About Us Take My Online Class

Question.5019 - Data cleaning is a crucial step in the big data analytics process. Before analyzing your data, it is essential to ensure it is free from errors and inconsistencies and formatted appropriately for your analysis. Proper data cleaning helps guarantee that your analysis is based on high-quality data, preventing errors and incorrect conclusions. Additionally, clean data is easier to work with and can reduce the time and resources needed for analysis.OpenRefine, formerly known as GoogleRefine, is a powerful, free, open-source tool designed for working with messy data. It allows you to clean, transform data from one format to another, and extend it with web services and external data. OpenRefine is capable of handling large datasets, making it suitable for big data projects. Note that OpenRefine is a Java application and requires the Java Runtime Environment (JRE) to run.Action Items:1. Visit OpenRefine’s homepage for an overview of its features. Then, download and install the version 3.6.2. 2. Import Datasetproperties.csvRun OpenRefine and point your browser at 127.0.0.1:3333.We use a product dataset from Mercari, derived from a Kaggle competition. If you are interested in the details, visit the data description page. We have sampled a subset of the dataset provided as "properties.csv".Choose "Create Project" → This Computer → Choose Files → properties.csv". Click "Next"You will now see a preview of the data. Click "Create Project" at the upper right corner.3. Clean/Refine the data and answer the questions below. NOTE: OpenRefine maintains a log of all changes. You can undo changes by the "Undo/Redo" button at the upper left corner. Follow the exact output format specified in every part below.Additional resources for further learning:OpenRefine is a powerful tool for cleaning data. Whether you are working on a course or a real-world project, this tool could be very helpful. To learn more about it, search online and there are many tutorials, such as https://www.youtube.com/watch?v=RhaDVmLT-Ck.Question 1Saved4 pointsa) Select the category_name column and choose "Facet by Blank" (Facet → Customized Facets → Facet by Blank) to filter out the records that have blank values in this column. Exclude these rows (hint: choose include of an appropriate Boolean variable in the left-side output panel). How many rows are left?Question 2Saved4 pointsb) Select the column name, and apply cluster by selecting the Edit Cells → Cluster. This opens a window where you can choose different “methods” and “keying functions” to use while clustering. Choose the keying function that produces the largest number of clusters under the “Key Collision” method. Click ‘Select All’ and ‘Merge Selected & Close’. Provide the name of the keying function and the number of clusters produced.Paragraphusing key collision method - fingerprint, the number of clusters produced is 9670Question 3Saved4 pointsc) Replace the null values in the brand_name column with the text “Unknown” (Edit Cells → Transform). Hint: read the GREL instruction and lean how to use the "if" function. Report your GREL code. Click "OK" when the results in the preview window are correct. Note: the same as with Excel, you should learn how to find the appropriate function and apply it.Paragraphif(value == null, "Unknown", value)Question 4Saved4 pointsd) Go to the "price" column, choose “Edit cells”, select “Common transformations.”, and choose number as the desired format. Then, use an appropriate function under "Edit Column" to create a new column high_priced with the values 0 or 1 based on the “price” column with the following conditions: if the price is greater than 50, high_priced should be set as 1, else 0. Report your GREL code. Click "OK" when the results in the preview window are correct.Paragraphif(value > 50, 1, 0)Question 5Saved4 pointse) Create a new column has_offer with the values 0 or 1 based on the item_description column with the following conditions: If it contains the text “discount” or “offer” or “sale”, then set the value in has_offer as 1, else 0. Report your GREL code. Hint: you can use both if() and or() functions.Paragraphif(or(contains(value.toLowercase(), "discount"), contains(value.toLowercase(), "offer"), contains(value.toLowercase(), "sale")), 1, 0)Question 6Saved5 pointsPlease reflect on your learning experience throughout this lab assignment. To receive full credit, ensure you include specific details.What was the most interesting thing you learned?What challenges did you encounter during your learning process?

Answer Below:

Question xxxxxxxxx Matrix xxxxxxxxxx MetricsPredicted xxxxxxxxxx Positive xxxxxxxxx Non-Fraudulent xxxxxxxx Actual xxxxxxxxxx Positive xxxx Positive xx False xxxxxxxx FN xxxxxx Non-Fraudulent xxxxxxxx False xxxxxxxx FP xxxx Negative xx considering xxxxxxx accuracy xx TN xx TN xx FN xxxxx rate x accuracy x Considering xxxxxxxxxxx recall xxxx positive xxxx TP xx FN xxxxx in xxxxx of xxxxxxxxxxx true xxxxxxxx rate xx TN xx Increasing xxx cutoff xxxxx will xxxxxxxx decrease xxxxxxxxxxx since xxx threshold xxxxx become xxxx stringent xxxx fewer xxxxxx fraudulent xxxxx being xxxxxxxxx identified xxxxxxxx resulting xx an xxxxxx false xxxxxxxx rate xxxxx increase xx specificity xx raising xxx cutoff xxx model xxxxxxx more xxxxxxxxxxxx in xxxxxxxxxx fraudulent xxxxx correctly xxxxxxxxxxx more xxxxxxxxxxxxxx cases xxx reducing xxxxx positives

More Articles From DATA 610: Big Data Analytics and Data Mining

TAGLINE HEADING

More Subjects Homework Help