Process Data from Dirty to Clean Module 1-4 Challenge Answers

Table of Content

Process Data from Dirty to Clean Module 1 Challenge Answers

Fill in the blank: If a test is statistically _, the results are less likely to be due to random chance and more likely to be due to a real difference between the groups being compared.

  • significant
  • repeatable
  • connected
  • precise
Answers
  • significant

or

Fill in the blank: If a data professional calculates a statistical power of at least 80%, it is likely their experiment results are due to a _____ between the groups being compared, rather than random chance.

  • predicted connection
  • slight similarity
  • real difference
  • confirmed anomaly
Answers
  • real difference

A retailer keeps data in point-of-sale systems at its 10 local stores and an inventory management system at its central warehouse. When purchases are made at a store, they are only recorded in the point-of-sale systems. As a result, the inventory records are inaccurate and stores often run out of inventory. What data integrity problem does this scenario describe?

  • Replication
  • Transfer
  • Manipulation
  • Gathering
Answers
  • Replication

In a survey about a new smartphone app, 65% of respondents report they would recommend the app to others. The margin of error for the survey is 3%. Based on that margin of error, what range reflects the population’s true response?

  • 60-63%
  • 62-68%
  • 65-68%
  • 68-71%
Answers
  • 62-68%

A car dealer conducts a survey to understand why customers choose their dealership. They are eager for positive feedback, so they email the survey to only those customers who purchased two or more vehicles from the dealership in the past five years. What is likely to result?

  • Random sampling
  • Unbiased sampling
  • Geographically limited sampling
  • Sampling bias
Answers
  • Sampling bias

Fill in the blank: To determine whether a survey or experiment has meaningful _, a data team uses hypothesis testing.

  • process steps
  • results
  • action items
  • significance
Answers
  • results

A data professional in the logistics industry wants to calculate the margin of error for a study about transportation route efficiency. They know the sample size and confidence level. What must they also know in order to accurately calculate margin of error?

  • Distribution
  • Testing methodology
  • Population size
  • Correlation
Answers
  • Population size

Question 7 As an analyst downloads a dataset from the internet to their local drive, their internet connection goes down. This interrupts the download, causing them to have an incomplete copy of the dataset on their computer. What data integrity problem does this scenario describe?

  • Cleaning
  • Replication
  • Manipulation
  • Transfer
Answers
  • Transfer

Question 8 Which of the following statements accurately describe sample size, population, and confidence level? Select all that apply.

  • Using sample size makes it possible to get enough information from a small group within a population to draw conclusions about the whole population.
  • A confidence level of 75% is considered ideal by most industries.
  • For effective outcomes, a data professional aims for a high confidence level in their sample.
  • The goal of random sampling is to ensure every possible type of the sample has an equal chance of being chosen.
Answers
  • For effective outcomes, a data professional aims for a high confidence level in their sample.
  • The goal of random sampling is to ensure every possible type of the sample has an equal chance of being chosen.

or

  • Sample size is a part of a population that is representative of the population.
  • Random sampling is a method that data professionals use to help address some of the issues with sampling bias.

Process Data from Dirty to Clean Module 2 Challenge Answers

To identify ways to improve the shipping process, a data analyst merges a dataset of client order data with a dataset of shipping data. What should the data team do to ensure the compatibility of the two datasets?

  • Apply a data structure
  • Use a visualization
  • Map the data
  • Spotcheck for null values
Answers
  • Map the data

Fill in the blank: When typing a LEN function, the correct _____ to follow is =LEN(range).

  • validation
  • syntax
  • system
  • algorithm
Answers
  • syntax

In this spreadsheet, which function will extract Che Price’s four-digit postcode?

  • =RIGHT(C3,4)
  • =LEFT(4,C3)
  • =LEFT(C3,4)
  • =RIGHT(4,C3)
Answers
  • =RIGHT(C3,4)

Fill in the blank: In a VLOOKUP function, the word false tells VLOOKUP that an _ match is desired.

  • inexact
  • approximate
  • exact
  • uncertain
Answers
  • exact

In the following spreadsheet, a data professional wants to create product IDs in column C. The IDs should include the item name plus its version number. Which function will create the ID Tether_rope02?

  • =CONCATENATE(A5+B5)
  • =CONCATENATE(A5_B5)
  • =CONCATENATE(A5,B5)
  • =CONCATENATE(A5*B5)
Answers
  • =CONCATENATE(A5,B5)

A data analyst wants to know how many cells from A2 through A50 contain numbers below 100. Which of the following COUNTIF statements should they use?

  • =COUNTIF(A2:A50,”<100″)
  • =COUNTIF(A2:A50,”>=100″)
  • =COUNTIF(A2:A50, >100)
  • =COUNTIF(A2:A50, <=100)
Answers
  • =COUNTIF(A2:A50,”<100″)

A data analyst uses the SPLIT tool to place each protein and nut into new, separate cells. What is the hyphen’s function in this scenario?

  • String
  • Substring
  • Delimiter
  • Duplicate
Answers
  • Delimiter

A junior data analyst needs to search their spreadsheet for a particular client ID. In order to identify all cells containing the ID, they use a spreadsheet tool that changes how cells appear when values meet specific conditions. What tool do they use?

  • Cell filtering
  • Field length
  • Data merging
  • Conditional formatting
Answers
  • Conditional formatting

Process Data from Dirty to Clean Module 3 Challenge Answers

A data professional analyzes medical data for a health insurance company. The dataset they are working with contains millions of rows of data. What tool would be most efficient for the analyst to use?

  • CSV
  • Word processor
  • SQL
  • Spreadsheet
Answers
  • SQL

A data analyst discovers that their database has recognized product price data as text strings. What SQL function can the analyst use to convert the text strings to floats?

  • LENGTH
  • CAST
  • SUBSTR
  • TRIM
Answers
  • CAST

Fill in the blank: A data analyst working on a marketing project uses the SQL command _____ to add a row for a recent product lead to their organization’s database.

  • CREATE TABLE IF NOT EXISTS
  • INSERT INTO
  • DROP TABLE IF EXISTS
  • UPDATE
Answers
  • INSERT INTO

You are working with a database table that has columns about products, such as product_name. Which SUBSTR function and AS command will retrieve the first 2 characters of each product name and store the result in a new column called product_ID?

  • SUBSTR(product_name) AS (1, 2) product_ID
  • SUBSTR(product_name, 1, 2) AS product_ID
  • SUBSTR AS (1, 2 product_name) product_ID
  • SUBSTR(product_name, 2) AS product_ID
Answers
  • SUBSTR(product_name, 1, 2) AS product_ID

In SQL, what function can be used to remove leading spaces from a piece of data?

  • FORMAT
  • SUBSTR
  • AVG
  • TRIM
Answers
  • TRIM

While working with a database table that contains the column computer_model, you notice that there are some duplicate entries. Which SQL clause would you use in a query to return the computer_model data without these duplicates?

  • DELETE computer_model
  • DUPLICATE computer_model
  • DISTINCT computer_model
  • DROP computer_model
Answers
  • DISTINCT computer_model

Fill in the blank: The SQL command _ is irreversible, so data analysts should consider whether data should be backed up before using it.

  • CREATE TABLE IF NOT EXISTS
  • DROP TABLE IF EXISTS
  • INSERT INTO
  • UPDATE
Answers
  • DROP TABLE IF EXISTS

You are using a database table that includes the column credit_card_numbers, and you want to check for any fraudulent activity. Which SQL clause will help you identify any credit card numbers that are more than 16 characters long?

  • COUNT(credit_card_numbers) > 16
  • IDENTIFY(credit_card_numbers) < 16
  • WHERE(credit_card_numbers) < 16
  • LENGTH(credit_card_numbers) > 16
Answers
  • LENGTH(credit_card_numbers) > 16

After joining multiple tables containing data about patient visits to a hospital, you find a significant number of null values in the patient_intake column. What SQL function can you use to replace these null values with a value in a different column?

  • CONCAT
  • TRIM
  • COALESCE
  • CAST
Answers
  • COALESCE

Process Data from Dirty to Clean Module 4 Challenge Answers

Fill in the blank: A data scientist keeps code for data analysis pipelines in a ___, which enables them to track the evolution of the pipelines over time.

  • dashboard
  • version control system
  • dataset
  • changelog
Answers
  • version control system

A data professional works on a financial audit. During the verification process, they keep in mind the big picture view of confirming that the company’s financial statements comply with accounting standards. What activities will help them achieve this goal? Select all that apply.

  • Consider the business problem
  • Consider the goal
  • Consider the reporting
  • Consider the data
Answers
  • Consider the data
  • Consider the business problem

Which SQL clause will consider a condition and return a value when that condition is met?ch

  • WHEN column_name = ‘condition’ CASE ‘value’ END

  • WHEN
  • CASE column_name = ‘condition’ THEN ‘value’ END

  • CASE column_name = ‘condition’ THEN ‘value’ END

  • CASE
    WHEN column_name = ‘condition’ THEN ‘value’ END
Answers
  • CASE
    WHEN column_name = ‘condition’ THEN ‘value’ END

A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?

  • Disclosure
  • Illumination
  • Documentation
  • Examination
Answers
  • Documentation

During verification, you notice an error in a dataset. You remember fixing a similar error when previously cleaning the data. What tool can you reference to find documentation about how to fix the error?

  • Notepad
  • Changelog
  • Data table
  • Text editor
Answers
  • Changelog

Question 6 A data analyst uses a pivot table in Google Sheets to determine how many times a particular country name occurs within a dataset. What function will provide the required information?

  • CONCAT
  • CASE
  • COUNTA
  • CHECK
Answers
  • COUNTA

Which of the following statements accurately describe code review and code commit? Select all that apply

  • Code review occurs prior to code commit.
  • Code review must involve numerous formal approvals.
  • An example of code review is a data professional asking a colleague to assess their SQL query.
  • Code commit might involve updating code within a version control system.
Answers
  • Code review occurs prior to code commit.
  • An example of code review is a data professional asking a colleague to assess their SQL query.
  • Code commit might involve updating code within a version control system.

Fill in the blank: To correct a misspelling in their spreadsheet, a data professional uses _ to search for any instance of “compurer” and change it to “computer.”

  • Remove duplicates
  • find and replace
  • formatting
  • TRIM
Answers
  • find and replace

Leave a Reply