Table of Content
- 1 Process Data from Dirty to Clean Module 1 Challenge Answers
- 1.1 Fill in the blank: If a test is statistically _, the results are less likely to be due to random chance and more likely to be due to a real difference between the groups being compared.
- 1.2 Fill in the blank: If a data professional calculates a statistical power of at least 80%, it is likely their experiment results are due to a _____ between the groups being compared, rather than random chance.
- 1.3 A retailer keeps data in point-of-sale systems at its 10 local stores and an inventory management system at its central warehouse. When purchases are made at a store, they are only recorded in the point-of-sale systems. As a result, the inventory records are inaccurate and stores often run out of inventory. What data integrity problem does this scenario describe?
- 1.4 In a survey about a new smartphone app, 65% of respondents report they would recommend the app to others. The margin of error for the survey is 3%. Based on that margin of error, what range reflects the population’s true response?
- 1.5 A car dealer conducts a survey to understand why customers choose their dealership. They are eager for positive feedback, so they email the survey to only those customers who purchased two or more vehicles from the dealership in the past five years. What is likely to result?
- 1.6 Fill in the blank: To determine whether a survey or experiment has meaningful _, a data team uses hypothesis testing.
- 1.7 A data professional in the logistics industry wants to calculate the margin of error for a study about transportation route efficiency. They know the sample size and confidence level. What must they also know in order to accurately calculate margin of error?
- 1.8 Question 7 As an analyst downloads a dataset from the internet to their local drive, their internet connection goes down. This interrupts the download, causing them to have an incomplete copy of the dataset on their computer. What data integrity problem does this scenario describe?
- 1.9 Question 8 Which of the following statements accurately describe sample size, population, and confidence level? Select all that apply.
- 2 Process Data from Dirty to Clean Module 2 Challenge Answers
- 2.1 To identify ways to improve the shipping process, a data analyst merges a dataset of client order data with a dataset of shipping data. What should the data team do to ensure the compatibility of the two datasets?
- 2.2 Fill in the blank: When typing a LEN function, the correct _____ to follow is =LEN(range).
- 2.3 In this spreadsheet, which function will extract Che Price’s four-digit postcode?
- 2.4 Fill in the blank: In a VLOOKUP function, the word false tells VLOOKUP that an _ match is desired.
- 2.5 In the following spreadsheet, a data professional wants to create product IDs in column C. The IDs should include the item name plus its version number. Which function will create the ID Tether_rope02?
- 2.6 A data analyst wants to know how many cells from A2 through A50 contain numbers below 100. Which of the following COUNTIF statements should they use?
- 2.7 A data analyst uses the SPLIT tool to place each protein and nut into new, separate cells. What is the hyphen’s function in this scenario?
- 2.8 A junior data analyst needs to search their spreadsheet for a particular client ID. In order to identify all cells containing the ID, they use a spreadsheet tool that changes how cells appear when values meet specific conditions. What tool do they use?
- 3 Process Data from Dirty to Clean Module 3 Challenge Answers
- 3.1 A data professional analyzes medical data for a health insurance company. The dataset they are working with contains millions of rows of data. What tool would be most efficient for the analyst to use?
- 3.2 A data analyst discovers that their database has recognized product price data as text strings. What SQL function can the analyst use to convert the text strings to floats?
- 3.3 Fill in the blank: A data analyst working on a marketing project uses the SQL command _____ to add a row for a recent product lead to their organization’s database.
- 3.4 You are working with a database table that has columns about products, such as product_name. Which SUBSTR function and AS command will retrieve the first 2 characters of each product name and store the result in a new column called product_ID?
- 3.5 In SQL, what function can be used to remove leading spaces from a piece of data?
- 3.6 While working with a database table that contains the column computer_model, you notice that there are some duplicate entries. Which SQL clause would you use in a query to return the computer_model data without these duplicates?
- 3.7 Fill in the blank: The SQL command _ is irreversible, so data analysts should consider whether data should be backed up before using it.
- 3.8 You are using a database table that includes the column credit_card_numbers, and you want to check for any fraudulent activity. Which SQL clause will help you identify any credit card numbers that are more than 16 characters long?
- 3.9 After joining multiple tables containing data about patient visits to a hospital, you find a significant number of null values in the patient_intake column. What SQL function can you use to replace these null values with a value in a different column?
- 4 Process Data from Dirty to Clean Module 4 Challenge Answers
- 4.1 Fill in the blank: A data scientist keeps code for data analysis pipelines in a ___, which enables them to track the evolution of the pipelines over time.
- 4.2 A data professional works on a financial audit. During the verification process, they keep in mind the big picture view of confirming that the company’s financial statements comply with accounting standards. What activities will help them achieve this goal? Select all that apply.
- 4.3 Which SQL clause will consider a condition and return a value when that condition is met?ch
- 4.4 A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?
- 4.5 During verification, you notice an error in a dataset. You remember fixing a similar error when previously cleaning the data. What tool can you reference to find documentation about how to fix the error?
- 4.6 Question 6 A data analyst uses a pivot table in Google Sheets to determine how many times a particular country name occurs within a dataset. What function will provide the required information?
- 4.7 Which of the following statements accurately describe code review and code commit? Select all that apply
- 4.8 Fill in the blank: To correct a misspelling in their spreadsheet, a data professional uses _ to search for any instance of “compurer” and change it to “computer.”
Process Data from Dirty to Clean Module 1 Challenge Answers
Fill in the blank: If a test is statistically _, the results are less likely to be due to random chance and more likely to be due to a real difference between the groups being compared.
- significant
- repeatable
- connected
- precise
or
Fill in the blank: If a data professional calculates a statistical power of at least 80%, it is likely their experiment results are due to a _____ between the groups being compared, rather than random chance.
- predicted connection
- slight similarity
- real difference
- confirmed anomaly
A retailer keeps data in point-of-sale systems at its 10 local stores and an inventory management system at its central warehouse. When purchases are made at a store, they are only recorded in the point-of-sale systems. As a result, the inventory records are inaccurate and stores often run out of inventory. What data integrity problem does this scenario describe?
- Replication
- Transfer
- Manipulation
- Gathering
In a survey about a new smartphone app, 65% of respondents report they would recommend the app to others. The margin of error for the survey is 3%. Based on that margin of error, what range reflects the population’s true response?
- 60-63%
- 62-68%
- 65-68%
- 68-71%
A car dealer conducts a survey to understand why customers choose their dealership. They are eager for positive feedback, so they email the survey to only those customers who purchased two or more vehicles from the dealership in the past five years. What is likely to result?
- Random sampling
- Unbiased sampling
- Geographically limited sampling
- Sampling bias
Fill in the blank: To determine whether a survey or experiment has meaningful _, a data team uses hypothesis testing.
- process steps
- results
- action items
- significance
A data professional in the logistics industry wants to calculate the margin of error for a study about transportation route efficiency. They know the sample size and confidence level. What must they also know in order to accurately calculate margin of error?
- Distribution
- Testing methodology
- Population size
- Correlation
Question 7 As an analyst downloads a dataset from the internet to their local drive, their internet connection goes down. This interrupts the download, causing them to have an incomplete copy of the dataset on their computer. What data integrity problem does this scenario describe?
- Cleaning
- Replication
- Manipulation
- Transfer
Question 8 Which of the following statements accurately describe sample size, population, and confidence level? Select all that apply.
- Using sample size makes it possible to get enough information from a small group within a population to draw conclusions about the whole population.
- A confidence level of 75% is considered ideal by most industries.
- For effective outcomes, a data professional aims for a high confidence level in their sample.
- The goal of random sampling is to ensure every possible type of the sample has an equal chance of being chosen.
Process Data from Dirty to Clean Module 2 Challenge Answers
To identify ways to improve the shipping process, a data analyst merges a dataset of client order data with a dataset of shipping data. What should the data team do to ensure the compatibility of the two datasets?
- Apply a data structure
- Use a visualization
- Map the data
- Spotcheck for null values
Fill in the blank: When typing a LEN function, the correct _____ to follow is =LEN(range).
- validation
- syntax
- system
- algorithm
In this spreadsheet, which function will extract Che Price’s four-digit postcode?
- =RIGHT(C3,4)
- =LEFT(4,C3)
- =LEFT(C3,4)
- =RIGHT(4,C3)
Fill in the blank: In a VLOOKUP function, the word false tells VLOOKUP that an _ match is desired.
- inexact
- approximate
- exact
- uncertain
In the following spreadsheet, a data professional wants to create product IDs in column C. The IDs should include the item name plus its version number. Which function will create the ID Tether_rope02?
- =CONCATENATE(A5+B5)
- =CONCATENATE(A5_B5)
- =CONCATENATE(A5,B5)
- =CONCATENATE(A5*B5)
A data analyst wants to know how many cells from A2 through A50 contain numbers below 100. Which of the following COUNTIF statements should they use?
- =COUNTIF(A2:A50,”<100″)
- =COUNTIF(A2:A50,”>=100″)
- =COUNTIF(A2:A50, >100)
- =COUNTIF(A2:A50, <=100)
A data analyst uses the SPLIT tool to place each protein and nut into new, separate cells. What is the hyphen’s function in this scenario?
- String
- Substring
- Delimiter
- Duplicate
A junior data analyst needs to search their spreadsheet for a particular client ID. In order to identify all cells containing the ID, they use a spreadsheet tool that changes how cells appear when values meet specific conditions. What tool do they use?
- Cell filtering
- Field length
- Data merging
- Conditional formatting
Process Data from Dirty to Clean Module 3 Challenge Answers
A data professional analyzes medical data for a health insurance company. The dataset they are working with contains millions of rows of data. What tool would be most efficient for the analyst to use?
- CSV
- Word processor
- SQL
- Spreadsheet
A data analyst discovers that their database has recognized product price data as text strings. What SQL function can the analyst use to convert the text strings to floats?
- LENGTH
- CAST
- SUBSTR
- TRIM
Fill in the blank: A data analyst working on a marketing project uses the SQL command _____ to add a row for a recent product lead to their organization’s database.
- CREATE TABLE IF NOT EXISTS
- INSERT INTO
- DROP TABLE IF EXISTS
- UPDATE
You are working with a database table that has columns about products, such as product_name. Which SUBSTR function and AS command will retrieve the first 2 characters of each product name and store the result in a new column called product_ID?
- SUBSTR(product_name) AS (1, 2) product_ID
- SUBSTR(product_name, 1, 2) AS product_ID
- SUBSTR AS (1, 2 product_name) product_ID
- SUBSTR(product_name, 2) AS product_ID
In SQL, what function can be used to remove leading spaces from a piece of data?
- FORMAT
- SUBSTR
- AVG
- TRIM
While working with a database table that contains the column computer_model, you notice that there are some duplicate entries. Which SQL clause would you use in a query to return the computer_model data without these duplicates?
- DELETE computer_model
- DUPLICATE computer_model
- DISTINCT computer_model
- DROP computer_model
Fill in the blank: The SQL command _ is irreversible, so data analysts should consider whether data should be backed up before using it.
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
- INSERT INTO
- UPDATE
You are using a database table that includes the column credit_card_numbers, and you want to check for any fraudulent activity. Which SQL clause will help you identify any credit card numbers that are more than 16 characters long?
- COUNT(credit_card_numbers) > 16
- IDENTIFY(credit_card_numbers) < 16
- WHERE(credit_card_numbers) < 16
- LENGTH(credit_card_numbers) > 16
After joining multiple tables containing data about patient visits to a hospital, you find a significant number of null values in the patient_intake column. What SQL function can you use to replace these null values with a value in a different column?
- CONCAT
- TRIM
- COALESCE
- CAST
Process Data from Dirty to Clean Module 4 Challenge Answers
Fill in the blank: A data scientist keeps code for data analysis pipelines in a ___, which enables them to track the evolution of the pipelines over time.
- dashboard
- version control system
- dataset
- changelog
A data professional works on a financial audit. During the verification process, they keep in mind the big picture view of confirming that the company’s financial statements comply with accounting standards. What activities will help them achieve this goal? Select all that apply.
- Consider the business problem
- Consider the goal
- Consider the reporting
- Consider the data
Which SQL clause will consider a condition and return a value when that condition is met?ch
- WHEN column_name = ‘condition’ CASE ‘value’ END
- WHEN
- CASE column_name = ‘condition’ THEN ‘value’ END
- CASE column_name = ‘condition’ THEN ‘value’ END
- CASE
WHEN column_name = ‘condition’ THEN ‘value’ END
A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?
- Disclosure
- Illumination
- Documentation
- Examination
During verification, you notice an error in a dataset. You remember fixing a similar error when previously cleaning the data. What tool can you reference to find documentation about how to fix the error?
- Notepad
- Changelog
- Data table
- Text editor
Question 6 A data analyst uses a pivot table in Google Sheets to determine how many times a particular country name occurs within a dataset. What function will provide the required information?
- CONCAT
- CASE
- COUNTA
- CHECK
Which of the following statements accurately describe code review and code commit? Select all that apply
- Code review occurs prior to code commit.
- Code review must involve numerous formal approvals.
- An example of code review is a data professional asking a colleague to assess their SQL query.
- Code commit might involve updating code within a version control system.
Fill in the blank: To correct a misspelling in their spreadsheet, a data professional uses _ to search for any instance of “compurer” and change it to “computer.”
- Remove duplicates
- find and replace
- formatting
- TRIM