Google Data Analyst professional certificate Capstone. In this project I cleaned divvy dataset in BigQuery Server using SQL
How do annual members and casual riders use Cyclistic bikes differently? (our assignment from the stakeholder)
With our assignment or business task, we will create a report with the following deliverables:
- A clear statement of the business task
- A description of all data sources used
- Documentation of any cleaning or manipulation of data
- A summary of your analysis
- Supporting visualizations and key findings
- Top three recommendations based on your analysis
- Guiding Questions
- What tools are you choosing and why?
For this case study, I will be using SQL(BigQuery) for much of the cleaning, organization, and analysis while I use Tableau for the visualization. This will enable me reinforce what I have learned during my course and showcase my skills to job recruiters for employment.
- Have you ensured your data’s integrity?
Data is consistent, with queries to filter and calculate helpful information from the data.
- What steps have you taken to ensure that your data is clean?
After combining the data in BigQuery, I have run queries to check for missing/duplicate data as well as finding and removing outliers.
- How can you verify that your data is clean and ready to analyze?
The queries that have been run serve to remove duplicate/missing/irrelevant/outlier rows/columns and ensure that all columns are of the correct type.
- Have you documented your cleaning process so you can review and share those results?
This notebook serves as the documentation for all processes of the case study.
Key Tasks [1]Check the data for errors.
[2] Choose your tools.
[3] Transform the data so you can work with it effectively.
[4] Document the cleaning process.
Deliverable Documentation of any cleaning or manipulation of data can be found here Data Exploration in SQL Queries