Skip to content

Course project files for the JHU Getting and Cleaning Data course

Notifications You must be signed in to change notification settings

nnorris7/GettingAndCleaningData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data - Course Project

This README outlines all of the files contained in this repository. You will need to follow the instructions in this file and the CodeBook file in order to (re)create the tidy data set (file) required for the JHU Getting and Cleaning Data course project.

Table of Contents

File Listing and Description

####This repository contains 3 files at the root level:

  • GettingAndCleaningData-master/
    • CodeBook.md

      • A code book that describes the variables, the data, and any transformations or work performed to clean up the data
    • README.md

      • This file, which explains how all of the scripts work and are connected in this repository
    • run_analysis.R

      • The R script containing the code to produce the tidy data set, given the raw data files

####It also contains 2 directories. The first directory, raw_data_files, contains the raw data files provided in the UCI HAR Dataset:

  • GettingAndCleaningData-master/raw_data_files/
    • activity_labels.txt

      • Links the activity code to the activity name (6 x 2)
    • features.txt

      • List of the variables collected in the test and training data sets (561 x 2)
    • subject_test.txt

      • Each row identifies the (test) subject who performed the activity for each window sample, it's range is from 1 to 30 (2947 x 1)
    • subject_train.txt

      • Each row identifies the (train) subject who performed the activity for each window sample, it's range is from 1 to 30 (7352 x 1)
    • X_test.txt

      • Raw test data set (2947 x 561)
    • X_train.txt

      • Raw training data set (7352 x 561)
    • y_test.txt

      • Activity code for the each row of the test data set (2947 x 1)
    • y_train.txt

      • Activity code for each row of the train data set (7352 x 1)

####The second directory, tidy_data_file, contains the tidy data set text file we were required to produce.

  • GettingAndCleaningData-master/tidy_data_file/
    • tidy_data.txt
      • The tidy data set (181 x 68)

How To Use

In order to use these files to recreate the tidy data set, follow the instructions below (which assume you know how to use GitHub):

  1. Click the GitHub link to this repository provided in the Evaluation area.
  2. Either download the zip version of the repository or Clone it to your machine.
  3. Unzip the file, if required. This will recreate the folder structure listed above.
  4. In R, set your working directory to the "GettingAndCleaningData-master" folder. (NB: If you cloned the repository, the '-master' part of the folder name will not appear.)
  5. The run_analysis.R script requires the following packages to be loaded:
    • data.table
    • dplyr
  6. Sourcing the run_analysis.R file will cause the script to run and the tidy data set file to be (re)created. Note, the script overwrites the tidy data set file each time. The script takes approximately 30s to run (depending on your hardware, see below).
  7. The tidy data set is stored in a variable called "tidy", which you can explore in R, or by loading the tidy_data.txt file into a text editor.

Considerations

The run_analysis.R script contains comments throughout that explain what each section of the script does. Here is a summary of those comments:

## This R script called run_analysis.R performs the following tasks:
## 1. Reads the raw data files and merges them into one data set,
## 2. Replaces the activity codes with the activity names,
## 3. Uses the "features.txt" file to appropriately label the columns/variables,
## 4. Extracts only the mean and std variables from the larger data set,
## 5. Groups the data by subject and activity and then calculates the mean for each
##	  mean/std column/variable,
## 6. Writes out the "tidy" data to a text file.

The script cleans up after itself in each block of code, therefore it should not take more than 60MB of memory to run.

While most of the operations have been optimized for speed of execution, that was not a specified consideration for this project. As stated above, this script takes approximately 30s to run on a 2.6 GHz Intel Core i7 Macbook Pro with 8 GB of RAM.

The Fine Print

In accordance with the JHU Honor Code, I certify that my answers here are my own work, and that I have appropriately acknowledged all external sources (if any) that were used in this work.

About

Course project files for the JHU Getting and Cleaning Data course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages