This repository contains the assignment for course project from Getting and Cleaning Data course at coursera.org, which is a part of the Data Science specialization.
The analysis script 'run_analysis.R' does the following:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- 'CodeBook.md': this code book provides a brief overview on the data set, files included, the variables, the work to be performed on the dataset and working of 'run_analysis.R' code to obtain the tidy dataset.
- 'run_analysis.R': the script which performs the analysis.
- 'action_subject_means.txt': the resulting tidy data set produced by 'run_analysis.R' script.
In order to reproduce the analysis, follow these simple steps:
-
Clone the repository:
git clone git@github.com:scoricov/getdata-012.git
-
Make directory "getdata-012" your current working directory.
-
Download and extract raw data into the current working directory. As a result, directory called "UCI HAR Dataset" will appear in the CWD.
-
Run the analysis:
Rscript run_analysis.R
-
Resulting tidy data set will be written to the text file called 'action_subject_means.txt'.
-
Load and view the tidy data set in R:
x <- read.table("action_subject_means.txt") x
##Pre-requisites
- Package 'dplyr' must be installed before running the script.
- This course project uses the "Human Activity Recognition Using Smartphones Dataset" which can be downloaded here.