GetDataProject

For Coursera getdata-012 class project

Assumptions:

Run_analysis.R is called from the working directory containing the unzipped files form [http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones](the UCI HAR dataset). The working directory contains features.txt, activity_labels.txt and two sub-directories: test and train where the data resides. Summary of the data by subject and by activity requires the package dplyr() to be installed and uses data.table

Logical Flow of the Script

Define which columns to extract by parsing features.txt
Read defined columns from train/x_train.txt into a dataframe
Add the activity column to that dataframe
Add the subject ID to the dataframe
Repeat steps 2-4 on the test/ data
Concatenate the two dataframes
Output a tidy data set with human readable names
Write a summary dataframe

Reading features.txt

Two nice overviews of string functions in R can be found at GastonSanchez and Wikibooks. The function getCols define which columns to extract. Only keep columns with names containg with mean() or std(). This will include with names with intervening characters such as meanFreq() but not aggolmeration columns like angle(Z,gravityMean)

Reading activity_labels.txt

The function getActivityNames reads this file to create a dataframe describing the activities corresponding to the numbered activity.

Reading The Data in sub directories

The function readData reads the data from the files passed in and creates a data frame. SubjectFile is the file telling the subject who performed the activity. YFile is the label for the activity that was being performed in the original numeric format XFile is the actual data values for all columns. keepCol is the df created by getCols() that specifys which columns to keep. The files XFile, YFile, and SubjectFile all need to have the relative path specified (e.g. ./test/X_test.txt). This function should be invoked twice, once for the test data and once for the training data. The function returns the resulting data frame with the human readable column labels.

Merging the two data frames

The script buildData brings all the functions together. First it calls getCols() to make a data frame dataCols which contains the index and name of the columns to keep. Then buildData calls the readData function twice to build two data frames, testdf for the test data and traindf for the training data. These two data frames are concatenated together to make the mergeddf data frame. Finally, use the function getActivityNames to and use that data to add the activityName column to the mergeddf from the integers in activity column to make a more human readable format.

Averaging and Tidying the Data by subject by activity

##The criteria for tidy data

Exactly one measured variable per column
Each observation is in a unique row
One table for every kind of variable
If multiple tables, should include column that allows them to be linked

So after the script runs, we have one measured variable per column. Taking the mean values gives one observation per row per subject per activity. The data is put into one table from multiple tables at the start.

##SummarizeData The summarizeData function takes as an argument, the dataframe constructed by buildData(). The function then uses group_by from the dplyr package to group the data by subject and activity as requested. The function summarise each then applies the mean function to each column between tBodyAccmeanX and fBodyGyroJerkMagmeanFreq

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Codebook.txt		Codebook.txt
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GetDataProject

Assumptions:

Logical Flow of the Script

Reading features.txt

Reading activity_labels.txt

Reading The Data in sub directories

Merging the two data frames

Averaging and Tidying the Data by subject by activity

About

Uh oh!

Releases

Packages

Languages

kgChem/getdataProject

Folders and files

Latest commit

History

Repository files navigation

GetDataProject

Assumptions:

Logical Flow of the Script

Reading features.txt

Reading activity_labels.txt

Reading The Data in sub directories

Merging the two data frames

Averaging and Tidying the Data by subject by activity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages