Repo for Getting and Cleaning Data class via Coursera
Assumptions: Working directory contains the unzipped data folder containing UCI HAR Dataset, which contains train and test folders and all other data
####Required:
- Samsung data from UCI with data folder in working directory
- The reshape2 package
####Output
- "combined" data frame with all data in a frame
- "mean_STD_comb" data frame with only column names that contain the string "std" or "mean("
- "avg_df" data frame which has each activity and subject pair's average reading.
####Procedure:
- Load the test and train files into six separate data frames
- Load the titles of the columns into another data frame
- Load the vector consisting of the labels
- For each frame in test and train, set the column names
- Combine the frames to have two sets, train and test
- Combine the train and test frame to get the full data frame, named combined
- Clean up the unnecessary data frames, essentially all but combined and labels.
- Make a new data frame that consists only of the column names with "mean(", "std", "Activity", or "Subject"
- Rename the Activity name to match what is given in labels
- Melt the data by Activity and Subject
- Rename the columns to lower case and no special characters
- Dcast the data frame with means as element, (activity, subject)-pair as column title and the measure as row title.
- Change the activity number to activity name.
- Write outputs to files