Skip to content

This repository contains the R code to complete the assignments of machine learning online course on Coursera offered by professor Andrew NG.

Notifications You must be signed in to change notification settings

fredvanderzeeuw/MachineLearningMOOC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is the R equivalent version of all assignments of the online machine learning course (MOOC) offered by Stanford university and instructed by Andrew NG. The course materials including lectures and presentation PDFs, can be downloaded all at once from Coursera website. The course wiki page is accessible too.

One can solve the assignment and pass the quizzes and earn a certificate from Coursera.

The code provides the starter code and the infrastructure for carrying out the assignment in R statistical software. The completed assignments are available too, however publishing the solutions is against the course rules and I will remove them in future. If you had any question regarding the assignments, fill free to ask it here using Github issues.

To solve the assignments simply fill the parts of the code that is written "YOUR CODE HERE". The assignment instructions in pdf format are also included in this repository. There is a RStudio project file in each exercise folder that is a proper starting point to start coding. Upon the completion, the assignments can be submitted directly from R.

The "Solutions" folder has the solutions to the exercises. Note that the .Rda or .txt data files are not included in this folder.

Dependencies (3rd party packages)

I have tried my best not to use 3rd party packages in the starter codes. However to produce similar results and plots to Octave/Matlab I had to use a few packages although they are not many.

rgl package is used to produce the 3D scatter plots and surface plots in the exercises.

There are many optimization tasks within the assignments. Most of them were not large scale optimization problem and they were optimized using built-in optim function of R. However to solve optimization problems in exercise 4 and exercise 8, I have used a slightly modified version of lbfgsb3 package. One should first install the package and then source the lbfgsb3_.R. The sourcing is done in the starter codes automatically. fmincg or fminunc optimization functions in Octave/Matlab take one function as input that computes cost and gradient simultaneously. However cost and gradient functions MUST be supplied into optim or lbfgsb3 functions in R individually. So I have separated the cost and gradient functions in the starter codes.

Stemmer software (portStemmer.m) is used in the exercise 6 (spam classification) and the portStemmer function is called from processEmail function. Instead of re-implementing portStemmer function in R, I have used SnowballC package that produces the same results as with the case of portStemmer.m.

R.matlab package was used for reading Octave/Matlab .mat datasets. The datasets were converted to .Rda format. Thus you would not need this package to complete the assignments.

raster package is used to produce the plot of the bird in exercise 7.

Last but not the least is the Octave/Matlab pinv function. There is a ginv function in MASS package that doesn't produce the same exact result of Octave/Matlab pinv. Therefore a slightly modified version of MASS ginv is included in the starter codes. MASS package is not needed to be installed.

To wrap up, before starting to code make sure the following 6 packages are pre-installed: rgl, lbfgsb3, SnowballC and raster.

install.packages(c('rgl','lbfgsb3','SnowballC','raster'))

Submission

Now it is possible to submit assignments directly from R. So R programmers can take the course too. I submitted all assignments to coursera for testing and the scores were 100%.

Two more packages namely httr for POST() function and jsonlite for toJSON() function are needed to be installed before submission.

install.packages(c('jsonlite', 'httr'))

In order to submit, after completing each assignment, set the working directory to the root folder of the corresponding assignment e.g. setwd('D:\MachineLearningMOOC\StarterCodes\mlclass-ex1'). Then source the submit.r in R and type submit() in the R console.

Try not to use my solutions and submit your own efforts as it is against the course rules. In future I may remove the solutions from the repository, so the submissions will be all your own efforts.

Screen-shots

A few screen-shots of the plots produced in R:

Anomaly Detection Gradient Descent Convergence K-Means Clustering K-Means Raster Compress ![Learning Curves](http://faridcher.github.io/uploads/ml-course/Snapshots/Learning Curve.png) PCA Face Dataset SVM RBF Kernel Multiple Regression PCA Pixel Dataset Centroids

Topics covered in the course and assignments

  1. Linear regression, cost function and normalization
  2. Gradient descent and advanced optimization
  3. Multiple linear regression and normal equation
  4. Logistic regression, decision boundary and multi-class classification
  5. Over-fitting and Regularization
  6. Neural Network non-linear classification
  7. Model validation, diagnosis and learning curves
  8. System design, prioritizing and error analysis
  9. Support vector machine (SVM), large margin classification and SVM kernels (linear and Gaussian)
  10. K-Means clustering
  11. Principal component analysis (PCA)
  12. Anomaly detection, supervised learning
  13. Recommender systems, Collaborative filtering
  14. Large scale machine learning, stochastic and mini-batch gradient descent, online learning, map reduce

About

This repository contains the R code to complete the assignments of machine learning online course on Coursera offered by professor Andrew NG.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 100.0%