So Hurrah! We've finished. Now we need to review everything.
So far we have covered:
- Distance metrics
- cosine similarity
- euclidean distance (L2 norm)
- Manhattan distance (L1 norm)
- Chebyshev distance
- Jaccard distance
- collinearity and multicollinearity
- support vector machines (SVM)
- principle component analysis (PCA)
Still to cover:
- item-based vs. user-based recommenders. which similarity matrix do you use and why?
- cost function, gradient descent
- lasso vs. ridge
- NMF
- LDA
- MCMC
- heteroscedasticity
- a/b testing
- scoring methods
- clustering
- anamoly detection
- SNA
- Command line utilities for data cleaning
- Useful utilities for data visualization
- Tips on approaching coding questions
- graphs and network analysis, social network