mlcourse.ai
Open Machine Learning Course
by OpenDataScience
Yury Kashnitskiy (@yorko)
Data Scientist @ KPN, Amsterdam
OpenDataScience. DataFest
OpenDataScience. Kaggle
mlcourse.ai. What we have for you
Syllabus
• 10 lectures
• Basic ML algorithms and their applications
• Assignments and in-class practice
• Competitions
• Individual projects
• Tutorials
More info here https://mlcourse.ai/roadmap
What makes it different
• Lots and lots of practice
• Theoretical understanding of
applied techniques
• Delving into competitions
• Your own projects
• Really vibrant community!
Roadmap/logistics
• All communication in ODS Slack, #mlcourse_ai
• https://mlcourse.ai/roadmap
• 10 assignments – ~10 credits each
• Projects, competitions, tutorials – up to 40 crd. each
• Current rating is here https://goo.gl/TGGr3b
• All materials are stored on GitHub https://github.com/Yorko/
mlcourse.ai and https://mlcourse.ai
• Top-100 participants will be mentioned on a special Wiki page
Toolbox
• Python
• Jupyter notebooks
• GitHub
• Docker (optional)
• Other libs like Vowpal Wabbit & Xgboost
• Instructions https://mlcourse.ai/prerequisites
Lecture 1
• Data analysis with Pandas
• Practice on first steps after
getting data
Lecture 2
• Visual data analysis with
Pandas and Seaborn
• Crucial plots for feature
exploration
• Practice on «drawing»
Lecture 3
• Foundations of Machine
Learning
• Supervised learning
• Decision trees
• k Nearest Neighbours
• Practice: first steps with
Scikit-learn
Lecture 4
• Linear classification models
• Regularization
• Cross-validation
• Practice on logistic regression
for a "real-world" task
Lecture 5
• Ensembles, random forest
• Feature importance
• Practice on random forest and
assessing feature importance
Lecture 6
• Regression task
• Linear and non-linear
regression models
• Practice on grasping core
ideas behind linear regression
Lecture 7
• Unsupervised Learning
• Principal Component Analysis
• Clustering
• Practice: clustering Samsung
Galaxy S3 sensor data into
types of human activity
Lecture 8
• Stochastic Gradient Descent
& Online learning
• Learning with a couple GB of
data
• Vowpal Wabbit
• Extracting simple features
from texts
• Practice: text classification
Lecture 9
• Time series
• Classical and modern
approaches
• Practice: ARIMA model,
Facebook Prophet
Lecture 10
• Gradient boosting: a modern
view
• Theoretical basis for gradient
boosting
• Best implementations
• Practice: beating a baseline in
a Kaggle Inclass competition
Regularization?
Assignments
• Full versions are announced
during course sessions https://
mlcourse.ai/assignments
• Demo versions are found in
course repo https://
github.com/Yorko/mlcourse.ai
• And in a Kaggle Dataset
mlcourse.ai https://
www.kaggle.com/kashnitsky/
mlcourse
Kaggle Inclass
• Alice - tracking visited websites
to distinguish Alice from all others
• Medium - predicting #claps for a
story on Medium
More info here https://mlcourse.ai/roadmap
Individual projects
• Throughout the whole course
• Straightforward instructions
• Your own data or just Kaggle
Datasets
• Peer review
• Very cool experience
More info here https://mlcourse.ai/roadmap
Project "Alice"
• A substitute for an individual
project if you don't have cool
ideas for one
• Clear instructions
• 6 weeks, 6 notebooks to
complete
• In cooperation with Yandex
and MIPT, specialization
"Machine Learning and Data
Analysis"
• Solutions are not shared
Tutorials
• Your own tutorials on pretty
much any topic around ML & DS
• Peer-voted
• Nice way to grasp something
yourself is to write a tutorial
More info here https://mlcourse.ai/roadmap
More info in Slack
#mlcourse.ai, pinned items
Good luck!
https://mlcourse.ai/news

mlcourse.ai, introduction, course overview

  • 1.
    mlcourse.ai Open Machine LearningCourse by OpenDataScience Yury Kashnitskiy (@yorko) Data Scientist @ KPN, Amsterdam
  • 2.
  • 3.
  • 4.
  • 5.
    Syllabus • 10 lectures •Basic ML algorithms and their applications • Assignments and in-class practice • Competitions • Individual projects • Tutorials More info here https://mlcourse.ai/roadmap
  • 6.
    What makes itdifferent • Lots and lots of practice • Theoretical understanding of applied techniques • Delving into competitions • Your own projects • Really vibrant community!
  • 7.
    Roadmap/logistics • All communicationin ODS Slack, #mlcourse_ai • https://mlcourse.ai/roadmap • 10 assignments – ~10 credits each • Projects, competitions, tutorials – up to 40 crd. each • Current rating is here https://goo.gl/TGGr3b • All materials are stored on GitHub https://github.com/Yorko/ mlcourse.ai and https://mlcourse.ai • Top-100 participants will be mentioned on a special Wiki page
  • 8.
    Toolbox • Python • Jupyternotebooks • GitHub • Docker (optional) • Other libs like Vowpal Wabbit & Xgboost • Instructions https://mlcourse.ai/prerequisites
  • 9.
    Lecture 1 • Dataanalysis with Pandas • Practice on first steps after getting data
  • 10.
    Lecture 2 • Visualdata analysis with Pandas and Seaborn • Crucial plots for feature exploration • Practice on «drawing»
  • 11.
    Lecture 3 • Foundationsof Machine Learning • Supervised learning • Decision trees • k Nearest Neighbours • Practice: first steps with Scikit-learn
  • 12.
    Lecture 4 • Linearclassification models • Regularization • Cross-validation • Practice on logistic regression for a "real-world" task
  • 13.
    Lecture 5 • Ensembles,random forest • Feature importance • Practice on random forest and assessing feature importance
  • 14.
    Lecture 6 • Regressiontask • Linear and non-linear regression models • Practice on grasping core ideas behind linear regression
  • 15.
    Lecture 7 • UnsupervisedLearning • Principal Component Analysis • Clustering • Practice: clustering Samsung Galaxy S3 sensor data into types of human activity
  • 16.
    Lecture 8 • StochasticGradient Descent & Online learning • Learning with a couple GB of data • Vowpal Wabbit • Extracting simple features from texts • Practice: text classification
  • 17.
    Lecture 9 • Timeseries • Classical and modern approaches • Practice: ARIMA model, Facebook Prophet
  • 18.
    Lecture 10 • Gradientboosting: a modern view • Theoretical basis for gradient boosting • Best implementations • Practice: beating a baseline in a Kaggle Inclass competition Regularization?
  • 19.
    Assignments • Full versionsare announced during course sessions https:// mlcourse.ai/assignments • Demo versions are found in course repo https:// github.com/Yorko/mlcourse.ai • And in a Kaggle Dataset mlcourse.ai https:// www.kaggle.com/kashnitsky/ mlcourse
  • 20.
    Kaggle Inclass • Alice- tracking visited websites to distinguish Alice from all others • Medium - predicting #claps for a story on Medium More info here https://mlcourse.ai/roadmap
  • 21.
    Individual projects • Throughoutthe whole course • Straightforward instructions • Your own data or just Kaggle Datasets • Peer review • Very cool experience More info here https://mlcourse.ai/roadmap
  • 22.
    Project "Alice" • Asubstitute for an individual project if you don't have cool ideas for one • Clear instructions • 6 weeks, 6 notebooks to complete • In cooperation with Yandex and MIPT, specialization "Machine Learning and Data Analysis" • Solutions are not shared Tutorials • Your own tutorials on pretty much any topic around ML & DS • Peer-voted • Nice way to grasp something yourself is to write a tutorial More info here https://mlcourse.ai/roadmap
  • 23.
    More info inSlack #mlcourse.ai, pinned items Good luck! https://mlcourse.ai/news