Machine Learning and Data Science

This repository contains some of my work in the fields of Machine Learning and Data Science. The notebooks are experiments, courses, kaggle entries, hacks and code ideas from various sources. The work was done in python utilising numpy, pandas and scikit-learn. Some references are:

Udacity Coursera Kaggle DataCamp Sebastian Raschka Jason Brownlee Open Data Science

My Notebooks

Kaggle Data Science and Machine Learning

Donor Screening on Kaggle: Kaggle entry on Donors Screening
Advanced Regression Techniques: Exploratory Data Analysis and Regression techniques to predict houseprices.
Pandas Dataframe Optimisation: Reducing memory footprint of a Pandas Dataframe
Titanic Prediction: Using Transforms and Ensemble methods to predict Titanic survival
EDA on Titanic Dataset: Exploratory Data Analysis on the Titanic Kaggle Dataset
EDA and XGB Boost: Exploratory Data Analysis and Gradient Boosting on Kaggle Dataset entry
EDA on Kaggle Dataset: Playing around with seaborn and pandas on Kaggle ELO dataset
LightGBM on Kaggle Dataset: Feauture engineering and prediction on Kaggle ELO dataset
Credit Card Fraud: Predicting credit card fraud from Kaggle
Fake News: Building a simple fake news classifier

Machine Learning and Data Science

Internet Advertisement Detection: Predicting internet advertising with Decision Trees
Affinity Analysis: Utilise Affinity analysis on the MovieLens 100k dataset.
Regression Models in Pipeline: Evaluating various machine learning algorithms with pipelines on a dataset
Enron Fraud Investigation: Investigate fraud with ML techniques
Reduced Ensemble Training Time: How Pandas dataframe optimisation can speed up training times for random forest.
SKLearn Model Comparison: Trying different models on a binary classification problem
Decision Trees: Predicting NBA winners with Decision Tree based methods.
KNN: K-Nearest Neighbours investigation of dataset with SciKit Learn
XGBoost: Using Gradient Boosting on Facebook Dataset
Bagging: Simple experiment with bagging regressor
Explaining Decision Trees: My explanation of Decision Trees
Latent Semantic Analysis: Applying Latent Semantic Analysis to hotel reviews
Fraud Detection: Supervised and Unsupervised fraud detection techniques with unbalanced datasets
Unsupervised Machine Learning: Exercises on clustering, PCA and NMF
Olympic Medals: Exploring medal winners using manipulation of pandas dataframes
Water Pumps: Multiclass classification of functional water pumps with XGBoost tuning

Feature Transformation

Correlation Feature Selection: Feature selection, pieplines and custom transformers
Feature Engineering part1: Feature Engineering a datset - cleaning and transforming
Feature Transformation with PCA: Investigating and applying PCA to ML pipeline
Facial Recognition: Using ML pipelines with PCA, LDA and Logistic Regression

Visualisation

Data Visualisation Part1: Plotting with matplotlib
Data Visualisation Part2: Plotting with Pandas
Visualising Flight Data 2015: Plotting flight data with pandas
Seaborn Visualisation: Plotting Diamond dataset with seaborn

Keras Deep Learning

Deep Learning Basics

Deep Learning Network Tutorial: Exploring MLP, CNN and RNNs
SciKit Learn: Using cross validation and grid search on a simple dataset
Multiclass Classification: Uses famous iris dataset
Binary Classification: Predict binary target data
Regression: Regression of House prices and tuning of network topology

Better Deep Learning

Network Capacity: Investigate impact of changing model capacity on a complex multiclass dataset
Batch Size and Gradient Descent: Investigating Batch, Stochastic and Minibatch Gradient descent
Dropout: Investigate Dropout techniques and evaluate performance on Deep learning model
Learning Rates: Investigate Learning Rate techniques and evaluate performance on Deep learning model
Checkpoints: Use Keras API to checkpoint and save model weights
Training History: Use Keras API to display training and test history
Early Stopping: Use Keras API to employ early stopping on a dataset

NLP

Preparing Text Data: Preparing and applying scikit-learn and keras vectorizers to text data
Movie Reviews: Formatting IMDB movie review data ready for analysis
Sentiment Analysis: Sentiment analysis on the prepared movie reviews using Bag of Words model.
Model Validation: Model validation of the trained movie reviews
Language Model: Tring to predict text using LSTMs
IMDB Sentiment Analysis: Basic CNN and NLP on the IMDB datset
Quora Kaggle Challenge: First Attempt at Kaggle Quora challenge using word embedding and CNN model

CNN

Keras Covnets: Using the Keras Framework tools to process images
Pre-Trained Covnet: Using the Keras Framework tools to process images using pre-trained covnet
Whale Id: First attempt at Kaggle Whale Id challenge

LSTM

Prediction: Trying to predict airline passeneger numbers with LSTMs

Audio

Audio Classification Part1: Looking at the audio classification of street sounds

Notebook Assignments

Various completed notebook assignments and projects from on-line courses utilising pandas and sklearn.

Coursera University of Washington

Gradient Descent: Assignment on Gradient Descent.
Polynomial Regression: Assignment on Polynomial Regression.
Ridge and Lasso Regression: Assignment on Ridge and Lasso Regression.
Lasso Regularisation: Assignment on Lasso.

Datacamp

Superbowl Halftime Shows: Analysis and visualisation with Seaborn
Machine Learning Topics: Analysis, visualisation, NLP and LDA
Disease Analysis: Analysis and visualisation of the work of a medical pioneer
Author Gender Distribution: Analysis and text processing
Traffic Mortality: Data cleaning, analysis, visualisation and prediction
Song Classification: PCA, classification and prediction
Credit Card Classification: Cleaning, Logisic Regression
Nobel Prize Analysis: Analysis and visualisation with Seaborn
Movie Similarity: NLP and Kmeans clustering
Exploring Cryptocurrencies: Exploring the bitcoin market
Predicting Bees: Deep learning on bee images
Bad Passwords: Cleaning data, NLP

mlcourse.ai

Analysing Cardio Data: Cleaning, visualisation and data analysis of health data

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
DataScienceNotebooks		DataScienceNotebooks
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning and Data Science

My Notebooks

Kaggle Data Science and Machine Learning

Machine Learning and Data Science

Feature Transformation

Visualisation

Keras Deep Learning

Deep Learning Basics

Better Deep Learning

NLP

CNN

LSTM

Audio

Notebook Assignments

Coursera University of Washington

Datacamp

mlcourse.ai

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning and Data Science

My Notebooks

Kaggle Data Science and Machine Learning

Machine Learning and Data Science

Feature Transformation

Visualisation

Keras Deep Learning

Deep Learning Basics

Better Deep Learning

NLP

CNN

LSTM

Audio

Notebook Assignments

Coursera University of Washington

Datacamp

mlcourse.ai

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages