This repository contains some of my work in the fields of Machine Learning and Data Science. The notebooks are experiments, courses, kaggle entries, hacks and code ideas from various sources. The work was done in python utilising numpy, pandas and scikit-learn. Some references are:
Udacity Coursera Kaggle DataCamp Sebastian Raschka Jason Brownlee Open Data Science
- Donor Screening on Kaggle: Kaggle entry on Donors Screening
- Advanced Regression Techniques: Exploratory Data Analysis and Regression techniques to predict houseprices.
- Pandas Dataframe Optimisation: Reducing memory footprint of a Pandas Dataframe
- Titanic Prediction: Using Transforms and Ensemble methods to predict Titanic survival
- EDA on Titanic Dataset: Exploratory Data Analysis on the Titanic Kaggle Dataset
- EDA and XGB Boost: Exploratory Data Analysis and Gradient Boosting on Kaggle Dataset entry
- EDA on Kaggle Dataset: Playing around with seaborn and pandas on Kaggle ELO dataset
- LightGBM on Kaggle Dataset: Feauture engineering and prediction on Kaggle ELO dataset
- Credit Card Fraud: Predicting credit card fraud from Kaggle
- Fake News: Building a simple fake news classifier
- Internet Advertisement Detection: Predicting internet advertising with Decision Trees
- Affinity Analysis: Utilise Affinity analysis on the MovieLens 100k dataset.
- Regression Models in Pipeline: Evaluating various machine learning algorithms with pipelines on a dataset
- Enron Fraud Investigation: Investigate fraud with ML techniques
- Reduced Ensemble Training Time: How Pandas dataframe optimisation can speed up training times for random forest.
- SKLearn Model Comparison: Trying different models on a binary classification problem
- Decision Trees: Predicting NBA winners with Decision Tree based methods.
- KNN: K-Nearest Neighbours investigation of dataset with SciKit Learn
- XGBoost: Using Gradient Boosting on Facebook Dataset
- Bagging: Simple experiment with bagging regressor
- Explaining Decision Trees: My explanation of Decision Trees
- Latent Semantic Analysis: Applying Latent Semantic Analysis to hotel reviews
- Fraud Detection: Supervised and Unsupervised fraud detection techniques with unbalanced datasets
- Unsupervised Machine Learning: Exercises on clustering, PCA and NMF
- Olympic Medals: Exploring medal winners using manipulation of pandas dataframes
- Water Pumps: Multiclass classification of functional water pumps with XGBoost tuning
- Correlation Feature Selection: Feature selection, pieplines and custom transformers
- Feature Engineering part1: Feature Engineering a datset - cleaning and transforming
- Feature Transformation with PCA: Investigating and applying PCA to ML pipeline
- Facial Recognition: Using ML pipelines with PCA, LDA and Logistic Regression
- Data Visualisation Part1: Plotting with matplotlib
- Data Visualisation Part2: Plotting with Pandas
- Visualising Flight Data 2015: Plotting flight data with pandas
- Seaborn Visualisation: Plotting Diamond dataset with seaborn
- Deep Learning Network Tutorial: Exploring MLP, CNN and RNNs
- SciKit Learn: Using cross validation and grid search on a simple dataset
- Multiclass Classification: Uses famous iris dataset
- Binary Classification: Predict binary target data
- Regression: Regression of House prices and tuning of network topology
- Network Capacity: Investigate impact of changing model capacity on a complex multiclass dataset
- Batch Size and Gradient Descent: Investigating Batch, Stochastic and Minibatch Gradient descent
- Dropout: Investigate Dropout techniques and evaluate performance on Deep learning model
- Learning Rates: Investigate Learning Rate techniques and evaluate performance on Deep learning model
- Checkpoints: Use Keras API to checkpoint and save model weights
- Training History: Use Keras API to display training and test history
- Early Stopping: Use Keras API to employ early stopping on a dataset
- Preparing Text Data: Preparing and applying scikit-learn and keras vectorizers to text data
- Movie Reviews: Formatting IMDB movie review data ready for analysis
- Sentiment Analysis: Sentiment analysis on the prepared movie reviews using Bag of Words model.
- Model Validation: Model validation of the trained movie reviews
- Language Model: Tring to predict text using LSTMs
- IMDB Sentiment Analysis: Basic CNN and NLP on the IMDB datset
- Quora Kaggle Challenge: First Attempt at Kaggle Quora challenge using word embedding and CNN model
- Keras Covnets: Using the Keras Framework tools to process images
- Pre-Trained Covnet: Using the Keras Framework tools to process images using pre-trained covnet
- Whale Id: First attempt at Kaggle Whale Id challenge
- Prediction: Trying to predict airline passeneger numbers with LSTMs
- Audio Classification Part1: Looking at the audio classification of street sounds
Various completed notebook assignments and projects from on-line courses utilising pandas and sklearn.
- Gradient Descent: Assignment on Gradient Descent.
- Polynomial Regression: Assignment on Polynomial Regression.
- Ridge and Lasso Regression: Assignment on Ridge and Lasso Regression.
- Lasso Regularisation: Assignment on Lasso.
- Superbowl Halftime Shows: Analysis and visualisation with Seaborn
- Machine Learning Topics: Analysis, visualisation, NLP and LDA
- Disease Analysis: Analysis and visualisation of the work of a medical pioneer
- Author Gender Distribution: Analysis and text processing
- Traffic Mortality: Data cleaning, analysis, visualisation and prediction
- Song Classification: PCA, classification and prediction
- Credit Card Classification: Cleaning, Logisic Regression
- Nobel Prize Analysis: Analysis and visualisation with Seaborn
- Movie Similarity: NLP and Kmeans clustering
- Exploring Cryptocurrencies: Exploring the bitcoin market
- Predicting Bees: Deep learning on bee images
- Bad Passwords: Cleaning data, NLP
- Analysing Cardio Data: Cleaning, visualisation and data analysis of health data