Email Mining- Application of Machine Learning and NLP on email data

Machine Learning projects coded using Python
Includes Data Cleaning and Feature Engineering files
The 'ProcessedCSV' file contains a spam dataset which was manually picked based on subject line content and sender email address. This dataset can be used to extract features for spam detection which can then be used to train an ML algorithm to detect spam.
Here is the link to the original Enron Email corpus: https://www.cs.cmu.edu/~enron/

The machine learning algorithms displayed in the MLClassification Algorithms notebook were used for spam detection. This was the code used for a group project in my first machine learning course and all three of us contributed to this. We obtained decent metrics when we tested out machine learning models. This file also contains a naive labelling function which allowed us to search for keywords and separate emails into different categories.

As part of this report, I also ran a network analysis on the most frequent email recipients. the graph and the code can be found in the 'EnronEmployeeNetworkGraph' file A full report of our findings can be found here: https://fariakh973079136.files.wordpress.com/2021/01/finalml-project-report-1.pdf

The file titled 'Email Thread Processing' contains all the preprocessing functions used to get the email files into a usable state for text analysis. However, the notebook doesn't seem to render so unless you download the notebook, you cannot see the functions and their outputs. If you are only interested in seeing the functions, please click on the 'EmailProcessingCode' Python file.

The file labelled 'NLPCode' contains the code used to preprocess email bodies into structured text. Includes code for tokenization, lemmatization, POS tagging and POS-tree parsing

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
ProcessedEnronDatasets		ProcessedEnronDatasets
DublinSIRIDataAnalysis.ipynb		DublinSIRIDataAnalysis.ipynb
Email Thread Processing.ipynb		Email Thread Processing.ipynb
EmailProcessingCode.py		EmailProcessingCode.py
EnronEmployeeNetworkGraph.ipynb		EnronEmployeeNetworkGraph.ipynb
EnronKmeansClustering.ipynb		EnronKmeansClustering.ipynb
MLClassificationAlgorithms.ipynb		MLClassificationAlgorithms.ipynb
NLPCode.py		NLPCode.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Mining- Application of Machine Learning and NLP on email data

About

Uh oh!

Releases

Packages

Languages

fariiaakh/EmailMining

Folders and files

Latest commit

History

Repository files navigation

Email Mining- Application of Machine Learning and NLP on email data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages