This project is an email classifier that uses k-NN statistical analysis to classify emails as spam or ham. It uses the Parallel Java 2 library to run the analysis in parallel thus increasing throughput. The dataset used is the enron email dataset which contains 0.5 million email records.
- src : Contains the source files
- sampleFiles: Contains sample files that can be used to run the program.
- dataFiles: Contains data files used to train and test the system.
- scripts: Contains python scripts used to clean the data files.
- Report.pdf: Contains the report of the entire project along with conclusions, results and how to install and run the system.
Detailed installation and running guide can be found in the Report.pdf file.
- Cliffton Fernandes (https://github.com/cliffton)
- Nikhil Keswaney (https://github.com/nikhilkeswaney)
