Skip to content

Using Python to build a classifier with different algorithm (KNN, NB, SVM, Neural Networks)

Notifications You must be signed in to change notification settings

alifars/training-classifier-in-python

Repository files navigation

training-classifier-in-python

Background

This project is using Python to build a spam filter with Naive Bayes, Support Vector Machine (SVM) and Neural Network using Enron Email Corpus(ham:16454, spam:17171). (http://www.cs.cmu.edu/~enron/). Useful information can be found at http://www.aueb.gr/users/ion/docs/ceas2006_paper.pdf.

Tools

scikit-learn http://scikit-learn.org/stable/

NLTK http://nltk.org/

BeautifulSoup http://www.crummy.com/software/BeautifulSoup/

Scipy http://www.scipy.org/

Numpy http://www.numpy.org/

Project Details

  1. Loading enron emails corpus into memory

  2. Tokenizing files into word, and store them into lists

  3. Feature extraction

  4. Feature selection based on the words from corpus

  5. Training classifiers with Naive Bayes, SVM and AdaBoosting algorithms

  6. Evaluating the classifier

  7. Using Adaboost method to improve the accuaracy

  8. Checking results and improve the speed using Numpy & Scipy

1). more datasets 2). reduce demonsionality 3). add prori probability 4). optimizate the program

Books

  1. Web Data Mining

  2. Programming Collective Intelligence

  3. Machine Learning in Action

  4. Scipy and Numpy

  5. Building Machine System with Python

About

Using Python to build a classifier with different algorithm (KNN, NB, SVM, Neural Networks)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published