GitHub - bman2338/NLP2012: Too Cool for School

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Dataset3		Dataset3
EnronDataset		EnronDataset
QA		QA
WSD		WSD
Dude		Dude
README		README
authors.py		authors.py
ngram.py		ngram.py
nlp		nlp
nlp.pub		nlp.pub

Repository files navigation

ngram.py - Uni/Bi-gram finder and sentence generator, perplexity calculator

	Run from the command line using the following command
	
		python ngram.py -f <filename> [options]

	<filename> is the path to the corpus file you would like to parse


	Options:
		 
		 -s enables stemming
			default: disabled
			python ngram.py -f Dataset3/train.txt -s
		
		 -f <filename> adds <filename> to the train set
			python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt
			
		 -pp <filename> calculates the perplexity of the model given <filename> as test data
		    python ngram.py -f Dataset3/Train.txt -pp Dataset3/Test.txt

		 -w <word> sets <word> as the first word in the sentence generator, otherwise chosen randomly
			default: random word from corpus
			python ngram.py -f Dataset3/Train.txt -w once

		 -l <length> sets the minimum sentence length to the number <length>
			default: 20	

		 -p <length> sets the passage length to the number <length>
			default: 100
			
		 -sm <method> uses <method> for smoothing, <method> in {'n' (none), 'a' (addone), 'i' (interpolated addone),
			'ig' (interpolated good-turing)}

	Example:
		
		python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt -w the -l 15 -p 5 -s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

bman2338/NLP2012

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages