Skip to content

bman2338/NLP2012

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ngram.py - Uni/Bi-gram finder and sentence generator, perplexity calculator

	Run from the command line using the following command
	
		python ngram.py -f <filename> [options]

	<filename> is the path to the corpus file you would like to parse


	Options:
		 
		 -s enables stemming
			default: disabled
			python ngram.py -f Dataset3/train.txt -s
		
		 -f <filename> adds <filename> to the train set
			python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt
			
		 -pp <filename> calculates the perplexity of the model given <filename> as test data
		    python ngram.py -f Dataset3/Train.txt -pp Dataset3/Test.txt

		 -w <word> sets <word> as the first word in the sentence generator, otherwise chosen randomly
			default: random word from corpus
			python ngram.py -f Dataset3/Train.txt -w once

		 -l <length> sets the minimum sentence length to the number <length>
			default: 20	

		 -p <length> sets the passage length to the number <length>
			default: 100
			
		 -sm <method> uses <method> for smoothing, <method> in {'n' (none), 'a' (addone), 'i' (interpolated addone),
			'ig' (interpolated good-turing)}

	Example:
		
		python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt -w the -l 15 -p 5 -s

About

Too Cool for School

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages