-
Notifications
You must be signed in to change notification settings - Fork 2
bman2338/NLP2012
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ngram.py - Uni/Bi-gram finder and sentence generator, perplexity calculator
Run from the command line using the following command
python ngram.py -f <filename> [options]
<filename> is the path to the corpus file you would like to parse
Options:
-s enables stemming
default: disabled
python ngram.py -f Dataset3/train.txt -s
-f <filename> adds <filename> to the train set
python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt
-pp <filename> calculates the perplexity of the model given <filename> as test data
python ngram.py -f Dataset3/Train.txt -pp Dataset3/Test.txt
-w <word> sets <word> as the first word in the sentence generator, otherwise chosen randomly
default: random word from corpus
python ngram.py -f Dataset3/Train.txt -w once
-l <length> sets the minimum sentence length to the number <length>
default: 20
-p <length> sets the passage length to the number <length>
default: 100
-sm <method> uses <method> for smoothing, <method> in {'n' (none), 'a' (addone), 'i' (interpolated addone),
'ig' (interpolated good-turing)}
Example:
python ngram.py -f Dataset3/Train.txt -f Dataset3/Test.txt -w the -l 15 -p 5 -s
About
Too Cool for School
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published