This project demonstrates tokenization and stemming using NLTK's Porter and Lancaster stemmers, containerized with Docker.
Clone the repository:
git clone https://github.com/learningwithmainsh/Stemming.git
cd Stemming.
βββ Dockerfile
βββ requirements.txt
βββ stemmer.py
βββ README.md
Ensure you have Docker installed. You can verify by running:
docker --versiondocker build -t nlp-stemmer .docker run --rm nlp-stemmerIf you want to check logs from a running container:
docker run -d --name nlp-stemmer nlp-stemmer
docker logs nlp-stemmer- Dockerfile: Contains instructions to build the Docker image.
- requirements.txt: Lists required Python packages.
- stemmer.py: Python script for tokenization and stemming.
- README.md: This documentation.
The script processes the following sample text:
text = "Running ran easily quickly."Tokenized Words: ['Running', 'ran', 'easily', 'quickly', '.']
Porter Stemmed Words: ['run', 'ran', 'easili', 'quickli', '.']
Lancaster Stemmed Words: ['run', 'ran', 'easy', 'quick', '.']
To remove all stopped containers and dangling images:
docker system prune -fFeel free to fork this repo and open a pull request with any improvements!
Manish Pandey
Created by Manish Pandey. Feel free to reach out for any queries or collaborations!
Happy Coding! π