Skip to content

mpandey95/Stemming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NLP Stemming Pipeline

This project demonstrates tokenization and stemming using NLTK's Porter and Lancaster stemmers, containerized with Docker.

Installation

Clone the repository:

git clone https://github.com/learningwithmainsh/Stemming.git
cd Stemming

Project Structure

.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ stemmer.py
β”œβ”€β”€ README.md

Prerequisites

Ensure you have Docker installed. You can verify by running:

docker --version

Setup and Usage

1. Build the Docker image

docker build -t nlp-stemmer .

2. Run the Docker container

docker run --rm nlp-stemmer

3. Check container logs (optional)

If you want to check logs from a running container:

docker run -d --name nlp-stemmer nlp-stemmer

docker logs nlp-stemmer

Files

  • Dockerfile: Contains instructions to build the Docker image.
  • requirements.txt: Lists required Python packages.
  • stemmer.py: Python script for tokenization and stemming.
  • README.md: This documentation.

NLTK Stemming Example

The script processes the following sample text:

text = "Running ran easily quickly."

Sample Output

Tokenized Words: ['Running', 'ran', 'easily', 'quickly', '.']

Porter Stemmed Words: ['run', 'ran', 'easili', 'quickli', '.']

Lancaster Stemmed Words: ['run', 'ran', 'easy', 'quick', '.']

Cleanup

To remove all stopped containers and dangling images:

docker system prune -f

Contributing

Feel free to fork this repo and open a pull request with any improvements!

Author

Manish Pandey

Author πŸ‘€

Created by Manish Pandey. Feel free to reach out for any queries or collaborations!


Happy Coding! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published