A Java-based poetry generation system that uses Markov chains and hash tables to create poems based on word frequency analysis. The trained version is trained off of the english poems at https://zenodo.org/records/10907309
This Poetry Generator analyzes text files to build statistical models of word relationships and generates original poetry. It uses a custom hash table implementation with quadratic probing for efficient word lookup and frequency tracking.
The easiest way to get started is with the pre-trained model:
cd trained
java HashingPoetryEnter a starting word and poem length when prompted.
To train the generator on your own text data:
- Place your text files in the
untrained/data/folder- Use plain text files (.txt format) or JSON files (.json format)
- Each file should contain the text you want to train on (poems, stories, etc.)
- The cleaning script will combine and preprocess all .txt and .json files in the folder
- Run the cleaning script to preprocess the data:
cd untrained/data python combine_and_clean_v2.py - Compile the Java files:
cd ../ javac *.java
- Run the generator (this will train on your data):
java HashingPoetry
poetryGenerator/
├── trained/ # Compiled Java classes
├── untrained/ # Source code
│ ├── HashingPoetry.java # Main application
│ ├── WritePoetry.java # Poetry generation engine
│ ├── HashTable.java # Custom hash table implementation
│ ├── WordFreqInfo.java # Word frequency tracking
│ ├── ProgressBar.java # Progress display utility
│ └── data/
│ └── combine_and_clean_v2.py # Text preprocessing script
└── README.md