This project provides tools for analyzing and visualizing data from the Online Encyclopedia of Integer Sequences (OEIS) using Python and Neo4j graph database.
The project generates various graph visualizations of the OEIS sequence relationships. Here's an example of the network graph showing connections between sequences:
Network visualization showing relationships between OEIS sequences in the Neo4j database
- Web scraping of OEIS sequence pages
- Entity extraction using OpenAI's API
- Graph database population using Neo4j
- Asynchronous data processing for improved performance
- Various analysis queries and visualizations
- Secure environment variable management
- Python 3.10+
- Neo4j Database
- OpenAI API key
- Required Python packages (see installation section)
-
Clone the repository and navigate to the project directory
-
Install required Python packages:
pip install beautifulsoup4 neo4j aiohttp openai requests pandas python-dotenv- Set up environment variables:
- Create a
.envfile in the project root - Add your credentials:
- Create a
# Neo4j Database Configuration
NEO4J_PASSWORD=test1234
# OpenAI API Configuration
OPENAI_API_KEY=your-actual-openai-api-key-here- Start Neo4j database:
sudo docker run -p7474:7474 -p7687:7687 -d --env NEO4J_AUTH=neo4j/test1234 neo4j:latestload_oeis_batch.py- Main script for batch processing multiple OEIS folders (A000-A376)load_oeis_single.py- Processes a single OEIS folder for testingload_oeis_recursive.py- Recursively processes all OEIS foldersoeis_experiments.py- Experimental/draft code for OEIS processing
oeis_entity_extractor.py- Extracts mathematical concepts, authors, and cross-references using OpenAI's GPT APIoeis_entity_extraction.ipynb- Jupyter notebook for entity extraction analysis
neo4j_performance_optimization.py- Asynchronous processing for better performance with batch operations
queries.ipynb- Collection of useful Neo4j queries for data analysistext_link_title.py- Text processing utilitiestext_link_title_boilerplate.ipynb- Initial data exploration notebooklinks.py- Link processing utilities
python load_oeis_batch.pypython load_oeis_single.pypython load_oeis_recursive.pypython oeis_entity_extractor.pypython neo4j_performance_optimization.pyMATCH (n:Sequence)
WITH n, COUNT { (n)--() } as connectionCount
WHERE connectionCount >= 250
WITH COLLECT(n) as highlyConnectedNodes
UNWIND highlyConnectedNodes as n
MATCH (n)-[r]-(connected:Sequence)
WHERE connected IN highlyConnectedNodes
RETURN n, r, connected LIMIT 20;MATCH (n)-[r:AUTHOR]->(m)
WITH n, COUNT(DISTINCT m) AS aCount
WHERE aCount > 2
MATCH (n)-[r:AUTHOR]->(m)
RETURN n, r, m;MATCH (s1:Sequence)-[cr1:CROSSREFS]->(sm:Sequence)<-[cr2:CROSSREFS]-(s2:Sequence)
WITH s1, COUNT(DISTINCT sm) AS seqCount
WHERE seqCount > 60
MATCH (s1:Sequence)-[cr1:CROSSREFS]->(s2:Sequence)
RETURN s1, cr1, s2;- Never commit your
.envfile - it contains sensitive credentials - Regenerate your OpenAI API key if it was ever exposed in code
- Use environment variables for all sensitive configuration
- Extracts mathematical concepts, authors, and cross-references from OEIS pages
- Uses OpenAI's GPT API for intelligent entity recognition
- Processes sequences in batches for efficiency
- Implements asynchronous processing for better performance
- Handles batch operations for Neo4j database updates
- Includes progress tracking and error handling
- Batch Processing: Handles multiple specific folders efficiently
- Single Folder: Perfect for testing and development
- Recursive Processing: Processes all available OEIS data
The project generates various types of network visualizations to analyze different aspects of OEIS sequence relationships:
Combined visualization showing merged network analysis of OEIS sequences with multiple relationship types and clustering patterns
Network visualization focusing on highly connected sequences (>80 connections), revealing the most central and influential sequences in the OEIS database
Feel free to submit issues and enhancement requests.
This project is for educational and research purposes.


