An automated research paper generation and management system that leverages AI to create comprehensive survey papers and manage academic literature.
Auto-Research is a comprehensive toolkit for automating academic research workflows, featuring:
- Automated Survey Paper Generation: Generate high-quality survey papers on any topic using advanced AI models
- ArXiv Paper Synchronization: Download and manage papers from ArXiv with intelligent categorization
- Vector-Based Paper Search: Find relevant papers using semantic similarity search
- IEEE-Style LaTeX Formatting: Generate publication-ready papers in IEEE format
auto-research/
βββ sync_paper/ # ArXiv paper synchronization and database management
β βββ src/ # Core synchronization modules
β βββ test/ # Test suites
βββ write_paper/ # Automated paper generation system
β βββ src/ # Paper generation pipeline
β β βββ models/ # Data models
β β βββ nodes/ # Processing nodes
β β βββ providers/ # LLM providers (Ollama, OpenAI, etc.)
β βββ IEEE_Conference_Template/ # IEEE LaTeX templates
βββ Cline/ # MCP server implementations
βββ MCP/
βββ Ollama-mcp/ # Ollama MCP server
βββ mcp-server-firecrawl/ # Firecrawl MCP server
- Automated ArXiv Download: Fetches papers from ArXiv dataset via Kaggle API
- Smart Categorization: Filters papers by ML/AI categories (cs.AI, cs.CL, cs.CV, cs.LG, stat.ML)
- PostgreSQL Integration: Stores papers with pgvector for efficient similarity search
- Duplicate Detection: Prevents re-uploading existing papers
- Topic analysis and outline generation
- Vector similarity search for relevant papers
- Content synthesis using LLMs
- IEEE LaTeX formatting
A comprehensive 4-stage pipeline for high-quality surveys:
- Initial Retrieval & Outline Generation: Creates structured hierarchical outline
- Subsection Drafting: Targeted retrieval and drafting for each section
- Integration & Refinement: Refines and integrates sections cohesively
- Rigorous Evaluation: Iterative improvement based on quality metrics
- Ollama MCP: Integration with Ollama for local LLM inference
- Firecrawl MCP: Web scraping and content extraction capabilities
- Python 3.8+ (3.12+ recommended)
- Docker and Docker Compose
- PostgreSQL with pgvector extension
- Ollama (for local LLM inference)
- Node.js (for MCP servers)
- Clone the repository
git clone https://github.com/keyuchen21/auto-research.git
cd auto-research- Set up PostgreSQL with pgvector
docker compose up -d- Install Python dependencies
For paper synchronization:
cd sync_paper
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txtFor paper generation:
cd write_paper
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt- Configure Kaggle API (for ArXiv sync)
- Get API token from https://www.kaggle.com/settings
- Place
kaggle.jsonin~/.kaggle/ - Set permissions:
chmod 600 ~/.kaggle/kaggle.json
- Start Ollama (for paper generation)
ollama run llama2 # or your preferred modelcd sync_paper
uv run src/upload_paper.pyStandard pipeline:
cd write_paper
python -m src.main --topic "Large Language Models" \
--output output_directory \
--model llama2 \
--reference-num 1500AutoSurvey pipeline (recommended):
python -m src.main --topic "Multimodal Learning" \
--output output_directory \
--model llama2 \
--reference-num 1500 \
--autosurvey--topic: Research topic for the survey (required)--output: Output directory (default: "output")--model: Ollama model to use (default: "llama2")--reference-num: Number of papers to consider (default: 1500)--autosurvey: Enable advanced AutoSurvey pipeline
The system uses PostgreSQL with pgvector for efficient similarity search:
CREATE TABLE papers (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
authors TEXT[],
abstract TEXT,
categories TEXT[],
url TEXT,
published_date DATE,
embedding VECTOR(1536) -- For similarity search
);Create a .env file in the project root:
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=research_db
DB_USER=postgres
DB_PASSWORD=postgres
# Ollama
OLLAMA_HOST=http://localhost:11434
# OpenAI (optional)
OPENAI_API_KEY=your_api_key_hereGenerated papers include:
- LaTeX Source: IEEE-formatted
.texfile - Structured Content:
- Title and abstract
- Introduction with background
- Methodology sections
- Results and discussion
- Conclusions
- References in IEEE style
- Metadata: Generation parameters and statistics
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ArXiv for providing open access to research papers
- Ollama team for local LLM inference
- PostgreSQL and pgvector for efficient vector storage
- IEEE for LaTeX templates
For issues and questions:
- Open an issue on GitHub
- Check existing documentation in subdirectory READMEs