Auto-Research

An automated research paper generation and management system that leverages AI to create comprehensive survey papers and manage academic literature.

🚀 Overview

Auto-Research is a comprehensive toolkit for automating academic research workflows, featuring:

Automated Survey Paper Generation: Generate high-quality survey papers on any topic using advanced AI models
ArXiv Paper Synchronization: Download and manage papers from ArXiv with intelligent categorization
Vector-Based Paper Search: Find relevant papers using semantic similarity search
IEEE-Style LaTeX Formatting: Generate publication-ready papers in IEEE format

📁 Project Structure

auto-research/
├── sync_paper/          # ArXiv paper synchronization and database management
│   ├── src/            # Core synchronization modules
│   └── test/           # Test suites
├── write_paper/        # Automated paper generation system
│   ├── src/            # Paper generation pipeline
│   │   ├── models/     # Data models
│   │   ├── nodes/      # Processing nodes
│   │   └── providers/  # LLM providers (Ollama, OpenAI, etc.)
│   └── IEEE_Conference_Template/  # IEEE LaTeX templates
└── Cline/              # MCP server implementations
    └── MCP/
        ├── Ollama-mcp/        # Ollama MCP server
        └── mcp-server-firecrawl/  # Firecrawl MCP server

🎯 Key Features

1. Paper Synchronization (`sync_paper`)

Automated ArXiv Download: Fetches papers from ArXiv dataset via Kaggle API
Smart Categorization: Filters papers by ML/AI categories (cs.AI, cs.CL, cs.CV, cs.LG, stat.ML)
PostgreSQL Integration: Stores papers with pgvector for efficient similarity search
Duplicate Detection: Prevents re-uploading existing papers

2. Survey Paper Generation (`write_paper`)

Standard Pipeline

Topic analysis and outline generation
Vector similarity search for relevant papers
Content synthesis using LLMs
IEEE LaTeX formatting

AutoSurvey Pipeline (Advanced)

A comprehensive 4-stage pipeline for high-quality surveys:

Initial Retrieval & Outline Generation: Creates structured hierarchical outline
Subsection Drafting: Targeted retrieval and drafting for each section
Integration & Refinement: Refines and integrates sections cohesively
Rigorous Evaluation: Iterative improvement based on quality metrics

3. MCP Servers

Ollama MCP: Integration with Ollama for local LLM inference
Firecrawl MCP: Web scraping and content extraction capabilities

🛠️ Installation

Prerequisites

Python 3.8+ (3.12+ recommended)
Docker and Docker Compose
PostgreSQL with pgvector extension
Ollama (for local LLM inference)
Node.js (for MCP servers)

Quick Start

Clone the repository

git clone https://github.com/keyuchen21/auto-research.git
cd auto-research

Set up PostgreSQL with pgvector

docker compose up -d

Install Python dependencies

For paper synchronization:

cd sync_paper
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

For paper generation:

cd write_paper
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configure Kaggle API (for ArXiv sync)

Get API token from https://www.kaggle.com/settings
Place kaggle.json in ~/.kaggle/
Set permissions: chmod 600 ~/.kaggle/kaggle.json

Start Ollama (for paper generation)

ollama run llama2  # or your preferred model

📖 Usage

Sync ArXiv Papers

cd sync_paper
uv run src/upload_paper.py

Generate Survey Paper

Standard pipeline:

cd write_paper
python -m src.main --topic "Large Language Models" \
                   --output output_directory \
                   --model llama2 \
                   --reference-num 1500

AutoSurvey pipeline (recommended):

python -m src.main --topic "Multimodal Learning" \
                   --output output_directory \
                   --model llama2 \
                   --reference-num 1500 \
                   --autosurvey

Parameters

--topic: Research topic for the survey (required)
--output: Output directory (default: "output")
--model: Ollama model to use (default: "llama2")
--reference-num: Number of papers to consider (default: 1500)
--autosurvey: Enable advanced AutoSurvey pipeline

📊 Database Schema

The system uses PostgreSQL with pgvector for efficient similarity search:

CREATE TABLE papers (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    authors TEXT[],
    abstract TEXT,
    categories TEXT[],
    url TEXT,
    published_date DATE,
    embedding VECTOR(1536)  -- For similarity search
);

🔧 Configuration

Environment Variables

Create a .env file in the project root:

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=research_db
DB_USER=postgres
DB_PASSWORD=postgres

# Ollama
OLLAMA_HOST=http://localhost:11434

# OpenAI (optional)
OPENAI_API_KEY=your_api_key_here

📝 Output Format

Generated papers include:

LaTeX Source: IEEE-formatted .tex file
Structured Content:
- Title and abstract
- Introduction with background
- Methodology sections
- Results and discussion
- Conclusions
- References in IEEE style
Metadata: Generation parameters and statistics

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

ArXiv for providing open access to research papers
Ollama team for local LLM inference
PostgreSQL and pgvector for efficient vector storage
IEEE for LaTeX templates

📞 Support

For issues and questions:

Open an issue on GitHub
Check existing documentation in subdirectory READMEs

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Cline		Cline
auto-research		auto-research
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Auto-Research

🚀 Overview

📁 Project Structure

🎯 Key Features

1. Paper Synchronization (`sync_paper`)

2. Survey Paper Generation (`write_paper`)

Standard Pipeline

AutoSurvey Pipeline (Advanced)

3. MCP Servers

🛠️ Installation

Prerequisites

Quick Start

📖 Usage

Sync ArXiv Papers

Generate Survey Paper

Parameters

📊 Database Schema

🔧 Configuration

Environment Variables

📝 Output Format

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🚦 Status

About

Uh oh!

Releases

Packages

Languages

keyuchen21/auto-research

Folders and files

Latest commit

History

Repository files navigation

Auto-Research

🚀 Overview

📁 Project Structure

🎯 Key Features

1. Paper Synchronization (sync_paper)

2. Survey Paper Generation (write_paper)

Standard Pipeline

AutoSurvey Pipeline (Advanced)

3. MCP Servers

🛠️ Installation

Prerequisites

Quick Start

📖 Usage

Sync ArXiv Papers

Generate Survey Paper

Parameters

📊 Database Schema

🔧 Configuration

Environment Variables

📝 Output Format

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🚦 Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Paper Synchronization (`sync_paper`)

2. Survey Paper Generation (`write_paper`)

Packages