A comprehensive Streamlit application for generating and evaluating AI-powered research paper summaries.
- GPT-4 and GPT-4-Mini from OpenAI
- Claude-3-Sonnet and Claude-3-Haiku from Anthropic
- Deepseek-Chat model
- Llama-3-8B and Mistral-7B open-source models
- Upload BibTeX files with paper metadata and abstracts
- Upload PDF files for full-text processing
- Configurable AI models and parameters
- Custom prompt templates (Layman, Technical, Executive, Educational)
- Temperature control for creativity adjustment
- Generation history tracking
- Side-by-side comparison of original abstracts and generated summaries
- Factuality rating (1-5 scale)
- Readability rating (1-5 scale)
- Optional evaluator comments
- Random paper selection for unbiased evaluation
- Model performance metrics
- Evaluation statistics and trends
- Data visualization with interactive charts
- CSV data export functionality
- Python 3.8 or higher
- Internet connection for AI model access
- AbacusAI API key (required for AI model access)
Note: No external database installation required! The application uses DuckDB with Parquet file storage for a completely self-contained setup.
The application uses AbacusAI's unified API to access all AI models (GPT-4, Claude, Deepseek, Llama, Mistral) through a single API key.
-
Get your AbacusAI API key:
- Visit AbacusAI
- Sign up for an account or log in
- Navigate to your account settings or API section
- Generate a new API key
- Copy the key for the next step
-
Set up environment variables:
# Copy the example environment file cp .env-example .env # Edit the .env file and add your API key nano .env # OR use any text editor of your choice
-
Configure your API key in the .env file:
# Replace 'your_abacusai_api_key_here' with your actual API key ABACUSAI_API_KEY=your_actual_api_key_here
Important considerations:
- The AbacusAI API provides access to premium AI models and may have usage limits
- API usage may incur costs depending on your AbacusAI plan
- Monitor your API usage through your AbacusAI dashboard
- Consider starting with smaller batches to understand usage patterns
Keep your API key secure:
- Never commit the
.envfile to version control - Never share your API key publicly
- Store it securely and treat it like a password
- Consider using environment variables in production deployments
After setting up your API key, you can verify it works by:
- Starting the application (see Quick Setup below)
- Navigating to the Generator interface
- Trying to generate a summary with a small test input
- Check that the AI models are accessible and responding
- Clone or download the application
cd /home/ubuntu
git clone <repository-url> research_summary_app
# OR extract from ZIP file- Navigate to the project directory
cd research_summary_app- Run the setup script
python setup.py- Start the application
streamlit run app.py- Access the application
Open your browser and navigate to
http://localhost:8501
If you prefer manual installation:
- Install dependencies
pip install -r requirements.txt-
Set up environment variables Follow the Environment Setup section above to configure your AbacusAI API key.
-
Create required directories
mkdir -p data/uploads data/exports data/db logs- Initialize the database
python -c "from src.database.models import create_tables; create_tables()"- Start the application
streamlit run app.pySimplified Setup: No database server installation or configuration needed! DuckDB and Parquet files are created automatically in the
data/db/directory.
For the easiest and most portable deployment, you can run the application using Docker. This method ensures consistent behavior across different operating systems and eliminates dependency issues.
- Docker Desktop installed on your system
- Your AbacusAI API key (see Environment Setup section above)
- Clone or download the application
cd /home/ubuntu
git clone <repository-url> research_summary_app
# OR extract from ZIP file- Navigate to the project directory
cd research_summary_app- Set up environment variables for Docker
# Copy the Docker environment example file
cp .env.docker-example .env
# Edit the .env file and add your API key
nano .env
# OR use any text editor of your choice- Configure your API key in the .env file
# Replace 'your_abacusai_api_key_here' with your actual API key
ABACUSAI_API_KEY=your_actual_api_key_here- Build and run with Docker Compose
# Build and start the application
docker-compose up --build
# OR run in detached mode (background)
docker-compose up -d --build- Access the application
Open your browser and navigate to
http://localhost:8501
# Start the application (build if needed)
docker-compose up --build
# Start in background (detached mode)
docker-compose up -d
# Stop the application
docker-compose down
# View application logs
docker-compose logs -f
# Rebuild the image (if you made changes)
docker-compose build --no-cache
# View running containers
docker-compose ps
# Access the container shell (for debugging)
docker-compose exec research-app /bin/bash✅ Consistent Environment: Same behavior across Windows, macOS, and Linux
✅ Easy Setup: No need to install Python, dependencies, or manage versions
✅ Isolation: Application runs in its own container without affecting your system
✅ Data Persistence: Your data is automatically saved and persists between container restarts
✅ Easy Updates: Simply pull new code and rebuild to update the application
Port Already in Use
# If port 8501 is already in use, modify docker-compose.yml
# Change the ports section to use a different port:
ports:
- "8502:8501" # Use port 8502 insteadPermission Issues on Linux/macOS
# Fix data directory permissions
sudo chown -R $USER:$USER ./data ./logs
chmod -R 755 ./data ./logsContainer Won't Start
# Check logs for errors
docker-compose logs research-app
# Check if all required environment variables are set
docker-compose configAPI Key Not Working
# Verify your .env file is properly configured
cat .env
# Make sure the API key is valid by testing it manually
# The container logs will show warnings if the API key is missingDatabase Issues
# If you encounter database issues, you can reset the data
# WARNING: This will delete all your data
rm -rf ./data/db/*
docker-compose restartFor production deployment, consider these additional configurations:
- Use a reverse proxy (nginx, Traefik) for SSL termination
- Set up monitoring and health checks
- Configure backup strategies for the data directory
- Use Docker secrets for sensitive environment variables
- Consider using Docker Swarm or Kubernetes for scaling
| Feature | Docker | Traditional |
|---|---|---|
| Setup Time | ⚡ 5 minutes | 🕐 10-15 minutes |
| Dependencies | ✅ Included | ❌ Manual install |
| Portability | ✅ Works everywhere | ❌ OS-specific |
| Isolation | ✅ Containerized | ❌ System-wide |
| Updates | ✅ Simple rebuild | ❌ Manual process |
The following screenshots showcase the key interfaces and features of the Research made Readable application:
The home page welcomes users with a clean interface featuring the application title "Research made Readable" and role-based navigation. Users can choose between different roles (Content Generator, Content Evaluator) to access specific functionality tailored to their needs.
The Generator interface provides comprehensive tools for creating AI-powered research summaries. Key features include:
- File upload support for BibTeX and PDF files
- AI model selection (GPT-4, Claude, Deepseek, etc.)
- Input mode selection (Abstract or Full PDF processing)
- Configurable parameters including temperature control
- Multiple prompt templates (Layman, Technical, Executive, Educational)
- Custom prompt editing capabilities
The generator interface shows detailed configuration options including:
- AI model dropdown with multiple options (GPT-4, Claude-3-Sonnet, etc.)
- Temperature slider for controlling output creativity (0.00 to 1.00)
- Prompt template selection with predefined options
- Custom prompt text area for personalized instructions
- Generate Summary button to initiate the AI processing
The Evaluator interface enables comprehensive quality assessment of generated summaries:
- Side-by-side comparison of original abstracts and AI-generated summaries
- Factuality rating system (1-5 scale) to assess accuracy
- Readability rating system (1-5 scale) to evaluate clarity and comprehension
- Optional comments section for detailed feedback
- Submit, Skip, and Refresh buttons for efficient evaluation workflow
The Analytics Dashboard provides comprehensive insights into summary generation and evaluation performance:
- Overview of analytics and insights from summary evaluations
- Performance metrics visualization
- Data-driven insights for model comparison
- Export functionality for detailed analysis
- Clean, professional interface for monitoring application usage
- Navigate to the Generator page
- Upload files:
- BibTeX files (.bib) containing paper metadata and abstracts
- PDF files of research papers
- Configure generation settings:
- Select AI model (GPT-4, Claude, etc.)
- Choose input mode (Abstract or Full PDF)
- Select prompt template or write custom prompt
- Adjust temperature for creativity
- Generate summaries and review results
- Save summaries to the database
- Navigate to the Evaluator page
- Review presented papers:
- Original abstract on the left
- Generated summary on the right
- Rate the summary:
- Factuality (1-5): How accurate is the summary?
- Readability (1-5): How clear and understandable is it?
- Add optional comments
- Submit evaluation and continue to next paper
- Navigate to the Dashboard page
- View performance metrics:
- Overall evaluation statistics
- Model-by-model performance comparison
- Interactive charts and visualizations
- Export data:
- Download complete dataset as CSV files
- Use for external analysis and reporting
research_summary_app/
├── app.py # Main Streamlit application
├── setup.py # Setup script
├── requirements.txt # Python dependencies
├── README.md # Documentation
├── src/
│ ├── ai_models/
│ │ ├── model_interface.py # AI model integration
│ │ └── prompts.py # Default prompts
│ ├── database/
│ │ ├── models.py # DuckDB schema definitions
│ │ └── operations.py # Database operations
│ ├── parsers/
│ │ ├── bibtex_parser.py # BibTeX file parser
│ │ └── pdf_parser.py # PDF text extraction
│ ├── ui_components/
│ │ ├── generator_interface.py # Generator UI
│ │ ├── evaluator_interface.py # Evaluator UI
│ │ └── dashboard_interface.py # Dashboard UI
│ └── utils/
│ ├── session_manager.py # Session management
│ └── helpers.py # Utility functions
├── data/
│ ├── db/ # DuckDB and Parquet files
│ │ ├── research_app.duckdb # DuckDB database file
│ │ ├── papers.parquet # Papers data storage
│ │ ├── summaries.parquet # Summaries data storage
│ │ ├── translations.parquet # Translations data storage
│ │ └── evaluations.parquet # Evaluations data storage
│ ├── uploads/ # Uploaded files
│ └── exports/ # Exported data
├── tests_and_debug/ # Testing and debugging files
│ ├── README.md # Testing documentation
│ ├── test_app.py # Interactive BibTeX parser test
│ ├── test_bibtex.bib # Test BibTeX data
│ ├── debug_bibtex_detailed.py # Detailed BibTeX debugging
│ ├── debug_standalone.py # Standalone parser testing
│ ├── debug_validation.py # Validation step debugging
│ ├── test_bibtex_debug.py # BibTeX parser unit tests
│ └── test_fixed_parser.py # Fixed parser implementation tests
└── docs/
└── deployment.md # Deployment instructions
The application uses DuckDB with Parquet file storage for optimal performance and portability:
- DuckDB: Fast, embedded SQL database for queries and operations
- Parquet Files: Columnar storage format for efficient data storage and retrieval
- Self-contained: No external database server required
- Papers (
papers.parquet): Research paper metadata, abstracts, and full text - Summaries (
summaries.parquet): Generated summaries with model metadata and parameters - Translations (
translations.parquet): Multi-language translations of summaries - Evaluations (
evaluations.parquet): Human evaluations of summary quality and readability
- Portable: Copy the entire
data/db/directory to move your data - Fast: DuckDB optimized for analytical workloads
- No Setup: Database files created automatically on first run
- Efficient: Parquet format provides excellent compression and query performance
- Zero External Dependencies: No PostgreSQL server installation required
- File-Based Storage: All data stored in portable Parquet files
- Single Directory Deployment: Copy the entire application directory to any machine
# Backup your entire database
cp -r data/db/ backup_$(date +%Y%m%d)/
# Migrate to new server
scp -r research_summary_app/ user@newserver:/path/to/deployment/- Identical Architecture: Development and production use the same storage format
- No Configuration Changes: No database connection strings or credentials to manage
- Instant Setup: Run
streamlit run app.pyon any machine with Python
The application integrates with multiple AI models through a unified API interface:
- All models use the same endpoint format
- Automatic fallback and error handling
- Configurable parameters (temperature, max tokens)
- Request/response logging for debugging
-
Database File Access Issues
- Ensure the
data/db/directory exists and is writable - Check file permissions for Parquet files
- Verify sufficient disk space for database operations
- Ensure the
-
AI Model API Errors
- Check API key configuration
- Verify internet connectivity
- Review API rate limits
-
File Upload Issues
- Ensure file formats are supported (.bib, .pdf)
- Check file size limits
- Verify file permissions
-
PDF Text Extraction Fails
- Try different PDF files
- Check if PDF is text-based (not scanned images)
- Verify PDF is not password-protected
-
Data Migration or Corruption
- DuckDB automatically handles data integrity
- Parquet files can be verified using DuckDB directly
- Backup/restore is as simple as copying the
data/db/directory
- Check browser console for JavaScript errors
- Review Streamlit logs in terminal
- DuckDB operations are logged in application output
- Database files are created automatically if missing
The application includes a comprehensive testing suite located in the tests_and_debug/ directory. This directory contains:
- Interactive test applications for BibTeX parsing
- Debug scripts for troubleshooting parsing issues
- Unit tests for parser functionality
- Test data files with real research paper examples
To run the testing suite:
# Run individual test files
python tests_and_debug/test_app.py
python tests_and_debug/debug_bibtex_detailed.py
# Run interactive BibTeX parser test
streamlit run tests_and_debug/test_app.py
# Run all debug scripts
cd tests_and_debug
for file in debug_*.py test_*.py; do
echo "Running $file..."
python "$file"
echo "---"
doneThe testing suite focuses on:
- BibTeX parsing with complex formatting scenarios
- Edge cases handling (spaces in keys, special characters, long abstracts)
- Database operations with DuckDB and Parquet storage
- Error handling and validation processes
For detailed information about each test file and debugging procedure, see the Testing README.
To contribute to the project:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License. See LICENSE file for details.
For support and questions:
- Check the troubleshooting section
- Review the documentation
- Submit issues through the project repository
Research made Readable - Making research accessible through AI-powered summarization.




