Convert PowerPoint presentations into semantically rich text using Vision Language Models.
ppt2desc is a command-line tool that converts PowerPoint presentations into detailed textual descriptions. PowerPoint presentations are an inherently visual medium that often convey complex ideas through a combination of text, graphics, charts, and other visual layouts. This tool uses vision language models to not only transcribe the text content but also interpret and describe the visual elements and their relationships, capturing the full semantic meaning of each slide in a machine-readable format.
- Convert PPT/PPTX files to semantic descriptions
- Process individual files or entire directories
- Support for visual elements interpretation (charts, graphs, figures)
- Rate limiting for API calls
- Customizable prompts and instructions
- JSON output format for easy integration
Current Model Provider Support
- Gemini models via Google Gemini API
- GPT Models via OpenAI API
- Claude Models via Anthropic API
- Gemini Models via Google Cloud Platform Vertex AI
- GPT Models via Microsoft Azure AI Foundry Deployments
- Nova & Claude Models via Amazon Web Services's Amazon Bedrock
- Python 3.13 or higher
- UV package manager (install from uv.pm)
- LibreOffice (for PPT/PPTX to PDF conversion)
- Option 1: Install LibreOffice locally.
- Option 2: Use the provided Docker container for LibreOffice.
- vLLM API credentials
- Clone the repository:
git clone https://github.com/ALucek/ppt2desc.git
cd ppt2desc- Installing LibreOffice
LibreOffice is a critical dependency for this tool as it handles the headless conversion of PowerPoint files to PDF format
Option 1: Local Installation
Linux Systems:
sudo apt install libreofficemacOS:
brew install libreofficeWindows:
Build from the installer at LibreOffice's Official Website
Option 2: Docker-based Installation
a. Ensure you have Docker installed on your system
b. Run the following command
docker compose up -dThis command will build the Docker image based on the provided Dockerfile and start the container in detached mode. The LibreOffice conversion service will be accessible athttp://localhost:2002. Take down with docker compose down.
- Install dependencies using UV:
uv syncThis will create a virtual environment and install all dependencies from pyproject.toml.
Basic usage with Gemini API:
uv run src/main.py \
--input_dir /path/to/presentations \
--output_dir /path/to/output \
--libreoffice_path /path/to/soffice \
--client gemini \
--api_key YOUR_GEMINI_API_KEYGeneral Arguments:
--input_dir: Path to input directory or PPT file (required)--output_dir: Output directory path (required)--client: LLM client to use: 'gemini', 'vertexai', 'anthropic', 'azure', 'aws' or 'openai' (required)--model: Model to use (default: "gemini-2.5-flash")--instructions: Additional instructions for the model--libreoffice_path: Path to LibreOffice installation--libreoffice_url: Url for docker-based libreoffice installation (configured: http://localhost:2002)--rate_limit: API calls per minute (default: 60)--prompt_path: Custom prompt file path--api_key: Model Provider API key (if not set via environment variable)--save_pdf: Include to save the converted PDF in your output folder--save_images: Include to save the individual slide images in your output folder
Vertex AI Specific Arguments:
--gcp_project_id: GCP project ID for Vertex AI service account--gcp_region: GCP region for Vertex AI service (e.g., us-central1)--gcp_application_credentials: Path to GCP service account JSON credentials file
Azure AI Foundry Specific Arguments:
--azure_openai_api_key: Azure AI Foundry Resource Key 1 or Key 2--azure_openai_endpoint: Azure AI Foundry deployment service endpoint link--azure_deployment_name: The name of your model deployment--azure_api_version: Azure API Version (Default: "2023-12-01-preview")
AWS Amazon Bedrock Specific Arguments:
--aws_access_key_id: Bedrock Account Access Key--aws_secret_access_key: Bedrock Account Account Secret Access Key--aws_region: AWS Bedrock Region
Using Gemini API:
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client gemini \
--model gemini-2.5-flash \
--rate_limit 30 \
--instructions "Focus on extracting numerical data from charts and graphs"Using Vertex AI:
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--client vertexai \
--libreoffice_path ./soffice \
--gcp_project_id my-project-123 \
--gcp_region us-central1 \
--gcp_application_credentials ./service-account.json \
--model gemini-2.5-pro \
--instructions "Extract detailed information from technical diagrams"Using Azure AI Foundry:
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client azure \
--azure_openai_api_key 123456790ABCDEFG \
--azure_openai_endpoint 'https://example-endpoint-001.openai.azure.com/' \
--azure_deployment_name gpt-4o \
--azure_api_version 2023-12-01-preview \
--rate_limit 60Using AWS Amazon Bedrock:
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--client aws \
--model us.amazon.nova-lite-v1:0 \
--aws_access_key_id 123456790ABCDEFG \
--aws_secret_access_key 123456790ABCDEFG \
--aws_region us-east-1 \
--rate_limit 60The tool generates JSON files with the following structure:
{
"deck": "presentation.pptx",
"model": "model-name",
"slides": [
{
"number": 1,
"content": "Detailed description of slide content..."
},
// ... more slides
]
}When using the Docker container for LibreOffice, you can use the --libreoffice_url argument to direct the conversion process to the container's API endpoint, rather than a local installation.
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_url http://localhost:2002 \
--client vertexai \
--model gemini-2.5-pro \
--gcp_project_id my-project-123 \
--gcp_region us-central1 \
--gcp_application_credentials ./service-account.json \
--rate_limit 30 \
--instructions "Extract detailed information from technical diagrams" \
--save_pdf \
--save_imagesYou should use either --libreoffice_url or --libreoffice_path but not both.
You can modify the base prompt by editing src/prompt.py (specifically the BASE_PROMPT constant) or providing additional instructions via the command line:
uv run src/main.py \
--input_dir ./presentations \
--output_dir ./output \
--libreoffice_path ./soffice \
--instructions "Include mathematical equations and formulas in LaTeX format"For Consumer APIs:
- Set your API key via the
--api_keyargument or through your respective provider's environment variables
For Vertex AI:
- Create a service account in your GCP project IAM
- Grant necessary permissions (typically, "Vertex AI User" role)
- Download the service account JSON key file
- Provide the credentials file path via
--gcp_application_credentials
For Azure OpenAI Foundry:
- Create an Azure OpenAI Resource
- Navigate to Azure AI Foundry and choose the subscription and Azure OpenAI Resource to work with
- Under management select deployments
- Select create new deployment and configure with your vision LLM
- Provide deployment name, API key, endpoint, and api version via
--azure_deployment_name,--azure_openai_api_key,--azure_openai_endpoint,--azure_api_version,
For AWS Bedrock:
- Request access to serverless model deployments in Amazon Bedrock's model catalog
- Create a user in your AWS IAM
- Enable Amazon Bedrock access policies for your user
- Save User Credentials access key and secret access key
- Provide user's credentials via
--aws_access_key_id, and--aws_secret_access_key
Contributions are welcome! Please feel free to submit a Pull Request.
Todo
- Handling google's new genai SDK for a unified gemini/vertex experience
- Better Docker Setup
- AWS Llama Vision Support Confirmation
- Combination of JSON files across multiple ppts
- Dynamic font understanding (i.e. struggles when font that ppt is using is not installed on machine)
This project is licensed under the MIT License - see the LICENSE file for details.
- LibreOffice for PPT/PPTX conversion
- PyMuPDF for PDF processing