Skip to content

Samtoosoon/Pdfquestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brainstorm with Pdfs

Streamlit App

This application allows you to ask questions about the content of a PDF document and receive answers generated by a Large Language Model (LLM). It leverages semantic similarity to find relevant sections within the PDF before feeding them to the LLM, resulting in more accurate and context-aware responses.

How it Works

The application follows these steps:

  1. PDF Reading: The application takes a PDF file as input and extracts its textual content.

  2. Text Chunking: The extracted text is split into smaller, manageable chunks. This is crucial for efficient processing by the LLM and for focusing on relevant information.

  3. Embedding Generation: Using Hugging Face Sentence Transformers, each text chunk is converted into a vector embedding. These embeddings capture the semantic meaning of the text.

  4. Question Embedding: When you ask a question, it is also converted into a vector embedding using the same Sentence Transformer model.

  5. Semantic Similarity Search: The application calculates the semantic similarity between the question's embedding and the embeddings of all the text chunks. This identifies the chunks that are most relevant to your question.

  6. Contextual Information for LLM: The most semantically similar text chunks are retrieved and provided as context to the LLM.

  7. Answer Generation: The LLM uses the provided context to generate an answer to your question.

  8. User Interface: Streamlit provides an interactive graphical user interface for uploading PDFs and asking questions.

  9. LLM Integration: Langchain is used to seamlessly integrate with the LLM.

    Screenshot 2025-03-31 200248

Deployed Application

You can try the deployed application here: https://pdfquestion-euvx4eayptjuxzvrht6wn5.streamlit.app/

Repository

The source code for this application can be found on GitHub: https://github.com/Samtoosoon/Pdfquestion/tree/main

Installation

To run this application locally, follow these steps:

  1. Clone the repository:

    git clone [https://github.com/Samtoosoon/Pdfquestion.git](https://www.google.com/search?q=https://github.com/Samtoosoon/Pdfquestion.git)
    cd Pdfquestion
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Set up environment variables:

    • Create a file named .env in the root directory of the repository.

    • Add your Hugging Face API key to the .env file. You can obtain an API key from the Hugging Face website.

      HUGGINGFACE_API_KEY=YOUR_HUGGINGFACE_API_KEY
      

      (Replace YOUR_HUGGINGFACE_API_KEY with your actual API key)

Usage

  1. Navigate to the repository directory in your terminal.

  2. Run the Streamlit application:

    streamlit run app.py
  3. Open your web browser to the address displayed in the terminal (usually http://localhost:8501).

  4. Follow the on-screen instructions:

    • Upload a PDF file using the file uploader.
    • Enter your question in the text input field.
    • Click the "Ask" button to get your answer.

Code Structure (Brief Overview)

  • app.py: This is the main Streamlit application file that handles the user interface, PDF processing, embedding generation, similarity search, and interaction with the LLM.
  • requirements.txt: This file lists all the Python dependencies required to run the application.
  • .env: This file stores sensitive information like your Hugging Face API key. Ensure this file is not committed to your version control system (e.g., add it to your .gitignore file).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages