Brainstorm with Pdfs

This application allows you to ask questions about the content of a PDF document and receive answers generated by a Large Language Model (LLM). It leverages semantic similarity to find relevant sections within the PDF before feeding them to the LLM, resulting in more accurate and context-aware responses.

How it Works

The application follows these steps:

PDF Reading: The application takes a PDF file as input and extracts its textual content.
Text Chunking: The extracted text is split into smaller, manageable chunks. This is crucial for efficient processing by the LLM and for focusing on relevant information.
Embedding Generation: Using Hugging Face Sentence Transformers, each text chunk is converted into a vector embedding. These embeddings capture the semantic meaning of the text.
Question Embedding: When you ask a question, it is also converted into a vector embedding using the same Sentence Transformer model.
Semantic Similarity Search: The application calculates the semantic similarity between the question's embedding and the embeddings of all the text chunks. This identifies the chunks that are most relevant to your question.
Contextual Information for LLM: The most semantically similar text chunks are retrieved and provided as context to the LLM.
Answer Generation: The LLM uses the provided context to generate an answer to your question.
User Interface: Streamlit provides an interactive graphical user interface for uploading PDFs and asking questions.
LLM Integration: Langchain is used to seamlessly integrate with the LLM.

Deployed Application

You can try the deployed application here: https://pdfquestion-euvx4eayptjuxzvrht6wn5.streamlit.app/

Repository

The source code for this application can be found on GitHub: https://github.com/Samtoosoon/Pdfquestion/tree/main

Installation

To run this application locally, follow these steps:

Clone the repository:

git clone [https://github.com/Samtoosoon/Pdfquestion.git](https://www.google.com/search?q=https://github.com/Samtoosoon/Pdfquestion.git)
cd Pdfquestion

Install the required Python packages:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a file named .env in the root directory of the repository.
- Add your Hugging Face API key to the .env file. You can obtain an API key from the Hugging Face website.
```
HUGGINGFACE_API_KEY=YOUR_HUGGINGFACE_API_KEY
```
  (Replace YOUR_HUGGINGFACE_API_KEY with your actual API key)

Usage

Navigate to the repository directory in your terminal.
Run the Streamlit application:
```
streamlit run app.py
```
Open your web browser to the address displayed in the terminal (usually http://localhost:8501).
Follow the on-screen instructions:
- Upload a PDF file using the file uploader.
- Enter your question in the text input field.
- Click the "Ask" button to get your answer.

Code Structure (Brief Overview)

app.py: This is the main Streamlit application file that handles the user interface, PDF processing, embedding generation, similarity search, and interaction with the LLM.
requirements.txt: This file lists all the Python dependencies required to run the application.
.env: This file stores sensitive information like your Hugging Face API key. Ensure this file is not committed to your version control system (e.g., add it to your .gitignore file).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brainstorm with Pdfs

How it Works

Deployed Application

Repository

Installation

Usage

Code Structure (Brief Overview)

About

Uh oh!

Releases

Packages

Languages

Samtoosoon/Pdfquestion

Folders and files

Latest commit

History

Repository files navigation

Brainstorm with Pdfs

How it Works

Deployed Application

Repository

Installation

Usage

Code Structure (Brief Overview)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages