LLM Projects & Notes

A collection of practical LLM projects for learning and experimentation.

Why This Project?

We're surrounded by information, but often lack the time to digest it all. Whether you're researching competitors, staying on top of industry news, evaluating tools, or simply trying to understand what a company does before a meeting - manually reading through websites is time-consuming.

This project explores how LLMs can help us quickly extract insights from web content while keeping everything local and private.

When This Is Useful

Research & Discovery - Quickly understand what a company or product does
Competitive Analysis - Get the gist of competitor websites without reading every page
Meeting Prep - Summarize a client's or partner's website before a call
Content Curation - Evaluate if an article is worth a deep read
Learning - Understand how to build LLM-powered tools from scratch

When NOT to Use This

Don't scrape sites that prohibit it - Check robots.txt and Terms of Service
Don't use for mass scraping - This is for occasional, personal use
Don't rely on it for critical decisions - LLMs can miss nuance or hallucinate
Don't scrape login-protected content - Respect access controls
Don't use commercially without consideration - Website content belongs to its owners

A note on responsibility: Web scraping exists in a gray area. This tool is meant for personal productivity and learning. Be respectful of website owners, don't hammer servers with requests, and always consider whether your use case is ethical and legal.

Projects

1. Website Summarizer (`1_llm_website_summarizer/`)

A web scraper + LLM-powered summarizer with a clean Gradio UI. Paste any URL and get an AI-generated summary.

Features:

Smart scraping with BeautifulSoup + Playwright fallback (handles JS-heavy sites)
Streaming responses for real-time output
Two LLM providers:
- OpenAI (paid) - GPT-4o, GPT-4o-mini, etc.
- Ollama (free) - Llama 3.2, Mistral, Gemma, and more (runs 100% locally)

Learning Path:

Start with summarizer_tutorial.ipynb - understand the fundamentals
Explore scraper.py - see how web scraping works
Check out app.py - learn how to build a UI with Gradio

Quick Start

Prerequisites

Python 3.11+
uv (recommended) or pip
Ollama (optional, for free local models)

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/LLM_Projects_Notes.git
cd LLM_Projects_Notes

# Install dependencies with uv (recommended)
uv sync

# Or with pip
pip install -e .

Setup Environment Variables

# Copy the example env file
cp .env.example .env

# Edit .env and add your API keys
# For OpenAI: Get key at https://platform.openai.com/api-keys

Install Playwright (for JS-heavy sites)

# Install browser binaries
playwright install chromium

Run the App

# Activate the virtual environment (if using uv)
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

# Run the Gradio app
python 1_llm_website_summarizer/app.py

Open http://127.0.0.1:7860 in your browser.

Using Ollama (Free, Local)

Ollama lets you run LLMs entirely on your machine - no API keys, no costs, complete privacy.

Setup Ollama

Install Ollama: https://ollama.com/download

Pull a model:

# Recommended for most machines (4.7GB)
ollama pull llama3.2

# Lighter alternative (2.0GB)
ollama pull phi3

# More powerful if you have the RAM (8GB+)
ollama pull llama3.1

Start Ollama server:
```
ollama serve
```
In the app, select "Ollama" as the provider and choose your model.

Project Structure

LLM_Projects_Notes/
├── 1_llm_website_summarizer/
│   ├── app.py                   # Gradio web UI (the final product)
│   ├── scraper.py               # Web scraping utilities
│   └── summarizer_tutorial.ipynb # Tutorial notebook (start here!)
├── .env.example                 # Example environment variables
├── .gitignore                   # Git ignore rules
├── pyproject.toml               # Project dependencies
└── README.md                    # This file

Configuration

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	For OpenAI	Your OpenAI API key
`ANTHROPIC_API_KEY`	Optional	For future Anthropic support

Scraper Settings

In app.py, you can adjust:

max_chars: Maximum characters to scrape (default: 5000)
Playwright fallback is automatic for JS-heavy sites

Extending This Project

Adding New LLM Providers

Add a new function in app.py following the pattern of summarize_with_openai() or summarize_with_ollama()
Add the provider to the dropdown choices
Update the summarize_website() function to handle the new provider

Customizing the Prompt

Edit the SYSTEM_PROMPT constant in app.py to change how summaries are generated.

Using in Your Own Code

from scraper import fetch_website_contents, fetch_website_links

# Get website text content
content = fetch_website_contents("https://example.com", max_chars=5000)

# Get all links on a page
links = fetch_website_links("https://example.com")

Ideas for Enhancement

The scraper includes a fetch_website_links() function that extracts all links from a page. Here are some ideas to build on this project:

Multi-Page Summarizer

Crawl an entire website by following links and summarize each page. Useful for getting a complete picture of a company or product.

from scraper import fetch_website_contents, fetch_website_links

# Get all links from homepage
links = fetch_website_links("https://example.com")

# Filter to same-domain links, then summarize each
for link in links[:10]:  # Limit to avoid hammering the server
    summary = summarize_website(link)
    print(f"## {link}\n{summary}\n")

Link Auditor

Find broken links or analyze where a website links to (external dependencies, partners, etc.).

Research Assistant

Given a topic, scrape multiple sources and generate a consolidated summary with citations.

Content Change Tracker

Periodically scrape a page and use an LLM to identify what changed since last time.

Troubleshooting

"OpenAI API key not found"

Make sure you created .env file (copy from .env.example)
Check that OPENAI_API_KEY is set correctly
Restart the app after changing .env

"Could not connect to Ollama"

Make sure Ollama is installed: https://ollama.com/download
Start the server: ollama serve
Check it's running: ollama list

"Model not found" (Ollama)

Pull the model first: ollama pull llama3.2
Check available models: ollama list

Scraper returns very little content

Some sites are heavily JavaScript-based
Playwright fallback should handle most cases
Very dynamic sites (React SPAs) may still be challenging

Security Notes

API Keys: Never commit .env files. The .gitignore is configured to exclude them.
Local Only: The Gradio app runs on 127.0.0.1 only, not exposed to your network.
No Cloud: This app is designed for local use. No data is sent anywhere except to the LLM provider you choose.
Private URLs Blocked: The scraper blocks requests to localhost and private IP ranges.

License

MIT License - feel free to use, modify, and distribute.

Contributing

Contributions welcome! Please open an issue or PR.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
1_llm_website_summarizer		1_llm_website_summarizer
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

Squib-17/llm_projects

Folders and files

Latest commit

History

Repository files navigation

LLM Projects & Notes

Why This Project?

When This Is Useful

When NOT to Use This

Projects

1. Website Summarizer (1_llm_website_summarizer/)

Quick Start

Prerequisites

Installation

Setup Environment Variables

Install Playwright (for JS-heavy sites)

Run the App

Using Ollama (Free, Local)

Setup Ollama

Project Structure

Configuration

Environment Variables

Scraper Settings

Extending This Project

Adding New LLM Providers

Customizing the Prompt

Using in Your Own Code

Ideas for Enhancement

Multi-Page Summarizer

Link Auditor

Research Assistant

Content Change Tracker

Troubleshooting

"OpenAI API key not found"

"Could not connect to Ollama"

"Model not found" (Ollama)

Scraper returns very little content

Security Notes

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. Website Summarizer (`1_llm_website_summarizer/`)

Packages