A modular system to discover what people actually want and struggle with on Reddit, focusing on business and entrepreneurship topics.
This system generates exploratory queries to find user needs, pain points, and desires rather than pre-defining questions. It discovers what people want and struggle with through sentiment analysis of Reddit discussions.
- Generate Subqueries: Create targeted search queries for different subreddits and topics
- Firecrawl Search: Use Firecrawl API to search Reddit with rate limiting (5 searches/minute)
- Extract Insights: Process results to identify pain points, desires, and solutions
query_generator.py- Generates and saves discovery queries to JSONreddit_scraper.py- Loads queries and scrapes Reddit for user insightscrawl.py- Main runner that shows usage and system statusrequirements.txt- Python dependencies
discovery_queries_YYYYMMDD_HHMMSS.json- Generated query tablesuser_insights_YYYYMMDD_HHMMSS.json- Final scraped insightsuser_insights_YYYYMMDD_HHMMSS_progress.json- Progress saves during scraping
# Install dependencies
pip install firecrawl-py python-dotenv
# Create .env file with your Firecrawl API key
echo "FIRECRAWL_API_KEY=your_api_key_here" > .envpython3 query_generator.pyThis creates a JSON file with 682+ discovery queries including:
- Base discovery queries (42)
- Subreddit-specific queries (190)
- Topic-specific queries (450)
python3 reddit_scraper.pyThis loads the queries and scrapes Reddit to find user insights. The scraper respects Firecrawl's rate limit of 5 searches per minute (15-second delays between searches).
python3 crawl.pyShows system status, file checks, and usage instructions.
The system generates queries to discover:
- "frustrated with reddit.com"
- "struggling with reddit.com"
- "having trouble reddit.com"
- "problem with reddit.com"
- "want to reddit.com"
- "wish I could reddit.com"
- "looking for reddit.com"
- "need reddit.com"
- "worked for me reddit.com"
- "succeeded in reddit.com"
- "helped me reddit.com"
- "better than reddit.com"
- "alternative to reddit.com"
- "vs reddit.com"
- r/entrepreneur, r/startups, r/smallbusiness
- r/marketing, r/sales, r/finance
- r/webdev, r/programming, r/SaaS
- r/AskReddit, r/legaladvice, r/careerguidance
- And more...
The scraper analyzes content for:
- Insight Type: pain_point, desire, solution, comparison
- Intensity: 1-10 scale based on emotional language
- Key Phrases: business terms (marketing, sales, funding, etc.)
- Platform Classification: Reddit-focused
{
"total_insights": 150,
"timestamp": "2025-01-XX...",
"insights": [...],
"insight_summary": {
"by_type": {"pain_point": 45, "desire": 38, "solution": 67},
"by_intensity": {"high": 23, "medium": 89, "low": 38},
"top_key_phrases": {"marketing": 15, "sales": 12, "funding": 8},
"platforms": {"reddit": 150}
}
}- Modular Architecture: Separate query generation and scraping
- No Pre-defined Questions: Discovers what people actually want
- Rate Limited: Respects Firecrawl's 5 searches/minute limit
- Progress Saving: Saves progress during long scraping sessions
- Business-Focused: Targets entrepreneurship and business topics
firecrawl-py- Web scraping and searchpython-dotenv- Environment variable managementjson- Data serializationdatetime- Timestamping
- Run
query_generator.py→ Creates query table - Run
reddit_scraper.py→ Discovers user insights - Analyze JSON output → Find business opportunities
- Use insights → Build products people actually want
This system helps you understand what people really struggle with and want, rather than guessing or using predefined questions.