Track competitor mentions across the web using AI-powered search and LLM extraction. Automatically monitors competitors, extracts competitive intelligence events, and stores structured data in PostgreSQL for analysis.
This pipeline automatically:
- Searches the web using Tavily AI (AI-native search engine optimized for agents)
- Extracts competitive intelligence events using DeepSeek LLM analysis:
- Product launches and feature releases
- Partnerships and collaborations
- Funding rounds and financial news
- Key executive hires/departures
- Acquisitions and mergers
- Indexes both raw articles and extracted events in PostgreSQL
- Enables queries like:
- "What has OpenAI been doing recently?"
- "Which competitors are making the most news?"
- "Find all partnership announcements"
- "What are the most significant competitive moves this week?"
- PostgreSQL Database - Choose one option:
- Local PostgreSQL installation
- Cloud PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database, etc.)
- Python 3.11+ - Required for CocoIndex
- API Keys (required):
- Tavily API key from tavily.com (free tier: 1,000 searches/month)
- OpenRouter API key for LLM extraction via GPT-4o-mini (cost-effective: ~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens)
Choose Option A (Local) or Option B (Cloud):
# Install PostgreSQL (macOS)
brew install postgresql@15
brew services start postgresql@15
# Create database
createdb competitive_intel
# Your connection string:
# postgresql://username:password@localhost:5432/competitive_intelGoogle Cloud SQL Example:
- Create PostgreSQL instance in Google Cloud Console
- Note the Public IP address (e.g.,
34.71.19.121) - Create database:
postgres(or custom name) - Set password for
postgresuser - Allow your IP in Cloud SQL connections
Connection string format:
postgresql://postgres:YOUR_PASSWORD@PUBLIC_IP:5432/postgres
💡 Special characters in password? URL-encode them:
@→%40#→%23&→%26
Example: Password Lucas@123 becomes Lucas%40123
AWS RDS / Azure: Same format, just use your cloud database endpoint instead of public IP.
pip install -e .Copy the example environment file and add your credentials:
cp .env.example .envEdit .env and set:
DATABASE_URL- Your PostgreSQL connection string (from Step 1)COCOINDEX_DATABASE_URL- Same as DATABASE_URL (required by CocoIndex)OPENAI_API_KEY- OpenRouter API key from openrouter.aiTAVILY_API_KEY- Tavily API key from tavily.comCOMPETITORS- Comma-separated list of companies to trackSEARCH_DAYS_BACK- How many days back to search (default: 7)
Example (Local PostgreSQL):
DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
COCOINDEX_DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
OPENAI_API_KEY=sk-or-v1-...
TAVILY_API_KEY=tvly-...
COMPETITORS=OpenAI,Anthropic,Google AI,Meta AI,Mistral AI
REFRESH_INTERVAL_SECONDS=3600
SEARCH_DAYS_BACK=7Example (Google Cloud SQL):
DATABASE_URL=postgresql://postgres:Lucas%40123@34.71.19.121:5432/postgres
COCOINDEX_DATABASE_URL=postgresql://postgres:Lucas%40123@34.71.19.121:5432/postgres
OPENAI_API_KEY=sk-or-v1-...
TAVILY_API_KEY=tvly-...
COMPETITORS=Apple,Google,Microsoft,Amazon,Meta
REFRESH_INTERVAL_SECONDS=3600
SEARCH_DAYS_BACK=7Option A: Interactive Mode (Recommended for first-time users)
Run the interactive CLI that prompts you for what to monitor:
python3 run_interactive.pyThis will ask you:
- Which companies to track
- What types of events to focus on (product launches, partnerships, funding, etc.)
- Time range to search (default: 7 days)
- How many articles per company (default: 10)
- One-time sync or continuous monitoring
See INTERACTIVE_DEMO.md for example sessions and use cases.
Option B: Direct Mode (For automated/scheduled runs)
Initial sync:
cocoindex update main -fContinuous monitoring (live mode):
cocoindex update -L main.pyRun the test script to verify data extraction:
python3 test_results.pySave extracted intelligence to a text file:
python3 generate_report.pyThis creates intelligence_report_YYYY-MM-DD_HH-MM-SS.txt with:
- Summary statistics
- Event type distribution
- Competitor rankings
- Detailed intelligence by company
See USAGE_GUIDE.md for more commands and TESTING.md for comprehensive testing.
Once the pipeline is running, you can query your competitive intelligence:
"What has Anthropic been doing recently?"
→ Uses: search_by_competitor(competitor="Anthropic")
"Find funding news about OpenAI"
→ Uses: search_by_competitor(competitor="OpenAI", event_type="funding")
"What are the most significant competitive moves this week?"
→ Uses: get_high_significance_events(days=7)
"Which AI companies are making the most news?"
→ Uses: get_trending_competitors(days=7)
"What partnerships has Google AI announced?"
→ Uses: search_partnerships(partner="Google AI")
Stores raw articles from news sources and blogs:
id- Article URL (primary key)title- Article headlinecontent- Article text/summaryurl- Source URLsource- Publisher namepublished_at- Publication timestamp
Stores extracted competitive intelligence events:
article_id- Reference to source articleevent_type- Category: product_launch, partnership, funding, key_hire, acquisitioncompetitor- Primary company involveddescription- Event summarysignificance- Impact rating: high, medium, lowrelated_companies- Other companies mentioned (partners, investors, etc.)
Edit main.py TavilySearchSource configuration:
flow.add_source(
TavilySearchSource(
api_key=tavily_api_key,
competitor=competitor.strip(),
days_back=7, # Adjust lookback period
max_results=20, # Increase results per competitor
),
refresh_interval_seconds=1800, # Check every 30 minutes
)Modify the search query in TavilySearchSource (line ~65):
search_query = (
f"{self.competitor} AND "
f"(funding OR partnership OR product launch OR acquisition OR executive hire OR regulatory)"
)Edit .env to track different companies:
COMPETITORS=Company1,Company2,Company3Edit the CompetitiveEvent model in main.py to track different event categories.
Adjust REFRESH_INTERVAL_SECONDS in .env:
3600= hourly (default)1800= every 30 minutes86400= daily
CocoIndex provides CocoInsight (free beta) for visualizing data lineage and debugging:
- See how data flows through the pipeline
- Inspect LLM extraction results
- Troubleshoot indexing issues
Visit the CocoIndex documentation for CocoInsight setup.
┌─────────────────────────────────────────────────────────────────────┐
│ COMPETITIVE INTELLIGENCE MONITOR │
└─────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Tavily AI │──────▶│ CocoIndex │──────▶│ PostgreSQL │
│ Search │ │ Pipeline │ │ Database │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
│ │ │
▼ ▼ ▼
Articles Extraction Intelligence
(web data) (GPT-4o-mini) (structured)
-
Data Ingestion (Tavily AI Search)
- Searches web for competitor mentions
- Filters by time range (configurable: 1-30 days)
- Returns clean, full article content
- Output: Raw articles with metadata
-
LLM Extraction (GPT-4o-mini via OpenRouter)
- Processes article content through LLM
- Extracts structured
CompetitiveEventobjects - Classifies: product launches, partnerships, funding, hires, acquisitions
- Assigns significance: high, medium, low
- Output: Structured intelligence events
-
Dual Indexing (CocoIndex + PostgreSQL)
- Articles Table: Raw content, URLs, sources, timestamps
- Events Table: Extracted intelligence with relationships
- Incremental updates (only new data processed)
- Output: Queryable database
-
Query Layer (SQL + Python)
- Search by competitor
- Filter by event type
- Rank by significance
- Trend analysis
- Output: Intelligence reports
- Incremental Processing: CocoIndex tracks processed articles, avoiding duplicate work
- Dual Indexing: Both raw content and extracted entities for maximum flexibility
- Weighted Scoring: High-significance events = 3 points, medium = 2, low = 1
- Relational Queries: Join articles with events for full context
- Real-time Monitoring: Continuous mode refreshes every hour (configurable)
Tavily is an AI-native search engine designed specifically for AI agents and LLMs:
- Clean content extraction - Returns full article text, not just snippets
- Relevance scoring - Built-in ranking for competitive intelligence
- No scraping needed - Handles content extraction and cleaning
- Free tier - 1,000 searches/month (enough for hourly monitoring of 5-10 competitors)
- Advanced search - Deeper crawling for comprehensive results
- Refine search queries - Add industry-specific keywords or event types
- Add custom event types - Track regulation changes, PR crises, etc.
- Sentiment analysis - Classify news as positive/negative/neutral
- Alert system - Get notified of high-significance events via email/Slack
- Dashboard - Build a web UI for exploring competitive intelligence
- Export reports - Generate weekly/monthly competitor summary reports
competitive-intelligence/
├── main.py # Core pipeline definition
├── run_interactive.py # Interactive CLI for easy setup
├── test_results.py # Validation and testing script
├── generate_report.py # Report generation tool
├── clear_and_run.py # Fresh data testing utility
├── pyproject.toml # Project dependencies
├── .env.example # Environment template
├── .env # Your credentials (git-ignored)
│
├── README.md # This file
├── QUICKSTART.md # 3-minute setup guide
├── USAGE_GUIDE.md # Complete command reference
├── TESTING.md # Testing procedures
├── INTERACTIVE_DEMO.md # Interactive mode examples
├── CLAUDE.md # Developer guidance
├── CONTRIBUTING.md # Contribution guidelines
└── LICENSE # MIT License
We welcome contributions! See CONTRIBUTING.md for guidelines.
- Report bugs via GitHub Issues
- Submit feature requests
- Improve documentation
- Add new data sources
- Create new query handlers
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with CocoIndex - Modern data pipeline framework
- Powered by Tavily AI Search - AI-native search engine
- LLM extraction via OpenRouter - Multi-model API gateway
- Documentation: Full docs | Quick Start | Usage Guide
- Issues: Report bugs or request features via GitHub Issues
- CocoIndex: cocoindex.io
- Examples: github.com/cocoindex-io/cocoindex
Built with ❤️ using CocoIndex | Track your competitors automatically