A multi-threaded Dark Web OSINT tool for crawling and archiving .onion websites through Tor.
This tool is intended for security research and educational purposes. Users are responsible for compliance with all applicable laws. Only use on systems you own or have explicit permission to test.
- Multi-threaded concurrent downloads (configurable workers)
- SQLite database for persistent storage
- Content deduplication via SHA256 hashing
- Robots.txt compliance
- Automatic retry with exponential backoff
- HTML text extraction
- Tor circuit rotation for anonymity
- URL filtering (blacklist/whitelist with regex)
- Session persistence (resume interrupted crawls)
- Multi-format exports (JSON, CSV, TXT, HTML)
pip install requests beautifulsoup4Tor must be running on 127.0.0.1:9050 (default).
-
Install Tor:
# Debian/Ubuntu sudo apt install tor sudo systemctl start tor # macOS brew install tor tor
-
Install dependencies:
pip install requests beautifulsoup4
torsocks python torclawx.pyWhen running, you'll be prompted for:
- Starting .onion URL
- Max crawl depth (default: 2)
- Delay between requests in seconds (default: 1)
- Max concurrent workers (default: 5)
- Download all resources vs HTML only
- Enable database storage
- Tor circuit rotation frequency
- URL filtering patterns (optional)
- Resume previous session
For circuit rotation, enable Tor control port in /etc/tor/torrc:
ControlPort 9051
onion_archive/
├── html/ # HTML pages
├── images/ # Images
├── css/ # Stylesheets
├── js/ # JavaScript
├── other/ # Other files
├── text/ # Extracted text
├── reports/ # HTML reports
├── exports/ # JSON/CSV/TXT
└── crawler.db # SQLite database
Version : 1.0
License: MIT License - See LICENSE file
Developer: DR4CBOI
Contact: dr4cboi@protonmail.com