Torclawx

A multi-threaded Dark Web OSINT tool for crawling and archiving .onion websites through Tor.

Disclaimer

This tool is intended for security research and educational purposes. Users are responsible for compliance with all applicable laws. Only use on systems you own or have explicit permission to test.

Features

Multi-threaded concurrent downloads (configurable workers)
SQLite database for persistent storage
Content deduplication via SHA256 hashing
Robots.txt compliance
Automatic retry with exponential backoff
HTML text extraction
Tor circuit rotation for anonymity
URL filtering (blacklist/whitelist with regex)
Session persistence (resume interrupted crawls)
Multi-format exports (JSON, CSV, TXT, HTML)

Requirements

pip install requests beautifulsoup4

Tor must be running on 127.0.0.1:9050 (default).

Installation

Install Tor:

# Debian/Ubuntu
sudo apt install tor
sudo systemctl start tor

# macOS
brew install tor
tor

Install dependencies:
```
pip install requests beautifulsoup4
```

Usage

torsocks python torclawx.py

Configuration Options

When running, you'll be prompted for:

Starting .onion URL
Max crawl depth (default: 2)
Delay between requests in seconds (default: 1)
Max concurrent workers (default: 5)
Download all resources vs HTML only
Enable database storage
Tor circuit rotation frequency
URL filtering patterns (optional)
Resume previous session

Tor Control (Optional)

For circuit rotation, enable Tor control port in /etc/tor/torrc:

ControlPort 9051

Output Structure

onion_archive/
├── html/          # HTML pages
├── images/        # Images
├── css/           # Stylesheets
├── js/            # JavaScript
├── other/         # Other files
├── text/          # Extracted text
├── reports/       # HTML reports
├── exports/       # JSON/CSV/TXT
└── crawler.db     # SQLite database

About Project

Version : 1.0

License: MIT License - See LICENSE file

Developer: DR4CBOI

Contact: dr4cboi@protonmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
torclawx.py		torclawx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Torclawx

Disclaimer

Features

Requirements

Installation

Usage

Configuration Options

Tor Control (Optional)

Output Structure

About Project

About

Uh oh!

Releases

Packages

Languages

License

DR4CBOI/TorClawx

Folders and files

Latest commit

History

Repository files navigation

Torclawx

Disclaimer

Features

Requirements

Installation

Usage

Configuration Options

Tor Control (Optional)

Output Structure

About Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages