Skip to content

DR4CBOI/TorClawx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Torclawx

A multi-threaded Dark Web OSINT tool for crawling and archiving .onion websites through Tor.

Disclaimer

This tool is intended for security research and educational purposes. Users are responsible for compliance with all applicable laws. Only use on systems you own or have explicit permission to test.

Features

  • Multi-threaded concurrent downloads (configurable workers)
  • SQLite database for persistent storage
  • Content deduplication via SHA256 hashing
  • Robots.txt compliance
  • Automatic retry with exponential backoff
  • HTML text extraction
  • Tor circuit rotation for anonymity
  • URL filtering (blacklist/whitelist with regex)
  • Session persistence (resume interrupted crawls)
  • Multi-format exports (JSON, CSV, TXT, HTML)

Requirements

pip install requests beautifulsoup4

Tor must be running on 127.0.0.1:9050 (default).

Installation

  1. Install Tor:

    # Debian/Ubuntu
    sudo apt install tor
    sudo systemctl start tor
    
    # macOS
    brew install tor
    tor
  2. Install dependencies:

    pip install requests beautifulsoup4

Usage

torsocks python torclawx.py

Configuration Options

When running, you'll be prompted for:

  • Starting .onion URL
  • Max crawl depth (default: 2)
  • Delay between requests in seconds (default: 1)
  • Max concurrent workers (default: 5)
  • Download all resources vs HTML only
  • Enable database storage
  • Tor circuit rotation frequency
  • URL filtering patterns (optional)
  • Resume previous session

Tor Control (Optional)

For circuit rotation, enable Tor control port in /etc/tor/torrc:

ControlPort 9051

Output Structure

onion_archive/
├── html/          # HTML pages
├── images/        # Images
├── css/           # Stylesheets
├── js/            # JavaScript
├── other/         # Other files
├── text/          # Extracted text
├── reports/       # HTML reports
├── exports/       # JSON/CSV/TXT
└── crawler.db     # SQLite database

About Project

Version : 1.0

License: MIT License - See LICENSE file

Developer: DR4CBOI

Contact: dr4cboi@protonmail.com

About

A multi-threaded Dark Web OSINT tool for crawling and archiving .onion websites through Tor.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages