Canary is a command line tool to scrape html elements, test availability of links and images, and build an easy to read report
WARNING!:
Be careful about the number of urls being verified / scraped.
Frequent Use may be viewed by Web Application Firewalls / IDPS systems as a DoS Attack
Do Not use this tool on any websites without the owner's permission.
Symbiotech LLC is not liable / responsible for how this tool is used.
python canary.py -h
python canary.py --excel
python canary.py -u "http://www.google.com" -u "https://www.yahoo.com"
python canary.py -f "filepath.txt"
python canary.py -f "filepath.txt" -base "https://www.mydomain.com"
filepath.txt:
1. /home.html
2. /sitemap.html
would test:
1. https://www.mydomain.com/home.html
2. https://www.mydomain.com/sitemap.html
Report will contain the following:
- URL
- Status Code
- Message
- Page Title
python canary.py -u "https://www.google.com" --type status
Report will contain the following:
- anchor links
<a> - images
<img> - forms / input fields (
<button>, <input>, etc..) - iframes
python canary.py -u "https://www.google.com" --type scrape
Report will contain:
- all anchor links
<a>, status code, status code message, and page title - images
<img>,status code, status code message, and page title - forms / input fields
- iframes
python canary.py -u "https://www.google.com" --type verify
You will be prompted for password. This is to help maintain security and hide password from cmdline history.
Warning: You may still be able to see password sent in clear text if capturing network packets or monitoring the network
python canary.py -u "https://www.google.com" --type verify -webuser "grimm"
It is sometimes common to see links to facebook, twitter, and other sites when scraping. This will limit the results to what you care about.
python canary.py -u "https://www.mydomain.com" --type verify --limit "https://www.mydomain.com"
Exclude urls that have the specified domain
python canary.py -u "https://www.google.com" --type verify --exclude "https://www.facebook.com"
