Skip to content

Releases: petebuffon/scuwl

v1.2 (2022-10-11)

11 Oct 20:40

Choose a tag to compare

New Features

  • Scuwl now attempts to abide by a robots.txt found at the root of the input URL.
  • Connections now have a default of 20 seconds. Additionally a new flag, -T (--timeout) has been added to change the default timeout value.

Improvements

  • The default user-agent is now scuwl/version.
  • Exceptions from the fetch method from Scraper have been moved from the general Exception to a series of more specific exceptions.
  • The fetch method also filters out any requests that don't contain "text/html" in "content-type".
  • Link extraction can now handle scheme-relative, origin-relative, and directory-relative URLs.
  • JPG and SVG links, as well as link fragments are now ignored.
  • Global variable version is now pulled by get_distribution from pkg_resources.
  • Added functionaility where a a new set (scraper.urls) keeps track of urls already visited. Scraper.urls stores the blake2b 32-byte digest instead of the actual urls. This cuts down on recursive calls of scraper.recursive_scrape.

v1.1 (2022-10-05)

06 Oct 00:14

Choose a tag to compare

New Features

  • Added -t, --tables flag. This flag limits the scraping of websites to text from table elements.
  • Added -a, --alpha flag. This flag limits words extracted from websites to alphabet characters only.
  • Added -m, --max-length flag. This flag limits words extracted from websites to a specific maximum length.
  • Generated wordlists are now sorted.

Improvements

  • Updated help text.
  • Refined passing URL checks.
  • Cleaned up code.

v1.0 (2022-09-30)

05 Oct 23:19
fbf7a37

Choose a tag to compare

Initial Release