-
Notifications
You must be signed in to change notification settings - Fork 0
shengssw/WebCrawler
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Project Name: simple multi-threaded crawler Author: Sheng Wang 1.Environment: python 3.9 2.Usage: In terminal, go into the project directory. Run python main.py -m "simple" (for simple crawl) Run python main.py -m "Priority" (for prority crawl) type in the query you want in the prompt and hit enter. 3.Description: The program will take in a user query and run a google search for 10 intial seed pages. Then the seeds will be passed into crawler and the crawler will start crawling until certain condition is meet. The crawled results will be write to a log file. For each log entry, it will record the crawled url, its fetch status, the size of the page if it get fetched successfully, and the priority score if the mode is priority.
About
Backup for CS6913
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published