Skip to content

weizhang1992/WebSearchEngine-CourseProject-Webcrawler

Repository files navigation

(a) go to urlCrawler.py, you will find the code at first fewer lines:
		thequery="software engineer"
		maxpage=500 
(b)change thequery and maxpage to what you want to set

(c)the folder name will be Webcrawler

(d)set you basepath, result file will store at basepath+"/Webcrawler/result_file/"+thequery+"/"  

(e)run python urlCrawler.py at Terminal

(f)check the log or result file

(g)the crawler will be slow at start, because score all links. But will be fast after that.

About

For Web Search Engine Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages