A middleware for scrapy. Used to change HTTP proxy from time to time.
Initial proxyes are stored in a file. During runtime, the middleware will fetch new proxyes if it finds out lack of valid proxyes.
Used to fetch free proxyes from the Internet. Could be modified by youself.
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': 3,
# put this middleware behind retrymiddleware
'crawler.middleware.HttpProxyMiddleware': 4,
}
Often, we wanna change to use a new proxy when our spider gets banned. Just recognize your IP being banned and yield a new Request with meta["change_proxy"]=True in your Spider.parse method