Update for https
authorMagnus Hagander <magnus@hagander.net>
Wed, 25 May 2016 13:37:32 +0000 (15:37 +0200)
committerMagnus Hagander <magnus@hagander.net>
Wed, 25 May 2016 13:37:32 +0000 (15:37 +0200)
Uncommented "add length 7" to array deindex seems to have come from it
being the length of http://. Now changed to https://, so change the
length as well, and properly comment it.

tools/search/crawler/lib/sitemapsite.py

index a6f5ae83139ebaf6f8526300657c8e5c3bc7893b..4534a456a23f093eb84ad6f26cfeded63c110f5e 100644 (file)
@@ -69,7 +69,8 @@ class SitemapSiteCrawler(BaseSiteCrawler):
                u.close()
 
                for url, prio, lastmod in p.urls:
-                       url = url[len(self.hostname)+7:]
+                       # Advance 8 characters - length of https://.
+                       url = url[len(self.hostname)+8:]
                        if lastmod:
                                if self.scantimes.has_key(url):
                                        if lastmod < self.scantimes[url]: