-
-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
questionFurther information is requestedFurther information is requested
Description
In our testing the current code produces unreliable results when tested on Wikipedia articles. Sometimes it returns a data, sometimes it doesn't. Wikipedia articles are constantly updated, so @coreydockser and I would like to propose to change it so it returns no date if the URL is a wikipedia.org one. In our broader experience with Media Cloud this produces more useful results (for our open web news analysis context).
In terms of implementation, we could just copy filter_url_for_undateable function from date_guesser and use that as is to include the other checks it does for undateable domains. We'd call it early on in guess_date.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested