Learning to Webscrape!

I came across this webscraping kata on Codewars and didn't have the slightest idea on where to start. So, I took to google and found a few tutorials!

Tutorials used:

This freeCodeCamp tutorial for the wikipedia_scraper.py file
This Real Python tutorial for the scraper.py file

Both of these tutorials use Beautiful Soup

Wikipedia_scraper.py

This was a simple tutorial to follow, which at first simply returns the title of the current article and one wikipedia link from the entire article at random.

Removing the shuffling line broke the code, I'm still not sure why, so it's stayed in.

The 2nd leap was to take what we'd written and turn it into a method that returns infinite random wikipedia titles, jumping from one article to another.

This worked fine but was a little unpredictable and would crash as the method is an infinite loop!

Here is the output prior to crashing:

scraper.py

This again was simple to follow. Although, I ran into errors when I tried to translate this to the monster.co.uk website as the html element I required didn't have an id tag, only a class one. This is something to come back to as I'm sure with more knowledge I'll be able to figure out how to do this!

After a lot of frustration, I decided to continue using the monster.com site instead and I could get it to work.

I tested the python_jobs filter on line 11 by switching 'python' with 'engineer' as there were no python jobs currently going in Texas. This allowed me to see what the output would be like!

All in all, this was a fun tutorial to follow, but I still need to go into more of it to solidify my learning.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
scraper.py		scraper.py
wikipedia_scraper.py		wikipedia_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Webscrape!

Tutorials used:

Wikipedia_scraper.py

scraper.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning to Webscrape!

Tutorials used:

Wikipedia_scraper.py

scraper.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages