Scrapping Web use BeautfiulSoup
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the library is good for, how it works, how to use it, how to make it do what you want, and what to do when it violates your expectations.
- You must download python 2.7 or higher
after that, open terminal and run this code use Python Install Package
pip install beautifulsoup4
- If you want to use Jupyter Notebook. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
pip install jupyter
then, into the folder python-BeautifulSoup
cd jupyter
and type below to run
jupyter notebook
and then, you can see text like this 'Copy/paste this URL into your browser when you connect for the first time, to login with a token' for example URL:
http://localhost:8889/?token=af34c4bd65b51f89466c0058a0bae2d90723e6f354cbcac2