Websites can be accessed using the urllib module. You can use the urllib module to interact with any website in the world, no matter if you want to get data, post data or parse data.
If you want to do web scraping or data mining, you can use urllib but it’s not the only option. Urllib will just fetch the data, but if you want to emulate a complete web browser, there’s also a module for that.
Web Scraping in Python with BeautifulSoup & Scrapy Framework
We can download a webpages HTML using 3 lines of code:
The variable html will contain the webpage data in html formatting. Traditionally a web-browser like Google Chrome visualizes this data.
A web-browsers sends their name and version along with a request, this is known as the user-agent. Python can mimic this using the code below. The User-Agent string contains the name of the web browser and version number:
Given a web-page data, we want to extract interesting information. You could use the BeautifulSoup module to parse the returned HTML data.
You can use the BeautifulSoup module to:
There are several modules that try to achieve the same as BeautifulSoup: PyQuery and HTMLParser, you can read more about them here.
The code below posts data to a server: