python urllib
Python hosting: Host, run, and code Python in the cloud!
Websites can be accessed using the urllib module. You can use the urllib module to interact with any website in the world, no matter if you want to get data, post data or parse data.
If you want to do web scraping or data mining, you can use urllib but it’s not the only option. Urllib will just fetch the data, but if you want to emulate a complete web browser, there’s also a module for that.
Related course:
Web Scraping in Python with BeautifulSoup & Scrapy Framework
python urllib
Download website
We can download a webpages HTML using 3 lines of code:
|
The variable html will contain the webpage data in html formatting. Traditionally a web-browser like Google Chrome visualizes this data.
Web browser
A web-browsers sends their name and version along with a request, this is known as the user-agent. Python can mimic this using the code below. The User-Agent string contains the name of the web browser and version number:
import urllib.request |
Parsing data
Given a web-page data, we want to extract interesting information. You could use the BeautifulSoup module to parse the returned HTML data.
You can use the BeautifulSoup module to:
There are several modules that try to achieve the same as BeautifulSoup: PyQuery and HTMLParser, you can read more about them here.
Posting data
The code below posts data to a server:
|
Leave a Reply: