Urllib Tutorial Python 3

Websites can be accessed using the urllib module. You can use the urllib module to interact with any website in the world, no matter if you want to get data, post data or parse data.

If you want to do web scraping or data mining, you can use urllib but it’s not the only option. Urllib will just fetch the data, but if you want to emulate a complete web browser, there’s also a module for that.

Related course:
Web Scraping in Python with BeautifulSoup & Scrapy Framework

python urllib

Download website
We can download a webpages HTML using 3 lines of code:

import urllib.request

html = urllib.request.urlopen('https://arstechnica.com').read()

The variable html will contain the webpage data in html formatting. Traditionally a web-browser like Google Chrome visualizes this data.

Web browser
A web-browsers sends their name and version along with a request, this is known as the user-agent. Python can mimic this using the code below. The User-Agent string contains the name of the web browser and version number:

import urllib.request

headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"

req = urllib.request.Request('https://arstechnica.com', headers = headers)
html = urllib.request.urlopen(req).read()

Parsing data
Given a web-page data, we want to extract interesting information. You could use the BeautifulSoup module to parse the returned HTML data.

You can use the BeautifulSoup module to:

There are several modules that try to achieve the same as BeautifulSoup: PyQuery and HTMLParser, you can read more about them here.

Posting data
The code below posts data to a server:

import urllib.request

data = urllib.urlencode({'s': 'Post variable'})
h = httplib.HTTPConnection('https://server:80/')
headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
h.request('POST', 'webpage.php', data, headers)
r = h.getresponse()


Leave a Reply

Login disabled