Reading about Python? Actually practice it. Try PyChallenge free

Python Tutorial

Urllib Tutorial Python 3

Python hosting: PythonAnywhere — host, run and code Python in the cloud. Free tier available.

Websites can be accessed using the urllib module. You can use the urllib module to interact with any website in the world, no matter if you want to get data, post data or parse data.

If you want to do web scraping or data mining, you can use urllib but it's not the only option. Urllib will just fetch the data, but if you want to emulate a complete web browser, there's also a module for that.

Related course:
Practice Python with interactive exercises

python urllib

Download website We can download a webpages HTML using 3 lines of code:

import urllib.request

html = urllib.request.urlopen('https://arstechnica.com').read() print(html)

The variable html will contain the webpage data in html formatting. Traditionally a web-browser like Google Chrome visualizes this data.

Web browser A web-browsers sends their name and version along with a request, this is known as the user-agent. Python can mimic this using the code below. The User-Agent string contains the name of the web browser and version number:

import urllib.request

headers = {} headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"

req = urllib.request.Request('https://arstechnica.com', headers = headers) html = urllib.request.urlopen(req).read() print(html)

Parsing data Given a web-page data, we want to extract interesting information. You could use the BeautifulSoup module to parse the returned HTML data.

You can use the BeautifulSoup module to:

There are several modules that try to achieve the same as BeautifulSoup: PyQuery and HTMLParser, you can read more about them here.

Posting data The code below posts data to a server:

import urllib.request

data = urllib.urlencode({'s': 'Post variable'}) h = httplib.HTTPConnection('https://server:80/') headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"} h.request('POST', 'webpage.php', data, headers) r = h.getresponse() print(r.read())

Next

Practice
Stop reading. Start writing Python.
PyChallenge gives you interactive exercises in your browser — no install needed.
Practice Python with interactive exercises