Requests: HTTP for Humans
If you want to request data from webservers, the traditional way to do that in Python is using the urllib library. While this library is effective, you could easily create more complexity than needed when building something. Is there another way?
Requests is an Apache2 Licensed HTTP library, written in Python. It's powered by httplib and urllib3, but it does all the hard work for you.
To install type:
git clone https://github.com/kennethreitz/requests.git
cd requests
sudo python setup.py install
The Requests library is now installed. We will list some examples below:
Related course
Practice Python with interactive exercises
Grabbing raw html using HTTP/HTTPS requests We can now query a website as :
import requests
r = requests.get('http://pythonspot.com/')
print r.content
Save it and run with:
python website.pyIt will output the raw HTML code.
Download binary image using Python
from PIL import Image
from StringIO import StringIO
import requests
r = requests.get('http://1.bp.blogspot.com/_r-MQun1PKUg/SlnHnaLcw6I/AAAAAAAAA_U$
i = Image.open(StringIO(r.content))
i.show()
An image retrieved using python
Website status code (is the website online?)
import requests
r = requests.get('http://pythonspot.com/')
print r.status_code
This returns 200 (OK). A list of status codes can be found here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Retrieve JSON from a webserver You can easily grab a JSON object from a webserver.
import requests
import requests
r = requests.get('https://api.github.com/events')
print r.json()
HTTP Post requests using Python
from StringIO import StringIO
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)
print(r.text)
SSL verification, verify certificates using Python
from StringIO import StringIO
import requests
print requests.get('https://github.com', verify=True)
Extract data from the HTTP response header With every request you send to a HTTP server, the server will send you some additional data. You can get extract data from an HTTP response using:
#!/usr/bin/env python
import requests
r = requests.get('http://pythonspot.com/')
print r.headers
This will return the data in JSON format. We can parse the data encoded in JSON format to a Python dict.
#!/usr/bin/env python
import requests
import json
r = requests.get('http://pythonspot.com/')
jsondata = str(r.headers).replace('\'','"')
headerObj = json.loads(jsondata)
print headerObj['server']
print headerObj['content-length']
print headerObj['content-encoding']
print headerObj['content-type']
print headerObj['date']
print headerObj['x-powered-by']
Extract data from HTML response Once you get the data from a server, you can parse it using python string functions or use a library. BeautifulSoup is often used. An example code that gets the page title and links:
from bs4 import BeautifulSoup
import requests
# get html data
r = requests.get('http://stackoverflow.com/')
html_doc = r.content
# create a beautifulsoup object
soup = BeautifulSoup(html_doc)
# get title
print soup.title
# print all links
for link in soup.find_all('a'):
print(link.get('href'))