Category: pro
Python hosting: Host, run, and code Python in the cloud!
Creating a gmail wordcloud

We have created a python program that generates a wordcloud based on your gmail account. The output may look something like this depending on the contents of your emails.
First you will need a small script that interacts with the gmail service. We have created a small script that interact with gmail. It relies on gmaillib installed and you will need to set: allow “less-secure” applications to access gmail server: https://www.google.com/settings/security/lesssecureapps
Gmail example:
#!/usr/bin/env python import gmaillib from collections import Counter def getMails(cnt,account, start,amount): emails = account.inbox(start, amount) for email in emails: cnt[email.sender_addr] += 1 amountOfMails = 100 cnt = Counter() username = raw_input("Gmail account: ") password = raw_input("Password: ") account = gmaillib.account(username, password) getMails(cnt,account,0,amountOfMails) print cnt |
If this script runs successfully you have almost all requirements installed. You will also need the library called wordcloud. We rebuild the system such that we get one long string containing the message bodies, which we feed as input to the wordcloud instance. The variable amount contains the number of mails to fetch. We have set it to 100 but you could set it to all messages using get_inbox_count() or you could simply fetch all emails of the last week.
Final program:
#!/usr/bin/env python import gmaillib from collections import Counter from wordcloud import WordCloud import matplotlib.pyplot as plt amount = 100 cnt = Counter() username = raw_input("Gmail account: ") password = raw_input("Password: ") account = gmaillib.account(username, password) emails = account.inbox(0, amount) data = "" for email in emails: data = data + str(email.body) wordcloud = WordCloud().generate(data) plt.imshow(wordcloud) plt.axis("off") plt.show() |
Requests: HTTP for Humans
If you want to request data from webservers, the traditional way to do that in Python is using the urllib library. While this library is effective, you could easily create more complexity than needed when building something. Is there another way?
Requests is an Apache2 Licensed HTTP library, written in Python. It’s powered by httplib and urllib3, but it does all the hard work for you.
To install type:
git clone https://github.com/kennethreitz/requests.git cd requests sudo python setup.py install |
The Requests library is now installed. We will list some examples below:
Grabbing raw html using HTTP/HTTPS requests
We can now query a website as :
import requests r = requests.get('http://pythonspot.com/') print r.content |
Save it and run with:
python website.py
It will output the raw HTML code.
Download binary image using Python
from PIL import Image from StringIO import StringIO import requests r = requests.get('http://1.bp.blogspot.com/_r-MQun1PKUg/SlnHnaLcw6I/AAAAAAAAA_U$ i = Image.open(StringIO(r.content)) i.show() |
An image retrieved using python
Website status code (is the website online?)
import requests r = requests.get('http://pythonspot.com/') print r.status_code |
This returns 200 (OK). A list of status codes can be found here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Retrieve JSON from a webserver
You can easily grab a JSON object from a webserver.
import requests import requests r = requests.get('https://api.github.com/events') print r.json() |
HTTP Post requests using Python
from StringIO import StringIO import requests payload = {'key1': 'value1', 'key2': 'value2'} r = requests.post("http://httpbin.org/post", data=payload) print(r.text) |
SSL verification, verify certificates using Python
from StringIO import StringIO import requests print requests.get('https://github.com', verify=True) |
Extract data from the HTTP response header
With every request you send to a HTTP server, the server will send you some additional data. You can get extract data from an HTTP response using:
#!/usr/bin/env python import requests r = requests.get('http://pythonspot.com/') print r.headers |
This will return the data in JSON format. We can parse the data encoded in JSON format to a Python dict.
#!/usr/bin/env python import requests import json r = requests.get('http://pythonspot.com/') jsondata = str(r.headers).replace('\'','"') headerObj = json.loads(jsondata) print headerObj['server'] print headerObj['content-length'] print headerObj['content-encoding'] print headerObj['content-type'] print headerObj['date'] print headerObj['x-powered-by'] |
Extract data from HTML response
Once you get the data from a server, you can parse it using python string functions or use a library. BeautifulSoup is often used. An example code that gets the page title and links:
from bs4 import BeautifulSoup import requests # get html data r = requests.get('http://stackoverflow.com/') html_doc = r.content # create a beautifulsoup object soup = BeautifulSoup(html_doc) # get title print soup.title # print all links for link in soup.find_all('a'): print(link.get('href')) |