python logo

visualize data with python


Python hosting: Host, run, and code Python in the cloud!

Visualizing data has become a critical skill for many professionals, especially in the field of data analysis. In this article, we’ll explore how to visualize data using Python’s powerful pandas library.

Start by obtaining the dataset we’ll be working with. This dataset, which can be fetched from depaul.edu, consists of information about US presidents, their associated political parties, professions, and other related data.

Python Pandas Dataset

If you’re keen to dive deeper into data analysis with pandas, consider enrolling in this Data Analysis with Python Pandas course.

Plotting Data with Pandas

One of the most exciting features of pandas is its plotting capabilities. With a few lines of code, you can visualize data from large excel files. For instance, if we focus on the “Occupation” column for this demonstration:

1
df['Occupation']

The entire code to visualize the occupation distribution among the presidents is as follows:

1
2
3
4
5
6
7
8
9
10
11
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

file = r'data/Presidents.xls'
df = pd.read_excel(file)

# Define the colors for the plot
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral','red','green','blue','orange','white','brown']
df['Occupation'].value_counts().plot(kind='pie',title='Occupation by President',colors=colors)
plt.show()

Occupation by President

Data Cleaning and Plotting

Data visualization often requires clean data. So, before we visualize the popularity of each president, let’s first clean our dataset. Here’s an example of what the raw data looks like:

Data Cleaning with Pandas

As observed, some cells do not have numerical values, and it’s best practice to either remove or replace them. For the sake of this tutorial, we’ll opt for removal:

1
df = df[df['% popular'] != 'NA()']

Now, let’s plot the popularity:

1
2
3
4
5
6
7
8
9
10
11
12
13
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

file = r'data/Presidents.xls'
df = pd.read_excel(file)

# Cleaning data
df = df[df['% popular'] != 'NA()']

# Plotting popularity distribution
df['% popular'].plot(kind='hist', bins=8, title='Popularity by President', facecolor='blue', alpha=0.5, normed=1)
plt.show()

Popularity by President

With the ease of Python’s pandas library, visualizing data has never been more straightforward. Ensure to explore more functionalities and improve your data analysis skills.

← Previous Topic Next Topic →






Leave a Reply: