Visualize data with Pandas

Get the xls data for this tutorial from: depaul.edu. This dataset contains a list of US presidents, associated parties,profession and more.

pandas-data
Python Pandas Dataset

Related course
Data Analysis with Python Pandas

Beautiful Plots with Pandas
We can plot data of this large excel file with a few lines of code. We select the column “Occupation” for this demonstration using:

df['Occupation'].

Full code:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd 
 
file = r'data/Presidents.xls'
df = pd.read_excel(file)
 
# plot data
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral','red','green','blue','orange','white','brown']
df['Occupation'].value_counts().plot(kind='pie',title='Occupation by President',colors=colors)
plt.show()

occupation by president

Data cleaning and plotting
We can also plot the popularity by president. Before we do that, we need to clean some data:

data-cleaning
Data cleaning with Pandas

Some of the cells do not contain numerical values, we can either remove them or replace them:

#df = df.replace('NA()',0)
df = df[df['% popular'] != 'NA()']

To plot the popularity:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd 
 
file = r'data/Presidents.xls'
df = pd.read_excel(file)
 
# plot data
#df = df.replace('NA()',0)
df = df[df['% popular'] != 'NA()']
 
print( df['% popular'] )
df['% popular'].plot(kind='hist', bins=8, title='Popularity by President', facecolor='blue', alpha=0.5, normed=1)
plt.show()
popularity-by-president
Popularity by President

Pandas read CSV

Pandas is a data analaysis module. It provides you with high-performance, easy-to-use data structures and data analysis tools.

In this article you will learn how to read a csv file with Pandas.

Related course
Data Analysis with Python Pandas

Read CSV with Python Pandas
We create a comma seperated value (csv) file:

Names,Highscore,
Mel,8,
Jack,5,
David,3,
Peter,6,
Maria,5,
Ryan,9,

Imported in excel that will look like this:

pandas-dataset
Python Pandas example dataset

The data can be read using:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd 
 
file = r'highscore.csv'
df = pd.read_csv(file)
print(df)

The first lines import the Pandas module. The read_csv method loads the data in a a Pandas dataframe that we named df.

Dataframes
A dataframe can be manipulated using methods, the minimum and maximum can easily be extracted:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd 
 
file = r'highscore.csv'
df = pd.read_csv(file)
print('Max', df['Highscore'].max())
print('Min', df['Highscore'].min())
pandas-shell
Pandas on a dataset

The dataset in this example is very small, but a dataset can easily contain thousands or millions of records.

Read Excel with Pandas

Excel files can be read using the Python module Pandas. In this article we will read excel files using Pandas.

Related course:
Data Analysis with Python Pandas

Read Excel column names
We import the pandas module, including ExcelFile. The method read_excel() reads the data into a Pandas Data Frame, where the first parameter is the filename and the second parameter is the sheet.

The list of columns will be called df.columns.

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
 
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
 
print("Column headings:")
print(df.columns)

Using the data frame, we can get all the rows below an entire column as a list. To get such a list, simply use the column header

print(df['Sepal width'])

Read Excel data
We start with a simple Excel file, a subset of the Iris dataset.

excel - python pandas

To iterate over the list we can use a loop:

for i in df.index:
    print(df['Sepal width'][i])

We can save an entire column into a list:

listSepalWidth = df['Sepal width']
print(listSepalWidth[0])

We can simply take entire columns from an excel sheet:

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
 
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
 
sepalWidth = df['Sepal width']
sepalLength = df['Sepal length']
petalLength = df['Petal length']

Write Excel with Pandas

Excel files can be created in Python using the module Pandas. In this article we will show how to create an excel file using Python.

Related course:
Data Analysis with Python Pandas

Write Excel
We start by importing the module pandas. From the module we import ExcelWriter and ExcelFile.
The next step is to create a data frame. In the data frame we put a list, with the name of the list as the first argument:

df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
                   'b':[3,5,6,2,4,6,7,8,7,8,9]})

In this short code snippet, ‘a’ and ‘b’ are columns’. The second is the list of data the these columns will contain.

Then we write data frame to an excel file using:

writer = ExcelWriter('Pandas-Example2.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()

The complete code:

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import numpy as np
 
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
                   'b':[3,5,6,2,4,6,7,8,7,8,9]})
 
writer = ExcelWriter('Pandas-Example2.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()

The output of this code:

write excel pandas

Posts navigation

1 2