python logo

Pandas Data Structures


Python hosting: Host, run, and code Python in the cloud!

Pandas, a data analysis library, supports two data structures:

  • Series: one-dimensional labeled arrays pd.Series(data)
  • DataFrames: two-dimensional data structure with columns, much like a table.

Related course:
Data Analysis with Python and Pandas: Go from zero to hero

Series

A series can be seen as a one-dimensional array. The data structure can hold any data type, that is includings strings, integers, floats and Python objects.

A very basic example is shown below, where it holds characters:

import pandas as pd
s = pd.Series(['a','b','c'])

It can contain an integer list:

import pandas as pd
s = pd.Series([1,2,3,4,5])

A pandas series can also contain a dictionary:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

d = { 'Netherlands': 17, 'US': 318, 'Canada': 35, 'France': 66,'UK': 64}
population = pd.Series(d)
print(population)

To get a single value use:


print(population['US'])

To get a subset:


print(population[['US','Canada','UK']])

You can also use operators on the series:


print(population[population > 60])

Data Frames

The data frame datastructure is similar to a table. Data Frames are the most commonly used Pandas data structures. So how is it made?

Lets say you have the following table:

Example Dataframe Example Dataframe

Then you can represent that as a Python dictionary like this:

d = { 'name': ['Bob','Bart','Bobby'],
'occupation': ['Lawyer','Programmer','Teacher']}

After that, you can create a new DataFrame object.

frame = pd.DataFrame(d, columns=['name','occupation'])

Unlike a dictionary, a Data Frame allows you to do all kinds of operations on the data quickly.
The complete code then becomes:

from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd

d = { 'name': ['Bob','Bart','Bobby'],
'occupation': ['Lawyer','Programmer','Teacher']}

frame = pd.DataFrame(d, columns=['name','occupation'])
print(frame)

Data may be defined manually or loaded from:

Related course:
Data Analysis with Python and Pandas: Go from zero to hero

Indexing / selection

You can do selection on a dataframe, first define a DataFrame.

>>> dff = pd.DataFrame( { 'name': ['Bob','Bart','Bobby'],  
... 'occupation': ['Lawyer','Programmer','Teacher'] } )

You can select a column by it’s column header name

>>> dff['name']
0 Bob
1 Bart
2 Bobby
Name: name, dtype: object
>>>
>>> dff['occupation']
0 Lawyer
1 Programmer
2 Teacher
Name: occupation, dtype: object
>>>

You can select a row by integer location

>>> dff.iloc[0]
name Bob
occupation Lawyer
Name: 0, dtype: object
>>> dff.iloc[1]
name Bart
occupation Programmer
Name: 1, dtype: object
>>>

If you want to slice rows from the data frame, you can do that too:

>>> dff[0:2]
name occupation
0 Bob Lawyer
1 Bart Programmer
>>>

Data arithmetic

You can do many operations on Pandas data frames and series.
For example, you can apply scalars to a series.

>>> s = pd.Series([1,2,3,4,5])
>>> s = s * 2
>>> s
0 2
1 4
2 6
3 8
4 10
dtype: int64
>>> s = s * s
>>> s
0 4
1 16
2 36
3 64
4 100
dtype: int64
>>>

Scalars can be applied to data frames too.

In the example below we create a dataframe with random numbers (using import numpy as np).
Then we apply a scalar to the data frame.

>>> df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
>>> df
A B C D
0 0 3 2 2
1 3 3 2 2
2 1 2 2 3
3 3 2 3 4
4 0 2 2 3
>>> df = df * 2
>>> df
A B C D
0 0 6 4 4
1 6 6 4 4
2 2 4 4 6
3 6 4 6 8
4 0 4 4 6
>>>





Leave a Reply: