Pandas Data Structures

Python hosting: Host, run, and code Python in the cloud!

Pandas is a powerful tool for data analysis in Python, with two primary structures: Series and DataFrames. By understanding these structures, you’ll be better equipped to manipulate, analyze, and visualize data efficiently.

Pandas Series
A Series is a one-dimensional labeled array capable of holding data of various types, such as strings, numbers, and Python objects.

Example of a Series holding characters:

1 2	import pandas as pd s = pd.Series(['a', 'b', 'c'])

Storing integers in a Series:
1
s = pd.Series([1, 2, 3, 4, 5])

Series can also encompass dictionaries:

1 2	countries_population = {'Netherlands': 17, 'US': 318, 'Canada': 35, 'France': 66, 'UK': 64} population = pd.Series(countries_population)

Accessing values in a Series:

1	population['US'] # Gets the population of the US

Retrieving a subset of the Series:
1
population[['US', 'Canada', 'UK']]

Applying operations to filter the Series:

1	populous_countries = population[population > 60]

Pandas DataFrames
A DataFrame is a two-dimensional labeled data structure. Think of it as a table or a spreadsheet. DataFrames are more versatile than Series and are extensively used in Pandas.

Example DataFrame creation:

data = {
  'name': ['Bob', 'Bart', 'Bobby'],
  'occupation': ['Lawyer', 'Programmer', 'Teacher']
}
frame = pd.DataFrame(data, columns=['name', 'occupation'])

DataFrames support numerous operations:

Data can be sourced from various formats including CSV files, SQLite databases, or Excel files.

DataFrame Indexing & Selection
DataFrames allow fine-grained access to rows and columns.

Selecting a column:

1 2	names = dff['name'] occupations = dff['occupation']

Accessing specific rows using index:

1 2	first_row = dff.iloc[0] second_row = dff.iloc[1]

Slicing rows for a subset of the DataFrame:
1
subset = dff[0:2]

Arithmetic Operations on Data
Both Series and DataFrames support arithmetic operations. Scalars can be applied to modify the data.

Applying scalars to a Series:

1
2
3

numbers = pd.Series([1, 2, 3, 4, 5])
doubled_numbers = numbers * 2
squared_numbers = doubled_numbers * doubled_numbers

Applying scalars to a DataFrame:

1
2
3

import numpy as np
random_data = pd.DataFrame(np.random.randint(0, 5, size=(5, 4)), columns=list('ABCD'))
doubled_data = random_data * 2

By mastering the concepts of Series and DataFrames, you’ll be well on your way to becoming proficient in data analysis using Pandas. Remember, practice is key, so keep experimenting and exploring these data structures!

Posted in pandas

2017-03-31

Leave a Reply: