python logo

pandas filter


Python hosting: Host, run, and code Python in the cloud!

Filtering rows of a DataFrame is an almost mandatory task for Data Analysis with Python. Given a Data Frame, we may not be interested in the entire dataset but only in specific rows.

Related course:
Data Analysis with Python Pandas

Filter using query
A data frames columns can be queried with a boolean expression. Every frame has the module query() as one of its objects members.


We start by importing pandas, numpy and creating a dataframe:


import pandas as pd
import numpy as np

data = {'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
'year': [2017, 2017, 2017, 2017, 2017],
'salary': [40000, 24000, 31000, 20000, 30000]}

df = pd.DataFrame(data, index = ['Acme', 'Acme', 'Bilbao', 'Bilbao', 'Bilbao'])

print(df)

This will create the data frame containing:


dataframe

After creation of the Data Frame, we call the query method with a boolean expression. This expression is based on the column names that we defined as ‘ABCD’. The query method will return a new filtered data frame.


df_filtered = df.query('salary>30000')
print(df_filtered)

This will return:


filter

Total code of data frame creation and filter using boolean expression:


import pandas as pd
import numpy as np

data = {'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
'year': [2017, 2017, 2017, 2017, 2017],
'salary': [40000, 24000, 31000, 20000, 30000]}

df = pd.DataFrame(data, index = ['Acme', 'Acme', 'Bilbao', 'Bilbao', 'Bilbao'])

print(df)
print('----------')

df_filtered = df.query('salary>30000')
print(df_filtered)

Filter by indexing, chain methods
Instead of queries, we can use in-dices.
We do that by using an array index with boolean expressions:


df_filtered = df[(df.salary >= 30000) & (df.year == 2017)]
print(df_filtered)

This will return:

filter-index

Related course:
Data Analysis with Python Pandas

BackNext





Leave a Reply: