python logo

pandas filter


Python hosting: Host, run, and code Python in the cloud!

Filtering rows in a DataFrame with Python’s pandas library is a fundamental task for anyone involved in Data Analysis. It allows us to narrow down our dataset to the specific rows that are relevant for our analysis.
If you’re looking to enhance your data analysis skills, consider this Data Analysis with Python Pandas course.

Filter Using the Query Method
Within pandas, a DataFrame’s columns can be filtered using boolean expressions. The query() function is an integral part of the DataFrame that facilitates this.
For example, let’s demonstrate by importing necessary libraries and creating a simple DataFrame:

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
import numpy as np

data = {
'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
'year': [2017, 2017, 2017, 2017, 2017],
'salary': [40000, 24000, 31000, 20000, 30000]
}

df = pd.DataFrame(data, index=['Acme', 'Acme', 'Bilbao', 'Bilbao', 'Bilbao'])
print(df)

The above code generates a DataFrame that appears like:
dataframe

With our DataFrame set, we can utilize the query() function. For instance, if we want to filter out employees with a salary greater than 30,000:

1
2
df_filtered = df.query('salary>30000')
print(df_filtered)

The result would display employees that fit the above criteria:
filter result

If you’d like to see the code for both creating the DataFrame and applying the filter in one place:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
import numpy as np

data = {
'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
'year': [2017, 2017, 2017, 2017, 2017],
'salary': [40000, 24000, 31000, 20000, 30000]
}

df = pd.DataFrame(data, index=['Acme', 'Acme', 'Bilbao', 'Bilbao', 'Bilbao'])
print(df)
print('----------')
df_filtered = df.query('salary>30000')
print(df_filtered)

Filter Using Indexing and Method Chaining
Another approach to filtering data in pandas is through indexing with boolean expressions. This can sometimes provide more flexibility than the query() method:

1
2
df_filtered = df[(df.salary >= 30000) & (df.year == 2017)]
print(df_filtered)

This will return rows where the salary is 30,000 or higher and the year is 2017:
filter by indexing

In conclusion, pandas offers multiple ways to filter DataFrames to better analyze and present your data. Whether you’re using the query() function or indexing, mastering these techniques will significantly improve your data analysis capabilities.
For a deep dive into pandas and data analysis, check out this Data Analysis with Python Pandas course.

<- Back | Next ->






Leave a Reply: