python logo

Reading CSV files in Python

Python hosting: Host, run, and code Python in the cloud!

Reading CSV files using Python 3 is what you will learn in this article. The file data contains comma separated values (csv). The comma is known as the delimiter, it may be another character such as a semicolon.

A CSV file is a table of values, separated by commas. To read a CSV file from Python, you need to import the csv module or the pandas module.

Related course
Python Programming Bootcamp: Go from zero to hero

Read CSV

csv stands for “comma-separated values”. they are a common file format for data exchange, storage, and editing. in fact, the .csv files you may open in a spreadsheet application (like excel) are just plain text files, with one very simple rule:

all of the fields in your records must be separated by commas.

For example, the following might be a small part of a sample spreadsheet in csv format:

"first_name","last_name","email","address","city","state","zip","phone"
"charlie","davidson","[email protected]","123 main street, akron, ohio","akron, ohio","23678"
"tanya","jones","[email protected]", "734 main street", "ny", "new york", "nyc", "12354"

Another example csv file:


01/01/2016, 4
02/01/2016, 2
03/01/2016, 10
04/01/2016, 8

The process will be:

read csv file Read csv file (first step is optional)

Read CSV file

One of the first things you will need to do when creating a data-driven Python application is to read your data from a CSV file into a dataset. If you’re familiar with Excel, reading data from a CSV file is easy but if you’re new to CSV, let me show you how easy it is.

The most basic method to read a csv file is:

# load csv module
import csv

# open file for reading
with open('file.csv') as csvDataFile:

# read file as csv file
csvReader = csv.reader(csvDataFile)

# for every row, print the row
for row in csvReader:
print(row)

We import the csv module. This is a simple module to read/write csv files in python.


import csv

You can read every row in the file. Every row is returned as an array and can be accessed as such, to print the first cells we could simply write:


print(row[0])

For the second cell, you would use:


print(row[1])

It is better to have the data in arrays, because it’s easier to understand than those indices like [0],[1],[2] etc.

You can do that by adding the cells to a list during loading. The example below demonstrates this:

# load module
import csv

# first cell data
dates = []

# second cell data
scores = []

# open file for reading
with open('file.csv') as csvDataFile:

# open file as csv file
csvReader = csv.reader(csvDataFile)

# loop over rows
for row in csvReader:

# add cell [0] to list of dates
dates.append(row[0])

# add cell [1] to list of scores
scores.append(row[1])

# output data
print(dates)
print(scores)

We creates two arrays: dates and scores. We use the append method to add the cells to the arrays.

If you want to use a different delimiter simply change the reader call:


csvReader = csv.reader(delimiter=';')

Load CSV function

If you have many csv files in an identical format, you can create a function for loading the data. That way you don’t have to write duplicate code.

For instance, if your csv files have the format (dates,scores) then you can write this code:


import csv

def readMyFile(filename):
dates = []
scores = []

with open(filename) as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
dates.append(row[0])
scores.append(row[1])

return dates, scores


dates,scores = readMyFile('file.csv')

print(dates)
print(scores)

Given a csv filename, the function will read and parse the csv data. Its added to the arrays dates and scores and returned.

Read csv with pandas

CSV Files can be read by the Pandas library in Python. The read_csv() function in Pandas is used to read CSV files. You must pass it a file-like object containing your data

Pandas is not part of the Python standard library, so you will need to install it with the pip package manager. Panda’s read_csv function can read multiple columns

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Pandas uses its own data structure called a DataFrame (df), it is different than a Python list that you used with the csv module. Once a dataset has been read then many data manipulation functions become available.

To access a row you can use the index like this

print(df.loc[0])

Related course: Python Programming Bootcamp: Go from zero to hero

Next





Leave a Reply: