There are multiple ways you can iterate or loop over rows in pandas. However, this is often a “code smell” that implies you are not very experience with pandas.

First, let me cover how to iterate over rows. Let’s assume we have this pandas Data Frame.

import pandas as pd

df = pd.DataFrame({
    "product": ['iphone 14 pro', 'samsung s23', 'Motorola edge 30'],
    "price": [999, 1199, 799]
})
df

First of all, let me cover how to iterate over rows. You can use the df.itertuples syntax.

for row in df.itertuples():
    print(row.price)
# 999
# 1199
# 799

Why isn’t this recommended?

The problem with this approach is it’s slow. You will not feel this if you are working with a relatively small dataset (up to 100k rows for example).

With a medium sized dataset, this approach will be very slow.

What should I do then?

Generally the best option is to avoid using for loops. For example, let’s divide the price by the price average.

df['new_col'] = df.price / df.price.mean()

Imagine all the code we would need to write by using df.itertuples or df.iterrows.

Generally the concept behind this idea is vectorization. This is valid for pandas, numpy, pytorch, tensorflow, etc. There are a set of operations that are already optimized in low level languages. Therefore if you use for example df.price.mean() you are calling a C function that is heavily optimized.

However, if you compute a mean using python code, your solution will be very slow.

Interested in more pandas tips and tricks? Check out this youtube video with the top-10 tips & tricks I use on my day to day work.


Leave a Reply

Your email address will not be published. Required fields are marked *