In this post I cover how you can make line charts using the most popular data visualization libraries in Python. These are Pandas .plot method, Matplotlib, Seaborn, plotly-express and Plotnine.

A common issue with line charts is overplotting, this happens when you have too many time series in the chart and it’s impossible to make something useful out of it.

To solve this, I’ll show several examples using the small multiples (or facets) data visualization technique. If you don’t know what this means, it’s a very useful technique to condition a plot by a categorical variable.

Getting the data

import pandas as pd
import numpy as np

import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *

pd.options.display.max_columns = 500

# Set seaborn as default styling
plt.style.use('default')

# Read the data and keep info of 9 states
data_path = "https://raw.githubusercontent.com/martinbel/datasets/master/unemployment.csv"
df = pd.read_csv(data_path, parse_dates=['date'])

# Filter a few states we are interested in
keep_states = ['SC', 'CA', 'FL', 'NY', 'WI', 'WA', 'NJ', 'IL', 'TX']
df = df.query('state == @keep_states')

# show top 3 rows of each state
df.groupby("state").head(3).head(9)
Unemployment Data

1. Pandas .plot()

Pandas is a great option when you need a simple line chart, I use this a lot for quick plots when doing EDA on one dimension.

(df
 .query('state == "FL"')
 .set_index('date')
 .plot()
)
1d Time Series Plot

It’s also possible to plot multiple lines on one chart with Pandas. However, in this case as we have data of nine states, it’s not easy to visualize the data.

(df
 .set_index(['state', 'date'])
 .unstack(0)
 .plot()
 .legend(loc='center left',
         bbox_to_anchor=(1.0, 0.5))
)
Pandas Plot – Multiple time series

There is a small multiple (or faceted plot) option in pandas. However, in this case the default plot is just awful. I just show this for completeness, but I would never use this plot in practical terms.

axis = (df
 .set_index(['state', 'date'])
 .plot(by='state')
)

for ax in axis:
    ax.legend(loc='center left',
              bbox_to_anchor=(1.0, 0.5))
Pandas – Small Multiples

I don’t think there isn’t much to say about this plot. It works and perhaps with a different backend you get something more interesting. But I’d rather explore other libraries that make more sense for complex plots.

2. Matplotlib

Let’s see what we can do with matplotlib.

group_values = list(df.state.unique())

# set number of columns in the plot
ncols = 1

# calculate number of rows in the plot
nrows = len(group_values) // ncols + (len(group_values) % ncols > 0)

# Define the plot 
plt.figure(figsize = (8, 10))
plt.suptitle("Unemployment Rate by State", fontsize=14)
plt.subplots_adjust(hspace=0, top=0.95)

for n, state in enumerate(group_values):
    # add a new subplot at each iteration using nrows and cols
    ax = plt.subplot(nrows, ncols, n + 1)
    
    # Filter the dataframe data for each state
    df_temp = df.query("state == @state")
    ax = df_temp.set_index("date").unemployment.plot(ax=ax)
        
    # chart formatting
    ax.set_title(label=state, x=1.02, y=0.4)
    ax.set_xlabel("")
Matplotlib – Small Multiples – Time Series

In this case, we get a much more decent plot. However, we had to write a lot more code. Still, I don’t think the code is hard to read so this is a decent option to visualize this data.

The downside of this plot is that it’s hard to compare the time series if they are not close to each other.

3. Seaborn

We can make a similar plot with seaborn using the sns.FacetGrid class. This is actually simpler to the matplotlib version but instead of writing the loop ourselves, it’s done for us.

sns.set(style='darkgrid')

g = sns.FacetGrid(df, 
                  col='state',  # col to facet by
                  col_wrap=2,
                  height=1.2, aspect=5,
                  sharex=True, sharey=True
                 )
g.map(sns.lineplot, 
      "date",
      'unemployment'
     )

g.fig.subplots_adjust(hspace=0.05, top=0.94);
g.fig.suptitle("Unemployment Rate by State");

# Change the title labels
for ax in g.axes:
    subplot_title = ax.get_title().split("= ")[1]
    ax.set_title(subplot_title, x=0.97, y=0.7)
    ax.set_ylabel("")
Seaborn – FacetGrid – Time Series

I think this plot is an improvement compared to the matplotlib version. It just more visually appealing and the code is relatively simpler. It still involved some customization to make it look well.

This is another option using seaborn that I think is the best of all.

# Plot each year's time series in its own facet
g = sns.relplot(
    data=df,
    x="date", y="unemployment", col="state",
    kind="line", linewidth=1, zorder=5,
    col_wrap=3, height=2, aspect=1.5, legend=False
)

# Iterate over each subplot to customize further
for state, ax in g.axes_dict.items():

    # Add the title as an annotation within the plot
    ax.text(.8, .85, state, transform=ax.transAxes, 
            fontweight="semibold")

    # Plot every year's time series in the background
    sns.lineplot(
        data=df, x="date", y="unemployment", 
        units="state", estimator=None, 
        color=".7", linewidth=1, ax=ax,
    )

    
# Reduce the frequency of the x axis ticks
#ax.set_xticks(ax.get_xticks()[::2])

# Tweak the supporting aspects of the plot
g.set_titles("")
g.set_axis_labels("", "Unemployment")
g.tight_layout()
g.fig.suptitle("Unemployment Rate by State", fontsize=14)
g.fig.subplots_adjust(hspace=0.1, top=0.93)
Seaborn – Facet Grid – Time Series

I think this is a great option that allows to easily compare the state unemployment rate with the rest of the states in each subplot.

So far this is my favorite solution to visualizing this data.

4. Plotly Express

Plotly is a great library for data visualization. It’s easy to use and allows doing interactive plots without much effort. There are two ways to visualize this data with plotly.

The first one is to just plot all the lines in one line chart. And then use the legend to filter out the lines we want to compare.

fig = px.line(
    df, x='date', y='unemployment', color='state'
)

# better hover labels
fig.update_traces(hovertemplate=None)
fig.update_layout(hovermode="x", autosize=False)
fig.update_yaxes(title='')
Plotly – Interactive Line Chart with legend

In this example, I selected only the CA and WA states and I can easily compare them on the same plot. This would be a decent option for a dashboard for example where you would allow the user to select which states to compare.

Plotly also supports the small multiple or faceted plot. This is how you can do it:

fig = px.line(
    df, x='date', y='unemployment', 
    facet_col="state", facet_col_wrap=2
)

# better hover labels
fig.update_traces(hovertemplate=None)
fig.update_layout(hovermode="x",
                  autosize=False, 
                  height=800, width=1000)
fig.update_yaxes(title='')
Plotly – Faceted line chart

I think this looks similar to the first version of the seaborn chart. However, this plot is interactive and this might be an advantage if you are developing a dashboard.

5. Plotnine

Finally we have the ggplot2 port, plotnine. In terms of the time I used to make this plot work, it was probably the fastest one. I didn’t have to sort out how the labels looked or any other aesthetic issue. The default parameters were simply correct.

(ggplot(df, aes(x='date', y='unemployment')) +
 geom_line() + 
 facet_grid("state ~ .") +
 theme(figure_size=(8, 10)) + 
 ggtitle("Unemployment Rate by State")
)
Plotnine – Facet Grid

I think this looks decent but we can do better with plotnine. This plot will look similar to the first Seaborn plot.

from datetime import date
from mizani.breaks import date_breaks

(ggplot(df, aes(x='date', y='unemployment')) +
 geom_line() + 
 facet_wrap("~ state", ncol=2) +
 scale_x_datetime(
     breaks=date_breaks('15 years'),
     minor_breaks=[]) +
 theme(figure_size=(12, 10)) + 
 ggtitle("Unemployment Rate by State")
)
Plotnine – Facet Wrap Line Chart

Considering the effort this plot involved, I think it looks great. However, I think the seaborn plot that has the other time series in grey is a better way to visualize this data.

Did I mention I have a YouTube channel where I cover data science topics? In this video I explain in more detail what I covered in this post.


Leave a Reply

Your email address will not be published. Required fields are marked *