Polars is the fastest DataFrame library available today in Python. It’s a Rust library based on Arrow that has a Python binding.

I think Polars will grow massively in a few years and might be able to fully replace Pandas. In the mean time, I think Polars is a perfect fit for “intermediate sized data”. This would be datasets of sizes larger than 5GB and up to 1TB, so a fairly large spectrum that covers most use cases.

Tasks that were previously done with Spark will start to be done with polars because it’s simply faster, therefore more cost efficient. It’s also a lot simpler to use polars compared to spark for this type of datasets.

I’ve developed a series of introductory video tutorials related to spark:

Here is the first video of the series:

Leave a Reply

Your email address will not be published. Required fields are marked *