1a: Polars I
Let’s turn tabular data work into clear, reproducible Python.
This segment is the basic workflow: read data, inspect it, clean it, transform it, and write it back out.
Most empirical research data work is tabular data work.
Rows
Observations, records, documents, firm-years, estimates, events.
Columns
Variables, identifiers, timestamps, measures, labels.
Operations
Select, filter, transform, aggregate, join, and write.
Pandas
Older, everywhere, and deeply connected to the Python data ecosystem.
Extremely useful when another package hands us a pandas object, or when pandas reads a format we need.
Polars
Newer, fast, expression-oriented, and designed around efficient dataframe operations.
Useful for research pipelines where we care about readable transformations and performance.
Some research formats are still easiest to read with pandas.
Then we do the dataframe work in Polars.
pl.from_pandas() gives us an easy way to use pandas to read data and then work with it in Polars.
Polars benefitted from observing the many years of pandas and starting with a clean slate.
Expressions
Use pl.col() and friends to describe column computations.
Pipelines
Chain operations so the data work reads as a sequence of decisions.
Lazy option
Build a query first, then run it when we ask for the result.
The Polars expression docs are the main reference.
Polars is designed to stay inside optimized dataframe operations instead of bouncing in and out of slow row-by-row Python work.
The lazy API can see a whole query before running it, which enables query optimization and can reduce unnecessary work. We’ll talk about that in 1c.
The lazy API docs have the deeper version.
Read and inspect
Load a Stata file through pandas, convert to Polars, and inspect shape, columns, schema, and nulls.
Clean and transform
Cast types, rename columns, select columns, filter rows, and create new variables.
Work by group
Use .over() for firm-level calculations inside firm-year data.
Write outputs
Write dated CSV and Parquet files that can be reused later.
Open notebooks/1a_polars.ipynb.