Why you should use Pydantic Dataclasses instead of Python Dataclasses

Date: 2024-10-02 | create | tech | python | pydantic | dataclasses | records |

I've used Python for the last ~5ish years of my career. I regularly lament about how bad Python's type system is. Luckily there are some tools like Pydantic to make it suck a little less.

In this post I'll share an example of why you should probably use Pydantic dataclasses over Python's built-in dataclasses.

What's the point of types in programming?

Types are basically a way to label a code flow. This is useful because it means at any point along that flow you can easily tell what kind of data is flowing through it (I like to think of this as using colored wires).

For small scale programs this doesn't matter so much - you can probably just remember / figure out what's in a code flow.

But for large scale programs with dozens of engineers this becomes a heavy tax on productivity.

For more on the usefulness of types: Types vs No Types - How Types Allow Code to Scale across Developers, Organizations, and Lines of Code

Python Types and Dataclasses

So Python added types. This is good! Some types are better than no types - as evidenced by every single company I've worked at (including Instagram!) eventually moving away from dynamic Python and towards typed Python.

A Dataclass in Python is similar to a record in other programming languages - it allows you to create a logical grouping of properties. This is useful because often a single variable / type is not enough to describe something - often we need several fields.

You can use it like:

from dataclasses import dataclass 

@dataclass(frozen=True, kw_only=True)
class RegularDataclass:
    number: int 

This creates a dataclass with one property number with type int. It uses frozen to make the type immutable and kw_only so that you create it with kwargs only (useful to make sure you're setting the right property).

The Problem with Python Types and Dataclasses

Now the problem with these types and the dataclasses built on them is that these types are not really enforced in the language. There are a lot of linters out there that help you find errors but there's very little that protects you at runtime.

At the end of the day Python is still a dynamic language so it lets you trip yourself up in the same ways a dynamic language would.

Let's take our simple dataclass above. We would assume that because the number property is an int that we could only set it to an int right?

Wrong! Cause Python!

Here's an example:

from dataclasses import dataclass 

@dataclass(frozen=True, kw_only=True)
class RegularDataclass:
    number: int 

regular_dataclass = RegularDataclass(number=None)
print(f"Regular Dataclass: {regular_dataclass}")

This will output: Regular Dataclass: RegularDataclass(number=None)

The RegularDataclass is created successfully even though the values inside of it don't align with its types. This is bad because all other code will be written based on this type - thinking number is an int but will in fact get a None instead.

And this is how we get null pointer exceptions.

Fixing the Python Types and Dataclass issues

So how do we fix this? Unfortunately Python is a dynamic language through and through so fixing this at the language level is hard. Yes they have types but they often don't really do anything - so not super useful.

So a new layer of utils has been created like Pydantic to work as a more type-safe validation layer. It doesn't add more types but it does add more run-time validations (kind of like zod for Javascript).

Here's an example of how this protects our dataclasses better than Python's built-ins:

# Python example
from dataclasses import dataclass 

@dataclass(frozen=True, kw_only=True)
class RegularDataclass:
    number: int 

print("hello world")

regular_dataclass = RegularDataclass(number=None)
print(f"Regular Dataclass: {regular_dataclass}")

# Pydantic example

import pydantic 

@pydantic.dataclasses.dataclass(frozen=True, kw_only=True)
class PydanticDataclass:
    number: int

pydantic_dataclass = PydanticDataclass(number=None)
print(f"Pydantic Dataclass: {pydantic_dataclass}")

When we run this we see:

  • RegularDataclass is still created: Regular Dataclass: RegularDataclass(number=None)
  • PydanticDataclass fails: Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]

So we don't catch this at devtime but at least we catch this at runtime where it's created - much better than something failing somewhere and having to trace it back to the origin of the type lie.

Next

I like types. They make my dev experience better. Unfortunately sometimes you don't get to pick your technologies so you have to deal with what you have. Pydantic helps make working with Python a little more sane so I like Pydantic.

Want to run this project yourself? Get this example's full source code (github) and access to dozens of others by joining HAMINIONs.

If you liked this post you might also like:

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.