Python dataclasses are a really nice feature for constructing classes that primarily hold or work with data. They can be a good alternative to using dictionaries, since they allow you to add methods, dynamic properties, and subclasses. They can also be a good alternative to building your own class by hand, since they don’t need a custom __init__()
that reassigns attributes and provide methods like __eq__()
out of the box.
One small tip to keeping dataclasses maintainable is to always construct them with kw_only=True
, like so:
from dataclasses import dataclass
@dataclass(kw_only=True)
class MyDataClass:
x: int
y: str
z: bool = True
This will construct an __init__()
that looks like this:
class MyDataClass:
def __init__(
*,
x: int,
y: str,
z: bool = True,
) -> None:
self.x = x
self.y = y
self.z = z
Instead of:
class MyDataClass:
def __init__(
x: int,
y: str,
z: bool = True,
) -> None:
self.x = x
self.y = y
self.z = z
That *
in the argument list means everything that follows must be passed as a keyword argument, instead of a positional argument.
There are two reasons you probably want to do this:
- It allows you to reorder the fields on the dataclass without breaking callers. Positional arguments means a caller can use
MyDataClass(1, 'foo', False)
, and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to useMyDataClass(x=1, y='foo', z=False)
, you remove this risk. - It allows subclasses to add required fields. Normally, any field with a default value (like
z
above) will force any fields following it to also have a default. And that includes all fields defined by subclasses. Usingkw_only=True
gives subclasses the flexibility to decide for themselves which fields must be provided by the caller and which have a default.
These reasons are more important for library authors than anything. We spend a lot of time trying to ensure backwards-compatibility and forwards-extensibility in Review Board, so this is an important topic for us. And if you’re developing something reusable with dataclasses, it might be for you, too.