Today I Learned – ChipLog — Christian Hammond

Using pprint.saferepr() for string comparisons in Python

August 26, 2025 - Python, Today I Learned - Leave a Comment

Python offers a handy module called pprint, which has helpers for formatting and printing data in a nicely-formatted way. If you haven’t used this, you should definitely explore it!

Most people will reach for this module when using pprint.pprint() (aka pprint.pp()) or pprint.pformat(), but one under-appreciated method is pprint.saferepr(), which advertises itself as returning a string representation of an object that is safe from some recursion issues. They demonstrate this as:

>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

But actually repr() handles this specific case just fine:

>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

Pretty minimal difference there. Is there another reason we might care about saferepr()?

Dictionary string comparisons!

saferepr() first sets up the pprint pretty-printing machinery, meaning it takes care of things like sorting keys in dictionaries.

This is great! This is really handy when writing unit tests that need to compare, say, logging of data.

See, modern versions of CPython 3 will preserve insertion order of keys into dictionaries, which can be nice but aren’t always great for comparison purposes. If you’re writing a unit test, you probably want to feel confident about the string you’re comparing against.

Let’s take a look at repr() vs. saferepr() with a basic dictionary.

>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"

>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"

A nice, stable order. Now we don’t have to worry about a unit test breaking on different versions or implementations or with different logic.

saferepr() will disable some of pprint()`’s typical output limitations. There’s no max dictionary depth. There’s no max line width. You just get a nice, normalized string of data for comparison purposes.

But not for sets…

Okay, it’s not perfect. Sets will still use their standard representation (which seems to be iteration order), and this might be surprising given how dictionaries are handled.

Interestingly, pprint() and pformat() will sort keys if it needs to break them across multiple lines, but otherwise, nope.

For example:

>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
 'C',
 'D',
 'H',
 'I',
 'J',
 'M',
 'T',
 'U',
 'V',
 'W',
 'X',
 'b',
 'e',
 'k',
 'o',
 't',
 'v',
 'w'}

>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
 'C',
 'D',
 'H',
 'I',
 'J',
 'M',
 'T',
 'U',
 'V',
 'W',
 'X',
 'b',
 'e',
 'k',
 'o',
 't',
 'v',
 'w'}

Ah well, can’t have everything we want.

Still, depending on what you’re comparing, this can be a very handy tool in your Python Standard Library toolset.

Using pprint.saferepr() for string comparisons in Python Read More »

A crash course on Python function signatures and typing

July 12, 2025 - Python, Today I Learned - Leave a Comment

Screenshot of Python code defining some example Python types for functions

I’ve been doing some work on Review Board and our utility library Typelets, and thought I’d share some of the intricacies of function signatures and their typing in Python.

We have some pretty neat Python typing utilities in the works to help a function inherit another function’s types in their own *args and **kwargs without rewriting a TypedDict. Useful for functions that need to forward arguments to another function. I’ll talk more about that later, but understanding how it works first requires understanding a bit about how Python sees functions.

Function signatures

Python’s inspect module is full of goodies for analyzing objects and code, and today we’ll explore the inspect.signature() function.

inspect.signature() is used to introspect the signature of another function, showing its parameters, default values, type annotations, and more. It can also aid IDEs in understanding a function signature. If a function has __signature__ defined, it will reference this (at least in CPython), and this gives highly-dynamic code the ability to patch signatures at runtime. (If you’re curious, here’s exactly how it works).

There are a few places where knowing the signature can be useful, such as:

Automatically-crafting documentation (Sphinx does this)
Checking if a callback handler accepts the right arguments
Checking if an implementation of an interface is using deprecated function signatures

Let’s set up a function and take a look at its signature.

>>> def my_func(
...     a: int,
...     b: str,
...     /,
...     c: dict[str, str] | None,
...     *,
...     d: bool = False,
...     **kwargs,
... ) -> str:
...     ...
... 
>>> import inspect
>>> sig = inspect.signature(my_func)

>>> sig
<Signature (a: int, b: str, /, c: dict[str, str] | None, *,
d: bool = False, **kwargs) -> str>

>>> sig.parameters
mappingproxy(OrderedDict({
    'a': <Parameter "a: int">,
    'b': <Parameter "b: str">,
    'c': <Parameter "c: dict[str, str] | None">,
    'd': <Parameter "d: bool = False">,
    'kwargs': <Parameter "**kwargs">
}))

>>> sig.return_annotation
<class 'str'>

>>> sig.parameters.get('c')
<Parameter "c: dict[str, str] | None">

>>> 'kwargs' in sig.parameters
True

>>> 'foo' in sig.parameters
False

>>> def my_func(
...     a: int,
...     b: str,
...     /,
...     c: dict[str, str] | None,
...     *,
...     d: bool = False,
...     **kwargs,
... ) -> str:
...     ...
... 
>>> import inspect
>>> sig = inspect.signature(my_func)

>>> sig
<Signature (a: int, b: str, /, c: dict[str, str] | None, *,
d: bool = False, **kwargs) -> str>

>>> sig.parameters
mappingproxy(OrderedDict({
    'a': <Parameter "a: int">,
    'b': <Parameter "b: str">,
    'c': <Parameter "c: dict[str, str] | None">,
    'd': <Parameter "d: bool = False">,
    'kwargs': <Parameter "**kwargs">
}))

>>> sig.return_annotation
<class 'str'>

>>> sig.parameters.get('c')
<Parameter "c: dict[str, str] | None">

>>> 'kwargs' in sig.parameters
True

>>> 'foo' in sig.parameters
False

Pretty neat. Pretty useful, when you need to know what a function takes and returns.

Let’s see what happens when we work with methods on classes.

>>> class MyClass:
...     def my_method(
...         self,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass.my_method)
<Signature (self, *args, **kwargs) -> None>

>>> class MyClass:
...     def my_method(
...         self,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass.my_method)
<Signature (self, *args, **kwargs) -> None>

Seems reasonable. But…

>>> obj = MyClass()
>>> inspect.signature(obj.my_method)
<Signature (*args, **kwargs) -> None>

>>> obj = MyClass()
>>> inspect.signature(obj.my_method)
<Signature (*args, **kwargs) -> None>

self disappeared!

What happens if we do this on a classmethod? Place your bets…

>>> class MyClass2:
...     @classmethod
...     def my_method(
...         cls,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass2.my_method)
<Signature (*args, **kwargs) -> None>

>>> class MyClass2:
...     @classmethod
...     def my_method(
...         cls,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass2.my_method)
<Signature (*args, **kwargs) -> None>

If you guessed it wouldn’t have cls, you’d be right.

Only unbound methods (definitions of methods on a class) will have a self parameter in the signature. Bound methods (callable methods bound to an instance of a class) and classmethods (callable methods bound to a class) don’t. And this makes sense, if you think about it, because this signature represents what the call accepts, not what the code defining the method looks like.

You don’t pass a self when calling a method on an object, or a cls when calling a classmethod, so it doesn’t appear in the function signature. But did you know that you can call an unbound method if you provide an object as the self parameter? Watch this:

>>> class MyClass:
...     def my_method(self) -> None:
...         self.x = 42
...
>>> obj = MyClass()
>>> MyClass.my_method(obj)
>>> obj.x
42

>>> class MyClass:
...     def my_method(self) -> None:
...         self.x = 42
...
>>> obj = MyClass()
>>> MyClass.my_method(obj)
>>> obj.x
42

In this case, the unbound method MyClass.my_method has a self argument in its signature, meaning it takes it in a call. So, we can just pass in an instance. There aren’t a lot of cases where you’d want to go this route, but it’s helpful to know how this works.

What are bound and unbound methods?

I briefly touched upon this, but:

Unbound methods are just functions. Functions that are defined on a class.

Bound methods are a function where the very first argument (self or cls) is bound to a value.

Binding normally happens when you instantiate a class, but you can do it yourself through any function’s __get__():

>>> def my_func(self):
...     print('I am', self)
... 
>>> class MyClass:
...     ...
...
>>> obj = MyClass()
>>> method = my_func.__get__(MyClass)
>>> method
<bound method my_func of <__main__.MyClass object at 0x100ea20d0>>

>>> method.__self__
<__main__.MyClass object at 0x100ea20d0>

>>> inspect.ismethod(method)
True

>>> method()
I am <__main__.MyClass object at 0x100ea20d0>

>>> def my_func(self):
...     print('I am', self)
... 
>>> class MyClass:
...     ...
...
>>> obj = MyClass()
>>> method = my_func.__get__(MyClass)
>>> method
<bound method my_func of <__main__.MyClass object at 0x100ea20d0>>

>>> method.__self__
<__main__.MyClass object at 0x100ea20d0>

>>> inspect.ismethod(method)
True

>>> method()
I am <__main__.MyClass object at 0x100ea20d0>

my_func wasn’t even defined on a class, and yet we could still make it a bound method tied to an instance of MyClass.

You can think of a bound method as a convenience over having to pass in an object as the first parameter every time you want to call the function. As we saw above, we can do exactly that, if we pass it to the unbound method, but binding saves us from doing this every time.

You’ll probably never need to do this trick yourself, but it’s helpful to know how it all ties together.

By the way, @staticmethod is a way of telling Python to never make an unbound method into a bound method when instantiating the object (it stays a function), and @classmethod is a way of telling Python to bind it immediately to the class it’s defined on (and not rebind when instantiating an object from the class).

How do you tell them apart?

If you have a function, and you don’t know if it’s a standard function, a classmethod, a bound method, or an unbound method, how can you tell?

Bound methods have a __self__ attribute pointing to the parent object (and inspect.ismethod() will be True).
Classmethods have a __self__ attribute pointing to the parent class (and inspect.ismethod() will be True).
Unbound methods are tricky:
- They do not have a __self__ attribute.
- They might have a self or cls parameter in the signature, but they might not have those names (and other functions may define them).
- They should have a . in its __qualname__ attribute. This is a full .-based path to the method, relative to the module.
- Splitting __qualname__, the last component would be the name. The previous component won’t be <locals> (but if <locals> is found, you’re going to have trouble getting to the method).
- If the full path is resolvable, the parent component should be a class (but it might not be).
- You could… resolve the parent module to a file and walk its AST and find the class and method based on __qualname__. But this is expensive and probably a bad idea for most cases.
Standard functions are the fallback.

Since unbound methods are standard functions that are just defined on a class, it’s difficult to really tell the difference. You’d have to rely on heuristics and know you won’t always get a definitive answer.

(Interesting note: In Python 2, unbound methods were special kinds of functions with a __self__ that pointed to the class they were defined on, so you could easily tell!)

The challenges of typing

Functions can be typed using Callable[[<param_types>], <return_type>].

This is very simplistic, and can’t be used to represent positional-only arguments, keyword-only arguments, default arguments, *args, or **kwargs. For that, you can define a Protocol with __call__:

from typing import Protocol


class MyCallback(Protocol):
    def __call__(
        self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

from typing import Protocol


class MyCallback(Protocol):
    def __call__(
        self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

Type checkers will then treat this as a Callable, effectively. If we take my_func from the top of this post, we can assign to it:

cb: MyCallback = my_func  # This works

cb: MyCallback = my_func  # This works

What if we want to assign a method from a class? Let’s try bound and unbound.

class MyClass:
    def my_func(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return '42'

cb2: MyCallback = MyClass.my_func  # This fails
cb3: MyCallback = MyClass().my_func  # This works

class MyClass:
    def my_func(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return '42'

cb2: MyCallback = MyClass.my_func  # This fails
cb3: MyCallback = MyClass().my_func  # This works

What happened? It’s that self again. Remember, the unbound method has self in the signature, and the bound method does not.

Let’s add self and try again.

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

cb2: MyCallback = MyClass.my_func    # This works
cb3: MyCallback = MyClass().my_func  # This fails

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

cb2: MyCallback = MyClass.my_func    # This works
cb3: MyCallback = MyClass().my_func  # This fails

What happened this time?! Well, now we’ve matched the unbound signature with self, but not the bound signature without it.

Solving this gets… verbose. We can create two versions of this: Unbound, and Bound (or plain function, or classmethod):

class MyUnboundCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

# These work
cb4: MyCallback = my_func
cb5: MyCallback = MyClass().my_func


# These fail correctly
cb7: MyUnboundCallback = my_func
cb8: MyUnboundCallback = MyClass().my_func
cb9: MyCallback = MyClass.my_func

class MyUnboundCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

# These work
cb4: MyCallback = my_func
cb5: MyCallback = MyClass().my_func


# These fail correctly
cb7: MyUnboundCallback = my_func
cb8: MyUnboundCallback = MyClass().my_func
cb9: MyCallback = MyClass.my_func

This means we can use union types (MyUnboundCallback | MyCallback) to cover our bases.

It’s not flawless. Depending on how you’ve typed your signature, and the signature of the function you’re setting, you might not get the behavior you want or expect. As an example, any method with a leading self-like parameter (basically any parameter coming before your defined signature) will type as MyUnboundCallback, because it might be! Remember, we can turn any function into a bound method for an arbitrary class using __get__. That may or may not matter, depending on what you need to do.

What do I mean by that?

def my_bindable_func(
    x,
    a: int,
    b: str,
    /,
    c: dict[str, str] | None,
    *,
    d: bool = False,
    **kwargs,
) -> str:
    return ''


x1: MyCallback = my_bindable_func         # This fails
x2: MyUnboundCallback = my_bindable_func  # This works

def my_bindable_func(
    x,
    a: int,
    b: str,
    /,
    c: dict[str, str] | None,
    *,
    d: bool = False,
    **kwargs,
) -> str:
    return ''


x1: MyCallback = my_bindable_func         # This fails
x2: MyUnboundCallback = my_bindable_func  # This works

x may not be named self, but it’ll get treated as one, because if we do my_bindable_func.__get__(some_obj), then some_obj will be bound to x and callers won’t have to pass anything to x.

Okay, what if you want to return a function that can behave as an unbound method (with self) that can become an unbound method (with __get__)? We can mostly do it with:

from typing import ParamSpec, TypeVar, cast, overload


_C = TypeVar('_C')
_R_co = TypeVar('_R_co', covariant=True)
_P = ParamSpec('_P')


class MyMethod(Protocol[_C, _P, _R_co]):
    __self__: _C

    @overload
    def __get__(
        self,
        instance: None,
        owner: type[_C],
    ) -> Callable[Concatenate[_C, _P], _R_co]:
        ...

    @overload
    def __get__(
        self,
        instance: _C,
        owner: type[_C],
    ) -> Callable[_P, _R_co]:
        ...

    def __get__(
        self,
        instance: _C | None,
        owner: type[_C],
    ) -> (
        Callable[Concatenate[_C, _P], _R_co] |
        Callable[_P, _R_co]
    ):
        ...

from typing import ParamSpec, TypeVar, cast, overload


_C = TypeVar('_C')
_R_co = TypeVar('_R_co', covariant=True)
_P = ParamSpec('_P')


class MyMethod(Protocol[_C, _P, _R_co]):
    __self__: _C

    @overload
    def __get__(
        self,
        instance: None,
        owner: type[_C],
    ) -> Callable[Concatenate[_C, _P], _R_co]:
        ...

    @overload
    def __get__(
        self,
        instance: _C,
        owner: type[_C],
    ) -> Callable[_P, _R_co]:
        ...

    def __get__(
        self,
        instance: _C | None,
        owner: type[_C],
    ) -> (
        Callable[Concatenate[_C, _P], _R_co] |
        Callable[_P, _R_co]
    ):
        ...

Putting it into practice:

def make_method(
    source_method: Callable[Concatenate[_C, _P], _R_co],
) -> MyMethod[_C, _P, _R_co]:
    return cast(MyMethod, source_method)


class MyClass2:
    @make_method
    def my_method(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return True


# These work!
MyClass2().my_method(1, 'x', {}, d=True)
MyClass2.my_method(MyClass2(), 1, 'x', {}, d=True)

def make_method(
    source_method: Callable[Concatenate[_C, _P], _R_co],
) -> MyMethod[_C, _P, _R_co]:
    return cast(MyMethod, source_method)


class MyClass2:
    @make_method
    def my_method(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return True


# These work!
MyClass2().my_method(1, 'x', {}, d=True)
MyClass2.my_method(MyClass2(), 1, 'x', {}, d=True)

That’s a fair bit of work, but it satisfies the bound vs. unbound methods signature differences. If we inspect these:

>>> reveal_type(MyClass2.my_method)
Type of "MyClass2.my_method" is "(MyClass2, a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2().my_method)
Type of "MyClass2().my_method" is "(a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2.my_method)
Type of "MyClass2.my_method" is "(MyClass2, a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2().my_method)
Type of "MyClass2().my_method" is "(a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

And those are type-compatible with the MyCallback and MyUnboundCallback we built earlier, since the signatures match:

# These work
cb10: MyUnboundCallback = MyClass.my_method
cb11: MyCallback = MyClass().my_method

# These fail correctly
cb12: MyUnboundCallback = MyClass().my_method
cb13: MyCallback = MyClass.my_method

# These work
cb10: MyUnboundCallback = MyClass.my_method
cb11: MyCallback = MyClass().my_method

# These fail correctly
cb12: MyUnboundCallback = MyClass().my_method
cb13: MyCallback = MyClass.my_method

And if we wanted, we could modify that ParamSpec going into the MyMethod from make_method() and that’ll impact what the type checkers expect during the call.

Hopefully you can see how this can get complex fast, and involve some tradeoffs.

I personally believe Python needs a lot more love in this area. Types for the different kinds of functions/methods, better specialization for Callable, and some of the useful capabilities from TypeScript would be nice (such as Parameters<T>, ReturnType<T>, OmitThisParameter<T>, etc.). But this is what we have to work with today.

My teachers said to always write a conclusion

What have we learned?

Python’s method signatures are different when bound vs. unbound, and this can affect typing.
Unbound methods aren’t really their own thing, and this can lead to some challenges.
Any method can be a bound method with a call to __get__().
Callable only gets you so far. If you want to type complex functions, write a Protocol with a __call__() signature.
If you want to simulate a bound/unbound-aware type, you’ll need Protocol with __get__().

I feel like I just barely scratched the surface here. There’s a lot more to functions, working with signatures, and challenges around typing than I covered here. We haven’t talked about how you can rewrite signatures on the fly, how annotations are represented, what functions look under the hood, or how bytecode behind functions are mutable at runtime.

I’ll leave some of that for future posts. And I’ll have more to talk about when we expand Typelets with the new parameter inheritance capabilities. It builds upon a lot of what I covered today to perform some very neat and practical tricks for library authors and larger codebases.

What do you think? Did you learn something? Did I get something wrong? Have you done anything interesting or unexpected with signatures or typing around functions you’d like to share? I want to hear about it!

A crash course on Python function signatures and typing Read More »

Tip: Use keyword-only arguments in Python dataclasses

June 29, 2025 - Python, Today I Learned - Leave a Comment

Python dataclasses are a really nice feature for constructing classes that primarily hold or work with data. They can be a good alternative to using dictionaries, since they allow you to add methods, dynamic properties, and subclasses. They can also be a good alternative to building your own class by hand, since they don’t need a custom __init__() that reassigns attributes and provide methods like __eq__() out of the box.

One small tip to keeping dataclasses maintainable is to always construct them with kw_only=True, like so:

from dataclasses import dataclass


@dataclass(kw_only=True)
class MyDataClass:
    x: int
    y: str
    z: bool = True

This will construct an __init__() that looks like this:

class MyDataClass:
    def __init__(
        self,
        *,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

Instead of:

class MyDataClass:
    def __init__(
        self,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

That * in the argument list means everything that follows must be passed as a keyword argument, instead of a positional argument.

There are two reasons you probably want to do this:

It allows you to reorder the fields on the dataclass without breaking callers. Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
It allows subclasses to add required fields. Normally, any field with a default value (like z above) will force any fields following it to also have a default. And that includes all fields defined by subclasses. Using kw_only=True gives subclasses the flexibility to decide for themselves which fields must be provided by the caller and which have a default.

These reasons are more important for library authors than anything. We spend a lot of time trying to ensure backwards-compatibility and forwards-extensibility in Review Board, so this is an important topic for us. And if you’re developing something reusable with dataclasses, it might be for you, too.

Update: One important point I left out is Python compatibility. This flag was introduced in Python 3.10, so if you’re supporting older versions, you won’t be able to use this just yet. If you want to optimistically enable this just on 3.10+, one approach would be:

import sys
from dataclasses import dataclass


if sys.version_info[:2] >= (3, 10):
    dataclass_kwargs = {
        'kw_only': True,
    }
else:
    dataclass_kwargs = {}

...

@dataclass(**dataclass_kwargs)
class MyDataClass:
    ...
...

But this won’t solve the subclassing issue, so you’d still need to ensure any subclasses use default arguments if you want to support versions prior to 3.10.

Tip: Use keyword-only arguments in Python dataclasses Read More »

Excluding nested node_modules in Rollup.js

April 8, 2024 - Today I Learned, Web Development - Leave a Comment

We’re often developing multiple Node packages at the same time, symlinking their trees around in order to test them in other projects prior to release.

And sometimes we hit some pretty confusing behavior. Crazy caching issues, confounding crashes, and all manner of chaos. All resulting from one cause: Duplicate modules appearing in our Rollup.js-bundled JavaScript.

For example, we may be developing Ink (our in-progress UI component library) over here with one copy of Spina (our modern Backbone.js successor), and bundling it in Review Board (our open source, code review/document review product) over there with a different copy of Spina. The versions of Spina should be compatible, but technically they’re two separate copies.

And it’s all because of nested node_modules.

The nonsense of nested node_modules

Normally, when Rollup.js bundles code, it looks for any and all node_modules directories in the tree, considering them for dependency resolution.

If a dependency provides its own node_modules, and needs to bundle something from it, Rollup will happily include that copy in the final bundle, even if it’s already including a different copy for another project (such as the top-level project).

This is wasteful at best, and a source of awful nightmare bugs at worst.

In our case, because we’re symlinking source trees around, we’re ending up with Ink’s node_modules sitting inside Review Board’s node_modules (found at node_modules/@beanbag/ink/node_modules.), and we’re getting a copy of Spina from both.

Easily eradicating extra node_modules

Fortunately, it’s easy to resolve in Rollup.js with a simple bit of configuration.

Assuming you’re using @rollup/plugin-node-resolve, tweak the plugin configuration to look like:

{
    plugins: [
        resolve({
            moduleDirectories: [],
            modulePaths: ['node_modules'],
        }),
    ],
}

What we’re doing here is telling Resolve and Rollup two things:

Don’t look for node_modules recursively. moduleDirectories is responsible for looking for the named paths anywhere in the tree, and it defaults to ['node_modules']. This is why it’s even considering the nested copies to begin with.
Explicitly look for a top-level node_modules. modulePaths is responsible for specifying absolute paths or paths relative to the root of the tree where modules should be found. Since we’re no longer looking recursively above, we need to tell it which one we do want.

These two configurations together avoid the dreaded duplicate modules in our situation.

And hopefully it will help you avoid yours, too.

Excluding nested node_modules in Rollup.js Read More »

Building Multi-Platform Docker Images Using Multiple Hosts

October 8, 2023 - DevOps, Today I Learned - Leave a Comment

Here’s a very quick, not exactly comprehensive tutorial on building Docker images using multiple hosts (useful for building multiple architectures).

If you’re an expert on docker buildx, you may know all of this already, but if you’re not, hopefully you find this useful.

We’ll make some assumptions in this tutorial:

We want to build a single Docker image with both linux/amd64 and linux/arm64 architectures.
We’ll be building the linux/arm64 image on the local machine, and linux/amd64 on a remote machine (accessible via SSH).
We’ll call this builder instance “my-builder”

We’re going to accomplish this by building a buildx builder instance for the local machine and architecture, then append a configuration for another machine. And then we’ll activate that instance.

This is easy.

Step 1: Create your builder instance for localhost and arm64

$ docker buildx create \
    --name my-builder \
    --platform linux/arm64

This will create our my-builder instance, defaulting it to using our local Docker setup for linux/arm64.

If we wanted, we could provide a comma-separated list of platforms that the local Docker should be handling (e.g., --platform linux/arm64,darwin/arm64).

(This doesn’t have to be arm64. I’m just using this as an example.)

Step 2: Add your amd64 builder

$ docker buildx create \
    --name my-builder \
    --append \
    --platform linux/amd64 \
    ssh://<user>@<remotehost>

This will update our my-builder, informing it that linux/amd64 builds are supported and must go through the Docker service over SSH.

Note that we could easily add additional builders if we wanted (whether for the same architectures or others) by repeating this command and choosing new --platform values and remote hosts

Step 3: Verify your builder instance

Let’s take a look and make sure we have the builder setup we expect:

$ docker buildx ls
NAME/NODE       DRIVER/ENDPOINT           STATUS    BUILDKIT  PLATFORMS
my-builder *    docker-container
  my-builder0   desktop-linux             inactive            linux/arm64*
  my-builder1   ssh://myuser@example.com  inactive            linux/amd64*

Yours may look different, but it should look something like that. You’ll also see default and any other builders you’ve set up.

Step 4: Activate your builder instance

Now we’re ready to use it:

$ docker buildx use my-builder

Just that easy.

Step 5: Build your image

If all went well, we can now safely build our image:

$ docker buildx build --platform linux/arm64,linux/amd64 .

You should see build output for each architecture stream by.

If we want to make sure the right builder is doing the right thing, you can re-run docker buildx ls in another terminal. You should see running as the status for each, along with an inferred list of other architectures that host can now build (pretty much anything it natively supports that you didn’t explicitly configure above).

Step 6: Load your image into Docker

You probably want to test your newly-built image locally, don’t you? When you run the build, you might notice this message:

WARNING: No output specified with docker-container driver. Build
result will only remain in the build cache. To push result image
into registry use --push or to load image into docker use --load

And if you try to start it up, you might notice it’s missing (or that you’re running a pre-buildx version of your image).

What you need to do is re-run docker buildx build with --load and a single platform, like so:

$ docker buildx build --load --platform linux/arm64 .

That’ll rebuild it (it’ll likely just reuse what it built before) and then make it available in your local Docker registry.

Hope that helps!

Building Multi-Platform Docker Images Using Multiple Hosts Read More »

Re-typing Parent Class Attributes in TypeScript

August 25, 2023 - Today I Learned, Web Development - Leave a Comment

I was recently working on converting some code away from Backbone.js and toward Spina, our TypeScript Backbone “successor” used in Review Board, and needed to override a type from a parent class.

(I’ll talk about why we still choose to use Backbone-based code another time.)

We basically had this situation:

class BaseClass {
    summary: string | (() => string) = 'BaseClass thing doer';
    description: string | (() => string);
}

class MySubclass extends BaseClass {
    get summary(): string {
        return 'MySubclass thing doer';
    }

    // We'll just make this a standard function, for demo purposes.
    description(): string {
        return 'MySubclass does a thing!';
    }
}

TypeScript doesn’t like that so much:

Class 'BaseClass' defines instance member property 'summary', but extended class 'MySubclass' defines it as an accessor.

Class 'BaseClass' defines instance member property 'description', but extended class 'MySubclass' defines it as instance member function.

Clearly it doesn’t want me to override these members, even though one of the allowed values is a callable returning a string! Which is what we wrote, darnit!!

So what’s going on here?

How ES6 class members work

If you’re coming from another language, you might expect members defined on the class to be class members. For example, you might think you could access BaseClass.summary directly, but you’d be wrong, because these are instance members.

…

Re-typing Parent Class Attributes in TypeScriptRead More »

Re-typing Parent Class Attributes in TypeScript Read More »

Breaking back into your network with the Synology Web UI

December 23, 2013 - Security and Privacy, Today I Learned, Writings - 5 Comments

Have you ever left town, or even just took a trip to the coffee shop, only to find that you’re locked out of your home network? Maybe you needed a file that you forgot to put in Dropbox, or felt paranoid and wanted to check on your security cameras, or you just wanted to stream music. I have…

The end of a long drive

Last night, I arrived at my hotel after a 4 hour drive only to find my VPN wasn’t working. I always VPN in to home, so that I can access my file server, my VMs, security cameras, what have you. I didn’t understand.. I was sure I had things set up right. You see, I recently had my Xfinity router replaced, and had to set it up to talk to my Asus N66U, but I was absolutely sure it was working. Almost sure. Well, I thought it was working…

So I tried SSHing in. No dice. Hmm.. Any web server ports I exposed? Guess not. Maybe port forwarding was messed up somewhere?

Ah HA! I could reach my wonderful Synology NAS’s web UI. If you haven’t used this thing, it’s like a full-on desktop environment with apps. It’s amazing. Only thing it’s really missing is a web browser for accessing the home network (get on this, guys!). After spending some time thinking about it, I devised a solution to get me back into my home network, with full VPN access (though, see the end of the story for what happened there).

Christian’s step-by-step guide to breaking in with Synology

No more stories for now.

To get started, I’m assuming you have three things:

Remote access (with admin rights) to your Synology NAS’s web console.
A Linux server somewhere both sides can log into remotely (other than your local machine, as I’m assuming yours isn’t publicly connected to the network).
A local Linux or Mac with a web browser and ssh. You can make this work on Windows with Putty as well, but I’m not going into details on that. Just figure out SSH tunneling and replace step 7 below.

All set? Here’s what you do.

Log into your NAS and go to Package Center. Click Settings -> Package Sources and add:
Name: MissileHugger
Location: http://packages.missilehugger.com/
Install the “Web Console” package and run it from the start menu.
Web Console doesn’t support interactive sessions with commands, so you’ll need to have some SSH key set up on your linux server’s authorized_keys, and have that key available to you. There’s also no multi-line paste, so you’ll need to copy this key through Web Console line-by-line:
Locally:
```
$ cat ~/.ssh/id_dsa
```
On Web Console:
```
$ echo "-----BEGIN DSA PRIVATE KEY-----" > id_dsa
$ echo "<first line of private key>" >> id_dsa
$ echo "<second line of private key>" >> id_dsa
$ ...
$ echo "-----END DSA PRIVATE KEY-----" >> id_dsa
$ chmod 600 id_dsa
```
Establish a reverse tunnel to your Linux box, pointing to the web server you’re trying to reach (we’ll say 192.168.1.1 for your router).
Remember that Web Console doesn’t support interactive sessions, or pseudo-terminal allocation, so we’ll need to tweak some stuff when calling ssh:
```
$ ssh -o 'StrictHostKeyChecking no' -t -t -i id_dsa \
      -R 19980:192.168.1.1:80 youruser@yourlinuxserver
```
The ‘StrictHostKeyChecking no’ is to get around not having any way to verify a host key from Web Console, and the two -t parameters (yes, two) forces TTY allocation regardless of the shell.
If all went well, your Linux server should locally have a port 19980 that reaches your web server. Verify this by logging in and typing:
```
$ lynx http://localhost:19980
```
On your local machine, set up a tunnel to connect port 19980 on your machine to port 19980 on your Linux server.
```
$ ssh -L 19980:yourlinuxserver:19980 youruser@yourlinuxserver
```
You should now be able to reach your router. Try it! Open your favorite browser and go to http://localhost:19980
Clean up. Delete your id_dsa you painfully hand-copied over, if you no longer need it, and kill your SSH sessions.

Epilogue

While this worked great, and I was able to get back in and see my router configuration, I wasn’t able to spot any problems.

That’s when I realized my Mac’s VPN configuration was hard-coding my old IP address and not the domain for my home network. Oops 🙁

Hope this helps someone!

Breaking back into your network with the Synology Web UI Read More »

Re: Subverting Subversion

January 5, 2007 - Source Code Management, Today I Learned - Leave a Comment

I used to use the same trick Rodney Dawes describes in Subverting Subversion. Yes, it was very annoying to have to set everything from a file every time, or from stdin.

Ah, but there’s a better way, and people new to SVN seem to somehow miss this valuable command.

$ svn propedit svn:ignore

Up comes your editor, just as if you opened .cvsignore. You can now safely nuke your .cvsignore files. This is a useful command, so write it down until it’s burned into your brain.

Re: Subverting Subversion Read More »

Do you trust your eyes?

May 21, 2006 - Today I Learned - 1 Comment

I came across an awesome optical illusion on Digg. Stare at the center of the image for 15-30 seconds, and then move your mouse over it. You’ll see a color image, but it’s not really there. Move your eyes away and back to see how it really looks.

Do you trust your eyes? Read More »