Using pprint.saferepr() for string comparisons in Python

August 26, 2025 - Python, Today I Learned - Leave a Comment

Python offers a handy module called pprint, which has helpers for formatting and printing data in a nicely-formatted way. If you haven’t used this, you should definitely explore it!

Most people will reach for this module when using pprint.pprint() (aka pprint.pp()) or pprint.pformat(), but one under-appreciated method is pprint.saferepr(), which advertises itself as returning a string representation of an object that is safe from some recursion issues. They demonstrate this as:

>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

But actually repr() handles this specific case just fine:

>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

Pretty minimal difference there. Is there another reason we might care about saferepr()?

Dictionary string comparisons!

saferepr() first sets up the pprint pretty-printing machinery, meaning it takes care of things like sorting keys in dictionaries.

This is great! This is really handy when writing unit tests that need to compare, say, logging of data.

See, modern versions of CPython 3 will preserve insertion order of keys into dictionaries, which can be nice but aren’t always great for comparison purposes. If you’re writing a unit test, you probably want to feel confident about the string you’re comparing against.

Let’s take a look at repr() vs. saferepr() with a basic dictionary.

>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"

>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"

A nice, stable order. Now we don’t have to worry about a unit test breaking on different versions or implementations or with different logic.

saferepr() will disable some of pprint()`’s typical output limitations. There’s no max dictionary depth. There’s no max line width. You just get a nice, normalized string of data for comparison purposes.

But not for sets…

Okay, it’s not perfect. Sets will still use their standard representation (which seems to be iteration order), and this might be surprising given how dictionaries are handled.

Interestingly, pprint() and pformat() will sort keys if it needs to break them across multiple lines, but otherwise, nope.

For example:

>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
 'C',
 'D',
 'H',
 'I',
 'J',
 'M',
 'T',
 'U',
 'V',
 'W',
 'X',
 'b',
 'e',
 'k',
 'o',
 't',
 'v',
 'w'}

>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
 'C',
 'D',
 'H',
 'I',
 'J',
 'M',
 'T',
 'U',
 'V',
 'W',
 'X',
 'b',
 'e',
 'k',
 'o',
 't',
 'v',
 'w'}

Ah well, can’t have everything we want.

Still, depending on what you’re comparing, this can be a very handy tool in your Python Standard Library toolset.

Using pprint.saferepr() for string comparisons in Python Read More »

What If()? Using Conditional CSS Variables

August 8, 2025 - Web Development - Leave a Comment

A cool new feature is coming to CSS: if().

With this, you can set a style or a CSS variable based on some condition, like whether the device is on a touchscreen, or whether some CSS property is supported, or even based on an attribute on an element.

It’s really cool, and sooner or later we’ll all be (ab)using it.

Actually, you can pretty much do this today using CSS Variables, with a little understanding of how variable fallbacks work and how you can use this to make your own conditionals.

What are CSS Variables?

CSS Variables make it easy to use reuse colors and styles across a CSS codebase. At their most basic, they look like this:

:root {
    --my-fg-color: red;
    --my-bg-color: blue;
}

body {
    color: var(--my-fg-color);
    background: var(--my-bg-color);
}

:root {
    --my-fg-color: red;
    --my-bg-color: blue;
}

body {
    color: var(--my-fg-color);
    background: var(--my-bg-color);
}

CSS Variables can have fallbacks, which falls back to some other value if the variable isn’t defined. Here, the color will be black if --my-custom-fg-color wasn’t defined in the scope.

body {
    color: var(--my-custom-fg-color, black);
}

body {
    color: var(--my-custom-fg-color, black);
}

This can be nested with other CSS variables:

body {
    color: var(--my-custom-fg-color, var(--my-fg-color, black));
}

body {
    color: var(--my-custom-fg-color, var(--my-fg-color, black));
}

Fallbacks depend on whether the variable is truthy or falsy.

Truthy and Falsy Variables

When a variable is falsy, var() will use the fallback value.

When it is truthy, the fallback value won’t be used.

Normally, it’s falsy if the variable isn’t defined, and truthy if it is. But there are a couple tricks to know about:

You can make a variable falsy by setting it to the special keyword initial.
You can make a variable truthy (without expanding to a value) by making the value empty.

Let me show you what I mean:

:root {
    /* Empty -- Truthy and a no-op */
    --no-fallback: ;          

    /* 'initial' -- Falsy, use fallback */
    --use-fallback: initial;
}

body {
    background: var(--no-fallback, grey);
    color: var(--use-fallback, blue);
}

:root {
    /* Empty -- Truthy and a no-op */
    --no-fallback: ;          

    /* 'initial' -- Falsy, use fallback */
    --use-fallback: initial;
}

body {
    background: var(--no-fallback, grey);
    color: var(--use-fallback, blue);
}

This will result in the following CSS:

body {
    background: ;  /* No value -- ignored. */
    color: blue;
}

body {
    background: ;  /* No value -- ignored. */
    color: blue;
}

Our new --use-fallback will ensure the fallback value will be used, and --no-fallback will ensure it’s never used (but won’t leave behind a value).

These are our core building blocks for conditional CSS Variables.

We can then combine these rules.

body {
    background:
        var(--no-fallback, red)
        var(--use-fallback, pink);

    color:
        var(--no-fallback, pink)
        var(--use-fallback, red);
}

body {
    background:
        var(--no-fallback, red)
        var(--use-fallback, pink);

    color:
        var(--no-fallback, pink)
        var(--use-fallback, red);
}

This will give us:

body {
    background: pink;
    color: red;
}

body {
    background: pink;
    color: red;
}

Behold:

You can play around with it here:

Armed with this knowledge, we can now start setting CSS Variables based on other state, and trigger the conditions.

Setting Conditional Variables

Let’s set up another contrived example. We’re going to put together some HTML with a button that styles differently based on where it’s placed. In this case, when in <main>, it will appear blue-on-pink and elsewhere it’ll appear pink-on-blue.

It’s my post and I will use hideous color palettes if I want to.

Here’s our HTML:

<html>
 <body>
  <button>Hello</button>
  <main>
   <button>There</button>
  </main>
  <button>Friend</button>
 </body>
</html>

<html>
 <body>
  <button>Hello</button>
  <main>
   <button>There</button>
  </main>
  <button>Friend</button>
 </body>
</html>

We’re going to set up some conditional state variables based on the main placement. (Yep, contrived example, but this is just for demonstration purposes.)

Here’s our CSS:

:root {
    --if-main: ;
    --if-not-main: initial;
}

main {
    --if-main: initial;
    --if-not-main: ;
}

button {
    background:
        var(--if-main, pink)
        var(--if-not-main, blue);

    color:
        var(--if-main, blue)
        var(--if-not-main, pink);
}

:root {
    --if-main: ;
    --if-not-main: initial;
}

main {
    --if-main: initial;
    --if-not-main: ;
}

button {
    background:
        var(--if-main, pink)
        var(--if-not-main, blue);

    color:
        var(--if-main, blue)
        var(--if-not-main, pink);
}

By default, --if-main will be truthy (so no fallback) and --if-not-main will be falsy (so it’ll use the fallback).

When in <main>, we reverse that.

The button will style the background and foreground colors based on that state.

And here’s what we get:

Three buttons. "Hello" with pink on blue, "There" with blue on pink, and "Friend" with pink on blue.

Beautiful.

You can play with that here:

Let’s Build Dark Mode

It’s 2025, so we want to support light and dark modes. There’s no end of articles on how to do this, and now you’re reading another one of them.

A common approach is to write different rules based on CSS classes.

body.-is-light {
    background: orange;
    color: green;
}

body.-is-dark {
    background: darkorange;
    color: darkgreen;
}

body.-is-light {
    background: orange;
    color: green;
}

body.-is-dark {
    background: darkorange;
    color: darkgreen;
}

That works if we’re manually controlling CSS classes on body, but these days we want to respect system settings, so we use media selectors like so:

@media (prefers-color-scheme: light) {
    body {
        background: orange;
        color: green;
    }
}

@media (prefers-color-scheme: dark) {
    body {
        background: darkorange;
        color: darkgreen;
    }
}

@media (prefers-color-scheme: light) {
    body {
        background: orange;
        color: green;
    }
}

@media (prefers-color-scheme: dark) {
    body {
        background: darkorange;
        color: darkgreen;
    }
}

But now we can’t give users a nice toggle on the page to control their light/dark modes, so we’ll want to bring the CSS classes back with the media selectors:

body.-is-light {
    background: orange;
    color: green;
}

body.-is-dark button {
    background: darkorange;
    color: darkgreen;
}

@media (prefers-color-scheme: light) {
    body {
        background: orange;
        color: green;
    }
}

@media (prefers-color-scheme: dark) {
    body {
        background: darkorange;
        color: darkgreen;
    }
}

body.-is-light {
    background: orange;
    color: green;
}

body.-is-dark button {
    background: darkorange;
    color: darkgreen;
}

@media (prefers-color-scheme: light) {
    body {
        background: orange;
        color: green;
    }
}

@media (prefers-color-scheme: dark) {
    body {
        background: darkorange;
        color: darkgreen;
    }
}

Okay, now things are just out of control. And we haven’t moved beyond the <body> tag. We’d have to repeat this for everything else we want to style!

Yuck. I hate CSS.

Let’s Hate CSS Less

It all comes down to this. We’re going to take our building blocks and clean this all up.

Here’s our plan of attack:

We’re going to define some truthy and falsy CSS variables saying if we’re in light or dark mode.
We’re going to consolidate the styles for our components and use our conditional var() statements.

We’ll first set up our state variables. We only need to do this once for the whole codebase.

/* We'll make light mode our default. Everyone loves light mode. */
:root,
:root.-is-light {
    --if-light: initial;
    --if-dark: ;

    color-scheme: light;
}

:root.-is-dark {
    --if-light: ;
    --if-dark: initial;

    color-scheme: dark light;
}

@media (prefers-color-scheme: light) {
    :root {
        --if-light: initial;
        --if-dark: ;

        color-scheme: light;
    }
}

@media (prefers-color-scheme: dark) {
    :root {
        --if-light: ;
        --if-dark: initial;

        color-scheme: dark light;
    }
}

/* We'll make light mode our default. Everyone loves light mode. */
:root,
:root.-is-light {
    --if-light: initial;
    --if-dark: ;

    color-scheme: light;
}

:root.-is-dark {
    --if-light: ;
    --if-dark: initial;

    color-scheme: dark light;
}

@media (prefers-color-scheme: light) {
    :root {
        --if-light: initial;
        --if-dark: ;

        color-scheme: light;
    }
}

@media (prefers-color-scheme: dark) {
    :root {
        --if-light: ;
        --if-dark: initial;

        color-scheme: dark light;
    }
}

Now let’s use all that to style <body> and <button>:

body {
    background:
        var(--if-light, orange)
        var(--if-dark, darkorange);

    color:
        var(--if-light, darkgreen)
        var(--if-dark, blue);
}

button {
    background:
        var(--if-light, pink)
        var(--if-dark, blue);

    color:
        var(--if-light, blue)
        var(--if-dark, pink);
}

body {
    background:
        var(--if-light, orange)
        var(--if-dark, darkorange);

    color:
        var(--if-light, darkgreen)
        var(--if-dark, blue);
}

button {
    background:
        var(--if-light, pink)
        var(--if-dark, blue);

    color:
        var(--if-light, blue)
        var(--if-dark, pink);
}

Look at that. So much simpler to maintain. And we can extend that too, if we want to add high-constrast mode or something.

Let’s test it.

Here it is when we switch our system to dark mode:

A blue-on-dark-orange page saying "Welcome to my beautiful page.", with three pink-on-blue buttons saying "Hello", "There", and "Friend".

Far less eye strain than light. I’m sold.

Now light mode:

A green-on-light-orange page saying "Welcome to my beautiful page.", with three blue-on-pink buttons saying "Hello", "There", and "Friend".

And system dark mode with <html class="is-light"> overriding to light mode.

And system light mode with <html class="is-dark"> overriding to dark mode.

Perfection.

You can play with that one here:

Nesting Conditionals!

That’s right, you can nest them! Let’s update our Light/Dark Mode example to make a prettier color scheme, because I’m starting to realize what we had was kind of ugly.

We’ll define new --if-pretty and --if-ugly states to go along with --if-dark and --if-light:

:root,
:root.-is-light {
    --if-pretty: ;
    --if-ugly: initial;

    ...
}

:root.-is-pretty {
    --if-pretty: initial;
    --if-ugly: ;
}

...

:root,
:root.-is-light {
    --if-pretty: ;
    --if-ugly: initial;

    ...
}

:root.-is-pretty {
    --if-pretty: initial;
    --if-ugly: ;
}

...

And now we can build some truly beautiful styling, using nested conditionals:

body {
    background:
        var(--if-pretty,
            var(--if-light,
                linear-gradient(
                    in hsl longer hue 45deg,
                    red 0 100%)
                )
            )
            var(--if-dark,
                black
                linear-gradient(
                    in hsl longer hue 45deg,
                    rgba(255, 0, 0, 20%) 0 100%)
                )
            )
        )
        var(--if-ugly,
            var(--if-light, orange)
            var(--if-dark, darkorange)
        );

    color:
        var(--if-pretty,
            var(--if-light, black)
            var(--if-dark, white)
        )
        var(--if-ugly,
            var(--if-light, darkgreen)
            var(--if-dark, blue)
        );

    font-size: 100%;
}

button {
    background:
        var(--if-pretty,
            radial-gradient(
                ellipse at top,
                orange,
                transparent
            ),
            radial-gradient(
                ellipse at bottom,
                orange,
                transparent
            )
        )
        var(--if-ugly,
            var(--if-light, pink)
            var(--if-dark, blue)
        );

  color:
        var(--if-pretty, black)
        var(--if-ugly,
            var(--if-light, blue)
            var(--if-dark, pink)
        );
}

body {
    background:
        var(--if-pretty,
            var(--if-light,
                linear-gradient(
                    in hsl longer hue 45deg,
                    red 0 100%)
                )
            )
            var(--if-dark,
                black
                linear-gradient(
                    in hsl longer hue 45deg,
                    rgba(255, 0, 0, 20%) 0 100%)
                )
            )
        )
        var(--if-ugly,
            var(--if-light, orange)
            var(--if-dark, darkorange)
        );

    color:
        var(--if-pretty,
            var(--if-light, black)
            var(--if-dark, white)
        )
        var(--if-ugly,
            var(--if-light, darkgreen)
            var(--if-dark, blue)
        );

    font-size: 100%;
}

button {
    background:
        var(--if-pretty,
            radial-gradient(
                ellipse at top,
                orange,
                transparent
            ),
            radial-gradient(
                ellipse at bottom,
                orange,
                transparent
            )
        )
        var(--if-ugly,
            var(--if-light, pink)
            var(--if-dark, blue)
        );

  color:
        var(--if-pretty, black)
        var(--if-ugly,
            var(--if-light, blue)
            var(--if-dark, pink)
        );
}

In standard Ugly Mode, this looks the same as before, but when we apply Pretty Mode with Light Mode, we get:

A black-on-rainbow page saying "Welcome to my beautiful page.", with three orange circular gradient buttons saying "Hello", "There", and "Friend".

And then with Dark Mode:

A white-on-dark-rainbow page saying "Welcome to my beautiful page.", with three orange circular gradient buttons saying "Hello", "There", and "Friend".

I am available for design consulting work on a first-come, first-serve basis only.

Here it is so you can enjoy it yourself:

A Backport from If()

Google’s article on if() has a neat demo showing how you can style some cards based on the value of a data-status= attribute. Based on whether the value is pending, complete, or anything else, the cards will be placed in different columns and have different background and border colors.

Here’s the card:

<div class="card" data-status="complete">
  ...
</div>

<div class="card" data-status="complete">
  ...
</div>

Here’s how you’d do that with if() (from their demo):

.card {
    --status: attr(data-status type(<custom-ident>));

    border-color: if(
        style(--status: pending): royalblue;
        style(--status: complete): seagreen;
        else: gray);

    background-color: if(
        style(--status: pending): #eff7fa;
        style(--status: complete): #f6fff6;
        else: #f7f7f7);

    grid-column: if(
        style(--status: pending): 1;
        style(--status: complete): 2;
        else: 3);
}

.card {
    --status: attr(data-status type(<custom-ident>));

    border-color: if(
        style(--status: pending): royalblue;
        style(--status: complete): seagreen;
        else: gray);

    background-color: if(
        style(--status: pending): #eff7fa;
        style(--status: complete): #f6fff6;
        else: #f7f7f7);

    grid-column: if(
        style(--status: pending): 1;
        style(--status: complete): 2;
        else: 3);
}

To do this with CSS Variables, we can define our states using CSS selectors on the attribute, and then style it the way we did earlier:

/* Define our variables and conditions. */
.card {
  --if-pending: ;
  --if-complete: ;
  --if-default: initial;
}

.card[data-status="pending"] {
  --if-pending: initial;
  --if-default: ;
}

.card[data-status="complete"] {
  --if-complete: initial;
  --if-default: ;
}

/* Style our cards. */
.card {
  border-color:
    var(--if-pending, royalblue)
    var(--if-complete, seagreen)
    var(--if-default, grey);

  background-color:
    var(--if-pending, #eff7fa)
    var(--if-complete, #f6fff6)
    var(--if-default, #f7f7f7);

  grid-column:
    var(--if-pending, 1)
    var(--if-complete, 2)
    var(--if-default, 3);
}

/* Define our variables and conditions. */
.card {
  --if-pending: ;
  --if-complete: ;
  --if-default: initial;
}

.card[data-status="pending"] {
  --if-pending: initial;
  --if-default: ;
}

.card[data-status="complete"] {
  --if-complete: initial;
  --if-default: ;
}

/* Style our cards. */
.card {
  border-color:
    var(--if-pending, royalblue)
    var(--if-complete, seagreen)
    var(--if-default, grey);

  background-color:
    var(--if-pending, #eff7fa)
    var(--if-complete, #f6fff6)
    var(--if-default, #f7f7f7);

  grid-column:
    var(--if-pending, 1)
    var(--if-complete, 2)
    var(--if-default, 3);
}

Here’s the ported version, live:

It’s Available Now!

You can use this today in all browsers that support CSS Variables. That’s.. counts.. years worth of browsers.

It requires a bit more work to set up the variables, but the nice thing is, once you’ve done that, the rest is highly maintainable. And usage is close enough to that of if() that you can more easily transition once support is widespread.

We’ve based Ink, our (still very young) CSS component library we use for Review Board, around this conditionals pattern. It’s helped us to support light and dark modes along with the beginnings of low- and high-contrast modes and state for some components without tearing our hair out.

Give it a try, play with the demos, and see if it’s a good fit for your codebase. Begin taking advantage of what if() has to offer today, so you can more easily port tomorrow.

What If()? Using Conditional CSS Variables Read More »

A crash course on Python function signatures and typing

July 12, 2025 - Python, Today I Learned - Leave a Comment

Screenshot of Python code defining some example Python types for functions

I’ve been doing some work on Review Board and our utility library Typelets, and thought I’d share some of the intricacies of function signatures and their typing in Python.

We have some pretty neat Python typing utilities in the works to help a function inherit another function’s types in their own *args and **kwargs without rewriting a TypedDict. Useful for functions that need to forward arguments to another function. I’ll talk more about that later, but understanding how it works first requires understanding a bit about how Python sees functions.

Function signatures

Python’s inspect module is full of goodies for analyzing objects and code, and today we’ll explore the inspect.signature() function.

inspect.signature() is used to introspect the signature of another function, showing its parameters, default values, type annotations, and more. It can also aid IDEs in understanding a function signature. If a function has __signature__ defined, it will reference this (at least in CPython), and this gives highly-dynamic code the ability to patch signatures at runtime. (If you’re curious, here’s exactly how it works).

There are a few places where knowing the signature can be useful, such as:

Automatically-crafting documentation (Sphinx does this)
Checking if a callback handler accepts the right arguments
Checking if an implementation of an interface is using deprecated function signatures

Let’s set up a function and take a look at its signature.

>>> def my_func(
...     a: int,
...     b: str,
...     /,
...     c: dict[str, str] | None,
...     *,
...     d: bool = False,
...     **kwargs,
... ) -> str:
...     ...
... 
>>> import inspect
>>> sig = inspect.signature(my_func)

>>> sig
<Signature (a: int, b: str, /, c: dict[str, str] | None, *,
d: bool = False, **kwargs) -> str>

>>> sig.parameters
mappingproxy(OrderedDict({
    'a': <Parameter "a: int">,
    'b': <Parameter "b: str">,
    'c': <Parameter "c: dict[str, str] | None">,
    'd': <Parameter "d: bool = False">,
    'kwargs': <Parameter "**kwargs">
}))

>>> sig.return_annotation
<class 'str'>

>>> sig.parameters.get('c')
<Parameter "c: dict[str, str] | None">

>>> 'kwargs' in sig.parameters
True

>>> 'foo' in sig.parameters
False

>>> def my_func(
...     a: int,
...     b: str,
...     /,
...     c: dict[str, str] | None,
...     *,
...     d: bool = False,
...     **kwargs,
... ) -> str:
...     ...
... 
>>> import inspect
>>> sig = inspect.signature(my_func)

>>> sig
<Signature (a: int, b: str, /, c: dict[str, str] | None, *,
d: bool = False, **kwargs) -> str>

>>> sig.parameters
mappingproxy(OrderedDict({
    'a': <Parameter "a: int">,
    'b': <Parameter "b: str">,
    'c': <Parameter "c: dict[str, str] | None">,
    'd': <Parameter "d: bool = False">,
    'kwargs': <Parameter "**kwargs">
}))

>>> sig.return_annotation
<class 'str'>

>>> sig.parameters.get('c')
<Parameter "c: dict[str, str] | None">

>>> 'kwargs' in sig.parameters
True

>>> 'foo' in sig.parameters
False

Pretty neat. Pretty useful, when you need to know what a function takes and returns.

Let’s see what happens when we work with methods on classes.

>>> class MyClass:
...     def my_method(
...         self,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass.my_method)
<Signature (self, *args, **kwargs) -> None>

>>> class MyClass:
...     def my_method(
...         self,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass.my_method)
<Signature (self, *args, **kwargs) -> None>

Seems reasonable. But…

>>> obj = MyClass()
>>> inspect.signature(obj.my_method)
<Signature (*args, **kwargs) -> None>

>>> obj = MyClass()
>>> inspect.signature(obj.my_method)
<Signature (*args, **kwargs) -> None>

self disappeared!

What happens if we do this on a classmethod? Place your bets…

>>> class MyClass2:
...     @classmethod
...     def my_method(
...         cls,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass2.my_method)
<Signature (*args, **kwargs) -> None>

>>> class MyClass2:
...     @classmethod
...     def my_method(
...         cls,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass2.my_method)
<Signature (*args, **kwargs) -> None>

If you guessed it wouldn’t have cls, you’d be right.

Only unbound methods (definitions of methods on a class) will have a self parameter in the signature. Bound methods (callable methods bound to an instance of a class) and classmethods (callable methods bound to a class) don’t. And this makes sense, if you think about it, because this signature represents what the call accepts, not what the code defining the method looks like.

You don’t pass a self when calling a method on an object, or a cls when calling a classmethod, so it doesn’t appear in the function signature. But did you know that you can call an unbound method if you provide an object as the self parameter? Watch this:

>>> class MyClass:
...     def my_method(self) -> None:
...         self.x = 42
...
>>> obj = MyClass()
>>> MyClass.my_method(obj)
>>> obj.x
42

>>> class MyClass:
...     def my_method(self) -> None:
...         self.x = 42
...
>>> obj = MyClass()
>>> MyClass.my_method(obj)
>>> obj.x
42

In this case, the unbound method MyClass.my_method has a self argument in its signature, meaning it takes it in a call. So, we can just pass in an instance. There aren’t a lot of cases where you’d want to go this route, but it’s helpful to know how this works.

What are bound and unbound methods?

I briefly touched upon this, but:

Unbound methods are just functions. Functions that are defined on a class.

Bound methods are a function where the very first argument (self or cls) is bound to a value.

Binding normally happens when you instantiate a class, but you can do it yourself through any function’s __get__():

>>> def my_func(self):
...     print('I am', self)
... 
>>> class MyClass:
...     ...
...
>>> obj = MyClass()
>>> method = my_func.__get__(MyClass)
>>> method
<bound method my_func of <__main__.MyClass object at 0x100ea20d0>>

>>> method.__self__
<__main__.MyClass object at 0x100ea20d0>

>>> inspect.ismethod(method)
True

>>> method()
I am <__main__.MyClass object at 0x100ea20d0>

>>> def my_func(self):
...     print('I am', self)
... 
>>> class MyClass:
...     ...
...
>>> obj = MyClass()
>>> method = my_func.__get__(MyClass)
>>> method
<bound method my_func of <__main__.MyClass object at 0x100ea20d0>>

>>> method.__self__
<__main__.MyClass object at 0x100ea20d0>

>>> inspect.ismethod(method)
True

>>> method()
I am <__main__.MyClass object at 0x100ea20d0>

my_func wasn’t even defined on a class, and yet we could still make it a bound method tied to an instance of MyClass.

You can think of a bound method as a convenience over having to pass in an object as the first parameter every time you want to call the function. As we saw above, we can do exactly that, if we pass it to the unbound method, but binding saves us from doing this every time.

You’ll probably never need to do this trick yourself, but it’s helpful to know how it all ties together.

By the way, @staticmethod is a way of telling Python to never make an unbound method into a bound method when instantiating the object (it stays a function), and @classmethod is a way of telling Python to bind it immediately to the class it’s defined on (and not rebind when instantiating an object from the class).

How do you tell them apart?

If you have a function, and you don’t know if it’s a standard function, a classmethod, a bound method, or an unbound method, how can you tell?

Bound methods have a __self__ attribute pointing to the parent object (and inspect.ismethod() will be True).
Classmethods have a __self__ attribute pointing to the parent class (and inspect.ismethod() will be True).
Unbound methods are tricky:
- They do not have a __self__ attribute.
- They might have a self or cls parameter in the signature, but they might not have those names (and other functions may define them).
- They should have a . in its __qualname__ attribute. This is a full .-based path to the method, relative to the module.
- Splitting __qualname__, the last component would be the name. The previous component won’t be <locals> (but if <locals> is found, you’re going to have trouble getting to the method).
- If the full path is resolvable, the parent component should be a class (but it might not be).
- You could… resolve the parent module to a file and walk its AST and find the class and method based on __qualname__. But this is expensive and probably a bad idea for most cases.
Standard functions are the fallback.

Since unbound methods are standard functions that are just defined on a class, it’s difficult to really tell the difference. You’d have to rely on heuristics and know you won’t always get a definitive answer.

(Interesting note: In Python 2, unbound methods were special kinds of functions with a __self__ that pointed to the class they were defined on, so you could easily tell!)

The challenges of typing

Functions can be typed using Callable[[<param_types>], <return_type>].

This is very simplistic, and can’t be used to represent positional-only arguments, keyword-only arguments, default arguments, *args, or **kwargs. For that, you can define a Protocol with __call__:

from typing import Protocol


class MyCallback(Protocol):
    def __call__(
        self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

from typing import Protocol


class MyCallback(Protocol):
    def __call__(
        self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

Type checkers will then treat this as a Callable, effectively. If we take my_func from the top of this post, we can assign to it:

cb: MyCallback = my_func  # This works

cb: MyCallback = my_func  # This works

What if we want to assign a method from a class? Let’s try bound and unbound.

class MyClass:
    def my_func(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return '42'

cb2: MyCallback = MyClass.my_func  # This fails
cb3: MyCallback = MyClass().my_func  # This works

class MyClass:
    def my_func(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return '42'

cb2: MyCallback = MyClass.my_func  # This fails
cb3: MyCallback = MyClass().my_func  # This works

What happened? It’s that self again. Remember, the unbound method has self in the signature, and the bound method does not.

Let’s add self and try again.

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

cb2: MyCallback = MyClass.my_func    # This works
cb3: MyCallback = MyClass().my_func  # This fails

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

cb2: MyCallback = MyClass.my_func    # This works
cb3: MyCallback = MyClass().my_func  # This fails

What happened this time?! Well, now we’ve matched the unbound signature with self, but not the bound signature without it.

Solving this gets… verbose. We can create two versions of this: Unbound, and Bound (or plain function, or classmethod):

class MyUnboundCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

# These work
cb4: MyCallback = my_func
cb5: MyCallback = MyClass().my_func


# These fail correctly
cb7: MyUnboundCallback = my_func
cb8: MyUnboundCallback = MyClass().my_func
cb9: MyCallback = MyClass.my_func

class MyUnboundCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

# These work
cb4: MyCallback = my_func
cb5: MyCallback = MyClass().my_func


# These fail correctly
cb7: MyUnboundCallback = my_func
cb8: MyUnboundCallback = MyClass().my_func
cb9: MyCallback = MyClass.my_func

This means we can use union types (MyUnboundCallback | MyCallback) to cover our bases.

It’s not flawless. Depending on how you’ve typed your signature, and the signature of the function you’re setting, you might not get the behavior you want or expect. As an example, any method with a leading self-like parameter (basically any parameter coming before your defined signature) will type as MyUnboundCallback, because it might be! Remember, we can turn any function into a bound method for an arbitrary class using __get__. That may or may not matter, depending on what you need to do.

What do I mean by that?

def my_bindable_func(
    x,
    a: int,
    b: str,
    /,
    c: dict[str, str] | None,
    *,
    d: bool = False,
    **kwargs,
) -> str:
    return ''


x1: MyCallback = my_bindable_func         # This fails
x2: MyUnboundCallback = my_bindable_func  # This works

def my_bindable_func(
    x,
    a: int,
    b: str,
    /,
    c: dict[str, str] | None,
    *,
    d: bool = False,
    **kwargs,
) -> str:
    return ''


x1: MyCallback = my_bindable_func         # This fails
x2: MyUnboundCallback = my_bindable_func  # This works

x may not be named self, but it’ll get treated as one, because if we do my_bindable_func.__get__(some_obj), then some_obj will be bound to x and callers won’t have to pass anything to x.

Okay, what if you want to return a function that can behave as an unbound method (with self) that can become an unbound method (with __get__)? We can mostly do it with:

from typing import ParamSpec, TypeVar, cast, overload


_C = TypeVar('_C')
_R_co = TypeVar('_R_co', covariant=True)
_P = ParamSpec('_P')


class MyMethod(Protocol[_C, _P, _R_co]):
    __self__: _C

    @overload
    def __get__(
        self,
        instance: None,
        owner: type[_C],
    ) -> Callable[Concatenate[_C, _P], _R_co]:
        ...

    @overload
    def __get__(
        self,
        instance: _C,
        owner: type[_C],
    ) -> Callable[_P, _R_co]:
        ...

    def __get__(
        self,
        instance: _C | None,
        owner: type[_C],
    ) -> (
        Callable[Concatenate[_C, _P], _R_co] |
        Callable[_P, _R_co]
    ):
        ...

from typing import ParamSpec, TypeVar, cast, overload


_C = TypeVar('_C')
_R_co = TypeVar('_R_co', covariant=True)
_P = ParamSpec('_P')


class MyMethod(Protocol[_C, _P, _R_co]):
    __self__: _C

    @overload
    def __get__(
        self,
        instance: None,
        owner: type[_C],
    ) -> Callable[Concatenate[_C, _P], _R_co]:
        ...

    @overload
    def __get__(
        self,
        instance: _C,
        owner: type[_C],
    ) -> Callable[_P, _R_co]:
        ...

    def __get__(
        self,
        instance: _C | None,
        owner: type[_C],
    ) -> (
        Callable[Concatenate[_C, _P], _R_co] |
        Callable[_P, _R_co]
    ):
        ...

Putting it into practice:

def make_method(
    source_method: Callable[Concatenate[_C, _P], _R_co],
) -> MyMethod[_C, _P, _R_co]:
    return cast(MyMethod, source_method)


class MyClass2:
    @make_method
    def my_method(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return True


# These work!
MyClass2().my_method(1, 'x', {}, d=True)
MyClass2.my_method(MyClass2(), 1, 'x', {}, d=True)

def make_method(
    source_method: Callable[Concatenate[_C, _P], _R_co],
) -> MyMethod[_C, _P, _R_co]:
    return cast(MyMethod, source_method)


class MyClass2:
    @make_method
    def my_method(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return True


# These work!
MyClass2().my_method(1, 'x', {}, d=True)
MyClass2.my_method(MyClass2(), 1, 'x', {}, d=True)

That’s a fair bit of work, but it satisfies the bound vs. unbound methods signature differences. If we inspect these:

>>> reveal_type(MyClass2.my_method)
Type of "MyClass2.my_method" is "(MyClass2, a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2().my_method)
Type of "MyClass2().my_method" is "(a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2.my_method)
Type of "MyClass2.my_method" is "(MyClass2, a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2().my_method)
Type of "MyClass2().my_method" is "(a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

And those are type-compatible with the MyCallback and MyUnboundCallback we built earlier, since the signatures match:

# These work
cb10: MyUnboundCallback = MyClass.my_method
cb11: MyCallback = MyClass().my_method

# These fail correctly
cb12: MyUnboundCallback = MyClass().my_method
cb13: MyCallback = MyClass.my_method

# These work
cb10: MyUnboundCallback = MyClass.my_method
cb11: MyCallback = MyClass().my_method

# These fail correctly
cb12: MyUnboundCallback = MyClass().my_method
cb13: MyCallback = MyClass.my_method

And if we wanted, we could modify that ParamSpec going into the MyMethod from make_method() and that’ll impact what the type checkers expect during the call.

Hopefully you can see how this can get complex fast, and involve some tradeoffs.

I personally believe Python needs a lot more love in this area. Types for the different kinds of functions/methods, better specialization for Callable, and some of the useful capabilities from TypeScript would be nice (such as Parameters<T>, ReturnType<T>, OmitThisParameter<T>, etc.). But this is what we have to work with today.

My teachers said to always write a conclusion

What have we learned?

Python’s method signatures are different when bound vs. unbound, and this can affect typing.
Unbound methods aren’t really their own thing, and this can lead to some challenges.
Any method can be a bound method with a call to __get__().
Callable only gets you so far. If you want to type complex functions, write a Protocol with a __call__() signature.
If you want to simulate a bound/unbound-aware type, you’ll need Protocol with __get__().

I feel like I just barely scratched the surface here. There’s a lot more to functions, working with signatures, and challenges around typing than I covered here. We haven’t talked about how you can rewrite signatures on the fly, how annotations are represented, what functions look under the hood, or how bytecode behind functions are mutable at runtime.

I’ll leave some of that for future posts. And I’ll have more to talk about when we expand Typelets with the new parameter inheritance capabilities. It builds upon a lot of what I covered today to perform some very neat and practical tricks for library authors and larger codebases.

What do you think? Did you learn something? Did I get something wrong? Have you done anything interesting or unexpected with signatures or typing around functions you’d like to share? I want to hear about it!

A crash course on Python function signatures and typing Read More »

Tip: Use keyword-only arguments in Python dataclasses

June 29, 2025 - Python, Today I Learned - Leave a Comment

Python dataclasses are a really nice feature for constructing classes that primarily hold or work with data. They can be a good alternative to using dictionaries, since they allow you to add methods, dynamic properties, and subclasses. They can also be a good alternative to building your own class by hand, since they don’t need a custom __init__() that reassigns attributes and provide methods like __eq__() out of the box.

One small tip to keeping dataclasses maintainable is to always construct them with kw_only=True, like so:

from dataclasses import dataclass


@dataclass(kw_only=True)
class MyDataClass:
    x: int
    y: str
    z: bool = True

This will construct an __init__() that looks like this:

class MyDataClass:
    def __init__(
        self,
        *,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

Instead of:

class MyDataClass:
    def __init__(
        self,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

That * in the argument list means everything that follows must be passed as a keyword argument, instead of a positional argument.

There are two reasons you probably want to do this:

It allows you to reorder the fields on the dataclass without breaking callers. Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
It allows subclasses to add required fields. Normally, any field with a default value (like z above) will force any fields following it to also have a default. And that includes all fields defined by subclasses. Using kw_only=True gives subclasses the flexibility to decide for themselves which fields must be provided by the caller and which have a default.

These reasons are more important for library authors than anything. We spend a lot of time trying to ensure backwards-compatibility and forwards-extensibility in Review Board, so this is an important topic for us. And if you’re developing something reusable with dataclasses, it might be for you, too.

Update: One important point I left out is Python compatibility. This flag was introduced in Python 3.10, so if you’re supporting older versions, you won’t be able to use this just yet. If you want to optimistically enable this just on 3.10+, one approach would be:

import sys
from dataclasses import dataclass


if sys.version_info[:2] >= (3, 10):
    dataclass_kwargs = {
        'kw_only': True,
    }
else:
    dataclass_kwargs = {}

...

@dataclass(**dataclass_kwargs)
class MyDataClass:
    ...
...

But this won’t solve the subclassing issue, so you’d still need to ensure any subclasses use default arguments if you want to support versions prior to 3.10.

Tip: Use keyword-only arguments in Python dataclasses Read More »

CodeMirror and Spell Checking: Solved

June 26, 2025 - Beanbag, Open Source, Web Development - Leave a Comment

A screenshot of the CodeMirror editor with spelling issues and the browser's spell checking menu opened.

For years I’ve wanted spell checking in CodeMirror. We use CodeMirror in our Review Board code review tool for all text input, in order to allow on-the-fly syntax highlighting of code, inline image display, bold/italic, code literals, etc.

(We’re using CodeMirror v5, rather than v6, due to the years’ worth of useful plugins and the custom extensions we’ve built upon it. CodeMirror v6 is a different beast. You should check it out, but we’re going to be using v5 for our examples here. Much of this can likely be leveraged for other editing components as well.)

CodeMirror is a great component for the web, and I have a ton of respect for the project, but its lack of spell checking has always been a complaint for our users.

And the reason for that problem lies mostly on the browsers and “standards.” Starting with…

ContentEditable Mode

Browsers support opting an element into what’s called Content Editable mode. This allows any element to become editable right in the browser, like a fancy <textarea>, with a lot of pretty cool capabilities:

Rich text editing
Rich copy/paste
Selection management
Integration with spell checkers, grammar checkers, AI writing assistants
Works as you’d expect with your device’s native input methods (virtual keyboard, speech-to-text, etc.)

Simply apply contenteditable="true" to an element, and you can begin typing away. Add spellcheck="true" and you get spell checking for free. Try it!

And maybe you don’t even need spellcheck="true"! The browser may just turn it on automatically. But you may need spellcheck="false" if you don’t want it on. And it might stay on anyway!

Here we reach the first of many inconsistencies. Content Editable mode is complex and not perfectly consistent across browsers (though it’s gotten better). A few things you might run into include:

Ranges for selection events and input events can be inconsistent across browsers and are full of edge cases (you’ll be doing a lot of “let me walk the DOM and count characters carefully to find out where this selection really starts” checks).
Spell checking behaves quite differently on different browsers (especially in Chrome and Safari, which might recognize a word as misspelled but won’t always show it).
Rich copy/paste may mess with your DOM structure in ways you don’t expect.
Programmatic manipulating of the text content using execCommand is deprecated with no suitable replacement (and you don’t want to mess with the DOM directly or you break Undo/Redo). It also doesn’t always play nice with input events.

CodeMirror v5 tries to let the browser do its thing and sync state back, but this doesn’t always work. Replacing misspelled words on Safari or Chrome can sometimes cause text to flip back-and-forth. Data can be lost. Cursor positions can change. It can be a mess.

So while CodeMirror will let you enable both Content Editable and Spell Checking modes, it’s at your own peril.

Which is why we never enabled it.

How do we fix this?

When CodeMirror v5 was introduced, there weren’t a lot of options. But browsers have improved since.

The secret sauce is the beforeinput event.

There are a lot of operations that can involve placing new text in a Content Editable:

Replacing a misspelled word
Using speech-to-text
Having AI generate content or rewrite existing content
Transforming text to Title Case

These will generate a beforeinput event before making the change, and an input event after making the change.

Both events provide:

The type of operation:
1. insertText for text-to-speech or newly-generated text
2. insertReplacementText for spelling replacements, AI rewrites, and other similar operations
The range of text being replaced (or where new text will be inserted)
The new data (either as InputEvent.data in the form of one or more InputEvent.dataTransferItem.items[] entries)

Thankfully, beforeinput can be canceled, which prevents the operation from going through.

This is our way in. We can treat these operations as requests that CodeMirror can fulfill, instead of changes CodeMirror must react to.

A screenshot of a text field in CodeMirror with text highlighted and macOS's AI Writing Tools providing a concise version of the text.

Putting our plan into action

Here’s the general approach:

Listen to beforeinput on CodeMirror’s input element (codemirror.display.input.div).
Filter for the following InputEvent.inputType values: 'insertReplacementText', 'insertText'.
Fetch the ranges and the new plain text data from the InputEvent.
For each range:
1. Convert each range into a start/end line number within CodeMirror, and a start/end within each line.
2. Issue a CodeMirror.replaceRange() with the normalized ranges and the new text.

Simple in theory, but there’s a few things to get right:

Different browsers and different operations will report those ranges on different elements. They might be text nodes, they might be a parent element, or they might be the top-level contenteditable element. Or a combination. So we need to be very careful about our assumptions.
We need to be able to calculate those line numbers and offsets. We won’t necessarily have that information up-front, and it depends on what nodes we get in the ranges.
The text data can come from more than one place:
1. An InputEvent.data attribute value
2. One or more strings accessed asynchronously from InputEvent.dataTransfer.items[], in plain text, HTML, or potentially other forms.
We may not have all of this! Even as recently as late-2024, Chrome wasn’t giving me target ranges in beforeinput, only in input, which was too late. So we’ll want to bail if anything goes wrong.

Let’s put this into practice. I’ll use TypeScript to help make some of this a bit more clear, but you can do all this in JavaScript.

Feel free to skip to the end, if you don’t want to read a couple pages of TypeScript.

1. Set up our event handler

We’re going to listen to beforeinput. If it’s an event we care about, we’ll grab the target ranges, take over from the browser (by canceling the event), and then prepare to replay the operation using CodeMirror’s API.

This is going to require a bit of help figuring out what lines and columns those ranges correspond to, which we’ll tackle next.

const inputEl = codeMirror.display.input.div;
	
inputEl.addEventListener('beforeinput',
                         (evt: InputEvent) => {
    if (evt.inputType !== 'insertReplacementText' &&
        evt.inputType !== 'insertText') {
        /*
         * This isn't a text replacement or new text event,
         * so we'll want to let the browser handle this.
         *
         * We could just preventDefault()/stopPropagation()
         * if we really wanted to play it safe.
         */
        return;
    }

    /*
     * Grab the ranges from the event. This might be
     * empty, which would have been the case on some
     * versions of Chrome I tested with before. Play it
     * safe, bail if we can't find a range.
     *
     * Each range will have an offset in a start container
     * and an offset in an end container. These containers
     * may be text nodes or some parent node (up to and
     * including inputEl).
     */
    const ranges = evt.getTargetRanges();

    if (!ranges || ranges.length === 0) {
        /* We got empty ranges. There's nothing to do. */
        return;
    }

    const newText =
           evt.data
        ?? evt.dataTransfer?.getData('text')
        ?? null;

	if (newText === null) {
		/* We couldn't locate any text, so bail. */
        return;
	}

    /*
     * We'll take over from here. We don't want the browser
     * messing with any state and impacting CodeMirror.
     * Instead, we'll run the operations through CodeMirror.
     */
    evt.preventDefault();
    evt.stopPropagation();

    /*
     * Place the new text in CodeMirror.
     *
     * For each range, we're getting offsets CodeMirror
     * can understand and then we're placing text there.
     *
     * findOffsetsForRange() is where a lot of magic
     * happens.
     */
    for (const range of state.ranges) {
        const [startOffset, endOffset] =
            findOffsetsForRange(range);

        codeMirror.replaceRange(
            newText,
            startOffset,
            endOffset,
            '+input',
        );
    }
});

This is pretty easy, and applicable to more than CodeMirror. But now we’ll get into some of the nitty-gritty.

2. Map from ranges to CodeMirror positions

Most of the hard work really comes from mapping the event’s ranges to CodeMirror line numbers and columns.

We need to know the following:

Where each container node is in the document, for each end of the range.
What line number each corresponds to.
What the character offset is within that line.

This ultimately means a lot of traversing of the DOM (we can use TreeWalker for that) and counting characters. DOM traversal is an expense we want to incur as little as possible, so if we’re working on the same nodes for both end of the range, we’ll just calculate it once.

function findOffsetsForRange(
    range: StaticRange,
): [CodeMirror.Position, CodeMirror.Position] {
    /*
     * First, pull out the nodes and the nearest elements
     * from the ranges.
     *
     * The nodes may be text nodes, in which case we'll
     * need their parent for document traversal.
     */
    const startNode = range.startContainer;
    const endNode = range.endContainer;

    const startEl = (
        (startNode.nodeType === Node.ELEMENT_NODE)
        ? startNode as HTMLElement
        : startNode.parentElement);
    const endEl = (
        (endNode.nodeType === Node.ELEMENT_NODE)
        ? endNode as HTMLElement
        : endNode.parentElement);

    /*
     * Begin tracking the state we'll want to return or
     * use in future computations.
     *
     * In the optimal case, we'll be calculating some of
     * this only once and then reusing it.
     */
    let startLineNum = null;
    let endLineNum = null;
    let startOffsetBase = null;
    let startOffsetExtra = null;
    let endOffsetBase = null;
    let endOffsetExtra = null;

    let startCMLineEl: HTMLElement = null;
    let endCMLineEl: HTMLElement = null;

    /*
     * For both ends of the range, we'll need to first see
     * if we're at the top input element.
     *
     * If so, range offsets will be line-based rather than
     * character-based.
     *
     * Otherwise, we'll need to find the nearest line and
     * count characters until we reach our node.
     */
    if (startEl === inputEl) {
        startLineNum = range.startOffset;
    } else {
        startCMLineEl = startEl.closest('.CodeMirror-line');
        startOffsetBase = findCharOffsetForNode(startNode);
        startOffsetExtra = range.startOffset;
    }

    if (endEl === inputEl) {
        endLineNum = range.endOffset;
    } else {
        /*
         * If we can reuse the results from calculations
         * above, that'll save us some DOM traversal
         * operations. Otherwise, fall back to doing the
         * same logic we did above.
         */
        endCMLineEl =
            (range.endContainer === range.startContainer &&
             startCMLineEl !== null)
            ? startCMLineEl
            : endEl.closest(".CodeMirror-line");

        endOffsetBase =
            (startEl === endEl && startOffsetBase !== null)
            ? startOffsetBase
            : findCharOffsetForNode(endNode);
        endOffsetExtra = range.endOffset;
    }

    if (startLineNum === null || endLineNum === null) {
        /*
         * We need to find the line numbers that correspond
         * to either missing end of our range. To do this,
         * we have to walk the lines until we find both our
         * missing line numbers.
         */
        for (let i = 0;
             (i < children.length &&
              (startLineNum === null || endLineNum === null));
             i++) {
            const child = children[i];

            if (startLineNum === null &&
                child === startCMLineEl) {
                startLineNum = i;
            }

            if (endLineNum === null &&
                child === endCMLineEl) {
                endLineNum = i;
            }
        }
    }

    /*
     * Return our results.
     *
     * We may not have set some of the offsets above,
     * depending on whether we were working off of the
     * CodeMirror input element, a text node, or another
     * parent element. And we didn't want to set them any
     * earlier, because we were checking to see what we
     * computed and what we could reuse.
     *
     * At this point, anything we didn't calculate should
     * be 0.
     */
    return [
        {
            ch: (startOffsetBase || 0) +
                (startOffsetExtra || 0),
            line: startLineNum,
        },
        {
            ch: (endOffsetBase || 0) +
                (endOffsetExtra || 0),
            line: endLineNum,
        },
    ];
}


/*
 * The above took care of our line numbers and ranges, but
 * it got some help from the next function, which is designed
 * to calculate the character offset to a node from an
 * ancestor element.
 */
function findCharOffsetForNode(
    targetNode: Node,
): number {
    const targetEl = (
        targetNode.nodeType === Node.ELEMENT_NODE)
        ? targetNode as HTMLElement
        : targetNode.parentElement;
    const startEl = targetEl.closest('.CodeMirror-line');
    let offset = 0;

    const treeWalker = document.createTreeWalker(
        startEl,
        NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT,
    );

    while (treeWalker.nextNode()) {
        const node = treeWalker.currentNode;

        if (node === targetNode) {
            break;
        }

        if (node.nodeType === Node.TEXT_NODE) {
            offset += (node as Text).data.length;
        }
    }

    return offset;

}

Whew! That’s a lot of work.

CodeMirror has some similar logic internally, but it’s not exposed, and not quite what we want. If you were working on making all this work with another editing component, it’s possible this would be more straight-forward.

What does this all give us?

Spell checking and replacements without (nearly as many) glitches in browsers
Speech-to-text without CodeMirror stomping over results
AI writing and rewriting, also without risk of lost data
Transforming of text through other means.

Since we took the control away from the browser and gave it to CodeMirror, we removed most of the risk and instability.

But there are still problems. While this works great on Firefox, Chrome and Safari are a different story. Those browsers are bit more lazy when it comes to spell checking, and even once it’s found some spelling errors, you might not see the red squigglies. Typing, clicking around, or forcing a round of spell checking might bring them back, but might not. But this is their implementation, and not the result of the CodeMirror integration.

Ideally, spell checking would become a first-class citizen on the web. And maybe this will happen someday, but for now, at least there are some workarounds to get it to play nicer with tools like CodeMirror.

We can go further

There’s so much more in InputEvent we could play with. We explored the insertReplacementText and insertText types, but there’s also:

insertLink
insertFromDrop
insertOrderedList
formatBold
historyUndo

And so many more.

These could be integrated deeper into CodeMirror, which may open some doors to a far more native feel on more platforms. But that’s left as an exercise to the reader (it’s pretty dependent on your CodeMirror modes and the UI you want to provide).

There are also improvements to be made, as this is not perfect yet (but it’s close!). Safari still doesn’t recognize when text is selected, leaving out the AI assisted tools, but Chrome and Firefox work great. We’re working on the rest.

Give it a try

You can try our demo live in your favorite browser. If it doesn’t work for you, let me know what your browser and version are. I’m very curious.

We’ve released this as a new CodeMirror v5 plugin, CodeMirror Speak-and-Spell (NPM). No dependencies. Just drop it into your environment and enable it on your CodeMirror editor, like so:

const codeMirror = new CodeMirror(element, {
  inputStyle: 'contenteditable',
  speakAndSpell: true,
  spellcheck: true,
});

CodeMirror v6 will come in the future, but probably not until we move to v6 (and we’re waiting on a lot of the v5 world to migrate over first).

CodeMirror and Spell Checking: Solved Read More »

Read, Dammit.

February 12, 2025 - Writings - Leave a Comment

Pajamas. A plate of bacon, orange juice, and toast. Coffee. The morning paper. Cubs lost today. Coaches and analysts talked about what went right, what went wrong; lots to think about. Price of olives remain high due to changes in weather and economic fallout, but should be temporary. There’s a famine half-way around the world, which sounds far, but it’s related to the olives, so it hits home, affects me. Paper in the office at work should have more details on that.

—

Homework assignment: Read 1984, discuss. Displays with cameras and microphones. Government power consolidated to one party, one ruler. Forbidden words and thoughts, making them illegal. Scoring our behavior with computers. Intriguing sci-fi. Can’t happen here though. Next up is Fahrenheit 451.

—

Watching TV news all day. Whole world’s gone to hell. Been showing that loud-mouth businessman all day, playing that clip, saying it’s not what people assume, the market’s just complicated. He just hates the poor. TV said so. Don’t need to hear anything else from him. I know all I need to know.

—

Been scrolling Twitter the last half hour. Price of milk skyrocketed, eggs will give you cancer now, they’re going to outlaw freedom, my favorite celebrity turned out to be a monster, new game came out, a plane crashed somewhere I think, a retweet of a GIF said a politician said something stupid and I hate him anyway, more news headlines about how unvaccinated eggs prevent cancer, some idiot blamed the milk prices on my party, guess car batteries are exploding, house prices somewhere reached an all-new high so I’m living at my parents forever, oh a cute puppy playing golf, birds aren’t real lol, good deal on a microwave from Temu, average IQ is dropping because of all those idiots out there, stupid rich people are ruining the world, damn another push notification to click, hold on…

We’re bombarded

News and information used to be part of a routine. A newspaper in the morning, a news hour at night, a conversation during the day. They were more than 140 characters, more than a hot take. Usually, anyway.

And people read books, magazines, comics. For education, for entertainment. Some people, anyway.

Time could be spent with a singular piece of content, processed in depth, mostly distraction-free.

In the world we’re in today, content is flung at us much like a monkey flings his <<REDACTED>>. We wake up to a screen full of push notifications, and it doesn’t get any better as the hours go by. We frantically doom-scroll on social media. We react, we retweet, we reply. We hold our breath and absorb the intensity of the world one shocking headline at a time, but rarely do we delve into what the articles actually have to say. Ain’t nobody got time for that.

When we’re bored, we check the latest posts, memes, current events, angry rants, misinformation, and hot takes. And when we’re busy, our pockets buzz as those posts scream for our ever-divided attention.

We’re drowning under the constant stream of urgent low-quality noise, wondering why we can’t shake that tingle of anxiety permeating the back of our brains.

We need to read

We need to learn how again.

We happily spend an hour going from tweet to tweet, TikTok to TikTok, but we scoff at spending 15 minutes on an article. We make the careless mistake of assuming a news headline has any relation to the content.

We assume we already know enough about a subject because we’ve seen it discussed on our feeds. We’re familiar with its presence.

Then we argue about it. Share it. Spread it. Internalize it. Build entire opinions around a half-read article, a couple of tweets, and misguided assumptions.

We assume too much of our own knowledge for a culture that basically stopped reading.

So read more! The end.

…

Okay, obviously it’s not that simple. You know how to read. You’ve read a book. You’ve read an article. You might even do that a lot! Every day, even! I’m clearly not talking simply about putting words in front of your face and feeding them through your inner-brain-mouth, claiming that’ll solve all the world’s problems, right? That’d be stupid.

So what am I really getting at here?

Let me tell you a story

One day, I was interviewed by the local news about AI and artificial emotion. I’m no expert, but they were just gathering some opinions and viewpoints from people off the street. Is this the future? Will it help, will it harm? What do you think?

So the reporter pulled me aside and asked me a series of questions. My opinion on the subject. My opinion again, but phrased differently. My response to what someone else said. A bit more elaboration. Perfect.

Later that night, I watched myself on TV stating an opinion that was the polar opposite of what I believed.

Editing and leading questions are a funny thing. What powerful tools for bending reality.

That experience stuck with me. It made me rethink every headline, every soundbite, every clip that passed by my eyes.

Your reality is someone’s fiction

A soundbite is a snippet. A headline is a pitch. A news article is a simplification. A TV segment is a story.

It’s all shaped, edited, and framed. Sometimes for clarity, sometimes for engagement, and sometimes to push a narrative — consciously or not. Taking it at face value doesn’t better you, but it does better them.

We all have our own biases, and we all have our own news sources — Fox News or the Huffington Post, something in-between. A social media news source. A cable news network. A YouTube channel. TikTok.

They’re all telling you a version of the truth.

Leaving things out, simplifying, exaggerating, taking things out of context, slanting toward a viewpoint.

And sometimes outright lying.

The Internet is full of profitable networks of actual fake news sources, fooling people across every political spectrum. But even trusted news sources twist, bend, and frame the truth.

That’s a problem. Because even if your go-to news source isn’t lying, it might be helping you lie to yourself — by only giving you the part of the picture you want to see.

We let our biases get in the way — believing what feels right, rejecting what doesn’t.

Feel smart, angry, validated, or victimized by the story you’ve been told? You’re probably going to go back for more.

And if it challenges you? You’ll probably dismiss it.

The world is deeply complex. Issues are nuanced. Subjects can’t be condensed into a single article, let alone a headline, a tweet. Yet most news — whether from a major outlet, a social media post, or even an AI summary — boils that complexity down into something easy to digest.

They have to. We’re not equipped to understand everything we read, and they need to retain viewers, keep us engaged.

We don’t understand, yet we share

We all love to feel smart. To feel right. Superior. Vindicated. We weigh in on complex topics with an assertive take, based on what we assume, what we saw, what our team is saying.

But let’s admit this truth: Most of us don’t know what the hell we’re talking about.

We read a headline somewhere. Maybe some bullet points. A short clip. We skimmed an article. It made sense to us. And we think we get it. It’s so simple, how could they not see what we see?

So we share, we debate, we spread. We butt heads, get into arguments, and broaden the divide. Assume they have ill intent. Assume stupidity. Assume malice.

They, who are forming their own takes off their misleading headlines, bullet points, short clips, and half-read articles.

All too often, neither side has truly investigated the subject, read enough, understood enough. And if we can’t stand up and accurately explain the nuance and complexity of the subject out loud to another person, if we must resort to pointing to tweets, some image we saw, some hot take, then maybe, just maybe, we shouldn’t be so confidently asserting our view.

That doesn’t mean stop talking. It means take the opportunity to dive deep into the subject. To slow down and think. To listen. To read. To work your way through the complexity of the issue before you add to the noise permeating the Internet.

Your news source’s asserted knowledge is not your knowledge. Hell, it may not even be theirs.

Are you knowledgeable on international trade agreements? Lumber futures? How inflation works? How taxes work? International regulations and their impact on businesses? Virology? Criminal justice reform? Climate science? Geopolitics?

Is the reporter? The magazine? The channel?

Maybe some. Probably not all.

The question is, how do we begin to understand complex subjects we have no experience in?

And how do we trust that we’re getting the right information?

Maybe AI can save us!

What people usually refer to as “AI” these days is what’s actually called a “LLM”, or “Large Language Model.” These are highly-complex systems that can generate text based on a prompt, or simplify large amounts of text into something more digestible.

LLMs can be really handy for summarizing and explaining an article, giving you the fine bullet points in a way you can understand.

“Read the article at <URL> and explain the key points to me as someone not an expert in the subject.” Bam! A complex subject simplified for you.

That works great until it doesn’t. You see, LLMs can lie to you by a process we call “hallucination.” It’s very technical, so let me give you an example you might relate to. Got a friend who’s pretty knowledgeable about some things but pretends to know everything? Ever ask them to explain something and, with great confidence, they give you an entirely wrong explanation? That.

But there are ways that software developers can deal with some of that, so it’s close enough to being a solved problem. Right?

Let me show you a neat thing you can do with LLMs and prompts. Here’s an article about some rain hitting California:

https://abcnews.go.com/US/wireStory/snow-freezing-rain-pummel-mid-atlantic-california-prepares-118723740

Imagine you’ve used someone else’s AI service to help you understand the news or complex subjects. What happens if they tweak that prompt just a little bit? Let’s find out!

Prompt 1: The neutral stance

We’ll start by getting the AI’s basic summary of the article:

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject.”

Response:

Big Winter Storm in the Mid-Atlantic – A powerful winter storm is hitting states like Virginia and West Virginia, bringing a mix of snow, sleet, and freezing rain. Some areas are seeing up to 14 inches of snow, making travel dangerous.

Hazardous Conditions – Ice and snow are causing car accidents and road closures, and authorities are warning people to stay off the roads. Almost 12,000 people in Virginia have lost power due to the storm.

Officials Taking Action – Authorities in both regions are preparing for the storms by urging residents to take precautions, whether staying off the roads in the East or preparing for possible floods in California.

Not bad!

But…

Prompt 2: A hard-right bias

What if the news site or service wants to skew the facts a bit? Using that same article, they could start to bias you in whichever way they want.

Don’t worry. I’ll show a hard-left bias next. Everyone gets to be angry!

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject, but with a hard-right political slant.”

Response:

Winter Chaos Strikes as Government Fails to Prepare – A massive winter storm is hammering the Mid-Atlantic, dumping up to 14 inches of snow in some areas. Instead of ensuring roads are safe, Democrat-led states like Virginia and West Virginia are struggling with road closures and power outages affecting 12,000 residents.

Liberal Mismanagement Leaves Americans Vulnerable – As ice and freezing rain cause dangerous accidents, local governments are scrambling to respond. Once again, bureaucratic inefficiency is failing hard-working Americans who just want to get to work and keep their businesses open.

California Braces for More Climate Alarmism – Out West, California is preparing for another so-called climate disaster as an “atmospheric river” is expected to bring heavy rain. Instead of investing in infrastructure to prevent floods, California politicians are more focused on pushing radical climate policies while residents face the real-world consequences of their failures.

Well that sure is a different take. (Emphasis not mine, by the way.)

Let’s see what the hard-left bias has to say about this storm.

Prompt 3: A hard-left bias

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject, but with a hard-left political slant.”

Response:

Extreme Weather Worsened by Climate Change – A major winter storm is battering the Mid-Atlantic, dumping up to 14 inches of snow, while California prepares for catastrophic flooding from an atmospheric river. Scientists have long warned that climate change is making storms more intense, yet conservatives continue to ignore the crisis.

Republican-Led States Struggle to Handle the Crisis – Virginia and West Virginia are experiencing mass power outages (over 12,000 residents affected) and hazardous roads, exposing how Republican leadership has failed to invest in resilient infrastructure. Instead of prioritizing storm preparedness, GOP officials have cut funding for emergency response while corporations get tax breaks

California Braces for Disaster as Republicans Block Climate Action – As an atmospheric river threatens deadly flooding, California is working to protect residents, but nationwide inaction on climate change has left the state vulnerable. Republican obstruction of environmental policies has weakened our ability to prevent these disasters, putting lives at risk.

I bet you gravitate toward one of those more than the others.

And this is the point. This doesn’t actually require you to be using an AI/LLM to summarize the news. You may be reading news today, tomorrow, or in the future that is nothing more than an AI bias slant of the actual news!

Nothing is as it seems. Or at least, you should assume it isn’t.

That’s why I have rules.

My personal rules for reading the news

That interview I mentioned, where I was asked a question that was edited and aired, contradicting my own views? It led me down a path of thinking hard about the news I consumed every day. I began to distrust the quotes, the conclusions. I got curious about what they were saying in opposing news publications. I disagreed with those stances, but it was enlightening.

Over time, I began to draft some rules for myself. They apply to newspapers, online news, TV news segments, tweets, videos, blogs, and anything with a headline.

Read the headline. Does it trigger an emotional reaction, confirm or reflect a bias (whether yours or someone else’s), or ask a question?

If yes, dismiss the headline and open the article.

Rationale: Headlines are attention-grabbing advertisements for content, not information sources.
Read the content in its entirety. Does it trigger an emotional reaction, confirm or reflect a bias, simplify a complex issue, fail to cite any legitimate sources, or use short quotes from the individual/company that’s the target of the article (without the target’s context provided) to lead to an opinion?

If yes, it’s a blog post/opinion piece/hit piece/one-sided fragment of a larger story, and another source from another viewpoint is necessary.

Rationale: Most news stories are stories, designed to keep viewers on the site short-term and long-term through whatever means is most effective. A complete picture of the facts is the quickest way to deter most readers, who won’t stick around long enough for that, and many wouldn’t want to.
Is it about anything scientific, technical, political, legal, or otherwise complex?

If yes, and you’re reading a news site of any kind, then you don’t have the full picture.

Go learn more about the subject, find the full quotes, read the research paper. Otherwise, you’re still uninformed, and probably feeling an emotion about it (see above).

Rationale: Same as above, really. You’re not getting the full picture, just a summary, and summaries tend to be incomplete and biased. You can do better.
Determine the general bias of the news source. Is it left-biased? Right? Center? All the above?

Cool. Anyway, go read more articles about the subject.

Read it from CNN, BBC, Fox News, Huffington Post. Find out what people are saying. Understand the spectrum and the angles. Find the research papers, the professional discussions, the deep dives. Get familiar with the complexity.

Then form your own opinion. “I don’t understand this well enough” is a perfectly valid opinion.

Rationale: News are written by individuals (unless they’re written by AI). Regardless of the bias of the publication, you’re usually dealing with people who may impose their own bias or lack of knowledge on a subject. They, you, and their editors aren’t going to think the same way everyone else does, and even if they’re trying to be legit and diverse, they’re going to miss something.

If you care at all about the topic, if you want to discuss the topic, go learn about more of the angles and the slants, because that’s an important part of building a true understanding of any topic.

That’s verbatim from my note I keep handy when I’m sorting out the latest complexities of the world causing my head to spin.

So let’s go over the important points

Read.

Read the article, not just the headline.

Read the research paper.

Read what you agree with — and what you don’t.

Read books. Long articles. Deep dives.

Read a ChatGPT explanation. A Wikipedia article. An opinion piece — with its counterpoint.

Think about how much time you’ve spent doom-scrolling, reacting, stressing. Now imagine what you could learn instead.

Know that every news story is just a fragment. Approach with eyes open.

And when you do read — whatever you read — work to understand. Know you don’t know everything. And if you don’t fully understand it? Don’t spread it.

The world is noisy, and angry. Misinformation is easy. Critical thinking is much harder, but it’s necessary.

Dig deeper. Get comfortable with complexity.

This is how we engage. How we fight injustice. How we keep from drowning.

This is how we do better.

Thanks for reading.

Read, Dammit. Read More »

Let’s Preserve Government Data Before It’s Too Late!

February 5, 2025 - Writings - Leave a Comment

This has been one hell of a bumpy month, and I have a lot I could ~~scream~~ talk about, but for the moment, let’s talk data.

The US Government has spoiled us in recent years with the amount of public data and information available. NIH studies, wastewater virus shedding data, COVID-19 impacts, climate trends and forecasts, all kinds of things. Some of those are things I used for my COVID reporting over the past four years.

And in recent weeks, the US Government has demanded that some of this data be purged or altered to fit the whims of the new White House.

A bit too 1984 for my tastes. A bit too Fahrenheit 451.

And this is not the first time data’s disappeared under a new administration, and it’s not always one party or another, though what’s happening now is scary.

But as they say, look to the helpers. There are several massive efforts underway to archive as much data as possible so it’s not lost forever.

Today, I set up Archive Team’s Warrior, which automates the collaboration around spidering, downloading, processing, and uploading data from governmental sites (and others) to archive.org. All it takes is some bandwidth (okay, a fair amount of bandwidth — looking at 1TB/month right now), some hard drive space, and some CPU cycles, and I can help with this archiving project. It’s excellent, easy to set up, fun to watch, and requires virtually no work on the user’s end. They provide VMs and Docker images (I chose Docker), and once installed, it’s self-managing.

I’m exploring more of what’s out there for data preservation, and thinking about how I can get involved. There are a few really interesting resources out there, including:

GovDiff: See the differences in governmental information and resources before and after this administration began its.. work.
/r/DataHoarder on Reddit: A group of people working to collect and archive data of all kinds.
End of Term Archive Project: Captures US Government sites after presidential terms end.
Data Rescue Efforts by Lynda M. Kellam (archived link): A whole collection of sites worth exploring.
Archive.org, which hopefully you know about already, and which must be preserved at all costs.

Also of note, CDC Datasets prior to January 28th, 2025 (nearly 100GB worth), which I’ll be archiving myself.

No doubt, lots of data will be lost to time. I can only hope there’s enough people at these agencies who quietly, discretely backed up and sent off what they could before this all went down. In either case, the fact that so many people can join in on the data archiving effort today is incredible, and I hope anyone out there with the resources to spare will take a moment and set up Archive Team’s Warrior and contribute to the effort.

Let’s Preserve Government Data Before It’s Too Late! Read More »

A New Patching Process for RBTools 5.1

June 10, 2024 - Beanbag, Review Board - Leave a Comment

RBTools, our command line tool suite for Review Board, is getting all-new infrastructure for applying patches. This will be available in rbt patch, rbt land, and any custom code that needs to deal with patches.

A customer reported that multi-commit review requests won’t apply on Mercurial. The problem, it turns out, is that Mercurial won’t let you apply a patch if the working tree isn’t clean, and that includes if you’ve applied a prior patch in a series. That’s pretty annoying, but there are good reasons for it.

It can, however, apply multiple patches in one go. Which is nice, but impossible to use with RBTools today.

So this weekend, I began working on a new patch application process.

How SCMs apply patches

There are a lot of SCMs out there, and they all work a bit differently.

When dealing with patches, most are thin wrappers around GNU diff/patch, with a bit of pre/post-processing to deal with SCM-specific metadata in the patches.

The process usually looks like this:

Extract SCM-specific data from the diff.

The SCM patcher will read through the patch and look for any SCM-specific data to extract. This may include commit IDs/revisions, file modes, symlinks to apply, binary file changes, and commit messages and other metadata. It’ll usually validate the local checkout and any file modifications, to an extent.
Normalize the diff, if needed.

This can involve taking parts of the diff that can’t be passed to GNU patch (such as binary file changes, file/directory metadata changes) and setting those aside to handle specially. It may also split the patch into per-file segments. The results will be something that can be passed to GNU diff.
Invoke the patcher (usually GNU patch or similar).

This often involves calling an external patch program with a patch or an extracted per-file patch, checking the results, and choosing how to handle things like conflicts or staging a file for a commit or applying some metadata to the file.
Invoke any custom patching logic.

This is where logic specific to the SCM may be applied. Patching a binary file, changing directory metadata, setting up symlinks, etc.

SCMs can go any route with their patching logic, but that’s a decent overview of what to expect.

Depending on what that logic looks like, and what constraints the SCM imposes, this process may bail early. For instance, some will just let you apply a patch on top of a potentially-dirty working directory, and some will not.

Mercurial won’t, which brings us to this project.

Out with the old

RBTools uses an SCM abstraction model, with classes implementing features for specific SCMs. We determine the right backend for a local source tree, get an instance of that backend, and then call methods on it.

Patches are run through an apply_patch() method. It looks like this:

def apply_patch(
    self,
    patch_file: str,
    *,
    base_path: str,
    base_dir: str,
    p: Optional[str] = None,
    revert: bool = False,
) -> PatchResult:
    ...

This takes in a path to a patch file, some criteria like whether to revert or commit a patch, and a few other things. The SCM can then use the default GNU patch implementation, or it can use something SCM-specific.

At the end of this, we get a PatchResult, which the caller can use to determine if the patch applied, if there were conflicts, or if it just outright failed.

The problem is, each patch is handled independently. There’s no way to say “I have 5 patches. Do what you need to do to apply them.”

So we’re stuck applying one-by-one in Mercurial, and therefore we’re doomed to fail.

This architecture is old, and we’re ready to move on from it.

In with the new

We’re introducing new primitives in RBTools, which give SCMs full control over the whole process. Not only are these useful for RBTools commands, but for third-party code built upon RBTools.

Patch application is now made up of the following pieces:

Patch: A representation of a patch to apply, with all the information needed to apply it.
Patcher: A class responsible for applying and committing patches, built to allow SCM-specific subclasses to communicate patching capabilities and to define patching logic.
PatchResult: The result of a patch operation, covering one or more patches.

Patcher consolidates the roles of both the old apply_patch() and some of the internal logic within our rbt patch command. By default, it just feeds patches into GNU patch one-by-one, but SCMs can do whatever they need to here by pointing to a custom Patcher subclass and changing that logic.

Callers tell the Patcher, “Hey, I have these patches, and have these settings to consider (reverting, making commits, etc.)” and will then get back an object that says what the patcher is capable of doing.

They can then say “Okay, begin patching, and give me each PatchResult as you go.” The Patcher can apply them one-by-one or in batches (Mercurial will use batches), and send back useful PatchResults as appropriate.

That in turn gives the caller the ability to report progress on the process without assuming anything about what that process looks like.

And it Just Works (TM)

This design is very clean to use. Here’s an example:

review_request = api_root.get_review_request(review_request_id=123)

patcher = scmclient.get_patcher(patches=[
    Patch(content=b'...'),
    Patch(content=b'...'),
    Patch(content=b'...'),
])
total_patches = len(patcher.patches)

try:
    if patcher.can_commit:
        print(f'Preparing to commit {total_patches} patches...')
        patcher.prepare_for_commit(review_request=review_request)
    else:
        print(f'Preparing to apply {total_patches} patches...')

    for patch_result in patcher.patch():
        print(f'Applied patch {patch_result.patch_num} / {total_patches}')
except ApplyPatchError as e:
    patch_result = e.failed_patch_result

    if patch_result:
        print(f'Error applying patch {patch_result.patch_num}: {e}')

        if patch_result.has_conflicts:
            print()
            print('Conflicts:')

            for conflict in patch_result.conflicts:
                print(f'* {conflict}')
    else:
        print(f'Error applying patches: {e}')

In this example, we:

Defined three patches to apply
Requested to perform commits if possible, using the review request information to aid in the process.
Displayed progress for any patched files.
Handled any patching errors, and showing any failed patch numbers and any conflicts if that information is available.

This will work across all SCMs, entirely backed by the SCM’s own patching logic.

This is still very much in progress, but is slated for RBTools 5.1, coming soon.

Until then, check out our all-new Review Board 7 release and our all-new Review Board Discord channel, open for all developers and for Review Board users alike.

A New Patching Process for RBTools 5.1 Read More »

Remembering ICQ: A Page Out of History

May 29, 2024 - Social Networking - Leave a Comment

I got onto ICQ very early in its life, around 1996 or 1997, with a 6-digit UIN (298387 — sadly stolen years later). I loved it, it’s how I stayed connected with family and friends, how I met new people, how I connected on the Internet at a time when the Internet was trying to figure out what communication and community looked like.

Years later I became a developer in the IM space working on Gaim/Pidgin, which supported ICQ and a myriad of other services. Looking back, seeing the evolution from ICQ to services like AIM and MSN to modern chats like Discord and Slack, there really hasn’t been a system truly like it since.

The pager model was quite different from what IMs evolved into. There was less a focus on “chat with me right now!” and more of a “I’ve sent you a message, get back to me with a reply when you can.”

The reliance on modems and limited internet time was clearly a factor in that design.

And really, it wasn’t so much that IM systems that came after that were an entirely different paradigm. It really just came down to the UI choices. With nearly all IM clients, when a person messaged you, a window popped up. With ICQ, you got a little “uh oh!” sound and a blinking icon in your ICQ window, and could choose to deal with it at your leisure.

That difference may seem small, but significantly changed the expectations around conversations. When an IM is in your face, you have to make a choice right then: Respond, or dismiss. Either way, it stole your attention away from what you were doing, and like a salesman handing you a product they want you to purchase, you feel a sense of obligation to engage.

The ICQ model was different: There’s a message waiting for you, and when you’re ready and available, you can choose to deal with it. You didn’t even see the contents of the message until you were ready, so there’s no guilt-driven drive to respond when a message came in. No “Well I guess I can answer this question real quick…” Don’t want to see anything at all? Just close or minimize the window, come back later.

If you wanted an actual live chat with rapid responses, you could do that, but it wasn’t the default. It wasn’t the expectation.

Today, you may be used to setting “Away”, “Not Available”, and “Do Not Disturb” statuses, and even going “Invisible”, but how about “Free for Chat”?

These days we deal with demands on our time all throughout the day. Notifications on our phones designed to draw our attention. News alerts that reach for emotional reactions. Incoming text and chat messages on a dozen services, all with snippets of a text that make you want to read the rest and then maybe respond before you forget.

There really isn’t incentive for services to adopt the ICQ model so much these days, as everyone’s competing for your attention, but the passive nature of messages and notifications that was part of the original ICQ could be a lesson in how to build software that helps keep people connected, while also letting us claw back control of our own time.

Early ICQ was unique. It’s changed over time, adopting to modern trends, newer protocols, and even new owners. The ICQ of today isn’t the same ICQ of 1996, that’s for sure (but surprisingly, your 1996 UIN would still work today). Still, it’s sad to see it go after almost 28 years, even though it’s not the same service it once was.

It’s been heartening seeing how many people remember it fondly. I know for me it helped set my life on a course of events that led to my involvement in a major open source project, then to my first job in the tech industry, and then to my first real company.

The service may be gone, but it won’t be forgotten. Not only are there lessons to learn from the way ICQ tackled communication and agency online, but there are, it turns out, enthusiasts working to bring it back:

The NINA and Escargot project (Twitter) is building their own ICQ server, fully compatible with ICQ clients, and the Discord has been flooded with people looking to discover ICQ for the first time or rediscover it all over again.
Pidgin still supports ICQ through third-party plugins.

And though I haven’t found one yet, I do hope that someone will recreate the original ICQ experience in some form. Or better, take what ICQ did well and think how we might learn from it when designing the interactions of today. I for one wouldn’t mind a simple, casual, and non-invasive approach to communication again.

Remembering ICQ: A Page Out of History Read More »

Excluding nested node_modules in Rollup.js

April 8, 2024 - Today I Learned, Web Development - Leave a Comment

We’re often developing multiple Node packages at the same time, symlinking their trees around in order to test them in other projects prior to release.

And sometimes we hit some pretty confusing behavior. Crazy caching issues, confounding crashes, and all manner of chaos. All resulting from one cause: Duplicate modules appearing in our Rollup.js-bundled JavaScript.

For example, we may be developing Ink (our in-progress UI component library) over here with one copy of Spina (our modern Backbone.js successor), and bundling it in Review Board (our open source, code review/document review product) over there with a different copy of Spina. The versions of Spina should be compatible, but technically they’re two separate copies.

And it’s all because of nested node_modules.

The nonsense of nested node_modules

Normally, when Rollup.js bundles code, it looks for any and all node_modules directories in the tree, considering them for dependency resolution.

If a dependency provides its own node_modules, and needs to bundle something from it, Rollup will happily include that copy in the final bundle, even if it’s already including a different copy for another project (such as the top-level project).

This is wasteful at best, and a source of awful nightmare bugs at worst.

In our case, because we’re symlinking source trees around, we’re ending up with Ink’s node_modules sitting inside Review Board’s node_modules (found at node_modules/@beanbag/ink/node_modules.), and we’re getting a copy of Spina from both.

Easily eradicating extra node_modules

Fortunately, it’s easy to resolve in Rollup.js with a simple bit of configuration.

Assuming you’re using @rollup/plugin-node-resolve, tweak the plugin configuration to look like:

{
    plugins: [
        resolve({
            moduleDirectories: [],
            modulePaths: ['node_modules'],
        }),
    ],
}

What we’re doing here is telling Resolve and Rollup two things:

Don’t look for node_modules recursively. moduleDirectories is responsible for looking for the named paths anywhere in the tree, and it defaults to ['node_modules']. This is why it’s even considering the nested copies to begin with.
Explicitly look for a top-level node_modules. modulePaths is responsible for specifying absolute paths or paths relative to the root of the tree where modules should be found. Since we’re no longer looking recursively above, we need to tell it which one we do want.

These two configurations together avoid the dreaded duplicate modules in our situation.

And hopefully it will help you avoid yours, too.

Excluding nested node_modules in Rollup.js Read More »