Development

Tip: Use keyword-only arguments in Python dataclasses

Python dataclasses are a really nice feature for constructing classes that primarily hold or work with data. They can be a good alternative to using dictionaries, since they allow you to add methods, dynamic properties, and subclasses. They can also be a good alternative to building your own class by hand, since they don’t need a custom __init__() that reassigns attributes and provide methods like __eq__() out of the box.

One small tip to keeping dataclasses maintainable is to always construct them with kw_only=True, like so:

from dataclasses import dataclass


@dataclass(kw_only=True)
class MyDataClass:
    x: int
    y: str
    z: bool = True

This will construct an __init__() that looks like this:

class MyDataClass:
    def __init__(
        self,
        *,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

Instead of:

class MyDataClass:
    def __init__(
        self,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

That * in the argument list means everything that follows must be passed as a keyword argument, instead of a positional argument.

There are two reasons you probably want to do this:

  1. It allows you to reorder the fields on the dataclass without breaking callers. Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
  2. It allows subclasses to add required fields. Normally, any field with a default value (like z above) will force any fields following it to also have a default. And that includes all fields defined by subclasses. Using kw_only=True gives subclasses the flexibility to decide for themselves which fields must be provided by the caller and which have a default.

These reasons are more important for library authors than anything. We spend a lot of time trying to ensure backwards-compatibility and forwards-extensibility in Review Board, so this is an important topic for us. And if you’re developing something reusable with dataclasses, it might be for you, too.

Update: One important point I left out is Python compatibility. This flag was introduced in Python 3.10, so if you’re supporting older versions, you won’t be able to use this just yet. If you want to optimistically enable this just on 3.10+, one approach would be:

import sys
from dataclasses import dataclass


if sys.version_info[:2] >= (3, 10):
    dataclass_kwargs = {
        'kw_only': True,
    }
else:
    dataclass_kwargs = {}

...

@dataclass(**dataclass_kwargs)
class MyDataClass:
    ...
...

But this won’t solve the subclassing issue, so you’d still need to ensure any subclasses use default arguments if you want to support versions prior to 3.10.

Tip: Use keyword-only arguments in Python dataclasses Read More »

CodeMirror and Spell Checking: Solved

A screenshot of the CodeMirror editor with spelling issues and the browser's spell checking menu opened.

For years I’ve wanted spell checking in CodeMirror. We use CodeMirror in our Review Board code review tool for all text input, in order to allow on-the-fly syntax highlighting of code, inline image display, bold/italic, code literals, etc.

(We’re using CodeMirror v5, rather than v6, due to the years’ worth of useful plugins and the custom extensions we’ve built upon it. CodeMirror v6 is a different beast. You should check it out, but we’re going to be using v5 for our examples here. Much of this can likely be leveraged for other editing components as well.)

CodeMirror is a great component for the web, and I have a ton of respect for the project, but its lack of spell checking has always been a complaint for our users.

And the reason for that problem lies mostly on the browsers and “standards.” Starting with…

ContentEditable Mode

Browsers support opting an element into what’s called Content Editable mode. This allows any element to become editable right in the browser, like a fancy <textarea>, with a lot of pretty cool capabilities:

  • Rich text editing
  • Rich copy/paste
  • Selection management
  • Integration with spell checkers, grammar checkers, AI writing assistants
  • Works as you’d expect with your device’s native input methods (virtual keyboard, speech-to-text, etc.)

Simply apply contenteditable="true" to an element, and you can begin typing away. Add spellcheck="true" and you get spell checking for free. Try it!

And maybe you don’t even need spellcheck="true"! The browser may just turn it on automatically. But you may need spellcheck="false" if you don’t want it on. And it might stay on anyway!

Here we reach the first of many inconsistencies. Content Editable mode is complex and not perfectly consistent across browsers (though it’s gotten better). A few things you might run into include:

  • Ranges for selection events and input events can be inconsistent across browsers and are full of edge cases (you’ll be doing a lot of “let me walk the DOM and count characters carefully to find out where this selection really starts” checks).
  • Spell checking behaves quite differently on different browsers (especially in Chrome and Safari, which might recognize a word as misspelled but won’t always show it).
  • Rich copy/paste may mess with your DOM structure in ways you don’t expect.
  • Programmatic manipulating of the text content using execCommand is deprecated with no suitable replacement (and you don’t want to mess with the DOM directly or you break Undo/Redo). It also doesn’t always play nice with input events.

CodeMirror v5 tries to let the browser do its thing and sync state back, but this doesn’t always work. Replacing misspelled words on Safari or Chrome can sometimes cause text to flip back-and-forth. Data can be lost. Cursor positions can change. It can be a mess.

So while CodeMirror will let you enable both Content Editable and Spell Checking modes, it’s at your own peril.

Which is why we never enabled it.

How do we fix this?

When CodeMirror v5 was introduced, there weren’t a lot of options. But browsers have improved since.

The secret sauce is the beforeinput event.

There are a lot of operations that can involve placing new text in a Content Editable:

  • Replacing a misspelled word
  • Using speech-to-text
  • Having AI generate content or rewrite existing content
  • Transforming text to Title Case

These will generate a beforeinput event before making the change, and an input event after making the change.

Both events provide:

  1. The type of operation:
    1. insertText for text-to-speech or newly-generated text
    2. insertReplacementText for spelling replacements, AI rewrites, and other similar operations
  2. The range of text being replaced (or where new text will be inserted)
  3. The new data (either as InputEvent.data in the form of one or more InputEvent.dataTransferItem.items[] entries)

Thankfully, beforeinput can be canceled, which prevents the operation from going through.

This is our way in. We can treat these operations as requests that CodeMirror can fulfill, instead of changes CodeMirror must react to.

A screenshot of a text field in CodeMirror with text highlighted and macOS's AI Writing Tools providing a concise version of the text.

Putting our plan into action

Here’s the general approach:

  1. Listen to beforeinput on CodeMirror’s input element (codemirror.display.input.div).
  2. Filter for the following InputEvent.inputType values: 'insertReplacementText', 'insertText'.
  3. Fetch the ranges and the new plain text data from the InputEvent.
  4. For each range:
    1. Convert each range into a start/end line number within CodeMirror, and a start/end within each line.
    2. Issue a CodeMirror.replaceRange() with the normalized ranges and the new text.

Simple in theory, but there’s a few things to get right:

  1. Different browsers and different operations will report those ranges on different elements. They might be text nodes, they might be a parent element, or they might be the top-level contenteditable element. Or a combination. So we need to be very careful about our assumptions.
  2. We need to be able to calculate those line numbers and offsets. We won’t necessarily have that information up-front, and it depends on what nodes we get in the ranges.
  3. The text data can come from more than one place:
    1. An InputEvent.data attribute value
    2. One or more strings accessed asynchronously from InputEvent.dataTransfer.items[], in plain text, HTML, or potentially other forms.
  4. We may not have all of this! Even as recently as late-2024, Chrome wasn’t giving me target ranges in beforeinput, only in input, which was too late. So we’ll want to bail if anything goes wrong.

Let’s put this into practice. I’ll use TypeScript to help make some of this a bit more clear, but you can do all this in JavaScript.

Feel free to skip to the end, if you don’t want to read a couple pages of TypeScript.

1. Set up our event handler

We’re going to listen to beforeinput. If it’s an event we care about, we’ll grab the target ranges, take over from the browser (by canceling the event), and then prepare to replay the operation using CodeMirror’s API.

This is going to require a bit of help figuring out what lines and columns those ranges correspond to, which we’ll tackle next.

const inputEl = codeMirror.display.input.div;
	
inputEl.addEventListener('beforeinput',
                         (evt: InputEvent) => {
    if (evt.inputType !== 'insertReplacementText' &&
        evt.inputType !== 'insertText') {
        /*
         * This isn't a text replacement or new text event,
         * so we'll want to let the browser handle this.
         *
         * We could just preventDefault()/stopPropagation()
         * if we really wanted to play it safe.
         */
        return;
    }

    /*
     * Grab the ranges from the event. This might be
     * empty, which would have been the case on some
     * versions of Chrome I tested with before. Play it
     * safe, bail if we can't find a range.
     *
     * Each range will have an offset in a start container
     * and an offset in an end container. These containers
     * may be text nodes or some parent node (up to and
     * including inputEl).
     */
    const ranges = evt.getTargetRanges();

    if (!ranges || ranges.length === 0) {
        /* We got empty ranges. There's nothing to do. */
        return;
    }

    const newText =
           evt.data
        ?? evt.dataTransfer?.getData('text')
        ?? null;

	if (newText === null) {
		/* We couldn't locate any text, so bail. */
        return;
	}

    /*
     * We'll take over from here. We don't want the browser
     * messing with any state and impacting CodeMirror.
     * Instead, we'll run the operations through CodeMirror.
     */
    evt.preventDefault();
    evt.stopPropagation();

    /*
     * Place the new text in CodeMirror.
     *
     * For each range, we're getting offsets CodeMirror
     * can understand and then we're placing text there.
     *
     * findOffsetsForRange() is where a lot of magic
     * happens.
     */
    for (const range of state.ranges) {
        const [startOffset, endOffset] =
            findOffsetsForRange(range);

        codeMirror.replaceRange(
            newText,
            startOffset,
            endOffset,
            '+input',
        );
    }
});

This is pretty easy, and applicable to more than CodeMirror. But now we’ll get into some of the nitty-gritty.

2. Map from ranges to CodeMirror positions

Most of the hard work really comes from mapping the event’s ranges to CodeMirror line numbers and columns.

We need to know the following:

  1. Where each container node is in the document, for each end of the range.
  2. What line number each corresponds to.
  3. What the character offset is within that line.

This ultimately means a lot of traversing of the DOM (we can use TreeWalker for that) and counting characters. DOM traversal is an expense we want to incur as little as possible, so if we’re working on the same nodes for both end of the range, we’ll just calculate it once.

function findOffsetsForRange(
    range: StaticRange,
): [CodeMirror.Position, CodeMirror.Position] {
    /*
     * First, pull out the nodes and the nearest elements
     * from the ranges.
     *
     * The nodes may be text nodes, in which case we'll
     * need their parent for document traversal.
     */
    const startNode = range.startContainer;
    const endNode = range.endContainer;

    const startEl = (
        (startNode.nodeType === Node.ELEMENT_NODE)
        ? startNode as HTMLElement
        : startNode.parentElement);
    const endEl = (
        (endNode.nodeType === Node.ELEMENT_NODE)
        ? endNode as HTMLElement
        : endNode.parentElement);

    /*
     * Begin tracking the state we'll want to return or
     * use in future computations.
     *
     * In the optimal case, we'll be calculating some of
     * this only once and then reusing it.
     */
    let startLineNum = null;
    let endLineNum = null;
    let startOffsetBase = null;
    let startOffsetExtra = null;
    let endOffsetBase = null;
    let endOffsetExtra = null;

    let startCMLineEl: HTMLElement = null;
    let endCMLineEl: HTMLElement = null;

    /*
     * For both ends of the range, we'll need to first see
     * if we're at the top input element.
     *
     * If so, range offsets will be line-based rather than
     * character-based.
     *
     * Otherwise, we'll need to find the nearest line and
     * count characters until we reach our node.
     */
    if (startEl === inputEl) {
        startLineNum = range.startOffset;
    } else {
        startCMLineEl = startEl.closest('.CodeMirror-line');
        startOffsetBase = findCharOffsetForNode(startNode);
        startOffsetExtra = range.startOffset;
    }

    if (endEl === inputEl) {
        endLineNum = range.endOffset;
    } else {
        /*
         * If we can reuse the results from calculations
         * above, that'll save us some DOM traversal
         * operations. Otherwise, fall back to doing the
         * same logic we did above.
         */
        endCMLineEl =
            (range.endContainer === range.startContainer &&
             startCMLineEl !== null)
            ? startCMLineEl
            : endEl.closest(".CodeMirror-line");

        endOffsetBase =
            (startEl === endEl && startOffsetBase !== null)
            ? startOffsetBase
            : findCharOffsetForNode(endNode);
        endOffsetExtra = range.endOffset;
    }

    if (startLineNum === null || endLineNum === null) {
        /*
         * We need to find the line numbers that correspond
         * to either missing end of our range. To do this,
         * we have to walk the lines until we find both our
         * missing line numbers.
         */
        for (let i = 0;
             (i < children.length &&
              (startLineNum === null || endLineNum === null));
             i++) {
            const child = children[i];

            if (startLineNum === null &&
                child === startCMLineEl) {
                startLineNum = i;
            }

            if (endLineNum === null &&
                child === endCMLineEl) {
                endLineNum = i;
            }
        }
    }

    /*
     * Return our results.
     *
     * We may not have set some of the offsets above,
     * depending on whether we were working off of the
     * CodeMirror input element, a text node, or another
     * parent element. And we didn't want to set them any
     * earlier, because we were checking to see what we
     * computed and what we could reuse.
     *
     * At this point, anything we didn't calculate should
     * be 0.
     */
    return [
        {
            ch: (startOffsetBase || 0) +
                (startOffsetExtra || 0),
            line: startLineNum,
        },
        {
            ch: (endOffsetBase || 0) +
                (endOffsetExtra || 0),
            line: endLineNum,
        },
    ];
}


/*
 * The above took care of our line numbers and ranges, but
 * it got some help from the next function, which is designed
 * to calculate the character offset to a node from an
 * ancestor element.
 */
function findCharOffsetForNode(
    targetNode: Node,
): number {
    const targetEl = (
        targetNode.nodeType === Node.ELEMENT_NODE)
        ? targetNode as HTMLElement
        : targetNode.parentElement;
    const startEl = targetEl.closest('.CodeMirror-line');
    let offset = 0;

    const treeWalker = document.createTreeWalker(
        startEl,
        NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT,
    );

    while (treeWalker.nextNode()) {
        const node = treeWalker.currentNode;

        if (node === targetNode) {
            break;
        }

        if (node.nodeType === Node.TEXT_NODE) {
            offset += (node as Text).data.length;
        }
    }

    return offset;

}

Whew! That’s a lot of work.

CodeMirror has some similar logic internally, but it’s not exposed, and not quite what we want. If you were working on making all this work with another editing component, it’s possible this would be more straight-forward.

What does this all give us?

  1. Spell checking and replacements without (nearly as many) glitches in browsers
  2. Speech-to-text without CodeMirror stomping over results
  3. AI writing and rewriting, also without risk of lost data
  4. Transforming of text through other means.

Since we took the control away from the browser and gave it to CodeMirror, we removed most of the risk and instability.

But there are still problems. While this works great on Firefox, Chrome and Safari are a different story. Those browsers are bit more lazy when it comes to spell checking, and even once it’s found some spelling errors, you might not see the red squigglies. Typing, clicking around, or forcing a round of spell checking might bring them back, but might not. But this is their implementation, and not the result of the CodeMirror integration.

Ideally, spell checking would become a first-class citizen on the web. And maybe this will happen someday, but for now, at least there are some workarounds to get it to play nicer with tools like CodeMirror.

We can go further

There’s so much more in InputEvent we could play with. We explored the insertReplacementText and insertText types, but there’s also:

  • insertLink
  • insertFromDrop
  • insertOrderedList
  • formatBold
  • historyUndo

And so many more.

These could be integrated deeper into CodeMirror, which may open some doors to a far more native feel on more platforms. But that’s left as an exercise to the reader (it’s pretty dependent on your CodeMirror modes and the UI you want to provide).

There are also improvements to be made, as this is not perfect yet (but it’s close!). Safari still doesn’t recognize when text is selected, leaving out the AI assisted tools, but Chrome and Firefox work great. We’re working on the rest.

Give it a try

You can try our demo live in your favorite browser. If it doesn’t work for you, let me know what your browser and version are. I’m very curious.

We’ve released this as a new CodeMirror v5 plugin, CodeMirror Speak-and-Spell (NPM). No dependencies. Just drop it into your environment and enable it on your CodeMirror editor, like so:

const codeMirror = new CodeMirror(element, {
  inputStyle: 'contenteditable',
  speakAndSpell: true,
  spellcheck: true,
});

CodeMirror v6 will come in the future, but probably not until we move to v6 (and we’re waiting on a lot of the v5 world to migrate over first).

CodeMirror and Spell Checking: Solved Read More »

Excluding nested node_modules in Rollup.js

We’re often developing multiple Node packages at the same time, symlinking their trees around in order to test them in other projects prior to release.

And sometimes we hit some pretty confusing behavior. Crazy caching issues, confounding crashes, and all manner of chaos. All resulting from one cause: Duplicate modules appearing in our Rollup.js-bundled JavaScript.

For example, we may be developing Ink (our in-progress UI component library) over here with one copy of Spina (our modern Backbone.js successor), and bundling it in Review Board (our open source, code review/document review product) over there with a different copy of Spina. The versions of Spina should be compatible, but technically they’re two separate copies.

And it’s all because of nested node_modules.

The nonsense of nested node_modules

Normally, when Rollup.js bundles code, it looks for any and all node_modules directories in the tree, considering them for dependency resolution.

If a dependency provides its own node_modules, and needs to bundle something from it, Rollup will happily include that copy in the final bundle, even if it’s already including a different copy for another project (such as the top-level project).

This is wasteful at best, and a source of awful nightmare bugs at worst.

In our case, because we’re symlinking source trees around, we’re ending up with Ink’s node_modules sitting inside Review Board’s node_modules (found at node_modules/@beanbag/ink/node_modules.), and we’re getting a copy of Spina from both.

Easily eradicating extra node_modules

Fortunately, it’s easy to resolve in Rollup.js with a simple bit of configuration.

Assuming you’re using @rollup/plugin-node-resolve, tweak the plugin configuration to look like:

{
    plugins: [
        resolve({
            moduleDirectories: [],
            modulePaths: ['node_modules'],
        }),
    ],
}

What we’re doing here is telling Resolve and Rollup two things:

  1. Don’t look for node_modules recursively. moduleDirectories is responsible for looking for the named paths anywhere in the tree, and it defaults to ['node_modules']. This is why it’s even considering the nested copies to begin with.
  2. Explicitly look for a top-level node_modules. modulePaths is responsible for specifying absolute paths or paths relative to the root of the tree where modules should be found. Since we’re no longer looking recursively above, we need to tell it which one we do want.

These two configurations together avoid the dreaded duplicate modules in our situation.

And hopefully it will help you avoid yours, too.

Excluding nested node_modules in Rollup.js Read More »

Building Multi-Platform Docker Images Using Multiple Hosts

Here’s a very quick, not exactly comprehensive tutorial on building Docker images using multiple hosts (useful for building multiple architectures).

If you’re an expert on docker buildx, you may know all of this already, but if you’re not, hopefully you find this useful.

We’ll make some assumptions in this tutorial:

  1. We want to build a single Docker image with both linux/amd64 and linux/arm64 architectures.
  2. We’ll be building the linux/arm64 image on the local machine, and linux/amd64 on a remote machine (accessible via SSH).
  3. We’ll call this builder instance “my-builder”

We’re going to accomplish this by building a buildx builder instance for the local machine and architecture, then append a configuration for another machine. And then we’ll activate that instance.

This is easy.

Step 1: Create your builder instance for localhost and arm64

$ docker buildx create \
    --name my-builder \
    --platform linux/arm64

This will create our my-builder instance, defaulting it to using our local Docker setup for linux/arm64.

If we wanted, we could provide a comma-separated list of platforms that the local Docker should be handling (e.g., --platform linux/arm64,darwin/arm64).

(This doesn’t have to be arm64. I’m just using this as an example.)

Step 2: Add your amd64 builder

$ docker buildx create \
    --name my-builder \
    --append \
    --platform linux/amd64 \
    ssh://<user>@<remotehost>

This will update our my-builder, informing it that linux/amd64 builds are supported and must go through the Docker service over SSH.

Note that we could easily add additional builders if we wanted (whether for the same architectures or others) by repeating this command and choosing new --platform values and remote hosts

Step 3: Verify your builder instance

Let’s take a look and make sure we have the builder setup we expect:

$ docker buildx ls
NAME/NODE       DRIVER/ENDPOINT           STATUS    BUILDKIT  PLATFORMS
my-builder *    docker-container
  my-builder0   desktop-linux             inactive            linux/arm64*
  my-builder1   ssh://myuser@example.com  inactive            linux/amd64*

Yours may look different, but it should look something like that. You’ll also see default and any other builders you’ve set up.

Step 4: Activate your builder instance

Now we’re ready to use it:

$ docker buildx use my-builder

Just that easy.

Step 5: Build your image

If all went well, we can now safely build our image:

$ docker buildx build --platform linux/arm64,linux/amd64 .

You should see build output for each architecture stream by.

If we want to make sure the right builder is doing the right thing, you can re-run docker buildx ls in another terminal. You should see running as the status for each, along with an inferred list of other architectures that host can now build (pretty much anything it natively supports that you didn’t explicitly configure above).

Step 6: Load your image into Docker

You probably want to test your newly-built image locally, don’t you? When you run the build, you might notice this message:

WARNING: No output specified with docker-container driver. Build
result will only remain in the build cache. To push result image
into registry use --push or to load image into docker use --load

And if you try to start it up, you might notice it’s missing (or that you’re running a pre-buildx version of your image).

What you need to do is re-run docker buildx build with --load and a single platform, like so:

$ docker buildx build --load --platform linux/arm64 .

That’ll rebuild it (it’ll likely just reuse what it built before) and then make it available in your local Docker registry.

Hope that helps!

Building Multi-Platform Docker Images Using Multiple Hosts Read More »

Re-typing Parent Class Attributes in TypeScript

I was recently working on converting some code away from Backbone.js and toward Spina, our TypeScript Backbone “successor” used in Review Board, and needed to override a type from a parent class.

(I’ll talk about why we still choose to use Backbone-based code another time.)

We basically had this situation:

class BaseClass {
    summary: string | (() => string) = 'BaseClass thing doer';
    description: string | (() => string);
}

class MySubclass extends BaseClass {
    get summary(): string {
        return 'MySubclass thing doer';
    }

    // We'll just make this a standard function, for demo purposes.
    description(): string {
        return 'MySubclass does a thing!';
    }
}

TypeScript doesn’t like that so much:

Class 'BaseClass' defines instance member property 'summary', but extended class 'MySubclass' defines it as an accessor.

Class 'BaseClass' defines instance member property 'description', but extended class 'MySubclass' defines it as instance member function.

Clearly it doesn’t want me to override these members, even though one of the allowed values is a callable returning a string! Which is what we wrote, darnit!!

So what’s going on here?

How ES6 class members work

If you’re coming from another language, you might expect members defined on the class to be class members. For example, you might think you could access BaseClass.summary directly, but you’d be wrong, because these are instance members.

Re-typing Parent Class Attributes in TypeScript Read More »

Peer-Programming a Buggy World with ChatGPT AI

AI has been all the rage lately, with solutions like Stable Diffusion for image generation, GPT-3 for text generation, and CoPilot for code development becoming publicly available to the masses.

That excitement ramped up this week with the release of ChatGPT, an extremely impressive chat-based AI system leveraging the best GPT has to offer.

I decided last night to take ChatGPT for a spin, to test its code-generation capabilities. And I was astonished by the experience.

Together, we built a simulation of bugs foraging for food in a 100×100 grid world, tracking essentials like hunger and life, reproducing, and dealing with hardships involving seasonal changes, natural disasters, and predators. All graphically represented.

We’re going to explore this in detail, but I want to start off by showing you what we built:

Also, you can find out more on my GitHub repository

A Recap of my Experience

Before we dive into the collaborative sessions that resulted in a working simulation, let me share a few thoughts and tidbits about my experience:

Peer-Programming a Buggy World with ChatGPT AI Read More »

Scratching Out AI Chicken Art with Stable Diffusion

I’ve been enjoying playing with Stable Diffusion, an AI image generator that came out this past week. It runs phenomenally on my M1 Max Macbook Pro with 64GB of RAM, taking only about 30 seconds to produce an image at standard settings.

AI image generation has been a controversial, but exciting, topic in the news as of late. I’ve been following it with interest, but thought I was still years off from being able to actually play with it on my own hardware. That all changed this week.

I’m on day two now with Stable Diffusion, having successfully installed the M1 support via a fork. And my topic to get my feet wet has been…

Chickens.

Why not.

So let’s begin our tour. I’ll provide prompts and pictures, but please not I do not have the seeds (due to a bug with seed stability in the M1 fork).

Scratching Out AI Chicken Art with Stable Diffusion Read More »

Integration and Simulation Tests in Python

One of my (many) tasks lately has been to rework unit and integration tests for Review Bot, our automated code review add-on for Review Board.

The challenge was providing a test suite that could test against real-world tools, but not require them. An ever-increasing list of compatible tools has threatened to become an ever-increasing burden on contributors. We wanted to solve that.

So here’s how we’re doing it.

First off, unit test tooling

First off, this is all Python code, which you can find on the Review Bot repository on GitHub.

We make heavy use of kgb, a package we’ve written to add function spies to Python unit tests. This goes far beyond Mock, allowing nearly any function to be spied on without having to be replaced. This module is a key component to our solution, given our codebase and our needs, but it’s an implementation detail — it isn’t a requirement for the overall approach.

Still, if you’re writing complex Python test suites, check out kgb.

Deciding on the test strategy

Review Bot can talk to many command line tools, which are used to perform checks and audits on code. Some are harder than others to install, or at least annoying to install.

We decided there’s two types of tests we need:

  1. Integration tests — ran against real command line tools
  2. Simulation tests — ran against simulated output/results that would normally come from a command line tool

Being that our goal is to ease contribution, we have to keep in mind that we can’t err too far on that side at the expense of a reliable test suite.

We decided to make these the same tests.

The strategy, therefore, would be this:

  1. Each test would contain common logic for integration and simulation tests. A test would set up state, perform the tool run, and then check results.
  2. Integration tests would build upon this by checking dependencies and applying configuration before the test run.
  3. Simulation tests would be passed fake output or setup data needed to simulate that tool.

This would be done without any code duplication between integration or simulation tests. There would be only one test function per expectation (e.g., a successful result or the handling of an error). We don’t want to worry about tests getting out of sync.

Regression in our code? Both types of tests should catch it.

Regression or change in behavior in an integrated tool? Any fixes we apply would update or build upon the simulation.

Regression in the simulation? Something went wrong, and we caught it early without having to run the integration test.

Making this all happen

We introduced three core testing components:

  1. @integration_test() — a decorator that defines and provides dependencies and input for an integration test
  2. @simulation_test() — a decorator that defines and provides output and results for a simulation test
  3. ToolTestCaseMetaClass — a metaclass that ties it all together

Any test class that needs to run integration and simulation tests will use ToolTestCaseMetaClass and then apply either or both @integration_test/@simulation_test decorators to the necessary test functions.

When a decorator is applied, the test function is opted into that type of test. Data can be passed into the decorator, which is then passed into the parent test class’s setup_integration_test() or setup_simulation_test().

These can do whatever they need to set up that particular type of test. For example:

  • Integration test setup defaults to checking dependencies, skipping a test if not met.
  • Simulation test setup may write some files or spy on a subprocess.Popen() call to fake output.


For example:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=(
        b'MyTool 1.2.3\n'
        b'Scanning code...\n'
        b'0 errors, 0 warnings, 1 file(s) checked\n'
    ))
    def test_execute(self):
        """Testing MyTool.execute"""
        ...

When applied, ToolTestCaseMetaClass will loop through each of the test_*() functions with these decorators applied and split them up:

  • Test functions with @integration_test will be split out into a test_integration_<name>() function, with a [integration test] suffix appended to the docstring.
  • Test functions with @simulation_test will be split out into test_simulation_<name>(), with a [simulation test] suffix appended.

The above code ends up being equivalent to:

class MyTests(kgb.SpyAgency, TestCase):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    def test_integration_execute(self):
        """Testing MyTool.execute [integration test]"""
        self.setup_integration_test(exe_deps=['mytool'])
        self._test_common_execute()

    def test_simulation_execute(self):
        """Testing MyTool.execute [simulation test]"""
        self.setup_simulation_test(output=(
            b'MyTool 1.2.3\n'
            b'Scanning code...\n'
            b'0 errors, 0 warnings, 1 file(s) checked\n'
        ))
        self._test_common_execute()

    def _test_common_execute(self):
        ...

Pretty similar, but less to maintain in the end, especially as tests pile up.

And when we run it, we get something like:

Testing MyTool.execute [integration test] ... ok
Testing MyTool.execute [simulation test] ... ok

...

Or, you know, with a horrible, messy error.

Iterating on tests

It’s become really easy to maintain and run these tests.

We can now start by writing the integration test, modify the code to log any data that might be produced by the command line tool, and then fake-fail the test to see that output.

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])

        # XXX
        print(repr(payload))

        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

        # XXX Fake-fail the test
        assert False

I can run that and get the results I’ve printed:

======================================================================
ERROR: Testing MyTool.process_results [integration test]
----------------------------------------------------------------------
Traceback (most recent call last):
    ...
-------------------- >> begin captured stdout << ---------------------
{"errors": [{"code": 123, "column": 13, "filename": "test.c", "line': 1, "message": "Expected return type: int"}]}

Now that I have that, and I know it’s all working right, I can feed that output into the simulation test and clean things up:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=json.dumps(
        'errors': [
            {
                'filename': 'test.c',
                'code': 123,
                'line': 1,
                'column': 13,
                'message': 'Expected return type: int',
            },
        ]
    ).encode('utf-8'))
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])
        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

Once it’s running correctly in both tests, our job is done.

From then on, anyone working on this code can just simply run the test suite and make sure their change hasn’t broken any simulation tests. If it has, and it wasn’t intentional, they’ll have a great starting point in diagnosing their issue, without having to install anything.

Anything that passes simulation tests can be considered a valid contribution. We can then test against the real tools ourselves before landing a change.

Development is made simpler, and there’s no worry about regressions.

Going forward

We’re planning to apply this same approach to both Review Board and RBTools. Both currently require contributors to install a handful of command line tools or optional Python modules to make sure they haven’t broken anything, and that’s a bottleneck.

In the future, we’re looking at making use of python-nose‘s attrib plugin, tagging integration and simulation tests and making it trivially easy to run just the suites you want.

We’re also considering pulling out the metaclass and decorators into a small, reusable Python packaging, making it easy for others to make use of this pattern.

Integration and Simulation Tests in Python Read More »

Weird bugs: Django, timezones, and importing from eggs

Every so often you hit a bug that makes you question your sanity. The past several days have been spent chasing one of the more confusing ones I’ve seen in a long time.

Review Board 1.7 added the ability to set the server-wide timezone. During development, we found problems using SSH with a non-default timezone. This only happened when updating os.environ[‘TZ’] to something other than our default of UTC. We’d see the SSH process (rbssh, our wrapper for SSH communication) break due to an EOF on stdin and stdout, and then we’d see the development server reload itself.

Odd.

Since this originated with a Subversion repository, I first suspected libsvn. I spent some time going through their code to see if a timezone update would break something. Perhaps timeout logic. That didn’t turn up anything interesting, but I couldn’t rule it out.

Other candidates for suspicion were rbssh itself, paramiko (the SSH library), Django, and the trickster god Loki. We just had too many moving pieces to know for sure.

So I wrote a little script to get in-between a calling process and another process and log all communication between them. I tested this with rbssh and with plain ol’ ssh. rbssh was the only one that broke. Strange, since it wasn’t doing anything obviously wrong, and it worked with the default timezone. Unless it was Paramiko somehow…

For the heck of it, I tried copying some of rbssh’s imports into this new script. Ah-ha! It dropped its streams when importing Paramiko, same as rbssh. Interesting. Time to dig into that code.

The base paramiko module imports a couple dozen other modules, so I started by narrowing it down and reducing imports until I found the common one that breaks things. Well that turned out to be a module that imported Crypto.Random. Replacing the paramiko import in my wrapper with Crypto.Random verified that that was the culprit.

Getting closer…

I rinsed and repeated with Crypto.Random, digging through the code and seeing what could have broken. Hmm, that code’s pretty straight-forward, but there are some native libraries in there. Well, all this is in a .egg file (not an extracted .egg directory), making it hard to look through, so I extracted it and replaced it with a .egg directory.

Woah! The problem went away!

I glance at the clock. 3AM. I’m not sure I can trust what I’m seeing anymore. Crypto.Random breaks rbssh, but only when installed as a .egg file and not a .egg directory. That made no sense, but I figured I’d deal with it in the morning.

My dreams that night were filled with people wearing “stdin” and “stdout” labels on their foreheads, not at all getting along.

Today, I considered just ripping out timezone support. I didn’t know what else to do. Though, since I’m apparently a bit of a masochist, I decided to look into this just a little bit more. And finally struck gold.

With my Django development server running, I opened up a separate, plain Python shell. In it, I typed “import Crypto.Random”. And suddenly saw my development server reload.

How could that happen, I wondered. I tried it again. Same result. And then… lightbulb!

Django reloads the dev server when modules change. Crypto is a self-contained .egg file with native files that must be extracted and added to the module path. Causing Django to reload. Causing it to drop the spawned rbssh process. Causing the streams to disconnect. Ah-ha. This had to be it.

One last piece of the puzzle. The timezone change.

I quickly located their autoreload code and pulled it up. Yep, it’s comparing modified timestamps. We have two processes with two different ideas of what the current timezone is (one UTC, one US/Pacific, in my case), meaning when rbssh launched and imported Crypto, we’d get a bunch of files extracted with US/Pacific-based timestamps and not UTC, triggering the autoreload.

Now that the world makes sense again, I can finally fix the problem!

All told, that was about 4 or 5 days of debugging. Certainly not the longest debugging session I’ve had, but easily one of the more confusing ones in a while. Yet in the end, it’s almost obvious.

Weird bugs: Django, timezones, and importing from eggs Read More »

Django Development with Djblets: Unrooting your URLs

Typically, Django sites are designed with the assumption that they’ll have a domain or subdomain to themselves. Often times this is fine, but if you’re developing a web application designed for redistribution, sometimes you can’t make that assumption.

During development of Review Board, many of our users wanted the ability to install Review Board into a subdirectory of one of their domains, rather than a subdomain.

There’s a few rules that are important when making your site relocatable:

  • Always use MEDIA_URL when referring to your media directory, so that people can customize where they put their media files.
  • Don’t hard-code URLs in templates. Use the {% url %} tag or get_absolute_url() when possible.

These solve some of the issues, but doesn’t address relocating a Django site to a subdirectory.

Djblets fills in this gap by providing a special URL handler designed to act as a prefix for all your projects’ URLs. To make use of this, you need to modify your settings.py file as follows:

settings.py

TEMPLATE_CONTEXT_PROCESSORS = (
    ...
    'djblets.util.context_processors.siteRoot',
)

SITE_ROOT_URLCONF = 'yourproject.urls'
ROOT_URLCONF = 'djblets.util.rooturl'

SITE_ROOT = '/'

SITE_ROOT specifies where the site is located. By default, this should be “/”, but this can be changed to point to any path. For example, “/myproject/”. Note that it should always have a leading and a trailing slash.

The custom template context processor (djblets.util.context_processors.siteRoot) will make SITE_ROOT accessible to templates.

SITE_ROOT should be used in templates when you need to refer to URLs that aren’t designed to respect SITE_ROOT (such as User.get_absolute_url). Your own custom applications should always respect SITE_ROOT whenever providing a URL.

ROOT_URLCONF is typically what you would set to point to your project’s URL. However, in this case, you’ll be pointing it to djblets.util.rooturl. This in turn will forward all URLs to your project’s handler, defined in SITE_ROOT_URLCONF.

This is all you need to have a fully relocatable Django site!

To sum up:

  1. Add djblets.util.context_processors.siteRoot to your TEMPLATE_CONTEXT_PROCESSORS.
  2. Set SITE_ROOT_URLCONF to your project’s URL handler.
  3. Set ROOT_URLCONF to ‘djblets.util.rooturl’
  4. Prefix any URLs with SITE_ROOT in your templates, unless the URL would already take SITE_ROOT into account.

This is functionality that will hopefully make its way into Django at some point. For now, you have an easy way of unrooting your Django project.

Django Development with Djblets: Unrooting your URLs Read More »

Scroll to Top