A New Patching Process for RBTools 5.1

RBTools, our command line tool suite for Review Board, is getting all-new infrastructure for applying patches. This will be available in rbt patch, rbt land, and any custom code that needs to deal with patches.

A customer reported that multi-commit review requests won’t apply on Mercurial. The problem, it turns out, is that Mercurial won’t let you apply a patch if the working tree isn’t clean, and that includes if you’ve applied a prior patch in a series. That’s pretty annoying, but there are good reasons for it.

It can, however, apply multiple patches in one go. Which is nice, but impossible to use with RBTools today.

So this weekend, I began working on a new patch application process.

How SCMs apply patches

There are a lot of SCMs out there, and they all work a bit differently.

When dealing with patches, most are thin wrappers around GNU diff/patch, with a bit of pre/post-processing to deal with SCM-specific metadata in the patches.

The process usually looks like this:

  1. Extract SCM-specific data from the diff.

    The SCM patcher will read through the patch and look for any SCM-specific data to extract. This may include commit IDs/revisions, file modes, symlinks to apply, binary file changes, and commit messages and other metadata. It’ll usually validate the local checkout and any file modifications, to an extent.
  2. Normalize the diff, if needed.

    This can involve taking parts of the diff that can’t be passed to GNU patch (such as binary file changes, file/directory metadata changes) and setting those aside to handle specially. It may also split the patch into per-file segments. The results will be something that can be passed to GNU diff.
  3. Invoke the patcher (usually GNU patch or similar).

    This often involves calling an external patch program with a patch or an extracted per-file patch, checking the results, and choosing how to handle things like conflicts or staging a file for a commit or applying some metadata to the file.
  4. Invoke any custom patching logic.

    This is where logic specific to the SCM may be applied. Patching a binary file, changing directory metadata, setting up symlinks, etc.

SCMs can go any route with their patching logic, but that’s a decent overview of what to expect.

Depending on what that logic looks like, and what constraints the SCM imposes, this process may bail early. For instance, some will just let you apply a patch on top of a potentially-dirty working directory, and some will not.

Mercurial won’t, which brings us to this project.

Out with the old

RBTools uses an SCM abstraction model, with classes implementing features for specific SCMs. We determine the right backend for a local source tree, get an instance of that backend, and then call methods on it.

Patches are run through an apply_patch() method. It looks like this:

def apply_patch(
    self,
    patch_file: str,
    *,
    base_path: str,
    base_dir: str,
    p: Optional[str] = None,
    revert: bool = False,
) -> PatchResult:
    ...

This takes in a path to a patch file, some criteria like whether to revert or commit a patch, and a few other things. The SCM can then use the default GNU patch implementation, or it can use something SCM-specific.

At the end of this, we get a PatchResult, which the caller can use to determine if the patch applied, if there were conflicts, or if it just outright failed.

The problem is, each patch is handled independently. There’s no way to say “I have 5 patches. Do what you need to do to apply them.”

So we’re stuck applying one-by-one in Mercurial, and therefore we’re doomed to fail.

This architecture is old, and we’re ready to move on from it.

In with the new

We’re introducing new primitives in RBTools, which give SCMs full control over the whole process. Not only are these useful for RBTools commands, but for third-party code built upon RBTools.

Patch application is now made up of the following pieces:

  • Patch: A representation of a patch to apply, with all the information needed to apply it.
  • Patcher: A class responsible for applying and committing patches, built to allow SCM-specific subclasses to communicate patching capabilities and to define patching logic.
  • PatchResult: The result of a patch operation, covering one or more patches.

Patcher consolidates the roles of both the old apply_patch() and some of the internal logic within our rbt patch command. By default, it just feeds patches into GNU patch one-by-one, but SCMs can do whatever they need to here by pointing to a custom Patcher subclass and changing that logic.

Callers tell the Patcher, “Hey, I have these patches, and have these settings to consider (reverting, making commits, etc.)” and will then get back an object that says what the patcher is capable of doing.

They can then say “Okay, begin patching, and give me each PatchResult as you go.” The Patcher can apply them one-by-one or in batches (Mercurial will use batches), and send back useful PatchResults as appropriate.

That in turn gives the caller the ability to report progress on the process without assuming anything about what that process looks like.

And it Just Works (TM)

This design is very clean to use. Here’s an example:

review_request = api_root.get_review_request(review_request_id=123)

patcher = scmclient.get_patcher(patches=[
    Patch(content=b'...'),
    Patch(content=b'...'),
    Patch(content=b'...'),
])
total_patches = len(patcher.patches)

try:
    if patcher.can_commit:
        print(f'Preparing to commit {total_patches} patches...')
        patcher.prepare_for_commit(review_request=review_request)
    else:
        print(f'Preparing to apply {total_patches} patches...')

    for patch_result in patcher.patch():
        print(f'Applied patch {patch_result.patch_num} / {total_patches}')
except ApplyPatchError as e:
    patch_result = e.failed_patch_result

    if patch_result:
        print(f'Error applying patch {patch_result.patch_num}: {e}')

        if patch_result.has_conflicts:
            print()
            print('Conflicts:')

            for conflict in patch_result.conflicts:
                print(f'* {conflict}')
    else:
        print(f'Error applying patches: {e}')

In this example, we:

  1. Defined three patches to apply
  2. Requested to perform commits if possible, using the review request information to aid in the process.
  3. Displayed progress for any patched files.
  4. Handled any patching errors, and showing any failed patch numbers and any conflicts if that information is available.

This will work across all SCMs, entirely backed by the SCM’s own patching logic.

This is still very much in progress, but is slated for RBTools 5.1, coming soon.

Until then, check out our all-new Review Board 7 release and our all-new Review Board Discord channel, open for all developers and for Review Board users alike.