Python offers a handy module called pprint, which has helpers for formatting and printing data in a nicely-formatted way. If you haven’t used this, you should definitely explore it!
Most people will reach for this module when using pprint.pprint() (aka pprint.pp()) or pprint.pformat(), but one under-appreciated method is pprint.saferepr(), which advertises itself as returning a string representation of an object that is safe from some recursion issues. They demonstrate this as:
>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"
But actually repr()
handles this specific case just fine:
>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"
Pretty minimal difference there. Is there another reason we might care about saferepr()
?
Dictionary string comparisons!
saferepr()
first sets up the pprint
pretty-printing machinery, meaning it takes care of things like sorting keys in dictionaries.
This is great! This is really handy when writing unit tests that need to compare, say, logging of data.
See, modern versions of CPython 3 will preserve insertion order of keys into dictionaries, which can be nice but aren’t always great for comparison purposes. If you’re writing a unit test, you probably want to feel confident about the string you’re comparing against.
Let’s take a look at repr()
vs. saferepr()
with a basic dictionary.
>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"
A nice, stable order. Now we don’t have to worry about a unit test breaking on different versions or implementations or with different logic.
saferepr()
will disable some of pprint
()`’s typical output limitations. There’s no max dictionary depth. There’s no max line width. You just get a nice, normalized string of data for comparison purposes.
But not for sets…
Okay, it’s not perfect. Sets will still use their standard representation (which seems to be iteration order), and this might be surprising given how dictionaries are handled.
Interestingly, pprint()
and pformat()
will sort keys if it needs to break them across multiple lines, but otherwise, nope.
For example:
>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
'C',
'D',
'H',
'I',
'J',
'M',
'T',
'U',
'V',
'W',
'X',
'b',
'e',
'k',
'o',
't',
'v',
'w'}
Ah well, can’t have everything we want.
Still, depending on what you’re comparing, this can be a very handy tool in your Python Standard Library toolset.