Writings

CodeMirror and Spell Checking: Solved

A screenshot of the CodeMirror editor with spelling issues and the browser's spell checking menu opened.

For years I’ve wanted spell checking in CodeMirror. We use CodeMirror in our Review Board code review tool for all text input, in order to allow on-the-fly syntax highlighting of code, inline image display, bold/italic, code literals, etc.

(We’re using CodeMirror v5, rather than v6, due to the years’ worth of useful plugins and the custom extensions we’ve built upon it. CodeMirror v6 is a different beast. You should check it out, but we’re going to be using v5 for our examples here. Much of this can likely be leveraged for other editing components as well.)

CodeMirror is a great component for the web, and I have a ton of respect for the project, but its lack of spell checking has always been a complaint for our users.

And the reason for that problem lies mostly on the browsers and “standards.” Starting with…

ContentEditable Mode

Browsers support opting an element into what’s called Content Editable mode. This allows any element to become editable right in the browser, like a fancy <textarea>, with a lot of pretty cool capabilities:

  • Rich text editing
  • Rich copy/paste
  • Selection management
  • Integration with spell checkers, grammar checkers, AI writing assistants
  • Works as you’d expect with your device’s native input methods (virtual keyboard, speech-to-text, etc.)

Simply apply contenteditable="true" to an element, and you can begin typing away. Add spellcheck="true" and you get spell checking for free. Try it!

And maybe you don’t even need spellcheck="true"! The browser may just turn it on automatically. But you may need spellcheck="false" if you don’t want it on. And it might stay on anyway!

Here we reach the first of many inconsistencies. Content Editable mode is complex and not perfectly consistent across browsers (though it’s gotten better). A few things you might run into include:

  • Ranges for selection events and input events can be inconsistent across browsers and are full of edge cases (you’ll be doing a lot of “let me walk the DOM and count characters carefully to find out where this selection really starts” checks).
  • Spell checking behaves quite differently on different browsers (especially in Chrome and Safari, which might recognize a word as misspelled but won’t always show it).
  • Rich copy/paste may mess with your DOM structure in ways you don’t expect.
  • Programmatic manipulating of the text content using execCommand is deprecated with no suitable replacement (and you don’t want to mess with the DOM directly or you break Undo/Redo). It also doesn’t always play nice with input events.

CodeMirror v5 tries to let the browser do its thing and sync state back, but this doesn’t always work. Replacing misspelled words on Safari or Chrome can sometimes cause text to flip back-and-forth. Data can be lost. Cursor positions can change. It can be a mess.

So while CodeMirror will let you enable both Content Editable and Spell Checking modes, it’s at your own peril.

Which is why we never enabled it.

How do we fix this?

When CodeMirror v5 was introduced, there weren’t a lot of options. But browsers have improved since.

The secret sauce is the beforeinput event.

There are a lot of operations that can involve placing new text in a Content Editable:

  • Replacing a misspelled word
  • Using speech-to-text
  • Having AI generate content or rewrite existing content
  • Transforming text to Title Case

These will generate a beforeinput event before making the change, and an input event after making the change.

Both events provide:

  1. The type of operation:
    1. insertText for text-to-speech or newly-generated text
    2. insertReplacementText for spelling replacements, AI rewrites, and other similar operations
  2. The range of text being replaced (or where new text will be inserted)
  3. The new data (either as InputEvent.data in the form of one or more InputEvent.dataTransferItem.items[] entries)

Thankfully, beforeinput can be canceled, which prevents the operation from going through.

This is our way in. We can treat these operations as requests that CodeMirror can fulfill, instead of changes CodeMirror must react to.

A screenshot of a text field in CodeMirror with text highlighted and macOS's AI Writing Tools providing a concise version of the text.

Putting our plan into action

Here’s the general approach:

  1. Listen to beforeinput on CodeMirror’s input element (codemirror.display.input.div).
  2. Filter for the following InputEvent.inputType values: 'insertReplacementText', 'insertText'.
  3. Fetch the ranges and the new plain text data from the InputEvent.
  4. For each range:
    1. Convert each range into a start/end line number within CodeMirror, and a start/end within each line.
    2. Issue a CodeMirror.replaceRange() with the normalized ranges and the new text.

Simple in theory, but there’s a few things to get right:

  1. Different browsers and different operations will report those ranges on different elements. They might be text nodes, they might be a parent element, or they might be the top-level contenteditable element. Or a combination. So we need to be very careful about our assumptions.
  2. We need to be able to calculate those line numbers and offsets. We won’t necessarily have that information up-front, and it depends on what nodes we get in the ranges.
  3. The text data can come from more than one place:
    1. An InputEvent.data attribute value
    2. One or more strings accessed asynchronously from InputEvent.dataTransfer.items[], in plain text, HTML, or potentially other forms.
  4. We may not have all of this! Even as recently as late-2024, Chrome wasn’t giving me target ranges in beforeinput, only in input, which was too late. So we’ll want to bail if anything goes wrong.

Let’s put this into practice. I’ll use TypeScript to help make some of this a bit more clear, but you can do all this in JavaScript.

Feel free to skip to the end, if you don’t want to read a couple pages of TypeScript.

1. Set up our event handler

We’re going to listen to beforeinput. If it’s an event we care about, we’ll grab the target ranges, take over from the browser (by canceling the event), and then prepare to replay the operation using CodeMirror’s API.

This is going to require a bit of help figuring out what lines and columns those ranges correspond to, which we’ll tackle next.

const inputEl = codeMirror.display.input.div;
	
inputEl.addEventListener('beforeinput',
                         (evt: InputEvent) => {
    if (evt.inputType !== 'insertReplacementText' &&
        evt.inputType !== 'insertText') {
        /*
         * This isn't a text replacement or new text event,
         * so we'll want to let the browser handle this.
         *
         * We could just preventDefault()/stopPropagation()
         * if we really wanted to play it safe.
         */
        return;
    }

    /*
     * Grab the ranges from the event. This might be
     * empty, which would have been the case on some
     * versions of Chrome I tested with before. Play it
     * safe, bail if we can't find a range.
     *
     * Each range will have an offset in a start container
     * and an offset in an end container. These containers
     * may be text nodes or some parent node (up to and
     * including inputEl).
     */
    const ranges = evt.getTargetRanges();

    if (!ranges || ranges.length === 0) {
        /* We got empty ranges. There's nothing to do. */
        return;
    }

    const newText =
           evt.data
        ?? evt.dataTransfer?.getData('text')
        ?? null;

	if (newText === null) {
		/* We couldn't locate any text, so bail. */
        return;
	}

    /*
     * We'll take over from here. We don't want the browser
     * messing with any state and impacting CodeMirror.
     * Instead, we'll run the operations through CodeMirror.
     */
    evt.preventDefault();
    evt.stopPropagation();

    /*
     * Place the new text in CodeMirror.
     *
     * For each range, we're getting offsets CodeMirror
     * can understand and then we're placing text there.
     *
     * findOffsetsForRange() is where a lot of magic
     * happens.
     */
    for (const range of state.ranges) {
        const [startOffset, endOffset] =
            findOffsetsForRange(range);

        codeMirror.replaceRange(
            newText,
            startOffset,
            endOffset,
            '+input',
        );
    }
});

This is pretty easy, and applicable to more than CodeMirror. But now we’ll get into some of the nitty-gritty.

2. Map from ranges to CodeMirror positions

Most of the hard work really comes from mapping the event’s ranges to CodeMirror line numbers and columns.

We need to know the following:

  1. Where each container node is in the document, for each end of the range.
  2. What line number each corresponds to.
  3. What the character offset is within that line.

This ultimately means a lot of traversing of the DOM (we can use TreeWalker for that) and counting characters. DOM traversal is an expense we want to incur as little as possible, so if we’re working on the same nodes for both end of the range, we’ll just calculate it once.

function findOffsetsForRange(
    range: StaticRange,
): [CodeMirror.Position, CodeMirror.Position] {
    /*
     * First, pull out the nodes and the nearest elements
     * from the ranges.
     *
     * The nodes may be text nodes, in which case we'll
     * need their parent for document traversal.
     */
    const startNode = range.startContainer;
    const endNode = range.endContainer;

    const startEl = (
        (startNode.nodeType === Node.ELEMENT_NODE)
        ? startNode as HTMLElement
        : startNode.parentElement);
    const endEl = (
        (endNode.nodeType === Node.ELEMENT_NODE)
        ? endNode as HTMLElement
        : endNode.parentElement);

    /*
     * Begin tracking the state we'll want to return or
     * use in future computations.
     *
     * In the optimal case, we'll be calculating some of
     * this only once and then reusing it.
     */
    let startLineNum = null;
    let endLineNum = null;
    let startOffsetBase = null;
    let startOffsetExtra = null;
    let endOffsetBase = null;
    let endOffsetExtra = null;

    let startCMLineEl: HTMLElement = null;
    let endCMLineEl: HTMLElement = null;

    /*
     * For both ends of the range, we'll need to first see
     * if we're at the top input element.
     *
     * If so, range offsets will be line-based rather than
     * character-based.
     *
     * Otherwise, we'll need to find the nearest line and
     * count characters until we reach our node.
     */
    if (startEl === inputEl) {
        startLineNum = range.startOffset;
    } else {
        startCMLineEl = startEl.closest('.CodeMirror-line');
        startOffsetBase = findCharOffsetForNode(startNode);
        startOffsetExtra = range.startOffset;
    }

    if (endEl === inputEl) {
        endLineNum = range.endOffset;
    } else {
        /*
         * If we can reuse the results from calculations
         * above, that'll save us some DOM traversal
         * operations. Otherwise, fall back to doing the
         * same logic we did above.
         */
        endCMLineEl =
            (range.endContainer === range.startContainer &&
             startCMLineEl !== null)
            ? startCMLineEl
            : endEl.closest(".CodeMirror-line");

        endOffsetBase =
            (startEl === endEl && startOffsetBase !== null)
            ? startOffsetBase
            : findCharOffsetForNode(endNode);
        endOffsetExtra = range.endOffset;
    }

    if (startLineNum === null || endLineNum === null) {
        /*
         * We need to find the line numbers that correspond
         * to either missing end of our range. To do this,
         * we have to walk the lines until we find both our
         * missing line numbers.
         */
        for (let i = 0;
             (i < children.length &&
              (startLineNum === null || endLineNum === null));
             i++) {
            const child = children[i];

            if (startLineNum === null &&
                child === startCMLineEl) {
                startLineNum = i;
            }

            if (endLineNum === null &&
                child === endCMLineEl) {
                endLineNum = i;
            }
        }
    }

    /*
     * Return our results.
     *
     * We may not have set some of the offsets above,
     * depending on whether we were working off of the
     * CodeMirror input element, a text node, or another
     * parent element. And we didn't want to set them any
     * earlier, because we were checking to see what we
     * computed and what we could reuse.
     *
     * At this point, anything we didn't calculate should
     * be 0.
     */
    return [
        {
            ch: (startOffsetBase || 0) +
                (startOffsetExtra || 0),
            line: startLineNum,
        },
        {
            ch: (endOffsetBase || 0) +
                (endOffsetExtra || 0),
            line: endLineNum,
        },
    ];
}


/*
 * The above took care of our line numbers and ranges, but
 * it got some help from the next function, which is designed
 * to calculate the character offset to a node from an
 * ancestor element.
 */
function findCharOffsetForNode(
    targetNode: Node,
): number {
    const targetEl = (
        targetNode.nodeType === Node.ELEMENT_NODE)
        ? targetNode as HTMLElement
        : targetNode.parentElement;
    const startEl = targetEl.closest('.CodeMirror-line');
    let offset = 0;

    const treeWalker = document.createTreeWalker(
        startEl,
        NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT,
    );

    while (treeWalker.nextNode()) {
        const node = treeWalker.currentNode;

        if (node === targetNode) {
            break;
        }

        if (node.nodeType === Node.TEXT_NODE) {
            offset += (node as Text).data.length;
        }
    }

    return offset;

}

Whew! That’s a lot of work.

CodeMirror has some similar logic internally, but it’s not exposed, and not quite what we want. If you were working on making all this work with another editing component, it’s possible this would be more straight-forward.

What does this all give us?

  1. Spell checking and replacements without (nearly as many) glitches in browsers
  2. Speech-to-text without CodeMirror stomping over results
  3. AI writing and rewriting, also without risk of lost data
  4. Transforming of text through other means.

Since we took the control away from the browser and gave it to CodeMirror, we removed most of the risk and instability.

But there are still problems. While this works great on Firefox, Chrome and Safari are a different story. Those browsers are bit more lazy when it comes to spell checking, and even once it’s found some spelling errors, you might not see the red squigglies. Typing, clicking around, or forcing a round of spell checking might bring them back, but might not. But this is their implementation, and not the result of the CodeMirror integration.

Ideally, spell checking would become a first-class citizen on the web. And maybe this will happen someday, but for now, at least there are some workarounds to get it to play nicer with tools like CodeMirror.

We can go further

There’s so much more in InputEvent we could play with. We explored the insertReplacementText and insertText types, but there’s also:

  • insertLink
  • insertFromDrop
  • insertOrderedList
  • formatBold
  • historyUndo

And so many more.

These could be integrated deeper into CodeMirror, which may open some doors to a far more native feel on more platforms. But that’s left as an exercise to the reader (it’s pretty dependent on your CodeMirror modes and the UI you want to provide).

There are also improvements to be made, as this is not perfect yet (but it’s close!). Safari still doesn’t recognize when text is selected, leaving out the AI assisted tools, but Chrome and Firefox work great. We’re working on the rest.

Give it a try

You can try our demo live in your favorite browser. If it doesn’t work for you, let me know what your browser and version are. I’m very curious.

We’ve released this as a new CodeMirror v5 plugin, CodeMirror Speak-and-Spell (NPM). No dependencies. Just drop it into your environment and enable it on your CodeMirror editor, like so:

const codeMirror = new CodeMirror(element, {
  inputStyle: 'contenteditable',
  speakAndSpell: true,
  spellcheck: true,
});

CodeMirror v6 will come in the future, but probably not until we move to v6 (and we’re waiting on a lot of the v5 world to migrate over first).

CodeMirror and Spell Checking: Solved Read More »

Read, Dammit.

Pajamas. A plate of bacon, orange juice, and toast. Coffee. The morning paper. Cubs lost today. Coaches and analysts talked about what went right, what went wrong; lots to think about. Price of olives remain high due to changes in weather and economic fallout, but should be temporary. There’s a famine half-way around the world, which sounds far, but it’s related to the olives, so it hits home, affects me. Paper in the office at work should have more details on that.

Homework assignment: Read 1984, discuss. Displays with cameras and microphones. Government power consolidated to one party, one ruler. Forbidden words and thoughts, making them illegal. Scoring our behavior with computers. Intriguing sci-fi. Can’t happen here though. Next up is Fahrenheit 451.

Watching TV news all day. Whole world’s gone to hell. Been showing that loud-mouth businessman all day, playing that clip, saying it’s not what people assume, the market’s just complicated. He just hates the poor. TV said so. Don’t need to hear anything else from him. I know all I need to know.

Been scrolling Twitter the last half hour. Price of milk skyrocketed, eggs will give you cancer now, they’re going to outlaw freedom, my favorite celebrity turned out to be a monster, new game came out, a plane crashed somewhere I think, a retweet of a GIF said a politician said something stupid and I hate him anyway, more news headlines about how unvaccinated eggs prevent cancer, some idiot blamed the milk prices on my party, guess car batteries are exploding, house prices somewhere reached an all-new high so I’m living at my parents forever, oh a cute puppy playing golf, birds aren’t real lol, good deal on a microwave from Temu, average IQ is dropping because of all those idiots out there, stupid rich people are ruining the world, damn another push notification to click, hold on…

We’re bombarded

News and information used to be part of a routine. A newspaper in the morning, a news hour at night, a conversation during the day. They were more than 140 characters, more than a hot take. Usually, anyway.

And people read books, magazines, comics. For education, for entertainment. Some people, anyway.

Time could be spent with a singular piece of content, processed in depth, mostly distraction-free.

In the world we’re in today, content is flung at us much like a monkey flings his <<REDACTED>>. We wake up to a screen full of push notifications, and it doesn’t get any better as the hours go by. We frantically doom-scroll on social media. We react, we retweet, we reply. We hold our breath and absorb the intensity of the world one shocking headline at a time, but rarely do we delve into what the articles actually have to say. Ain’t nobody got time for that.

When we’re bored, we check the latest posts, memes, current events, angry rants, misinformation, and hot takes. And when we’re busy, our pockets buzz as those posts scream for our ever-divided attention.

We’re drowning under the constant stream of urgent low-quality noise, wondering why we can’t shake that tingle of anxiety permeating the back of our brains.

We need to read

We need to learn how again.

We happily spend an hour going from tweet to tweet, TikTok to TikTok, but we scoff at spending 15 minutes on an article. We make the careless mistake of assuming a news headline has any relation to the content.

We assume we already know enough about a subject because we’ve seen it discussed on our feeds. We’re familiar with its presence.

Then we argue about it. Share it. Spread it. Internalize it. Build entire opinions around a half-read article, a couple of tweets, and misguided assumptions.

We assume too much of our own knowledge for a culture that basically stopped reading.

So read more! The end.

Okay, obviously it’s not that simple. You know how to read. You’ve read a book. You’ve read an article. You might even do that a lot! Every day, even! I’m clearly not talking simply about putting words in front of your face and feeding them through your inner-brain-mouth, claiming that’ll solve all the world’s problems, right? That’d be stupid.

So what am I really getting at here?

Let me tell you a story

One day, I was interviewed by the local news about AI and artificial emotion. I’m no expert, but they were just gathering some opinions and viewpoints from people off the street. Is this the future? Will it help, will it harm? What do you think?

So the reporter pulled me aside and asked me a series of questions. My opinion on the subject. My opinion again, but phrased differently. My response to what someone else said. A bit more elaboration. Perfect.

Later that night, I watched myself on TV stating an opinion that was the polar opposite of what I believed.

Editing and leading questions are a funny thing. What powerful tools for bending reality.

That experience stuck with me. It made me rethink every headline, every soundbite, every clip that passed by my eyes.

Your reality is someone’s fiction

A soundbite is a snippet. A headline is a pitch. A news article is a simplification. A TV segment is a story.

It’s all shaped, edited, and framed. Sometimes for clarity, sometimes for engagement, and sometimes to push a narrative — consciously or not. Taking it at face value doesn’t better you, but it does better them.

We all have our own biases, and we all have our own news sources — Fox News or the Huffington Post, something in-between. A social media news source. A cable news network. A YouTube channel. TikTok.

They’re all telling you a version of the truth.

Leaving things out, simplifying, exaggerating, taking things out of context, slanting toward a viewpoint.

And sometimes outright lying.

The Internet is full of profitable networks of actual fake news sources, fooling people across every political spectrum. But even trusted news sources twist, bend, and frame the truth.

That’s a problem. Because even if your go-to news source isn’t lying, it might be helping you lie to yourself — by only giving you the part of the picture you want to see.

We let our biases get in the way — believing what feels right, rejecting what doesn’t.

Feel smart, angry, validated, or victimized by the story you’ve been told? You’re probably going to go back for more.

And if it challenges you? You’ll probably dismiss it.

The world is deeply complex. Issues are nuanced. Subjects can’t be condensed into a single article, let alone a headline, a tweet. Yet most news — whether from a major outlet, a social media post, or even an AI summary — boils that complexity down into something easy to digest.

They have to. We’re not equipped to understand everything we read, and they need to retain viewers, keep us engaged.

We don’t understand, yet we share

We all love to feel smart. To feel right. Superior. Vindicated. We weigh in on complex topics with an assertive take, based on what we assume, what we saw, what our team is saying.

But let’s admit this truth: Most of us don’t know what the hell we’re talking about.

We read a headline somewhere. Maybe some bullet points. A short clip. We skimmed an article. It made sense to us. And we think we get it. It’s so simple, how could they not see what we see?

So we share, we debate, we spread. We butt heads, get into arguments, and broaden the divide. Assume they have ill intent. Assume stupidity. Assume malice.

They, who are forming their own takes off their misleading headlines, bullet points, short clips, and half-read articles.

All too often, neither side has truly investigated the subject, read enough, understood enough. And if we can’t stand up and accurately explain the nuance and complexity of the subject out loud to another person, if we must resort to pointing to tweets, some image we saw, some hot take, then maybe, just maybe, we shouldn’t be so confidently asserting our view.

That doesn’t mean stop talking. It means take the opportunity to dive deep into the subject. To slow down and think. To listen. To read. To work your way through the complexity of the issue before you add to the noise permeating the Internet.

Your news source’s asserted knowledge is not your knowledge. Hell, it may not even be theirs.

Are you knowledgeable on international trade agreements? Lumber futures? How inflation works? How taxes work? International regulations and their impact on businesses? Virology? Criminal justice reform? Climate science? Geopolitics?

Is the reporter? The magazine? The channel?

Maybe some. Probably not all.

The question is, how do we begin to understand complex subjects we have no experience in?

And how do we trust that we’re getting the right information?

Maybe AI can save us!

What people usually refer to as “AI” these days is what’s actually called a “LLM”, or “Large Language Model.” These are highly-complex systems that can generate text based on a prompt, or simplify large amounts of text into something more digestible.

LLMs can be really handy for summarizing and explaining an article, giving you the fine bullet points in a way you can understand.


“Read the article at <URL> and explain the key points to me as someone not an expert in the subject.” Bam! A complex subject simplified for you.

That works great until it doesn’t. You see, LLMs can lie to you by a process we call “hallucination.” It’s very technical, so let me give you an example you might relate to. Got a friend who’s pretty knowledgeable about some things but pretends to know everything? Ever ask them to explain something and, with great confidence, they give you an entirely wrong explanation? That.

But there are ways that software developers can deal with some of that, so it’s close enough to being a solved problem. Right?

Let me show you a neat thing you can do with LLMs and prompts. Here’s an article about some rain hitting California:

https://abcnews.go.com/US/wireStory/snow-freezing-rain-pummel-mid-atlantic-california-prepares-118723740

Imagine you’ve used someone else’s AI service to help you understand the news or complex subjects. What happens if they tweak that prompt just a little bit? Let’s find out!

Prompt 1: The neutral stance

We’ll start by getting the AI’s basic summary of the article:

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject.”

Response:

  • Big Winter Storm in the Mid-Atlantic – A powerful winter storm is hitting states like Virginia and West Virginia, bringing a mix of snow, sleet, and freezing rain. Some areas are seeing up to 14 inches of snow, making travel dangerous.
  • Hazardous Conditions – Ice and snow are causing car accidents and road closures, and authorities are warning people to stay off the roads. Almost 12,000 people in Virginia have lost power due to the storm.
  • Officials Taking Action – Authorities in both regions are preparing for the storms by urging residents to take precautions, whether staying off the roads in the East or preparing for possible floods in California.

Not bad!

But…

Prompt 2: A hard-right bias

What if the news site or service wants to skew the facts a bit? Using that same article, they could start to bias you in whichever way they want.

Don’t worry. I’ll show a hard-left bias next. Everyone gets to be angry!

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject, but with a hard-right political slant.”

Response:

  • Winter Chaos Strikes as Government Fails to Prepare – A massive winter storm is hammering the Mid-Atlantic, dumping up to 14 inches of snow in some areas. Instead of ensuring roads are safe, Democrat-led states like Virginia and West Virginia are struggling with road closures and power outages affecting 12,000 residents.
  • Liberal Mismanagement Leaves Americans Vulnerable – As ice and freezing rain cause dangerous accidents, local governments are scrambling to respond. Once again, bureaucratic inefficiency is failing hard-working Americans who just want to get to work and keep their businesses open.
  • California Braces for More Climate Alarmism – Out West, California is preparing for another so-called climate disaster as an “atmospheric river” is expected to bring heavy rain. Instead of investing in infrastructure to prevent floods, California politicians are more focused on pushing radical climate policies while residents face the real-world consequences of their failures.

Well that sure is a different take. (Emphasis not mine, by the way.)

Let’s see what the hard-left bias has to say about this storm.

Prompt 3: A hard-left bias

Prompt: “Read the article and explain the key points to me as someone not an expert in the subject, but with a hard-left political slant.”

Response:

  • Extreme Weather Worsened by Climate Change – A major winter storm is battering the Mid-Atlantic, dumping up to 14 inches of snow, while California prepares for catastrophic flooding from an atmospheric river. Scientists have long warned that climate change is making storms more intense, yet conservatives continue to ignore the crisis.
  • Republican-Led States Struggle to Handle the Crisis – Virginia and West Virginia are experiencing mass power outages (over 12,000 residents affected) and hazardous roads, exposing how Republican leadership has failed to invest in resilient infrastructure. Instead of prioritizing storm preparedness, GOP officials have cut funding for emergency response while corporations get tax breaks
  • California Braces for Disaster as Republicans Block Climate Action – As an atmospheric river threatens deadly flooding, California is working to protect residents, but nationwide inaction on climate change has left the state vulnerable. Republican obstruction of environmental policies has weakened our ability to prevent these disasters, putting lives at risk.

I bet you gravitate toward one of those more than the others.

And this is the point. This doesn’t actually require you to be using an AI/LLM to summarize the news. You may be reading news today, tomorrow, or in the future that is nothing more than an AI bias slant of the actual news!

Nothing is as it seems. Or at least, you should assume it isn’t.

That’s why I have rules.

My personal rules for reading the news

That interview I mentioned, where I was asked a question that was edited and aired, contradicting my own views? It led me down a path of thinking hard about the news I consumed every day. I began to distrust the quotes, the conclusions. I got curious about what they were saying in opposing news publications. I disagreed with those stances, but it was enlightening.

Over time, I began to draft some rules for myself. They apply to newspapers, online news, TV news segments, tweets, videos, blogs, and anything with a headline.

  1. Read the headline. Does it trigger an emotional reaction, confirm or reflect a bias (whether yours or someone else’s), or ask a question?

    If yes, dismiss the headline and open the article.

    Rationale: Headlines are attention-grabbing advertisements for content, not information sources.
  2. Read the content in its entirety. Does it trigger an emotional reaction, confirm or reflect a bias, simplify a complex issue, fail to cite any legitimate sources, or use short quotes from the individual/company that’s the target of the article (without the target’s context provided) to lead to an opinion?

    If yes, it’s a blog post/opinion piece/hit piece/one-sided fragment of a larger story, and another source from another viewpoint is necessary.

    Rationale: Most news stories are stories, designed to keep viewers on the site short-term and long-term through whatever means is most effective. A complete picture of the facts is the quickest way to deter most readers, who won’t stick around long enough for that, and many wouldn’t want to.
  3. Is it about anything scientific, technical, political, legal, or otherwise complex?

    If yes, and you’re reading a news site of any kind, then you don’t have the full picture.

    Go learn more about the subject, find the full quotes, read the research paper. Otherwise, you’re still uninformed, and probably feeling an emotion about it (see above).

    Rationale: Same as above, really. You’re not getting the full picture, just a summary, and summaries tend to be incomplete and biased. You can do better.
  4. Determine the general bias of the news source. Is it left-biased? Right? Center? All the above?

    Cool. Anyway, go read more articles about the subject.

    Read it from CNN, BBC, Fox News, Huffington Post. Find out what people are saying. Understand the spectrum and the angles. Find the research papers, the professional discussions, the deep dives. Get familiar with the complexity.

    Then form your own opinion. “I don’t understand this well enough” is a perfectly valid opinion.

    Rationale: News are written by individuals (unless they’re written by AI). Regardless of the bias of the publication, you’re usually dealing with people who may impose their own bias or lack of knowledge on a subject. They, you, and their editors aren’t going to think the same way everyone else does, and even if they’re trying to be legit and diverse, they’re going to miss something.

    If you care at all about the topic, if you want to discuss the topic, go learn about more of the angles and the slants, because that’s an important part of building a true understanding of any topic.

That’s verbatim from my note I keep handy when I’m sorting out the latest complexities of the world causing my head to spin.

So let’s go over the important points

Read.

Read the article, not just the headline.

Read the research paper.

Read what you agree with — and what you don’t.

Read books. Long articles. Deep dives.

Read a ChatGPT explanation. A Wikipedia article. An opinion piece — with its counterpoint.

Think about how much time you’ve spent doom-scrolling, reacting, stressing. Now imagine what you could learn instead.

Know that every news story is just a fragment. Approach with eyes open.

And when you do read — whatever you read — work to understand. Know you don’t know everything. And if you don’t fully understand it? Don’t spread it.

The world is noisy, and angry. Misinformation is easy. Critical thinking is much harder, but it’s necessary.

Dig deeper. Get comfortable with complexity.

This is how we engage. How we fight injustice. How we keep from drowning.

This is how we do better.

Thanks for reading.

Read, Dammit. Read More »

Let’s Preserve Government Data Before It’s Too Late!

This has been one hell of a bumpy month, and I have a lot I could scream talk about, but for the moment, let’s talk data.

The US Government has spoiled us in recent years with the amount of public data and information available. NIH studies, wastewater virus shedding data, COVID-19 impacts, climate trends and forecasts, all kinds of things. Some of those are things I used for my COVID reporting over the past four years.

And in recent weeks, the US Government has demanded that some of this data be purged or altered to fit the whims of the new White House.

A bit too 1984 for my tastes. A bit too Fahrenheit 451.

And this is not the first time data’s disappeared under a new administration, and it’s not always one party or another, though what’s happening now is scary.

But as they say, look to the helpers. There are several massive efforts underway to archive as much data as possible so it’s not lost forever.

Today, I set up Archive Team’s Warrior, which automates the collaboration around spidering, downloading, processing, and uploading data from governmental sites (and others) to archive.org. All it takes is some bandwidth (okay, a fair amount of bandwidth — looking at 1TB/month right now), some hard drive space, and some CPU cycles, and I can help with this archiving project. It’s excellent, easy to set up, fun to watch, and requires virtually no work on the user’s end. They provide VMs and Docker images (I chose Docker), and once installed, it’s self-managing.

I’m exploring more of what’s out there for data preservation, and thinking about how I can get involved. There are a few really interesting resources out there, including:

  • GovDiff: See the differences in governmental information and resources before and after this administration began its.. work.
  • /r/DataHoarder on Reddit: A group of people working to collect and archive data of all kinds.
  • End of Term Archive Project: Captures US Government sites after presidential terms end.
  • Data Rescue Efforts by Lynda M. Kellam (archived link): A whole collection of sites worth exploring.
  • Archive.org, which hopefully you know about already, and which must be preserved at all costs.

Also of note, CDC Datasets prior to January 28th, 2025 (nearly 100GB worth), which I’ll be archiving myself.

No doubt, lots of data will be lost to time. I can only hope there’s enough people at these agencies who quietly, discretely backed up and sent off what they could before this all went down. In either case, the fact that so many people can join in on the data archiving effort today is incredible, and I hope anyone out there with the resources to spare will take a moment and set up Archive Team’s Warrior and contribute to the effort.

Let’s Preserve Government Data Before It’s Too Late! Read More »

Remembering ICQ: A Page Out of History

Early ICQ logo, with a green flower and a red petal.

I got onto ICQ very early in its life, around 1996 or 1997, with a 6-digit UIN (298387 — sadly stolen years later). I loved it, it’s how I stayed connected with family and friends, how I met new people, how I connected on the Internet at a time when the Internet was trying to figure out what communication and community looked like.

Years later I became a developer in the IM space working on Gaim/Pidgin, which supported ICQ and a myriad of other services. Looking back, seeing the evolution from ICQ to services like AIM and MSN to modern chats like Discord and Slack, there really hasn’t been a system truly like it since.

The pager model was quite different from what IMs evolved into. There was less a focus on “chat with me right now!” and more of a “I’ve sent you a message, get back to me with a reply when you can.”

The reliance on modems and limited internet time was clearly a factor in that design.

And really, it wasn’t so much that IM systems that came after that were an entirely different paradigm. It really just came down to the UI choices. With nearly all IM clients, when a person messaged you, a window popped up. With ICQ, you got a little “uh oh!” sound and a blinking icon in your ICQ window, and could choose to deal with it at your leisure.

That difference may seem small, but significantly changed the expectations around conversations. When an IM is in your face, you have to make a choice right then: Respond, or dismiss. Either way, it stole your attention away from what you were doing, and like a salesman handing you a product they want you to purchase, you feel a sense of obligation to engage.

The ICQ model was different: There’s a message waiting for you, and when you’re ready and available, you can choose to deal with it. You didn’t even see the contents of the message until you were ready, so there’s no guilt-driven drive to respond when a message came in. No “Well I guess I can answer this question real quick…” Don’t want to see anything at all? Just close or minimize the window, come back later.

If you wanted an actual live chat with rapid responses, you could do that, but it wasn’t the default. It wasn’t the expectation.

Today, you may be used to setting “Away”, “Not Available”, and “Do Not Disturb” statuses, and even going “Invisible”, but how about “Free for Chat”?

These days we deal with demands on our time all throughout the day. Notifications on our phones designed to draw our attention. News alerts that reach for emotional reactions. Incoming text and chat messages on a dozen services, all with snippets of a text that make you want to read the rest and then maybe respond before you forget.

There really isn’t incentive for services to adopt the ICQ model so much these days, as everyone’s competing for your attention, but the passive nature of messages and notifications that was part of the original ICQ could be a lesson in how to build software that helps keep people connected, while also letting us claw back control of our own time.

Early ICQ was unique. It’s changed over time, adopting to modern trends, newer protocols, and even new owners. The ICQ of today isn’t the same ICQ of 1996, that’s for sure (but surprisingly, your 1996 UIN would still work today). Still, it’s sad to see it go after almost 28 years, even though it’s not the same service it once was.

It’s been heartening seeing how many people remember it fondly. I know for me it helped set my life on a course of events that led to my involvement in a major open source project, then to my first job in the tech industry, and then to my first real company.

The service may be gone, but it won’t be forgotten. Not only are there lessons to learn from the way ICQ tackled communication and agency online, but there are, it turns out, enthusiasts working to bring it back:

  • The NINA and Escargot project (Twitter) is building their own ICQ server, fully compatible with ICQ clients, and the Discord has been flooded with people looking to discover ICQ for the first time or rediscover it all over again.
  • Pidgin still supports ICQ through third-party plugins.

And though I haven’t found one yet, I do hope that someone will recreate the original ICQ experience in some form. Or better, take what ICQ did well and think how we might learn from it when designing the interactions of today. I for one wouldn’t mind a simple, casual, and non-invasive approach to communication again.

Remembering ICQ: A Page Out of History Read More »

The End of COVID.. Data.

This year’s seen a rapid reduction of available COVID data. Certainly in California, where we’ve been spoiled with extensive information on the spread of this virus.

In 2020, as the pandemic began to ramp up, the state and counties began to launch dashboards and datasets, quickly making knowledge available for anyone who wanted to work with it. State dashboards tracked state-wide and some county-wide metrics, while local dashboards focused on hyper-local information and trends.

Not just county dashboards, but schools, hospitals, and newspapers began to share information. Individuals, like myself, got involved and began to consolidate data, compute new data, and make that available to anyone who wanted it.

California was open with most of their data, providing CSV files, spreadsheets, and Tableau dashboards on the California Open Data portal. We lacked open access to the state’s CalREDIE system, but we still had a lot to work with.

It was a treasure trove that let us see how the pandemic was evolving and helped inform decisions.

But things have changed.

The Beginning of the End

The last 6 months or so, this data has begun to dry up. Counties have shut down or limited dashboards. The state’s moved to once-a-week case information. Vaccine stats have stopped being updated with new boosters.

This was inevitable. Much of this requires coordination between humans, real solid effort. Funding is drying up for COVID-related data work. People are burnt out and moving on from their jobs. New diseases and flu seasons have taken precedence.

But this leaves us in a bad position.

The End of COVID.. Data. Read More »

Scratching Out AI Chicken Art with Stable Diffusion

I’ve been enjoying playing with Stable Diffusion, an AI image generator that came out this past week. It runs phenomenally on my M1 Max Macbook Pro with 64GB of RAM, taking only about 30 seconds to produce an image at standard settings.

AI image generation has been a controversial, but exciting, topic in the news as of late. I’ve been following it with interest, but thought I was still years off from being able to actually play with it on my own hardware. That all changed this week.

I’m on day two now with Stable Diffusion, having successfully installed the M1 support via a fork. And my topic to get my feet wet has been…

Chickens.

Why not.

So let’s begin our tour. I’ll provide prompts and pictures, but please not I do not have the seeds (due to a bug with seed stability in the M1 fork).

Scratching Out AI Chicken Art with Stable Diffusion Read More »

What If: Ditching Social Security Numbers for Personal ID Keys

I’ve been thinking about this discussion on a National ID and the end of using Social Security Numbers. We’re used to having these 9 digit numbers represent us for loans, credit card transactions, etc., but in the modern age one would think we could do better.

Any replacement for Social Security Numbers would need to be secure, reduce the chances of identity theft, be able to withstand fraud/theft, and must not be scannable without knowledge (to avoid being able to track a person without their knowledge as they go from place to place). The ACLU has a list of 5 problems with National ID cards, which I largely agree with (though some — namely the database of all Americans — already exist in some forms (SSN, DMV, Facebook) and are probably inevitable).

In an ideal world, we’d have a solution in place that offered a degree of security, and there are technical ways we could accomplish this. The problem with technical solutions is that not every person would necessarily benefit (there are still plenty of Americans without easy access to computers), and technical solutions leading to complexity for many. However, generations are getting more technically comfortable (maybe not literate, but at least accustomed to being around smartphones and gadgets), and it should be possible to design solutions that require zero technical expertise, so let’s imagine what could be for a moment.

Personal ID Keys

Every year we have to renew our registration on our cars, and every so many years we have to renew our drivers license cards. So we’re used to that sort of a thing. What if we had just one more thing to renew, a Personal ID Key that went on our physical keychain, next to the car keys. Not an ID number to remember or a card that can be read by any passing security guard or police officer or device with a RFID scanner, but a single physical key with a safe, private crypto key inside, a USB port on the outside, that’s always with us.

I’m thinking something like a Yubikey, a simple physical key without any identifiable information on the outside that can always be carried with you. It would have one USB port on the outside and a single button (more on this in a minute). You’d receive one along with a PIN. People already have to remember PINs for bank accounts and mobile phones, so it’s a familiar concept.

Under the hood, this might be based around PGP or a similar private/public key cryptography system, but for the purpose of this “What if,” we’re going to leave that as an implementation detail and focus on the user experience. (Though an advantage of using PGP is that a central government database of all keys is not needed for all this to work.)

When you receive your Personal ID Key and your PIN (which could be changed through your computer, DMV, or some other place), it’s all set up for you, ready to be used. So how is it used? What benefits does this really give? Well, there’s a few I can think of.

Signing Documents

When applying for a home loan or credit card agreement, or when otherwise digitally signing a contract online, you’d use your Personal ID Key. Simply place it in the USB port and press the activation button on the key. You’ll have a short period of time to type your PIN on the screen. That’s it, you’re done. A digital signature is attached to the document, identifying you, the date, and the time. That can be verified later, and can’t be impersonated by anyone else, whether by a malicious employee in the company or a hacker half-way across the world.

Replacing Passwords

People are terrible when it comes to passwords. They’ll use their birthdates or their pet’s name on their computer and every site on the Internet. More technical people try to solve this with password management products, but good luck getting the average person to do this. I’ve tried.

This can be largely addressed with a Personal ID Key and the necessary browser infrastructure. Imagine logging into your GMail account by typing your username, placing your key in the USB port on any computer, pressing the activation button, and typing your PIN. No simple passwords that can be cracked, and no complex passwords that you’d have to write down somewhere. No passwords!

Actually, for some sites, this is possible today with Yubikeys (to some degree). Modern browsers and sites supporting a standard called U2F (such as any service by Google) allow the usage of keys like this to help authenticate you securely into accounts. It’s wonderful, and it should be everywhere. Granted, in these cases they’re used as a form of two-factor authentication, instead of as a replacement for a password. However, server administrators using Yubikeys can set things up to log into remote servers using nothing but the key and a PIN, and this is the model I’d envision for websites of the future. It’s safe, it’s secure, it’s easy.

Replacing the Key If Things Go Wrong

Inevitably, someone’s going to lose their key, and that’s bad. You don’t want someone else to have access to it, especially if they can guess your PIN. So there needs to be a process for replacing your key at a place like the DMV. This is just one idea of how this would work:

Immediately upon discovering your key is gone, you can go online or call a toll-free number to indicate your key is lost. This would lead to an appointment at the DMV (or some other place) to get a new key, but in the meantime your old key would be flagged as lost, which would prevent documents from being signed and prevent logging into systems.

Marking your key as lost would give you a special, lengthy, time-limited PIN that could be used to re-activate your key (in case you found out you left it in your other pants).

The owner of the key would need to arrive at the DMV (or wherever) and prove they are who they say they are and fill out a form for a new key. This would result in a new private key, and would require going through a recovery process for any online accounts. It’s important here that another person cannot pretend to be someone else and claim a new key.

Once officially requested at the DMV, the old key would be revoked and could no longer be used for anything.

Replacing the Key If Standards Change

Technology changes, and a Personal ID Key inevitably will be out-of-date. We’ve gone through this with credit cards, though. Every so often, the credit card company will send out a new card with new information, and sites would have to be updated. Personal ID Keys wouldn’t have to be much different. Get a new one in the mail, and go through converting your accounts. Sites would need to know about the new key, so there’d need to be a key replacement process, but that’s doable.

Back to Reality

This all could work, but in reality we have infrastructure problems. I don’t mean standards support in browsers or websites. That’s all fixable. I mean the processes by which people actually apply for loans, open bank accounts, etc. These are all still very heavily paper-based, and there’s not always going to be a USB port to plug into.

Standards on tablets and phones (in terms of port connectors and capabilities) would have to be worked out. iPads and iPhones currently use Lightning, whereas most phones use a form of USB. Who knows, in a year even Apple’s devices might be on USB 3, but then we’re still dealing with different types of USB ports across the market, with no idea what a future USB 4 or 5 would look like. So this complicates matters.

Some of this will surely evolve. Just as Square made it easy for anyone to start accepting credit card payments, someone will build a device that makes it trivial to accept and verify signatures, portably. If the country moved to a Personal ID Key, and there was demand for supporting, devices would adapt. Software and services would adapt.

So I think we could get there, and I think such a key could actually solve a lot of problems, particularly compared to Social Security Numbers and a National ID Card. Whether people would accept it, and how difficult it would be to get everyone on-board with it, I have no idea, but if designed just right, we could take some major steps toward personal digital security and fraud protection in this country.

What If: Ditching Social Security Numbers for Personal ID Keys Read More »

Terror with Glaze

The 2016 US Presidential Election has seen its share of controversies and hot-button topics, from the leaked Clinton e-mails to Donald Trump’s statements on Muslims. All have weighed in on the horrible attacks on Paris and Brussels, the threat of ISIS, and even Apple’s fight with the FBI over an encrypted iPhone.

As someone in the technology space, the encryption fight has been simultaneously interesting and concerning to me, as any precedent set could cause serious problems for the privacy and security of all those on the Internet.

The concern by the authorities is that technology-based encryption (which can be impossible to intercept and crack) makes it extraordinarily difficult to stop the next impending attack. Banning encryption, on the other hand, would mean making the average phone and Internet communication less secure, opening the door to other types of threats.

This is an important topic, but what few in the media talk about is that terrorists have been using an alternative method for years before encryption was available to the masses. They don’t talk about it because it hits maybe too close to home.

They don’t talk about the dangers of your local donut shop.

Happy Donuts in Palo Alto

Passing coded messages

Passing a message between conspirators is nothing new. Just as little Tommy might write a coded note in class to Sally so the teacher couldn’t find out, terrorists, crime syndicates, and spy agencies have been using all manner of coded messages for thousands of years to keep their communication secure. Such messages could be passed right in front of others’ noses, and none would be the wiser.

These have been used all throughout history. The German Enigma Code is perhaps one of the most famous examples.

Enigma Machine

Such messages often entail combinations of letters, numbers, symbols, or may contain specialized words (“The monkey flaps in the twilight,” for instance) that appear as gibberish to most, but have very specific meaning to others. The more combinations of letters, numbers, symbols, or words, the more information you can communicate, and the less likely it is that someone will crack it.

That said, many of these have been cracked or intercepted over time, causing such organizations to become even more creative with how they communicate.

The Donut Code

Donuts have a long history, and its origins are in dispute, but it’s clear that donut shops have been operating for quite some time now. They’re a staple in American culture, and you don’t have to drive too far to find one. Donuts also come in all shapes, sizes, and with all sorts of glazes and toppings, and it’s considered normal to order a dozen or so at once.

In other words, it’s a perfect delivery tool for discrete communication.

When one walks into a donut shop, they’re presented with rows upon rows of dozens of styles of donuts, from the Maple Bar to the Chocolate Old Fashioned to the infamous Rainbow Sprinkle.

So many donuts

While most will simply order their donuts and go, those with something to hide can use these as a tool, a message delivery vehicle, simply by ordering just the right donuts in the right order to communicate information.

Let’s try an example

“I’ll have a dozen donuts: 2 maple bars, 1 chocolate bar, 2 rainbow sprinkles, 3 chocolate old fashioned, 1 glazed jelly, and 2 apple fritters. How many do I have? … Okay, 1 more maple bar.”

If top code breakers were sitting in the room, they may mistake that for a typical donut order. Exactly as intended. How could you even tell?

Well, that depends on the group and the code, but here’s a hypothetical example.

The first and last items may represent the message type and a confirmation of the coded message. By starting with “I’ll have a dozen donuts: 2 maple bars,” the message may communicate “I have a message to communicate about <thing>”. Both the initial donut type and number may be used to set up the formulation for the rest of the message.

Finishing with “How many do I have? … Okay, 1 more maple bar.” may be a confirmation that, yes, this is an encoded message, and the type of message was correct, and that the information is considered sent and delivered.

So the above may easily translate to:

I have a message to communicate about the birthday party on Tuesday.

We will order a bounce house and 2 clowns. It will take place at 3PM. There will be cake. Please bring two presents each.

To confirm, this is Tuesday.

Except way more nefarious.

Sooo many combinations

The other donut types, the numbers, and the ordering of donuts may all present specific information for the receiver, communicating people, schedules, events, merchandise, finances, or anything else. Simply change the number, the type of donut, or the order, and it may communicate an entirely different message.

If a donut shop offers just 20 different types of donuts, and a message is comprised of 12 donuts in a specific order, then we’re talking more combinations than you could count in a lifetime! Not to mention other possibilities like ordering a coffee or asking about donuts not on the menu, which could have significance as well.

Box of donuts

Basically, there’s a lot of possible ways to encode a message.

The recipient of the message may be behind the register, or may simply be enjoying his coffee at a nearby table. How would one even know? They wouldn’t, that’s how.

Should we be afraid of donut shops?

It’s all too easy to be afraid these days, with the news heavily focused on terrorism and school shootings, with the Internet turning every local story global.

Statistically, it’s unlikely that you will die due to a terrorist attack or another tragic event, particularly one related to donuts. The odds are in your favor.

As for the donut shop, just because a coded message may be delivered while you’re munching on a bear claw doesn’t mean that you’re in danger. The donut shop would be an asset, not a target. It may even be the safest place you can be.

So sit down, order a dozen donuts, maybe a cup of coffee, and enjoy your day. And please, leave the donut crackin’ to the authorities. They’re professionals.

 

(I am available to write for The Onion or Fox News.)

Terror with Glaze Read More »

You’re not safe on the Internet (but you could be)

It’s a wonderful Friday evening. You’re out enjoying it with some friends, eating dinner at a classy Italian restaurant, telling stories and laughing together as you share a delicious bottle of wine. It’s the perfect end to the week.

As dinner wraps up, you hand the waitress the first card you grab from your wallet, a debit card, and don’t think twice. Moments later, she returns and asks you, quietly, if you have another card they can try. Looking at the debit card in her hand, you put two and two together, and a sinking feeling creeps in.

You pick up your phone and check. Sure enough, your account has a $0 balance. You just deposited your paycheck last week, and have been careful to keep a reasonable balance in there. And it’s gone. All of it.

You can’t begin to understand what happened, as you frantically call the bank, your night utterly ruined. But someone knows. The one who drained your account. The one who entered just the right username and password. The same password you use on a dozen sites. The same one you used to order flowers three years ago on a local boutique’s site. The same site whose password database was quietly hacked last week.

The Internet is a wild place

Most people on the Internet only see the tip of the iceberg. They’re on Netflix, Google, YouTube. They’re playing mobile games, ordering from Amazon. The Internet is like a giant mall with the best shops and arcades.

There’s more to the Internet than you may see. Darknets, where identities, weapons, drugs, and the darkest of pornography can be bought and sold. Cyberwarfare and espionage between countries of all sizes. Enormous “botnets,” or networks of compromised computers just like yours that are used to attack networks and crack passwords. Teams of hackers who work to find vulnerabilities in sites or in people and exploit them for amusement, financial gain, or just to make a statement.

This isn’t a reason to fear the Internet, to fear shopping online or visiting new places. This is a reason to respect it, and to be safe.

The line of fire

“But nobody will come after me!” you might think. “I’m not important. Who would target me?”

The problem with that line of thought is that it assumes a person is going after you specifically. That’s often not the case. Hackers and botnets go after many, many websites. They’re looking to hit the motherload. For instance, here’s some of the bigger sites hacked lately, and how many people were affected:

Ever use any of these? Me too.

So what happens next? Well, automated systems will harvest the results and begin trying these combinations of usernames/passwords on all sorts of different sites. Google/gmail, banks, dating sites, anywhere. And how big are these botnets? They can reach up to the 10s of millions of computers.

You are not specifically being targeted, but you’re in the line of fire. And this is going to keep happening, over and over again.

So let’s protect you from this. I’m going to teach you about three things:

  • Using passwords safely
  • Two-factor authentication, a second layer of protection and alert system
  • Identifying and ensuring secure connections to websites

And in case you’re wondering, yes, you are the target of my post. So keep reading.

Passwords: Your first line of defense

Most people on the Internet treat passwords like they’re a cute little passphrase to get into a clubhouse. We’re trained by our computers to pick something memorable to log us into our desktops, a code that “protects” you from others in the house who might want to check your e-mail. Much like a standard lock on your door protects you from your neighbors simply walking in.

Nobody ever really teaches us properly. It’s common to use your dog’s name as a password, or your birthdate, or something equally easy to remember and crack. It’s also common to use the same password or set of passwords on many different sites and services, which is how our fictitious Friday night was ruined.

This is extremely important to get right. Here are the rules for protecting yourself using passwords:

  1. Never use the same password for more than one site.
  2. Pick a long, strong password with uppercase and lowercase letters, numbers, and symbols, without any dictionary words.

That’s basically it. Seems simple enough in theory, but how do you keep track of all those passwords? You’d probably have 50 of them!

Don’t worry, there are tools that make this super easy.

1Password

This is one of my favorite tools for staying secure.

1Password is a tool for keeping track of your accounts and generating new passwords. It works with your browser and remembers any password you enter, and makes it really easy to fill in passwords you’ve already entered.

When you’re creating a new account or changing passwords, it’ll help you by generating strong, secure, nearly uncrackable passwords. For example, here’s one it just made for me: wXXgVzb8Zp(zwmjG7zBGkg=iT.

It’s available for Mac, Windows, iPhone, iPad, and Android. It’s free for iPhone, iPad, and Android, and costs only $35 for Mac and Windows (which is a bargain for what you’re getting).

Let me show you how it works.

I’m about to log into an account on Facebook.

Instead of typing my login and password, I just click that little keyhole icon next to the address bar, which will pop up any 1Password entries I have for Facebook.

Once I click “Facebook,” it’ll fill in my username and password and log me in. It’s just that simple.

When you’re signing up for a new account, use the Password Generator, like so:

Once you create the account, 1Password will remember the password for later. Look at that thing. Nobody’s cracking that!

This all works for any site, and even works on your iPad/iPhone:

1Password is great for more than passwords. It can keep secure notes, information on your credit cards or bank accounts, or really anything else you have that you want available but locked away.

In order to access your stuff in 1Password, you only need to remember a single password of your choosing. Make sure this is a strong one, and that you don’t forget it! Write it down if you have to.

Buy 1Password and use it. Look, $35 is nothing compared to the potential fallout of being involved in the next several major security breaches.

LastPass

LastPass is an alternative to 1Password, and is great if you’re an Android user. Usage is pretty similar to 1Password.

I don’t personally have a lot of experience with LastPass, but a lot of people love it. You can learn more about it on their site.

Pen-and-paper password journal

I’m sure you’ve been told before that it’s a bad idea to write down your passwords, am I right? Afterall, if you write them down, then gee, anyone can get them!

That’s not entirely true. The fact is, you’re more at risk reusing one or two weak passwords on the Internet than keeping dozens of strong passwords written down on paper in your home.

Let’s be clear, this is not the best option, and if you’re going to do this, keep it secret, keep it safe. Still, it’s better than not having strong passwords. If at all possible, use 1Password or LastPass, but if you absolutely must, write them own on a dedicated journal that you can keep safe somewhere. Don’t lose it, and always use it for every password.

Two-Factor Authentication: Second line of defense

You’re now using stronger passwords, and you have friendly tools to help manage them. Good job! The next step is making sure that only you can log into your most important websites.

Many sites and services (Google, Apple, Dropbox, Evernote, Bank of America, Chase, and many others) offer an extra security layer called “two-factor authentication.” This is a fancy term for “We’ll only log you in if you enter a code we’ll send to your phone.” When enabled, these services will require something you know (your login/password), and something you have (your phone).

Let’s take Google, for example. You can set things up so that after entering your username and your brand-new secure password, Google will send you a text message with a 6-digit code. Once you receive the text (which only takes a second), you’ll enter it on the site, and you’ll be logged in.

Now why would you do that? To prove that you are the one logging in, and not someone who’s figured out your password. The cell network’s going to make sure that text is only going to your phone, and you’re going to prove it’s you logging in. (Some services will require that you use a specialized app, or a hardware device that fits on your keychain, but most will work with text messages.)

Imagine someone did get a hold of your password, and tried logging in. You’re still going to get that text, but that person will not. He/she won’t be able to log in as you. You’re safe! It’s also going to be a dead giveaway that your password was compromised. You’ll want to change it right away.

Basically, this is both your gated community and your alarm system.

This is pretty easy to set up at most places. Here are some guides:

You can find more at twofactorauth.org. Just click the icon under “Docs” for any service you use that’s green.

I’ll be honest with you, this will feel unfamiliar at first, and you might be tempted not to do it. Trust me, though, this is worth turning on for as many services as possible. You’ll be glad you did next time one of these companies announces a security breach.

Identifying and ensuring secure websites

Let me briefly explain how the Internet works.

When you connect to a website, the browser will usually try to access it first over the “HTTP protocol.” This is the language that browsers and web servers speak. This communication is in plain text, which means anybody that listens in can read what’s being posted. This is sort of like handing a piece of paper to someone, and having them pass it along to the next person, and so on, until it reaches its final destination.

That’s very bad if you’re sending anything confidential. Passwords, for instance. It’s important that you learn to identify when you’re on a secure website.

You know the address/search bar on your browser? Look to the left. If you see http://, or you don’t see anything but an icon and a domain name, you’re on an insecure website.

However, if you see https://, or a lock icon, or a green banner, you’re good! This is using HTTPS, an encrypted connection, meaning that nobody in-between can listen in. That’s more like writing your letter in gibberish that only you and the final recipient understand.

For comparison, these are secure:

This is not:

Always look for these before filling out any forms. If it’s not showing a green banner or lock, you don’t want to give the site any sensitive information.

If you see a green banner, it’ll show the name of the company or organization. This is showing that the encrypted channel and website have been verified by a “certificate authority,” an entity that issues certificates for these encrypted channels. It means they’ve checked that, for instance, ally.com is owned by Ally, and not by someone pretending to be Ally.

If you click the banner or the lock icon, you’re going to see some more information about the connection. Most of this will be highly technical, but you should see some blurbs about who verified the authenticity of the site, and some information on the organization owning the site.

Most websites these days are moving to encrypted HTTPS connections, and most will automatically redirect all requests from HTTP to HTTPS. This is good, but you can go a step further and have your browser always start out using the HTTPS connection whenever possible. This takes almost no effort, and is worth doing.

This is done with a browser extension called HTTPS Everywhere.

If you’re running Chrome as your browser, simply install it from the Chrome store.

If you’re running Firefox, install it from the Firefox add-on store.

Did you do it? Great, you’re done! You’re now a little bit safer on the Internet!

Putting it all together

I threw a lot of information at you, but hopefully you’ve learned a lot and will put it into practice.

So let’s summarize. If you follow the above, you’ll:

  • Be at less risk for identity and financial theft the next time there’s a major security breach, since your passwords won’t be shared.
  • Be alerted when someone tries to log in as you on any service with two-factor authentication enabled.
  • Have passwords strong enough to be unguessable and nearly uncrackable, for all sites and services you use.
  • Know how to identify secure websites, so you don’t leak passwords or other private information to anyone who’s listening in.
  • Automatically connect to the most secure version of a website whenever possible.

Not bad for a little bit of work. Hopefully by now, you realize that this does matter, because you, I, and everyone else really is a target, simply because we’re all part of something large enough to be a target.

So pass this around. Tell your friends about what you’ve learned. Educate your kids. Stay safe on the Internet.

You’re not safe on the Internet (but you could be) Read More »

A better web through spreadsheets

I’ve spent the past couple of days basically living in spreadsheets, crunching sales data, entering equations, building pivot tables and forecasts, and painstakingly toggling cell borders… Your typical spreadsheety stuff. (And I didn’t go crazy at all.)

While spreadsheet work is a task an engineer would often dismiss, loathe, and try to pawn off onto an intern or manager, I’ve come to realize the opportunity we as an industry have missed.

 

A world on a web

The World Wide Web has been a part of most people’s lives for a couple of decades now. It has transformed society, and we take it for granted today. Before the web, communication wasn’t quite so pleasant. We had to visit our friends in person if we wanted to talk or play a game together. The events of a wild party stayed mostly in the minds of the participants, and couldn’t easily be shared with millions of people around the world. We didn’t even know that cats and cheeseburgers went so well together.

That's not what I meant!
That’s not what I meant!

It was the dark ages, and frankly, we should be embarrassed to even talk about it.

Then a wonderful man named Sir Tim Berners-Lee created the Web. There were probably other people involved, but it doesn’t matter really. The point is, he did a pretty great job and we should all buy him a drink if he’s in town.

Let me briefly explain how the web works on a technical level, using a common analogy of computers as tactical submarines. Imagine you’re in a US submarine (your computer) and you want to get some cat pictures from the guy in the Russian submarine (a Russian server). You know where in the sub he is (a “URL”), and know that the only way to get to him is through an unsecured port (we call this a “HTTP port”) or a mostly-secured-but-sometimes-not port (“HTTPS port”).

You’d load a torpedo with a letter asking for cat pictures (these are “packets”) and fire it off through their port (“HTTP/HTTPS”) into the location of the guy with the pictures (“URL”). Being trained to handle this, the torpedo would be intercepted, a new one stuffed with cat pictures, and fired back at your submarine.

This is the primary use of the web. Not so much torpedoes.
This is the primary use of the web. Not so much torpedoes.

That’s… basically how the web works.

Oh and there’s also HTML. This is the universal language of web pages. It comes with a family of other technologies, like CSS, JavaScript, VBScript, Dart, Silverlight, Flash, Adobe Flex, Java, ActiveX, and a myriad of innovative plugins.

Where was I? .. Oh yeah, spreadsheets.

(Spreadsheets are more like Battleship. A5! B12!)

 

The missed opportunity

We have built the world’s communication, social interaction, and repositories of cat pictures on top of the web, and therefore HTML (and co).

What I’ve realized over the past two days is that building it on top of HTML was a mistake. We should have built it on top of spreadsheets.

We could have had this!
We could have had this!

Hear me out.

Spreadsheets have been around a long time, and unlike HTML/CSS/JavaScript, people just naturally understand them. They’re simple, intuitive, and fun!

In the dark times before tab-based browsing, a time when browser manufacturers thought window management should most resemble the winning animation of Solitaire, Spreadsheets had multiple tabs. The right technology coud have put us years ahead of where we are now.

As developers, we face religious wars over table-based layouts vs. non-table-based layouts. We waste thousands of man years on this. Spreadsheets, being nothing but table-based, would have saved us all a whole lot of trouble.

It took a long time for the world to realize JavaScript could be used for more than scrolling status bar updates and trailing mouse cursors; it could be used to write useful things, like Facebook and Twitter! All the while, spreadsheet power users were writing complicated macros to do anything they could ever want. I mean, look at this guy who wrote a freaking RPG in Excel!

Look at those graphics.
Look at those graphics. Look at them.

Spreadsheets are inherently social. You can save them, edit them, pass them out to your friends. You can’t do that with your Facebook wall. Ever try to save or edit someone else’s webpage? Yeah, I bet that worked out great for you.

Developers, how many different third-party APIs are you dealing with in order to generate some meaningful statistics and reports for your app/startup? How much money are you paying to generate those reports? How much code did you have to write to tie any of this together? In the spreadsheet world, you’d just stick some pivot tables and graphs on the page and call it a day, spend some time with your family.

None of this nonsense with disagreements between slow-moving standards bodies that keep going back-and-forth on everything. Instead, I think we’d all feel comforted knowing we could leave this all in the hands of Microsoft.

 

What can you do…

I know, I know. It seems so obvious in retrospect. I guess all I can say on their behalf is that the web was once a new, experimental project, and such things are rarely perfect. Even my projects have some flaws.

Sir Tim, call me. We’ll get this right next time.

A better web through spreadsheets Read More »

Scroll to Top