Jun 24, 2024 6:00 AM

The Eternal Truth of Markdown

An exegesis of the most ubiquitous piece of code on the web.

ILLUSTRATION: SAMUEL TOMSON

In the beginning was the Word, and the Word was in plaintext, and the Word was in plaintext because plaintext was the Way. It was good.

On the sixth day—I’m skipping ahead here—the internet was born. The Word needed to be rewritten in HTML. Now there were two Words. It was not good.

On the eighth day, after a bit of rest, Markdown was born. Markdown made it possible to bring forth the Word as HTML on the web, PDF in the library, LaTeX in the publishing house, even Microsoft Word DOC in the office—all generated from the same plaintext.

The people saw that in this form the Word was more flexible. It was good. The internet rejoiced and put Markdown in all the things.

This is where the real problems began.

Today, Markdown is possibly the most ubiquitous piece of code on the web. Support for Markdown is embedded in nearly every online text box you’re likely to encounter, and there’s an entire economy of mobile writing and note-taking apps built on its back.

Markdown is not just a piece of software. It’s also a markup language—it’s used to format plaintext, which then appears the way you want it to on, say, the internet. Markdown the markup language was designed to be “as easy-to-read and easy-to-write as is feasible,” according to creator John Gruber’s syntax guide. “A Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions.”

This, I believe, is the cornerstone of Markdown’s success (and why related projects from that era, like reStructuredText and Setext, remain largely unknown): It looked at the world as it actually was and built on the informal conventions people were using. Markdown took common quirks of writing plaintext emails or message-board posts—like wrapping a word in asterisks to *emphasize* it—and extended those formatting customs. It did not come in and declare an entirely new syntax and ask people to adopt it.

Of course, there are some important assumptions behind Markdown. The big one is that the ideal canonical format for storing data long-term is plaintext. This is self-evident to any programmer. Code is plaintext. Humans write in text—using text editors, some of which are more than 40 years old—and we’ve even created entire operating systems (Unix) built around the idea that the file system is a tree of plaintext files. Plaintext is the alpha and the omega of digital files.

I first encountered Markdown when Gruber posted something about it to the BBEdit mailing list toward the end of 2004. (At the time, most worthwhile discussions happened on BBSes or over email.) Like most people, I was able to memorize Markdown in an afternoon because we were already using half of it.

I liked Markdown so much, I took the parser and adapted it to spit out LaTeX, a system for typesetting documents that I could then convert to PDF and print. I had never written a line of Perl (the language Gruber wrote Markdown in), nor had I ever attempted a regular expression (which is the bulk of the code in the Markdown parser), but the code was out there, why not try? It worked.

Markdown became a core part of how I wrote. The simplicity and flexibility meant I would live the dream of write once, run anywhere. It did lead to some ambiguity, though. Gruber would probably say this is by design. His emphasis throughout the Markdown documentation is on the syntax of Markdown, not—say—the resulting HTML. His Perl script does not support HTML class names or IDs, for example, so you can’t add those to the generated HTML. By the logic of the original Markdown script, if you want complete control over the HTML output, then you’d need to write in HTML.

This situation is great for Markdown users: that is, writers. It’s less great for programmers. In fact, it drives them crazy. Programmers do not like ambiguity. It goes against so much of what programming is about. As a writer using Markdown, I love that I can pick whichever particular version is best suited to my needs. As a programmer, I hate that when I build something I have to make this same decision, which then affects all the people who use my finished product. Maybe I didn’t support some specific extension they were expecting because they’ve always used the same Markdown parser and assume that feature is available.

If this weren’t bad enough, there are also some ambiguities in the syntax. For example, asterisks are used for italics when singular (*like this*) and bold when doubled (**like this**). So far so good. But what should happen if you write **like* this**? Should that be rendered like* this? Or maybe like this*? There’s no way to know; whoever is writing the parser has to make that decision.

What’s more, unlike most extremely successful pieces of code, Markdown is not publicly hosted on the code-sharing site du jour. It doesn’t have hundreds of people contributing to it, and the last time the original Perl script was updated was 2004. This too rubs programmers the wrong way. We’re a cliquish bunch; things outside the clique are viewed with suspicion.

About a decade ago, there was an effort to eliminate the ambiguities in Markdown and bring it into line with coding dogma. Some programmers got together and created CommonMark, which makes the choices the original Markdown script doesn’t and came up with what its creators think is the One Right Way to Do It.

CommonMark offered comfort. It’s on Github. It has a discussion forum. It seems to be an active project. I have never personally incorporated CommonMark into a project, but its parsers are what convert your Markdown to HTML on such popular sites as Stack Overflow, Github, and Reddit. (To eliminate the asterisk ambiguity, for example, it proposed underscore for italics, asterisk for bold.) Presumably the developers behind CommonMark consider it a success.

But it’s not Markdown. Not in name, and I would argue not in spirit.

Around the time the CommonMark effort was happening, the software developer Dave Winer told me something I still think about: Markdown belongs to everyone who uses it. This is literally true because of the license. But it also reminded me of the real point of free software. We all have a say in it: by using it, by adapting it, even by forking it.

Whether Gruber intended it this way or not, Markdown does belong to everyone, and there is no standard. I use a very old version of Markdown for Python. Gruber presumably still uses his Perl script. Other people use other versions. It’s messy. It’s ambiguous. It’s human.

And this, in the end, is the Way.

You Might Also Like …