YAML and Markdown

2019-11-13 - Progress - Tony Finch

This web site is built with a static site generator. Each page on the site has a source file written in Markdown. Various bits of metadata (sidebar links, title variations, blog tags) are set in a bit of YAML front-matter in each file.

Both YAML and Markdown are terrible in several ways.

YAML is ridiculously over-complicated and its minimal syntax can hide minor syntax errors turning them into semantic errors. (A classic example is a list of two-letter country codes, in which Norway (NO) is transmogrified into False.)

Markdown is poorly defined, and has a number of awkward edge cases where its vagueness causes gotchas. It has spawned several dialects to fill in some of its inadequacies, which causes compatibility problems.

However, they are both extremely popular and relatively pleasant to write and read.

For this web site, I have found that a couple of simple sanity checks are really helpful for avoiding cockups.

YAML documents

One of YAML's peculiarities is its idea of storing multiple documents in a stream.

A YAML document consists of a --- followed by a YAML value. You can have multiple documents in a file, like these two:

---
document: one
---
document: two

YAML values don't have to be key/value maps: they can also be simple strings. So you can also have a two-document file like:

--- one
--- two

YAML has a complicated variety of multiline string syntaxes. For the simple case of a preformatted string, you can use the | sigil. This document is like the previous one, except that the strings have newlines:

--- |
one
--- |
two

YAML frontmatter

The source files for this web site each start with something like this (using this page as an example, and cutting off after the title):

---
tags: [ progress ]
authors: [ fanf2 ]
--- |
YAML and Markdown
=================

This is a YAML stream consisting of two documents, the front matter (a key/value map) and the Markdown page body (a preformatted string).

There's a fun gotcha. I like to use underline for headings because it helps to make them stand out in my editor. If I ever have a three-letter heading, that splits the source file into a third YAML document. Oops!

So my static site generator's first sanity check is to verify there are exactly two YAML documents in the file.

Aside: There is also a YAML document end marker, ..., but I have not had problems with accidentally truncated pages because of it!

Tabs and indentation

Practically everything (terminals, editors, pagers, browsers...) by default has tab stops every 8 columns. It's a colossal pain in the arse to have to reconfigure everything for different tab stops, and even more of a pain in the arse if you have to work on projects that expect different tab stop settings. (PostgreSQL is the main offender of the projects I have worked with, bah.)

I don't mind different coding styles, or different amounts of indentation, so long as the code I am working on has a consistent style. I tend to default to KNF (the Linux / BSD kernel normal form) if I'm working on my own stuff, which uses one tab = one indent.

The only firm opinion I have is that if you are not using 8 column tab stops and tabs for indents, then you should use spaces for indents.

Indents in Markdown

Markdown uses indentation for structure, either a 4-space indent or a tab indent. This is a terrible footgun if tabs are displayed in the default way and you accidentally have a mixture of spaces and tabs: an 8 column indent might be one indent level or two, depending on whether it is a tab or spaces, and the difference is mostly invisible.

So my static site generator's second sanity check is to ensure there are no tabs in the Markdown.

This is a backup check, in case my editor configuration is wrong and unintentionally leaks tabs.