Clarify how empty lines and list continuations impact list boundaries
Lists have implicit boundaries in AsciiDoc (and in most lightweight markup languages). Hence, a common matter for an author is how to maintain the boundaries of a list or how to break out of them. Our goal is to ensure that its easy for an author to keep a list together when needed, but also easy to separate lists when they shouldn't be adjoined.
In AsciiDoc, there are two forms that impact this outcome, the list continuation and the empty line. In this issue, we'll clarify what impact these two forms have on list parsing with the intent to solidify the rules of list boundaries.
Scenarios
To understand how the grammar rules are defined, we'll be examining several scenarios:
-
[l-1] Empty lines between list items (after the list item definitively ends)
* first item * second item
-
[l-2] Empty lines above a block attached with a list continuation, as well as between its metadata lines
* item + attached block
* item + [#idname] attached block with an ID
-
[l-3] Empty lines above a new list with or without block metadata following a list item
* item . nested list
* item [] . nested or sibling list?
-
[l-4] Empty lines above an indented (literal) block with or without block metadata following a list item
* item indented
* item [] indented
Although it won't be discussed in this issue, an empty line above a list continuation applies that list continuation to an ancestor list. The number of empty lines equates to how many levels it ascends (e.g., one empty line means it applies to the parent).
l-1
Let's start with [l-1], since a decision here sets the foundation for what rules are available for the other scenarios. We often refer to [l-1] as ventilated list items. The reason is, authors have a tendency to want to put some space between list items to make them more readable. The question is, how much space is allowed? Consider the following case:
* first item
* second item
In pre-spec AsciiDoc, any amount of empty lines are tolerated between list items and the list will still stay together. To tighten this rule, and make it easier to separate lists, one proposal is to only permit a single empty line between list items. Any more and the list would be severed. However, this proposal could have major compatibility implications as there are many documents in the wild that rely on arbitrary ventilation. Furthermore, this new rule would be inconsistent with other lightweight markup languages such as Markdown and reStructuredText. It just seems to be part of the unwritten code of lightweight markup languages to allow list items to be separated by an arbitrary number of empty lines. (One notable exception is textile). If we decide to honor that code, then we won't be able to allow adjacent lists that have the same marker to be separated using empty lines alone.
Currently, the main way to separate adjacent lists that are congruent (i.e., same list marker) is to insert a block attribute line between them (with or without a preceding empty line). The block attribute line can be empty (i.e., nothing between the square brackets). Since a sibling list item cannot have metadata lines above it, this line effectively acts as an interrupting line. As a result, it causes the first list to end and a new one to begin. For example:
* first list
[]
* second list
This technique also works if the second list is preceded by a block title line, though it usually has to follow an empty line in order to be recognized as a block title line (typical rules).
If the lists are not congruent, the empty line above the block attribute line is required (to account for the scenario in [l-3]).
To make the intent of the interrupting line more clear, a non-functional option could be used to communicate the block attribute line's function:
* first list
[%interrupt]
* second list
Another technique to keep them apart is to enclose one of the lists in an open block:
* first list
~~~~
* second list
~~~~
We are still considering whether there are other ways to separate adjacent lists, such as using a line comment. However, we generally prefer comments to not impact parsing, so this may not be pursued. Either way, it will addressed in a separate issue.
l-2
Let's now consider [l-2]. Here, there's an argument to once again be tolerable of multiple empty lines, but for different reasons. Before going on, it's important to emphasize that the list continuation cannot be preceded by an empty line (otherwise, it becomes a list continuation for an ancestor list). When the list continuation is found, it effectively tells the parser to expect a single block. Normally, a block can have empty lines above it (either above the metadata or in between it). Thus, it seems like it would be safe and consistent to allow them here too. The intent of the author is clear: "find one block to attach". If empty lines were not tolerated, then the parser would essentially be ignoring the request of the author and leave the list continuation dangling.
The AsciiDoc style guide should certainly encourage authors to not leave empty lines after a list continuation as it makes the attachment less clear. But from the standpoint of the parser, there's no real benefit of giving empty lines special meaning here.
l-3
In a list, there are two cases of an implicit list continuation, [l-3] a nested list and [l-4] an indented (literal) block. In these cases, the rules about when empty lines are tolerated are more strict.
Let's look at [l-3] first. If an adjacent list is encountered that's different and has no metadata lines, that list is attached as a child of the current list item regardless of how many empty lines are above it. Again, this comes from the empty line tolerance in lists across lightweight markup languages. While we could forbid consecutive empty lines, we'd be introducing a special rule just for this case, which will be hard to remember.
The primary question is, how many empty lines should be permitted if the adjacent list has metadata lines? Typically in AsciiDoc, a block attribute line acts as an interrupting line. Intuition would then tell us that a block attribute line above an adjacent list will cause the previous list to end. Consider this case:
* disc
[square]
** square
However, the AsciiDoc syntax grants a special exception here. If there's no empty line above the block attribute line, it acts as through there's an implicit list continuation above it. Thus, the second list becomes a child of the list item in the first list (hence a nested list).
But what happens if there's an empty line above the block attribute line? Consider this case:
* disc
[square]
** square
Now we are torn between two standard rules. On the one hand, we said earlier that a block attribute line is one way to separate adjacent lists (i.e., prevent nesting). On the other hand, there's an implicit list continuation above an adjacent list when that list is different.
There are two possibilities here. The first choice is that we stick with the idea that empty line + block metadata line acts as an interrupting line with a list. This rule matches pre-spec AsciiDoc. In that case, here's how the second list would need to be attached if preceded by an empty line:
* disc
+
[square]
** square
The second choice is that we tolerate at least one empty line, but not consecutive ones. This rule borrows from an earlier proposal. Since no where else in the AsciiDoc syntax do consecutive empty lines have a different meaning than a single empty line (especially above a block), I think the second rule would be a risky choice to introduce here. I'm inclined to reject the idea.
l-4
Finally, we arrive at [l-4]. Like with a nested list, an indented (literal) block has an implicit list continuation. If the indented block has no metadata lines, then it must be offset by at least one empty line or else it gets soaked up as part of the list item principal. Consider this case:
* item
indented
Since we've already established that a nested list without metadata lines can be preceded by an arbitrary number of empty lines, it's both logical and consistent to allow it in this case as well.
Once again, we need to consider what happens if the indented block has metadata lines. Consider this case:
* item
[.output]
indented
Pre-spec does not apply the implicit list continuation if the indented block has at least one metadata line. So the indented block would not be attached to the list item in this case. Instead, it would require an explicit list continuation to do so. However, if we want the rules of an implicit list continuation to be consistent, then we should attach the indented block if not preceded by any empty lines:
* item
[.output]
indented
The block attribute line interrupts the list item principal, so the indented block should be a candidate for attachment in this case. This is not supported in pre-spec AsciiDoc, but we could add it now.
Summary and decisions
As we've stated in other issues, while formalizing AsciiDoc, we're trying to remain as consistent with how the language is currently interpreted as possible. At the same time, we need to address idiosyncrasies so the language is easy to understand, remember, and use.
With that in mind, we want to make it easy to keep lists together and also easy to separate them. The main subject of concern are empty lines. When are empty lines tolerated in a list and are consecutive empty lines are allowed? We established that pre-spec AsciiDoc—and lightweight markup languages in general—are quite tolerant of empty lines in a list. Any number of empty lines are permitted between list items of the same list, and empty lines are permitted following an explicit or implicit list continuation.
We considered the proposal of assigning meaning to consecutive empty lines so they act like a list interrupting line. While enticing, this proposal would greatly threaten compatibility and deviate from the unwritten code of lightweight markup languages. Thus, we don't think it's worth the risk.
We then clarified that adjacent lists can be separated using an empty line followed by a block attribute line. (If the two lists are congruent, the empty line is not required). This is a pattern that was heavily promoted in pre-spec AsciiDoc and, as a result, plenty of documents now depend on it. It offers an definitive way to separate lists that can be made to be self-documenting.
We then accepted that empty lines should be permitted above a block attached using an explicit list continuation. The justification is that the intent of the explicit list continuation is clear and there's no reason to counter that intent by giving empty lines special meaning. The goal of the list continuation is to find a block, and the parser should proceed until it does.
We then considered whether an empty line or lines should be permitted in the case an implicit list continuation is being used to attach a block with metadata lines. Here we decided that the syntax should not be tolerant of empty lines. The reason is that it would break the contract that empty line + block attribute line can be used to separate adjacent lists. The block metadata line either must not be preceded by an empty line or the list must be attached using an explicit list continuation.