Decide whether the attribute value reader and inline preprocessor preemptively resolve escaped backslashes
The text of a paragraph goes through two phases of inline parsing, the inline preprocessor and the inline parser. The value of an attribute entry goes through two phases as well, the value assembler and the inline preprocessor. (That value will subsequently go through the inline parser when referenced in a paragraph). At each stage, there's syntax that can be escaped using a backslash. For example, in the value of an attribute entry, an attribute reference or a value continuation can be escaped using a backslash. Consider the following case:
:hint: Use \{backslash} to insert \\
When the hint
attribute is referenced in a paragraph, we expect to see the following in the rendered document:
Use {backslash} to insert \
We need to consider how we end up at that result.
There are two strategies for how escaped backslashes can be handled as the processor works through the inline parsing phases.
Strategy 1: Resolve escaped backslashes per phase (strict)
Following this strategy, each time the processor looks for escaped backslashes, it resolves (or normalizes) them. What that entails is consuming the odd backslash (if present) as an escape, then reducing the number of backslashes by half. Consider this sequence:
\\\
That would resolve to:
\
The benefit of this strategy is that it can account for every permutation of backslash escaping. If you want the backslash to be treated as a literal backslash, you just add more backslashes. However, this strategy quickly leads to the leaning toothpick problem...which is essentially an exponential increase of required backslashes.
Let's assume that we start with the following AsciiDoc source:
:command: *begin*
:text: Use ??
{command} to begin a block.
We want to see the following the output document:
Use \<strong>begin</strong> to begin a block.
The question is, how many trailing backslashes do we need to use in place of ?? to produce a literal backslash without impacting the attribute reference and text formatting it contains? The answer is, we need 9.
:text: Use \\\\\\\\\
{command} to begin a block.
The last backslash acts as a value continuation. Then, it reduces the even number of backslashes that precede it by half, leaving us with 4. At this stage, this is what the processor sees:
Use \\\\{command} to begin a block.
Now we resolve the attribute reference, once again reducing the even number of backslashes that precede it by half. At this stage, here's what the processor sees:
Use \\*begin* to begin a block.
When the {text}
attribute reference is used in the paragraph, the inline parser will locate the escaped backslash in the resolved value and once again reduce the backslashes by half, reducing it to 1 (which will not impact the text formatting). Thus, we arrive at the following result:
Use \<strong>begin</strong> to begin a block.
While this works, it's hard to explain to an author—especially someone not familiar with the low-level phases—why 9 backslashes are needed. Thus, I think we should consider strategy 2.
Strategy 2: Only resolve escaped backslashes once, during inline parsing
In this strategy, the escaped backslashes are still considered at each phase, but they are left as is until inline parsing (the last phase). That way, they remain stable through the phases rather than being reduced at each stage. As a result, the user only needs to escape a backslash once.
Revisiting the previous example, the author only needs 3 trailing backslashes to achieve the desired result.
:command: *begin*
:text: Use \\\
{command} to begin a block.
The odd backslash is consumed as the value continuation. The remaining escaped backslash is reduced to a literal backslash by the inline parser.
The drawback of this strategy is that it's not possible to use a backslash to escaped the resolved value of an attribute. Let's assume that we want the following output instead:
Use *begin* to begin a block.
If we use \\{command}
, then we're going to end up with \\*begin*
rather than \*begin*
. So we've sacrificed some flexibility for simplicity. However, there's still a mechanism available to achieve the desired result. If we set the esc
attribute to a single backslash, then it becomes possible to insert an escape character in front of the resolved value of the attribute. Consider this case:
:esc: \
:text: Use {esc}\
{command} to begin a block.
Now when we reference {text}
, the inline parser will see \*begin*
. That means the output will show:
Use *begin* to begin a block.
NOTE: The value of the implicit backslash
attribute will need to be \\
rather than \
so it produces a literal backslash as expected.
Proposed decision
Given that the audience for AsciiDoc is more than just programmers, I think the simplistic approach is best here. We want to avoid the leaning toothpick problem, and we want to be able to easily explain the AsciiDoc rules without having to make the user aware of all the low-level phases. There are still plenty of mechanisms available in AsciiDoc to escape syntax without having to rely on strict backslash escaping.
It's worth noting that none of these scenarios mentioned in this issue are even available in Asciidoctor or its predecessor. That's because both only consider whether the character that immediately precedes the reserved syntax (e.g., an attribute reference) is a backslash, not whether that backslash is itself escaped. So this issue is primarily a refinement of #25.