AsciiDoc Language issueshttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues2024-03-06T07:04:04Zhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/48Specify how location ranges are reported in the ASG2024-03-06T07:04:04ZDan AllenSpecify how location ranges are reported in the ASGAn implementation can optionally include location information in the ASG of the parsed document. If it does include this location information, the values must match the location information in the expected data. The purpose of this issue...An implementation can optionally include location information in the ASG of the parsed document. If it does include this location information, the values must match the location information in the expected data. The purpose of this issue is to specify the logic for how these location ranges are reported.
The optional `loc` property on every element in the ASG reports the location range of that node in the source. The range consists of the starting line, column, and file position of the element and the ending line, column, and file position of the element. Both the line and column values are 1-based. The column is meant to represent a visible column in the source. The file property captures the include target stack, when applicable, starting from the first include in the stack. (It's up to code analyzing the ASG to resolve the file value to an absolute representation).
Here's an example:
```
"location": [{ "line": 1, "col": 1 }, { "line": 3, "col": 4 }]
```
We're not tracking newline characters when calculating end locations, nor are we tracking empty lines that separate blocks in the ASG.
When computing the end location of a block, the column of the trailing newline is not included. The block ends at the visible location in the source document, not at the newline that follows it. For a delimited block with a delimiter length of 4, the end column is 4, not 5. There are two reasons for this. First, it points to a column in the source that the cursor can go. Second, it ensures that the end column for a block is consistent regardless of whether it's at the end of the document or somewhere in the middle of it.
The tracked location of the document is all the lines in the document, even those that proceed or follow the first and last block, respectively.
The tracked start location of a delimited block is the start of the opening delimiter line (e.g., line: 1, col: 1) and the tracked end location of a delimited block is the end of the closing delimiter line (e.g., line: 3, col: 4).
The tracked start location of a paragraph or named non-delimited block is the start of the first line of content and tracked end location of a paragraph or named non-delimited block is the last visible character of the last line of content.
The tracked start location of block metadata is the first column of the first block attribute line and the tracked end location is the last column of the last block attribute line. The location of the block metadata comes before the start of the block itself (so the entire range of the block is the start location of the metadata to the end location of the block).
Normally, the lowest column value is 1. However, there are two cases when the column must be 0. First, if a document has no blocks, then the start and end column is 0. The 0 column indicates that the source does not occupy any space. If the first line of the document is empty, then the start column is 0. The 0 column indicates that there is no content on the first line, only a newline that follows it. Similarly, if the contents of a verbatim block starts with an empty line, then the start column of the content is 0, again indicating that there is no content on the first line, only the newline that follows it. If the content has a trailing empty line, then the end column is 0 for the same reason.
For an inline element, if the beginning or end of the element was resolved from an attribute reference, the location should be the start or end of the attribute reference, respectively. That's because the location range is meant to track the offsets in the source, not in the resolved value.
Any other nuances of the location range should be covered by this SDR.0.4.0 (milestone build)Dan AllenDan Allenhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/47Draft preliminary descriptions for block parsing, structural forms, and conte...2024-02-29T18:56:58ZSarah WhiteDraft preliminary descriptions for block parsing, structural forms, and content modelsWrite up the initial specification document content for the following global block sections:
* Block element
* Block parsing
* Block structural forms
* Paragraph form
* Block content models
* Basic
* Verbatim
* Compound
* (Possi...Write up the initial specification document content for the following global block sections:
* Block element
* Block parsing
* Block structural forms
* Paragraph form
* Block content models
* Basic
* Verbatim
* Compound
* (Possibly) Block style, macro name and variant
This content evolved out of working on the paragraph content and heavily draws from the SDRs. Most of all, it needs to get off my computer before it gets buried in a sea of other experimental branches :scream:.
The information will not be complete, and may possibly mention some concepts or terms that haven't been officially discussed or approved (I'll mark such items as "under consideration", but it is important to remember that everything in the specification is pre-alpha and is highly likely to be tweaked if not significantly changed depending on new discoveries and feedback from implementations).0.4.0 (milestone build)Sarah WhiteSarah Whitehttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/43Decide whether the attribute value reader and inline preprocessor preemptivel...2024-02-28T00:17:15ZDan AllenDecide whether the attribute value reader and inline preprocessor preemptively resolve escaped backslashesThe text of a paragraph goes through two phases of inline parsing, the inline preprocessor and the inline parser. The value of an attribute entry goes through two phases as well, the value assembler and the inline preprocessor. (That val...The text of a paragraph goes through two phases of inline parsing, the inline preprocessor and the inline parser. The value of an attribute entry goes through two phases as well, the value assembler and the inline preprocessor. (That value will subsequently go through the inline parser when referenced in a paragraph). At each stage, there's syntax that can be escaped using a backslash. For example, in the value of an attribute entry, an attribute reference or a value continuation can be escaped using a backslash. Consider the following case:
```
:hint: Use \{backslash} to insert \\
```
When the `hint` attribute is referenced in a paragraph, we expect to see the following in the rendered document:
```
Use {backslash} to insert \
```
We need to consider how we end up at that result.
There are two strategies for how escaped backslashes can be handled as the processor works through the inline parsing phases.
## Strategy 1: Resolve escaped backslashes per phase (strict)
Following this strategy, each time the processor looks for escaped backslashes, it resolves (or normalizes) them. What that entails is consuming the odd backslash (if present) as an escape, then reducing the number of backslashes by half. Consider this sequence:
```
\\\
```
That would resolve to:
```
\
```
The benefit of this strategy is that it can account for every permutation of backslash escaping. If you want the backslash to be treated as a literal backslash, you just add more backslashes. However, this strategy quickly leads to the leaning toothpick problem...which is essentially an exponential increase of required backslashes.
Let's assume that we start with the following AsciiDoc source:
```
:command: *begin*
:text: Use ??
{command} to begin a block.
```
We want to see the following the output document:
```
Use \<strong>begin</strong> to begin a block.
```
The question is, how many trailing backslashes do we need to use in place of ?? to produce a literal backslash without impacting the attribute reference and text formatting it contains? The answer is, we need 9.
```
:text: Use \\\\\\\\\
{command} to begin a block.
```
The last backslash acts as a value continuation. Then, it reduces the even number of backslashes that precede it by half, leaving us with 4. At this stage, this is what the processor sees:
```
Use \\\\{command} to begin a block.
```
Now we resolve the attribute reference, once again reducing the even number of backslashes that precede it by half. At this stage, here's what the processor sees:
```
Use \\*begin* to begin a block.
```
When the `{text}` attribute reference is used in the paragraph, the inline parser will locate the escaped backslash in the resolved value and once again reduce the backslashes by half, reducing it to 1 (which will not impact the text formatting). Thus, we arrive at the following result:
```
Use \<strong>begin</strong> to begin a block.
```
While this works, it's hard to explain to an author—especially someone not familiar with the low-level phases—why 9 backslashes are needed. Thus, I think we should consider strategy 2.
## Strategy 2: Only resolve escaped backslashes once, during inline parsing
In this strategy, the escaped backslashes are still considered at each phase, but they are left as is until inline parsing (the last phase). That way, they remain stable through the phases rather than being reduced at each stage. As a result, the user only needs to escape a backslash once.
Revisiting the previous example, the author only needs 3 trailing backslashes to achieve the desired result.
```
:command: *begin*
:text: Use \\\
{command} to begin a block.
```
The odd backslash is consumed as the value continuation. The remaining escaped backslash is reduced to a literal backslash by the inline parser.
The drawback of this strategy is that it's not possible to use a backslash to escaped the resolved value of an attribute. Let's assume that we want the following output instead:
```
Use *begin* to begin a block.
```
If we use `\\{command}`, then we're going to end up with `\\*begin*` rather than `\*begin*`. So we've sacrificed some flexibility for simplicity. However, there's still a mechanism available to achieve the desired result. If we set the `esc` attribute to a single backslash, then it becomes possible to insert an escape character in front of the resolved value of the attribute. Consider this case:
```
:esc: \
:text: Use {esc}\
{command} to begin a block.
```
Now when we reference `{text}`, the inline parser will see `\*begin*`. That means the output will show:
```
Use *begin* to begin a block.
```
**NOTE:** The value of the implicit `backslash` attribute will need to be `\\` rather than `\` so it produces a literal backslash as expected.
## Proposed decision
Given that the audience for AsciiDoc is more than just programmers, I think the simplistic approach is best here. We want to avoid the leaning toothpick problem, and we want to be able to easily explain the AsciiDoc rules without having to make the user aware of all the low-level phases. There are still plenty of mechanisms available in AsciiDoc to escape syntax without having to rely on strict backslash escaping.
It's worth noting that none of these scenarios mentioned in this issue are even available in Asciidoctor or its predecessor. That's because both only consider whether the character that immediately precedes the reserved syntax (e.g., an attribute reference) is a backslash, not whether that backslash is itself escaped. So this issue is primarily a refinement of #25.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/41Clarify syntax and parsing rules for continuing an attribute entry value acro...2024-02-27T21:00:43ZDan AllenClarify syntax and parsing rules for continuing an attribute entry value across multiple linesMost of the time, an attribute entry occupies a single line. For example:
```
:source-language: java
```
When the value is very long, the AsciiDoc syntax allows that value to be split across multiple lines by ending each previous line ...Most of the time, an attribute entry occupies a single line. For example:
```
:source-language: java
```
When the value is very long, the AsciiDoc syntax allows that value to be split across multiple lines by ending each previous line in a backslash, called an *attribute continuation*. This feature is inspired by shell interpreters, such as Bash. For example:
```
:description: This page is a migration guide. \
It only covers the migration between each LTS release.
```
The attribute continuation has never been very well defined beyond a basic example. This issue aims to resolve the syntax and parsing rules while also making the feature more robust and universal.
The attribute continuation serves two purposes. First, it tells the parser to append the next line to the value as long as that line is not an interrupting line. If the line is taken, the continuation and the newline that follows it are dropped. If the line is not taken, the continuation is preserved (meaning it remains as part of the value), but not the trailing newline.
Thus, the resolved value of the previous example is as follows:
```
This page is a migration guide. It only covers the migration between each LTS release.
```
**NOTE:** In addition to the interrupting lines for a paragraph, an attribute entry is interrupted by an adjacent attribute entry. Asciidoctor does not always get this requirement right. (Also, it's still unclear whether a list continuation should only be an interrupting line when inside of a list, or at any time).
Both Asciidoctor and its predecessor required the attribute continuation to be proceeded by a space. However, this is an unnecessary requirement and it makes it impossible to continue the the value without introducing a space. It should be possible to use the continuation directly at the end of the line.
```
:product-code: ISV-\
1234
```
This attribute entry would produce the value `ISV-1234`.
Any time we rely on a character to have special meaning, especially a backslash, it should be possible to escape that character. Like with the inline preprocessor, we will want to apply contextual escaping here. What that means is that if there are an even number of backslashes at the end of the line, the last backslash does not act as an attribute continuation and those backslashes are reduced by half. If there are an odd number, the last backslash is an attribute continuation and the remaining backslashes are reduced by half. Escaped backslashes anywhere else in the line are not considered.
Here's an example of how to use a literal backslash at the end of a value:
```
:instructions: escape markup using \\
```
However, keep in mind that most of the time this won't be necessary. That's because the backslash is preserved if the attribute entry is interrupted, which it almost always is. So this is unlikely to affect existing documents. Consider this case:
```
:instructions: escape markup using \
{instructions}
```
Here's an example of how to use a literal backslash and then continue the value:
```
:instructions: escape an autolink using \\\
https://example.org
```
Again, these are pretty rare events, so we're just defining the rules for completeness.
An attribute continuation allows the continued value to be aligned with the value on the previous line. Yet, the indentation is dropped from the value. Consider this case:
```
:description: This page is a migration guide. \
It only covers the migration between each LTS release.
```
Shell interpreters also support this feature. In shell interpreters, the repeating spaces are always normalized to a single space. However, I don't think we want that behavior. Instead, all leading indentation should be removed and only the space to the left of the attribute continuation should be kept. That gives the user better control over where the space ends up in the resolved value.
Of course, we have to consider whether we even want to normalize the spaces at all or just keep them as entered. In other words, do we want to encourage this style of formatting in the AsciiDoc source, or should the wrapped line always start at the left margin?
The final point to consider is how to specify a hard wrap. Consider the case when the value of the attribute entry is going to be used in a verbatim block or a paragraph with the hardbreaks option. The author is going to want to be able to preserve the newlines in the attribute value so that they carry over. But this is not possible in AsciiDoc.
Asciidoctor offers a partial compromise by enhancing the attribute continuation to recognize a hard line break shorthand before the continuation. When Asciidoctor detects this case, it preserves the newline. Consider this case:
```
:lines: one + \
two + \
three
```
The resolved value would be as follows:
```
one +
two +
three
```
This is a not a general purpose feature, and thus I think we can do better. I see two possible ways to express that the newline should be preserved, and there's no need to link it to the hard line break shorthand.
The first option is to use a double attribute continuation offset by a space. For example:
```
:lines: one\ \
two\ \
three
```
The escaped space in front of the list continuation would tell the processor to keep the newline after the attribute continuation. This is not likely a syntax that would interfere with content. However, it may be costly to parse.
Another option is to take a page from YAML and use the `|` character in front of the continuation as a hint to keep the newline.
```
:lines: one|\
two|\
three
```
However, the risk here is that the pipe character is used to separate table cells, so it could cause an AsciiDoc table cell to end prematurely. Though it could be escaped in that case.
Yet another option is to take a hint from Markdown and use multiple spaces in front of the continuation as a hint to preserve the ensuing newline.
```
:lines: one \
two \
three
```
This may be the safest and most portable option, and it's not terribly difficult to parse. It's rare that you need spaces at the end of a line, so we're able to take advantage of characters that would otherwise have no meaning. That's ideal for introducing a new feature. When newlines are preserved, indentation on wrapped lines is also preserved.
We can apply this to the earlier example of the partial syntax offered by Asciidoctor to see how it compares:
```
:lines: one + \
two + \
three
```
It's nearly the same syntax, but now it's not coupled to the hard line break shorthand.
* In summary, an attribute entry is interrupted by an adjacent attribute entry or paragraph interrupting line
* An attribute value can be continued to the next line by ending the line in an attribute continuation (trailing backslash)
* If the attribute continuation is unused, it is preserved at the end of the value
* Indentation is removed from wrapped lines
* The attribute continuation can be escaped using a backslash (any even number of backslashes at the end of the line)
* Newlines in an attribute value can be preserved by preceding the attribute continuation with two spaces0.4.0 (milestone build)https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/40Define how document attributes are represented in the ASG2024-02-14T23:23:52ZDan AllenDefine how document attributes are represented in the ASGAs part of defining a semantic representation of an AsciiDoc document, we must determine where document attributes fit within that structure. It's not enough for document attributes to reside as state in the parser since they can impact ...As part of defining a semantic representation of an AsciiDoc document, we must determine where document attributes fit within that structure. It's not enough for document attributes to reside as state in the parser since they can impact how nodes are interpreted after parsing is complete. In other words, the document attributes have to be tracked and represented in the ASG.
To start, we first need to establish that a document attribute is more than just a map entry (name/value pair). Where and how the attribute was declared (set or unset) matters too. Thus, we need to store document attributes as a value record. That record has the following properties:
* value - the value of the attribute if set, or `null` if explicitly unset
* source - records how the attribute was set; an enum that is one of the following values: external, header, body, intrinsic
* writable - whether the attribute can be changed or unset (i.e., mutable)
Thus, a document attribute entry consists of a name and value record. These entries are stored in an insertion-ordered map, where the map keys are the attribute names.
Let's assume the document header contains the following attribute entry:
```
:toc: left
```
Here's an example of the resulting entry in the document attributes map:
```
{
toc: {
value: 'left',
source: 'header',
writable: true,
}
}
```
With the information provided by the value records, it's possible to find document attributes that are immutable (i.e., locked), document attributes that have been explicitly unset, document attributes that are implicit, document attributes set from the API or CLI (and not later overridden), etc. This is all information that would not have been available if the document attributes were stored with their value and only when set.
The document attributes stored on the document node in the ASG should be stored using this value record. That map should be the compiled collection of document attributes up to the end of the document header (thus discarding any intermediate state).
AsciiDoc permits attributes to be set or unset using attribute entries in the document (in the header or in the body). Not all document attributes are declared this way. However, for those that are, those entries should be tracked in the ASG. Those records will need to be stored differently. That brings us to attribute entry nodes.
An attribute entry is recorded in an insertion-ordered map (or should it be an array?) for each group of attribute entries. In this case, the value of the entry can be the attribute value if set, or `null` if unset. An attribute declared using an attribute entry is always writable and the source is implicit, so there's no reason for using a value record in this case.
For attributes declared in the header, these attribute entries are stored on the `attributeEntries` property on the value of the `header` property on the document. For attributes declared in the body, these attribute entries are either stored on the `attributeEntries` property on the ensuing block, or stored as a sibling node of that block. (Prototyping is required in order to resolve this choice).
While it's true that all attributes declared in the header will already be represented in the map of attributes on the document, it's important to track them to ensure the document was correctly parsed. (As an alternative, the document attributes map could be all attributes set before the header, then the header attributes could be replayed onto the proxy, just like attributes declared in the body).
Within any attribute entries map, only the last entry for a given attribute name in that scope should be stored (i.e., locally impactful). Hence, if the attribute is set, then unset, no entry is stored. If the attribute is set, then set again, the second value is used. If the attribute is unset, then set, the set value of the attribute is stored. If the attribute is not mutable, the entry is discarded (since the attribute entry is effectively nullified).
As the body of the document is parsed, the effective document attributes need to be tracked. However, the map of document attributes as it exists at the end of the header should not be modified. Instead, that object should be proxied. Any updates to the map from attribute entries in the body should be applied to the proxy during parsing. That way, attribute entries in the body don't impact the state of the document attributes bound to the document. Yet the parser will still have access to the updated map as parsing proceeds.
If a node is accessed at random from the parsed document (ASG), the view of the document attributes needs to be built on demand by proxying the document attributes map on the document and replaying all the attribute entries in the body up to the start of the node. These proxy objects could be cached to avoid repeated and redundant computations.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/39Clarify how empty lines and list continuations impact list boundaries2024-02-27T21:17:44ZDan AllenClarify how empty lines and list continuations impact list boundariesLists have implicit boundaries in AsciiDoc (and in most lightweight markup languages). Hence, a common matter for an author is how to maintain the boundaries of a list or how to break out of them. Our goal is to ensure that its easy for ...Lists have implicit boundaries in AsciiDoc (and in most lightweight markup languages). Hence, a common matter for an author is how to maintain the boundaries of a list or how to break out of them. Our goal is to ensure that its easy for an author to keep a list together when needed, but also easy to separate lists when they shouldn't be adjoined.
In AsciiDoc, there are two forms that impact this outcome, the list continuation and the empty line. In this issue, we'll clarify what impact these two forms have on list parsing with the intent to solidify the rules of list boundaries.
## Scenarios
To understand how the grammar rules are defined, we'll be examining several scenarios:
* **[l-1]** Empty lines between list items (after the list item definitively ends)
```
* first item
* second item
```
* **[l-2]** Empty lines above a block attached with a list continuation, as well as between its metadata lines
```
* item
+
attached block
```
```
* item
+
[#idname]
attached block with an ID
```
* **[l-3]** Empty lines above a new list with or without block metadata following a list item
```
* item
. nested list
```
```
* item
[]
. nested or sibling list?
```
* **[l-4]** Empty lines above an indented (literal) block with or without block metadata following a list item
```
* item
indented
```
```
* item
[]
indented
```
Although it won't be discussed in this issue, an empty line above a list continuation applies that list continuation to an ancestor list. The number of empty lines equates to how many levels it ascends (e.g., one empty line means it applies to the parent).
### l-1
Let's start with **[l-1]**, since a decision here sets the foundation for what rules are available for the other scenarios. We often refer to **[l-1]** as ventilated list items. The reason is, authors have a tendency to want to put some space between list items to make them more readable. The question is, how much space is allowed? Consider the following case:
```
* first item
* second item
```
In pre-spec AsciiDoc, any amount of empty lines are tolerated between list items and the list will still stay together. To tighten this rule, and make it easier to separate lists, one proposal is to only permit a single empty line between list items. Any more and the list would be severed. However, this proposal could have major compatibility implications as there are many documents in the wild that rely on arbitrary ventilation. Furthermore, this new rule would be inconsistent with other lightweight markup languages such as Markdown and reStructuredText. It just seems to be part of the unwritten code of lightweight markup languages to allow list items to be separated by an arbitrary number of empty lines. (One notable exception is textile). If we decide to honor that code, then we won't be able to allow adjacent lists that have the same marker to be separated using empty lines alone.
Currently, the main way to separate adjacent lists that are congruent (i.e., same list marker) is to insert a block attribute line between them (with or without a preceding empty line). The block attribute line can be empty (i.e., nothing between the square brackets). Since a sibling list item cannot have metadata lines above it, this line effectively acts as an interrupting line. As a result, it causes the first list to end and a new one to begin. For example:
```
* first list
[]
* second list
```
This technique also works if the second list is preceded by a block title line, though it usually has to follow an empty line in order to be recognized as a block title line (typical rules).
If the lists are not congruent, the empty line above the block attribute line is required (to account for the scenario in **[l-3]**).
To make the intent of the interrupting line more clear, a non-functional option could be used to communicate the block attribute line's function:
```
* first list
[%interrupt]
* second list
```
Another technique to keep them apart is to enclose one of the lists in an open block:
```
* first list
~~~~
* second list
~~~~
```
We are still considering whether there are other ways to separate adjacent lists, such as using a line comment. However, we generally prefer comments to not impact parsing, so this may not be pursued. Either way, it will addressed in a separate issue.
### l-2
Let's now consider **[l-2]**. Here, there's an argument to once again be tolerable of multiple empty lines, but for different reasons. Before going on, it's important to emphasize that the list continuation cannot be preceded by an empty line (otherwise, it becomes a list continuation for an ancestor list). When the list continuation is found, it effectively tells the parser to expect a single block. Normally, a block can have empty lines above it (either above the metadata or in between it). Thus, it seems like it would be safe and consistent to allow them here too. The intent of the author is clear: "find one block to attach". If empty lines were not tolerated, then the parser would essentially be ignoring the request of the author and leave the list continuation dangling.
The AsciiDoc style guide should certainly encourage authors to not leave empty lines after a list continuation as it makes the attachment less clear. But from the standpoint of the parser, there's no real benefit of giving empty lines special meaning here.
### l-3
In a list, there are two cases of an implicit list continuation, **[l-3]** a nested list and **[l-4]** an indented (literal) block. In these cases, the rules about when empty lines are tolerated are more strict.
Let's look at **[l-3]** first. If an adjacent list is encountered that's different and has no metadata lines, that list is attached as a child of the current list item regardless of how many empty lines are above it. Again, this comes from the empty line tolerance in lists across lightweight markup languages. While we could forbid consecutive empty lines, we'd be introducing a special rule just for this case, which will be hard to remember.
The primary question is, how many empty lines should be permitted if the adjacent list has metadata lines? Typically in AsciiDoc, a block attribute line acts as an interrupting line. Intuition would then tell us that a block attribute line above an adjacent list will cause the previous list to end. Consider this case:
```
* disc
[square]
** square
```
However, the AsciiDoc syntax grants a special exception here. If there's no empty line above the block attribute line, it acts as through there's an implicit list continuation above it. Thus, the second list becomes a child of the list item in the first list (hence a nested list).
But what happens if there's an empty line above the block attribute line? Consider this case:
```
* disc
[square]
** square
```
Now we are torn between two standard rules. On the one hand, we said earlier that a block attribute line is one way to separate adjacent lists (i.e., prevent nesting). On the other hand, there's an implicit list continuation above an adjacent list when that list is different.
There are two possibilities here. The first choice is that we stick with the idea that *empty line + block metadata line* acts as an interrupting line with a list. This rule matches pre-spec AsciiDoc. In that case, here's how the second list would need to be attached if preceded by an empty line:
```
* disc
+
[square]
** square
```
The second choice is that we tolerate at least one empty line, but not consecutive ones. This rule borrows from an earlier proposal. Since no where else in the AsciiDoc syntax do consecutive empty lines have a different meaning than a single empty line (especially above a block), I think the second rule would be a risky choice to introduce here. I'm inclined to reject the idea.
### l-4
Finally, we arrive at **[l-4]**. Like with a nested list, an indented (literal) block has an implicit list continuation. If the indented block has no metadata lines, then it must be offset by at least one empty line or else it gets soaked up as part of the list item principal. Consider this case:
```
* item
indented
```
Since we've already established that a nested list without metadata lines can be preceded by an arbitrary number of empty lines, it's both logical and consistent to allow it in this case as well.
Once again, we need to consider what happens if the indented block has metadata lines. Consider this case:
```
* item
[.output]
indented
```
Pre-spec does not apply the implicit list continuation if the indented block has at least one metadata line. So the indented block would not be attached to the list item in this case. Instead, it would require an explicit list continuation to do so. However, if we want the rules of an implicit list continuation to be consistent, then we **should** attach the indented block if not preceded by any empty lines:
```
* item
[.output]
indented
```
The block attribute line interrupts the list item principal, so the indented block should be a candidate for attachment in this case. This is not supported in pre-spec AsciiDoc, but we could add it now.
## Summary and decisions
As we've stated in other issues, while formalizing AsciiDoc, we're trying to remain as consistent with how the language is currently interpreted as possible. At the same time, we need to address idiosyncrasies so the language is easy to understand, remember, and use.
With that in mind, we want to make it easy to keep lists together and also easy to separate them. The main subject of concern are empty lines. When are empty lines tolerated in a list and are consecutive empty lines are allowed? We established that pre-spec AsciiDoc—and lightweight markup languages in general—are quite tolerant of empty lines in a list. Any number of empty lines are permitted between list items of the same list, and empty lines are permitted following an explicit or implicit list continuation.
We considered the proposal of assigning meaning to consecutive empty lines so they act like a list interrupting line. While enticing, this proposal would greatly threaten compatibility and deviate from the unwritten code of lightweight markup languages. Thus, we don't think it's worth the risk.
We then clarified that adjacent lists can be separated using an empty line followed by a block attribute line. (If the two lists are congruent, the empty line is not required). This is a pattern that was heavily promoted in pre-spec AsciiDoc and, as a result, plenty of documents now depend on it. It offers an definitive way to separate lists that can be made to be self-documenting.
We then accepted that empty lines should be permitted above a block attached using an explicit list continuation. The justification is that the intent of the explicit list continuation is clear and there's no reason to counter that intent by giving empty lines special meaning. The goal of the list continuation is to find a block, and the parser should proceed until it does.
We then considered whether an empty line or lines should be permitted in the case an implicit list continuation is being used to attach a block with metadata lines. Here we decided that the syntax should not be tolerant of empty lines. The reason is that it would break the contract that *empty line + block attribute line* can be used to separate adjacent lists. The block metadata line either must not be preceded by an empty line or the list must be attached using an explicit list continuation.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/38Decide whether a non-indented line interrupts an indented block form2024-02-27T20:59:42ZDan AllenDecide whether a non-indented line interrupts an indented block formAn indented block form is defined as one or more contiguous lines indented by at least one space. This is an implicit structure that produces a literal block in the parsed document (the ASG).
(In pre-spec AsciiDoc, this was referred to ...An indented block form is defined as one or more contiguous lines indented by at least one space. This is an implicit structure that produces a literal block in the parsed document (the ASG).
(In pre-spec AsciiDoc, this was referred to as a literal paragraph, but we've since decided to name it a literal block with the indented form to make the terminology more accurate and consistent).
A question that has come up when defining the grammar is what to do if a subsequent line is not indented (and not otherwise an interrupting line). In other words, can a paragraph interrupt an indented literal block? Consider the following case:
```
indented
not indented
```
In both Asciidoctor and its predecessor, the non-indented line does not interrupt the indented block form. Thus, only the first line has to be indented by at least one space. This parsing behavior mandates that an adjacent paragraph must be separated by at least one empty line. In other words, a non-indented line cannot interrupt the indented block form, but is rather consumed as part of it.
There are two reasons why this behavior may be problematic:
* It's not consistent with other markup languages, Markdown in particular. (rST also treats it as an interrupting line, though the indented block is a blockquote)
* According to CommonMark, "A blank line is not needed ... between a code block and a following paragraph."
* It's makes it more nuanced to explain and to identify in the source.
There's one other important reason this should be considered. The next list item should be allowed to interrupt the indented block.
```
* first item
indented
* second next item
```
However, it currently is not permitted, which is definitely surprising. And yet the list item is permitted to interrupt an attached paragraph. The interruption rules just seem inconsistent in this regard.
It's very unlikely that existing documents rely on this behavior since the general practice is to surround the indented block by empty lines. But in the event that it does occur, the parser must have deterministic behavior. I think we should at least discuss changing the rule so that a non-indented line acts as an interrupting line, meaning it's not consumed as part of the indented block.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/34Create guidelines for describing an element in the specification document2024-02-27T20:27:16ZSarah WhiteCreate guidelines for describing an element in the specification documentThis proposal is an outline for how to write about an element in the specification document. Some of the following section titles or their contents won't apply when writing about high-level topics such as blocks, spec-wide terminology, m...This proposal is an outline for how to write about an element in the specification document. Some of the following section titles or their contents won't apply when writing about high-level topics such as blocks, spec-wide terminology, macros, etc. Creating guidelines for high-level topics may not be necessary, but if so, we'll do that in a separate issue.
Proposed structure of a specification section for an element
1. The document title is the name of the element
- Generally, create one document per element, such as Paragraph, Sidebar, Strong Span
- The filename should match the document title in most cases.
2. Description: A non-normative section describing the element and any applicable semantics and meanings.
- Source examples: A subsection of Description showing source examples included from the TCK; this ensures they're accurate and valid
- The source examples should be included from the tests in the TCK, not handwritten! A case would need to be made for any examples to be written in the document that aren't functionally sourced directly from a tested file.
3. Context: A section specifying the place, environment, or situation in which the element can be used.
A context is a parent of the element; therefore, the element is a child of the parent in the ASG and DOM trees.
- Q: Should this section be normative or non-normative?
4. Content model: A normative section.
The content model of an element equates to the grammar rules for the contents of the element.
The content model describes the children the element is capable of accepting.
These children are represented as descendants of the element in the ASG and DOM trees.
- The section should state the model, what the element accepts, whether it can be empty, whether it can be parsed to empty, and what it can be interrupted by
- Attributes and metadata: A subsection of Content model that lists the attributes and metadata the element must accept
5. Possible section: Grammar (or) Grammar rules
- Q: What differentiates the grammar/grammar rule section and the content model and metadata sections? That is, why is this section needed in the specification document?
6. ASG and DOM: A normative section containing the applicable snippets from the ASG schema and the DOM, potentially in a tabbed interface.
- Should this section be a child of the content model section?
7. Q: Should we add a section or information in one of the sections that identifies whether the element is extensible?
As far as text usage and formatting, we need to decide if we want to:
* follow the formalized capitalization and usage of words such as MUST, SHOULD, etc. (there's an RFI for this sort of thing)
* try to consistently use those words in normative sections but don't capitalize them
* use some other terminology or styling to call out hard and soft rules
For this issue to be complete, I would like to be 75% confident we're using a communicative and usable structure, including clear section titles and sections in a logical order, and that the content in the sections clearly communicates to developers the information they must know while creating an implementation. However, we don't need to nail down every term or determine how we're going to functionally insert the ASG and DOM snippets into the specification from their tested single source.
When drafting up these guidelines, I created a few sample sections (Paragraph, Strong Span, etc.) and consulted the Unicode, HTML, JavaScript, and OpenAPI specs for structural information.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/26Proposal to clarify the behavior of the preprocessor2023-05-05T22:47:24ZDan AllenProposal to clarify the behavior of the preprocessor## Purpose
The line-oriented preprocessor in AsciiDoc (herein the AsciiDoc preprocessor) is one of the biggest hurdles we face in formalizing the AsciiDoc language syntax (using a grammar formalism). That’s because the preprocessor dire...## Purpose
The line-oriented preprocessor in AsciiDoc (herein the AsciiDoc preprocessor) is one of the biggest hurdles we face in formalizing the AsciiDoc language syntax (using a grammar formalism). That’s because the preprocessor directives are both coupled with the document structure and work outside of it.
The purpose of this issue is to figure how to describe the behavior of the preprocessor in a way that allows the language to be formalized. We’ll look at it from various perspectives that range from untangling the preprocessor from the document structure so it’s easier to parse to carefully defining grammar rules, actions, and parsing requirements to match the existing functionality. In the end, it may just come down to accepting the existing behavior and figuring out how to describe it in terms of a grammar formalism.
This issue can be resolved once we've decided on the behavior of the AsciiDoc preprocessor and resolved how to describe it in such a way that it doesn't prevent formalizing the grammar for the AsciiDoc language.
## Background
> In this section, I define the AsciiDoc preprocessor, describe its purpose, and explain how it works today (according to the user documentation and how it’s implemented in Asciidoctor).
The AsciiDoc preprocessor provides directives that add or remove lines from the source document ahead of (block-level) parsing. The preprocessor is strictly line-oriented.
There are two types of preprocessor directives: conditional directives and the include directive. The conditional directives (ifdef, ifndef, and ifeval) are for filtering lines in the source document. Conditional directives are useful for producing variations of a single document, such as to repurpose it for different audiences. The include directive (include) is for adding new lines to the source document from an external file, thus allowing documents to be composed.
Once a preprocessor directive is processed, the parser does not see it (and thus does not introduce a boundary in the structure). Rather, the parser only sees the outcome of the directive (which is either more or less lines than what’s in the source document).
In its purest form, a [preprocessor](https://en.wikipedia.org/wiki/Preprocessor) is applied to the source document before the document is parsed (i.e., called a lexical preprocessor). But that’s not how the AsciiDoc preprocessor works. The AsciiDoc preprocessor is somewhere between a lexical preprocessor and a syntactic preprocessor.
The AsciiDoc preprocessor is able to see the value of attributes set or unset by attribute entries in the document. This pertains to attribute entries set in either the header or the body. However, since attribute entries are not permitted anywhere in an AsciiDoc document, knowing how to find them means the preprocessor must have at least some awareness of the document structure. On the other hand, the preprocessor directives themselves can appear anywhere in the document (except perhaps in comment blocks, which is open for discussion), meaning they exist outside the structure of the document. Thus, the preprocessor must be able to recognize the structure of the document enough to process attribute entries, but not restrict where the preprocessor directives can be used.
To summarize, the preprocessor has access to document attributes as soon as they are defined in the document (in addition to ones passed to the processor), but does not otherwise recognize or honor the document’s block structure. Thus, we can say that the AsciiDoc preprocessor is not a lexical preprocessor, but rather a syntactic preprocessor (at least in part). I prefer to think of it as a priority (or contextual) preprocessor.
Unfortunately, the behavior I just described presents a real problem for defining a grammar for AsciiDoc. The requirements it calls for are really at odds with a grammar formalism and potentially compromises our ability to define one. Addressing this problem may call for a separate parsing phase which handles preprocessor directives with just enough parsing of the block structure baked in to also handle attribute entries. Or it may be possible to fold that behavior into the primary grammar, hiding it behind select annotated rules to keep the grammar tidy. Either way, it’s going to put some real constraints on which parsing technologies can be used to parse AsciiDoc.
Let’s consider different approaches that make the described behavior compatible with a grammar formalism and/or change the behavior so it can be.
## Proposed Models
There are at least four ways we can consider defining the behavior of the preprocessor:
* lexical preprocessor
* lexical preprocessor in body
* priority block processor
* priority line processor
We’ll look at each of these in detail.
### Lexical Preprocessor
One approach we could take is to redefine the AsciiDoc preprocessor as a strict lexical preprocessor. Using this model, the preprocessor only looks for preprocessor directives as reserved, line-oriented tokens. In other words, it looks at the source document as a series of lines, but otherwise doesn’t acknowledge the structure of the document. This is by far the easiest to implement. The grammar rules only have to look for lines that are preprocessor directives, adding or removing lines as prescribed. The grammar does not have to try to find or process attribute entries. However, what it means is that the only attributes the preprocessor directives can see are the ones passed into the processor (in other words, attributes defined in the document don’t affect the operation of the preprocessor).
Although this model may sound alluring, it has a tremendous impact on compatibility. The assumption that preprocessor directives can reference attributes defined in the document, at the very least in the document header, is ingrained and, as such, documents that use preprocessor directives are often written in this way. Changing this model now would almost certainly violate our commitment to creating a specification that’s reasonably compatible with existing content. In other words, it would be a significant departure from AsciiDoc as we know it.
### Lexical Preprocessor in Body
To address the problem of the lexical preprocessor not being able to see attribute entries defined in the document header, where they are most often defined, we could consider different preprocessing rules for the document header. This would work because the structure of the document header is flat and thus lends itself to line-oriented processing. Locating the end of the header is only complicated by having to consider preprocessor directives, which would already be addressed here.
The document header could be processed line-by-line, allowing the preprocessor directives to see the result of the previous line and filtering lines in advance of the header parser. While the grammar for the header would be slightly less formal (or require extra parsing requirements), that exception would be confined to the document header. Once the header is cleared, the preprocessor would switch to being a lexical preprocessor, ignoring all remaining structure (and, as such, attribute entries).
This model is certainly something worth considering, but I'd need to see a proof of concept of it working. It does have an impact on compatibility, but only in the case where documents are written to use preprocessor directives that rely on attributes defined in the body of the document. If there are documents that rely on this behavior, then this change will break compatibility in a way that is significant.
An impact assessment would certainly need to be done here. It’s not uncommon for documents to change the value of an attribute to modify the target of a subsequent include. We often see this pattern used in books, where the target of a chapter file is controlled by an attribute. Documents may also use attributes to change the location of an example file that is included in a code block. My instinct tells me that we’re going to find that it’s going to present major problems.
### Priority Block Processor
Another model for the preprocessor is to process it as transparent block that does not appear in the ASG. In this model, the preprocessor would be part of the document structure and thus would naturally be able to see attributes set or unset by attribute entries. However, it would impose a lot of new restrictions on how and where the preprocessor directives can be used. It could also introduce side effects in the parsing.
For one, conditional preprocessor directives would have to be balanced within the document structure. In other words, they couldn’t overlap boundaries of a block like they can today. They would also not be permitted in places where blocks are not allowed, such as around block attribute lines. There’s also a question of how they would be processed within verbatim content, something that’s permitted today. It’s also not clear whether lines contributed by adjacent include directives would be stitched back together to create a single block. In other words, the preprocessor directives would end up introducing artificial boundaries in the block structure.
While this model has some merit, it also has a tremendous consequences on compatibility. And while it may simplify the grammar, it would also require additional processing to transform the parse tree. Thus, I don’t really see how we can consider it.
### Priority Line Processor
A priority line processor is the closest model to what we have in AsciiDoc today. Thus, this is the preferred proposal.
In this model, every line must be checked for a preprocessor directive before it’s considered by the grammar parser. If a preprocessor directive is found, it needs to be processed and the input modified so the pending grammar rule only sees the outcome. If the preprocessor directive leaves behind a preprocessor directive on the same line (such as by an include directive), that directive must also be processed. Once the current line is confirmed to not be a preprocessor directive, the pending grammar rule may proceed.
The priority line processor can either be integrated with the grammar parser (thus a single parsing phase), or it can be done as a separate parsing phase. If done as a separate phase, it will still have to consider the structure of the document in order to locate and process attribute entries, but this mode is effectively a lightweight parse rather than a complete one. By lightweight, I mean that it would be doing just enough to identify valid attribute entries.
While choosing this model has no impact on compatibility, it puts rather substantial restrictions on what parsing technologies can be used for parsing AsciiDoc. We are essentially requiring the parser to allow the input ahead of the cursor to be modified while parsing is taking place. It also has to be possible to instruct the parser to backtrack to the location of the preprocessor directive after the directive has been processed. That, in turn, means that any information cached about the input at that point forward needs to be cleared. These requirements are distinctly at odds with a grammar formalism.
With that said, it’s not likely that an implementation will use a grammar-based parser to handle the preprocessor requirement. Instead, it may decide to employ bespoke line-based processing logic for this step, such as we see in Asciidoctor (and downdoc). But we still may be able to describe the behavior of the preprocessor using the grammar from a grammar-based parser that can accommodate the stated requirements. In doing so, we will have achieved the goal of communicating the normative rules using a grammar while, at the same time, not mandating that an implementation do it that way.
Here’s a partial exhibit of a dedicated grammar for the preprocessor that shows how a priority line processor might work:
```
document = header? body lf*
header = ...
body = pp_block*
pp_block = (pp (lf / attribute_entry / block_attribute_line))* block
pp = (pp_directive* . !.)?
pp_directive = pp_conditional / pp_conditional_short / pp_include
pp_conditional_short = operator:pp_conditional_name '::' attribute_name:attribute_name '[' contents:$([^\n\]]+ &(']' eol) / ([^\n\]] / ']' !eol)+) ']' eol
{
// see action for pp_conditional rule
}
pp_conditional = operator:pp_conditional_name '::' attribute_name:attribute_name '[]\n' contents:conditional_lines 'endif::[]' eol
{
const { start: { offset: startOffset }, end: { offset: endOffset } } = location()
const drop = operator === 'ifdef' ? !(attribute_name in options.attributes) : (attribute_name in options.attributes)
// TODO record line offsets
input = input.slice(0, (peg$currPos = startOffset)) + (drop ? '' : contents.join('')) + input.slice(endOffset)
peg$posDetailsCache = [{ line: 1, column: 1 }]
return true
}
conditional_lines = (!('endif::[]' eol) @(pp_conditional_pair / $([^\n]+ eol) / '\n'))*
pp_conditional_pair = opening:$(pp_conditional_name '::' attribute_name '[]\n') contents:conditional_lines closing:$('endif::[]' eol)?
pp_conditional_name = 'ifdef' / 'ifndef'
pp_include = 'include::' target:$[^\[\n]+ '[]' eol
{
const { start: { offset: startOffset }, end: { offset: endOffset } } = location()
const contents = require('fs').readFileSync(target, 'utf8').split(/(?<=\n)/)
// TODO record line offsets
input = input.slice(0, (peg$currPos = startOffset)) + contents.join('') + input.slice(endOffset)
peg$posDetailsCache = [{ line: 1, column: 1 }]
return true
}
block = example / listing / list / ... / paragraph
attribute_entry = ':' name:attribute_name ':' value:(' ' @$[^\n]+ / '') eol
{
options.attributes[name] = value
}
attribute_name = $[a-z]+
example = ...
listing = '----\n' contents:$(pp !('----' eol) line / '\n')* pp '----' eol
list = ...
paragraph = (pp !(block_attribute_line / any_parent_block_delimiter_line) @line)+
line = value:$([^\n]+ eol)
eol = '\n' / eof
eof = !.
lf = '\n'
```
There are a couple things to notice about this grammar. Any time the grammar looks for a line, it must run the `pp` rule to make sure the line has been preprocessed. When the grammar looks for preprocessor directives, it must keep looking until it doesn’t find any at that location. It must then fail that rule so that the cursor is not advanced. (In practice, I found it necessary to reset the cursor manually since I couldn’t find a way to fail each `pp_directive` rule individually but still continue checking for preprocessor directives). In the action that processes the preprocessor directive, it must be modify the input to replace the directive with its contents (either the conditional lines or the contents of the include). It then needs to move the cursor back to the start offset of the directive so the input can be reprocessed starting at that point. The grammar needs to walk the block structure, but does not have to get into the finer details of how to parse the blocks. In particular, it doesn’t need to consider the inline syntax at all.
The behavior of the priority line processor is being described formally as follows:
* a lightweight parse of the block structure in order to identify and process attribute entries
* the inclusion of the `pp` rule to identify and process for preprocessor directives
* a rule to read the contents of a conditional preprocessor directive without processing the lines
It’s debatable whether it helps to have a separate preprocessing phase, though it’s certainly useful as a tool (consider the role of Asciidoctor Reducer). I think when we define the primary grammar for the language, we may want to do so without including the preprocessor rules (thus thinking about them as a separate phase). But an implementation may combine the grammars to avoid having to maintain separate grammars.
One very important factor to consider in all these models is how to map nodes to the original source. In other words, how to track line offsets as a result of resolving the preprocessor directives.
## Line Offsets
Another big challenge with the preprocessor (irrespective of the model) is tracking line offsets. When reporting problems, or to allow a document to be properly analyzed, we want to be able to map the location of nodes in the parsed document / ASG to the source document or documents. If a preprocessor comes through and moves lines around, it compromises the parser’s ability to provide this information accurately. Thus, when the preprocessor runs, it must build a map of processed lines to source lines. The parser then needs to run the reported location through this map to resolve the correct location in the source document. (From experience, building this map can be quite tricky).
Although the logic is difficult, the result of the mapping is quite easy to understand. Consider the following AsciiDoc source:
```asciidoc
début
conditional content
fin
```
Here’s the source the parser will see (after the preprocessor does its thing):
```asciidoc
début
conditional content
fin
```
A line offset mapping may look something like this:
```json
{
"1": { "line": 1, "column": 1, "delta": 0 },
"2": { "line": 3, "column": 1, "delta": 1 },
"3": { "line": 5, "column": 1, "delta": 2 },
}
```
In the parser, it can run the reported line through this map to get the source line. Obviously, it gets a little trickier when we have to consider included lines, but the idea is still the same.
## Conclusion
After careful analysis, I don’t see any way to make the preprocessor less complicated than it currently is (both in terms of how to define it using a grammar formalism and how to implement it). Of the models presented, I think the priority line processor is the best choice to pursue. That’s partly because it maintains compatibility with existing usage. It’s also because I’ve proved that it’s possible to use a grammar formalism to describe its behavior given we can make use of specialized parsing features to do it. Namely, we have to assume that the parser is capable of modifying the input as it proceeds and to reprocess input that was modified by resolving preprocessor directives.
I do think it’s at least worth discussing the switch to a lexical preprocessor for the document body. However, I’m not convinced that actually makes AsciiDoc simpler to parse (in addition to the incompatibility problem it introduces). Thus, it may be better to stick to defining the preprocessor as currently works in Asciidoctor (a priority line processor), but to do a better job of tracking line offsets accurately.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/25Proposal to make backslash escaping stable2024-02-28T00:22:45ZDan AllenProposal to make backslash escaping stableEscaping markup using a backslash character (“backslash escaping”) is one of the weaker areas of the AsciiDoc syntax. As currently described in the user docs (which reflects how it's implemented in Asciidoctor), a backslash character is ...Escaping markup using a backslash character (“backslash escaping”) is one of the weaker areas of the AsciiDoc syntax. As currently described in the user docs (which reflects how it's implemented in Asciidoctor), a backslash character is only treated as meaningful if it precedes a markup element (markup that would have otherwise been interpreted). For example, `\*stars*` becomes `*stars*`. If the backslash is used in front of a character that isn't a markup element (i.e., doesn't match a grammar rule), such as `\*star`, the input remains as is, `\*star`. To a writer not well-versed in the rules of the AsciiDoc syntax, this behavior appears broken.
What we can say is that backslash escaping is contextual. Rather than instructing the parser to pass through the escaped character without interpreting it (`\*` becomes `*`), it's dependent on whether that markup character is enlisted in a markup element. That puts the onus on the writer to track where the markup character is being used and whether that usage gives it special meaning. Expecting the writer to take on this responsibility makes backslash escaping feel unstable. Writers often avoid using this escaping mechanism and resort to more brute-force methods such as inline passthroughs.
As part of formalizing the AsciiDoc language, I feel strongly that we should stabilize this mechanism to make it more approachable.
I see three ways we could define backslash escaping:
* **contextual** - the backslash prevents a markup element from being interpreted; in this case, the backslash is consumed (`\*stars*` becomes `*stars*`); if no markup element is found immediately following the backslash, the backslash is left in place (`\*star` remains as `\*star`)
* **universal** - a backslash can be used in front of any character and consumed; the character it escapes will not be considered when looking for markup elements (`\look at that \*` becomes `look at that *`); a literal backslash must escape itself (`\\` becomes `\`)
* **reserved** - a backslash can be used in front of any reserved character in the markup and is always consumed; if used in front of any other character, the backslash is left as is (`\n is a line feed; \* is an asterisk` becomes `\n is a line feed; * is an asterisk`); a literal backlash in front of a reserved markup character would have to itself be escaped (`\\*word*` becomes `\<strong>word</strong>`); otherwise, the backslash can be written as `\\` or `\`
One exception to maintain backwards compatibility is a macro prefix, which is treated as a single markup expression (`\link: starts a link macro` becomes (`link: starts a link macro`); (another option would be to switch to contextual backslash escaping in this case, though it would add a dependency on using semantic predicates in the parser); regardless, moving forward, escaping the colon would be preferred (`link\: starts a link macro`); another exception is a bare URL, which is treated as a single markup expression and thus a contextual escape (this wouldn't rely on a semantic predicate since there is no intention to interpret the identified URL any other way)
As mentioned above, AsciiDoc is currently described to permit contextual backslash escaping. We want to move past this. However, universal backslash escaping may be a step too far if we consider the impact on compatibility. The most notable problem will be Windows file paths. Under universal backslash escaping rules, `C:\projects` becomes `C:projects`. We can't expect writers to go back and fix all these cases. Besides, there's no expectation that a backslash has a meaning in this case (and quickly introduces leaning toothpick syndrome).
Therefore, reserved backslash escaping may offer the best compromise. By choosing reserved backslash escaping, the writer no longer has to worry about escaped markup that doesn't match a syntax rule, but also won't be faced with the Windows file path problem. The only thing that still must be considered is that escaping markup could cause different markup to be found, which then must be escaped.
One **open question** is which markup characters to define as reserved? Should we say that all symbol/punctuation characters in the ASCII charset can be escaped, or limit it to just the ASCII characters that the AsciiDoc syntax currently uses? For reference, CommonMark allows escaping all ASCII punctuation.
Here are the reserved markup characters identified thus far:
```
\ ` _ * # ~ ^ : [ < ( {
```
Note that it shouldn't be necessary to have to escape the closing bracket of a markup element, hence why those characters are not listed here as reserved.
Another **open question** is how to escape unconstrained marked text. Currently, AsciiDoc requires that the opening unconstrained mark be double escaped (`\\**stars**`). However, this is both context-dependent and ambiguous (as escaping a backslash should make a literal backslash). Therefore, we may have to change this rule to be (`\*\*stars**`). This will introduce a slight incompatibility, but one that is reasonable to explain and to justify with the goal of making backslash escaping stable.
The examples provided thus far focus on where backslash escaping is used in inline syntax. It should also be considered for the following block-level constructs:
* preprocessor directive (`\include::target[]`)
* block macro (`\image::target[]`)
* list item (`\* is an asterisk`)
* dlist term (`App\:: is a Ruby namespace`) (or should it be `\App:: is a Ruby namespace`?)
* heading (`\= is an equals sign`)
**Open Question:** For block-level constructs, are we interpreting the backslash because it's at the beginning of the line, or because it is escaping a character? I think we should consider it because it's used at the beginning of the line. (I think this would translate to removing the backslash at the beginning of a paragraph). That reduces how much markup we have to designate as reserved.
In terms of parsing efficiency, we have identified the following optimization for processing backslash escaping. During parsing, only consider backslash characters that are escaping a grammar rule that is being considered. Once parsing is complete, drop the backlash in front of all reserved characters in the transformation from a parse tree to an AST/ASG. This can minimize the number of checks that the grammar has to consider.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/24Summarize all available block style in one section2023-03-10T23:02:32ZHemang AjmeraSummarize all available block style in one section
Adding a list/table of all the style available in asciidoc will be useful in this page https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/blocks/pages/index.adoc
Adding a list/table of all the style available in asciidoc will be useful in this page https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/blocks/pages/index.adochttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/23Include Content Model information for each context2023-03-10T23:03:36ZHemang AjmeraInclude Content Model information for each contextIt will be really useful for newbie like me to understand the content concept better if we can add third column in the table at https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/blocks/pages/index.ad...It will be really useful for newbie like me to understand the content concept better if we can add third column in the table at https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/blocks/pages/index.adoc#user-content-summary-of-built-in-contexts indicating context model (if a block is compound, simple, verbatim, raw, empty or table)https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/21Document see and see-also indexterm attributes2023-02-28T22:49:42ZSarah WhiteDocument see and see-also indexterm attributes*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/120*
In brief, the shorthand delimiter `>>` defines a see term and the shorthand delimiter `&>` defines one or more see also terms.
Here's an example of an index term...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/120*
In brief, the shorthand delimiter `>>` defines a see term and the shorthand delimiter `&>` defines one or more see also terms.
Here's an example of an index term with a see term (i.e., redirect):
```
(((Flash >> HTML 5)))
```
This is shorthand for:
```
indexterm:[Flash,see=HTML 5]
```
Here's an example of an index term with see also terms:
```
(((HTML 5 &> CSS 3 &> SVG)))
```
This is shorthand for:
```
indexterm:[HTML 5,see-also="CSS 3,SVG"]
```
Both attributes work with the visible index term as well (double round brackets or indexterm2 macro).
There can only be one "see" term. There can be multiple "see also" terms.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/20Consolidate and clarify anchors and ID end user documentation2023-02-28T19:45:08ZSarah WhiteConsolidate and clarify anchors and ID end user documentationThe end user documentation regarding anchors and IDs needs to be consolidated and clarified as some users have expressed that the concepts of ID and anchor could be better explained, that the information about where IDs can be placed and...The end user documentation regarding anchors and IDs needs to be consolidated and clarified as some users have expressed that the concepts of ID and anchor could be better explained, that the information about where IDs can be placed and the syntax for such use cases should be consolidated, and that more and/or better examples would be helpful.
This issue should:
- identify what areas of the anchor and ID end user documentation needs to be clarified (restructured, reworked, more search-friendly, direct headings, etc.)
- restructure and edit the anchor and ID content that is problematic
- make sure that the exceptions and recommendations for each syntax and/or use case is clearly stated
- add new content and/or new examples for any anchor and ID use cases
This issue was created based on a PR and the discussion referenced in the PR: https://github.com/asciidoctor/asciidoc-docs/pull/106
This issue may have some cross over with #19.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/19Clarify and expand xref macro end user documentation2023-02-28T19:45:46ZSarah WhiteClarify and expand xref macro end user documentation*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/79*
Some users are having difficulty finding the documentation about how to use the xref macro (both the shorthand and named version), determining when they should use...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/79*
Some users are having difficulty finding the documentation about how to use the xref macro (both the shorthand and named version), determining when they should use which form, and what options and attributes are associated with the xref macro.
This issue should:
- identify what areas of the xref end user documentation needs to be clarified (restructured, reworked, more search-friendly, direct headings, etc.)
- restructure and edit the xref content that is problematic
- add new content and/or new examples for any undocumented xref options, attributes, or common use cases
The existing xref documentation pages are:
- Cross References: https://docs.asciidoctor.org/asciidoc/latest/macros/xref/
- Document to Document Cross References: https://docs.asciidoctor.org/asciidoc/latest/macros/inter-document-xref/
- Cross Reference Text and Styles: https://docs.asciidoctor.org/asciidoc/latest/macros/xref-text-and-style/
- Validate Cross References: https://docs.asciidoctor.org/asciidoc/latest/macros/xref-validate/https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/18Add dedicated section or page for the linenums option on source blocks2023-02-28T19:15:19ZSarah WhiteAdd dedicated section or page for the linenums option on source blocks*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/77*
Source blocks support the `linenums` option. This option adds line numbers to the rendered source block when supported by the source highlighter or converter.
```...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/77*
Source blocks support the `linenums` option. This option adds line numbers to the rendered source block when supported by the source highlighter or converter.
```asciidoc
:source-highlighter: rouge
[source%linenums,js]
----
function fib (n) {
if (n < 2) return n
return fib(n - 1) + fib(n - 2)
}
console.log([1, 2, 3, 4, 5].map(fib))
----
```
The `linenums` option can also be specified as the third positional (unnamed) attribute on the source block.
```asciidoc
:source-highlighter: rouge
[source,js,linenums]
----
function fib (n) {
if (n < 2) return n
return fib(n - 1) + fib(n - 2)
}
console.log([1, 2, 3, 4, 5].map(fib))
----
```
However, the named option is preferred. The option can also be enabled globally by setting the `source-linenums-option` attribute.
The docs should emphasize that while the option is a part of the AsciiDoc language, it does require support from the toolchain (syntax highlighter adapter, converter, and/or output format), so it may not always be honored.
The `linenums` option is documented in passing in the Asciidoctor docs on the page for each syntax highlighter adapter. See:
* https://docs.asciidoctor.org/asciidoctor/latest/syntax-highlighting/pygments/
* https://docs.asciidoctor.org/asciidoctor/latest/syntax-highlighting/rouge/
* https://docs.asciidoctor.org/asciidoctor/latest/syntax-highlighting/coderay/
However, this topic deserves its own section or page to define it and solidify it as part of the AsciiDoc language.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/17Add top-level Document section to documentation site navigation2023-02-28T19:11:54ZSarah WhiteAdd top-level Document section to documentation site navigation*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/10*
The document is such a foundational concept in AsciiDoc that it deserves its own top-level section in the nav. In the end user documentation, this section is the n...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/10*
The document is such a foundational concept in AsciiDoc that it deserves its own top-level section in the nav. In the end user documentation, this section is the natural place for the pages about the document type and document header to live. (It's possible that the pages about document attributes could live here, though they are arguably so universal that they still deserve their own top-level section).
Here's the proposed layout:
- Document
- Document Type
- Document Header
Document is also a central concept in the API, so having a place to link might make writing about the API easier.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/15Fix scope of abstract section style2023-02-28T02:08:40ZSarah WhiteFix scope of abstract section style*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/23*
*Published page: https://docs.asciidoctor.org/asciidoc/latest/sections/abstract/*
*Source page: https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/23*
*Published page: https://docs.asciidoctor.org/asciidoc/latest/sections/abstract/*
*Source page: https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/sections/pages/abstract.adoc?plain=1*
The documentation incorrectly states that the `abstract` style can only be applied to a section in the `article` doctype on the [Abstract (Section) page](https://docs.asciidoctor.org/asciidoc/latest/sections/abstract/). The abstract style can also be used on a section in the book doctype. In this case, it becomes a chapter, even if defined as a level-0 section (in a multipart book).
It does correctly note that it can be applied to `article` and `book` doctypes on the [Section Styles for Articles and Books page](https://docs.asciidoctor.org/asciidoc/latest/sections/styles/) The abstract style can also be used on a section in the book doctype. In this case, it becomes a chapter, even if defined as a level-0 section (in a multipart book).
Note the content commented out in the page:
```
////
There's some sort of funkiness with book and abstract so we're putting this section on hold.
== Book abstract syntax
When the document type is book, the `abstract` section style must be placed on the first section _inside_ the chapter section.
The section must be one section level below the chapter, that is, the chapter is marked up as level 1 (`==`) so its abstract must be marked up as level 2 (`===`).
An abstract may not be used _before_ a part or chapter in a book.
[source]
----
= Book Title
:doctype: book
== Chapter Title
[abstract]
=== Summary
Documentation is a distillation of many long adventures.
=== Section
----
////
```
This content needs to be reconciled as part of this issue. Is the section level information correct or not?
The PR for this issue should update the Abstract (Section) page to correctly describe the application and result of the `abstract` style on a section in both doctypes and reconcile the commented out content (remove it if it is incorrect or make sure any correct information in it is communicated in the new content).https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/14Section style should be framed as a special section2023-02-28T01:55:15ZSarah WhiteSection style should be framed as a special section*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/5*
When a section title has a style (e.g., `preface`, `appendix`, `glossary`, etc.), it automatically makes that section a special section. The documentation should in...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/5*
When a section title has a style (e.g., `preface`, `appendix`, `glossary`, etc.), it automatically makes that section a special section. The documentation should introduce the term "special section" and explain this correlation.
Also, do we need separate pages for [Dedication](https://docs.asciidoctor.org/asciidoc/latest/sections/dedication/) and [Colophon](https://docs.asciidoctor.org/asciidoc/latest/sections/colophon/)? Except for a handful of exceptions, special sections are just a passthrough designation (What does a passthrough designation actually mean?). For any special section style that AsciiDoc doesn't apply special meaning, they should be grouped on the Special Sections page and just listed. These include: colophon, dedication, acknowledgements. We may also point out that technically the list is open ended. It's up to the discretion of the converter to handle any additional names.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/13Add a dedicated page that covers the AsciiDoc table cell style2023-02-28T01:43:31ZSarah WhiteAdd a dedicated page that covers the AsciiDoc table cell style*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/75*
The AsciiDoc table cell style deserves it's own page. Currently it is only documented as a section on the following page:
*Published page: https://docs.asciidocto...*Original issue: https://github.com/asciidoctor/asciidoc-docs/issues/75*
The AsciiDoc table cell style deserves it's own page. Currently it is only documented as a section on the following page:
*Published page: https://docs.asciidoctor.org/asciidoc/latest/tables/format-cell-content/*
*Source page: https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/blob/main/docs/modules/tables/pages/format-cell-content.adoc*
Unlike the other table cell styles, it actually changes how the content is parsed. And there are some things to know about it.
- The AsciiDoc table cell must be used if the content contains AsciiDoc block content (paragraph, listing block, etc); it is not needed if the cell only has inline formatting
- An AsciiDoc table cell is an embedded document
- An AsciiDoc table cell inherits attributes from the parent document (though there are some exceptions)
- Line comments are removed from the content before the table cell is parsed (unlike a normal AsciiDoc document)
- The content should begin on a new line (though this is just a recommendation atm)
- References are shared with the parent document
- Counters are shared with the parent document
- Footnotes are processed independently from the parent document
This issue would include moving the content (or reconciling and removing the content) from the source page linked to above, particularly the [a operator section](https://docs.asciidoctor.org/asciidoc/latest/tables/format-cell-content/#a-operator) as well as adding some important terminology (most notably "AsciiDoc table cell") and context (when do I need it, when do I not need it).