AsciiDoc Language issueshttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues2024-03-19T09:22:46Zhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/49Document exceptions to the nobreak role and non-breaking characters2024-03-19T09:22:46ZRyan CarpenterDocument exceptions to the nobreak role and non-breaking charactersThe nobreak role and non-breaking characters should prevent breaks from occuring within a span or string of text, but unintended breaks may occur in files generated with Prawn when the span or string includes mixed fonts.
The reason is t...The nobreak role and non-breaking characters should prevent breaks from occuring within a span or string of text, but unintended breaks may occur in files generated with Prawn when the span or string includes mixed fonts.
The reason is that Prawn creates separate fragments for the different fonts, and allows breaks between the fragments.
This means that the nobreak role and non-breaking glyphs will only behave as intended when supported by the target font.
Unfortunately, this is not the case for many of the most common fonts, which typically do not include non-breaking hyphens, narrow no-break space, or zero-width word joiner.
See [discussion in the Asciidoctor Zulip chat](https://asciidoctor.zulipchat.com/#narrow/stream/288690-users.2Fasciidoctor-pdf/topic/Doc.20Suggestion.3A.20AsciiDoc.20Table.20Cell.20Implications)
Suggestions
* Document exceptions to the [nobreak role](https://docs.asciidoctor.org/asciidoc/latest/text/text-span-built-in-roles/#built-in)
* Consider adding a note about [formatting table content by cell](https://docs.asciidoctor.org/asciidoc/latest/tables/format-cell-content/) as this is a common context for using non-breaking text.
* Consider adding more information to the [Character Replacements page](https://docs.asciidoctor.org/asciidoc/latest/subs/replacements/)
* Adjust the documentation for Asciidoctor PDF about required characters on the [Prepare a Custom Font page](https://docs.asciidoctor.org/pdf-converter/latest/theme/prepare-custom-font/#required-characters) which advises that "You need to ensure these glyphs are present in your prepared font or configure a fallback font that provides them." This instruction is suitable for most of the required non-Latin characters, but not for the non-breaking ones, which will not behave as intended if these alone (and not the surrounding text) are supplied by Prawn's fallback font.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/48Specify how location ranges are reported in the ASG2024-03-06T07:04:04ZDan AllenSpecify how location ranges are reported in the ASGAn implementation can optionally include location information in the ASG of the parsed document. If it does include this location information, the values must match the location information in the expected data. The purpose of this issue...An implementation can optionally include location information in the ASG of the parsed document. If it does include this location information, the values must match the location information in the expected data. The purpose of this issue is to specify the logic for how these location ranges are reported.
The optional `loc` property on every element in the ASG reports the location range of that node in the source. The range consists of the starting line, column, and file position of the element and the ending line, column, and file position of the element. Both the line and column values are 1-based. The column is meant to represent a visible column in the source. The file property captures the include target stack, when applicable, starting from the first include in the stack. (It's up to code analyzing the ASG to resolve the file value to an absolute representation).
Here's an example:
```
"location": [{ "line": 1, "col": 1 }, { "line": 3, "col": 4 }]
```
We're not tracking newline characters when calculating end locations, nor are we tracking empty lines that separate blocks in the ASG.
When computing the end location of a block, the column of the trailing newline is not included. The block ends at the visible location in the source document, not at the newline that follows it. For a delimited block with a delimiter length of 4, the end column is 4, not 5. There are two reasons for this. First, it points to a column in the source that the cursor can go. Second, it ensures that the end column for a block is consistent regardless of whether it's at the end of the document or somewhere in the middle of it.
The tracked location of the document is all the lines in the document, even those that proceed or follow the first and last block, respectively.
The tracked start location of a delimited block is the start of the opening delimiter line (e.g., line: 1, col: 1) and the tracked end location of a delimited block is the end of the closing delimiter line (e.g., line: 3, col: 4).
The tracked start location of a paragraph or named non-delimited block is the start of the first line of content and tracked end location of a paragraph or named non-delimited block is the last visible character of the last line of content.
The tracked start location of block metadata is the first column of the first block attribute line and the tracked end location is the last column of the last block attribute line. The location of the block metadata comes before the start of the block itself (so the entire range of the block is the start location of the metadata to the end location of the block).
Normally, the lowest column value is 1. However, there are two cases when the column must be 0. First, if a document has no blocks, then the start and end column is 0. The 0 column indicates that the source does not occupy any space. If the first line of the document is empty, then the start column is 0. The 0 column indicates that there is no content on the first line, only a newline that follows it. Similarly, if the contents of a verbatim block starts with an empty line, then the start column of the content is 0, again indicating that there is no content on the first line, only the newline that follows it. If the content has a trailing empty line, then the end column is 0 for the same reason.
For an inline element, if the beginning or end of the element was resolved from an attribute reference, the location should be the start or end of the attribute reference, respectively. That's because the location range is meant to track the offsets in the source, not in the resolved value.
Any other nuances of the location range should be covered by this SDR.0.4.0 (milestone build)Dan AllenDan Allenhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/47Draft preliminary descriptions for block parsing, structural forms, and conte...2024-02-29T18:56:58ZSarah WhiteDraft preliminary descriptions for block parsing, structural forms, and content modelsWrite up the initial specification document content for the following global block sections:
* Block element
* Block parsing
* Block structural forms
* Paragraph form
* Block content models
* Basic
* Verbatim
* Compound
* (Possi...Write up the initial specification document content for the following global block sections:
* Block element
* Block parsing
* Block structural forms
* Paragraph form
* Block content models
* Basic
* Verbatim
* Compound
* (Possibly) Block style, macro name and variant
This content evolved out of working on the paragraph content and heavily draws from the SDRs. Most of all, it needs to get off my computer before it gets buried in a sea of other experimental branches :scream:.
The information will not be complete, and may possibly mention some concepts or terms that haven't been officially discussed or approved (I'll mark such items as "under consideration", but it is important to remember that everything in the specification is pre-alpha and is highly likely to be tweaked if not significantly changed depending on new discoveries and feedback from implementations).0.4.0 (milestone build)Sarah WhiteSarah Whitehttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/46Does AsciiDoc officially supoorts Setext style headers?2024-02-22T09:55:22ZBoqian JiangDoes AsciiDoc officially supoorts Setext style headers?I see that headers in AsciiDoc are described as titles. But I can't find any information about Setext style support like below:
```
A First Level Header
====================
A Second Level Header
---------------------
```
However, the a...I see that headers in AsciiDoc are described as titles. But I can't find any information about Setext style support like below:
```
A First Level Header
====================
A Second Level Header
---------------------
```
However, the above headers are both valid in online preview and AsciiDoc visual studio code extension. My question is, is this Setext style header officially supported in AsciiDoc, or just some untold feature added for Markdown compatibility?https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/45Document brand usage requirements2024-02-19T22:35:44ZJason PorterDocument brand usage requirementsThis issue should include rules we have from the Eclipse Foundation, and any that we as a community create relating to the usage, modification, and adoption of the AsciiDoc brand.This issue should include rules we have from the Eclipse Foundation, and any that we as a community create relating to the usage, modification, and adoption of the AsciiDoc brand.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/44Enforcing 2FA on Gitlab Accounts2023-10-22T08:10:12ZTiago LucasEnforcing 2FA on Gitlab AccountsDear project committers, \
I would like to bring to your attention that the security team at the Eclipse Foundation will soon be requiring that accounts with committer privileges on gitlab.eclipse.org activate 2FA access control. \
The p...Dear project committers, \
I would like to bring to your attention that the security team at the Eclipse Foundation will soon be requiring that accounts with committer privileges on gitlab.eclipse.org activate 2FA access control. \
The plans, along with details on the importance of this change, have been [shared on the committers mailing list](https://www.eclipse.org/lists/eclipse.org-committers/msg01397.html). \
As included in the announcement, we are opening this ticket to inform you and track the activation of 2FA on accounts belonging to this projects’ members. \
To keep in mind, starting on on the **30th of October** you’ll likely see a banner each time you access GitLab reminding you to activate 2FA in your account. \
The deadline is **December the 4th**, by which access to your account will be limited until you activate 2FA. It is highly recommended that you enroll in this process before the deadline.
GitLab offers [instructions](https://gitlab.eclipse.org/help/user/profile/account/two_factor_authentication.md) on every step of the process and we’re happy to answer any question you might have. \
Thank you!
/cc @mbarbero
## FAQ
### How can I activate 2FA for my [gitlab.eclipse.org](https://gitlab.eclipse.org) account?
Detailed [instructions](https://gitlab.eclipse.org/help/user/profile/account/two_factor_authentication.md) are available. In a nutshell, visit [gitlab.eclipse.org/-/profile/two_factor_auth](https://gitlab.eclipse.org/-/profile/two_factor_auth) and follow the on-screen instructions.
If the form asks you for a password in order to set up 2FA on your account, this is not your Eclipse account’s password. It is a known bug on Gitlab that some accounts are requested a “local” password despite having one in the Active Directory. \
You should request a [password reset](https://gitlab.eclipse.org/-/profile/password/edit) and use that same password for this form. This process *does not* change your Eclipse account password.
### Do I need to purchase a hardware token for account access?
No. GitLab supports two 2FA methods:
_Time-based One Time Password_ (TOTP) compatible with mobile apps like Google Authenticator or Authy, and several password managers such as Bitwarden or 1Password.
_WebAuthN_, which necessitates a hardware token, typically a USB key (examples include [Solo 2 key](https://solokeys.com/) or [Yubikey](https://www.yubico.com/la-cle-yubikey/yubikey-5-series/)). These tokens are sometimes referred to as FIDO2 keys.
### How will this affect my [gitlab.eclipse.org](https://gitlab.eclipse.org) accounts?
In the near future, 2FA will become mandatory for authentication on your accounts. Should you not have enrolled by the deadline we communicated to you, access to the platform will be restricted.
### I already have 2FA enabled on [gitlab.eclipse.org](https://gitlab.eclipse.org), do I need to do anything?
No, you’re all good.
### What do I do if I lose my 2FA device?
We highly recommend the utilization of diverse secondary authentication methods. In the event that you misplace all your secondary authentication elements, recovery codes will be the only way to restore account access. By securely storing your recovery codes, you'll ensure the ability to regain access.
Note that the Eclipse IT team may be able to recover access to accounts with 2FA enabled if both the 2FA credentials and account recovery methods are lost. This will require extra identity verification and direct contact with [security@eclipse-foundation.org](mailto:security@eclipse-foundation.org) or [webmaster@eclipse-foundation.org](mailto:webmaster@eclipse-foundation.org).https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/43Decide whether the attribute value reader and inline preprocessor preemptivel...2024-02-28T00:17:15ZDan AllenDecide whether the attribute value reader and inline preprocessor preemptively resolve escaped backslashesThe text of a paragraph goes through two phases of inline parsing, the inline preprocessor and the inline parser. The value of an attribute entry goes through two phases as well, the value assembler and the inline preprocessor. (That val...The text of a paragraph goes through two phases of inline parsing, the inline preprocessor and the inline parser. The value of an attribute entry goes through two phases as well, the value assembler and the inline preprocessor. (That value will subsequently go through the inline parser when referenced in a paragraph). At each stage, there's syntax that can be escaped using a backslash. For example, in the value of an attribute entry, an attribute reference or a value continuation can be escaped using a backslash. Consider the following case:
```
:hint: Use \{backslash} to insert \\
```
When the `hint` attribute is referenced in a paragraph, we expect to see the following in the rendered document:
```
Use {backslash} to insert \
```
We need to consider how we end up at that result.
There are two strategies for how escaped backslashes can be handled as the processor works through the inline parsing phases.
## Strategy 1: Resolve escaped backslashes per phase (strict)
Following this strategy, each time the processor looks for escaped backslashes, it resolves (or normalizes) them. What that entails is consuming the odd backslash (if present) as an escape, then reducing the number of backslashes by half. Consider this sequence:
```
\\\
```
That would resolve to:
```
\
```
The benefit of this strategy is that it can account for every permutation of backslash escaping. If you want the backslash to be treated as a literal backslash, you just add more backslashes. However, this strategy quickly leads to the leaning toothpick problem...which is essentially an exponential increase of required backslashes.
Let's assume that we start with the following AsciiDoc source:
```
:command: *begin*
:text: Use ??
{command} to begin a block.
```
We want to see the following the output document:
```
Use \<strong>begin</strong> to begin a block.
```
The question is, how many trailing backslashes do we need to use in place of ?? to produce a literal backslash without impacting the attribute reference and text formatting it contains? The answer is, we need 9.
```
:text: Use \\\\\\\\\
{command} to begin a block.
```
The last backslash acts as a value continuation. Then, it reduces the even number of backslashes that precede it by half, leaving us with 4. At this stage, this is what the processor sees:
```
Use \\\\{command} to begin a block.
```
Now we resolve the attribute reference, once again reducing the even number of backslashes that precede it by half. At this stage, here's what the processor sees:
```
Use \\*begin* to begin a block.
```
When the `{text}` attribute reference is used in the paragraph, the inline parser will locate the escaped backslash in the resolved value and once again reduce the backslashes by half, reducing it to 1 (which will not impact the text formatting). Thus, we arrive at the following result:
```
Use \<strong>begin</strong> to begin a block.
```
While this works, it's hard to explain to an author—especially someone not familiar with the low-level phases—why 9 backslashes are needed. Thus, I think we should consider strategy 2.
## Strategy 2: Only resolve escaped backslashes once, during inline parsing
In this strategy, the escaped backslashes are still considered at each phase, but they are left as is until inline parsing (the last phase). That way, they remain stable through the phases rather than being reduced at each stage. As a result, the user only needs to escape a backslash once.
Revisiting the previous example, the author only needs 3 trailing backslashes to achieve the desired result.
```
:command: *begin*
:text: Use \\\
{command} to begin a block.
```
The odd backslash is consumed as the value continuation. The remaining escaped backslash is reduced to a literal backslash by the inline parser.
The drawback of this strategy is that it's not possible to use a backslash to escaped the resolved value of an attribute. Let's assume that we want the following output instead:
```
Use *begin* to begin a block.
```
If we use `\\{command}`, then we're going to end up with `\\*begin*` rather than `\*begin*`. So we've sacrificed some flexibility for simplicity. However, there's still a mechanism available to achieve the desired result. If we set the `esc` attribute to a single backslash, then it becomes possible to insert an escape character in front of the resolved value of the attribute. Consider this case:
```
:esc: \
:text: Use {esc}\
{command} to begin a block.
```
Now when we reference `{text}`, the inline parser will see `\*begin*`. That means the output will show:
```
Use *begin* to begin a block.
```
**NOTE:** The value of the implicit `backslash` attribute will need to be `\\` rather than `\` so it produces a literal backslash as expected.
## Proposed decision
Given that the audience for AsciiDoc is more than just programmers, I think the simplistic approach is best here. We want to avoid the leaning toothpick problem, and we want to be able to easily explain the AsciiDoc rules without having to make the user aware of all the low-level phases. There are still plenty of mechanisms available in AsciiDoc to escape syntax without having to rely on strict backslash escaping.
It's worth noting that none of these scenarios mentioned in this issue are even available in Asciidoctor or its predecessor. That's because both only consider whether the character that immediately precedes the reserved syntax (e.g., an attribute reference) is a backslash, not whether that backslash is itself escaped. So this issue is primarily a refinement of #25.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/42Decide whether a list continuation is a universal interrupting line2024-02-29T18:55:49ZDan AllenDecide whether a list continuation is a universal interrupting lineA list continuation acts as an interrupting line (for a paragraph, principal text, or wrapped attribute entry value) inside of a list. This is well established. The question is whether the list continuation should act as an interrupting ...A list continuation acts as an interrupting line (for a paragraph, principal text, or wrapped attribute entry value) inside of a list. This is well established. The question is whether the list continuation should act as an interrupting line outside of a list.
From a purist perspective, there's no reason for a list continuation to act as an interrupting line outside of a list since it has no meaning there. However, there are good reasons to making it universal.
The first reason is that both Asciidoctor and its predecessor implement this rule. Consider the following AsciiDoc source:
```
foo
+
bar
```
Both implementations produce two paragraphs, with the first paragraph ending before the list continuation and the second paragraph starting with the list continuation (since it isn't otherwise consumed).
```
<p>foo</p>
<p>+
bar</p>
```
That may be reason enough to keep this rule. But even from the standpoint of wanting to clarify and refine the language, there's still good reason to retain the line continuation as a universal interrupting line. It means that an implementation doesn't need to maintain two separate rules for blocks with implicit boundaries. Currently, those blocks are the paragraph and the attribute entry (in the case the value continuation is used)...and anything that uses them as alternatives (which can have quite a ripple effect on the grammar). Making it universal also means it may be possible for the line preprocessor to not have to keep track of when a line is inside a list or not, thus making it more lightweight.
Encountering a list continuation outside of a list is very rare. Given that there are implementation benefits and a guarantee of backward compatibility, I think that is justification for keeping it this way.0.3.0 (milestone build)Dan AllenDan Allenhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/41Clarify syntax and parsing rules for continuing an attribute entry value acro...2024-02-27T21:00:43ZDan AllenClarify syntax and parsing rules for continuing an attribute entry value across multiple linesMost of the time, an attribute entry occupies a single line. For example:
```
:source-language: java
```
When the value is very long, the AsciiDoc syntax allows that value to be split across multiple lines by ending each previous line ...Most of the time, an attribute entry occupies a single line. For example:
```
:source-language: java
```
When the value is very long, the AsciiDoc syntax allows that value to be split across multiple lines by ending each previous line in a backslash, called an *attribute continuation*. This feature is inspired by shell interpreters, such as Bash. For example:
```
:description: This page is a migration guide. \
It only covers the migration between each LTS release.
```
The attribute continuation has never been very well defined beyond a basic example. This issue aims to resolve the syntax and parsing rules while also making the feature more robust and universal.
The attribute continuation serves two purposes. First, it tells the parser to append the next line to the value as long as that line is not an interrupting line. If the line is taken, the continuation and the newline that follows it are dropped. If the line is not taken, the continuation is preserved (meaning it remains as part of the value), but not the trailing newline.
Thus, the resolved value of the previous example is as follows:
```
This page is a migration guide. It only covers the migration between each LTS release.
```
**NOTE:** In addition to the interrupting lines for a paragraph, an attribute entry is interrupted by an adjacent attribute entry. Asciidoctor does not always get this requirement right. (Also, it's still unclear whether a list continuation should only be an interrupting line when inside of a list, or at any time).
Both Asciidoctor and its predecessor required the attribute continuation to be proceeded by a space. However, this is an unnecessary requirement and it makes it impossible to continue the the value without introducing a space. It should be possible to use the continuation directly at the end of the line.
```
:product-code: ISV-\
1234
```
This attribute entry would produce the value `ISV-1234`.
Any time we rely on a character to have special meaning, especially a backslash, it should be possible to escape that character. Like with the inline preprocessor, we will want to apply contextual escaping here. What that means is that if there are an even number of backslashes at the end of the line, the last backslash does not act as an attribute continuation and those backslashes are reduced by half. If there are an odd number, the last backslash is an attribute continuation and the remaining backslashes are reduced by half. Escaped backslashes anywhere else in the line are not considered.
Here's an example of how to use a literal backslash at the end of a value:
```
:instructions: escape markup using \\
```
However, keep in mind that most of the time this won't be necessary. That's because the backslash is preserved if the attribute entry is interrupted, which it almost always is. So this is unlikely to affect existing documents. Consider this case:
```
:instructions: escape markup using \
{instructions}
```
Here's an example of how to use a literal backslash and then continue the value:
```
:instructions: escape an autolink using \\\
https://example.org
```
Again, these are pretty rare events, so we're just defining the rules for completeness.
An attribute continuation allows the continued value to be aligned with the value on the previous line. Yet, the indentation is dropped from the value. Consider this case:
```
:description: This page is a migration guide. \
It only covers the migration between each LTS release.
```
Shell interpreters also support this feature. In shell interpreters, the repeating spaces are always normalized to a single space. However, I don't think we want that behavior. Instead, all leading indentation should be removed and only the space to the left of the attribute continuation should be kept. That gives the user better control over where the space ends up in the resolved value.
Of course, we have to consider whether we even want to normalize the spaces at all or just keep them as entered. In other words, do we want to encourage this style of formatting in the AsciiDoc source, or should the wrapped line always start at the left margin?
The final point to consider is how to specify a hard wrap. Consider the case when the value of the attribute entry is going to be used in a verbatim block or a paragraph with the hardbreaks option. The author is going to want to be able to preserve the newlines in the attribute value so that they carry over. But this is not possible in AsciiDoc.
Asciidoctor offers a partial compromise by enhancing the attribute continuation to recognize a hard line break shorthand before the continuation. When Asciidoctor detects this case, it preserves the newline. Consider this case:
```
:lines: one + \
two + \
three
```
The resolved value would be as follows:
```
one +
two +
three
```
This is a not a general purpose feature, and thus I think we can do better. I see two possible ways to express that the newline should be preserved, and there's no need to link it to the hard line break shorthand.
The first option is to use a double attribute continuation offset by a space. For example:
```
:lines: one\ \
two\ \
three
```
The escaped space in front of the list continuation would tell the processor to keep the newline after the attribute continuation. This is not likely a syntax that would interfere with content. However, it may be costly to parse.
Another option is to take a page from YAML and use the `|` character in front of the continuation as a hint to keep the newline.
```
:lines: one|\
two|\
three
```
However, the risk here is that the pipe character is used to separate table cells, so it could cause an AsciiDoc table cell to end prematurely. Though it could be escaped in that case.
Yet another option is to take a hint from Markdown and use multiple spaces in front of the continuation as a hint to preserve the ensuing newline.
```
:lines: one \
two \
three
```
This may be the safest and most portable option, and it's not terribly difficult to parse. It's rare that you need spaces at the end of a line, so we're able to take advantage of characters that would otherwise have no meaning. That's ideal for introducing a new feature. When newlines are preserved, indentation on wrapped lines is also preserved.
We can apply this to the earlier example of the partial syntax offered by Asciidoctor to see how it compares:
```
:lines: one + \
two + \
three
```
It's nearly the same syntax, but now it's not coupled to the hard line break shorthand.
* In summary, an attribute entry is interrupted by an adjacent attribute entry or paragraph interrupting line
* An attribute value can be continued to the next line by ending the line in an attribute continuation (trailing backslash)
* If the attribute continuation is unused, it is preserved at the end of the value
* Indentation is removed from wrapped lines
* The attribute continuation can be escaped using a backslash (any even number of backslashes at the end of the line)
* Newlines in an attribute value can be preserved by preceding the attribute continuation with two spaces0.4.0 (milestone build)https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/40Define how document attributes are represented in the ASG2024-02-14T23:23:52ZDan AllenDefine how document attributes are represented in the ASGAs part of defining a semantic representation of an AsciiDoc document, we must determine where document attributes fit within that structure. It's not enough for document attributes to reside as state in the parser since they can impact ...As part of defining a semantic representation of an AsciiDoc document, we must determine where document attributes fit within that structure. It's not enough for document attributes to reside as state in the parser since they can impact how nodes are interpreted after parsing is complete. In other words, the document attributes have to be tracked and represented in the ASG.
To start, we first need to establish that a document attribute is more than just a map entry (name/value pair). Where and how the attribute was declared (set or unset) matters too. Thus, we need to store document attributes as a value record. That record has the following properties:
* value - the value of the attribute if set, or `null` if explicitly unset
* source - records how the attribute was set; an enum that is one of the following values: external, header, body, intrinsic
* writable - whether the attribute can be changed or unset (i.e., mutable)
Thus, a document attribute entry consists of a name and value record. These entries are stored in an insertion-ordered map, where the map keys are the attribute names.
Let's assume the document header contains the following attribute entry:
```
:toc: left
```
Here's an example of the resulting entry in the document attributes map:
```
{
toc: {
value: 'left',
source: 'header',
writable: true,
}
}
```
With the information provided by the value records, it's possible to find document attributes that are immutable (i.e., locked), document attributes that have been explicitly unset, document attributes that are implicit, document attributes set from the API or CLI (and not later overridden), etc. This is all information that would not have been available if the document attributes were stored with their value and only when set.
The document attributes stored on the document node in the ASG should be stored using this value record. That map should be the compiled collection of document attributes up to the end of the document header (thus discarding any intermediate state).
AsciiDoc permits attributes to be set or unset using attribute entries in the document (in the header or in the body). Not all document attributes are declared this way. However, for those that are, those entries should be tracked in the ASG. Those records will need to be stored differently. That brings us to attribute entry nodes.
An attribute entry is recorded in an insertion-ordered map (or should it be an array?) for each group of attribute entries. In this case, the value of the entry can be the attribute value if set, or `null` if unset. An attribute declared using an attribute entry is always writable and the source is implicit, so there's no reason for using a value record in this case.
For attributes declared in the header, these attribute entries are stored on the `attributeEntries` property on the value of the `header` property on the document. For attributes declared in the body, these attribute entries are either stored on the `attributeEntries` property on the ensuing block, or stored as a sibling node of that block. (Prototyping is required in order to resolve this choice).
While it's true that all attributes declared in the header will already be represented in the map of attributes on the document, it's important to track them to ensure the document was correctly parsed. (As an alternative, the document attributes map could be all attributes set before the header, then the header attributes could be replayed onto the proxy, just like attributes declared in the body).
Within any attribute entries map, only the last entry for a given attribute name in that scope should be stored (i.e., locally impactful). Hence, if the attribute is set, then unset, no entry is stored. If the attribute is set, then set again, the second value is used. If the attribute is unset, then set, the set value of the attribute is stored. If the attribute is not mutable, the entry is discarded (since the attribute entry is effectively nullified).
As the body of the document is parsed, the effective document attributes need to be tracked. However, the map of document attributes as it exists at the end of the header should not be modified. Instead, that object should be proxied. Any updates to the map from attribute entries in the body should be applied to the proxy during parsing. That way, attribute entries in the body don't impact the state of the document attributes bound to the document. Yet the parser will still have access to the updated map as parsing proceeds.
If a node is accessed at random from the parsed document (ASG), the view of the document attributes needs to be built on demand by proxying the document attributes map on the document and replaying all the attribute entries in the body up to the start of the node. These proxy objects could be cached to avoid repeated and redundant computations.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/39Clarify how empty lines and list continuations impact list boundaries2024-02-27T21:17:44ZDan AllenClarify how empty lines and list continuations impact list boundariesLists have implicit boundaries in AsciiDoc (and in most lightweight markup languages). Hence, a common matter for an author is how to maintain the boundaries of a list or how to break out of them. Our goal is to ensure that its easy for ...Lists have implicit boundaries in AsciiDoc (and in most lightweight markup languages). Hence, a common matter for an author is how to maintain the boundaries of a list or how to break out of them. Our goal is to ensure that its easy for an author to keep a list together when needed, but also easy to separate lists when they shouldn't be adjoined.
In AsciiDoc, there are two forms that impact this outcome, the list continuation and the empty line. In this issue, we'll clarify what impact these two forms have on list parsing with the intent to solidify the rules of list boundaries.
## Scenarios
To understand how the grammar rules are defined, we'll be examining several scenarios:
* **[l-1]** Empty lines between list items (after the list item definitively ends)
```
* first item
* second item
```
* **[l-2]** Empty lines above a block attached with a list continuation, as well as between its metadata lines
```
* item
+
attached block
```
```
* item
+
[#idname]
attached block with an ID
```
* **[l-3]** Empty lines above a new list with or without block metadata following a list item
```
* item
. nested list
```
```
* item
[]
. nested or sibling list?
```
* **[l-4]** Empty lines above an indented (literal) block with or without block metadata following a list item
```
* item
indented
```
```
* item
[]
indented
```
Although it won't be discussed in this issue, an empty line above a list continuation applies that list continuation to an ancestor list. The number of empty lines equates to how many levels it ascends (e.g., one empty line means it applies to the parent).
### l-1
Let's start with **[l-1]**, since a decision here sets the foundation for what rules are available for the other scenarios. We often refer to **[l-1]** as ventilated list items. The reason is, authors have a tendency to want to put some space between list items to make them more readable. The question is, how much space is allowed? Consider the following case:
```
* first item
* second item
```
In pre-spec AsciiDoc, any amount of empty lines are tolerated between list items and the list will still stay together. To tighten this rule, and make it easier to separate lists, one proposal is to only permit a single empty line between list items. Any more and the list would be severed. However, this proposal could have major compatibility implications as there are many documents in the wild that rely on arbitrary ventilation. Furthermore, this new rule would be inconsistent with other lightweight markup languages such as Markdown and reStructuredText. It just seems to be part of the unwritten code of lightweight markup languages to allow list items to be separated by an arbitrary number of empty lines. (One notable exception is textile). If we decide to honor that code, then we won't be able to allow adjacent lists that have the same marker to be separated using empty lines alone.
Currently, the main way to separate adjacent lists that are congruent (i.e., same list marker) is to insert a block attribute line between them (with or without a preceding empty line). The block attribute line can be empty (i.e., nothing between the square brackets). Since a sibling list item cannot have metadata lines above it, this line effectively acts as an interrupting line. As a result, it causes the first list to end and a new one to begin. For example:
```
* first list
[]
* second list
```
This technique also works if the second list is preceded by a block title line, though it usually has to follow an empty line in order to be recognized as a block title line (typical rules).
If the lists are not congruent, the empty line above the block attribute line is required (to account for the scenario in **[l-3]**).
To make the intent of the interrupting line more clear, a non-functional option could be used to communicate the block attribute line's function:
```
* first list
[%interrupt]
* second list
```
Another technique to keep them apart is to enclose one of the lists in an open block:
```
* first list
~~~~
* second list
~~~~
```
We are still considering whether there are other ways to separate adjacent lists, such as using a line comment. However, we generally prefer comments to not impact parsing, so this may not be pursued. Either way, it will addressed in a separate issue.
### l-2
Let's now consider **[l-2]**. Here, there's an argument to once again be tolerable of multiple empty lines, but for different reasons. Before going on, it's important to emphasize that the list continuation cannot be preceded by an empty line (otherwise, it becomes a list continuation for an ancestor list). When the list continuation is found, it effectively tells the parser to expect a single block. Normally, a block can have empty lines above it (either above the metadata or in between it). Thus, it seems like it would be safe and consistent to allow them here too. The intent of the author is clear: "find one block to attach". If empty lines were not tolerated, then the parser would essentially be ignoring the request of the author and leave the list continuation dangling.
The AsciiDoc style guide should certainly encourage authors to not leave empty lines after a list continuation as it makes the attachment less clear. But from the standpoint of the parser, there's no real benefit of giving empty lines special meaning here.
### l-3
In a list, there are two cases of an implicit list continuation, **[l-3]** a nested list and **[l-4]** an indented (literal) block. In these cases, the rules about when empty lines are tolerated are more strict.
Let's look at **[l-3]** first. If an adjacent list is encountered that's different and has no metadata lines, that list is attached as a child of the current list item regardless of how many empty lines are above it. Again, this comes from the empty line tolerance in lists across lightweight markup languages. While we could forbid consecutive empty lines, we'd be introducing a special rule just for this case, which will be hard to remember.
The primary question is, how many empty lines should be permitted if the adjacent list has metadata lines? Typically in AsciiDoc, a block attribute line acts as an interrupting line. Intuition would then tell us that a block attribute line above an adjacent list will cause the previous list to end. Consider this case:
```
* disc
[square]
** square
```
However, the AsciiDoc syntax grants a special exception here. If there's no empty line above the block attribute line, it acts as through there's an implicit list continuation above it. Thus, the second list becomes a child of the list item in the first list (hence a nested list).
But what happens if there's an empty line above the block attribute line? Consider this case:
```
* disc
[square]
** square
```
Now we are torn between two standard rules. On the one hand, we said earlier that a block attribute line is one way to separate adjacent lists (i.e., prevent nesting). On the other hand, there's an implicit list continuation above an adjacent list when that list is different.
There are two possibilities here. The first choice is that we stick with the idea that *empty line + block metadata line* acts as an interrupting line with a list. This rule matches pre-spec AsciiDoc. In that case, here's how the second list would need to be attached if preceded by an empty line:
```
* disc
+
[square]
** square
```
The second choice is that we tolerate at least one empty line, but not consecutive ones. This rule borrows from an earlier proposal. Since no where else in the AsciiDoc syntax do consecutive empty lines have a different meaning than a single empty line (especially above a block), I think the second rule would be a risky choice to introduce here. I'm inclined to reject the idea.
### l-4
Finally, we arrive at **[l-4]**. Like with a nested list, an indented (literal) block has an implicit list continuation. If the indented block has no metadata lines, then it must be offset by at least one empty line or else it gets soaked up as part of the list item principal. Consider this case:
```
* item
indented
```
Since we've already established that a nested list without metadata lines can be preceded by an arbitrary number of empty lines, it's both logical and consistent to allow it in this case as well.
Once again, we need to consider what happens if the indented block has metadata lines. Consider this case:
```
* item
[.output]
indented
```
Pre-spec does not apply the implicit list continuation if the indented block has at least one metadata line. So the indented block would not be attached to the list item in this case. Instead, it would require an explicit list continuation to do so. However, if we want the rules of an implicit list continuation to be consistent, then we **should** attach the indented block if not preceded by any empty lines:
```
* item
[.output]
indented
```
The block attribute line interrupts the list item principal, so the indented block should be a candidate for attachment in this case. This is not supported in pre-spec AsciiDoc, but we could add it now.
## Summary and decisions
As we've stated in other issues, while formalizing AsciiDoc, we're trying to remain as consistent with how the language is currently interpreted as possible. At the same time, we need to address idiosyncrasies so the language is easy to understand, remember, and use.
With that in mind, we want to make it easy to keep lists together and also easy to separate them. The main subject of concern are empty lines. When are empty lines tolerated in a list and are consecutive empty lines are allowed? We established that pre-spec AsciiDoc—and lightweight markup languages in general—are quite tolerant of empty lines in a list. Any number of empty lines are permitted between list items of the same list, and empty lines are permitted following an explicit or implicit list continuation.
We considered the proposal of assigning meaning to consecutive empty lines so they act like a list interrupting line. While enticing, this proposal would greatly threaten compatibility and deviate from the unwritten code of lightweight markup languages. Thus, we don't think it's worth the risk.
We then clarified that adjacent lists can be separated using an empty line followed by a block attribute line. (If the two lists are congruent, the empty line is not required). This is a pattern that was heavily promoted in pre-spec AsciiDoc and, as a result, plenty of documents now depend on it. It offers an definitive way to separate lists that can be made to be self-documenting.
We then accepted that empty lines should be permitted above a block attached using an explicit list continuation. The justification is that the intent of the explicit list continuation is clear and there's no reason to counter that intent by giving empty lines special meaning. The goal of the list continuation is to find a block, and the parser should proceed until it does.
We then considered whether an empty line or lines should be permitted in the case an implicit list continuation is being used to attach a block with metadata lines. Here we decided that the syntax should not be tolerant of empty lines. The reason is that it would break the contract that *empty line + block attribute line* can be used to separate adjacent lists. The block metadata line either must not be preceded by an empty line or the list must be attached using an explicit list continuation.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/38Decide whether a non-indented line interrupts an indented block form2024-02-27T20:59:42ZDan AllenDecide whether a non-indented line interrupts an indented block formAn indented block form is defined as one or more contiguous lines indented by at least one space. This is an implicit structure that produces a literal block in the parsed document (the ASG).
(In pre-spec AsciiDoc, this was referred to ...An indented block form is defined as one or more contiguous lines indented by at least one space. This is an implicit structure that produces a literal block in the parsed document (the ASG).
(In pre-spec AsciiDoc, this was referred to as a literal paragraph, but we've since decided to name it a literal block with the indented form to make the terminology more accurate and consistent).
A question that has come up when defining the grammar is what to do if a subsequent line is not indented (and not otherwise an interrupting line). In other words, can a paragraph interrupt an indented literal block? Consider the following case:
```
indented
not indented
```
In both Asciidoctor and its predecessor, the non-indented line does not interrupt the indented block form. Thus, only the first line has to be indented by at least one space. This parsing behavior mandates that an adjacent paragraph must be separated by at least one empty line. In other words, a non-indented line cannot interrupt the indented block form, but is rather consumed as part of it.
There are two reasons why this behavior may be problematic:
* It's not consistent with other markup languages, Markdown in particular. (rST also treats it as an interrupting line, though the indented block is a blockquote)
* According to CommonMark, "A blank line is not needed ... between a code block and a following paragraph."
* It's makes it more nuanced to explain and to identify in the source.
There's one other important reason this should be considered. The next list item should be allowed to interrupt the indented block.
```
* first item
indented
* second next item
```
However, it currently is not permitted, which is definitely surprising. And yet the list item is permitted to interrupt an attached paragraph. The interruption rules just seem inconsistent in this regard.
It's very unlikely that existing documents rely on this behavior since the general practice is to surround the indented block by empty lines. But in the event that it does occur, the parser must have deterministic behavior. I think we should at least discuss changing the rule so that a non-indented line acts as an interrupting line, meaning it's not consumed as part of the indented block.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/37Draft specification content for strong span2024-02-27T19:59:48ZSarah WhiteDraft specification content for strong spanWrite up the initial specification document content for the strong span. This content will also act as an example of the writing guidelines proposed in #34 and should be used as a working reference along with !17 (the paragraph block wri...Write up the initial specification document content for the strong span. This content will also act as an example of the writing guidelines proposed in #34 and should be used as a working reference along with !17 (the paragraph block written up using the guidelines).
This issue requires:
* A brief non-normative description of the element
* Source examples included from the TCK
* Context section with content
* Content model section with content
* Attributes and metadata section with content
* ASG and DOM placeholder section
The issue requirements might change depending on feedback provided in issue #34.0.3.0 (milestone build)Sarah WhiteSarah Whitehttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/36Clarify when named positional attributes are mapped and their precedence2024-02-29T18:56:11ZDan AllenClarify when named positional attributes are mapped and their precedenceAsciiDoc supports a feature in which positional attributes are mapped onto named attributes if the block provides this mapping. One such example is the alt (text), width and height attributes on the block macro. These are known as named ...AsciiDoc supports a feature in which positional attributes are mapped onto named attributes if the block provides this mapping. One such example is the alt (text), width and height attributes on the block macro. These are known as named positional attributes, or posattrs for short. We need to clarify when positional attributes are mapped to names and processed and what precedence they have over regular named attributes, if any.
Let's consider a block image macro:
```
image::diagram.png[Diagram,300,400]
```
The posattrs for an image block are `alt,width,height`. That means that that the first positional attribute will be assigned to the `alt` attribute, the second positional attribute will be assigned to the `width` attribute, and the third to the `height` attribute.
The first question is, when does this happen? To be consistent with Asciidoctor, this mapping occurs after all the attrlists have been parsed and once the block is known. In the block grammar, we may choose to do this assignment in the action for any block (the final action).
```js
if (posattrs) {
for (let i = 0, num = posattrs.length; i < num; i++) {
const posKey = `$${i + 1}`
if (posKey in attributes) attributes[posattrs[i]] = attributes[posKey]
}
}
```
In order to communicate the list of named positional attributes, the grammar action for a specific block would have to pass this through using a reserved property that's later deleted.
```js
node.posattrs = ['alt', 'width', 'height']
```
We need to consider what happens if both the positional and named attributes are specified. For example:
```
image::diagram.png[Diagram,300,400,alt=A symbolic representation of something]
```
What is the value of the `alt` attribute in this case? We could say that if the attribute is already assigned, it cannot be overwritten by a named positional attribute. Keep in mind that block attributes could be defined in block attribute lines too:
```
[Diagram]
image::diagram.png[,300,400,alt=A symbolic representation of something]
```
It seems obvious in this case that the named `alt` attribute would win out. But note that we can't rely on the order in which the attributes are specified since we won't have that information at the time the posattrs are mapped...unless we were to fundamentally change the process by which attrlists are parsed.
Note that in the initial contribution (the docs from Asciidoctor), a posattr is allowed to override an existing named attribute for blocks. However, it does not override an existing named attribute defined in the boxed attrlist of a block macro. This inconsistency needs to be rectified.
By mapping named positional attributes in the block action, it means those attributes will not be assigned while descendant blocks are being parsed. But this seems acceptable. The alternative would be to map the named positional attributes after the opening delimited block or marker is matched, though that would result in a much noisier grammar.
Another issue that comes up are the positional content attributes on quote and verse blocks. Consider the following case:
```
[,'https://en.wikipedia.org/wiki/Martin_Luther_King_Jr.[Martin Luther King Jr.]']
____
World peace through nonviolent means is neither absurd nor unattainable.
____
```
We currently rely on the explicit name (e.g., `attribution`) to determine if the value should be parsed as a content attribute (parsed into inlines). However, when parsing attrlists, we don't yet know the posattrs, or even know that we're above a quote block or paragraph. And yet we have to parse the attrlists ahead of the block in order to know the style (such as the discrete style above a heading or the cols attribute above a table). So there's a circular dependency we have to unlock here.
One option would be to run the inline parser on any single-quoted positional attribute. The problem here is that a positional attribute which is not enclosed in single-quotes may end up being mapped as a content attribute, and thus wouldn't be in the right form as an array of inlines. So we're kind of stuck in between.
Another option would be make a special exception for quote and verse blocks. To do so, the block metadata action would need to look ahead to see if it's above a quote container and, if so, pass that information in to the attrlist parser. The attrlist parser would then recognize the second and third positional attributes as content attributes and parse them appropriately. The attrlist parser would also need to check if the explicit style is quote or verse and activate this feature in that case as well. With this approach, there would still be room for supporting this feature for custom blocks since that information would be accessible from the explicit style, which is always required in the case of a custom block. So this solution might actually work. The only downside is that the explicit style would have to appear in document order before any positional attributes it impacts.
In summary, there are several decisions to make:
* How is the named positional attribute list (posattrs) communicated in the grammar?
* When are the named positional attributes mapped during parsing?
* Do named positional attributes override named attributes already defined?
* How are the positional content attributes for quote and verse blocks handled?
* Is it possible for a custom block to define named positional content attributes (like quote and verse blocks?)0.3.0 (milestone build)Dan AllenDan Allenhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/35Draft specification content for paragraph2024-02-22T21:12:40ZSarah WhiteDraft specification content for paragraphWrite up the initial specification document content for the paragraph. This content will also act as an example of the writing guidelines proposed in #34.
This issue requires:
* A brief non-normative description of the element
* Source...Write up the initial specification document content for the paragraph. This content will also act as an example of the writing guidelines proposed in #34.
This issue requires:
* A brief non-normative description of the element
* Source examples included from the TCK
* Context section with content
* Content model section with content
* Attributes and metadata section with content
* ASG and DOM placeholder section
The issue requirements might change depending on feedback provided in issue #34.0.3.0 (milestone build)Sarah WhiteSarah Whitehttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/34Create guidelines for describing an element in the specification document2024-02-27T20:27:16ZSarah WhiteCreate guidelines for describing an element in the specification documentThis proposal is an outline for how to write about an element in the specification document. Some of the following section titles or their contents won't apply when writing about high-level topics such as blocks, spec-wide terminology, m...This proposal is an outline for how to write about an element in the specification document. Some of the following section titles or their contents won't apply when writing about high-level topics such as blocks, spec-wide terminology, macros, etc. Creating guidelines for high-level topics may not be necessary, but if so, we'll do that in a separate issue.
Proposed structure of a specification section for an element
1. The document title is the name of the element
- Generally, create one document per element, such as Paragraph, Sidebar, Strong Span
- The filename should match the document title in most cases.
2. Description: A non-normative section describing the element and any applicable semantics and meanings.
- Source examples: A subsection of Description showing source examples included from the TCK; this ensures they're accurate and valid
- The source examples should be included from the tests in the TCK, not handwritten! A case would need to be made for any examples to be written in the document that aren't functionally sourced directly from a tested file.
3. Context: A section specifying the place, environment, or situation in which the element can be used.
A context is a parent of the element; therefore, the element is a child of the parent in the ASG and DOM trees.
- Q: Should this section be normative or non-normative?
4. Content model: A normative section.
The content model of an element equates to the grammar rules for the contents of the element.
The content model describes the children the element is capable of accepting.
These children are represented as descendants of the element in the ASG and DOM trees.
- The section should state the model, what the element accepts, whether it can be empty, whether it can be parsed to empty, and what it can be interrupted by
- Attributes and metadata: A subsection of Content model that lists the attributes and metadata the element must accept
5. Possible section: Grammar (or) Grammar rules
- Q: What differentiates the grammar/grammar rule section and the content model and metadata sections? That is, why is this section needed in the specification document?
6. ASG and DOM: A normative section containing the applicable snippets from the ASG schema and the DOM, potentially in a tabbed interface.
- Should this section be a child of the content model section?
7. Q: Should we add a section or information in one of the sections that identifies whether the element is extensible?
As far as text usage and formatting, we need to decide if we want to:
* follow the formalized capitalization and usage of words such as MUST, SHOULD, etc. (there's an RFI for this sort of thing)
* try to consistently use those words in normative sections but don't capitalize them
* use some other terminology or styling to call out hard and soft rules
For this issue to be complete, I would like to be 75% confident we're using a communicative and usable structure, including clear section titles and sections in a logical order, and that the content in the sections clearly communicates to developers the information they must know while creating an implementation. However, we don't need to nail down every term or determine how we're going to functionally insert the ASG and DOM snippets into the specification from their tested single source.
When drafting up these guidelines, I created a few sample sections (Paragraph, Strong Span, etc.) and consulted the Unicode, HTML, JavaScript, and OpenAPI specs for structural information.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/33Clarify how block attribute lines are parsed and aggregated2023-06-26T21:00:15ZDan AllenClarify how block attribute lines are parsed and aggregatedWhile working through the block-level grammar, it became clear that there's a lot of gray area with regard to how the block attribute lines are parsed. This issue seeks to clarify these rules.
First, it should be stated how the attrlist...While working through the block-level grammar, it became clear that there's a lot of gray area with regard to how the block attribute lines are parsed. This issue seeks to clarify these rules.
First, it should be stated how the attrlist in a block attribute line is found and parsed. We might be tempted to think that the attrlist should be parsed incrementally using a top-level rule in the grammar within the block attribute line rule. At a high level, something like:
```
block_attribute_line = '[' block_attrs ']' eol
block_attrs = attrs:(block_attr|.., ',' !' ' / ' '* ',' ' '*|)
block_attr = block_attr_name '=' block_attr_value / block_attr_value
...
```
However, this approach is not compatible with the rule that a block attribute line must be restricted to a single line and that line must start and end with matching square brackets. In other words, the closing square bracket at the end of the line is a hard boundary (especially important when we get into the matter of resolving attribute references).
Instead, the attrlist should be parsed using a subparser, then aggregated with the result from other attrlists, in a rule action:
```
block_with_metadata = metadata:(attrlists:block_attribute_line* {
const attributes = {}
for (const { source: attrlist, location: loc } of attrlists) {
const theseAttributes = parseAttrlist(attrlist, { ..., line: loc[0].line, startCol: loc[0].col })
...
}
return { metadata: { attributes } }
}) block:block
{
return metadata ? Object.assign(block, { metadata }) : block
}
block_attribute_line = '[' block_attrlist ']' eol
block_attrlist = !space source:$(!(lf / space? ']') .)*
{
return { source, location: toSourceLocation(getLocation()) }
}
```
(We're saying here that the attrlist cannot start or end with a space)
This approach also has the benefit of making the attrlist parser easier to implement since it doesn't have to worry about overrunning the end of the line. However, since it's a subparser, it does require more effort to propagate the location information.
**R1: Use subparser to parse attrlist matched by block attribute line rule**
The next issue has to do with attribute references. AsciiDoc has always allowed attribute references to be resolved before the attrlist is parsed. Changing that now would be very problematic. However, keeping this feature does raise some questions that need to be answered.
First, having to preprocess the attrlist by resolving attribute references makes it clear that the attrlist requires a subparser. It's not possible to replace attribute references as the block attribute line is being matched. It could be done in the block/line preprocessor. However, the consequence of that is that an attribute value could introduce additional lines that could breach the boundaries of the block attribute line and actually change the structure of the document. This is not allowed today, and we don't want users to start exploiting that loophole. Therefore, it's imperative that the block attribute line be found first, then the contents within it (the attrlist) be parsed.
If attribute references are resolved first, we need a new mode for the inline preprocessor that only processes attributes. At the same time, it must not resolve attribute references inside inline passthroughs, so those ranges still need to be considered. It then needs to return the same result as it does for the inline parser, except the source mapping only contains information about resolved attribute references, not inline passthroughs. At this point, parsing of the attrlist may proceed. That parser need not worry about newline characters since any source that was added by an attribute value since the block parser still considers it all on the same line in the source document.
**R2: Use inline preprocessor to resolve attributes before parsing attrlist**
The next question has to do when inline parsing is performed on an attribute value. As part of a larger commit a decade ago, Asciidoctor introduced the feature into the AsciiDoc Language that substitutions (i.e., inline parsing) are applied to an attribute value enclosed in single quotes. If we were to preserve this feature, then we would expect the location on the inline nodes to be accurate (even though location information is not stored for unparsed attribute values).
But we have to wonder whether we should support this feature. After all, there are numerous attributes whose value should never be parsed, such as `id`, `role`, `opts`, `cols`, etc. Should the parser ignore the single quotes in these cases? And what about the title attribute. In Asciidoctor, the value is always parsed even if it is not enclosed in single quotes. Should that behavior be preserved?
Assuming we still want to parse single-quoted attribute values, the next question is how to handle inline preprocessing. Since attribute references have already been resolved, we don't want attribute references to be resolved again. However, the inline passthroughs need to be processed just as they would have been had that processing been done at the same time as attribute references. That means the inline preprocessor has to only work on ranges between resolved attribute references. We have effectively split the inline preprocessor into two modes or phases, yet it needs to work as if it was done all in the same phase.
In the case the attrlist has at least one attribute reference and one single-quoted value, the parsing order is:
resolve attribute references -> parse attrlist -> extract passthroughs -> parse inlines -> restore passthroughs
**R3: Either a) universally parse single-quoted attribute value, b) ignore single-quoted enclosure and parse attribute values of certain attribute names, like title, c) parse attribute values of certain attribute names if value is single quoted**
The next question is how to aggregate block attributes. Although multiple block attribute lines are permitted, in the end we need a single map of attributes. Here a sketch of the aggregation rules:
* role values are always aggregated; duplicates are ignored
* options is an alias for opts
* options are always aggregated; duplicates are ignored
* if any other attribute appears twice (whether it is named or positional), it is overwritten
* positional attributes are stored using 1-based index keys as a string (did we want to add a $ in front of the number)?
* the order of the names in the attribute map match the order in which they first appear in the document
* the first positional attribute permits shorthands (#idname, .rolename, %optname, [idname,reftext])
* what's left, if any, is interpreted as the style (e.g., source from source%linenums)
* if nothing remains, the existing style is not overwritten
Note that attributes that are overwritten don't appear in the ASG.
**R4: Aggregate certain attribute values (role, opts) and overwrite the rest**
Certain attributes are promoted to a top-level property on the node in the ASG. These attributes are:
* id
* title
* reftext
* roles (an array form of the role attribute); may want to move this to metadata.roles
The value of the title and reftext properties are always an array of inlines, even if the attribute value is not parsed (because it was enclosed in single quotes).
**R5: The metadata property is only defined if the block has metadata lines; metadata contains the properties attributes, options, roles, and location. The title and reftext attributes are promoted to top-level properties. The value of the top-level id property either comes from the attributes or generated, if applicable.**0.2.0 (milestone build)https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/32Is there any non-English version of AsciiDoc user documentation?2023-05-22T21:44:33Z劲 曾Is there any non-English version of AsciiDoc user documentation?Maybe I am wrong, but it is hard for me to find an official or up-to-date unofficial Chinese version of AsciiDoc user documentation. Also not possible to switch language on https://docs.asciidoctor.org/asciidoctor/latest. If there exist ...Maybe I am wrong, but it is hard for me to find an official or up-to-date unofficial Chinese version of AsciiDoc user documentation. Also not possible to switch language on https://docs.asciidoctor.org/asciidoctor/latest. If there exist some non-English versions of the doc site, where can I find them? If not, is there any plan to support multi-languages? Actually, I'd really like to bring my own contribution to a Chinese version if needed.https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/31Clarify how styled paragraphs are parsed and transformed2023-06-26T21:21:54ZDan AllenClarify how styled paragraphs are parsed and transformedThe AsciiDoc Language currently permits a paragraph to be promoted to a permissible named block by specifying that name as the block style. This is referred to as a styled paragraph. Consider this example:
```
[quote]
A quote block.
```...The AsciiDoc Language currently permits a paragraph to be promoted to a permissible named block by specifying that name as the block style. This is referred to as a styled paragraph. Consider this example:
```
[quote]
A quote block.
```
Now consider this example:
```
[source]
A source block.
```
The way this is implemented (according to the initial contribution) presents challenges for a language formalism for two reasons:
* The style influences how the ensuing lines are parsed
* The parser generates a named block which has lines instead of a child paragraph (when applicable)
The first challenge is not compatible with a language formalism. Allowing the style to modify the parsing rules is extremely difficult to express in a grammar. Thus, I'd like to propose a different approach which will be simpler to understand and have low risk for breaking compatibility with existing documents.
There are two parsing models for non-enclosed, non-marked lines: paragraph and literal paragraph. The former is a contiguous group of lines that ends at an empty or interrupting line. The later is a contiguous group of indented lines that ends at an empty line (no interrupting lines are possible). The later then drops the uniform indentation at the start of each line.
Thus, a styled paragraph is actually a transformation that occurs after parsing of the block is complete. A paragraph is parsed as a paragraph. A literal paragraph is parsed as a literal paragraph. Then the paragraph is promoted to a named block in the rule action. While it's not required that a styled paragraph for a verbatim block be written as a literal paragraph, it is required to avoid any interpretation of the lines. Thus, the second example above becomes:
```
[source]
A source block.
```
Which is equivalent to:
```
[source]
----
A source block.
----
```
That brings us to the second challenge. A styled paragraph is just a shorthand way of writing a named delimited block. Thus, once the parser has found a styled paragraph, it should generate the same result had it been written in the long form. For a verbatim block like a source block, this transformation is obvious since a verbatim block cannot have any child blocks. (You had one block, you get one block). However, a compound block like a sidebar block is more complex.
What we propose is that the paragraph be added as a child of the generated named block. (You had one block, you get two blocks). However, all metadata on the paragraph will get moved to the parent block. Thus, the paragraph will have no metadata (such as attributes or options). If the writer wants metadata on both the named block and the paragraph, it will be necessary to write it in long form (as a delimited block with a child paragraph). (Maybe we can hold back certain attributes/options, such as the lead role, text alignment roles, or the hardbreaks option).0.2.0 (milestone build)Dan AllenDan Allenhttps://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-lang/-/issues/30Change version of user documentation from latest to pre-spec2023-06-26T21:03:54ZDan AllenChange version of user documentation from latest to pre-specIn its current state, the user documentation for the AsciiDoc Language hosted in this repository constitutes the initial contribution for this project, which is a reading of AsciiDoc as implemented by Asciidoctor. The draft outline for t...In its current state, the user documentation for the AsciiDoc Language hosted in this repository constitutes the initial contribution for this project, which is a reading of AsciiDoc as implemented by Asciidoctor. The draft outline for the specification has set a course to formalize the language leading up to the 1.0.0 release. That means there will be notable differences in how the parsing is modeled and the language interpreted, which will require material updates to the explanations in this documentation. We need to preserve the user documentation in its current state, and flag it as such, for users of Asciidoctor while the spec is under development. In order to do so, the version of the user documentation in this repository should be changed from "latest" to "pre-spec". The documentation in this state can continue to accept refinements to the reading of AsciiDoc as implemented by Asciidoctor during this period.
Once we get to the point where we start updating the user documentation to match the specification, we'll create a pre-spec branch to store the pre-spec version, then update the version in the main branch to match the version of the specification.0.1.0 (milestone build)