Skip to content
Snippets Groups Projects
Verified Commit 92973f9d authored by Sarah White's avatar Sarah White
Browse files

add initial block element, content model, and structural forms pages

parent 0b6fc20a
No related branches found
No related tags found
No related merge requests found
= Block Content Model
A block content model defines the permitted content a block can contain and the common rules to which the block subscribes.
The block parser identifies the content model of a block from the structural form used to express that block in the source text.
The block content models are: basic, verbatim, compound, empty, and raw.
// It's highly likely there are other content models (see the commented out headings and sections below), but I'm still doing research and those sections will be added as part of other issues.
== Basic
A block that belongs to the basic content model can only contain uninterpreted text, inline elements, and inline preprocessor directives.
Basic block content cannot contain any child block elements.
A basic block is represented as a terminal block node in the ASG tree.
The following block elements belong to the basic content model by default:
* paragraph
// * table cell?
// * principle list text?
  • Dan Allen @mojavelinux ·
    Developer

    These are all correct. At the moment, the later two are somewhat lesser blocks since they cannot have metadata, but we expect that to eventually change so that they can. A table cell is somewhat weird in that it can have multiple paragraphs...but they are more like line breaks than actual paragraphs. A verse block is similar in that regard.

    Edited by Dan Allen
  • Author Owner

    Got it :thumbsup:

  • Please register or sign in to reply
Certain attributes, such as a block style or AsciiDoc table cell modifier, can change a block that adheres to the basic content model into another type of block and how the parser transforms the content of the block.
== Verbatim
A block that belongs to the verbatim content model can only contain a single inline string; it cannot contain any child blocks.
A verbatim block is represented as a terminal block node in the ASG tree.
The following block elements belong to the verbatim content model by default:
* indented lines
  • Dan Allen @mojavelinux ·
    Developer

    Perhaps "standalone indented lines" since to make it clear they are an atomic unit

  • Please register or sign in to reply
* delimited literal block
  • Dan Allen @mojavelinux ·
    Developer

    I think we could say "delimited verbatim block (literal, listing, source)"

  • Author Owner

    This one I disagree with, because the list is to specifically point out the things that belong to this content model. By saying verbatim block, in the verbatim content model, you're using the thing to describe the thing.

  • Dan Allen @mojavelinux ·
    Developer

    Oh, I see. You're right.

  • Please register or sign in to reply
* delimited listing block
* delimited source block
== Compound block
A block that belongs to the compound block content model consists of one or more child block elements.
A compound block can contain basic, verbatim, compound, empty, and raw blocks.
  • Dan Allen @mojavelinux ·
    Developer

    A compound block may contain any other type of block. I don't think we need to itemize them here.

  • Author Owner

    This is a spec, it's all about being blatantly specific so we don't have to repeat it over and over. For now I prefer to itemize. We can revisit at the end of the spec and remove things if it is too repetitive.

  • Dan Allen @mojavelinux ·
    Developer

    Ah, of course. I don't know what I was thinking here. Itemize away!

  • Please register or sign in to reply
A compound block cannot contain inlines.
  • Dan Allen @mojavelinux ·
    Developer

    cannot itself contain inlines, but appropriate children can

  • Please register or sign in to reply
A compound block is a parent block node in the ASG tree.
The following block elements belong to the compound block content model by default:
* preamble
* admonition
* delimited example block
* delimited open block
* delimited quote block
* delimited sidebar block
* delimited verse block
// * table
// * list
// * dlist
  • Dan Allen @mojavelinux ·
    Developer

    I think we had talked at one point about table, list, and dlist being compound blocks with a restricted content model. In other words, they act like a compound block, but only allow designated children specific to that type of block. HTML has this same situation.

    Edited by Dan Allen
  • Author Owner

    Yeah, I know we have, and table I'm pretty happy with just being a compound block. I'm a little floaty on the lists though and wondering if they should just be their own thing because there would be so many exceptions. I'd prefer, if we can work it out, that we not have a million "except for "this" in "this" situation" because no one will really pay attention to all those admonition/exception things <- this goes for section too, which is why I have them commented out for now.

  • Dan Allen @mojavelinux ·
    Developer

    Yes, I think the restricted content model of lists make them their own thing. I actually consider table to be the same because the content model is even more restricted. Once we get into something like a list item or table cell, then we can start to view those more as a compound block, but there's no requirement to do so...just something to keep in mind.

  • Please register or sign in to reply
////
== Section block
  • Dan Allen @mojavelinux ·
    Developer

    Yes, a section block is actually its own type of compound block...perhaps we can say it is a specialization of the compound block, and thus a subsection here.

  • Author Owner

    See previous comment

  • Dan Allen @mojavelinux ·
    Developer

    I'm a little torn on a section because it is the most like a compound block. The only difference between a compound block and a section from a content model perspective is that a section can contain other sections. Perhaps we can just add this point in the compound block section if we aren't comfortable having a whole separate section for section.

  • Please register or sign in to reply
Section block content can contain other section blocks, as well as basic, verbatim, compound, empty, and raw blocks.
A section block is a parent block node in the ASG tree and therefore can only directly contain block nodes.
Sections are only permitted inside the document block and other section blocks; they're not permitted inside any other types of blocks.
For example, a section cannot start (or end) inside a sidebar block.
////
//== Empty block
// == Raw block
// stem?
// entry, list, dlist, table
\ No newline at end of file
= Blocks
== Block elements
Block elements, referred to as blocks, are discrete, linewise chunks of source text that form the main structure of an AsciiDoc document.
Block elements are stacked vertically, one block above or below another, in the source text.
  • Dan Allen @mojavelinux ·
    Developer

    Let's say "one block below another in document order"

  • Please register or sign in to reply
A block always starts at the beginning or effective beginning of a whole line and ends on a whole line (except for table cells).
Block elements are separated from one another by boundaries.
These boundaries are either implicit, such as an empty line, or explicit, such as a delimiter line that is part of an enclosure.
An enclosure is a source feature of some blocks.
Specifically, a block element consists of two or three source features: an enclosure, content, and metadata.
These three features are described in the following sections.
=== Boundaries and enclosures
The boundaries of a block define the start and end of a block.
A block boundary is an interrupting line that begins or ends the parsing context of a block.
Some blocks have explicit boundaries that mark the start and end of a block's content.
These explicit boundaries are represented by a balanced, matching pair of delimiter lines in the source text, and they are referred to as an enclosure because they enclose the content of the block.
The boundaries of many other blocks are implicit.
  • Dan Allen @mojavelinux ·
    Developer

    I don't think we need "many other" here, just "other"

  • Please register or sign in to reply
For example, the boundaries of a paragraph block are implicit because they can be represented by an empty line or by the boundary of a parent or sibling block, such as the delimiter line of a delimited block.
How the boundaries of a block are represented in the source text is defined by the structural form of a block.
=== Content
The content of a block is one or more lines of source text, which may consist of other blocks or inlines if permitted by the block's content model.
The block's content always starts at the beginning or effective beginning of a new line (except for table cells).
The content of a block may end on the same line it starts on or on a subsequent line.
What content a block can contain and how that content is handled is determined by the content model to which the block belongs.
=== Metadata
A block element can have metadata, such as a block title line, block attribute lines, or boxed attribute list.
\ No newline at end of file
= Block Parsing and Structural Forms
== Block parsing
Block element parsing takes precedence over inline element parsing.
Each block should be parsed in the order it appears in the document.
How a block is identified and parsed is determined by a block's structural form in the source text and the content model that applies to the structural form.
Discrete headings are the one exception to this rule.
The parser identifies a block using its structural form in the source text.
The structural form controls how the parser determines the boundaries of a block.
The structural form also tells the parser the default content model to apply to the block.
If the content model of the block allows the block to contain child blocks, the parser will descend into the block to search for and identify any child blocks.
Once the parser has identified the boundaries of a block, the lines that comprise it, and, if allowed, the boundaries and lines of any child blocks, it will then determine if and how the lines in the block should be parsed.
If and how the lines that comprise a block should be parsed is determined by the block's content model and further refined by the block style or macro name of the block.
The block's content model and its style or macro name instruct the parser whether it should change the block from one type to another, run the inline preprocessor in a certain mode on the lines, and run the inline parser on the lines.
  • Dan Allen @mojavelinux ·
    Developer

    We should be clear that inline parsing occurs as soon as the end of the block is identified. The inline parsing of a block always happens before the inline parsing of blocks the follow that block in document order. This is the most fundamental change in the parsing of AsciiDoc being introduced by the specification.

  • Please register or sign in to reply
When the parser completes the parsing of a block it records the name of the block, its variant (if applicable), its structural form, and any block metadata as properties of a block node in the ASG.
== Block structural forms
In AsciiDoc, there are various structural forms, which are the building blocks of the language at the block level.
Each structural form indicates how the block is expressed in the source text.
These forms are recognized by the parser based on the grammar rule that they match.
The structural form also informs the parser what block content model it should apply to the lines that comprise the block.
Once the parser identifies what the structure is and the block content model associated with the structure, it follows certain rules for how to identify the lines that comprise the block, how to process any leading markers or surrounding delimiter lines, and whether to search for and parse any child blocks within the parent block.
The block parser should identify the block and its boundaries without considering any of the block metadata, including the block style attribute or the block macro name.
In other words, the block style attribute and block macro name should not influence how the parser identifies the block structure.
The one exception to this rule is the discrete block style on a heading.
  • Dan Allen @mojavelinux ·
    Developer

    In this case, the style is effectively acting as part of the structural form.

  • Dan Allen @mojavelinux ·
    Developer

    I almost wonder whether we should use the # marker for discrete headings, whereas the = marker is for section titles. Just a thought.

  • Author Owner

    I would like to most respectively veto the # marker.

  • Dan Allen @mojavelinux ·
    Developer

    I think we might want to bring this up to the spec group / language project, but we won't mention it here. The reason I say that is because Markdown only has discrete headings and it uses the # marker. So it would be an easier way for us to distinguish between headings that are discrete and those that are titles of a section using existing knowledge. If we decided to get rid of the discrete style on a section title, then we could also simplify the parsing drastically (it's actually pretty hard to parse differently based on the style). I think there are legs to this idea. But not something to solve in this MR to be sure.

  • Please register or sign in to reply
The block structural forms are: paragraph, indented line, delimited block, block macro, shorthand block macro, heading, and list item.
The structural form is recorded by the `form` property on a block node in the ASG.
=== Paragraph form
The paragraph block form is composed of a sequence of contiguous, non-interrupting lines of source text.
The first line of the source text may not start with a space character; it should start directly adjacent to the left margin.
Line breaks are not significant between adjacent lines in a paragraph.
  • Dan Allen @mojavelinux ·
    Developer

    As a side note, indented lines can be demoted to a paragraph by applying the normal style. But the lines will still be parsed at the block-level as indented lines. This can be useful as an escaping mechanism. Perhaps we can put this information in an admonition or sidebar here.

  • Author Owner

    I feel like this is heavily explained somewhere else, and should belong in the section where we go into indented lines, not the paragraph.

  • Dan Allen @mojavelinux ·
    Developer

    Got it. Just wanted to mention it in case it wasn't yet called out.

  • Please register or sign in to reply
A paragraph block ends when the block parser encounters any of the following interrupting lines and syntax:
* empty line
* block attribute line
* block delimiter line
* list continuation line (proposed)
* table cell delimiter (when the paragraph occurs anywhere inside a table)
The default content model of the paragraph form is the basic content model.
////
=== Indented form
=== Delimited form
=== Macro form
=== Shorthand macro form
=== Heading (marked, prefixed?) form
=== List item form
=== dlist item form
////
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment