resolves #51 document the normalization applied to the AsciiDoc source and...

resolves #51 document the normalization applied to the AsciiDoc source and AsciiDoc include files (PR #52)

resolves #51 document the normalization applied to the AsciiDoc source and...
resolves #51 document the normalization applied to the AsciiDoc source and AsciiDoc include files (PR #52)
fc6ca8e2 · Dan Allen · 3ff18825 · fc6ca8e2 · fc6ca8e2 · fc6ca8e2
Commit fc6ca8e2 authored 4 years ago by Dan Allen
--- a/CHANGELOG.adoc
+++ b/CHANGELOG.adoc
@@ -13,6 +13,7 @@ Fixed::
 Added::
 * Add example of how to select all lines outside of tagged regions and lines inside a specific tagged region
 * Document attribute list parsing in detail (#43)
+* Document the normalization applied to the AsciiDoc source and AsciiDoc include files (#51)
 Changed::
 * Clarify the rules for include tag filtering; emphasize that the wildcards can only be used once

--- a/modules/ROOT/nav-top.adoc
+++ b/modules/ROOT/nav-top.adoc
@@ -2,3 +2,4 @@
 ** xref:document-structure.adoc[]
 ** xref:key-concepts.adoc[]
 ** xref:document-processing.adoc[]
+** xref:normalization.adoc[]
--- a/modules/ROOT/pages/normalization.adoc
+++ b/modules/ROOT/pages/normalization.adoc
+= Normalization
+When an AsciiDoc processor reads the AsciiDoc source, the first thing it does is normalize the lines.
+(This operation can be performed up front or as each line is visited).
+Normalization consists of the following operations:
+* Force the encoding to UTF-8 (An AsciiDoc processor always assumes the content is UTF-8 encoded)
+* Strip trailing spaces from each line (including any end of line character)
+This normalization is performed independent of any structured context.
+It doesn't matter if the line is part of a literal block or a regular paragraph. All lines get normalized.
+Normalization is only applied in certain cases to the lines of an include file.
+Only include files that have a recognized AsciiDoc extension are normalized as described above.
+For all other files, only the trailing end of line character is removed.
+Include files can also have a different encoding, which is specified using the encoding attribute.
+If the encoding attribute is not specified, UTF-8 is assumed.
+When the AsciiDoc processor brings the lines back together to produce the rendered document (HTML, DocBook, etc), it joins the lines on the line feed character (`\n`).