io7m-kstructural 0.3.1 Documentation
Package Information
Orientation
Overview
The structural language is a syntactically lightweight language for writing technical documentation.
Metadata
The structural language allows for tagging of terms with metadata. For example, in this document, every reference to the name structural is tagged with "package". Individual terms within a document are given semantic meaning. The available tags are entirely user-defined: The language merely provides a way to apply tags, but does not define any tags of its own.
Semantic Simplicity
The structural language currently defines around twenty elements. The language is trivial to learn, the number of possible permutations of elements is low, and the output is predictable.
Lightweight Syntax
The structural language defines an abstract model for documents as an algebraic data type. It then defines multiple formats, all of which compile down to the same internal model. The canonical format is defined in terms of s-expressions and can be parsed at a basic level by any standard s-expression parser. An XML format with a full RELAX-NG schema is provided for legacy compatibility. Additionally, an imperative format is provided that allows for syntax that is only slightly more heavyweight than Markdown whilst still allowing for the same control over metadata and document structure. Documents may freely combine elements written in any of the formats via the standard import mechanism [6].
Portability
The kstructural implementation is written primarily in the Kotlin language using APIs defined in Java 8. The implementation can therefore execute on any JVM supporting Java 8 and above. This also means that the language can be used to generate documentation as part of the build process for Java based projects without having to assume the existence of any platform-specific native binaries.
Specification
The structural language is carefully and unambiguously specified as an executable Haskell specification.
Modular Documents
The structural language supports modular documents with a simple import system: Documents may import elements or raw text from files, so large technical manuals can be cleanly separated into files based on logical content. Intra-document links are fully supported and full referential integrity checks are performed to ensure that links always point to elements that actually exist. Circular imports are detected and prevented. Additionally, the parsers are constructed to be secure by default: Malicious documents cannot import files outside of a given base directory, and in the case of the XML encoding, cannot cause the XML parser to perform requests over the network for remote content.
Comparison With Other Systems
In the table below, Metadata should be taken to mean that the language allows for expressing the semantic meaning of the contents of the documents, as opposed to just being a series of formatting commands.
Semantic Simplicity is a somewhat subjective (but somewhat measurable) judgement of how simple the conceptual model is of each language. An excellent way to measure this aspect of a language is to attempt to come up with a definition of an algebraic type that is able to represent a parsed and validated document in the given language [0].
Lightweight Syntax is an indication of the ratio between markup and actual document content. For example, XML is notoriously verbose and the text of some documents can often consist more of XML elements than actual content.
The term Specification should be taken to mean that both the language has a complete formal specification of the syntax and the underlying semantic model, and that documents can be machine-checked against this specification. The important point is: For a given document, can a machine determine unambiguously whether or not a document is valid? This is critical for both for ensuring that documents remain accessible decades into the future, and for ensuring that different implementations of languages assign the same meaning to documents: If a language does not have a machine-checkable specification, then users of that language are locked into that implementation of the language perpetually. Languages are assigned a value of Informal if they have at least made an attempt at a complete specification even if the specification is ambiguous, unimplementable, and/or does not provide any means to check documents.
The term Portability should be taken to mean that the language either has a platform-independent implementation, or implementations exist for multiple platforms [1].
The term Modular Documents should be taken to mean that a language provides a way to break documents into multiple files. Languages are assigned a value of Import if they have a language-supported system that actually parses external files and performs substitutions into an abstract syntax tree. Languages are assigned a value of Include if they only implement a simple-minded system akin to macro expansions or the C preprocessor (where the contents of external files are simply dumped verbatim into the current file and the whole mess is parsed as one unit).
structuralDocBookMarkdownTexinfotroffAsciiDocreStructuredTextLaTeX
MetadataYesYesNoNoNoNoNoNo
Semantic SimplicityYesNoMaybe [3]NoNoNoNoNo
Lightweight SyntaxYesNoYesYesYesYesYesQuestionable
SpecificationYesYesInformalNoNoNoInformalNo
PortabilityYesYesMaybe [4]YesMaybe [5]YesYesYes
Modular DocumentsImportImportNoIncludeNoIncludeNoInclude
The DocBook system is similar to structural in that it allows for metadata within a document. DocBook, however, takes a different approach in that it defines elements for all of the things that authors may be expected to talk about in technical documentation. For example, if an author wants to tag a name as being the name of a software package, the name has to be contained within a package element. In terms of semantic simplicity, the sheer number of defined elements and the resulting possible permutations of elements mean that it is difficult to make the case that DocBook is in any way simple. In the author's experience, there are many combinations of elements in DocBook that are valid according to the schema but cannot actually be used in practice because the resulting XHTML output becomes ugly or difficult to style in a useful manner. DocBook is, however, strongly and unambiguously specified: If a document is well-formed XML, it can be machine-checked against the published schema and it is immediately known whether or not the document is valid. DocBook also allows for modular documents via the standard XML XInclude mechanism. Unfortunately, due to DocBook being defined in XML, writing documents using it can be an exercise in physical stamina. XML is notorious for being syntactically heavyweight and really requires editor support to avoid causing repetitive strain injury.
Markdown was originally designed as a trivial text format intended for quick conversion to HTML. Additionally, Markdown is intended to be readable without processing. Unfortunately, the original description of Markdown has numerous ambiguities, meaning that almost every implementation of the format differs in important ways. It was also designed with the incredibly poorly thought out idea that no document should be considered invalid; every mistake simply causes silent failure or corrupted output. It offers absolutely no standard way to incorporate metadata into documents: Terms may be marked as being bold, italic, or monospace, and very little else. It offers no way to make documents modular, and most implementations require the user to manually concatenate their documents into one large file before passing it to the Markdown processor. An attempt has been made to formalize a compatible common subset of Markdown into a system known as CommonMark. Unfortunately, after two years, the compatible subset is still rather poorly specified and contains almost none of the features of any existing implementation such as footnotes and intra-document links [2].
GNU Texinfo is a typesetting system similar to LaTeX and has all of the same advantages and flaws as that system.
troff is a somewhat archaic UNIX typesetting tool primarily used to construct manual pages. Due to being a macro-based typesetting system (albeit a drastically simpler one), it suffers from most of the same disadvantages as LaTeX. The syntax, however, is extremely lightweight and easy to parse.
LaTeX is a set of macros for the TeX typesetting system. While it produces very aesthetically pleasing output and has moderately lightweight syntax, it suffers from an excruciatingly error-prone document authoring workflow due to its macro-based nature. The user is forced to manage the state of an enormous imperative typesetting engine, error messages are incomprehensible at best and typically contain layers of elements that have appeared from inside expanded macros. Users are forced to use external packages of macros to get support for basic features such as images, and due to the complete lack of a type system or indeed any kind of module system at all, packages can and do break when combined in unexpected ways. The language has no specification, and an include -based system for modular documents (made extremely dangerous by the presence of the global state machine). The kstructural implementation of the structural language currently contains support for producing LaTeX code. This protects users from having to have anything to do with the LaTeX or TeX system directly.
AsciiDoc is a Python-based text document format. It has all of the advantages and disadvantages of a Markdown based system but supports a drastically larger feature set than most Markdown implementations. It also contains an insecure-by-default mode of operation that makes evaluating arbitrary documents a risky proposition. It has no specification.
reStructuredText is a text document format in the style of AsciiDoc. It has the same advantages and disadvantages as both AsciiDoc and Markdown. It has what appears to be a fairly complete informal specification, but does not have any way to determine if a given document is valid or not.
Installation
Source compilation
The project can be compiled and installed with Maven:
$ mvn -C clean install
Maven
Regular releases are made to the Central Repository, so it's possible to use the io7m-kstructural package in your projects with the following Maven dependencies:
<!-- For the user-friendly frontend Java API -->
<dependency>
  <groupId>com.io7m.kstructural</groupId>
  <artifactId>io7m-kstructural-frontend</artifactId>
  <version>0.1.0</version>
</dependency>
All io7m.com packages use Semantic Versioning [7], which implies that it is always safe to use version ranges with an exclusive upper bound equal to the next major version - the API of the package will not change in a backwards-incompatible manner before the next major version.
Platform Specific Issues
There are currently no known platform-specific issues.
License
All files distributed with the io7m-kstructural package are placed under the following license:
Copyright © 2016 <code@io7m.com> http://io7m.com

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Tutorial
Overview
The structural language is intended to be syntactically lightweight language for writing technical documentation.
The language is currently missing a tutorial, so instead, here is the source code for the documentation that you are currently reading.
Tools
Command Line Usage
Synopsis
The kstructural command processes structural documents.
Usage: kstructural [options] [command] [command options]
The command line tool is distributed as part of the executable io7m-kstructural-cmdline-0.3.1-main.jar file (referred to as kstructural.jar in usage examples, for brevity):
$ java -jar io7m-kstructural-cmdline-0.3.1-main.jar
check
The check subcommand parses and validates the given document.
check      Check document syntax and structure
  Usage: check [options]
    Options:
      * -file
          Input file
        -verbose
          Set the minimum logging verbosity level
          Default: info
          Possible Values: [trace, debug, info, warn, error]
The command exits with code 0 if no errors occurred, and a positive exit code otherwise.
$ java -jar kstructural.jar check -file valid.sd
$ echo $?
0

$ java -jar kstructural.jar check -file invalid.sd
ERROR com.io7m.kstructural.frontend.KSOpCheck: invalid.sd: 1:14: Expected an inline element.
  Expected: [symbol:{image | include | link-ext | link | term | verbatim | footnote-ref | table | list-ordered | list-unordered} ... ]
  Received: [invalid]
$ echo $?
1
compile-xhtml
The compile-xhtml subcommand parses and validates the given document, and then generates XHTML pages based on the content.
compile-xhtml      Compile documents to XHTML
  Usage: compile-xhtml [options]
    Options:
      -brand-bottom
         Append the contents of the given XML file to each XHTML page's body
         element
      -brand-top
         Prepend the contents of the given XML file to each XHTML page's
         body element
      -css-create-default
         Create the default CSS files in the output directory
         Default: true
      -css-extra-styles
         A comma-separated list of extra CSS styles (as URIs) that will be
         used for each page
         Default: []
      -css-include-default
         Include links to the default CSS files
         Default: true
    * -file
         Input file
    * -output-dir
         The directory in which output files will be written
      -pagination
         The type of XHTML pagination that will be used
         Default: multi
         Possible Values: [single, multi]
      -render-toc-document
         Render a table of contents at the document level
         Default: true
      -render-toc-part
         Render a table of contents at the part level
         Default: true
      -render-toc-section
         Render a table of contents at the section level
         Default: true
      -verbose
         Set the minimum logging verbosity level
         Default: info
         Possible Values: [trace, debug, info, warn, error]
The -brand-bottom option specifies an XML file that will be appended to the body of each generated XHTML page. This effectively allows for custom footers on generated pages. Note that the XML file is parsed and the element is appended to the AST of each generated page. This guarantees that the output is well formed (although not necessarily valid) XHTML.
The -brand-top option specifies an XML file that will be prepended to the body of each generated XHTML page. This effectively allows for custom headers on generated pages. Note that the XML file is parsed and the element is prepended to the AST of each generated page. This guarantees that the output is well formed (although not necessarily valid) XHTML.
The -css-create-default option specifies that the default CSS files used by the kstructural should be written to the output directory. This option can be set to false if entirely custom CSS is to be used.
The -file option specifies the input file.
The -css-include-default option specifies that links to the default CSS files used by the kstructural should be generated in each XHTML page. This option can be set to false if entirely custom CSS is to be used.
The -output-dir option specifies the output directory.
The -pagination option specifies how the generated XHTML should be paginated. A value of single indicates that the output should be one large XHTML page. A value of multi indicates that a new page should be created for each document, part, and section.
The -render-toc-document option specifies whether or not a table of contents should be generated for the main document.
The -render-toc-section option specifies whether or not a table of contents should be generated at the start of each section.
The -render-toc-part option specifies whether or not a table of contents should be generated at the start of each part.
The -verbose option specifies the level of logging desired.
The command exits with code 0 if no errors occurred, and a positive exit code otherwise.
$ java -jar kstructural.jar compile-xhtml -file valid.sd -output-dir /tmp
$ echo $?
0

$ java -jar kstructural.jar compile-xhtml -pagination single -file valid.sd -output-dir /tmp
$ echo $?
0

$ file /tmp/index-m.xhtml
/tmp/index-m.xhtml: XML 1.0 document, ASCII text, with very long lines, with CRLF line terminators
$ file /tmp/index.xhtml
/tmp/index.xhtml: XML 1.0 document, ASCII text, with very long lines, with CRLF line terminators
compile-latex
The compile-latex subcommand parses and validates the given document, and then generates LaTeX based on the content.
compile-latex      Compile documents to LaTeX
  Usage: compile-latex [options]
    Options:
    * -file
         Input file
    * -output-dir
         The directory in which output files will be written
      -type-map
         A file containing type name to LaTeX emphasis mappings
      -verbose
         Set the minimum logging verbosity level
         Default: info
         Possible Values: [trace, debug, info, warn, error]
The -file option specifies the input file.
The -output-dir option specifies the output directory.
The -type-map option specifies a file that contains a set of mappings from terms to emphasis types. This is used to mark specific terms as being displayed in a monospaced font, in bold, or in italic. The file contains one line per term and has the following grammar:
mapping =
  term_name , ':' , style ;

term_character =
  p{isLetter} | p{isNumber}_ ;

term_name =
  term_character , { term_character } ;

style =
  "bold" | "mono" | "italic" ;
For example, the following type map makes all text tagged with function monospaced, all text tagged with package bold, and all text tagged with term italic:
package  : bold
function : mono
term     : italic
The -verbose option specifies the level of logging desired.
The command exits with code 0 if no errors occurred, and a positive exit code otherwise.
$ java -jar kstructural.jar compile-latex -file valid.sd -output-dir /tmp
$ echo $?
0

$ file /tmp/main.tex
/tmp/main.tex: ASCII text, with very long lines
compile-plain
The compile-plain subcommand parses and validates the given document, and then generates plain text based on the content.
compile-plain      Compile documents to plain text
  Usage: compile-plain [options]
    Options:
    * -file
         Input file
    * -output-dir
         The directory in which output files will be written
      -verbose
         Set the minimum logging verbosity level
         Default: info
         Possible Values: [trace, debug, info, warn, error]
The -file option specifies the input file.
The -output-dir option specifies the output directory.
The -verbose option specifies the level of logging desired.
The command exits with code 0 if no errors occurred, and a positive exit code otherwise.
$ java -jar kstructural.jar compile-plain -file valid.sd -output-dir /tmp
$ echo $?
0

$ file /tmp/main.txt
/tmp/main.tex: UTF-8 Unicode text
convert
The convert subcommand parses and validates the given document, and then converts it to one of the supported structural formats.
convert      Convert documents between input formats
  Usage: convert [options]
    Options:
    * -file
         Input file
      -format
         The format that will be used for exported documents
         Default: canonical
         Possible Values: [canonical, imperative, xml]
      -indent
         The number of spaces that will be used to indent documents
         Default: 2
      -no-imports
         Export as one large document that does not contain any imports
         Default: false
    * -output-dir
         The directory in which output files will be written
      -verbose
         Set the minimum logging verbosity level
         Default: info
         Possible Values: [trace, debug, info, warn, error]
      -width
         The maximum width in characters that will be used when formatting
         documents
         Default: 80
The command exits with code 0 if no errors occurred, and a positive exit code otherwise.
$ cat valid.sd
[document
  [title A document]
  [section
    [title A section]
    [paragraph A paragraph]]]

$ java -jar kstructural.jar convert -file valid.sd -output-dir /tmp -format imperative
$ java -jar kstructural.jar convert -file valid.sd -output-dir /tmp -format xml

$ cat /tmp/main.xml
<?xml version="1.0" encoding="UTF-8"?>
<s:document s:title="A document" xmlns:s="http://schemas.io7m.com/structural/3.0.0">
  <s:section s:title="A section">
    <s:paragraph>A paragraph</s:paragraph>
  </s:section>
</s:document>

$ cat /tmp/main.sdi
[document [title A document]]
[section [title A section]]
[paragraph]
A paragraph
Maven Plugin Usage
Synopsis
The kstructural Maven plugin processes documents during a Maven build.
The plugin currently exposes the command line's compile-xhtml, compile-latex, and compile-plain commands via the compileXHTML, compileLaTeX, and compilePlain Maven goals to produce XHTML, LaTeX, and plain text documentation during the build. The goals have the exact same behaviour as the command-line subcommands and the parameters have the same names modulo differences in casing, so the command line documentation should be consulted for information on the behaviour of the parameters.
<plugin>
  <groupId>com.io7m.kstructural</groupId>
  <artifactId>io7m-kstructural-maven-plugin</artifactId>
  <version>0.3.1</version>
  <executions>
    <execution>
      <id>exec-multi</id>
      <goals>
        <goal>compileXHTML</goal>
      </goals>
      <phase>process-resources</phase>
      <configuration>
        <documentFile>src/main/resources/documentation/documentation.sd</documentFile>
        <outputDirectory>target/documentation/</outputDirectory>
        <pagination>XHTML_MULTI_PAGE</pagination>
        <brandTopFile>src/main/resources/documentation/brand.xml</brandTopFile>
        <renderTOCDocument>true</renderTOCDocument>
        <renderTOCSection>true</renderTOCSection>
        <renderTOCPart>true</renderTOCPart>
        <cssIncludeDefault>true</cssIncludeDefault>
        <cssCreateDefault>true</cssCreateDefault>
        <cssExtraStyles>
          <param>documentation.css</param>
        </cssExtraStyles>
      </configuration>
    </execution>
  </executions>
</plugin>
Skipping
Set the property kstructural.skip to true to skip execution of the plugin.
$ mvn -Dkstructural.skip=true clean package
Exhaustive Example
Exhaustive Example
Overview
This section represents an exhaustive example of every element available in the structural language. See the source code listing [complete.sd].
Footnotes
This is a reference to a footnote [8].
Images
This paragraph contains an image: A strawberry.
Links
This is an internal link. This is an external link.
Lists
An unordered list contained within this paragraph as inline content:
  • One
  • Two
  • Three
An ordered list contained within this paragraph as inline content:
  1. One
  2. Two
  3. Three
Nested Lists
A nested unordered list contained within this paragraph as inline content:
    • One
    • Two
    • Three
    • One
    • Two
    • Three
    • One
    • Two
    • Three
A nested ordered list contained within this paragraph as inline content:
    1. One
    2. Two
    3. Three
    1. One
    2. Two
    3. Three
    1. One
    2. Two
    3. Three
Table
The paragraph contains a simple 3x3 table with a head:
OneTwoThree
Top LeftTop MiddleTop Right
Middle LeftMiddle MiddleMiddle Right
Bottom LeftBottom MiddleBottom Right
Table (Headless)
The paragraph contains a simple 3x3 table without a head:
Top LeftTop MiddleTop Right
Middle LeftMiddle MiddleMiddle Right
Bottom LeftBottom MiddleBottom Right
Table (Mixed)
The paragraph contains a table with differently sized rows:
Top OneTop Two
Middle OneMiddle TwoMiddle Three
Bottom One
Term
This paragraph contains a selection of terms: mono, bold, italic.
Verbatim
This paragraph contains verbatim text:
This is verbatim text.
ASCII
This paragraph contains the entire ASCII space. Note that unrepresentable characters may be replaced with U+FFFD:
� � � � � � � � � 	 
 � � 
 � � � � � � � � � � � � � � � � � �  
! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  
This paragraph contains the entire ASCII space again, but not contained within a verbatim element: � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ 
LaTeX
This paragraph contains text intended to injure the LaTeX backend by trying to insert an \end{verbatim} command inside a verbatim element:
\end{verbatim}
Target
This subsection is the target of an earlier link.

[0]
Doing this for an imperative state machine such as LaTeX could easily become a research-level project.
[1]
This again assumes the existence of an unambiguous formal specifications so that multiple interoperable implementations can actually exist!
[2]
Users of Markdown are resigned to the fact that they are writing documents in an implementation-specific dialect of Markdown because there is no other alternative. This is an incredibly poor long-term strategy.
[3]
The CommonMark specification has a simple semantic model due to including almost nothing.
[4]
There are Markdown implementations for most languages. However, there may not be an implementation that supports the specific dialect the author has used.
[5]
Being a traditional UNIX tool, the troff tool is unlikely to be pleasant to use outside of UNIX.
[6]
Although elements written in the XML format cannot import non-XML elements due to limitations with XInclude.
[8]
This is the footnote content.