STX is a markup language for creating documents. It is designed to be simple enough to be easily read and written but at the same time powerful enough to allow creating complex documents.
A document in STX is generated by a single self-describing text file that must comply with the syntax so it can be compiled. External content can be included from other resources by using directives.
Some of the patterns in the syntax were inspired by other markup languages like Markdown and AsciiDoc so the entry barrier was low.
The current STX implementation was written in Python and can be installed with pip.
The source code can be found in the python-stx repository in GitHub.
This document is a living example of the capabilities of STX, the source code can be found in the file index.stx in the docs repository in GitHub.
Most of the features are oriented to help to create technical and academic documents.
The text written in STX generates documents represented by a hierarchical component structure. Building a document using components makes it possible to have “infinite” nesting and the option to render it to multiple output formats.
Nesting rules are based on the text alignment. Whenever there is block mark,
the content can be nested following the alignment of the subsequent text.
The content can be broken just by breaking the alignment or by using a %
(percentage symbol).
Since the STX documents are structured, cross references are easy to generate and validate. Text between brackets are considered links, the target reference can be customized by appending it between parenthesis Broken references are validated generating a warning.
The matching algorithm for validating references ignores the case and only
considers words using ASCII letters and numbers.
A reference like [this]
is equivalent than [THIS]
.
Only sections can be automatically referenced by using the heading text, if a section attempts to create duplicated references, the algorithm starts appending a number.
Custom references are supported by adding the ref
attribute to any
component.
Since the documents in STX are organized by sections, they can be easily numbered. Each section, figure and table are automatically numbered.
Additionally, there is a function for generating a Table of Contents. All numbers of sections, figures and tables of this document were auto-generated as well as the table of contents.
While develop a new output format is not a trivial task, STX is designed to be modular so that the output formats can be plugged in. The current supported output formats are described in this section.
Since STX document are stand-alone, the output format is specified by using directives in the same document. A document can have any number of output formats.
This output format generates one single HTML5 file. The document structure is completely based on the HTMLBook specification, a great unofficial draft from O’Reilly Media, Inc.
Since this is the main format for rendering STX documents, some directives are optimized for HTML, example:
Another useful feature for creating HTML documents is the embed
function,
which integrates the content from another file directly in the document.
This output format generates a JSON file with the raw structure of the document. This format is not optimized for human reading but can be useful for debugging and using it with other tools.
The STX syntax is stricter than other markup languages. Having a malformed component will cause a compilation error.
Since this is a markup language as well, the marks in the text together with the text alignment makes it possible to create complex documents. This section describes the different types of marks and its syntax.
This type of components are able to contain more components inside, they are the building blocks for a document.
Block marks are not recognized inside inline content, consequently, paragraphs cannot begin with a block mark.
When several block components are defined sequentially, they can even be considered as a single composite component.
There are 6 levels of sections, they can be created by repeating the =
(equal symbol). The number of symbols represents the level of the section.
The content of the section must be aligned with the section mark.
Compared with other markup languages, sections here are not just titles, they contain all subsequent components until a section with a level equal or less is found.
There are two kind of lists: unordered and ordered, they can be created by
using the -
(dash symbol) and .
(period symbol) respectively.
The content of the items can be any supported component as long as the nesting rules are followed.
Tables are created by gathering rows, the mark for heading rows is |=
(pipe and equal symbols) and for normal rows is |-
(pipe and dash symbol).
The content of a row represents a cell, more cells can be added to the row
by using the |
(pipe symbol), this cell mark can be used inline, in other
words, multiple cells can be defined in a single line.
The number of cells per row is not required to match.
This component is allowed to contain raw text which won't be parsed by STX.
The text must be delimited by +++
(three plus symbols); one line before
and one line after.
Optionally, the raw text can be processed by a function to decorate it or
create a richer component.
The function must be indicated by an entry right after the first +++
mark (there can by any amount of spaces in between).
See the functions sections for more details.
This component creates a group of components which can be processed by a
function to decorate them or create a richer component. The components
must be delimited by {{{
(three left curly braces) and }}}
(three
right curly braces)
; one line before
and one line after, the function must be indicated by an entry right
after the first {{{
mark (there can by any amount of spaces in between).
See the functions sections for more details.
Any block component can be broken by using %
(percentage symbol).
Any other sequence of characters that doesn't match with a block mark is considered a paragraph. The paragraphs are composed by a sequence of inline components and are broken by an empty line.
There are some delimiter marks that can decorate the surrounded inline content for producing richer components. These marks can be nested as any other component in STX.
Mark | Description | Example |
---|---|---|
* (asterisk symbol)
| Strong text: normally rendered as bold text. | Note: *This* is important!
|
_ (underscore symbol)
| Emphasized text: normally rendered as cursive text. | A.K.A text in _italics_.
|
` (grave accent symbol)
| Code sample text: normally rendered with a monospace typography. | HTML uses the `<code>` tag.
|
~~ (two tilde symbols)
| Strike-through text: normally rendered with a horizontal line through its center. | Use ~~one~~ two tilde symbols.
|
"" (two double quotes symbol)
| Inline typographic quotation primary marks: the text is surrounded
with “ and ” .
| ""Primary"" marks.
|
'' (two single quotes symbol)
| Inline typographic quotation secondary marks: the text is surrounded
with ‘ and ’ .
| ''Secondary'' marks.
|
Links can create cross references for internal targets or hyperlinks for
the external ones. In order to create a link, the inline content must be
surrounded by [
(left square bracket symbol) and ]
(right square
bracket symbol), optionally, the target reference surrounded
by parenthesis should be appended immediately after the right bracket.
There are some combinations of characters that generate specific symbols:
Character Sequence | Symbol |
---|---|
... (three period symbols) | … (ellipsis)
|
Similar to the capturing blocks, inline content can be captured as well
by surrounding it with {
and }
(left and right brace symbols).
Captured text doesn't change too much by itself, it only gets grouped
and the rendered result is normally the same.
In order to generate richer content, the captured content can be processed
by appending a function call immediately after the }
(right brace
symbol).
To create a function call, an entry must be surrounded with <
and >
(left and right angle bracket symbols). Functions calls can receive
captured content and the arguments indicated in the entry.
The name of the entry indicates the function and the value are the direct arguments. Captured content can be passed as argument by putting it immediately before of a function call.
All components can receive arguments, they are used mainly for changing how it is rendered or just to provide meta information.
The arguments must be specified before the component and are created with
a @
(at symbol) followed by an entry. The entry represents the name and
value of the attribute.
The attributes which can accept all components are described in the table below.
Attribute | Accepted Values | Description |
---|---|---|
ref
| Token or Group of tokens. | Defines how the component can be referenced by links. |
Directives are instructions for the STX compiler, they are created by using
a #
(number sign) followed by an entry representing the name and
arguments of the directive.
Unlike functions, that produce a component and are evaluated after the document is parsed, directives are evaluated immediately when they are found in the document and doesn't necessarily produce components.
Accepts a Token or a Group of tokens, the tokens represent the files which are going to be included where the directive was placed.
The specified files are relative to the current STX file, if the arguments contain a folder, the entire content of the folder is included recursively.
Indicates the output format after processing the document. Accepts following arguments:
format
: Built-in values are html
and json
.
target
: The file where the document should be generated.
options
: Format-specific options.
This directive exists because HTML and other formats supports the use of stylesheets. It accepts a Token or a Group of tokens representing the stylesheet files.
There are some directives for defining document attributes, these values are not necessarily rendered in the document.
title
: Defines the title of the document.
author
: Defines the author of the document.
encoding
: Defines the encoding of the document.
In STX, functions can receive plain values and components as arguments to generate new components. There are some built-in functions described in this section, however, more functions can be registered at runtime since the design is modular.
Functions can be invoked by using following components:
All allowed components can use plain values as arguments by sending them in the entry.
This function can be invoked by using the code
keyword accepting following
plain arguments:
lang
: Name of the language of the content.
Literal text should be sent as argument, if a rich component is sent, it will be converted to plain text throwing a warning.
If the language is supported by some registered grammar, the result will be tokenized, otherwise the result will be just marked as a code block.
This function can be invoked by using the img
and image
keywords,
accepting following plain arguments:
src
: Location of the image.
alt
: Alternative text describing the image.
The resulting component will be an image inserted in the document.
This function can be invoked by using the embed
keyword,
accepting following plain arguments:
src
: Location of the file to embed.
The resulting component will be a literal text with the content of the file.
For embed the components of a STX file,
the directive include
should be used.
This function can be invoked by using the toc
keyword,
accepting following plain arguments:
title
: The title of the table of contents.
The resulting component will be a table of contents of all sections defined in the document.
This function can be invoked by using the information
and warning
keywords.
The resulting component will be a block marked as an admonition of the type of the keyword.
Apart from the rich content of the components, in STX there is a syntax for entering structured data which is used to invoke and pass arguments to functions, define directives and attributes, among other uses.
This way of entering data is designed to be easy to write and read for a human, it consists in three types of data described below.
Represents a final value, depending on the context it can be interpreted as text, number, boolean, etc.
There are two ways of writing a token:
_
), dashes (-
),
periods (.
) or slashes (/
).
`
), special
characters can be scaped by using a \
.
An entry represents a named value, the name is defined by using a token
followed by a :
(colon symbol) as a separator, the value can be either a
token or a group. When the value is a group, the separator can be omitted.
Sequence of values (token, entry or group) delimited by parenthesis ()
and separated by a comma (,
).