wiki index

Site Generator

Written February 4, 2021, last changed February 21, 2021

A tool for generating static html and gemtext pages from markdown-like source documents.

Source code

By keeping the functionality simple and the code structure straightforward, adding and removing features becomes an inviting task. Rather than building a library of common tools or writing extensions or plugins, I want to keep the scope small enough that new variations for kinds of websites could be copy & paste clones with custom tweaks.

> sitegen --help

usage:
  sitegen <command> [<args>]

commands:
  make   The main generation tool
  index  Outputs site indexes in various formats
  docs   Print the sitegen documentation in sitegen format

Building the site

> sitegen make --help

usage:
  sitegen make [-p] [--private] [--] <out_dir> [<site_dir>]
  sitegen make [--help]

arguments:
  out_dir   the output folder for all generated content
  site_dir  the input folder for the site itself, defaults to cwd
options:
  --help         show this text
  -p, --private  include private content in the build

The site generator will generate /html and /gmi directories inside of the given out directories. My publishing process looks something like:

rm -rf out
mkdir -p out/html out/gmi
cp -r public-html/* out/html/
cp -r public/* out/html/
cp -r public/* out/gmi
sitegen make out site && \
scp -r out/html/* username@clarity.flowers:/path/to/http/server &&\
scp -r out/gmi/* username@clarity.flowers:/path/to/gmi/server

Please note that these documents are full-fledged programs. Running the site generator on a document that you haven't vetted yourself is like running a shell script you got off the internet without making sure you know what it does. Use at your own risk.

Indexing

> sitegen index --help

usage:
  sitegen index [--help]
  sitegen index [--private] [--limit <num>] [--updates | --additions] [--] 
    <file> [<file>...]
options:
  -p, --private            include private content in the list
  -l <num>, --limit <num>  limit the number of entries to the top num
  -u, --updates            only include updates (and not additions)
  -a, --additions          only include additions (and not updates)

Indexing allows you to create lists of files inside of your site, which you can combine with the ability to run arbitrary commands inside of documents to produce in-page tables of contents.

> sitegen index --limit 5 *

=> language.* 2021-02-27 – Language
=> toki_pona.* 2021-02-27 – Toki Pona – Added some sample Toki Pona text.
=> computers.* 2021-02-21 – Computers – Added some more interesting things
=> music.* 2021-02-21 – Music
=> sexual_hegemony.* 2021-02-21 – Notes on Sexual Hegemony – Finished chapter 4

Document format

Sitegen documents are UTF8-encoded .txt files, with newlines as line endings. The first few lines of the file are the title and metadata, broken by an empty line, followed by its contents.

Info

The beginning of a document starts with the title and the date it was written, followed by optional metadata information.

Site Generator
Written 2021-02-04

You can keep track of the history of updates to the page:

Updated 2021-02-05 Added a bunch of features
Updated 2021-02-06 You can have multiple updates now

You can also mark a page as private or unlisted

Private
Unlisted

Private pages will only be generated if you include the "--private" option during generation. Unlisted pages will be generated, but won't show up during indexing.

Body

The remaining text comprises the document itself. Documents are raw text with special "blocks" marked by prefixing a line with special characters.

Raw text is copied literally in gemini, and separated into paragraphs by empty lines in html. Newlines are carried-over in the final content, so you'll need an editor that support text wrapping for large blocks of text.

There are two levels of heading available:

# Section heading
## Sub-section heading

Links are written gemini-style.

=> gemini://gemini.circumlunar.space/ About Gemini

Links to internal documents need to have a different extension based on the output format. Using "*" as an extension will accomplish this.

=> computers.* Computers

Block quotes are formatted into paragraphs the same way raw text is:

> This line and...
> ... this one are all one paragraph within this quote block.
> 
> This is a separate paragraph in the same quote block.

There is only a single style of list, with only one line allowed per entry:

- list
- items

Preformatted text is preceded by two spaces:

Here is a code block:

  echo "hello world"

Sometimes you'll want certain blocks to only be included in certain output formats. You can start the line with the given extension and write raw text that will be literally copied into matching documents.

.html <img src="/media/me.png" alt="A selfie"/>
.gmi => /media/me.png A selfie

Lines prefixed with a semicolon are private. You can use them to keep track of personal notes that don't need to be shared, or to comment your documents. Private lines are only included in the final output if you pass the "--private" option during generation.

; This is a private line

Parsing for private lines happens during a pre-parsing step, which means your private lines can include formatting like any other.

; => somewhere_secret.* A private link

You can also run arbitrary commands with your shell and parse the output as formatted text!

: echo "# Hello world
: => hello.png Hello world!"

This has pretty huge implications as it allows you to write "plugins" in any language that generate data without constraints. The shell is invoked with its working directory as the directory of the document file, and the "$FILE" environment variable is set to the name of the file (without the .txt) extension.

A pattern I find myself using fairly often (including in this document) is to pipe the result of commands into sed in order to format the output as a preformatted block.

: sitegen index --help | sed 's/.*/  &/'

Example

For a complete example of a document, here is the source of the page you are currently reading:

Site Generator
Written 2021-02-04
Updated 2021-02-06 Added a bunch of features
Updated 2021-02-16 Blank lines are now handled literally in gemini
Updated 2021-02-21 Rewrote to a more conventional documation format.

A tool for generating static html and gemtext pages from markdown-like source documents.

=> https://github.com/clarityflowers/sitegen Source code

By keeping the functionality simple and the code structure straightforward, adding and removing features becomes an inviting task. Rather than building a library of common tools or writing extensions or plugins, I want to keep the scope small enough that new variations for kinds of websites could be copy & paste clones with custom tweaks.

: echo '  > sitegen --help'
: echo '  '
: sitegen --help | sed 's/.*/  &/'

# Building the site

: echo '  > sitegen make --help'
: echo '  '
: sitegen make --help | sed 's/.*/  &/'

The site generator will generate /html and /gmi directories inside of the given out directories. My publishing process looks something like:

  rm -rf out
  mkdir -p out/html out/gmi
  cp -r public-html/* out/html/
  cp -r public/* out/html/
  cp -r public/* out/gmi
  sitegen make out site && \
  scp -r out/html/* username@clarity.flowers:/path/to/http/server &&\
  scp -r out/gmi/* username@clarity.flowers:/path/to/gmi/server

Please note that these documents are full-fledged programs. Running the site generator on a document that you haven't vetted yourself is like running a shell script you got off the internet without making sure you know what it does. Use at your own risk.


# Indexing

: echo '  > sitegen index --help'
: echo '  '
: sitegen index --help | sed 's/.*/  &/'

Indexing allows you to create lists of files inside of your site, which you can combine with the ability to run arbitrary commands inside of documents to produce in-page tables of contents.

: echo '  > sitegen index --limit 5 *'
: echo '  '
: sitegen index --limit 5 * | sed 's/.*/  &/'

# Document format

Sitegen documents are UTF8-encoded .txt files, with newlines as line endings. The first few lines of the file are the title and metadata, broken by an empty line, followed by its contents.

## Info

The beginning of a document starts with the title and the date it was written, followed by optional metadata information.

  Site Generator
  Written 2021-02-04

You can keep track of the history of updates to the page:

  Updated 2021-02-05 Added a bunch of features
  Updated 2021-02-06 You can have multiple updates now

You can also mark a page as private or unlisted

  Private
  Unlisted

Private pages will only be generated if you include the "--private" option during generation. Unlisted pages will be generated, but won't show up during indexing.

## Body

The remaining text comprises the document itself. Documents are raw text with special "blocks" marked by prefixing a line with special characters.

Raw text is copied literally in gemini, and separated into paragraphs by empty lines in html. Newlines are carried-over in the final content, so you'll need an editor that support text wrapping for large blocks of text.

There are two levels of heading available:

  # Section heading
  ## Sub-section heading

Links are written gemini-style.

  => gemini://gemini.circumlunar.space/ About Gemini

Links to internal documents need to have a different extension based on the output format. Using "*" as an extension will accomplish this.

  => computers.* Computers

Block quotes are formatted into paragraphs the same way raw text is:

  > This line and...
  > ... this one are all one paragraph within this quote block.
  > 
  > This is a separate paragraph in the same quote block.

There is only a single style of list, with only one line allowed per entry:

  - list
  - items

Preformatted text is preceded by two spaces:

  Here is a code block:
  
    echo "hello world"

Sometimes you'll want certain blocks to only be included in certain output formats. You can start the line with the given extension and write raw text that will be literally copied into matching documents.

  .html <img src="/media/me.png" alt="A selfie"/>
  .gmi => /media/me.png A selfie

Lines prefixed with a semicolon are private. You can use them to keep track of personal notes that don't need to be shared, or to comment your documents. Private lines are only included in the final output if you pass the "--private" option during generation.

  ; This is a private line

Parsing for private lines happens during a pre-parsing step, which means your private lines can include formatting like any other.

  ; => somewhere_secret.* A private link

You can also run arbitrary commands with your shell and parse the output as formatted text!

  : echo "# Hello world
  : => hello.png Hello world!"

This has pretty huge implications as it allows you to write "plugins" in any language that generate data without constraints. The shell is invoked with its working directory as the directory of the document file, and the "$FILE" environment variable is set to the name of the file (without the .txt) extension.

A pattern I find myself using fairly often (including in this document) is to pipe the result of commands into sed in order to format the output as a preformatted block. 

  : sitegen index --help | sed 's/.*/  &/'

## Example

For a complete example of a document, here is the source of the page you are currently reading:

: sed 's/.*/  &/' {$FILE}.txt