The Goofy Markup Language Processor
Last Update: Version 1.4.3, October 8, 2017.
A "less hasty" update than previous. Markdown support greatly improved.
Note: This code has a basic flaw: It treats the input text as an array of lines, modifying the array as it processes the text. It should read the input text line by line, while buffering/outputting the processed text separately.
The term Markup Language Processor reflects it's origins as a program to convert "marked up" text to HTML. It has since turned into something completely different: a set of functions for converting text in nearly any conceivable way via a User Defined Data File.
The main documentation is viewed through index.php
run under your localhost server.
The idea was, at first, to be able to change the converted output, not by modifying code, but modifying the data that defined the output conversions.
Why?
I saw the basic idea of "some markup to HTML" as a sole program as myopic. For example, Markdown only does Markdown; Textile only does Textile; BBCode only does BBCode; etc.
The reason for GMLP was to convert any markup language to HTML with a definition file for each markup language. So GMLP does Markdown, Textile, BBCode, etc. And anyone can add their own definition file.
Markdown support is only slowly being finished and Textile support has been removed. (The Markdown code is "hastily, sloppily" code and needs yet another re-write.)
But GMLP is not just for markup to HTML. It is a fully functional "any text input to any text output" processor, with no programming required, though basic familiarity with PHP and regular expressions is.
GMLP Is
- A really small PHP API and not a Class.
- Not a single string literal test for markup in the code.
- Conversions are based on user defined regular expressions.
- User defined functions can be added to support conversions.
- Can be integrated into a Blog/CMS type website (though the code is a bit slow).
- Can be run from the command line to convert files.
The algorithm is currently wbout 800 lines and a basic definition file (data and code) is about 300 lines – which means basic Markdown conversion in 1100 lines (though there is some tweaking to do for it). (Compared to Markdown.pl, at about 1400 lines, the size of the code is not that relevent; but Markdown can only do Markdown.)
Caveats
- The code lacks proper test/benchmark code, though that has been begun.
- Some of this code is a bit sloppy and unclear. The documentation is poor.
- See Simple Markup for a 300 line version of HTML markup.
Definition Files
The changes to occur to the input text is defined by a particularly formatted PHP associative array of "actions" to perform on the input data.
A definition file has sections for conversions based on characters, words, lines and blocks (multi-line), with each section defining "rules" for how to convert the input text.
For more complex conversions the output can be supported by user defined functions. And there are "hooks" to run functions with the data at certain points in the conversion process.
There are additional sections for internal options and for defining paragraph marks and end of line terminators.
Please note that the following Definition Files (and their included code) are not fully completed. That is a fault of implementation and not limitation. Which means they will eventually be finished and work as expected as soon as I (or anybody else) finishes their code.
The Definition Files
Definition Files are in the tests/
directory:
markdown
Markdown to HTMLtxt2markdown
plain text to Markdownhtm2markdown
HTML to Markdowntxt2html
plain text to HTMLhtml2txt
HTML to plain textmd2txt
Markdown to plain textbbcode
BBCode to HTMLcss-
CSS minifycss+
CSS un-minifyreplace
string search and replace (example)js-
Javascript minimizer (not great)comments-
strip comment lines/blocks
Recently added:
htmlstrip
remove all HTML from inputphpdoc
Phpdoc function comments to PHP.NET documentation
See phpdoc.txt.
Experimental:
js
Javascript syntax highlighting
The Javascript highlighting definition file is kind of sloppy and weird, and was created just to see if that kind of processing could be done.
Markdown
Markdown is nearly complete. It's about 400 lines now.
HTML to Markdown
HTML to Markdown has just been started (one day's worth of coding) and is based on to-markdown
by Dom Christie. (It is a good example of the kind of things that can be done.)
Text to Markdown
Text to Markdown is experimental, and a way to particularly format text documents without using Markdown, but to be readily converted to Markdown. A line of only UPPERCASE WORDS converted to a # header, or only Camel Case words to ##, for example.
Text to HTML
Text to HTML is new and similar to above but output is HTML (like Text -> MD -> HTML without the MD).
Testing
GMLP comes with index.php
which will convert the main document gmlp.txt
as well as the .php
source files, and the Markdown and To-Markdown test files.
The default markup is called "Goofy Markup" and should be considered example only.
CLI
There is also a CLI executable, gmlp
, for command line converting and testing (see CLI.md
).