module documentation
(source)

Generic parser for wiki formats

This parser for wiki text (and similar formats) consists of two classes: the Rule class which defines objects which specify a single parser rule, and the Parser class which takes a number of rules and parses a piece of text accordingly. The parser just does a series of regex matches and calls a method on the specific rule objects to process the match. Recursion can be achieved by making the rules process with another Parser object.

All rules have access to a Builder object which is used to construct the resulting parse tree.

There are several limitation to this parser. Most importantly it does not have backtracking, so once a rule matches it is not allowed to fail. But since we are dealing with wiki input it is a good assumption that the parser should always result in a representation of the text, even if it is broken according to the grammar. So rules should be made robust when implementing a wiki parser.

Another limitation comes from the fact that we use regular expressions. There is a limit on the number of capturing groups you can have in a single regex (100 on my system), and since all rules in a set are compiled into one big expression this can become an issue for more complex parser implementations. However for a typical wiki implementation this should be sufficient.

Note that the regexes are compiles using the flags re.U, re.M, and re.X. This means any whitespace in the expression is ignored, and a literal space need to be written as "\ ". In general you need to use the "r" string prefix to ensure those backslashes make it through to the final expression.

Class Builder No summary
Class BuilderTextBuffer Wrapper that buffers text going to a Builder object such that the last piece of text remains accessible for inspection and can be modified.
Class Parser Parser class that matches multiple rules at once. It will compile the patterns of various rules into a single regex and based on the match call the correct rules for processing.
Class ParserError Undocumented
Class Rule No summary
Class SimpleTreeBuilder Builder class that builds a tree of SimpleTreeElements
Class SimpleTreeElement No class docstring; 0/2 instance variable, 0/1 class variable, 1/6 method documented
Function convert_space_to_tab No summary
Function fix_unicode_chars Fixes missing line end @param text: the input text @returns: the fixed text
Function get_line_count No summary
Variable logger Undocumented
def convert_space_to_tab(text, tabstop=4): (source)
Convert spaces to tabs
Parameters
textthe input text
tabstopthe number of spaces to represent a tab
Returns
the fixed text
def fix_unicode_chars(text): (source)
Fixes missing line end
Parameters
textthe input text
Returns
the fixed text
def get_line_count(text, offset): (source)
Helper function used to report line numbers for exceptions that happen during parsing.
Parameters
textthe text being parsed
offsetcharacter offset in this text
Returns
a 2-tuple of the line and column that corresponds to this offset
logger = (source)

Undocumented