zim.parser

module documentation

(source)

Generic parser for wiki formats

This parser for wiki text (and similar formats) consists of two classes: the Rule class which defines objects which specify a single parser rule, and the Parser class which takes a number of rules and parses a piece of text accordingly. The parser just does a series of regex matches and calls a method on the specific rule objects to process the match. Recursion can be achieved by making the rules process with another Parser object.

All rules have access to a Builder object which is used to construct the resulting parse tree.

There are several limitation to this parser. Most importantly it does not have backtracking, so once a rule matches it is not allowed to fail. But since we are dealing with wiki input it is a good assumption that the parser should always result in a representation of the text, even if it is broken according to the grammar. So rules should be made robust when implementing a wiki parser.

Another limitation comes from the fact that we use regular expressions. There is a limit on the number of capturing groups you can have in a single regex (100 on my system), and since all rules in a set are compiled into one big expression this can become an issue for more complex parser implementations. However for a typical wiki implementation this should be sufficient.

Note that the regexes are compiles using the flags re.U, re.M, and re.X. This means any whitespace in the expression is ignored, and a literal space need to be written as "\ ". In general you need to use the "r" string prefix to ensure those backslashes make it through to the final expression.

Class	`Builder`	No summary
Class	`BuilderTextBuffer`	Wrapper that buffers text going to a `Builder` object such that the last piece of text remains accessible for inspection and can be modified.
Class	`Parser`	Parser class that matches multiple rules at once. It will compile the patterns of various rules into a single regex and based on the match call the correct rules for processing.
Class	`ParserError`	Undocumented
Class	`Rule`	No summary
Class	`SimpleTreeBuilder`	Builder class that builds a tree of `SimpleTreeElement`s
Class	`SimpleTreeElement`	No class docstring; 0/2 instance variable, 0/1 class variable, 1/6 method documented
Function	`convert_space_to_tab`	No summary
Function	`fix_unicode_chars`	Fixes missing line end @param text: the input text @returns: the fixed text
Function	`get_line_count`	No summary
Variable	`logger`	Undocumented

def convert_space_to_tab(text, tabstop=4): (source)

Convert spaces to tabs

Parameters
text	the input text
tabstop	the number of spaces to represent a tab
Returns
the fixed text

def fix_unicode_chars(text): (source)

Fixes missing line end

Parameters
text	the input text
Returns
the fixed text

def get_line_count(text, offset): (source)

Helper function used to report line numbers for exceptions that happen during parsing.

Parameters
text	the text being parsed
offset	character offset in this text
Returns
a 2-tuple of the line and column that corresponds to this offset

logger = (source)

Undocumented