Generic parser for wiki formats
This parser for wiki text (and similar formats) consists of two classes: the Rule class which defines objects which specify a single parser rule, and the Parser class which takes a number of rules and parses a piece of text accordingly. The parser just does a series of regex matches and calls a method on the specific rule objects to process the match. Recursion can be achieved by making the rules process with another Parser object.
All rules have access to a Builder object which is used to construct the resulting parse tree.
There are several limitation to this parser. Most importantly it does not have backtracking, so once a rule matches it is not allowed to fail. But since we are dealing with wiki input it is a good assumption that the parser should always result in a representation of the text, even if it is broken according to the grammar. So rules should be made robust when implementing a wiki parser.
Another limitation comes from the fact that we use regular expressions. There is a limit on the number of capturing groups you can have in a single regex (100 on my system), and since all rules in a set are compiled into one big expression this can become an issue for more complex parser implementations. However for a typical wiki implementation this should be sufficient.
Note that the regexes are compiles using the flags re.U, re.M, and re.X. This means any whitespace in the expression is ignored, and a literal space need to be written as "\ ". In general you need to use the "r" string prefix to ensure those backslashes make it through to the final expression.
| Class | Builder |
No summary |
| Class | BuilderTextBuffer |
Wrapper that buffers text going to a Builder object such that the last piece of text remains accessible for inspection and can be modified. |
| Class | Parser |
Parser class that matches multiple rules at once. It will compile the patterns of various rules into a single regex and based on the match call the correct rules for processing. |
| Class | ParserError |
Undocumented |
| Class | Rule |
No summary |
| Class | SimpleTreeBuilder |
Builder class that builds a tree of SimpleTreeElements |
| Class | SimpleTreeElement |
No class docstring; 0/2 instance variable, 0/1 class variable, 1/6 method documented |
| Function | convert_space_to_tab |
No summary |
| Function | fix_unicode_chars |
Fixes missing line end @param text: the input text @returns: the fixed text |
| Function | get_line_count |
No summary |
| Variable | logger |
Undocumented |
| Parameters | |
| text | the input text |
| tabstop | the number of spaces to represent a tab |
| Returns | |
| the fixed text | |