Generic parser for wiki formats
This parser for wiki text (and similar formats) consists of two classes: the Rule
class which defines objects which specify a single parser rule, and the Parser
class which takes a number of rules and parses a piece of text accordingly. The parser just does a series of regex matches and calls a method on the specific rule objects to process the match. Recursion can be achieved by making the rules process with another Parser
object.
All rules have access to a Builder
object which is used to construct the resulting parse tree.
There are several limitation to this parser. Most importantly it does not have backtracking, so once a rule matches it is not allowed to fail. But since we are dealing with wiki input it is a good assumption that the parser should always result in a representation of the text, even if it is broken according to the grammar. So rules should be made robust when implementing a wiki parser.
Another limitation comes from the fact that we use regular expressions. There is a limit on the number of capturing groups you can have in a single regex (100 on my system), and since all rules in a set are compiled into one big expression this can become an issue for more complex parser implementations. However for a typical wiki implementation this should be sufficient.
Note that the regexes are compiles using the flags re.U, re.M, and re.X. This means any whitespace in the expression is ignored, and a literal space need to be written as "\ ". In general you need to use the "r" string prefix to ensure those backslashes make it through to the final expression.
Class | Builder |
No summary |
Class | BuilderTextBuffer |
Wrapper that buffers text going to a Builder object such that the last piece of text remains accessible for inspection and can be modified. |
Class | Parser |
Parser class that matches multiple rules at once. It will compile the patterns of various rules into a single regex and based on the match call the correct rules for processing. |
Class | ParserError |
Undocumented |
Class | Rule |
No summary |
Class | SimpleTreeBuilder |
Builder class that builds a tree of SimpleTreeElement s |
Class | SimpleTreeElement |
No class docstring; 0/2 instance variable, 0/1 class variable, 1/6 method documented |
Function | convert_space_to_tab |
No summary |
Function | fix_unicode_chars |
Fixes missing line end @param text: the input text @returns: the fixed text |
Function | get_line_count |
No summary |
Variable | logger |
Undocumented |
Parameters | |
text | the input text |
tabstop | the number of spaces to represent a tab |
Returns | |
the fixed text |