Package with source formats for pages.
Each module in zim.formats should contains exactly one subclass of DumperClass and exactly one subclass of ParserClass (optional for export formats). These can be loaded by get_parser()
and get_dumper()
respectively. The requirement to have exactly one subclass per module means you can not import other classes that derive from these base classes directly into the module.
For format modules it is safe to import '*' from this module.
Parse trees are build using the (c)ElementTree module (included in python 2.5 as xml.etree.ElementTree). It is basically a xml structure supporting a subset of "html like" tags.
Supported tags:
page root element for grouping paragraphs
p for paragraphs
h for heading, level attribute can be 1..6
pre for verbatim paragraphs (no further parsing in these blocks)
em for emphasis, rendered italic by default
strong for strong emphasis, rendered bold by default
mark for highlighted text, rendered with background color or underlined
strike for text that is removed, usually rendered as strike through
code for inline verbatim text
ul for bullet and checkbox lists
ol for numbered lists
li for list items
link for links, attribute href gives the target
img for images, attributes src, width, height an optionally href and alt
table for tables, attributes * aligns - comma separated values: right,left,center * wraps - 0 for not wrapped, 1 for auto-wrapped line display
thead for table header row
Nesting rules:
Unlike html we respect line breaks and other whitespace as is. When rendering as html use the "white-space: pre" CSS definition to get the same effect.
Text blocks (paragraphs, listitems, headings, vertabim blocks) must end with a newline. Only the last block of the sequence can omit the newline. This case will be interpreted as a text snippet and affect copy-paste behavior.
Tables and other objects that are not inline are implicitly handled as ending in a newline.
As a result the newlines outsides blocks represent the number of empty lines between the blocks and newline ending the block is contained in the block.
If a page starts with a h1 this heading is considered the page title, else we can fall back to the page name as title.
NOTE: To avoid confusion: "headers" refers to meta data, usually in the form of rfc822 headers at the top of a page. But "heading" refers to a title or subtitle in the document.
Module | html |
This module supports dumping to HTML |
Module | latex |
This modules handles export of LaTeX Code |
Module | markdown |
This module handles dumping markdown text with pandoc extensions |
Module | plain |
This module handles parsing and dumping input in plain text |
Module | rst |
This module handles dumping reStructuredText with sphinx extensions |
Module | wiki |
This module handles parsing and dumping wiki text |
Module | __main__ |
No summary |
From __init__.py
:
Class | BaseLinker |
No summary |
Class | DocumentFragment |
Document fragment class for DOM-like access |
Class | DumperClass |
No summary |
Class | Element |
Element class for DOM-like access |
Class | Node |
Base class for DOM-like access to the document structure. @note: This class is not optimized for keeping large structures in memory. |
Class | OldParseTreeBuilder |
No summary |
Class | ParserClass |
Base class for parsers |
Class | ParseTree |
Wrapper for zim parse trees. |
Class | ParseTreeBuilder |
Builder object that builds a ParseTree |
Class | StubLinker |
Linker used for testing - just gives back the link as it was parsed. DO NOT USE outside of testing. |
Class | TableParser |
Common functions for converting a table from its' xml structure to another format |
Class | Visitor |
Conceptual opposite of a builder, but with same API. Used to walk nodes in a parsetree and call callbacks for each node. See e.g. ParseTree.visit() . |
Class | VisitorSkip |
Exception to be raised when the visitor should skip a leaf node and not decent into it. |
Class | VisitorStop |
Exception to be raised to cancel a visitor action |
Function | canonical_name |
Undocumented |
Function | convert_list_iter_letter_to_number |
No summary |
Function | dump_header_lines |
Return text representation of header dict |
Function | encode_xml |
Encode text such that it can be used in xml @param text: label text as string @returns: encoded text |
Function | get_dumper |
Returns a dumper object instance for a specific format |
Function | get_format |
Returns the module object for a specific format. |
Function | get_format_module |
Returns the module object for a specific format |
Function | get_parser |
Returns a parser object instance for a specific format |
Function | heading_to_anchor |
Derive an anchor name from a heading |
Function | increase_list_iter |
No summary |
Function | list_formats |
Undocumented |
Function | parse_header_lines |
Read header lines in the rfc822 format. Can e.g. look like: |
Constant | ANCHOR |
Undocumented |
Constant | BLOCK |
Undocumented |
Constant | BLOCK_LEVEL |
Undocumented |
Constant | BULLET |
Undocumented |
Constant | BULLETLIST |
Undocumented |
Constant | CHECKED_BOX |
Undocumented |
Constant | EMPHASIS |
Undocumented |
Constant | EXPORT_FORMAT |
Undocumented |
Constant | FORMATTEDTEXT |
Undocumented |
Constant | FRAGMENT |
Undocumented |
Constant | HEADDATA |
Undocumented |
Constant | HEADING |
Undocumented |
Constant | HEADROW |
Undocumented |
Constant | IMAGE |
Undocumented |
Constant | IMPORT_FORMAT |
Undocumented |
Constant | LINE |
Undocumented |
Constant | LINK |
Undocumented |
Constant | LISTITEM |
Undocumented |
Constant | MARK |
Undocumented |
Constant | MIGRATED_BOX |
Undocumented |
Constant | NATIVE_FORMAT |
Undocumented |
Constant | NUMBEREDLIST |
Undocumented |
Constant | OBJECT |
Undocumented |
Constant | PARAGRAPH |
Undocumented |
Constant | STRIKE |
Undocumented |
Constant | STRONG |
Undocumented |
Constant | SUBSCRIPT |
Undocumented |
Constant | SUPERSCRIPT |
Undocumented |
Constant | TABLE |
Undocumented |
Constant | TABLEDATA |
Undocumented |
Constant | TABLEROW |
Undocumented |
Constant | TAG |
Undocumented |
Constant | TEXT_FORMAT |
Undocumented |
Constant | TRANSMIGRATED_BOX |
Undocumented |
Constant | UNCHECKED_BOX |
Undocumented |
Constant | VERBATIM |
Undocumented |
Constant | VERBATIM_BLOCK |
Undocumented |
Constant | XCHECKED_BOX |
Undocumented |
Variable | count_eol_re |
Undocumented |
Variable | DumperContextElement |
Undocumented |
Variable | logger |
Undocumented |
Variable | Pango |
Undocumented |
Variable | split_para_re |
Undocumented |
Variable | _aliases |
Undocumented |
Variable | _is_continue_re |
Undocumented |
Variable | _is_header_re |
Undocumented |
Variable | _letters |
Undocumented |
Parameters | |
listiter | the current item, either an integer number or single letter |
Returns | |
the next item, or None |
Parameters | |
name | format name |
*arg | arguments to pass to the parser object |
**kwarg | keyword arguments to pass to the parser object |
Returns | |
parser object instance (subclass of ParserClass ) |
Parameters | |
name | format name |
*arg | arguments to pass to the dumper object |
**kwarg | keyword arguments to pass to the dumper object |
Returns | |
dumper object instance (subclass of DumperClass ) |