package documentation
(source)

Package with source formats for pages.

Each module in zim.formats should contains exactly one subclass of DumperClass and exactly one subclass of ParserClass (optional for export formats). These can be loaded by get_parser() and get_dumper() respectively. The requirement to have exactly one subclass per module means you can not import other classes that derive from these base classes directly into the module.

For format modules it is safe to import '*' from this module.

Parse tree structure

Parse trees are build using the (c)ElementTree module (included in python 2.5 as xml.etree.ElementTree). It is basically a xml structure supporting a subset of "html like" tags.

Supported tags:

  • page root element for grouping paragraphs

  • p for paragraphs

  • h for heading, level attribute can be 1..6

  • pre for verbatim paragraphs (no further parsing in these blocks)

  • em for emphasis, rendered italic by default

  • strong for strong emphasis, rendered bold by default

  • mark for highlighted text, rendered with background color or underlined

  • strike for text that is removed, usually rendered as strike through

  • code for inline verbatim text

  • ul for bullet and checkbox lists

  • ol for numbered lists

  • li for list items

  • link for links, attribute href gives the target

  • img for images, attributes src, width, height an optionally href and alt

    • type can be used to control plugin functionality, e.g. type=equation
  • table for tables, attributes * aligns - comma separated values: right,left,center * wraps - 0 for not wrapped, 1 for auto-wrapped line display

  • thead for table header row

    • th for table header cell
    • trow for table row
      • td for table data cell

Nesting rules:

  • paragraphs, list items, table cells & headings can contain all inline elements
  • inline formats can contain other inline formats as well as links and tags
  • code and pre cannot contain any other elements

Unlike html we respect line breaks and other whitespace as is. When rendering as html use the "white-space: pre" CSS definition to get the same effect.

Text blocks (paragraphs, listitems, headings, vertabim blocks) must end with a newline. Only the last block of the sequence can omit the newline. This case will be interpreted as a text snippet and affect copy-paste behavior.

Tables and other objects that are not inline are implicitly handled as ending in a newline.

As a result the newlines outsides blocks represent the number of empty lines between the blocks and newline ending the block is contained in the block.

If a page starts with a h1 this heading is considered the page title, else we can fall back to the page name as title.

NOTE: To avoid confusion: "headers" refers to meta data, usually in the form of rfc822 headers at the top of a page. But "heading" refers to a title or subtitle in the document.

Module html This module supports dumping to HTML
Module latex This modules handles export of LaTeX Code
Module markdown This module handles dumping markdown text with pandoc extensions
Module plain This module handles parsing and dumping input in plain text
Module rst This module handles dumping reStructuredText with sphinx extensions
Module wiki This module handles parsing and dumping wiki text
Module __main__ No summary

From __init__.py:

Class BaseLinker No summary
Class DocumentFragment Document fragment class for DOM-like access
Class DumperClass No summary
Class Element Element class for DOM-like access
Class Node Base class for DOM-like access to the document structure. @note: This class is not optimized for keeping large structures in memory.
Class OldParseTreeBuilder No summary
Class ParserClass Base class for parsers
Class ParseTree Wrapper for zim parse trees.
Class ParseTreeBuilder Builder object that builds a ParseTree
Class StubLinker Linker used for testing - just gives back the link as it was parsed. DO NOT USE outside of testing.
Class TableParser Common functions for converting a table from its' xml structure to another format
Class Visitor Conceptual opposite of a builder, but with same API. Used to walk nodes in a parsetree and call callbacks for each node. See e.g. ParseTree.visit().
Class VisitorSkip Exception to be raised when the visitor should skip a leaf node and not decent into it.
Class VisitorStop Exception to be raised to cancel a visitor action
Function canonical_name Undocumented
Function convert_list_iter_letter_to_number No summary
Function dump_header_lines Return text representation of header dict
Function encode_xml Encode text such that it can be used in xml @param text: label text as string @returns: encoded text
Function get_dumper Returns a dumper object instance for a specific format
Function get_format Returns the module object for a specific format.
Function get_format_module Returns the module object for a specific format
Function get_parser Returns a parser object instance for a specific format
Function heading_to_anchor Derive an anchor name from a heading
Function increase_list_iter No summary
Function list_formats Undocumented
Function parse_header_lines Read header lines in the rfc822 format. Can e.g. look like:
Constant ANCHOR Undocumented
Constant BLOCK Undocumented
Constant BLOCK_LEVEL Undocumented
Constant BULLET Undocumented
Constant BULLETLIST Undocumented
Constant CHECKED_BOX Undocumented
Constant EMPHASIS Undocumented
Constant EXPORT_FORMAT Undocumented
Constant FORMATTEDTEXT Undocumented
Constant FRAGMENT Undocumented
Constant HEADDATA Undocumented
Constant HEADING Undocumented
Constant HEADROW Undocumented
Constant IMAGE Undocumented
Constant IMPORT_FORMAT Undocumented
Constant LINE Undocumented
Constant LINK Undocumented
Constant LISTITEM Undocumented
Constant MARK Undocumented
Constant MIGRATED_BOX Undocumented
Constant NATIVE_FORMAT Undocumented
Constant NUMBEREDLIST Undocumented
Constant OBJECT Undocumented
Constant PARAGRAPH Undocumented
Constant STRIKE Undocumented
Constant STRONG Undocumented
Constant SUBSCRIPT Undocumented
Constant SUPERSCRIPT Undocumented
Constant TABLE Undocumented
Constant TABLEDATA Undocumented
Constant TABLEROW Undocumented
Constant TAG Undocumented
Constant TEXT_FORMAT Undocumented
Constant TRANSMIGRATED_BOX Undocumented
Constant UNCHECKED_BOX Undocumented
Constant VERBATIM Undocumented
Constant VERBATIM_BLOCK Undocumented
Constant XCHECKED_BOX Undocumented
Variable count_eol_re Undocumented
Variable DumperContextElement Undocumented
Variable logger Undocumented
Variable Pango Undocumented
Variable split_para_re Undocumented
Variable _aliases Undocumented
Variable _is_continue_re Undocumented
Variable _is_header_re Undocumented
Variable _letters Undocumented
logger = (source)

Undocumented

Pango = (source)

Undocumented

EXPORT_FORMAT: int = (source)

Undocumented

Value
1
IMPORT_FORMAT: int = (source)

Undocumented

Value
2
NATIVE_FORMAT: int = (source)

Undocumented

Value
4
TEXT_FORMAT: int = (source)

Undocumented

Value
8
UNCHECKED_BOX: str = (source)

Undocumented

Value
'unchecked-box'
CHECKED_BOX: str = (source)

Undocumented

Value
'checked-box'
XCHECKED_BOX: str = (source)

Undocumented

Value
'xchecked-box'
MIGRATED_BOX: str = (source)

Undocumented

Value
'migrated-box'
TRANSMIGRATED_BOX: str = (source)

Undocumented

Value
'transmigrated-box'
BULLET: str = (source)

Undocumented

Value
'*'
FORMATTEDTEXT: str = (source)

Undocumented

Value
'zim-tree'
FRAGMENT: str = (source)

Undocumented

Value
'zim-tree'
HEADING: str = (source)

Undocumented

Value
'h'
PARAGRAPH: str = (source)

Undocumented

Value
'p'
VERBATIM_BLOCK: str = (source)

Undocumented

Value
'pre'
BLOCK: str = (source)

Undocumented

Value
'div'
IMAGE: str = (source)

Undocumented

Value
'img'
OBJECT: str = (source)

Undocumented

Value
'object'
BULLETLIST: str = (source)

Undocumented

Value
'ul'
NUMBEREDLIST: str = (source)

Undocumented

Value
'ol'
LISTITEM: str = (source)

Undocumented

Value
'li'
EMPHASIS: str = (source)

Undocumented

Value
'emphasis'
STRONG: str = (source)

Undocumented

Value
'strong'
MARK: str = (source)

Undocumented

Value
'mark'
VERBATIM: str = (source)

Undocumented

Value
'code'
STRIKE: str = (source)

Undocumented

Value
'strike'
SUBSCRIPT: str = (source)

Undocumented

Value
'sub'
SUPERSCRIPT: str = (source)

Undocumented

Value
'sup'
LINK: str = (source)

Undocumented

Value
'link'

Undocumented

Value
'tag'
ANCHOR: str = (source)

Undocumented

Value
'anchor'
TABLE: str = (source)

Undocumented

Value
'table'
HEADROW: str = (source)

Undocumented

Value
'thead'
HEADDATA: str = (source)

Undocumented

Value
'th'
TABLEROW: str = (source)

Undocumented

Value
'trow'
TABLEDATA: str = (source)

Undocumented

Value
'td'
LINE: str = (source)

Undocumented

Value
'line'
BLOCK_LEVEL = (source)

Undocumented

Value
(PARAGRAPH, HEADING, VERBATIM_BLOCK, BLOCK, LISTITEM)
_letters: str = (source)

Undocumented

def increase_list_iter(listiter): (source)
Get the next item in a list for a numbered list E.g if listiter is "1" this function returns "2", if it is "a" it returns "b".
Parameters
listiterthe current item, either an integer number or single letter
Returns
the next item, or None
def convert_list_iter_letter_to_number(listiter): (source)
Convert a "letter" numbered list to a digit numbered list Usefull for export to formats that do not support letter lists. Both "A." and "a." convert to "1." assumption is that this function is used for start iter only, not whole list
def encode_xml(text): (source)
Encode text such that it can be used in xml
Parameters
textlabel text as string
Returns
encoded text
def list_formats(type): (source)

Undocumented

def canonical_name(name): (source)

Undocumented

_aliases: dict[str, str] = (source)

Undocumented

def get_format(name): (source)
Returns the module object for a specific format.
def get_format_module(name): (source)
Returns the module object for a specific format
Parameters
namethe format name
Returns
a module object
def get_parser(name, *arg, **kwarg): (source)
Returns a parser object instance for a specific format
Parameters
nameformat name
*argarguments to pass to the parser object
**kwargkeyword arguments to pass to the parser object
Returns
parser object instance (subclass of ParserClass)
def get_dumper(name, *arg, **kwarg): (source)
Returns a dumper object instance for a specific format
Parameters
nameformat name
*argarguments to pass to the dumper object
**kwargkeyword arguments to pass to the dumper object
Returns
dumper object instance (subclass of DumperClass)
def heading_to_anchor(name): (source)
Derive an anchor name from a heading
count_eol_re = (source)

Undocumented

split_para_re = (source)

Undocumented

DumperContextElement = (source)

Undocumented

_is_header_re = (source)

Undocumented

_is_continue_re = (source)

Undocumented

def parse_header_lines(text): (source)

Read header lines in the rfc822 format. Can e.g. look like:

        Content-Type: text/x-zim-wiki
        Wiki-Format: zim 0.4
        Creation-Date: 2010-12-14T14:15:09.134955
Returns
the text minus the headers and a dict with the headers
def dump_header_lines(*headers): (source)
Return text representation of header dict