zim.formats

package documentation

(source)

Package with source formats for pages.

Each module in zim.formats should contains exactly one subclass of DumperClass and exactly one subclass of ParserClass (optional for export formats). These can be loaded by get_parser() and get_dumper() respectively. The requirement to have exactly one subclass per module means you can not import other classes that derive from these base classes directly into the module.

For format modules it is safe to import '*' from this module.

Parse tree structure

Parse trees are build using the (c)ElementTree module (included in python 2.5 as xml.etree.ElementTree). It is basically a xml structure supporting a subset of "html like" tags.

Supported tags:

page root element for grouping paragraphs
p for paragraphs
h for heading, level attribute can be 1..6
pre for verbatim paragraphs (no further parsing in these blocks)
em for emphasis, rendered italic by default
strong for strong emphasis, rendered bold by default
mark for highlighted text, rendered with background color or underlined
strike for text that is removed, usually rendered as strike through
code for inline verbatim text
ul for bullet and checkbox lists
ol for numbered lists
li for list items
link for links, attribute href gives the target
img for images, attributes src, width, height an optionally href and alt
- type can be used to control plugin functionality, e.g. type=equation
table for tables, attributes * aligns - comma separated values: right,left,center * wraps - 0 for not wrapped, 1 for auto-wrapped line display
thead for table header row
- th for table header cell
- trow for table row
  - td for table data cell

Nesting rules:

paragraphs, list items, table cells & headings can contain all inline elements
inline formats can contain other inline formats as well as links and tags
code and pre cannot contain any other elements

Unlike html we respect line breaks and other whitespace as is. When rendering as html use the "white-space: pre" CSS definition to get the same effect.

Text blocks (paragraphs, listitems, headings, vertabim blocks) must end with a newline. Only the last block of the sequence can omit the newline. This case will be interpreted as a text snippet and affect copy-paste behavior.

Tables and other objects that are not inline are implicitly handled as ending in a newline.

As a result the newlines outsides blocks represent the number of empty lines between the blocks and newline ending the block is contained in the block.

If a page starts with a h1 this heading is considered the page title, else we can fall back to the page name as title.

NOTE: To avoid confusion: "headers" refers to meta data, usually in the form of rfc822 headers at the top of a page. But "heading" refers to a title or subtitle in the document.

Module	`html`	This module supports dumping to HTML
Module	`latex`	This modules handles export of LaTeX Code
Module	`markdown`	This module handles dumping markdown text with pandoc extensions
Module	`plain`	This module handles parsing and dumping input in plain text
Module	`rst`	This module handles dumping reStructuredText with sphinx extensions
Module	`wiki`	This module handles parsing and dumping wiki text
Module	`__main__`	No summary

From __init__.py:

Class	`BaseLinker`	No summary
Class	`DocumentFragment`	Document fragment class for DOM-like access
Class	`DumperClass`	No summary
Class	`Element`	Element class for DOM-like access
Class	`Node`	Base class for DOM-like access to the document structure. @note: This class is not optimized for keeping large structures in memory.
Class	`OldParseTreeBuilder`	No summary
Class	`ParserClass`	Base class for parsers
Class	`ParseTree`	Wrapper for zim parse trees.
Class	`ParseTreeBuilder`	Builder object that builds a `ParseTree`
Class	`StubLinker`	Linker used for testing - just gives back the link as it was parsed. DO NOT USE outside of testing.
Class	`TableParser`	Common functions for converting a table from its' xml structure to another format
Class	`Visitor`	Conceptual opposite of a builder, but with same API. Used to walk nodes in a parsetree and call callbacks for each node. See e.g. `ParseTree.visit()`.
Class	`VisitorSkip`	Exception to be raised when the visitor should skip a leaf node and not decent into it.
Class	`VisitorStop`	Exception to be raised to cancel a visitor action
Function	`canonical_name`	Undocumented
Function	`convert_list_iter_letter_to_number`	No summary
Function	`dump_header_lines`	Return text representation of header dict
Function	`encode_xml`	Encode text such that it can be used in xml @param text: label text as string @returns: encoded text
Function	`get_dumper`	Returns a dumper object instance for a specific format
Function	`get_format`	Returns the module object for a specific format.
Function	`get_format_module`	Returns the module object for a specific format
Function	`get_parser`	Returns a parser object instance for a specific format
Function	`heading_to_anchor`	Derive an anchor name from a heading
Function	`increase_list_iter`	No summary
Function	`list_formats`	Undocumented
Function	`parse_header_lines`	Read header lines in the rfc822 format. Can e.g. look like:
Constant	`ANCHOR`	Undocumented
Constant	`BLOCK`	Undocumented
Constant	`BLOCK_LEVEL`	Undocumented
Constant	`BULLET`	Undocumented
Constant	`BULLETLIST`	Undocumented
Constant	`CHECKED_BOX`	Undocumented
Constant	`EMPHASIS`	Undocumented
Constant	`EXPORT_FORMAT`	Undocumented
Constant	`FORMATTEDTEXT`	Undocumented
Constant	`FRAGMENT`	Undocumented
Constant	`HEADDATA`	Undocumented
Constant	`HEADING`	Undocumented
Constant	`HEADROW`	Undocumented
Constant	`IMAGE`	Undocumented
Constant	`IMPORT_FORMAT`	Undocumented
Constant	`LINE`	Undocumented
Constant	`LINK`	Undocumented
Constant	`LISTITEM`	Undocumented
Constant	`MARK`	Undocumented
Constant	`MIGRATED_BOX`	Undocumented
Constant	`NATIVE_FORMAT`	Undocumented
Constant	`NUMBEREDLIST`	Undocumented
Constant	`OBJECT`	Undocumented
Constant	`PARAGRAPH`	Undocumented
Constant	`STRIKE`	Undocumented
Constant	`STRONG`	Undocumented
Constant	`SUBSCRIPT`	Undocumented
Constant	`SUPERSCRIPT`	Undocumented
Constant	`TABLE`	Undocumented
Constant	`TABLEDATA`	Undocumented
Constant	`TABLEROW`	Undocumented
Constant	`TAG`	Undocumented
Constant	`TEXT_FORMAT`	Undocumented
Constant	`TRANSMIGRATED_BOX`	Undocumented
Constant	`UNCHECKED_BOX`	Undocumented
Constant	`VERBATIM`	Undocumented
Constant	`VERBATIM_BLOCK`	Undocumented
Constant	`XCHECKED_BOX`	Undocumented
Variable	`count_eol_re`	Undocumented
Variable	`DumperContextElement`	Undocumented
Variable	`logger`	Undocumented
Variable	`Pango`	Undocumented
Variable	`split_para_re`	Undocumented
Variable	`_aliases`	Undocumented
Variable	`_is_continue_re`	Undocumented
Variable	`_is_header_re`	Undocumented
Variable	`_letters`	Undocumented

logger = (source)

Undocumented

Pango = (source)

Undocumented

EXPORT_FORMAT: int = (source)

Undocumented

Value

IMPORT_FORMAT: int = (source)

Undocumented

Value

NATIVE_FORMAT: int = (source)

Undocumented

Value

TEXT_FORMAT: int = (source)

Undocumented

Value

UNCHECKED_BOX: str = (source)

Undocumented

Value

'unchecked-box'

CHECKED_BOX: str = (source)

Undocumented

Value

'checked-box'

XCHECKED_BOX: str = (source)

Undocumented

Value

'xchecked-box'

MIGRATED_BOX: str = (source)

Undocumented

Value

'migrated-box'

TRANSMIGRATED_BOX: str = (source)

Undocumented

Value

'transmigrated-box'

BULLET: str = (source)

Undocumented

Value

'*'

FORMATTEDTEXT: str = (source)

Undocumented

Value

'zim-tree'

FRAGMENT: str = (source)

Undocumented

Value

'zim-tree'

HEADING: str = (source)

Undocumented

Value

'h'

PARAGRAPH: str = (source)

Undocumented

Value

'p'

VERBATIM_BLOCK: str = (source)

Undocumented

Value

'pre'

BLOCK: str = (source)

Undocumented

Value

'div'

IMAGE: str = (source)

Undocumented

Value

'img'

OBJECT: str = (source)

Undocumented

Value

'object'

BULLETLIST: str = (source)

Undocumented

Value

'ul'

NUMBEREDLIST: str = (source)

Undocumented

Value

'ol'

LISTITEM: str = (source)

Undocumented

Value

'li'

EMPHASIS: str = (source)

Undocumented

Value

'emphasis'

STRONG: str = (source)

Undocumented

Value

'strong'

MARK: str = (source)

Undocumented

Value

'mark'

VERBATIM: str = (source)

Undocumented

Value

'code'

STRIKE: str = (source)

Undocumented

Value

'strike'

SUBSCRIPT: str = (source)

Undocumented

Value

'sub'

SUPERSCRIPT: str = (source)

Undocumented

Value

'sup'

LINK: str = (source)

Undocumented

Value

'link'

TAG: str = (source)

Undocumented

Value

'tag'

ANCHOR: str = (source)

Undocumented

Value

'anchor'

TABLE: str = (source)

Undocumented

Value

'table'

HEADROW: str = (source)

Undocumented

Value

'thead'

HEADDATA: str = (source)

Undocumented

Value

'th'

TABLEROW: str = (source)

Undocumented

Value

'trow'

TABLEDATA: str = (source)

Undocumented

Value

'td'

LINE: str = (source)

Undocumented

Value

'line'

BLOCK_LEVEL = (source)

Undocumented

Value

(PARAGRAPH, HEADING, VERBATIM_BLOCK, BLOCK, LISTITEM)

_letters: str = (source)

Undocumented

def increase_list_iter(listiter): (source)

Get the next item in a list for a numbered list E.g if listiter is "1" this function returns "2", if it is "a" it returns "b".

Parameters
listiter	the current item, either an integer number or single letter
Returns
the next item, or `None`

def convert_list_iter_letter_to_number(listiter): (source)

Convert a "letter" numbered list to a digit numbered list Usefull for export to formats that do not support letter lists. Both "A." and "a." convert to "1." assumption is that this function is used for start iter only, not whole list

def encode_xml(text): (source)

Encode text such that it can be used in xml

Parameters
text	label text as string
Returns
encoded text

def list_formats(type): (source)

Undocumented

def canonical_name(name): (source)

Undocumented

_aliases: dict[str, str] = (source)

Undocumented

def get_format(name): (source)

Returns the module object for a specific format.

def get_format_module(name): (source)

Returns the module object for a specific format

Parameters
name	the format name
Returns
a module object

def get_parser(name, *arg, **kwarg): (source)

Returns a parser object instance for a specific format

Parameters
name	format name
*arg	arguments to pass to the parser object
**kwarg	keyword arguments to pass to the parser object
Returns
parser object instance (subclass of `ParserClass`)

def get_dumper(name, *arg, **kwarg): (source)

Returns a dumper object instance for a specific format

Parameters
name	format name
*arg	arguments to pass to the dumper object
**kwarg	keyword arguments to pass to the dumper object
Returns
dumper object instance (subclass of `DumperClass`)

def heading_to_anchor(name): (source)

Derive an anchor name from a heading

count_eol_re = (source)

Undocumented

split_para_re = (source)

Undocumented

DumperContextElement = (source)

Undocumented

_is_header_re = (source)

Undocumented

_is_continue_re = (source)

Undocumented

def parse_header_lines(text): (source)

Read header lines in the rfc822 format. Can e.g. look like:

        Content-Type: text/x-zim-wiki
        Wiki-Format: zim 0.4
        Creation-Date: 2010-12-14T14:15:09.134955

Returns
the text minus the headers and a dict with the headers

def dump_header_lines(*headers): (source)

Return text representation of header dict