Class | Re |
Wrapper around regex pattern objects which memorizes the last match object and gives list access to it's capturing groups. See module re for regex docs. |
Class | TextBuffer |
No summary |
Function | escape_string |
Escape special characters with a backslash Escapes newline, tab, backslash itself and any characters in chars |
Function | link_type |
Function that returns a link type for urls and page links |
Function | normalize_win32_share |
No summary |
Function | parse_date |
Returns a tuple of (year, month, day) for a date string or None if failed to parse the string. Current supported formats: |
Function | split_escaped_string |
Split string on char while respecting backslash escapes |
Function | split_quoted_strings |
Split a word list respecting quotes |
Function | unescape_quoted_string |
Removes quotes from a string and unescapes embedded quotes @returns: string |
Function | unescape_string |
Unescape backslash escapes in string Recognizes \n and \t for newline and tab respectively, otherwise keeps the literal character |
Function | uri_scheme |
Function that returns a scheme for URIs, URLs and email addresses |
Function | url_decode |
Replace url-encoding hex sequences with their proper characters. |
Function | url_encode |
Replaces non-standard characters in urls with hex codes. |
Function | valid_interwiki_key |
Undocumented |
Constant | URL_ENCODE_DATA |
Undocumented |
Constant | URL_ENCODE_PATH |
Undocumented |
Constant | URL_ENCODE_READABLE |
Undocumented |
Variable | is_email_re |
Undocumented |
Variable | is_interwiki_keyword_re |
Undocumented |
Variable | is_interwiki_re |
Undocumented |
Variable | is_path_re |
Undocumented |
Variable | is_uri_re |
Undocumented |
Variable | is_url_re |
Undocumented |
Variable | is_win32_path_re |
Undocumented |
Variable | is_win32_share_re |
Undocumented |
Variable | is_www_link_re |
Undocumented |
Variable | url_re |
Undocumented |
Function | _escape |
Undocumented |
Function | _unescape |
Undocumented |
Function | _url_decode |
Undocumented |
Function | _url_decode_bytes |
Undocumented |
Function | _url_encode |
Undocumented |
Function | _url_encode_on_error |
Undocumented |
Function | _url_encode_readable |
Undocumented |
Variable | _classes |
Undocumented |
Variable | _parse_date_re |
Undocumented |
Variable | _url_bytes_decode_re |
Undocumented |
Variable | _url_decode_ascii_re |
Undocumented |
Variable | _url_decode_unicode_bytes_re |
Undocumented |
Variable | _url_encode_path_re |
Undocumented |
Variable | _url_encode_re |
Undocumented |
Parameters | |
path | a filesystem path or URL |
Returns | |
the platform specific path or the original input path |
Returns a tuple of (year, month, day) for a date string or None if failed to parse the string. Current supported formats:
Where '-' can be replaced by any separator. Any preceding or trailing text will be ignored (so we can parse journal page names correctly).
TODO: Some setting to prefer US dates with mm-dd instead of dd-mm TODO: More date formats ?
Split a word list respecting quotes
This function always expect full words to be quoted, even if quotes appear in the middle of a word, they are considered word boundries.
( XDG Desktop Entry spec says full words must be quoted and quotes in a word escaped, but doesn't specifify what to do with loose quotes in a string. )
Also a comma "," is handled specially and is always considered a word on it's own.
Parameters | |
string | string to split in words |
unescape | if True quotes are removed, else they are left in place |
strict | if True unmatched quotes will cause a ValueError to be raised, if False unmatched quotes are ignored. |
Returns | |
list of strings |
Replace url-encoding hex sequences with their proper characters.
Mode can be:
The mode URL_ENCODE_READABLE will not decode any other characters, so urls decoded with these modes can still contain escape sequences. They are safe to use within zim, but should be re-encoded with URL_ENCODE_READABLE before handing them to an external program.
This method will only decode non-ascii byte codes when the _whole_ byte equivalent of the URL is in valid UTF-8 decoding. Else it is assumed the encoding was done in another format and the decoding fails silently for these byte sequences.
Replaces non-standard characters in urls with hex codes.
Mode can be:
The mode URL_ENCODE_READABLE can be applied to urls that are already encoded because it does not touch the "%" character. The modes URL_ENCODE_DATA and URL_ENCODE_PATH can only be applied to strings that are known not to be encoded.
The encoded URL is a string containing only ASCII characters