API
Degrotesque
A tiny web type setter.
The main method "prettify" uses the list of actions to change the contents of the given HTML page.
XML-elements are skipped as well as the contents of specific elements. Additional methods support parsing and setting new values for actions and elements to skip.
Some internal methods exist for determining which parts of the document shall processed and which ones shall be skipped.
__init__()
Sets defaults for the elements which contents shall not be processed.
Sets defaults for actions to perform.
_restore_default_actions()
Instantiates default actions
set_actions(action_names)
Sets the actions to apply.
If the given names of actions are None or empty, the default actions are used.
Otherwise, the actions matching the given names are retrieved from the internal database and their list is returned.
| Parameters: |
|
|---|
set_format(format_name)
Sets the target character representation
| Parameters: |
|
|---|
get_marker(filename, document)
Returns the marker to use.
In a first step, the marker to use is tried to be determined using the file's extension. If the extension matches a marker, this marker is returned.
If the extension is not listed in the markers' extensions lists, it is tried to check whether it is a SGML derivative (HTML/XML/...). In this case, a DegrotesqueHTMLMarker is returned.
If no other marker could be found, a DegrotesqueTextMarker is returned.
| Parameters: |
|
|---|
prettify(document, marker, to_skip=None)
Prettifies (degrotesques) the given document.
It is assumed that the input is given in utf-8.
The result is returned in utf-8 as well.
| Parameters: |
|
|---|
| Returns: |
|
|---|
_replace_unicode(matchobj)
Unicode numbers conversion to itself
| Parameters: |
|
|---|
| Returns: |
|
|---|
_replace_html(matchobj)
Unicode numbers conversion to HTML entities
| Parameters: |
|
|---|
| Returns: |
|
|---|
_replace_character(matchobj)
Unicode numbers conversion to Unicode characters
| Parameters: |
|
|---|
| Returns: |
|
|---|
prettify(document, marker=None, actions=None, replacement_format='text', to_skip=None)
Prettifies (degrotesques) the given document.
Builds a Degrotesque instance, inserts the given options, and applies it on the document.
| Parameters: |
|
|---|
| Returns: |
|
|---|
main(arguments=[])
The main method using parameter from the command line.
The application reads the given file or the files from the folder defined by the given name. If -r/--recursive option is set, the input folder will be scanned recursively. All files are processed but can be limited to those that match the extension defined using the -e/--extension option. The default encoding for the files is utf-8. This can be changed using the -E/--encoding option.
The default actions or those named using the -a/--actions option are applied. When parsing HTML / XML documents, elements are skipped. There are default XML/HTML elements which contents will be skipped as well. The list of these elements may be changed using the -s/--skip option. degrotesque tries to determine the file type using the respective extension. The options -t/--type can be used to set an explicit type.
The target format of the replacements is unicode entity but may be changed using the -f/--format option.
The files are saved under their original name. If the option -B/--no-backup is not given, a backup of the original files is generated named as the original file with the appendix ".orig".
degrotesque can read a configuration named using the -c/--config option. It will save the current options into the file named using the option -w/--write-config.
| Parameters: |
|
|---|
Options
degrotesque must get the name(s) of the files/folders to process.
The following options are optional:
--recursive / -r: Set if the folder — if given — shall be processed recursively
--extensions / -e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed
--encoding / -E <ENCODING>: Sets the file encoding (default: 'utf-8')
--type / -t: Sets the file type ['sgml', 'text', 'md', 'doxygen', 'python', 'rst']
--no-backup / -B: Set if no backup files shall be generated
--format / -f <FORMAT>: Defines the format of the replacements ['html', 'unicode', 'text']
--skip / -s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed
--actions / -a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied
--config / -c <FILE>: Reads options from the named configuration file
--write-config / -w <FILE>: Writes the set options into a configuration file
--help / -h: Prints the help screen
--version / -v: Prints the version