Running on the Command Line

degrotesque is started on the command line.

Synopsis

degrotesque [-h] [-c FILE] [--version] [-r] 
            [-e EXTENSIONS] [-E ENCODING]
            [-t {sgml,text,md,doxygen,python,rst}] 
            [-B] [-f {html,unicode,char}] [-s SKIP] 
            [-a ACTIONS] [-w FILE]
            input

Description

degrotesque reads one or multiple files named on the command line. Multiple files are read if the given name contains an asteriks ('*') or is a folder. If the option -r / --recursive is set and a folder is given, it will be processed recursively.

Per default, all files are processed when the given path points to a folder. You may limit the files to process by their extension using the -e <EXTENSION>[,<EXTENSION>]* / --extensions <EXTENSION>[,<EXTENSION>]* option - multiple file extensions can be given, separated using a ','.

The files are assumed to be encoded using UTF-8 per default. You may change the encoding using the option -E <ENCODING> / --encoding <ENCODING>.

The files are read one by one and the replacement of plain characters by some nicer ones is based upon a chosen set of “actions”. Known actions are given in Appendix A. You may select the actions to apply using the -a <ACTION_NAME>[,<ACTION_NAME>]* / --actions <ACTION_NAME>[,<ACTION_NAME>]* option. The default actions are ‘quotes.english’, ‘dashes’, ‘ellipsis’, ‘math’, ‘apostrophe’, and ‘commercial’.

Per default, Unicode characters are inserted (e.g. ‘—’ for an mdash). You may change this using the --format <FORMAT> / -f <FORMAT> option. The following formats are currently supported:

  • unicode’: uses numeric entities (e.g. ‘&#8211;’ for an ‘—’);
  • html’: uses HTML entities (e.g. ‘&mdash;’ for an ‘—’);
  • char’: uses plain (utf-8) characters (e.g. ‘—’ for an ‘—’).

degrotesque tries to determine whether the read files are plain text files, markdown files, or XML/HTML derivatives using the files' extensions and contents. Appendix B lists the extensions by which files are recognized as HTML / markdown files. To be secure, one may set the file type using the -t <TYPE> / --type <TYPE> option. The following types are currently recognized:

  • sgml’: used for processing XML/HTML documents;
  • text’: used for processing plain text files;
  • md’: used for processing Markdown documents;
  • doxygen’: used for processing files documents using the Doxygen syntax;
  • python’: used for processing Python files;
  • rst’: used for processing restructuredText documents.

When parsing XML/HTML files, the script does not change the quotation marks within elements, of course. As well, the contents of several elements, such as <code> or <pre>, are skipped. You may change the list of elements which contents shall not be processed using the option -s <ELEMENT_NAME>[,<ELEMENT_NAME>]* / --skip <ELEMENT_NAME>[,<ELEMENT_NAME>]*. The list of elements that are skipped per default is given in Appendix C. This works only if the set / determined file type is ‘sgml’.

When parsing Markdown and restructuredText files, code is skipped. Quotes as well. When parsing doxygen files, only the contents of the doxygen-comments are processed. Only comments are processed in Python files, skipping pydoctest parts. The complete content of text files is processed. URLs and ISBN/ISSN numbers are always skipped (as well in text files), see Appendix D.

After the actions have been applied to its contents, the file is saved. By default, a backup of the original file is saved under the same name, with the appendix “.orig”. You may omit the creation of these backup files using the option -B / --no-backup.

You may as well define all the options in a configuration file. The options set within the configuration file must be preceeded by a line with "[degrotesque]". You can define the configuration file to load using the option -c <FILE> / --config <FILE>. You may generate a configuration file that contains the currently given options using the option -w <FILE> / --write-config <FILE>

The option --help / -h prints a help screen. The option --version the degrotesque's version number.

Examples

degrotesque --actions quotes.german my_page.html

Replaces single and double quotes within the file “my_page.html” by their typographic German counterparts.

degrotesque --recursive --no-backup my_folder

Applies the default actions to all files in the folder “my_folder” and all subfolders. No backup files are generated. The files format of each file is determined using the file's extension.

Command line arguments

The script can be started on the command line with the following options:

  • --config/-c <FILE>: Load options from the named cofniguration file
  • --recursive/-r: Set if the folder — if given — shall be processed recursively
  • --extensions/-e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed
  • --encoding/-E <ENCODING>: The assumed encoding of the files
  • --type/-t: Defines the file type of the read files
  • --no-backup/-B: Set if no backup files shall be generated
  • --format/-f <FORMAT>: Define the format of the replacements [‘html’, ‘unicode’, ‘char’]
  • --skip/-s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed
  • --actions/-a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied
  • --write-config/-w <FILE>: Save the current options into the named file (generate a configuration file)
  • --help: Prints the help screen
  • --version: Prints the version