API
Module holding a class that computes the mask for HTML documents.
DegrotesqueHTMLMarker
Bases: DegrotesqueMarker
A class that returns the mask for SGML (HTML/XML) documents.
Masks all element (opening, closing, single) element definitions and everything else that is within < and >. Masks the contents of code elements (<pre>, <code> and others). Masks links.
get_extensions()
Returns the extensions of file types that can be processed using this marker.
| Returns: |
|
|---|
get_mask(document, to_skip=None)
Returns a string where all HTML-elements are denoted as '1' and plain content as '0'.
| Parameters: |
|
|---|
| Returns: |
|
|---|
_get_tag_name(document)
Returns the name of the tag that starts at the begin of the given string.
| Parameters: |
|
|---|
| Returns: |
|
|---|