Reference¶
turbohtml: fast, typed HTML utilities powered by a C-accelerated core.
- class turbohtml.Token¶
An HTML token produced by Tokenizer or tokenize(). Immutable; the meaningful attributes depend on .type.
- attr(default=None)¶
Return the value of attribute name on a start or end tag. A valueless attribute yields None; a missing attribute yields default.
- class turbohtml.Tokenizer¶
Streaming HTML tokenizer. Feed markup with feed() and iterate the returned iterators; call close() at the end, or use the tokenizer as a context manager so leaving the with block signals end of input, then iterate the tokenizer itself for the remaining tokens. For a whole string at once use tokenize().
- close()¶
Signal end of input and return an iterator over the final tokens, flushing any buffered text and the token in progress.
- feed()¶
Append a chunk of markup and return an iterator over the tokens that are now complete. Text before an unfinished tag stays buffered until more is fed or close() is called.
- turbohtml.__version__ = '0.2.0'¶
The installed package version.
- turbohtml.escape(s, quote=True)¶
Replace special characters “&”, “<” and “>” with HTML-safe sequences.
If the optional flag quote is true (the default), the quotation mark characters, both double quote (”) and single quote (‘), are also translated.
- turbohtml.tokenize(s, /)¶
Tokenize a whole HTML string, returning an iterator of Token objects following the WHATWG tokenization algorithm.
- turbohtml.unescape(s, /)¶
Convert all named and numeric character references in s to the corresponding Unicode characters, following the HTML5 rules.