turbohtml¶
A fast, fully typed HTML toolkit for Python, powered by a C-accelerated core. turbohtml provides spec-correct HTML
escaping and unescaping that match the standard library byte for byte, and a WHATWG-conformant streaming tokenizer — all
several times faster than their pure-Python counterparts and ready for the free-threaded build.
>>> import turbohtml
>>> turbohtml.escape('<a href="?x=1&y=2">Tom & Jerry</a>')
'<a href="?x=1&y=2">Tom & Jerry</a>'
>>> turbohtml.unescape("café & résumé")
'café & résumé'
>>> [token.tag or token.data for token in turbohtml.tokenize("<p>Tom & Jerry</p>")]
['p', 'Tom & Jerry', 'p']
The documentation follows the Diátaxis framework.