Changelog¶
vUnreleased (2026-06-11)¶
No significant changes.
v0.2.0 (2026-06-11)¶
Features - 0.2.0¶
Add a WHATWG-conformant HTML tokenizer:
turbohtml.tokenize()for whole strings, the streamingturbohtml.Tokenizer, and theturbohtml.Token/turbohtml.TokenTypetypes. The C state machine is validated against the html5lib-tests tokenizer conformance suite and bulk-scans text runs the way html5ever does. (#6)Speed up
escapeandunescapeacross the board.escape: one-byte strings are classified sixteen bytes at a time with NEON / SSE2 (a SWAR word elsewhere) — on NEON a single low-nibble table lookup matches all five specials at once — the sizing pass accumulates growth branchlessly, the writing pass copies clean stretches wholesale and rewrites only the positions a match bitmask singles out, and UCS-2 / UCS-4 text is probed for all special characters in a single SWAR pass instead of onePyUnicode_FindCharsweep per character (UCS-4 with a four-lane NEON vector).unescape: the scan hops between&occurrences and bulk-copies the clean spans instead of funnelling every character through a per-code-point emit, output staging stays at the input’s width until a reference actually widens it, and the entitieshtml.escapeemits resolve with one comparison instead of the full binary search — unescaping escaped real HTML runs about three times faster than via the general lookup path alone. The benchmark now uses pyperf with multi-MiB real documents referenced as pinned git submodules undertools/bench-data- by @gaborbernat. (#7)
v0.1.1 (2026-06-09)¶
Packaging updates - 0.1.1¶
Publish each wheel artifact in its own job so PEP 740 attestations finish within the Sigstore signing identity’s lifetime, fixing the
sigstore.oidc.ExpiredIdentityfailure that blocked the first PyPI upload - by @gaborbernat. (#4)
v0.1.0 (2026-06-09)¶
Features - 0.1.0¶
Add C-accelerated
turbohtml.escape()andturbohtml.unescape(), matchinghtml.escape()andhtml.unescape()byte for byte, with free-threading support and per-interpreter wheels for CPython 3.10 through 3.15 - by @gaborbernat. (#1)Speed up
escapeof non-ASCII (UCS-2/UCS-4) text that needs no escaping by probing for special characters with a vectorized scan instead of a scalar one, making it several times faster and ahead ofhtml.escape()- by @gaborbernat. (#3)
Improved documentation - 0.1.0¶
Document the measured
escape/unescapespeedups over the standard library in the README and the docs, add a reproducible benchmark behindtox -e bench, and give the API reference typed signatures with intersphinx links - by @gaborbernat. (#2)
Miscellaneous internal changes - 0.1.0¶
Automate releases the tox-dev way: git-tag-derived versioning, a towncrier-managed changelog, and a manual prepare-release workflow that tags and triggers the trusted-publishing wheel build - by @gaborbernat. (#1)