Changelog¶

vUnreleased (2026-06-11)¶

No significant changes.

Add a WHATWG-conformant HTML tokenizer: turbohtml.tokenize() for whole strings, the streaming turbohtml.Tokenizer, and the turbohtml.Token / turbohtml.TokenType types. The C state machine is validated against the html5lib-tests tokenizer conformance suite and bulk-scans text runs the way html5ever does. (#6)
Speed up escape and unescape across the board. escape: one-byte strings are classified sixteen bytes at a time with NEON / SSE2 (a SWAR word elsewhere) — on NEON a single low-nibble table lookup matches all five specials at once — the sizing pass accumulates growth branchlessly, the writing pass copies clean stretches wholesale and rewrites only the positions a match bitmask singles out, and UCS-2 / UCS-4 text is probed for all special characters in a single SWAR pass instead of one PyUnicode_FindChar sweep per character (UCS-4 with a four-lane NEON vector). unescape: the scan hops between & occurrences and bulk-copies the clean spans instead of funnelling every character through a per-code-point emit, output staging stays at the input’s width until a reference actually widens it, and the entities html.escape emits resolve with one comparison instead of the full binary search — unescaping escaped real HTML runs about three times faster than via the general lookup path alone. The benchmark now uses pyperf with multi-MiB real documents referenced as pinned git submodules under tools/bench-data - by @gaborbernat. (#7)

Publish each wheel artifact in its own job so PEP 740 attestations finish within the Sigstore signing identity’s lifetime, fixing the sigstore.oidc.ExpiredIdentity failure that blocked the first PyPI upload - by @gaborbernat. (#4)

Add C-accelerated turbohtml.escape() and turbohtml.unescape(), matching html.escape() and html.unescape() byte for byte, with free-threading support and per-interpreter wheels for CPython 3.10 through 3.15 - by @gaborbernat. (#1)
Speed up escape of non-ASCII (UCS-2/UCS-4) text that needs no escaping by probing for special characters with a vectorized scan instead of a scalar one, making it several times faster and ahead of html.escape() - by @gaborbernat. (#3)

Document the measured escape/unescape speedups over the standard library in the README and the docs, add a reproducible benchmark behind tox -e bench, and give the API reference typed signatures with intersphinx links - by @gaborbernat. (#2)

Automate releases the tox-dev way: git-tag-derived versioning, a towncrier-managed changelog, and a manual prepare-release workflow that tags and triggers the trusted-publishing wheel build - by @gaborbernat. (#1)