Changelog

vUnreleased (2026-06-11)

No significant changes.

v0.2.0 (2026-06-11)

Features - 0.2.0

  • Add a WHATWG-conformant HTML tokenizer: turbohtml.tokenize() for whole strings, the streaming turbohtml.Tokenizer, and the turbohtml.Token / turbohtml.TokenType types. The C state machine is validated against the html5lib-tests tokenizer conformance suite and bulk-scans text runs the way html5ever does. (#6)

  • Speed up escape and unescape across the board. escape: one-byte strings are classified sixteen bytes at a time with NEON / SSE2 (a SWAR word elsewhere) — on NEON a single low-nibble table lookup matches all five specials at once — the sizing pass accumulates growth branchlessly, the writing pass copies clean stretches wholesale and rewrites only the positions a match bitmask singles out, and UCS-2 / UCS-4 text is probed for all special characters in a single SWAR pass instead of one PyUnicode_FindChar sweep per character (UCS-4 with a four-lane NEON vector). unescape: the scan hops between & occurrences and bulk-copies the clean spans instead of funnelling every character through a per-code-point emit, output staging stays at the input’s width until a reference actually widens it, and the entities html.escape emits resolve with one comparison instead of the full binary search — unescaping escaped real HTML runs about three times faster than via the general lookup path alone. The benchmark now uses pyperf with multi-MiB real documents referenced as pinned git submodules under tools/bench-data - by @gaborbernat. (#7)

v0.1.1 (2026-06-09)

Packaging updates - 0.1.1

  • Publish each wheel artifact in its own job so PEP 740 attestations finish within the Sigstore signing identity’s lifetime, fixing the sigstore.oidc.ExpiredIdentity failure that blocked the first PyPI upload - by @gaborbernat. (#4)

v0.1.0 (2026-06-09)

Features - 0.1.0

Improved documentation - 0.1.0

  • Document the measured escape/unescape speedups over the standard library in the README and the docs, add a reproducible benchmark behind tox -e bench, and give the API reference typed signatures with intersphinx links - by @gaborbernat. (#2)

Miscellaneous internal changes - 0.1.0

  • Automate releases the tox-dev way: git-tag-derived versioning, a towncrier-managed changelog, and a manual prepare-release workflow that tags and triggers the trusted-publishing wheel build - by @gaborbernat. (#1)