########### Tutorials ########### ***************** Getting started ***************** This tutorial walks you from an empty environment to escaping and unescaping your first HTML. Install turbohtml from PyPI: .. code-block:: console $ pip install turbohtml Open a Python prompt and escape some text for safe inclusion in an HTML page: .. code-block:: pycon >>> import turbohtml >>> turbohtml.escape("5 > 3 & 2 < 4") '5 > 3 & 2 < 4' By default the quotation marks are escaped too, which is what you want inside an attribute value: .. code-block:: pycon >>> turbohtml.escape("name=\"O'Brien\"") 'name="O'Brien"' Now go the other way and turn HTML character references back into text: .. code-block:: pycon >>> turbohtml.unescape("Tom & Jerry — café") 'Tom & Jerry — café' From here you can stay with the string helpers, or continue below to break whole documents into tokens. ******************************** Tokenizing your first document ******************************** This tutorial takes you from a string of HTML to a stream of tokens you can inspect. Start with a small document and hand it to :func:`turbohtml.tokenize`, which returns an iterator of :class:`turbohtml.Token` objects: .. code-block:: pycon >>> import turbohtml >>> for token in turbohtml.tokenize('
Tom & Jerry
'): ... print(token) Token(START_TAG, tag='p') Token(TEXT, data='Tom & Jerry') Token(END_TAG, tag='p') Each token tells you what it is through ``type``, a :class:`turbohtml.TokenType`. Start and end tags carry the lowercased tag name and the attributes, already decoded: .. code-block:: pycon >>> start, text, end = turbohtml.tokenize('Tom & Jerry
') >>> start.type