Skip to content

HTMLParser does not support escapable raw text mode (<textarea> and <title>) #118350

@savchenko

Description

@savchenko

Bug report

Bug description:

An example where parsing stops after the <style color="red">:

from html.parser import HTMLParser from io import StringIO class HTML2text(HTMLParser): def __init__(self): super().__init__() self.data = StringIO() def handle_data(self, html): self.data.write(html) def get_data(self): return self.data.getvalue().strip() html_test = ''' <!DOCTYPE html> <head><title>Glued</title></head><body><some><style color="red">title</bar> <h1>Spacious </h1><a href="https://heading.net">heading.net</a> <span>not<a href="https://www.arpa.home">my.home.arpa</a><p> URL</p> </body></html> ''' parser = HTML2text() parser.feed(html_test) print(parser.get_data())

Changing a single character in the word "style" restores the normal functionality.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

Labels

3.10only security fixes3.11only security fixes3.12only security fixes3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixes3.9 (EOL)end of lifetype-bugAn unexpected behavior, bug, or errortype-securityA security issue

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions