Skip to content

Link in code comment no longer relevant for HTML unescaping #100210

@jcamiel

Description

@jcamiel

Documentation

The link in

# see http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references
is not longer relevant and should be replace:

The link should explain the source of the replacements table:

# see http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references _invalid_charrefs = { 0x00: '\ufffd', # REPLACEMENT CHARACTER 0x0d: '\r', # CARRIAGE RETURN 0x80: '\u20ac', # EURO SIGN 0x81: '\x81', # <control> 0x82: '\u201a', # SINGLE LOW-9 QUOTATION MARK 0x83: '\u0192', # LATIN SMALL LETTER F WITH HOOK 0x84: '\u201e', # DOUBLE LOW-9 QUOTATION MARK 0x85: '\u2026', # HORIZONTAL ELLIPSIS 0x86: '\u2020', # DAGGER 0x87: '\u2021', # DOUBLE DAGGER 0x88: '\u02c6', # MODIFIER LETTER CIRCUMFLEX ACCENT 0x89: '\u2030', # PER MILLE SIGN 0x8a: '\u0160', # LATIN CAPITAL LETTER S WITH CARON

Linked PRs

Metadata

Metadata

Assignees

Labels

docsDocumentation in the Doc dir

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions