Skip to content

Conversation

@KitaitiMakoto
Copy link
Contributor

Hello,

I found parsing XHTML documents like below fails since v3.3.3:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>XHTML Document</title> </head> <body> <h1>XHTML Document</h1> <p xml:lang="ja" lang="ja">この段落は日本語です。</p> </body> </html>

XML namespace spec is a little bit ambiguous but document above is valid according to an article W3C serves.

I fixed the parsing algorithm. Can you review it?

As an aside, <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> style language declaration is often used in XHTML files included in EPUB files because sample EPUB files provided by IDPF, former EPUB spec authority, use the style.

@KitaitiMakoto KitaitiMakoto changed the title Lang attr Fix handling with "xml:" prefixed namespace Sep 27, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@KitaitiMakoto
Copy link
Contributor Author

@kou Thanks for your review! I fixed patches.

@kou kou merged commit 78f8712 into ruby:master Sep 29, 2024
58 of 61 checks passed
@kou
Copy link
Member

kou commented Sep 29, 2024

Thanks.

@KitaitiMakoto KitaitiMakoto deleted the lang-attr branch September 29, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants