DEV Community

spO0q
spO0q

Posted on

Hugo: remove accents from anchors

So I have this very specific need: removing accents from generated anchors with Hugo.

To my knowledge, there is no accent in English, but other languages contain plenty of them.

For example, in French, it gives:

<h2 id="#les-frameworks-de-r%C3%A9f%C3%A9rence"> 
Enter fullscreen mode Exit fullscreen mode

The problem is that if you use {{ .TableOfContents }} to generate a table of contents for your posts, anchors have to be exactly same as those generated in {{ .Content }}.

Therefore, it's not possible to manually filter headings or override them in layouts.

This global configuration seems effective:

[markup] [markup.goldmark] [markup.goldmark.parser] autoHeadingIDType = 'github-ascii' 
Enter fullscreen mode Exit fullscreen mode

This config is similar to the default autoHeadingIDType (github), but it removes non-ASCII characters.

Now, I get:

<h2 id="#les-frameworks-de-reference"> 
Enter fullscreen mode Exit fullscreen mode

Top comments (2)

Collapse
 
xwero profile image
david duymelinck

Why would you want to remove information, because the formatting of one part of application does it wrong.
It is perfectly valid to use utf-8 encoded characters in the id. It is the fragment in the href that needs to be encoded.

<a href="#les-frameworks-de-r%C3%A9f%C3%A9rence">test</a> <div style="height:3000px"></div> <div id="les-frameworks-de-référence">content</div> 
Enter fullscreen mode Exit fullscreen mode

I also wonder how AI handles those ascii-ized words? I think it is fair to assume they score lower.

Collapse
 
spo0q profile image
spO0q

I'm not saying it's not valid. I just don't like it, especially when you want to bookmark some URL on a specific anchor.

I find various articles saying you have to override the layout, but if you use automatic table of contents, it breaks.