Posted on Jul 27, 2023 • Originally published at tnvmadhav.me

How to Generate Table of Contents From Markdown/Html Text in Next.js

Introduction

I wanted a programmatic way to generate and extract 'Table of Contents' HTML snippet from existing markdown text for my Next.js blogging website www.notionworkspaces.com.¹

- These are the benefits of this approach:

You don't have to have a Table of Contents section in all your markdown files or HTML files
You can cut-out/extract (or) keep the table of contents after generating them in the content HTML using cheerio

In this tutorial, I'll teach you about a programmatic way to generate and extract 'Table of Contents' HTML snippet from existing markdown text in Next.js

TLDR; working code snippet here

Original Snippet before modification

I already had a function that converted markdown text to html text using remark² library.

export async function getPostData(id) { const fullPath = path.join(postsDirectory, `${id}.md`); const fileContents = fs.readFileSync(fullPath, 'utf8'); // Use gray-matter to parse the post metadata section const matterResult = matter(fileContents); // Use remark to convert markdown into HTML string const processedContent = await remark() .use(html) .process(matterResult.content); const contentHtml = processedContent.toString(); // Combine the data with the id and contentHtml return { id, contentHtml, ...matterResult.data, }; }

This above snippet was taken from the Next.js's official getting started tutorial³.

- What I wanted exactly:

But, it didn't do everything that I wanted. It didn't generate table of contents based on the structure of markdown data.

I googled around and found an existing library called remark-toc⁴ but it didn't do exactly what I wanted.

It required a few conditions that I didn't want to entertain.

I later stumbled upon rehype⁵ library a more recent take on processing html (also markdown) in Next.js.

The Working Code Snippet

This is final code I use to generate and extract table of contents from my markdown content.

export async function getPostData(id) { const fullPath = path.join(postsDirectory, `${id}.md`); const fileContents = fs.readFileSync(fullPath, 'utf8'); // Use gray-matter to parse the post metadata section const matterResult = matter(fileContents); const file = await unified() .use(remarkParse) .use(remarkRehype) .use(rehypeSlug) .use(rehypeDocument) .use(rehypeFormat) .use(rehypeTOC) .use(rehypeStringify) .process(matterResult.content) // Extract TOC dynamically const $ = cheerio.load(String(file)); const contentTOC = $("nav.toc").html(); $("nav.toc").remove(); const contentHtml = $.html(); // Combine the data with the id and contentHtml return { id, contentHtml, contentTOC, ...matterResult.data, }; }

I used the following imports to get it all working seamlessly,

The import requirements

import { unified } from 'unified' import remarkParse from 'remark-parse' import remarkRehype from 'remark-rehype' import rehypeDocument from 'rehype-document' import rehypeFormat from 'rehype-format' import rehypeStringify from 'rehype-stringify' import rehypeSlug from 'rehype-slug' import rehypeTOC from "@jsdevtools/rehype-toc"; import * as cheerio from 'cheerio';

I used cheerio⁶ to build an DOM tree from html text for me to extract the TOC div component using the name nav.tov and use it as a Table of Contents snippet I used in my react components.

This is a screenshot of how I used this piece of code on www.notionworkspaces.com.