Custom Markdown Parsing

Beta

This guide will walk you through the process of implementing custom Markdown parsing in Tiptap Editor. By the end of this tutorial, you'll be able to extract Tiptap JSON from Tokens.

Extensions can provide custom parsing logic to handle specific Markdown tokens. This is done through the markdown.parse handler.

Creating and understanding a parse handler

A parse handler receives a Markdown token from MarkedJS and returns Tiptap JSON content that can be consumed by the editor. In addition to the token, the parse function also receives a helpers object with utility functions to assist in parsing.

These can be helpful for creating nodes, marks or parsing child MarkedJS tokens from token.tokens.

const MyHeading = Node.create({  name: 'customHeading',  // ...   markdownTokenName: 'heading', // Token type to handle (optional, default is the extension name)  parseMarkdown: (token, helpers) => {  return {  type: 'heading',  attrs: { level: token.depth },  content: helpers.parseInline(token.tokens || []),  }  }, })

In this example the parse handler processes a heading token which is passed through by MarkedJS to our Markdown manager. This token is picked up in this example and transformed into a Tiptap node with the node type heading.

The appropriate level attribute is extracted from the token and it's inline content (as headlines can only contain marks or inline text) are parsed using the helpers.parseInline() function.

Important: Attributes on tokens can vary depending on how the Tokenizer is configured.

Parse Helper Functions

As described in the section above, the helpers object provides utility functions for parsing child tokens or creating nodes and marks. Let us go through each helper and see how they can be used.

Parse inline-level child token with helpers.parseInline(tokens)

This helper takes a list of tokens and tries to parse them as inline content (text nodes with marks). It will not verify if the tokens are actually inline tokens so make sure to only pass inline tokens here.

The function returns TiptapJSON[] that can be used as the content of a Tiptap Node.

parse: (token, helpers) => {  const content = helpers.parseInline(token.tokens || [])   return {  type: 'paragraph',  content,  } }

Parse block-level child token with helpers.parseChildren(tokens)

Similar to parseInline(), but parses tokens as block-level content (e.g., list items, blockquotes, code blocks and more). It will not verify if the tokens are actually block-level tokens so make sure to only pass block-level tokens here.

The function returns TiptapJSON[] that can be used as the content of a Tiptap Node.

parse: (token, helpers) => {  // Parse nested block content (e.g., list items)  const content = helpers.parseChildren(token.tokens || [])   return {  type: 'blockquote',  content,  } }

Parsing Marks with helpers.parseInline() and helpers.applyMark()

Use helpers.applyMark() to apply a mark to content:

const Bold = Mark.create({  name: 'bold',   markdownTokenName: 'strong',  parseMarkdown: (token, helpers) => {  const content = helpers.parseInline(token.tokens || []) // parse the inline content inside the mark  return helpers.applyMark('bold', content) // apply the 'bold' mark to the parsed content  }, })

HTML Parsing in Markdown

When Markdown contains HTML, it's parsed using your extensions' existing parseHTML methods.

# Regular Markdown  <custom-component data-foo="bar">  <p>This HTML is parsed by your extensions</p> </custom-component>  More **Markdown** here.

Defining the Markdown token name

When parsing tokens to nodes or marks, it can happen that tokens may not map one-to-one to your node or mark names. In that case, you can use markdownTokenName to specify which token names to parse and to your nodes or marks type name.

const CustomBold = Mark.create({  name: 'bold',  // ...   markdownTokenName: 'strong', // Match 'strong' tokens when parsing  parseMarkdown: (token, helpers) => { /* ... */ },  renderMarkdown: (node, helpers) => { /* ... */ }, })

This is useful when:

  • Markdown token names differ from your node names
  • Multiple Markdown tokens map to the same node type
  • One node type can be serialized to multiple Markdown formats

Fallback Parsing

If no extension handles a specific token type, the MarkdownManager provides fallback parsing for common tokens:

  • paragraph{ type: 'paragraph' }
  • heading{ type: 'heading', attrs: { level } }
  • text{ type: 'text', text }
  • html → Parsed using extensions' parseHTML methods

You can override this by providing your own handler for these token types.

Debug Parsing

Log tokens to understand what MarkedJS produces:

const markdown = '# Hello **World**' const tokens = editor.markdown.instance.lexer(markdown) console.log(JSON.stringify(tokens, null, 2))

Parse tokens in isolation

const token = {  type: 'heading',  depth: 1,  tokens: [{ type: 'text', text: 'Hello' }], }  const helpers = {  parseInline: tokens => [{ type: 'text', text: 'Hello' }],  // ... other helpers }  const result = myExtension.options.markdown.parse(token, helpers) console.log(result)

Performance Considerations

Lazy Parsing

For large documents, consider parsing on demand:

let cachedJSON = null  function getJSON() {  if (!cachedJSON) {  cachedJSON = editor.markdown.parse(largeMarkdownString)  }  return cachedJSON }

Incremental Updates

Instead of re-parsing the entire document on each change, update specific sections:

editor.commands.insertContentAt(position, newMarkdown, { contentType: 'markdown' })

Examples

Custom Heading Parser

Let's build a custom heading parser for a customHeading extension that will extract the heading level and also generate a unique ID for each heading.

import { Node } from '@tiptap/core'  const CustomHeading = Node.create({  name: 'customHeading',   // ... other config   parseMarkdown: (token, helpers) => {  const level = token.depth || 1 // we can get the heading level from the token   // Add custom attributes  return {  type: 'customHeading',  attrs: {  level,  id: `heading-${Math.random()}`, // Generate ID  },  content: helpers.parseInline(token.tokens || []), // parse the inline content of the heading token  }  }, })

Custom YouTube Embed Parser

Let's create a custom parser for a youtube token that turns the token into a youtubeEmbed node with the appropriate embed attributes.

import { Node } from '@tiptap/core'  const YoutubeEmbed = Node.create({  name: 'youtubeEmbed',  atom: true, // this is a self-contained node   // ... other config   parseMarkdown: (token) => {  // Those attributes are extracted from the youtube token  // we assume that a custom tokenizer is providing these tokens  // from the Markdown syntax like: ![youtube](videoId?start=60&width=800&height=450)  const videoId = token.videoId || ''  const start = token.start || 0  const width = token.width || 560  const height = token.height || 315    // Because this is an atom node, we don't require the helpers  // to parse any children, as this node is self-contained.  return {  type: 'youtubeEmbed',  attrs: {  videoId,  start,  width,  height,  },  }  }, })