Posted on Jul 27

Processing Markdown in Python with mq

The markdown-query package brings the functionality of the mq Markdown processor to Python. mq uses a jq-like syntax to filter, transform, and extract data from Markdown documents.

Installation

pip install markdown-query

Basic Usage

The primary interface is the run() function, which takes three parameters: a query string, markdown content, and optional configuration.

import mq markdown = """# Title This is a paragraph. ## Section - Item 1 - Item 2 ``javascript console.log("hello"); `` """ # Extract all headings result = mq.run("select(.h1 || .h2)", markdown, None) print(result.values) # ['# Title', '## Section']  # Get heading text without markdown formatting result = mq.run("select(.h1 || .h2) | to_text()", markdown, None) print(result.values) # ['Title', 'Section']

Working with Different Input Formats

The library supports multiple input formats through the Options class:

import mq # Process HTML input html = '<h1>Title</h1><p>Content</p><ul><li>Item</li></ul>' options = mq.Options() options.input_format = mq.InputFormat.HTML result = mq.run(".h1 | upcase()", html, options) print(result.values) # ['# TITLE']

Available input formats:

InputFormat.MARKDOWN (default)
InputFormat.HTML
InputFormat.MDX
InputFormat.TEXT

Query Examples

Extract Code Blocks

# Get all code blocks code_blocks = mq.run("select(.code)", markdown, None) # Get code block content as text code_text = mq.run("select(.code) | to_text()", markdown, None)

Filter by Content

# Find headings containing specific text headings = mq.run('select(.h1 || .h2) | select(test("Section"))', markdown, None) # Extract list items items = mq.run(".[] | to_text()", markdown, None)

Transform Content

# Convert headings to uppercase upper_headings = mq.run("select(.h1, .h2) | upcase()", markdown, None) # Replace text in paragraphs modified = mq.run('select(.paragraph) | gsub("paragraph"; "text")', markdown, None)

Accessing Result Data

The run() function returns an MQResult object with a values list:

result = mq.run("select(.h1)", markdown, None) # Access all values for value in result.values: print(value) # Access individual items first_heading = result[0] print(first_heading.text) # Text content print(first_heading.markdown_type) # MarkdownType enum  # Iterate over results for item in result: print(f"Type: {item.markdown_type}, Text: {item.text}")

Integration with Other Tools

The library works well with other Python markdown processing tools:

from markitdown import MarkItDown import mq # Convert web pages to markdown, then process with mq markitdown = MarkItDown() result = markitdown.convert("https://example.com") # Extract specific content code_samples = mq.run(".code | to_text()", result.text_content, None) all_links = mq.run(".link | to_html()", result.text_content, None)

Configuration Options

The Options class provides additional configuration:

options = mq.Options() options.input_format = mq.InputFormat.HTML options.list_style = mq.ListStyle.PLUS # Use + for lists options.link_title_style = mq.TitleSurroundStyle.SINGLE options.link_url_style = mq.UrlSurroundStyle.ANGLE result = mq.run("select(.list)", content, options)

Error Handling

Queries that fail will raise a PyRuntimeError:

try: result = mq.run("invalid_query", markdown, None) except RuntimeError as e: print(f"Query failed: {e}")

Performance

The library is built on Rust and compiled to native code, providing fast processing for large markdown files.

Resources

Support

🐛 Report bugs
💡 Request features
⭐ Star the project if you find it useful!

The markdown-query package provides a straightforward way to apply mq's markdown processing capabilities in Python applications, from simple content extraction to complex document transformations.

DEV Community