The markdown-query
package brings the functionality of the mq Markdown processor to Python. mq uses a jq-like syntax to filter, transform, and extract data from Markdown documents.
Installation
pip install markdown-query
Basic Usage
The primary interface is the run()
function, which takes three parameters: a query string, markdown content, and optional configuration.
import mq markdown = """# Title This is a paragraph. ## Section - Item 1 - Item 2 ``javascript console.log("hello"); `` """ # Extract all headings result = mq.run("select(.h1 || .h2)", markdown, None) print(result.values) # ['# Title', '## Section'] # Get heading text without markdown formatting result = mq.run("select(.h1 || .h2) | to_text()", markdown, None) print(result.values) # ['Title', 'Section']
Working with Different Input Formats
The library supports multiple input formats through the Options class:
import mq # Process HTML input html = '<h1>Title</h1><p>Content</p><ul><li>Item</li></ul>' options = mq.Options() options.input_format = mq.InputFormat.HTML result = mq.run(".h1 | upcase()", html, options) print(result.values) # ['# TITLE']
Available input formats:
-
InputFormat.MARKDOWN
(default) InputFormat.HTML
InputFormat.MDX
InputFormat.TEXT
Query Examples
Extract Code Blocks
# Get all code blocks code_blocks = mq.run("select(.code)", markdown, None) # Get code block content as text code_text = mq.run("select(.code) | to_text()", markdown, None)
Filter by Content
# Find headings containing specific text headings = mq.run('select(.h1 || .h2) | select(test("Section"))', markdown, None) # Extract list items items = mq.run(".[] | to_text()", markdown, None)
Transform Content
# Convert headings to uppercase upper_headings = mq.run("select(.h1, .h2) | upcase()", markdown, None) # Replace text in paragraphs modified = mq.run('select(.paragraph) | gsub("paragraph"; "text")', markdown, None)
Accessing Result Data
The run()
function returns an MQResult
object with a values
list:
result = mq.run("select(.h1)", markdown, None) # Access all values for value in result.values: print(value) # Access individual items first_heading = result[0] print(first_heading.text) # Text content print(first_heading.markdown_type) # MarkdownType enum # Iterate over results for item in result: print(f"Type: {item.markdown_type}, Text: {item.text}")
Integration with Other Tools
The library works well with other Python markdown processing tools:
from markitdown import MarkItDown import mq # Convert web pages to markdown, then process with mq markitdown = MarkItDown() result = markitdown.convert("https://example.com") # Extract specific content code_samples = mq.run(".code | to_text()", result.text_content, None) all_links = mq.run(".link | to_html()", result.text_content, None)
Configuration Options
The Options class provides additional configuration:
options = mq.Options() options.input_format = mq.InputFormat.HTML options.list_style = mq.ListStyle.PLUS # Use + for lists options.link_title_style = mq.TitleSurroundStyle.SINGLE options.link_url_style = mq.UrlSurroundStyle.ANGLE result = mq.run("select(.list)", content, options)
Error Handling
Queries that fail will raise a PyRuntimeError
:
try: result = mq.run("invalid_query", markdown, None) except RuntimeError as e: print(f"Query failed: {e}")
Performance
The library is built on Rust and compiled to native code, providing fast processing for large markdown files.
Resources
Support
- 🐛 Report bugs
- 💡 Request features
- ⭐ Star the project if you find it useful!
The markdown-query
package provides a straightforward way to apply mq's markdown processing capabilities in Python applications, from simple content extraction to complex document transformations.
Top comments (0)