mdast utility to transform to nlcst.
- What is this?
- When should I use this?
- Install
- Use
- API
- Types
- Compatibility
- Security
- Related
- Contribute
- License
This package is a utility that takes an mdast (markdown) syntax tree as input and turns it into nlcst (natural language).
This project is useful when you want to deal with ASTs and inspect the natural language inside markdown. Unfortunately, there is no way yet to apply changes to the nlcst back into mdast.
The hast utility hast-util-to-nlcst
does the same but uses an HTML tree as input.
The remark plugin remark-retext
wraps this utility to do the same at a higher-level (easier) abstraction.
This package is ESM only. In Node.js (version 14.14+ and 16.0+), install with npm:
npm install mdast-util-to-nlcst
In Deno with esm.sh
:
import {toNlcst} from 'https://esm.sh/mdast-util-to-nlcst@5'
In browsers with esm.sh
:
<script type="module"> import {toNlcst} from 'https://esm.sh/mdast-util-to-nlcst@5?bundle' </script>
Say we have the following example.md
:
Some *foo*sball.
…and next to it a module example.js
:
import {read} from 'to-vfile' import {ParseEnglish} from 'parse-english' import {inspect} from 'unist-util-inspect' import {fromMarkdown} from 'mdast-util-from-markdown' import {toNlcst} from 'mdast-util-to-nlcst' const file = await read('example.md') const mdast = fromMarkdown(file) const nlcst = toNlcst(mdast, file, ParseEnglish) console.log(inspect(nlcst))
Yields:
RootNode[1] (1:1-1:17, 0-16) └─0 ParagraphNode[1] (1:1-1:17, 0-16) └─0 SentenceNode[4] (1:1-1:17, 0-16) ├─0 WordNode[1] (1:1-1:5, 0-4) │ └─0 TextNode "Some" (1:1-1:5, 0-4) ├─1 WhiteSpaceNode " " (1:5-1:6, 4-5) ├─2 WordNode[2] (1:7-1:16, 6-15) │ ├─0 TextNode "foo" (1:7-1:10, 6-9) │ └─1 TextNode "sball" (1:11-1:16, 10-15) └─3 PunctuationNode "." (1:16-1:17, 15-16)
This package exports the identifier toNlcst
. There is no default export.
Turn an mdast tree into an nlcst tree.
👉 Note:
tree
must have positional info andfile
must be aVFile
corresponding totree
.
tree
(MdastNode
) — mdast tree to transformfile
(VFile
) — virtual fileParser
(ParserConstructor
orParserInstance
) — parser to useoptions
(Options
, optional) — configuration
nlcst tree (NlcstNode
).
Configuration (TypeScript type).
List of mdast node types to ignore (Array<string>
, optional).
The types 'table'
, 'tableRow'
, and 'tableCell'
are always ignored.
Show example
Say we have the following file example.md
:
A paragraph. > A paragraph in a block quote.
…and if we now transform with ignore: ['blockquote']
, we get:
RootNode[2] (1:1-3:1, 0-14) ├─0 ParagraphNode[1] (1:1-1:13, 0-12) │ └─0 SentenceNode[4] (1:1-1:13, 0-12) │ ├─0 WordNode[1] (1:1-1:2, 0-1) │ │ └─0 TextNode "A" (1:1-1:2, 0-1) │ ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2) │ ├─2 WordNode[1] (1:3-1:12, 2-11) │ │ └─0 TextNode "paragraph" (1:3-1:12, 2-11) │ └─3 PunctuationNode "." (1:12-1:13, 11-12) └─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14)
List of mdast node types to mark as nlcst source nodes (Array<string>
, optional).
The type 'inlineCode'
is always marked as source.
Show example
Say we have the following file example.md
:
A paragraph. > A paragraph in a block quote.
…and if we now transform with source: ['blockquote']
, we get:
RootNode[3] (1:1-3:32, 0-45) ├─0 ParagraphNode[1] (1:1-1:13, 0-12) │ └─0 SentenceNode[4] (1:1-1:13, 0-12) │ ├─0 WordNode[1] (1:1-1:2, 0-1) │ │ └─0 TextNode "A" (1:1-1:2, 0-1) │ ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2) │ ├─2 WordNode[1] (1:3-1:12, 2-11) │ │ └─0 TextNode "paragraph" (1:3-1:12, 2-11) │ └─3 PunctuationNode "." (1:12-1:13, 11-12) ├─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14) └─2 ParagraphNode[1] (3:1-3:32, 14-45) └─0 SentenceNode[1] (3:1-3:32, 14-45) └─0 SourceNode "> A paragraph in a block quote." (3:1-3:32, 14-45)
Create a new parser (TypeScript type).
type ParserConstructor = new () => ParserInstance
nlcst parser (TypeScript type).
For example, parse-dutch
, parse-english
, or parse-latin
.
type ParserInstance = { tokenizeSentencePlugins: ((node: NlcstSentence) => void)[] tokenizeParagraphPlugins: ((node: NlcstParagraph) => void)[] tokenizeRootPlugins: ((node: NlcstRoot) => void)[] parse(value: string | null | undefined): NlcstRoot tokenize(value: string | null | undefined): Array<NlcstSentenceContent> }
This package is fully typed with TypeScript. It exports the types Options
, ParserConstructor
, and ParserInstance
.
Projects maintained by the unified collective are compatible with all maintained versions of Node.js. As of now, that is Node.js 12.20+, 14.14+, and 16.0+. Our projects sometimes work with older versions, but this is not guaranteed.
Use of mdast-util-to-nlcst
does not involve hast so there are no openings for cross-site scripting (XSS) attacks.
mdast-util-to-hast
— transform mdast to hasthast-util-to-nlcst
— transform hast to nlcsthast-util-to-mdast
— transform hast to mdasthast-util-to-xast
— transform hast to xasthast-util-sanitize
— sanitize hast nodes
See contributing.md
in syntax-tree/.github
for ways to get started. See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.