Good Stuff. #21

panoply · 2024-06-24T06:38:58Z

panoply
Jun 24, 2024

@FIameCaster

This is a well done lib. I was looking to do something similar with PrismJS but never got around to exploring things at depth. I'm interested on the per-line tokenizing tactic you've got in place. Initial glance, there is good foundation and extending the existing logic while keeping things snappy beyond 1k LoC is something I'd be interesting in exploring as it's both a challenging and an itch I've been wanting to scratch with Prism.

Given you've already delved into the lexing algorithm of PrismJS, I'd be curious to hear your thoughts around the incremental updates, your experience in refactors to get it where it is now and any issues you encountered. One of the biggest perf-hits with PrismJS is the grammars and it's tokenization method. In my previous attempts, getting Prism to behave reactively in 1k - 2k LoC is difficult.

The per-line targeting of syntactics really peaked my interest.

jonpyt · 2024-06-24T10:03:59Z

jonpyt
Jun 24, 2024
Maintainer

Hi @panoply. Thanks for showing interest!

The tokenization algorithm here is very similar to the one in PrismJS with only minor changes. Splitting into lines is done after the tokenization. To avoid losing context when splitting, tags must be closed and immediately opened again when a line break is found. You can see the logic for it here. Splitting the HTML into lines allows diffing when updating the DOM instead of replacing all the HTML like what's done in many other code editors using PrismJS or Highlight.js for syntax highlighting. Here's the diffing algorithm.

The only behavioral change I've made to the tokenization algorithm is skipping empty matches. Empty matches can cause an infinite loop which is why PrismJS has this failsafe. By skipping empty matches, an infinite loop becomes impossible, which means we can remove this failsafe and also the length of the linked list since it's unused.

I also rewrote the tokenization algorithm slightly to use a singly linked list instead of a doubly linked list, but this optimization doesn't change the behavior of the tokenizer at all.

Keeping things snappy beyond 1k LoC would require both virtualization and incremental parsing. Virtualization is something I've avoided doing since it would massively increase the complexity of nearly everything, defeating the purpose of this library IMO.

Regarding incremental tokenization, I'd say it's nearly impossible. Prism's regex based grammars are very simple and flexible. Doing incremental parsing with them would be anything but simple, and not worth it.

0 replies

panoply · 2024-07-15T18:55:43Z

panoply
Jul 15, 2024
Author

Thanks for getting back to me on this.

I really like what you've done here. I started something similar some time ago which is still very much WIP (see papyrus.js.org). I found it interesting to see the common approaches (repo: https://github.com/panoply/papyrus) taken, but of course papyrus is not nearly as fluid and well composed as PCE.I think there is great need for a solution of this nature.

I wanted to see about implementing the work you've done in PCE for Papyrus but currently it is not viable, mainly because of the underlying approach I've taken so as the solution can work within node environments, along with rendering logic and extended grammars I have in place (which are solely for my personal tastes).

Keeping things snappy beyond 1k LoC

FWIW: I was able to have 2k LoC and things remained snappy in pce.

Prism's regex based grammars are very simple and flexible. Doing incremental parsing with them would be anything but simple, and not worth it.

Indeed.

8 replies

panoply Jul 16, 2024
Author

Hydration can be achieved a few different ways here. It's relatively inexpensive to perform in a build environment, different question on the server. I'll respond in better detail tomorrow, but one possible route might be to leverage the DOM. Given that the tree will require traversal during hydration, it might an idea to provide a structure via annotation.

Looking forward to thinking more deeply about this.

jonpyt Jul 17, 2024
Maintainer

It's relatively inexpensive to perform in a build environment, different question on the server.

Constructing the HTML string for an editor will be very fast. Shouldn't impact server response times during SSR much at all.

it might an idea to provide a structure via annotation.

I don't think I understand why we need any annotations in the DOM.

Anyway, I looked into using this API with Astro, and it looks very promising.

--- import { renderEditor } from "prism-code-editor/ssr" import "prism-code-editor/prism/languages/tsx" import "prism-code-editor/layout.css" import "prism-code-editor/scrollbar.css" import "prism-code-editor/themes/github-dark.css"  const editor = renderEditor(Astro.props) --- <Fragment set:html={editor} /> <script> import { mountAll } from "prism-code-editor/client" import "prism-code-editor/prism/languages/tsx" const editors = mountAll(document.body, options => [ // Extensions here ]) </script>

This uses the set:html directive to display the editor and a client-side script to do the hydration. With a Fragment we don't even need a wrapper element to add the HTML to.

When using this Astro component, there would be zero layout shifts. You can even do the hydration in a dynamic import, so almost zero JavaScript gets downloaded before the page loads.

panoply Jul 17, 2024
Author

encode it in a data attribute
reconstruct the options used to create the editor by reading the DOM.
We could add a data attribute to hydrated editors

Referring to these operations when I mentioned annotation. Hydration need additional contexts, passing data within attributes makes the most sense.

To handle extensions, the hydrate functions could receive a callback function.
Lastly, how could we handle the onTokenize, onUpdate, and onSelectionChange options? It's obviously impossible to encode a function in an HTML string

This is where things will get a little tedious imo, but maybe some restrictions are good for the sake of sanity. Though there could be situations where developers set multiple instances with each instance differing in the features and capabilities to apply. If that route was to be taken, the only slightly possible way would be to compose a object representation and possibly leverage the LZ-based compression algorithm, but it feels really "hacky"...

panoply Jul 17, 2024
Author

RE #21 (reply in thread)

I've not used Astro and when it comes to the virtual DOM I am leveraging a different full stack web framework which is a successor to mithril.js and is not yet public (the author is still working on some things) so my base knowledge of execution and how these frameworks are handling Hydration is limited. Hydration in the project I am referring to is elegantly handled by walking the tree, comments are used to signal for component hydration, which might be worth considering, though I don't know how viable it would be for PCE.

I am however well versed with 11ty and things are rather simple in that context.

jonpyt Jul 17, 2024
Maintainer

Referring to these operations when I mentioned annotation. Hydration need additional contexts, passing data within attributes makes the most sense

The textContent of the editor becomes the value option. tabSize is already added as an inline style. The language, lineNumbers, readOnly, and wordWrap options can all be determined by reading the class-name of the scroll container. Only insertSpaces needs an annotation as I mentioned earlier. A data attribute for it would work just fine.

Uh oh!

Good Stuff. #21

Uh oh!

panoply Jun 24, 2024

Replies: 2 comments · 8 replies

Uh oh!

jonpyt Jun 24, 2024 Maintainer

Uh oh!

panoply Jul 15, 2024 Author

Uh oh!

Uh oh!

panoply Jul 16, 2024 Author

Uh oh!

jonpyt Jul 17, 2024 Maintainer

Uh oh!

panoply Jul 17, 2024 Author

Uh oh!

Uh oh!

panoply Jul 17, 2024 Author

Uh oh!

jonpyt Jul 17, 2024 Maintainer

panoply
Jun 24, 2024

Replies: 2 comments 8 replies

jonpyt
Jun 24, 2024
Maintainer

panoply
Jul 15, 2024
Author

panoply Jul 16, 2024
Author

jonpyt Jul 17, 2024
Maintainer

panoply Jul 17, 2024
Author

panoply Jul 17, 2024
Author

jonpyt Jul 17, 2024
Maintainer