Replies: 2 comments 8 replies
-
| Hi @panoply. Thanks for showing interest! The tokenization algorithm here is very similar to the one in PrismJS with only minor changes. Splitting into lines is done after the tokenization. To avoid losing context when splitting, tags must be closed and immediately opened again when a line break is found. You can see the logic for it here. Splitting the HTML into lines allows diffing when updating the DOM instead of replacing all the HTML like what's done in many other code editors using PrismJS or Highlight.js for syntax highlighting. Here's the diffing algorithm. The only behavioral change I've made to the tokenization algorithm is skipping empty matches. Empty matches can cause an infinite loop which is why PrismJS has this failsafe. By skipping empty matches, an infinite loop becomes impossible, which means we can remove this failsafe and also the length of the linked list since it's unused. I also rewrote the tokenization algorithm slightly to use a singly linked list instead of a doubly linked list, but this optimization doesn't change the behavior of the tokenizer at all. Keeping things snappy beyond 1k LoC would require both virtualization and incremental parsing. Virtualization is something I've avoided doing since it would massively increase the complexity of nearly everything, defeating the purpose of this library IMO. Regarding incremental tokenization, I'd say it's nearly impossible. Prism's regex based grammars are very simple and flexible. Doing incremental parsing with them would be anything but simple, and not worth it. |
Beta Was this translation helpful? Give feedback.
-
| Thanks for getting back to me on this. I really like what you've done here. I started something similar some time ago which is still very much WIP (see papyrus.js.org). I found it interesting to see the common approaches (repo: https://github.com/panoply/papyrus) taken, but of course papyrus is not nearly as fluid and well composed as PCE.I think there is great need for a solution of this nature. I wanted to see about implementing the work you've done in PCE for Papyrus but currently it is not viable, mainly because of the underlying approach I've taken so as the solution can work within node environments, along with rendering logic and extended grammars I have in place (which are solely for my personal tastes).
FWIW: I was able to have 2k LoC and things remained snappy in pce.
Indeed. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@FIameCaster
This is a well done lib. I was looking to do something similar with PrismJS but never got around to exploring things at depth. I'm interested on the per-line tokenizing tactic you've got in place. Initial glance, there is good foundation and extending the existing logic while keeping things snappy beyond 1k LoC is something I'd be interesting in exploring as it's both a challenging and an itch I've been wanting to scratch with Prism.
Given you've already delved into the lexing algorithm of PrismJS, I'd be curious to hear your thoughts around the incremental updates, your experience in refactors to get it where it is now and any issues you encountered. One of the biggest perf-hits with PrismJS is the grammars and it's tokenization method. In my previous attempts, getting Prism to behave reactively in 1k - 2k LoC is difficult.
The per-line targeting of syntactics really peaked my interest.
Beta Was this translation helpful? Give feedback.
All reactions