- Notifications
You must be signed in to change notification settings - Fork 14
Add a new proposal for how to organize things #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely on board with this solution. I think it addresses very well all the concerns we've discussed.
Added just a few minor comments.
- `:lexical-preservation' If `t', assume the AST contains | ||
whitespace. If `nil', insert whitespace between forms. When | ||
parsing with `:lexical-preservation', you should unparse the | ||
same way. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could also retain the :lexica-preservation
value within the root node of the AST, in order to be able to advice/warn the user if they're trying to do something that doesn't make sense: trying to unparse with lexical preservation on a AST that was generated without it, or the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good idea. I would even add it to each branch node. That way the unparser can be a single function that knows how to handle both cases.
of the newly created node. | ||
This implementation handles `:discard' nodes (#_), for other node | ||
types it delegates to `parseclj-ast--reduce-branch'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious on why you went with different functions to handle lexical preservation instead of handling it as an optional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way the functions follow the signature that parseclj-reduce
expects. Passing in options would either mean wrapping it in a closure (like here), or adding an extra options argument to all implementations (so we keep a uniform interface for the functions passed to parseclj-reduce
.
That said I'm warming up to the idea of passing any key-value arguments passed to the top level parsing functions straight on to these reducer functions. We need that already for :tag-readers
, and mightly also want it for :fail-fast
, although I think that one could be handled in parseclj-reduce
itself. (which I'd rename to parseclj-parse
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, it makes sense.
I think passing :fail-fast
to reducers could be useful. it'd let implementations decide when and how to fail, our other option is to centralize failure and define a clear way for reducers to pass failure-like values to parseclj-parse
so that it can throw the error.
If I understand correctly, what you are proposing by passing kv args to the reducer functions would be something like:
(defun parseclj-ast (&rest args) (parseclj-reduce #'parseclj-ast--reduce-leaf #'parseclj-ast--reduce-node args)) (defun parseclj-reduce (reduce-leaf reduce-node extra) ;; ... (apply #'reduce-leaf stack opener children extra) ;; ... (apply #'reduce-node stack opener children extra))
If that's the case, I think it's a good way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you went with centralizing options to parseclj-parse-clojure
and parseclj--reduce-coll
. Curious about your thoughts on ^.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold on, I'm not done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll see in the current version that options are passed through to the reducers, which was always the plan. But errors which can be detected in the parser itself are handled there, otherwise we'd have to duplicate them across the different reducers.
Reducers could still error on additional things though, e.g. the edn reader should error when non-edn features like quotes or metadata are used.
DESIGN.md Outdated
This implementation handles `:discard' nodes (#_), for other node | ||
types it delegates to `parseclj-ast--reduce-branch'.") | ||
| ||
(defun parse-clj-ast-value (node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseclj-
*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
(using zsh's globbing and perl's rename utility) sed s/clj-parse/parseclj/g -i **/*(.) rename 's/clj-parse/parseclj/g' **/*(.)
To differentiate "leaf nodes" from "branch nodes".
This will need to be moved into its own package eventually.
Other than my open comment, lgtm |
Make token keys keywords, as with nodes. Use :token-type instead of just 'type, analogous with :node-type. Provide similar helper functions for tokens and nodes.
This also includes a change where a reduction of :root returns a stack, rather than a single reduced result. This makes the interface of the reduce-branch functions consistent (they always return a stack).
…broken Discard nodes were currently not being preserved, this changes handles reductions of type :discard, but also adds a test case that demonstrates this breaks down in the face of whitespace (or other discard nodes). The reason is that the parser only considers the top two elements on the stack, but in the case of :lexical-preservation, there could be several :whitespace and :comment nodes that need to be taken into account as well. We'll have the same problem when dealing with metadata.
When nodes on the stack can be "empty" (i.e. whitespace, comments), then extra care needs to be taken to prevent e.g. a #_ from discarding the whitespace node that follows it in "#_ 1 2 3". This introduces parseclj--take-token and parseclj--take-value, which should also come in handy to implement three-element reduction rule for metadata literals.
This changes the signature of the reducers to also receive a final `options` argument. This is currently only used by parseedn to pass through the :tag-readers, but it could also be used by `parseedn` to bail on non-edn features when `:fail-fast` is enabled.
I've continued to convert this to the interface laid out in the design doc, there are still a few things missing (parseclj-ast-value, some parseedn stuff), but mostly it's there. This also implements :lexical-preservation (read and write). I ran into an interesting issue where something like "#_ 5" would end up discarding the whitespace, not the 5, so the parser had to become a bit smarter at how it does its reductions. This does mean some more groundwork is done for implementing metadata. I've also finally converted tokens to use keywords and |
@volrath I've added you as a collaborator. You should be able to merge this if you think it's ready to go in. |
took me a while, but looks good to me =) |
There are several things to note here.
This splits the package into two: parseclj and parseedn. The first contains the lexer, parser backend, clojure-to-ast parser frontend, AST printer (unparser), and some utility functions for working with the AST.
The second depends on the first, it contains the EDN reader+writer, which makes use of the parser backend from parseclj.
I think this will make it a bit more intuitive for people what to expect from each, it also means we can work on shipping parseclj first. I think there is signifanctly more opportunity for bikeshedding on the parseedn side.
Instead of having several different function for similar things, the parser/reader/unparser/printer functions now take keyword arguments.
:fail-fast
raise error or continue parsing?:lexical-preservation
(clojure parser/unparser only) retain comments/whitespace?:compat
(edn reader/printer only) mimic edn.el exactly:tag-readers
/:tag-writers
(edn reader/printer only) for handling tagged literals in EDNAnother difference that is a bit more explicit in this proposal is that parsing Clojure always parses until end-of-buffer/end-of-string, returning a list of parsed forms, (or to be precise: a
:root
node with a list of parsed forms), whereasedn-read
(from edn.el) and Clojure'sread-string
read a single form.I think for EDN this makes sense, and I would like parseedn to do the same thing.
Finally a note about EDN output formats: edn.el makes use
cl-defstruct
to represent things like sets, insts, and uuids. Emacs Lisp doesn't really have structs, socl-defstruct
really just means representing things as a list with an initial symbol that designates the type.I would like to use this mechanism across the board for anything that can be ambiguous. This way we still get a lossless representation of Clojure data using Emacs's data modelling limitations.
The edn reader/writer can switch between this mode and what edn.el does (with
:compat
). If you do use this mode then you get a nice symmetry when using custom types, you can pass:tag-readers
and:tag-writers
to handle those. This is not possible withedn.el
as it has no way to know what counts as a custom type.I hope that makes sense, feel free to ask if I need to elaborate on any of these points.