Skip to content

Conversation

plexus
Copy link
Collaborator

@plexus plexus commented Jul 23, 2017

There are several things to note here.

This splits the package into two: parseclj and parseedn. The first contains the lexer, parser backend, clojure-to-ast parser frontend, AST printer (unparser), and some utility functions for working with the AST.

The second depends on the first, it contains the EDN reader+writer, which makes use of the parser backend from parseclj.

I think this will make it a bit more intuitive for people what to expect from each, it also means we can work on shipping parseclj first. I think there is signifanctly more opportunity for bikeshedding on the parseedn side.

Instead of having several different function for similar things, the parser/reader/unparser/printer functions now take keyword arguments.

  • :fail-fast raise error or continue parsing?
  • :lexical-preservation (clojure parser/unparser only) retain comments/whitespace?
  • :compat (edn reader/printer only) mimic edn.el exactly
  • :tag-readers / :tag-writers (edn reader/printer only) for handling tagged literals in EDN

Another difference that is a bit more explicit in this proposal is that parsing Clojure always parses until end-of-buffer/end-of-string, returning a list of parsed forms, (or to be precise: a :root node with a list of parsed forms), whereas edn-read (from edn.el) and Clojure's read-string read a single form.

Clojure 1.8.0 user=> (read-string "(1 2 3) (4 5 6)") (1 2 3)

I think for EDN this makes sense, and I would like parseedn to do the same thing.

Finally a note about EDN output formats: edn.el makes use cl-defstruct to represent things like sets, insts, and uuids. Emacs Lisp doesn't really have structs, so cl-defstruct really just means representing things as a list with an initial symbol that designates the type.

(edn-read "#{1 2 3}") ;;=> (edn-set (1 2 3)) (edn-read "#inst \"2017-07-23T15:52:09.516-00:00\"") ;;=> (edn-inst 22900 50729) (edn-read "#uuid \"3596e8d6-5f69-48d0-a2f3-fc49bd948330\"") ;;=> (edn-uuid "3596e8d6-5f69-48d0-a2f3-fc49bd948330")

I would like to use this mechanism across the board for anything that can be ambiguous. This way we still get a lossless representation of Clojure data using Emacs's data modelling limitations.

 "#{1 2 3}" => (edn-set (1 2 3)) "false" => (edn-false) "true" => (edn-true) ;; The symbol `t`. Emacs's `t` is also really just a symbol,  "t" => t "(1 2 3)" => (edn-list (1 2 3)) "#uuid \"255efd69-dec9-4428-9142-cebd5357fb2a\"" => (edn-uuid "255efd69-dec9-4428-9142-cebd5357fb2a")

The edn reader/writer can switch between this mode and what edn.el does (with :compat). If you do use this mode then you get a nice symmetry when using custom types, you can pass :tag-readers and :tag-writers to handle those. This is not possible with edn.el as it has no way to know what counts as a custom type.

(parseedn-read "#my-tag {:foo :bar}" :tag-readers (a-list 'my-tag (lambda (val) `(my-custom-type ,val)))) (parseedn-print value :tag-writers (a-list 'my-custom-type (lambda (val) (insert (concat "#my-tag " (parseend-print-to-string val))))))

I hope that makes sense, feel free to ask if I need to elaborate on any of these points.

Copy link
Member

@volrath volrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am completely on board with this solution. I think it addresses very well all the concerns we've discussed.

Added just a few minor comments.

- `:lexical-preservation' If `t', assume the AST contains
whitespace. If `nil', insert whitespace between forms. When
parsing with `:lexical-preservation', you should unparse the
same way. ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also retain the :lexica-preservation value within the root node of the AST, in order to be able to advice/warn the user if they're trying to do something that doesn't make sense: trying to unparse with lexical preservation on a AST that was generated without it, or the other way around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good idea. I would even add it to each branch node. That way the unparser can be a single function that knows how to handle both cases.

of the newly created node.
This implementation handles `:discard' nodes (#_), for other node
types it delegates to `parseclj-ast--reduce-branch'.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on why you went with different functions to handle lexical preservation instead of handling it as an optional argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the functions follow the signature that parseclj-reduce expects. Passing in options would either mean wrapping it in a closure (like here), or adding an extra options argument to all implementations (so we keep a uniform interface for the functions passed to parseclj-reduce.

That said I'm warming up to the idea of passing any key-value arguments passed to the top level parsing functions straight on to these reducer functions. We need that already for :tag-readers, and mightly also want it for :fail-fast, although I think that one could be handled in parseclj-reduce itself. (which I'd rename to parseclj-parse)

Copy link
Member

@volrath volrath Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, it makes sense.

I think passing :fail-fast to reducers could be useful. it'd let implementations decide when and how to fail, our other option is to centralize failure and define a clear way for reducers to pass failure-like values to parseclj-parse so that it can throw the error.

If I understand correctly, what you are proposing by passing kv args to the reducer functions would be something like:

(defun parseclj-ast (&rest args) (parseclj-reduce #'parseclj-ast--reduce-leaf #'parseclj-ast--reduce-node args)) (defun parseclj-reduce (reduce-leaf reduce-node extra) ;; ... (apply #'reduce-leaf stack opener children extra) ;; ... (apply #'reduce-node stack opener children extra))

If that's the case, I think it's a good way to go.

Copy link
Member

@volrath volrath Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you went with centralizing options to parseclj-parse-clojure and parseclj--reduce-coll. Curious about your thoughts on ^.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold on, I'm not done here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll see in the current version that options are passed through to the reducers, which was always the plan. But errors which can be detected in the parser itself are handled there, otherwise we'd have to duplicate them across the different reducers.

Reducers could still error on additional things though, e.g. the edn reader should error when non-edn features like quotes or metadata are used.

DESIGN.md Outdated
This implementation handles `:discard' nodes (#_), for other node
types it delegates to `parseclj-ast--reduce-branch'.")

(defun parse-clj-ast-value (node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseclj-*

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@volrath
Copy link
Member

volrath commented Aug 1, 2017

Other than my open comment, lgtm

plexus added 8 commits August 2, 2017 14:34
Make token keys keywords, as with nodes. Use :token-type instead of just 'type, analogous with :node-type. Provide similar helper functions for tokens and nodes.
This also includes a change where a reduction of :root returns a stack, rather than a single reduced result. This makes the interface of the reduce-branch functions consistent (they always return a stack).
…broken Discard nodes were currently not being preserved, this changes handles reductions of type :discard, but also adds a test case that demonstrates this breaks down in the face of whitespace (or other discard nodes). The reason is that the parser only considers the top two elements on the stack, but in the case of :lexical-preservation, there could be several :whitespace and :comment nodes that need to be taken into account as well. We'll have the same problem when dealing with metadata.
When nodes on the stack can be "empty" (i.e. whitespace, comments), then extra care needs to be taken to prevent e.g. a #_ from discarding the whitespace node that follows it in "#_ 1 2 3". This introduces parseclj--take-token and parseclj--take-value, which should also come in handy to implement three-element reduction rule for metadata literals.
This changes the signature of the reducers to also receive a final `options` argument. This is currently only used by parseedn to pass through the :tag-readers, but it could also be used by `parseedn` to bail on non-edn features when `:fail-fast` is enabled.
@plexus
Copy link
Collaborator Author

plexus commented Aug 2, 2017

I've continued to convert this to the interface laid out in the design doc, there are still a few things missing (parseclj-ast-value, some parseedn stuff), but mostly it's there.

This also implements :lexical-preservation (read and write). I ran into an interesting issue where something like "#_ 5" would end up discarding the whitespace, not the 5, so the parser had to become a bit smarter at how it does its reductions. This does mean some more groundwork is done for implementing metadata.

I've also finally converted tokens to use keywords and :token-type, so they're more in line with how we represent nodes.

@lambdaisland
Copy link

@volrath I've added you as a collaborator. You should be able to merge this if you think it's ready to go in.

@volrath
Copy link
Member

volrath commented Aug 19, 2017

took me a while, but looks good to me =)

@volrath volrath merged commit 168027f into master Aug 19, 2017
@vemv vemv deleted the reorganize-package branch September 11, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants