Add a new proposal for how to organize things #7

plexus · 2017-07-23T16:08:42Z

There are several things to note here.

This splits the package into two: parseclj and parseedn. The first contains the lexer, parser backend, clojure-to-ast parser frontend, AST printer (unparser), and some utility functions for working with the AST.

The second depends on the first, it contains the EDN reader+writer, which makes use of the parser backend from parseclj.

I think this will make it a bit more intuitive for people what to expect from each, it also means we can work on shipping parseclj first. I think there is signifanctly more opportunity for bikeshedding on the parseedn side.

Instead of having several different function for similar things, the parser/reader/unparser/printer functions now take keyword arguments.

:fail-fast raise error or continue parsing?
:lexical-preservation (clojure parser/unparser only) retain comments/whitespace?
:compat (edn reader/printer only) mimic edn.el exactly
:tag-readers / :tag-writers (edn reader/printer only) for handling tagged literals in EDN

Another difference that is a bit more explicit in this proposal is that parsing Clojure always parses until end-of-buffer/end-of-string, returning a list of parsed forms, (or to be precise: a :root node with a list of parsed forms), whereas edn-read (from edn.el) and Clojure's read-string read a single form.

Clojure 1.8.0 user=> (read-string "(1 2 3) (4 5 6)") (1 2 3)

I think for EDN this makes sense, and I would like parseedn to do the same thing.

Finally a note about EDN output formats: edn.el makes use cl-defstruct to represent things like sets, insts, and uuids. Emacs Lisp doesn't really have structs, so cl-defstruct really just means representing things as a list with an initial symbol that designates the type.

(edn-read "#{1 2 3}") ;;=> (edn-set (1 2 3)) (edn-read "#inst \"2017-07-23T15:52:09.516-00:00\"") ;;=> (edn-inst 22900 50729) (edn-read "#uuid \"3596e8d6-5f69-48d0-a2f3-fc49bd948330\"") ;;=> (edn-uuid "3596e8d6-5f69-48d0-a2f3-fc49bd948330")

I would like to use this mechanism across the board for anything that can be ambiguous. This way we still get a lossless representation of Clojure data using Emacs's data modelling limitations.

 "#{1 2 3}" => (edn-set (1 2 3)) "false" => (edn-false) "true" => (edn-true) ;; The symbol `t`. Emacs's `t` is also really just a symbol,  "t" => t "(1 2 3)" => (edn-list (1 2 3)) "#uuid \"255efd69-dec9-4428-9142-cebd5357fb2a\"" => (edn-uuid "255efd69-dec9-4428-9142-cebd5357fb2a")

The edn reader/writer can switch between this mode and what edn.el does (with :compat). If you do use this mode then you get a nice symmetry when using custom types, you can pass :tag-readers and :tag-writers to handle those. This is not possible with edn.el as it has no way to know what counts as a custom type.

(parseedn-read "#my-tag {:foo :bar}" :tag-readers (a-list 'my-tag (lambda (val) `(my-custom-type ,val)))) (parseedn-print value :tag-writers (a-list 'my-custom-type (lambda (val) (insert (concat "#my-tag " (parseend-print-to-string val))))))

I hope that makes sense, feel free to ask if I need to elaborate on any of these points.

volrath

I am completely on board with this solution. I think it addresses very well all the concerns we've discussed.

Added just a few minor comments.

volrath · 2017-07-24T08:53:59Z

DESIGN.md

+- `:lexical-preservation' If `t', assume the AST contains
+ whitespace. If `nil', insert whitespace between forms. When
+ parsing with `:lexical-preservation', you should unparse the
+ same way. ")


Maybe we could also retain the :lexica-preservation value within the root node of the AST, in order to be able to advice/warn the user if they're trying to do something that doesn't make sense: trying to unparse with lexical preservation on a AST that was generated without it, or the other way around.

That's a really good idea. I would even add it to each branch node. That way the unparser can be a single function that knows how to handle both cases.

volrath · 2017-07-24T08:58:17Z

DESIGN.md

+of the newly created node.
+
+This implementation handles `:discard' nodes (#_), for other node
+types it delegates to `parseclj-ast--reduce-branch'.")


Curious on why you went with different functions to handle lexical preservation instead of handling it as an optional argument.

This way the functions follow the signature that parseclj-reduce expects. Passing in options would either mean wrapping it in a closure (like here), or adding an extra options argument to all implementations (so we keep a uniform interface for the functions passed to parseclj-reduce.

That said I'm warming up to the idea of passing any key-value arguments passed to the top level parsing functions straight on to these reducer functions. We need that already for :tag-readers, and mightly also want it for :fail-fast, although I think that one could be handled in parseclj-reduce itself. (which I'd rename to parseclj-parse)

I understand, it makes sense.

I think passing :fail-fast to reducers could be useful. it'd let implementations decide when and how to fail, our other option is to centralize failure and define a clear way for reducers to pass failure-like values to parseclj-parse so that it can throw the error.

If I understand correctly, what you are proposing by passing kv args to the reducer functions would be something like:

(defun parseclj-ast (&rest args) (parseclj-reduce #'parseclj-ast--reduce-leaf #'parseclj-ast--reduce-node args)) (defun parseclj-reduce (reduce-leaf reduce-node extra) ;; ... (apply #'reduce-leaf stack opener children extra) ;; ... (apply #'reduce-node stack opener children extra))

If that's the case, I think it's a good way to go.

I see you went with centralizing options to parseclj-parse-clojure and parseclj--reduce-coll. Curious about your thoughts on ^.

hold on, I'm not done here.

You'll see in the current version that options are passed through to the reducers, which was always the plan. But errors which can be detected in the parser itself are handled there, otherwise we'd have to duplicate them across the different reducers.

Reducers could still error on additional things though, e.g. the edn reader should error when non-edn features like quotes or metadata are used.

volrath · 2017-07-24T08:59:00Z

DESIGN.md

+This implementation handles `:discard' nodes (#_), for other node
+types it delegates to `parseclj-ast--reduce-branch'.")
+
+(defun parse-clj-ast-value (node)


parseclj-*

(using zsh's globbing and perl's rename utility) sed s/clj-parse/parseclj/g -i **/*(.) rename 's/clj-parse/parseclj/g' **/*(.)

To differentiate "leaf nodes" from "branch nodes".

This will need to be moved into its own package eventually.

volrath · 2017-08-01T14:07:16Z

Other than my open comment, lgtm

Make token keys keywords, as with nodes. Use :token-type instead of just 'type, analogous with :node-type. Provide similar helper functions for tokens and nodes.

This also includes a change where a reduction of :root returns a stack, rather than a single reduced result. This makes the interface of the reduce-branch functions consistent (they always return a stack).

…broken Discard nodes were currently not being preserved, this changes handles reductions of type :discard, but also adds a test case that demonstrates this breaks down in the face of whitespace (or other discard nodes). The reason is that the parser only considers the top two elements on the stack, but in the case of :lexical-preservation, there could be several :whitespace and :comment nodes that need to be taken into account as well. We'll have the same problem when dealing with metadata.

When nodes on the stack can be "empty" (i.e. whitespace, comments), then extra care needs to be taken to prevent e.g. a #_ from discarding the whitespace node that follows it in "#_ 1 2 3". This introduces parseclj--take-token and parseclj--take-value, which should also come in handy to implement three-element reduction rule for metadata literals.

This changes the signature of the reducers to also receive a final `options` argument. This is currently only used by parseedn to pass through the :tag-readers, but it could also be used by `parseedn` to bail on non-edn features when `:fail-fast` is enabled.

plexus · 2017-08-02T16:34:01Z

I've continued to convert this to the interface laid out in the design doc, there are still a few things missing (parseclj-ast-value, some parseedn stuff), but mostly it's there.

This also implements :lexical-preservation (read and write). I ran into an interesting issue where something like "#_ 5" would end up discarding the whitespace, not the 5, so the parser had to become a bit smarter at how it does its reductions. This does mean some more groundwork is done for implementing metadata.

I've also finally converted tokens to use keywords and :token-type, so they're more in line with how we represent nodes.

lambdaisland · 2017-08-02T16:52:32Z

@volrath I've added you as a collaborator. You should be able to merge this if you think it's ready to go in.

volrath · 2017-08-19T13:03:29Z

took me a while, but looks good to me =)

Document proposal for alternative package organization

ba9f4d7

volrath reviewed Jul 24, 2017

View reviewed changes

plexus added 9 commits July 25, 2017 13:57

Rename to parseclj

313fc4e

(using zsh's globbing and perl's rename utility) sed s/clj-parse/parseclj/g -i **/*(.) rename 's/clj-parse/parseclj/g' **/*(.)

Rename parseclj-reduce to parseclj-parse

da4bacb

Rename reduce-node to reduce-branch

3a92eaf

To differentiate "leaf nodes" from "branch nodes".

Rename clj-lex to parseclj-lex

a71e57d

Rename clj-ast to parseclj-ast

fcd1a08

Rename clj-edn to parseedn, keep it in this package for now.

6d40b39

This will need to be moved into its own package eventually.

Introduce parseclj-parse-clojure

7beff77

Implement parsing with lexical preservation (keep whitespace, comments)

0f16fcf

Implement :fail-fast

0644bcd

plexus added 8 commits August 2, 2017 14:34

Clean up node and token types

a9dba19

Make token keys keywords, as with nodes. Use :token-type instead of just 'type, analogous with :node-type. Provide similar helper functions for tokens and nodes.

Unparse ASTs that have lexical preservation.

96b8180

This also includes a change where a reduction of :root returns a stack, rather than a single reduced result. This makes the interface of the reduce-branch functions consistent (they always return a stack).

Vocab chage: closer/opener => closing-token/opening-token

2781f0c

Position starts at 1, not at 0 (just like (point))

987bd16

Add missing require

6e0dc95

volrath merged commit 168027f into master Aug 19, 2017

vemv deleted the reorganize-package branch September 11, 2023 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new proposal for how to organize things #7

Add a new proposal for how to organize things #7

Uh oh!

plexus commented Jul 23, 2017

volrath left a comment

volrath Jul 24, 2017

plexus Jul 25, 2017

volrath Jul 24, 2017

plexus Jul 25, 2017

volrath Jul 25, 2017 •

edited

Loading

volrath Aug 1, 2017 •

edited

Loading

plexus Aug 2, 2017

plexus Aug 2, 2017

volrath Jul 24, 2017

plexus Jul 25, 2017

volrath commented Aug 1, 2017

plexus commented Aug 2, 2017

lambdaisland commented Aug 2, 2017

volrath commented Aug 19, 2017

Labels

3 participants

Add a new proposal for how to organize things #7

Add a new proposal for how to organize things #7

Uh oh!

Conversation

plexus commented Jul 23, 2017

volrath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

volrath Jul 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

volrath Aug 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

volrath commented Aug 1, 2017

plexus commented Aug 2, 2017

lambdaisland commented Aug 2, 2017

volrath commented Aug 19, 2017

Labels

3 participants

volrath Jul 25, 2017 •

edited

Loading

volrath Aug 1, 2017 •

edited

Loading