Skip to content

Conversation

volrath
Copy link
Member

@volrath volrath commented Jul 5, 2017

This refactor includes the following:

  • Make parser produce an AST by default.
  • Switch to lexical binding and fix some broken var references.
  • Reorganize reduce functions a bit.
  • Rename some private functions to follow the -- convention.
  • Separate all test cases.

It also includes two printers for the AST, one that goes to elisp code (ala edn.el) and one that returns EDN/Clojure strings.

volrath added 3 commits July 5, 2017 20:06
- Make parser produce an AST by default. - Switch to lexical binding and fix some broken var references. - Reorganizes reduce functions' signatures a bit.
All tests are separated now.
@plexus
Copy link
Collaborator

plexus commented Jul 6, 2017

Thanks! I left some initial comments, but I'll have to have another look with a fresh mind.

Could you look into why the tests are failing on Travis?

@plexus
Copy link
Collaborator

plexus commented Jul 6, 2017

Actually im going to start with writing up an overview of all the use cases this library aims to support, that should hopefully clarify some of the design decisions in there.

@volrath
Copy link
Member Author

volrath commented Jul 6, 2017

It's failing because of a.el, I was hoping it was going to be approved by now =/. I could drop it though, or should we wait?

@lambdaisland
Copy link

lambdaisland commented Jul 6, 2017 via email

@volrath
Copy link
Member Author

volrath commented Jul 7, 2017

Cloning it for the repo seems like a good idea meanwhile. Let me know when you have that overview, should be useful.

btw

Thanks! I left some initial comments,

Couldn't find them 😕

(with-temp-buffer
(insert ,test-string)
(goto-char 1)
(should (equal (,parse-to-fn) ,expected)))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping the whole ert-deftest in a macro I'm thinking it might be better to only have a macro for the inner part, and still use regular ert-deftest. I'm thinking of using the same approach for the integration tests in unrepl.el.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the unrepl-deftest macro? It does include ert-deftest in it, I was actually copying it when I did this macro 😅, although I agree it's better if we just do the internal part. I'll fix it.

clj-parse.el Outdated
(let ((stack nil))

(while (not (eq (clj-lex-token-type token) :eof))
(while (not (eq (clj-lex-token-type (setq token (clj-parse--next))) :eof))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser should not throw away whitespace, that's the job of the reducing functions (if they choose to do so). Otherwise the parser is no longer able to generate an AST that can round-trip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I left you a comment about it in the clj-parse--next function definition. I was on the fence about this as well.

(let ((node (pop stack)))
(funcall reduceN stack node coll))
;; Syntax error
(error "Syntax Error"))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser should not throw syntax errors. Instead it should continue parsing and return a partial result. You can inspect the result afterwards to find syntax errors, or raise them in the reducer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "the reducer", do you mean the reduce1/reduceN functions? or the "to-elisp" / "to-clojure-string" functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, reduce1 and reduceN, which maybe should be called reduce-leaf and reduce-node. Or even shift-leaf and reduce-node, since the first one will typically "shift" a single leaf node onto the stack, whereas the second one "reduces" a number of nodes into a single node.

you can think of parsing as three layers working together: the lexer, "the parser", "the reducer". The last part is pluggable to support different outputs.

to-clojure-string would be "the printer"

@plexus
Copy link
Collaborator

plexus commented Jul 7, 2017

Ugh this github review thing is confusing :( they should be there now.

@volrath
Copy link
Member Author

volrath commented Jul 7, 2017

That was quick! thanks.

clj-parse.el Outdated
(setq next (clj-lex-next))
(while (eq (clj-lex-token-type next) :whitespace)
(setq next (clj-parse--next)))
next)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this to simplify clj-parse--ast-reduce1, and it's ok as long as we only use this parser to produce AST, but if we want to use the clj-parse--reduce algorithm to produce CST (not entirely sure we would want to do that) then we can get rid of it and handle whitespaces when reducing leafs.

clj-parse.el Outdated
(:lparen (clj-parse--make-node :list subnodes))
(:lbracket (clj-parse--make-node :vector subnodes))
(:set (clj-parse--make-node :set subnodes))
(:lbrace (clj-parse--make-node :map subnodes))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left maps as a collection of subnodes so that printers would transform them into whatever they need, but maybe we want kv alists in our AST instead?

clj-parse.el Outdated
(:lbracket (clj-parse--make-node :vector subnodes))
(:set (clj-parse--make-node :set subnodes))
(:lbrace (clj-parse--make-node :map subnodes))
(:discard (clj-parse--make-node :discard subnodes)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For :discard, the subnodes list only has one element, always. Right now the resulting node for a discard token includes this 1-sized list as the list of subnodes, but we could also get rid of the list and just add the its internal node (the discarded element). I'm guessing this depends on implementation details for the AST transversing funtions/API.

Same for :tag nodes.

(with-temp-buffer
(insert ,test-string)
(goto-char 1)
(should (equal (,parse-to-fn) ,expected)))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the unrepl-deftest macro? It does include ert-deftest in it, I was actually copying it when I did this macro 😅, although I agree it's better if we just do the internal part. I'll fix it.

clj-parse.el Outdated
(let ((stack nil))

(while (not (eq (clj-lex-token-type token) :eof))
(while (not (eq (clj-lex-token-type (setq token (clj-parse--next))) :eof))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I left you a comment about it in the clj-parse--next function definition. I was on the fence about this as well.

(let ((node (pop stack)))
(funcall reduceN stack node coll))
;; Syntax error
(error "Syntax Error"))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "the reducer", do you mean the reduce1/reduceN functions? or the "to-elisp" / "to-clojure-string" functions?

(ert-deftest clj-parse-to-elisp-simple-list ()
(clj-parse-eq-test clj-parse-to-elisp
"(1 2 3)"
'((1 2 3))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I don't know if this is the best way to indent this.

@volrath
Copy link
Member Author

volrath commented Jul 7, 2017

Turns out I didn't send my review either...... it is confusing.

@volrath
Copy link
Member Author

volrath commented Jul 9, 2017

@plexus I did some adjustments regarding the DESIGN.md document to help land this PR into master and keep working on the rest of the implementation in further PRs. So far we have:

  • clj-parse-ast
  • clj-parse-ast-print
  • clj-parse-edn

I rewrote tests again, and this time I went with including ert-deftest into a macro (similar to the unrepl-deftest) because I wanted to generate different named tests for each parsing mode.

Regarding :discard nodes: before, I was adding them to the AST as any other node, now I'm completely ignoring them in the AST construction (similar to the elisp reducers) because, tbh, I didn't find an elegant way to handle the case where there are two #_ in a row. "(1 #_#_ 2 3) should produce (1), but if we want to represent that in a AST, we would have to have something like:

- :root - :list - :number 1 - :discard - :number 2 - :discard - :number 3 
@plexus
Copy link
Collaborator

plexus commented Jul 12, 2017

When unsure about how to represent things like #_ or ^, just imagine them with lisp syntax

#_ #_ 2 3 => (discard (discard 2) 3)

That said discards are like whitespace or comments: they don't add any semantics and so should not be represented in the AST mode. In the whitespace aware source mode they should be present.

@plexus plexus merged commit 697618d into clojure-emacs:master Jul 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants