Skip to content

Conversation

@taku0
Copy link
Member

@taku0 taku0 commented Jan 18, 2015

This patch updates most of indentation logic.

The current implementation defines a complex syntax for SIME and depends on the indentation logic of SMIE. Unfortunately, SMIE is a black magic and controlling SMIE is very hard, so that results in a fragile and inconsistent logic:

func foo() { foo .bar() // :( foo .bar() }
CGPoint(x: aaaaaaaaaaaaaaa.x + bbbbbbbbbbbbbbbb, // 1 space y: aaaaaaaaaaaaaaa.y + bbbbbbbbbbbbbbbb) // 2 spaces
class Foo: Bar, Baz { // 4 spaces from "class" } class Foo: Bar, Baz { // aligns with "Foo" }
a .b // 2 spaces if (a .b) { // 1 spaces }
protocol Foo { func foo() func bar() // :( }
public class Foo { func foo() { // :( } // :( } // :(
let a = [ [ // :( a ] ]
UIView.animateWithDuration(1.0, animations: { foo.center = CGPoint(x: x, y: y) // :( } // :( )
protocol Foo { func foo() -> Foo<(A, B), [C]> // :( }

This patch defines a simple syntax enough for indentation and use a simple manual logic for most cases.

The new logic also supports different wrapping styles:

let x = 1 + 1 + 1 let x = 1 + 1 + 1
timer = NSTimer.scheduledTimerWithTimeInterval(1.0, target: self, selector: Selector("onTimer"), userInfo: nil, repeats: true) timer = NSTimer.scheduledTimerWithTimeInterval( 1.0, target: self, selector: Selector("onTimer"), userInfo: nil, repeats: true )
let x = [ foo: bar, foo : bar ]
func a() -> [a] { a } func a() -> [a] { a }

The new logic passes all existing tests and new tests (on Emacs 24.4.1).

Sorry for a huge monolithic patch.

@taku0
Copy link
Member Author

taku0 commented Jan 18, 2015

The failure of Travis CI seems to be a problem in ELPA package dependencies since make check in a clean environment fails with Dependency flycheck-cask failed to install: Package let-alist-1.0.1' is unavailable. By removing flycheck-caskandflychecktemporally from theCask file,make check` passes all tests.

@ap4y
Copy link
Collaborator

ap4y commented Jan 18, 2015

Hi taku0, thanks for your contribution. I kind of agree that smie is a bit complex and I think documentation is rather lackluster. I felt the same way as you are when I started working with it, but at this point I think that smie does a better job of handling scopes, which makes your indentation more robust. I also think that with smie you need significantly less code to achieve better indentation and a result is a much cleaner code. One of the best indentations at the moment IMO is in ruby mode and it's based on smie. Comparing it to the manually implemented indentation in go and rust modes (other 2 methods I use a lot) it looks way more consistent.

The downside (not sure if it's a downside though) is that it has default style of indentation that obviously doesn't fit every user. For each grammar you can have different token position (or maybe formatting style will more correct here) which will require tweaks to the smie indentation rules. For example superclass position in class definition, braces position in statements, dot position in multi-line statements etc can have different styles. Obviously smie can't handle everything by default, so you have to account to it. On the other hand it doesn't require you to change grammars and lexer (in most cases) thanks to the design of the smie. It will track your scopes properly but it doesn't know if you want relative position or not, if you want default indentation offset or not. If you check ruby mode for example, it has a simple well defined grammars and quite a few indentation rules. Unfortunately, it's a hard task to handle every possible style of code formatting. I think it's a collaborative and iterative task, so pull requests and issue reports will help a lot.

I'm not attached to the current implementation of smie by any means, so I keen to hear other opinions. I also keen to collaborate on this problem and I will try to help. I can answer the questions about smie as I understand it, which may not be correct sometimes :). Also I saw in mailing list that author of smie is keen to hear from users about the ways he can improve documentation and make it easier to use.

@taku0
Copy link
Member Author

taku0 commented Jan 19, 2015

Hi ap4y. The new code still based on SMIE and using its great power. SMIE is also getting better, as many strange behaviors I met are fixed in Emacs 24.4.1. What I wrote is variants of smie-indent--parent and smie-indent-keyword with explicit parent tokens.

It is best to achieve same level of indentation quality only with SMIE functions, but I cannot do it for the moment. If the new code does not match the taste of the project, please use only the test code and the lexer. It will help.


The problem I could not solve is a problem with "parent". Assume the following code:

func foo() { bar1 .baz1(); bar2 .baz2(); ; }

The smie-indent--parent for the first "." token is the "{" token while one for the second "." token is the first ";" token. To place the "{" token at the beginning of the line, smie-indent-virtual for the "{" token must return 0. For the ";" token, smie-indent-virtual should return 4 to indent the third ";" token. As the result, returning 2 for (:before . ".") rule, which means an offset from its parent, come down to a zigzag indentation.

The problem of the following code results from the same issue:

a .b // 2 spaces if (a .b) { // 1 spaces }

How to solve this problem?

The strategy I used is indent them based on the token before the parent token, that is bar1 for the first "." token and bar2 for the second "." token. The bar1 and bar2 tokens are indented with :after rule for "{" and ";" token.

Another problem is opener-closer. Assume the following code:

func foo() { } // point A let x = { } // point B

Invoking backward-sexp at the point A with the current swift-mode.el, the cursor backs to the "func" token, since the syntax is defined as (class-level-st ("DECSPEC" "func" func-header "{" insts "}")) and "DECSPEC" is the opener. In contrast, invoking backward-sexp at the point B, the cursor cannot stop at "{" token since it is not a opener. A token cannot be both "opener" and "neither" token.

I restricted openers and closers to true parenthesis (i.e. (), [], {}, ) to keep things simple.

We cannot write ("DECSPEC" "func" func-header block) since a non-terminal token cannot follow another non-terminal token, so that I break it into two non-terminals: (func (func-modifiers "func" params)) and (block ("{" insts "}")). This separation had a side-effect that the parent of the "{" token in the following code is the "," token rather than "class" or preceding ";".

class Foo: Bar, Baz { }

To indent the "{" token, we need to access the "class" or the ";" token, that is an ancestor token but SMIE only provides smie-indent--parent. We could resolve this by writing a carefully crafted BNF, but I rather wrote a function to access ancestor tokens we are interested in.

@ap4y
Copy link
Collaborator

ap4y commented Jan 19, 2015

Ok, I will try to answer the problems as I see it.

For the first example with multi-line dot expressions. This one is rather straightforward, I would try to explain it from the smie perspective. For this particular example BNF grammar defined as (reduced example):

("DECSPEC" "func" func-header "{" insts "}") (insts (inst) (insts ";" insts))

In this particular case we are interested in internal scope rules, which means that the parent of first statement will be always {, all other statements will have ;. This is exactly what is described in your example.

Checking through the defined rules, you will notice that we are overriding only rules for multi-line expression, by giving it (smie-rule-parent swift-indent-multiline-statement-offset). As you mentioned default rules produce incorrect behavior, specifically base offset of the { token is not the same as offset of the ; token. This makes sense too, since all ; tokens will be descendants of the {. There are several solutions to this problem. We could modify multi-line expression rule to have additional offset for the case when . is a descendant of {. Better solution will be to establish consistent baseline inside the scope. To do that we can add implicit semicolon for the { token and set to this new implicit semicolon proper offset. I did something like that:

((smie-rule-parent-p "{") (smie-rule-parent swift-indent-offset))

Second example. For the statements with "headers" (like conditional, loops and class definition) currently lexer doesn't provide any information, so there is no anchor in rules to create offset relatively to start of the header. I just scanned through ruby-mode source code and apparently they tokenise it as a @ symbol. To be honest I didn't account for such style of indentation, so I didn't implement this rules in lexer. It seems like it will be relatively easy to add this additional token and write new rules against it.

I couldn't reproduce problems with sexp movements on the current master branch, so not sure what to say about it. Let me know if you have any question and you need other help.

Coming back to your PR, I again really appreciate your contributions, it's a tremendous amount of work. I absolutely agree that tests and some corrections have to be merged. I'm not ready to merge it the whole commit though. I'm a bit concerned about BNF grammars, it looks like they are rather simplistic, especially for statements. I'm also a bit concerned about overall increased complexity. On the other hand I agree that your code handles more cases correctly at the moment. Again I'm not strictly opposed merging it, but would like you get other opinions. cc @bbatsov.

@taku0
Copy link
Member Author

taku0 commented Jan 20, 2015

Thank you for answering. Inserting implicit semicolon seems reasonable solution, but yet magical (should we insert implicit commas for lists? Is it works for other cases?). It also has problems with some minor cases:

With current implementation, indenting at "|" (which represents the cursor as check-indentation macro):

func foo() { bar(); bar(); |bar(); }

results in:

func foo() { bar(); bar(); |bar(); }

which says "if the user indents the previous line manually, follow it". This behavior is seen in many Emacs modes or other editor/IDEs and important to cope with indentation bug in editors.

By contrast, indenting the following example:

func foo() { bar(); baz |.bar(); }

results in:

func foo() { bar(); baz |.bar(); }

instead of:

func foo() { bar(); baz |.bar(); }

which is inconsistent with previous example. This arises from indenting based on the ";" token. Although this problem itself is not critical, I believe that indenting based on the beginning of the statement is more natural than indenting based on the parent.


For the second example, the problem is not related headers:

foo(); a .b // 2 spaces from smie-indent-virtual of ";", that is the first column. (a .b) // 2 spaces from smie-indent-virtual of "(", // results in 1 space from the "a" token.

The above example should be indented like this (ruby-mode with ruby-use-smie nil):

(a .b)

or this (c-mode):

(a .b)

at least, this indentation is not desirable:

(a .b, a .b)

I couldn't reproduce problems with sexp movements on the current master branch

I should have used class instead of func (there is also problem with func). Using commit 250e8e4 (HEAD) of master branch with Emacs 24.4.1:

foo(); class Foo { } // backward-sexp goes to "class" token let x = { } // backward-sexp goes to beginning of the buffer func foo() { } // backward-sexp goes to beginning of the buffer

I'm not sure that this is a essential problem of SMIE or just a bug in BNF.

This results in "Mismatched parenthesis" or incorrect indentation because smie-indent-close depends on backward-sexp:

class Foo { func foo() { // the beginning of the line is highlighted when closing the function } // :( }

I'm a bit concerned about BNF grammars, it looks like they are rather simplistic, especially for statements.

Indeed, the new code cannot indent some kind of statements below. We need to expand the BNF and/or the rule function.

// expected indents for var i = 0; i < 10; i++ { foo() } for i = 0, j = 0; i < 10; i++ { foo() } for foo in foos { } return 1 // actual indents for var i = 0; i < 10; i++ { foo() } for i = 0, j = 0; i < 10; i++ { foo() } for foo in foos { } return 1
@ap4y
Copy link
Collaborator

ap4y commented Jan 21, 2015

Hey @taku0, sorry for the delay.

I kind of agree that placing implicit semicolon may feel a bit weird in some places. Like in this particular example with { I mentioned it as one of the solution for consistent baseline for parent element in this particular scope. It could have been resolved a bit differently, but it could result in slightly more complex rules. It may be result of not investing enough time from my side, I found resetting indentation for descendants of ; is a bit hard. I will try again and maybe I will come up with a better solution.

I also noticed inconsistencies with parenthesis () and I'm pretty sure ruby-mode has the same behavior/quirk. I haven't investigated it thoroughly yet.

What my current plan is to cherry pick all your tests and try to fix it with pure smie grammar/lexer/rules. It will take some time, but I hope to come up with some result within a week. I hope this will help us to continue this discussion. I'll let you know how it goes asap. I'm not a smie master by any means and had issues with it too, but I feel like it should be flexible enough to handle cases you mentioned. Thanks again for finding so much indentation inconsistencies, fixing those will significantly improve this mode.

@bbatsov
Copy link
Collaborator

bbatsov commented Jan 21, 2015

What my current plan is to cherry pick all your tests and try to fix it with pure smie grammar/lexer/rules. It will take some time, but I hope to come up with some result within a week. I'll let you know how it goes asap. I'm not a smie master by any means and had issues with it too, but I feel like it should be flexible enough to handle cases you mentioned. Thanks again for finding so much indentation inconsistencies, fixing those will significantly improve this mode.

Sounds like a good plan to me. SMIE is the way to go IMO - many modes have adopted it recently and we should push to improve it (and its documentation).

@taku0
Copy link
Member Author

taku0 commented Jan 21, 2015

Here are informal test cases I used while development. This may also helps.
https://gist.github.com/taku0/1fcf50214a71593a72fa

Miscellaneous comments for the current implementation:

swift-smie--decl-specifier-regexp matches modifiers and "func", so that a token "func" is never generated.

All operators are reported as "OP" token, so that the smie-precs->prec2 part has no effects.

Swift has user defined operators, so that swift-smie--operators-regexp is insufficient.

looking-back function does not behave as what computer scientists expect. It behaves like this:

  1. Evaluate folowing steps with save-excursion
  2. Repeat looking-at followed by backward-char until it matches or exceed the limit.
  3. If it exceed the limit, return nil.
  4. Otherwise, if greedy is nil, return t.
  5. Otherwise, repeat looking-at followed by backward-char until it does not match.
  6. forward-char followed by looking-at to set match-beginning and match-end.
  7. Return t.

As a result, evaluating (progn (looking-back "\\(?:class\\)? *func" (point-min) t) (goto-char (match-beginning 0))) after class func goes between "class" and "func", not at "class", since "s func" does not match the regexp. This means swift-smie--backward-token does not work as expected.

swift-smie--implicit-semi-p does not conver several cases. Many bugs are fixed by fixing it.

Type parameters may begin with (and end with) brackets (e.g. Foo<[A], (B, C), Bar<D>>). The lexer should handle them.

swift-smie--backward-token before y of x >> y reports "T>".

The following example is a valid enum definition, so that distinguishing "case" and "ecase" at the lexer is hard. IMO, neither "switch" nor "enum" is used so frequently, so that we can use extra CPU time in the rule function to distinguish them.

enum Foo { case A(Int, [Int : String]), B, C } 

swift-smie--forward-token for "Foo:" returns "Foo:", not "Foo", because of (modify-syntax-entry ?: "_" table).

In (:before . "OP") rule, (looking-at ".[\n]") returns t for "+\n" but not for ">>\n".

@ap4y ap4y mentioned this pull request Jan 26, 2015
@ap4y
Copy link
Collaborator

ap4y commented Jan 26, 2015

Hey @taku0, I just submitted #76 with all or tests but with fixes based on current implementation. Can you check it? Keen to hear your feedback.

@taku0
Copy link
Member Author

taku0 commented Jan 26, 2015

It is significantly improved but we still have a long way to go:

https://gist.github.com/taku0/931503557ba42905bbbd

My indentation logic indents them correctly (excluding some differences of preferences. For example, I prefer

foobar( a, b, c )

and

func foo() { timer = NSTimer.scheduledTimerWithTimeInterval( 1.0, target: self, selector: Selector("onTimer"), userInfo: nil, repeats: true ) }

).

@ap4y
Copy link
Collaborator

ap4y commented Jan 27, 2015

Hey @taku0 thanks for checking and making this gist. I managed to fix some of the issues already, but it will take some time before I will run through all cases (I'm a bit busy at the moment). Overall it should be possible fix all this cases with smie, I would appreciate any help though ;)

@taku0
Copy link
Member Author

taku0 commented Feb 7, 2015

Fixed known bugs.

@ap4y
Copy link
Collaborator

ap4y commented Feb 23, 2015

Hey @taku0 do you think I can close this PR?

@taku0
Copy link
Member Author

taku0 commented Feb 23, 2015

@ap4y
Yes. The master is improved enough.

@taku0 taku0 closed this Feb 23, 2015
@taku0 taku0 mentioned this pull request Jan 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants