Project

General

Profile

Actions

Feature #19070

closed

Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods

Feature #19070: Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods

Added by yui-knk (Kaneko Yuichiro) about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:110418]

Description

Background

Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both m(1) and m(1, ) has same AST structure other than node locations then it's impossible to check the existence of , from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case.

Example

require "pp" node = RubyVM::AbstractSyntaxTree.parse(<<~STR, keep_tokens: true) def m(a, b = 1, *rest, &block) end m(1, ) STR defn = node.children[2].children[0] fcall = node.children[2].children[1] puts "defn.tokens" pp defn.tokens puts "\n\n" puts "fcall.tokens" pp fcall.tokens puts "\n\n" puts defn.tokens.map{_1[2]}.join puts fcall.tokens.map{_1[2]}.join 

shows below, where token is [sequence_id, token_type, token_string, [first_line, first_column, last_line, last_column]]

defn.tokens [[0, :kw, "def", [1, 0, 1, 3]], [1, :sp, " ", [1, 3, 1, 4]], [2, :ident, "m", [1, 4, 1, 5]], [3, :lparen, "(", [1, 5, 1, 6]], [4, :ident, "a", [1, 6, 1, 7]], [5, :comma, ",", [1, 7, 1, 8]], [6, :sp, " ", [1, 8, 1, 9]], [7, :ident, "b", [1, 9, 1, 10]], [8, :sp, " ", [1, 10, 1, 11]], [9, :op, "=", [1, 11, 1, 12]], [10, :sp, " ", [1, 12, 1, 13]], [11, :int, "1", [1, 13, 1, 14]], [12, :comma, ",", [1, 14, 1, 15]], [13, :sp, " ", [1, 15, 1, 16]], [14, :op, "*", [1, 16, 1, 17]], [15, :ident, "rest", [1, 17, 1, 21]], [16, :comma, ",", [1, 21, 1, 22]], [17, :sp, " ", [1, 22, 1, 23]], [18, :op, "&", [1, 23, 1, 24]], [19, :ident, "block", [1, 24, 1, 29]], [20, :rparen, ")", [1, 29, 1, 30]], [21, :ignored_nl, "\n", [1, 30, 1, 31]], [22, :kw, "end", [2, 0, 2, 3]]] fcall.tokens [[25, :ident, "m", [4, 0, 4, 1]], [26, :lparen, "(", [4, 1, 4, 2]], [27, :int, "1", [4, 2, 4, 3]], [28, :comma, ",", [4, 3, 4, 4]], [29, :sp, " ", [4, 4, 4, 5]], [30, :rparen, ")", [4, 5, 4, 6]]] def m(a, b = 1, *rest, &block) end m(1, ) 

Interface

  • Add keep_tokens option for RubyVM::AbstractSyntaxTree.parse, .parse_file and .of
  • Add RubyVM::AbstractSyntaxTree::Node#tokens which returns tokens for the node including tokens for descendants nodes.
  • Add RubyVM::AbstractSyntaxTree::Node#all_tokens which returns all tokens for the input script regardless the receiver node.

Implementation

https://github.com/yui-knk/ruby/tree/cst5

Updated by Eregon (Benoit Daloze) about 3 years ago Actions #1 [ruby-core:110419]

Doesn't Ripper.lex already provide this information?

Updated by matz (Yukihiro Matsumoto) about 3 years ago Actions #2 [ruby-core:110631]

Sounds OK.

Matz.

Updated by yui-knk (Kaneko Yuichiro) almost 3 years ago Actions #3

  • Status changed from Open to Closed

Applied in changeset git|d8601621edcf29e3323b90dcf04b774edd9fb45e.


Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods

Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both m(1) and m(1, ) has same AST structure other than node locations
then it's impossible to check the existence of , from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

  • Add keep_tokens option for RubyVM::AbstractSyntaxTree.parse, .parse_file and .of
  • Add RubyVM::AbstractSyntaxTree::Node#tokens which returns tokens for the node including tokens for descendants nodes.
  • Add RubyVM::AbstractSyntaxTree::Node#all_tokens which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

$ cat test.rb root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true) $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] 11408kb # keep_tokens :false $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 17508kb # keep_tokens :true $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 30960kb 

Performance:

$ cat ../ast_keep_tokens.yml prelude: | src = <<~SRC module M class C def m1(a, b) 1 + a + b end end end SRC benchmark: without_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false) with_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true) $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v ../ast_keep_tokens.yml compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] warming up.. | |compare-ruby|built-ruby| |:--------------------|-----------:|---------:| |without_keep_tokens | 21.659k| 21.303k| | | 1.02x| -| |with_keep_tokens | 6.220k| 5.691k| | | 1.09x| -| 
Actions

Also available in: PDF Atom