Skip to content

Conversation

@ASvyatkovskiy
Copy link
Contributor

@ASvyatkovskiy ASvyatkovskiy commented Jan 7, 2019

The pull request introduces a uniqueness check to avoid duplicate paths contexts in the C# extractor.
For an example code snippet:

using System; namespace Test { class Program { static int TestMethod(int n) { if (n == 0) return 0; if (n == 1) return 1; if (n == 2) return 2; return -1; } } } 

I get following duplicate path contexts (adding full span of start/end syntax tokens to distinguish cases with the same token name but different location in the code snippet): [1] Is it expected behavior, or such path-contexts should be removed?

In addition, the PR suggests sampling terminal nodes in Extractor.GetInternalPaths, before extracting the paths connecting them. The suggested sample size is 30000 - would affect only very deep/wide ASTs. Currently the Sampling down to 200 path contexts is performed on the post processing step https://github.com/tech-srl/code2vec/blob/master/preprocess.py#L23.

[1] [85..89) int,PredefinedType^Parameter^ParameterList^MethodDeclaration_PredefinedType,int [70..74) 3 [70..74) int,PredefinedType^MethodDeclaration_ParameterList_Parameter_PredefinedType,int [85..89) 3 [70..74) int,PredefinedType^MethodDeclaration_ParameterList_Parameter,n [89..90) 8 [85..89) int,PredefinedType^Parameter,n [89..90) 8 [70..74) int,PredefinedType^MethodDeclaration,METHOD_NAME [74..84) 2 [85..89) int,PredefinedType^Parameter^ParameterList^MethodDeclaration,METHOD_NAME [74..84) 2 [118..120) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 10 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 10 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_IdentifierName,n [118..120) 10 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [118..120) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_IdentifierName,n [152..154) 10 [118..120) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [152..154) 10 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [152..154) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_IdentifierName,n [186..188) 10 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [186..188) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_NumericLiteralExpression,0 [123..124) 4 [118..120) n,IdentifierName^EqualsExpression_NumericLiteralExpression,0 [123..124) 4 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,0 [123..124) 4 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_ReturnStatement_NumericLiteralExpression,0 [133..134) 4 [118..120) n,IdentifierName^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,0 [133..134) 4 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,0 [133..134) 4 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 10 [118..120) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 10 [152..154) n,IdentifierName^EqualsExpression_NumericLiteralExpression,1 [157..158) 10 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 10 [118..120) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 10 [152..154) n,IdentifierName^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 10 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_ReturnStatement_UnaryMinusExpression_NumericLiteralExpression,1 [225..226) 10 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block_ReturnStatement_UnaryMinusExpression_NumericLiteralExpression,1 [225..226) 10 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 8 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 8 [186..188) n,IdentifierName^EqualsExpression_NumericLiteralExpression,2 [191..192) 8 [89..90) n,Parameter^ParameterList^MethodDeclaration_Block_IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 8 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 8 [186..188) n,IdentifierName^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 8 [89..90) n,Parameter^ParameterList^MethodDeclaration,METHOD_NAME [74..84) 4 [118..120) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 4 [152..154) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 4 [186..188) n,IdentifierName^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 4 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement_EqualsExpression_NumericLiteralExpression,0 [123..124) 3 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,0 [133..134) 3 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 4 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 4 [123..124) 0,NumericLiteralExpression^EqualsExpression_IdentifierName,n [118..120) 4 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement_EqualsExpression_IdentifierName,n [118..120) 4 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [152..154) 4 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [152..154) 4 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 6 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 6 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 6 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 6 [123..124) 0,NumericLiteralExpression^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 2 [133..134) 0,NumericLiteralExpression^ReturnStatement^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 2 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 6 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 6 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 2 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 2 [225..226) 1,NumericLiteralExpression^UnaryMinusExpression^ReturnStatement^Block^MethodDeclaration_ParameterList_Parameter,n [89..90) 2 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [118..120) 2 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [118..120) 2 [157..158) 1,NumericLiteralExpression^EqualsExpression_IdentifierName,n [152..154) 2 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement_EqualsExpression_IdentifierName,n [152..154) 2 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [186..188) 2 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [186..188) 2 [225..226) 1,NumericLiteralExpression^UnaryMinusExpression^ReturnStatement^Block_IfStatement_EqualsExpression_IdentifierName,n [186..188) 2 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 4 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 4 [225..226) 1,NumericLiteralExpression^UnaryMinusExpression^ReturnStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 4 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 4 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 4 [225..226) 1,NumericLiteralExpression^UnaryMinusExpression^ReturnStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 4 [157..158) 1,NumericLiteralExpression^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 3 [167..168) 1,NumericLiteralExpression^ReturnStatement^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 3 [225..226) 1,NumericLiteralExpression^UnaryMinusExpression^ReturnStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 3 [201..202) 2,NumericLiteralExpression^ReturnStatement^IfStatement_EqualsExpression_NumericLiteralExpression,2 [191..192) 3 [191..192) 2,NumericLiteralExpression^EqualsExpression^IfStatement_ReturnStatement_NumericLiteralExpression,2 [201..202) 3 [191..192) 2,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 2 [201..202) 2,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_EqualsExpression_NumericLiteralExpression,1 [157..158) 2 [191..192) 2,NumericLiteralExpression^EqualsExpression^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 2 [201..202) 2,NumericLiteralExpression^ReturnStatement^IfStatement^Block_IfStatement_ReturnStatement_NumericLiteralExpression,1 [167..168) 2 [191..192) 2,NumericLiteralExpression^EqualsExpression^IfStatement^Block_ReturnStatement_UnaryMinusExpression_NumericLiteralExpression,1 [225..226) 2 [201..202) 2,NumericLiteralExpression^ReturnStatement^IfStatement^Block_ReturnStatement_UnaryMinusExpression_NumericLiteralExpression,1 [225..226) 2 [191..192) 2,NumericLiteralExpression^EqualsExpression^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 2 [201..202) 2,NumericLiteralExpression^ReturnStatement^IfStatement^Block^MethodDeclaration,METHOD_NAME [74..84) 2
@yahave yahave requested a review from urialon January 8, 2019 11:21
Copy link
Collaborator

@urialon urialon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, see minor comments

@urialon
Copy link
Collaborator

urialon commented Jan 9, 2019

Hi Alexey,
Thanks!
All changes seem to be correct and reasonable. Thank you for finding the duplication, it looks like a bug.
Please see minor comments

@urialon
Copy link
Collaborator

urialon commented Jan 9, 2019

LGTM, thanks Alexey!

@urialon urialon merged commit 978564a into tech-srl:master Jan 9, 2019
avi1mizrahi pushed a commit to avi1mizrahi/code2vec that referenced this pull request Feb 18, 2019
* Add sampling in GetInternalPaths * Add uniqueness check in Variables * Use StreamWriter instead of standard out * Add MaxContexts and output file parameters, add Reservoir sampling utility function * Flush stream to write last line * Pass ofile_name command line option to the python script * Change variable names to follow convention used * Specify ofile_name argument instead of stdout * Use a file pointed to by ofile_name rather than directing from stdout * Revert change in the Pool size * Use IDisposable to manage StreamWriter
anki54 pushed a commit to anki54/code2vec that referenced this pull request May 31, 2020
* Add sampling in GetInternalPaths * Add uniqueness check in Variables * Use StreamWriter instead of standard out * Add MaxContexts and output file parameters, add Reservoir sampling utility function * Flush stream to write last line * Pass ofile_name command line option to the python script * Change variable names to follow convention used * Specify ofile_name argument instead of stdout * Use a file pointed to by ofile_name rather than directing from stdout * Revert change in the Pool size * Use IDisposable to manage StreamWriter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants