Accurate and Efficient Refactoring Detection in Commit History 1May 31, 2018 Danny DigNikolaos Tsantalis Matin Mansouri Laleh Eshkevari Davood Mazinanian
Refactoring is noise in evolution analysis • Bug-inducing analysis (SZZ): flag refactoring edits as bug-introducing changes • Tracing requirements to code: miss traceability links due to refactoring • Regression testing: unnecessary execution of tests for refactored code with no behavioral changes • Code review/merging: refactoring edits tangled with the actual changes intended by developers 2
There are many refactoring detection tools • Demeyer et al. [OOPSLA’00] • UMLDiff + JDevAn [Xing & Stroulia ASE’05] • RefactoringCrawler [Dig et al. ECOOP’06] • Weißgerber and Diehl [ASE’06] • Ref-Finder [Kim et al. ICSM’10, FSE’10] • RefDiff [Silva & Valente, MSR’17] 3
Limitations of previous approaches • Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics • Dependence on built versions • only 38% of the change history can be successfully compiled [Tufano et al., 2017] • Unreliable oracles for evaluating precision/recall • Incomplete (refactorings found in release notes or commit messages) • Biased (applying a single tool with two different similarity thresholds) • Artificial (seeded refactorings) 4
Why do we need better accuracy? 5 Empirical studies Refactoring detection Library adaptation Framework migration poor accuracy
Why do we need better accuracy? 6
Contributions 1. First refactoring detection algorithm operating without any code similarity thresholds 2. RefactoringMiner open-source tool with an API 3. Oracle comprising 3,188 refactorings found in 538 commits from 185 open-source projects 4. Evaluation of precision/recall and comparison with previous state-of-the-art 5. Tool infrastructure for comparing multiple refactoring detection tools 7
Approach in a nutshell AST-based statement matching algorithm • Input: code fragments T1 from parent commit and T2 from child commit • Output: • M set of matched statement pairs • UT1 set of unmatched statements from T1 • UT2 set of unmatched statements from T2 • Code changes due to refactoring mechanics: abstraction, argumentization • Code changes due to overlapping refactorings or bug fixes: syntax-aware AST node replacements 8
9 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
10 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(int count) { List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
11 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
12 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
13 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { } return addresses; } try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } AfterBefore
14 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore
15 protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } AfterBefore textual similarity  30%
16 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
17 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
18 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
19 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
20 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
21 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
22 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore textual similarity = 100%
23 private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore A B C D E F G 1 2 3 9 4 5 6 7 8 M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 = {A, B, G} UT2 = {8}
Extract Method detection rule (M, UT1, UT2) = statement-matching(createAddresses, createAddress) M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 ={A, B, G} UT2 = {8} createAddress is a newly added method in child commit  createAddresses in parent commit does not call createAddress  createAddresses in child commit calls createAddress  |M| > |UT2|   createAddress has been extracted from createAddresses 24
Evaluation RQ1: What is the accuracy of RefactoringMiner and how does it compare to the state-of-the-art? RQ2: What is the execution time of RefactoringMiner and how does it compare to the state-of-the-art? 25
Oracle construction • Public dataset with validated refactoring instances: 538 commits from 185 open-source projects [Silva et al., FSE’2016 Distinguished Artifact] • We executed two tools: RefactoringMiner and RefDiff [Silva & Valente, MSR’2017] • We manually validated all 4,108 detected instances with 3 validators for a period of 3 months • 3,188 true positives and 920 false positives 26
Comparison with state-of-the-art RefDiff [Silva & Valente, MSR’2017] • Commit-based refactoring detection tool • Evaluation on 448 seeded refactorings in 20 open-source projects • RefDiff has much higher precision/recall than Ref-Finder [Kim et al. 2010] and RefactoringCrawler [Dig et al. 2006] • Ref-Finder and RefactoringCrawler need fully built Eclipse projects as input 27
28 • RMiner better precision in all refactoring types • In half of the types RefDiff has better recall • Overall RMiner has +22% precision and +1.5% recall
Advantage of RefDiff • Treats code fragments as bags of tokens and ignores the structure 29 - private void startScanner() throws Exception + public void startScanner() throws Exception { { - // check if scanning is enabled + if (!isScanningEnabled()) - if (scanIntervalSeconds <= 0) return; - if ( "manual".equalsIgnoreCase( reload ) ) return; return; + public boolean isScanningEnabled () + { + if (scanIntervalSeconds <=0 || "manual".equalsIgnoreCase( reload )) + return false; + return true; + }
Disadvantage of RefDiff • Inability to deal with changes in the tokens 30 - if (eventBus != null) { + onRemoteStatusChanged(lastRemoteInstanceStatus, currentRemoteInstanceStatus); - StatusChangeEvent event = new StatusChangeEvent(lastRemoteInstanceStatus, - currentRemoteInstanceStatus); - eventBus.publish(event); - } + protected void onRemoteStatusChanged(InstanceInfo.InstanceStatus oldStatus, InstanceInfo.InstanceStatus newStatus) { + if (eventBus != null) { + StatusChangeEvent event = new StatusChangeEvent(oldStatus, newStatus); + eventBus.publish(event); + } + }
Execution time per commit [ms] • On median, RefactoringMiner is 7 times faster than RefDiff 31 ms
Limitations + Future work • Missing context: Pull Up reported as Move, if a class between the source and destination is unchanged. • Nested refactorings: unable to detect Extract Method applied within an extracted method • Unsupported refactorings: refactoring types, such as Rename Variable/Parameter/Field, Extract/Inline Variable can be supported from the analysis of AST replacements. • Oracle bias: plan to add more tools for constructing the oracle (challenge: make tools work without binding information) 32
Conclusions • RefactoringMiner: commit-based refactoring detection • No similarity thresholds • High accuracy: 98% precision, 87% recall • Ultra-fast: 58ms on median per commit • Better than competitive tools (RefDiff): +22% precision, 7 times faster • Largest and least biased refactoring oracle up to date • 3188 true refactoring instances • 538 commits • 185 open-source projects • 3 validators over 3 months (9 person-months) 33 http://refactoring.encs.concordia.ca/oracle/ https://github.com/tsantalis/RefactoringMiner

Accurate and Efficient Refactoring Detection in Commit History

  • 1.
    Accurate and EfficientRefactoring Detection in Commit History 1May 31, 2018 Danny DigNikolaos Tsantalis Matin Mansouri Laleh Eshkevari Davood Mazinanian
  • 2.
    Refactoring is noisein evolution analysis • Bug-inducing analysis (SZZ): flag refactoring edits as bug-introducing changes • Tracing requirements to code: miss traceability links due to refactoring • Regression testing: unnecessary execution of tests for refactored code with no behavioral changes • Code review/merging: refactoring edits tangled with the actual changes intended by developers 2
  • 3.
    There are manyrefactoring detection tools • Demeyer et al. [OOPSLA’00] • UMLDiff + JDevAn [Xing & Stroulia ASE’05] • RefactoringCrawler [Dig et al. ECOOP’06] • Weißgerber and Diehl [ASE’06] • Ref-Finder [Kim et al. ICSM’10, FSE’10] • RefDiff [Silva & Valente, MSR’17] 3
  • 4.
    Limitations of previousapproaches • Dependence on similarity thresholds • thresholds need calibration for projects with different characteristics • Dependence on built versions • only 38% of the change history can be successfully compiled [Tufano et al., 2017] • Unreliable oracles for evaluating precision/recall • Incomplete (refactorings found in release notes or commit messages) • Biased (applying a single tool with two different similarity thresholds) • Artificial (seeded refactorings) 4
  • 5.
    Why do weneed better accuracy? 5 Empirical studies Refactoring detection Library adaptation Framework migration poor accuracy
  • 6.
    Why do weneed better accuracy? 6
  • 7.
    Contributions 1. First refactoringdetection algorithm operating without any code similarity thresholds 2. RefactoringMiner open-source tool with an API 3. Oracle comprising 3,188 refactorings found in 538 commits from 185 open-source projects 4. Evaluation of precision/recall and comparison with previous state-of-the-art 5. Tool infrastructure for comparing multiple refactoring detection tools 7
  • 8.
    Approach in anutshell AST-based statement matching algorithm • Input: code fragments T1 from parent commit and T2 from child commit • Output: • M set of matched statement pairs • UT1 set of unmatched statements from T1 • UT2 set of unmatched statements from T2 • Code changes due to refactoring mechanics: abstraction, argumentization • Code changes due to overlapping refactorings or bug fixes: syntax-aware AST node replacements 8
  • 9.
    9 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
  • 10.
    10 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(int count) { List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore
  • 11.
    11 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  • 12.
    12 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } AfterBefore private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; }
  • 13.
    13 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { } return addresses; } try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } AfterBefore
  • 14.
    14 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore
  • 15.
    15 protected static AddresscreateAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } AfterBefore textual similarity  30%
  • 16.
    16 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
  • 17.
    17 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (1) Abstraction
  • 18.
    18 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
  • 19.
    19 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (2) Argumentization
  • 20.
    20 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
  • 21.
    21 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore (3) AST Node Replacements
  • 22.
    22 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore textual similarity = 100%
  • 23.
    23 private static Address[]createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } } return addresses; } private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; } protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); } return null; } AfterBefore A B C D E F G 1 2 3 9 4 5 6 7 8 M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 = {A, B, G} UT2 = {8}
  • 24.
    Extract Method detectionrule (M, UT1, UT2) = statement-matching(createAddresses, createAddress) M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 ={A, B, G} UT2 = {8} createAddress is a newly added method in child commit  createAddresses in parent commit does not call createAddress  createAddresses in child commit calls createAddress  |M| > |UT2|   createAddress has been extracted from createAddresses 24
  • 25.
    Evaluation RQ1: What isthe accuracy of RefactoringMiner and how does it compare to the state-of-the-art? RQ2: What is the execution time of RefactoringMiner and how does it compare to the state-of-the-art? 25
  • 26.
    Oracle construction • Publicdataset with validated refactoring instances: 538 commits from 185 open-source projects [Silva et al., FSE’2016 Distinguished Artifact] • We executed two tools: RefactoringMiner and RefDiff [Silva & Valente, MSR’2017] • We manually validated all 4,108 detected instances with 3 validators for a period of 3 months • 3,188 true positives and 920 false positives 26
  • 27.
    Comparison with state-of-the-art RefDiff[Silva & Valente, MSR’2017] • Commit-based refactoring detection tool • Evaluation on 448 seeded refactorings in 20 open-source projects • RefDiff has much higher precision/recall than Ref-Finder [Kim et al. 2010] and RefactoringCrawler [Dig et al. 2006] • Ref-Finder and RefactoringCrawler need fully built Eclipse projects as input 27
  • 28.
    28 • RMiner betterprecision in all refactoring types • In half of the types RefDiff has better recall • Overall RMiner has +22% precision and +1.5% recall
  • 29.
    Advantage of RefDiff •Treats code fragments as bags of tokens and ignores the structure 29 - private void startScanner() throws Exception + public void startScanner() throws Exception { { - // check if scanning is enabled + if (!isScanningEnabled()) - if (scanIntervalSeconds <= 0) return; - if ( "manual".equalsIgnoreCase( reload ) ) return; return; + public boolean isScanningEnabled () + { + if (scanIntervalSeconds <=0 || "manual".equalsIgnoreCase( reload )) + return false; + return true; + }
  • 30.
    Disadvantage of RefDiff •Inability to deal with changes in the tokens 30 - if (eventBus != null) { + onRemoteStatusChanged(lastRemoteInstanceStatus, currentRemoteInstanceStatus); - StatusChangeEvent event = new StatusChangeEvent(lastRemoteInstanceStatus, - currentRemoteInstanceStatus); - eventBus.publish(event); - } + protected void onRemoteStatusChanged(InstanceInfo.InstanceStatus oldStatus, InstanceInfo.InstanceStatus newStatus) { + if (eventBus != null) { + StatusChangeEvent event = new StatusChangeEvent(oldStatus, newStatus); + eventBus.publish(event); + } + }
  • 31.
    Execution time percommit [ms] • On median, RefactoringMiner is 7 times faster than RefDiff 31 ms
  • 32.
    Limitations + Futurework • Missing context: Pull Up reported as Move, if a class between the source and destination is unchanged. • Nested refactorings: unable to detect Extract Method applied within an extracted method • Unsupported refactorings: refactoring types, such as Rename Variable/Parameter/Field, Extract/Inline Variable can be supported from the analysis of AST replacements. • Oracle bias: plan to add more tools for constructing the oracle (challenge: make tools work without binding information) 32
  • 33.
    Conclusions • RefactoringMiner: commit-basedrefactoring detection • No similarity thresholds • High accuracy: 98% precision, 87% recall • Ultra-fast: 58ms on median per commit • Better than competitive tools (RefDiff): +22% precision, 7 times faster • Largest and least biased refactoring oracle up to date • 3188 true refactoring instances • 538 commits • 185 open-source projects • 3 validators over 3 months (9 person-months) 33 http://refactoring.encs.concordia.ca/oracle/ https://github.com/tsantalis/RefactoringMiner

Editor's Notes

  • #2 My name is Nikolaos Tsantalis, associate professor at Concordia University, and I will present to you our work on Refactoring detection. The rest of co-authors are Matin (Master student in my group), Laleh (PDF in my group), Davood (PhD in my group), and Danny from Oregon State Univeristy.