Clone Refactoring with Lambda Expressions Nikolaos Tsantalis Davood Mazinanian Shahriar Rostami May 24, 2017 1
Motivation • Studied 1M+ clones detected by 4 clone detectors in 9 open-source projects [Tsantalis et al., TSE 2015] • 94% of the clones are either Type-2 or Type-3 • Out of those, only 14% could be safely refactored 2
Because The clones have differences that cannot be parameterized with regular parameters. 1. Method calls or object instantiations (side-effects) 2. Unmatched statements 3 public void testGetFirstMillisecond() { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , d.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } public void testGetFirstMillisecond() { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , h.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } Day d = new Day(1, 3, 1970); Hour h = new Hour(15, 1, 4, 2006); 5094000000L 114390000000L
public void extracted( , long arg1) { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , p.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } public void testGetFirstMillisecond() { extracted( , ); } public void testGetFirstMillisecond() { extracted( , ); } Regular parameters 4 new Day(1, 3, 1970) new Hour(15, 1, 4, 2006)5094000000L 114390000000L RegularTimePeriod p = arg0; arg1 RegularTimePeriod arg0
public void testGetFirstMillisecond() { extracted( , ); } public void testGetFirstMillisecond() { extracted( , ); } Lambda parameters 5 ()-> new Day(1, 3, 1970) ()-> new Hour(15, 1, 4, 2006)5094000000L 114390000000L public void extracted( , long arg1) { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , p.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } RegularTimePeriod p = arg0.get(); arg1 Supplier<RegularTimePeriod> arg0
Contributions 6 • Method assessing if two clones can be refactored with Lambda expressions. • Eclipse plug-in automating the method • Refactoring implementation • Study 46K+ clones (Lambda refactorability) • Dataset and tools publicly available
Approach in a Nutshell 7 Unificationinput output Refactorable? Differences between two clones
public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 8 String containsStr = (String)contains.elementAt(i); line.indexOf(containsStr)>=0; RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); re.matches(line); Input [Tsantalis et al., TSE 2015]
Unification principle 9 Two code fragments can be abstracted into a common functional interface if they: 1. Have same input parameter types 2. Have same output type 3. Throw same exception types Unification Strategy Starting from two differences, we perform: 1. Backward expansion, until input is the same 2. Forward expansion, until output is the same
public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = re.matches(line); } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = line.indexOf(containsStr)>=0; } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 10 String containsStr = (String)contains.elementAt(i); RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); Input: int i Input: int i Output: String containsStr Output: Regexp re
public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 11 String containsStr = (String)contains.elementAt(i); matches = line.indexOf(containsStr)>=0; RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); matches = re.matches(line); Input: int i Input: int i Output: boolean matches Output: boolean matches
12 protected int extracted(Vector vector, ) throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = vector.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } matches = (boolean)matcher.apply(i); Function<Integer, Boolean> matcher Lambda parameterization
Evaluation 13 1. Correctness (compilation errors, behavior preservation) 2. Applicability (location, clone type, Template Method) 3. Characteristics (number, size, functional interface types)
RQ1. Correctness 14 • Refactored 12.6K clones assessed as Lambda parameterizable and covered by unit tests. • Executed the entire test suite after each refactoring • JFreeChart test suite (less than 10 seconds) Zero compilation errors One test failure
RQ2. Applicability 15 • Analyzed 46.7K Type-2 & Type-3 clones from 9 projects • non-refactorable with regular parameters • 60% Type-2 (differences in expressions) • 40% Type-3 (unmatched statements a.k.a. clone gaps) Clone source code Location 58% lambda applicability 72% in test clones 51% in production clones Clone Type 57% applicability in Type-2 clones 60% applicability in Type-3 clones
RQ2. Applicability 16 • Template Method pattern alternative to Lambdas • clones extracted to a new or existing abstract superclass Template Method can be used as an alternative in only 𝟏 𝟑 of the cases refactored with Lambda expressions 1. Clones existing in the same class, or classes without common superclass 2. Existing common superclass cannot be made abstract, because it is instantiated or has other subclasses
RQ3. Characteristics 17 • Analyzed 27.2K refactored clones • Number of lambda expressions • Relative size of lambda expression to clone fragment • Functional interface types Number of Lambdas Relative Lambda Size
RQ3. Characteristics 18 • Built-in functional interfaces • Function<T,R> • Supplier<T> • Consumer<T> Actual functional interface types Custom types break-down
Take-home message 19 1. Lambdas are very effective for parameterizing clones with behavioral differences, esp. test clones 2. Lambdas are equally beneficial for parameterizing Type-2 and Type-3 clones 3. The vast majority of clones require one or two Lambdas 4. 60% of the Lambdas can be parameterized using just three of the Java built-in functional interface types 5. If these functional interface types supported exception throwing, 72% of the clones could be parameterized with built-in functional interfaces
20 https://github.com/tsantalis/JDeodorant http://tiny.cc/ICSE17 https://github.com/tsantalis/jdeodorant-commandline

Clone Refactoring with Lambda Expressions

  • 1.
    Clone Refactoring with LambdaExpressions Nikolaos Tsantalis Davood Mazinanian Shahriar Rostami May 24, 2017 1
  • 2.
    Motivation • Studied 1M+clones detected by 4 clone detectors in 9 open-source projects [Tsantalis et al., TSE 2015] • 94% of the clones are either Type-2 or Type-3 • Out of those, only 14% could be safely refactored 2
  • 3.
    Because The clones havedifferences that cannot be parameterized with regular parameters. 1. Method calls or object instantiations (side-effects) 2. Unmatched statements 3 public void testGetFirstMillisecond() { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , d.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } public void testGetFirstMillisecond() { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , h.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } Day d = new Day(1, 3, 1970); Hour h = new Hour(15, 1, 4, 2006); 5094000000L 114390000000L
  • 4.
    public void extracted(, long arg1) { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , p.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } public void testGetFirstMillisecond() { extracted( , ); } public void testGetFirstMillisecond() { extracted( , ); } Regular parameters 4 new Day(1, 3, 1970) new Hour(15, 1, 4, 2006)5094000000L 114390000000L RegularTimePeriod p = arg0; arg1 RegularTimePeriod arg0
  • 5.
    public void testGetFirstMillisecond(){ extracted( , ); } public void testGetFirstMillisecond() { extracted( , ); } Lambda parameters 5 ()-> new Day(1, 3, 1970) ()-> new Hour(15, 1, 4, 2006)5094000000L 114390000000L public void extracted( , long arg1) { Locale saved = Locale.getDefault(); Locale.setDefault(Locale.UK); TimeZone savedZone = TimeZone.getDefault(); TimeZone.setDefault(TimeZone.getTimeZone("London")); assertEquals( , p.getFirstMillisecond()); Locale.setDefault(saved); TimeZone.setDefault(savedZone); } RegularTimePeriod p = arg0.get(); arg1 Supplier<RegularTimePeriod> arg0
  • 6.
    Contributions 6 • Method assessingif two clones can be refactored with Lambda expressions. • Eclipse plug-in automating the method • Refactoring implementation • Study 46K+ clones (Lambda refactorability) • Dataset and tools publicly available
  • 7.
    Approach in aNutshell 7 Unificationinput output Refactorable? Differences between two clones
  • 8.
    public int read()throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 8 String containsStr = (String)contains.elementAt(i); line.indexOf(containsStr)>=0; RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); re.matches(line); Input [Tsantalis et al., TSE 2015]
  • 9.
    Unification principle 9 Two codefragments can be abstracted into a common functional interface if they: 1. Have same input parameter types 2. Have same output type 3. Throw same exception types Unification Strategy Starting from two differences, we perform: 1. Backward expansion, until input is the same 2. Forward expansion, until output is the same
  • 10.
    public int read()throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = re.matches(line); } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { matches = line.indexOf(containsStr)>=0; } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 10 String containsStr = (String)contains.elementAt(i); RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); Input: int i Input: int i Output: String containsStr Output: Regexp re
  • 11.
    public int read()throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = regexps.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } public int read() throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = contains.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } 11 String containsStr = (String)contains.elementAt(i); matches = line.indexOf(containsStr)>=0; RegularExpression regexp = (RegularExpression)regexps.elementAt(i); Regexp re = regexp.getRegexp(getProject()); matches = re.matches(line); Input: int i Input: int i Output: boolean matches Output: boolean matches
  • 12.
    12 protected int extracted(Vectorvector, ) throws IOException { if (!getInitialized()) { initialize(); setInitialized(true); } int ch = -1; if (line != null) { ch = line.charAt(0); if (line.length() == 1) { line = null; } else { line = line.substring(1); } } else { final int size = vector.size(); for (line = readLine(); line != null; line = readLine()) { boolean matches = true; for (int i = 0; matches && i<size; i++) { } if (matches ^ isNegated()) { break; } } if (line != null) { return read(); } } return ch; } matches = (boolean)matcher.apply(i); Function<Integer, Boolean> matcher Lambda parameterization
  • 13.
    Evaluation 13 1. Correctness (compilationerrors, behavior preservation) 2. Applicability (location, clone type, Template Method) 3. Characteristics (number, size, functional interface types)
  • 14.
    RQ1. Correctness 14 • Refactored12.6K clones assessed as Lambda parameterizable and covered by unit tests. • Executed the entire test suite after each refactoring • JFreeChart test suite (less than 10 seconds) Zero compilation errors One test failure
  • 15.
    RQ2. Applicability 15 • Analyzed46.7K Type-2 & Type-3 clones from 9 projects • non-refactorable with regular parameters • 60% Type-2 (differences in expressions) • 40% Type-3 (unmatched statements a.k.a. clone gaps) Clone source code Location 58% lambda applicability 72% in test clones 51% in production clones Clone Type 57% applicability in Type-2 clones 60% applicability in Type-3 clones
  • 16.
    RQ2. Applicability 16 • TemplateMethod pattern alternative to Lambdas • clones extracted to a new or existing abstract superclass Template Method can be used as an alternative in only 𝟏 𝟑 of the cases refactored with Lambda expressions 1. Clones existing in the same class, or classes without common superclass 2. Existing common superclass cannot be made abstract, because it is instantiated or has other subclasses
  • 17.
    RQ3. Characteristics 17 • Analyzed27.2K refactored clones • Number of lambda expressions • Relative size of lambda expression to clone fragment • Functional interface types Number of Lambdas Relative Lambda Size
  • 18.
    RQ3. Characteristics 18 • Built-infunctional interfaces • Function<T,R> • Supplier<T> • Consumer<T> Actual functional interface types Custom types break-down
  • 19.
    Take-home message 19 1. Lambdasare very effective for parameterizing clones with behavioral differences, esp. test clones 2. Lambdas are equally beneficial for parameterizing Type-2 and Type-3 clones 3. The vast majority of clones require one or two Lambdas 4. 60% of the Lambdas can be parameterized using just three of the Java built-in functional interface types 5. If these functional interface types supported exception throwing, 72% of the clones could be parameterized with built-in functional interfaces
  • 20.