DEV Community

Java Daily Coding Problem #001

Andrew (he/him) on April 20, 2019

Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some o...
Collapse
 
anduser96 profile image
Andrei Gatej

I’d use a set and iterate over the list and for each element I’d check if the set contains that element and if it does, it means we found a positive answer and if not, add k - arr[i] to the set.

for (int number: given){ if (set.contains(number)) return true; set.add(k - number); } return false; 

Thank you for sharing!

Collapse
 
awwsmm profile image
Andrew (he/him)

What is the Big-O of this solution, though?

In the worst case, if we have...

2 elements: we will do a contains() on a set with 0 elements, then an add(), then a contains() on a set with 1 element, then an add().

3 elements: all of the above, plus a contains() on a set with 2 elements, then an add()

Assuming we skip the last add() when we know we've reached the end of the list, worst-case would be calling contains() on N sets of length 0 through N-1, plus calling add() N-1 times.

This is definitely less space-efficient than my solution, because of the set, but is it more time-efficient? What do you think?

Collapse
 
anduser96 profile image
Andrei Gatej

I think it depends on how set works under the hood.
At first, because there is only one for loop, one might think it’s only O(n).

But how does the `contains()’ method actually work? I have just read here and it says that set will internally iterate its elements and compare each element to the object passed as parameter.

Another way of making sure that there isn’t another for loop going on behind the scenes is by using a map. Because accessing an item requires O(1) time.(as far as I know)

Thread Thread
 
awwsmm profile image
Andrew (he/him)

So it could be that the Set solution is just as memory-efficient as the double-for loop. I'd love to do a benchmark of this.

Thread Thread
 
anduser96 profile image
Andrei Gatej

If you do, please let me know what the results are!
Thanks!

Collapse
 
rodiongork profile image
Rodion Gorkovenko

Ah, here is the answer already.

It's O(N) for sure. Sets (based on HashMaps) in Java have O(1) amortized for lookup and insert.

Collapse
 
awwsmm profile image
Andrew (he/him)

Very clever! I like this answer.

Collapse
 
techgirl1908 profile image
Angie Jones

yo!🔥

Collapse
 
ntt2k profile image
Trung Nguyen • Edited

I thought the same and this is way more efficient runtime.

Collapse
 
wil222 profile image
wil222

Hi

Why not sorting the data first? If it is sorted, you can easily do a dichotomic search for the second number and get O(n(log(n)). Because sorting can be done in the same complexity, you get something better than the O(n2) algorithm above, with a space complexity of O(1)

You could also sort using a binary tree, then it becomes trivial to search for the second number in the tree while keeping the same complexity (with a space complexity of O(n) though)

If you use the map solution of Andrei, you don't know how much space the map will take (it depends on the implementation) but with a time complexity of O(n). If you use the set, you will get the same thing as my first solution.

Collapse
 
awwsmm profile image
Andrew (he/him) • Edited

Joan and Andrei both provided alternative solutions to this problem. I wondered which of our solutions was the best in the worst-case, so I created a Java Microbenchmark Harness (JMH) example which runs each of these pieces of code to time them. So I present...

A mini-tutorial on using the Java Microbenchmark Harness to benchmark Java code

Setup

Use Maven to set up a simple JMH project:

$ mvn archetype:generate \ -DinteractiveMode=false \ -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DgroupId=<org.sample> \ -DartifactId=<test> \ -Dversion=1.0 

My package is daily, so I use that in place of <org.sample>, and I set the artifactId to 001 because this is the first Daily Coding Problem I'm solving in this fashion. This makes a directory called 001 with the following structure:

$ tree 001 001 ├── pom.xml └── src └── main └── java └── daily └── MyBenchmark.java 4 directories, 2 files 

The contents of MyBenchmark.java are as follows:

 /* * ...Oracle copyright... */ package daily; import org.openjdk.jmh.annotations.Benchmark; public class MyBenchmark { @Benchmark public void testMethod() { // This is a demo/sample template for building your JMH benchmarks. Edit as needed. // Put your benchmark code here. } } 

...but we're going to delete all of this and write our own code.

Benchmarking

Christophe Schreiber has a good example of using JMH on Dev.To. We need to send parameters to our methods, though. In the worst-case scenario, we'll have a very long array in which no two numbers add to k. These numbers should also all be unique so that the compiler can't do any optimisations and so that we need to continually add them to Andrei's Set.

I will be using this file on GitHub by Bruno Doolaeghe as a template:

package daily; import java.util.Arrays; import java.util.Collections; import java.util.List; import java.util.Set; import java.util.TreeSet; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; import java.util.stream.IntStream; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Param; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Warmup; @BenchmarkMode(Mode.AverageTime) @Warmup(iterations = 10, time=500, timeUnit=TimeUnit.MILLISECONDS) @Measurement(iterations = 10, time=500, timeUnit=TimeUnit.MILLISECONDS) @Fork(1) @OutputTimeUnit(TimeUnit.MICROSECONDS) public class MyBenchmark { @State(Scope.Benchmark) public static class Setup { @Param({ "10", "15", "20", "25", "30", "40", "60", "80", "100", "200", "400", "800", "1000", "10000", "100000", "1000000", "10000000" }) int N; // create a List of numbers 1...length List<Integer> list = IntStream.rangeClosed(1, N).boxed().collect(Collectors.toList()); // always larger than the largest possible sum of two elements int k = Integer.MAX_VALUE; // shuffle the array before returning it to the user int[] get_given() { // shuffle the List Collections.shuffle(list); // convert the List to an array int[] given = list.stream().mapToInt(i->i).toArray(); return given; } // return k int get_k() { return k; } } //---------------------------------------------------------------------------- // // TESTS // //---------------------------------------------------------------------------- @Benchmark public boolean testAndrew(Setup d) { // setup int[] given = d.get_given(); int k = d.get_k(); // algorithm int len = given.length; for (int outer = 0; outer < (len - 1); ++outer) for (int inner = outer + 1; inner < len; ++inner) if (given[outer] + given[inner] == k) return true; return false; } @Benchmark public boolean testAndrei(Setup d) { // setup int[] given = d.get_given(); int k = d.get_k(); // algorithm Set<Integer> set = new TreeSet<Integer>(); for (int number : given){ if (set.contains(number)) return true; set.add(k - number); } return false; } @Benchmark public boolean testJoan(Setup d) { // setup int[] given = d.get_given(); int k = d.get_k(); // algorithm Arrays.sort(given); int i = 0; int j = given.length - 1; while (i < given.length && j >= 0) { int sum = given[i] + given[j]; if (sum == k) return true; else if (sum > k) --j; else ++i; } return false; } } 

All code is available on my GitHub. The results of the benchmarking are below. Please note that I don't use JMH very often so this may not be the optimal way to lay out this benchmark. If you have any suggestions, be sure to let me (politely) know :)

Benchmark (N) Mode Cnt Score Error Units MyBenchmark.testAndrei 10 avgt 10 0.053 ± 0.004 us/op MyBenchmark.testAndrei 15 avgt 10 0.056 ± 0.002 us/op MyBenchmark.testAndrei 20 avgt 10 0.052 ± 0.003 us/op MyBenchmark.testAndrei 25 avgt 10 0.057 ± 0.008 us/op MyBenchmark.testAndrei 30 avgt 10 0.053 ± 0.003 us/op MyBenchmark.testAndrei 40 avgt 10 0.052 ± 0.001 us/op MyBenchmark.testAndrei 60 avgt 10 0.053 ± 0.002 us/op MyBenchmark.testAndrei 80 avgt 10 0.056 ± 0.007 us/op MyBenchmark.testAndrei 100 avgt 10 0.057 ± 0.008 us/op MyBenchmark.testAndrei 200 avgt 10 0.054 ± 0.006 us/op MyBenchmark.testAndrei 400 avgt 10 0.055 ± 0.009 us/op MyBenchmark.testAndrei 800 avgt 10 0.054 ± 0.001 us/op MyBenchmark.testAndrei 1000 avgt 10 0.065 ± 0.031 us/op MyBenchmark.testAndrei 10000 avgt 10 0.056 ± 0.002 us/op MyBenchmark.testAndrei 100000 avgt 10 0.053 ± 0.003 us/op MyBenchmark.testAndrei 1000000 avgt 10 0.063 ± 0.004 us/op MyBenchmark.testAndrei 10000000 avgt 10 0.052 ± 0.001 us/op MyBenchmark.testAndrew 10 avgt 10 0.051 ± 0.002 us/op MyBenchmark.testAndrew 15 avgt 10 0.051 ± 0.002 us/op MyBenchmark.testAndrew 20 avgt 10 0.055 ± 0.006 us/op MyBenchmark.testAndrew 25 avgt 10 0.052 ± 0.001 us/op MyBenchmark.testAndrew 30 avgt 10 0.053 ± 0.002 us/op MyBenchmark.testAndrew 40 avgt 10 0.054 ± 0.005 us/op MyBenchmark.testAndrew 60 avgt 10 0.050 ± 0.002 us/op MyBenchmark.testAndrew 80 avgt 10 0.059 ± 0.009 us/op MyBenchmark.testAndrew 100 avgt 10 0.053 ± 0.007 us/op MyBenchmark.testAndrew 200 avgt 10 0.055 ± 0.003 us/op MyBenchmark.testAndrew 400 avgt 10 0.051 ± 0.002 us/op MyBenchmark.testAndrew 800 avgt 10 0.062 ± 0.017 us/op MyBenchmark.testAndrew 1000 avgt 10 0.080 ± 0.034 us/op MyBenchmark.testAndrew 10000 avgt 10 0.051 ± 0.002 us/op MyBenchmark.testAndrew 100000 avgt 10 0.063 ± 0.038 us/op MyBenchmark.testAndrew 1000000 avgt 10 0.062 ± 0.014 us/op MyBenchmark.testAndrew 10000000 avgt 10 0.058 ± 0.005 us/op MyBenchmark.testJoan 10 avgt 10 0.059 ± 0.007 us/op MyBenchmark.testJoan 15 avgt 10 0.068 ± 0.010 us/op MyBenchmark.testJoan 20 avgt 10 0.060 ± 0.002 us/op MyBenchmark.testJoan 25 avgt 10 0.055 ± 0.003 us/op MyBenchmark.testJoan 30 avgt 10 0.058 ± 0.008 us/op MyBenchmark.testJoan 40 avgt 10 0.056 ± 0.003 us/op MyBenchmark.testJoan 60 avgt 10 0.061 ± 0.010 us/op MyBenchmark.testJoan 80 avgt 10 0.064 ± 0.013 us/op MyBenchmark.testJoan 100 avgt 10 0.061 ± 0.006 us/op MyBenchmark.testJoan 200 avgt 10 0.059 ± 0.014 us/op MyBenchmark.testJoan 400 avgt 10 0.059 ± 0.008 us/op MyBenchmark.testJoan 800 avgt 10 0.060 ± 0.001 us/op MyBenchmark.testJoan 1000 avgt 10 0.058 ± 0.001 us/op MyBenchmark.testJoan 10000 avgt 10 0.057 ± 0.002 us/op MyBenchmark.testJoan 100000 avgt 10 0.066 ± 0.009 us/op MyBenchmark.testJoan 1000000 avgt 10 0.059 ± 0.007 us/op MyBenchmark.testJoan 10000000 avgt 10 0.058 ± 0.008 us/op 

Even in the worst-case scenario with a 10-million element array, I see no difference between the three methods in terms of time:

MyBenchmark.testAndrei 10000000 avgt 10 0.052 ± 0.001 us/op MyBenchmark.testAndrew 10000000 avgt 10 0.058 ± 0.005 us/op MyBenchmark.testJoan 10000000 avgt 10 0.058 ± 0.008 us/op 

Because of the uncertainties on the three benchmarks, they could all be the same (they could all be 0.053 us/op). That's anticlimactic!

Collapse
 
awwsmm profile image
Andrew (he/him) • Edited

Follow-up with arrays with more elements:

Benchmark (N) Mode Cnt Score Error Units MyBenchmark.testAndrei 10000000 avgt 100 0.071 ± 0.004 us/op MyBenchmark.testAndrew 10000000 avgt 100 0.049 ± 0.001 us/op MyBenchmark.testJoan 10000000 avgt 100 0.887 ± 0.365 us/op MyBenchmark.testAndrei 100000000 avgt 100 0.057 ± 0.001 us/op MyBenchmark.testAndrew 100000000 avgt 100 0.050 ± 0.001 us/op MyBenchmark.testJoan 100000000 avgt 100 0.129 ± 0.080 us/op MyBenchmark.testAndrei 1000000000 avgt 100 0.058 ± 0.001 us/op MyBenchmark.testAndrew 1000000000 avgt 100 0.053 ± 0.001 us/op MyBenchmark.testJoan 1000000000 avgt 100 0.057 ± 0.003 us/op 

Does the double-for win??

Collapse
 
charles1303 profile image
charles1303

Interesting stats though. But could it be that the N squared approach never really got to it's worst case scenario by not iterating through all the elements?

Collapse
 
joanvo profile image
Joan Verdaguer • Edited

If you're crazy about complexity's O, I have the impression it would be better to start by sorting it and then going through the array starting both from the beggining and the end, but only once, so it becomes O(nlogn). It's probably less readable though.

Arrays.sort(arr); int i = 0; int j = arr.length - 1; while(i < arr.length && j >= 0) { int sum = arr[i] + arr[j]; if(sum == k) return true; else if (sum > k) j--; else i++; } return false; 

Andrei's solution was very smart as well, and I guess depending on complexity of contains and add of the Set object, you could get an algorithm with the same complexity.

Collapse
 
andylawlerdev profile image
Andrew • Edited

I enjoyed this! I read the problem and decided to have a go myself and see what our different solutions looked like. I did the same except in the embedded loop i just started at array.length and went down to 0 instead.

Collapse
 
awwsmm profile image
Andrew (he/him)

You bring up a good point. I should hide my solutions for future coding challenges like this.

Collapse
 
iameugn profile image
imeugn

I would even check, if the outer number is greater equal than the number k. Thus we could even skip some for loops.

Collapse
 
awwsmm profile image
Andrew (he/him)

This sounds reasonable to me but note that the problem doesn't state whether the numbers in the array can be negative!

If I got this question in a coding interview, I'm not sure I would make that assumption.

Collapse
 
iameugn profile image
imeugn

I was actually thinking more in the line of:
Let n={3,5,10,15,17} and k=13. 15 and 17 should not be considered at all since they are greater.
The numbers are not supposed to be negative. The worst case remains the same.
But if n is meant to be sorted, we can then only check the left side of n up to the very closest number to k, which is still less than k. That would change the approach in general, I would think.