COLLECTIONS FRAMEWORK PROGRAMMAZIONE CONCORRENTE E DISTR. Università degli Studi di Padova Dipartimento di Matematica Corso di Laurea in Informatica, A.A. 2015 – 2016 rcardin@math.unipd.it
Programmazione concorrente e distribuita SUMMARY  Introduction  Collections and iterators  Linked list  Array list  Hash set  Tree set  Maps  Collections framework 2Riccardo Cardin
Programmazione concorrente e distribuita INTRODUCTION  Data structures can make a big difference in programming  Do you need to search quickly? Do you need to rapidly insert and remove element? ...  The first version of Java was supplied with a very small set of classes  Vector, Stack, Hashtable, BitSet and Enumeration  Java needed a serious data structure library  Without the complexity of C++ STL  With the benefit of «generic algorithms» of C++ STL  The collections framework satisfies those needs 3Riccardo Cardin
Programmazione concorrente e distribuita INTRODUCTION  Java collection framework separates interfaces and implementations  It defines a set of interfaces that represents abstract data structures  Each interface is then implemented in different ways, each with main focus on some aspects  For example, fast insertion, fast search, fixed memory consumption, ...  Using interfaces, you can change the implementation simply modifying a single statement 4Riccardo Cardin // ArrayList optimizes random access, LinkedList modification stmts List<Integer> list = new ArrayList<>(); list = new LinkedList<>()
Programmazione concorrente e distribuita INTRODUCTION 5Riccardo Cardin
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS  The fundamental interface for collection is the Collection interface  The add method adds an element ot the collection  Returns true if the collection was changed by the addition  Iterators are used visit elements in the collection  Implements the Iterator design pattern 6Riccardo Cardin public interface Collection<E> { boolean add(E element); Iterator<E> iterator(); // ... } public interface Iterator<E> { E next(); boolean hasNext(); void remove(); }
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS  Using iterators we have decoupled a collection from the traversing policies  The order in which elements are visited depends on the the collection type  Using the next method the collection is visited one element at time  If the end of the collection is reached an NoSuchElementException is thrown  Use the hasNext method to check if the collection has more elements to visit  Think of Java iterators as being between elements  The iterator jumps over the next element 7Riccardo Cardin
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS 8Riccardo Cardin Think of Java iterators as being between elements While moving, the iterator returns the current element
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS  Using next and hasNext it is possibile to traverse the collection  As of Java 5, there is an elegant shortcut to looping a collection using iterators  To use the for each loop the data structure must implement Iterable<E> interface 9Riccardo Cardin for (String element : c) { // do something with element } Collection<String> c = /* ... */; Iterator<String> iter = c.iterator(); while (iter.hasNext()) { String element = iter.next(); // do something with element }
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS  Using an iterator it possibile to remove elements from a collection  The remove method removes the element that was returned by the last call to next  There is a strong dependency between next and remove: it is illegal to call remove if it wasn’t preceded by a call to next  This is the only way to safely modify a collection after the creation of an iterator 10Riccardo Cardin for (Iterator<String> iterator = list.iterator(); iterator.hasNext();) { // Point the iterator to the current element String string = iterator.next(); iterator.remove(); }
Programmazione concorrente e distribuita COLLECTIONS AND ITERATORS 11Riccardo Cardin
Programmazione concorrente e distribuita CONCRETE COLLECTIONS  All concrete collections implement Collection or Map interfaces  We will introduce only few of them 12Riccardo Cardin Collection type Description ArrayList An indexed sequence that grows and shrinks dynamically LinkedList An ordered sequence that allows efficient insertion and removal at any location HashSet An unordered collection that rejects duplicates TreeSet A sorted set HashMap A data structure that stores key/value associations
Programmazione concorrente e distribuita LINKED LIST  A LinkedList<E> is an ordered data structure that stores each object in a link  Each link store a reference to the next link of the seq.  In Java, a linked list is always doubly linked 13Riccardo Cardin Implements List<E> interface
Programmazione concorrente e distribuita LINKED LIST  Very efficient for remove and add operations  These ops are made through an Iterator  Other elements in the list have not to be repositioned after removal of an element 14Riccardo Cardin
Programmazione concorrente e distribuita LINKED LIST  Also di add operation is made efficiently through an iterator of type ListIterator<E>  Use List.listIterator method to get one  New element is added before the current position of the iterator  Be carefull of concurrent modification using iterators  Linked list are very inefficient in random access 15Riccardo Cardin interface ListIterator<E> extends Iterator<E> { void add(E element); E previous() boolean hasPrevious() } for (int i = 0; i < list.size(); i++) // do something with list.get(i);
Programmazione concorrente e distribuita LINKED LIST 16Riccardo Cardin Adding a new element changes at most two references
Programmazione concorrente e distribuita ARRAY LIST  An ArrayList<E> is an ordered data structure that is very efficient in random access ops.  It encapsulate a dynamically reallocated array of objects  Adding and removing operation are not so efficient  Reallocation of elements is needed  Use ArrayList instead of Vector  More efficient due to its not synchronized methods 17Riccardo Cardin
Programmazione concorrente e distribuita ARRAY LIST 18Riccardo Cardin
Programmazione concorrente e distribuita HASH SET  A Set<E> is a data structure that doesn’t care about element’s ordering  A set contain the same element only once  Search operation performs very efficiently  A set is a Collection  An HashSet<E> uses hash codes to distinguish among elements  An hashCode is a number that can be derived from object data  You must provide an hash function to your classes  The function must be compatible with the equals method 19Riccardo Cardin
Programmazione concorrente e distribuita HASH SET  Hash codes have to be computed quickly  Hash functions have some hash collision  Using the correct hash function the number of collision should be unlikely  Hash table are implemented as an array of linked lists 20Riccardo Cardin Using an hash function, the bucket in which inserting a new element is equal to hash(element) % #bucket If the hash function produceds values that are randomly distributed, collision should be rare buckets
Programmazione concorrente e distribuita HASH SET  An HashSet is implemented using an hash table  The contains method is very efficient, because it has to lookup the element only in one bucket  An iterator to hash set visit each bucket in turn  Because of scattering, they are visited in a seemingly random order  The add method adds an element if it is not already present  Don’t mutate an element in a set once inserted  If the hash code of an element were to change, the element would no longer be in the correct position 21Riccardo Cardin
Programmazione concorrente e distribuita TREE SET  A TreeSet<E> is a sorted set  While iterating over the collection, the elements are presented in sorted order  It uses a red-black tree to store data  Insertion is slower than insertion in an hash table, but it is still much faster than insertion in an array or linked list  ...but a tree set automatically sorts the elements ;)  The type of the elements may implement Comparable<T> 22Riccardo Cardin public interface Comparable<T> { // It returns a value that is < 0, = 0 or > 0 int compareTo(T other); }
Programmazione concorrente e distribuita TREE SET  What if elements do not implement Comparable or if you need more than on compation alg?  Provide a Comparator during set construction  The compare method acts like the compareTo method  Function object (lambda anyone?!)  Using an HashSet or a TreeSet?  There must be a total ordering defined on elements  You have to implement also the Comparator interface  Do you neeed elements to be sorted? 23Riccardo Cardin public interface Comparator<T> { // Defines how two elements of type T have to be compared int compare(T a, T b); }
Programmazione concorrente e distribuita TREE SET 24Riccardo Cardin
Programmazione concorrente e distribuita MAPS  A Map<K,V> is a data structure that allows you to search an element using a key  A map stores key/value pairs  Key must be unique: If you call the put method twice with the same key, the second value replaces the first one.  If no info is associated with a key, get returns null  In sets to find an element you must have a copy of it  An HashMap hashes the keys 25Riccardo Cardin // HashMap implements Map Map<String, Employee> staff = new HashMap<>(); Employee harry = new Employee("Harry Hacker"); staff.put("987-98-9996", harry); String s = "987-98-9996"; e = staff.get(s); // gets harry
Programmazione concorrente e distribuita MAPS  The collection framework does not consider a map itself as a Collection  It is possible to obtain views of the map  If you are interested in all the values of a map, loop over the enumeration of its entries  Iterators on views cannot add elements to the map  An UnsupportedOperationException is thrown 26Riccardo Cardin Set<K> keySet() // Set of keys Collection<K> values() // Collection of values Set<Map.Entry<K, V>> entrySet() // Set of pairs (key,value) for (Map.Entry<String, Employee> entry : staff.entrySet()) { String key = entry.getKey(); Employee value = entry.getValue(); // do something with key, value }
Programmazione concorrente e distribuita THE FRAMEWORK  A framework is a set of classes that form the basis for building advanced functionality  The Collection framework defines classes to implement collections  The main interfaces are Collection and Map  Insertion interfaces are different between the two types  To get the elements from a Collection, just iterate over it  To get a value from a Map, use the get method 27Riccardo Cardin // To insert an element in a Collection boolean add(E element) // To store a key/value pair in a Map V put(K key, V value) V get(K key)
Programmazione concorrente e distribuita THE FRAMEWORK  A List is an ordered collection  There is the concept of position of an element  A list provides random access methods  Lists provides a specialized iterator, ListIterator  A Set is a Collection with no duplicates  The add method can reject a value if already present  Methods equals and hashCode are used to maintain elements inside the set 28Riccardo Cardin void add(int index, E element) E get(int index) void remove(int index) void add(E element)
Programmazione concorrente e distribuita THE FRAMEWORK  The framework provides some wrappers 29Riccardo Cardin
Programmazione concorrente e distribuita THE FRAMEWORK  The framework have some companion objects  The Arrays type allows to trasform arrays into List  The list returned is a view on the array: it is not possible to change the size of the list; elements are the same  The Collections type have a bunch of utilities  The method nCopies builds an illusory immutable list  Object is stored only once  Method singleton returns an illusory set with one element  And so on... 30Riccardo Cardin static <T> List<T> asList(T... a) List<String> settings = Collections.nCopies(100, "DEFAULT"); Collections.singleton(anObject)
Programmazione concorrente e distribuita THE FRAMEWORK 31Riccardo Cardin
Programmazione concorrente e distribuita EXAMPLES 32Riccardo Cardin https://github.com/rcardin/pcd-snippets
Programmazione concorrente e distribuita REFERENCES  Chap. 13 «Collections», Core Java Volume I - Fundamentals, Cay Horstmann, Gary Cornell, 2012, Prentice Hall 33Riccardo Cardin

Java - Collections framework

  • 1.
    COLLECTIONS FRAMEWORK PROGRAMMAZIONE CONCORRENTEE DISTR. Università degli Studi di Padova Dipartimento di Matematica Corso di Laurea in Informatica, A.A. 2015 – 2016 rcardin@math.unipd.it
  • 2.
    Programmazione concorrente edistribuita SUMMARY  Introduction  Collections and iterators  Linked list  Array list  Hash set  Tree set  Maps  Collections framework 2Riccardo Cardin
  • 3.
    Programmazione concorrente edistribuita INTRODUCTION  Data structures can make a big difference in programming  Do you need to search quickly? Do you need to rapidly insert and remove element? ...  The first version of Java was supplied with a very small set of classes  Vector, Stack, Hashtable, BitSet and Enumeration  Java needed a serious data structure library  Without the complexity of C++ STL  With the benefit of «generic algorithms» of C++ STL  The collections framework satisfies those needs 3Riccardo Cardin
  • 4.
    Programmazione concorrente edistribuita INTRODUCTION  Java collection framework separates interfaces and implementations  It defines a set of interfaces that represents abstract data structures  Each interface is then implemented in different ways, each with main focus on some aspects  For example, fast insertion, fast search, fixed memory consumption, ...  Using interfaces, you can change the implementation simply modifying a single statement 4Riccardo Cardin // ArrayList optimizes random access, LinkedList modification stmts List<Integer> list = new ArrayList<>(); list = new LinkedList<>()
  • 5.
    Programmazione concorrente edistribuita INTRODUCTION 5Riccardo Cardin
  • 6.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  The fundamental interface for collection is the Collection interface  The add method adds an element ot the collection  Returns true if the collection was changed by the addition  Iterators are used visit elements in the collection  Implements the Iterator design pattern 6Riccardo Cardin public interface Collection<E> { boolean add(E element); Iterator<E> iterator(); // ... } public interface Iterator<E> { E next(); boolean hasNext(); void remove(); }
  • 7.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using iterators we have decoupled a collection from the traversing policies  The order in which elements are visited depends on the the collection type  Using the next method the collection is visited one element at time  If the end of the collection is reached an NoSuchElementException is thrown  Use the hasNext method to check if the collection has more elements to visit  Think of Java iterators as being between elements  The iterator jumps over the next element 7Riccardo Cardin
  • 8.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS 8Riccardo Cardin Think of Java iterators as being between elements While moving, the iterator returns the current element
  • 9.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using next and hasNext it is possibile to traverse the collection  As of Java 5, there is an elegant shortcut to looping a collection using iterators  To use the for each loop the data structure must implement Iterable<E> interface 9Riccardo Cardin for (String element : c) { // do something with element } Collection<String> c = /* ... */; Iterator<String> iter = c.iterator(); while (iter.hasNext()) { String element = iter.next(); // do something with element }
  • 10.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using an iterator it possibile to remove elements from a collection  The remove method removes the element that was returned by the last call to next  There is a strong dependency between next and remove: it is illegal to call remove if it wasn’t preceded by a call to next  This is the only way to safely modify a collection after the creation of an iterator 10Riccardo Cardin for (Iterator<String> iterator = list.iterator(); iterator.hasNext();) { // Point the iterator to the current element String string = iterator.next(); iterator.remove(); }
  • 11.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS 11Riccardo Cardin
  • 12.
    Programmazione concorrente edistribuita CONCRETE COLLECTIONS  All concrete collections implement Collection or Map interfaces  We will introduce only few of them 12Riccardo Cardin Collection type Description ArrayList An indexed sequence that grows and shrinks dynamically LinkedList An ordered sequence that allows efficient insertion and removal at any location HashSet An unordered collection that rejects duplicates TreeSet A sorted set HashMap A data structure that stores key/value associations
  • 13.
    Programmazione concorrente edistribuita LINKED LIST  A LinkedList<E> is an ordered data structure that stores each object in a link  Each link store a reference to the next link of the seq.  In Java, a linked list is always doubly linked 13Riccardo Cardin Implements List<E> interface
  • 14.
    Programmazione concorrente edistribuita LINKED LIST  Very efficient for remove and add operations  These ops are made through an Iterator  Other elements in the list have not to be repositioned after removal of an element 14Riccardo Cardin
  • 15.
    Programmazione concorrente edistribuita LINKED LIST  Also di add operation is made efficiently through an iterator of type ListIterator<E>  Use List.listIterator method to get one  New element is added before the current position of the iterator  Be carefull of concurrent modification using iterators  Linked list are very inefficient in random access 15Riccardo Cardin interface ListIterator<E> extends Iterator<E> { void add(E element); E previous() boolean hasPrevious() } for (int i = 0; i < list.size(); i++) // do something with list.get(i);
  • 16.
    Programmazione concorrente edistribuita LINKED LIST 16Riccardo Cardin Adding a new element changes at most two references
  • 17.
    Programmazione concorrente edistribuita ARRAY LIST  An ArrayList<E> is an ordered data structure that is very efficient in random access ops.  It encapsulate a dynamically reallocated array of objects  Adding and removing operation are not so efficient  Reallocation of elements is needed  Use ArrayList instead of Vector  More efficient due to its not synchronized methods 17Riccardo Cardin
  • 18.
    Programmazione concorrente edistribuita ARRAY LIST 18Riccardo Cardin
  • 19.
    Programmazione concorrente edistribuita HASH SET  A Set<E> is a data structure that doesn’t care about element’s ordering  A set contain the same element only once  Search operation performs very efficiently  A set is a Collection  An HashSet<E> uses hash codes to distinguish among elements  An hashCode is a number that can be derived from object data  You must provide an hash function to your classes  The function must be compatible with the equals method 19Riccardo Cardin
  • 20.
    Programmazione concorrente edistribuita HASH SET  Hash codes have to be computed quickly  Hash functions have some hash collision  Using the correct hash function the number of collision should be unlikely  Hash table are implemented as an array of linked lists 20Riccardo Cardin Using an hash function, the bucket in which inserting a new element is equal to hash(element) % #bucket If the hash function produceds values that are randomly distributed, collision should be rare buckets
  • 21.
    Programmazione concorrente edistribuita HASH SET  An HashSet is implemented using an hash table  The contains method is very efficient, because it has to lookup the element only in one bucket  An iterator to hash set visit each bucket in turn  Because of scattering, they are visited in a seemingly random order  The add method adds an element if it is not already present  Don’t mutate an element in a set once inserted  If the hash code of an element were to change, the element would no longer be in the correct position 21Riccardo Cardin
  • 22.
    Programmazione concorrente edistribuita TREE SET  A TreeSet<E> is a sorted set  While iterating over the collection, the elements are presented in sorted order  It uses a red-black tree to store data  Insertion is slower than insertion in an hash table, but it is still much faster than insertion in an array or linked list  ...but a tree set automatically sorts the elements ;)  The type of the elements may implement Comparable<T> 22Riccardo Cardin public interface Comparable<T> { // It returns a value that is < 0, = 0 or > 0 int compareTo(T other); }
  • 23.
    Programmazione concorrente edistribuita TREE SET  What if elements do not implement Comparable or if you need more than on compation alg?  Provide a Comparator during set construction  The compare method acts like the compareTo method  Function object (lambda anyone?!)  Using an HashSet or a TreeSet?  There must be a total ordering defined on elements  You have to implement also the Comparator interface  Do you neeed elements to be sorted? 23Riccardo Cardin public interface Comparator<T> { // Defines how two elements of type T have to be compared int compare(T a, T b); }
  • 24.
    Programmazione concorrente edistribuita TREE SET 24Riccardo Cardin
  • 25.
    Programmazione concorrente edistribuita MAPS  A Map<K,V> is a data structure that allows you to search an element using a key  A map stores key/value pairs  Key must be unique: If you call the put method twice with the same key, the second value replaces the first one.  If no info is associated with a key, get returns null  In sets to find an element you must have a copy of it  An HashMap hashes the keys 25Riccardo Cardin // HashMap implements Map Map<String, Employee> staff = new HashMap<>(); Employee harry = new Employee("Harry Hacker"); staff.put("987-98-9996", harry); String s = "987-98-9996"; e = staff.get(s); // gets harry
  • 26.
    Programmazione concorrente edistribuita MAPS  The collection framework does not consider a map itself as a Collection  It is possible to obtain views of the map  If you are interested in all the values of a map, loop over the enumeration of its entries  Iterators on views cannot add elements to the map  An UnsupportedOperationException is thrown 26Riccardo Cardin Set<K> keySet() // Set of keys Collection<K> values() // Collection of values Set<Map.Entry<K, V>> entrySet() // Set of pairs (key,value) for (Map.Entry<String, Employee> entry : staff.entrySet()) { String key = entry.getKey(); Employee value = entry.getValue(); // do something with key, value }
  • 27.
    Programmazione concorrente edistribuita THE FRAMEWORK  A framework is a set of classes that form the basis for building advanced functionality  The Collection framework defines classes to implement collections  The main interfaces are Collection and Map  Insertion interfaces are different between the two types  To get the elements from a Collection, just iterate over it  To get a value from a Map, use the get method 27Riccardo Cardin // To insert an element in a Collection boolean add(E element) // To store a key/value pair in a Map V put(K key, V value) V get(K key)
  • 28.
    Programmazione concorrente edistribuita THE FRAMEWORK  A List is an ordered collection  There is the concept of position of an element  A list provides random access methods  Lists provides a specialized iterator, ListIterator  A Set is a Collection with no duplicates  The add method can reject a value if already present  Methods equals and hashCode are used to maintain elements inside the set 28Riccardo Cardin void add(int index, E element) E get(int index) void remove(int index) void add(E element)
  • 29.
    Programmazione concorrente edistribuita THE FRAMEWORK  The framework provides some wrappers 29Riccardo Cardin
  • 30.
    Programmazione concorrente edistribuita THE FRAMEWORK  The framework have some companion objects  The Arrays type allows to trasform arrays into List  The list returned is a view on the array: it is not possible to change the size of the list; elements are the same  The Collections type have a bunch of utilities  The method nCopies builds an illusory immutable list  Object is stored only once  Method singleton returns an illusory set with one element  And so on... 30Riccardo Cardin static <T> List<T> asList(T... a) List<String> settings = Collections.nCopies(100, "DEFAULT"); Collections.singleton(anObject)
  • 31.
    Programmazione concorrente edistribuita THE FRAMEWORK 31Riccardo Cardin
  • 32.
    Programmazione concorrente edistribuita EXAMPLES 32Riccardo Cardin https://github.com/rcardin/pcd-snippets
  • 33.
    Programmazione concorrente edistribuita REFERENCES  Chap. 13 «Collections», Core Java Volume I - Fundamentals, Cay Horstmann, Gary Cornell, 2012, Prentice Hall 33Riccardo Cardin