Data Structures and Algorithms-II
Data Structures and Algorithms-II
D A T A S T R U C T UR E S
AND ALGORITHMS-II
For S.Y.B.Sc. Computer Science : Semester – IV (Paper – I)
 [Course Code CS 241 : Credits - 2]
 CBCS Pattern
 As Per New Syllabus, Effective from June 2020
Price ` 190.00
 N5541
DATA STRUCTURES & ALGORITHMS-II ISBN 978-93-90506-31-6
First Edition : January 2021
© : Author
 The text of this publication, or any part thereof, should not be reproduced or transmitted in any form or stored in any computer
storage system or device for distribution including photocopy, recording, taping or information retrieval system or reproduced on any disc,
tape, perforated media or other information storage device etc., without the written permission of Author with whom the rights are
reserved. Breach of this condition is liable for legal action.
 Every effort has been made to avoid errors or omissions in this publication. In spite of this, errors may have crept in. Any mistake, error
or discrepancy so noted and shall be brought to our notice shall be taken care of in the next edition. It is notified that neither the publisher
nor the author or seller shall be responsible for any damage or loss of action to any one, of any kind, in any manner, therefrom.
 DISTRIBUTION CENTRES
 PUNE
 Nirali Prakashan : 119, Budhwar Peth, Jogeshwari Mandir Lane, Pune 411002, Maharashtra
 (For orders within Pune) Tel : (020) 2445 2044; Mobile : 9657703145
 Email : niralilocal@pragationline.com
 Nirali Prakashan : S. No. 28/27, Dhayari, Near Asian College Pune 411041
 (For orders outside Pune) Tel : (020) 24690204; Mobile : 9657703143
 Email : bookorder@pragationline.com
 MUMBAI
 Nirali Prakashan : 385, S.V.P. Road, Rasdhara Co-op. Hsg. Society Ltd.,
 Girgaum, Mumbai 400004, Maharashtra; Mobile : 9320129587
 Tel : (022) 2385 6339 / 2386 9976, Fax : (022) 2386 9976
 Email : niralimumbai@pragationline.com
 DISTRIBUTION BRANCHES
 JALGAON
 Nirali Prakashan : 34, V. V. Golani Market, Navi Peth, Jalgaon 425001, Maharashtra,
 Tel : (0257) 222 0395, Mob : 94234 91860; Email : niralijalgaon@pragationline.com
 KOLHAPUR
 Nirali Prakashan : New Mahadvar Road, Kedar Plaza, 1st Floor Opp. IDBI Bank, Kolhapur 416 012
 Maharashtra. Mob : 9850046155; Email : niralikolhapur@pragationline.com
 NAGPUR
 Nirali Prakashan : Above Maratha Mandir, Shop No. 3, First Floor,
 Rani Jhanshi Square, Sitabuldi, Nagpur 440012, Maharashtra
 Tel : (0712) 254 7129; Email : niralinagpur@pragationline.com
 DELHI
 Nirali Prakashan : 4593/15, Basement, Agarwal Lane, Ansari Road, Daryaganj
 Near Times of India Building, New Delhi 110002 Mob : 08505972553
 Email : niralidelhi@pragationline.com
 BENGALURU
 Nirali Prakashan : Maitri Ground Floor, Jaya Apartments, No. 99, 6th Cross, 6th Main,
 Malleswaram, Bengaluru 560003, Karnataka; Mob : 9449043034
 Email: niralibangalore@pragationline.com
 Other Branches : Hyderabad, Chennai
Note : Every possible effort has been made to avoid errors or omissions in this book. In spite this, errors may have crept in. Any type of
error or mistake so noted, and shall be brought to our notice, shall be taken care of in the next edition. It is notified that neither the publisher,
nor the author or book seller shall be responsible for any damage or loss of action to any one of any kind, in any manner, therefrom. The
reader must cross check all the facts and contents with original Government notification or publications.
 niralipune@pragationline.com | www.pragationline.com
 Also find us on www.facebook.com/niralibooks
Preface …
 I take an opportunity to present this Text Book on "Data Structures and Algorithms-II"
to the students of Second Year B.Sc. (Computer Science) Semester-IV as per the New
Syllabus, June 2020.
 The book has its own unique features. It brings out the subject in a very simple and lucid
manner for easy and comprehensive understanding of the basic concepts. The book covers
theory of Tree, Efficient Search Trees, Graph and Hash Table.
 A special word of thank to Shri. Dineshbhai Furia, and Mr. Jignesh Furia for
showing full faith in me to write this text book. I also thank to Mr. Amar Salunkhe and
Mr. Akbar Shaikh of M/s Nirali Prakashan for their excellent co-operation.
 I also thank Ms. Chaitali Takle, Mr. Ravindra Walodare, Mr. Sachin Shinde, Mr. Ashok
Bodke, Mr. Moshin Sayyed and Mr. Nitin Thorat.
 Although every care has been taken to check mistakes and misprints, any errors,
omission and suggestions from teachers and students for the improvement of this text book
shall be most welcome.
 Author
Syllabus …
1. Tree (10 Hrs.)
 1.1 Concept and Terminologies
 1.2 Types of Binary Trees - Binary Tree, Skewed Tree, Strictly Binary Tree, Full Binary Tree,
 Complete Binary Tree, Expression Tree, Binary Search Tree, Heap
 1.3 Representation - Static and Dynamic
 1.4 Implementation and Operations on Binary Search Tree - Create, Insert, Delete, Search, Tree
 Traversals - Preorder, Inorder, Postorder (Recursive Implementation), Level-Order Traversal
 using Queue, Counting Leaf, Non-Leaf and Total Nodes, Copy, Mirror
 1.5 Applications of Trees
 1.5.1 Heap Sort, Implementation
 1.5.2 Introduction to Greedy Strategy, Huffman Encoding (Implementation using Priority
 Queue)
2. Efficient Search Trees (8 Hrs.)
 2.1 Terminology: Balanced Trees - AVL Trees, Red Black Tree, Splay Tree, Lexical Search
 Tree -Trie
 2.2 AVL Tree - Concept and Rotations
 2.3 Red Black Trees - Concept, Insertion and Deletion
 2.4 Multi-Way Search Tree - B and B+ Tree - Insertion, Deletion
3. Graph (12 Hrs.)
 3.1 Concept and Terminologies
 3.2 Graph Representation - Adjacency Matrix, Adjacency List, Inverse Adjacency List, Adjacency
 Multi-list
 3.3 Graph Traversals - Breadth First Search and Depth First Search (With Implementation)
 3.4 Applications of Graph
 3.4.1 Topological Sorting
 3.4.2 Use of Greedy Strategy in Minimal Spanning Trees (Prim’s and Kruskal’s Algorithm)
 3.4.3 Single Source Shortest Path - Dijkstra’s Algorithm
 3.4.4 Dynamic Programming Strategy, All Pairs Shortest Path - Floyd Warshall Algorithm
 3.4.5 Use of Graphs in Social Networks
4. Hash Table (6 Hrs.)
 4.1 Concept of Hashing
 4.2 Terminologies - Hash Table, Hash Function, Bucket, Hash Address, Collision, Synonym,
 Overflow etc.
 4.3 Properties of Good Hash Function
 4.4 Hash Functions: Division Function, Mid Square, Folding Methods
 4.5 Collision Resolution Techniques
 4.5.1 Open Addressing - Linear Probing, Quadratic Probing, Rehashing
 4.5.2 Chaining - Coalesced, Separate Chaining
Contents …
 1.0 INTRODUCTION
• A tree is a non-linear data structure. A non-linear data structure is one in which its
 elements do not forms a sequence.
• A tree data structure is a widely used Abstract Data Type (ADT) that simulates a
 hierarchical tree structure, with a root value and subtrees of children with a parent
 node, represented as a set of linked nodes.
• A tree data structure stores the data elements in a hierarchical manner. A tree is a
 hierarchical data structure that consists of nodes connected by edges.
• Let us see an example of a directory structure stored by operating system. The
 operating system organizes files into directories and subdirectories. This can be
 viewed as tree shown in Fig. 1.1.
 Desktop
Win TC Java
 include lib
 Fig. 1.1: Tree Representation of Directory Structure
 1.1
Data Structures & Algorithms - II Tree
Subtree Edge
 Parent B C
 node Siblings
1.1.1 Definition
• A tree may be defined as, a finite set ‘T’ of one or more nodes such that there is a node
 designated as the root of the tree and the other nodes (excluding the root) are divided
 into n ≥ 0 disjoint sets T1, T2, … Tn and each of these sets is a tree in turn. The trees
 T1, T2 … Tn are called the sub-trees or children of the root.
 1.2
Data Structures & Algorithms - II Tree
• The Fig. 1.3 (a) shows the empty tree, there are no nodes. The Fig. 1.3 (b) shows the
 tree with only one node. The tree in Fig. 1.3 (c) has 12 nodes.
 root A
 root = NULL
 (a) (b)
 Level
 root A 0
B C 1
D E F G 2
 H I J K
 3
 L 4
 (c)
 Fig. 1.3: Examples of Tree
• The root of tree is A, it has 2 subtrees. The roots of these subtrees are called the
 children of the root A.
• The nodes with no subtrees are called terminal node or leaves. There are 5 leaves in
 the tree of Fig. 1.3 (c).
• Because family relationship can be modeled as trees, we often call the root of a tree (or
 subtree) the parent and the nodes of the subtrees the children; the children of a node
 are called siblings.
1.1.3 Terminology
• A tree consists of following terminology:
 1. Node: In a tree, every individual element is called as node. Node in a tree data
 structure, stores the actual data of that particular element and link to next element
 in hierarchical structure. Fig. 1.3 (c) has 12 nodes.
 2. Root Node: Every tree must have a root node. In tree data structure, the first node
 from where the tree originates is called as a root node. In any tree, there must be
 only one root node.
 3. Parent Node: In tree data structure, the node which is predecessor of any node is
 called as parent node. Parent node can be defined as, "the node which has
 child/children". In simple words, the node which has branch from it to any other
 node is called as parent node.
 4. Child Node: In tree data structure, the node which is descendant of any node is
 called as child node. In simple words, the node which has a link from its parent
 node is called as child node. In tree data structure, all the nodes except root node
 are child nodes.
 5. Leaf Node: The node which does not have any child is called as a leaf node. Leaf
 nodes are also called as external nodes or terminal nodes.
 6. Internal Node: In a tree data structure, the leaf nodes are also called as external
 nodes. External node is also a node with no child. In a tree, leaf node is also called
 as terminal node.
 7. Edge: In tree data structure, the connecting link between any two nodes is called
 as edge. In a tree with 'n' number of nodes there will be a maximum of 'n-1'
 number of edges.
 8. Path: In tree data structure, the sequence of nodes and edges from one node to
 another node is called as path between that two nodes. Length of a path is total
 number of nodes in that path.
 9. Siblings: Nodes which belong to the same parent are called as siblings. In other
 words, nodes with the same parent are sibling nodes. [Oct. 17]
 10. Null Tree: A tree with no nodes is called as a null tree (Refer Fig. 1.3 (a)).
 11. Degree of a Node: Degree of a node is the total number of children of that node.
 The degree of A is 2, degree of K is 1 and degree of L is 0, (Refer Fig. 1.3 (c)).
 12. Degree of a Tree: The highest degree of a node among all the nodes in a tree is
 called as degree of tree. In Fig. 1.3 (c), degree of the tree is 2. [April 18]
 13. Depth or Level of a Node: In tree data structure, the root node is said to be at
 Level 0 and the children of root node are at Level 1 and the children of the nodes
 which are at Level 1 will be at Level 2 and so on. In simple words, in a tree each
 step from top to bottom is called as a Level and the Level count starts with '0' and
 incremented by one (1) at each level (step).
 14. Descendants: The descendents of a node are those nodes which are reachable
 from node. In Fig. 1.3 nodes J, K, L are descendents of G.
 1.4
Data Structures & Algorithms - II Tree
 15. Ancestor: The ancestor of a node are all the nodes along the path from the root to
 that node. In Fig. 1.3 nodes A and C are ancestor of G.
 16. In-degree: The in-degree of a node is the total number of edges coming to that
 node.
 17. Out-degree: The out-degree of a node is the total number of edges going outside
 from the node.
 18. Forest: A forest is a set of disjoint trees.
 19. Sub Tree: In tree data structure, each child from a node forms a subtree
 recursively. Every child node will form a subtree on its parent node.
 20. Height: In tree data structure, the total number of edges from leaf node to a
 particular node in the longest path is called as height of that node. In a tree, height
 of the root node is said to be height of the tree. In a tree, height of all leaf nodes is
 '0'.
 21. Depth: Total number of edges from root node to a particular node is called
 as depth of that node. In tree, depth of the root node is ‘0’. The tree of Fig. 1.3 (c),
 has depth 4.
B C
 D
 E
G F
A A
B B
C C
D D
B C
D E
F G
B C
D E F G
B C
D E F G
B I J K L
+ 3
4 2
Fig. 1.9
32 32
26 45 20 59
15 30 40 48 15 26 78
 11
 38
1.2.8 Heap
• A heap is a special tree-based data structure in which the tree is a complete binary
 tree.
• Generally, heaps can be of following two types:
 1. Max-Heap: In a max-heap the key present at the root node must be greatest among
 the keys present at all of its children. The same property must be recursively true
 for all sub-trees in that binary tree.
 2. Min-Heap: In a min-heap the key present at the root node must be minimum
 among the keys present at all of its children. The same property must be
 recursively true for all sub-trees in that binary tree.
 10 100
15 30 40 50
40 50 100 40 10 15 50 40
• Hence, nodes of the tree are stored level by level, starting from the zero level where
 the root is present.
• The root node is stored in the first memory location as the first element in the array.
• Static representation of tree needs sequential numbering of the nodes starting with
 nodes on level zero, then those on level 1 and so on.
 h+1
• A complete binary tree of height h has (2 − 1) nodes in it. The nodes can be stored in
 h+1 d
 one dimensional array. A array of size (2 − 1) or 2 − 1 (where d = no. of levels) is
 required.
 th
• Following rules can be used to decide the location of any i node of a tree:
 For any node with index i, where i, 1 ≤ i ≤ n
 i
 (a) PARENT (i) = 2 if i ≠ 1
  
 If i = 1 then it is root which has no parent
 (b) LCHILD (i) = 2 * i, if 2i ≤ n
 If 2i > n, then i has no left child
 (c) RCHILD(i) = 2i + 1, if 2i + 1 ≤ n
 If (2i + 1) > n, then i has no right child.
• Let us, consider complete binary tree in the Fig. 1.12.
 A
 B
 E
C D F G
 Fig. 1.12
• The representation of the above tree in Fig. 1.12 using array is given in Fig. 1.13.
 1 2 3 4 5 6 7 8 9
 A B E C D F G – –
 Fig. 1.13
• The parent of node i is at location (i/2)
 Example, Parent of node D = (5/2) =2 i.e. B
 i
• Left child of a node i is present at position 2i + 1 if 2 + 1 < n
 i
 Left child of node B = 2 = 2*2= 4 i.e. C
 i
• Right child of a node i is present at position 2 + 1
 i
 Right child of E = 2 +1= (2*3) +1= 6+1=7 i.e. G.
 1.9
Data Structures & Algorithms - II Tree
Examples:
 Example 1: Consider the complete binary tree in Fig. 1.14.
 0
 A
 1 2
 B E
 3 4 5 6
 C D F G
 Fig. 1.14
 In Fig. 1.14,
 Number of levels = 3 (0 to 2) and height = 2
 3 2+1 3
 Therefore, we need the array of size 2 −1 or 2 − 1 is 2 − 1 = 7.
 The representation of the above binary tree using array is as followed:
 1 2 3 4 5 6 7
 A B C D E F G
 We will number each node of tree starting from the root. Nodes on same level will be
 numbered left to right.
 Example 2: Consider almost complete binary tree in Fig. 1.15.
 A
B C
D E
F G H I
 Fig. 1.15
 4
 Here, depth = 4 (level), therefore we need the array of size 2 − 1 = 15. The
 representation of the above binary tree using array is as follows:
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 A B C D E − − F G H I − − − −
level 0 1 2 3
 We can apply the above rules to find array representation.
 i 5
 1. Parent of node E (node 5) = 2 = 2 = 2 i.e. B.
 1.10
Data Structures & Algorithms - II Tree
 i
 2. Left (i) = 2 .
 For example, left of E = 2 × 5 = 10 i.e. H.
 th
 Since, E is the 5 node of tree.
 i
 3. Right (i) = 2 + 1
 For example: Right of D = 2 × 4 + 1 = 8 + 1 = 9 i.e. G.
 th th
 Since, D is the 4 node of tree and G is the 9 element of an array.
 Example 3: Consider the example of skewed tree.
 A
 D
 Fig. 1.16: Skewed Binary Tree
 The tree has following array representation.
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 A B – C – – – D – – – – – – –
 4
 We need the array of size 2 – 1 = 15.
 Thus, only four positions are occupied in the array and 11 are wasted.
 From the above example it is clear that binary tree using array representation is
 easier, but the method has certain drawbacks. In most of the representation, there will
 be lot of unused space. For complete binary trees, the representation is ideal as no
 space is wasted.
 But for the skewed binary tree (See Fig. 1.16), less than half of the array is used and
 the more is left unused. In the worst case, a skewed binary tree of depth k will require
 k
 2 − 1 locations of array and occupying only few of them.
Advantages of Array Representation of Tree:
 1. In static representation of tree, we can access any node from other node by
 calculating index and this is efficient from the execution point of view.
 2. In static representation, the data is stored without any pointers to their successor
 or ancestor.
 3. Programming languages like BASIC, FORTRAN etc., where dynamic allocation is
 not possible, use array representation to store a tree.
Disadvantages of Array Representation of Tree:
 1. In static representation the number of the array entries are empty.
 2. In static representation, there is no way possible to enhance the tree structure.
 Array size cannot be changed during execution.
 3. Inserting a new node to it or deleting a node from it are inefficient for this
 representation, (here, considerable data movement up and down the array
 happens, thus more/excessive processing time is used).
 1.11
Data Structures & Algorithms - II Tree
Program 1.1: Program for static (array) representation of tree (Converting a list of array
in to binary tree).
 #include<stdio.h>
 typedef struct node
 {
 struct node*left;
 struct node*right;
 char data;
 }node;
 node* insert(char c[],int n)
 {
 node*tree=NULL;
 if(c[n]!='\0')
 {
 tree=(node*)malloc(sizeof(node));
 tree->left=insert(c,2*n+1);
 tree->data=c[n];
 tree->right=insert(c,2*n+2);
 }
 return tree;
 }
 /*traverse the tree in inorder*/
 void inorder(node*tree)
 {
 if(tree!=NULL)
 {
 inorder(tree->left);
 printf("%c\t",tree->data);
 inorder(tree->right);
 }
 }
 void main()
 {
 node*tree=NULL;
 char c[]={'A','B','C','D','E','F','\0','G','\0','\0','\0','\0','\0','
 \0','\0','\0','\0','\0','\0','\0','\0'};
 1.12
Data Structures & Algorithms - II Tree
 tree=insert(c,0);
 inorder(tree);
 }
Output:
 G D B E A F C
A Root node
Tree
 A Root
 B C
 B
 C
 D E F G D NULL E NULL F NULL NULL G NULL
H I J
 (a)
 1.13
Data Structures & Algorithms - II Tree
 Linked representation
 A Root NULL A
 B
 NULL B
 C
 NULL C
 D
 NULL D NULL
 (b)
 Fig. 1.18: Linked Representation of Binary Tree
• Each node represents information (data) and two links namely, Lchild and Rchild are
 used to store addresses of left child and right child of a node.
 Declaration in 'C': We can define the node structure as follows:
 struct tnode
 {
 struct tnode *left;
 <datatype> data; /* data type is a data type of field data */
 struct tnode *right;
 };
• Using this declaration for linked representation, the binary tree representation can be
 logically viewed as in Fig. 1.19 (c). In Fig. 1.19 (b) physical representation shows the
 memory allocation of nodes.
 Address node
 A
 left data right
 B 50 NULL D NULL
 A C
 D 75 50 B 40
 B E F G
 C
 40 NULL E NULL
 D E FH G I
 89 75 A 46
 (a) A Binary Tree 62 NULL F NULL
 H I
 (a)
 46 62 C 58
 (a)
 66 NULL H NULL
58 66 G 74
74 NULL I NULL
 (b)
 (b) A Binary Tree and its various Nodes
 (Physical View)
 75 A 46
 1.14
 50 B 40 62 C 58
75 A 46
50 B 40 62 C 58
 (c)
 (c) Logical View
 Fig. 1.19: Linked Representation of Binary Tree
Advantages of Linked Representation of Binary Tree:
 1. Efficient use of memory than that of static representation.
 2. Insertion and deletion operations are more efficient.
 3. Enhancement of tree is possible.
Disadvantages of Linked Representation of Binary Tree:
 1. If we want to access a particular node, we have to traverse from root to that node;
 there is no direct access to any node.
 2. Since, two additional pointers are present (left and right), the memory needed per
 node is more than that of sequential/static representation.
 3. Programming languages not supporting dynamic memory management are not
 useful for dynamic representation.
Program 1.2: Program for dynamic (linked list) representation of tree.
#include <stdio.h>
#include <malloc.h>
struct node {
 struct node * left;
 char data;
 struct node * right;
 };
 struct node *constructTree( int );
 void inorder(struct node *);
 char array[] = {'A', 'B', 'C', 'D', 'E', 'F', 'G', '\0', '\0', 'H'};
 int leftcount[] = {1, 3, 5, -1, 9, -1, -1, -1, -1, -1};
 int rightcount[] = {2, 4, 6, -1, -1, -1, -1, -1, -1, -1};
 void main() {
 struct node *root;
 root = constructTree( 0 );
 1.15
Data Structures & Algorithms - II Tree
B NULL C
NULL G NULL
 1.16
Data Structures & Algorithms - II Tree
 3. Deletion: To delete a node from the binary tree. Deletion operation deletes any
 node from any non-empty binary tree. In order to delete a node in a binary tree, it
 is required to reach at the parent node of the node to be deleted. The link field of
 the parent node which stores the address of the node to be deleted is then set by a
 NULL entry as shown in Fig. 1.21.
 A
B NULL C NULL
NULL G NULL
 Fig. 1.21: Node with Data G is Deleted and Left of D becomes NULL
 4. Traversal: Binary tree traversal means visiting every node exactly once. There are
 three methods of traversing a binary tree. These are called as inorder, postorder
 and preorder traversals.
 1.17
Data Structures & Algorithms - II Tree
32 48 Dec May
5 27 42 Apr
1.4.1.1 Creating Binary Search Tree [April 16, 18, 19, Oct. 17]
• The function create() simply creates an empty binary search tree. This initializes the
 root pointer to NULL.
 struct tree * root = NULL;
• The actual tree is build through a call to insert BST function, for every key to be
 inserted.
Algorithm:
 1. root = NULL
 2. read key and create a new node to store key value
 3. if root is null then new node is root
 4. t = root, flag = false
 5. while (t ≠ null) and (flag = false) do
 case 1: if key < t → data
 attach new node to t → left
 case 2: if key > t → data
 attach new node to t → right
 case 3: if t → data = key
 then print "key already exists"
 6. Stop.
 1.18
Data Structures & Algorithms - II Tree
Example: Construct Binary Search Tree (BST) for the following elements:
 15, 11, 13, 8, 9, 17, 16, 18
 1. 2. 3.
 15 15
 15
 11 11
13
 4. 5. 6.
 15 15 15
11 11 11 17
8 13 8 13 8 13
9 9
7. 15 8. 15
 17 17
 11 11
8 13 16 8 13 16 18
9 9
 Key = 16 Key = 18
• Here, all values in the left subtrees are less than the root and all values in the right
 subtrees are greater than the root.
Program 1.3: Program to create a BST.
 #include<stdio.h>
 #include<malloc.h>
 #include<stdlib.h>
 struct node
 {
 int data;
 struct node *right;
 struct node *left;
 };
 1.20
Data Structures & Algorithms - II Tree
Output:
 Enter data for node 10
 Do you want to insert more elements?y
 Enter data for node 20
 20 inserted right of 10
 Do you want to insert more elements?y
 Enter data for node 5
 5 inserted left of 10
 Do you want to insert more elements?y
 Enter data for node 15
 15 inserted right of 10
 15 inserted left of 20
 Do you want to insert more elements?y
 Enter data for node 2
 2 inserted left of 10
 2 inserted left of 5
 Do you want to insert more elements?
• To search a target key, we first compare it with the key at the root of the tree. If it is
 the same, then the search ends and if it is less than key at root, search the target key in
 left subtree else search in the right subtree.
• Example: Consider BST in the Fig. 1.23.
 Jan
Dec May
Apr
 1.22
Data Structures & Algorithms - II Tree
1.4.1.3 Inserting a Node into BST [April 16, 18, Oct. 17, 18]
• We insert a node into binary search tree in such a way that resulting tree satisfies the
 properties of BST.
Algorithm: Insert_BST (Key)
Steps: 1. t = root, flag = false
 2. while (t ≠ null) & (flag = false) do
 case 1: key < t → data
 t1 = t;
 t = t → left;
 case 2: key > t → data
 t1 = t
 t = t → right
 case 3: t → data = key
 flag = true
 display "item already exist"
 break
 end case
 end while
 3. if (t = null) then
 new = getnode (node) //create node
 new → data = key
 new → left = null //initialize a node
 new → right = null
 if (t1 → data < key) then //insert at right
 t1 → right = new
 else t1 → left = new //insert at left
 endif
 4. Stop
Recursive C function for inserting node into BST:
 BSTNODE *Insert-BST (BSTNODE *root, int n)
 {
 if (root == NULL)
 {
 root = (BSTNODE *) malloc (sizeof(BSTNODE));
 root → data = n;
 root → left = root → right = NULL;
 }
 1.24
Data Structures & Algorithms - II Tree
 else
 if (n < root → data)
 root → left = Insert-BST (root → left, n);
 else
 root → right = Insert–BST (root → right, n);
 return(root);
 }
Non-Recursive C function for inserting node into BST:
 BSTNODE *Insert-BST (BSTNODE *root, int n)
 {
 BSTNODE *temp, *newnode;
 newnode = (BSTNODE*) malloc (sizeof(BSTNODE));
 newnode → data = n;
 newnode → left = root → right = NULL;
 if (root==NULL)
 root = newnode;
 else
 {
 temp = root;
 while (temp)
 {
 if (n < temp → data)
 {
 if (temp → left==NULL)
 {
 temp → left = newnode;
 break;
 }
 else
 temp = temp → left;
 else if (n > temp → data)
 {
 if(temp → right==NULL)
 {
 temp → right = newnode;
 break;
 }
 else
 temp = temp → Rchild;}
 }
 }
 return root;
 }
 }
 }
 1.25
Data Structures & Algorithms - II Tree
• Example: Consider the insertion of keys: Heena, Deepa, Tina, Meena, Beena, Anita.
• Initially tree is empty. The first key 'Heena' is inserted, it becomes a root. Since, 'Deepa'
 < 'Heena', it is inserted at left. Similarly insert remaining keys in such a way that they
 satisfy BST properties.
• Now, suppose if we want to insert a key 'Geeta'. It is first compared with root. Since,
 ‘Geeta’ < ‘Heena’, search is proceeding to left side. Since, 'Geeta' > ‘Deepa’ and right of
 ‘Deepa’ is null then ‘Geeta’ is inserted as a right of ‘Deepa’.
Heena Heena
 1.27
Data Structures & Algorithms - II Tree
 1.28
Data Structures & Algorithms - II Tree
 12 30
 10 15 25 35
 32
 33
 Fig. 1.25: Binary Search Tree
 After deletion of leaf node with data 33.
 22 22
 12 30 Node 33 is 12 30
 deleted
 10 15 10 15
 25 35 25 35
 32 32
 33
 12 30 12 30
 10 15 10 15
 25 35 25 32
 32
 Fig. 1.27: Deletion of Node 35
 1.30
Data Structures & Algorithms - II Tree
 Case 3: The node to be deleted having both nodes. When the node to be deleted has
 both non-empty subtrees, the problem is difficult.
 One of the solution is to attach the right subtree in place of the deleted node and then
 attach the left subtree to the appropriate node of right subtree.
 From Fig. 1.28, delete the node with data 12.
 22 22
 12 15
 30 30
 10 15 10
 25 35 25 35
 32 32
 33
 33
 Delete x
 x z
 y z y
 The another approach is to delete x from T by first deleting inorder successor of node
 x say z, then replace the data content in node x by the data content in node z.
 Inorder successor means the node which comes after node x during the inorder
 traversal of T.
• Example: Consider 1.30 and delete node with data 15.
 10 10
 Delete 15
 8 15 8 16
 6 6
 20 11 20
 11
16 22 18 22
18
 1.31
Data Structures & Algorithms - II Tree
1.4.3 Tree Traversals (Preorder, Inorder and Postorder) [April 16, 19]
• Traversing a binary tree refers to the process of visiting each and every node of the
 tree exactly once.
• Traversal operation is used to visit each node (or visit each node exactly once). There
 are two methods of traversal namely, recursive and non-recursive.
 1. Recursive is a straight forward approach where the programmer has to convert
 the definitions into recursions. The entire load is on the language translator to
 carry out the execution.
 2. Non-recursive approach makes use of stack. This approach is more efficient as it
 requires less execution time.
• In addition, it is good for the programming language which do not support dynamic
 memory management scheme.
• The sequence in which these entities processed defines a particular traversal method.
• There are three traversal methods i.e. inorder traversal, preorder traversal and post
 order traversal.
• These traversal techniques are applicable on binary tree as well as binary search tree.
Preorder Traversal (DLR):
• In preorder traversal, the root node is visited before traversing it’s the left child and
 right child nodes.
 1.32
Data Structures & Algorithms - II Tree
• In this traversal, the root node is visited first, then its left child and later its right child.
 This preorder traversal is applicable for every root node of all subtrees in the tree.
 (i) Process the root Data (D) (Data)
 (ii) Traverse the left subtree of D in preorder (Left)
 (iii) Traverse the right subtree of D in preorder (Right)
 Preorder ⇒ Data − Left − Right (DLR)
Example:
• A preorder traversal of a tree in Fig. 1.31 (a) visit node in a sequence ABDECF.
 A
 B
 C
 D E F
 A B
 Fig. 1.31 (b): Prefix Expression
• In preorder traversal, we visit a node, traverse left and continue again. When we
 cannot continue, move right and begin again or move back, until we can move right
 and stop. Preorder function can be written as both recursive and non-recursive way.
Recursive preorder traversal:
 Algorithm:
 Step 1 : begin
 Step 2 : if tree not empty
 visit the root
 preorder (left child)
 preorder(right child)
 Step 3 : end
C Function for Recursive Preorder Traversal:
 void preorder (struct treenode * root)
 {
 if (root)
 {
 printf("%d \t", root → data); /*data is integer type */;
 preorder (root → left);
 preorder (root → right);
 }
 }
 1.33
Data Structures & Algorithms - II Tree
Example:
• In the Fig. 1.32, tree contains an arithmetic expression. It gives us prefix expression as,
 + * − A/BCDE. The preorder traversal is also called as depth first traversal.
 +
* E
- D
A l
 B C
 Fig. 1.32: Binary Tree for Expression
 ((A − B/C) * D) + E
Inorder Traversal (LDR):
• In Inorder traversal, the root node is visited between the left child and right child.
• In this traversal, the left child node is visited first, then the root node is visited and
 later we go for visiting the right child node.
• This inorder traversal is applicable for every root node of all subtrees in the tree. This
 is performed recursively for all nodes in the tree.
 (i) Traverse the left substree of D in inorder (Left)
 (ii) Process of root Data (D) (Data)
 (iii) Traverse the right subtree of R in inorder (Right)
 Inorder ⇒ Left − Data − Right (LDR)
Algorithm for Recursive Inorder Traversal:
 Step 1 : begin
 Step 2 : if tree is not empty
 inorder (left child)
 visit the root
 inorder (right child)
 Step 3 : end
'C' function for Inorder Traversal:
• The in order function calls for moving down the tree towards the left until, you can go
 on further. Then visit the node, move one node to the right and continue again.
 void inorder (tnode * root)
 {
 if(root)
 {
 inorder (root → left);
 printf("%d", root → data);
 inorder (root → right);
 }
 }
• This traversal is also called as symmetric traversal.
 1.34
Data Structures & Algorithms - II Tree
 Fig. 1.33
 Traversals: Preorder Traversal: log ! n
 Inorder Traversal: log n!
 Postorder Traversal: n! log
 1.35
Data Structures & Algorithms - II Tree
 (b)
 and
and <
 <
 < d
 c
 a b
b c
 Fig. 1.34
 Traversals: Preorder Traversal: and and < a b < b c < c d
 Inorder Traversal: a < b and b < c and c < d
 Postorder Traversal: a b < b c and c d < and
 (c)
 1
 Fig. 1.35
 Traversals: Preorder Traversal: 1 2 3 4 5
 Inorder Traversal: 2 4 5 3 1
 Postorder Traversal: 5 4 3 2 1
 (d)
 1
2 3
4 5 6
 7 8 9
 Fig. 1.36
 Traversals: Preorder Traversal: 1 2 4 7 3 5 6 8 9
 Inorder Traversal: 4 7 2 1 5 3 8 6 9
 Postorder Traversal: 7 4 2 5 8 9 6 3 1
 1.36
Data Structures & Algorithms - II Tree
Examples:
 Example 1: Perform in order, post order and pre order traversal of the binary tree
shown in Fig. 1.37.
 A In order traversal (left-root-right):
 DCEBGFAIHJ
 B H
 5 4
 Pre-order: 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
 9 7 11 In-order: 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
 Post-order: 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
 1 12 3
Fig. 1.38
Program 1.4: Menu driven program for binary search tree creation and tree traversing.
 #include<stdio.h>
 #include<malloc.h>
 #include<stdlib.h>
 struct node
 {
 int data;
 struct node *right;
 struct node *left;
 };
 struct node *Create(struct node *, int);
 void Inorder(struct node *);
 void Preorder(struct node *);
 void Postorder(struct node *);
 1.37
Data Structures & Algorithms - II Tree
 int main()
 {
 struct node *root = NULL;
 setbuf(stdout, NULL);
 int choice, item, n, i;
 printf("\n*** Binary Search Tree ***\n");
 printf("\n1. Creation of BST");
 printf("\n2. Traverse in Inorder");
 printf("\n3. Traverse in Preorder");
 printf("\n4. Traverse in Postorder");
 printf("\n5. Exit\n");
 while(1)
 {
 printf("\nEnter Your Choice :(1.Create 2.Inorder 3.Preorder
 4.Postorder 5.Exit)\n");
 scanf("%d",&choice);
 switch(choice)
 {
 case 1:
 root = NULL;
 printf("Enter number of nodes:\n");
 scanf("%d",&n);
 for(i = 1; i <= n; i++)
 {
 printf("\nEnter data for node %d : ", i);
 scanf("%d",&item);
 root = Create(root,item);
 }
 break;
 case 2:
 Inorder(root);
 break;
 case 3:
 Preorder(root);
 break;
 case 4:
 Postorder(root);
 break;
 1.38
Data Structures & Algorithms - II Tree
 case 5:
 exit(0);
 default:
 printf("Wrong Choice !!\n");
 }
 }
 return 0;
 }
 struct node *Create(struct node *root, int item)
 {
 if(root == NULL)
 {
 root = (struct node *)malloc(sizeof(struct node));
 root->left = root->right = NULL;
 root->data = item;
 return root;
 }
 else
 {
 if(item < root->data )
 root->left = Create(root->left,item); //recursive function call
 else if(item > root->data )
 root->right = Create(root->right,item);
 else
 printf(" Duplicate Element is Not Allowed !!!");
 return(root);
 }
 }
 void Inorder(struct node *root)
 {
 if( root != NULL)
 {
 Inorder(root->left); //recursive function call
 printf("%d ",root->data);
 Inorder(root->right);
 }
 }
 1.39
Data Structures & Algorithms - II Tree
 1.40
Data Structures & Algorithms - II Tree
4 5
Fig. 1.39
• To count total nodes of a binary tree, we traverse a tree. We can use any of the three
 traversal methods.
'C' Function:
 int CountNode(struct node *root)
 {
 static int count;
 if (root== NULL)
 return 0;
 else
 count=1+CountNodes(root  left)+CountNodes(root  right);
 return count;
 }
 OR
 int CountNode (struct node * root)
 {
 if(root == NULL)
 return 0;
 else
 return (1 + CountNode(root → left) + CountNode (root → right));
 }
 1.41
Data Structures & Algorithms - II Tree
20 20
 10 30 30 10
 5 8 32 8
 32 5
'C' Function:
 struct node * mirror(struct node * root)
 {
 struct node * temp=NULL;
 if (root != NULL)
 {
 temp = root -> left;
 root -> left = mirror(root  right);
 root -> right = mirror(temp);
 }
 return root;
 }
55 40 7 15
 25 15 7 2 25 40 55 75
 (a) Max Heap (b) Min Heap
 Fig. 1.41: Heap Trees
• Heap sort It is one of the important application of heap tree.
• Heap sort is based on heap tree and it is an efficient sorting method. We can sort
 either in ascending or descending using heap sort.
 1.43
Data Structures & Algorithms - II Tree
15 64
2 75 67 57
 80
 Fig. 1.42
 The tree of Fig. 1.42 is not a heap (max) tree, we convert it into heap tree as follows:
 32 32
15 64 15 64
2 75 67 57 80 75 67 57
 80 2
 (a) (b)
32 80
80 64 32 64
15 75 67 57 15 75 67 57
 2 2
 (c) (d)
80
75 67
15 32 64 57
 2
 (e)
 Fig. 1.43: Heap Tree (Max)
 Here, if the keys in the children are greater than the parent node, the key in parent
and in child are interchanged.
 1.44
Data Structures & Algorithms - II Tree
75 67
15 32 64 57 80 75 67 15 32 64 57 2
(a)
 1.45
Data Structures & Algorithms - II Tree
 Iteration 2: (a)
 2
75 67
15 32 64 57 2 75 67 15 32 64 57 80
 80
 (b) Swapping the root node and last node
 Iteration 3:
 75 57
15 67 15 67
2 32 64 57 2 32 64
(c) Rebuilt the heap tree (d) Swapping 57 and root node
 67 15 64 2 32 57 75 80
 67
 64
 67 15 64
 (e) Rebuilt
 15 57
 15 64 2 32 57
 2 32
 2 32 57
 64
 64 15 57 2 32 32 67 75 80
 (e) Rebuilt 57
 15
 (f)15
 Swapping and 57
 Rebuilt
 64 15 57 2 32 67 75 80
 2 32 57
 (f) Swapping and Rebuilt 2
 (g) After
 15 swapping 32
 32and 64
 32
 2
 15 57
 57 15 32 2 64 67 75 80
 2
 (g) After swapping 32 and 64 (h) Rebuilt
 15 32
 2 15 32 57 64 67 75 80
 2 15 32 57 64 67 75 80
 Sorted list when heap is empty
 (i) After swapping 2 and 57 (j) Continue and we get emptyheap
 Fig. 1.44: Tracing of Heap Sort
 Hence we get the array sorted in ascending order.
 1.46
Data Structures & Algorithms - II Tree
 1.47
Data Structures & Algorithms - II Tree
 1.48
Data Structures & Algorithms - II Tree
 1.49
Data Structures & Algorithms - II Tree
a 5 b 9
 1.50
Data Structures & Algorithms - II Tree
 Now min heap contains 5 nodes where 4 nodes are roots of trees with single
 element each, and one heap node is root of tree with 3 elements
 Character Frequency
 c 12
 d 13
 Internal Node 14
 e 16
 f 45
 Step 3 : Extract two minimum frequency nodes from heap. Add a new internal node
 with frequency 12 + 13 = 25.
 25
c 12 d 13
 Now min heap contains 4 nodes where 2 nodes are roots of trees with single
 element each, and two heap nodes are root of tree with more than one nodes.
 Character Frequency
 Internal Node 14
 e 16
 Internal Node 25
 f 45
 Step 4 : Extract two minimum frequency nodes. Add a new internal node with
 frequency 14 + 16 = 30.
 30
 e 16
 14
a 5 b 9
 1.51
Data Structures & Algorithms - II Tree
 Step 5 : Extract two minimum frequency nodes. Add a new internal node with
 frequency 25 + 30 = 55.
 55
30
 e 16
 25
 14 14
c 12 d 13 a 5 b 9
 f 45
 55
30
 e 16
 25
 14 14
c 12 d 13 a 5 b 9
 1.52
Data Structures & Algorithms - II Tree
 100
 0 1
 f 45
 55
 1
 0 30
 0 1
 e 16
 25
 14 14
 0 1 0 1
c 12 d 13 a 5 b 9
 1.55
Data Structures & Algorithms - II Tree
 if (root->right) {
 arr[top] = 1;
 printCodes(root->right, arr, top + 1);
 }
 /* If this is a leaf node, then */
 /* it contains one of the input */
 /* characters, print the character */
 /* and its code from arr[]
 if (isLeaf(root)) {
 printf("%c: ", root->data);
 printArr(arr, top);
 }
 }
 /* The main function that builds a */
 /* Huffman Tree and print codes by traversing */
 /* the built Huffman Tree */
 void HuffmanCodes(char data[], int freq[], int size)
 {
 /* Construct Huffman Tree */
 struct MinHeapNode* root = buildHuffmanTree(data, freq, size);
 /* Print Huffman codes using */
 /* the Huffman tree built above */
 int arr[MAX_TREE_HT], top = 0;
 printCodes(root, arr, top);
 }
 /* Driver program to test above functions */
 int main()
 {
 char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
 int freq[] = { 5, 9, 12, 13, 16, 45 };
 int size = sizeof(arr) / sizeof(arr[0]);
 HuffmanCodes(arr, freq, size);
 return 0;
 }
Output:
 f: 0
 c: 100
 d: 101
 a: 1100
 b: 1101
 e: 111
 1.58
Data Structures & Algorithms - II Tree
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
 1. Which is widely used non-linear data structure?
 (a) Tree (b) Array
 (c) Queue (d) Stack
 2. Which in a tree data structure stores the actual data of that particular element and
 link to next element in hierarchical structure?
 (a) Root (b) Node
 (c) Child (d) Leaf
 3. Which is the operation in tree will remove a node and all of its descendants from
 the tree?
 (a) Prune (b) Graft
 (c) Insert (d) Delete
 4. The depth of the root node = .
 (a) 1 (b) 3
 (c) 0 (d) 4
 5. Which is a set of several trees that are not linked to each other.
 (a) Node (b) Forest
 (c) Leaf (d) Root
 6. In which tree, every node can have a maximum of two children, which are known
 as left child and right child.
 (a) Binary (b) Binary search
 (c) Strictly (d) Extended
 7. Which is data structure like a tree-based data structure that satisfies a property
 called heap property?
 (a) Tree (b) Graph
 (c) Heap (d) Stack
 8. How many roots contains a tree?
 (a) 1 (b) 3
 (c) 0 (d) 4
 9. The total number of edges from root node to a particular node is called as,
 (a) Height (b) Path
 (c) Depth (d) Degree
 10. In which binary tree every node has either two or zero number of children?
 (a) Binary (b) Binary search
 (c) Strictly (d) Extended
 11. Which tree operation will return a list or some other collection containing every
 descendant of a particular node, including the root node itself?
 (a) Prune (b) Graft
 (c) Insert (d) Enumerate
 1.59
Data Structures & Algorithms - II Tree
 3
 2
 4 5 6 7
 If the postorder traversal gives (ab–cd*+) then the label of the nodes 1, 2, 3, 4, 5, 6
 will be,
 (a) +, –, *, a, b, c, d (b) a, –, b, +, c, *, d
 (c) a, b, c, d, –, *, + (d) –, a, b, +, *, c, d
 16. Which of the following statement about binary tree is correct?
 (a) Every binary tree is either complete or full
 (b) Every complete binary tree is also a full binary tree
 (c) Every full binary tree is also a complete binary tree
 (d) A binary tree cannot be both complete and full
 17. Which type of traversal of binary search tree outputs the value in sorted order?
 (a) Preorder (b) Inorder
 (c) Postorder (d) None of these
 Answers
 1. (a) 2. (b) 3. (a) 4. (c) 5. (b) 6. (a) 7. (c)
 8. (a) 9. (c) 10.(c) 11. (d) 12. (c) 13. (a) 14. (b)
 15. (a) 16. (c) 17. (b)
Q. II Fill in the Blanks:
 1. A tree is a non-linear ______ data structure.
 2. There is only ______ root per tree and one path from the root node to any node.
 3. In tree data structure, every individual element is called as ______.
 4. Nodes which belong to ______ parent are called as siblings.
 5. The ______ operation will remove a specified node from the tree.
 6. Height of all leaf nodes is ______.
 7. A ______ tree is simply a tree with zero nodes.
 1.60
Data Structures & Algorithms - II Tree
 1.61
Data Structures & Algorithms - II Tree
 18. What are different tree traversal methods? Explain with example.
 19. What is binary search tree? How to implement it? Explain with example.
 20. Traverse following trees in:
 (i) Inorder (ii) Preorder (iii) Postorder
 1
B P
Q S
 T
 I J
 Ans. Refer to Section 1.4.3.
 April 2017
 1. Write a ‘C’ function to compare two BST. [5 M]
 Ans. Refer to Section 1.4.1.2.
 2. Define skewed binary tree. [1 M]
 Ans. Refer to Section 1.2.2.
 October 2017
 1. What is siblings? [1 M]
 Ans. Refer to Section 1.1.3, Point (9).
 1.63
Data Structures & Algorithms - II Tree
B F
C D G
 H I
 Ans. Refer to Section 1.4.3.
 
 1.64
 CHAPTER
 2
 Efficient Search Trees
Objectives …
 To study AVL Trees with its Operations
 To learn Red Black Trees
 To understand B and B+ Tree with its Operations
 2.0 INTRODUCTION
• The efficiency of many operations on trees is related to the height of the tree - for
 example searching, inserting, and deleting.
• The efficiency of various operations on a Binary Search Tree (BST) decreases with
 increase in the differences between the heights of right sub tree and left sub tree of the
 root node.
• Hence, the differences between the heights of left sub tree and right sub tree should be
 kept to the minimum.
• Various operations performed on binary tree can lead to an unbalanced tree, in which
 either the height of left sub tree is much more than the height of right sub tree or vice
 versa.
• Such type of tree must be balanced using some techniques to achieve better efficiency
 level.
• The need to have balanced tree led to the emergence of another type of binary search
 tree known as height-balanced tree (also known as AVL tree) named after their
 inventor G. M. Adelson-Velsky and E. M. Landis.
• The AVL tree is a special kind of binary search tree, which satisfies the following two
 conditions:
 1. The heights of the left and right sub trees of a node differ by one.
 2. The left and right sub trees of a node (if exist) are also AVL trees.
 2.1 TERMINOLOGY
• In 1962, Adelson-Velsky and Landis introduced a binary tree structure that is balanced
 with respect to the height of subtree. That is why it is called a height balanced tree.
• The basic objective of the height balanced tree is to perform the searching, insertion
 and deletion operations efficiently.
• A balanced binary tree is a binary tree in which the heights of two sub trees (left and
 right) of every node never differ by more than 1.
 2.1
Data Structures & Algorithms - II Efficient Search Trees
• A binary search tree is said to height balanced binary tree if all its nodes have a
 balance factor of 1, 0 or − 1 i.e.,
 |hL − hR | ≤ 1
 Where, hL and hR are heights of left and right subtrees respectively.
1. Balance Factor: [Oct. 17, April 19]
• The term Balancing Factor (BF) is used to determine whether the given binary
 search tree is balanced or not.
• The BF of a code is calculated as the height of its left sub tree minus the height of
 right sub tree i.e.,
 Balance factor = hL − hR
• The balance factor of a node is a binary tree can have value +1, −1 or 0 depending on
 whether the height of its left sub tree is greater than or equal to the height of its right
 subtree.
• The balance factor of each node is indicated in the Fig. 2.1.
 o If balance factor of any node is +1, it means that the left sub-tree is one level higher
 than the right sub-tree.
 o If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
 contain equal height.
 o If balance factor of any node is −1, it means that the left sub-tree is one level lower
 than the right sub-tree.
 -1
 L
 0 E +1
 T
 0 0 +1
 B J 0
 N V
 0
 P
 0
 25
 1 0
 Balance factor 20 36
 -1 0 1 0
 10 22 30 40
 0 0 0 0
 12 28 38 48
 Fig. 2.2
3. Red Black Tree:
• A red black tree is a variant of Binary Search Tree (BST) in which an additional
 attribute, ‘color’ is used for balancing the tree. The value of this attribute can be
 either red or black.
• The red black trees are self-balancing binary search tree. In this type of tree, the leaf
 nodes are the NULL/NIL child nodes storing no data.
• In addition to the conditions satisfied by binary search tree, the following
 conditions/rules must also be satisfied for being a red black tree:
 (i) Each and every node is either red or black.
 (ii) The roof node and leaf nodes are always black in color.
 (iii) If any node is red, then both its child nodes are black.
 (iv) Each and every path from a given node to the leaf node contains same
 number of black nodes. The number of black nodes on such a path is known
 as black-height of a node.
• Fig. 2.3 shows a red back tree.
 4
2 6
1 3 5 8
 7 9
 NULL node
 Fig. 2.3
 2.3
Data Structures & Algorithms - II Efficient Search Trees
• The AVL trees are more balanced compared to red black trees, but they may cause
 more rotations during insertion and deletion.
• So if the application involves many frequent insertions and deletions, then Red Black
 trees should be preferred.
• And if the insertions and deletions are less frequent and search is a more frequent
 operation, then AVL tree should be preferred over red black tree.
4. Splay Tree:
• Splay tree was invented by D. D. Sleator and R. E. Tarjan in 1985. According them the
 tree is called splay tree (splay means to spread wide apart).
• Splay tree is another variant of a Binary Search Tree (BST). In a splay tree, recently
 accessed element is placed at the root of the tree.
• Splay Tree is a self-adjusted/self-balancing Binary Search Tree in which every
 operation on element rearranges the tree so that the element is placed at the root
 position of the tree.
• All normal operations on a binary search tree are combined with one basic operation,
 called splaying.
• Splaying the tree for a certain element rearranges the tree so that the element is
 placed at the root of the tree.
• Fig. 2.4 shows an example of splay tree.
 j
f l
b i k m
a e g
d h
 Fig. 2.4
Rotations in Splay Tree:
• In zig rotation, every node moves one position to the right from its current position.
• In zag rotation, every node moves one position to the left from its current position.
• In zig-zig rotation, every node moves two positions to the right from its current
 position.
• In zag-zag rotation, every node moves two positions to the left from its current
 position.
• In zig-zag rotation, every node moves one position to the right followed by one
 position to the left from its current position.
• In zag-zig rotation, every node moves one position to the left followed by one position
 to the right from its current position.
 2.4
Data Structures & Algorithms - II Efficient Search Trees
 A B C Y Z A B C Y Z A B C Y Z
 ...
 A B C Y Z A B C Y Z
 ... ...
 2.5
Data Structures & Algorithms - II Efficient Search Trees
• Trie data structure makes retrieval of a string from the collection of strings more
 easily.
• Trie is also called as Prefix Tree and sometimes Digital Tree. Trie is a tree like data
 structure used to store collection of strings.
• Fig. 2.6 shows an example of trie. Consider a list of strings Cat, Bat, Ball, Rat, Cap, Be.
 Root *
B C R
a e a a
l t $ t t
 l $ $ $
 $
 Indicates end of a string
 Fig. 2.6
2.2.1 Concept
• AVL, named after inventors Adelson-Velsky and Landis, is a binary tree that self-
 balances by keeping a check on the balance factor of every node.
• The balance factor of a node is the difference in the heights of the left and right
 subtrees. The balance factor of every node in the AVL tree should be either +1, 0 or -1.
• AVL trees are special kind of binary search trees. In AVL trees, difference of heights of
 left subtree and right subtree of any node is less than or equal to one.
• AVL trees are also called as self-balancing binary search trees. The node structure of
 AVL tree are given below:
 struct AVLNode
 {
 int data;
 struct AVLNode *left, *right;
 int balfactor;
 };
• AVL tree can be defined as, let T be a non-empty binary tree with TL and TR as its left
 and right subtrees. The tree is height balanced if:
 o TL and TR are height balanced.
 o hL − hR ≤ 1, where hL − hR are the heights of TL and TR.
 2.6
Data Structures & Algorithms - II Efficient Search Trees
 J
 +1
 F -1 P
 +1
 D
 G 0 L V -1
 -1 +1
 C 0
 0 N S X 0
 0
 Q U
 0 0
 Fig. 2.7: AVL Tree with Balance Factors
• Balance factor of a node is the difference between the heights of the left and right
 subtrees of that node. Consider the following binary search tree in Fig. 2.8.
 90
63 200
 30
 175 300
 21 35
 180 220 320
 10
 184
 182 186
 Fig. 2.8: Binary Search Tree (BST)
 Height of tree with root 90 (90) = 1 + max (height (63), height (200))
 Height of (63) = 1 + height (30)
 Height (30) = 1 + max (height (21), height (35))
 Height (21) = 1 + height (10)
 Height (10) = 1
 Therefore,
 Height (21) = 1 + 1 = 2
 Height (30) = 1 + max (2, 1) = 1 + 2 = 3
 Height (63) = 1 + 3 = 4
 Height (200) = 1 + max (height (175), height (300))
 Height (175) = 1 + height (180)
 Height (180) = 1 + height (184)
 Height (184) = 1 + max (height (182), height (186))
 Height (182) = 1
 Height (186) = 1
 2.7
Data Structures & Algorithms - II Efficient Search Trees
 Height (184) = 1 + 1 = 2
 Height (180) = 1 + 2 = 3
 Height (175) = 1 + 3 = 4
 Height (200) = 1 + max. (height (175), height (300))
 = 1 + max (4, 2)
 = 1+4=5
 Height (90) = 1 + max (height (63), height (200))
 = 1 + max (4, 5)
 = 1+5=6
• Thus, this tree has height 6. But from this we do not get any information about balance
 of height. The tree is said to be balanced if the difference in the right subtree and left
 subtree is not more than 1.
• Consider the above example in which all the leaf nodes have a balance factor of 0.
 BF (21) = hL (10) − 0 = 1 − 0 = 1
 BF (30) = hL − hR = 2 − 1 = 1
 BF (63) = 3 − 0 = 3
 BF (184) = 1 − 1 = 0
 BF (180) = 0 − 2 = − 2
 BF (175) = 0 − 3 = − 3
 BF (300) = 1 − 1 = 0
 BF (200) = 4 − 2 = 2
 BF (90) = 4 − 5 = − 1
• Hence, the above tree is not height balanced. In order to balance a tree, we have to
 perform rotations on the tree.
2.2.2 Rotations
• Rotation is the process of moving nodes either to left or to right to make the tree
 balanced. To balance itself, an AVL tree may perform the following four kinds of
 rotations:
 1. Left Rotation (LL Rotation)
 2. Right Rotation (RR Rotation)
 3. Left-Right Rotation (LR Rotation)
 4. Right-Left Rotation (RL Rotation)
• The first two rotations are single rotations and the next two rotations are double
 rotations.
Single Left Rotation (LL Rotation):
• In LL Rotation, every node moves one position to left from the current position.
• To understand LL Rotation, let us consider the following insertion operation in AVL
 tree.
 2.8
Data Structures & Algorithms - II Efficient Search Trees
 insert 1, 2 and 3
 -2 -2
 1 1
 -1 -1
 0
 2 2
 2
 0 0 0
 0
 1 3
 3 3
Fig. 2.9
• In RR Rotation, every node moves one position to right from the current position.
 2 2
 3 3
 1 1
 0
 2 2
 2
 0 0 0 0
 1 3
 1 1
Fig. 2.10
• The LR Rotation is a sequence of single left rotation followed by a single right rotation.
• In LR Rotation, at first, every node moves one position to the left and one position to
 right from the current position.
 2.9
Data Structures & Algorithms - II Efficient Search Trees
 insert 3, 1 and 2
 2
 2 2 3
 3 3 0
 -1 -1 1 2
 1 1 After LL 2 After RR 0 0
 Rotation Rotation
 0 0 1 3
 0
 2 2
 1
 2.10
Data Structures & Algorithms - II Efficient Search Trees
 insert 2
 -1
 1
 Tree is balanced
 0
 2
 insert 3
 -2 -2
 1 1 0
 2
 -1 -1 After LL Rotation 0 0
 2 2 1 3
 0 0
 3 3
 0 -1 Tree is balanced
 1 3
 0
 4
 insert 5
 -2 -2 -1
 2 2 2
 0 0 0
 -2 -2 0
 1 1 1
 3 3 4
 After LL
 -1 -1 Rotation at 3 0 0
 4 4 3 5
 0 0
 5 5
 2.11
Data Structures & Algorithms - II Efficient Search Trees
 insert 6
 -2 -2
 2 2
 0 0
 -1 -1 0
 1 1
 4 3 4
 0 0 After LL 0
 -1 -1 Rotation at 2 -1
 3 5 3 4 2 5
 0 0 0 0 0
 Becomes right
 6 child of 2 5 1 3 6
 insert 7
 -1 -1
 4 4
 0 0
 -2 -2 0
 2 2
 5 5 4
 0 0 0 0 After LL 0
 -1 -1 Rotation at 5
 1 3 6 1 3 6 2 6
 0 0 0 0 0 0
 7 7 1 3 5 7
 insert 8
 -1
 4
 0 -1
 2 6
 0 0 0 -1
 1 3 5 7
 0
 8
 Tree is balanced
 2.12
Data Structures & Algorithms - II Efficient Search Trees
Example: Delete the node 30 from the AVL tree shown in the Fig. 2.13.
 (0)
 15
 (0) (0)
 12 54
 (0) (-1) (1) (0)
 8 13 18 60
 Fig. 2.13
 Step 1:
 • The node to be deleted from the tree is 8.
 • If we observe it is the parent node of the node 5 and 9.
 • Since the node 8 has two children it can be replaced by either of its child nodes.
 (0)
 15
 (0) (0)
 12 54
 (0) (-1) (1) (0)
 8 13 18 60
 Fig. 2.14
 Step 2:
 • The node 8 is deleted from the tree.
 • As the node is deleted we replace it with either of its children nodes.
 • Here we replaced the node with the inorder successor, i.e. 9.
 • Again we check the balance factor for each node.
 (0)
 15
 (0) (0)
 12 54
 (1) (-1) (1) (0)
 9 13 18 60
 Fig. 2.15
 2.13
Data Structures & Algorithms - II Efficient Search Trees
 Step 3:
 • Now The next element to be deleted is 12.
 • If we observe, we can see that the node 12 has a left subtree and a right subtree.
 • We again can replace the node by either it’s inorder successor or inorder
 predecessor.
 • In this case we have replaced it by the inorder successor.
 (0)
 15
 (0) (0)
 12 54
 (1) (-1) (1) (0)
 9 13 18 60
Fig. 2.16
 Step 4:
 • The node 12 is deleted from the tree.
 • Since we have replaced the node with the inorder successor, the tree structure
 looks like shown in the Fig. 2.17.
 • After removal and replacing check for the balance factor of each node of the tree.
 (0)
 15
 (1) (0)
 13 54
 (1) (0) (1) (0)
 9 14 18 60
Fig. 2.17
 Step 5:
 • The next node to be eliminated is 14.
 • It can be seen clearly in the image that 14 is a leaf node.
 • Thus it can be eliminated easily from the tree.
 2.14
Data Structures & Algorithms - II Efficient Search Trees
 (0)
 15
 (1) (0)
 13 54
 (1) (0) (1) (0)
 9 14 18 60
 Fig. 2.18
 Step 6:
 • As the node 14 is deleted, we check the balance factor of all the nodes.
 • We can see the balance factor of the node 13 is 2.
 • This violates the terms of the AVL tree thus we need to balance it using the rotation
 mechanism.
 (0)
 15
 (2) (0)
 13 54
 (1) (1) (0)
 9 18 60
 Fig. 2.19
 Step 7:
 • In order to balance the tree, we identify the rotation mechanism to be applied.
 • Here, we need to use LL Rotation.
 • The nodes involved in the rotation is shown in Fig. 2.20.
 (0)
 15
 (2) (0)
 13 54
 (1) (1) (0)
 9 18 60
 Fig. 2.20
 2.15
Data Structures & Algorithms - II Efficient Search Trees
 Step 8:
 • The nodes are rotated and the tree satisfies the conditions of an AVL tree.
 • The final structure of the tree is shown as follows.
 • We can see all the nodes have their balance factor as ‘0’, ‘1’ and ‘-1’.
 (-1)
 15
 (0) (0)
 9 54
 (0) (0) (1) (0)
 5 13 18 60
Fig. 2.21
 Kamal
 0
(iii) Archana
 Manisha
 2 Kamal
 0
 LL
 Rotation
 Kamal
 1
 Archana Manisha
 0 0
 Archana
 0
 2.16
 (iv) Reena
 Kamal
 -1
 No balancing is required
 Archana Manisha
 0 -1
 Manisha
 2 Kamal
 0
 LL
 Rotation
 Kamal
 1
 Archana Manisha
 0 0
 Archana
Data Structures0& Algorithms - II Efficient Search Trees
(iv) Reena
 Kamal
 -1
 No balancing is required
 Archana Manisha
 0 -1
 Reena
(v) Nidhi 0
 Kamal Kamal
 -2 -1
 RL
 Archana Manisha Rotation Archana Nidhi
 0 -2 0 0
 Nidhi
 0
 (vi) Shalaka
 Kamal Nidhi
 -2 0
 RR
 Rotation
 Archana Nidhi Kamal Reena
 0 -1 0 -1
 Shalaka
 0
 Leena
 0
 Leena Meena
 0 0
 Sun
 Mon Mon Mon RR 0
 (i) 0 (ii) -1 (iii) -2 Rotation
 Sun Thur
 Sun Sun
 0 Mon Thur
 -1
 0 0
 Thur
 (iv) Fri Sun 0
 +1
 Fri
 0
 No balancing
 Mon Thur Mon Thur required
 0 0 0 -1
 RL
 No balancing
 Rotation Thur No balancing
 Thur MonMon
 Mon Thur
 Thur required Mon Mon Thur
 Thur -2 required
 0 00 0 -1 -2
 0 0 0 -1
 Tue
 -1
 Fri Fri
 Fri Sat Sat Wed
 Wed Fri Sat Wed
 Sat Sat Fri Sat
 00 0 01 0 0
 0 00 0 0 0 0
 Wed
 Tue 0
n (vii) Tue Sun 0
 Sun Sun
 -1 0 Sun 0
 0
 RL RL
 Rotation RL Thur Rotation
 Thur Thur
 Mon MonRotation Thur-2 Mon
 Tue
 0 0 Mon -2
 -2 -2 00
 0
 Tue Tue
 -1 -1
t Wed Fri Fri Sat Sat Wed Fri Sat
 1 0 0 Fri Sat 0 Thur
 0 0 1 0Wed
 0 0 Wed 0 0 Wed
 Tue 0 0
 Tue
 0 0
 Sun Sun
 0 0
RL RL
ation Tue Rotation
 Mon Tue
 0 Mon
 0 0
 0
 2.19
Data Structures & Algorithms - II Efficient Search Trees
 40 40
 0 1 20 50
 0 0
 20
 0
 80
 No balancing required
 0
 (vi) 200 40 80
 -2 0
 RL
 Rotation 100
 20 80 40
 0 -1 0 -1
 50 100
 0 -1 20 50 200
 0 0 0
 200
 0
 150
 0
 2.20
Data Structures & Algorithms - II Efficient Search Trees
 Blue Blue
 0 -1 Blue Red
 0 0
 Green
 0
 Orange Red
 Orange Orange 0
 -1 0
 0
 Pink
 No balancing required
 0
 (vi) Green
 0
 Blue Pink
 1 0
 Black Red
 Orange
 0 0
 0
No balancing required
 2.21
Data Structures & Algorithms - II Efficient Search Trees
8 17
1 11 15 25
 2.22
Data Structures & Algorithms - II Efficient Search Trees
 insert (8)
 Tree is Empty. So insert new node as Root node with black color.
 8
 insert (18)
 Tree is not Empty. So insert new node with red color.
 18
 insert (5)
 Tree is mat Empty. So insert new node with red color.
 5 18
 insert ( 15 )
 Tree is not Empty. So insert new node with red color.
 8
 Here there are two consecutive Red nodes (18 and 15).
 5 18 The new node's parent sibling color is Red
 and parent's parent is root node.
 15 So we use RECOLOR to make it Red Black Tree.
After RECOLOR
 8
 Here there are two consecutive
 Red nodes (15 and 17).
 5 18
 The new node's parent sibling is NULL.
 So we need rotation.
 15 Here. we need LR Rotation and Recolor
17
17 15 18
15
 2.23
Data Structures & Algorithms - II Efficient Search Trees
 insert ( 25 )
 Tree is not Empty. So insert new node with red color.
 8
 Here there are two consecutive
 5 17 Red nodes (18 and 25),
 The new node's parent sibling color is Red
 18 and parent's parent is not root node.
 15
 So we use RECOLOR and Recheck.
 25
 After Recolor
 8
 After Recolor operation, the tree is satisfying
 5 17 all Red Black Tree properties.
15 18
25
 insert (40)
 Tree is not Empty. So insert new node with red color.
 8
 Here there are two consecutive
 5 17 Red nodes (25 and 40).
 The new node's parent sibling is NULL
 15 18 So we need a Rotation and Recolor.
 Here, we use LL Rotation and Recheck
 25
 After LL Rotation 40
 and Recolor
 5 17
 After LL Rotation and Recolor operation,
 the tree is satisfying all Red Black Tree properties.
 15 25
18 40
 2.24
Data Structures & Algorithms - II Efficient Search Trees
 Insert (80)
 Tree is not Empty. So insert new node with red color.
 8
 Here there are two consecutive
 5 17 Red nodes (40 and 80). The new node's
 parent sibling color is Red
 15 25 and parent's parent is not root node.
 So we use RECOLOR and Recheck.
 18 40
 80
 After Recolor
 8
 After Recolor again there are two consecutive
 5 17 Red nodes (17 and 25). The new node's
 parent sibling color is Black. So we need
 15 25 Rotation. We use Left Rotation and Recolor.
18 40
80
17
8 25
5 15 18 40
80
 Finally above tree is satisfying all the properties of Red Black Tree and
 it is a perfect Red Black tree.
• The deletion operation in red black tree is similar to deletion operation in BST. but
after every deletion operation, we need to check with the red-black tree properties.
• If any of the properties are violated then make suitable operations like recolor,
• In this example, we show the red black trees that result from the successful deletion of
 2.25
Data Structures & Algorithms - II Efficient Search Trees
 Delete 8
 B B
 38 38
 R B R B
 19 41 19 41
 R B
 R B
 12 31 12 31
 8
 Delete 12
 B B
 38 38
 R B B B
 Case-2
 19 41 19 41
 B
 B R
 12 31 31
 Delete 19
 B B
 38 38
 B B R B
 19 41 31 41
 R
 31
 Delete 31
 B B
 38 38
 B R
 Case-2
 31 41 41
 Delete 38 B
 B
 38
 41
 R
 41
 Delete 41
 No/Empty Tree.
• A multiway tree can have more than one value/child per node. They are written as m-
 way trees where the m means the order of the tree. A multiway tree can have m-1
 values per node and m children.
• An m-way tree is a search tree in which each node can have from 0 to m subtrees,
 where m is defined as the B-tree order.
• Fig. 2.23 shows an example of m-way tree of order 4.
 50 100 150
60 70 90 110 120
75
17 23 42 68 85 96
16 21 57 78 91
11 17 19 21 23 24 45 63 65 85 90 95
2.4.1 B-Tree
• In search trees like binary search tree, AVL tree, red-black tree, etc., every node
 contains only one value (key) and a maximum of two children.
• But there is a special type of search tree called B-tree in which a node contains more
 than one value (key) and more than two children.
• B-Tree was developed in the year 1972 by Bayer and McCreight with the name Height
 Balanced m-way Search Tree. Later it was named as B-Tree.
• A B-tree is a specialized m-way tree that is widely used for disk access. A B tree of
 order m can have a maximum of m–1 keys and m pointers to its sub-trees.
• B-tree is a type of tree in which each node can store multiple values and can point to
 multiple subtrees.
• The B-trees are useful in case of very large data that cannot accommodated the main
 memory of the computer and is stored on disks in the form of files.
• The B-tree is a self-balancing search tree like AVL and red-black trees. The main
 objective of B-trees is to minimize/reduce the number of disk accesses for accessing a
 record.
• Every B-tree has an order. B-tree of order m has the following properties:
 1. All leaf nodes must be at same level.
 2. All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
 3. All non-leaf nodes except root (i.e. all internal nodes) must have at
 least m/2 children.
 4. If the root node is a non-leaf node, then it must have at least two children.
 5. A non-leaf node with n-1 keys must have n number of children.
 6. All the key values in a node must be in Ascending Order.
• Fig. 2.26 shows an example of B-tree of order 5.
 66
35 45 80 100
20 30 38 40 48 52 56 68 70 73 77 85 90 110 115
• Now, while inserting the element in the searched leaf node following one of the two
 cases may arise:
 1. There may be a space in the leaf node to accommodate more elements. In that
 case, the element is simply added to the node in such a way that the order of
 elements is maintained.
 2. There may not be a space in the leaf node to accommodate more elements. In that
 case, after inserting new element in the full leaf node, a single middle element is
 selected from these elements and is shifted to the parent node and the leaf is split
 into two nodes namely, left node and right node (at the same level).
• All the elements less than the middle element are placed in the left node and all the
 elements greater than the middle element are placed in the right node.
• If there is no space for middle element in the parent node, the splitting of the parent
 node takes place using the same procedure.
• The process of splitting may be repeated all the way to the root. In case the splitting of
 root node takes place, a new root node is created that comprises the middle element
 from the old root node.
• The rest of the elements of the old root node are distributed in two nodes created as a
 result of splitting. The splitting of root node increases the height of B-tree by one.
• For example, consider the following step-by-step procedure for inserting elements in
 the B-tree of order 4, i.e. any node can store at most 3 elements and can point to at
 most 4 subtrees.
• The elements to be inserted in the B-tree are 66, 90, 40, 75, 30, 35, 80, 70, 20, 50, 45, 55,
 110, 100, and 120.
• The element 66 forms the part of new root node, as B tree is empty initially
 [Fig. 2.27 (a)].
• Since, each node of B tree can store up to 3 elements, the elements 90 and 40 also
 become part of the root node [Fig. 2.27 (b)].
• Now, since the root node is full, it is split into two nodes. The left node stores 40, the
 right node stores 90, and middle element 66 becomes the new root node. Since, 75 is
 less than 90 and greater than 66, it is placed before 90 in the right node [Fig. 2.27 (c)].
• The elements 30 and 35 are inserted in left sub tree and the element 80 is inserted in
 the right sub tree such that the order of elements is maintained [Fig. 2.27 (d)].
• The appropriate position for the element 70 is in the right sub tree, and since there is
 no space for more elements, the splitting of this node takes place.
• As a result, the middle element 80 is moved to the parent node, the element 75 forms
 the part of the left sub tree (of element 80) and the element 90 forms the part of the
 right sub tree (of element 80). The new element 70 is placed before the element 75
 [Fig. 2.27 (e)].
• The appropriate position for the element 20 is in the left most sub tree, and since
 there is no space for more elements, the splitting of this node takes place as discussed
 in the previous step. The new element 20 is placed before the element 30
 [Fig. 2.27 (f)].
 2.29
Data Structures & Algorithms - II Efficient Search Trees
• This tree can be used for future insertions, but a situation may arise when any of the
 sub trees splits and it will be required to adjust the middle element from that sub tree
 to the root node where there is no space for more elements.
• Hence, keeping in mind the future requirements, as soon as root node becomes full,
 splitting of root node must take place [Fig. 2.27 (g)]. This splitting of root node
 increases the height of tree by one.
• Similarly, other elements 50, 45, 55, 110, 100, and 120 can be inserted in this B-tree.
 The resultant B-tree is shown in Fig. 2.27 (h).
 66 66
66 40 66 90 40 90 40 75 90
66
 30 35 40 75 80 90
 (d)
 66 80 66 80
30 35 40 75 90 30 35 40 70 75 90
(e)
35 66 80 35 66 80
30 40 70 75 90 20 30 40 70 75 90
(f)
66
35 80
20 30 40 70 75 90
 (g)
 2.30
Data Structures & Algorithms - II Efficient Search Trees
66
35 45 80 100
20 30 40 50 55 70 75 90 110 120
 (h)
 Fig. 2.27: Insertion Operation in B-Tree Deletion Operation on B-Tree:
• Deletion of an element from a B-tree involves following two steps:
 1. Searching the desired element and
 2. Deleting the element.
• Whenever, an element is deleted from a B-tree, it must be ensured that no property of
 B-tree is violated after the deletion.
• The element to be deleted may belong to either leaf node or internal node.
• Consider a B-tree in Fig. 2.28.
 13
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
 Fig. 2.28
• If we want to delete 8 then it is very simple (See Fig. 2.29).
 13
4 7 17 20
1 3 5 6 11 12 14 16 18 19 23 24 25 26
 Fig. 2.29
• Now we will delete 20, the 20 is not in a leaf node so we will find its successor which is
 23. Hence, 23 will be moved upto replace 20.
 2.31
Data Structures & Algorithms - II Efficient Search Trees
13
4 7 17 23
1 3 5 6 11 12 14 16 18 19 24 25 26
 Fig. 2.30
• Next we will delete 18. Deletion of 18 from the corresponding node causes the node
 with only one key, which is not desired in B-tree of order 5.
• The slibling node to immediate right has an extra key. In such a case we can borrow a
 key from parent and move spare key of sibling to up (See Fig. 2.31).
 13
4 7 17 24
1 3 5 6 11 12 14 16 19 23 25 26
 Fig. 2.31
• Now delete 5. But deletion of 5 is not easy. The first thing is 5 is from leaf node.
 Secondly this leaf node has no extra keys nor siblings to immediate left or right.
• In such a situation we can combine this node with one of the siblings. That means
 removes 5 and combine 6 with the node 1, 3.
• To make the tree balanced we have to move parent's key down. Hence, we will move 4
 down as 4 is between 1, 3 and 6. The tree will be looks like as shown in Fig. 2.32.
 13
7 17 24
1 3 4 6 11 12 14 16 19 23 25 26
 Fig. 2.32
• But again internal node of 7 contains only one key which not allowed in B-tree. We
 then will try to borrow a key from sibling.
 2.32
Data Structures & Algorithms - II Efficient Search Trees
• But sibling 17, 24 has no spare key. Hence, what we can do is that, combine 7 with 13
 and 17, 24. Hence the B-tree will be looks like as shown in Fig. 2.33.
 7 13 17 24
1 3 4 6 11 12 14 16 19 23 25 26
Fig. 2.33
2.4.2 B+ Tree
• B+ (B Plus) Tree is a balanced, multi-way binary tree. The B+ Trees are extended
 version of B-Trees.
• A B+ tree is an m-ary tree with a variable but often large number of children per node.
 A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or
 a node with two or more children.
• A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–
 value pairs), and to which an additional level is added at the bottom with linked
 leaves.
• The B+ tree is a variation of B-tree in a sense that unlike B-tree, it includes all the key
 values and record pointers only in the leaf nodes, and key values are duplicated in
 internal nodes for defining the paths that can be used for searching purposes.
• In addition, each leaf node contains a pointer, which points to the next leaf node, that
 is, leaf nodes are linked sequentially (See Fig. 2.34).
• Hence, B+ tree supports fast sequential access of records in addition to the random
 access feature of B-tree.
• Note that in case txt B+ trees, if key corresponding to the desired record is found in
 any internal node, the traversal continues until its respective leaf node is reached to
 access the appropriate record pointer.
• Fig. 2.34 shows a sample B+ tree.
 Duplicated key
 40
 values
30 40 77 90
20 30 38 40 48 52 56 68 70 73 77 85 90 110 115
 Only leaf nodes having record pointers with the key values
 Fig. 2.34
 2.33
Data Structures & Algorithms - II Efficient Search Trees
 Fig. 2.35
• The value 195 will be inserted in the right sub-tree of 120 after 190. Insert it at the
 desired position (See Fig. 2.36).
 60 78 108 120
 Fig. 2.36
• The node contains greater than the maximum number of elements i.e. 4, therefore
 split it and place the median node up to the parent.
 60 78 108 120 190
 Fig. 2.37
• Now, the index node contains 6 children and 5 keys which violates the B+ tree
 properties, therefore we need to split it, shown in Fig. 2.38.
 108
60 78 120 190
 Fig. 2.38
Deletion in B+ Tree:
• Delete the key and data from the leaves.
• Delete the key 200 from the B+ tree shown in the Fig. 2.39.
 2.34
Data Structures & Algorithms - II Efficient Search Trees
108
60 78 120 190
 Fig. 2.39
• The value 200 is present in the right sub-tree of 190, after 195, delete it.
 108
60 78 120 190
 Fig. 2.40
• Merge the two nodes by using 195, 190, 154 and 129.
 108
60 78 120
 Fig. 2.41
• Now, element 120 is the single element present in the node which is violating the
 B+ Tree properties. Therefore, we need to merge it by using 60, 78, 108 and 120.
• Now, the height of B+ tree will be decreased by 1.
 60 78 108 120
Fig. 2.42
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
 1. Which factor on many tree operations related to the height of the tree?
 (a) Efficiency (b) Degree
 (c) Sibling (d) Path
 2.35
Data Structures & Algorithms - II Efficient Search Trees
 14. A B-tree of order 4 and of height 3 will have a maximum of _______ keys.
 (a) 255 (b) 63
 (c) 127 (d) 188
 15. Consider the below formations of red-black tree:
 100 100 100
 50 50 50
 NULL NULL NULL
 18 18 18
 NULL NULL NULL
 All the above formations are incorrect for it to be a red black tree. then what may
 be the correct order?
 (a) 50-black root, 18-red left subtree, 100-red right subtree
 (b) 50-red root, 18-red left subtree, 100-red right subtree
 (c) 50-black root, 18-black left subtree, 100-red right subtree
 (d) 50-black root, 18-red left subtree, 100-black right subtree
 16. What is the special property of red-black trees and what root should always be?
 (a) a color which is either red or black and root should always be black color
 (b) height of the tree
 (c) pointer to next node
 (d) a color which is either green or black
 Answers
 1. (a) 2. (d) 3. (c) 4. (c) 5. (b) 6. (a) 7. (a)
 8. (d) 9. (c) 10.(a) 11. (b) 12. (d) 13. (b) 14. (a)
 15. (a) 16. (a)
 2.37
Data Structures & Algorithms - II Efficient Search Trees
 2.39
Data Structures & Algorithms - II Efficient Search Trees
 2.40
 CHAPTER
 3
 Graph
Objectives …
 To study Basic Concepts of Graph Data Structure
 To learn Graph Terminology
 To understand Representation and Operations of Graph
 To study Graph Traversals
 To learn Greedy Strategy, Dynamic Programming Strategy of Graphs
 3.0 INTRODUCTION
• Graph is one of the most important non-linear data structure. A graph is a pictorial
 representation of a set of objects where some pairs of objects (vertices) are connected
 by links (edges).
• A graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
 connecting the pairs of vertices.
3.1.1 Concept
• A graph G is a set of two tuples G = (V, E) where, V is a finite non-empty set of vertices,
 and E is the set of pairs of vertices called edges.
• Fig. 3.1 shows an example of graph.
 B
 A
C D
 3.1
Data Structures & Algorithms - II Graph
• There are two types of Graph, Undirected graph and Directed graph.
1. Undirected Graph:
• In an undirected graph, the pair of vertices representing any edge is unordered i.e. the
 pairs (v1, v2) and (v2, v1) represent the same edge.
• In other words, the edges have no direction in undirected graph.
Example: Consider the Fig. 3.2.
 B B A
 A A
B C
D C D C D E
 A
 B C B C
 C D
E D D E F
 (a)
 (a) (b) (c)
 (b) (c)
 Fig. 3.3: Directed Graphs
• In Fig. 3.3 (a) G = (V, E) where V = {A, B, C, D, E} and E = {(A, B), (A, C), (C, D),
 (A, E), (E, D)}.
• In Fig. 3.3 (b) G = (V, E) where V = {A, B, C, D} and E = {(A, B), (A, C), (C, D), (D, B)}.
• In Fig. 3.3 (c) G = (V, E) where V = {A, B, C, D} and E = {(A, B), (A, C), (B, D), (B, E), (C, F)}.
 3.2
Data Structures & Algorithms - II Graph
3.1.2 Terminology
• Basic graph terminology listed below:
 1. Adjacent Vertex: When there is an edge from one vertex to another then these
 vertices are called adjacent vertices. Node 1 is called adjacent to node 2 as there
 exists an edge from node 1 and node 2 as shown in Fig. 3.4.
 4
 2
 3
 Fig. 3.4
 2. Cycle: A path from a vertex to itself is called a cycle. Thus, a cycle is a path in
 which the initial and last vertices are same.
 3.3
Data Structures & Algorithms - II Graph
 Example: Fig. 3.5 shows the path (A, B, C, A) or (A, C, D, B, A) are cycles of different
 lengths. If a graph contains a cycle it is cyclic otherwise acyclic.
 A
B C
C D
 C C
 Fig. 3.7: Connected
 (a) Graph Fig. 3.8 Non-connected Graph
 5. Degree of a Vertex: It is the number of edges incident(b)
 to a vertex. It is written as
 degree(V), where V is a vertex.
 6. In-degree of a Vertex: In directed graph, in-degree of a vertex ‘v’ is the number of
 edges for which ‘v’ is the head.
 A B
 Indegree of vertex B = 1
 D Indegree of vertex C = 2
 Indegree of vertex D = 2
 E C Indegree of vertex E = 1
 Indegree of vertex F = 1
 F Indegree of vertex A = 1
 Fig. 3.9: Digraph
 3.4
Data Structures & Algorithms - II Graph
 7. Out-degree of a Vertex: In directed graph, the out-degree of a vertex ‘v’ is the total
 number of edges for which ‘v’ is the tail.
 From Fig. 3.9:
 Outdegree of A = 1
 Outdegree of B = 2
 Outdegree of C = 2
 Outdegree of D = 0
 Outdegree of E = 2
 Outdegree of F = 1
 8. Isolated Vertex: If any vertex does not belong to any edge then it is called isolated
 vertex.
 B
 D
 A
C B
B C A
D B C
 (a) Graph
 (a) G (b) Subgraph of G
 Fig. 3.12
 3.5 (b)
Data Structures & Algorithms - II Graph
 13. Weighted Graph: A weighted graph is a graph in which every edge is assigned a
 weight. In Fig. 3.13 weight in a graph denote the distance between the two vertices
 connected by the corresponding edge. The weight of an edge is also called its cost.
 In case of a weighted graph, an edge is a 3-tuple (U, V, W) where U and V are
 vertices and W is weight of an edge (U, V).
 5
 4
 1
 2
 5
 2 58
 14
 4
 3 34
 5
A B A B
C D C D
1 2
3 4
 3.6
Data Structures & Algorithms - II Graph
3.2.1 Sequential Representation of Graph [April 16, 17, 18, 19 Oct. 17, 18]
• Graphs can be represented through matrix (array) in computer system's memory. This
 is sequential in nature. This type of representation is called sequential representation
 of graphs.
• The graph when represented using sequential representation using matrix, is
 commonly known as Adjacency Matrix.
• Let G = (V, E) be a graph with n vertices, where n>=1. An adjacency matrix of G is a
 2-dimentional n × n array, say A, with the following property:
 A[i][j] = 1 if the edge (vi, vj) is in E(G)
 = 0 if there is no such edge in G
• If the graph is undirected then,
 A[i] [j] = A[j] [i] = 1
 1 1
 1
2 3 2 3 2
 4 5 6 7 3
 4
 G1 G2 G3
 Fig. 3.16: Undirected Graph G1, G2 and Directed Graph G3
• The graphs G1, G2 and G3 of Fig. 3.16 are represented using adjacency matrix in
 Fig. 3.17.
 1 2 3 4 1 2 3
 1 0 1 1 1 1 0 1 0
 2 1 0 1 1 2 1 0 1
 3 1 1 0 1 3 0 0 0
 4 1 1 1 0
 G1 G3
 3.7
Data Structures & Algorithms - II Graph
 1 2 3 4 5 6 7
 1 0 1 1 1 0 0 1
 2 1 0 1 1 1 0 0
 3 1 1 0 0 0 1 1
 4 1 1 0 0 1 0 0
 5 0 1 0 1 0 0 0
 6 0 0 1 0 0 0 1
 7 1 0 1 0 0 1 0
 G2
 Fig. 3.17 (a): Adjacency Matrix for G1, G2 and G3 of Fig. 3.16
• Adjacency Matrix Representation of a Weighted Graph:
 For weighted graph, the matrix A is represented as,
 A[i] [j] = weight of the edge (i, j)
 = 0 otherwise 1
 Here, weight is labelled associated with edge.
 Example, following is the weighted graph and its associated adjacency matrix.
 1 19
 1 2 3 4 5
 10 15 1 0 15 12 19 0
 4
 2 10 0 14 0 0
 2 12
 3 0 0 0 0 9
 8
 14 6 4 0 0 0 0 0
 3 5 5 0 0 6 8 0
 9
 Fig. 3.17 (b)
 Example 1: Consider the following undirected graph and provide adjacency matrix.
The graph has 4 vertices and it is undirected graph. Write 1 to Number of vertices i.e. 1 to
4 as row and column headers to represent it in adjacency matrix. If edge exists between
any two nodes (row, column headers indicate nodes) write it as 1 otherwise 0 in the
matrix.
 1 1 2 3 4
 1 0 1 1 1
 1 
 2 3 2  0 1 0 
 31 1 0 1 
 4 1 0 
 4
 0 1
 (a) Undirected Graph (b) Adjacency Matrix
 Fig. 3.18
 3.8
Data Structures & Algorithms - II Graph
 3.9
Data Structures & Algorithms - II Graph
 for(j=0;j<n;j++)
 {
 printf("Enter Edge from %d to %d,(1: Edge 0: No edge) \n",i,j);
 scanf("%d",&adj[i][j]);
 }
 for(i=0;i<n;i++)
 {
 degree(adj,i,n);
 }
 return 0;
 }
Output:
 Enter the total number of nodes in graph4
 Enter Edge from 0 to 0,(1: Edge 0: No edge)
 0
 Enter Edge from 0 to 1,(1: Edge 0: No edge)
 1
 Enter Edge from 0 to 2,(1: Edge 0: No edge)
 0
 Enter Edge from 0 to 3,(1: Edge 0: No edge)
 1
 Enter Edge from 1 to 0,(1: Edge 0: No edge)
 1
 Enter Edge from 1 to 1,(1: Edge 0: No edge)
 0
 Enter Edge from 1 to 2,(1: Edge 0: No edge)
 1
 Enter Edge from 1 to 3,(1: Edge 0: No edge)
 1
 Enter Edge from 2 to 0,(1: Edge 0: No edge)
 0
 Enter Edge from 2 to 1,(1: Edge 0: No edge)
 1
 Enter Edge from 2 to 2,(1: Edge 0: No edge)
 0
 Enter Edge from 2 to 3,(1: Edge 0: No edge)
 1
 Enter Edge from 3 to 0,(1: Edge 0: No edge)
 1
 Enter Edge from 3 to 1,(1: Edge 0: No edge)
 1
 Enter Edge from 3 to 2,(1: Edge 0: No edge)
 1
 Enter Edge from 3 to 3,(1: Edge 0: No edge)
 0
 3.10
Data Structures & Algorithms - II Graph
3.2.2 Linked Representation of Graphs [April 16, 17, 18, 19 Oct. 17, 18]
• We use the adjacency list for the linked representation of the graph. In adjacency lists
 representation the n rows of the adjacency matrix are represented as n linked lists.
• The adjacency list representation maintains each node of the graph and a link to the
 nodes that are adjacent to this node. When we traverse all the adjacent nodes, we set
 the next pointer to null at the end of the list.
• Example: Consider the graph G. The Fig. 3.20 shows the graph G and linked
 representation of G in memory. The linked representation will contain two lists:
 1. A node vertex list: To keep the track of all the N nodes of the graph.
 2. An edge list: To keep the information of adjacent vertices of each and every vertex
 of a graph.
• Head node is used to represent each list, i.e. we can represent G by an array Head,
 where Head [i] is a pointer to the adjacency list of vertex i.
 1 2
 5
 C 4
2 3 4 NULL
3 4 NULL
5 4 NULL
 2 1 3 4 NULL
 2 3
 3 1 2 4 NULL
 4 1 2 3 NULL
 4
 Undirected graph G1
 Fig.3.22: Adjacency List Representation of G1 (Undirected Graph)
• The structure for adjacency list representation can be defined in C as follows:
 #define MAX_V 20
 struct node
 {
 int vertex;
 struct node * next;
 }
 struct node head[MAX_V];
• In this representation every edge (vi, vj) of an undirected graph is represented twice,
 once in the list of vi and in the list of vj.
• For directed graph time required is O(n) to determine whether, there is an edge from
 vertex i to vertex j and for undirected graph time is O(n + e).
 3.12
Data Structures & Algorithms - II Graph
 Example 1: Consider the undirected graph in Fig. 3.23 and provide adjacency list.
 1 2 3
4 5
 6 7
 Fig. 3.23
 Solution: The adjacency list is given in Fig. 3.24
 First edge
1 2 5 4
2 1 3
3 2 5
4 1 6
5 1 3 6
6 4 5
 7
 Fig. 3.24
 Example 2: Consider the weighted undirected graph in Fig. 3.25 and provide
adjacency list.
 12
 A B
 7 1
 17
 15 F 2 C
 19 10 6
 E D
 14
 Fig. 3.25
 Solution: An adjacency list for this graph is:
 A B 12 E 15 F 17
B A 12 C 1 D 2 F 7
C B 1 D 6
D B 2 C 6 E 14 F 10
E A 15 D 14 F 19
F A 17 B 7 D 10 E 19
Fig. 3.26
 3.13
Data Structures & Algorithms - II Graph
 2
 2
 3 1 2
 3
 3.14
Data Structures & Algorithms - II Graph
 0 edge (1,2)
 N4 1 2 N5 N6
 N5 1 3 N6 edge (1,3)
 1 2
 N6 2 3 edge (2,3)
 3
 Lists: Vertex 0: N1 N2 N3, Vertex 1: N1 N4 N5
 Vertex 2: N2 N4 N6, Vertex 3: N3 N5 N6
3.3.1 Depth First Search (DFS) [April 16, 17, 18, 19 Oct. 17, 18]
 3.15
Data Structures & Algorithms - II Graph
 3.16
Data Structures & Algorithms - II Graph
 begin
 visited[v] = 1
 display vertex i
 push anyone adjacent vertex x of v onto STACK which is not visted
 end
 end
 3. Stop.
Pseudo 'C' Code for Recursive DFS:
 int visited[MAX]={0};
 DFS (int i)
 { int k;
 visited[i] = 1;
 printf(“ %d”,i);
 for(k=0;k<n;k++)
 if (A[i,k]==1 && visited[k]==0)
 DFS (k);
 }
• Let us consider graph 3.29 drawn below:
 b c f
a e
d g h i
 Sr.
 Steps Stack Spanning Tree
 No.
 1. Select vertex 'a' as starting point
 (visit 'a'). Push 'a' onto the stack a a
 a a
 Visited = {a} Visited = {a}
 3.17
Data Structures & Algorithms - II Graph
 c
 3. Visit any adjacent vertex of 'b' which c b c
 bb c
 b a
 is not visited ('c'). Push 'c' onto the a a
 a
 stack. Visited = {a,b,c}
 Visited = {a,b,c}
 f f b bc cf f
 4. Visit any adjacent vertex of 'c' which c c
 is not visited ('f'). Push 'f' onto the b ba a
 stack. a a ={a,b,c,f
 Visited Visited ={a,b,c,f
 } }
Visited = {a, b, c, f, e, g}
contd. …
 3.18
Data Structures & Algorithms - II Graph
contd. …
 3.19
Data Structures & Algorithms - II Graph
 d g h i
 Stack becomes Empty. So Stop DFS traversal, final result of DFS traversal is following
 spanning tree.
 b c f
 a
 e
d g h i
3.3.2 Breadth First Search (BFS) [April 16, 17, 18, 19 Oct. 17, 18]
• Another systematic way of visiting the vertices is Breadth-First Search (BFS). Starting
 at vertex v and mark it as visited, the BFS differs from DFS, in that all unvisited
 vertices adjacent to i are visited next.
• Then unvisited vertices adjacent to these vertices are visited and so on until the all
 vertices has been in visited. The approach is called 'breadth first' because from vertex
 i that we visit, we search as broadly as possible by next visiting all the vertices
 adjacent to i.
• In BFS method, all the vertices are stored in a Queue and each vertex of the graph is
 visited or explored once.
• The oldest vertex (added first) in the Queue is explored first. To traverse the graph
 using breadth first search, Adjacent List of a graph is created first.
• For example, the BFS of graph of Fig. 3.30 results in visiting the nodes in the following
 order: 1, 2, 3, 4, 5, 6, 7, 8.
 1
2 3
 4 5 6
 7
 Fig. 3.30
• This search algorithm uses a queue to store the adjacent vertices of each vertex of the
 graph as and when it is visited.
 3.20
Data Structures & Algorithms - II Graph
• These vertices are then taken out from queue in a sequence (FIFO) and their adjacent
 vertices are visited and so on until all the vertices have been visited. The algorithm
 terminates when the queue is empty.
• The algorithm for BFS is given below. The algorithm initializes the Boolean array
 visited[ ] to 0 (false) i.e. marks each vertex as unvisited.
 for i = 1 to n do
 visited[i] = 0.
 Algorithm:
 BFS (i)
 begin
 visited[i] = 1
 add (Queue, i)
 while not empty (Queue) do
 begin
 i = delete (Queue)
 for all vertices j adjacent to i do
 begin
 if (visited[j] = 0)
 add (Queue, j)
 visited[j] = 1
 end
 end
 end
• Here, while loop is executed n time as n is the number of vertices and each vertex is
 inserted in a queue once. If adjacency list representation is used, then adjacent nodes
 are computed in for loop.
Pseudo 'C' code for recursive BFS:
 BFS (int i)
 {
 int k,visited[MAX];
 struct queue Q;
 for(k=0;k<n;k++)
 visited[k] = 0;
 visited[i] = 1;
 printf(“ %d”,i);
 insert(Q,i);
 3.21
Data Structures & Algorithms - II Graph
a e
d g h i
• Let us traverse the graph using non-recursive algorithm which uses queue. Let 'a' be a
 start vertex. Initially queue is empty. Initial set of visited vertices, V = φ.
 (i) Add ‘a’ to the queue. Mark ‘a’ as visited. V = {a}
 a
 Front
 Rear
 (ii) As queue is not empty,
 vertex = delete( ) = ‘a’.
 Add all unvisited adjacent vertices of ‘a’ to the queue. Also mark them visited.
 V = {a, b, d, e}
b d e
 Front Rear
 3.22
Data Structures & Algorithms - II Graph
d e c
 Front Rear
 (iv) As queue is not empty,
 Vertex = delete( ) = ‘d’. Now insert all adjacent, unvisited vertices of ‘d’ to the
 queue and mark them as visited
 V = {a, b, d, e, c, g}
e c g
Front Rear
c g
 Front Rear
 (vi) As queue is not empty, vertex = delete( ) = ‘c’.
 Insert all adjacent, unvisited vertices of ‘c’ to the queue and mark them visited.
 V = {a, b, d, e, c, g, f}
g f
 Front Rear
 (vii) As queue is not empty, vertex = delete( ) = ‘g’.
 Insert all unvisited adjacent vertices of ‘g’ to the queue and mark them visited.
 V = {a, b, d, e, c, g, f, h}
f h
Front Rear
 3.23
Data Structures & Algorithms - II Graph
 Front Rear
 (ix) As queue is not empty, vertex = delete( ) = ‘h’.
 Insert its unvisited adjacent vertices to queue and Mark them.
 V = {a, b, d, e, c, g, f, h, i}
 i
 Front Rear
 (x) As queue is not empty,
 Vertex = delete( ) = ‘ i ’, No adjacent vertices of i are unvisited.
 (xi) As queue is empty, stop.
 The sequence in which vertices are visited by BFS is as:
 a, b, d, e, c, g, f, h, i.
 2 5 7
 b c f
1 a e 4
 d g h i
 3 6 8 9
 Fig. 3.31
Program 3.2: To create a graph and represent using adjacency matrix and adjacency list
and traverse in BFS order.
 #include<stdio.h>
 #include<stdlib.h>
 struct q
 {
 int data[20];
 int front, rear;
 } q1;
 struct node
 {
 int vertex;
 struct node * next;
 } * v[10];
 3.24
Data Structures & Algorithms - II Graph
 3.26
Data Structures & Algorithms - II Graph
 while (temp)
 {
 printf("v%d -> ", temp -> vertex);
 temp = temp -> next;
 }
 printf("null");
 }
 }
 void bfs(int m[10][10], int n) //create adjacency list
 {
 int i, j, v, w;
 int visited[20];
 initq();
 for(i=0; i<n; i++)
 visited[i] = 0;
 printf("\n \t The BFS traversal is: \n");
 v=0;
 visited[v] = 1;
 add(v);
 while(! emptyq())
 {
 v = del();
 printf("\n v%d ", v + 1);
 printf("\n");
 for(w = 0; w < n; w++)
 if((m[v][w] ==1) && (visited[w] == 0))
 {
 add(w);
 visited[w] = 1;
 }
 }
 }
 /* main program */
 void main()
 {
 int m[10][10], n;
 printf("\n \t enter no. of vertices");
 scanf("%d", &n);
 create(m,n);
 disp(m,n);
 create1(m,n);
 displist(n);
 bfs(m,n);
 }
 3.27
Data Structures & Algorithms - II Graph
B C B C
 D E F D E F
 A, B, D, C, E, F A, B, C, D, E, F
P R
Q S P Q S R
1 3
1 3
 3.29
Data Structures & Algorithms - II Graph
• In an algorithm to find the topological sort of an acyclic directed graph, the indegree of
 the vertices is considered. Following are the steps that are repeated until the graph is
 empty.
 1. Select any vertex Vi with 0 indegree.
 2. Add vertex Vi to the topological sort (initially the topological sort is empty).
 3. Remove the vertex Vi along with its edges from the graph and decrease the
 indegree of each adjacent vertex of Vi by one.
• To illustrate this algorithm, consider an acyclic directed graph shown in Fig. 3.35.
 2
 3
 2 6
 0 1
 1 3 5 3 Topological Short: Empty
 4 7 3
 2
 Fig. 3.35: Acyclic Directed Graph
• The steps for finding topological sort for this graph are shown in Fig. 3.36.
 2 1
 3 3
 2 6 2 6
 0 1 0
 1 3 5 3 Topological sort: 1 3 5 3
 4 7 3 4 7 3
 2 1
 (a) Removing Vertex 1 with 0 Indegree
 1 0
 3 3
 2 6 2 6
 0
 3 5 3 Topological sort: 1,3 5 2
 4 7 3 4 7 2
 1 0
 (b) Removing Vertex 3 with 0 Indegree
 3.30
Data Structures & Algorithms - II Graph
 0
 3 2
 2 6 6
 4 7 2 4 7 2
 0 0
 (c) Removing Vertex 2 with 0 Indegree
 2 2
 6 6
 4 7 2 7 1
 0
 (d) Removing Vertex 4 with 0 Indegree
 2
 6 6 1
 0
 7 1 7
 0
 Topological sort: 1,3,2,4,5,7 6 Topological sort: 1,3,2,4,5,7,6
 0
 7
 3.31
Data Structures & Algorithms - II Graph
1 3 2 4 5 7 6
(a)
1 3 4 2 5 7 6
 (b)
 Fig. 3.37: Graphical representation of Topological Sort
• Topological sort is useful for the proper scheduling of various subtasks to be executed
 for completing a particular task. In computer field, it is used for scheduling the
 instructions.
• For example, consider a task in which smaller number is to be subtracted from the
 larger one. The set of instructions for this task is as follows:
 1. if A>B then goto Step 2, else goto Step 3
 2. C = A-B, goto Step 4
 3. C = B-A, goto Step 4
 4. Print C
 5. End
• The two possible scheduling orders to accomplish this task are (1, 2, 4, 5) and
 (1, 3, 4, 5). From this, it can be concluded that the instruction 2 cannot be executed
 unless the instruction 1 is executed before it. Moreover, these instructions are non-
 repetitive, hence acyclic in nature.
• A spanning tree of a connected graph G is a tree that covers all the vertices and the
 edges required to connect those vertices in the graph.
• Formally, a tree T is called a spanning tree of a connected graph G if the following two
 conditions hold:
 1. T contains all the vertices of G, and
 2. All the edges of T are subsets of edges of G.
• For a given graph G with n vertices, there can be many spanning trees and each tree
 will have n − 1 edges. For example, consider a graph as shown in Fig. 3.38.
• Since, this graph has 4 vertices, each spanning tree must have 4 − 1 = 3 edges. Some of
 the spanning trees for this graph are shown in Fig. 3.39.
• Observe that in spanning trees, there exists only one path between any two vertices
 and insertion of any other edge in the spanning tree results in a cycle.
• The spanning tree generated by using depth-first traversal is known as depth-first
 spanning tree. Similarly, the spanning tree generated by using breadth-first traversal
 is known as breadth-first spanning tree.
 1
2 3 4
2 3 4 2 3 4
(a) (b)
1 1
2 3 4 2 3 4
 (c) (d)
 Fig. 3.39: Spanning Trees of Graph G
• For a connected weighted graph G, it is required to construct a spanning tree T such
 that the sum of weights of the edges in T must be minimum. Such a tree is called a
 minimum spanning tree.
 3.33
Data Structures & Algorithms - II Graph
• There are various approaches for constructing a minimum spanning tree out of which
 Kruskal's algorithm and Prim's algorithm are commonly used.
• If each edge of the graph is associated with a weight and there exists more than one
 spanning tree, we need to find the minimum spanning tree of the graph.
• A Minimum Spanning Tree (MST) or minimum weight spanning tree is a subset of the
 edges of a connected, edge-weighted undirected graph that connects all the vertices
 together, without any cycles and with the minimum possible total edge weight.
• A minimum spanning tree in an undirected connected weighted graph is a spanning
 tree of minimum weight (among all spanning trees).
• In a minimum spanning tree of a graph, the maximum weight of an edge is the
 minimum possible from all possible spanning trees of that graph.
• In a minimum spanning tree of a graph, the maximum weight of an edge is the
 minimum possible from all possible spanning trees of that graph.
• In the left image you can see a weighted undirected graph, and in the right image you
 can see the corresponding minimum spanning tree.
 5
 4 3 4 3
 9 8
 3
 5 1 3 6 5 1 3 6
 4 7 4 7
 1
 C 2 1
 C 2
 2 2
 (a) Weighted Undirected Graph (b) Minimum Spanning Tree
 Fig. 3.40
Prim’s Minimum Spanning Tree (MST) Algorithm:
• Prim’s algorithm is a greedy strategy to find the minimum spanning tree. In this
 algorithm, to form a MST we can start from an arbitrary vertex.
• Prim's algorithm shares a similarity with the shortest path first algorithms.
• Prim's algorithm, treats the nodes as a single tree and keeps on adding new nodes to
 the spanning tree from the given graph.
• Example: Consider the following graph.
 9
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 1
 Fig. 3.41
 3.34
Data Structures & Algorithms - II Graph
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 1
 Fig. 3.42
• Remove all loops and parallel edges from the given graph. In case of parallel edges,
 keep the one which has the least cost associated and remove all others.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.43
Step 2: Choose any arbitrary node as root node:
• In this case, we choose S node as the root node of Prim's spanning tree. This node is
 arbitrarily chosen, so any node can be the root node. One may wonder why any video
 can be a root node. So the answer is, in the spanning tree all the nodes of a graph are
 included and because it is connected then there must be at least one edge, which will
 join it to the rest of the tree.
Step 3: Check outgoing edges and select the one with less cost:
• After choosing the root node S, we see that S, A and S, C are two edges with weight 7
 and 8, respectively. We choose the edge S, A as it is lesser than the other.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.44
• Now, the tree S-7-A is treated as one node and we check for all edges going out from it.
 We select the one which has the lowest cost and include it in the tree.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.45
 3.35
Data Structures & Algorithms - II Graph
• After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a node and will
 check all the edges again. However, we will choose only the least cost edge. In this
 case, C-3-D is the new edge, which is less than other edges' cost 8, 6, 4, etc.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.46
• After adding node D to the spanning tree, we now have two edges going out of it
 having the same cost, i.e. D-2-T and D-2-B. Thus, we can add either one. But the next
 step will again yield edge 2 as the least cost. Hence, we are showing a spanning tree
 with both edges included.
 A B
 7
S 3 2 T
 2
 C D
 3
 Fig. 3.47
• We may find that the output spanning tree of the same graph using two different
 algorithms is same.
Kruskal's Minimum Spanning Tree (MST) Algorithm:
• Kruskal's algorithm to find the minimum cost spanning tree uses the greedy strategy.
• Kruskal's algorithm treats the graph as a forest and every node it has as an individual
 tree. A tree connects to another only and only if, it has the least cost among all
 available options and does not violate MST properties.
• To understand Kruskal's algorithm let us consider the Fig. 3.48.
 9
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 1
 Fig. 3.48
 3.36
Data Structures & Algorithms - II Graph
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 1
 Fig. 3.49
• In case of parallel edges, keep the one which has the least cost associated and remove
 all others.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.50
Step 2: Arrange all edges in their increasing order of weight:
• The next step is to create a set of edges and weight, and arrange them in an ascending
 order of weightage (cost).
 B, D D, T A, C C, D C, B B, T A, B S, A S, C
 2 2 3 3 4 5 6 7 8
Step 3: Add the edge which has the least weightage:
• Now we start adding edges to the graph beginning from the one which has the least
 weight. Throughout, we shall keep checking that the spanning properties remain
 intact. In case, by adding one edge, the spanning tree property does not hold then we
 shall consider not to include the edge in the graph.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.51
• The least cost is 2 and edges involved are B, D and D, T. We add them. Adding them
 does not violate spanning tree properties, so we continue to our next edge selection.
 3.37
Data Structures & Algorithms - II Graph
• Next cost is 3, and associated edges are A, C and C, D. We add them again:
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.52
• Next cost in the table is 4, and we observe that adding it will create a circuit in the
 graph.
 6
 A B
 7 5
 4
 S 3 2 T
 8 2
 C D
 3
 Fig. 3.53
• We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
 6
 A B
 7 5
S 3 2 T
 8 2
 C D
 3
 Fig. 3.54
• We observe that edges with cost 5 and 6 also create circuits. We ignore them and move
 on.
 A B
 7
S 3 2 T
 8 2
 C D
 3
 Fig. 3.55
• Now we are left with only one node to be added. Between the two least cost edges
 available 7 and 8, we shall add the edge with cost 7.
 A B
 7
S 3 2 T
 2
 C D
 3
 Fig. 3.56
 3.38
Data Structures & Algorithms - II Graph
• By adding edge S, A we have included all the nodes of the graph and we now have
 minimum cost spanning tree.
 2
 C 5
 6
 Fig. 3.57
 Distance 0 3 1 6 ∞ ∞
 1
 Distance From 0 0 0
 Step 2:
 0
 No. of Nodes 0 1 2 3 4 5 3
 1
 Distance 0 3 0 5 6 4 1
 Distance From 0 2 2 2
 2
 Step 3:
 0
 3
 No. of Nodes 0 1 2 3 4 5 1
 1
 Distance 0 0 0 5 3 4
 3 2
 Distance From 2 1 2
 3.39
Data Structures & Algorithms - II Graph
 Step 4:
 0
 3
 No. of Nodes 0 1 2 3 4 5 1
 1
 Distance 0 0 0 5 0 4
 Distance From 2 2 3 2
 4
 4 5
 Step 5:
 0
 3
 No. of Nodes 0 1 2 3 4 5 1
 1 3
 Distance 0 0 0 3 0 0
 Distance From 2 2 3 2 2
 4
 4 5
 Hence, the minimum distance of vertex 9 from vertex 1 is 20. And the path is
 1 → 3 →7 →8 →6 →9
 This path is determined based on predecessor information.
 7 8
 2 5 6
 5 3 4
 2
 7 9
 1 2 4 5 3
 6
 2 3
 3 7 8
 9 2
 Fig. 3.58
 Example: Consider the graph in Fig. 3.59.
 ¥ ¥
 2
 B D
 10
 8
 ¥ A 1 4 7 9
 3
 C E
 2
 ¥ ¥
 Fig. 3.59
Procedure for Dijkstra's Algorithm:
 Step 1 : Consider A as source Vertex.
 ¥
 B
 No. of Nodes A B C D E 10
 Distance 0 10 3 ∞ ∞
 0 A
 Distance From A A
 3
 C
 ¥
 Step 2 : Now consider vertex C.
 10 ¥
 B D
 No. of Nodes A B C D E
 10
 Distance 0 7 3 11 5 8
 0 A
 Distance From C C C C
 3
 C E
 2
 3 ¥
 3.41
Data Structures & Algorithms - II Graph
 Therefore,
 A B C D E
 0 ∞ ∞ ∞ ∞
 A 0 10 3 ∞ ∞
 C 7 3 11 5
 E 14 5
 B 9
 D 16
 3.42
Data Structures & Algorithms - II Graph
 C E
 3
 Fig. 3.60
• The all-pairs shortest path problem, in which we have to find shortest paths between
 every pair of vertices in the graph.
• The Floyd Warshall Algorithm is for solving the All Pairs Shortest Path problem. The
 all pair shortest path algorithm is also known as Floyd-Warshall algorithm is used to
 find all pair shortest path problem from a given weighted graph.
• The Floyd-Warshall algorithm is used for finding the shortest path between every pair
 of vertices in the graph.
• This algorithm works for both directed as well as undirected graphs. This algorithm is
 invented by Robert Floyd and Stephen Warshall hence it is often called as Floyed -
 Warshall algorithm.
• Floyd-Warshall algorithm is used to find all pair shortest path problem from a given
 weighted graph.
• As a result of this algorithm, it will generate a matrix, which will represent the
 minimum distance from any node to all other nodes in the graph.
 2 0 1 −3 2 −4
 3 4
  3 0 −4 1 −1 
 7 4 0 5 3 
 7 1
 1 3
 8
 -4 2 -5  2 −1 −5 0 −2 
 C
 5
 6
 4
 8 5 1 6 0 
 Fig. 3.61
 3.43
Data Structures & Algorithms - II Graph
• At first, the output matrix is the same as the given cost matrix of the graph. After that,
 the output matrix will be updated with all vertices k as the intermediate vertex.
 Input and Output:
 Input: The cost matrix of the graph.
 036∞∞∞∞
 3021∞∞∞
 620142∞
 ∞1102∞4
 ∞∞42021
 ∞∞2∞201
 ∞∞∞4110
 Output: Matrix of all pair shortest path.
 0345677
 3021344
 4201323
 5110233
 6332021
 7423201
 7433110
 3.44
Data Structures & Algorithms - II Graph
• On The Graph API, everything is a vertice or node. This are entities such as Users,
 Pages, Places, Groups, Comments, Photos, Photo Albums, Stories, Videos, Notes, Events
 and so forth. Anything that has properties that store data is a vertice.
• The Graph API uses this, collections of vertices and edges (essentially graph data
 structures) to store its data.
• The Graph API has come into some problems because of it's ability to obtain unusually
 rich info about user's friends.
Jacie
John Juan
Jack Julia
Jade Jeff
 Fig. 3.62: A sample Graph (in this graph, individuals are represented with nodes (circles) and
 individuals who know each other are connected with edges (lines))
Example: Define spanning tree and minimum spanning tree. Find the minimum spanning
tree of the graph shown in Fig. 3.62.
Using Prim's Algorithm:
 Let X be the set of nodes explored, initially X = {A}.
 A
 5
 4
 3
 B E
 6
 2 5 6 7
 C
 C
 1
 D
 B X = {A, B}
 Step 2 : Taking minimum weight edge of all Adjacent edges of X = {A, B}.
 A
 4
 B
 X = {A, B, C}
 2
 C
 C
 3.45
Data Structures & Algorithms - II Graph
 Step 3 : Taking minimum weight edge of all Adjacent edges of X = {A, B, C}.
 A
 4
2 X = {A, B, C, D)
 C
 C
 1
 D
 Step 4 : Taking minimum weight edge of all Adjacent edges of X = {A, B, C, D}.
 A
 4
 3
 B E
2 X = {A, B, C, D, E}
 C
 C
 1
 D
 All nodes of graph are there with set X, so we obtained minimum spanning tree of
 cost: 4 + 2 + 1 + 3 = 10.
Using Kruskal's Algorithm:
• Let X be the set of nodes explored, initially X = (A).
 A
 5
 4
 3
 B E
 6
 2 5 6 7
 C
 C
 1
 D
 Step 1 : Taking minimum edge (C, D).
 C
 C
 1
 D
 Step 2 : Taking next minimum edge (B, C).
 B
 C
 C
 1
 D
 Step 3 : Taking next minimum edge (B, E).
 3
 B E
 C
 C
 1
 D
 3.46
Data Structures & Algorithms - II Graph
 C
 C
 1
 D
 Step 5 : Taking next minimum edge (A, E) it forms cycle so do not consider.
 Step 6 : Taking next minimum edge (C, E) it forms cycle so do not consider.
 Step 7 : Taking next minimum edge (A, D) it forms cycle so do not consider.
 Step 8 : Taking next minimum edge (A, C) it forms cycle so do not consider.
 Step 9 : Taking next minimum edge (E, D) it forms cycle so do not consider.
 All edges of graph have been visited, so we obtained minimum spanning tree of
 cost: 4 + 2 + 1 + 3 = 10.
Program 3.3: Program that accepts the vertices and edges of a graph. Create adjacency list
and display the adjacency list.
 #include <stdio.h>
 #include <stdlib.h>
 #define new_node (struct node*)malloc(sizeof(struct node))
 struct node
 {
 int vertex;
 struct node *next;
 };
 void main()
 {
 int choice;
 do
 {
 printf("\n A Program to represent a Graph by using an
 Adjacency List \n ");
 printf("\n 1. Directed Graph ");
 printf("\n 2. Un-Directed Graph ");
 printf("\n 3. Exit ");
 printf("\n\n Select a proper choice : ");
 scanf("%d", &choice);
 3.47
Data Structures & Algorithms - II Graph
 switch(choice)
 {
 case 1 : dir_graph();
 break;
 case 2 : undir_graph();
 break;
 case 3 : exit(0);
 }
 }while(1);
 }
 int dir_graph()
 {
 struct node *adj_list[10], *p;
 int n;
 int in_deg, out_deg, i, j;
 printf("\n How Many Vertices ? : ");
 scanf("%d", &n);
 for( i = 1 ; i <= n ; i++ )
 adj_list[i] = NULL;
 read_graph (adj_list, n);
 printf("\n Vertex \t In_Degree \t Out_Degree \t Total_Degree ");
 for (i = 1; i <= n ; i++ )
 {
 in_deg = out_deg = 0;
 p = adj_list[i];
 while( p != NULL )
 {
 out_deg++;
 p = p -> next;
 }
 for ( j = 1 ; j <= n ; j++ )
 {
 p = adj_list[j];
 while( p != NULL )
 {
 if ( p -> vertex == i )
 in_deg++;
 p = p -> next;
 }
 }
 3.48
Data Structures & Algorithms - II Graph
 3.50
Data Structures & Algorithms - II Graph
1 2 0 2
2 1 2 3
3 0 1 1
4 1 1 2
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
 1. Which is a non-linear data structure?
 (a) Graph (b) Array
 (c) Queue (d) Stack
 2. A graph G is represented as G = (V, E) where,
 (a) V is set of vertices (b) E is set of edges
 (c) Both (a) and (b) (d) None of these
 3. In which representation, the graph is represented using a matrix of size total
 number of vertices by a total number of vertices.
 (a) Adjacency List (b) Adjacency Matrix
 (c) Adjacency Queue (d) Adjacency Stack
 4. Each node of the graph is called a ______.
 (a) Edge (b) Path
 (c) Vertex (d) Cycle
 5. Which representation of graph is based on linked lists?
 (a) Adjacency List (b) Adjacency Matrix
 (c) Both (a) and (b) (d) None of these
 6. Which of a graph means visiting each of its nodes exactly once?
 (a) Insert (b) Traversal
 (c) Delete (d) Merge
 7. Which is a vertex based technique for finding a shortest path in graph?
 (a) DFS (b) BST
 (c) BFS (d) None of these
 8. Which usually implemented using a stack data structure?
 (a) DFS (b) BST
 (c) BFS (d) None of these
 9. Which is a graph in which all the edges are uni-directional i.e. the edges point in a
 single direction?
 (a) Undirected (b) Directed
 (c) Cyclic (d) Acyclic
 3.51
Data Structures & Algorithms - II Graph
 10. Which sorting involves displaying the specific order in which a sequence of
 vertices must be followed in a directed graph?
 (a) Cyclic (b) Acyclic
 (c) Topological (d) None of these
 11. Which is a subset of an undirected graph that has all the vertices connected by
 minimum number of edges?
 (a) Spanning tree (b) Minimum spanning tree
 (c) Both (a) and (b) (d) None of these
 12. Which is a subset of edges of a connected weighted undirected graph that connects
 all the vertices together with the minimum possible total edge weight?
 (a) Spanning tree (b) Minimum Spanning Tree (MST)
 (c) Both (a) and (b) (d) None of these
 Answers
 1. (a) 2. (c) 3. (b) 4. (c) 5. (a) 6. (b) 7. (c)
 8. (a) 9. (b) 10.(c) 11. (a) 12. (b)
 13. The sequence of nodes that we need to follow when we have to travel from one
 vertex to another in a graph is called the ______.
 14. An adjacency ______ is a matrix of size n x n where n is the number of vertices in
 the graph.
 15. Topological ordering of vertices in a graph is possible only when the graph is a
 ______ acyclic graph.
 16. A ______ (or minimum weight spanning tree) for a weighted, connected and
 undirected graph is a spanning tree with weight less than or equal to the weight of
 every other spanning tree.
 17. The ______ algorithm is an example of an all-pairs shortest paths algorithm.
 18. A graph ______ cycle is called acyclic graph.
 Answers
 1. G = (V, E) 2. cyclic 3. non-linear 4. Vertex, Arc
 5. searching 6. Stack 7. degree 8. toward
 9. Null 10. link 11. Queue 12. visited
 13. path 14. matri 15. directed
 16. Minimum Spanning Tree (MST) 17. Floyd-Warshall 18. without
Q. III State True or False:
 1. An acyclic graph is a graph that has no cycle.
 2. A Graph consists of a finite set of vertices (or nodes) and set of Edges which
 connect a pair of nodes.
 3. A graph in which weights are assigned to every edge is called acyclic graph.
 4. The Floyd-Warshall algorithm is for solving the all pairs shortest path problem.
 5. The number edges pointing away from the node are called out-degree/out-order.
 6. In a graph, if two nodes are connected by an edge then they are called adjacent
 nodes or neighbors.
 7. Dijkstra's Algorithm can be applied to either a directed or an undirected graph to
 find the shortest path to each vertex from a single source.
 8. The graph is a linear data structures.
 9. BFS usually implemented using a queue data structure.
 10. The spanning tree does not have any cycle (loops).
 11. Prim’s Algorithm will find the minimum spanning tree from the graph G.
 12. Given an undirected, connected and weighted graph, find Minimum Spanning
 Tree (MST) of the graph using Kruskal’s algorithm.
 3.53
Data Structures & Algorithms - II Graph
 13. The number of edges that are connected to a particular node is called the path of
 the node.
 14. A spanning tree of that graph is a subgraph that is a tree and connects all the
 vertices together.
 15. Directed graph is also called as Digraph.
 Answers
 1. (T) 2. (T) 3. (F) 4. (T) 5. (T) 6. (T) 7. (T) 8. (T)
9. (F) 10. (T) 11. (T) 12. (T) 13. (F) 14. (T) 15. (T)
 2
 1 3
2 1 3 NULL
3 2 3 NULL
B C
V1 V3 V5
V4 V6
 3.55
Data Structures & Algorithms - II Graph
V1 V2 V3
V6 V7
 Starting vertex v1
 (i) Draw adjacency list
 (ii) Give DFS and BFS traversals.
Ans. Refer to Sections 3.2.2 and 3.3.
 October 2018
 3
 6
 1 8
 5
 2
 April 2019
 1. Consider the following graph: [4 M]
 V2
 V5
V1 V3 V7
 V6
 V4
 3.58
 CHAPTER
 4
 Hash Table
Objectives …
 To study Basic Concepts of Hashing
 To learn Hash Table and Hash Function
 To understand Terminologies in Hashing
 To study Collision Resolution Techniques
 4.0 INTRODUCTION
• Hashing is a data structure which is designed to use a special function called the hash
 function which is used to map a given value with a particular key for faster access of
 elements.
• Hashing is the process of mapping large amount of data item to smaller table with the
 help of hashing function.
• The mapping between an item and the slot where that item belongs in the hash table
 is called the hash function.
• The hash function will take any item in the collection and return an integer in the
 range of slot names, between 0 and m−1.
• A hash table is a collection of items which are stored in such a way as to make it easy
 to find them later.
Need for Hash Table Data Structure:
• In the linear search and binary search, the location of the item is determined by a
 sequence/series of comparisons. The data item to be searched is compared with items
 at certain locations in the list.
• If any item/element matches with the item to be searched, the search is successful.
 The number of comparisons required to locate an item depends on the data structure
 like array, linked list, sorted array, binary search tree, etc. and the search algorithm
 used.
• For example, if the items are stored in sorted order in an array, binary search can be
 applied which locates an item in O(log n) comparisons.
• On the other hand, if an item is to be searched in a linked list or an unsorted array,
 linear search has to be applied which locates an item in O(n) comparisons.
 4.1
Data Structures and Algorithms - II Hash Table
• However, for some applications, searching is very critical operation and they need a
 search algorithm which performs search in constant time, i.e. O(1).
• Although, ideally it is almost impossible to achieve a performance of O(1), but still a
 search algorithm can be derived which is independent of n and can give a
 performance very close to O(1).
• That search algorithm is called hashing. Hashing uses a data structure called hash
 table which is merely an array of fixed size and items in it are inserted using a
 function called hash function.
• Best case timing behavior of searching using hashing = O(1) and Worst case timing
 Behavior of searching using hashing = O(n).
• A hash table is a data structure in which the location of a data item is determined
 directly as a function of the data item itself rather than by a sequence of comparisons.
• Under ideal condition, the time required to locate a data item in a hash table is O(1)
 i.e. it is constant and does nor depend on the number of data items stored.
 4.2
Data Structures and Algorithms - II Hash Table
 4.2 TERMINOLOGY
• The basic terms used in hashing are explained below:
 1. Hash Table: A hash table is a data structure that is used to store keys/value pairs.
 Hashing uses a data structure called hash table which is merely an array of fixed
 size and items in it are inserted using a hash function. It uses a hash function to
 compute an index into an array in which an element will be inserted or searched.
 A hash table is a data structure that maps keys to values. a hash table (hash map) is
 a data structure that implements an associative array abstract data type, a
 structure that can map keys to values. A hash table uses a hash function to
 compute an index, also called a hash code, into an array of buckets or slots, from
 which the desired value can be found.
 2. Hash Function: A hash function is a function that maps the key to some slot in the
 hash table. A hash function, is a mapping function which maps all the set of search
 keys to the address where actual records are placed.
 3. Bucket: A hash file stores data in bucket format. Bucket is considered a unit of
 storage. A bucket typically stores one complete disk block, which in turn can store
 one or more records.
 4. Hash Address: A hash function is a function which when given a key, generates an
 address in the table. Hash index is an address of the data block.
 5. Collision: The situation where a newly inserted key maps to an already occupied
 slot in the hash table is called collision. A collision occurs when two data elements
 are hashed to the same value and try to occupy the same space in the hash table.
 In simple words, the situation in which a key hashes to an index which is already
 occupied by another key is called as collision.
 6. Synonym: It is possible for different keys to hash to the same array location. This
 situation is called collision and the colliding keys are called synonyms.
 4.3
Data Structures and Algorithms - II Hash Table
 7. Overflow: An overflow occurs at the time of the bucket for a new pair (key,
 element) is full.
 8. Open Addressing: It is performed to ensure that all elements are stored directly
 into the hash table, thus it attempts to resolve collisions implementing various
 methods.
 9. Linear Probing: It is performed to resolve collisions by placing the data into the
 next open slot in the table.
 10. Hashing: Hashing is a process that uses a hash function to get the key for the hash
 table and transform it into an index that will point to different arrays of buckets,
 which is where the information will be stored.
 11. Chaining: It is a technique used for avoiding collisions in hash tables.
 12. Collision Resolution: Collision should be resolved by finding some other location
 to insert the new key, this process of finding another location is called as collision
 resolution.
• Such a field is known as primary key (denoted by k). The values k1, k2,… kn. in the key
 field are known as keys or key values.
• The key through an algorithmic function determines the location of a particular
 record.
 The algorithmic function i.e. hashing function basically performs the key-to-address
 transformation in which key is mapped to the addresses of records in the file as shown
 in Fig. 4.2.
 4.6
Data Structures and Algorithms - II Hash Table
 4.7
Data Structures and Algorithms - II Hash Table
 while(hashArray[hashIndex] != NULL)
 {
 if(hashArray[hashIndex]->key == key)
 {
 struct DataItem* temp = hashArray[hashIndex];
 //assign a dummy item at deleted position
 hashArray[hashIndex] = dummyItem;
 return temp;
 }
 //go to next cell
 ++hashIndex;
 //wrap around the table
 hashIndex %= SIZE;
 }
 return NULL;
 }
 void display()
 {
 int i = 0;
 for(i = 0; i<SIZE; i++)
 {
 if(hashArray[i] != NULL)
 printf(" (%d,%d)",hashArray[i]->key,hashArray[i]->data);
 else
 printf(" ~~ ");
 }
 printf("\n");
 }
 int main()
 {
 dummyItem = (struct DataItem*) malloc(sizeof(struct DataItem));
 dummyItem->data = -1;
 dummyItem->key = -1;
 insert(1, 20);
 insert(2, 70);
 insert(42, 80);
 insert(4, 25);
 insert(12, 44);
 4.8
Data Structures and Algorithms - II Hash Table
 insert(14, 32);
 insert(17, 11);
 insert(13, 78);
 insert(37, 97);
 display();
 item = search(37);
 if(item != NULL)
 {
 printf("Element found: %d\n", item->data);
 } else
 {
 printf("Element not found\n");
 }
 delete(item);
 item = search(37);
 if(item != NULL)
 {
 printf("Element found: %d\n", item->data);
 } else
 {
 printf("Element not found\n");
 }
 }
Output:
 $gcc -o hashpgm *.c
 $hashpgm
 ~~ (1,20) (2,70) (42,80) (4,25) ~~ ~~ ~~ ~~ ~~ ~~ ~~ (12,44)
 (13,78) (14,32) ~~ ~~ (17,11) (37,97) ~~
 Element found: 97
 Element not found
• The number obtained after removing the digits is used as the hash value. Note that the
 2
 digits at the same position of k must be used for all keys.
• Thus, the hash function is given below,
 h(k) = s
 2
 where, s is obtained by deleting digits from both sides of k .
• For example, consider a hash table with N = 1000. The hash value of the key value
 132437 can be calculated as follows:
 1. The square of the key value is calculated, which is, 17539558969.
 th th th
 2. The hash value is obtained by taking 5 , 6 and 7 digits counting from right,
 which is, 955.
• Let us suppose, if k is the index retrieved from the hash function. If the kth index is
 already filled then we will look for (k+1) %M, then (k+2) %M and so on. When we get a
 free slot, we will insert the object into that free slot.
• Below table shows the result of Inserting keys (5,18,55,78,35,15) using the hash
 function h(k) = k % 10 and linear probing strategy.
 Empty
 After 5 After 18 After 55 After 78 After 35 After 15
 Table
 0 15
 1
 2
 3
 4
 5 5 5 5 5 5 5
 6 55 55 55 55
 7 35 35
 8 18 18 18 18 18
 9 78 78 78
• Linear probing is easy to implement but it suffers from "primary clustering".
• When many keys are mapped to the same location (clustering), linear probing will not
 distribute these keys evenly in the hash table. These keys will be stored in
 neighborhood of the location where they are mapped. This will lead to clustering of
 keys around the point of collision.
Example 1: Consider an example of hash table of size 20, and the following items are to be
stored. Item are in the (key, value) format. Here, we can search the next empty location in
the array by looking into the next cell until we find an empty cell. This technique is called
linear probing.
 (1, 20)
 (2, 70)
 (42, 80)
 (4, 25)
 (12, 44)
 (14, 32)
 (17, 11)
 (13, 78)
 (37, 98)
 4.12
Data Structures and Algorithms - II Hash Table
Solution:
 After Linear Probing,
 Sr. No. Key Hash Array Index
 Array Index
 1. 1 1 % 20 = 1 1 1
 2. 2 2 % 20 = 2 2 2
 3. 42 42 % 20 = 2 2 3
 4. 4 4 % 20 = 4 4 4
 5. 12 12 % 20 = 12 12 12
 6. 14 14 % 20 = 14 14 14
 7. 17 17 % 20 = 17 17 17
 8. 13 13 % 20 = 13 13 13
 9. 37 37 % 20 = 17 17 18
Example 2: A hash table of length 10 uses open addressing with hash function h(k)=k mod
10, and linear probing. After inserting 6 values into an empty hash table, the table is as
shown below:
 0
 1
 2 42
 3 23
 4 34
 5 52
 6 46
 7 33
 8
 9
 What is a possible order in which the key values could have been inserted in the table?
Solution: 46, 34, 42, 23, 52, 33 is the sequence in which the key values could have been
inserted in the table.
 How many different insertion sequences of the key values using the same hash function
and linear probing will result in the hash table shown above?
 Solution: 30
 In a valid insertion sequence, the elements 42, 23 and 34 must appear before 52 and 33,
and 46 must appear before 33.
 Total number of different sequences = 3! × 5 = 30
 In the above expression, 3! is for elements 42, 23 and 34 as they can appear in any order,
and 5 is for element 46 as it can appear at 5 different places.
 4.13
Data Structures and Algorithms - II Hash Table
4.5.1.3 Rehashing
• As the name suggests, rehashing means hashing again. Rehashing is a technique in
 which the table is resized, i.e. the size of table is doubled by creating a new table.
• Several deletion operations are intermixed with insertion operations while
 performing operations on hash table. Eventually, a situation arises when the hash
 table becomes almost full.
• At this time, it might happen that the insert, delete, and search operations on the hash
 table take too much time. The insert operation even fails in spite of performing open
 addressing with quadratic probing for collision resolution. This condition/situation
 indicates that the current space allocated to the hash table is not sufficient to
 accommodate all the keys.
 4.14
Data Structures and Algorithms - II Hash Table
• A simple solution to this problem is rehashing, in which all the keys in the original
 hash table are rehashed to a new hash table of larger size.
• The default size of the new hash table is twice as that of the original hash table. Once
 the new hash table is created, a new hash value is computed for each key in the
 original hash table and the keys are inserted into the new hash table.
• After this, the memory allocated to the original hash table is freed. The performance of
 the hash table improves significantly after rehashing.
 0
 Transferring 1
 the contents 2
 9
 Old table
 22
 New table
 Fig. 4.3: Rehashing
4.5.2 Chaining
• In hashing the chaining is one collision resolution technique. Chaining is a possible
 way to resolve collisions. Each slot of the array contains a link to a singly-linked list
 containing key-value pairs with the same hash.
• Chaining allows storing the key elements with the same hash value into linked list as
 shown in Fig. 4.4.
• Thus, cach slots in the hash table contains a pointer to the head of the linked list of all
 the elements that hashes to the value h.
• All collisions are chained in the lists attached to the appropriate slot. This allows an
 unlimited number of collisions to be handled and does not require a prior knowledge
 of how many elements are contained in the collection.
• The tradeoff is the same as with linked lists versus array implementations of
 collections: linked list overhead in space and to a lesser extent, in time.
 4.15
Data Structures and Algorithms - II Hash Table
 1
 Head
 2
 122803 3
 Hash Address = h(k)=4
 Key%307 + 1 4 Uday Lodia 122803
 151354
 Key Value Next
 4.16
Data Structures and Algorithms - II Hash Table
0 ofu
1 gcl
2 qur
3 clq
4 ecd
5 dim
6 aty
7 rhv
8 qrj
9 qsu
 Fig. 4.5
• Coalesced hashing technique is effective, efficient, and very easy to implement.
• In separate chaining technique, a separate list of all elements mapped to the same
 value is maintained. Separate chaining is based on collision avoidance.
• If memory space is tight, separate chaining should be avoided. Additional memory
 space for links is wasted in storing address of linked elements.
• Hashing function should ensure even distribution of elements among buckets;
 otherwise the timing behavior of most operations on hash table will deteriorate.
• Fig. 4.6 shows a separate chaining hash table.
 List of elements
 0 10 50
2 12 32 62
4 4 24
7 7
9 9 69
 Example 1: The integers given below are to be inserted in a hash table with 5 locations
using chaining to resolve collisions. Construct hash table and use simplest hash function.
1, 2, 3, 4, 5, 10, 21, 22, 33, 34, 15, 32, 31, 48, 49, 50,
 Solution: An element can be mapped to a location in the hash table using the mapping
function h(k) = k % 10.
0 5, 10, 15, 50
1 1, 21, 31
2 2, 22, 32
3 3, 33, 48
 4 4, 34, 49
 4.18
Data Structures and Algorithms - II Hash Table
0 5 10 15 50
1 1 21 31
2 2 22 32
3 3 33 48
4 4 34 49
Fig. 4.7
Example 2: Consider the key values 20, 32, 41, 66, 72, 80, 105, 77, 56, 53 that need to be
hashed using the simple hash function h(k) = k mod 10. The keys 20 and 80 hash to index
0, key 41 hashes to index 1, keys 32 and 72 hashes to index 2, key 53 is hashed to index 3,
key 105 is hashed to index 5, keys 66 and 56 are hashed to index 6 and finally the key 77 is
hashed to index 7. The collision is handled using the separate chaining (also known as
0 80 20 NULL
1 41 NULL
2 72 32 NULL
3 53 NULL
4 NULL
5 105 NULL
6 56 66 NULL
7 77 NULL
8 NULL
9 NULL
 4.19
Data Structures and Algorithms - II Hash Table
 Sr.
 Separate Chaining Open Addressing
 No.
 1. Chaining is simpler to implement. Open addressing requires more
 computation.
 2. In chaining, hash table never fills up, In open addressing, table may become
 we can always add more elements to full.
 chain.
 3. Chaining is less sensitive to the hash Open addressing requires extra care
 function or load factors. for to avoid clustering and load factor.
 4. Chaining is mostly used when it is Open addressing is used when the
 unknown how many and how frequency and number of keys is
 frequently keys may be inserted or known.
 deleted.
 5. Cache performance of chaining is not Open addressing provides better cache
 good as keys are stored using linked performance as everything is stored in
 list. the same table.
 6. Wastage of space (some parts of hash In open addressing, a slot can be used
 table in chaining are never used). even if an input does not map to it.
 7. Chaining uses extra space for links. No links in open addressing.
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
 1. Which data structure is designed to use a special function called the hash
 function?
 (a) Hashing (b) Stacking
 (c) Queueing (d) None of these
 2. Which is a data structure that represents data in the form of key-value pairs?
 (a) Hash function (b) Hash table
 (c) Hashing (d) None of these
 3. Which is any function that can be used to map a data set of an arbitrary size to a
 data set of a fixed size, which falls into the hash table?
 (a) Hash function (b) Hash table
 (c) Hash values (d) None of these
 4.20
Data Structures and Algorithms - II Hash Table
8. (a) 9. (c) 10. (b) 11. (d) 12. (a) 13. (b)
 4.22
Data Structures and Algorithms - II Hash Table
 4.23
Data Structures and Algorithms - II Hash Table
4.24