B Tree
Muhammad Haris
Department of Computer Science
m.haris@nu.edu.pk
1
Today’s Lecture
Introduction to B tree
Properties of B tree
Insertion into B tree
Examples
Activity to submit now
B-tree concept
BST?
AVL trees/Balance tree
B-Tree is a good structure if much of the tree is in
slow memory (disk),
Since smaller the height
Pick large block of data
Used in cache applications
Definition of a B-tree
A self balancing tree called M-way tree
M is the order of B tree
M could be 3 4 5 6 7 etc
M is the maximum number of children of nodes in a B-tree.
[Not the values]
All leaf nodes are at the same Level
Keys will be always in sorted form in each node
Definition of a B-tree properties
Every node can have maximum m Children
Every node can have minimum children
Root => 2
Leaf node => 0 children
Mid or internal node in tree => ceil[m/2]
Every node can have m-1 maximum keys
Every node can All node will be inserted in the leaf
node have minimum keys ceil[m/2]-1 except root node
Root node can have one value
Extra information
Self balancing tree
Allow node to have more than 2 children in specific range
Max child = 2t i.e m
Min child=Ceil [m/2] called t
Where m=2t and t is the branching factor that should be
greater than 1.
Balanced = leaves are at same height
Disc operations efficient
M>2
T>1
Definition
The major advantage of the B+ tree (and B-trees in general) over binary
search trees is that they play well with caches. If you have a binary search
tree whose nodes are stored in more or less random order in memory, then
each time you follow a pointer, the machine will have to pull in a new
block of memory into the processor cache, which is dramatically slower
than accessing memory already in cache.
The B+-tree and the B-tree work by having each node store a huge number
of keys or values and have a large number of children. They are typically
packed together in a way that makes it possible for a single node to fit nicely
into cache (or, if stored on disk, to be pulled from the disk in a single
read operation). You then have to do more work to find a key within the
node or determine which child to read next, but because all memory accesses
done on a single node can be done without going back to disk, the access
times are very small. This means that even though in principle a BST might
be better in terms of number of memory accesses, the B+-tree and the B-tree
can performed better in terms of the runtime of those memory accesses.
The typical use case for a B+-tree or B-tree is in a
database, where there is a huge amount of information
and the data are so numerous that they can't all fit into
main memory. Accordingly, the data can then be
stored in a B+-tree or B-tree on a hard disk
somewhere. This minimizes the number of disk reads
necessary to pull in the data during lookups. Some
filesystems (like ext4, I believe) use B-trees as well
for the same reason - they minimize the number of
disk lookups necessary, which is the real bottleneck.
An example B-Tree
26 A B-tree of order 5
6 12
42 51 62
1 2 4 7 8 13 15 18 25
27 29 45 46 48 53 55 60 64 70 90
Note that all the leaves are at the same level
9 B-Trees
Properties of B-trees
1. Every node x has the following fields
a. x.n: the number of keys currently stored in node x.
e.g 1|2|4.n is 3.
b. The x.n keys themselves, stored in non-decreasing order
so that x.key1 ≤ x.key2≤ … ≤ x.keyx.n
e.g 1|2|4 are ordered
c. x.leaf, a boolean that is TRUE if x is a leaf and FALSE
otherwise.
2. Each internal (=non-leaf) node contains x.n+1 pointers x.c 1, x.c2,
…, x.cx.n+1 to children.
3. Leaf nodes have no such pointers.
4. The keys x.keyi separate the ranges of keys stored in each subtree:
if ki is any key stored in the subtree with root x.c i then
k1 ≤ x.key1 ≤ k2 ≤ x.key2 ≤ … ≤ x.keyx.n ≤ kx.n+1
e.g consider 6|12 node
6 12
1 2 4 7 8 13 15 18 25
B tree Overall properties
Balance tree m-way tree
More than 2 Childs but actually its balance tree (BST)
All leaf nodes must be at same level 2,3,4 or k
Always add items to the leaf node
All order of m leaf have following properties
Ever node has at most m Childs
Min children could be zero for leaf, 2 for root and ceil of (
m/2) for all other nodes
Every node has m-1 keys (values)
Min keys for root will be 1
All other nodes will have minimum keys ceilof(m/2)-1
M way B-tree
5 way tree
A B-tree of order 5, that is, internal nodes can have children
three, four or five children
m-1 nodes max keys
• 3 way tree
A B-tree of order 3, that is, internal nodes can have two or
three children.
m-1 nodes max keys
Insertion value X into B-tree
1. Using Search Procedure for M-way trees, find leaf
node to which X should be added
2. Add X to this node in the appropriate place among the
values already there. Being a leaf node there are no sub
trees to worry about.
3. If there are M-1 or fewer values in the node after
adding X, then we are finished
4. If there are M values after adding X in the node.
Split the node into three parts
Left : the first (M-1)/2 values
Middle : ((M-1)/2 +1)
Right: the last (M-1)/2 values
Move up the middle key
This strategy might have to be repeated all the way to the top
If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Constructing a B-tree
Suppose we start with an empty B-tree and keys arrive
in the following order:1 12 8 2 25 6 14 28 17 7
52 16 48 68 3 26 29 53 55 45
We want to construct a B-tree of order 5
The first four items go into the root:
1 2 8 12
To put the fifth item in the root (Step 4)
Therefore, when 25 arrives, pick the middle key to
make a new root
16 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Constructing a B-tree (contd.)
1 2 12 25
6, 14, 28 get added to the leaf nodes:
8
1 2 6 12 14 25 28
17 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Constructing a B-tree (contd.)
Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17
1 2 6 12 14 25 28
7, 52, 16, 48 get added to the leaf nodes
8 17
1 2 6 7 12 14 16 25 28 48 52
18 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Constructing a B-tree (contd.)
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48
1 2 6 7 12 14 16 25 26 28 29 52 53 55 68
Adding 45 causes a split of 25 26 28 29
and promoting 28 to the root then causes the root to split
19 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Constructing a B-tree (contd.)
17
3 8 28 48
1 2 6 7 12 14 16 25 26 29 45 52 53 55 68
20 B-Trees
Exercise in Inserting a B-Tree
Home Task
Insert the following Letters to a 3-way B-tree:
CNGAHEKQMFWLT
21 B-Trees
Analysis of B-Tree
Two Principle component of running time :
The number of disc accesses
The CPU computing time
B-Tree-Search
Search 55
17
3 8 28 48
1 2 6 7 12 14 16 25 26 29 45 52 53 55 68
24 B-Trees
Analysis of B-Tree Search
Number of Disk access
The B-TREE-SEARCH procedure accesses disk O(h)=
O(logt n) , where h is the height of the B-tree and n is
the number of keys in the B-tree.
Assumption 2t=m
Each node has 2t-1 items/keys
CPU time
Since x.n < 2t, the while loop of lines 2–3 takes O(t)
time within each node
the total CPU time is O(th)=O(t logtn).
Self study
Read code for the insertion in a B tree (CLRS pages :
Chapter 18 p 491 t0 495)
Analysis of pseudo code should be O(th)
Contd..
It is actually a proactive insertion algorithm where before going down to a
node, we split it if it is full.
The advantage of splitting before is, we never traverse a node twice. If we
don’t split a node before going down to it and split it only if new key is
inserted (reactive), we may end up traversing all nodes again from leaf to
root.
This happens in cases when all nodes on the path from root to leaf are full.
So when we come to the leaf node, we split it and move a key up. Moving a
key up will cause a split in parent node (because parent was already full).
This cascading effect never happens in this proactive insertion algorithm.
There is a disadvantage of this proactive insertion though, we may do
unnecessary splits.