Data Structures Trees
Data Structures Trees
Data Structures Trees
A Tree is a non-linear data structure which organizes data in a hierarchical structure (a recursive
definition).
OR
A Tree is a connected graph without any circuits.
OR
If in a graph, there is one and only one path between every pair of vertices, then such graph is called
as a Tree.
Tree Terminology :
The important terms related to tree data structure are ..
Root:
The first node from where the tree originates is called as a root node.
In any tree, there must be only one root node .
We can never have multiple root nodes in a tree data structure.
Ex:
Parent:
The node which has a branch from it to any other node is called as a parent node.
In other words, the node which has one or more children is called as a parent node.
In a tree, a parent node can have any number of child nodes.
Ex:
Here,
Node A is the parent of nodes B and C
Node B is the parent of nodes D, E and F
Node C is the parent of nodes G and H
Node E is the parent of nodes I and J
Node G is the parent of node K
Child:
The node which is a descendant of some node is called as a child node.
All the nodes except root node are child nodes.
Ex:
Here,
Nodes B and C are the children of node A
Nodes D, E and F are the children of node B and Likewise.
Siblings:
Nodes which belong to the same parent are called as siblings.
In other words, nodes with the same parent are sibling nodes.
Ex:
Here,
Nodes B and C are siblings
Nodes D, E and F are siblings
Nodes G and H are siblings
Nodes I and J are siblings
Degree:
Degree of a node is the total number of children of that node.
Degree of a tree is the highest degree of a node among all the nodes in the tree.
Ex:
Here,
Degree of node A = 2
Degree of node B = 3
Degree of node C = 2
Degree of node D = 0
Degree of node E = 2
Degree of node F = 0
Degree of node G = 1
Degree of node H = 0
Degree of node I = 0
Degree of node J = 0
Degree of node K = 0
Internal Node:
The node which has at least one child is called as an Internal Node.
Internal nodes are also called as non-terminal nodes.
Every non-leaf node is an internal node.
Ex:
Leaf Node:
The node which does not have any child is called as a leaf node.
Leaf nodes are also called as external nodes or terminal nodes.
Ex:
Height:
Total number of Edges that lies on the longest (downward) path from any node to a leaf node is
called as Height of that Node.
Height of a tree is the height of root node.
Height of all leaf nodes will be 0.
Ex:
In the Above Diagram ,
Height of node A = 3
Height of node B = 2
Height of node C = 2
Height of node E = 1
Height of node G = 1
Height of all Leaf Nodes F,D,H,I,J,K = 0
Depth:
Total number of edges from root node to a particular node is called as depth of that node.
OR
Depth of a tree is the total number of edges from node to a root node in the longest (upward) path.
Depth of the root node will always be 0
The terms “level” and “depth” are used interchangeably.
Ex:
Here,
Depth of node A = 0 (root node) Depth of node B = 1
Depth of node C = 1 Depth of node D = 2
Depth of node E = 2 Depth of node F = 2
Depth of node G = 2 Depth of node H = 2
Depth of node I = 3 Depth of node J = 3
Depth of node K = 3
Subtree:
In a tree, each child from a node forms a subtree recursively.
Every child node forms a subtree on its parent node.
Ex:
Forest:
A forest is a set of disjoint trees.
Ex:
Advantages of trees:
Trees reflect structural relationships in the data
Trees are used to represent hierarchies
Trees provide an efficient insertion and searching
Trees are very flexible data, allowing to move sub trees around with minimum effort
Application of Trees:
One reason to use trees might be because you want to store information that naturally forms a
hierarchy. For example, the File System on a computer:
File System
/ <-- root
/ \
... home
/ \
upgrade course
/ / | \
... cs11 cs12 cs13
In the above Binary Tree each node contains Left_Child , Data and Right_Child .
Node 1 is the root and its Left_Child is pointing to node 2 and Right_Child to node 3.
Node 2 is the child of node 1 and its Left_Child is pointing to node 5 and Right_Child to node 4.
Node 4 is a leaf node because it has no children.
Node 5 is the child of node 2 and its Left_Child is pointing to node 6 and Right_Child to node 7.
Both Node 6 and 7 are leaf nodes.
Node 3 is the child of node 1 and its Left_Child and Right_Child are pointing to nothing thus it is a
leaf node.
Types of Binary Tree: There are three different types of Binary Trees :
Algorithm:
1. Put a Root Node in a Queue.
2. Iterate till the Queue is not Empty.
3. While iterating, take one element from Queue say 45 in our example, print it and put children's
of 45 in queue.
4. Repeat steps till Queue is not empty.
Ex2:
1.Pre Order Traversal - visit the parent first and then left and right children;
Algorithm: PreOrder(tree)
Check if(root!=NULL) then
Visit the root node and print data of node.
Traverse the left subtree, i.e., call PreOrder(left-subtree)
Traverse the right subtree, i.e., call PreOrder(right-subtree)
PreOrder traversal of the above tree- 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
2.InOrder Traversal - visit the left child, then the parent and the right child;
Algorithm: InOrder(tree)
Check if (root != NULL) then
Traverse the left subtree, i.e., call InOrder(left-subtree)
Visit the root node and print data of node.
Traverse the right subtree, i.e., call InOrder(right-subtree)
InOrder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
3.PostOrder traversal - visit left child, then the right child and then the parent;
Algorithm: PostOrder(tree)
Check if(root!=NULL) then
Traverse the left subtree, i.e., call PostOrder(left-subtree)
Traverse the right subtree, i.e., call PostOrder(right-subtree)
Visit the root and print data of node.
PostOrder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
What is Binary Search Tree (BST)?
Binary search tree is a data structure that quickly allows us to maintain a sorted list of numbers.
It is called a binary tree because each tree node has maximum of two children.
A binary search tree is a useful data structure for fast addition and removal of data.
It is called a search tree because it can be used to search for the presence of a number
in O(log(n)) time.
Binary Search Tree is a node-based binary tree data structure which has the following properties:
The left subtree of a node contains only nodes with keys lesser than the parent’s key.
The right subtree of a node contains only nodes with keys greater than the parent’s key.
The left and right subtree each must be a binary search tree.
A Binary search tree is shown in the above figure. As the constraint applied on the BST, we can see
that the root node 8 doesn't contain any value greater than or equal to 8 in its left sub-tree and it also
doesn't contain any value less than 8 in its right sub-tree.
Here, 4,2,5,1,3 are < 6 since they are left of the binary search tree and 8,7,13,9 are >= 6 since they
are right of the binary search tree.
Ex: Create the binary search tree using the following data elements:-43, 10, 79, 90, 12, 54, 11, 9, 50
Insert 43 into the tree as the root of the tree.
Read the next element, if it is lesser than the root node element, insert it as the root of the
left sub-tree.
Otherwise, insert it as the root of the right of the right sub-tree.
The process of creating BST by above elements is shown in the image below.
Comparision between Binary Tree and Binary Search Tree:
Structure Each node must have at the most The value of the nodes in the left
two child nodes with each node subtree are less than or equal to the
being connected from exactly one value of the root node, and the nodes
other node by a directed edge. to the right subtree have values
greater than or equal to the value of
the root node.
Order There is no relative order to how It follows a definitive order to how the
the nodes should be organized nodes should be organized in a tree.
It’s basically a hierarchical data It’s a variant of the binary tree in which
structure that is a collection of the nodes are arranged in a relative
elements called nodes order.
Operation It is mainly used for insertion, deletion, It is used for fast and efficient lookup of
and searching of elements. data and information in a tree structure.
Types Different types of binary trees are the T-trees, AVL trees, Splay trees, Tango
“Full Binary Tree”, “Complete Binary trees, Red-Black trees etc
Tree”, “Perfect Binary Tree”, and
“Extended Binary Tree”
Operations on BST:
Operations on Binary Search Tree are Search ,Insertion ,Deletion .
Searching Operation in BST :
Searching - finding or locating some specific element or node within a data structure.
However, searching for some specific node in binary search tree is pretty easy due to the fact that,
element in BST are stored in a particular order.
Searching a binary search tree for a specific value is of two types:
Recursive
Iterative process.
This explanation covers a recursive method,we begin by examining the root node.
If the tree is null, the value we are searching for does not exist in the tree.
Otherwise, if the value equals the root, the search is successful. If the value is less than the root,
search the left subtree.
Similarly, if it is greater than the root, search the right subtree.
This process is repeated until the value is found or the indicated subtree is null. If the searched
value is not found before a null subtree is reached, then the item must not be present in the tree.
This operation requires O (log n) time in the average case, but needs O(n) time in the worst case,
when the unbalanced tree resembles a linked list (degenerate tree).
For Ex: Consider the following binary tree and find 60 in the BST.
Compare the 60 with the root of the tree i.e 50.
If the key is matched then return the location of the node.
Otherwise check if 60 is less than the element present on root(50), if so then move to the left sub-
tree.
But 60 > 50 thus move to the right sub-tree.
Now root is 75. 60 < 75 thus move to the left sub-tree.
If 60 = 60 , return root.
OR
AVL tree is a self-balancing Binary Search Tree (BST) where the difference between heights
of left and right subtrees cannot be more than one for all nodes
The AVL Tree has a bad cost in inserting and removing data because it has to self sort itself each
and every time. It has a wonderful searching capacity with O(log(n)) running time guarranteed
making the AVL Tree so popular.
It is best to implement the operations of AVL Trees it under recursion because the tree should be
rotating from the leaves balancing itself up to the general root.
Why AVL Trees ? [Advantages of AVL tree]
Most of the BST operations (e.g., search, max, min, insert, delete.. etc) take O(h) time where h is the
height of the BST. The cost of these operations may become O(h) for a skewed Binary tree. If we
make sure that height of the tree remains O(Log h) after every insertion and deletion, then we can
guarantee an upper bound of O(Log h) for all these operations. The height of an AVL tree is always
O(Log h) where h is the number of nodes in the tree.
A skewed tree is a tree where each node has only one child node or none. This type of BST is similar
to a linked list.
AVL Trees are self- balancing Binary Search Trees (BSTs). A normal BST may be skewed to either
side which will result in a much greater effort to search for a key (the order will be much more
than O( log2n) O(log2n) ) and sometimes equal O( n O(n) ) at times giving a worst case result which
is the same effort spent in sequential searching. This makes BSTs useless as a practically efficient
Data Structure.
Balance Factor: It is defined as the difference between height of left subtree and height of right
subtree.
If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right
sub-tree.
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal
height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right
sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor associated with each
node is in between -1 and +1.
Therefore, it is an example of AVL tree.
There are following two cases possible with the balance factor:
Case-01:
After the operation, the balance factor of each node is either 0 or 1 or -1.
In this case, the AVL tree is considered to be balanced.
The operation is concluded.
Case-02:
After the operation, the balance factor of at least one node is not 0 or 1 or -1.
In this case, the AVL tree is considered to be imbalanced.
Rotations are then performed to balance the imbalanced AVL tree.
Insertion in AVL tree is performed in the same way as it is performed in a binary search tree. The
new node is added into AVL tree as the leaf node. However, it may lead to violation in the AVL tree
property and therefore the tree may need balancing.
A red-black tree is a binary search tree with one extra attribute for each node: the colour, which is
either red or black. Red-Black tree's node structure is
struct Node
{
int data;
bool color;
struct Node *right_child;
struct Node *left_child;
};
Why Red-Black Trees?
Most of the BST operations (e.g., search, max, min, insert, delete and etc) take O(h) time
where h is the height of the BST.
The cost of these operations may become O(h) for a skewed Binary tree. If we make sure
that height of the tree remains O(Log h) after every insertion and deletion, then we can
guarantee an upper bound of O(Log h) for all these operations.
The height of a Red-Black tree is always O(Log n) where n is the number of nodes in the
tree. Comparison with AVL trees are more balanced compared to Red-Black Trees, but they
may cause more rotations during insertion and deletion.
Black height is number of black nodes on a path from root to a leaf. Leaf nodes are also counted
black nodes.
From above properties 3 and 4, we can derive a
Red-Black Tree of height h has black-height >= h/2.
Advantages of Red Black Tree
Red black tree are useful when we need insertion and deletion relatively frequent.
Red-black trees are self-balancing so these operations are guaranteed to be O(logn).
They have relatively low constants in a wide variety of scenarios.
In AVL tree insertion, we used rotation as a tool to do balancing after insertion caused imbalance.
In Red-Black tree, we use two tools to do balancing.
Recoloring
Rotation
We try recoloring first, if recoloring doesn’t work, then we go for rotation. Following is detailed
algorithm. The algorithms has mainly two cases depending upon the color of uncle. If uncle is red,
we do recoloring. If uncle is black, we do rotations and/or recoloring.
Let x be the newly inserted node.
1) Perform standard BST insertion and make the color of newly inserted nodes as RED.
2) If x is root, change color of x as BLACK (Black height of complete tree increases by 1).
3) Do following if color of x’s parent is not BLACK and x is not root.
a) If x’s uncle is RED (Grand parent must have been black from Red)
i) Change color of parent and uncle as BLACK.
ii) color of grand parent as RED.
iii) Change x = x’s grandparent, repeat steps 2 and 3 for new x.
b) If x’s uncle is BLACK, then there can be four configurations for x, x’s parent (p) and x’s
grandparent (g) (This is similar to AVL Tree)
Left Left Case (p is left child of g and x is left child of p)
Left Right Case (p is left child of g and x is right child of p
Right Right Case (Mirror of case i)
Right Left Case (Mirror of case ii)
Examples of Insertion :
Deletion Operation of Red Black Trees:
Like Insertion, recoloring and rotations are used to maintain the Red-Black properties.
In delete operation, we check color of sibling to decide the appropriate case.
The main property that violates after insertion is two consecutive reds. In delete, the main violated
property is, change of black height in subtrees as deletion of a black node may cause reduced black
height in one root to leaf path.
Deletion is fairly complex process. To understand deletion, notion of double black is used. When a
black node is deleted and replaced by a black child, the child is marked as double black. The main
task now becomes to convert this double black to single black.
Steps For Deletion:
1. Perform standard BST delete. When we perform standard delete operation in BST, we
always end up deleting a node which is either leaf or has only one child (For an internal
node, we copy the successor and then recursively call delete for successor, successor is
always a leaf node or a node with one child). So we only need to handle cases where a node
is leaf or has one child. Let v be the node to be deleted and u be the child that replaces v
(Note that u is NULL when v is a leaf and color of NULL is considered as Black).
2. Simple Case: If either u or v is red, we mark the replaced child as black (No change in black
height). Note that both u and v cannot be red as v is parent of u and two consecutive reds are
not allowed in red-black tree.
ii) Do following while the current node u is double black and it is not root. Let sibling of
node be s.
(a): If sibling s is black and at least one of sibling’s children is red, perform rotation(s).
Let the red child of s be r. This case can be divided in four subcases depending upon positions of s
and r.
(i) Left Left Case (s is left child of its parent and r is left child of s or both children
of s are red). This is mirror of right right case shown in below diagram.
(ii) Left Right Case (s is left child of its parent and r is right child). This is mirror of
right left case shown in below diagram.
(iii) Right Right Case (s is right child of its parent and r is right child of s or both
children of s are red)
(iv) Right Left Case (s is right child of its parent and r is left child of s)
(b): If sibling is black and its both children are black, perform recoloring, and recur for the
parent if parent is black.
In this case, if parent was red, then we didn’t need to recur for prent, we can simply make it black
(red + double black = single black)
(c): If sibling is red, perform a rotation to move old sibling up, recolor the old sibling and parent.
The new sibling is always black (See the below diagram). This mainly converts the tree to black
sibling case (by rotation) and leads to case (a) or (b). This case can be divided in two subcases.
(i) Left Case (s is left child of its parent). This is mirror of right right case shown in below
diagram. We right rotate the parent p.
(ii) Right Case (s is right child of its parent). We left rotate the parent p.
iii) If u is root, make it single black and return (Black height of complete tree reduces by 1).
Splay Tree :
Like AVL and Red-Black Trees, Splay tree is also self-balancing BST. The main idea of splay tree
is to bring the recently accessed item to root of the tree, this makes the recently searched item to be
accessible in O(1) time if accessed again. The idea is to use locality of reference (In a typical
application, 80% of the access are to 20% of the items). Imagine a situation where we have millions
or billions of keys and only few of them are accessed frequently, which is very likely in many
practical applications.
All splay tree operations run in O(log n) time on average, where n is the number of entries in the
tree. Any single operation can take Theta(n) time in the worst case.
The worst case time complexity of Binary Search Tree (BST) operations like search, delete, insert is
O(n). The worst case occurs when the tree is skewed. We can get the worst case time complexity as
O(log n) with AVL and Red-Black Trees.
Operations on splay tree are Insert , Delete , Search.
Search Operation:
The search operation in Splay tree does the standard BST search, in addition to search, it also splays
(move a node to the root). If the search is successful, then the node that is found is splayed and
becomes the new root. Else the last node accessed prior to reaching the NULL is splayed and
becomes the new root.
There are following cases for the node being accessed.
Node is root : We simply return the root, don’t do anything else as the accessed node is already
root.
Otherwise Rotations are performed ,
Zig Rotation :
The Zig Rotation in splay tree is similar to the single right rotation in AVL Tree rotations. In zig
rotation, every node moves one position to the right from its current position.
Consider the following example...
Zag Rotation
The Zag Rotation in splay tree is similar to the single left rotation in AVL Tree rotations. In zag
rotation, every node moves one position to the left from its current position. Consider the following
example...
Zig-Zig Rotation
The Zig-Zig Rotation in splay tree is a double zig rotation. In zig-zig rotation, every node moves
two positions to the right from its current position. Consider the following example...
Zag-Zag Rotation
The Zag-Zag Rotation in splay tree is a double zag rotation. In zag-zag rotation, every node moves
two positions to the left from its current position. Consider the following example...
Zig-Zag Rotation
The Zig-Zag Rotation in splay tree is a sequence of zig rotation followed by zag rotation. In zig-zag
rotation, every node moves one position to the right followed by one position to the left from its
current position. Consider the following example...
Zag-Zig Rotation
The Zag-Zig Rotation in splay tree is a sequence of zag rotation followed by zig rotation. In zag-zig
rotation, every node moves one position to the left followed by one position to the right from its
current position. Consider the following example...
Note:Every Splay tree must be a binary search tree but it is need not to be balanced tree.
Example:
The important thing to note is, the search or splay operation not only brings the searched key to
root, but also balances the BST. For example in above case, height of BST is reduced by 1
Splay trees have become the most widely used basic data structure invented in the last 30
years, because they’re the fastest type of balanced search tree for many applications.
Splay trees are used in Windows NT (in the virtual memory, networking, and file system
code), the gcc compiler and GNU C++ library, the sed string editor, Fore Systems network
routers, the most popular implementation of Unix malloc, Linux loadable kernel modules,
and in much other software.