Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

UNIT - 4

Tree

Er. Asim Ahmad


Assistant Professor, CSE
Shri Ramswaroop Memorial College of Engineering & Management
WHAT IS THE TREE DATA STRUCTURE?
A tree data structure is a hierarchical data structure in which nodes are connected to each other in a
parent-child relationship. In a tree, there is a single node called the root, which has no parent, and every
other node has exactly one parent. Nodes that have the same parent are called siblings. Each node in a
tree can have any number of children, but there are some types of trees that have specific constraints
on the maximum number of children a node can have. For example, a binary tree is a tree in which each
node can have at most two children.

A TYPICAL TREE STRUCTURES

Other data structures such as arrays, linked lists, stacks, and queues are linear data structures that store
data sequentially. In order to perform any operation in a linear data structure, the time complexity
increases with the increase in the data size. But it is not acceptable in today's computational world.
Different tree data structures allow quicker and easier access to the data as it is a non-linear data
structure.
Trees are used to represent a wide range of structures, such as file systems, organization charts, family
trees, and data structures such as heaps, tries, and search trees. The hierarchical structure of a tree
allows for efficient searching, sorting, and insertion operations, making them a fundamental data
structure in computer science.

Tree Terminologies
In data structures, a tree is a data structure that represents a hierarchical structure. Each node in a tree
has a parent node and zero or more child nodes. Here are some common terminologies related to trees
in data structures:

 Node: A single element in a tree, which contains data and references to its child nodes and parent
node (except the root node). The last nodes of each path are called leaf nodes or external nodes
that do not contain a link/pointer to child nodes.

 Edge: The edges of a tree are known as branches. Elements of trees are called their nodes. The
nodes without child nodes are called leaf nodes. A tree with 'n' vertices has 'n-1' edges.
NODES AND EDGES OF A TREE

 Root: The topmost node of a tree, which has no parent.

 Parent: A node that has one or more child nodes.

 Child: A node that has a parent node.

 Leaf: A node that has no child nodes. Leaves are the last nodes on a tree. They are nodes without
children. Like real trees, we have the root, branches, and finally leaves.

 Sibling: Two nodes connected to the same node which are the same distance from the root vertex
in a rooted tree are called siblings or in other words, nodes that have the same parent are called
siblings.

 Depth: In a tree, many edges from the root node to the particular node are called the depth of
the tree. In the tree, the total number of edges from the root node to the leaf node in the longest
path is known as the "Depth of the Tree". In the tree data structures, the depth of the root node
is 0.

 Height: The height of a node is the number of edges from the node to the deepest leaf (ie. the
longest path from the node to a leaf node) or the number of edges from the given node to the
deepest leaf node in its subtree.

HEIGHT AND DEPTH OF EACH NODE IN A TREE

 Degree of a Node: The degree of a node is the total number of branches of that node.
 Forest: A collection of disjoint trees is called a forest. You can create a forest by cutting the root
of a tree.

CREATING A FOREST FROM A TREE

 Binary Tree: A tree where each node has at most two child nodes.

 Binary Search Tree (BST): A binary tree where the left child of a node has a value less than or
equal to its parent node, and the right child of a node has a value greater than or equal to its
parent node.

 AVL Tree: The height of a tree is defined as the number of edges on the longest path from the
root to a leaf node. In an AVL tree, the difference in height between the left and right subtrees
of any node is at most one. This property is called the balance factor. If the balance factor is
greater than one or less than a negative one, then the tree is unbalanced, and the AVL tree
performs a rotation operation to rebalance the tree.

 Red-Black Tree: A self-balancing binary search tree that maintains a balance between the depth
of its left and right subtrees by ensuring that the paths from the root to any leaf node have the
same number of black nodes.

Binary Tree Representation:


A binary tree is a non-linear data structure of the tree type that has a maximum of two children for every
parent node. The node at the top of the entire binary tree is called the root node. In any binary tree,
every node has a left reference, a right reference, and a data element. However, there are two ways by
which we can implement binary trees. These are as follows:

 Array Representation: One common way to represent a binary tree as an array is to use a
breadth-first-order traversal and fill the array level by level. The root node is placed in the first
index of the array, and its left and right children are placed in the second and third indices,
respectively. Then, the next level of the tree is filled in from left to right, with each node's children
being placed in consecutive indices. If a node has no child, its corresponding index in the array is
marked as empty (e.g., by using a special value like null or -1).
I’ll use the most convenient one where we traverse each level starting from the root node and
from left to right and mark them with the indices these nodes would belong to. The binary tree
also be implemented using the array data structure. If P is the index of the parent element, then
the left child will be stored in their index (2p) +1, and the right child will be stored in the
index (2p) +2.

And now we can simply make an array of length 7 and store these elements at their
corresponding indices.

We have another method to represent binary trees called the linked representation of binary
trees. Don’t confuse this with the linked lists you have studied. And the reason why I am saying
that is because linked lists are lists that are linear data structures.

 Pointer (Linked List) Representation:


In a pointer (linked list) representation of a binary tree, each node of the tree is represented using
a linked list node structure, which contains a pointer to the node's left child, a pointer to its right
child, and the node's data value. Now if you remember a doubly linked list helped us traverse
both to the left and the right. And using that we would create a similar node here, pointing both
to the left and the right child node. Follow the below representation of a node here in the linked
representation of a binary tree.

You can see how closely this representation resembles a real tree node, unlike the array
representation where all the nodes succumbed to a 2D structure. And now we can very easily
transform the whole tree into its linked representation which is just how we imagined it would
have looked in real life.
Kind of Binary Tree:
A binary tree is a tree data structure in which each node has at most two children, referred to as the
left child and the right child. There are several kinds of binary trees, some of the most common ones
are:
 Binary Search Tree:
A binary search tree (BST) is a type of data structure used in computer science and programming.
It is a tree structure where each node has at most two children, and the left subtree of a node
contains only values less than the node's value, while the right subtree of a node contains only
values greater than the node's value.

BINARY SEARCH TREE

The key feature of a binary search tree is that it allows for efficient searching, insertion, and
deletion operations, with an average time complexity of O(log n), where n is the number of nodes
in the tree. This is because the structure of the tree ensures that each comparison reduces th e
search space by half. The following are the properties of a binary search tree:

o It is a type of binary tree.


o All nodes of the left subtree are lesser than the node itself.
o All nodes of the right subtree are greater than the node itself.
o The Left and Right subtrees are also binary trees.
o There are no duplicate nodes.

Having discussed all the properties, you must now tell me if the above binary tree was a binary
search tree or not. The answer should be no. Since the left subtree of the root node has a single
element that is greater than the root node violating the 3rd property, it is not a binary search tree.

Searching in Binary search tree

Searching means to find or locate a specific element or node in a data structure. In Binary search tree,
searching a node is easy because elements in BST are stored in a specific order. The steps of searching a
node in Binary Search tree are listed as follows -

1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it is smaller than
the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
5. Repeat the above procedure recursively until the match is found.
6. If the element is not found or not present in the tree, then return NULL.

Algorithm to search an element in Binary search tree

1. Search (root, item)


2. Step 1 - if (item = root → data) or (root = NULL)
3. return root
4. else if (item < root → data)
5. return Search(root → left, item)
6. else
7. return Search(root → right, item)
8. END if
9. Step 2 - END
Deletion in Binary Search tree

In a binary search tree, we must delete a node from the tree by keeping in mind that the property of BST
is not violated. To delete a node from BST, there are three possible situations occur:

o The node to be deleted is the leaf node, or,


o The node to be deleted has only one child, and,
o The node to be deleted has two children

When the node to be deleted is the leaf node

It is the simplest case to delete a node in BST. Here, we have to replace the leaf node with NULL and
simply free the allocated space.

We can see the process to delete a leaf node from BST in the below image. In below image, suppose we
have to delete node 90, as the node to be deleted is a leaf node, so it will be replaced with NULL, and the
allocated space will free.

When the node to be deleted has only one child

In this case, we have to replace the target node with its child, and then delete the child node. It means
that after replacing the target node with its child node, the child node will now contain the value to be
deleted. So, we simply have to replace the child node with NULL and free up the allocated space.

We can see the process of deleting a node with one child from BST in the below image. In the below
image, suppose we have to delete the node 79, as the node to be deleted has only one child, so it will be
replaced with its child 55.

So, the replaced node 79 will now be a leaf node that can be easily deleted.
When the node to be deleted has two children

This case of deleting a node in BST is a bit complex among other two cases. In such a case, the steps to be
followed are listed as follows -

o First, find the inorder successor of the node to be deleted.


o After that, replace that node with the inorder successor until the target node is placed at the leaf
of tree.
o And at last, replace the node with NULL and free up the allocated space.

The inorder successor is required when the right child of the node is not empty. We can obtain the
inorder successor by finding the minimum element in the right child of the node.

We can see the process of deleting a node with two children from BST in the below image. In the below
image, suppose we have to delete node 45 that is the root node, as the node to be deleted has two
children, so it will be replaced with its inorder successor. Now, node 45 will be at the leaf of the tree so
that it can be deleted easily.
Insertion in Binary Search tree

A new key in BST is always inserted at the leaf. To insert an element in BST, we have to start searching
from the root node; if the node to be inserted is less than the root node, then search for an empty
location in the left subtree. Else, search for the empty location in the right subtree and insert the data.
Insert in BST is similar to searching, as we always have to maintain the rule that the left subtree is smaller
than the root, and right subtree is larger than the root.

Now, let's see the process of inserting a node into BST using an example.

The complexity of the Binary Search tree: 'n' is the number of nodes in the given tree.

1. Time Complexity
Operations Best case time Average case time Worst case time
complexity complexity complexity

Insertion O(log n) O(log n) O(n)

Deletion O(log n) O(log n) O(n)

Search O(log n) O(log n) O(n)


2. Space Complexity
Operations Space complexity

Insertion O(n)

Deletion O(n)

Search O(n)

 Strictly Binary Tree:


The strictly binary tree can only be considered if each node must contain either 0 or 2 children.
The full binary tree can also be defined as the tree in which each node must contain 2 children
except the leaf nodes. Here are some key characteristics of a strictly binary tree:

o Every non-leaf node in a strictly binary tree has exactly two children.
o All leaf nodes in a strictly binary tree are at the same level.
o The number of nodes in a strictly binary tree with height h is (2^h) - 1.
o The height of a strictly binary tree with n nodes is n - 1.

Strictly binary trees are commonly used in computer science and data structures. They provide
an efficient way to store and retrieve data and can be used for a variety of applications, such as
representing hierarchical data structures or searching for data in sorted order.

DIFFERENT KINDS OF BINARY TREE


 Complete Binary Tree:
A complete binary tree is a binary tree in which all levels of the tree are completely filled, except
possibly the last level, which is filled from left to right. More formally, a complete binary tree of
depth d is defined as in the above figure.

o Every level of the tree, except possibly the last one, is completely filled with nodes.
o All nodes on the last level are as far left as possible.

In other words, a complete binary tree is a binary tree where all levels except the last are
completely filled, and in the last level, nodes are filled from left to right without any gaps. This
means that if a node has a left child, it must also have a right child, except possibly for nodes on
the last level.
 Extended Binary Tree:
The extended binary tree is a type of binary tree in which all the null subtrees of the original tree
are replaced with special nodes called external nodes whereas other nodes are called internal
nodes.

Here the circles represent the internal nodes and the boxes represent the external nodes.
Properties of External binary tree

o The nodes from the original tree are internal nodes and the special nodes are external
nodes.
o All external nodes are leaf nodes and the internal nodes are non-leaf nodes.

Every internal node has exactly two children and every external node is a leaf. It displays the
result which is a complete binary tree.

Threaded Binary trees


In the linked representation of binary trees, more than one half of the link fields contain NULL values
which results in wastage of storage space. So in order to effectively manage the space, a method was
introduced according to which, the NULL links are replaced with special links known as threads. Such
binary trees with threads are known as threaded binary trees. Each node in a threaded binary tree either
contains a link to its child node or thread to other nodes in the tree. It decreases the memory wastage by
setting the null pointers of a leaf node to the in-order predecessor or in-order successor.
So the basic idea of a threaded binary tree is that for the nodes whose right pointer is null, we store the
in-order successor of the node (if-exists), and for the nodes whose left pointer is null, we store the in-
order predecessor of the node (if-exists). One thing to note is that the leftmost and the rightmost child
pointer of a tree always points to null as their in-order predecessor and successor do not exist.
 AVL Binary Tree:
An AVL tree is a self-balancing binary search tree that was named after its inventors, Adelson-
Velskii and Landis. It was the first such data structure to be invented, and it guarantees that the
height of the tree is always logarithmic with respect to the number of nodes.
In an AVL tree, each node has two children, and each node is assigned a balance factor, which is
the height of its right subtree minus the height of its left subtree. The balance factor is always -
1, 0, or 1, and if the balance factor of a node is not within this range, the tree needs to be
rebalanced. To maintain balance, the AVL tree performs rotations to change the structure of the
tree. There are four types of rotations: left rotation, right rotation, left-right rotation, and right-
left rotation. These rotations are used to adjust the balance factors of nodes in the tree so that
they fall within the acceptable range.
You can see, none of the nodes above has a balance factor of more than 1 or less than -1. So, for
a balanced tree to be considered an AVL tree. If a tree is not the AVL tree then we have to make
some rotation according to the situation and then it will become the AVL tree. So, rotation is very
important for making AVL trees and it is of four types:

o LL Rotation: The name LL, is just because we inserted the new element to the left subtree
of the root. In this rotation technique, you just simply rotate your tree one time in the
clockwise direction as shown below:

o RR Rotation: The name RR, is just because we inserted the new element to the right
subtree of the root. In this rotation technique, you just simply rotate your tree one time
in the anticlockwise direction as shown below:
o LR Rotation: The method you will follow now to make this tree an AVL again is called the
LR rotation. The name LR is just because we inserted the new element to the right to the
left subtree of the root. In this rotation technique, there is a subtle complexity, which
says, first rotate the left subtree in the anticlockwise direction, and then the whole tree
in the clockwise direction. Follow the two steps illustrated below:
o RL Rotation: The method you will follow now to make this tree an AVL again is called the
RL rotation. The name RL is just because we inserted the new element to the left to the
right subtree of the root. We follow the same technique we used above, which says, first
rotate the right subtree in the clockwise direction, and then the whole tree in the
anticlockwise direction. Follow the two steps illustrated below:

 B Tree:
The limitations of traditional binary search trees can be frustrating. Meet the B-Tree, the multi-
talented data structure that can handle massive amounts of data with ease. When it comes to
storing and searching large amounts of data, traditional binary search trees can become
impractical due to their poor performance and high memory usage. B-Trees, also known as B-
Tree or Balanced Tree, are a type of self-balancing tree that was specifically designed to
overcome these limitations.
B-tree is a special type of self-balancing search tree in which each node can contain more than
one key and can have more than two children. It is a generalized form of the binary search tree.
It is also known as a height-balanced m-way tree.
B-TREE

Each node in a B-Tree can contain multiple keys, which allows the tree to have a larger branching
factor and thus a shallower height. This shallow height leads to less disk I/O, which results in
faster search and insertion operations. B-Trees are particularly well suited for storage systems
that have slow, bulky data access such as hard drives, flash memory, and CD-ROMs.

Tree Traversal algorithms:


A binary tree is a hierarchical data structure in which each node has at most two children, referred to as
the left child and the right child. Each node of the tree has a value or data associated with it. Traversing
a tree means visiting every node in the tree. You might, for instance, want to add all the values in the
tree or find the largest one. For all these operations, you will need to visit each node of the tree. Linear
data structures like arrays, stacks, queues, and linked lists have only one way to read the data. But a
hierarchical data structure like a tree can be traversed in different ways.

TREE HAVING ITS INORDER, PRE-ORDER AND POST-ORDER TRAVERSAL


Inorder Traversal: In the case of binary search trees (BST), Inorder traversal gives nodes in non-
decreasing order. To get nodes of BST in non-increasing order, a variation of Inorder traversal where
Inorder traversal is reversed can be used. The algorithm of Inorder tree traversal is:

 Traverse the left subtree, i.e., call Inorder(left->subtree)


 Visit the root.
 Traverse the right subtree, i.e., call Inorder(right->subtree)

// C program for different tree traversals


#include <stdio.h>
#include <stdlib.h>

/* A binary tree node has data, pointer to left child and a pointer to right child */

struct node {
int data;
struct node* left;
struct node* right;
};

/* Helper function that allocates a new node with the given data and NULL left and right pointers. */

struct node* newNode(int data)


{
struct node* node
= (struct node*)malloc(sizeof(struct node));
node->data = data;
node->left = NULL;
node->right = NULL;

return (node);
}

/* Given a binary tree, print its nodes in inorder*/

void printInorder(struct node* node)


{
if (node == NULL)
return;

/* first recur on left child */


printInorder(node->left);

/* then print the data of node */


printf("%d ", node->data);

/* now recur on right child */


printInorder(node->right);
}

/* Driver code*/

int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);

// Function call
printf("\nInorder traversal of binary tree is \n");
printInorder(root);

getchar();
return 0;
}
Preorder Traversal: In this traversal method, the root node is visited first, then the left subtree and finally
the right subtree. The algorithm of Inorder tree traversal is:

 Traverse the left subtree, i.e., call Inorder(left->subtree)


 Visit the root.
 Traverse the right subtree, i.e., call Inorder(right->subtree)

// C program for different tree traversals


#include <stdio.h>
#include <stdlib.h>

/* A binary tree node has data, pointer to left child and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};

/* Helper function that allocates a new node with the given data and NULL left and right pointers. */
struct node* newNode(int data)
{
struct node* node
= (struct node*)malloc(sizeof(struct node));
node->data = data;
node->left = NULL;
node->right = NULL;

return (node);
}
/* Given a binary tree, print its nodes in preorder*/
void printPreorder(struct node* node)
{
if (node == NULL)
return;

/* first print data of node */


printf("%d ", node->data);

/* then recur on left subtree */


printPreorder(node->left);

/* now recur on right subtree */


printPreorder(node->right);
}

/* Driver code*/
int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);

// Function call
printf("\nPreorder traversal of binary tree is \n");
printPreorder(root);

getchar();
return 0;
}

Postorder Traversal: In this traversal method, the root node is visited last, hence the name. First we
traverse the left subtree, then the right subtree and finally the root node. The algorithm is as follows:

 Traverse the left subtree, i.e., call Postorder(left->subtree)


 Traverse the right subtree, i.e., call Postorder(right->subtree)
 Visit the root
// C program for different tree traversals
#include <stdio.h>
#include <stdlib.h>

/* A binary tree node has data, pointer to left child


and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};

/* Helper function that allocates a new node with the


given data and NULL left and right pointers. */
struct node* newNode(int data)
{
struct node* node
= (struct node*)malloc(sizeof(struct node));
node->data = data;
node->left = NULL;
node->right = NULL;

return (node);
}

/* Given a binary tree, print its nodes according to the


"bottom-up" postorder traversal. */
void printPostorder(struct node* node)
{
if (node == NULL)
return;

// first recur on left subtree


printPostorder(node->left);

// then recur on right subtree


printPostorder(node->right);

// now deal with the node


printf("%d ", node->data);
}

/* Driver code*/int
main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3); root-
>left->left = newNode(4);
root->left->right = newNode(5);

// Function call
printf("\nPostorder traversal of binary tree is \n");
printPostorder(root);

getchar();
return 0;
}

Huffman coding using Binary Tree

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to
input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. Huffman codes are of variable-length, and without any prefix (that means no code is a prefix of
any other). Any prefix-free binary code can be displayed or visualized as a binary tree with the encoded
characters stored at the leaves.

Huffman tree or Huffman coding tree defines as a full binary tree in which each leaf of the tree corresponds
to a letter in the given alphabet.

The Huffman tree is treated as the binary tree associated with minimum external path weight that means, the
one associated with the minimum sum of weighted path lengths for the given set of leaves. So the goal is to
construct a tree with the minimum external path weight.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as
a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the
least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other extracted node as its right child. Add this node to the
min heap.
4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and
the tree is complete.

Example: Letter frequency table


Letter z k m c u d l e

Frequency 2 7 24 32 37 42 42 120

Huffman code
Letter Freq Code Bits

e 120 0 1

d 42 101 3

l 42 110 3

u 37 100 3

c 32 1110 4

m 24 11111 5

k 7 111101 6

z 2 111100 6

The Huffman tree (for the above example) is given below:


Time complexity: O(nlogn) where n is the number of unique characters.
Space complexity :- O(N)

Applications of Huffman Coding:


1. They are used for transmitting fax and text.
2. They are used by conventional compression formats like PKZIP, GZIP, etc.
3. Multimedia codecs like JPEG, PNG, and MP3 use Huffman encoding(to be more precise the prefix codes).
4. It is useful in cases where there is a series of frequently occurring characters.

You might also like