DM Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Trees:

Trees are graphs that do not contain even a single cycle. They represent hierarchical structure in
a graphical form. Trees belong to the simplest class of graphs. Despite their simplicity, they
have a rich structure.

Trees provide a range of useful applications as simple as a family tree to as complex as trees in
data structures of computer science.

A connected acyclic graph is called a tree. In other words, a connected graph with no cycles is
called a tree. The edges of a tree are known as branches. Elements of trees are called
their nodes. The nodes without child nodes are called leaf nodes. A tree with ‘n’ vertices has ‘n-
1’ edges. If it has one more edge extra than ‘n-1’, then the extra edge should obviously has to
pair up with two vertices which leads to form a cycle. Then, it becomes a cyclic graph which is
a violation for the tree graph.

Example 1

The graph shown here is a tree because it has no cycles and it is connected. It has four vertices
and three edges, i.e., for ‘n’ vertices ‘n-1’ edges as mentioned in the definition.
Example 2

n the above example, the vertices ‘a’ and ‘d’ has degree one. And the other two vertices ‘b’ and
‘c’ has degree two. This is possible because for not forming a cycle, there should be at least two
single edges anywhere in the graph. It is nothing but two edges with a degree of one.

Forest

A disconnected acyclic graph is called a forest. In other words, a disjoint collection of trees is
called a forest.

Example

The following graph looks like two sub-graphs; but it is a single disconnected graph. There are
no cycles in this graph. Hence, clearly it is a forest.

Spanning Trees

Let G be a connected graph, then the sub-graph H of G is called a spanning tree of G if −

H is a tree

H contains all vertices of G.


A spanning tree T of an undirected graph G is a subgraph that includes all of the vertices of G.

In the above example, G is a connected graph and H is a sub-graph of G.

Clearly, the graph H has no cycles; it is a tree with six edges which is one less than the total
number of vertices. Hence H is the Spanning tree of G

Properties of trees

A tree is an undirected graph G that satisfies any of the following equivalent conditions:

1. G is connected and has no cycles.


2. G is acyclic, and a simple cycle is formed if any edge is added to G.
3. G is connected, but is not connected if any single edge is removed from G.
4. G is connected and the 3-vertex complete graph K3 is not a minor of G.
5. Any two vertices in G can be connected by a unique simple path.
Binary Tree

A binary tree is made of nodes, where each node contains a "left" reference, a "right" reference,
and a data element. The topmost node in the tree is called the root. Every node (excluding a
root) in a tree is connected by a directed edge from exactly one other node. This node is called a
parent. On the other hand, each node can be connected to arbitrary number of nodes, called
children. Nodes with no children are called leaves, or external nodes. Nodes which are not
leaves are called internal nodes. Nodes with the same parent are called siblings.

Tree Traversal Algorithms:

Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all
nodes are connected via edges (links) we always start from the root (head) node. That is, we
cannot randomly access a node in a tree. There are three ways which we use to traverse a tree −

1.In-order Traversal

2.Pre-order Traversal

3.Post-order Traversal

Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the
values it contains.
In-order Traversal

In this traversal method, the left subtree is visited first, then the root and later the right sub-tree.
We should always remember that every node may represent a subtree itself.

If a binary tree is traversed in-order, the output will produce sorted key values in an ascending
order

We start from A, and following in-order traversal, we move to its left subtree B. B is also
traversed in-order. The process goes on until all the nodes are visited. The output of inorder
traversal of this tree will be −

D→B→E→A→F→C→G
Pre-order Traversal

In this traversal method, the root node is visited first, then the left subtree and finally the right
subtree.

We start from A, and following pre-order traversal, we first visit A itself and then move to its
left subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited.
The output of pre-order traversal of this tree will be −

A→B→D→E→C→F→G

Post-order Traversal

In this traversal method, the root node is visited last, hence the name. First we traverse the left
subtree, then the right subtree and finally the root node
We start from A, and following pre-order traversal, we first visit the left subtree B. B is also
traversed post-order. The process goes on until all the nodes are visited. The output of post-
order traversal of this tree will be −

D→E→B→F→G→C→A

Binary Search Tree

A Binary Search Tree (BST) is a tree in which all the nodes follow the below-mentioned
properties −

1. The left sub-tree of a node has a key less than or equal to its parent node's key.

2. The right sub-tree of a node has a key greater than to its parent node's key.

Thus, BST divides all its sub-trees into two segments; the left sub-tree and the right sub-tree and
can be defined as −

left_subtree (keys) ≤ node (key) ≤ right_subtree (keys)


Representation

BST is a collection of nodes arranged in a way where they maintain BST properties. Each node
has a key and an associated value. While searching, the desired key is compared to the keys in
BST and if found, the associated value is retrieved.

Following is a pictorial representation of BST −

We observe that the root node key (27) has all less-valued keys on the left sub-tree and the
higher valued keys on the right sub-tree.

Decision Tree:

A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf
node holds a class label. The topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that indicates whether a customer
at a company is likely to buy a computer or not. Each internal node represents a test on an
attribute. Each leaf node represents a class.
The benefits of having a decision tree are as follows −

1. It does not require any domain knowledge.


2. It is easy to comprehend.
3. The learning and classification steps of a decision tree are simple and fast.

Prefix Codes:

A prefix code is most easily represented by a binary tree in which the external nodes are labeled
with single characters that are combined to form the message. The encoding for a character is
determined by following the path down from the root of the tree to the external node that holds
that character: a 0 bit identifies a left branch in the path, and a 1 bit identifies a right branch. In
the following tree, black circles are internal nodes and gray squares are external nodes. The code
for b is 111, because the external node holding b is reached from the root by taking 3 consecutive
right branches. The other codes are given in the table below.
character encoding

-------------------

a 0

b 111

c 1011

d 1010

r 110

! 100
Huffman coding:

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length
codes to input characters, lengths of the assigned codes are based on the frequencies of
corresponding characters. The most frequent character gets the smallest code and the least
frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not prefix of
code assigned to any other character. This is how Huffman Coding makes sure that there is no
ambiguity when decoding the generated bit stream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d,
and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity
because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is
0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.

See this for applications of Huffman Coding.

There are mainly two major parts in Huffman Coding


1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree


Input is array of unique characters along with their frequency of occurrences and output is
Huffman Tree.

1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap
is used as a priority queue. The value of frequency field is used to compare two nodes in min
heap. Initially, the least frequent character is at root)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with frequency equal to the sum of the two nodes frequencies.
Make the first extracted node as its left child and the other extracted node as its right child. Add
this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root
node and the tree is complete.

Example:

Construct an optimal Binary prefix code for each weight in the given code word

1. 8,9,0,11,13,15,22

Following are the steps for construction of BST

17

8 9 10 11 13 15 22

17 21

8 9 10 11 13 15 22

17 21 28

8 9 10 11 13 15 22

38

17 21 28

8 9 10 11 13 15 22
38 50

17 21 28

8 9 10 11 13 15 22

88

38 50

17 21 28

8 9 10 11 13 15 22
Assign 0 to the left and 1 to right

88

0 1

38 50

0 1 0

17 21 28 1

0 1 0 1 0 1

8 9 10 11 13 15 22

The prefix codes are

8-> 000

9->001

10->010

11->011

13->100

15->101

22->11

Spanning Tree Algorithms:

Kruskal's algorithm to find the minimum cost spanning tree uses the greedy approach. This
algorithm treats the graph as a forest and every node it has as an individual tree. A tree connects
to another only and only if, it has the least cost among all available options and does not violate
MST properties.

To understand Kruskal's algorithm let us consider the following example −


Step 1 - Remove all loops and Parallel Edges Remove all loops and parallel edges from the given
graph

Step 2 - Arrange all edges in their increasing order of weight

The next step is to create a set of edges and weight, and arrange them in an ascending order of
weightage (cost).

Step 3 - Add the edge which has the least weightage


Now we start adding edges to the graph beginning from the one which has the least weight.
Throughout, we shall keep checking that the spanning properties remain intact. In case, by
adding one edge, the spanning tree property does not hold then we shall consider not to include
the edge in the graph.

The least cost is 2 and edges involved are B,D and D,T. We add them. Adding them does not
violate spanning tree properties, so we continue to our next edge selection.

Next cost is 3, and associated edges are A,C and C,D. We add them again

Next cost in the table is 4, and we observe that adding it will create a circuit in the graph. –

We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
We observe that edges with cost 5 and 6 also create circuits. We ignore them and move on

Now we are left with only one node to be added. Between the two least cost edges available 7
and 8, we shall add the edge with cost 7.

By adding edge S,A we have included all the nodes of the graph and we now have minimum cost
spanning tree.
Prim’s Algorithm
Prim's algorithm to find minimum cost spanning tree (as Kruskal's algorithm) uses the greedy
approach. Prim's algorithm shares a similarity with the shortest path first algorithms.
Prim's algorithm, in contrast with Kruskal's algorithm, treats the nodes as a single tree and keeps
on adding new nodes to the spanning tree from the given graph.
To contrast with Kruskal's algorithm and to understand Prim's algorithm better, we shall use the
same example −

Step 1 - Remove all loops and parallel edges

Remove all loops and parallel edges from the given graph. In case of parallel edges, keep the one
which has the least cost associated and remove all others.
Step 2 - Choose any arbitrary node as root node
In this case, we choose S node as the root node of Prim's spanning tree. This node is arbitrarily
chosen, so any node can be the root node. One may wonder why any video can be a root node.
So the answer is, in the spanning tree all the nodes of a graph are included and because it is
connected then there must be at least one edge, which will join it to the rest of the tree.
Step 3 - Check outgoing edges and select the one with less cost
After choosing the root node S, we see that S,A and S,C are two edges with weight 7 and 8,
respectively. We choose the edge S,A as it is lesser than the other.

Now, the tree S-7-A is treated as one node and we check for all edges going out from it. We
select the one which has the lowest cost and include it in the tree.
After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a node and will check all the
edges again. However, we will choose only the least cost edge. In this case, C-3-D is the new
edge, which is less than other edges' cost 8, 6, 4, etc.

After adding node D to the spanning tree, we now have two edges going out of it having the
same cost, i.e. D-2-T and D-2-B. Thus, we can add either one. But the next step will again yield
edge 2 as the least cost. Hence, we are showing a spanning tree with both edges included.

We may find that the output spanning tree of the same graph using two different algorithms is
same.
CUTS:

A cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-
set, the set of edges that have one endpoint in each subset of the partition. These edges are said
to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases
cuts are identified with their cut-sets rather than with their vertex partitions.

A cut C=(S,T) is a partition of V of a graph G=(V,E) into two subsets S and T. The cut-set of a
cut is the set C=(S,T) of edges that have one endpoint in S and the other endpoint in T.
If s and t are specified vertices of the graph G, then an s–t cut is a cut in which s belongs to the
set S and t belongs to the set T.

In an unweighted undirected graph, the size or weight of a cut is the number of edges crossing
the cut. In a weighted graph, the value or weight is defined by the sum of the weights of the
edges crossing the cut.

A bond is a cut-set that does not have any other cut-set as a proper subset.

Minimum Cut

A cut is minimum if the size or weight of the cut is not larger than the size of any other cut. The
illustration on the right shows a minimum cut: the size of this cut is 2, and there is no cut of size
1 because the graph is bridgeless.

A Minimum Cut
Maximum Cut

A cut is maximum if the size of the cut is not smaller than the size of any other cut. The
illustration on the right shows a maximum cut: the size of the cut is equal to 5, and there is no cut
of size 6, or |E| (the number of edges), because the graph is not bipartite (there is an odd cycle).

A Maximum Cut

The Max-Flow, Min-Cut

Theorem: For any network, the value of the maximum flow is equal to the capacity of the
minimum cut.

The Ford-Fulkerson Algorithm

The Ford-Fulkerson algorithm for finding the maximum flow:

a. Construct the Residual Graph

b. Find a path from the source to the sink with strictly positive flow.

c. If this path exists, update flow to include it. Go to Step a.

d. Else, the flow is maximal.

e. The (s,t)-cut has as S all vertices reachable from the

source, and T as V - S.
Example:

Find the maximal flow from node 1 to node 7.


GAME TREE

A game tree is a directed graph whose nodes are positions in a game and whose edges are
moves. The complete game tree for a game is the game tree starting at the initial position and
containing all possible moves from each position; the complete tree is the same tree as that
obtained from the extensive-form game representation.

The first two plies of the game tree for tic-tac-toe.

The diagram shows the first two levels, or plies, in the game tree for tic-tac-toe. The rotations
and reflections of positions are equivalent, so the first player has three choices of move: in the
center, at the edge, or in the corner. The second player has two choices for the reply if the first
player played in the center, otherwise five choices. And so on.

The number of leaf nodes in the complete game tree is the number of possible different ways the
game can be played. For example, the game tree for tic-tac-toe has 255,168 leaf nodes.
Game trees are important in artificial intelligence because one way to pick the best move in a
game is to search the game tree using the minimax algorithm or its variants. The game tree for
tic-tac-toe is easily searchable, but the complete game trees for larger games like chess are much
too large to search. Instead, a chess-playing program searches a partial game tree: typically as
many plies from the current position as it can search in the time available. Except for the case of
"pathological" game tree (which seem to be quite rare in practice), increasing the search depth
(i.e., the number of plies searched) generally improves the chance of picking the best move.

Two-person games can also be represented as and-or trees. For the first player to win a game,
there must exist a winning move for all moves of the second player. This is represented in the
and-or tree by using disjunction to represent the first player's alternative moves and using
conjunction to represent all of the second player's moves

The first two plies of the game tree for tic-tac-toe.


Minimax Algorithm
Minimax is a recursive algorithm which is used to choose an optimal move for a player assuming
that the other player is also playing optimally. It is used in games such as tic-tac-toe, go, chess,
isola, checkers, and many other two-player games. Such games are called games of perfect
information because it is possible to see all the possible moves of a particular game. There can be
two-player games which are not of perfect information such as Scrabble because the opponent's
move cannot be predicted.

It is similar to how we think when we play a game: "if I make this move, then my opponent can
only make only these moves,” and so on.

Minimax is called so because it helps in minimizing the loss when the other player chooses the
strategy having the maximum loss.

Terminology
Game Tree: It is a structure in the form of a tree consisting of all the possible moves which allow
you to move from a state of the game to the next state.

A game can be defined a search problem with the following components:

Initial state: It comprises the position of the board and showing whose move it is.

Successor function: It defines what the legal moves a player can make are.

Terminal state: It is the position of the board when the game gets over.

Utility function: It is a function which assigns a numeric value for the outcome of a game. For
instance, in chess or tic-tac-toe, the outcome is either a win, a loss, or a draw, and these can be
represented by the values +1, -1, or 0, respectively. There are games that have a much larger
range of possible outcomes; for instance, the utilities in backgammon varies from +192 to -192.
A utility function can also be called a payoff function.

How does the algorithm work?

There are two players involved in a game, called MIN and MAX. The player MAX tries to get
the highest possible score and MIN tries to get the lowest possible score, i.e., MIN and MAX try
to act opposite of each other.
The general process of the Minimax algorithm is as follows:

Step 1: First, generate the entire game tree starting with the current position of the game all the
way upto the terminal states. This is how the game tree looks like for the game tic-tac-toe.

Let us understand the defined terminology in terms of the diagram above.

1.The initial state is the first layer that defines that the board is blank it’s MAX’s turn to play.

2.Successor function lists all the possible successor moves. It is defined for all the layers in the
tree.

3.Terminal State is the last layer of the tree that shows the final state, i.e whether the player
MAX wins, loses, or ties with the opponent.

4.Utilities in this case for the terminal states are 1, 0, and -1 as discussed earlier, and they can be
used to determine the utilities of the other nodes as well.

Step 2: Apply the utility function to get the utility values for all the terminal states.
Step 3: Determine the utilities of the higher nodes with the help of the utilities of the terminal
nodes. For instance, in the diagram below, we have the utilities for the terminal states written in
the squares.
Let us calculate the utility for the left node(red) of the layer above the terminal. Since it is the
move of the player MIN, we will choose the minimum of all the utilities. For this case, we have
to evaluate MIN{3, 5, 10}, which we know is certainly 3. So the utility for the red node is 3.
Similarly, for the green node in the same layer, we will have to evaluate MIN{2,2} which is 2.

Step 4: Calculate the utility values with the help of leaves considering one layer at a time until
the root of the tree.
Step 5: Eventually, all the backed-up values reach to the root of the tree, i.e., the topmost point.
At that point, MAX has to choose the highest value. In our example, we only have 3 layers so we
immediately reached to the root but in actual games, there will be many more layers and nodes.
So we have to evaluate MAX{3,2} which is 3.
Therefore, the best opening move for MAX is the left node(or the red one). This move is called
the minimax decision as it maximizes the utility following the assumption that the opponent is
also playing optimally to minimize it.

To summarize,

Minimax Decision = MAX{MIN{3,5,10},MIN{2,2}}


= MAX{3,2}
=3

You might also like