Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

A Research

In
Data Structures
(Trees and Graph)
Submitted
by:
Ryan Paolo Salvador
Submitted
to:
Mrs. Elsa Salvador

October 9, 2013

Trees and indexes


The tree is one of the most powerful of the advanced data structures and it often pops up in even more
advanced subjects such as AI and compiler design. Surprisingly though the tree is important in a much
more basic application - namely the keeping of an efficient index.
Whenever you use a database there is a 99% chance that an index is involved somewhere. The simplest
type of index is a sorted listing of the key field. This provides a fast lookup because you can use a binary
search to locate any item without having to look at each one in turn.
The trouble with a simple ordered list only becomes apparent once you start adding new items and have
to keep the list sorted - it can be done reasonably efficiently but it takes some advanced juggling. A more
important defect in these days of networking and multi-user systems is related to the file locking
properties of such an index. Basically if you want to share a linear index and allow more than one user to
update it then you have to lock the entire index during each update. In other words a linear index isn't
easy to share and this is where trees come in - I suppose you could say that trees are shareable.
Tree ecology
A tree is a data structure consisting of nodes organised as a hierarchy - see Figure 1.
Tree1

Figure 1: Some tree jargon


There is some obvious jargon that relates to trees and some not so obvious both are summarised in the
glossary and selected examples are shown in Figure 1.
I will try to avoid overly academic definitions or descriptions in what follows but if you need a quick
definition of any term then look it up in the glossary.
Binary trees
A worthwhile simplification is to consider only binary trees. A binary tree is one in which each node has
at most two descendants - a node can have just one but it can't have more than two.
Clearly each node in a binary tree can have a left and/or a right descendant. The importance of a binary
tree is that it can create a data structure that mimics a "yes/no" decision making process.
For example, if you construct a binary tree to store numeric values such that each left sub-tree contains
larger values and each right sub-tree contains smaller values then it is easy to search the tree for any
particular value. The algorithm is simply a tree search equivalent of a binary search:
start at the root

REPEAT until you reach a terminal node


IF value at the node = search value
THEN found
IF value at node < search value
THEN move to left descendant
ELSE move to right descendant
END REPEAT
Of course if the loop terminates because it reaches a terminal node then the search value isn't in the tree,
but the fine detail only obscures the basic principles.
The next question is how the shape of the tree affects the efficiency of the search. We all have a tendency
to imagine complete binary trees like the one in Figure 2a and in this case it isn't difficult to see that in the
worst case a search would have to go down the to the full depth of the tree. If you are happy with maths
you will know that if the tree in Figure 2a contains n items then its depth is log2 n and so at best a tree
search is as fast as a binary search.
Tree2a

Figure 2a: The "perfect" binary tree .


The worst possible performance is produced by a tree like that in Figure 2b. In this case all of the items
are lined up on a single branch making a tree with a depth of n. The worst case search of such a tree
would take n compares which is the same as searching an unsorted linear list.
So depending on the shape of the tree search efficiency varies from a binary search of a sorted list to a
linear search of an unsorted list. Clearly if it is going to be worth using a tree we have to ensure that it is
going to be closer in shape to the tree in Figure 2a than that in 2b.
Tree2b

Figure 2b: This may be an extreme binary tree but it still IS a binary tree

Common Uses of Trees


.

1. Manipulate hierarchical data.


2. Make information easy to search (Traversal)
3. Manipulate sorted lists of data.
4. Acts as a workflow for compositing digital images for visual effects.
5. Router algorithms
How trees are represented in memory

Binary Tree
A binary tree is a tree data structure in which each node has at most two child nodes,
usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes
may contain references to their parents. Outside the tree, there is often a reference to the "root"
node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by
starting at root node and repeatedly following references to either the left or right child. A tree
which does not have any node other than root node is called a null tree. In a binary tree, a degree
of every node is maximum two. A tree with n nodes has exactly n1 branches or degree.
Binary trees are used to implement binary search trees and binary heaps, finding
applications in efficient searching and sorting algorithms.

A directed edge refers to the link from the parent to the child (the arrows in the picture
of the tree).

The root node of a tree is the node with no parents. There is at most one root node in a
rooted tree.

A leaf node has no children.

The depth of a node is the length of the path from the root to the node. The set of all
nodes at a given depth is sometimes called a level of the tree. The root node is at depth zero.

The depth (or height) of a tree is the length of the path from the root to the deepest node
in the tree. A (rooted) tree with only one node (the root) has a depth of zero.

Siblings are nodes that share the same parent node.


A node p is an ancestor of a node q if it exists on the path from the root to node q. The
node q is then termed as a descendant of p.

The size of a node is the number of descendants it has including itself.

In-degree of a node is the number of edges arriving at that node.

Out-degree of a node is the number of edges leaving that node.

The root is the only node in the tree with In-degree = 0.

All the leaf nodes have Out-degree = 0.


BINARY SEARCH TREE

A binary search tree (BST), sometimes also called an ordered or sorted binary tree, is
a node-based binary tree data structure which has the following properties:[1]

The left subtree of a node contains only nodes with keys less than the node's key.

The right subtree of a node contains only nodes with keys greater than the node's key.

The left and right subtree must each also be a binary search tree.

There must be no duplicate nodes.

Generally, the information represented by each node is a record rather than a single data
element. However, for sequencing purposes, nodes are compared according to their keys rather
than any part of their associated records.
The major advantage of binary search trees over other data structures is that the
related sorting algorithms and search algorithms such as in-order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data
structures such as sets, multisets, and associative arrays.

TREE TRAVERSALS

Preorder traversal: To traverse a binary tree in Preorder, following operations are carried-out (i)
Visit the root, (ii) Traverse the left subtree, and (iii) Traverse the right subtree.
Therefore, the Preorder traversal of the above tree will outputs:
7, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10
Inorder traversal: To traverse a binary tree in Inorder, following operations are carried-out (i)
Traverse the left most subtree starting at the left external node, (ii) Visit the root, and (iii)
Traverse the right subtree starting at the left external node.
Therefore, the Inorder traversal of the above tree will outputs:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Postorder traversal: To traverse a binary tree in Postorder, following operations are carried-out
(i) Traverse all the left external nodes starting with the left most subtree which is then followed
by bubble-up all the internal nodes, (ii) Traverse the right subtree starting at the left external
node which is then followed by bubble-up all the internal nodes, and (iii) Visit the root.

Therefore, the Postorder traversal of the above tree will outputs:


0, 2, 4, 6, 5, 3, 1, 8, 10, 9, 7
The binary tree is a fundamental data structure used in computer science. The binary tree is a
useful data structure for rapidly storing sorted data and rapidly retrieving stored data. A binary
tree is composed of parent nodes, or leaves, each of which stores data and also links to up to two
other child nodes (leaves) which can be visualized spatially as below the first node with one
placed to the left and with one placed to the right. It is the relationship between the leaves linked
to and the linking leaf, also known as the parent node, which makes the binary tree such an
efficient data structure. It is the leaf on the left which has a lesser key value (i.e., the value used
to search for a leaf in the tree), and it is the leaf on the right which has an equal or greater key
value. As a result, the leaves on the farthest left of the tree have the lowest values, whereas the
leaves on the right of the tree have the greatest values. More importantly, as each leaf connects to
two other leaves, it is the beginning of a new, smaller, binary tree. Due to this nature, it is
possible to easily access and insert data in a binary tree using search and insert functions
recursively called on successive leaves.

The typical graphical representation of a binary tree is essentially that of an upside down tree. It
begins with a root node, which contains the original key value. The root node has two child
nodes; each child node might have its own child nodes. Ideally, the tree would be structured so
that it is a perfectly balanced tree, with each node having the same number of child nodes to its
left and to its right. A perfectly balanced tree allows for the fastest average insertion of data or
retrieval of data. The worst case scenario is a tree in which each node only has one child node, so
it becomes as if it were a linked list in terms of speed. The typical representation of a binary tree
looks like the following:
10
/ \
6
14
/\ / \
5 8 11 18
The node storing the 10, represented here merely as 10, is the root node, linking to the
left and right child nodes, with the left node storing a lower value than the parent node, and the
node on the right storing a greater value than the parent node. Notice that if one removed the root
node and the right child nodes, that the node storing the value 6 would be the equivalent a new,
smaller, binary tree.
The structure of a binary tree makes the insertion and search functions simple to
implement using recursion. In fact, the two insertion and search functions are also both very
similar. To insert data into a binary tree involves a function searching for an unused node in the
proper position in the tree in which to insert the key value. The insert function is generally a
recursive function that continues moving down the levels of a binary tree until there is an unused

leaf in a position which follows the rules of placing nodes. The rules are that a lower value
should be to the left of the node, and a greater or equal value should be to the right. Following
the rules, an insert function should check each node to see if it is empty, if so, it would insert the
data to be stored along with the key value (in most implementations, an empty node will simply
be a NULL pointer from a parent node, so the function would also have to create the node). If the
node is filled already, the insert function should check to see if the key value to be inserted is less
than the key value of the current node, and if so, the insert function should be recursively called
on the left child node, or if the key value to be inserted is greater than or equal to the key value
of the current node the insert function should be recursively called on the right child node. The
search function works along a similar fashion. It should check to see if the key value of the
current node is the value to be searched. If not, it should check to see if the value to be searched
for is less than the value of the node, in which case it should be recursively called on the left
child node, or if it is greater than the value of the node, it should be recursively called on the
right child node. Of course, it is also necessary to check to ensure that the left or right child node
actually exists before calling the function on the node.
Because binary trees have log (base 2) n layers, the average search time for a binary tree is log
The struct has the ability to store the key_value and contains the two child nodes which
define the node as part of a tree. In fact, the node itself is very similar to the node in a linked list.
A basic knowledge of the code for a linked list will be very helpful in understanding the
techniques of binary trees. Essentially, pointers are necessary to allow the arbitrary creation of
new nodes in the tree.
It is most logical to create a binary tree class to encapsulate the workings of the tree into a single
area, and also making it reusable. The class will contain functions to insert data into the tree and
to search for data. Due to the use of pointers, it will be necessary to include a function to delete
the tree in order to conserve memory after the program has finished.
class btree
{
public:
btree();
~btree();
void insert(int key);
node *search(int key);
void destroy_tree();
private:
void destroy_tree(node *leaf);
void insert(int key, node *leaf);
node *search(int key, node *leaf);
node *root;
};
The insert and search functions that are public members of the class are designed to allow
the user of the class to use the class without dealing with the underlying design. The insert and

search functions which will be called recursively are the ones which contain two parameters,
allowing them to travel down the tree. The destroy_tree function without arguments is a front for
the destroy_tree function which will recursively destroy the tree, node by node, from the bottom
up.
The code for the class would look similar to the following:
btree::btree()
{
root=NULL;
}

The destroy_tree function will set off the recursive function destroy_tree shown below which
will actually delete all nodes of the tree.
void btree::destroy_tree(node *leaf)
{
if(leaf!=NULL)
{
destroy_tree(leaf->left);
destroy_tree(leaf->right);
delete leaf;
}
}
The function destroy_tree goes to the bottom of each part of the tree, that is, searching
while there is a non-null node, deletes that leaf, and then it works its way back up. The function
deletes the leftmost node, then the right child node from the leftmost node's parent node, then it
deletes the parent node, then works its way back to deleting the other child node of the parent of
the node it just deleted, and it continues this deletion working its way up to the node of the tree
upon which delete_tree was originally called. In the example tree above, the order of deletion of
nodes would be 5 8 6 11 18 14 10. Note that it is necessary to delete all the child nodes to avoid
wasting memory.
void btree::insert(int key, node *leaf)
{
if(key< leaf->key_value)
{
if(leaf->left!=NULL)
insert(key, leaf->left);
else
{
leaf->left=new node;
leaf->left->key_value=key;

leaf->left->left=NULL; //Sets the left child of the child node to null


leaf->left->right=NULL; //Sets the right child of the child node to null
}
}
else if(key>=leaf->key_value)
{
if(leaf->right!=NULL)
insert(key, leaf->right);
else
{
leaf->right=new node;
leaf->right->key_value=key;
leaf->right->left=NULL; //Sets the left child of the child node to null
leaf->right->right=NULL; //Sets the right child of the child node to null
The case where the root node is still NULL will be taken care of by the insert function
that is nonrecursive and available to non-members of the class. The insert function searches,
moving down the tree of children nodes, following the prescribed rules, left for a lower value to
be inserted and right for a greater value, until it finds an empty node which it creates using the
'new' keyword and initializes with the key value while setting the new node's child node pointers
to NULL. After creating the new node, the insert function will no longer call itself.
node *btree::search(int key, node *leaf)
{
if(leaf!=NULL)
{
if(key==leaf->key_value)
return leaf;
if(key<leaf->key_value)
return search(key, leaf->left);
else
return search(key, leaf->right);
}
else return NULL;
}
The search function shown above recursively moves down the tree until it either reaches
a node with a key value equal to the value for which the function is searching or until the
function reaches an uninitialized node, meaning that the value being searched for is not stored in
the binary tree. It returns a pointer to the node to the previous instance of the function which
called it, handing the pointer back up to the search function accessible outside the class.
void btree::insert(int key)
{
if(root!=NULL)
insert(key, root);
else
{

root=new node;
root->key_value=key;
root->left=NULL;
root->right=NULL;
}
}

Graph
A graph is a mathematical structure consisting of a set of vertices (also called
nodes)
and a set of edges
. An edge is a pair of
vertices
. The two vertices are called the edge endpoints. Graphs are
ubiquitous in computer science. They are used to model real-world systems such as the Internet
(each node represents a router and each edge represents a connection between routers); airline
connections (each node is an airport and each edge is a flight); or a city road network (each node
represents an intersection and each edge represents a block). The wireframe drawings in
computer graphics are another example of graphs.
A graph may be either undirected or directed. Intuitively, an undirected edge models a
"two-way" or "duplex" connection between its endpoints, while a directed edge is a one-way
connection, and is typically drawn as an arrow. A directed edge is often called an arc.
Mathematically, an undirected edge is an unordered pair of vertices, and an arc is an ordered pair.
For example, a road network might be modeled as a directed graph, with one-way streets
indicated by an arrow between endpoints in the appropriate direction, and two-way streets shown
by a pair of parallel directed edges going both directions between the endpoints. You might ask,
why not use a single undirected edge for a two-way street. There's no theoretical problem with
this, but from a practical programming standpoint, it's generally simpler and less error-prone to
stick with all directed or all undirected edges.

An undirected graph can have at most


edges (one for each unordered pair), while
a directed graph can have at most
edges (one per ordered pair). A graph is called sparse if it
has many fewer than this many edges (typically

edges), and dense if it has closer

to
edges. A multigraph can have more than one edge between the same two vertices. For
example, if one were modeling airline flights, there might be multiple flights between two cities,
occurring at different times of the day.

A path in a graph is a sequence of vertices


such that there exists an
edge or arc between consecutive vertices. The path is called a cycle if
. An undirected
acyclic graph is equivalent to an undirected tree. A directed acyclic graph is called a DAG. It is
not necessarily a tree.
Nodes and edges often have associated information, such as labels or weights. For
example, in a graph of airline flights, a node might be labeled with the name of the
corresponding airport, and an edge might have a weight equal to the flight time. The popular
game "Six Degrees of Kevin Bacon" can be modeled by a labeled undirected graph. Each actor
becomes a node, labeled by the actor's name. Nodes are connected by an edge when the two
actors appeared together in some movie. We can label this edge by the name of the movie.
Deciding if an actor is separated from Kevin Bacon by six or fewer steps is equivalent to finding
a path of length at most six in the graph between Bacon's vertex and the other actors vertex.
(This can be done with the breadth-first search algorithm found in the
companion Algorithms book. The Oracle of Bacon at the University of Virginia has actually
implemented this algorithm and can tell you the path from any actor to Kevin Bacon in a few
clicks.)
Uses of Graph
Applications of graph theory are primarily, but not exclusively, concerned with labeled graphs
and various specializations of these.
Structures that can be represented as graphs are ubiquitous, and many problems of practical
interest can be represented by graphs. The link structure of a website could be represented by a
directed graph: the vertices are the web pages available at the website and a directed edge from
page A to page B exists if and only if A contains a link to B. A similar approach can be taken to
problems in travel, biology, computer chip design, and many other fields. The development of
algorithms to handle graphs is therefore of major interest in computer science. There, the
transformation of graphs is often formalized and represented by graph rewrite systems. They are
either directly used or properties of the rewrite systems(e.g. confluence) are studied.
A graph structure can be extended by assigning a weight to each edge of the graph. Graphs with
weights, or weighted graphs, are used to represent structures in which pairwise connections have
some numerical values. For example if a graph represents a road network, the weights could
represent the length of each road. A digraph with weighted edges in the context of graph theory is
called a network.
Networks have many uses in the practical side of graph theory, network analysis (for example, to
model and analyze traffic networks). Within network analysis, the definition of the term
"network" varies, and may often refer to a simple graph.
Many applications of graph theory exist in the form of network analysis. These split broadly into
three categories. Firstly, analysis to determine structural properties of a network, such as the
distribution of vertex degrees and the diameter of the graph. A vast number of graph measures
exist, and the production of useful ones for various domains remains an active area of research.

Secondly, analysis to find a measurable quantity within the network, for example, for a
transportation network, the level of vehicular flow within any portion of it. Thirdly, analysis of
dynamical properties of networks.
Graph theory is also used to study molecules in chemistry and physics. In condensed matter
physics, the three dimensional structure of complicated simulated atomic structures can be
studied quantitatively by gathering statistics on graph-theoretic properties related to the topology
of the atoms. For example, Franzblau's shortest-path (SP) rings. In chemistry a graph makes a
natural model for a molecule, where vertices represent atoms and edges bonds. This approach is
especially used in computer processing of molecular structures, ranging from chemical editors to
database searching.
Graph theory is also widely used in sociology as a way, for example, to measure actors' prestige
or to explore diffusion mechanisms, notably through the use of social network analysis software.

How are Graphs Represented in Memory?

Graph Creation
The following program constructs the graph shown in the introduction using the intuitive
representation, MBgraph1, and then enumerates the vertices, neighbours and edges:
#include <stdio.h>
#include <graph1.h>
int main(void)
{

MBgraph1 *graph;
MBvertex *vertex;
MBvertex *A, *B, *C, *D, *E;
MBiterator *vertices, *edges;
MBedge *edge;
/* Create a graph */
graph = MBgraph1_create();
/* Add vertices */
A = MBgraph1_add(graph, "A", NULL);
B = MBgraph1_add(graph, "B", NULL);
C = MBgraph1_add(graph, "C", NULL);
D = MBgraph1_add(graph, "D", NULL);
E = MBgraph1_add(graph, "E", NULL);
/* Add edges */
MBgraph1_add_edge(graph, A, B);
MBgraph1_add_edge(graph, A, D);
MBgraph1_add_edge(graph, B, C);
MBgraph1_add_edge(graph, C, B);
MBgraph1_add_edge(graph, D, A);
MBgraph1_add_edge(graph, D, C);
MBgraph1_add_edge(graph, D, E);
/* Display */
printf("Vertices (%d) and their neighbours:\n\n", MBgraph1_get_vertex_count(graph));
vertices = MBgraph1_get_vertices(graph);
while ((vertex = MBiterator_get(vertices))) {
MBiterator *neighbours;
MBvertex *neighbour;
unsigned int n = 0;
printf("%s (%d): ", MBvertex_get_name(vertex), MBgraph1_get_neighbour_count(graph,
vertex));
neighbours = MBgraph1_get_neighbours(graph, vertex);
while ((neighbour = MBiterator_get(neighbours))) {
printf("%s", MBvertex_get_name(neighbour));
if (n < MBgraph1_get_neighbour_count(graph, vertex) - 1) {
fputs(", ", stdout);
}
n++;
}
putchar('\n');
MBiterator_delete(neighbours);
}
putchar('\n');

MBiterator_delete(vertices);
printf("Edges (%d):\n\n", MBgraph1_get_edge_count(graph));
edges = MBgraph1_get_edges(graph);
while ((edge = MBiterator_get(edges))) {
printf("<%s, %s>\n", MBvertex_get_name(MBedge_get_from(edge)),
MBvertex_get_name(MBedge_get_to(edge)));
}
putchar('\n');
MBiterator_delete(edges);
/* Delete */
MBgraph1_delete(graph);
Return 0;
}
Graph Traversal
Graph traversal is the problem of visiting all the nodes in a graph in a particular manner,
updating and/or checking their values along the way.Tree traversal is a special case of graph
traversal.
Unlike tree traversal, graph traversal may require that some nodes be visited more than
once, since it is not necessarily known before transitioning to a node that it has already been
explored. As graphs become more dense, this redundancy becomes more prevalent, causing
computation time to increase; as graphs become more sparse, the opposite holds true.
Thus, it is usually necessary to remember which nodes have already been explored by the
algorithm, so that nodes are revisited as infrequently as possible (or in the worst case, to prevent
the traversal from continuing indefinitely). This may be accomplished by associating each node
of the graph with a "color" or "visitation" state during the traversal, which is then checked and
updated as the algorithm visits each node. If the node has already been visited, it is ignored and
the path is pursued no further; otherwise, the algorithm checks/updates the node and continues
down its current path.
Several special cases of graphs imply the visitation of other nodes in their structure, and
thus do not require that visitation be explicitly recorded during the traversal. An important
example of this is a tree, during a traversal of which it may be assumed that all "ancestor" nodes
of the current node (and others depending on the algorithm) have already been visited. Both the
depth-first and breadth-first graph searches are adaptations of tree-based algorithms,
distinguished primarily by the lack of a structurally determined "root" node and the addition of a
data structure to record the traversal's visitation state.

You might also like