Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Unit-4 SYMBOL TABLE MANAGEMENT

 Symbol Table:-
An important part of any compiler is the construction and maintenance of a dictionary
containing names and their associated values, such type of dictionary is called a symbol
table.
The symbol tables - repository of all information within a compiler. All parts of a compiler
communicate through these table and access the data-symbols. Symbol tables are also used
to hold labels, constants, and types.
Formally, a symbol table maps names into declarations (called attributes), such as mapping
the variable name x to its type int. More specifically, a symbol table stores:
 For each type name, its type definition (eg. for the C type declaration typedef int* mytype,
it maps the name mytype to a data structure that represents the type int*).
 For each variable name, its type. If the variable is an array, it also stores dimension
information. It may also store storage class, offset in activation record etc.
 For each constant name, its type and value.
 For each function and procedure, its formal parameter list and its output type. Each formal
parameter must have name, type, type of passing (by-reference or by-value), etc.
There are a numbers of phases associated with the construction of symbol tables. The principal
phases of symbol table have included:-
 Building phase : The building phase involves the insertion of symbols and their associated
values into a table.
 Referencing phase :The referencing phase is the fetching or accessing of values from a
table.

Operations of Symbol table:-


The basic operations defined on a symbol table include:-
Attributes of symbol table:
1. Name: Several alternatives are available for representing the name. symbol names may be:
Fixed length and variable length
 Fixed length: the symbol table record can reserve a fixed amount of memory to store the
name. for ex.: the original BASIC language (2 Bytes)FORTRAN 66 (6 Bytes) ANSI C (32 Bytes)
 Variable length: A pointer to the start of the name is stored in the symbol table record. For
ex.: Pascal
2. Information: The information field about a name may include the following attributes:
 Type
 Lifetime
 Address
 Size
 Scope
 Value

Information Field
 Type: The data structures needed to represent type information depend on the complexity
of allowable types in the language.
 Size: Size of the name is decided by their type and varies from compiler to compiler
 Lifetime: The lifetime of a variable is the period of time for which the variable has memory
allocated. An integer code may be used to represent lifetime.
 Static: Memory is allocated in a fixed location and size for the duration of the program.
Global variable are usually static.
 Semi- static: memory is allocated in a fixed size and location in the stack frame for the
duration of a function invocation.
 Semi- dynamic: during a particular function invocation, memory is allocated in a fixed size
and location, but the size and location may change from invocation to invocation.
 Dynamic: the size and location may be changed at any time. Algol-68 allows the declaration
of fully dynamic arrays.
 Scope: Languages that support scope rules can store the scope value as an integer value.
Scoped languages should choose a symbol table data structure that supports scope rules,
i.e.: a variable defined in the current scope must be accessed before a variable in an
enclosing scope.
A stack organization provides this behavior. New scopes are pushed on the stack, and
searching for a symbol begins at the top of the stack sent lifetime.
 Value: Languages like Pascal and C++ allow declaration of constant variables. Initialization of
variables with a value known at compiler time is also allowed in most languages. The
symbol table can store a binary representation of the value that can be used to generate
the initialization during code generation.
 Address: The value stored for address depends on the lifetime of the symbol.
Data Structure for Symbol table:
There are several data structure that is used in our symbol table, it including
 Arrays,
 Linear lists,
 Search trees
 Hash tables.

 Array: An array data structure for symbol table is sub divided into two categories;
1. Unsorted array: New symbols are added to the end of the table. Searching for a symbol
starts at the end and is linear. The most recent scope is always at the end of the table.
Following are some important point about this data structure:
 It is easy to program
 The brief about is an operation:
 It is slow in searching: O(n)
 It is fast in insertion: O(1)
 It is fast in scope deletion: O(1)
 It has fixed size.
 It has a good scope support.
2. Sorted array: In this data structure new symbols are inserted in sorted order. Binary search
is used. It has no scope support.
Following are some important points about this data structure:
 It is moderately easy to program.
 The brief about its operations: as compared to unsorted array
 It is improved in searching: O(log n)
 It is slow in insertion: O(n)
 It is slow in scope deletion: O(n)
 It has fixed size.
 It has poor scope support.

 Linear list: The simplest and easiest to implement data structure for symbol table is a
linear list of records.
 Linear linked list
 Self Organizing List

1. Linear linked list:


In this data structure new symbols are inserted at the front of the list or to the end of array. End of
array always marks by a point known as space. When we insert any name in this list then searching
is done in whole array from ‘space’ to beginning of array. If word is not found in array then we
create an entry at ‘space’ and increment ‘space’ by one or value of data type.
Following are some important points about Linear linked list :
 It is moderate programming difficulty.
 The brief about its operations:
o It is slow in searching: O(n)
o It is fast in insertion: O(1)
o It is fast in scope deletion: O(1)
 It has not size limit
 It has good scope support.
2. Self Organizing List:
To reduce the time of searching we can add an addition field ‘linker’ to each record field or each
array index. When a name is inserted then it will insert at ‘space’ and manage all linkers to other
existing name.

Figure: symbol table for self organizing list

 Balanced search tree:


The structure of the tree represents the sorted order of the records. In this data structure
maintaining balance is important because of the tendency of programmers to use systematic
names with common prefixes/suffixes. It has no scope support; some problems as with sorted
arrays. Following are some important points about Balanced search tree:
 It is moderate to high programming difficulty for balanced trees.
 The brief about its operations:- It is moderate in searching: O(log n)
It is moderate in insertion: O(log n)
It is fast in scope deletion: O(1)
 It has no size limit.
 It has poor scope support.
 Hash table:
A hash function data structure for symbol table is sub divided into two categories;
 Closed hash table: This data structure is an array of symbol table records inserted in a
random order. A hash function applied to the name determines the location in the table
where the symbol record should be stored.
A collision resolution strategy must be employed to deal with name that hash to the same
value.
Following are some important points about Closed hash table :
 It moderate programming difficulty.
 The brief about its operations:
It is fast in searching: O(1)
It is fast in insertion: O(1)
It is slow in scope deletion: O(n)
 It has fixed size.
 It has poor scope support.

 Open hash table:


A hash table, or a hash map, is a data structure that associates keys with values ‘Open
hashing’ is a key that is applied to hash table. This data structure is an array of pointers to
linear linked lists of symbol table records.
All symbols that have the same name hash to the name list. Insert new most records at the
start of the list.
It has good scope support; symbols in the most records recently entered scope are at the
front of the list.
Following are some important points about Open hash table :
 It has high programming difficulty.
 The brief about its operations:
It is fast in searching: O(1)
It is fast in insertion: O(1)
It is slow in scope deletion: O(1)
 It has no fixed size
 It has good scope support.

Representing Scope Information:


The scope of a name is the portion of the program over which it may be used or Each name
possesses a region of validity within the source program called the scope of that name. The rules
governing the scope of names in a block-structured language are as follows:
 A name declared within block B is valid only within B.
 If block B1 is nested within B2, then any name that is valid for B2 is also valid for B1, unless
identifier for that name is re-declared in B1.
These rules require a more complicated symbol table organization that simply a list of associations
between names and attributes. One technique is to keep multiple symbol tables for each active
block:
 Each table is list of names and their associated attributes, and the tables are organized on
stack.
 Whenever a new block is entered, a new table is pushed on the stack. When a declaration is
compiled, the table on the stack is searched for the name.
 If name is not found it is inserted.
 When a reference is translated, it is searched in all tables starting from top.
Another technique is to represent scope information in the symbol table.
 Store the nesting depth of each procedure block in the symbol table.
 Use the (procedure name, nesting depth) pair as the key to accessing the information from
the table.
 The nesting depth of a procedure is a number that is obtained by starting with a value of
one for the main and adding one to it every time we go from an enclosing to an enclosed
procedure. It counts the number of procedure in the referencing environment of a
procedure.

 ERROR HANDLING, ERROR DETECTION AND RECOVERY:-


ERROR:-
Program submitted to a compiler often have errors of various kinds So, good compiler
should be able to detect as many errors as possible in various ways and also recover from
them (i.e) even in the presence of errors , the compiler should scan the program and try to
compile all of it.(error recovery).

CLASSIFICATION OF ERRORS:-

 Lexical Error:-
This type of errors can be detected during lexical analysis phase. Typical lexical phase errors
are:
o Spelling errors. Hence get incorrect tokens.
o Exceeding length of identifier or numeric constants.
o Appearance of illegal characters
Ex:
fi ( )
{
}
In above code 'fi' cannot be recognized as a misspelling of keyword if rather lexical
analyzer will understand that it is an identifier and will return it as valid identifier. Thus
misspelling causes errors in token formation.

 Syntax error:-
This type of error appear during syntax analysis phase of compiler Typical errors are:
o Errors in structure
o Missing operators
o Unbalanced parenthesis
The parser demands for tokens from lexical analyzer and if the tokens do not satisfy the
grammatical rules of programming language then the syntactical errors get raised.

 Semantic error:-
This type of error detected during semantic analysis phase. Typical errors are:
o Incompatible types of operands
o Undeclared variable
o Not matching of actual argument with formal argument

 Error recovery strategies:-


o Panic mode
o Phrase level recovery
o Error production
o Global production

 Panic mode:-
o This strategy is used by most parsing methods. This is simple to implement.
o In this method on discovering error, the parser discards input symbol one at time.
This process is continued until one of a designated set of synchronizing tokens is
found. Synchronizing tokens are delimiters such as semicolon or end. These tokens
indicate an end of input statement.
o Thus in panic mode recovery a considerable amount of input is skipped without
checking it for additional errors.
o This method guarantees not to go in infinite loop.
o If there is less number of errors in the same statement then this strategy is best
choice.
 Phrase level recovery:-
o In this method, on discovering error parser performs local correction on remaining
input.
o It can replace a prefix of remaining input by some string. This actually helps parser to
continue its job.
o The local correction can be replacing comma by semicolon, deletion of semicolons or
inserting missing semicolon; this type of local correction is decided by compiler
designer.
o While doing the replacement a care should be taken for not going in an infinite loop.
o This method is used in many error-repairing compilers.

 Error production:-
o If we have knowledge of common errors that can be encountered then we can
incorporate these errors by augmenting the grammar of the corresponding language
with error productions that generate the erroneous constructs.
o If error production is used then during parsing, we can generate appropriate error
message and parsing can be continued.
o This method is extremely difficult to maintain. Because if we change grammar then it
becomes necessary to change the corresponding productions.

 Global production:-
o We often want such a compiler that makes very few changes in processing an
incorrect input string.
o We expect less number of insertions, deletions, and changes of tokens to recover
from erroneous input.
o Such methods increase time and space requirements at parsing time.
o Global production is thus simply a theoretical concept.

Error recovery in predictive parsing:- FOLLOW

Consider the grammar given below: E {$,)}

E ::= TE’ E’ {$,)}

E’ ::= +TE’ | ε T {+,$,)}

T ::= FT’ T’ {+,$,)}

T’ ::= *FT’ | ε F {+,*,$,)}

F ::= (E)|id
Insert ‘synch’ in FOLLOW symbol for all non terminals. ‘synch’ indicates resume the parsing.
Error recovery in predictive parsing:-
Input Symbol
NT
id + * ( ) $

E E =>TE’ E=>TE’ synch Synch

E’ E’ => +TE’ E’ => ε E’ => ε

T T => FT’ synch T=>FT’ Synch synch

T’ T’ => ε T’ =>* FT’ T’ => ε T’ => ε

F F => <id> synch synch F=>(E) synch synch

Error recovery in predictive parsing:-


Stack Input Remarks

$E )id*+id$ Error, skip )

$E id*+id$

$E’ T id*+id$

$E’ T’ F id*+id$

$E’ T’ id id*+id$

$E’ T’ *+id$

$E’ T’ F* *+id$

$E’ T’ F +id$ Error, M[F,+]=synch

$E’ T’ +id$ F has been popped.

$E’ +id$

$E’ T+ +id$

$E’ T id$

$E’ T’ F id$

$E’ T’ id id$

$E’ T’ $

$E’ $

$ $

o If parser looks entry M[A,a] and finds that it is blank then i/p symbol a is skipped.
o If entry is “synch” then non terminal on the top of the stack is popped in an attempt to
resume parsing.
o If a token on top of the stack does not match i/p symbol then we pop token from the stack.
Error recovery in LR parsing:-
An LR parser will detect an error when it consults the parsing action table and finds an error entry.
Consider expression grammar E-> E+E | E*E | (E) | id

I0: I1: I2: I3: I4:

E’->.E E’->E. E-> (E.) E->id. E-> E+.E

E->.E+E E->E.+E E- E->.E+E E->.E+E


>E.*E
E->.E*E E->.E*E E->.E*E

E->.(E) E->.(E) E->.(E)

E->.id E->.id E->.id

I5: I6: I7: I8: I9:

E-> E*.E E->.E+E E- E-> (E.) E- E->E+E. E->E.+E E->E*E. E->E.+E E- E->(E).
>.E*E E->.(E) >E.+E E->E.*E E->E.*E >E.*E

E->.id

Error recovery in LR parsing:- Parsing table given below shows error detection and recovery.
action goto
States
id + * ( ) $ E

0 S3 E1 E1 S2 E2 E1 1

1 E3 S4 S5 E3 E2 Acc -

2 S3 E1 E1 S2 E2 E1 6

3 R4 R4 R4 R4 R4 R4

4 S3 E1 E1 S2 E2 E1 7

5 S3 E1 E1 S2 E2 E1 8

6 E3 S4 S5 E3 S9 E4 -

7 R1 R1 S5 R1 R1 R1 -

8 R2 R2 R2 R2 R2 R2 -

9 R3 R3 R3 R3 R3 R3 -
Error recovery in LR parsing:-
The error routines are as follow:
o E1: push an imaginary id onto the stack and cover it with state 3.
Issue diagnostics “missing operands”. This routine is called from states 0, 2, 4 and 5, all of
which expect the beginning of an operand, either an id or left parenthesis. Instead, an
operator + or *, or the end of the input found.
o E2: remove the right parenthesis from the input.
Issue diagnostics “unbalanced right parenthesis. This routine is called from states 0, 1, 2,4,5
on finding right parenthesis.
o E3: push + on to the stack and cover it with state 4
Issue diagnostics “missing operator”. This routine is called from states 1 or 6 when
expecting an operator and an id or right parenthesis is found.
o E4: push right parenthesis onto the stack and cover it with state 9.
Issue diagnostics “missing right parenthesis”. This routine is called from states 6 when the
end of the input is found. State 6 expects an operator or right parenthesis.

Error recovery in LR parsing:-


Stack Input Error message and action

0 id+)$

0id3 +)$

0E1 +)$

0E1+4 )$

“unbalanced right parenthesis” e2 removes right


0E1+4 $
parenthesis

0E1+4id3 $ “missing operands” e1 pushes id 3 on stack

0E1+4E7 $

0E1 $
 Run Time Environment/ Administration:-
o Storage Organization
o Storage Allocation
o Activation Record
 Storage Organization:-
o When the target program executes then it runs in its own logical address space
in which the value of each program has a location.
o The logical address space is shared among the compiler, operating system and
target machine for management and organization. The operating system is
used to map the logical address into physical address which is usually spread
throughout the memory.
Subdivision of Run-time Memory:-
 Runtime storage comes into blocks, where a byte is used to show the smallest
unit of addressable memory.
 Using the four bytes a machine word can be formed.
 Object of multibyte is stored in consecutive bytes and gives the first byte
address.
 Run-time storage can be subdivide to hold the different components of an
executing program:
o Generated executable code
o Static data objects
o Dynamic data-object- heap
o Automatic data objects- stack
Subdivision of Run-time Memory:

 Storage Allocation:-
 Runtime environment manages runtime memory requirements for the following
entities:
 Code: It is known as the text part of a program that does not change at runtime. Its
memory requirements are known at the compile time.
 Procedures: Their text part is static but they are called in a random manner. That is
why, stack storage is used to manage procedure calls and activations
 Variables: Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
The different ways to allocate memory are:
o Static storage allocation
o Dynamic storage allocation
• Stack storage allocation
• Heap storage allocation
 Static storage allocation:-
• In static allocation, names are bound to storage locations.
• If memory is created at compile time then the memory will be created in static area
and only once.
• Static allocation supports the dynamic data structure that means memory is created
only at compile time and deallocated after program completion.
• The drawback with static storage allocation is that the size and position of data
objects should be known at compile time.
• Another drawback is restriction of the recursion procedure.
 Stack Storage Allocation:-
• In static storage allocation, storage is organized as a stack.
• An activation record is pushed into the stack when activation begins and it is popped
when the activation end.
• Activation record contains the locals so that they are bound to fresh storage in each
activation record. The value of locals is deleted when the activation ends.
• It works on the basis of last-in-first-out (LIFO) and this allocation supports the
recursion process.
 Heap Storage Allocation:-
• Heap allocation is the most flexible allocation scheme.
• Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.
• Heap allocation is used to allocate memory to the variables dynamically and when
the variables are no more used then claim it back.
• Heap storage allocation supports the recursion process.
Example of Dynamic Allocation in Heap:- fact (6)
fact (int n)
{
if (n<=1)
return 1;
else
return (n * fact(n-1));
}
Difference between Static , Stack and Heap Allocation:-

Activation Record:-
 A program is a sequence of instructions combined into a number of procedures.
 Instructions in a procedure are executed sequentially.
 A procedure has a start and an end delimiter and everything inside it is called the body of
the procedure.
 The execution of a procedure is called its activation.
 An activation record contains all the necessary information required to call a procedure.
 Control stack is a run time stack which is used to keep track of the live procedure activations
i.e. it is used to find out the procedures whose execution have not been completed.
 When it is called (activation begins) then the procedure name will push on to the stack and
when it returns (activation ends) then it will popped.
 Activation record is used to manage the information needed by a single execution of a
procedure.
 An activation record is pushed into the stack when a procedure is called and it is popped
when the control returns to the caller function.
An activation record may contain the following units (depending upon the source
language used):-
Return Value: It is used by calling procedure to return a value to calling procedure.
Actual Parameter: It is used by calling procedures to supply parameters to the called
procedures.
Control Link: It points to activation record of the caller.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the procedure
is called.
Local Data: It holds the data that is local to the execution of the procedure.
Temporaries: It stores the value that arises in the evaluation of an expression.

Activation Tree:-
• The program control flows in a sequential manner and when a procedure is called, its
control is transferred to the called procedure.
• When a called procedure is executed, it returns the control back to the caller.
• This type of control flow makes it easier to represent a series of activations in the form of a
tree, known as the activation tree.

• Example:-
main()
{
Int n;
readarray();
quicksort(1,n);
}
quicksort(int m, int n)
{
Int i= partition(m,n); quicksort(m,i-1); quicksort(i+1,n);
}
Program Sequence of procedure call Activation Tree

You might also like