SSC Module3 SyntaxAnalysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Module 3

Syntax Analysis
Chapter 4 (4.1 – 4.5)

Sunitha G 1
Outline
⚫Introduction
⚫Context free grammars
⚫Writing a grammar
⚫Top down parsing
⚫Bottom up parsing

Sunitha G 2
The role of parser

token
Source Lexical Parse tree Rest of Intermediate
program Parser
Analyzer Front End representation
getNext
Token

Symbol
table

Sunitha G 3
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id

E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id

Sunitha G 4
Error handling
⚫Common programming errors
⚫Lexical errors
⚫Syntactic errors
⚫Semantic errors
⚫Logical errors
⚫Error handler goals
⚫Report the presence of errors clearly and accurately
⚫Recover from each error quickly enough to detect
subsequent errors
⚫Add minimal overhead to the processing of correct
programs
Sunitha G 5
Error-recover strategies
⚫Panic mode recovery
⚫Discard input symbol one at a time until one of
designated set of synchronization tokens is found
⚫Phrase level recovery
⚫Replacing a prefix of remaining input by some string
that allows the parser to continue
⚫Error productions
⚫Augment the grammar with productions that generate
the erroneous constructs
⚫Global correction
⚫Choosing minimal sequence of changes to obtain a
globally least-cost correction

Sunitha G 6
Context free grammars
A CFG is defined as G= (V,T,P,S)
expr -> expr + term
⚫V is the finite set of non-terminals expr -> expr – term
(variables) expr -> term
⚫T is the finite set of terminals term -> term * factor
(tokens) term -> term / factor
⚫P is the finite set of productions rules term -> factor
in the following form factor -> (expr)
⚫A -> α where factor -> id
⚫A is a non-terminal and
⚫α is a string of terminals and non- E -> E+T | E-T | T
terminals (including the empty string) T -> T*F | T/F | F
⚫S is the start symbol (one of the non- F -> ( E ) | id
terminal symbol) Sunitha G 7
Notational conventions:
1. Symbols used for terminals are :
a. Lower case letters early in the alphabet (such as a, b, c, . . .)
b. Operator symbols (such as +, *, . . . )
c. Punctuation symbols (such as parenthesis, comma and so on)
d. The digits(0…9)
e. Boldface strings and keywords (such as id or if)
2. Symbols used for non terminals are:
a. Uppercase letters early in the alphabet (A, B, C, …)
b. The letter S, which when it appears is usually the start symbol.
c. Lowercase, italic names (such as expr or stmt).
3. Lower case greek letters such as α, β, γ represent (possibly empty)
strings of grammar symbols.
4. X, Y, Z represent grammar symbols(Terminal or Nonterminal)
5. u,v,…,z represent strings of terminals.
6. A ->α1 , A ->α2 , A ->α3 can be written as A ->α1 | α 2 | α3

Sunitha G 8
Derivations
⚫Productions are treated as rewriting rules to generate
a string
⚫A sequence of replacements of non-terminal symbols
by its production body is called as derivation
⚫Rightmost and leftmost derivations
⚫E -> E + E | E * E | -E | (E) | id
⚫Derivations for –(id+id)
LMD: E => -E RMD: E => -E
=> -(E) => -(E)
=> -(E+E) => -(E+E)
=> -(id+E) => -(E+id)
left sentential form =>-(id+id) =>-(id+id) right sentential form
⚫If S =>α, α is a sentential form of G.
Sunitha G 9
Parse trees
⚫-(id+id)
⚫E => -E
=> -(E)
=> -(E+E)
=> -(id+E)
=>-(id+id)

The leaves of a parse tree when read from


left to right constitute a sentential form,
called yield or frontier of the tree.

Sunitha G 10
Problem:
⚫Obtain LMD and RMD for the following grammar
S -> +SS | *SS | a | b
with string +*+abba

Sunitha G 11
Ambiguity
⚫A grammar that produces more than one parse tree
for a sentence is called as an ambiguous grammar.
⚫Or more than one leftmost derivation
⚫Or more than one rightmost derivation
⚫Example: id+id*id
3+4*5

23 35

Sunitha G 12
Context-Free Grammars versus Regular
Expressions
⚫Every construct that can be described by a regular
expression can be described by a grammar, but not
vice-versa.
⚫Every regular language is a context-free language,
but not vice-versa.

Sunitha G 13
Writing a Grammar
Lexical Versus Syntactic Analysis
"Why do we use regular expressions to define the lexical syntax of a
language?" There are several reasons.
⚫Separating the syntactic structure of a language into lexical and
nonlexical parts provides a convenient way of modularizing the
front end of a compiler into two manageable-sized components.
⚫The lexical rules of a language are frequently quite simple, and to
describe them we do not need a notation as powerful as grammars.
⚫Regular expressions generally provide a more concise and easier-
to-understand notation for tokens than grammars.
⚫More efficient lexical analyzers can be constructed automatically
from regular expressions than from arbitrary grammars.

Sunitha G 14
Elimination of ambiguity
⚫Ambiguous grammars can be disambiguated
according to the precedence and associativity rules.
E -> E+E | E-E | E*E | E/E | id | (E)
⚫precedence: id, ( ) associativity rules
*, / (left to right)
+, - (left to right)
⚫Unambiguous Grammar: E -> E+T | E-T
(Introduce new variables F & T) T -> T*F | T/F
F -> id | (E)

Sunitha G 15
Elimination of ambiguity
from the following "dangling else" grammar:

if E1 then S1 else if E2 then S2 else S3 (1 Parse tree)


if E1 then if E2 then S1 else S2 (2 Parse trees)

Sunitha G 16
Elimination of ambiguity (cont.)
⚫Idea:
⚫A statement appearing between a then and an else
must be matched
if (E)
then
ms
else
ms

if (E) if (E)
then then
s ms
else
os
Sunitha G 17
Elimination of left recursion
⚫ A grammar is left recursive if it has a non-terminal A
such that there is a derivation A=> Aα
⚫ Top down parsing methods+ cant handle left-
recursive grammars

⚫ A simple rule for direct left recursion elimination:


⚫ For a rule like:
⚫ A -> A α|β
⚫ We may replace it with
⚫ A -> β A’
⚫ A’ -> α A’ | ɛ

Sunitha G 18
Removal of Immediate left recursion
⚫First, group the productions as
A -> 𝐴𝛼1 | 𝐴𝛼2 |…| 𝐴𝛼𝑚 | 𝛽1| 𝛽2 | ... | 𝛽𝑛
where no 𝛽𝑖 begins with an A.
⚫Then, replace the A-productions by
A -> 𝛽1𝐴′ | 𝛽2𝐴′ | … | 𝛽𝑛𝐴′
A' -> 𝛼1A' | 𝛼2A' | … | 𝛼𝑚A' | ɛ

Example 1: Example 2:
E -> E + id | id E -> E + id | ɛ
After removing immediate LR After removing immediate LR
E -> id E′ E -> E′
E’ -> +id E′ | ɛ E’ -> +id E′ | ɛ
Sunitha G 19
Problem
⚫Eliminate left recursion from the following grammar:
E -> E+T | E-T | T
T -> T*F | T/F | F
F -> (E) | id
⚫The non-left-recursive expression grammar is
E -> T E'
E' -> + T E' | - T E' | ɛ
T -> FT'
T' -> * F T' | / F T' | ɛ
F -> (E) | id

Sunitha G 20
Indirect Left recursion elimination
⚫There are cases like following
⚫S -> Aa | b S => Aa
⚫A -> Ac | Sd | ɛ => Sda

⚫Left recursion elimination algorithm:

Arrange the nonterminals in some order A1,A2,…,An.


For (each i from 1 to n) {
For (each j from 1 to i-1) {
Replace each production of the form Ai-> Aj γ by the production
Ai -> δ1 γ | δ2 γ | … |δk γ where
Aj-> δ1 | δ2 | … |δk are all current Aj productions
}
Eliminate left recursion among the Ai-productions
}

Sunitha G 21
Left factoring
⚫Left factoring is a grammar transformation that is
useful for producing a grammar suitable for predictive
or top-down parsing.
⚫Consider following grammar:
Stmt -> if expr then stmt else stmt
| if expr then stmt
⚫On seeing input if it is not clear for the parser which
production to use

⚫We can easily perform left factoring:


⚫If we have A->αβ1 | αβ2 then we replace it with
A -> αA’
A’ -> β1 | β2
Sunitha G 22
Left factoring (cont.)
⚫Algorithm
⚫For each non-terminal A, find the longest prefix α
common to two or more of its alternatives.
⚫If α<> ɛ, then replace all of A-productions
A->αβ1 |αβ2 | … | αβn | γ by
⚫A -> αA’ | γ
⚫A’ -> β1 |β2 | … | βn

⚫Example:
⚫S -> I E t S | i E t S e S | a
⚫E -> b

Sunitha G 23
Top Down Parsing

Sunitha G 24
Introduction
⚫A Top-down parser tries to create a parse tree from
the root towards the leafs scanning input from left to
right
⚫It can be also viewed as finding a leftmost derivation
for an input string
⚫Example: id+id*id
E -> TE’ E
lm
E
lm
E
lm
E
lm
E
lm
E
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’
F -> (E) | id id id id
Ɛ Ɛ

Sunitha G 25
Sunitha G 26
High level classification of Top-Down Parser:

Sunitha G 27
Recursive descent parsing
⚫Consists of a set of procedures, one for each
nonterminal
⚫Execution begins with the procedure for start symbol
⚫A typical procedure for a non-terminal

void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Sunitha G 28
Recursive descent parsing (cont)
⚫General recursive descent may require backtracking
⚫The previous code needs to be modified to allow
backtracking
⚫In general form it cant choose an A-production
easily.
⚫So we need to try all alternatives
⚫If one failed the input pointer needs to be reset and
another alternative should be tried
⚫Recursive descent parsers cant be used for left-
recursive grammars

Sunitha G 29
Example
S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a

Sunitha G 30
First and Follow
⚫First() is set of terminals that begin strings derived
fromα
⚫If α=>ɛ
* then ɛ is also in First(α)
⚫In predictive parsing when we have A-> α|β, if First(α)
and First(β) are disjoint sets then we can select
appropriate A-production by looking at the next input

⚫Follow(A), for any nonterminal A, is set of terminals


a that can appear
*
immediately after A in some
sentential form
⚫If we have S*=> αAaβ for some αand βthen a is in
Follow(A)
⚫If A can be the rightmost symbol in some sentential form,
then $ is in Follow(A)
Sunitha G 31
Computing First
⚫To compute First(X) for all grammar symbols X,
apply the following rules until no more terminals or ɛ
can be added to any First set:
1. If X is a terminal, then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a
production for some k>=1, then place a in First(X) if
for some i, a is in First(Yi) and ɛ is in all of
First(Y1),…,First(Yi-1)
* that is Y1…Yi-1 => ɛ. if ɛ is in
First(Yj) for j=1,…,k then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
⚫ Example!

Sunitha G 32
Computing Follow
⚫To compute Follow(A) for all nonterminals A, apply
following rules until nothing can be added to any
follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in
First(β) except ɛ is in Follow(B).
3. If there is a production A->αB or a production
A->αBβ where First(β) contains ɛ, then everything
in Follow(A) is in Follow(B)
⚫ Example!

Sunitha G 33
LL(1) Grammars
⚫Predictive parsers are those recursive descent parsers
needing no backtracking
⚫Grammars for which we can create predictive parsers are
called LL(1)
⚫The first L means scanning input from left to right
⚫The second L means leftmost derivation
⚫And 1 stands for using one input symbol for lookahead
⚫A grammar G is LL(1) if and only if whenever A->
α|βare two distinct productions of G, the following
conditions hold:
⚫For no terminal a do αandβ both derive strings beginning
with a
⚫At most one of α or βcan derive empty string
⚫If β=>
* ɛ then αdoes not derive any string beginning with a
terminal in Follow(A) and viceversa.

Sunitha G 34
Algorithm:
Construction of predictive parsing table
Input: Grammar G
Output: Parsing table M
⚫For each production A->α in grammar, do the
following:
1. For each terminal a in First(α) add A->α in M[A,a]
2. If ɛ is in First(α), then for each terminal b in
Follow(A) add A-> α to M[A,b]. If ɛ is in First(α)
and $ is in Follow(A), add A->αto M[A,$] as well
⚫ If after performing the above, there is no production
in M[A,a] then set M[A,a] to error

Sunitha G 35
Example First Follow
E -> TE’ F {(, id} {+, *, ), $}
E’ -> +TE’ | Ɛ T {(, id} {+, ), $}
T -> FT’ E {(, id} {), $}
T’ -> *FT’ | Ɛ E’ {+, ɛ} {), $}
F -> (E) | id T’ {*, ɛ} {+, ), $}

Input Symbol
Non-terminal id + * ( ) $
E E -> TE’ E -> TE’

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

F F -> id F -> (E)


Sunitha G 36
Another example
First Follow
S -> iEtSS’ | a S {i, a} {e, $}
S’ -> eS | Ɛ S’ {e, Ɛ} {e, $}
E -> b E {b} {t}

Input Symbol
Non-terminal a b e i t $
S S -> a S -> iEtSS’

S’ -> Ɛ S’ ->
S’
S’ -> eS Ɛ
E E -> b

Sunitha G 37
Non-recursive predicting parsing
a + b $

Predictive
parsing output
stack X
Y program
Z
$
Parsing
Table
M

Sunitha G 38
Predictive parsing algorithm
Set a point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance a;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}

Sunitha G 39
Example
id + * ( ) $
E E -> E ->
E’ T E’ -> T E’ -
E E E’ -
T ’ +TE’ ’
> >
T -> T -> Ɛ Ɛ
T’ F F
⚫id+id*id$ F F-
T

T’ - T’ ->
> *FT’
F ->
T

T’ -
>
T’ -
>
Ɛ Ɛ Ɛ
> (E)
i
Matched Stack d Input Action
E$ id+id*id$

Sunitha G 40
Error recovery in predictive parsing
⚫Panic mode
⚫Place all symbols in Follow(A) into synchronization set for
nonterminal A: skip tokens until an element of Follow(A) is seen
and pop A from stack.
⚫Add to the synchronization set of lower level construct the
symbols that begin higher level constructs. Ex: ; in C.
⚫Add symbols in First(A) to the synchronization set of nonterminal
A
⚫If a nonterminal can generate the empty string then the
production deriving Ɛ can be used as a default
⚫If a terminal on top of the stack cannot be matched, pop the
terminal, issue a message saying that the terminal was inserted and
continue parsing.

Sunitha G 41
Non - Input Symbol
terminal id + ( ) $
Example E E -> TE’
*
E -> TE’ synch synch

E’ E’ -> +TE’ E’ -> E’ ->


Ɛ Ɛ
T -> FT’ synch T -> FT’ synch synch
T
T’ -> T’ -> *FT’ T’ -> T’ ->
T’
Ɛ Ɛ Ɛ
F F -> id synch synch F -> (E) synch synch

Stack Input Action


E$ )id*+id$ Error, Skip )
E$ id*+id$ id is in First(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
FT’E’$ +id$ Error, M[F,+]=synch
T’E’$ +id$ F has been poped

Sunitha G 42
⚫Phrase-level Recovery
⚫Phrase-level error recovery is implemented by filling in
the blank entries in the predictive parsing table with
pointers to error routines.
⚫These routines may change, insert, or delete symbols
on the input and issue appropriate error messages.
They may also pop from the stack.
Problems:
⚫First, the steps carried out by the parser might then
not correspond to the derivation of any word in the
language at all.
⚫Second, we must ensure that there is no possibility of
an infinite loop.

Sunitha G 43
Bottom-up Parsing

Sunitha G 44
Introduction
⚫Constructs parse tree for an input string beginning
at the leaves (the bottom) and working towards the
root (the top)
⚫Example: id*id

E -> E + T | T id*id F * id T * id T*F T E


T -> T * F | F
F -> (E) | id id F F id T*F T

id id F id T*F

id F id

id

Sunitha G 45
Shift-reduce parser
⚫The general idea is to shift some symbols of input to
the stack until a reduction can be applied
⚫At each reduction step, a specific substring matching
the body of a production is replaced by the
nonterminal at the head of the production
⚫The key decisions during bottom-up parsing are
about when to reduce and about what production to
apply
⚫A reduction is a reverse of a step in a derivation
⚫The goal of a bottom-up parser is to construct a
derivation in reverse:
⚫E=>T=>T*F=>T*id=>F*id=>id*id
Sunitha G 46
Handle pruning
⚫A Handle is a substring that matches the body of a
production and whose reduction represents one step
along the reverse of a rightmost derivation

Right sentential form Handle Reducing production


id*id id F->id
E -> E + T | T F*id F T-
T -> T * F | F T*id id >F
F->id
F -> (E) | id T*F T*F T-
T T >T*F
E->T

Sunitha G 47
A handle A -> β in the parse tree for αβw
A rightmost derivation in reverse can be
S obtained by "handle pruning.“
⚫We start with a string of terminals w to
be parsed. If w is a sentence of the
A grammar at hand, then let w = γn ,
where γn is the nth right-sentential
α β w form of some as yet unknown rightmost
derivation,
S => γ0 => γ1 => γ2……..=> γn-1=> γn = w
⚫If S => αAw =>αβw ,
⚫To reconstruct this derivation in reverse
then the production order, we locate the handle βn in γn and
A -> β in the position replace βn by the head of the relevant
following α is a production An -> βn to obtain the
handle of αβw. previous right-sentential form γn-1.
⚫We then repeat this process.

Sunitha G 48
Shift-Reduce parsing
⚫A stack is used to hold grammar symbols
⚫Handle always appear on top of the stack
⚫Initial configuration:
Stack Input
$ w$
⚫Acceptance configuration
Stack Input
$S $

Sunitha G 49
Shift-Reduce parsing (cont.)
⚫Basic operations:
⚫Shift
⚫Reduce
Stack Input Action
⚫Accept
⚫Error
$ id*id$ shift
$id *id$ reduce by F->id
$F *id$ reduce by T-
⚫Example: id*id $T *id$ shift
>F
$T* id$ shift
$T*id $ reduce by F->id
E -> E + T | T $T*F $ reduce by T-
T -> T * F | F $T $ reduce by E->T
>T*F
F -> (E) | id $E $ accept

Sunitha G 50
Handle will appear on top of the stack

S S
A
B
B A

αβ γ y z α γ x y z

Stack Input Stack Input


$αβγ yz$ $αγ xyz$
$αβB yz$ $αB xyz$
$αβBy z$ $αBxy z$

Sunitha G 51
Conflicts during Shift-Reduce parsing
⚫Two kind of conflicts
⚫shift/reduce conflict
⚫reduce/reduce conflict

Shift/reduce conflict
⚫Example:

Stack Input
… if expr then stmt else …$

Sunitha G 52
Reduce/reduce conflict
S -> AB
Stack Input Action
A -> aA |
ab $ ab…$
$…ab …$ reduce/reduce conflict
B -> bB | ab

stmt -> id(parameter_list)


stmt -> expr:=expr
parameter_list->parameter_list, parameter
parameter_list->parameter
parameter->id
expr->id(expr_list) Input
Stack
expr->id
expr_list->expr_list, expr … id ( id , id ) …$
expr_list->expr … id ( id , id ) …$
Sunitha G 53
Exercise: For the following grammar and string, obtain
its bottom-up parse tree. Also, indicate the handle in the
right-sentential form.
S -> SS + | SS * | a aaa*a++

S => SS+ Reverse the derivation steps and construct bottom-up parse tree.
=> SSS++ aaa*a++ S SS
=> SSa++ aaa*a++ aaa*a++
=> SSS*a++
=> SSa*a++ Right sentential form Handle Reducing production
=> Saa*a++ aaa*a++ a S->a
=> aaa*a++ Saa*a++ a S->a
SSa*a++ a S->a
SSS*a++ SS* S->SS*
SSa++ a S->a
SSS++ SS+ S->SS+
SS+ SS+ S->SS+
S
Sunitha G 54

You might also like