Context Free Grammar CFG
Context Free Grammar CFG
if a<0 then U+V else if a*b < 17 then U/V else if k <> y then V/U
else 0
Example derivation in a Grammar
• Grammar: start symbol is A
A → aAa
A→B
B → bB
B→ε
• Sample Derivation:
A ⇒ aAa ⇒ aaAaa ⇒ aaaAaaa ⇒ aaaBaaa ⇒ aaabBaaa
⇒ aaabbBaaa ⇒ aaabbaaa
• Language?
Derivations in Tree Form
Example CFG
• An example of a context-free grammar, which we call G1 .
A → 0A1
A→B
B→#
Collection of substitution rules, called productions. Each rule appears as a line in the
grammar, comprising a symbol and a string separated by an arrow. The symbol is
called a variable.
stmt → id = expression;
| if(expression) stmt
| if(expression) stmt else stmt
| while(expression) stmt
| do stmt while (expression);
| {stmts}
A → aA 0 | bA 0 | aA 1
Describe the same
language: the set of → bA 2
strings of a’s and b’s
ending with abb → bA 3
→∈
CFG vs. Regex
• Language L = {anbn | n>=1} can be described by a grammar but not by a
regex
• Suppose L was defined by some regex
– We could construct a DFA with a finite number of states, say k, to accept L
Path aj-i State si: For an input beginning
with more than k a’s
Path ai
--- si aibi is in the language: A path
s0 bi from si to state f
Path ajbi is also possible
--- Path bi This DFA accepts both aibi
and ajbi
33
Right-Most Derivation Parse Tree
34
Ambiguous Grammar
• A grammar can have more than one parse tree
generating a given string of terminals. Such a
grammar is said to be ambiguous.
E → E '+ E | E '
E ' → id * E ' | id | ( E ) * E ' | ( E )
Enforces precedence of * over +
41
Example
E → E '+ E | E '
id + id * id
E ' → id * E ' | id | ( E ) * E ' | ( E )
E
id *id + id
E E’ + E
E’ E E’
+ id
id E’ id E’
* *
id
id id
42
Example
Another Ambiguous Grammar
44
Dangling Else
E → if E then E
| if E then E else E | OTHER
• The expression
if E 1 then if E 2 then E 3 else E 4
has two parse trees
The
if E else E ‘ELSE’ if E then E
then E
should
be
consider
if E then E ed with if E then E else E
which
‘THEN’ 45
Dangling Else
The
if E else E ‘ELSE’ if E then E
then E
should
be
consider
if E then E ed with if E then E else E
which
‘THEN’
46
Dangling Else
E → matchedIF //all THEN are matched
| unmatchedIF //someTHEN is unmatched
matchedIF→ if E then matchedIFelse matchedIF
| OTHER
umatchedIF→ if E then E
| if E then matchedIFelse unmatchedIF
47
Dangling Else
• Consider again the expression
if E 1 then if E 2 then E 3 else E 4
if E then E if E
else MIF
then MIF
if E else MIF
then MIF
48
Ambiguity
There are no general techniques for handling
ambiguity
It is impossible to automatically convert an
ambiguous grammar into an unambiguous one
If used sensibly, ambiguity can simplify the
grammar
• Disambiguation Rules: Instead of re-writing the
grammar, we can
– Use the ambiguous grammar
– Along with disambiguation rules. 49
Disambiguation Rules
• Precedence and Associativity Declarations
• %left: all tokens following this declaration are
left-associative
• %right: all tokens following this declaration are
right-associative
• Precedence is established by the order of the
%left and %right declarations
• %left ‘+’ ‘-’
• %right ‘*’ ‘/’
– ‘*’ has a higher precedence than ‘+’, so ‘1+2*3’ would
be evaluated as ‘1+(2*3)’
• %nonassoc: the specified operators may not be
used together, e.g., %nonassoc ‘>’ ‘<‘. 50
Associativity Example
E → E + E | int
%left + 51
Precedence Example
E → E + E | E * E | int
%left +
% left * 52
Associativity of Operators
• The operator + associates to the left
An operator with + signs on both sides of it
belongs to the operator to its left.
E → E + T | E – T | T
E → T * F | T / F | F
F → (E) | id
Derivations Using a Grammar
We apply the productions of a CFG to infer that certain
strings are in the language of a certain variable.
There are two approaches to this inference.
The more conventional approach is to use the rules
from body to head. That is, we take strings known to
be in the language of each of the variables of the
body, concatenate them, in the proper order, with
any terminals appearing in the body, and infer that
the resulting string is in the language of the variable
in the head. This procedure is called Recursive
inference.
Derivations Using a Grammar
• There is another approach to defining the
language of a grammar, in which we use the
productions from head to body. We expand the
start symbol using one of its productions (i.e.,
using a production whose head is the start
symbol). We further expand the resulting
string by replacing one of the variables by the
body of one of its productions, and so on, until
we derive a string consisting entirely of
terminals. The language of the grammar is all
strings of terminals that we can obtain in this
way. This use of grammars is called derivation.
Example
Let us explore a more complex CFG that
represents (a simplification of) expressions in
a typical programming language. First we shall
limit ourselves to the operators + and *,
representing addition and multiplication. We
shall allow arguments to be identifiers, but
instead of allowing the full set of typical
identifiers (letters followed by zero or more
letters and digits), we shall allow only the
letters a and b and the digits 0 and 1. Every
identifier must begin with a or b, which may be
followed by any string in {a, b, 0, 1}*.
Example
We need two variables in this grammar. One, which we
call E, represents expressions. It is the start symbol
and represents the language of expressions we are
defining. The other variable, I, represents identifiers.
Its language is actually regular; it is the language of
the regular expression
(a | b)(a | b | 0 | 1)*
However, we shall not use regular expressions directly
in grammars. Rather we use a set of productions
that say essentially the same thing as this regular
expression.
Example
A context-free grammar for simple expressions
1. E → I
2. E → E+E
3. E → E*E
4. E → (E)
5. I → a
6. I → b
7. I → Ia
8. I → Ib
9. I → I0
10. I → I1
Example
The grammar for expressions is stated
formally as G = ({E, I}, T, P, E), where T is the
set of symbols {+, *, (,), a, b, 0, 1} and P is the
set of productions shown above.
Tree Traversal
• Tree traversals are used for describing
attribute evaluation and for specifying the
execution of code fragments in a translation
scheme.
• A traversal of a tree starts at the root and
visits each node of the tree in some order.
Depth-First Traversal
• Depth-first traversal starts at the root and
recursively visits the children of each node in
any order, not necessarily from left to right . It
is called "depth-first“ because it visits an
unvisited child of a node whenever it can, so it
visits nodes as far away from the root (as
"deep") as quickly as it can.
Depth-First Traversal
• The procedure visit(N) in Fig. is a depth-first
traversal that visits the children of a node in
left-to-right order, as shown in Fig.
Depth-First Traversal
Depth-First Traversal
• Synthesized attributes can be evaluated
during any bottom-up traversal, that is, a
traversal that evaluates a:ttributes at a node
after having evaluated attributes at its
children.
• In general, with both synthesized and
inherited attributes, the matter of evaluation
order is quite complex
Questions
• CFG representing the Regular Expression a+
A → aA