PL Units1.2
PL Units1.2
PL Units1.2
3.1 Introduction
3.2 The General Problem of Describing Syntax
3.3 Formal Methods of Describing Syntax
3.4 Attribute Grammars
3.5 Describing the Meanings of Programs: Dynamic Semantics
Chapter 3
Describing Syntax and Semantics
3.1 Introduction
Syntax – the form of the expressions, statements, and program units
Semantics - the meaning of the expressions, statements, and program units.
Ex: the syntax of a Java while statement is
– The semantics of this statement form is that when the current value of the Boolean
expression is true, the embedded statement is executed.
– The form of a statement should strongly suggest what the statement is meant to
accomplish.
Lexemes Tokens
index identifier
= equal_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon
Language Recognizers and Generators
In general, language can be formally defined in two distinct ways: by recognition and by
generation.
Language Recognizers:
– A recognition device reads input strings of the language and decides whether the input
strings belong to the language.
– It only determines whether given programs are in the language.
– Example: syntax analyzer part of a compiler. The syntax analyzer, also known as
parsers, determines whether the given programs are syntactically correct.
Language Generators:
– A device that generates sentences of a language
– One can determine if the syntax of a particular sentence is correct by comparing it to the
structure of the generator
3.3 Formal Methods of Describing Syntax
It is a syntax description formalism that became the most widely used method for
programming language syntax.
3.3.1.3 Fundamentals
A metalanguage is a language used to describe another language. BNF is a metalanguage for
programming language.
In BNF, abstractions are used to represent classes of syntactic structures--they act like
syntactic variables (also called nonterminal symbols)
A grammar is a finite nonempty set of rules and the abstractions are called nonterminal
symbols, or simply nonterminals.
The lexemes and tokens of the rules are called terminal symbols or terminals.
A BNF description, or grammar, is simply a collection of rules.
An abstraction (or nonterminal symbol) can have more than one RHS
For Example, a Java if statement can be described with the rule
Multiple definitions can be written as a single rule, with the different definitions separated by
the symbol |, meaning logical OR.
<ident_list> identifier
| identifier, <ident_list>
Figure 3.2 Two distinct parse trees for the same sentence, A = B + C * A
Figure 3.3 The unique parse tree for A = B + C * A using an unambiguous grammar
Rightmost derivation of the sentence A = B + C * A
<assign> => <id> = <expr>
=> <id> = <expr> + <term>
=> <id> = <expr> + <term> * <factor>
=> <id> = <expr> + <term> * <id>
=> <id> = <expr> + <term> * A
=> <id> = <expr> + <factor> * A
=> <id> = <expr> + <id> * A
=> <id> = <expr> + C * A
=> <id> = <term> + C * A
=> <id> = <factor> + C * A
=> <id> = <id> + C * A
=> <id> = B + C * A
=> A = B + C * A
Both of these derivations, however, are represented by the same parse tree.
3.3.1.9 Associativity of Operators
Do parse trees for expressions with two or more adjacent occurrences of operators with equal
precedence have those occurrences in proper hierarchical order?
An example of an assignment using the previous grammar is: A = B + C + A
Figure above shows the left + operator lower than the right + operator. This is the correct
order if + operator meant to be left associative, which is typical.
When a grammar rule has LHS also appearing at beginning of its RHS, the rule is said to be
left recursive. The left recursion specifies left associativity.
In most languages that provide it, the exponentiation operator is right associative. To
indicate right associativity, right recursion can be used. A grammar rule is right recursive if
the LHS appears at the right end of the RHS. Rules such as
Because of minor inconveniences in BNF, it has been extended in several ways. EBNF
extensions do not enhance the descriptive powers of BNF; they only increase its readability
and writability.
Three extension are commonly included in various versions of EBNF
– Optional parts are placed in brackets ([ ])
Without the use the brackets, the syntactic description of this statement would require
the following two rules:
– Put multiple-choice options of RHSs in parentheses and separate them with vertical bars
(|, OR operator)
In BNF, a description of this <term> would require the following three rules:
Context-free grammars (CFGs) cannot describe all of the syntax of programming languages.
In Java, for example, a floating-point value cannot be assigned to an integer type variable,
although the opposite is legal.
The static semantics of a language is only indirectly related to the meaning of programs
during execution; rather, it has to do with the legal forms of programs (syntax rather than
semantics).
Many static semantic rules of a language state its type constraints. Static semantics is so
named because the analysis required to these specifications can be done at compile time.
Attribute grammars was designed by Knuth (1968) to describe both the syntax and the
static semantics of programs.
Attribute grammars have additions to are context-free grammars to carry some semantic
information on parse tree nodes.
Attribute grammars are context-free grammars to which have been added attributes, attribute
computation functions, and predicate function.
Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined
outside the parse tree.
For example, the type of an instance of a variable in a program could come form the symbol
table, which is used to store variable names and their types.
3.4.5 Examples Attribute Grammars
The look-up function looks up a given variable name in the symbol table and returns the
variable’s type
Now, consider the process of computing the attribute values of a parse tree, which is
sometimes called decorating the parse tree.
The tree in Figure 3.7 show the flow of attribute values in the example of Figure 3.6.
3.4.7 Evaluation
Checking the static semantic rules of a language is an essential part of all compiler.
– One of the main difficulties in using an attribute grammar to describe all of the syntax
and static semantics of a real contemporary programming language is the size and
complexity of the attribute grammar.
– Furthermore, the attribute values on a large parse tree are costly to evaluate.
3.5 Describing the Meanings of Programs: Dynamic Semantics
C Statement Meaning
for (expr1; expr2; expr3) { expr1;
… loop: if expr2 == 0 goto out
} …
expr3;
goto loop
out: …
Denotational semantics is the most rigorous and most widely known formal method for
describing the meaning of programs.
It is solidly based on recursive function theory.
<bin_num> '0'
| '1'
| <bin_num> '0'
| <bin_num> '1'
– A parse tree for the example binary number, 110, is show in Figure 3.9.
Mbin ('0') = 0
Mbin ('1') = 1
Mbin (<bin_num> '0') = 2 * Mbin (<bin_num>)
Mbin (<bin_num> '1') = 2 * Mbin (<bin_num>) + 1
– The meanings, or denoted objects (which in this case are decimal numbers), can be
attached to the nodes of the parse tree, yielding the tree in Figure 3.10)
Example 2:
– The syntactic domain is the set of character string representations of decimal numbers.
The semantic domain is once again the set N.
<dec_num> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
| <dec_num> ('0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9')
3.5.3.1 Assertions
The logical expressions are called predicates, or assertions.
An assertion before a statement (a precondition) states the relationships and constraints
among variables that are true at that point in execution.
An assertion following a statement is a postcondition.
– An example: a = b + 1 {a > 1}
If the weakest precondition can be computed from the given postcondition for each statement
of a language, then correctness proofs can be constructed from programs in that language.
Program proof process: The postcondition for the whole program is the desired result.
Work back through the program to the first statement. If the precondition on the first
statement is the same as the program spec, the program is correct.
An Axiom is a logical statement that is assumed to be true.
An Inference Rule is a method of inferring the truth of one assertion on the basis of the
values of other assertions.
S1, S2, …, Sn
S
– The rule states that if S1, S2, …, and Sn are true, then the truth of S can be inferred. The
top part of an inference rule is call its antecedent; the bottom part is called it consequent.
a = b / 2 – 1 {a < 10}
b / 2 – 1 < 10
b / 2 < 11
b < 22
∴ the weakest precondition for the given assignment and the postcondition is {b < 22}
An assignment statement has a side effect if it changes some variable other than its left side.
Ex:
x = 2 * y – 3 {x > 25}
2 * y – 3 > 25
2 * y > 28
y > 14
∴ the weakest precondition for the given assignment and the postcondition is {y > 14}
Ex:
x = x + y – 3 {x > 10}
x + y – 3 > 10
y > 13 – x
∴ the weakest precondition for the given assignment and the postcondition is {y > 13 -x}
3.5.3.4 Sequences
The weakest precondition for a sequence of statements cannot be described by an axiom,
because the precondition depends on the particular kinds of statements in the sequence.
In this case, the precondition can only be described with an inference rule.
Let S1 and S2 be adjacent program statements. If S1 and S2 have the following preconditions
and postconditions.
{P1} S1 {P2}
{P2} S2 {P3}
Ex:
y = 3 * x + 1;
x = y + 3; {x < 10}
y + 3 < 10
y < 7
3 * x + 1 < 7
3 * x < 6
x < 2
∴ the weakest precondition for the first assignment statement is {x < 2}
3.5.3.5 Selection
We next consider the inference rule for selection statements, the general form of which is
if B then S1 else S2
We consider only selections that include else clause. The inference rule is
If (x > 0) then
y = y - 1;
else
y = y + 1;
{y > 0}
{I and B} S {I}
{I} while B do S end {I and (not B)}
To find I, we use the loop postcondition to compute preconditions for several different
numbers of iterations of the loop body, starting with none. If the loop body contains a single
assignment statement, the axiom for assignment statements can be used to compute these
cases.
Characteristics of the loop invariant: I must meet the following conditions:
– P => I -- the loop invariant must be true initially
– {I} B {I} -- evaluation of the Boolean must not change the validity of I
– {I and B} S {I} -- I is not changed by executing the body of the loop
– (I and (not B)) => Q -- if I is true and B is false, Q is implied
– The loop terminates -- can be difficult to prove
Ex:
It is now obvious that {y < x} will suffice for cases of one or more iterations. Combining this
with {y = x} for the 0 iterations case, we get {y <= x} which can be used for the loop
invariant.
Ex: