Chapter Three
3.1 Context free languages
Context free languages
A context-free language has important applications in the design of programming languages as
well as in the construction of efficient compilers.
Context-Free Language (CFL) is a language generated by a Context-Free Grammar (CFG). CFG
is a set of rules for deriving (or generating) strings (or sentences) in a language.
A grammar G = (N, T, S, P) is said to be context-free if all productions in P have the form
A → α where A is a Non-terminal [A ∈ N] and α is any string containing Non-terminals and/or
terminals. [α ∈ (N ∪ T)*].
Definition: Let L be a language. Then L is a context-free language if and only if there exists a
context-free grammar G such that L = L(G).
Every regular grammar is context-free, so a regular language is also a context- free one, so we
see that the family of regular languages is a proper subset of the family of context-free
languages. Context- free grammars derive their name from the fact that the substitution of the
variable on the left of a production can be made any time such a variable appears in a sentential
form. It does not depend on the symbols in the rest of the sentential form (the context). This
feature is the consequence of allowing only a single variable on the left side of the production.
Examples of Context-Free languages:
The grammar G = ({S}, {a, b}, S, P) with productions
S → aSa
S → bSb
S→ε
is context-free. A typical derivation in this grammar is
S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa
This makes it clear that
L(G) = {wwR : w ∈ {a, b}*}. The language is context-free
Page | 1
3.2. Parsing and ambiguity
Definition: Let G = (V, T, P, S) be a CFG. A tree is a derivation (or parse) tree if:
– Every vertex has a label from V U T U {ε}
– The label of the root is S
– If a vertex with label A has children with labels X1, X2,…, Xn, from left to right,
then
A –> X1, X2,…, Xn must be a production in P
If a vertex has label ε, then that vertex is a leaf and the only child of its’ parent.
Example
For generating a language that generates equal number of a’s and b’s in te form anbn . the context
free grammar is defined as G= {(S, A), (a, b), (S aAb, A aAb/ ε)}
SaAb
aaAbb (AaAb)
aaaAbbb (A aAb)
aaabb (A ε)
aaabbb a3b3 anbn
Page | 2
Parsing is the process of analyzing a string of symbols, either in natural or computer language.
Ambiguity
Grammar can be derived in either leftmost or rightmost derivation. We can draw the derivation
tree called as parse tree or syntax tree.
Parse tree is a unique one though the derivation is leftmost or rightmost.
If there exists more than one parse tree for a given grammar that means more than one
rightmost or leftmost is possible: that grammar is said to be ambiguous grammar.
Page | 3
Example: the grammar with production
S → aSb|SS|ε , is ambiguous
The sentence aabb has two derivation trees
Let us consider this grammar: E→ E+E|id, We can create a 2-parse tree from this grammar to
obtain a string id+id+id. The following are the 2 parse trees generated by left-most derivation:
From the above grammar String "id + id + id" can be derived in 2 ways:
First Leftmost derivation
E→E+E
→ id + E
Page | 4
→ id + E + E
→ id + id + E
→ id + id + id
Second Leftmost derivation
E→E+E
→E+E+E
→ id + E + E
→ id + id + E
→ id + id + id
Both the above parse trees are derived from the same grammar rules but both parse trees are
different. Hence, the grammar is ambiguous.
For example, consider the grammar
S → SS
S → aSb
S→ε
The sentence aabb has two derivation trees as shown in the figures below:
Hence the above grammar is an Ambiguous grammar.
Example:
Check whether the given grammar G is ambiguous or not.
A → AA
A → (A)
A→a
Solution:
Page | 5
For the string "a(a)aa" the above grammar can generate two parse trees:
Since there are two parse trees for a single string "a(a)aa", the grammar G is ambiguous.
Ambiguity is a common feature of natural languages, where it is tolerated and dealt with in a
variety of ways. In programming languages, where there should be only one interpretation of
each statement, ambiguity must be removed when possible. Often we can achieve this by
rewriting the grammar in an equivalent, unambiguous form.
Unambiguous Grammar
A grammar can be unambiguous if the grammar does not contain ambiguity that means if it does
not contain more than one leftmost derivation or more than one rightmost derivation or more
than one parse tree for the given input string.
Example 1:
Consider a grammar G is given as follows:
S → AB | aaB
A → a | Aa
B→b
Solution:
Let us derive the string "aab"
Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct an
unambiguous grammar equivalent to G.
Page | 6
As there are two different parse tree for deriving the same string, the given grammar is
ambiguous.
Unambiguous grammar will be:
S → AB
A → Aa | a
B→b
Sentential forms
Replacing right most non terminal, then replacing right most non terminal, and so on. So always
replacing right most non terminal, such a derivation is called right most derivation. Left most
and right most derivation are very important because later on when you learn about compiler
realizing top most parsing used for left most derivation and bottom most parsing used for right
most derivation.
We will consider one more example Venice is a beautiful city.
<S> <NP1> <VP> <S> <NP1><VP>
<NP1> <PropN> <PropN><VP>
<PropN Venice Venice <VP>
>
<VP> <V> <NP2> Venice <V><NP2>
<V> Is Venice is <NP2>
<NP2> <Art><Adj><N Venice is <Art><Adj><N2>
2>
<Art> A Venice is a <Adj><N2>
<Adj> beautiful Venice is a beautiful <N2>
<N2> City Venice is a beautiful city.
<S> rewritten as <NP1><VP>, then <NP1> rewritten as <PropN>, then <PropN>
rewritten as Venice, then <VP> rewritten as <Verb><NP2>, then <Verb> rewritten
as is, then <NP2> rewritten as <Art><Adj><N2>, then <Art> rewritten as a, then
<Adj> rewritten as beautiful and finally <N2> rewritten as city. So Grammar
generates the sentence Venice is a beautiful city.
Page | 7
3.3 Derivation tree or parse tree
Derivation tree is a graphical representation for the derivation of the given production rules for
the given CFG. It is called as parse tree.
Properties:
• Root node indicating start symbol
• Derivation is read from left to right
• Leaf nodes are terminals
• The interior nodes are non terminals
Example: Let G be the grammar
S aB | bA
A a | aS | bAA
B b | bS | aBB
For the string baaabbabba find leftmost, rightmost derivation and Derivation tree
First, write the production rules separately like below
S aB rule1
S bA rule2
Aa rule3
A aS rule4
A bAA rule5
Bb rule6
B bS rule7
B aBB rule8
Page | 8
3.4. Left most and right most derivations
Generation of the string using production rules is called derivation. In derivation of string:
replacing the non -terminal by appropriate rule. If there are more non-terminal available then
which non-terminal has to replace first; which is based on the two methods listed below:
Leftmost Derivation
The derivation is said to be leftmost if in each step the leftmost variable in the sentential form is
replaced. If in each step the right most variable is replaced, we call the derivation rightmost.
Example: consider the grammar with productions
S → aAB
A → bBb
B → A|λ
Page | 9
S → aAB → abBbB → abAbB → abbBbbB → abbbbB → abbbb is leftmost derivation of the
string abbbb. A rightmost derivation of the same string
S → aAB → aA → abBb → abAb → abbBbb → abbbb
Rightmost Derivation
Its derivation in which, rightmost non-terminal is replaced first from the sentential form
Simplification of context free grammar
All Grammars are not always optimized. That means grammars are containing some unnecessary
symbols(non-terminals) and this will increase the length of the grammar.
Simplification of grammar means reduction of grammar by removing useless symbols.
This is done in three ways:
i. Removal of useless symbols
ii. Elimination of ε production
iii. Removal of unit production
Removal of Useless Symbols
If a symbol does not exist on the RHS of the production rule and does not participate in the
derivation of any string, it may be considered to be useless. Similar to this, if a variable is not
used to create any strings, it may be worthless.
Example:
T → xxY | xbX | xxT
X → xX
Y → xy | y
Z → xz
In the example given above, the variable ‘Z’ will never occur in any string’s derivation. Thus,
the production Z → xz is useless. Therefore, let’s eliminate it. This way, all the other productions
would be written in a way that the variable Z can never reach from the ‘T’ starting variable.
Production X → xX would also be useless because it cannot terminate it in any way. But if it
never terminates, it can never ever produce a string. Thus, such a production can not participate
in any derivation ever.
Elimination of ε Production
All the productions that are of type S → ε are known as ε productions. These types of
productions can be removed only from the grammars that do not happen to generate ε.
Page | 10
Step 1: Find all nullable non-terminal variables, which derive ε first.
Step 2: For all the productions of X → a, construct all production X → a, where a is obtained
from x by removing non-terminals from the first step.
Step 3: Combine the result of the second step with the original production. Remove all the ε
productions.
Example:
Let us remove the production from the CFG given below by preserving its meaning.
S → ABA
A → 0A | ε
B → 1B | ε
Solution:
While removing ε production, the rule A → ε and B → ε would be deleted. To preserve the
CFG’s meaning, we are placing ε actually at the RHS whenever A and B have appeared.
Let us take
S → ABA
If the first A at RHS is ε. Then,
S → BA
Similarly, if the last A in RHS = ε. Then,
S → AB
If B = ε, then
S → AA
If B and A are ε, then
S→A
If both A is replaced by ε, then
S→B
Now,
S → AB | BA | AA | A | B
Now, let us consider
A → 0A
If we place ε at RHS for A, then
A→0
Page | 11
A → 0A | 0
Similarly, B → 1B | 1
We can rewrite the CFG collectively with removed ε production as
S → AB | BA | AA | A | B
A → 0A | 0
B → 1B | 1
Removing Unit Productions
The productions that result in the giving of one non-terminal to another are known as unit
productions. To get rid of unit production, take the following actions:
Step 1: To remove A → B, add production A → x to the grammar rule whenever B → x occurs
in the grammar.
Step 2: Delete A → B from the grammar.
Step 3: Repeat the first and the second steps until all the unit productions are removed.
Example:
S → 0X | 1Y | Z
X → 0S | 00
Y→1|X
Z → 01
Solution:
S → Z is a unit production. However, while removing S → Z, we have to consider what Z gives.
So, we can add a rule to S.
S → 0X | 1Y | 01
Similarly, Y → X is also a unit production, so we can modify it as
B → 1 | 0S | 00
Thus, we can write CFG finally without the unit production, as follows:
S → 0X | 1Y | 01
X → 0S | 00
Y → 1 | 0S | 00
Z → 01
3.5 Chomsky’s hierarchy of grammars
According to Chomsky hierarchy, grammar is divided into 4 types as follows:
Page | 12
1. Type 0 is known as unrestricted grammar.
2. Type 1 is known as context-sensitive grammar.
3. Type 2 is known as a context-free grammar.
4. Type 3 Regular Grammar.
Type 0: Unrestricted Grammar:
Type-0 grammars include all formal grammar. Type 0 grammar languages are recognized by
turing machine. These languages are also known as the Recursively Enumerable languages.
Grammar Production in the form of α → β where α is a string of terminals and nonterminals
with at least one non-terminal and α cannot be null. β is a string of terminals and non-
terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in the form
αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
Page | 13
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.
Example
AB → AbBc
A → bcA
B→b
Type-2 Context Free Grammar:
Type-2 grammars generate context-free languages.
The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
These languages generated by these grammars are be recognized by a non-deterministic
pushdown automaton.
Example
S→Xa
X→a
X → aX
X → abc
X→ε
Type-3 Regular Grammar:
Generate the regular languages. Such a grammar restricts its rules to a single non-terminal on the
left-hand side and the right-hand side consisting of a single terminal, possibly followed by a
single non-terminal (right regular).
Example
X→ε
X → a | aY
Y→b
Page | 14