Regular Expressions
READING: CHAPTER 2
AN INTRODUCTION TO FORMAL
L ANGUAGES AND AUTOMATA THEORY
PETER LINZ
1
Regular Expressions
One way of describing regular languages
The notation involves
a combination of strings of symbols from some alphabet Σ
parentheses
operators +, ., and *
Definition
Let Σ be a given alphabet. Then
1. Ø,λ and a ∈ Σ are all regular expressions. These are called primitive regular
expressions.
2. If r1 and r2 are regular expressions, so are r1+ r2,r1.r2, , and (r1).
3. A string is a regular expression if and only if it can be derived from the primitive
regular expressions by a finite number of applications of the rules in (2).
2
Example
For Σ = {a, b, c}, the string
(a+b+c)* .(c+Ø) is a regular expression, since it is constructed
by application of the above rules.
For example,
if we take r1 = c and r2 = Ø, we find that c + Øand (c + Ø)
are also regular expressions.
Repeating this, we eventually generate the whole string.
On the other hand, (a + b +) is not a regular expression, since
there is no way it can be constructed from the primitive
regular expressions.
3
Languages Associated
with Regular
Expressions
The language L(r) denoted by any regular expression r is defined by the
following rules.
1. Ø is a regular expression denoting the empty set.
2. λ is a regular expression denoting {λ}.
3. For every a ∈ Σ, a is a regular expression denoting {a}.
If r1 and r2are regular expressions, then
4. L (r1 + r2) = L (r1)∪ L (r2)
5.L (r1 · r2) = L (r1) L (r2)
6. L ((r1)) = L (r1)
7.L ( ) = (L (r1))*.
4
Example
For Σ = {a,b}, the expression
r=(a+b)*(a+bb)
is regular.
It denotes the language L (r)= {a, bb, aa, abb, ba, bbb,…}.
The first part, (a + b)*, stands for any string of a’s and b’s.
The second part, (a + bb) represents either an a or a double b.
Consequently, L(r) is the set of all strings on {a, b}, terminated by either
an a or a bb.
5
Example
The language denoted by the expression
r =(aa)* (bb)* b is
the set of all strings with an even number of a’s followed by an odd
number of b’s;
that is, L (r) = {a2nb2m+1: n ≥ 0, m ≥ 0}
6
Regular Expressions vs.
Finite Automata
Automata => more machine-like
Regular expressions => more program syntax-like
Unix environments heavily use regular expressions
E.g., bash shell, grep, vi & other editors, sed
Perl scripting –good for string processing
Lexical analyzers such as Lex or Flex
7
Finite Automata to RE
8
Regular Expressions vs. Finite
Automata
(a) nfa accepts Ø.
(b) nfa accepts {λ}.
(c) nfa accepts {a}.
9
Automation for Languages
10
Automation for
Languages
11
Regular Expression to
FA
12
Example
Find an nfa that accepts L(r), where
r=(a + bb)* (ba* + λ) i.e., Automaton accepting L ((a + bb)* (ba* + λ)).
(a) M1 accepts L(a + bb). (b) M2 accepts L (ba* + λ)
13
Algebraic Laws…
Distributive:
E(F+G) = EF + EG
(F+G)E = FE+GE
Idempotent:
E+E=E
14
Summary
Regular Expressions
Equivalence to Finite Automata
DFA to Regular Expression conversion
Regular Expression to NFA conversion
Algebraic Laws of Regular Expressions
15