prog1_re.py
Re
containing the following instance variables: an instance variable type
indicating which type of regexp it is, an instance variable symb
in case it is a symbol regexp, and two instance variables left
and right
for possible children. To illustrate how the various instances are used for each case of a regular expression construction, we provide a to_string()
method that translates a regular expression into a string.
class Re:
def __init__(self):
self.type = None
self.symb = None
self.left = None
self.right = None
def to_string(self):
if self.type == "eps":
return "\u0395"
elif self.type == "symb":
return self.symb
elif self.type == "union":
return self.left.to_string() + "\u222A" + self.right.to_string()
elif self.type == "conc":
return self.left.to_string() + "." + self.right.to_string()
elif self.type == "star":
return self.left.to_string() + "*"
The definition of the class Re
ends right here for now. We won't define constructors following the OOP programming fashion. Instead, constructors for regular expressions are defined outside its class, simply as functions.
We need 5 constructors, one for each type of regexp. We provide the two constructors for epsilon and union:
def eps():
r = Re()
r.type = "eps"
return r
def union(e1,e2):
r = Re()
r.type = "union"
r.left = e1
r.right = e2
return r
to_string()
function works, define the remaining constructors:
def symb(a):
...
def conc(e1,e2):
...
def star(e1):
...
Once you have defined all the constructors, it is easy to write regexps. For example, the regexp a.b is encoded as conc(symb("a"),symb("b"))
. The regexp (a*.b*)* is encoded as star(conc(star(symb("a")),star(symb("b"))))
. If you don't like to type symb("a")
or symb("b")
everytime then you can define ra = symb("a")
and rb = symb("b")
, and use ra
and rb
to define regexps. Considering the two previous expressions, they could be defined as conc(ra,rb)
and star(conc(star(ra),star(rb)))
, which is shorter and more readable.
print((star(conc(star(ra),star(rb)))).to_string())
a*.b**
This is a little bit confusing because there is an ambiguity. It is not clear whether the last star is applied to b*
or to the entire expression. Modify the to_string()
such it includes parantheses in all cases. After that, you may consider that it prints too many parantheses like in ((a)*.(b)*)*
, but that's okay for this assignment.
Encode regexps corresponding to the following regular languages (with Σ = {a,b}
):
ex1 =
is the regexp for the language containing all the strings starting and ending with the same symb. ex2 =
is the regexp for the language containing all the strings with "abba"
as substring.ex3 =
is the regexp for the language containing all strings without 3 consecutive a's. ex4 =
is the regexp for the language containing all strings of even length. ex5 =
is the regexp for the language containing all strings such each a
is followed by at least three b
's. match(w,e)
that decides whether a string w
matches the regexp e
.
To implement this function, you are going to need two helper functions split(w)
and split_all(w)
. These functions are necessary for the iteration in the concatenation and Kleene's star cases. We provide these functions below:
def split(w):
res = []
for i in range(len(w)+1):
res += [ [w[0:i],w[i:]] ]
return res
def split_all(w):
if len(w) == 0:
return [ ]
elif len(w) == 1:
return [ [w] ]
else:
res = [ ]
for i in range(len(w)-1):
w1 = w[0:i+1]
res += [ [w[0:i+1]]+lst for lst in split_all(w[i+1:]) ]
res += [[w]]
return res
The function call split(w)
returns the list of pair of strings such that the concatenation of the two string in a pair is equal to w.
The function call split_all(w)
returns the list containing all the list of strings of the form [w1,w2,...,wn]
such that each wi
is not the empty string and the concatenation w1+w2+...+wn
of all these strings is equal to w
.
I let you test this function on the string "xyzt"
to make sure you understand what it does.
> split("xyzt")
[['', 'xyzt'], ['x', 'yzt'], ['xy', 'zt'], ['xyz', 't'], ['xyzt', '']]
> split_all("xyzt")
[['x', 'y', 'z', 't'], ['x', 'y', 'zt'], ['x', 'yz', 't'], ['x', 'yzt'],
['xy', 'z', 't'], ['xy', 'zt'], ['xyz', 't'], ['xyzt']]
match(w,e)
using the skeleton code below. It returns True
when it matches, and it returns False
otherwise.
def match(w,e):
if e.type == "eps":
return w == ""
elif e.type == "symb":
...
elif e.type == "union":
...
elif e.type == "conc":
for p in split(w):
...
return False
elif e.type == "star":
...
You may need one helper function for the Kleene's star case. All the other cases should translate more or less directly from the pseudo-code for match()
ex0 = conc(union(ra,rb), union(ra,rb))
you would come up with the 6 strings "ab","ba", "bb", "", "a", "bab"
. To help you in this task, we provide a test_match
function that takes a list of tests and run them one after another and check that the output is as expected.
def test_match(lst):
if len(lst) == 0:
return
else:
the_test = lst[0]
w = the_test[0]
e = the_test[1]
b = the_test[2]
print("match("+ w + ", " + e.to_string() + ") =? " + str(b))
if match(w,e) == b:
print("Ok")
test_match(lst[1:])
else:
raise Exception("I have to fix that")
Using this function you could test match()
on ex0
by typing:
> test_match([ [ "ab",ex0,True ], [ "ba",ex0,True], [ "bb",ex0, True ],
[ "", ex0,False], [ "a" ,ex0,False],[ "bab",ex0,False] ])
match(ab, (a∪b).(a∪b)) =? True
Ok
match(ba, (a∪b).(a∪b)) =? True
Ok
match(bb, (a∪b).(a∪b)) =? True
Ok
match(, (a∪b).(a∪b)) =? False
Ok
match(a, (a∪b).(a∪b)) =? False
Ok
match(bab, (a∪b).(a∪b)) =? False
Ok
csci341
folder to your gitlab account. Add me bhcr001
to the list of developers for this project.
Copy your