Non-deterministic Finite Automata (NFA)

Introduction

NFAs are cool! You are going to write the answer to the problems into a Python file named prog2_aut.py At the beginning of this file, copy the code from the previous programming assignment for:

class Re:
    def __init__(self):
        self.type = None
        self.char = None
        self.left = None
        self.right = None

    def to_string(self):
        ...

def eps():
    ...
        
def symb(c):
    ...
        
def union(e1,e2):
    ...
    
def conc(e1,e2):
    ...

def star(e1):
    ...

1 A datatype for Automata

The mathematical definition for an automaton is a 5-tuple (Q, Σ, Δ, s, F). You may have noticed that the last three components Δ, s and F are essential to describe an NFA, whereas the two first components Q and Σ could be inferred from the last 3 components. To keep our data-structure as simple as possible, we are going to keep only the last three components Δ, s and F. Therefore, a natural data-structure for automata is a triple stored in an object.

class Automata:
    def __init__(self,delta,start,accept):
        self.delta = delta
        self.start = start
        self.accept = accept

    def to_string(self):
        s1 = str(self.delta)
        s2 = str(self.start)
        s3 = str(self.accept)
        return "(" + s1 + ", " + s2 + ", " + s3 + ")"

The starting state self.start is a particular state (usually an integer). The set of accepting states self.accept is expected to be a list of states like [4,1,7]. The transition relation self.delta is expected to be a list of transitions, where a single transition is represented as a triple. For example a single transition [1, 'a', 3] encodes a transition from state 1 to state 3 labelled with symbol 'a'. A transition relation for an NFA recoginizing the language (ab)* could be [ [1,'a',2], [2,'b',3], [3,"eps",1]]. Notice how the epsilon transitions are labelled with the particular string "eps".

To illustrate the format expected for the transition function we provide the function transition_symb(delta,q,a) that returns the list of states p such that there exists a transtion [q,a,p] in the transition relation delta.

  
def transition_symb(delta, q, a):
    if delta == []:
        return []
    else:
        tr = delta[0]
        q1 = tr[0]
        a1 = tr[1]
        p  = tr[2]
        if q == q1 and a == a1:
            return [ p ] + transition_symb(delta[1:],q,a)
        else:
            return transition_symb(delta[1:],q,a)

Make sure you understand all the details of this function before proceeding. Remark that this function can be used with a being a symbol or an epsilon transition "eps". When the input symbol is "eps", the function returns all the states that can be reached from q taking exactly one epsilon-transition.

Problem 1

The epsilon-closure of a state q is the set of states that are reachable by an arbitrary number of epsilon-transitions. Implement the epsilon-closure of a state q according to a transition relation delta as the function


def eps_closure(delta,q):
  ...

  
To implement this function you may only use the function transition_symb() provided above. Hint: consider having two variables, one for the states you have already visited (initialized to []), another variable for the states you have to visit (initialized to [q]), and a while-loop that updates these two variables until there are no states available in states to visit.

Your function is expected to behave as in the following example:


>>> delta1 = [ [1,"eps",2], [2,'a',3], [4, "eps", 3], [4,"b", 5], [2,"eps",4], [3,"eps",2] ]
>>> print(eps_closure(delta1,1))
[3, 4, 2, 1]

The order of states appearing in the list does not matter. Nevertheless, no duplicates are expected in the result. Make sure the states 4 and 1 appear in the list.

2 NFA - String matching algorithm

Using the epsilon-closure defined previously, given a state q and a symbol a, we can compute the set of states that are accessible by an arbitrary number of epsilon-transitions followed by a transition labelled by a:

def transition_eps_symb(delta, q, a):
    eps_reached = eps_closure(delta,q)
    res = []
    for p in eps_reached:
        res = transition_symb(delta,p,a) + res
    return res

Make sure you understand all the details of this function before proceeding.

Problem 2

Using the two functions epsilon_closure() and transition_eps_symb(), implement the algorithm that decides whether a string w matches an NFA nfa, starting from a state q. The function can be defined recursively on the length of w and may have the structure given below:

def match_nfa(nfa,w,q):
    if w == "":
        eps_reached = eps_closure(nfa.delta,q)
        for p in eps_reached:
            ...
        return False
    else:
        a = w[0]
        ...

3 From regexps to NFA

We are now going to implement the algorithm that converts regexp into NFAs, like the one we have seen in class. The construction we have seen for each case of a regexp (eps, symb, union, conc, star) introduced at most 2 new states and some epsilon transitions depending on the case. In particular, remember the construction for conc and star... it creates epsilon-transitions from some accepting states to a starting state. To this matter it will be very useful to have the function below:

def conc_helper(states,s):
    res = []
    for q in states:
        res = [[q,"eps",s]] +res
    return res

For example, this function called on the set of states [1,3,5] and state 7 returns the following transitions:

>>>print( conc_helper([1,3,5],7) )
[[5, 'eps', 7], [3, 'eps', 7], [1, 'eps', 7]]

Problem 3

Implement the function re_to_nfa(e) that converts a regexp e to an NFA, completing the following piece of code filling up the parts indicated with "...", that is the construction for the epsilon, union and conc:

state_count = 0

def new_state():
    global state_count
    s = state_count
    state_count += 1
    return s

def re_to_nfa(e):
    if e.type == "eps":
        ...
    elif e.type == "symb":
        qstart = new_state()
        qaccept = new_state()
        return Automata([ [qstart, e.symb, qaccept] ],qstart,[qaccept])
    elif e.type == "union":
        ...
    elif e.type == "conc":
        ...
    elif e.type == "star":
        aut1 = re_to_nfa(e.left)
        s = new_state()
        delta = aut1.delta + [ [ s,"eps", aut1.start ] ] + conc_helper(aut1.accept, aut1.start)
        return Automata(delta, s, [s]+aut1.accept)

You may have noticed that this code is using a function new_state(). This is required to avoid having to rename the states by making sure the set of states allocated for 2 regexps are disjoint. To achieve this goal, we define a global variable named state_count and a function new_state() that increment the global variable state_count and returns its previous value. The global variable is invisible from re_to_nfa(). (Another solution without using a global variable is possible, but at the expense of adding an argument to re_to_nfa().)

4 Test your work

If you run the function re_to_nfa() on different regexp, you may want to reset the global variable state_count to 0 everytime.

Problem 4

Define three regexps for regexp written in comments

re1 = ...     # (aub)(cud)(euf)
re2 = ...     # (a*b*)*
re3 = ...     # (aub)*aaa((aub)*)

Convert each regexp to an NFA using re_to_nfa()

nfa1 = re_to_nfa(re1)
nfa2 = re_to_nfa(re2)
nfa3 = re_to_nfa(re3)

For each NFA, test the string matching with 4 representative strings. The strings should be choosen to illustrate the correctness of your implementation.

Make sure that if I run your program, it will print clearly all the tests, such that I can quickly evaluate your work, thanks!

5 Submit your work

Copy your file to the csci341 git folder. Add, commit and push your work.