Finite state machines in Python

• `fsm.py` which is on GitHub now

This is a library for the creation and manipulating of deterministic finite state machines in Python 3.

`fsm` class overview

This contains the basic `fsm` class. This allows you to create a deterministic finite-state machine (I guess "automaton" is a more correct term but never mind, if that bothers you too much you can do a find-and-replace).

You can use the `fsm.accepts()` method to pass a string to the FSM and retrieve `True` or `False`. The `fsm` class also implements `__str__()` which means you can print an FSM using `print(fsm)`.

Finally and most importantly, if you also have my companion `lego.py` module, you can call `fsm.lego()` to convert the FSM into an object representing a regular expression with exactly equivalent matching power to the original FSM. This process uses Brzozowski's algebraic method to carry out the conversion, as well as using the full power of `lego.py` to rearrange and simplify the resulting regular expression objects.

Example

To do: a slightly more impressive example than this.

```>>> from fsm import fsm
>>> a = fsm(
...     alphabet = {"a", "b"},
...     states   = {0, 1, 2},
...     initial  = 0,
...     finals   = {1},
...     map      = {
...             0 : {"a" : 1, "b" : 2},
...             1 : {"a" : 2, "b" : 2},
...             2 : {"a" : 2, "b" : 2},
...     },
... )
>>> print(a)
name final? a b
------------------
* 0    False  1 2
1    True   2 2
2    False  2 2

>>> print(repr(a.lego()))
charclass("a")
>>> print(a.lego())
a
>>> print(a.accepts(""))
False
>>> print(a.accepts("a"))
True
>>> print(a.accepts("b"))
False
>>> print(a.accepts("c"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fsm.py", line 68, in accepts
state = self.map[state][symbol]
KeyError: 'c'
```

Built-ins

Given any alphabet (collection of hashable "symbols"), you can generate some ready-made FSMs:

• `fsm1 = null(alphabet)` returns an FSM matching no strings at all.
• `fsm1 = epsilon(alphabet)` returns an FSM accepting only ε, also written "", the empty string.

Methods

Call `reversed(fsm1)` to return a reversed FSM. For each string that `fsm1` accepted, `reversed(fsm1)` will accept the reversed string.

Call `fsm1.strings()` to get a generator of strings that this FSM accepts.

Operators

`fsm.py` also provides code which allows you to perform all of the FSM combination operations alluded to above.

`fsm.__add__()` (called when you write `fsm3 = fsm1 + fsm2`) concatenates two FSMs. If the first FSM accepts all strings in A and the second FSM accepts all strings in B, then the resulting FSM will accept all strings of the form a·b where a is in A and b is in B.

`fsm.__and__()` (called when you write `fsm3 = fsm1 & fsm2`) produces an FSM accepting any string which is in both A and B. This is called intersection.

`fsm.__or__()` (called when you write `fsm3 = fsm1 | fsm2`) produces an FSM accepting any string which is in A or in B or both. This is called alternation.

`fsm.star()` takes a single FSM and connects its initial state to its final states. For any string a accepted by the input FSM, the output FSM will accept ε, a, aa, aaa, and so on.

`fsm.__mul__()` (called when you write `fsm2 = fsm1 * 7`) is essentially repeated self-concatenation. For any string a accepted by the input FSM, the output FSM will accept aaaaaaa.

All four of these operations are automatically followed with a clever routine to locate and merge duplicate states in the resulting FSM. This almost always makes the resulting FSM much simpler.

Converting to regular expression objects

Every finite state machine has an alphabet associated with it. Typically this alphabet will be a `set` of characters, such as `{"a", "b", "c"}`. If you have an FSM over this alphabet which is capable of recognising "any string", then upon conversion into a regex you might expect an expression like `.*`, with `.` standing for "any character". Instead, you will get an expression like `[abc]*`.

There is a way around this. By convention, if you add `lego.otherchars` to your alphabet, this will be taken to stand for "every possible character not explicitly mentioned in the rest of the alphabet". Thus, an FSM over the alphabet `{"a", "b", "c", otherchars}`, which can recognise `"a"` or `"b"` or `"c"`, will be turned into the regular expression `[abc]`, whereas an FSM over that same alphabet which can recognise `"a"` or `"b"` or `"c"` or `lego.otherchars` will be turned into `.` as you would expect.

FSM theory if you aren't already familiar

A finite state machine is a combination of three things:

• A finite alphabet Σ of symbols
• A finite set S of states
• A map δ : S × ΣS which tells the machine what to do when a symbol is passed to it.

One of the states is marked as the initial state, and any of the states may be marked as possible final states.

Clearly, if you take an arbitrary FSM, starting in its initial state, and pass it an arbitrary string of symbols, the FSM will now be in a new state. This state may be final or not. If the state is final, then we say that the FSM accepts that string. If there are no final states, or they exist but are unreachable, the FSM accepts no strings. If the starting state is final, then the FSM will accept the empty string (possibly others).

The (possibly infinite) set of finite strings which an FSM will accept is called a regular language. Likewise, a regular language is defined as any (possibly infinite) set of finite strings for which an FSM can be devised accepting only those strings and no others. Since FSMs can be combined in various ways to form additional FSMs, and every FSM corresponds to a unique regular language, it can be proven that regular languages themselves can be combined.

What about nondeterministic finite state machines?

Sorry, non-deterministic finite state machines are impossible with this module unless you modify it. I did have NFSM capability here at one point, but I found out that it was completely unnecessary for this project. NFSMs can be trivially converted to and from DFSMs, and this conversion was so trivial that I removed the NFSM steps entirely.

Discussion (4)

2010-07-31 18:29:27 by Fjord:

This has nothing to do with anything, but reading this was made marginally more difficult by my inability to stop mentally translating "FSM" to "Flying Spaghetti Monster." Just saying.

2010-08-01 05:10:11 by JeremyBowers:

Unless you've got a really strong reason to believe there's some chance that someone's going to use this to make billions of \$CURRENCY and laugh maniacally about how they don't have to pay you one thin \$SMALLEST_CURRENCY, I would just release this into the public domain. Licensing is for code chunks large enough or complicated enough for that to sound much less silly. Anything else is just a hassle for you.

2010-08-01 08:22:40 by lorg:

Usually I just release my code under the BSD license. Especially for small pieces I put up on my blog.

2010-08-01 22:57:15 by jonas:

To Fjord: maybe that's why these are more commonly called "finite automaton" (FA) instead.