Finite state machines in Python

Okay, so I had an idea for a project and I decided that this would be as good a time as any to also learn Python so here's the first part of it. I can't release all of it right now but the rest is forthcoming.

This is (and I realise that this has been done before) a library for the creation and manipulating of deterministic finite state machines in Python. I realise that this has been done before in basically every language under the sun, Python included, but doing it for myself was incredibly good fun and highly educational.

fsm.py

This contains the basic fsm class. This allows you to create a deterministic finite-state machine (I guess "automaton" is a more correct term but never mind, if that bothers you too much you can do a find-and-replace). A finite state machine is a combination of three things:

One of the states is marked as the initial state, and any of the states may be marked as a possible final state.

Clearly, if you take an arbitrary FSM and pass it an arbitrary string of symbols, the FSM will now be in a new state. This state may be final or not. If the state is final, then we say that the FSM accepts that string. If there are no final states, or they exist but are unreachable, the FSM accepts no strings. If the starting state is final, then the FSM will accept the empty string (possibly others).

The (possibly infinite) set of finite strings which an FSM will accept is called a regular language. Likewise, a regular language is defined as any (possibly infinite) set of finite strings for which an FSM can be devised accepting only those strings and no others. Since FSMs can be combined in various ways to form additional FSMs, and every FSM corresponds to a unique regular language, it can be proven that regular languages themselves can be combined.

fsm.py also allows you to print an FSM and provides some simple sample FSMs, including a null FSM which accepts no strings, and epsilon, which is an FSM accepting only ε, also written "", the empty string.

Technically, a fully described finite state machine must completely define the transition function δ. That is, for every combination of a state and a symbol, there MUST be a resulting state defined by the transition function. fsm.py requires the transition function to include every state, but not every symbol in the alphabet. Transitions are tested by calling the getNext() method. If a state/symbol transition doesn't appear in the supplied transition function, an exception arises.

This allows the implicit presence of an "oblivion" state. Given a state/symbol combination which is undefined, it is assumed that you have instead made a transition to this unseen "oblivion" state, and any transition from this state leads back to oblivion. This is basically intended to save time and repetition. It's also not necessary; an FSM can be fully defined instead.

fsmops.py

fsmops.py provides code which allows you to perform all of the FSM combination operations alluded to above.

fsmconcatenate() takes an arbitrary number of FSMs and concatenates them together. If the first FSM accepts all strings in A and the second FSM accepts all strings in B, then the resulting FSM will accept all strings of the form a·b where a is in A and b is in B.

fsmalternate() produces an FSM accepting any string which is in A or in B or both.

fsmintersect() produces an FSM accepting any string which is in both A and B.

fsmstar() takes a single FSM and connects its initial state to its final states. For any string a accepted by the input FSM, the output FSM will accept ε, a, aa, aaa, and so on.

The above four methods all work by the same basic method, which is to take the input FSM(s) and use them to construct information such as isFinal() and getNext() and newInitialState. These variables and functions are passed to the _crawl() function which uses them to explore and map out the resulting finite state machine. _crawl() constructs the list of states, takes a note of which of the states are final, and constructs the transition function for the new FSM. This saved a LOT of duplicate code for me and the solution was much more elegant than what I had before. It also does some incredibly cool (and difficult to code!) stuff to strip out equivalent states. There's one unit test in fsmops.py which demonstrates this. I never got around to writing unit tests for the rest of the the module. Or package. I don't really know what the difference is, the Python documentation seems to use the terms interchangeably.

That's it for now. There's more to come. One other thing: I wanted to use this opportunity to open up a dialogue to figure out exactly what license to release this code under. Up until now I've used the "I write, you read" model of code release which proved somewhat controversial, but I want people to be able to do stuff with this.

What about nondeterministic finite state machines?

Sorry, non-deterministic finite state machines are impossible with this module unless you modify it. I did have NFSM capability here at one point, but I found out that it was completely unnecessary for this project. NFSMs can be trivially converted to and from DFSMs, and this conversion was so trivial that I removed the NFSM steps entirely.)

Back to Code
Back to Things Of Interest
StumbleUpon Twitter Hacker News Facebook Reddit Digg del.icio.us Email

Discussion (4)

2010-07-31 18:29:27 by Fjord:

This has nothing to do with anything, but reading this was made marginally more difficult by my inability to stop mentally translating "FSM" to "Flying Spaghetti Monster." Just saying.

2010-08-01 05:10:11 by JeremyBowers:

Unless you've got a really strong reason to believe there's some chance that someone's going to use this to make billions of $CURRENCY and laugh maniacally about how they don't have to pay you one thin $SMALLEST_CURRENCY, I would just release this into the public domain. Licensing is for code chunks large enough or complicated enough for that to sound much less silly. Anything else is just a hassle for you.

2010-08-01 08:22:40 by lorg:

Usually I just release my code under the BSD license. Especially for small pieces I put up on my blog.

2010-08-01 22:57:15 by jonas:

To Fjord: maybe that's why these are more commonly called "finite automaton" (FA) instead.

add comment