Alert! Don't use this! It is hideously, unnecessarily, unmaintainably complicated under the covers and I no longer support it in any meaningful sense. Use Loco instead! Much more sensible.
Here's the PHP script containing the Lexer and Parser classes that you'll need to create your own PHP lexers and parsers.
And here are some pre-built samples.
- json.php - lexer and parser for the JSON data format. Turns JSON strings into pleasant PHP arrays.
- simpleComment.php - lexer and parser for user comments here at qntm.org. Validates that the supplied string fits the strict subset of HTML that is permitted. Returns the original comment in string form, but could just as easily return a cleaned-up comment, or nothing.
- bnf.php - lexer and parser for a very basic subset of Backus-Naur Form. Actually returns a new Parser object as output.
- bracketMatch.php - very simple worked example for Wikipedia's example well-formed parentheses context-free grammar. With extensive comments.
- perfDemo.php - demonstrates three ways to tokenise and then parse a very simple context-free language, in increasing order of performance.
- regEx.php - NEW! Turn a regular expression from a string into a physical, manipulable object instance. (I wrote this in an evening. The previous regex parser I wrote took me a week!)
How do I use RailroadParser?
Here's a simple worked example with a million comments: bracketMatch.php.
For more advanced demonstrations, I recommend inspecting and fooling around with the other examples above, such as the JSON example. Also take a look at the unit tests to see some usage examples and failure cases.
How does RailroadParser work?
RailroadParser is a top-down parser similar to an LL(1) parser.
As the name implies, RailroadParser is based on the concept of the railroad diagram. As parsing progresses, the parser maintains two pieces of information. The first is the incomplete parse tree. The second is the physical position on the railroad, which includes a stack of nested rules that we are currently inside. The position is singular (i.e. the parser is deterministic and can only be in one state at a time) and fully describes the current location on the railroad, including which rules need to be "popped" off the stack in order to exit the railroad at the far end.
In order to figure out which instantaneous description to jump to next, RailroadParser has to look into the future. Because of this, RailroadParser cannot be used with left-recursive grammars because a left-recursive grammar results in situations where the number of possible future states is infinite. However, all left-recursive grammars can be rewritten to be right-recursive instead, which are fine. Left-recursion is caught at Parser instantiation time.
Because RailroadParser is deterministic, it cannot handle non-deterministic context-free grammars, only deterministic ones, also known as the LL(k) grammars. RailroadParser can only accept grammars specified in an LL(1) form.
Discussion (14)
2011-04-17 11:00:22 by skztr:
2011-04-17 11:02:47 by Sam:
2011-04-17 20:09:48 by Sam:
2011-04-18 07:44:46 by skztr:
2011-04-18 10:06:10 by Sam:
2011-04-18 21:50:57 by tef:
2011-04-20 19:31:46 by Sam:
2011-05-12 03:08:10 by RasmusSchultz:
2011-07-06 08:52:46 by Bushee:
2011-07-06 10:15:40 by Sam:
2011-07-12 03:03:44 by intelChrisAKAChrisClark:
2011-07-12 16:01:38 by Bushee:
2011-07-13 23:09:58 by Sam:
2011-07-15 11:47:50 by Bushee:
add comment