Given a formal grammar and a string produced by that grammar, parsing is figuring out the production process for that string.
In the case of the context-free grammars, the production process takes the form of a parse tree. Before we begin, we always know two things about the parse tree: the root node, which is the initial symbol from which the string was originally derived, and the leaf nodes, which are all the characters of the string in order. What we don't know is the layout of nodes and branches between them.
For example, if the string is acddf, we know this much already:
S /|\ ??? | | | | | a c d d f
This approach is not unlike solving a jigsaw puzzle. We start at the bottom of the parse tree with individual characters. We then use the rules to connect the characters together into larger tokens as we go. At the end of the string, everything should have been combined into a single big S, and S should be the only thing we have left. If not, it's necessary to backtrack and try combining tokens in different ways.
With bottom-up parsing, we typically maintain a stack, which is the list of characters and tokens we've seen so far. At each step, we shift a new character onto the stack, and then reduce as far as possible by combining characters into larger tokens.
String is acddf.
| a
| | a c
B | | a c
B | | | a c d
B | | | | a c d d
B | | | | | a c d d f
B C | | | |\ a c d d f
| | a c
| | | a c d
B | /| a c d
B | /| | a c d d
B | /| | | a c d d f
B C | /| |\ a c d d f
S /|\ / | | / B C | /| |\ a c d d f
If all combinations fail, then the string cannot be parsed.
String is acdg.
| a
| | a c
B | | a c
B | | | a c d
B | | | | a c d g
| | a c
| | | a c d
B | /| a c d
B | /| | a c d g
| | | a c d
| | | | a c d g
For this approach we assume that the string matches S and look at the internal logical implications of this assumption. For example, the fact that the string matches S logically implies that either (1) the string matches xyz or (2) the string matches aBC. If we know that (1) is not true, then (2) must be true. But (2) has its own further logical implications. These must be examined as far as necessary to prove the base assertion.
String is acddf.
S |
S /|\ a B C | |
S /|\ a B C | | c
S /|\ a B C /| | c d
S /|\ a B C /| |\ c d d f
If, after following every logical lead, we can't prove the basic hypothesis ("The string matches S") then the string cannot be parsed.
String is acdg.
S |
S /|\ a B C | |
S /|\ a B C | | c
S /|\ a B C /| | c d
If our rules were left-recursive, for example something like this:
Then notice how our algorithm behaves:
S |
S |\ S b |
S |\ S b |\ S b |
S |\ S b |\ S b |\ S b |
...
Discussion (17)
2011-06-27 23:15:12 by Gil:
2012-04-18 03:09:33 by Sowmyaaa:
2013-02-14 00:05:04 by Deborah:
2013-07-25 12:06:26 by Pes:
2013-09-25 22:58:33 by Payel:
2013-12-15 17:49:59 by Joey:
2014-01-06 08:50:00 by melissa:
2014-04-06 15:35:49 by naw:
2014-04-07 22:52:28 by qntm:
2014-05-07 11:20:27 by Tom:
2014-06-24 14:04:35 by Chris:
2014-07-29 07:53:12 by hosein:
2014-12-31 17:48:57 by Cristiano:
2015-01-04 19:40:59 by Nachi:
2015-01-05 05:26:21 by Sérgio:
2015-02-03 19:20:21 by Chris:
2015-09-01 19:30:27 by Rajiv:
This discussion is closed.