Given a formal grammar and a string produced by that grammar, parsing is figuring out the production process for that string.
In the case of the contextfree grammars, the production process takes the form of a parse tree. Before we begin, we always know two things about the parse tree: the root node, which is the initial symbol from which the string was originally derived, and the leaf nodes, which are all the characters of the string in order. What we don't know is the layout of nodes and branches between them.
For example, if the string is acddf, we know this much already:
S /\ ???      a c d d f
Example grammar for use in this article
 S → xyz  aBC
 B → c  cd
 C → eg  df
Bottomup parsing
This approach is not unlike solving a jigsaw puzzle. We start at the bottom of the parse tree with individual characters. We then use the rules to connect the characters together into larger tokens as we go. At the end of the string, everything should have been combined into a single big S, and S should be the only thing we have left. If not, it's necessary to backtrack and try combining tokens in different ways.
With bottomup parsing, we typically maintain a stack, which is the list of characters and tokens we've seen so far. At each step, we shift a new character onto the stack, and then reduce as far as possible by combining characters into larger tokens.
Example
String is acddf.
Steps
 ε can't be reduced
 a can't be reduced
 ac can be reduced, as follows:

reduce ac to aB
 aB can't be reduced
 aBd can't be reduced
 aBdd can't be reduced
 aBddf can be reduced, as follows:

reduce aBddf to aBdC
 aBdC can't be reduced
 End of string. Stack is aBdC, not S. Failure! Must backtrack.
 aBddf can't be reduced
 ac can't be reduced
 acd can be reduced, as follows:

reduce acd to aB
 aB can't be reduced
 aBd can't be reduced
 aBdf can be reduced, as follows:

reduce aBdf to aBC
 aBC can be reduced, as follows:

reduce aBC to S
 End of string. Stack is S. Success!
Parse trees
 a
  a c
B   a c
B    a c d
B     a c d d
B      a c d d f
B C    \ a c d d f
  a c
   a c d
B  / a c d
B  /  a c d d
B  /   a c d d f
B C  / \ a c d d f
S /\ /   / B C  / \ a c d d f
Example 2
If all combinations fail, then the string cannot be parsed.
String is acdg.
Steps
 ε can't be reduced
 a can't be reduced
 ac can be reduced, as follows:

reduce ac to aB
 aB can't be reduced
 aBd can't be reduced
 aBdg can't be reduced
 End of string. Stack is aBdg, not S. Failure! Must backtrack.
 ac can't be reduced
 acd can be reduced, as follows:

reduce acd to aB
 aB can't be reduced
 aBg can't be reduced
 End of string. stack is aBg, not S. Failure! Must backtrack.
 acd can't be reduced
 acdg can't be reduced
 End of string. Stack is is acdg, not S. No backtracking is possible. Failure!
Parse trees
 a
  a c
B   a c
B    a c d
B     a c d g
  a c
   a c d
B  / a c d
B  /  a c d g
   a c d
    a c d g
Topdown parsing
For this approach we assume that the string matches S and look at the internal logical implications of this assumption. For example, the fact that the string matches S logically implies that either (1) the string matches xyz or (2) the string matches aBC. If we know that (1) is not true, then (2) must be true. But (2) has its own further logical implications. These must be examined as far as necessary to prove the base assertion.
Example
String is acddf.
Steps

Assertion 1: acddf matches S
 Assertion 2: acddf matches xyz:
 Assertion is false. Try another.

Assertion 2: acddf matches aBC i.e. cddf matches BC:

Assertion 3: cddf matches cC i.e. ddf matches C:
 Assertion 4: ddf matches eg:
 False.
 Assertion 4: ddf matches df:
 False.
 Assertion 3 is false. Try another.

Assertion 3: cddf matches cdC i.e. df matches C:
 Assertion 4: df matches eg:
 False.
 Assertion 4: df matches df:
 Assertion 4 is true.
 Assertion 3 is true.

Assertion 3: cddf matches cC i.e. ddf matches C:
 Assertion 2 is true.
 Assertion 1 is true. Success!
Parse trees
S 
S /\ a B C  
S /\ a B C   c
S /\ a B C /  c d
S /\ a B C / \ c d d f
Example 2
If, after following every logical lead, we can't prove the basic hypothesis ("The string matches S") then the string cannot be parsed.
String is acdg.
Steps

Assertion 1: acdg matches S:
 Assertion 2: acdg matches xyz:
 False.

Assertion 2: acdg matches aBC i.e. cdg matches BC:

Assertion 3: cdg matches cC i.e. dg matches C:
 Assertion 4: dg matches eg:
 False.
 Assertion 4: dg matches df:
 False.
 False.

Assertion 3: cdg matches cdC i.e. g matches C:
 Assertion 4: g matches eg:
 False.
 Assertion 4: g matches df:
 False.
 False.

Assertion 3: cdg matches cC i.e. dg matches C:
 False.
 Assertion 1 is false. Failure!
Parse trees
S 
S /\ a B C  
S /\ a B C   c
S /\ a B C /  c d
Why leftrecursion is a problem for topdown parsers
If our rules were leftrecursive, for example something like this:
 S → Sb
Then notice how our algorithm behaves:
Steps

Assertion 1: acddf matches S:

Assertion 2: acddf matches Sb:

Assertion 3: acddf matches Sbb:

Assertion 4: acddf matches Sbbb:
 ...and so on forever

Assertion 4: acddf matches Sbbb:

Assertion 3: acddf matches Sbb:

Assertion 2: acddf matches Sb:
Parse trees
S 
S \ S b 
S \ S b \ S b 
S \ S b \ S b \ S b 
...
Discussion (17)
20110627 22:15:12 by Gil:
20120418 02:09:33 by Sowmyaaa:
20130214 00:05:04 by Deborah:
20130725 11:06:26 by Pes:
20130925 21:58:33 by Payel:
20131215 17:49:59 by Joey:
20140106 08:50:00 by melissa:
20140406 14:35:49 by naw:
20140407 21:52:28 by qntm:
20140507 10:20:27 by Tom:
20140624 13:04:35 by Chris:
20140729 06:53:12 by hosein:
20141231 17:48:57 by Cristiano:
20150104 19:40:59 by Nachi:
20150105 05:26:21 by Sérgio:
20150203 19:20:21 by Chris:
20150901 18:30:27 by Rajiv:
This discussion is closed.