The difference between top-down parsing and bottom-up parsing

Given a formal grammar and a string produced by that grammar, parsing is figuring out the production process for that string.

In the case of the context-free grammars, the production process takes the form of a parse tree. Before we begin, we always know two things about the parse tree: the root node, which is the initial symbol from which the string was originally derived, and the leaf nodes, which are all the characters of the string in order. What we don't know is the layout of nodes and branches between them.

For example, if the string is acddf, we know this much already:

    S
   /|\

   ???

| | | | |
a c d d f

Example grammar for use in this article

  • Sxyz | aBC
  • Bc | cd
  • Ceg | df

Bottom-up parsing

This approach is not unlike solving a jigsaw puzzle. We start at the bottom of the parse tree with individual characters. We then use the rules to connect the characters together into larger tokens as we go. At the end of the string, everything should have been combined into a single big S, and S should be the only thing we have left. If not, it's necessary to backtrack and try combining tokens in different ways.

With bottom-up parsing, we typically maintain a stack, which is the list of characters and tokens we've seen so far. At each step, we shift a new character onto the stack, and then reduce as far as possible by combining characters into larger tokens.

Example

String is acddf.

Steps
  • ε can't be reduced
  • a can't be reduced
  • ac can be reduced, as follows:
  • reduce ac to aB
    • aB can't be reduced
    • aBd can't be reduced
    • aBdd can't be reduced
    • aBddf can be reduced, as follows:
    • reduce aBddf to aBdC
      • aBdC can't be reduced
      • End of string. Stack is aBdC, not S. Failure! Must backtrack.
    • aBddf can't be reduced
  • ac can't be reduced
  • acd can be reduced, as follows:
  • reduce acd to aB
    • aB can't be reduced
    • aBd can't be reduced
    • aBdf can be reduced, as follows:
    • reduce aBdf to aBC
      • aBC can be reduced, as follows:
      • reduce aBC to S
        • End of string. Stack is S. Success!
Parse trees
|
a
| |
a c
  B
| |
a c
  B
| | |
a c d
  B
| | | |
a c d d
  B
| | | | |
a c d d f
  B   C
| | | |\
a c d d f
| |
a c
| | |
a c d
    B
|  /|
a c d
    B
|  /| |
a c d d
    B
|  /| | |
a c d d f
    B C
|  /| |\
a c d d f
    S
   /|\
  / | |
 /  B C
|  /| |\
a c d d f

Example 2

If all combinations fail, then the string cannot be parsed.

String is acdg.

Steps
  • ε can't be reduced
  • a can't be reduced
  • ac can be reduced, as follows:
  • reduce ac to aB
    • aB can't be reduced
    • aBd can't be reduced
    • aBdg can't be reduced
    • End of string. Stack is aBdg, not S. Failure! Must backtrack.
  • ac can't be reduced
  • acd can be reduced, as follows:
  • reduce acd to aB
    • aB can't be reduced
    • aBg can't be reduced
    • End of string. stack is aBg, not S. Failure! Must backtrack.
  • acd can't be reduced
  • acdg can't be reduced
  • End of string. Stack is is acdg, not S. No backtracking is possible. Failure!
Parse trees
|
a
| |
a c
  B
| |
a c
  B
| | |
a c d
  B
| | | |
a c d g
| |
a c
| | |
a c d
    B
|  /|
a c d
    B
|  /| |
a c d g
| | |
a c d
| | | |
a c d g

Top-down parsing

For this approach we assume that the string matches S and look at the internal logical implications of this assumption. For example, the fact that the string matches S logically implies that either (1) the string matches xyz or (2) the string matches aBC. If we know that (1) is not true, then (2) must be true. But (2) has its own further logical implications. These must be examined as far as necessary to prove the base assertion.

Example

String is acddf.

Steps
  • Assertion 1: acddf matches S
    • Assertion 2: acddf matches xyz:
    • Assertion is false. Try another.
    • Assertion 2: acddf matches aBC i.e. cddf matches BC:
      • Assertion 3: cddf matches cC i.e. ddf matches C:
        • Assertion 4: ddf matches eg:
        • False.
        • Assertion 4: ddf matches df:
        • False.
      • Assertion 3 is false. Try another.
      • Assertion 3: cddf matches cdC i.e. df matches C:
        • Assertion 4: df matches eg:
        • False.
        • Assertion 4: df matches df:
        • Assertion 4 is true.
      • Assertion 3 is true.
    • Assertion 2 is true.
  • Assertion 1 is true. Success!
Parse trees
    S
    |
    S
   /|\
  a B C
    | |
    S
   /|\
  a B C
    | |
    c
    S
   /|\
  a B C
   /| |
  c d
    S
   /|\
  a B C
   /| |\
  c d d f

Example 2

If, after following every logical lead, we can't prove the basic hypothesis ("The string matches S") then the string cannot be parsed.

String is acdg.

Steps
  • Assertion 1: acdg matches S:
    • Assertion 2: acdg matches xyz:
    • False.
    • Assertion 2: acdg matches aBC i.e. cdg matches BC:
      • Assertion 3: cdg matches cC i.e. dg matches C:
        • Assertion 4: dg matches eg:
        • False.
        • Assertion 4: dg matches df:
        • False.
      • False.
      • Assertion 3: cdg matches cdC i.e. g matches C:
        • Assertion 4: g matches eg:
        • False.
        • Assertion 4: g matches df:
        • False.
      • False.
    • False.
  • Assertion 1 is false. Failure!
Parse trees
    S
    |
    S
   /|\
  a B C
    | |
    S
   /|\
  a B C
    | |
    c
    S
   /|\
  a B C
   /| |
  c d

Why left-recursion is a problem for top-down parsers

If our rules were left-recursive, for example something like this:

  • SSb

Then notice how our algorithm behaves:

Steps
  • Assertion 1: acddf matches S:
    • Assertion 2: acddf matches Sb:
      • Assertion 3: acddf matches Sbb:
        • Assertion 4: acddf matches Sbbb:
          • ...and so on forever
Parse trees
  S
  |
  S
  |\
  S b
  |
  S
  |\
  S b
  |\
  S b
  |
  S
  |\
  S b
  |\
  S b
  |\
  S b
  |

...

Back to Blog
Back to Things Of Interest

Discussion (17)

2011-06-27 22:15:12 by Gil:

I think you just made a really awesome lead-in to the halting problem.

2012-04-18 02:09:33 by Sowmyaaa:

splendid work buddy!!

2013-02-14 00:05:04 by Deborah:

I finally understood it! Thank you!

2013-07-25 11:06:26 by Pes:

Nicely written! Thanks!

2013-09-25 21:58:33 by Payel:

Really helpful..Thanks a lot

2013-12-15 17:49:59 by Joey:

Very helpful. Much better explanation than my theory of computation textbook.

2014-01-06 08:50:00 by melissa:

Very helpful..thanks!

2014-04-06 14:35:49 by naw:

i don't understand shit


aBdd can't be reduced
aBddf can't be reduced

but for some reason

aBdf can be reduced to aBC

why dd couldn't have been reduced to C or even df in aBddf (to aBdC) is apparently so obvious for everyone but me, that it didn't even need a brief explanation.

2014-04-07 21:52:28 by qntm:

The explanation is that I'm an idiot. Cheers.

2014-05-07 10:20:27 by Tom:

Great post, thanks.

2014-06-24 13:04:35 by Chris:

Great and easy-to-understand explanation.

2014-07-29 06:53:12 by hosein:

it was just an introduction, what is the application differences. where should one use top-down or bottom-up ?

2014-12-31 17:48:57 by Cristiano:

From Italy

Very nice representation of the parse tree generation
Thank you !

2015-01-04 19:40:59 by Nachi:

Bottom up parsing - "If not, it's necessary to backtrack and try combining tokens in different ways" I feel the way it is put along with shift reduce parsing is misleading. Backtracking is essentially an aspect avoided (more like solved) by shift-reduce parsing. They don't go together in bottom up parsing. Shift reduce parsers posses the potential to predict the handle to use by looking at the contents on top of the stack.
Good job BTW, there are very few people who write about compiler/language theory :)

2015-01-05 05:26:21 by Sérgio:

This is awesome. Well done.

2015-02-03 19:20:21 by Chris:

Thank you i got it ;)

2015-09-01 18:30:27 by Rajiv:

Drawings made the concept so much clear. Thanks :)

This discussion is closed.