For a formal grammar `G` = { `N`, `Σ`, `P`, `S` }, where...

`N`is a finite set of*nonterminal symbols*`Σ`is a finite set of*terminal symbols*, disjoint from`N``P`is a finite set of*production rules*, each consisting of- A finite string of symbols on the left (with at least one of them being nonterminal in
`N`) - A finite string of symbols on the right

- A finite string of symbols on the left (with at least one of them being nonterminal in
- A special starting symbol
`S`which is nonterminal in`N`

...the formal language ** L**(

`G`) is the set of all finite strings of symbols which can be

*produced*by...

- Starting with a string consisting of just an
`S` - Applying a production rule to the current string, to turn it into a new string
- Repeating step 2 until there are no nonterminal symbols left in the string.

At this point, all the symbols in the string are terminal in `Σ`, which means no more production rules can be applied. So, the sentence itself is terminal.

While `G` is finite, ** L** may be finite, countably infinite, or empty.

**is distinct from the set of**

`L`*all*finite strings of terminal symbols in Σ, because

**only contains the**

`L`*producible*strings.

**is the set of all terminal strings.**

`L`### The relationship between `G` and `L`

`L`

It is a relatively straightforward recursive process to derive all the the members of ** L** from

`G`.

- Stage 1: start with
`S` - Stage 2: take each production rule in turn. Try to apply it to
`S`. Sometimes the rule is not applicable, but sometimes a new string is produced. Some of these are terminal. Add these to. The rest are nonterminal. Hand these off to Stage 3.`L` - Stage 3: take each production rule in turn. Take each nonterminal string from Stage 2 in turn. Try to apply the rule to the string. This produces many, many more strings. Some of these are terminal. Add these to
. The rest are nonterminal. Hand these off to Stage 4...`L` - Continue...

This process may continue forever, as we know, because ** L** may be infinite.

But what about reversing this process: turning a set of strings ** L** back into a finite formal grammar

`G`? This is much harder.

If ** L** is finite, e.g....

= {L

fjjw

a

moocow

floipojwerpingelrnv

kkkkkkk

peas

...

zzzzz

}

...then in theory we can just enumerate all the members of ** L** and have a grammar which produces each one in turn...

G= {

N= {S}

Σ= {a,b, ...,z} [every terminal character seen in]L

S=S, obviously

P= {

S→fjjw

S→a

S→moocow

S→floipojwerpingelrnv

S→kkkkkkk

S→peas

...

S→zzzzz

}

}

However, this is trivial, and not terribly helpful, because `G` is as huge as ** L** and therefore doesn't tell us anything about

**, which is the point of the exercise.**

`L`If we wish to make a *small* formal grammar from ** L**, then the task becomes substantially more difficult. If

**is infinite, then the task can even become impossible.**

`L`#### Proof that some languages have no formal grammar

If a *language* is a set of finite strings, then a language ** L** is a subset of the set (call it

`V`) of all possible finite strings. This means that the set of

*all*languages is the set of

*all*possible subsets of

`V`. This is called the power set of

`V`, which is written P(

`V`). A power set is always strictly larger than the original set.

`V`is countably infinite, meaning that P(

`V`) is uncountably infinite. There are uncountably many languages.

However, every *formal* grammar `G` is finite. This means that the set of all possible formal grammars is countably infinite. Each formal grammar generates a single language ** L**, which means, even including duplicates, that there only exists a countably infinite number of formal languages.

This means that uncountably many languages must *not* be formal languages; they cannot be generated by a finite formal grammar.

## Discussion (4)

## 2010-02-11 03:57:51 by pascal:

## 2010-02-12 04:18:06 by joel:

## 2010-02-12 10:38:11 by qntm:

## 2017-03-10 10:00:07 by Ingvar: