© Copyright 2005 Peri Hankey - documentation license Gnu FDL - code license Gnu GPL - validate HTML
SourceForge.net Logo grammar

home formal grammar

In 1957, Noam Chomsky described a way of talking about language that proved immensely influential. The key elements in his approach were:

home two kinds of symbol

The two kinds of symbol in Chomsky's account of a grammar are:

home rewriting rules

An unrestricted rewriting rule is characterised as follows:

home a hierarchy of languages

Chomksy also showed that successive restrictions on the generality of rules in the grammar produce a hierarchy of languages in which the more general include the less general:

home received wisdom

Most researchers who took these ideas as a point of departure bought the whole bag - the description of a language as an infinite set of sentences, the description of grammar, and the idea that the grammar generates the sentences.

This was in many ways unfortunate: almost every application of formal grammar involves recognising and understanding the sentences of a language. From that standpoint, the unrestricted and context-sensitive grammars were not easy to understand, and so almost all efforts were concentrated on context-free and regular grammars.

So the received wisdom is that to think about language, you think in terms of regular grammars for low-level lexical analysis, with context-free grammars as a basis for the construction of parsers, where the object of parsing is to produce a tree structure. Beyond this it appears that there is very little that the theories of grammar can offer.

home analytic versus generative grammar

In fact it is perfectly possible to accept Chomksy's description of language and grammar, while stating that there are two possible ways in which a finite grammar can characterise a possibly infinite set of sentences:

In each case, the rest of Chomsky's ideas remain valid and useful. But the difference between generative and analytic grammars is very great, particularly in relation to unrestricted grammars - the most general category of grammar:

home analytic and generative rules

Unrestricted rewriting rules in a generative grammar go from sequences of symbols in the grammar towards the sentences of the language - they have this overall form:

 sequence-of-symbols-in-the-grammar => sequence-nearer-the-sentences-of-the-language;

Unrestricted rewriting rules in an analytic grammar go in the opposite direction, from the sentences of the language towards sequences of symbols in the grammar:

 sequence-nearer-the-sentences-of-the-language => sequence-of-symbols-in-the-grammar;

where '=>' means 'can be rewritten as'. In accounts of generative grammar, this would often be written using '->', while in the lmn metalanguage it would be written using '<-' to emphasise substitution back into the input stream; I am grateful to Nathan Sobo on http://lambda-the-ultimate.org for pointing out that in this context it was confusing to use both forms.

To the extent that an analytic grammar is the mirror image of a generative grammar, anything that can be said or proved about generative grammars should have a direct equivalent that can be applied to analytic grammars.

home analysis precedes generation

Unrestricted rules in a generative grammar replace patterns in the grammar with patterns that get closer to the sentences of the language. To operate an unrestricted generative grammar, you have to recognise patterns in the grammar in order to decide which generative rule to apply at each iteration. So some process of recognition or analysis is required before unrestricted rules in a generative grammar can be applied as part of a generative process. Analysis (or at least recognition) logically precedes generation.

home the language machine

The language machine directly implements unrestricted analytic grammars. It adds a system of variables with side-effect actions and calls on external procedures. It can be used to create useful applications, and above all it provides a way of thinking about and understanding how grammar works.