hippos.lmn: this file is (C) Copyright 2006 Peri Hankey (mpah@users.sourceforge.net). This text is published under the terms of the Gnu General Program License, and comes with absolutely no warranty.
bogus-english, bogus-dutch, bogus-german
We have three different word orderings in languages in which each sentence consists of a noun followed by nouns and verbs that are to be grouped in pairs.
We define a canonical ordering noun (noun (noun (...) verb) verb) in which the pairings are defined by simple nesting. The possible orderings are
- I saw Cecilia help Henk feed hippos - bogus-english: noun verb1 noun1 verb2 noun2 ...
- I Cecilia Henk hippos saw help feed - bogus-dutch: noun noun1 noun2 ... verb1 verb2 ...
- I Cecilia Henk hippos feed help saw - bogus-german: noun noun1 noun2 ... ... verb2 verb1
In other words the canonical order is bogus-german. The objective is to recognise sentences in these different languages (as indicated by e: , d: , and g: prefixes) and translate them to the canonical order. Here is an example of what we expect:
- e: I saw Cecilia help Henk feed hippos; -> y {I Cecilia Henk hippos feed help saw }
- e: I saw Cecilia help Bill make hippos bite horses; -> y {I Cecilia Bill hippos horses bite make help saw }
- d: I Cecilia Bill horses hippos saw help make bite; -> y {I Cecilia Bill hippos horses bite make help saw }
- g: I Cecilia Bill horses hippos bite make help saw; -> y {I Cecilia Bill hippos horses bite make help saw }
- e: Bill made tigers eat hay; -> y { Bill tigers hay eat made }
- g: Bill tigers hay eat make saw; -> n {g: Bill tigers hay eat make saw;}
In the last case above, the input should not be recognised because there too many verbs. The example that prompted all this is in an otherwise unrelated paper about combinatory categorial grammar by Mark Steedman.
a few useful rules in the outermost context
say output <- eof - ; .[ \n] <- eof - ; - error <- eof - ;
rules that apply when generating output
eot <- output; - out <- output -; tr vv :V <- output - V;
n :N <- output - N; v :V <- output - V;
deal with errors by consuming one line
- line :T <- error say "n {" T "}\n" eot; - repeat .[^\n] % <- line % ;
rules for english, german and dutch word order
'e: ' n :S verb_noun_pairs :P ';' <- eof - say "y {" S P "}\n" eot; 'g: ' n :S nouns_verbs_reverse_order :P ';' <- eof - say "y {" S P "}\n" eot; 'd: ' n :S nouns_verbs_forward_order :P ';' <- eof - say "y {" S P "}\n" eot;
The rule for verb_noun pair ordering is very simple - it builds a reverse-ordered list of verbs and a forward ordered list of nouns. The other rules operate by requiring a verb for each noun recognised. The verb recogniser builds a forward list for the german ordering, and a reversed list for the dutch ordering.
- repeat rv :Rv nn :Nn <- verb_noun_pairs :{ Nn Rv }; - rqfv :Nn :V V <- nouns_verbs_reverse_order :{ Nn Fv }; - rqrv :Nn :V V <- nouns_verbs_forward_order :{ Nn Rv };
These rules recognise a sequence of nouns and provide a recogniser that will match one verb for each noun. They differ only in the ordering of the list of verbs that will be produced when the verb list recogniser is applied.
- repeat nn :Nn require :F :R <- rqrv :Nn :{ each R }; - repeat nn :Nn require :F :R <- rqfv :Nn :{ each F };
The require rule provides verb recognisers on demand - one to recognise a verb and add it to a forward list, the other to recognise a verb and add it to a reverse list (it's concise to have one rule provide both, but it would be more efficient at runtime to use two separate rules).
- <- require :{ fv :Fv } :{ rv :Rv };
Here are the rules that recognise verbs and nouns and add them to lists - fv recognises a verb and adds it to a forward list Fv, rv recognises a verb and adds ti to a reverse list Rv. nn recognises a noun and adds it to a forward list Nn.
- v :V <- fv :{ Fv v :V }; - v :V <- rv :{ v :V Rv }; - n :N <- nn :{ Nn n :N };
vocabulary
And finally, here is a simple and minimal vocabulary of nouns and verbs. Note that the lexical details are very crude - just one space is requred after each word, except that no space is required before a semicolon - not that these lexical details are hard, they are just not the point of this exercise.
';' <- ' ;' ;
'I ' <- n :"I " ; 'Cecilia ' <- n :"Cecilia " ; 'Gertrude ' <- n :"Gertrude " ; 'Henk ' <- n :"Henk " ; 'Bill ' <- n :"Bill " ; 'hippos ' <- n :"hippos " ; 'horses ' <- n :"horses " ; 'tigers ' <- n :"tigers " ; 'hay ' <- n :"hay " ;
'make ' <- v :"make " ; 'made ' <- v :"made " ; 'saw ' <- v :"saw " ; 'help ' <- v :"help " ; 'feed ' <- v :"feed " ; 'like ' <- v :"like " ; 'eat ' <- v :"eat " ; 'bite ' <- v :"bite " ;