TreeGraSP is a research project funded by an ERC Consolidator Grant. The acronym stands for Tree rewriting grammars and the syntax-semantics interface: From grammar development to semantic parsing.
The project is hosted by the University of Düsseldorf and led by Professor Laura Kallmeyer.


The increasing amount of data available in our digital society is both a chance and a challenge for natural language processing. On the one hand, we have better possibilities than ever to extract and process meaning from language data, and recent techniques, in particular deep learning methods, have achieved impressive results. On the other hand, linguistic research has a much broader empirical basis and can aim at rich quantitative models of language. Unfortunately, theory and application interact too little in these areas of meaning extraction and grammar theory. Current semantic processing techniques do not sufficiently capture the complex structure of language while grammatical theory does not sufficiently incorporate data-driven insights
about language.
TreeGraSP bridges this gap by combining rich linguistic theory with data-driven approaches to large scale statistical grammar induction and to semantic parsing. The novelty of its approach consists in putting semantics at the center of grammar theory, putting an emphasis on multilinguality and typological diversity, and adopting a constructional approach to grammar.

TreeGraSP is interdisciplinary and innovative in serveral respects:
It contributes to the field of linguistics by a) making theories of grammar explicit, b) providing a grammar implementation tool for typologically working linguists and c) developing means to obtain a quantitative grammar theory. And it contributes to the field of computational semantics by providing a probabilistic theory of meaning construal that can be used for textual entailment and reasoning applications.

The challenge lies in the intended transfer between theoretical linguistics and statistical natural language processing.


Pr. Laura Kallmeyer

Dr. Rainer Osswald

Dr. Simon Petitjean

WP 1: Formalizing and implementing RRG (syntax and semantics)

TreeGraSP builds on previous theoretical work concerning the specific form of the syntax-semantics interface using TAG and it will add work on formalizing RRG based on this. RRG is a rich empirically motivated grammar theory so far lacking formalization. By formalizing it, we will gain new insights about the mathematical properties of grammar and we will be able to provide tools for RRG implementation that will be highly useful to the community. On the other hand, insights on grammar architecture, for instance the mechanisms underlying argument linking, can be transferred from RRG to TAG. Our view is that TAG (as a linguistic theory) and RRG share the same semantic grammar
specifications while they differ in their syntactic building blocks and in the syntactic composition operations.
The overall goal of WP 1 is an integrated TAG/RRG grammar development framework including a thorough formalization, implementation tools, an implementation of universal parts of the grammar such as a linking theory for the syntax-semantics interface, and a parser. As a test case, we plan the implementation of typologically interesting grammar fragments.

WP 2: Automatic (meta)grammar extraction and parsing (syntax)

WP 2 is concerned with syntactic grammar induction. This comprises on the one hand a supervised induction of elementary trees (for TAG/RRG) from treebanks and on the other hand a metagrammar induction starting from an existing TAG/RRG. In contrast to WP 1, WP 2 is concerned with a data-driven probabilistic approach. The resulting probabilistic (meta)grammar can then, in turn, be used for parsing, which requires the implementation of corresponding probabilistic parsers for TAG and RRG.

WP 3: Semantic Parsing

WP 3 extends the data-driven approach from WP 2 to semantics. Overall, we will pursue an approach inspired by Lewis & Steedman (2013), albeit with TAG/RRG instead of CCG. More concretely, via a probabilistic grammar of tree-meaning pairs, language will be mapped to semantic representations. The semantic representations are frames enriched with logical operators. Furthermore, the elements of the frames, i.e., the event predicates and semantic roles, are mapped to distributional vectors. This allows then to relate these predicates via their distributional properties, which allows for reasoning with these representations.

Lewis, M. & M. Steedman. 2013. Combined Distributional and Logical Semantics. Transactions of the Association for Computational Linguistics 1. 179–192.