SESCO

Semantic Tagging System for Corpora

SESCO is a tag set which allows the semantic representation of linguistic corpora. This tagging system follows a compositional approach based on event structures. Events are classified under only three major types: states, processes and actions, and these major types can be divided into subtypes according to the arguments they require. This approach is compositional because a state has two arguments (an entity and its property/location), a process is made up of a transition from one state to another, and an action is a process with an agent and a patient; besides, those parts of an event which are not arguments are tagged as indirect relations.

A 50000 words subcorpus of the Spanish part of C-ORAL-ROM, a corpus of spoken language which was recorded following strict requirements of spontaneity and variety of speakers and contexts, has been manually annotated. This corpus represents a wide variety of speech acts performed in the daily use of language.

Sentence tokenization has been done following semantic constraints and every sentence corresponds to a complete event-structure.

Besides, the UAM Spanish Treebank, a corpus made up of 1600 written sentences taken from newspapers, has been semi-automatically annotated with SESCO.

Both corpora are being used for syntactic, semantic and pragmatic studies (for further information, see Publications/CV).