VORTRAG ******* Oesterreichisches Forschungsinstitut fuer Artificial Intelligence(OeFAI) Schottengasse 3, A-1010 Wien Tel.: +43-1-53361120, Fax: +43-1-5336112-77, Email: sec@oefai.at ------------------------------------------------------------------------- Dr. Karel Oliva und Mag. Pavel Kveton Oesterreichisches Forschungsinstitut fuer Artificial Intelligence, Wien A LINGUISTIC BASIS OF CORRECTLY TAGGED POS CORPORA In this talk, we shall first review two notions from the area of statistical (i.e. purely "quantitative-based") language processing: "representativity of a corpus" and "bigram", and we shall try to give them a linguistic ("qualitative") interpretation. Based on these considerations, we shall develop a practical technique serving for detection of errors in a part-of-speech tagged corpus. Further, we shall generalize the approach in two orthogonal directions: from bigrams to n-grams (for any natural n) and from error detection to genuine tagging. In the last section, we shall illustrate the error-detection method developed on the NEGRA corpus of German, and discuss the general implications of the linguistics-based framework developed for statistical taggers. Zeit: Donnerstag, 21. Maerz 2002, 18:30 Uhr pktl. Ort: Oesterreichisches Forschungsinstitut fuer Artificial Intelligence Schottengasse 3, 1010 Wien. OESTERREICHISCHES FORSCHUNGSINSTITUT FUER ARTIFICIAL INTELLIGENCE o.Univ.-Prof. Dr. Robert Trappl