Projects

2006

AUREX/W

Automated retrieval and extraction of web contents

It is the goal of AUREX/W to investigate the development of tools for information extraction from unstructured websites, thus making information extraction methods applicable to a significant larger portion of the web in a largely task-independent way. AUREX/W aims at developing tools that are generic and easy to use for end-users and to provide a working toolbox for certain common subtasks of the information extraction process. The project is focused primarily on extraction from German language websites.

2004 – 2006

SIMAC

Semantic Interaction with Music Audio Contents

SIMAC's main task was the development of prototypes for the automatic generation of semantic descriptors and development of prototypes for exploration, recommendation, and retrieval. One special feature was the development and use of semantic descriptors. That is, ways to tag music that are close to the user's way of describing its contents. We developed tools to be used by music consumers, music distributors, and music creators.

2004 – 2006

Creative Histories

The Josefsplatz Experience

The goal of the project is to reconstruct a complex 3D model of an urban environment, in particular the Viennese Josefsplatz, from historical pictures and paintings and present this 4D information space (3D geometry over time) on PCs and mobile devices. Moreover, a user adaptive meta-information system enables the visualisation of complex, interlinked historic events. A special focus is set on the different qualities of historical information. In the project, OFAI concentrates on the realization of the meta-information system.

2003 – 2006

Architecture and Effective Development of a High-Quality Part-of-Speech Tagger

The general aim of the project is enhancing the quality of Part-of-Speech tagging by developing a tagger combining the statistical approach with the Constraint Grammar based approach in such a way that (i) strengths of each of the approaches are accented, and (ii) weaknesses are mutually compensated for. Apart from these theoretical aims, a validation/practical demonstration of the developed methodology is also due, together with an evaluation of the practical results achieved. This sums up to the following three main objectives of (and simultaneously to the three innovations in the field of PoS-tagging contributed by) the project: (1) proposing and advocating a novel tagger architecture combining the statistical and the Constraint Grammar based tagging scheme into a tagging system with higher accuracy than any of its components taken alone; (2) developing a systematic method for writing rules of a Constraint Grammar tagger, together with a novel and more powerful method of their application; (3) implementing and evaluating a combined tagger for German, employing the TnT tagger by T. Brants as the statistical component and the newly developed Constraint Grammar tagger for German, and using the NEGRA corpus as the evaluation standard.

2005 – 2006

SPARC

Semantic Phonetic Automatic Reconstruction of Dictations

The SPARC project aims at integrating semantic knowledge bases in automatic speech recognition systems for dictation applications. Speech recognition systems, which take spoken text as input and convert it into written text, have long reached a point where they can be commercially employed. An important application for speech recognition is automating document creation in institutions with a large dictation volume. This type of application poses a challenge for text processing due to its potentially large vocabulary. While in dialog or command-and-control systems, 'semantics' is represented by the underlying databases or the set of possible system actions, dictation systems have to handle texts with a much broader content, even if the domain is usually limited. In order to create documents from spoken texts, speech recognition systems usually only rely on an acoustic model and a language model which represents co-occurrence statistics of words. Based on this knowledge, a transcription of the spoken text is produced. To fully employ the potential of language technology for automated dictation, systems must move away from simple transcriptions of the spoken utterances to document creation conforming to the formal and informal requirements of specific types of texts. By making use of explicit semantic information, our project will contribute to this new dimension in automatic speech recognition technology for dictation systems. Improvements gained with the integration of semantic knowledge will concern document quality, word error rate and usability.

2003 – 2005

BioMint

Biological Text Mining

The main aim of this project is the development of a generic text mining tool for the biological domain. The BioMint tool will search the literature and automatically extract information from abstracts and papers in order to provide two essential research support services: (1) Curator's assistant: accelerate, by partially automating, the annotation and update of bio-databases; and (2) Researcher's assitant: generate readable reports in response to queries from biological researchers and practitioners.

2002 – 2005

Knowledge Exploration in Science and Technology

The primary objective of the Action is to develop and implement computerised systems for extracting previously unknown, non-trivial, and potentially useful knowledge from structurally complex, high-volume, distributed, and fast-changing scientific and R&D databases within the context of current and newly developed global computing and data infrastructures such as the GRID.

1998 – 2005

Computer-Based Music Research

Artificial Intelligence Models of Musical Expression

The goal of this project is to use Artificial Intelligence methods to study the phenomenon of expressive music performance. The focus of the project is on developing and using machine learning and data mining methods for the analysis of expressive performance data. The goal is to gain a deeper understanding of this complex domain of human competence and to contribute new methods to the (relatively new) branch of musicology that tries to develop quantitative models and theories of musical expression.

2004 – 2005

An Automaton for the Moderation of Internet-based Discussion Fora

The project aimed at the development of an automaton for the moderation of internet-based discussion fora. Up to now, fora which have to fulfill certain legal and qualitative standards, had to be moderated by hand. In the case of fora with intensive participation, this required large amounts of human workload and lead often to delays in publication. The goal of the project was a partial automatic checking of contributions according to given criteria in order to greatly reduce human workload. A prototype was established and successfully tested with actual material from an Austrian online newspaper.

2004 – 2005

Foromat

Machine learning to qualify postings

The research and development project Foromat was the first endeavour of OFAI to bring the benefits of machine learning to news media. Media owners and publishers have a legal (and moral) obligation to monitor the content published on their pages. Foromat helps the forum moderators to identify postings that must not be published due to their infringing, abusive and offending content. The system is successfully running at Der Standard since 2005.