2005 – 2006


Semantic Phonetic Automatic Reconstruction of Dictations

The SPARC project aims at integrating semantic knowledge bases in automatic speech recognition systems for dictation applications. Speech recognition systems, which take spoken text as input and convert it into written text, have long reached a point where they can be commercially employed. An important application for speech recognition is automating document creation in institutions with a large dictation volume. This type of application poses a challenge for text processing due to its potentially large vocabulary. While in dialog or command-and-control systems, 'semantics' is represented by the underlying databases or the set of possible system actions, dictation systems have to handle texts with a much broader content, even if the domain is usually limited. In order to create documents from spoken texts, speech recognition systems usually only rely on an acoustic model and a language model which represents co-occurrence statistics of words. Based on this knowledge, a transcription of the spoken text is produced. To fully employ the potential of language technology for automated dictation, systems must move away from simple transcriptions of the spoken utterances to document creation conforming to the formal and informal requirements of specific types of texts. By making use of explicit semantic information, our project will contribute to this new dimension in automatic speech recognition technology for dictation systems. Improvements gained with the integration of semantic knowledge will concern document quality, word error rate and usability.

2003 – 2005


Biological Text Mining

The main aim of this project is the development of a generic text mining tool for the biological domain. The BioMint tool will search the literature and automatically extract information from abstracts and papers in order to provide two essential research support services: (1) Curator's assistant: accelerate, by partially automating, the annotation and update of bio-databases; and (2) Researcher's assitant: generate readable reports in response to queries from biological researchers and practitioners.

2002 – 2005

Knowledge Exploration in Science and Technology

The primary objective of the Action is to develop and implement computerised systems for extracting previously unknown, non-trivial, and potentially useful knowledge from structurally complex, high-volume, distributed, and fast-changing scientific and R&D databases within the context of current and newly developed global computing and data infrastructures such as the GRID.

1998 – 2005

Computer-Based Music Research

Artificial Intelligence Models of Musical Expression

The goal of this project is to use Artificial Intelligence methods to study the phenomenon of expressive music performance. The focus of the project is on developing and using machine learning and data mining methods for the analysis of expressive performance data. The goal is to gain a deeper understanding of this complex domain of human competence and to contribute new methods to the (relatively new) branch of musicology that tries to develop quantitative models and theories of musical expression.

2004 – 2005

An Automaton for the Moderation of Internet-based Discussion Fora

The project aimed at the development of an automaton for the moderation of internet-based discussion fora. Up to now, fora which have to fulfill certain legal and qualitative standards, had to be moderated by hand. In the case of fora with intensive participation, this required large amounts of human workload and lead often to delays in publication. The goal of the project was a partial automatic checking of contributions according to given criteria in order to greatly reduce human workload. A prototype was established and successfully tested with actual material from an Austrian online newspaper.

2004 – 2005


Machine learning to qualify postings

The research and development project Foromat was the first endeavour of OFAI to bring the benefits of machine learning to news media. Media owners and publishers have a legal (and moral) obligation to monitor the content published on their pages. Foromat helps the forum moderators to identify postings that must not be published due to their infringing, abusive and offending content. The system is successfully running at Der Standard since 2005.

2003 – 2004


Artificial Intelligence Methods for Ebusiness

This project aims at obtaining an overview about the potential of AI for eBusiness, studied in four sub-projects, plus the development of small-scale, prototypical applications in each of these areas.

2001 – 2004


A Net Environment for Embodied Emotional Conversational Agents

The objective of the NECA project was to develop a new generation of mixed multi-user / multi-agent virtual spaces populated by affective conversational agents. The agents are be able to express themselves through synchronised emotional speech and non-verbal expression, generated from an abstract representation. This is the first time that such expressive capabilities are featured in Internet applications. The agents' usefulness were evaluated in two concrete application scenarios. From a technical point of view, the NECA platform provides a confederation of dedicated components including an affective reasoner, co-ordinated generation of verbal and nonverbal aspects of communication, and emotional speech synthesis, thus providing a basis for the development of new Internet applications with emotional agents. OFAI was the co-ordinating partner of the project. Moreover OFAI was responsible for the representation of multimodal information and for text generation in the German versions of the demonstrators. OFAI contributed also to the speech synthesis.

1998 – 2003


Methods and Tools for Collocation Extraction and Performance-Oriented Parsing

The aim of this project was to lay the foundations for a new generation of systems that enable fast, efficient and robust natural language processing and are still sufficiently general. Based on the assumption that particular aspects of performance are grammaticalized, we pursued a novel approach to grammar where performance and competence aspects are already interleaved within the grammar model. In particular, we aimed at modeling the interaction of generativity which is the distinctive feature of competence, and lexicalization which is a feature of language usage. To achieve this goal, the influence of lexicalization on generativity was studied within the phenomenon of collocations. The interaction of lexical and structural information was modeled by means of corpus-based statistical techniques.