|
WP 2 Perception and Action Project partners contributions
Input Processing Tool (OFAI)
Natural Language Input Processing Tool. The tool was conceived as a robust mechanism for constructing a rich annotation of natural language text. It integrates an extensive set of NLP components and resources with the ontologies and databases created for and used in the Rascalli platform. The tool structures a natural language text using a finite set of categories and symbols, and creates a computer interpretable representation of human utterances. In this process, the tool incorporates knowledge about various concepts and relations (utterance categories, question categories, POS tags, WordNet relations, role of words in an utterance such as question focus or question interest, mapping between words from a user utterance to available ontologies and database categories).
Natural Language Database Interface (OFAI)
For the first versions of the Rascalli system, OFAI developed a natural
language query interface to retrieve information from the music databases
available to the Rascalli agents. The tool analyses the parse tree of a
question and maps it to concepts and relations stored in the databases. If the
mapping is successful, a query is formulated and executed.
Question Answering System (OFAI)
Internet Based Open Domain Question Answering System. The processing stages of the tool include: question analysis, accessing the Internet resources, and analysing the accessed documents. The tool, amongst others, incorporates a number of natural language processing tools (named entity recognition, part-of-speech tagger, chunker, stemmer, etc.), information retrieval tools (document indexing and querying engine) and machine learning based classifiers and clustering solutions. It further includes: SVM based question classification, keyword extraction, keyword scoring, and query generation modules. A set of tools usable to access a variety of Internet websites such as wikipedia.com, dictionary.com, howstuffworks.com, wordnet.org., and the Internet search engines such as Google, Yahoo, Altavista posses the capabilities to interpret the results of Internet resources (e.g. extract distinct definitions for a given term, report on ambiguity of a used term or possible misspellings, provide information on the number of available documents for a given query, distinct senses for a given term, the lack of term related documents or of a searched definition). The tools used to analyse the retrieved documents include: text segmentation, document scoring and indexing, a local search engine (using: Lemur Indri), the Fine Grained Answer Candidates Extraction module (employing a combination of: LingPipe NER, Gazeeteers, and extraction rules).
RSS Feed Tool (OFAI)
RSS feeds provide a mechanisms for Rascalli agents to detect changes in their
environment. While technically RSS feeds have to be polled and retrieved (and
thus obtaining information from an RSS feed is an active behaviour), they can be easily modelled within the Rascalli system to be part of a dynamic
and changing environment,with new feed items arriving in a manner that is temporally
unpredictable for the agent. Thus we implement a RSS feed tool which
continuously (i.e. in very short intervals) polls and
retrieves feeds that have been registered by a Rascalli agent without requiring
any further activation. As soon as new
data is retrieved, the tool triggers a signal which is perceived by the
Rascalli agent.
The RSS feed tool includes a mechanism to filter news feeds for sets of
keywords, allowing Rascalli to be informed only of news containing sucessful
matches.
Domain Modelling, Information Extraction and Visualization (DFKI)
DFKI has continued the research and implementation work, namely,
applying NLPs to relation extraction. The major task is the development
of an
NLP-based information acquisition service platform, which can retrieve
and extract musician relevant information from interesting web sites,
utilizing IR, ontology and NLP tools and allows an easy access to the
acquired information. The information acquisition service platform
consists of three components:
domain modelling: it defines the relevant concepts and
their relations of the musician profiles and the associated gossip
information. The concepts and their relations are structured in an
ontology.
relation extraction: it identifies the relevant concepts
and detects relations among different concepts;
visualization: it enables users to access the acquired
data via different options, such as, ontology-driven hierarchical
navigation and question answering.
The interaction among the three components is depicted in the following
figure:
Internet-based Perception: Information Discovery via Information
Wrapping and Information Extraction (DFKI)
DFKI has worked on automatic and semi-automatic acquisition of data
from web, in order to collect sufficient information and knowledge
about musicians, in particular, the personal profiles and other gossip
information. The methods developed for this workpackage contribute to
the perception, while the acquired knowledge supports the actions of
agents. We apply both information wrapping and information extraction
and information merging techniques. The whole discovery is embedded in
a bootstrapping framework, namely, starting with some examples and then
learning more and more information after several iterations.
Information Wrapping: Data Collection from Structured or
Semi-Structured Web Portals: This process aims to extract data from
websites where the gossip information about musicians are described,
e.g., the community site Wikipedia
and the special web portal for people and their profile NNDB.
It starts with a set of musicians provided by the consontrium partner
as seed . An extra data cleaning method is developed to
filter the noisy information contained in the seed content. The
data extractor component applies the information wrapping
techniques by sending query about an artist from the seed set to the
websites and map the extracted relevant pieces of information to the
RASCALLI gossip data types. The information merging
component merges the data discovered by various sources and validate
them in some degree and store them in the gossip database. The
bootstrapping will be triggered, if new musicians are discovered. The
new person lists will be used as new seed again.
Information Extraction: Relation
Extraction from Unstructured Free Texts: This process aims to
extract musicians and properties of the musicians and recoginze and
classify the corresponding relation types. We applied named entity
recognation and dependency parsing to the unstructured free texts. We
have automatically crawed relevant web articles reporting on news
about musicians and ranked them according to occurrences of the
musicians.The miminally supervised machine learning method DARE took
the known musician list and relations won by the information wrapping
method and extract more and more relations in a bootstrapping manner.
Visual Browser(DFKI)
DFKI has developed the visual browser to allow users the access to
the
RASCALLI information. The visual browser tool was originally developed
in the dropping knowledge
project.
We have modified it and adapted it to the RASCALLI needs. The visual
browser takes a taxonomy or even an ontology in OWL as input. Users
can navigate the concept hierarchy in a very fancy and convenient way.
Furthermore, each conept node in the visual browser is associated
with information services
which provide additional information about the concept, such as, search
in WIKIPEDIA, search in Google, search in RASCALLI musician data. The
terminal
nodes in the visual browser graph are the musicians.
We provide not only the facts about a musician and also a network
view of people related to them.
NLP Tools for Applications (DFKI)
DFKI has prepared severval relevant NLP tools for the consortium.
They include SProUT sysetm for the named
entity recognition and the Java interface of MINIPAR for
the dependency parsing. We also experimented with the Stanford Parser and tools provided by OpenNLP.
Back to workpackages
RASCALLI is supported by the European Commission Cognitive Systems
Programme (IST-27596-2004).
|
RASCALLI develops a new type of personalized cognitive agents, the
Rascalli, that live and learn on the Internet.
Rascalli
combine Internet-based perception, action, reasoning, learning, and
communication.
Rascalli come into existence by creation
through the user. The users not only create their Rascalli but also
train them to fulfil specific tasks, such as be experts in a quiz
game or assist the user in a music portal.
|
|