|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractProcessingResource
gate.creole.AbstractLanguageAnalyser
at.ofai.gate.creole.MtlTransducer
public class MtlTransducer
Montreal Transducer: A cascaded multi-phase ontology-aware transducer using the Jape language which is a variant of the CPSL language. Requires java 1.4 or higher.
The Montreal Transducer is based on the Transducer from the ANNIE suite but with the following added features:
To use this resource, the repository (or directory) containing the creole.xml and resource jar file must be loaded by the user. The repository must be accessible via the file:// protocol. Unlike most resources, the repository cannot be a web URL (http://www...). This is because the transducer compiles java code (the grammar rules) every time it is loaded and the resource jar file must be part of the classpath when compiling, but only regular file URLs are allowed in the classpath. The resource will try to add the jar file to the classpath automatically; if problems arise when loading the transducer, add the jar file to the classpath manually prior to running the application.
The Montreal Transducer offers more comparison operators to put in left hand side constraints of a JAPE grammar. The standard ANNIE transducer allows constraints only like these:
{MyAnnot}
// true if the current annotation is a MyAnnot annotation
{MyAnnot.attrib == "3"}
// true if attrib
attribute has a value that is equal to 3
The Montreal Transducer allows the following constraints:
{!MyAnnot}
// true if NO annotation at current point is a MyAnnot
{!MyAnnot.attrib == 3}
// true if attrib
is not equal to 3
{MyAnnot.attrib != 3}
// true if attrib
is not equal to 3
{MyAnnot.attrib > 3}
// true if attrib
> 3
{MyAnnot.attrib >= 3}
// true if attrib
≥ 3
{MyAnnot.attrib < 3}
// true if attrib
< 3
{MyAnnot.attrib <= 3}
// true if attrib
≤ 3
{MyAnnot.attrib =~ "[Dd]ogs?"}
// true if regular expression matches attrib
entirely
{MyAnnot.attrib !~ "[Dd]ogs?"}
// true if regular expression does not match attrib
See the notes on the equality operators, comparison operators, pattern matching operators and negation operator.
Notes on equality operators: "==" and "!="
The "!=" operator is the negation of the "==" operator, that is to say: {Annot.attribute != value}
is equivalent to {!Annot.attribute == value}
.
When a constraint on an attribute cannot be evaluated because an annotation does not have a value for the attribute, the equality operator returns false (and the difference operator returns true).
If the constraint's attribute is a string, then the String.equals method is called with the annotation's attribute as a parameter. If the constraint's attribute is an integer, then the Long.equals method is called. If the constraint's attribute is a float, then the Double.equals method is called. And if the constraint's attribute is a boolean, then the Boolean.equals method is called. The grammar parser does not allow other types of constraints.
Normally, when the types of the constraint's and the annotation's attribute differ, they cannot be equal. However, because some ANNIE processing resources (namely the tokeniser) set all attribute values as strings even when they are numbers (Token.length
is set to a string value, for example), the Montreal Transducer can convert the string to a Long/Double/Boolean before testing for equality. In other words, for the token "dog":
{Token.attrib == "3"}
is true using either the ANNIE transducer or the Montreal Transducer
Notes on comparison operators: ">", "<", ">=" and "<="
If the constraint's attribute is a string, then the String.compareTo method is called with the annotation's attribute as a parameter (strings can be compared alphabetically). If the constraint's attribute is an integer, then the Long.compareTo method is called. If the constraint's attribute is a float, then the Double.compareTo method is called. The transducer issues a warning if an attempt is made to compare two Boolean because this type does not extend the Comparable interface and thus has no compareTo method.
The transducer issues a warning when it encounters an annotation's attribute that cannot be compared to the constraint's attribute because the value types are different, or because one value is null. For example, given a constraint {MyAnnot.attrib > 2}
, a warning is issued for any MyAnnot in the document for which attrib
is not an integer, such as attrib = "dog"
because we cannot evaluate "dog" > 2
. Similarly, {MyAnnot.attrib > 2}
cannot be compared to attrib = 2.5
because 2.5 is a float. In this case, force 2 as a float with {MyAnnot.attrib > 2.0}
.
The transducer does not issue a warning when the constraint's attribute is an integer/float and the annotation's attribute is a string but can be parsed as an integer/float. Some ANNIE processing resources (namely the tokeniser) set all attribute values as strings even when they are numbers (Token.length
is set to a string value, for example), and because {Token.length < "10"}
would lead to an alphabetical comparison, a workaround was needed so we could write {Token.length < 10}
.
Notes on pattern matching operators: "=~" and "!~"
The "!~" operator is the negation of the "=~" operator, that is to say: {Annot.attribute !~ "value"}
is equivalent to {!Annot.attribute =~ "value"}
.
When a constraint on an attribute cannot be evaluated because an annotation does not have a value for the attribute, the value defaults to an empty string ("").
The regular expression must be enclosed in double quotes, otherwise the transducer issues a warning:
{MyAnnot.attrib =~ "[Dd]ogs?"}
is correct
{MyAnnot.attrib =~ 2}
is incorrect
The regular expression must be a valid java.util.regex.Pattern, otherwise a warning is issued.
To have a match, the regular expression must cover the entire attribute string, not only a part of it. For example:
{MyAnnot.attrib =~ "do"}
does not match "does"
{MyAnnot.attrib =~ "do.*"}
matches "does"
Notes on the negation operator: "!"
Bindings: when a constraint contains both negated and regular elements, the negated elements do not affect the bindings of the regular elements. Thus, {Person, !Organization}
binds to the same annotations (amongst those that starts at current node in the annotation graph) as {Person}
; the difference between the two is that the first will simply not match if one of the annotations starting at current node is an Organization. On the other hand, when a constraint contains only negated elements such as {!Organization}
, it binds to all annotations starting at current node. It is important to keep that in mind especially when a rule ends with a constraint with negated elements only: the longest annotation at current node will be preferred.
Conjunctions of constraints on different types of annotation
The Montreal Transducer allows constraints on different types of annotation. Though the JAPE implementation exposed in the GATE 2.1 User Guide details an algorithm that would allow such constraints, the ANNIE transducer does not implement it. This transducer does. Those examples do not work as expected with the ANNIE transducer but do with this transducer:
{Person, Organization}
{Person, Organization, Token.length == "10"}
{Person, !Organization}
Greedy Kleene operators: "*" and "+"
The ANNIE transducer does not behave consistently regarding the "*" and "+" Kleene operators. Suppose we have the following rule with 2 bindings:
({Lookup.majorType == title})+:titles ({Token.orth == upperInitial})+:names
the Honourable Mr. John Atkinson
", we expect the following bindings:
Honourable Mr.
"
John Atkinson
"
Honourable
"
Mr. John Atkinson
"
Nested Class Summary |
---|
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource |
---|
gate.creole.AbstractProcessingResource.InternalStatusListener, gate.creole.AbstractProcessingResource.IntervalProgressListener |
Field Summary | |
---|---|
protected Batch |
batch
The actual JapeTransducer used for processing the document(s). |
static java.lang.String |
TRANSD_AUTHORISE_DUPLICATES_PARAMETER_NAME
|
static java.lang.String |
TRANSD_DOCUMENT_PARAMETER_NAME
|
static java.lang.String |
TRANSD_ENCODING_PARAMETER_NAME
|
static java.lang.String |
TRANSD_GRAMMAR_URL_PARAMETER_NAME
|
static java.lang.String |
TRANSD_INPUT_AS_PARAMETER_NAME
|
static java.lang.String |
TRANSD_OUTPUT_AS_PARAMETER_NAME
|
Fields inherited from class gate.creole.AbstractLanguageAnalyser |
---|
corpus, document |
Fields inherited from class gate.creole.AbstractProcessingResource |
---|
interrupted |
Fields inherited from class gate.creole.AbstractResource |
---|
name |
Fields inherited from class gate.util.AbstractFeatureBearer |
---|
features |
Fields inherited from interface gate.creole.ANNIEConstants |
---|
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DATE_POSTED_ANNOTATION_TYPE, DOCUMENT_COREF_FEATURE_NAME, JOB_ID_ANNOTATION_TYPE, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_INSTANCE_FEATURE_NAME, LOOKUP_LANGUAGE_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME |
Constructor Summary | |
---|---|
MtlTransducer()
Default constructor. |
Method Summary | |
---|---|
void |
cleanup()
Remove this class' jar file from the system classpath so that the system state is the same as when the init method was called (and before this class' jar file was added to the classpath, if missing). |
void |
execute()
Implementation of the run() method from Runnable . |
java.lang.Boolean |
getAuthoriseDuplicates()
Gets the authoriseDuplicates flag that allow/prevent the transducer from creating annotations that already exist at some point in the doc. |
java.lang.String |
getEncoding()
Gets the encoding used for reding the grammar file(s). |
java.net.URL |
getGrammarURL()
Gets the URL to the grammar used to build this transducer. |
java.lang.String |
getInputASName()
Gets the AnnotationSet used as input by this transducer. |
gate.creole.ontology.Ontology |
getOntology()
Gets the ontology used by this transducer. |
java.lang.String |
getOutputASName()
Gets the AnnotationSet used as output by this transducer. |
gate.Resource |
init()
This method is the one responsible for initialising the transducer. |
void |
interrupt()
Notifies all the PRs in this controller that they should stop their execution as soon as possible. |
void |
setAuthoriseDuplicates(java.lang.Boolean newAuthoriseDuplicates)
Sets the authoriseDuplicates flag that allow/prevent the transducer from creating annotations that already exist at some point in the doc. |
void |
setEncoding(java.lang.String newEncoding)
Sets the encoding to be used for reding the input file(s) forming the Jape grammar. |
void |
setGrammarURL(java.net.URL newGrammarURL)
Sets the grammar to be used for building this transducer. |
void |
setInputASName(java.lang.String newInputASName)
Sets the AnnotationSet to be used as input for the transducer. |
void |
setOntology(gate.creole.ontology.Ontology ontology)
Sets the ontology used by this transducer. |
void |
setOutputASName(java.lang.String newOutputASName)
Sets the AnnotationSet to be used as output by the transducer. |
Methods inherited from class gate.creole.AbstractLanguageAnalyser |
---|
getCorpus, getDocument, setCorpus, setDocument |
Methods inherited from class gate.creole.AbstractProcessingResource |
---|
addProgressListener, addStatusListener, fireProcessFinished, fireProgressChanged, fireStatusChanged, isInterrupted, reInit, removeProgressListener, removeStatusListener |
Methods inherited from class gate.creole.AbstractResource |
---|
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class gate.util.AbstractFeatureBearer |
---|
getFeatures, setFeatures |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface gate.ProcessingResource |
---|
reInit |
Methods inherited from interface gate.Resource |
---|
getParameterValue, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.FeatureBearer |
---|
getFeatures, setFeatures |
Methods inherited from interface gate.util.NameBearer |
---|
getName, setName |
Methods inherited from interface gate.Executable |
---|
isInterrupted |
Field Detail |
---|
public static final java.lang.String TRANSD_DOCUMENT_PARAMETER_NAME
public static final java.lang.String TRANSD_INPUT_AS_PARAMETER_NAME
public static final java.lang.String TRANSD_OUTPUT_AS_PARAMETER_NAME
public static final java.lang.String TRANSD_ENCODING_PARAMETER_NAME
public static final java.lang.String TRANSD_GRAMMAR_URL_PARAMETER_NAME
public static final java.lang.String TRANSD_AUTHORISE_DUPLICATES_PARAMETER_NAME
protected Batch batch
Constructor Detail |
---|
public MtlTransducer()
init()
method.
Method Detail |
---|
public gate.Resource init() throws gate.creole.ResourceInstantiationException
init
in interface gate.Resource
init
in class gate.creole.AbstractProcessingResource
gate.creole.ResourceInstantiationException
public void execute() throws gate.creole.ExecutionException
Runnable
.
This method is responsible for doing all the processing of the input
document.
execute
in interface gate.Executable
execute
in class gate.creole.AbstractProcessingResource
gate.creole.ExecutionException
public void interrupt()
interrupt
in interface gate.Executable
interrupt
in class gate.creole.AbstractProcessingResource
public void setGrammarURL(java.net.URL newGrammarURL)
newGrammarURL
- an URL to a file containing a Jape grammar.public java.net.URL getGrammarURL()
URL
pointing to the grammar file.public void setEncoding(java.lang.String newEncoding)
newEncoding
- a {link String} representing the encoding.public java.lang.String getEncoding()
public void setInputASName(java.lang.String newInputASName)
AnnotationSet
to be used as input for the transducer.
newInputAS
- a AnnotationSet
public java.lang.String getInputASName()
AnnotationSet
used as input by this transducer.
AnnotationSet
public void setOutputASName(java.lang.String newOutputASName)
AnnotationSet
to be used as output by the transducer.
newOutputAS
- a AnnotationSet
public java.lang.String getOutputASName()
AnnotationSet
used as output by this transducer.
AnnotationSet
public void setAuthoriseDuplicates(java.lang.Boolean newAuthoriseDuplicates)
newAuthoriseDuplicates
- if set to false, the transducer performs
righthandside actions as usual but does not add annotations to the
output annotation set when an identical annotation exists at the
same point in the document.public java.lang.Boolean getAuthoriseDuplicates()
public gate.creole.ontology.Ontology getOntology()
Ontology
value.public void setOntology(gate.creole.ontology.Ontology ontology)
ontology
- an Ontology
value.public void cleanup()
cleanup
in interface gate.Resource
cleanup
in class gate.creole.AbstractProcessingResource
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |