gate.treetagger2
Class TreeTaggerChunk

java.lang.Object
  extended by gate.util.AbstractFeatureBearer
      extended by gate.creole.AbstractResource
          extended by gate.creole.AbstractProcessingResource
              extended by gate.creole.AbstractLanguageAnalyser
                  extended by gate.treetagger2.TreeTaggerBase
                      extended by gate.treetagger2.TreeTaggerChunk
All Implemented Interfaces:
gate.creole.ANNIEConstants, gate.Executable, gate.LanguageAnalyser, gate.ProcessingResource, gate.Resource, gate.util.FeatureBearer, gate.util.NameBearer, java.io.Serializable

public class TreeTaggerChunk
extends TreeTaggerBase
implements gate.ProcessingResource

This class is a wrapper for the language-independent POS tagger from the University of Stuttgart, Germany. This class is for the chunker function of TreeTagger. This is a modified version of the plugin that comes with GATE version 3.1 and earlier. This modified version includes several changes and enhancements over the original version:

The following parameters are available for the TreeTaggerPOS part of speech tagger processing resource:

The output of the chunker is stored in tree additional features for each annotation: chunktag which holds the original tag assigned from the TreeTagger, chunkpart which contains the sequence information from the chunktag and chunktype which contains the type information from the chunktag.

In order to create annotations that span several tokens of the same type, you need postprocess the output of the chunker with a JAPE transducer. An example JAPE rule file that will work with the default token annotation type is provided in the plugin directory, subdirectory resources/grammar as file join.jape.

Author:
Johann Petrak, Austrian Research Institute for AI (OFAI)
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class gate.creole.AbstractProcessingResource
gate.creole.AbstractProcessingResource.InternalStatusListener, gate.creole.AbstractProcessingResource.IntervalProgressListener
 
Field Summary
protected  boolean doPosTagging
           
protected  boolean includeLemma
           
 
Fields inherited from class gate.treetagger2.TreeTaggerBase
addToScriptParms, annotationSetName, debugMode, encoding, failOnUnmappableChar, tokenAnnotationType
 
Fields inherited from class gate.creole.AbstractLanguageAnalyser
corpus
 
Fields inherited from class gate.creole.AbstractProcessingResource
interrupted
 
Fields inherited from class gate.creole.AbstractResource
name
 
Fields inherited from class gate.util.AbstractFeatureBearer
features
 
Fields inherited from interface gate.creole.ANNIEConstants
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DATE_POSTED_ANNOTATION_TYPE, DOCUMENT_COREF_FEATURE_NAME, JOB_ID_ANNOTATION_TYPE, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME
 
Constructor Summary
TreeTaggerChunk()
           
 
Method Summary
 void execute()
          Run the TreeTagger on the current document.
 java.lang.Boolean getDoPosTagging()
          Get the flag for whether we should fail if an unmappable character is found.
protected  void getFeatures4Tokens(java.util.ArrayList lines, java.util.ArrayList tokens)
          Parse the lines of TreeTagger Chunker output and create features for the tokens.
 java.lang.Boolean getIncludeLemma()
          Get the flag for whether we should fail if an unmappable character is found.
 void setDoPosTagging(java.lang.Boolean newValue)
          Set the flag for whether we we also want POS taggin tags
 void setIncludeLemma(java.lang.Boolean newValue)
          Set the flag for whether we we also want POS taggin tags
 
Methods inherited from class gate.treetagger2.TreeTaggerBase
getAnnotationSetName, getDebugMode, getDocument, getEncoding, getFailOnUnmappableChar, getTokenAnnotationType, getTreeTaggerInvocationScriptParms, init, setAnnotationSetName, setDebugMode, setDocument, setEncoding, setFailOnUnmappableChar, setTokenAnnotationType, setTreeTaggerInvocationScriptParms
 
Methods inherited from class gate.creole.AbstractLanguageAnalyser
getCorpus, setCorpus
 
Methods inherited from class gate.creole.AbstractProcessingResource
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, reInit, removeProgressListener, removeStatusListener
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class gate.util.AbstractFeatureBearer
getFeatures, setFeatures
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.ProcessingResource
reInit
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, init, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.FeatureBearer
getFeatures, setFeatures
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 
Methods inherited from interface gate.Executable
interrupt, isInterrupted
 

Field Detail

doPosTagging

protected boolean doPosTagging

includeLemma

protected boolean includeLemma
Constructor Detail

TreeTaggerChunk

public TreeTaggerChunk()
Method Detail

execute

public void execute()
             throws gate.creole.ExecutionException
Description copied from class: TreeTaggerBase
Run the TreeTagger on the current document. This writes the document text to a temporary file, runs the tagger and processes its output to produce TreeTaggerToken annotations on the document.

Specified by:
execute in interface gate.Executable
Overrides:
execute in class TreeTaggerBase
Throws:
gate.creole.ExecutionException

getFeatures4Tokens

protected void getFeatures4Tokens(java.util.ArrayList lines,
                                  java.util.ArrayList tokens)
Parse the lines of TreeTagger Chunker output and create features for the tokens. The chunker output has the format: word-POS POS/CHUNK If the doPosTagging flag is true, we also set the POS tags from this output (using the first part of the POS/CHUNK pair) If the includeLemma flag is true, the invocation script is called with the -lemma option. In this case, we will set tje lemma in addition to the chunk tags. The format in this case is: word-POS POS/CHUNK LEMMA

Specified by:
getFeatures4Tokens in class TreeTaggerBase

setDoPosTagging

public void setDoPosTagging(java.lang.Boolean newValue)
Set the flag for whether we we also want POS taggin tags


getDoPosTagging

public java.lang.Boolean getDoPosTagging()
Get the flag for whether we should fail if an unmappable character is found.


setIncludeLemma

public void setIncludeLemma(java.lang.Boolean newValue)
Set the flag for whether we we also want POS taggin tags


getIncludeLemma

public java.lang.Boolean getIncludeLemma()
Get the flag for whether we should fail if an unmappable character is found.