Extended List Gazetteer Plugin
Description
NOTE: This plugin is out of date. You can find a newer version of the Extended Gazetteer PR in the StringAnnotation plugin.
This plugin provides an extended version of the original GATE ANNIE Gazetteer (DefaultGazetteer). In addition to the features of the original, built-in version of the List Gazetteer, this version provides features for more powerful matching of partial words:
- It is possible to define four ways of how to define a word: 1) everything that is not whitespace 2) everything that is not a letter 3) everything that is not a digit and 4) everything that is neither a letter nor a digit.
- In addition, you can define additional characters to be either part of words or part of whitespace
- The program will create additional annotations
LookupPrefix
andLookupSuffix
for parts of a word that are before or after the part that is matched with an entry from the gazetteer. The featuresmajorType
,minorType
, andontology
are the same as for the correspondingLookup
annotation. - The
Lookup
annotations include the additional boolean featuresatEnd
andatBeginning
to indicate whether the match is at the end/beginning of a word. - All generated annotations can unclude the
string
feature which contains the actual text that corresponds to the annotation. This can be useful in JAPE rules to match exceptions etc. - For the
Lookup
andLookup_prefix
annotations, the additional featuresfirstcharUpper
(true or false)andfirstcharCategory
(the integer value corresponding to the Unicode category of the character) are generated
NOTE: Other than with the GATE ANNIE Gazetteer, all parameters except the URL of the input file, the encoding, and the feature separator are runtime parameters now, and thus not visible when you create the resource. They can, however, be changed at any time once you include the resource in a pipline. This makes it possible to change them without the need to re-create the processing resource.
Current version: 1.3
You can download the plugin as
INSTALLATION: both the gzipped and the ZIP file contain a precompiled
version compiled with Sun JDK 1.6 under Linux.
This should work with other OS or Java versions, but if not,
the package can be recompiled in the standard way with
a simple ant
command.
Simply unpack the archive, then within GATE go to File->Manage Creole Plugins, press the "Add new CREOLE repository" button and select the directory you have just created.
After the plugin has been loaded this way, you should find the new processing resource "Extended List Gazetteer" in the "New" menu for processing resources.