GATE Plugin for S4

Skip to end of metadata
Go to start of metadata
Table of Contents
Source code
The source code for the GATE plugin is available from the S4 GitHub repository

Introduction

This component provides access to the S4 text analytics services directly from the GATE platform. The S4 Annotator plug-in is implemented as a GATE Processing Resource (PR) and it acts as a local proxy to the remotely accessible RESTful services of S4, hiding the complexity of the underlying technologies and communication protocols. The PR can be integrated in any GATE processing pipeline regardless of the context and it does not have any requirements or assumptions about the type of pre-processing or post-processing of the textual data being annotated.

The following sections describe the procedures for downloading, configuring and running the S4 Annotator PR for GATE.

Prerequisites

More details on acquiring S4 API keys are available in the S4 documentation.

Download

There are two options for downloading the plug-in depending on the user preference

  • The first one is supported by the GATE Developer environment via the CREOLE Plugin Manager user interface
  • The second one is involves a manual download of the plug-in binaries and extraction them at proper location on the local file system

Downloading via the CREOLE Plugin Manager

Step 1: Start GATE
Step 2: Open the CREOLE Plugin Manager
Step 3: Open the Configuration tab (see the figure below)
Step 4: Choose User Plugin Directory (the directory where GATE will be downloading plugins)
Step 5: Click on the "plus" icon (it's a button on the right side of the Plugin Repositories box)
Step 6: For the plugin name write:

GATE S4 Plugin

and for the URL:

http://ontotext-ad.github.io/S4/GatePR/gate-update-site.xml
The link needs to be "http://" since GATE can't manage encrypted / "https://" connections

Step 7: Having completed the previous step successfully, the tab Available to Install will become enabled. Its content should contain the plug-in listed (see the figure below).
Step 8: Check Install of the plugin and then click the Apply All button. At this moment the manager starts downloading the plugin and it will be available into the Installed Plugins tab. 

Direct plug-in download

The plug-in binaries package is available at http://ontotext-ad.github.io/S4/GatePR/Annotator_S4/Annotator_S4.zip

Download the archive and extract it in the plugins directory of the GATE platform, usually at $GATE_HOME/plugins. Then (re) start the GATE developer application.

Update

There are two ways for updating the plug-in depending on the chosen method for downloading the plugin.

Updating via the CREOLE Plugin Manager

Direct plug-in update

Remove ANnotator S4 directory and repeate Direct plug-in download.

Configuration

If your GATE installation is in the Program Files folder (Windows) then you should move S4.config file to a different folder, for example "C:/Users/<user_name>/Documents/S4.config".

Prior to loading the plug-in and instantiating the PR, there is a single file which needs to be updated with the proper credentials (API keys) information. Recall the Prerequisites section where we require an API key & secret pair for accessing the carious S4 services. This information should be provided in the configuration file called S4.config located in the main directory of the plug-in ($GATE_HOME/plugins/Annotator_S4/S4.config). It is a plain text file containing two properties:

Loading the Plugin

It is a regular GATE procedure for plug-in loading.

Step 1: Open the CREOLE Plugin Manager (File > Manage CREOLE plugins...)
Step 2: Open Installed plugins
Step 3: Find Annotator_S4 plugin
Step 4: Check Load Now and/or Load Always
Step 5: Click on Apply All button

Step 6: Close the CREOLE Plugin Manager window
Step 7: Right click on the Processing Resources to instantiate the PR
Step 8: Choose New > Annotator_S4

Step 9: Ensure the configFileURL refers to the correct configuration file
Step 10: Choose the proper S4 service endpoint URL for the service that you want to use (one of TwitIE, News or Semantic Biomedical Tagger).

for future extensions / new S4 services we provide the option for manual S4 service endpoint URL specification (customUrl).

Processing Documents with S4

Step 1: Load the documents you want to annotate
Step 2: Create new application of type corpus pipeline
Step 3: Add the Annotator S4 PR to the pipeline
Step 4: Select the proper corpus and execute the pipeline

 

Annotation Results

Now just run the pipeline and see the results.

Here follows a sample document annotated with the News annotation service. Description of the annotation results is available in the News Annotation documentation.

 

Here follows a sample document annotated with the Bio-medical annotation service. Description of the annotation results is available in the Semantic Biomedical Tagger documentation

Source Code

The source code for the GATE plugin for S4 is available from the S4 GitHub repository.

Next Steps

The S4 Annotator plugin for GATE provides an easy way for GATE developers and language engineers to incorporate the S4 text analytics services within GATE pipelines. If you haven't done so already - register and start using S4 right away!

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.