There are no parameters to the POST request - all the configuration information is provided in a JSON structure in the request body.
The processing body is a JSON structure containing the input text document - either directly included in the request (document), or as a reference to remote URL (documentUrl). In any case, the documentType property should specify the format of the input data (plain text, html page, twitter message, MS Word, etc.).
The following table provides the details on the attributes of the JSON request structure:
For plain text input, the document content can be directly included in the request. The documentType property should specify the format of the input data (plain text, html, twitter message, etc.).
For processing of HTML pages, the document URL has to specified in the request. The documentType property should specify the format of the input data ("text/html").
If the input for the text processing services of S4 is not a plain text or and HTML document, then the structure of the service request is different. It consists of two sections:
The request implementation is done as HTTP Multipart message with two attachments:
The content type (Content-Type) header of each of the attachments should be: application/json for the metadata and application/octet-stream for the binary data.
The simplest response format for the S4 text annotation services (news, bio-medical, Twitter) is application/json. For each annotated document, it consists of a JSON object with two properties:
If the original document is Twitter JSON (i.e. is sent with text/x-json-twitter MIME type), the output JSON will attempt to preserve the JSON structure of the original Tweet as much as possible. If the original Tweet contains "entities", the output annotations will be merged with the ones from the original JSON.
MATE format is a special kind of HTML containing annotation results from the S4 Text Analytics services. The purpose of this format is to preserve the original layout of the documents while adding the metadata in not visually rendered HTML elements.
A document in MATE format is:
Using this format requires HTML input documents.
The annotations representation is included in the header of the input HTML documents as JSON object, containing all the annotations generated by the service. Each annotation is a collection of HTML elements referenced by ids ("data-custom-id" attribute). The IDs are unique withing the document and they are generated during processing time.
Each annotation (JSON) object has the following structure:
JSON representation of a single annotation:
The contents in the features element contain the annotations for the entities identified in the text as key-value pairs. For semantic annotations the relevant annotation properties can be:
Refer the example in the following section for complete annotations representations.
The following example is based on a BBC's news article introducing the new Amazon delivery drone. For brevity the result example is cut down to few annotations on a fragment of the original document. The annotations are generated by the News Annotation service.
The proper request to the service should provide a reference to the original document:
The result HTML document contains a JSON object, enclosed by the <script> tags, recognizing entities like person and organization. The annotations boundaries are marked by span elements in the document content.
The response format for the S4 news classification service is application/json. The result is a JSON object providing document classification information as well as ranked list of the top 3 category candidates. The latter provides confidence level for the selected category with respect to the other best category candidates.
You may refer to the News analytics example which has a detailed explanation of the input request format as well as the JSON response for a sample document.
The Swagger description of the Text Analytics REST API is available at http://swagger.s4.ontotext.com/
Skip to end of metadata Go to start of metadata