Healthcare Tagger

Skip to end of metadata
Go to start of metadata
Table of Contents


Healthcare application is focused on information extraction of biomedical knowledge from clinical notes, discharge summaries, as well as many other types of medical records.


Several types of approaches have been used to identify entities in Healthcare application:

  • Named entities recognition - The application uses SNOMED CT ontology that is additionally enriched with a list of synonyms available in UMLS. This assures high recall but sophisticated disambiguation approach contributes to the application precision. See below for the full list of semantic types. In addition to the conventional exact matching of names, relaxed matching mechanism is used. It allows the discovery of entities with non-adjacent words that are typical for the normal speech. Drug names are also being discovered using RxNorm vocabulary.
  • Negations - Negation constructs are frequently used in all kind of medical documents and their correct finding is important for the clarification of named entities context.
  • Relations - Prescription of drugs is an integral part of almost each medical document and it might contain very diverse information about the medicine such as its name, quantity and measurement unit, frequency and period of intake, administration route, and population group. The discovery of such comprehensive information in medical document is tricky but as we found – possible. This kind of relations are designated as Medication.

Data sets

The following data sets were used to populate the gazetteers:

  • RxNorm
  • Snomed CT
  • UMLS

Semantic types

The following semantic types are used in Named entity recognition and Relations extraction:

  • Assessment_scale
  • Body structure
  • Cell
  • Cell structure
  • Core metadata concept
  • Disorder
  • Environment
  • Environment / location
  • Ethnic group
  • Event
  • Finding
  • Foundation metadata concept
  • Geographic location
  • Inactive concept
  • Life style
  • Morphologic abnormality
  • Navigational concept
  • Observable entity
  • Occupation
  • Organism
  • Person
  • Physical force
  • Physical object
  • Procedure
  • Product
  • Qualifier value
  • Racial group
  • Record artifact
  • Regime/therapy
  • Religion/philosophy
  • Situation
  • Social concept
  • Special concept
  • Specimen
  • Staging scale
  • Substance
  • Tumor staging


In our example we will use a very simple request for annotating just a couple of sentences:

All x-rays including left foot, right knee, left shoulder and cervical spine showed no acute fractures. The left shoulder did show old healed left humeral head and neck fracture with baseline anterior dislocation. CT of the brain showed no acute changes, left periorbital soft tissue swelling. CT of the maxillofacial area showed no facial bone fracture. Echocardiogram showed normal left ventricular function, ejection fraction estimated greater than 65%.

For the sake of clarity, if you annotate the sample text above with the demo UI of S4 you will see a result like this:

The JSON request for the Healthcare Tagger service will look like (Please refer to the Text Analytics page for details on the JSON input/output formats):

RESTful Request (Plain Text Content)

We are now ready to send a simple RESTful request to the S4 text analytics services using a simple command line tool like curl:

Lets go step-by-step through the sample code above:

  1. we specify the API Key and secret - all S4 requests need a valid API key and secret pair which can be generated from the S4 Management Console
  2. we specify the S4 RESTful service to be used - in this case the "Semantic Biomedical Tagger" text analytics service. Note that as part of the endpoint URL we also provide the API key and secret
  3. we have chosen to analyse an simple snippet of text
  4. we construct the proper JSON request document - comprised of the content + "text/plain" as content type
  5. we make a RESTful request to the S4 service via curl, providing the JSON request document (from step 4), the S4 service endpoint (from step 2) and we specify in the HTTP header that this HTTP request type is "application/json"

RESTful Request (Office Formats Content)

The following example demonstrates the processing of Office documents (Word) as input for the S4 text anaylitics services. The result is in the format described in the next section.

API Key, Secret and service URL configured in the same way as in the previous example. The request payload comprises of two parts:

  1. the JSON_REQUEST body specifying the document type "application/msword"
  2. the MS Office Word document, contained in file MS_WORD_DOCUMENT

The RESTful request itself is performed via curl as multipart message. The HTTP request type should not be explicitly provided (curl configures it properly), however the JSON part 'meta' should explicitly set its content type ("type=application/json")

JSON Result

The result of the service invocation is another JSON document (the structure is described on the Text Analytics page) which contains annotations and their offsets for various entities found in text:


The full JSON response is available below.

Versions History

Version 1

  • Update RxNorm, Snomed CT, and UMLS gazetteers
  • Modify the gazetteers population mechanism in order to disambiguate identical literals that belong to different concepts
  • Extend the abilities for abbreviations finding for all kind of semantic types
  • Place flags for annotations, which start-end positions coincide with the ones of the noun phrase
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.