Semantic Biomedical Tagger

Skip to end of metadata
Go to start of metadata
Table of Contents

Introduction

The Semantic Biomedical Tagger (SBT) has a built-in capability to recognize 133 biomedical entity types and semantically link them to the knowledge base systems, in this case LinkedLifeData (LLD). The SBT can load entity names from the LLD service or any other RDF database with a SPARQL endpoint. The current version is preloaded with the latest release of LLD dataset. All URIs used by SBT are resolvable and can be opened by a web browser or a machine accessible API.

REST API

The details on the REST API for the Semantic Biomedical Tagger service are available on the Text Analytics page.

SBT annotations

The SBT creates semantic annotations that have names (Annotation type) and features: class (URI), instance (URI), and string (instance label). Both URIs can be further explored in the LLD service.

Example:

Annotation type Annotation feature Possible values
Gene_or_Genome class http://linkedlifedata.com/resource/semanticnetwork/id/T028

instance http://linkedlifedata.com/resource/umls/concept/C1705323

string COX-2


Supported entity types

The following table reveals the SBT annotation capabilities in terms of annotation types, number of labels and instances per type.

Annotation type Number of labels Number of instances
Organism 326 93
Plant 352611 163305
Fungus 181514 111541
Virus 30481 15170
Bacterium 424705 298686
Animal 372 102
Vertebrate 94 23
Amphibian 21030 7427
Bird 55788 11635
Fish 71928 29789
Reptile 21170 8293
Mammal 28458 8764
Human 173 75
Anatomical Structure 330 127
Embryonic Structure 2556 818
Congenital Abnormality 32250 3067
Acquired Abnormality 6505 901
Fully Formed Anatomical Structure 31 7
Body System 1580 446
"Body Part, Organ, or Organ Component" 183374 69095
Tissue 9679 4808
Cell 14029 4984
Cell Component 17925 6864
Gene or Genome 300584 68088
Body Location or Region 33013 16595
Body Space or Junction 21535 7374
Body Substance 2771 1040
Organism Attribute 2072 900
Finding 219382 75719
Laboratory or Test Result 8008 2785
Injury or Poisoning 64544 9940
Biologic Function 4106 1558
Physiologic Function 6092 2053
Organism Function 11677 4425
Mental Process 3459 951
Organ or Tissue Function 10796 3796
Cell Function 44535 15953
Molecular Function 94641 27269
Genetic Function 12479 4273
Pathologic Function 27801 3943
Disease or Syndrome 239089 30862
Mental or Behavioral Dysfunction 19962 2461
Cell or Molecular Dysfunction 7810 1656
Experimental Model of Disease 457 130
Event 328 127
Activity 1084 364
Behavior 154 39
Social Behavior 2746 859
Individual Behavior 2649 740
Daily or Recreational Activity 1548 399
Occupational Activity 1947 631
Health Care Activity 12610 3819
Laboratory Procedure 35481 10652
Diagnostic Procedure 38191 14298
Therapeutic or Preventive Procedure 230553 85251
Research Activity 4216 1286
Molecular Biology Research Technique 1460 297
Governmental or Regulatory Activity 1567 529
Educational Activity 1475 416
Machine Activity 503 131
Phenomenon or Process 3473 1249
Human-caused Phenomenon or Process 1540 458
Environmental Effect of Humans 265 63
Natural Phenomenon or Process 2747 734
Entity 34 18
Physical Object 126 34
Manufactured Object 11873 3844
Medical Device 84454 36027
Research Device 460 114
Conceptual Entity 1185 671
Idea or Concept 6233 3394
Temporal Concept 6766 3143
Qualitative Concept 9555 3794
Quantitative Concept 19620 6512
Spatial Concept 5420 2811
Geographic Area 24862 4046
Molecular Sequence 46 12
Nucleotide Sequence 885 248
Amino Acid Sequence 580 162
Carbohydrate Sequence 5 1
Regulation or Law 1403 566
Occupation or Discipline 1431 579
Biomedical Occupation or Discipline 3112 1011
Organization 968 380
Health Care Related Organization 4917 1632
Professional Society 143 43
Self-help or Relief Organization 146 50
Group 112 54
Professional or Occupational Group 6735 2200
Population Group 5739 2261
Family Group 886 256
Age Group 423 110
Patient or Disabled Group 831 198
Group Attribute 332 127
Chemical 80 23
Chemical Viewed Structurally 947 325
Organic Chemical 449534 214216
"Nucleic Acid, Nucleoside, or Nucleotide" 34870 11599
"Amino Acid, Peptide, or Protein" 640799 130253
Chemical Viewed Functionally 571 205
Pharmacologic Substance 325453 137201
Biomedical or Dental Material 10906 4285
Biologically Active Substance 338462 67037
Hormone 11880 3007
Enzyme 143539 28043
Vitamin 5367 2426
Immunologic Factor 87211 24478
"Indicator, Reagent, or Diagnostic Aid" 41285 14840
Hazardous or Poisonous Substance 17498 5811
Substance 17011 5838
Food 6874 2812
Functional Concept 7942 2965
Intellectual Product 46867 18135
Language 1244 486
Sign or Symptom 18602 2709
Classification 1747 815
Anatomical Abnormality 11927 3811
Neoplastic Process 96457 13298
Receptor 35981 5064
Archaeon 6988 4681
Antibiotic 12710 5259
"Element, Ion, or Isotope" 4109 1290
Inorganic Chemical 13221 5513
Clinical Drug 408499 111455
Clinical Attribute 436319 80048
Drug Delivery Device 7310 1418
Eukaryote 560679 359656

Example

In our example we will use a very simple request for annotating just a couple of sentences of text from the following bio-medical article:

Type 1 diabetes results from autoimmune destruction of pancreatic β cells. Findings from preclinical studies suggest that dipeptidyl peptidase-4 inhibitors and proton-pump inhibitors might enhance β-cell survival and regeneration. We postulated that sitagliptin and lansoprazole would preserve β-cell function in patients with recent-onset type 1 diabetes

For the sake of clarity, if you annotate the sample text above with the demo UI of S4 you will see a result like this:
The JSON request for the Semantic Biomedical Tagger (SBT) service will look like (Please refer to the Text Analytics page for details on the JSON input/output formats):

RESTful Request (Plain Text Content)

We are now ready to send a simple RESTful request to the S4 text analytics services using a simple command line tool like curl:

Lets go step-by-step through the sample code above:

  1. we specify the API Key and secret - all S4 requests need a valid API key and secret pair which can be generated from the S4 Management Console
  2. we specify the S4 RESTful service to be used - in this case the "Semantic Biomedical Tagger" text analytics service. Note that as part of the endpoint URL we also provide the API key and secret
  3. we have chosen to analyse an simple snippet of text (from a bio-medical article)
  4. we construct the proper JSON request document - comprised of the content + "text/plain" as content type
  5. we make a RESTful request to the S4 service via curl, providing the JSON request document (from step 4), the S4 service endpoint (from step 2) and we specify in the HTTP header that this HTTP request type is "application/json"

RESTful Request (Office Formats Content)

The following example demonstrates the processing of Office documents (Word) as input for the S4 text anaylitics services. The result is in the format described in the next section.

API Key, Secret and service URL configured in the same way as in the previous example. The request payload comprises of two parts:

  1. the JSON_REQUEST body specifying the document type "application/msword"
  2. the MS Office Word document, contained in file MS_WORD_DOCUMENT

The RESTful request itself is performed via curl as multipart message. The HTTP request type should not be explicitly provided (curl configures it properly), however the JSON part 'meta' should explicitly set its content type ("type=application/json")

JSON Result

The result of the service invocation is another JSON document (the structure is described on the Text Analytics page) which contains annotations and their offsets for various entities found in text:

  • Pharmacologic Substance: "sitagliptin", "lansoprazole"
  • disease: "Type 1 diabetes"
  • cells: "pancreatic β cells"
  • molecular function: "dipeptidyl peptidase-4 inhibitors"
  • activities, laboratory procedures, etc
  • original text snippet + sentence splits

The full JSON response is available below.

Some important details:

  • the original text (rows 2-5) and the sentence splits (rows 166-179) are available
  • the offsets of the annotations in the original text are provided with the "indices" key
  • additional annotation information such as type, id, string, etc is available
  • the class and the instance that the annotation represents are also available
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.