News Annotation

Skip to end of metadata
Go to start of metadata
Table of Contents

Introduction

The News Annotation service retrieves various types of entities from texts as well as the relations between them. The extraction is based on gazetteers from trusted sources (such as the curated Freebase, DBpedia, etc.) and a combination of rule-based and machine learning techniques. The service applies word sense disambiguation techniques and attaches a unique URI to each extracted entity or relation.

Recognized entity types

Person, Organization, Location, Event, Work

Names of persons (e.g., "Michael Jordan", "Michael Jeffrey Jordan", "M. Jordan"), organisations (e.g., "Deutsche Bank", "DB"), locations (e.g., "New York", "NY") and events (e.g. "2018 FIFA World Cup qualification") in various lexical representations

Features

Name Description
inst The unique URI of the extracted entity (person, location, organisation) mapped to DBpedia (e.g. a URI starting with"http://dbpedia.org/") When no mapping to DBpedia is possible an unique URI is generated of the form "http://data.ontotext.com..."
class The entity type: http://dbpedia.org/ontology/Person for persons http://dbpedia.org/ontology/Organization for organisations http://dbpedia.org/ontology/Place for locations
string The literal detected in the text
preferredLabel Selected amongst all labels

Relations

Relations of Organizations

RelationOrganizationOrganization

A generic relation between two organizations, e.g., "Osram are up more than 28 per cent since it spun out of Siemens".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationOrganization
string The literal detected in the text
organizationURI http://dbpedia.org/resource/Osram
organizationStr Osram
organization1URI http://dbpedia.org/resource/Siemens
organization1Str Siemens
triggerStr spun out of

RelationOrganizationAbbreviation

A relation between the name of an organization and its abbreviation, e.g., "European Food Safety Authority (EFSA)".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationAbbreviation
string The literal detected in the text
organizationURI http://dbpedia.org/resource/European_Food_Safety_Authority
organizationStr European Food Safety Authority
abbrevURI http://dbpedia.org/resource/EFSA
abbrevStr EFSA

RelationOrganizationAffiliatedWithOrganization

A hierarchical relation between two organizations where one is a sub-company of the other, e.g., "Apple's iTunes Store", "European Parliament's Culture Committee".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationAffiliatedWithOrganization
string The literal detected in the text
organizationURI http://dbpedia.org/resource/European_Parliament
organizationStr European_Parliament
subOrganizationUri http://dbpedia.org/resource/European_Parliament_Committee_on_Culture_and_Education
subOrganizationStr Culture Committee

RelationOrganizationCustomerOfOrganization

A relation between two organizations where one of them is a customer of the second one, e.g., "Honda will provide engines to McLaren".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationCustomerOfOrganization
string The literal detected in the text
providerURI http://dbpedia.org/resource/Honda
providerStr Honda
customerURI http://dbpedia.org/resource/McLaren
customerStr McLaren
triggerStr will provide

RelationOrganizatoinCompetesWithOrganization

A symmetric relation between two organizations that compete with each other, e.g., "Apple's major media tablet rival, Google Inc."

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizatoinCompetesWithOrganization
string The literal detected in the text
organizationURI http://dbpedia.org/resource/Apple_Inc.
organizationStr Apple
organization1URI http://dbpedia.org/resource/Google
organization1Str Google Inc.
triggerStr rival

RelationOrganizationPartnership

A symmetric relation between two organizations that are partners/collaborators, e.g., "Bechtel, the engineering group which is a partner and investor in Planetary Resources"

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationPartnership
string The literal detected in the text
organizationURI http://dbpedia.org/resource/Bechtel
organizationStr Bechtel
organization1URI http://dbpedia.org/resource/Planetary_Resources
organization1Str Planetary Resources
triggerStr is a partner

RelationAcquisition

A relation between two organizations where one of them has been acquired by/merged with the second one, e.g., "Dentsu Inc. closed its nearly $5 billion deal to buy Aegis Group today", "Bridgepoint is nearing an agreement to buy Cambridge Education"

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationAcquisition
string The literal detected in the text
acquirerOrgURI http://dbpedia.org/resource/Dentsu
acquirerOrgStr Dentsu Inc.
acquiredOrgURI http://dbpedia.org/resource/Aegis_Group
acquiredOrgStr Aegis Group
triggerStr to buy

RelationOrganizationLocation

A relation between an organization and its location. It could also be thought of as the location where an organization is active, e.g., "Ontotext, Sofia", "California-based Apple".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationLocation
string The literal detected in the text
organizationURI http://dbpedia.org/resource/Apple_Inc.
organizationStr Apple
locationURI http://dbpedia.org/resource/California
locationStr California

RelationOrganizationQuotation

A relation between an organization and a quotation by the same organization, e.g., ""If the president had not taken up the issue, we would not be in the position where we are now. This is not where the debate was in September," said the White House."

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationOrganizationQuotation
string The literal detected in the text
organizationURI http://dbpedia.org/resource/White_House
organizationStr White House
quotationStr "If the president had not taken up the issue, we would not be in the position where we are now. This is not where the debate was in September," said the White House.
triggerStr said

Relations of Persons

RelationPersonRole

A relation between a person and his role (in a company, in society, etc.), e.g., "President Obama", "Hugh Laurie is a musician".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationPersonRole
string The literal detected in the text
personURI http://dbpedia.org/resource/Hugh_Laurie
personStr Hugh Laurie
roleURI http://dbpedia.org/resource/Musician
roleStr musician
roleTypeURI  
roleTypeStr  

RelationPersonRoleWithinOrganization

A relation between a person, the organization where he is employed/affiliated with and the role of this person within the organization, e.g., "Bill Gates, founder of Microsoft". "World Bank economist Nicholas Stern"

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationPersonRoleWithinOrganization
string The literal detected in the text
personURI http://dbpedia.org/resource/Nicholas_Stern,_Baron_Stern_of_Brentford
personStr Nicholas Stern
roleURI http://dbpedia.org/resource/Economist
roleStr economist
roleTypeURI  
roleTypeStr
organizationURI http://dbpedia.org/resource/World_Bank
organizationStr World Bank

RelationPersonRoleInLocation

A relation between a person, a location and the role of the person in this location, e.g., "South Africa's president Mbeki".

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationPersonRoleInLocation
string The literal detected in the text
personURI http://dbpedia.org/resource/Thabo_Mbeki
personStr Mbeki
roleURI http://dbpedia.org/resource/President
roleStr president
roleTypeURI  
roleTypeStr  
locationURI http://dbpedia.org/resource/South_Africa
locationStr South Africa

RelationPersonRoleWithinOrganizationInLocation

A relation between a person, the organization where he is employed, the location where this organization is based, and the role of the person in the organization, e.g., "... argues John Micklethwait, editor-in-chief of the London-based The Economist."

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationPersonRoleWithinOrganizationInLocation
string The literal detected in the text
personURI http://dbpedia.org/resource/John_Micklethwait
personStr John Micklethwait
roleURI http://dbpedia.org/resource/Editor-in-chief
roleStr editor in chief
roleTypeURI  
roleTypeStr  
locationURI http://dbpedia.org/resource/London
locationStr London
organizationURI http://dbpedia.org/resource/The_Economist
organizationStr The Economist

RelationPersonQuotation

A relation between a person and a statement made by the same person, e.g., ""It is clear to me that we have made real progress in several areas and that we have a credible way forward," Mr. Obama said."

Features

Name Description
inst The unique URI of the extracted relation instance
class The relation type, e.g., http://www.ontotext.com/proton/protontop#RelationPersonQuotation
string The literal detected in the text
personURI http://dbpedia.org/resource/Barack_Obama
personStr Obama
quotationURI  
quotationStr "It is clear to me that we have made real progress in several areas and that we have a credible way forward,"

Keyphrase (Topics)

These are words and phrases extracted from the document by a supervised algorithm, trained against a corpus of generic news in English. The criteria for picking up keyphrases are: relevance (based on a heuristic scoring formula that takes into account the tf and idf scores of each word that belongs to a candidate phrase), and individual term weights learned by example from the instances used for training the algorithm.

Features

Name Description
inst Unique URI of the extracted topic
class Topic type, e.g. http://www.ontotext.com/proton/protontop#Topic
string The literal detected in the text

Sentence

This annotation is a result of automatic sentence splitting. Each annotation corresponds to a single sentence.

Features

Name Description
string the sentence text

REST API

The details on the REST API for the News Annotation service are available on the Text Analytics page.

Example

In our example we will use a very simple request for annotating just a couple of sentences of text from a Guardian news article:

Operators of a northeast Ohio bridal shop linked to an Ebola survivor say the store is closing because it lost significant business and has been stigmatized.

Dallas nurse Amber Vinson was diagnosed with Ebola days after visiting Coming Attractions Bridal & Formal store in Akron in October

For the sake of clarity, if you annotate the sample text above with the demo UI of S4 you will see a result like this:

The JSON request for the News service will look like (Please refer to the Text Analytics page for details on the JSON input/output formats):

RESTful Request (Plain Text Content)

We are now ready to send a simple RESTful request to the S4 text analytics services using a simple command line tool like curl:

Lets go step-by-step through the sample code above:

  1. we specify the API Key and secret - all S4 requests need a valid API key and secret pair which can be generated from the S4 Management Console
  2. we specify the S4 RESTful service to be used - in this case the "News" analytics service. Note that as part of the endpoint URL we also provide the API key and secret
  3. we have chosen to analyse an simple snippet of text (from a news article)
  4. we construct the proper JSON request document - comprised of the content + "text/plain" as content type
  5. we make a RESTful request to the S4 service via curl, providing the JSON request document (from step 4), the S4 service endpoint (from step 2) and we specify in the HTTP header that this HTTP request type is "application/json"

RESTful Request (Office Formats Content)

The following example demonstrates the processing of Office documents (Word) as input for the S4 text anaylitics services. The result is in the format described in the next section.

API Key, Secret and service URL configured in the same way as in the previous example. The request payload comprises of two parts:

  1. the JSON_REQUEST body specifying the document type "application/msword"
  2. the MS Office Word document, contained in file MS_WORD_DOCUMENT

The RESTful request itself is performed via curl as multipart message. The HTTP request type should not be explicitly provided (curl configures it properly), however the JSON part 'meta' should explicitly set its content type ("type=application/json")

JSON Result

The result of the service invocation is another JSON document (the structure is described on the Text Analytics page) which contains annotations and their offsets for various entities found in text:

  • Locations: "Dallas", "Akron", "Ohio"
  • Person: "Amber Vinson"
  • Relations: "Dallas nurse Amber Vinson" (RelationPersonCareer)
  • original text snippet + sentence splits

The full JSON response is available below.

Some important details:

  • the original text (rows 1-3) and the sentence splits (rows 6-19) are available
  • the offsets of the annotations in the original text are provided with the "indices" key
  • additional annotation information such as type, id, ambiguity level, string, etc is available
  • the class and the instance that the annotation represents are also available
  •  
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.