Fully Managed Database

Skip to end of metadata
Go to start of metadata
Table of Contents

Introduction

The fully managed version of GraphDB in the Cloud will provide an enterprise-grade RDF graph database as-a-service (DBaaS). Developers no longer need to deal with DBA tasks such as installation and upgrades, provisioning and deployment, backups and restores, as well as ensuring database availability - the DBaaS provides a highly available database in the Cloud, accessible 24/7 from anywhere. It is available with pay-per-use pricing, where the user will pays only for the actual utilisation of the database (number of triples stored and number of queries executed).

The RDF graph database-as-a-service is the perfect solution for scenarios with small or medium database size and query load, where investing in software licenses and provisioning and maintaining an on-premise 24/7 server is not cost optimal.

Pricing Details

The RDF graph database-as-a-service is available in various configurations:

database type
max triples
max repositories
price (month)
Micro
1 million
2
FREE
XS
10 million
4
FREE
S
50 million
4
$49 / month
M
250 million
8
$175 / month
L
1 billion
8
$450 / month

Setup

Creating a Database

Fully managed databases are created via the S4 Management Console.

Creating Repositories

After the empty database is created, one or more repositories can be created within it. The repository creation can be performed via:

The following JSON snippet shows a repository configuration, which needs to be sent to the DBaaS REST API for the repository creation:

The various configuration parameters of the JSON configuration file are:

parameter
required
description
repositoryID
yes
Unique identifier for the repository within the database. The identifier will appear in the service access URL and it may contain lowercase letters, digits, '-', '_'
label
no
Human readable description for the repository (free text)
ruleset
yes
Inference rule set specification. One of:
  • empty
  • rdfs-optimized
  • rdfs
  • owl-horst-optimized
  • owl-horst
  • owl-max-optimized
  • owl-max
  • owl2-ql-optimized
  • owl2-ql
  • owl2-rl-optimized
  • owl2-rl
base-URL
no
Specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository
enablePredicateList
no Enables or disables mappings from an entity (subject or object) to its predicates; switching this on can drastically speed up queries that use wildcard predicate patterns. Default: false
proportions
no Determines the resources distribution (memory cache) between the repositories within a database. This specification is necessary in cases when more than one repository is hosted in the database AND the distribution of resources is not even. If omitted, all repositories (if any preexist) will automatically be set to equal size.

S4 Management Console


cURL (repository creation)

A new repository can be created via a simple RESTful request providing the repository configuration in JSON format. Several request parameters should be provided, such as user API keys, the database, as well as the unique user number:

  • API_KEY & KEY_SECRET are the API key/secret pair generated via the S4 Management Console
  • USER is a unique number for the user - all database endpoints for the user have it in the endpoint URL, and you can see it displayed on the S4 Management Console (see image)
  • DATABASE is the database name that the user has specified during the database creation process (see image)

The database service endpoint URL can be constructed with the proper authentication credentials as follows:

API_KEY=…
KEY_SECRET=…
USER=…
DATABASE=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"

For the cURL example, the JSON repository configuration file can be created like this:

CONFIG="{
   \"repositoryID\" : \"test01\",
   \"label\" : \"Description of my repository\",
   \"ruleset\" : \"owl-horst-optimized\",
   \"base-URL\" : \"http://example.org/graphdb#\"
}"

Here is the complete cURL example which creates a new repository in the database. Note that the proper content type of the request body needs to be application/json

CONFIG="{
   \"repositoryID\" : \"test01\",
   \"label\" : \"Description of my repository\",
   \"ruleset\" : \"owl-horst-optimized\",
   \"base-URL\" : \"http://example.org/graphdb#\"
}"

API_KEY=...
KEY_SECRET=...
USER=...
DATABASE=...
REPOSITORY_ID=...
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"

curl -X PUT -H "Content-Type:application/json" -d "$CONFIG" $SERVICE_ENDPOINT/repositories/$REPOSITORY_ID

Java / OpenRDF SDK (repository creation)

The sample code is available from the S4 GitHub repository

Step 1: Create an API key
Step 2: Specify your details:

String userId="<userId>";
String dbaasName="<database name>";
String repositoryId="<repository id>";
String ApiKey = "<api key>";
String ApiPass = "<api pass>";

Step 3:Set up the Http Client Context:

HttpClientContext ctx = HttpClientContext.create();
UsernamePasswordCredentials creds = new UsernamePasswordCredentials(ApiKey, ApiPass);
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(AuthScope.ANY, creds);
ctx.setCredentialsProvider(credsProvider);

Step 4: Prepare repository configuration

CreateRepositoryRequest createRepo=new CreateRepositoryRequest();
createRepo.setRepoId(repositoryId);
createRepo.setRuleset("owl-horst-optimized");

Step 5:Prepare the put headers

HttpPut put = new HttpPut("https://rdf.s4.ontotext.com/"+userId+"/"+dbaasName+"/repositories/"+repositoryId+"/");
put.setHeader("Content-Type", "application/json");
put.setHeader("Accept", "application/json");

Step 6: Add the configuration to the put:

ObjectMapper mapper=new ObjectMapper();
String message=mapper.writeValueAsString(createRepo);
put.setEntity(new StringEntity(message, Charset.forName("UTF-8")));

Step 7: Execute the request:

CloseableHttpResponse response = httpClient.execute(put, ctx);

Step 8: Print the results:

InputStream content = response.getEntity().getContent();
StringWriter sw = new StringWriter();
IOUtils.copy(content, sw, "UTF-8");
String result=sw.toString();
System.out.println(result);

Repository management

To provide a maximum flexibility of capacity provisioning for the repositories, S4 provides a mechanism for distributing the total memory used for caching by the database among the different repositories in the database. The cache memory configuration is also specified as a simple JSON structure:

{
   "proportions":[
      {
          "repositoryID" : "demo-repository1",
          "percentage" : 45
      },
      {
          "repositoryID" : "demo-repository2",
          "percentage" : 55
      },
      ...
    ]
}

The cache memory configuration MUST list all repositories and the sum of the percentage distribution MUST be equal to 100, or an error will be returned.

The cache memory configuration is optional - if the user does not specify it, then when a new repository is created the system will automatically distribute equally the memory available for cache among all repositories in the database, e.g.: 100% (single repository in the DB) 50/50 (2 repositories), 33/33/33 (3 repositories), and so on.

Uploading Data

As soon as a database is successfully created and it's status turns from Pending to OK, the URL indicated in the management interface can be opened in a separate browser window. This gives access to the GraphDB Workbench which is a visual tool for database administration and data management. The current version of the workbench is restricted only to data management and for the rest of the functionalities the S4 Management Console should be used.

Note: for accessing the integrated GraphDB Workbench, the user is asked to provide valid API key for authentication (not main account credentials), which can be generated via the S4 Management Console.

GraphDB Workbench (data upload)

OpenRDF Workbench (data upload)

You can use an open source Web UI like the OpenRDF Workbench to easily upload data into the database:

cURL (data upload)

You can upload RDF data (example.rdf) via the following cURL script:

API_KEY=…
KEY_SECRET=…
USER=…
DATABASE=…
REPOSITORY=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"

curl -X POST -H "Content-Type:application/rdf+xml;charset=UTF-8" -T example.rdf $SERVICE_ENDPOINT/repositories/$REPOSITORY/statements

Note that you need to specify the correct content type for the request body, according to the format of the RDF file you're uploading (RDF/XML, N-triples, Turtle, N3, RDF/JSON, ..)

Java / OpenRDF SDK (data upload)

The following example in Java uploads data into a repository, via the OpenRDF SDK for Java:

String dbaasURL = "<dbaas URL>";
String repositoryId="<repository ID>";
String pathToTheFile="<pathToTheFile>";
String ApiKey = "<api-key>";
String ApiPass = "<api-pass>";

//The base URI to resolve any relative URIs that are in the data against.
String baseURI="http://www.example.org";

// Create RemoteRepositoryManager
RemoteRepositoryManager manager = RemoteRepositoryManager.getInstance(dbaasURL, ApiKey, ApiPass);

// Get the repository to use
Repository repository = manager.getRepository(repositoryId);

// Open a connection to this repository
RepositoryConnection repositoryConnection = repository.getConnection();

File fileToUpload=new File(pathToTheFile);

repositoryConnection.add(fileToUpload,baseURI , RDFFormat.RDFXML);

repositoryConnection.close();

GraphDB Workbench (SPARQL Update)

OpenRDF Workbench (SPARQL Update)

cURL (SPARQL Update)

You can upload data using SPARQL Update via the following cURL script:

API_KEY=...
KEY_SECRET=...
USER=...
DATABASE=...
REPOSITORY=...
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"
SPARQL_UPDATE="INSERT..."

curl -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "update=$SPARQL_UPDATE" $SERVICE_ENDPOINT/repositories/$REPOSITORY/statements

Java / OpenRDF SDK (SPARQL Update)

coming soon

Querying Data

GraphDB Workbench (data query)

OpenRDF Workbench (data query)

cURL (data query)

You can execute SPARQL queries via the following cURL script:

API_KEY=…
KEY_SECRET=…
USER=…
DATABASE=…
REPOSITORY=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"
SPARQL_QUERY="…"

curl -X POST -H "Accept:application/sparql-results+xml" -d "query=$SPARQL_QUERY" $SERVICE_ENDPOINT/repositories/$REPOSITORY

Java / OpenRDF SDK (data query)

In this section, we show how to execute SPARQL query, via the OpenRDF SDK for Java:

String dbaasURL = "<dbaas URL>";
String repositoryId="<repository ID>";
String ApiKey = "<api-key>";
String ApiPass = "<api-pass>";
String queryString = "<query>";
RemoteRepositoryManager manager = RemoteRepositoryManager.getInstance(
		dbaasURL, ApiKey, ApiPass);

// Get the repository to use
Repository repository = manager.getRepository(repositoryId);

RepositoryConnection con = repository.getConnection();


TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
		queryString);

TupleQueryResult result = tupleQuery.evaluate();

while (result.hasNext()) { // iterate over the result
	BindingSet bindingSet = result.next();
	System.out.println(bindingSet);
	// do something interesting with the values here...
}

result.close();

con.close();

REST API

The fully managed DBaaS REST API is based on the OpenRDF API

The base URL always looks like https://rdf.s4.ontotext.com/<USER>/<DATABASE>/, where

  • USER is unique alphanumeric string for the user (system assigned, see image)
  • DATABASE is the database name (specified by the user, see image)

Managing Repositories & Querying Data

resource
method
parameters
details
/repositories
GET
- get information on the repositories in the database
/repositories/<REPOSITORY>
GET
  • query - the query to evaluate
  • infer (optional) - specifies whether inferred statements should be included in the query evaluation. Inferred statements are included by default ("true")
This resource represents a SPARQL query endpoint for the repository
/repositories/<REPOSITORY> POST
same as GET
same as GET. POST can be used in cases where the length of the (URL-encoded) query exceeds practicable limits of proxies, servers, etc. In case a POST request is used, the query parameters should be send to the server as www-form-urlencoded data.
/repositories/<REPOSITORY> PUT
A simple JSON structure of parameter-value pairs: 
  • repositoryID - unique name for the repository (valid characters: alpha-numeric, dash, underscore)
  • label - Human readable repository description (optional)
  • ruleset - the inference type to be used by the repository engine. Valid values for the ruleset include:
    • none
    • rdfs-optimized
    • rdfs
    • owl-horst-optimized
    • owl-horst
    • owl-max-optimized
    • owl-max
    • owl2-ql-optimized
    • owl2-ql
    • owl2-rl-optimized
    • owl2-rl
  • base-URL - ...
  • enablePredicateList - use additional indexes for predicates. Default: false.
create a new repository in the database. The repository configuration is provided as a simple JSON document containing the important parameters. 
Example:

{
   "repositoryID" : "demo-repository1",
   "label" : "Demo repository number 1",
   "ruleset" : "rdfs-optimized",
   "base-URL" : "http://example.org/graphdb#",
   "enablePredicateList" : "true"
}

NOTE: this method is an extension specific to the S4 DBaaS platform, and it is not originally available in the OpenRDF REST API.
/repositories/<REPOSITORY> DELETE
- deletes a repository and its data from the database

Create, Read, Upload & Delete Data

resource
method
parameters
details
/repositories/<REPOSITORY>/statements GET
  • subj (optional) - restricts the GET operation to statements with the specified resource as subject
  • pred (optional) - restricts the GET operation to statements with the specified URI as predicate.
  • obj (optional) - restricts the GET operation to statements with the specified value as object
  • context (optional) - If specified, restricts the operation to one or more specific contexts in the repository
  • infer (optional) - Specifies whether inferred statements should be included in the result of GET requests. Inferred statements are included by default. Specifying any value other than "true" (ignoring case) restricts the request to explicit statements only
fetches specific (or all) statements from the repository
/repositories/<REPOSITORY>/statements POST
  • baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against
  • update (optional) - specifies the SPARQL 1.1 Update string to be executed. The value is expected to be a syntactically valid SPARQL 1.1 Update string
Performs updates on the data in the repository. The data supplied with this request is expected to contain either an RDF document, a SPARQL 1.1 Update string, or a special purpose transaction document. If an RDF document is supplied, the statements found in the RDF document will be added to the repository. If a SPARQL 1.1 Update string is supplied, the update operation will be parsed and executed. If a transaction document is supplied, the updates specified in the transaction document will be executed
/repositories/<REPOSITORY>/statements PUT
  • baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against
Updates data in the repository, replacing any existing data with the supplied data. The data supplied with this request is expected to contain an RDF document (RDF/XML, N-triples, Turtle, N3, RDF/JSON, ...)
/repositories/<REPOSITORY>/statements DELETE
  • subj (optional) - restricts the DELETE operation to statements with the specified resource as subject
  • pred (optional) - restricts the DELETE operation to statements with the specified URI as predicate.
  • obj (optional) - restricts the DELETE operation to statements with the specified value as object
  • context (optional) - If specified, restricts the operation to one or more specific contexts in the repository
Deletes statements from the repository

Working with Named Graphs

resource
method
parameters
details
/repositories/<REPOSITORY>/rdf-graphs GET - get information on the named graphs in the repository
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET
  fetches statements in the named graph from the repository
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> PUT
  Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> POST
  Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> DELETE
  Delete all data in the named graph in the repository.
/repositories/<REPOSITORY>/rdf-graphs/service GET
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTE: Each request needs to specify precisely one of the above parameters.
fetches statements in the named graph from the repository
/repositories/<REPOSITORY>/rdf-graphs/service
PUT 
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTE: Each request needs to specify precisely one of the above parameters.
Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats 
/repositories/<REPOSITORY>/rdf-graphs/service
POST
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTE: Each request needs to specify precisely one of the above parameters.
Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/service
DELETE
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTE: Each request needs to specify precisely one of the above parameters.
Delete all data in the named graph in the repository.

Working with Namespaces and Contexts

resource
method
parameters
details
/repositories/<REPOSITORY>/contexts GET
- Gets a list of resources that are used as context identifiers
/repositories/<REPOSITORY>/size GET
  • context(optional) - If specified,restricts 
    the operation to one or more specific contexts in the repository
Gets the number of triples in a repository
/repositories/<REPOSITORY>/namespaces GET
- Gets a list of namespace declarations that have been defined for the repository
/repositories/<REPOSITORY>/namespaces DELETE
- Removes all namespace declarations from the repository.
/repositories/<REPOSITORY>/namespaces/<PREFIX> GET
- Gets the namespace that has been defined for a particular prefix
/repositories/<REPOSITORY>/namespaces/<PREFIX> PUT
- Defines or updates a namespace declaration, mapping the prefix to the namespace that is supplied in plain text in the request body
/repositories/<REPOSITORY>/namespaces/<PREFIX> DELETE
- Removes a namespace declaration for a particular prefix

Database Configuration

resource
method
parameters
details
/settings/cache
PUT A JSON structure defining how the memory cache should be distributed among all the repositories in the database. The structure of the request is as follows:
Specify the memory cache distribution between all the different repositories in the database. Validity requirements:
  • all repositories in the database should be listed
  • the sum of the percentage values should be equal to 100if the JSON specification is considered invalid, then an error will be returned

    NOTE: this method is an extension specific to the S4 DBaaS platform, and it is not originally available in the OpenRDF REST API.
/settings/cache
GET - Get the current specification of memory cache distribution between the repositories in the database.

NOTE: this method is an extension specific to the S4 DBaaS platform, and it is not originally available in the OpenRDF REST API.
/protocol GET - The version of the Sesame server protocol.

Content Types

MIME types for RDF formats

Format MIME type
RDF/XML application/rdf+xml
N-Triples text/x-nquads 
Turtle
text/turtle
N3
text/rdf+n3
N-Quads
text/x-nquads
RDF/JSON
application/rdf+json
TriX
application/trix
TriG
application/x-trig
Sesame Binary RDF
application/x-binary-rdf

MIME types for variable binding formats

Format MIME type
SPARQL Query Results XML Format application/sparql-results+xml
SPARQL Query Results JSON Format application/sparql-results+json
Binary RDF Results Table Format
application/x-binary-rdf-results-table

MIME types for boolean result formats

Format MIME type
SPARQL Query Results XML Format application/sparql-results+xml
SPARQL Query Results JSON Format
application/sparql-results+json
Plain Text Boolean Result Format
text/boolean

Swagger

The Swagger description of the DBaaS REST API is available at http://swagger.s4.ontotext.com/

3rd Party Tools

Metreeca Graph Rover

Graph Rover by Metreeca can be directly used with the fully-managed DBaaS on S4. Once you access the webapp for Graph Rover, you need to only specify the SPARQL endpoint for the RDF repository you want to use with it:


After that, Graph Rover should be able to seamlessly connect to the SPARQL endpoint of the DBaaS on S4 and retrieve information about the triples in your repository

Administration

Backup & Restore

The fully-managed database will automatically backup the data once per day by creating a snapshot of the network-attached storage where the data is stored. This snapshot can be used to quickly restore the database to the state from a point back in time. The S4 platform will preserve up to 3 backups (snapshots) and when a new backup is created, the oldest one will be deleted automatically.

Additionally, the user may initiate a backup or restore procedure manually and at any time from the S4 Management Console. This functionality is accessible on a selected database via context menu  or via the 'Actions' list.

The Create backup operation is preceded by a confirmation message and short backup instructions. During the backup process the database will be temporary inaccessible which is indicated by a MAINTENANCE status of the database. As soon as the backup is completed the database status turns to OK.

The Restore from backup action shows the user information about the latest available database backup. The restore operation will disable the database access until its previous state is recovered completely. During the restoration process the database will be set into  MAINTENANCE mode.

Export & Import

The fully managed database provides the capability to export the full database content into a file (in various RDF serialisation formats). The user may instantiate an export at any time via the "Database" section of the S4 Management Console.

This export file also provides the means for users to migrate their data to a different RDF database if they no longer wish to use the fully managed database on the S4 platform. If the user wishes to migrate from an on-premise RDF database deployment to the fully managed database on S4 then the data must be first exported from the old database instance (in one of the RDF serialisation formats) and then imported into the running S4 database instance. The import an be done via small code snippets in cURL, Java, etc. using the standard OpenRDF API that the S4 database supports. The import can also be performed with OpenRDF tools such as the OpenRDF Workbench. When the GraphDB Workbench is integrated with the S4 platform, it will provide easy means to export / import data from / into the fully managed databases running on S4.

For exporting the contents of a fully managed database on S4 follow the steps:

1. Instantiate the export from the S4 Management Console

2. Select the desired RDF serialisation format (Turtle is the default)

3. Start the export. The export should take only a few seconds on a Micro (1 million) instance.

4. When the export is complete, the database information on the S4 Management Console will be updated with the timestamp of the most recent export

5. At any time the user may download the database export file.

6. The export file is a compressed ZIP archive with one file per database repository within the archive

Support

The standard S4 support channels are available for questions, feedback and general information related to GraphDB on AWS:

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.