Self Managed Database

Skip to end of metadata
Go to start of metadata
Table of Contents

Introduction

The self-managed version of GraphDB is a hosted database in the Cloud providing all the power of a scalable triple store as a pay-by-the-hour service through Amazon Web Services. GraphDB (Free or Standard Edition) can be purchased as an AMI running on EC2 instances from 1-core / 2 GB RAM to 8-core / 64 GB RAM. 

Our customers often tell us that they want to develop and test in the cloud before bringing projects in-house. Now, you can do that without the need for buying GraphDB licenses or provision hardware first - GraphDB in the Cloud is perfect for running limited-time projects or low-volume experiments in a production-quality setting without an investment in hardware.

All GraphDB instances are designed to store data on user-supplied Amazon EBS volumes (network attached storage), so that your data is persisted and safe even if the instance is not running. GraphDB in the Cloud is accessible via standard RESTful APIs and SPARQL endpoints

Amazon Web Services

The following Amazon Web Services concepts which are related to running GraphDB on the AWS cloud:

  • AWS Marketplace is an online marketplace which makes it possible for customers to use its "1-Click deployment" to instantly launch pre-configured software and services on the AWS cloud infrastructure and pay only for what they use by the hour
    • The GraphDB software is available as a product on the AWS Marketplace.
  • AMI (Amazon Machine Image) provides a virtual server image which can be instantly launched on the AWS cloud
    • GraphDB provides such an AMI, and customers can provision it on virtual instances running on AWS.
  • EC2 (Elastic Compute Cloud) is the computing infrastructure where AMIs are launched as virtual instances. Security groups configure the firewalls controlling the netwprk traffic to a running virtual EC2 instance. Key pairs are used to encrypt and decrypt login information and must be used for accessing a running EC2 instance.
    • The GraphDB AMI will be provisioned as an EC2 virtual instance and a security group will be used to restrict network access to the instance, based on the user preferences
    • the user will use the private key pair to log into the running EC2 virtual instance with GraphDB
  • EBS (Elastic Block Store) provides network attached storage volumes that can be used with running EC2 instances
    • the EBS volume is created via and managed by the user's own AWS account. The user is responsible for data volume maintenance tasks such as: volume expansion, snapshots, backup & restore.
  • on-demand EC2 instances are charged by the hour with no long-term commitments or upfront payments, while the reserved EC2 instances provide a cheaper alternative to on-demand instances for longer term use. Note that GraphDB SHOULD NOT be deployed on spot instances, since they can be terminated abruptly which can lead to database file corruption.

Pricing Details

GraphDB in the AWS cloud is available in various server configurations:

database type
AWS instance type
virtual cores
RAM (GB)
price ($/hour)
GraphDB Cloud | GraphDB Free
EC2 cost ($/hour) data volume estimate
(triples)
XS
T2-S
1
2
------ | free
0.02
50 million
S
T2-M
2
4
------ | free
0.03 - 0.05 (reserved/on-demand) 200 million
M
M4-L / T2-L
2
8
0.35 | free
0.10 - 0.14 (reserved/on-demand)
500 million
L
R3-L
2
15
0.40 | free
0.11 - 0.18 (reserved/on-demand) 1 billion
XL
R3-XL
4
30
0.75 | free
0.22 - 0.35 (reserved/on-demand) 2 billion
2XL
R3-2XL
8
61
1.40 | free
0.44 - 0.70 (reserved/on-demand) 4 billion

The EC2 cost depends on the type of instance being used - on demand instances are optimal only for short term and occasional use, while reserved instances are optimal for longer term and more frequent use.

Note that GraphDB in the AWS cloud SHOULD NOT be deployed on spot instances, since they can be abruptly terminated and this may lead to data corruption

Prerequisites

In order to use GraphDB on AWS you need the following:

  1. A valid AWS account
  2. EC2 security group in the same region as the EC2 instance and configured as follows. Alternatively, the security group can be created and configured at instance launching time:
    1. Port 22 open to the IPs which will be administering the EC2 instance
    2. Port 8080 open to the IPs which need to access GraphDB (upload data, SPARQL queries, Workbench, etc)
  3. An EC2 key pair in the same region, used for user authentication on the EC2 instance. Alternatively, the key pair can be created at EC2 instance launching time.

Setup

The process of configuring and starting GraphDB in the AWS cloud involves the following steps:

  1. Activating the GraphDB product on the AWS Marketplace (one time step)
  2. Starting an EC2 instance with the GraphDB AMI
  3. Logging into the running EC2 instance via SSH
  4. Mounting the EBS data volume on the filesystem of the running EC2 instance
  5. Starting the GraphDB server
  6. Creating and configuring repositories with the Workbench
  7. Verifying that everything is correctly configured and running

The following diagram shows the sequence of steps to be followed:  

After an EC2 instance with GraphDB is activated and the GraphDB server is started the customer may access it via the public IP address of the particular EC2 instance as:

  • OpenRDF / rdf4j RESTful service, including a standard SPARQL endpoint
  • GraphDB Workbench web based administration tool for configuring, querying and monitoring a running GraphDB database

Buying the GraphDB Product on the AWS Marketplace

  • Sign In to the AWS Marketplace portal
  • Search for the "GraphDB Cloud" or the "GraphDB Free" products
  • alternatively, access directly the GraphDB Cloud or GraphDB Free product pages on the AWS Marketplace:

  • A detailed product preview page including pricing options shows up. Select the Continue button to proceed to the purchase preview screen.

  • The purchase preview screen offers two options for launching the product: 1-Click Launch and Manual Launch. The following sections follow the process of manual launching the product via the EC2 Console describing the various configuration options and their default values

EC2 Instance Configuration & Startup

  • Choose an instance type - one of: m4.large, r3.large, r3.xlarge, r3.2xlarge
  • Add storage. The GraphDB AMI is bundled with a pair of EBS volumes - one for the application and one for the data storage. The latter can be reused beyond the life-cycle of the product usage and initially it contains no data. There are several important parameters which might be adjusted at this step:
    • Volume size - by default it will allocate 4GiB (sufficient for approximately 15 million triples) but depending on the estimated needs the size should be adjusted prior to volume creation
    • Volume type - affects the IO performance (SSD vs Magnetic drives)
    • Delete on Termination SHOULD NOT be selected. Otherwise the data will be lost after machine termination
    • Device name (/dev/sdf) SHOULD NOT be changed
    • If there already exists a data volume from previous use of the system, remove the second volume configuration row and attach the old volume manually when the instance is already running (as /dev/sdf)

  • creating a security group (or reusing an existing one). Two ports has to be opened: 22 (SSH) for EC2 instance management; 8080 (HTTP) for accessing GraphDB service (Workbench UI & RESTful APIs)

  • creating a key pair (or reusing an existing one)

  • Review and Launch

GraphDB Startup

  • Login into the instance via SSH using the private key for the EC2 instance and user ec2-user

  • run the script responsible for proper mounting of the EBS data volume:

The script verifies that the EBS data volume is properly attached and creates a mount point for it. If the EBS volume is not attached yet for some reason, the script prompts the user for that and performs several delayed retries giving time to the user to attach the volume via the AWS Management Console. If the time is not sufficient this script should be rerun again.

On successful execution of the script confirms that the volume is mounted and prints out the mount point location: /data_mount/data.

  • Running the GraphDB service:

The script will verify that the data volume is available (if not it terminates with a reminder message) and will start the service:

Workbench Configuration

  • Open the GraphDB Workbench UI in your web browser under http://<instance-public-url>:8080
if you are running GraphDB version 6.6.5 or older, the service URL is http://<instance-public-url>:8080/graphdb


If you are running GraphDB version 6.6.5 or older for the first time, then you have to setup the data location manually via Admin > Locations and Repositories > Attach Location.

In the newer GraphDB versions this property is preset


If the data volume attached was used previously, the old repositories will be detected and listed under Admin > Locations and Repositories.

Verifying the Configuration & Startup

  • Testing the service. Back in the SSH console, test the configuration of the GraphDB instance by executing:

It will perform various automated tests like creating a repository, loading some data, query the data and delete the repository. Results from each test is printed in the console.

GraphDB Performance Tuning

This section provides a guidance on the recommended configuration for your GraphDB server.

The following parameters control the amount of memory assigned to each of the different caches:

configuration settings per
instance type
parameter name
(unit)
description
M L XL 2XL
data volume estimate (triples) -
 - 500 million 1 billion 2 billion 4 billion
AWS instance type / RAM
-
-
M4-L /
8 GB
R3-L /
15 GB
R3-XL /
30 GB
R3-2XL /
61 GB
Entity index size entity-index-size defines the number of entity hash table index entries; the bigger the size, the less the collisions in the hash table and the faster the entity retrieval; the entity hash table does not rehash, so its index size is constant throughout the life of the repository. 75000000
150000000 300000000 600000000
Total cache memory
cache-memory
(bytes)
The amount of memory to be distributed among different caches 3414m
6394m 12924m 26551m
Tuple index memory
tuple-index-memory
(bytes)
Memory used for PSO and POS caches 2561m
4796m 9693m 19913m
Enable predicate indices
enablePredicateList enables or disables mappings from an entity (subject or object) to its predicates; switching this on can drastically speed up queries that use wildcard predicate patterns. yes yes yes yes
Predicate index memory
predicate-memory
(bytes)
specifies the amount of memory to be used for predicate lists cache 853m
1598m
3231m 6638m
Use context index
enable-context-index if set to 'true' then GraphDB will build and use the context index/indices yes yes yes yes

All of these performance related settings can be configured from the GraphDB Workbench at repository creation time:

GraphDB Shutdown & Restart

The termination of the GraphDB service should be done only via the provided shell script:

This will perform a graceful shutdown of the service persisting any in memory data to the EBS volume. This operation might take some time so be sure there's no active java process prior to restarting the service or terminating the EC2 instance.

The GraphDB service can be started again at any time (only possible if the EC2 is stopped rather than terminated) with these steps:

  1. Mount the external EBS volume with the data:
  2. Start the GraphDB service:

Stopping the EC2 Instance

Note that the GraphDB service has to be gracefully shut down as explained in the previous step

The EC2 resources can be completely or partially released depending on the use case requirements:

  • stopping the instance - this operation stops the instance and preserves its filesystem state. You can use the EC2 Management Console for performing this task. This scenario is appropriate when the service is not needed for certain time period but it will be restarted later when it is necessary. In this case the attached EBS volume remains attached.
  • terminating the instance - complete termination of the service. This terminates the EC2 machine and its file system. Only the EBS data volume remains intact and it is automatically detached.

Working with GraphDB Cloud

REST API

The GraphDB Cloud REST API is based on the RDF4J API

The REST endpoint URL is http://<instance-public-url>:8080/graphdb/

The following sections provide details on the REST API usage

Managing Repositories & Querying Data

resource
method
parameters
details
/repositories
GET
- get information on the repositories in the database
/repositories/<REPOSITORY>
GET
  • query - the query to evaluate
  • infer (optional) - specifies whether inferred statements should be included in the query evaluation. Inferred statements are included by default ("true")
This resource represents a SPARQL query endpoint for the repository
/repositories/<REPOSITORY> POST
same as GET
same as GET. POST can be used in cases where the length of the (URL-encoded) query exceeds practicable limits of proxies, servers, etc. In case a POST request is used, the query parameters should be send to the server as www-form-urlencoded data.
/repositories/<REPOSITORY> DELETE
- deletes a repository and its data from the database

Create, Read, Upload & Delete Data

resource
method
parameters
details
/repositories/<REPOSITORY>/statements GET
  • subj (optional) - restricts the GET operation to statements with the specified resource as subject
  • pred (optional) - restricts the GET operation to statements with the specified URI as predicate.
  • obj (optional) - restricts the GET operation to statements with the specified value as object
  • context (optional) - If specified, restricts the operation to one or more specific contexts in the repository
  • infer (optional) - Specifies whether inferred statements should be included in the result of GET requests. Inferred statements are included by default. Specifying any value other than "true" (ignoring case) restricts the request to explicit statements only
fetches specific (or all) statements from the repository
/repositories/<REPOSITORY>/statements POST
  • baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against
  • update (optional) - specifies the SPARQL 1.1 Update string to be executed. The value is expected to be a syntactically valid SPARQL 1.1 Update string
Performs updates on the data in the repository. The data supplied with this request is expected to contain either an RDF document, a SPARQL 1.1 Update string, or a special purpose transaction document. If an RDF document is supplied, the statements found in the RDF document will be added to the repository. If a SPARQL 1.1 Update string is supplied, the update operation will be parsed and executed. If a transaction document is supplied, the updates specified in the transaction document will be executed
/repositories/<REPOSITORY>/statements PUT
  • baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against
Updates data in the repository, replacing any existing data with the supplied data. The data supplied with this request is expected to contain an RDF document (RDF/XML, N-triples, Turtle, N3, RDF/JSON, ...)
/repositories/<REPOSITORY>/statements DELETE
  • subj (optional) - restricts the DELETE operation to statements with the specified resource as subject
  • pred (optional) - restricts the DELETE operation to statements with the specified URI as predicate.
  • obj (optional) - restricts the DELETE operation to statements with the specified value as object
  • context (optional) - If specified, restricts the operation to one or more specific contexts in the repository
Deletes statements from the repository

Working with Named Graphs

resource
method
parameters
details
/repositories/<REPOSITORY>/rdf-graphs GET - get information on the named graphs in the repository
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET
  fetches statements in the named graph from the repository
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> PUT
  Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> POST
  Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/<GRAPH> DELETE
  Delete all data in the named graph in the repository.
/repositories/<REPOSITORY>/rdf-graphs/service GET
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTEEach request needs to specify precisely one of the above parameters.
fetches statements in the named graph from the repository
/repositories/<REPOSITORY>/rdf-graphs/service
PUT 
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTEEach request needs to specify precisely one of the above parameters.
Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats 
/repositories/<REPOSITORY>/rdf-graphs/service
POST
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTEEach request needs to specify precisely one of the above parameters.
Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats
/repositories/<REPOSITORY>/rdf-graphs/service
DELETE
  • graph (optional) - specifies the URI of the named graph to be accessed
  • default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.
    NOTEEach request needs to specify precisely one of the above parameters.
Delete all data in the named graph in the repository.

Working with Namespaces and Contexts

resource
method
parameters
details
/repositories/<REPOSITORY>/contexts GET
- Gets a list of resources that are used as context identifiers
/repositories/<REPOSITORY>/size GET
  • context(optional) - If specified,restricts 
    the operation to one or more specific contexts in the repository
Gets the number of triples in a repository
/repositories/<REPOSITORY>/namespaces GET
- Gets a list of namespace declarations that have been defined for the repository
/repositories/<REPOSITORY>/namespaces DELETE
- Removes all namespace declarations from the repository.
/repositories/<REPOSITORY>/namespaces/<PREFIX> GET
- Gets the namespace that has been defined for a particular prefix
/repositories/<REPOSITORY>/namespaces/<PREFIX> PUT
- Defines or updates a namespace declaration, mapping the prefix to the namespace that is supplied in plain text in the request body
/repositories/<REPOSITORY>/namespaces/<PREFIX> DELETE
- Removes a namespace declaration for a particular prefix

Content Types

MIME types for RDF formats

Format MIME type
RDF/XML application/rdf+xml
N-Triples text/x-nquads 
Turtle
text/turtle
N3
text/rdf+n3
N-Quads
text/x-nquads
RDF/JSON
application/rdf+json
TriX
application/trix
TriG
application/x-trig
Sesame Binary RDF
application/x-binary-rdf

MIME types for variable binding formats

Format MIME type
SPARQL Query Results XML Format application/sparql-results+xml
SPARQL Query Results JSON Format application/sparql-results+json
Binary RDF Results Table Format
application/x-binary-rdf-results-table

MIME types for boolean result formats

Format MIME type
SPARQL Query Results XML Format application/sparql-results+xml
SPARQL Query Results JSON Format
application/sparql-results+json
Plain Text Boolean Result Format
text/boolean

GraphDB Workbench

Administration

User management

By default, the user management and security is disabled. To enable it, go to Admin > Users and Access and enable the Security option

The default login for the 'admin' user is with password 'root'. Make sure you change the password as soon as the security is enabled!

EBS Volume Expansion

  • Stop the instance if it is running (not terminate)
  • Create a snapshot of the volume to be expanded
  • From the new snapshot create a new volume with the desired size (in the same availability zone)
  • Detach the old volume from the instance
  • Attach the new volume
  • Start the instance and expand the file system on the new volume from with the instance

Detailed description is available at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

Backup & Restore

Backing up the data is a simple process of taking snapshot of the EBS data volume. The snapshot then can be used for restoring the application data state or for replication of the data or migrating it to other data center.

The proper order of steps for data backup are:

  • stop the GraphDB service to ensure all in-memory data is persisted properly on the file system
  • stop the AWS instance to ensure the file system is in consistent state
  • take a snapshot of the EBS data volume
  • restart the AWS instance and the GraphDB service.

Data restore steps (on running AWS instance):

  • stop the GraphDB service if it is running
  • detach the old EBS data volume (if any)
  • create a new EBS volume from the backup data snapshot
  • attach the new volume on /dev/sdf device
  • run the attach_data_vol.sh script and then the GraphDB service

Data restore steps (new AWS instance):

  • in the Launch instance wizard,
    • remove the default blank data volume
    • add the backup data snapshot as a source for the data volume
  • follow the rest of the start-up and configuration procedure described above

Upgrading the GraphDB Product

This section describes the procedure for upgrading the GraphDB product whenever a newer version is available on the AWS Marketplace. An older version of GraphDB will still remain functional, but updating to the latest one is always recommended due to the improvements in performance and stability.

The upgrade process should follow these steps:

  1. starting a new EC2 instance with the latest version of the GraphDB product via the AWS Marketplace. We'll refer to this instance as EC2-NEW
  2. stopping the GraphDB service/process on the old instance (we'll refer to it as EC2-OLD)
  3. detaching the EBS data volume from EC2-OLD
  4. attaching the EBS data volume to EC2-NEW
  5. starting the GraphDB service/process on EC2-NEW
  6. terminating the EC2-OLD instance which is no longer needed

The following sections provide detailed instructions & screenshots for performing the upgrade procedure:

  1. Log into the EC2-OLD instance
  2. Stop the GraphDB service/process
    Use the graphdb.sh script to stop the service
  3. Unmount the EBS data volume in order to transfer it later to the EC2-NEW instance
  4. From the AWS Management Console detach the EBS data volume from the EC2-OLD instance. To identify the correct volume, in the Attachment Information column search for value: <old-instance-id>:/dev/sdf
  5. Launch the EC2-NEW instance with the latest version of the GraphDB product
    1. On the AWS Marketplace portal select Manual Launch instead of 1-Click Launch
    2. Locate the Region in which the EC2-OLD instance is running and use the corresponding Launch with EC2 Console button (redirecting to the Launch instance wizard of the AWS Management Console)
    3. On the Configure Instance Details select the Availability Zone where the EC2-OLD instance is running
    4. On the Add Storage tab remove the suggested additional EBS volume, leaving only the Root storage device: 
    5. Launch the EC2-NEW instance and wait until its state indicates running before proceeding with the next steps
  6. Attach the EBS data volume to the EC2-NEW instance in the AWS Management Console. The attachment device name should be left to its default value: /dev/sdf
  7. Log into the EC2-NEW instance and execute the two startup scripts in the same way described in section GraphDB Startup in this document:
  8. Test that the GraphDB database is operational (execute a sample SPARQL query, or connect to the endpoint via the GraphDB Workbench or a 3rd party tool)
  9. Terminate the EC2-OLD instance (which is still running) from the AWS Management Console

GraphDB Documentation

Support

The standard S4 support channels are available for questions, feedback and general information related to GraphDB on AWS:

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.