Graph database

General concepts

In mathematics, a graph is an abstract representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. Typically, a graph is represented in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges. The figure below is an example of this concept.

Vertices are also referred to as nodes and their properties are often called attributes. For the remainder of the document, graphs will be composed of nodes, edges and attributes.

Sparksee graph database

Sparksee is an embedded graph database management system tightly integrated with the application at code level. As a graph database, stored data is modeled as a graph.

Unlike relational databases where the data model is standard, graph database vendors propose different versions of the graph data model according to the description of a graph as explained in the previous section.

Graph data model

The Sparksee graph model is based on a generalization of the graph concept which can be defined as a labeled attributed multigraph. In Sparksee we refer to the label as the type.

These are its main features:

All node and edge objects belong to a type.
Edges have a direction. The source node is called the tail node and the target or destination node is called the head node.

Edges can also be undirected. Sparksee undirected edges do not have a restricted direction; in fact they can be interpreted as bidirectional as both nodes play the head and tail roles at the same time.

Considering this particularity, Sparksee graph model could also be called a mixed multigraph.
Node and edge objects can have one or more attributes.
As a multigraph, there are no restrictions on the number of edges between two nodes, even if those edges belong to the same type. In addition, loops are allowed.

This data model is more suitable for modeling complex scenarios such as the one in the Figure 2.1 which could be hardly represented using the simplest graph model. In Figure 2.1, there are two types of nodes (PEOPLE represented by a star icon and MOVIE shown as a clapperboard icon) both of which have an attribute (called respectively Name and Title) as well as a value. For instance the Scarlett Johansson (Name) node belongs to the PEOPLE type (star icon). Also there are two types of edges (DIRECTS shown in blue and CAST shown in orange). CAST (between PEOPLE and MOVIE) has an attribute called Character. Moreover, whereas DIRECTS is a directed edge, as it has an arrow pointing to its head node, CAST is an undirected edge type. More attributes could be added to both node and edge objects. Displaying the multigraph property, the Woody Allen node and Manhattan node are linked by two different edges.

Types

Nodes and edges in Sparksee must be of a certain type.

All Sparksee types are identified by a public user-provided unique name, the type name, and an immutable Sparksee-generated unique identifier, the type identifier. The type identifier is used to refer the type when using Sparksee APIs as is explained in the ’Nodes and edges section of the ‘API’ chapter.

In Figure 2.1 the types created are PEOPLE and MOVIES (node types) and CAST and DIRECTS (edge types). Note that we refer to the types with their type name.

Node and edges

Sparksee objects are node or edge instances of a certain type. When they are created they are given an immutable Sparksee-generated unique identifier, the object identifier (OID). The OID is used to refer the object when using Sparksee APIs as is explained in the ‘Nodes and edges’ section from the ‘API’ chapter.

Nodes and edges must belong to a certain type and may have attributes.

In Figure 2.1, 18 objects (9 nodes and 9 edges) are displayed.

Attributes

Sparksee attributes are identified by a unique public user-provided name, the attribute name, and an immutable Sparksee-generated unique identifier, the attribute identifier. As in the case of type identifiers, an attribute identifier is used to refer the attribute when using Sparksee APIs as is explained in the ‘Attributes and values’ section of the ‘API’ chapter.

Sparksee considers the following kind of attributes:

Attributes defined within the scope of a type, the most general type. For this type of attributes the name must be unique amongst all the other attributes defined for that type. Objects (node or edge objects) belonging to that type are the only ones allowed to set and get values for that attribute. For example, we could define the attribute ID for the PEOPLE and MOVIES node type, resulting in two different attributes. Thus, only PEOPLE objects would be able to use the first attribute while MOVIES objects would be able to use the second.
Node attributes. They are not restricted to only one node type. For example, we could define the node attribute NAME for all the node objects of the graph, no matter which specific node type they belong to.
Edge attributes. They are not restricted to only one edge type. For example, we could define the edge attribute WEIGHT for all the edge objects of the graph, no matter which specific edge type they belong to.
Global attributes. They are not restricted to a node or edge type. For example, we could define the global attribute ID for all the objects of the graph, no matter which type they belong to. Take into account that the attribute name must be unique among all other global attributes.

Sparksee attributes are defined for a domain or data type; all values of an attribute belong to a specified data type with the exception of the null value, which does not belong to any data type. Valid Sparksee data types are:

Boolean TRUE or FALSE values.
Integer 32-bit signed integer values.
Long 64-bit signed integer values.
Double 64-bit signed double values.
Timestamp Distance from Epoch (UTC) with millisecond precision. Valid timestamps must be within the range [‘1970-01-01 00:00:01’ UTC, ‘2038-01-19 03:14:07’ UTC].
String Unicode string values. Maximum length is restricted to 2048 characters.
Text Large unicode character object. See ‘Text attributes’ in the ‘Attributes and values’ section of the ‘API’ chapter.
OID Sparksee object identifier values.

Moreover, Sparksee attributes are univalued, which means that an object (node or edge) can only have one value for an attribute. Note that null may also be that value. For multiple values, use an Array attribute (See ‘Array attributes’ in the ‘Attributes and values’ section of the ‘API’ chapter.).

Figure 2.2 shows the attributes extracted from Figure 2.1. PEOPLE nodes have an Id and Name, MOVIES an Id and a Title and edges of type CAST have an attribute Character showing the name of the character of that actor in the movie. Note that we refer to the attributes by the attribute name.

Indexing

Attributes

Different index capabilities can be set for each Sparksee attribute. Depending on these capabilities there are three types of attributes:

Basic attributes: there is no index associated to the attribute.
Indexed attributes: there is an index automatically maintained by the system associated to the attribute.
Unique attributes: the same as for indexed attributes but with an added integrity restriction: two different objects cannot have the same value, with the exception of the null value.

Sparksee operations accessing the graph through an attribute will automatically use the defined index, significantly improving the performance of the operation. Note that only a single index can be associated to an attribute.

Edges

A specific index can also be defined to improve certain navigational operations. Thus, the neighbor index can be set for an specific edge type to be used automatically by the neighbor API (see the ‘Navigation operations’ section of the ‘API’ chapter) significantly improving the performance of this operation.

Processing

A Sparksee-based application is able to manage more than one database, each of them working independently. It is important to keep in mind that a single database can be accessed by a single application or process at a time. Also the connection (open) to the database can only be made once.

Access to the database must be enclosed within a session, and multiple sessions can concurrently access the same database.

Sessions

A session is a stateful period of a user’s activity with a database; it can also be described as an instance of database usage.

Whereas a database can be shared among multiple threads, a session cannot because it is not thread-safe. Also all manipulation of a database must be enclosed into a session. A graph can only be operated inside a session.

Session responsibilities include management of transactions and temporary data.

Figure 2.3: Sparksee application architecture

Figure 2.3 shows a representation of a basic Sparksee-based application architecture where the application can manage multiple databases, each of them accessed by multiple threads and each handling a session.

Transactions

A Sparksee transaction encloses a set of operations and defines the granularity level for the concurrent execution of sessions.

There are two types of transactions: Read or Shared, and Write or Exclusive. Sparksee’s concurrency model is based on the N-readers 1-writer model, meaning that multiple read transactions can be executed concurrently whereas write transactions are executed exclusively.

When a transaction starts with the ‘begin’ instruction becomes self-defined for the operations it contains. Initially, a transaction starts as a read transaction and if a method updates the persistent graph database then it automatically becomes a write transaction. To become a write transaction all other read transactions must have finished first. You can also directly start a write transaction by using the ‘beginUpdate’ instruction instead. That will avoid any possible lost update problem; but keep in mind Sparksee’s concurrecny model when creating this type of transactions.

Users can manage transactions in two different ways:

Explicit use of transactions: The user explicitly calls the beginning and end (commit or rollback) of a transaction (see ‘Transactions’ section of the ‘API’ chapter). All operations between the beginning and the end of the transaction are grouped within a single transaction.
Autocommited mode: A transaction is automatically started before each operation and automatically closed when the operation finishes. This results in transactions of size 1 in number of operations.

Explicit use of transactions may improve the performance of concurrently executed sessions, so it is highly recommended.

Temporary data

Some operations may require the use of temporary data. This temporary data is automatically managed by the session removing it when the session is closed. For this reason, temporary data may also be referred as Session data.

Large collections of object identifiers, its iterators and session attributes are examples of temporary data.

Session attributes are a further example of temporary data. Whereas attributes are persistent in the graph database, session attributes are temporary and exclusive for a session:

As they are automatically removed when the session is closed, these attributes do not require a user-provided name identifier.
Session attributes can be global too.
Session attributes are only manipulated within their session, so the other sessions cannot see them.

Back to Index

Sparksee User Manual

by Sparsity Technologies

Graph database

General concepts

Sparksee graph database

Graph data model

Types

Node and edges

Attributes

Indexing

Attributes

Edges

Processing

Sessions

Transactions

Temporary data