Maintenance and monitoring

In this chapter the database administrator can learn about the functionalities that Sparksee offers in order to maintain and monitorize Sparksee databases.

We would like to place particular emphasis on the Recovery functionality that will help the administrator to always keep an automatic copy of the database stored and safe.

Backup

Sparksee provides functionality for performing a cold backup and restoring a database which has been previously backed up.

During a cold backup, the database is closed or locked and not available to users. The data files do not change during the backup process so the database is in a consistent state when it is returned to normal operation.

The method Graph#backup performs a full backup by writing all the content of the database into a given file path and Sparksee#restore creates a new Database instance from a backup file.

Alternative Graph#encryptedBackup and Sparksee#restoreEncryptedBackup methods are also available to create and restore AES encrypted backup files.

Next code-blocks provide an example of this functionality:

[Java]
// perform backup

Graph graph = sess.getGraph();
...
graph.backup("database.gdb.back");
...
sess.close();

// restore backup

Sparksee sparksee = new Sparksee(new SparkseeConfig());
Database db = sparksee.restore("database.gdb", "database.gdb.back");
Session sess = db.newSession();
Graph graph = sess.getGraph();
...
sess.close();
db.close();
sparksee.close();
[C#]
// perform backup

Graph graph = sess.GetGraph();
...
graph.Backup("database.gdb.back");
...
sess.Close();

// restore backup

Sparksee sparksee = new Sparksee(new SparkseeConfig());
Database db = sparksee.Restore("database.gdb", "database.gdb.back");
Session sess = db.NewSession();
Graph graph = sess.GetGraph();
...
sess.Close();
db.Close();
sparksee.Close();
[C++]
// perform backup

Graph * graph = sess->getGraph();
...
graph->Backup(L"database.gdb.back");
...
delete sess;

// restore backup

SparkseeConfig cfg;
Sparksee * sparksee = new Sparksee(cfg);
Database * db = sparksee.Restore(L"database.gdb", L"database.gdb.back");
Session * sess = db->NewSession();
Graph * graph = sess->GetGraph();
...
delete db;
delete sess;
delete sparksee;
[Python]
# perform backup

graph = sess.get_graph()
...
graph.backup("database.gdb.back")
...
sess.close;

# restore backup

sparks = sparksee.Sparksee(sparksee.SparkseeConfig())
db = sparks.restore("database.gdb", "database.gdb.back")
sess = db.new_session()
graph = sess.get_graph()
...
db.close()
sess.close()
sparks.close()
[Objective-C]
// perform backup
STSGraph * graph = [sess getGraph];
...
[graph backup: @"database.gdb.back"];
...
[sess close];
[db close];
[sparksee close];
//[sparksee release];

// restore backup
STSSparkseeConfig * cfg = [[STSSparkseeConfig alloc] init];
STSSparksee * sparksee = [[STSSparksee alloc] initWithConfig: cfg];
//[cfg release];
STSDatabase * db = [sparksee restore: @"database.gdb" backupFile: @"database.gdb.back"];
STSSession * sess = [db createSession];
STSGraph * graph = [sess getGraph];
...
[sess close];
[db close];
[sparksee close];
//[sparksee release];

Note that OIDs (object identifiers) for both node and edge objects will be the same when the database is restored, however type or attribute identifiers may differ.

Take into consideration that although it does not update the database it works as a writing method. As Sparksee’s concurrency model only accepts 1 writer transaction at a time (see more details about this in the ‘Processing’ section of the ‘Graph database’ chapter), this operation blocks any other transaction.

Recovery

Sparksee includes an automatic recovery manager which keeps the database safe for any eventuality. In case of application or system failures, the recovery manager is able to bring the database to a consistent state in the next restart.

By default, the recovery functionality is disabled so in order to use it, the user must enable and configure the manager. The recovery manager introduces a small penalty in the performance, so there is always a trade-off between the functionality it provides and a minor decrease in performance.

The configuration includes:

This configuration can be performed with the SparkseeConfig class or by setting the values in a Sparksee configuration file. This is explained in detail in the ‘Recovery’ section of the ‘Configuration’ chapter.

Runtime information

Logging

It is possible to enable the logging of Sparksee activity. The log configuration requires both the level and the log file path.

This configuration can be performed with the SparkseeConfig class or by setting the values in a Sparksee configuration file. This is explained in detail in the ‘Log’ section of the ‘Configuration’ chapter.

Current valid Sparksee log levels are defined in the LogLevel enum class. This is the list of values ordered from the least verbose and increasing:

  1. Off

    Log is disabled.

  2. Severe

    The log only stores errors.

  3. Warning

    Log errors and situations which may require special attention are included in the log file.

  4. Info

    Log errors, warnings and information messages are always stored.

  5. Config

    Log includes configuration details of the different components.

  6. Fine

    This is the most complete log level; it includes the previous levels of logging plus additional platform details.

  7. Debug

    Log debug information. It only works for a debug version of the library, so it can only be used by developers.

Dumps

There are two methods to dump a summary of the content from a Sparksee database.

Both files are written using YAML, a human-readable data serialization format.

Statistics

Sparksee offers a set of runtime statistics available for different Sparksee components. In order to use each statistical method it is recommended checking the class in the reference manuals of the chosen programming language.

Database statistics

The class DatabaseStatistics provides general information about the database:

Use the Database#getStatistics method to retrieve this information.

Platform statistics

The class PlatformStatistics provides general information about the platform where Sparksee is running:

Use the Platform#getStatistics method to retrieve this information.

Attribute statistics

The class AttributeStatistics provides information about a certain attribute:

For numerical attributes (integer, long and double) it also includes:

For string attributes it also includes:

Use the Graph#getAttributeStatistics method to retrieve this information. The user should take into account the fact that the method has a boolean argument in order to specify if basic (TRUE value) or complete statistics (FALSE value) for that datatype must be retrieved. Check in the reference manual for those statistics which are considered to be basic.

The administrator may also want to check which attributes have a value in a certain range, in which case the method Graph#getAttributeIntervalCount would be the most appropriate.

Note that both methods do not work for Basic attributes, statistics can only be retrieved for Indexed or Unique attributes. See ‘API’ chapter for more details on the attribute types.

Cache statistics

Finally, it is also possible to enable the logging of the cache to monitorize its activity. By default, the logging of the cache is disabled, so it should be enabled and configured first. This configuration can be performed with the SparkseeConfig class or by setting the values in a Sparksee configuration file. This is explained in detail in the ‘Log’ section of the ‘Configuration’ chapter.

The configuration of the cache statistics includes:

The cache statistics log includes:

Checksums

Sparksee computes a checksum everytime a page is read and written from/to the disk. Such checksum allows for Sparksee to detect external I/O data corruption and report .

The type of checksum Sparksee uses is a 4-byte CRC-32 checksum. The checksum is computed for every page everytime it is written to disk and is stored at its header. When a page is read a checksum of the page is computed and compared to that stored in its header. If the two checksums disagree, the read of the page is retried and the checksum compared again. If the discrepancy persists, an Unrecoverable Error is reported as explained in ‘API’ chapter.

Additionally, Sparksee provides a mechanism to verify the integrity of a database image with respect to checksums, via the Sparksee object.

[Java]
Sparksee sparksee = new Sparksee(new SparkseeConfig());
boolean success = sparksee.verifyChecksums("database.gdb");
if(!success) {
  ...
}
sparksee.close();
[C#]
Sparksee sparksee = new Sparksee(new SparkseeConfig());
boolean success = sparksee.VerifyChecksums("database.gdb");
if(!success) {
  ...
}
sess.Close();
[C++]
SparkseeConfig cfg;
Sparksee * sparksee = new Sparksee(cfg);
bool success = sparksee->VerifyChecksums(L"database.gdb");
if(!success)
{
  ...
}
delete sparksee;
[Python]
sparks = sparksee.Sparksee(sparksee.SparkseeConfig())
success = sparks.verify_checksums("database.gdb")
if not success:
  ...
sparks.close()
[Objective-C]
STSSparkseeConfig * cfg = [[STSSparkseeConfig alloc] init];
STSSparksee * sparksee = [[STSSparksee alloc] initWithConfig: cfg];
//[cfg release];
BOOL success = [sparksee verifyChecksums: @"database.gdb"];
if(!success)
  ...
[sparksee close];
//[sparksee release];

Tools

We distribute Sparksee with a set of maintenance tools.

GDBCheck

Verifies the checksum integrity of a database image, and returns an error code if the image appears to be corrupted.

   ./GDBCheck  [-c <cfg file>] -g <gdb file> [--] [--version] [-h]


Where: 

   -c <cfg file>,  --cfg <cfg file>
     Sparksee config file

   -g <gdb file>,  --gdb <gdb file>
     (required)  Database file

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.

GDBConf

Tool used to change configuration parameters of a given database.

   ./GDBConf  [-d] [-e] [-n] [-k] [-c <cfg file>] -g <gdb file> [--]
              [--version] [-h]

Where: 

   -d,  --re
     Remove encryption

   -e,  --ae
     Add encryption

   -n,  --rc
     Remove checksums

   -k,  --ac
     Add checksums

   -c <cfg file>,  --cfg <cfg file>
     Sparksee config file

   -g <gdb file>,  --gdb <gdb file>
     (required)  Database file

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.

If encryption is enabled, the key and the IV must also be provided to add or remove checksums

GDBBackup

Tool to create and restore backups


   ./GDBBackup  [-c <cfg file>] [-r] [-i <hex encoded iv>] [-k <hex encoded
                key>] -b <backup file> -g <gdb file> [--] [--version] [-h]


Where:

   -c <cfg file>,  --cfg <cfg file>
     Sparksee config file

   -r,  --restore
     Restore mode (will overwrite the DB file)!

   -i <hex encoded iv>,  --iv <hex encoded iv>
     Backup encryption initialization vector

   -k <hex encoded key>,  --key <hex encoded key>
     Backup encryption key

   -b <backup file>,  --backup <backup file>
     (required)  Backup file

   -g <gdb file>,  --gdb <gdb file>
     (required)  Database file

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.




Simple Backup arguments example:

   GDBBackup -g sourceDB.gdb -b targetBackup.back

Simple Restore arguments example:

   GDBBackup -r -g targetDB.gdb -b sourceBackup.back

sparkseecli

Tool to execute queries on a Sparksee image using any of the supported languages

   ./sparkseecli  [-l <algebra|cypher>] [--ro] [--create] [-r <numRows>]
                  [-e <command>] [-b] [-s <file>] [-c <cfg file>] -g
                  <filename> [--] [--version] [-h]


Where:

   -l <algebra|cypher>,  --lang <algebra|cypher>
     Query language

   --ro
     Open the DB in READ ONLY mode

   --create
     Create a new DB

   -r <numRows>,  --rows <numRows>
     Number of output rows limit

   -e <command>,  --execute <command>
     Execute command

   -b,  --batch
     Batch mode (non interactive)

   -s <file>,  --script <file>
     Script file

   -c <cfg file>,  --cfg <cfg file>
     Sparksee config file

   -g <filename>,  --gdb <filename>
     (required)  Database filename

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.




   Multiline commands are allowed. '/' marks the end of a command.

   Special commands for the console:

   quit             Exit the console

   CTRL+C           Force exit the console

   CTRL+A           Go to the beginning of the line

   CTRL+E           Go to the end of the line

   UP arrowkey      History up

   DOWN arrowkey    History down

Best Practices

Effective Removal of Nodes and Edges

Sparksee uses the disk as a sequential storage unit. Removing information from the database, such as nodes, edges, types, etc. does not free the used space, but marks it as removed to be reused by subsequent database additions.

However, there are some restrictions that apply when it comes to reusing the space of deleted information. Sparksee clusters the storage used for nodes, edges and other information into storage groups of elements. Such storage groups are the minimum unit of allocation and deallocation and their size depends on the data structure.

For this reason, in order to effectively deallocate the used space of an information item, all the elements allocated in the same storage group must be deallocated. Otherwise, the storage group cannot be deallocated and reused in future database insertions.

As mentioned above, the size of such groups depends on the data structure and the type of information to store. However, the following information can be useful when deciding which nodes and edges need to be removed in order to effectively free storage space for a later reuse:

In conclusion, removing nodes or edge with consecutive ids is the more effective way to free and reuse storage space. Since ids are assigned consecutively to nodes and edges of the same type as they are created, this means, for instance, that removing the older nodes and/or edges of a graph should effectively free the storage for a later reuse.

Encryption Key and IV storage

When creating/opening an image with encryption enabled, we recommend not to provide the key and the IV through the sparksee.cfg configuration file, but to use the API, not to expose the key and the IV through a human readable medium. The option to pass them through the sparksee.cfg file is meant to be used only for testing purposes.

Upgrading from the previous release

Unless otherwise specified, you should always be able to upgrade your Sparksee to the next release, but you may not be able to skip version upgrades.

Usually only a major version number change implies a change in the database file format. But we highly recommend to backup your data before upgrading the Sparksee release. You can checkout the ‘Backup’ chapter above and you can also keep a backup copy of your database files (you should do the copy while the database is closed).

If the database file format has changed with the new release, the database must be opened at least once without the read-only mode to be able to upgrade the files.

Upgrading from Sparksee 5 to 6

In addition to the general considerations for upgrading Sparksee, upgrading from version 5 to 6 requires a few changes in your database settings.

The version 6.0 of Sparksee is the first release to include checksum verification enabled by default and the new Licensing system with mandatory identifiers. So in order to open a Sparksee 5 database with Sparksee 6 you need to both disable the checksum verification and setup the license identifiers that you should have received.

You can find more information about the Checksums in the ‘Checksums’ chapter and more information about how to setup both the checksums and the license in the ‘Configuration’ chapter.

Once you have successfully opened a Sparksee 5 database file with Sparksee 6, your database files shoud have been successfully upgraded to the new format. Then, you can choose to use the new Sparksee 6 features like checksum verification or database encryption. But first you should add checksums and or encryption to the database. You can do it using methods from the Sparksee class while the database is closed, but we recommend using the command line tool GDBConf. More information in the ‘Tools’ chapter.

Back to Index