com.sparsity.dex.script
Class ScriptParser

java.lang.Object
  extended by com.sparsity.dex.script.ScriptParser

public class ScriptParser
extends java.lang.Object

ScriptParser.

The ScriptParser can create schemas and load data from a set of commands in a dex script.

A DEX script contains an ordered list of commands. ScriptParser will execute each one of them in order. Commands will create schemas, define nodes and edges, and load data into a previous defined DEX schema.

There are six main commands: (i) database creation 'create dbgraph': creates a new empty schema into a DEX database, and sets this database to perform the following operatinos; (ii) database usage 'use dbgraph': opens an existing DEX database and sets this database to perform the following operations; (iii) node type creation 'create node': creates a node type into the database; (iv) edge type creation 'create edge': creates an edge type into the database; (v) node data load 'load nodes': loads an CSV file into the database; (vi) edge data load 'load edges': loads an CSV file into the database.

-- Schema definition --

This creates a DEX graph database:

CREATE (GDB|DBGRAPH) alias INTO filename

where alias is the name of the graph database to be created and filename corresponds to the path to store the dex database.

Instead of creating a new database, you can set an existing one as the operation database of the script:

USE (GDB|DBGRAPH) alias INTO filename

All following commands will be performed on the last created or used graph database.

This creates a node type:

CREATE NODE node_type_name "(" [attribute_name (INTEGER|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] [DEFAULT value], ...] ")"

and this an edge type:

CREATE [UNDIRECTED] EDGE edge_type_name [FROM node_type_name TO node_type_name] "(" [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [DEFAULT value], ...] ") [MATERIALIZE NEIGHBORS]"

Here there are some examples:

create gdb EXAMPLE into 'ex.dex'

use gdb WIKIPEDIA into 'wikipedia.dex'

create node TITLES (ID int unique, 'TEXT' string, NLC string, TITLE string indexed)

create node IMAGES (ID int unique, NLC string, FILENAME string indexed)

create edge REFS (NLC string, "TEXT" string, TYPE string)

create edge IMGS

create dbgraph FAMILY into 'family.dex'

create node PERSON (NAME string indexed, ID int unique, YEAR int)

create node DOG (NAME string indexed, YEAR int default 2012)

create edge CHILD from PERSON to PERSON (YEAR int)

create undirected edge MARRIED from PERSON to PERSON (YEAR int) materialize neighbors

create edge PET from PERSON to DOG () materialize neighbors

create gdb CARMODEL into 'cars.dex'

create node PERSON (NAME string, ID int unique, YEAR int)

create node CAR (MODEL string, ID int, OWNER int indexed)

Note you may quote name identifiers in order to be able to use reserved words.

Attributes can be defined as follows.

CREATE ATTRIBUTE [type.]name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] [DEFAULT value]

If no node or edge type name is given, it creates a global attribute.

-- Data node load --

Load nodes command creates nodes and sets attributes values for nodes imported from a CSV. For each CSV row a new node is created.

By default a new log file with the node type name is created to keep the invalid data error messages. But you can set a specific log file name (LOG logfile), abort at the first error instead of keeping a log (LOG ABORT) or turn off all the invalid data error reporting (LOG OFF).

This is the command:

LOAD NODES file_name [LOCALE locale_name] COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE [maxExtralines]]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])] [LOG (OFF|ABORT|logfile)]

Here there are some examples:

load nodes 'titles.csv' columns ID, 'TEXT', NLC, TITLE into TITLES

load nodes 'images.csv' columns ID, NLC, FILENAME into IMAGES from 2 max 10000 mode columns

load nodes 'people.csv' locale 'en_US.utf8' columns *, DNI, NAME, AGE, *, ADDRESS into PEOPLE mode row

-- Data edge load --

Load edges command creates edges between existing nodes and sets attributes values for those edges imported from a CSV. For each CSV row a new edge is created.

By default a new log file with the edge type name is created to keep the invalid data error messages. But you can set a specific log file name (LOG logfile), abort at the first error instead of keeping a log (LOG ABORT) or turn off all the invalid data error reporting (LOG OFF).

LOAD EDGES file_name [LOCALE locale_name] COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] WHERE TAIL (attribute_name|alias_name) = node_type_name.attribute_name HEAD (attribute_name|alias_name) = node_type_name.attribute_name [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE [maxExtralines]]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])] [LOG (OFF|ABORT|logfile)]

Tail node is defined by tail property, it looks for the node where attribute value is the same than the node of an specific name with the same value at specific attribute name. In the same way, head node is defined by head property.

Here there are some examples:

load edges 'references.csv' columns NLC, 'TEXT', TYPE, FROM F, TO T into REFS ignore F, T where tail F = TITLES.ID head T = TITLES.ID mode columns split partitions 3

load edges 'imagesReferences.csv' locale 'es_ES.iso88591' columns From, To into IMGS ignore From, To where tail From = TITLES.ID HEAD To = IMAGES.ID mode rows

load edges 'calls.gz' columns From, To, Time, Long into CALLS ignore From, To where tail From = PEOPLE.DNI head To = PEOPLE.DNI mode columns

-- Schema update --

Schema update commands allows for updating the schema of a graph database. Nowadays it is possible to remove node or edge types or attributes. The node attribute indexing can also be modified.

DROP (NODE|EDGE) name

DROP ATTRIBUTE [type_name.]attribute_name

INDEX [type_name.]attribute_name [INDEXED|UNIQUE|BASIC]

When no type_name is given, then it references a global attribute.

Examples:

drop edge REFS

drop node 'TITLES'

drop attribute PEOPLE.DNI

drop attribute GLOBAL_ID

index PEOPLE.NAME indexed

index CAR.ID unique

-- Timestamp Format --

The timestamp format can be set with the command:

SET TIMESTAMP FORMAT timestamp_format_string

After this command, all timestamps data are loaded with the format specified.

Valid format fields:

yyyy -> Year

yy -> Year without century (80-, 20+ from current year)

MM -> Month [1..12]

dd -> Day of month [1..31]

hh -> Hour [0..23]

mm -> Minute [0..59]

ss -> Second [0..59]

SSS -> Millisecond [0..999]

For parsing, if the pattern is 'yy', the parser determines the full year relative to the current year. The parser assumes that the two-digit year is within 80 years before or 20 years after the time of processing. For example, if the current year is 2007, the pattern MM/dd/yy assigned the value 01/11/12 parses to January 11, 2012, while the same pattern assigned the value 05/04/64 parses to May 4, 1964.

Default formats accepted when this command is not present:

"yyyy-MM-dd hh:mm:ss.SSS"

"yyyy-MM-dd hh:mm:ss"

"yyyy-MM-dd"

-- Default Attribute value --

The default value of an attribute can be set with the command:

SET ATTRIBUTE type.attribute_name DEFAULT value

Where the value should be of the same datatype as the attribute being set or NULL.

After this command, all the new nodes or edges with this attribute will be created with this value for this attribute.

-------


Constructor Summary
ScriptParser()
          Constructor.
 
Method Summary
static void generateSchemaScript(java.lang.String path, Database db)
          Writes an script with the schema definition for the given database.
static void main()
          Executes ScriptParser for the given file path.
 boolean parse(java.lang.String path, boolean execute, java.lang.String localeStr)
          Parses the given input file.
 void setErrorLog(java.lang.String path)
          Sets the error log.
 void setOutputLog(java.lang.String path)
          Sets the output log.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ScriptParser

public ScriptParser()
Constructor.

Method Detail

setErrorLog

public void setErrorLog(java.lang.String path)
                 throws java.io.IOException
Sets the error log.

If not set, error log corresponds to standard error output.

Parameters:
path - [in] Path of the error log.
Throws:
java.io.IOException - If bad things happen opening the file.

generateSchemaScript

public static void generateSchemaScript(java.lang.String path,
                                        Database db)
                                 throws java.io.IOException
Writes an script with the schema definition for the given database.

Parameters:
path - [in] Filename of the script to be writen.
db - [in] Database.
Throws:
java.io.IOException - If bad things happen opening or writing the file.

parse

public boolean parse(java.lang.String path,
                     boolean execute,
                     java.lang.String localeStr)
              throws java.io.IOException
Parses the given input file.

Parameters:
path - [in] Input file path.
execute - [in] If TRUE the script is executed, if FALSE it is just parsed.
localeStr - [in] The locale string for reading the input file. See CSVReader.
Returns:
TRUE if ok, FALSE in case of error.
Throws:
java.io.IOException - If bad things happen opening the file.

setOutputLog

public void setOutputLog(java.lang.String path)
                  throws java.io.IOException
Sets the output log.

If not set, output log corresponds to standard output.

Parameters:
path - [in] Path of the output log.
Throws:
java.io.IOException - If bad things happen opening the file.

main

public static void main()
Executes ScriptParser for the given file path.

One argument is required, a file path which contains the script to be parsed.

A second argument may be given, a boolean to set if the script must be executed or just parsed.If not given, the script will be executed.