DexNet 4.7.0
Public Member Functions | Static Public Member Functions
com::sparsity::dex::script::ScriptParser Class Reference

ScriptParser. More...

List of all members.

Public Member Functions

void SetErrorLog (System.String path) throws System.IO.IOException
 Sets the error log.
 ScriptParser ()
 Constructor.
bool Parse (System.String path, bool execute, System.String localeStr) throws System.IO.IOException
 Parses the given input file.
void SetOutputLog (System.String path) throws System.IO.IOException
 Sets the output log.

Static Public Member Functions

static void Main ()
 Executes ScriptParser for the given file path.
static void GenerateSchemaScript (System.String path, com.sparsity.dex.gdb.Database db) throws System.IO.IOException
 Writes an script with the schema definition for the given database.

Detailed Description

ScriptParser.

The ScriptParser can create schemas and load data from a set of commands in a dex script.

A DEX script contains an ordered list of commands. ScriptParser will execute each one of them in order. Commands will create schemas, define nodes and edges, and load data into a previous defined DEX schema.

There are six main commands: (i) database creation 'create dbgraph': creates a new empty schema into a DEX database, and sets this database to perform the following operatinos; (ii) database usage 'use dbgraph': opens an existing DEX database and sets this database to perform the following operations; (iii) node type creation 'create node': creates a node type into the database; (iv) edge type creation 'create edge': creates an edge type into the database; (v) node data load 'load nodes': loads an CSV file into the database; (vi) edge data load 'load edges': loads an CSV file into the database.

-- Schema definition --

This creates a DEX graph database:

CREATE (GDB|DBGRAPH) alias INTO filename

where alias is the name of the graph database to be created and filename corresponds to the path to store the dex database.

Instead of creating a new database, you can set an existing one as the operation database of the script:

USE (GDB|DBGRAPH) alias INTO filename

All following commands will be performed on the last created or used graph database.

This creates a node type:

CREATE NODE node_type_name "(" [attribute_name (INTEGER|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] [DEFAULT value], ...] ")"

and this an edge type:

CREATE [UNDIRECTED] EDGE edge_type_name [FROM node_type_name TO node_type_name] "(" [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [DEFAULT value], ...] ") [MATERIALIZE NEIGHBORS]"

Here there are some examples:

create gdb EXAMPLE into 'ex.dex'

use gdb WIKIPEDIA into 'wikipedia.dex'

create node TITLES (ID int unique, 'TEXT' string, NLC string, TITLE string indexed)

create node IMAGES (ID int unique, NLC string, FILENAME string indexed)

create edge REFS (NLC string, "TEXT" string, TYPE string)

create edge IMGS

create dbgraph FAMILY into 'family.dex'

create node PERSON (NAME string indexed, ID int unique, YEAR int)

create node DOG (NAME string indexed, YEAR int default 2012)

create edge CHILD from PERSON to PERSON (YEAR int)

create undirected edge MARRIED from PERSON to PERSON (YEAR int) materialize neighbors

create edge PET from PERSON to DOG () materialize neighbors

create gdb CARMODEL into 'cars.dex'

create node PERSON (NAME string, ID int unique, YEAR int)

create node CAR (MODEL string, ID int, OWNER int indexed)

Note you may quote name identifiers in order to be able to use reserved words.

Attributes can be defined as follows.

CREATE ATTRIBUTE [type.]name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] [DEFAULT value]

If no node or edge type name is given, it creates a global attribute.

-- Data node load --

Load nodes command creates nodes and sets attributes values for nodes imported from a CSV. For each CSV row a new node is created.

By default a new log file with the node type name is created to keep the invalid data error messages. But you can set a specific log file name (LOG logfile), abort at the first error instead of keeping a log (LOG ABORT) or turn off all the invalid data error reporting (LOG OFF).

This is the command:

LOAD NODES file_name [LOCALE locale_name] COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE [maxExtralines]]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])] [LOG (OFF|ABORT|logfile)]

Here there are some examples:

load nodes 'titles.csv' columns ID, 'TEXT', NLC, TITLE into TITLES

load nodes 'images.csv' columns ID, NLC, FILENAME into IMAGES from 2 max 10000 mode columns

load nodes 'people.csv' locale 'en_US.utf8' columns *, DNI, NAME, AGE, *, ADDRESS into PEOPLE mode row

-- Data edge load --

Load edges command creates edges between existing nodes and sets attributes values for those edges imported from a CSV. For each CSV row a new edge is created.

By default a new log file with the edge type name is created to keep the invalid data error messages. But you can set a specific log file name (LOG logfile), abort at the first error instead of keeping a log (LOG ABORT) or turn off all the invalid data error reporting (LOG OFF).

LOAD EDGES file_name [LOCALE locale_name] COLUMNS attribute_name [alias_name], ... INTO node_type_name [IGNORE (attribute_name|alias_name), ....] WHERE TAIL (attribute_name|alias_name) = node_type_name.attribute_name HEAD (attribute_name|alias_name) = node_type_name.attribute_name [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE [maxExtralines]]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])] [LOG (OFF|ABORT|logfile)]

Tail node is defined by tail property, it looks for the node where attribute value is the same than the node of an specific name with the same value at specific attribute name. In the same way, head node is defined by head property.

Here there are some examples:

load edges 'references.csv' columns NLC, 'TEXT', TYPE, FROM F, TO T into REFS ignore F, T where tail F = TITLES.ID head T = TITLES.ID mode columns split partitions 3

load edges 'imagesReferences.csv' locale 'es_ES.iso88591' columns From, To into IMGS ignore From, To where tail From = TITLES.ID HEAD To = IMAGES.ID mode rows

load edges 'calls.gz' columns From, To, Time, Long into CALLS ignore From, To where tail From = PEOPLE.DNI head To = PEOPLE.DNI mode columns

-- Schema update --

Schema update commands allows for updating the schema of a graph database. Nowadays it is possible to remove node or edge types or attributes. The node attribute indexing can also be modified.

DROP (NODE|EDGE) name

DROP ATTRIBUTE [type_name.]attribute_name

INDEX [type_name.]attribute_name [INDEXED|UNIQUE|BASIC]

When no type_name is given, then it references a global attribute.

Examples:

drop edge REFS

drop node 'TITLES'

drop attribute PEOPLE.DNI

drop attribute GLOBAL_ID

index PEOPLE.NAME indexed

index CAR.ID unique

-- Timestamp Format --

The timestamp format can be set with the command:

SET TIMESTAMP FORMAT timestamp_format_string

After this command, all timestamps data are loaded with the format specified.

Valid format fields:

yyyy -> Year

yy -> Year without century (80-, 20+ from current year)

MM -> Month [1..12]

dd -> Day of month [1..31]

hh -> Hour [0..23]

mm -> Minute [0..59]

ss -> Second [0..59]

SSS -> Millisecond [0..999]

For parsing, if the pattern is 'yy', the parser determines the full year relative to the current year. The parser assumes that the two-digit year is within 80 years before or 20 years after the time of processing. For example, if the current year is 2007, the pattern MM/dd/yy assigned the value 01/11/12 parses to January 11, 2012, while the same pattern assigned the value 05/04/64 parses to May 4, 1964.

Default formats accepted when this command is not present:

"yyyy-MM-dd hh:mm:ss.SSS"

"yyyy-MM-dd hh:mm:ss"

"yyyy-MM-dd"

-- Default Attribute value --

The default value of an attribute can be set with the command:

SET ATTRIBUTE type.attribute_name DEFAULT value

Where the value should be of the same datatype as the attribute being set or NULL.

After this command, all the new nodes or edges with this attribute will be created with this value for this attribute.

-------


Member Function Documentation

static void com::sparsity::dex::script::ScriptParser::GenerateSchemaScript ( System.String  path,
com.sparsity.dex.gdb.Database  db 
) throws System.IO.IOException [static]

Writes an script with the schema definition for the given database.

Parameters:
path[in] Filename of the script to be writen.
db[in] Database.
Exceptions:
System.IO.IOExceptionIf bad things happen opening or writing the file.
static void com::sparsity::dex::script::ScriptParser::Main ( ) [static]

Executes ScriptParser for the given file path.

One argument is required, a file path which contains the script to be parsed.

A second argument may be given, a boolean to set if the script must be executed or just parsed.If not given, the script will be executed.

bool com::sparsity::dex::script::ScriptParser::Parse ( System.String  path,
bool  execute,
System.String  localeStr 
) throws System.IO.IOException

Parses the given input file.

Parameters:
path[in] Input file path.
execute[in] If TRUE the script is executed, if FALSE it is just parsed.
localeStr[in] The locale string for reading the input file. See CSVReader.
Returns:
TRUE if ok, FALSE in case of error.
Exceptions:
System.IO.IOExceptionIf bad things happen opening the file.
void com::sparsity::dex::script::ScriptParser::SetErrorLog ( System.String  path) throws System.IO.IOException

Sets the error log.

If not set, error log corresponds to standard error output.

Parameters:
path[in] Path of the error log.
Exceptions:
System.IO.IOExceptionIf bad things happen opening the file.
void com::sparsity::dex::script::ScriptParser::SetOutputLog ( System.String  path) throws System.IO.IOException

Sets the output log.

If not set, output log corresponds to standard output.

Parameters:
path[in] Path of the output log.
Exceptions:
System.IO.IOExceptionIf bad things happen opening the file.