RUN LOADING JOB
statement. These two statements, and the components of the loading job, are detailed below. gsql_trim
, gsql_ltrim
, and gsql_rtrim
(Built-in Loader Token Functions).FileLoader.ReplicaNumber
.RUN LOADING JOB
.book_rating
schema defined earlier in the document are available in the $(gadmin config get System.AppRoot)/document/examples
folder in your TigerGraph platform installation.LOAD
or DELETE
statements but not both.
A loading job that includes both will be rejected when the CREATE
statement is executed.DROP JOB job_name
. The job will be removed from GSQL. To drop all jobs, run either of the following commands:ALL
depends on the user's current scope. If the user has set a working graph, then DROP ALL
removes all the jobs for that graph. If a superuser has set their scope to be global, then DROP ALL
removes all jobs across all graph spaces.DEFINE
statementsDEFINE
statement is used to define a local variable or expression to be used by the subsequent LOAD
statements in the loading job.DEFINE FILENAME
DEFINE FILENAME
statement defines a filename variable. The variable can then be used later in the JOB
block by a LOAD
statement to identify its data source. Every concurrent loading job must have at least one DEFINE FILENAME
statement.filevar
_ _is optionally followed by a filepath_string
, which tells the job where to find input data. As the name suggests, _filepath_string _is a string value. Therefore, it should start and end with double quotes.path
_: _either an absolute path or relative path for either a file or a folder on the machine where the job is run. If it is a folder, then the loader will attempt to load each non-hidden file in the folder."all:" path
: If the path is prefixed with all:
, then the loading job will attempt to run on every machine in the cluster which has a RESTPP component, and each machine will look locally for data at path. If the path is invalid on any of the machines, the job will be aborted. Also, the session parameter $sys.data_root may not be used."any:" path
: If the path is prefixed with any:
, then the loading job will attempt to run on every machine in the cluster which has a RESTPP component, and each machine will look locally for data at the specified path. **If the path is invalid on any of the machines, those machines where the path is not valid are skipped. **Also, the session parameter $sys.data_root may not be used.LOAD
statementLOAD
statement describes how to parse a data line into column values (tokens), and then describes how the values should be used to create a new vertex or edge instance. One LOAD
statement can be used to generate multiple vertices or edges, each vertex or edge having its own destination clause, as shown below. Additionally, two or more LOAD statements may refer to the same input data file. In this case, the GSQL loader will merge their operations so that both of their operations are executed in a single pass through the data file.LOAD
statement has many options. This reference guide provides examples of key features and options. The Knowledge Base and FAQs_** **_and the tutorials, such as GSQL 101, provide additional solution- and application-oriented examples.LOAD
statement types have different rules for the USING
clause; see the USING
clause section below for specifics.filevar
_ _must have been previously defined in a DEFINE FILENAME statement.filepath_string
_ _must satisfy the same rules given above in the DEFINE FILENAME section.TEMP_TABLE
). The destination clause formats for the three types are very similar, but we show them separately for clarity:TO VERTEX
and TO EDGE
destination clauses, the following rules for its parameters apply:vertex_type_name
_or _edge_type_name
_must match the name of a vertex or edge type previously defined in a CREATE VERTEX
or CREATE UNDIRECTED|DIRECTED EDGE
statement.id_expr
, attr_expr1
, attr_expr2
,...) _are assigned to the id(s) and attributes of a new vertex or edge instance, in the same order in which they are listed in the CREATE
statement.id_expr
_obeys the same attribute rules as attr_expr
, except that _only attr_expr
_can use the reducer function, which is introduced later.source_id_expr
and target_id_expr
can each optionally be followed by a source_type_expr
and target_type_expr
, respectively. The source_type_expr
and target_type_expr
must evaluate to one of the allowed endpoint vertex types for the given edge type. By specifying the vertex type, this tells the loader what id types to expect. This may be important when the edge type is defined to accept more than one type of source/target vertex.TO TEMP_TABLE
clause defines a new, temporary data structure. Its unique characteristics will be described in a separate subsection. For now, we focus on TO VERTEX
and TO EDGE
.VERTEX_MUST_EXIST=true
option.Visit
edge can connect two Person
vertices or a Person
to a Company
. A Person
has a string ID, while a Company has an INT
ID. Then suppose the Visit
edge source data comes from a single CSV file, containing both variants of edges. Note that the 2nd column ($1) contains either Person
or Company
, and that the 3rd column ($2) contains either a string or an integer.target_type_expr
field, we can load both variants of the Visit
edge with a single clause.USING clause
when loading data into edge types with different FROM-TO
vertex pairs, even if all options are default.LOAD
statement processes each line of an input file, splitting each line (according to the SEPARATOR
character, see Section "Other Optional LOAD
Clauses" for more details) into a sequence of tokens. Each destination clause provides a token-to-attribute mapping which defines how to construct a new vertex, an edge, or a temp table row instance (e.g., one data object). The tokens can also be thought of as the column values in a table. There are two ways to refer to a column, by position or by name. Assuming a column has a name, either method may be used, and both methods may be used within one expression.DEFINE HEADER
statement. If a header line is used, then the first line of the input file should be structured like a data line, using the same separator characters, except that each column contains a column name string instead of a data value. Names are enclosed in double quotes, e.g. quot;age"
.LOAD
statement,PRIMARY_ID
of a person vertex comes from column $0
of the file "xx/yy/a.csv".$1
.WHERE
clause and OPTION
clause are satisfied, then the data object is loaded.WHERE
clause. If the data object is not valid, the object is rejected (skipped) and counted as an error in the log file. The rules for invalid attributes values are summarized below:UINT
: Any non-digit character. (Out-of-range values cause overflow instead of rejection)INT
: Any non-digit or non-sign character. (Out-of-range values cause overflow instead of rejection)FLOAT
and DOUBLE
: Any wrong formatSTRING
, STRING COMPRESS
, FIXED_BINARY
: N/ADATETIME
: Wrong format, invalid date time, or out of range.BOOL
: Any value not listed later.UDT
), element (in LIST
or SET
), key or value (in MAP
) causes rejection.LOAD
statement can specify that a particular attribute should not be loaded by using the special character _ (underscore) as its attribute expression (attr_expr). For example,$0
), literals (constant numeric or string values), any of the built-in loader token functions, or a user-defined token function. Attribute expressions may **not **contain mathematical or boolean operators (such as +
, *
, AND
). The rules for attribute expressions are the same as those for id expressions, but an attribute expression can additionally use a reducer function:DATETIME
AttributeDATETIME
attribute, the GSQL loader will automatically read a string representation of DateTime information and convert it to internal DateTime representation. The loader accepts any of the following string formats:%Y-%m-%d %H:%M:%S
(e.g., 2011-02-03 01:02:03)%Y/%m/%d %H:%M:%S
(e.g., 2011/02/03 01:02:03)%Y-%m-%dT%H:%M:%S.000z
(e.g., 2011-02-03T01:02:03.123z, 123 will be ignored)%Y-%m-%d
(only date, no time, e.g., 2011-02-03 )%Y/%m/%d
(only date, no time, e.g., 2011/02/03)%Y
is a 4-digit year. A 2-digit year is not a valid value.%m
and %d
represent month (1 to 12) and day (1 to 31), respectively. Leading zeroes are optional.%H
, %M
, %S
are hours (0 to 23), minutes (0 to 59) and seconds (0 to 59), respectively. Leading zeroes are optional.TRUE
,True
,true
,1
FALSE
,False
,false
,0
true
and false
.LIST
or a SET
.load_set_list
will load two test_vertex
vertices because there are two unique id values in the data file. Vertex 1 has attribute values with iset = [10,20]
and ilist = [10,20,20]
. Vertex 3 has values iset = [30,40]
and ilist = [30, 30, 40]
. Note that a set doesn't contain duplicate values, while a list can contain duplicate values.SPLIT()
function to read a compound token and split it into a collection of elements, to form a LIST
or SET
collection. The SPLIT()
function takes two arguments: the column index and the element separator. The element separator should be distinct from the separator throughout the whole file. Below shows an example:SPLIT()
function cannot be used for UDT type elements.MAP
.MAP
by the first method: Use an arrow (->) to separate the map's key and value.MAP()
function. If there are multiple key-value pairs among multiple columns, MAP()
can load them together. Below is an example:SPLIT()
function. Similar to the SPLIT()
in loading LIST
or SET
, the SPLIT()
function can be used when the key-value pair is in one column and separated by a key-value separator, or multiple key-value pairs are in one column and separated by element separators and key-value separators. SPLIT()
here has three parameters: The first is the column index, the second is the key-value separator, and the third is the element separator. The third parameter is optional. If one row of raw data only has one key-value pair, the third parameter can be skipped. Below are the examples without and with the given element separator.TO_VERTEX
or FROM_VERTEX
contains a composite key, the composite set of attributes must be enclosed in parentheses. See the example below.LOAD
statement for the edge. An example is shown below:WHERE
clause.LOAD
statement with the function call. The arguments of the function can be a column from the FILE
object."%Y-%m-%d %H:%M:%S"
"%Y/%m/%d %H:%M:%S"
"%Y-%m-%dT%H:%M:%S.000z" // text after the dot . is ignored
WHERE
clause is an optional clause. The WHERE
clause's condition is a boolean expression. The expression may use column token variables, token functions, and operators which are described below. The expression is evaluated for each input data line. If the condition is true, then the vertex or edge instance is loaded into the graph store. If the condition is false, then this instance is skipped. Note that all attribute values are treated as string values in the expression, so the type conversion functions to_int()
and to_float()
, which are described below, are provided to enable numerical conditions.DOUBLE
and FLOAT
type values, it is not recommended to test these data types for exact equality or inequality. Instead, one should allow for an acceptable amount of error. The following example checks if $0 = 5
, with an error of 0.00001 permitted:OPTION
clauseTAGS
clause (Beta)TAGS
clause specifies the tags to be applied to the vertices loaded by the LOAD
statement.LOAD
statement has a TAGS
clause, it will tag the vertices with the tags specified in the TAGS
clause. Before vertices can be loaded and tagged with a LOAD
statement, the vertex type must first be marked as taggable, and the tags must be defined.BY OR
: Add the new tags to the existing set of tags.BY OVERWRITE
: Overwrite existing tags with the new tags.USING
clauseUSING
clause contains one or more optional parameter value pairs:"\t"
for tab"\xy"
for ASCII decimal code xy"\n"
(system-defined newline character or character sequence)QUOTE="double", SEPARATOR=","
, the comma in "Leonard,Euler"
will not separate Leonard and Euler into separate tokens. \JSON_FILE="true"
is used, the loader loads JSON objects instead of tabular data. A JSON object is an unordered set of key/value pairs, where each value may itself be an array or object, leading to nested structures. A colon separates each key from its value, and a comma separates items in a collection. A more complete description of JSON format is available at www.json.org. The JSON loader requires that each input line has exactly one JSON object. Instead of using column values as tokens, the JSON loader uses JSON values as tokens, that is, the second part of each JSON key/value pair. In a GSQL loading job, a JSON field is identified by a dollar sign $ followed by the colon-separated sequence of nested key names to reach the value from the top level. For example, given the JSON object {"abc":{"def": "this_value"}}
, the identifier quot;abc":"def"
is used to access "this_value"
. The double quotes are mandatory."file.reader.type": "parquet"
in the S3 file configuration file or argumentJSON_FILE="true"
in the USING clause of the LOAD statementsUSING EOF="true"
to your RUN LOADING JOB statement to explicitly indicate to the loading job to stop after consuming all data from the Parquet source, not to expect further entries.