Common Applications
DemoExamples_2.0.tar.gz
16KB
Binary
DemoExamples_2.0.tar.gz

Example 1. Collaborative Filtering

Here is an observation about social networks: If a set of persons likes me, and many of them also like another person Z, it is probably true that person Z and I have some things in common. The same observation works for products and services: if a set of customers likes product X, and many of them also like product Z, then product X and Z probably have something in common. We say X and Z are "co-liked". This observation can be turned around into a search for recommendations: Given a user X, find a set of of users Z which are highly co-liked. For social networks, this can be used as friend recommendation: find a highly co-liked person Z to introduce to X. For e-commerce, this can be used for purchase recommendation: someone who bought X may also be interested in buying Z. This technique of finding the top co-liked individuals is called collaborative filtering.
A graph analytics approach is a natural fit for collaborative filtering because the original problem is in a graph (social network), and the search criteria can easily be expressed as a path in the graph. We first find all people Y who like user X, then find other users Z who are liked by someone in group Y, and rank members of Z according to how many times they're liked by Y.
Figure 1 below shows a simple graph according to our model. The circles represent three User vertices with id values id1, id2, and id3. There are two directed edges labeled "Liked" which show that User id2 likes id1, and id2 also likes id3. (In this model, friendship is directional because in online social networks, one of the two persons initiates the friendship.) There are two more directed edges in the opposite directions labeled "Liked_By". Since id2 likes both id1 and id3, id1 and id3 are co-liked.
Figure 1 - Example graph for the collaborative filtering model

Quick Demo

To just see the basic operation of the TigerGraph system, follow the easy instructions below . You can then continue to read for the explanation of the command files so you can learn to design your own examples.

Quick Demo Instructions

This example uses the graph below and asks the following query: "Who are the top co-liked persons of id1"?
Step 1: Obtain the data and command files. Create a graph model.
This example uses 4 small files: 3 command files ( cf_model.gsql , cf_load.gsql , cf_query.gsql) and one data file ( cf_data.csv) . Their contents are shown below, so you can either copy from this document or download the files (look in the "cf" subfolder of Examples.zip)
1
> gsql 'DROP ALL'
2
> gsql cf_model.gsql
3
> gsql 'CREATE GRAPH gsql_demo(*)'
Copied!
Step 2: Load data:
The command below loads our new data.
1
> gsql -g gsql_demo cf_load.gsql
Copied!
Step 3: Install and execute the query:
The file cf_query.gsql creates a query called topCoLiked. Then we install the query. The creation step runs fast, but the installation (compiling) step may take about 1 minute. We then run the query, asking for the top 20 Users who are co-liked with User id1.
1
> gsql -g gsql_demo cf_query.gsql
2
> gsql -g gsql_demo 'INSTALL QUERY topCoLiked'
3
> gsql -g gsql_demo 'RUN QUERY topCoLiked("id1", 20)'
Copied!
The query results should be the following. Interpretation: id4 has as score (@cnt) = 2, which means there are two persons who like both id1 and id4. Next, id2 and id3 each have 1 co-friend in common with id1.
1
{
2
"error": false,
3
"message": "",
4
"version": {
5
"edition": "developer",
6
"schema": 0,
7
"api": "v2"
8
},
9
"results": [{"L2": [
10
{
11
"v_id": "id4",
12
"attributes": {"@cnt": 2},
13
"v_type": "User"
14
},
15
{
16
"v_id": "id3",
17
"attributes": {"@cnt": 1},
18
"v_type": "User"
19
},
20
{
21
"v_id": "id2",
22
"attributes": {"@cnt": 1},
23
"v_type": "User"
24
}
25
]}]
26
}
Copied!
We now begin a tutorial-style explanation of this TigerGraph example and the workflow in general.
The figure below outlines the steps to progress from an empty graph to a query solution. Each of the blocks below corresponds to one of the steps in the Quick Demo above. The tutorial below will give you a deeper understanding of each step, so you can learn how it works and so you can design your own graph solutions.

Step 1: Create a graph model.

The first step is to create a model for your data which describes the types of vertices and edges you will have.
This example is written to be compatible with older TigerGraph platforms which support only one graph model at a time (though the user can make the model simple or complex, to handle multiple needs). To clear an existing model and old data, so you can install a new one, run the DROP ALLcommand.
The statements below describe the vertex types and edge types in our Co-Liked model
1
CREATE VERTEX User (PRIMARY_ID id string)
2
CREATE DIRECTED EDGE Liked (FROM User, TO User) WITH REVERSE_EDGE = "Liked_By"
Copied!
The first CREATE statement creates one vertex type called User. The second statement creates one directed edge type called Liked. The WITH REVERSE_EDGE clause means that for every two vertices (x,y) connected by a Liked type of edge, the system will automatically generate a corresponding edge of type Liked_By pointing from y to x, and both edges will have the same edge attributes.
After defining all your vertex and edge types, execute the following command to create a graph which binds the vertices and edges into one graph model:
CREATE GRAPH command
1
CREATE GRAPH gsql_demo(*)
Copied!
The name of the graph is gsql_demo. Within the parentheses, you can either list the specific vertex and edge types (User, Liked), or you can use *, which means include everything. We chose to use * so that the same command can be used for all of our examples.
The CREATE commands can be stored in one file and executed together.
cf_model.gsql
1
CREATE VERTEX User (PRIMARY_ID id string)
2
CREATE DIRECTED EDGE Liked (FROM User, TO User) WITH REVERSE_EDGE = "Liked_By"
3
#CREATE GRAPH gsql_demo(*)
Copied!
The CREATE GRAPH command is commented out for the following reason:
Our examples have been designed to run either as individual graphs or merged together into one multi-application graph. The CREATE GRAPH command may be run only once, after all the vertex and edge types have been created. (Each of our demo examples uses unique vertex and edge names, to avoid conflicts.) In other words, we run CREATE GRAPH gsql_demo(*)as a separate command after creating all the vertex and edge types. If you decide you want to modify the schema after running CREATE GRAPH, you can create and run a SCHEMA_CHANGE JOB.
Newer TigerGraph platforms (i.e., version 1.1 or higher) can support multiple graphs, but this tutorial has been designed to be compatible with older single-graph platforms.
  • To execute these statements (DROP ALL, CREATE VERTEX, etc.), you can type them individually at the GSQL shell prompt, or you can first save them to a file, such as cf_model.gsql , and then run the command file. From within the shell, you would run @cf_model.gsql From outside the shell, you would run > gsql cf_model.gsql
Normally a user would put all their CREATE VERTEX, CREATE EDGE, and the final CREATE GRAPH statements in one file. In our example files, we have separated out the CREATE GRAPH statement because we want to merge all our example schemas together into one common graph.
  • The vertex, edge, and graph types become part of the catalog . To see what is currently in your catalog, type the ls command from within the GSQL shell to see a report as below:
Catalog contents, as reported by the "ls" command
1
Vertex Types:
2
- VERTEX User(PRIMARY_ID id STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
3
4
Edge Types:
5
- directed edge Liked(from User, to User) with reverse_edge="Liked_By"
6
- directed edge Liked_By(from User, to User) with reverse_edge="Liked"
7
8
Graphs:
9
10
Jobs:
11
Queries:
12
13
Json API version: v2
Copied!
  • To remove a definition from the catalog, use some version of the DROPcommand. Use the helpcommand to see a summary of available GSQL commands.
  • In our examples, we typically show keywords in ALL UPPERCASE to distinguish them from user-defined identifiers. Identifiers are case-sensitive but keywords are not.
In this example, the vertices and edges don't have attributes. In general, a TigerGraph graph can have attributes on both vertices and edges, and it can also have different types of edges connecting the same two vertices. Please see GSQL Language Reference Part 1 - Defining Graphs and Loading Data which provides a more complete description of the graph schema definition language with additional examples.

Step 2: Load initial data.

Figure 2 shows a larger graph with five vertices and several edges. To avoid crowding the figure, only the Liked edges are shown: For every Liked edge, there is a corresponding Liked_By edge in the reverse direction.
Figure 2 - Graph for Collaborative Filtering Calculation
The data file below describes the five vertices and seven edges of Figure 2.
cf_data.csv
1
id2,id1
2
id2,id3
3
id3,id1
4
id3,id4
5
id5,id1
6
id5,id2
7
id5,id4
Copied!
The loading job below will read from a data file and create vertex and edge instances to put into the graph.
Per Example: Load data into the graph (file: cf_load.gsql)
1
# define the loading job
2
USE GRAPH gsql_demo # added for v1.2
3
CREATE LOADING JOB load_cf FOR GRAPH gsql_demo {
4
DEFINE FILENAME f;
5
LOAD f
6
TO VERTEX User VALUES ($0),
7
TO VERTEX User VALUES ($1),
8
TO EDGE Liked VALUES ($0, $1);
9
}
10
11
# load the data
12
RUN LOADING JOB load_cf USING f="../cf/data/cf_data.csv"
Copied!
Now that we have defined a graph (in Step 1), GSQL commands or sessions should specify that you want to use a particular graph. Line 2 (new for v1.2) sets the working graph to be gsql_demo. Another way to set the working graph is to specify each time you invoke the gsql command, e.g.,
1
gsql -g gsql_demo cf_load.gsql
Copied!
The CREATE LOADING JOB statement (line 3) defines a job called load_cf. The job will read each line of the input file, creates one vertex based on the value in the first column (referenced with column name $0), another vertex based on the value in the second column ($1), and one Liked edge pointing from the first vertex to the second vertex. In addition, since the Liked edge type definition includes the WITH REVERSE_EDGE clause, a Liked_By edge pointing in the opposite direction is also created.
After the job has been created, we run the job (line 12). the RUN LOADING JOB command line includes details about the data source: the name of the file is cf_data.csv, commas are used to separate columns, and \n is used to end each line. (Data files should not contain any extra spaces before or after the separator character.)
The TigerGraph loader automatically filters out duplicates. If either of the two column values has already been seen before, that vertex won't be created. Instead the existing vertex will be used. For example, if we read the first two data lines in data file cf_data.csv , the first line will generate two User vertices, one edge type of Liked , and one edge type of Liked_By . For the second row, however, only one new vertex will be created since id2 has been seen already. Two edges will be created for the second row.
  1. 1.
    It is okay to run an LOADING JOB again, or to run a different loading job, to add more data to a graph store which already has some data. For example, you could do the following:
1
RUN LOADING JOB load_cf USING f="../cf/cf_data1.csv"
2
RUN LOADING JOB load_cf USING f="../cf/cf_data2.tsv"
Copied!
2.After loading, you can use the GraphStudio UI to visually inspect your data. Refer to the TigerGraph GraphStudio UI Guide .
To clear all your data but to keep your graph model, run the "CLEAR GRAPH STORE -HARD" command. -HARD must be in all capital letters.
Be very careful using CLEAR GRAPH STORE; there is no UNDO command.
For the querying and updating examples in the remainder of this use case, we will assume that Figure 2 has been loaded.
This loading example is basic. The GSQL language can do complex data extraction and transformation, such as dealing with JSON input format and key-value list input, all in high-level syntax. Please see GSQL Language Reference Part 1 - Defining Graphs and Loading Data for more examples.

Step 3: Write a graph-based query solution

The GSQL language includes not only data definition and simple inspection of the data, but also advanced querying which traverses the graph and which supports aggregation and iteration.

Built-In Queries

First , we can run some simple queries to verify that the data were loaded correctly. Below are some examples of some built-in GSQL queries which can be run in GSQL shell:
Simple Query for Validation
Meaning & Comments
SELECT count(*) FROM User
Display the number of User vertices,
SELECT count(*) FROM User-(Liked)->User
Display the number of directed Liked edges from User type to User type
SELECT approx_count(*) FROM User
Display the number of User vertices according to cached statistics. Response time may be faster than count(*). See note below.
SELECT approx_count(*) FROM User-(Liked)->User
Display the number of directed Liked edges from User type to User Type, according to cached statistics. Response time may be faster than count(*). See note below.
SELECT * FROM User LIMIT 3
Display all id, type, and attribute information for up to 3 User vertices.
A LIMIT or WHERE condition is required, to prevent the output from being too large. Note that there is also a system limit of 10240 vertices or edges returned by SELECT *.
SELECT * FROM User WHERE primary_id=="id2"
Display all id, type and attribute information for the User vertex whose primary_id is "id2". The WHERE clause can also specify non-ID attributes.
SELECT * FROM User-(ANY)->ANY WHERE from_id=="id1"
Display all id,type, and attribute information about any type of edge which starts from vertex "id1".
To guard against queries which select too many edges, the WHERE clause is mandatory when selecting edges.
1
GSQL > SELECT * FROM User LIMIT 5
2
[
3
{
4
"v_id": "id2",
5
"attributes": {},
6
"v_type": "User"
7
},
8
{
9
"v_id": "id5",
10
"attributes": {},
11
"v_type": "User"
12
},
13
{
14
"v_id": "id3",
15
"attributes": {},
16
"v_type": "User"
17
},
18
{
19
"v_id": "id4",
20
"attributes": {},
21
"v_type": "User"
22
},
23
{
24
"v_id": "id1",
25
"attributes": {},
26
"v_type": "User"
27
}
28
]
Copied!

Create a Query

Note on approx_count(*)
The approx_count(*) function relies on statistics which may not account for recent insertions and deletions. If there has been no recent activity, they will give accurate results. In contrast, the count(*) function insures that recent data insertions and deletions are processed, so that it returns an accurate count.
SELECT * displays information in JSON format. Below is an example of query output.
Now let's solve our original problem: find users who are co-liked with a user X. The following query demonstrates a 2-step traversal with aggregation.
The query below performs the co-liked collaborative filtering search. The concept behind this query is to describe a "graph path" which represents the relationship between a person (the starting point) and a person that is co-liked (the ending point). Figure 1 illustrates this path: id3 is a co-liked user of id1, because id2 likes both of them. The path from id1 to co-liked users is: (1) traverse a Liked_By edge to a User, and then (2) traverse a Liked edge to another User. This query also calculates the magnitude of the relationship between the starting point and each ending point. The more users there are such as id2 which connect id1 and id3, the stronger the co-like relationship between id1 and id3. Counting the number of paths that end at id3 serves to calculate this magnitude.
cf_query.gsql - Define the collaborative filtering query
1
CREATE QUERY topCoLiked( vertex<User> input_user, INT topk) FOR GRAPH gsql_demo
2
{
3
SumAccum<int> @cnt = 0;
4
# @cnt is a runtime attribute to be associated with each User vertex
5
# to record how many times a user is liked.
6
L0 = {input_user};
7
L1 = SELECT tgt
8
FROM L0-(Liked_By)->User:tgt;
9
L2 = SELECT tgt
10
FROM L1-(Liked)->:tgt
11
WHERE tgt != input_user
12
ACCUM tgt.@cnt += 1
13
ORDER BY tgt.@cnt DESC
14
LIMIT topk;
15
PRINT L2;
16
}
Copied!
This query is structured like a procedure with two input parameters: an input vertex and value of k for the top-K ranking. The query contains three SELECT statements executed in order. The L0 statement defines our initial list of vertices: a set containing a single user supplied by the input_user parameter. Suppose the input user is id1 . Next, the L1 statement starts from every vertex in the set L0, traverses every connected edge of type Liked_By and returns every target vertex (that is, the other ends of the connected edges). As a result, L1 is the set of all users who liked the input user. Referring to the graph in Figure 2, the query travels backwards along every Liked edge which points to id1 , arriving at id2 , id3 , and id5 . These three vertices form L1. Next, the L2 statement starts from each user in L1, travels to every user liked by that starting user (via the Liked type of edges), and increments the count for each User reached. That is, the algorithm counts how many times each vertex is visited by a query path. The WHERE condition makes sure the original input user will not be returned in the result.ORDER BY and LIMIT have the same meaning as in SQL. Below, we show how the L2 step tallies the counts for each vertex encountered:
  1. 1.
    From id2, Liked edges lead to id1 and id3. id1 is excluded due to the WHERE clause. The cnt count for id3 is incremented from 0 to 1.
  2. 2.
    From id3, Liked edges lead to id1 and id4. id1 is excluded due to the WHERE clause. The cnt count for id4 is incremented from 0 to 1.
  3. 3.
    From id5, Liked edges lead to id1, id2, and id4. id1 is excluded to to the WHERE clause. The cnt count of id2 is incremented from 0 to 1. The cnt count of id4 is incremented from 1 to 2.
The three co-liked users and their cnt scores: id3 (cnt score = 1), id4 (cnt = 2), and id2 (cnt = 1). The ORDER BY clause indicates that the sorting should be in descending order, such that the LIMIT clause trims L2 to the 20 vertices with the highest (as opposed to lowest) cnt values. For the test graph, there are only 3 vertices which are co-liked, less than the limit of 20. id4 has the strongest co-liked relationship.

Install and Run a Query

After the query is defined (in the CREATE QUERY block), it needs to be installed. The INSTALL QUERY command compiles the query.
1
INSTALL QUERY topCoLiked
Copied!
If you have several queries, you can wait to install them in one command, which runs faster than installed each one separately. E.g.,
1
INSTALL QUERY query1, query2
Copied!
or
1
INSTALL QUERY ALL
Copied!
is faster than
1
INSTALL QUERY query1
2
INSTALL QUERY query2
Copied!
After a query has been installed, it can be run as many times has desired. The command RUN QUERY invokes the query, with the given input arguments.
Using "id1" as the starting point and allowing up to 5 vertices in the output, the RUN QUERY command and its output on our test graph is shown below:
1
GSQL > RUN QUERY topCoLiked("id1", 5)
2
{
3
"error": false,
4
"message": "",
5
"version": {
6
"edition": "developer",
7
"schema": 0,
8
"api": "v2"
9
},
10
"results": [{"L2": [
11
{
12
"v_id": "id4",
13
"attributes": {"@cnt": 2},
14
"v_type": "User"
15
},
16
{
17
"v_id": "id3",
18
"attributes": {"@cnt": 1},
19
"v_type": "User"
20
},
21
{
22
"v_id": "id2",
23
"attributes": {"@cnt": 1},
24
"v_type": "User"
25
}
26
]}]
27
}
Copied!
Instead of using the RUN QUERY command within the GSQL shell, the query can be invoked from the operating system via a RESTful GET endpoint (which is automatically created by the INSTALL QUERY command):
1
curl -X GET "http://hostName:port/query/gsql_demo/topCoLiked?input_user=id1&topk=5"
Copied!
If you followed the standard installation instructions for the TigerGraph system, hostName for the REST server is localhostand port is 9000.
As of TigerGraph 1.2, the URL for query REST endpoints includes the graph name after query/. Prior to 1.2, the URL for the example above was http://hostName:port /query/topCoLiked

Step 4 (Optional): Update Your Data.

You can update the stored graph at any time, to add new vertices and edges, to remove some, or to update existing values. The GSQL language includes ADD, DROP, ALTER, UPSERT, and DELETE operations which are similar to the SQL operations of the same name. The UPSERT operation is a combined UPDATE-INSERT operation: If object exists, then UPDATE, else INSERT. Note that this is the default behavior for The GSQL language's 'smart' loading described above. There are three basic types of modifications to a graph:
  1. 1.
    Adding or deleting objects
  2. 2.
    Altering the schema of the graph
  3. 3.
    Modifying the attributes of existing objects
We'll give a quick example of each type. To show the effect each modification, we'll use the following simple built-in queries:
cf_mod_check.gsql
1
SELECT * FROM User LIMIT 1000
2
SELECT * FROM User-(Liked)->User WHERE from_id=="id2"
Copied!
The current results, before making any modifications, are shown below.
Users vertices and Edges from id2, before any modifications
1
GSQL > SELECT * FROM User LIMIT 1000
2
[
3
{
4
"v_id": "id2",
5
"attributes": {},
6
"v_type": "User"
7
},
8
{
9
"v_id": "id5",
10
"attributes": {},
11
"v_type": "User"
12
},
13
{
14
"v_id": "id3",
15
"attributes": {},
16
"v_type": "User"
17
},
18
{
19
"v_id": "id4",
20
"attributes": {},
21
"v_type": "User"
22
},
23
{
24
"v_id": "id1",
25
"attributes": {},
26
"v_type": "User"
27
}
28
]
29
GSQL > SELECT * FROM User-(Liked)->User WHERE from_id=="id2"
30
[
31
{
32
"from_type": "User",
33
"to_type": "User",
34
"directed": true,
35
"from_id": "id2",
36
"to_id": "id3",
37
"attributes": {},
38
"e_type": "Liked"
39
},
40
{
41
"from_type": "User",
42
"to_type": "User",
43
"directed": true,
44
"from_id": "id2",
45
"to_id": "id1",
46
"attributes": {},
47
"e_type": "Liked"
48
}
49
]
Copied!
Graph modification operations are performed by a distributed computing model which satisfies Sequential Consistency. For these examples, a brief one second pause between the updating and querying the graph should be sufficient.

Modification Type 1: Adding or deleting

Adding is simply running a loading job again with a new data file. More details are in the GSQL Language Reference Part 1.
Deleting: Suppose we want to delete vertex id3 and all its connections:
cf_mod1.gsql
1
DELETE FROM User WHERE primary_id=="id3"
Copied!
Users vertices and Edges from id2, after Modification 1
1
GSQL > SELECT * FROM User LIMIT 1000
2
[
3
{
4
"v_id": "id2",
5
"attributes": {},
6
"v_type": "User"
7
},
8
{
9
"v_id": "id5",
10
"attributes": {},
11
"v_type": "User"
12
},
13
{
14
"v_id": "id4",
15
"attributes": {},
16
"v_type": "User"
17
},
18
{
19
"v_id": "id1",
20
"attributes": {},
21
"v_type": "User"
22
}
23
]
24
GSQL > SELECT * FROM User-(Liked)->User WHERE from_id=="id2"
25
[{
26
"from_type": "User",
27
"to_type": "User",
28
"directed": true,
29
"from_id": "id2",
30
"to_id": "id1",
31
"attributes": {},
32
"e_type": "Liked"
33
}]
Copied!

Modification Type 2: Altering the schema

The GSQL DELETE operation is a cascading deletion. If a vertex is deleted, then all of the edges which connect to it are automatically deleted as well.
Result: one fewer vertex and one fewer edge from id2.
The GSQL language supports four types of schema alterations:
  1. 1.
    Adding a new type of vertex or edge: ADD VERTEX | DIRECTED EDGE | UNDIRECTED EDGE
  2. 2.
    Removing a type of vertex or edge: DROP VERTEX | DIRECTED EDGE | UNDIRECTED EDGE
  3. 3.
    Adding attributes to a vertex or edge type: ALTER VERTEX vertex_type | EDGE edge_type ADD ATTRIBUTE (name type)
  4. 4.
    Removing attributes of a vertex or edge type: ALTER VERTEX vertex_type | EDGE edge_type DROP ATTRIBUTE (name)
To make schema changes, create a SCHEMA_CHANGE job. Running the SCHEMA_CHANGE JOB will automatically stop all services, update the graph store, and restart the service. For example, suppose we wish to add a name for Users and a weight to Liked edges to indicatehow much User A likes User B.
cf_mod2.gsql
1
CREATE GLOBAL SCHEMA_CHANGE JOB cf_mod2 {
2
ALTER VERTEX User ADD ATTRIBUTE (name string);
3
ALTER EDGE Liked ADD ATTRIBUTE (weight float DEFAULT 1);
4
}
5
RUN JOB cf_mod2
Copied!
As of v1.2, the schema_change job here needs to be GLOBAL because the User vertex and Liked edge are global types (they were defined before an active graph was set.)
Changing the schema may necessitate changing queries and other tasks, such as REST endpoints. In this example, the collaborative filtering query will still run with the the new weight attribute, but it will ignore the weight in its calculations.
Users vertices and Edges from id2, after Modification 2
1
GSQL > SELECT * FROM User LIMIT 1000
2
[
3
{
4
"v_id": "id2",
5
"attributes": {"name": ""},
6
"v_type": "User"
7
},
8
{
9
"v_id": "id5",
10
"attributes": {"name": ""},
11
"v_type": "User"
12
},
13
{
14
"v_id": "id4",
15
"attributes": {"name": ""},
16
"v_type": "User"
17
},
18
{
19
"v_id": "id1",
20
"attributes": {"name": ""},
21
"v_type": "User"
22
}
23
]
24
GSQL > SELECT * FROM User-(Liked)->User WHERE from_id=="id2"
25
[{
26
"from_type": "User",
27
"to_type": "User",
28
"directed": true,
29
"from_id": "id2",
30
"to_id": "id1",
31
"attributes": {"weight": 1},
32
"e_type": "Liked"
33
}]
Copied!

Modification Type 3: Modifying the attributes of existing objects

Now that we have added a weight attribute, we probably want to assign some weight values to the graph. The following example updates the weight values of two edges. For edge upserts, the first two arguments in the VALUES list specify the FROM vertex id and the TO vertex_id, respectively. Similarly, for vertex upserts, the first argument in the VALUES list specifies the PRIMARY_ID id. Since id values may not be updated, the GSQL shell implicitly applies a conditional test: "If the specified id value(s) exist, than update the non-id attributes in the VALUES list; otherwise, insert a new data record using these values."
cf_mod3.gsql
1
UPSERT User VALUES ("id1", "Aaron")
2
UPSERT User VALUES ("id2", "Bobbie")
3
UPSERT User-(Liked)->User VALUES ("id2","id1",2.5)
Copied!
Users vertices and Edges from id2, after Modification 3
1
GSQL > SELECT * FROM User LIMIT 1000
2
[
3
{
4
"v_id": "id2",
5
"attributes": {"name": "Bobbie"},
6
"v_type": "User"
7
},
8
{
9
"v_id": "id5",
10
"attributes": {"name": ""},
11
"v_type": "User"
12
},
13
{
14
"v_id": "id4",
15
"attributes": {"name": ""},
16
"v_type": "User"
17
},
18
{
19
"v_id": "id1",
20
"attributes": {"name": "Aaron"},
21
"v_type": "User"
22
}
23
]
24
GSQL > SELECT * FROM User-(Liked)->User WHERE from_id=="id2"
25
[{
26
"from_type": "User",
27
"to_type": "User",
28
"directed": true,
29
"from_id": "id2",
30
"to_id": "id1",
31
"attributes": {"weight": 2.5},
32
"e_type": "Liked"
33
}]
Copied!

Other Modes for Graph Updates

In addition to making graph updates within the GSQL Shell, there are two other ways: sending a query string directly to the Standard Data Manipulation REST API, or writing a custom REST endpoint. For details about the first method, see the GET, POST, and DELETE /graphendpoints in the RESTPP API User Guide . The functionality in GSQL and in the Standard Query API is the same; GSQL commands are translated into REST GET, POST, and DELETE requests and submitted to the Standard Query API.
The REST API equivalent of the GSQL Modification 3 upsert example above is as follows:
1
curl -X POST --data-binary @ data/cf_mod3_input.json http://hostName:9000/graph
Copied!
where serverIP is the IP address of your REST server (default = localhost) and data/cf_mod3_input.json is a text file containing the following JSON-encoded data:
cf_upsert.json
1
{
2
"vertices": {
3
"User":{
4
"id1":{
5
"name":{
6
"value":"Aaron"
7
}
8
}
9
},
10
"User":{
11
"id2":{
12
"name":{
13
"value":"Bobbie"
14
}
15
}
16
}
17
},
18
"edges": {
19
"User":{
20
"id2":{
21
"Liked":{
22
"User":{
23
"id1":{
24
"weight" : {
25
"value":2.5
26
}
27
}
28
}
29
}
30
}
31
}
32
}
33
}
Copied!

Example 2. Page Rank

This example shows the use of WHILE loop iteration, global variables , and the built-in outdegree attribute.
It is recommended that you do the Collaborative Filtering Use Case first, because it contains additional tips on running the TigerGraph system.
Remember that if you have a text file containing GSQL commands (e.g., commands.gsql), you can run it one of two ways:
  • From Linux: gsql commands.gsql
  • From inside the GSQL shell: @commands.gsql
To run a single command (such as DROP ALL):
  • From Linux: gsql 'DROP ALL'
  • From inside the GSQL shell: DROP ALL
Setting the working graph
If a graph has been defined, then all subsequent gsql commands need to specify which graph is being used. If your command file does not contain a "USE GRAPH" statement, then you can specify the graph when invoking gsql:gsql -g graph_name commands.gsql
1
gsql -g graph_name commands.gsql
Copied!
If you are always using the same graph, you can define a Linux alias to automatically include your graph name:
1
alias gsql='gsql -g graph_name'
Copied!
You can add this line to the .bashrc in your home directory so that the alias is defined each time you open a bash shell.

Step 1: Create a graph model.

In this example, there is only one type of vertex and one type of edge, and edges are directed.
pagerank_model.gsql
1
CREATE VERTEX Page (PRIMARY_ID pid string, page_id string)
2
CREATE DIRECTED EDGE Linkto (FROM Page, TO Page)
3
#CREATE GRAPH gsql_demo(*)
Copied!
Note how the Page vertex type has both a PRIMARY_ID and a page_id attribute. As will be seen in step 2, the same data will be loaded into both fields. While this seems redundant, this is a useful technique in TigerGraph graph stores. The PRIMARY_ID is not treated as an ordinary attribute. In exchange for high-performance storage, the PRIMARY_ID lacks some of the filtering and querying features available to regular attributes. The Linkto edge does not have any attributes. In general, a TigerGraph graph can have attributes on both vertices and edges, and it can also have different types of edges connecting the same two vertices.
The CREATE GRAPH command is commented out for the following reason:
Our examples have been designed to run either as individual graphs or merged together into one multi-application graph. The CREATE GRAPH command should be run only once, after all the vertex and edge types for all the examples have been created. (Naturally, every model uses unique vertex and edge names, to avoid conflicts.) In other words, run ' CREATE GRAPH gsql_demo(*) ' as a separate command after you have created all your vertex and edge types.
Please see the GSQL Language Reference which provides a more complete description of the graph schema definition language with additional examples .

Step 2: Load initial data

A similar graph to what was used for the Collaborative Filtering user-user network example can be used for an example here. That is, each row has two values which are node IDs, meaning that there is a connection from the first node to the second node. However, we will introduce a difference to demonstrate the flexibility of the TigerGraph loading system. We will modify the data file to use the tab character as a field separator instead of the comma.
pagerank_data.tsv
1
1 2
2
1 3
3
2 3
4
3 4
5
4 1
6
4 2
Copied!

Loading job:

Create your loading job and load the data.
Per Example: Load data into the graph (file: pagerank_load.gsql)
1
# define the loading job
2
CREATE LOADING JOB load_pagerank FOR GRAPH gsql_demo {
3
DEFINE FILENAME f;
4
LOAD f
5
TO VERTEX Page VALUES ($0, $0),
6
TO VERTEX Page VALUES ($1, $1),
7
TO EDGE Linkto VALUES ($0, $1)
8
USING SEPARATOR="\t";
9
}
10
11
# load the data
12
RUN LOADING JOB load_pagerank USING f="../pagerank/pagerank_data.tsv"
Copied!
The above loading job will read each line of the input file (pagerank_data.tsv), create one vertex based on the value in the first column (referenced as $0), another vertex based on the value in the second column ($1), and one edge pointing from the first vertex to the second vertex. If either of the two column values has already been seen before, that vertex won't be created. Instead the existing vertex will be used. For example, the first row of pagerank_data.tsv, will create two vertices, with ids 1 and 2, and one edge (1, 2). The second row, however, will create only one new vertex, id 3, and one edge (1, 3), because id 1 already exists.
Note how the LOAD statement specifies the SEPARATOR character is the tab character.

Step 3: Write a Graph-based query solution

GSQL includes not only data definition and simple inspection of the data, but also advanced querying which traverses the graph and which supports aggregation and iteration. This example uses iterations, repeating the computation block until the maximum score change at any vertex is no more than a user-provided threshold, or until it reaches a user-specified maximum number of allowed iterations. Note the arrow -> in the FROM clause used to represent the direction of a directed edge.
pagerank_query.gsql
1
CREATE QUERY pageRank (float maxChange, int maxIteration, float dampingFactor)
2
FOR GRAPH gsql_demo
3
{
4
# In each iteration, compute a score for each vertex:
5
# score = dampingFactor + (1-dampingFactor)* sum(received scores from its neighbors).
6
# The pageRank algorithm stops when either of the following is true:
7
# a) it reaches maxIterations iterations;
8
# b) max score difference of any vertex compared to the last iteration <= maxChange.
9
# @@ prefix means a global accumulator;
10
# @ prefix means an individual accumulator associated with each vertex
11
12
MaxAccum<float> @@maxDifference = 9999; # max score change in an iteration
13
SumAccum<float> @received_score = 0; # sum of scores each vertex receives from neighbors
14
SumAccum<float> @score = 1; # initial score for every vertex is 1.
15
16
AllV = {Page.*}; # Start with all vertices of type Page
17
WHILE @@maxDifference > maxChange LIMIT maxIteration DO
18
@@maxDifference = 0;
19
S = SELECT s
20
FROM AllV:s-(Linkto)->:t
21
ACCUM t.@received_score += s.@score/s.outdegree()
22
POST-ACCUM s.@score = dampingFactor + (1-dampingFactor) * s.@received_score,
23
s.@received_score = 0,
24
@@maxDifference += abs(s.@score - s.@score');
25
PRINT @@maxDifference; # print to default json result
26
END; # end while loop
27
#PRINT AllV.page_id, [email protected]; # print the results, JSON output API version v1
28
PRINT AllV[AllV.page_id, AllV.@score]; # print the results, JSON output API version v2
29
} # end query
Copied!
For JSON output API v2, the PRINT syntax for a vertex set variable is different than the v1 syntax.
After executing the CREATE QUERY command, remember to install the query, either by itself or together with other queries:
Install the query
1
INSTALL QUERY pageRank
Copied!
Run the query:
We will use the typical dampingFactor of 0.15, iterate until the pagerank values change by less than 0.001, up to a maximum of 100 iterations. For these conditions, the PageRank values for the 4 vertices (1,2,3,4) are ( 0.65551, 0.93379, 1.22156, 1.18914), respectively.
1
RUN QUERY pageRank(0.001, 100, 0.15)
2
{
3
"error": false,
4
"message": "",
5
"version": {
6
"edition": "developer",
7
"schema": 0,
8
"api": "v2"
9
},
10
"results": [
11
{"@@maxDifference": 0.425},
12
{"@@maxDifference": 0.36125},
13
{"@@maxDifference": 0.15353},
14
{"@@maxDifference": 0.19575},
15
{"@@maxDifference": 0.16639},
16
{"@@maxDifference": 0.09429},
17
{"@@maxDifference": 0.08014},
18
{"@@maxDifference": 0.05961},
19
{"@@maxDifference": 0.04705},
20
{"@@maxDifference": 0.03999},
21
{"@@maxDifference": 0.017},
22
{"@@maxDifference": 0.02},
23
{"@@maxDifference": 0.017},
24
{"@@maxDifference": 0.00953},
25
{"@@maxDifference": 0.0081},
26
{"@@maxDifference": 0.00616},
27
{"@@maxDifference": 0.00479},
28
{"@@maxDifference": 0.00407},
29
{"@@maxDifference": 0.00178},
30
{"@@maxDifference": 0.00205},
31
{"@@maxDifference": 0.00174},
32
{"@@maxDifference": 9.6E-4},
33
{"AllV": [
34
{
35
"v_id": "2",
36
"attributes": {
37
"AllV.page_id": "2",
38
"[email protected]": 0.93379
39
},
40
"v_type": "Page"
41
},
42
{
43
"v_id": "4",
44
"attributes": {
45
"AllV.page_id": "4",
46
"[email protected]": 1.18914
47
},
48
"v_type": "Page"
49
},
50
{
51
"v_id": "1",
52
"attributes": {
53
"AllV.page_id": "1",
54
"[email protected]": 0.65551
55
},
56
"v_type": "Page"
57
},
58
{
59
"v_id": "3",
60
"attributes": {
61
"AllV.page_id": "3",
62
"[email protected]": 1.22156
63
},
64
"v_type": "Page"
65
}
66
]}
67
]
68
}
Copied!

Step 4 (Optional): Update Your Data.

Details about updating were discussed in Use Case 1 (Collaborative Filtering). We will go right to the graph modification examples for the PageRank case.
To show the effect of each modification, we use two built-in queries. The first one lists all the Page vertices. The second one lists all the edges which start at Page 4.
pagerank_mod_check.gsql
1
SELECT * FROM Page LIMIT 1000
2
SELECT * FROM Page-(Linkto)->Page WHERE from_id=="4"
Copied!
These are the results of the diagnostic queries, before any graph modifications. There are 4 vertices total and 2 edges which start at page 4.
Page vertices and Linkto edges from Page 4, before modifications
1
SELECT * FROM Page LIMIT 1000
2
[
3
{
4
"v_id": "2",
5
"attributes": {"page_id": "2"},
6
"v_type": "Page"
7
},
8
{
9
"v_id": "4",
10
"attributes": {"page_id": "4"},
11
"v_type": "Page"
12
},
13
{
14
"v_id": "1",
15
"attributes": {"page_id": "1"},
16
"v_type": "Page"
17
},
18
{
19
"v_id": "3",
20
"attributes": {"page_id": "3"},
21
"v_type": "Page"
22
}
23
]
24
SELECT * FROM Page-(Linkto)->Page WHERE from_id=="4"
25
[
26
{
27
"from_type": "Page",
28
"to_type": "Page",
29
"directed": true,
30
"from_id": "4",
31
"to_id": "2",
32
"attributes": {},
33
"e_type": "Linkto"
34
},
35
{
36
"from_type": "Page",
37
"to_type": "Page",
38
"directed": true,
39
"from_id": "4",
40
"to_id": "1",
41
"attributes": {},
42
"e_type": "Linkto"
43
}
44
]
Copied!

Modification 1: Adding or deleting

Adding is simply running a loading job again with a new data file.
Deleting: Suppose we want to delete vertex url4 and all its connections:
pagerank_mod1.gsql
1
DELETE FROM Page WHERE page_id=="1"
Copied!
The GSQL DELETE operation is a cascading deletion. If a vertex is deleted, then all of the edges which connect to it are automatically deleted as well.
Result: one fewer vertex and one fewer edge from Page 4.
Page vertices and Linkto edges from Page 4, after Modification 1
1
SELECT * FROM Page LIMIT 1000
2
[
3
{
4
"v_id": "2",
5
"attributes": {"page_id": "2"},
6
"v_type": "Page"
7
},
8
{
9
"v_id": "4",
10
"attributes": {"page_id": "4"},
11
"v_type": "Page"
12
},
13
{
14
"v_id": "3",
15
"attributes": {"page_id": "3"},
16
"v_type": "Page"
17
}
18
]
19
SELECT * FROM Page-(Linkto)->Page WHERE from_id=="4"
20
[{
21
"from_type": "Page",
22
"to_type": "Page",
23
"directed": true,
24
"from_id": "4",
25
"to_id": "2",
26
"attributes": {},
27
"e_type": "Linkto"
28
}]
Copied!

Modification Type 2: Altering the schema

For example, suppose we wish to add an attribute to the Page vertices to classify what type of Page it is and also a date to the edges.
pagerank_mod2.gsql
1
CREATE GLOBAL SCHEMA_CHANGE JOB pagerank_mod2 {
2
ALTER VERTEX Page ADD ATTRIBUTE (pageType string DEFAULT "");
3
ALTER EDGE Linkto ADD ATTRIBUTE (dateLinked string DEFAULT "");
4
}
5
RUN JOB pagerank_mod2
Copied!
Changing the schema may necessitate revising and reinstalling loading jobs and queries. In this case, adding the pageType attribute does not harm the pageRank query.
This schema_change job is GLOBAL because the Page vertex and Linkto edge types are global (defined before setting an active graph).
Page vertices and Linkto edges from Page 4, after Modification 2
1
SELECT * FROM Page LIMIT 1000
2
[
3
{
4
"v_id": "2",
5
"attributes": {
6
"page_id": "2",
7
"pageType": ""
8
},
9
"v_type": "Page"
10
},
11
{
12
"v_id": "4",
13
"attributes": {
14
"page_id": "4",
15
"pageType": ""
16
},
17
"v_type": "Page"
18
},
19
{
20
"v_id"