SQL/PGQ
SQL/PGQ is a graph query language built on top of SQL, designed to bring graph pattern matching capabilities to both seasoned SQL users and those new to graph technology. Standardized by the International Organization for Standardization (ISO), it offers a declarative approach to querying property graphs, which store nodes, edges, and properties.
The language features a visual graph syntax inspired by Cypher while also supporting traditional SQL syntax, easing the transition for SQL users. With SQL/PGQ, you can query property graphs to:
- Discover paths between nodes
- Identify specific graph patterns
- Calculate the shortest path between two nodes
See here for a list of resources related to SQL/PGQ.
Loading data
Starting with an empty DuckDB database, load the Person
and Person_knows_person
tables from the LDBC SNB dataset using the following commands:
CREATE TABLE Person AS SELECT * FROM 'https://gist.githubusercontent.com/Dtenwolde/2b02aebbed3c9638a06fda8ee0088a36/raw/8c4dc551f7344b12eaff2d1438c9da08649d00ec/person-sf0.003.csv';
CREATE TABLE Person_knows_person AS SELECT * FROM 'https://gist.githubusercontent.com/Dtenwolde/81c32c9002d4059c2c3073dbca155275/raw/8b440e810a48dcaa08c07086e493ec0e2ec6b3cb/person_knows_person-sf0.003.csv';
Creating the property graph
Next, create a property graph, which is persistent across database sessions and automatically reflects changes made to the underlying data. Similar to a VIEW
, the property graph provides a layer for querying graph structures, ensuring that updates to the base tables are immediately reflected in the graph representation. For more details, refer to Property graph.
Use the following command to define the property graph:
CREATE PROPERTY GRAPH snb
VERTEX TABLES (
Person
)
EDGE TABLES (
Person_knows_person
SOURCE KEY ( person1id ) REFERENCES Person ( id )
DESTINATION KEY ( person2id ) REFERENCES Person ( id )
LABEL Knows
);
If successful, you will see the following confirmation, allowing you to execute queries using SQL/PGQ syntax on the created property graph:
Pattern matching queries
SQL/PGQ uses a visual graph syntax, inspired by Cypher. Vertex elements are denoted by ()
and edge elements are denoted by []
. Here is an example of a pattern-matching query where we find friends of Jan and return their first names.
FROM GRAPH_TABLE(snb
MATCH (a:Person WHERE a.firstName = 'Jan')-[k:Knows]->(b:Person)
COLUMNS (b.firstName)
);
The result will be:
┌───────────┐
│ firstName │
│ varchar │
├───────────┤
│ Ali │
│ Otto │
│ Bryn │
│ Hans │
└───────────┘
Thanks to DuckDB’s friendlier syntax (blog post 1, blog post 2) we can omit the SELECT
and instead use the COLUMNS
clause.
The previous query featured a right-directed edge, ()-[]->()
, meaning that the left node pattern is the source, and the right is the destination.
DuckPGQ also supports the following edge types:
- Left-directed edge:
()<-[]-()
- The source of the edge is on the right side, the destination is on the left side.
- Any-directed edge:
()-[]-()
- The relationship can exist in any direction.
- Left-right-directed edge:
()<-[]->()
- The relationship must exist in both directions.
OPTIONAL MATCH
is currently not supported but will be in a future update.
Path-finding
Another significant feature of SQL/PGQ is the introduction of a more concise syntax for path-finding within a query. This enables us to find the shortest path length between any pairs of nodes in the graph.
DuckPGQ only supports finding ANY SHORTEST
path between nodes, which is non-deterministic.
To query the shortest path length between Jan and the first five persons sorted alphabetically, use the following query:
FROM GRAPH_TABLE (snb
MATCH p = ANY SHORTEST (a:Person WHERE a.firstName = 'Jan')-[k:knows]->+(b:Person)
COLUMNS (path_length(p), b.firstName)
)
ORDER BY firstName
LIMIT 5;
The result will be:
┌────────────────┬─────────────┐
│ path_length(p) │ firstName │
│ int64 │ varchar │
├────────────────┼─────────────┤
│ 3 │ Abdul Haris │
│ 2 │ Aleksandr │
│ 2 │ Alexei │
│ 2 │ Ali │
│ 1 │ Ali │
└────────────────┴─────────────┘
The previous query showed the +
syntax, which is syntactic sugar for finding the paths with a lower bound of 1, and no upper bound. This can also be denoted as ()-[]->{1,}()
.
Other options for path-finding are:
- Kleene star
*
: Lower bound of 0, no upper bound. {n, m}
: Lower bound ofn
(wheren > 0
) and upper bound ofm
(wherem ≥ n
).{,m}
: Lower bound of 0, upper bound ofm
.{n,}
: Lower bound ofn
, no upper bound.
Retrieving the path
DuckPGQ also allows you to retrieve the rowid
’s of the nodes and edges that are on the shortest path by adding element_id(<path variable>)
in the COLUMNS
clause.
Other options are:
vertices(<path variable>)
: Returns therowid
’s of the vertices on the shortest path.edges(<path variable>)
: Returns therowid
’s of the edges on the shortest path.path_length(<path variable>)
: Returns the path length of the shortest path.
The following query shows an example:
FROM GRAPH_TABLE (snb
MATCH p = ANY SHORTEST (a:Person WHERE a.firstName = 'Jan')-[k:knows]->+(b:Person)
COLUMNS (element_id(p), vertices(p), edges(p), path_length(p), b.firstName)
)
ORDER BY firstName
LIMIT 5;
The result will be:
┌───────────────────────────┬────────────────┬─────────────┬────────────────┬─────────────┐
│ element_id(p) │ vertices(p) │ edges(p) │ path_length(p) │ firstName │
│ int64[] │ int64[] │ int64[] │ int64 │ varchar │
├───────────────────────────┼────────────────┼─────────────┼────────────────┼─────────────┤
│ [1, 3, 5, 22, 26, 66, 44] │ [1, 5, 26, 44] │ [3, 22, 66] │ 3 │ Abdul Haris │
│ [1, 5, 33, 79, 39] │ [1, 33, 39] │ [5, 79] │ 2 │ Aleksandr │
│ [1, 3, 5, 24, 32] │ [1, 5, 32] │ [3, 24] │ 2 │ Alexei │
│ [1, 3, 5, 21, 21] │ [1, 5, 21] │ [3, 21] │ 2 │ Ali │
│ [1, 3, 5] │ [1, 5] │ [3] │ 1 │ Ali │
└───────────────────────────┴────────────────┴─────────────┴────────────────┴─────────────┘