Simple Graph Queries

In this post we saw how to create some graph tables with data. In this I will explore simple queries off of this data and how they compare with their relational counterparts.

The main goal behind a graph design is to help you answer queries – so what are the questions you’d ask of a movie database, if you had one? Mine would typically be like below.

1 Who are the actors in this movie?
2 Who is this movie directed by?
3 Who is the most prolific actor, according this dataset?
4 How many actors are also directors?
..and so on.
Lets answer these one by one, and see how they compare relationally. So if i were to answer the first question the typical relational way – my query would be as below:

SELECT c.actor_name from movies a, moviesactor b, actor c

WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID

AND a.Movie_Title = 'Jurassic Park'

This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.

SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a

where MATCH(m-(a)->p) AND m.movietitle = 'Jurassic Park'

The queries for other questions, with their relational counterparts, are as below.

--Most prolific actor

SELECT TOP 10 c.actor_name,COUNT(1) AS moviesactedin from movies a, moviesactor b, actor c

WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID GROUP BY c.actor_name ORDER BY moviesactedin desc

SELECT TOP 10 p.personname,count(1) AS moviesactedin FROM dbo.personnode p, movienode m,moviesactorlink a

where MATCH(m-(a)->p) GROUP BY p.personname ORDER BY moviesactedin desc

--2 Actors who are directors

SELECT c.actor_name,a.Movie_Title from movies a 

INNER JOIN moviesactor b

ON a.MovieId = b.movieid

INNER JOIN actor c 

ON b.actorid = c.ActorID

INNER JOIN MoviesDirector d

ON a.MovieId = d.movieid 

INNER JOIN director e ON

d.directorid = e.directorid

AND e.director_name = c.actor_name

SELECT p1.personname, m.movietitle FROM personnode p1, movienode m, moviesactorlink a,moviesdirectorlink d

WHERE MATCH(m-(d)->p1 AND m-(a)->p1)

The advantages are
1 Fewer number of tables
2 Easy to write as opposed to a lot of joins.

The node table usually has a seek operator on it, but edge tables are scanned since it is not possible (currently) to create an index on edge id. I will explore the most useful part of this feature – shortest path, in the next post. Thanks for reading!