In the previous post on this we saw what is a basic graph data structure, and the layout of an example – a movie database with it. In this I am going to explain how to convert that design into graph tables.
The SQL Server graph architecture is explained really well here.
I already have some relational tables with data for this model. I am looking at converting that into a graph data model like below
DROP TABLE IF EXISTS PersonNode;
CREATE TABLE PersonNode (
PersonID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
PersonName varchar(500) ) AS NODE;
Now I get a node table..i’d like to see what the node table looks like.
I have highlighted in red what SQL Server adds to the table – the two system columns – graph id, which is bigint, and node id, which is nvarchar and stores json, and the unique index to help with queries.
We can also see from constraint type that this table is similar to other relational tables – it can be enabled for replication and can have related delete or update actions defined on it if need be.
We can now insert data into this table from relational tables as below.
INSERT INTO PersonNode (PersonName)
SELECT actor_name FROM MovieData.dbo.Actor
SELECT director_name FROM MovieData.dbo.Director
Selecting data from the table gives us below. It is interesting and informative to see how node id is stored as JSON.
SELECT * FROM PersonNode
We can create the movie node similarly, as below.
DROP TABLE IF EXISTS MovieNode;
CREATE TABLE MovieNode (
MovieID INT PRIMARY KEY,
MovieTitle varchar(500) NOT NULL,
MovieLanguage varchar(500) NULL,
MovieCountry varchar(500) NULL,
MovieFacebookLikes INT NULL
) AS NODE;
INSERT INTO MovieNode (movieid,movietitle,movielanguage,moviecountry,moviefacebooklikes)
SELECT [MovieId],[Movie_Title],[Language],[Country],movie_facebook_likes FROM MovieData.[dbo].[movies]
Now that we have the two basic nodes, lets try and create the edges or connections between them.
DROP TABLE IF EXISTS MoviesActorLink;
CREATE TABLE MoviesActorLink (
Link BIT NOT NULL DEFAULT 0,
MovieActorLevel smallint NULL,
MovieActorFacebooklikes DECIMAL NULL,
CONSTRAINT EDG_MoviesWithActor CONNECTION (MovieNode TO PersonNode)
) AS EDGE;
Let us explore what an edge table looks like under the covers.
As highlighted in the red square on top, there are a whole bunch of system columns added for an edge table – the first two, graph id and edge id are unique identifying columns while the rest are for the nodes we plan to connect using this edge. There is also a default index added on graph id.
I chose to add a constraint on two nodes i plan to connect – movie node and person node. In case anyone wants to delete data from either of the node tables – there should not be an orphaned record in the edge table that points to the deleted node. This constraint can come in very useful.
Now I am ready to insert data into the edge table from relational table. I do this as below.
INSERT INTO MoviesActorLink ($from_id, $to_id,movieactorlevel,movieactorfacebooklikes)
SELECT a.$node_id, p.$node_id,c.actor_level,c.actor_facebook_likes FROM dbo.MovieNode a
INNER JOIN MovieData.dbo.MoviesActor c ON a.movieid = c.movieid
INNER JOIN MovieData.dbo.Actor b ON c.actorid = b.actorid
INNER JOIN dbo.PersonNode p ON b.actor_name = p.personname
The data in my the newly created edge table can look as below.
SELECT * FROM dbo.MoviesActorLink
As we can see the node and edge ids are all json based pointing to existing nodes. In the next post I will explore running simple queries off of this data. Thanks for reading.