In the last post I covered briefly the history around graph databases. In this one am going to look at the structure. There have been a lot of blog posts written up about structure of graph tables in SQL Server. I really like this series by my friend Niko Neugebauer.
The simplest way to understand a graph data model is that there are just two entities – Nodes, which is what we call Entities in the relational world, and Edges, which are what we call relationships. They are typically represented like below, with the circles standing for nodes, and the arrows for relationships. The emphasis, as we can see is on the bold arrows – because relationships are what graph data is about, with less emphasis on entities/nodes.
To illustrate with an example – I took the free movie dataset from IMDB and designed it the relational way. So, I have a few tables – movies, actors, directors and such, connected like below.
Relational Data Model
Why is this a good candidate for graph data?
1 It has more than one many-to-many relationship, with the candidate tables having significant amounts of data.
2 The nature of relationships are worth querying on – for example, how many directors are also actors, which actor has co starred with which actor in how many movies, what is the shortest way to reach one actor from another..and so on.
If I want to redesign this in graph model – the main thing to remember is that the concept of graph data is largely implemented as nosql, so there is no ANSI like standard to stick to. We’d have to make our own rules. At the designing stage, the main rule is ‘design around relationships.’ So, think in terms of the verbs ‘who produced what’, ‘who acted in what’, ‘who directed in what’. All of these make our edges, or arrows. Then we can see what connects those arrows – the two nodes, Person, and Movies. Those are our entities. So, my graph data model of the same database looks like below.
In the next post we can look at how to create sql graph tables and query on them with this model. Thanks for reading!
2 thoughts on “Graph Data – Basic Structure”
” so there is no ANSI like standard to stick to” – but what about SPARQL?
SPARQL is a querying language. ANSI SQL sets relational database querying standards – we don’t have anything equivalent to query graph databases. Any graph implementation is a combo of the database and the querying language, they are two parts which may be the same provider or different. Also all graph implementations are not RDF based which SparQL is. More on that in posts to come.