August 2019 – Curious..about data

Graph Data – Basic Structure

In the last post I covered briefly the history around graph databases. In this one am going to look at the structure. There have been a lot of blog posts written up about structure of graph tables in SQL Server. I really like this series by my friend Niko Neugebauer.

The simplest way to understand a graph data model is that there are just two entities – Nodes, which is what we call Entities in the relational world, and Edges, which are what we call relationships. They are typically represented like below, with the circles standing for nodes, and the arrows for relationships. The emphasis, as we can see is on the bold arrows – because relationships are what graph data is about, with less emphasis on entities/nodes.

To illustrate with an example – I took the free movie dataset from IMDB and designed it the relational way. So, I have a few tables – movies, actors, directors and such, connected like below.

Relational Data Model

Why is this a good candidate for graph data?

1 It has more than one many-to-many relationship, with the candidate tables having significant amounts of data.
2 The nature of relationships are worth querying on – for example, how many directors are also actors, which actor has co starred with which actor in how many movies, what is the shortest way to reach one actor from another..and so on.

If I want to redesign this in graph model – the main thing to remember is that the concept of graph data is largely implemented as nosql, so there is no ANSI like standard to stick to. We’d have to make our own rules. At the designing stage, the main rule is ‘design around relationships.’ So, think in terms of the verbs ‘who produced what’, ‘who acted in what’, ‘who directed in what’. All of these make our edges, or arrows. Then we can see what connects those arrows – the two nodes, Person, and Movies. Those are our entities. So, my graph data model of the same database looks like below.

In the next post we can look at how to create sql graph tables and query on them with this model. Thanks for reading!

Graph Databases – Introduction

I have been looking into this feature and also into understanding graph data in general. I believe introduction of graph database feature in SQL Server has many advantages – although I also believe it is important to understand the background/origin/ and how it was done before. In this series I will start with the history and cover several ways it was done before we got to where we are now.

Origins of Graph Theory: The theory behind graph data is old and goes back what is popularly known as Konigsberg Bridge Problem. The problem has been narrated several times – in short it was about how to traverse a town with 4 land masses (2, but 4 if you include two banks of a river) connected by 7 bridges. A mathematician named Leonard Euler took it up and came up with a mathematical concept of ‘nodes'(land masses) ,’edges’ (the connections between nodes or bridges), and the number of ‘edges’ coming out of each node (degrees). Euler’s theory, put very simply, says if you have more than two nodes with odd degrees in a configuration then you cannot traverse the graph from one end to another. This laid the foundation for graph data and what we have now.

How is it different from relational data? : Every thing we do with graph data modeling and querying can be done in the relational world. But to classify data as graph data it has to be a certain way. Graph data has way more relationships than it does entities. I like to use the simple example of social media followers. I am an entity, for example, with 100 followers. My friend has 200 followers. Between us we have 50 friends in common. Those friends in turn have friends in common with me. And so on. If you have to model this in the relational world, the number of relationships will be too large and difficult to represent, let alone query on. This is the kind of problem that graph data modeling and querying helps us model and deal with.

What are some common examples of graph data? Graph data is all around us…some of the most common examples are Chart of Accounts, Organizational Charts, Transportation Systems (GPS), Bill of Materials and social media connections.

In the next post I can discuss some examples of data modeling with graph data, and what are the specific problems/algorithms we can solve by modeling it this way.

PASS Summit 2019 – Getting the most out of it

The session line up for PASS Summit 2019 was announced today...there are so many good sessions to go to..managing time and what we do with our limited time there is an important skill..to some extent. I’d say it is partly skill, and partly luck to get the most out of it. We can control the skill part, so let’s see how.

1 If I am sponsored by my job, I’d consider the top sessions that I can go to that would add value to what I do at work. So let us say Powershell is one of them – I would shortlist all the sessions on powershell and decide which ones would add most value to what am doing, and attend those. One of the key things i’ve learned here is that a beginner level session on something I know does not necessarily disqualify it. It may certainly not be the best place for me to learn – but some beginner sessions are very creatively done, and can often offer new insights into something I already know. It also depends on the speaker who is presenting – I know certain speakers whose sessions I will attend, no matter what, which brings me to the next point.

2 Check the schedule for my favorite speakers and what they are presenting on. There are too many this year but I try to go to as many as I possibly can.

3 If I have skills I need to or like to learn personally – I try to attend all those sessions too. I have an ongoing list of those and keep up as much as I can.

Two of the challenges I face every year are as below:

1 Session I want to go to cancelled, room is full, or is not as good as I expected – in all of these cases I go to the next one I planned to. (Always have a backup plan). It does not hurt at all though to just stand in hallway and talk to someone I’ve not seen in a long time. Or visit the exhibit hall, or the community zone. All of these activities are part of the summit.

2 I feel too tired to walk to the session I want to go to – this is not at all uncommon while doing back-to-back sessions and the next session is at the other end of the conference center. Even if I manage the sprint, chances that the session will have room for me may be doubtful, unless it is one of the larger rooms. It is for reasons like this that we have summit recordings – this time they seem to cost extra (they are usually free for summit attendees). I would still invest in them to listen to all the talks I missed.

Last few tips are to attend keynotes, visit vendor area and do after hour parties. Also don’t forget to have your business cards on, you can self order if your company doesn’t. It is inexpensive and very worth it. Networking is a very important part of the summit – probably more important than going to sessions. Get rest as appropriate for you – there is no point having a very detailed list of classes to attend if you feel drowsy or fall asleep in them. Stay out for partying but watch for not overdoing it and killing the purpose why you’re there. Wear good walking shoes, and drink plenty of water. Hope this was helpful, and have fun!!