An Introduction To begin, a quick introduction to basic database design. Most databases are a collection of tables that contain specific information and fields describing the relationships to other tables. The most common relationship is called a one-to-many relationship. For example, an author has written many books and each book has one author. Can you see an immediate issue with relational databases? – some books have several authors. We could add additional author fields into a book record – author1, author2, author3 and so on, but these are hardwired and require lots of code to be created to maintain the links. What happens if we have no author? Or if an author also writes under a nom de plumes?
Adding flexibility
With a graph database setup you still have the same kinds of data, but the relationships between items is more flexible. Each connection is described on the connection itself, so a link between a book and a person can be described as an author, alias, contributor, illustrator, editor or any other type of connection that may occur. Because these connections are dynamic, you don’t need to know all of the potential ways your data may connect when you are setting it up – you can define new connections between items if and when required.Nodes and edges
In graph database terminology, the people and books are called nodes and the connections between them are called edges. Each node has specific properties (metadata) on it – for example a date of birth for a person, and each edge has properties about the join between items – for example the author or illustrator of our book. There is no limit to the number of edges (joins) you can make between nodes, or any limitations on what kind of connections they are. With a standard database, each connection is essentially hard coded when the database is created and items that fall outside of the norm tend to be poorly managed and non-discoverable. You cannot just add new connections to accommodate these items on the fly – you need to edit the database structure and the underlying code to make changes. In a graph database the connections can be made up as you go along, so your data expands your system as it grows.A data challenge
Here’s an example of how something fairly simple to say is nearly impossible to do in a standard database: Music. Mapping songs to albums is easy enough, even with greatest hits and live recordings being added to the mix. Add in the singer and songwriter on each song for each album and we’re getting richer metadata, and it’s still possible. We now have several tables covering the bands, albums, songs and people with a variety of connections between them and we’re all good. However, to go to the next stage is where it falls apart.Scenario time:
How many different and yet distinct connections can you think of in this situation? In 1992, after the death of Freddy Mercury, Queen is playing a tribute concert to over 70,000 people. The remaining members of the band, along with other performers including Robert Plant, Elton John, David Bowie, George Michael and Annie Lennox, perform a variety of Queen songs. Guns N’ Roses, Def Leppard and Metallica also get on stage and play their tributes to Freddy Mercury. Data wise, we have- (Most of) a band (Queen) playing a tribute to another band member who isn’t there.
- Queen (thankfully) play their own songs – but some other musicians contribute the vocals
- Musicians from other bands may join Queen on stage – playing various instruments
- in some cases a completely different band plays one of Queen’s songs.