Intro to Knowledge Graphs

AI is very good at faking understanding. Using statistical methods to identify correlation, it can predict the sales price of a house, turn spoken word into text, and identify if a picture contains a dog. Do algorithms understand what a house is and why a larger house will sell for more money than a smaller one? Do they understand that the sounds it is transcribing are humans speaking? Do they know what a dog is? The short answer is no.

At a very high level, these algorithms convert inputs into math and identify how these inputs are correlated to an output. To a machine learning algorithm, a sentence is just a string of alphanumeric characters and a picture is just a collection of RGB values. To a large degree this is all we need. Algorithms don’t need to know that a dog is an animal to identify a dog in a picture. But what happens if we need some understanding of context? What happens if we need to know that a dog is an animal or that the Wizard of Oz is a movie and is playing down the street from you at 6:46pm tomorrow? Knowledge Graphs are the answer.

Knowledge Graphs are a different way of storing information, where relationships between data points are stored along that data. Yes, I know that’s a lot to unravel, but bear with me. Normally, information is stored in relational databases. It structures data with columns and rows. Just think of an excel spreadsheet. Each column contains a certain type of data. Each row contains data related to other data points in the same row. Let’s say we’re talking about movies. In a relational database, one column might have the names of movies, another might have each movie’s genre, a bunch of other columns might contain all the actors in each movie.

Now, I’m sitting on my couch watching Interstellar with a friend and I want to tell him about another movie, but I’ve forgotten its name. I know it’s in the same genre as Interstellar, and I know that the lead actor of Interstellar appeared in at least two other movies with one of the actors in the movie I’m thinking of. I purposefully picked an example in which I don’t have a lot of information, but I remember how the movies and actors relate to each other. In this case, a traditional relational database would have to first look up the lead actor of Interstellar, then look up all the movies he acted in, then all the actors who acted along side him in those movies. It would then need to filter for actors that appear in that list at least twice, look up the movies that those actors acted in, look up the genre of Interstellar, and filter the resulting movie list by that genre. Whew! That’s 7 actions that my search would need to take to give me the search result I want. Not only is that slow, but it takes a lot of computational power, which can be expensive.

A Knowledge Graph would only need 1 query to answer my question. This is because the relationships between data points (we can call them entities) exist as first-class citizens in the graph along with the entities themselves. How does this work? Imagine each entity represented as a node. Entities are connected to other entities via relationships, which are sometimes called “edges”. These edges contain the types of relationship they represent and the strength of that relationship. They provide the context that is missing in other machine learning approaches.

We’ve already discussed that Knowledge Graphs can be more efficient than relational databases and can consider context, where other machine learning techniques cannot. Knowledge Graphs also have the added advantage of being “traceable”. This means that when they’ve returned a result or a recommendation, we can identify why. Knowledge Graphs don’t suffer from the same “black box” problems as other machine learning approaches, making them ideal for more regulated industries that need to explain why they came to certain conclusions.

The main example I’ve used thus far to describe the benefits of Knowledge Graphs is search. If need a piece of information and I can interrogate the graph to obtain that information. We can see this in effect in Facebook’s Graph Search. Facebook stores information from its social network in its Social Graph, which contains people, places, things, as well as actions people take on its platform like “Likes”. You can interrogate the Social Graph with Graph Search to answer questions like “Which of your boyfriend’s friends like Star Wars and Harry Potter?” Google also uses a Knowledge Graph to surface related results. If you search for Thomas Jefferson, it will surface a card with useful information about him and people related to him. If you search for a movie, it will surface show times for movie theaters near you.

Google_Knowledge_Panel.png

Knowledge Graphs can be used for more than search. Once data is structured in a Knowledge Graph, Graph techniques like cluster analytics can be used to derive insights. This enables applications that go beyond search like automated fraud detection, intelligent chatbots, advance drug discovery, dynamic risk analysis, and content-based recommendation engines. Future articles will discuss each of these solutions and how Knowledge Graphs can be used to build them.

What we covered here was just an introduction to knowledge graphs. We didn’t get into graph database management systems like Neo4j or graph techniques and how they’re used. We’ll be addressing these in future articles.