Many AI techniques are about transforming data into forms that are more useful to us, taking unstructured data like free-form text, images, and audio, and extracting meaning from it. While this new transformed data (the objects we’ve recognized in images, the intents and entities we’ve recognized in free text, the words we’ve identified in audio) is more useful than the unstructured data it was derived from, we haven’t done anything meaningful with it.
Optimization at its core is about problem solving and planning. Using structured data provided by techniques like Natural Language Processing or Object Recognition as context, it can determine the most efficient way to achieve a goal. This goal can be anything from maximizing its score at a board game or a computer game to successfully negotiating a deal to optimizing a supply chain. Unlike previously discussed techniques, which either rely on providing an algorithm with pre-tagged data (supervised learning) or feeding it data and asking it to categorize that data (unsupervised learning), Optimization uses another method called reinforcement learning.
In its simplest form, Optimization looks like iterative trial-and-error. The algorithm makes small changes to its behavior and observes whether this brings it closer to its goal. If it does, it continues to make changes to the same behavior. If not, it tries something else. Reinforcement learning enables an Optimization algorithm to remember its prior actions and their effects relating to its goal. This allows it to start developing tactics that leads to higher scores.
Let’s use the example of an Optimization algorithm learning to play Super Mario Bros. In this game, for those of you living under a rock since 1983, the player controls Mario or Luigi as they jump from platform to platform moving right (directionally) through levels. There are enemies that can either be avoided by jumping over them or defeated by jumping on them. If you want to train an Optimization algorithm to beat this game, you can’t simply set its goal to “beat Super Mario” and wait until it does so. This would take a very long time and when it finally beat the game through random chance, the algorithm wouldn’t have learned anything. This is because the algorithm needs to be given short term goals to incentivize it to move in the correct direction. While you and I would instinctively know to move right and to avoid enemies, this is something the machine must learn. The large problem of beating Super Mario Bros must be broken down into much smaller problems. Instead of the end goal of beating the game, you must give it a goal that will lead to the game being beaten: maximize its score. Then you set up a point system where the algorithm is rewarded with points as it moves right, as it jumps onto higher platforms, and as it kills enemies. This point system would be enabled by Object Recognition. Through trial and error (over thousands or millions of attempts), the algorithm will develop strategies that will enable it to move to the right, jump onto higher platforms, and kill enemies. Maximizing for these goals will eventually lead it to beat the game.
But what if we’re talking about an adversarial game (a game played against an opponent) like Chess, Go, or DOTA 2? A human can’t possibly play a million games of Go against an algorithm to train it. First, it would take too long (and be far too boring for the human participant). Second, the machine would only learn to be marginally better than the human opponent it faced. This is where Generative Adversarial Networks (GANs) come in. Instead of facing off against a human opponent, the algorithm plays a slightly different version of itself thousands of times. After each game, the learnings from both algorithms are integrated into both sides and then they start a new game. GANs enable these algorithms to gain experience much more quickly because they can play these games at an accelerated rate. They also enable Optimization algorithms to surpass human levels of skill because they aren’t measuring themselves against humans. This is how DeepMind trained their AlphaGo algorithm to beat the best human player in the world at Go.
So, we’re done! If you incentivize an Optimization algorithm to maximize its score, it’ll eventually develop enough skill to beat the best human players, right? Well, not exactly. Approaches to Optimization tend to be focused on maximizing for the short-term. In situations where short-term gains will eventually lead to long term success (like Super Mario Bros), this works very effectively. In more complex situations, like we see in the video games like StarCraft and DOTA 2, short-term sacrifices can lead to longer-term success. This is part of the reason why AI has had trouble beating teams of human players in games that take place in complex, ever-changing environments and require the sort of strategic thinking that can’t be easily simulated.
Researchers have tackled this weakness by combining different systems to try to balance short and long-term strategy. One algorithm will try to identify the next best move, while another will look at how the entire game might finish. Together, these two algorithms can identify the best path toward victory. A similar approach was used by a system called Libratus, which was able to beat experienced poker players. One of its algorithms used reinforcement learning to teach itself the game and then determine the best next move. A second focused on the end outcome of the game. A third algorithm would identify patterns in how Libratus itself had previously bet and introduce randomness to throw off the other players.
Video games, board games, and card games—what do these have to do with how Optimization can help you run your business? The reason we talk about games when talking about Optimization is that researchers use these games as a test to see how well their algorithms work. Much like we gave the Optimization algorithm short-term goals to beat Super Mario Bros, beating computer games is a short-term goal toward building robust Optimization algorithms that can deal with the real world. The end goal is not to design a system that can beat a computer game, but instead to train algorithms that can work in complex ever-changing environments filled with uncertainty.
Optimization can be applied in real world situations where there is a defined goal to achieve. Facebook has already used Optimization techniques to train Chatbots to negotiate for simple items as well as humans can. Optimization is also being used to optimize supply chains, design shift rotations, and make recommendations. If a large decision can be broken down into smaller decisions that can be optimized using trial and error, Optimization can be employed to determine the most effective course of action.