Have you ever wondered how Non-Player Characters (NPCs, bots, AI, etc..) are made in your favorite games? Well I certainly have. From simple characters like a Goomba in Mario to complex characters like Expert bots in League of Legends, Street Fighter, or Call of Duty are all controlled by some form of Artificial Intelligence. It could be a simple movement path (move from point A to point B, then from point B to point A, repeat), it could be something more advanced like a decision tree that handles complex situations by following a path of decisions on what to do in each scenario. There are countless methods for implementing non-player behavior, and really there is no "one size fits all" option.


Unity Machine Learning Agents is a plugin for the Unity Editor provided by Unity that enables users to get started developing Intelligent Agents. To me, one of the most interesting aspects of the ML-Agents toolkit is that it doesn't only apply to gaming related problems. Sure, most of the examples are related to simple scenarios used in gaming that serve as a way to connect with the audience and better relate the complex concepts in more digestible terms. Once the concepts have been understood, the applications for the tech is practically limitless and it's uses are growing rapidly.

One of my favorite things about the Unity ecosystem is their documentation and learning resources, and the ML-Agents plugin is no exception. I will not attempt to replicate all of the information in their documentation because they do such a good job at explaining things, but instead I will pick the parts that are important and relevant to me and do my best to summarize them here. But before we can really understand how the ML-Agents toolkit does it's work, we should first understand a few key concepts in Machine Learning.

Machine Learning - Reinforcement Learning

Machine Learning is a vast topic to cover and a pretty big buzzword in the tech industry. For the purposes of the Unity ML-Agents and the context of this challenge, the main class of Machine Learning that is focused on is called Reinforcement Learning. The very basic explanation of Reinforcement Learning is that an agent takes observations about an environment, performs an action based on those observations, and receives either a positive or negative reward. The goal of Reinforcement Learning is to create something known as a policy that is essentially a mapping of these observations into actions.

An Agent is the NPC or object that is learning some new behavior, and the environment is the world that it lives in. This cycle is repeated and iterated on to positively reward actions that lead to the desired behavior, and negatively reward actions that lead to undesired behavior. The goal of the agent is to maximize it's reward, and so it seeks out actions that reward it positively, as opposed to those that reward it negatively.

The example used in the Unity Background on Machine Learning documentation is that of an NPC Medic character in a War game (think Battlefield or Call of Duty). The role of this NPC is to heal and/or revive the player characters as needed. An example of a negative reward could be when the medic is hit by enemy fire it receives a large negative reward. An example of a positive reward could be when the medic successfully heals or revives a player character he receives a moderate positive reward. You could also combine those with a negative reward for every second that passes where a player character needs healing or reviving, effectively making the NPC character want to "hurry" to heal the injured player.

  • Observations - What the NPC knows or perceives about the environment. This is not necessarily the entire environment, but typically a subset of the environment. In the medic example, the medic will likely only know about the things that it can see. It should not know if there is an enemy hiding behind a building out of site, or if there is an enemy coming up behind them (unless perhaps there is some sort of radar system allowing the medic to be aware of the enemy). The medic might also be expected to be aware of the number of teammates, their positions, and their current health, allowing the medic to make informed decisions about where to go and who is in the most critical condition.

  • Actions - Based on the observations the medic has, certain actions will be taken accordingly. Actions can be anything from simple movement commands to complex actions such as running and jumping or healing a downed character. Each action taken should receive either a positive or negative reward to help inform the agent how it is performing in the given scenario.

  • Rewards - Rewards are given to the agent based on the actions that are taken. Rewards can be small or large based on the desired effect from the action. The outcome of the reward system is the agent performing the desired objective in order to maximize it's rewards. You negatively reward actions that go against the desired objecting, and positively reward actions that get the agent closer to it's goal.

After we have defined our Observations, Actions, and Rewards we will need a way to tie them all together, something to actually handle taking the observations, deciding which action to take, and dealing out the rewards. In the next section we will take a look at how all of these pieces fit together to make a working system.

How ML-Agents Work

With a bit of background understanding of how Reinforcement Learning works we can take a more informed look at the pieces ML-Agents toolkit that we will be using to create an Intelligent Agent. Up until now the things we have discussed have been conceptual but nothing that we can necessarily implement into a game or training scenario. To understand the ML-Agents toolkit we will necessarily transition into the world of Unity components and game objects.

There are three key components to understanding the ML-Agents toolkit.

  • Agents - An Agent is a component that is attached to a Unity Game Object. The agent makes the obervations about the environment and sends those observations to the Brain. Once the Agent receives the action from the Brain, the Agent will give itself a predetermined reward based on the result of that action. A positive reward for a desirable action, a negative reward for an undesirable action.

  • Brains - A Brain is another component that is attached to the same GameObject as the Agent. Each Agent is connected to a single Brain. The Brain will decide which action the Agent should take based on the set of observations it receives from the Agent. Once the Brain receives the obervations from the Agent it will forward those along to the Academy that will interact with an external to Unity process that keeps track of all observations and actions and helps the training process.

  • Academy - An Academy is essentially an orchestrator to the entire process. Contrary to an Agent and a Brain there is only a single Academy in the training environment, whereas there can be many Agents and Brains. The Academy connects the training environment to an external training process (in this case a python library that performs the actual training) and links the results back into the training environment. The Academy also controls whether the experiment runs in training configuration or inference configuration. Training config is used when you are training your agents to learn new behavior, and inference config is used when you are ready to test the learned behavior.

In the Medic NPC example, the medic would have a single Agent attached to it, as well as a Brain. But what if there were other Medics in the environment? Each of those medics would have their own Agent and Brain. All Agents in the scene will have their own sets of observations, resulting in the Brains responding with their own sets of Actions. Even if all of the medics have the same Brain, it is still receiving different observations from the Agent attached to it. Since all of these Brains technically are the same, there is only one copy of the Brain attached to the one instance of the Academy in the environment. The Academy is then responsible for communicating with the external python process that handles the calculations and learning policy for the Brain.

When the Academy is set to learning configuration it will forward the observation and reward data to the external learning process to help train the policy. The output of a run of the learning configuration will be a Tensor Flow model that can be used in the inference configuration. When the Academy is set to an inference configuration, instead of communicating with the external process the Academy will use the already created policy from the learning configuration. This policy comes in the form of the aforementioned Tensor Flow model.

If you want to find out more about how the ML-Agents toolkit works checkout this link.

The 3D Ball

The first project covered by the ML-Agents documentation is one called the 3D Balance Ball environment. It is a pre-built environment with an experiment already setup for you, and the goal of the environment is to train a flat, square platform to balance a ball that has been dropped onto it. This is a relatively simple challenge to understand and gives the opportunity to ease into many of the complex concepts that Machine learning and the ML-Agents toolkit introduces. Some of the background knowledge that really helps understand what is happening in this environment is covered by the excellent documentation provided by the Unity team, broken down into sections.

Installing ml-agents

I am working on a Windows machine so I followed their Windows Installation Guide to get setup. I skipped some of the extra steps around setting up GPU training and docker for now to help me stay focused on the main goal of learning, if needed I could always come back and setup these things in the future. I ran into a few issues getting setup, one in particular was a missing dependency from the python setup that didn't allow me to run one of the commands needed. I had to do some digging and found my answer on the ml-agents github issues page.

After running the fix I had my environment setup and ran a few tests to make sure things were working. At last I was ready to dive in and actually do some fun stuff! I then followed their Basic Guide that provided a simple Unity environment to get started with the ML-Agents toolkit and learn how the components work without overwhelming yourself. Luckily I already had a reasonably strong Unity background and so jumping into this part was relatively easy for me; I just augmented my current Unity knowledge with the new information about the ML-Agents toolkit.

Tensor Flow

Tensor Flow is a Machine Learning framework
If you want to find out more about how Tensor Flow works with the ML-Agents toolkit checkout this link.