Monday, February 19, 2018

2D Car AI project

https://github.com/longphin/Automated-2D-Car
Note: This article will explain my struggles along the way throughout this project. I will write a different article later that just explains the final implementation and results.

A while back, I wanted to learn Unity, while also implementing some machine learning. I think this is an excellent way to not only learn a new programming language, but also have a fun result you can show in the end. This blog will detail some of my thinking along the way, although I am starting this blog near the end of the project because I did not plan to blog about it at all.

Goal: Get a 2D car to learn how to navigate a racetrack.

Approach: Use an evolutionary algorithm approach to teach a neural network.

Input layer
  1. Distance from a wall within a certain radius.
  2. -1 or 1 if there is a wall
  3. The current velocity of the car.
  4. The current angular velocity of the car.
These 93 inputs will produce 4 outputs to determine if the car should move left or right, and accelerate or decelerate.

Each hidden layer uses a RELU activation function and the output layer uses a Tanh activation function.

Issue: I had originally bred the top 20% of the population to produce 40% of the next generation. I noticed that the results would vary greatly and the learning would be slow. So to make sure that future generations would be getting better, I also created an elitist group where the exact neural network is transferred to the next generation. This made sure the the learning was always equal or better than the previous.

Issue: After a certain point, it turned out that the calculations would get very large and overflow issues would arise. So I normalized the inputs to alleviate some of this to the [-1, 1] range. Since the inputs do not represent the same measurements (some are for distance, some are for speed, etc.), I did not normalize all values. For example, the distances ranged from (0, 10], velocity was not too significant in my case, and angular velocity ranged from (0, 100), so the scalings had to be different.

Issue: The cars would start off by learning how to turn left. However, I found that it became difficult for the cars to turn right after learning the left turn. So, I mirrored the track along the X axis. So there were 2 groups. Group 1 started off learning left, while Group 2 started off learning right. For each generation, it would breed both populations to hopefully learn left and right simulatneously.

Issue: My initial weights were way too large. They were randomly selected between [0,1]. But common procedure is to initialize the weights as fairly small. For the RELU function, the initial weights are now between [0, sqrt(2/n)] where n is the number of weights. For the Tanh function, the initial weights are now between [0, sqrt(1/n)]. This helped the cars learn a bit better.

Issue: When I implemented the elitist group (to transfer exact replicas of the best cars to the next generation), I had the copies re-run the track. Of course, they would end up exactly where the original did. This made the training a bit slower because there would be several generations did not improve, so the next generation had to wait until the origin did its re-run. To avoid this, I no longer create a copy. Instead, the car will retain its position but be considered dead at the start of the next generation.

Failed Change:  After realizing that my car probably only needed the front sensors to tell it to move forward, I removed the back sensors. So each car had a semi-circle as its line of sight. This actually did not prove useful, and cars would regularly do a U-turn and continue straight. This made the genetic algorithm learn much slower. So I reverted this change.

Failed Score Measurement: I had made the genetic algorithm rank the cars based on their distance to the next checkpoint. So it ranked cars based on if it finished a lap, how long it took to finish a lap, and how many checkpoints it passed. However, because of prioritizing safety, I noticed that the cars would slow down tremendously to avoid crashing. So I tried scoring them based on how long it took to reach a checkpoint instead. So it would rank on how long it took to get to checkpoint X, checkpoint X-1, checkpoint X-2, and so on. When scoring cars, if 1 car did not have a time for checkpoint X (i.e., the other car reached that checkpoint, but not this car), then the other car scored higher. This means it ranked on number of checkpoints reach and then the time it took to get to the checkpoints. I found that this learned VERY slowly, and the learning rate was not worth the results.