Junkbox Raider: Neural networks part 2: Evolving a "living" robot

In my first post on neural networks, I discussed training the network using gradient descent - a pretty straightforward optimization method. This project took a completely different approach: evolving the network's weights with genetic algorithms.

Our project team designed a virtual agent (robot) that learned to avoid obstacles while acting autonomously to "work" and "eat", maintaining its own internal conditions in proper balance like a living animal.

The virtual robot (green circle) navigates from the green "health" waypoint to the red "work" waypoint while avoiding the gray obstacles.

We started in simulation, planning to implement the working system on a physical robot, but ran out of time to get the hardware side functioning. C'est la vie robotique! We did make sure our virtual agent would use the same motor commands as the real robot, so the simulation wasn't completely disconnected from the real world.

Full details, including the multilevel control architecture we developed, after the jump.

Multilevel Control for Multiple Goals
We wanted our robot to accomplish several objectives at once:

Avoid obstacles in the environment...
...while navigating from any location to any other location...
...specifically, move between a "health source" and a "work site" to maximize "work done" and "current health" scores.

I've broken the goals down this way because we did a few experiments and concluded it would be too hard to train a single neural net to do all these things simultaneously. We settled on using a subsumption architecture to compose controls for each goal into an overall strategy.

The subsumption architecture. From the bottom up, control goals become more abstract, while control commands flow from the top down, becoming more and more concrete until they are direct motor commands.

There are two neural nets, as well as a hardcoded navigation layer, since we also had trouble training a neural net to do the navigation.

Training the Obstacle Avoidance Net
The lowest level is the neural net that controls obstacle avoidance. We trained this net with backprop, just like Canyonero. It basically worked fine from the word go, so we experimented with two different types of training data.

First we drove the agent around the virtual environment, creating training data by changing the agent's heading when its virtual distance sensors detected walls or obstacles. Because the agent kept moving while it turned, this produced smooth curves away from obstacles and walls.

Network training during agent driving. The blue lines are training runs; the green line is a test run using the trained network.

In the second case, we placed points randomly on the map, checked the distance sensors given that location and a random heading, and indicated a heading change based on the distance to detected walls and obstacles. This method produced driving behavior with sharp turns very close to the obstacles.

Network training with random points (in blue) and green test runs using the trained network.

Unsurprisingly, the random points method did worse and worse as we reduced the number of points. It was also much more susceptible to edge cases (e.g., turning away from all obstacles but one, since it hadn't encountered that particular type of distance reading before). The driving method was more robust to situations not seen in training.

No Training for Navigation
For some reason, neither backprop nor genetic algorithms worked very well to instruct the robot to change its heading so as to move from one place on this map to another.

Ultimately we ditched a neural net in favor of a hand-coded function to calculate the bearing of a straight line between the two points, then incrementally move the robot toward that heading.

In the subsumption architecture, this heading output becomes input data for the obstacle avoidance net, which modifies the heading as needed.

Mutating Weights to Maximize Work
We wanted our agent to behave like an animal that can gather energy from its environment and use that energy to perform useful work. The agent is always using energy, even when it's not moving, and it uses even more when it's "working". The goal is to maximize work done without running out of energy ("dying").

With the obstacle avoidance net and navigation function, the agent can move around the map from one place to another. However, we now have to train a second neural net (the maximization net) to choose at each time step whether to move to the health location, collect energy (i.e., stay at the health location), move to the work location, or perform work (i.e., stay at the work site).

This is nontrivial. We can't use supervised learning, because we don't know what the net should do, so we can't use backprop. Naive approaches (e.g., work until health falls below a threshold, then return to the "health" site), may work occasionally, but fall apart easily. That's why we decided to evolve the correct weights with genetic algorithms.

Genetic algorithms apply biological concepts to change controller parameters ("genes") over time to optimize some measure of controller performance (often called fitness). In our case, the genes are the maximization net weights and the fitness measure is the "work done" score, which is trivial to calculate.

With each iteration of the algorithm (generation), the genes are randomly changed (mutated), split, and recombined in various ways to create many neural nets (members) with unique "genomes". Each member is then tested for fitness, and the most fit members carry forward to the next generation, often with additional mutations.

Over enough generations, fitness will rise until the most-fit member can perform the proper control task. Note that we didn't have to know anything about how that task should be done, just how to assign scores to different outcomes.

Fitness per generation for a training run (lower values indicate higher fitness). Overall average fitness is green; average fitness per generation is red, and blue shows the fitness of the best member of a given generation.

You may notice the fitness doesn't increase that much; fitness curves for genetic algorithms often show much more dramatic improvement. However, our approach did successfully train the maximization net to identify the best location to head to, based on the agent's internal state and environmental conditions.

The plot below shows a successful run, with the fully trained three-level subsumption architecture controlling the robot to move between the work and health locations until the program is stopped.

A successful test run using fully trained nets in the subsumption architecture. The green line traces the path of the robot, starting from the right and then moving back and forth to gather health (at the pink X) and do work (at the black cross).

Junkbox Raider

Wednesday, December 7, 2011

Neural networks part 2: Evolving a "living" robot

No comments:

Post a Comment