Python track: lab 7: fun with genetic algorithms

For this lab, we will use the code developed in lab 6 to search for Busy Beaver Turing machines using genetic algorithms, which are an interesting kind of search technique often used for difficult problems with no obvious structure we can exploit.

Note that the amount of code you need to write for this lab is relatively small, but it will take a while to explain the problem fully, so read the lab carefully before beginning.

The task at hand

In assignment 6 you wrote a program to simulate a 5-state Turing machine, and you used that program to verify that a particular such Turing machine would print out 4098 1's before halting. A "busy beaver" 5-state Turing machine is one that will print out more 1's starting from a blank tape than any other 5-state Turing machine. It is not known whether the Turing machine you simulated is in fact the 5-state busy beaver. An obvious question is: can we find a better one? Or, alternatively, what's the best one we can find assuming we had no prior knowledge? We have to search through the space of 5-state Turing machines to find one that can print out the largest number of 1s and then stop. How many such Turing machines are there? For each of five states, we could be reading either a 0 or a 1 on the tape, and in either case we could write either a 0 or a 1, move left or right, and transition to one of six states (the five states plus the halt state). This makes for (2x2x6)^(5x2) = 63,403,380,965,376 different TMs. Another complication is this: given that we use a finite-sized tape, if the TM runs off the tape, how do we know that it wouldn't have halted eventually if we had a longer tape? Well, we don't, so we just have to see how well we can do given these limitations. If we could test a new TM each microsecond, and we didn't have to worry about running off the tape, we could search through all the TMs in about 64,000,000 seconds, or 740 days. Of course, we don't have that long, so we'll try to do something more clever, which may or may not work well.

Genetic algorithms

A genetic algorithm (GA) is an algorithm for searching a large space of possible configurations (called a parameter space) of some computational process (here, 5-state Turing machines). The idea is to encode the numbers that determine the configuration and then look for the best set of numbers (the set of numbers is referred to as a parameter set). In our case, these numbers would determine the structure of 5-state Turing machines. There are different ways to encode these numbers; we'll talk about this more below. Assuming that we have such an encoding, we can generate a large population of random Turing machines and simulate each one to see how many 1s it produces starting from an empty tape. This number can be used to determine the "fitness" of each Turing machine; we'll talk about this below as well. The fitness is a measure of how good the parameter set is; higher fitness means a better parameter set.

Once we have determined the fitness of each parameter set in the population, we can generate a new parameter set that is likely to contain some Turing machines that are better than the best ones in the current population by using a genetic algorithm. This algorithm consists of three stages:

Take the most fit members of the current population and generate a new population (which is the next generation of parameter sets) according to some rule. We will use a rule called "fitness-proportional reproduction". Each of the parameter sets in the new generation is initially identical to one of the parameter sets in the old generation.
Take some of the parameter sets, pair them with other parameter sets and perform an operation called "crossing-over" which generates a new pair of parameter sets from a pre-existing pair. The process takes chunks of the two parameter sets and generates new parameter sets that consist of part of the first one and part of the second one.
Go through each parameter in each of the parameter sets and implement mutation of the parameter with some low probability.

These three processes occur in the order just described. All three of these processes are described in more detail below. The genetic algorithm gets its name because the three steps are roughly analogous to biological genetics (selective breeding, recombination or crossing-over, and mutation). Note that there are a very large number of ways to implement all three of these processes; the implementations we will use will be very simple and straightforward but not necessarily optimal. Note also that genetic algorithms are not guaranteed to find the optimal solution to the problem (unless you have infinite time to run the simulations, in which case anything will work); it turns out that some problems work very well with GAs and others don't work as well. We will see how well they work when we use them to find busy-beaver Turing machines (hint: don't get your hopes up).

Encoding the Turing machine state

To encode a five-state Turing machine, we use ten "slots". Each slot corresponds to the actions to take when in a particular state (0-4) looking at a particular number on the tape (0 or 1). Each slot consists of a tuple of length 3, which includes:

The value to write on the tape (0 or 1)
The direction to move (left or right, which we encode as 0 or 1)
The state to transition to (0-4, or -1 for the halt state). Note that we use 0-based indexing for the state indices as usual.

Each parameter set is a list of these tuples; the list is exactly ten tuples long. Our initial population is a list of these parameter sets, which is therefore a list of list of tuples. The contents of the tuples of the parameter sets are selected randomly from valid values (e.g. the value to write is either 0 or 1, the direction to move is 0 or 1, and the state to transition to is an integer between -1 and 4).

The fitness function

The fitness function you will use is very simple: just simulate the Turing machine and add 1 to the total number of 1s written on the tape when the Turing machine halts. If it doesn't halt, runs off the tape, or fails in some other way, give it a fitness of 1. You might wonder why we don't just use the number of 1s directly instead of adding 1. This is to avoid difficulties in the unlikely event that all the Turing machines had a fitness of zero; this would cause division-by-zero errors in our reproduction algorithm (see below).

Reproduction

We implement reproduction as follows.

We compute the fitness for each parameter set.
We create a new array (list) to store the new parameter sets which is the same size as the old array.
We take the very best parameter set and copy it to the first location in the array. This parameter set will not be altered at all in this location.
We add up all the fitnesses (including the best one) and determine what proportion of the total number of remaining available locations in the parameter sets array is represented by each parameter set (including the best one). Then we round this to an integral number for each parameter set and copy that number of each parameter set to the new array. There are different ways to do this, and it's not necessary to be absolutely precise, but each location in the new array should have a parameter set in it and more fit parameter sets should be copied into more locations than less fit sets. NOTE: Make sure that you copy each parameter set and don't just copy a reference to it. This is the single most common bug in this lab.

Crossing-over

Pairs of parameter sets are selected at random and each member of the pair is crossed over at one location (which is randomly selected for each pair). This means that two old parameter sets become two new ones as follows:

    old:

        1: A B C D E F G H I J
        2: a b c d e f g h i j

    new:

        1: A B C D e f g h i j
        2: a b c d E F G H I J

where A-J and a-j represent parameter slots in order. In this case the crossover occurred at the position between slots C and D (c and d) of both sets. The location of a crossover is random but must be between two slots in a parameter set. There is a number called a "crossover probability" (p_crossover) which determines how many parameter sets in the population get crossed over:

    n_pairs = int(p_crossover * (nsets - 1) / 2)

Choose pairs so that once a given pair has been crossed over, it won't be chosen again (sampling without replacement). Also, make sure that your crossover location actually does something e.g. don't cross over at location 0. Also, do not alter the best parameter set that is in the first location when doing crossovers.

You may find the random.shuffle() function to be useful. Type pydoc random for more information on it.

Mutation

Each parameter slot in each parameter set (except for the first one, which we leave alone) should be mutated with some low probability. What this means is that for each location in each parameter slot (there are three of these, representing the number to write on the tape, the direction to move in, and the state to change to), a random number is generated and if it's below a threshold (called the mutation probability), the contents of the location are altered randomly. For instance, if the number to write on the tape is mutated, you could change it from 0 to 1 or vice versa; if the direction is to be mutated, you could change it from left to right or vice-versa, and if the state to transition to is to be mutated, you could choose it randomly from the range -1 to 4.

Program to write

We're supplying you with these files:

ga.py This file will contain the implementation of the genetic algorithm.
bb_search.py This file will run the parameter search to find busy beaver Turing machines.

Much of the code is supplied for you, but both of these files are incomplete; you will have to fill in the methods with comments marked TODO (remove the TODO comments, of course). Don't modify any of the other methods at all. Make your method implementations conform to the docstring comments for each method. You will also need the busy_beaver.py program you wrote for assignment 6. The ga.py file is self-contained and can be tested independently of either of the other files. The reason for this is that all of the aspects of the genetic algorithm that are specific to the busy beaver problem are passed to the GA code as arguments to the GA constructor. These arguments are functions which handle initialization, mutation, evaluation and saving of parameter sets. (Note that it's perfectly legal in python to pass a function as an argument to another function, just like it is in languages like Lisp or Scheme.) Most of this code has been provided in bb_search.py for the busy beaver problem. For testing the GA code, though, we've supplied a simple (perfectly solvable) test problem at the end of ga.py which can be invoked by typing

% python ga.py

once the other parts of the GA code are filled in. You should make sure that this works before trying to run the GA on the busy beaver problem. If your GA code cannot perfectly solve the test problem there is probably something wrong with your code (try it a few times; it runs for ten generations only so it may not work every time).

When searching for busy beavers, you should run your busy beaver simulation code from lab 6 in silent mode so as not to clutter up the output with a lot of irrelevant printouts (modify the lab 6 code if necessary).

Running the program

Once your GA class works correctly and you've written the necessary code in bb_search.py you're ready to search for busy beavers. Type python bb_search.py and away you'll go. I recommend you try to run the program for at least an hour (you may want to run it late at night when the CS cluster computers are less heavily loaded). Every generation, save the best TM found so far to a file (the code supplied does this automatically). The name of the file should be best.<n> where <n> is the generation number. You should also include a file which includes some sample output from the GA program as it searches for busy beavers, as well as another file which gives the best BB TM you found along with the number of 1s it outputs. To save the sample output file for the program as it's running, run it like this:

% python bb_search.py | tee bb_search.out

which will print out the results of the program to the terminal while simultaneously saving them to the file bb_search.out. Halt the program whenever you get tired of it, but let it go at least ten generations. Note that by default, python bb_search.py will only search for ten generations and then halt; if you want to search for longer give an integer command line argument. If it's negative, the program will run until you halt it; if it's positive, it will run that many generations.

NOTE: Once your program is finished, it must be possible to independently check how many 1s the best Turing machine produced. The easiest way to test this is to copy the Turing machine's description file to a file called "bb.in" (just like in lab 6) and run "python busy_beaver.py". If this fails for any reason, the lab will have to be redone. Make sure that you don't break your old busy beaver code while changing it for this program! This should be obvious, but for some reason, it isn't.

The best busy beaver found by my program so far produced only 21 1s before stopping. Hopefully your program will do better. You can try altering the mutation and/or crossover probabilities to see if that will help. One interesting feature is that the algorithm almost invariably picks out Turing machines that have only one halt state, even without being specifically programmed to do so. Such is the power of (un)natural selection.

Things to watch out for

PLEASE READ THIS SECTION!

Every year, most of the lab 7 submissions are broken in some way, and have to be redone. The reasons are usually trivial and easily fixed.

You need to read and understand all of the code, not just the parts that you write! In particular, you should understand what all the arguments to the constructor are for, and use them appropriately. If you don't, it'll result in a redo. You also need to read the doc comments for all of the methods carefully, and make sure your code reflects those comments. If you don't... you guessed it, a redo.
Pay particular attention to efficiency in this program! It is very easy to write a program that takes excessively long to run tests of single TMs, which will make the overall program run time way too long. Feel free to modify your busy beaver code from lab 6 to make it run faster. Also, be very careful to catch exceptions that can be thrown e.g. when a TM runs off the tape. In such cases you do not want the program as a whole to terminate (which is why we use exceptions in the first place). You also don't want silent failure i.e. just halting the simulation without throwing an exception when the TM runs off the tape. This can easily lead to excessively large (and bogus) 1 counts for a TM (see below).

The most common reason for excessive run times is invalid or weak checking for lack of progress in the lab 6 code (i.e. the TM is not making any progress, so it needs to be terminated, but it isn't and keeps on going for an abnormally long time). The second most common error is invalid error handling. So if any of these situations happen:
- the program hangs for an abnormally long time on a single parameter set
- the program can't get through ten generations without aborting
- the program can't get through ten generations in an hour
then the program will be considered to be broken and you'll have to redo it. Almost always, these problems reflect problems in the busy beaver code from lab 6. Feel free to edit that code to make this lab work properly. You should only need to edit the run method, which is where the progress checking code should be.

At this point, you probably think I've belabored this enough, but I will add one more thing: none of your TMs should take more than a minute before a decision is made. If they haven't terminated after a minute, they should abort. You don't need to use a timer, you just need to set very harsh progress checking criteria e.g. check for progress every 10000 steps and require it to have printed at least 10 more 1s each time you check. So if you notice some TMs taking significantly longer than a minute to run, make your termination checking conditions more stringent.
Another very common bug is to have TMs which apparently generate ridiculously large numbers of 1s. If a TM is generating 5000 1s before stopping, and the tape is 10000 cells long, and you started the TM in the exact middle of the tape, then what is almost certainly happening is that the TM is running off the end of the tape and your exception handling code isn't working properly. I have never seen a TM generated by this genetic algorithm that correctly produces more than 21 1s, so even if your TM generates e.g. 200 1s, you probably have a bug. If this happens, try testing the resulting TM with your unmodified lab 6 code to see how it does (you may have introduced bugs when modifying the lab 6 code). Also, recall that a TM that runs off the tape should be considered to generate no 1s at all.
Another problem with this lab is that it's very easy to screw up the crossover and/or the mutation part of the GA code and still get something that appears to work. In particular, the test problem is perfectly solvable even without crossing over of parameter sets. Be extra careful when debugging your crossover code. One way to see if it's working is to try to solve the test problem with mutation disabled and with a small number of parameter sets. If no progress is made after a few generations it may indicate that your crossover algorithm has a bug. This should be tested using the test problem.
Another very common bug is to not copy parameter sets properly. If you have an array of parameters and try to copy one to another like this:
```
    self.params[10] = self.params[0]
```
where self.params[0] is an list of parameters, all that this does is that now self.params[10] points to the same list as self.params[0]. So mutating self.params[0][0] will change both lists, not just the one in self.params[0]. Bugs of this form are called aliasing bugs in the programming literature, and they can be very tough to track down.

To fix this, explicitly copy the list like this:
```
    self.params[10] = self.params[0][:]
```
A small change, but it makes a big difference. One symptom of bad copying is to have an inordinately large number of sets with exactly the same fitness (other than 1, for obvious reasons). Yet another symptom is when the fitness of the parameter set in the first location (which you're saving because it's the best set from the last generation) suddenly changes to a lower value for no obvious reason. Code with this bug is broken, even if good results are obtained, and you'll have to fix it.

Note, however, that the following code will not copy individual parameter sets:
```
    new_params = self.params[:]
```
This is because the list slicing operator only does a shallow copy of the list. Each item in new_params is effectively a pointer to the same list that the corresponding item in self.params is. This is an incredibly common pitfall, so don't get fooled. There is a deepcopy function in the copy module if you really need to do a deep copy, but usually it isn't necessary.

To make it easier to track down aliasing bugs, we've added a method called check_for_aliasing to the template code in ga.py, and called it from the generation method. Do not alter this code in any way! If your code results in an assertion failure, you have a bug and you should fix it.

What to hand in

The files bb_search.py, ga.py, and your (possibly modified) busy_beaver.py code from lab 6.