CS 11 Python Track: Assignment 6: Simulated Annealing


Goals

In this assignment you will learn some the basics of scipy, a python module for scientists. In addition, you will learn about optimization using simulated annealing.


Covered this week


Instructions

lab6a: Optimizing Simple Functions

A common problem in computer science (and indeed all of science) is the optimization of some known function over some range of its possible inputs. For instance, let's find the optimal way to enclose a rectangular region using N total feet of fencing. A constraint is that we aren't allowed to subdivide a foot of fencing. A priori, we know the answer will be a square. Let's write this up as a function:

    def fencing(l, w, n):
        assert 2 * (l + w) == n
        return l * w

[10] Let's find the optimal input to fencing when given a particular amount of fencing, n. Write a function called solve_fence(n) which, given a total amount of fencing n (which we will assume is an even number) will find the optimal length and width of the rectangular region. Use the naive approach of looking through the entire potential solution space. Assume the lengths and widths are non-negative.

[5] Why is this type of solution a bad idea in general? Write your answer in a comment.

A better strategy would be to use what is called "gradient descent". This means that we choose a point to start at and determine if increasing or decreasing a parameter along a particular direction would improve our solution. We choose the best direction to move along and repeat until there is no way to improve the solution by moving in any direction. This can be done in any number of dimensions, but it's particularly simple for a one-dimensional problem.

[10] Write grad_fence(n) to implement gradient descent for the fencing problem. Start at a random point inside of the solution space. Only consider integer lengths, as before.

[5] Why is this an improvement? What could be a potential problem with this approach for general problems? Write your answer in a comment.

Consider this: what if we have a function with 2 maxima, one global and one only local? Let's look at a function that has these properties.

    import numpy as np
    import matplotlib.pyplot as plt

    x = np.linspace(-6,3,40)
    plt.plot(x,-2*(x+6)*(x+2)*(x-3)*x)
    plt.show()

    def f(x):
        return -2*(x+6)*(x+2)*(x-3)*x

[10] Write grad_f(start, stop) where start and stop are the minimum and maximum x's to consider. For simplicity, only consider integer values of x (so technically this is not a "gradient descent" but the principle is the same). Consider the range from below the minimum root to above the maximum root.

[5] Run your code a few times. What do you notice? Why is this the case?

[5] Now run your code once with the range within 1 of one of the maxima. What do you see now? Now try the same trick with the other maximum. What happens?

Another optimization method called "Simulated Annealing" can help solve this problem. Annealing is the process by which glass, metal or other molten material cools. The speed at which it cools affects the properties of the material and can introduce banding. Sometimes, the material is reheated in the middle of the process to slow the cooling down further to add additional order to the cooling material. Much as genetic algorithms use principles of biology to generate results in computer science, simulated annealing is a technique that uses a simulated physical "heating/cooling" process to achieve results.

Simulated annealing is the continuous repetition of the following process. The temperature decreases. Then, the current guess of the global maximum is randomly changed based on the temperature. If the new guess is better than the old guess, this change is accepted. Otherwise, there is a chance that this new (worse) guess will be accepted. For a very high temperature, this probability should be close to 1, and for a very low temperature, it should be close to zero. (Of course, probabilities should never be negative or > 1). One formula we can use to give us the probability that a bad move is accepted is the following:

     Goal: find x for which f(x) is maximized
     Given: Two values of x, old_x and new_x
     If f(new_x) > f(old_x), accept.
     Otherwise (f(new_x) < f(old_x)), accept with this probability:
     p(accept) =  exp((f(new) - f(old)) / T)

Other formulas for p(accept) are also possible, as long as they have the correct asymptotic behavior (i.e. the probability goes to 1 for high temperatures and goes to 0 for low temperatures).

Note that as the temperature decreases, the system is less likely to accept a worse guess. This means that the algorithm gets less tolerant as time goes on and starts searching for the nearest local maxima as T gets small.

When running simulated annealing, the most important parameter is the temperature T. This variable measures how much randomness there is in the system. When the temperature is high, the parameter(s) change randomly. When the temperature is very low, the simulation has "annealed" and very little movement occurs. When the temperature hits 0, the algorithm is done and the answer is returned. You can also stop the algorithm early by limiting the number of steps it will take. However, you should make sure that there are enough steps for the temperature to get close to zero.

[30] Implement simulated annealing on the function f. Choose a good rate of temperature decrease. We found that multiplying by a fraction close to 1 worked well enough. Find good inputs so that the answer is almost always the correct solution (say, 9 out of 10 times).

WARNING: Simulated annealing can be extremely sensitive to parameter variations. These include the initial temperature, the rate of decrease of the temperature, the distance to move in the parameter space per step, etc.. We strongly recommend that you litter your code with print statements initially until you are convinced that things are working properly. Make sure you search over a fixed parameter range, and if you go beyond the range, do something intelligent (like wraparound). Don't let the magnitude of your parameters get arbitrarily large! Make sure arguments to the exp function don't get too large; your code shouldn't crash part way through the simulation! You should also use assert statements to make sure that e.g. probabilities are always in the range [0, 1]

Also, try running the annealing simulation starting from around the point where the local (but not global) minimum is. If things are working correctly, it should usually still end up at the global minimum. If not, you may be lowering your temperature too quickly, which defeats the purpose of the algorithm.

lab6b: NP-hard problems and kNaPsacking

(See what we did there? :-))

NP-hard problems are essentially problems that are difficult to solve algorithmically, but where it is not difficult to verify that a solution is in fact a solution. You can learn more on this subject in courses like CS 21 and CS 38. For our purposes, it is sufficient to note that NP-hard problems are sufficiently complex that checking every possible solution is not feasible. In addition, a number of changes to the input of an NP-hard problem that each individually decrease the utility (i.e. the "desirability" of the output), can, when combined, lead to a higher overall utility. If you have a function that represents an NP-hard problem, it is likely to have many local maxima.

A classic NP-hard problem is the knapsack problem. In this problem, there is a collection of objects where each object has some weight and some value. Some objects may be duplicated i.e. there may be more than one of some kind of object. Your goal is to load up a knapsack such that the sum of the weights of all objects in the knapsack does not exceed the weight limit while optimizing for the combined value of the items in the knapsack. (There are variants of this problem, but we are not worried about them for this section.)

[15] Write the knapsack(lst, weight_budget) function. It should take a list of tuples, each of which represents a single object of the form (size,value). It should return a tuple of two items: a list of tuples that represent the objects that can go in the knapsack, and the total value of the items in the list. Note that your solution will have to search over nearly the entire search space to get the optimal answer. This is intended for now.

Here is an example of inputs for this function.

    a = [(12,4),(1,2),(4,10),(1,1),(2,2)]
    weight_a = 15

In this case, a is the list, and weight_a is the weight budget. The correct answer is ([(2, 2), (1, 1), (4, 10), (1, 2)], 15). (The order of the elements in the returned list is irrelevant, of course.)

[15] Now find a (possibly non-optimal) solution to the knapsack problem using simulated annealing. There are a number of ways to change your guess based on T. Find one that works for you. Your answer should usually get the right answer as the brute-force search for the list of tuples a and weight_a.

Your function signature should be knapsack_anneal(all_items, max_weight, max_steps, T) where all_items is a list of tuples as it was in knapsack. max_weight is an integer as is max_steps. T should be a float.

You will also want to be clever with what the utility function is. Remember that knapsacks that try to fit more weight than the maximum are not allowed.

Since this solution method should run much faster than brute-force methods, you can test it on a more complex set of objects.

    b = [(70,135), (73,139), (77,149), (80,150), (82,156), 
         (87,163), (90,173), (94,184), (98,192), (106,201), 
         (110,210), (113,214), (115,221), (118,229), (120,240)]
    weight_b = 750

The best possible score for this data set has a weight of 749 and a value of 1458:

    ([(70, 135), (77, 149), (82, 156), (90, 173),  
      (94, 184), (98, 192), (118, 229), (120, 240)], 1458) 

Don't worry about getting this exact value, but you should at least be close.

lab6c: SciPy

[5] SciPy is a library related to numpy and pandas that is specifically meant for science. As a brief taste of its potential, use scipy.optimize.basinhopping to solve for f from part A. You know what the answers should look like, so find some decent parameters. Be aware that the basinhopping algorithm looks for a global minimum, not maximum, so you will need to take this into account. Make sure you print the output of the basinhopping function so you can see what the search results are.

By the way, the basinhopping algorithm isn't exactly simulated annealing but is in the same broad class of stochastic search algorithms.

[10] Write whatever functions you need for scipy.optimize.basinhopping to be applicable to the knapsack problem. Note how much cleaner and easier this is. Test it on both knapsack problems from part B.

Note that you might want to write your own step function because knapsack is discrete and basinhopping works for continuous functions by default. Here is some example code to help.

    class MyTakeStep(object):
        def __call__(self, x):
            # x is the array that represents which items are in the knapsack.
            # Each value in the array is 0 if the item is not present or 1 if it
            # is.  Modify the array here in some random way.
            return x

Similarly, the function you want to optimize should take as its input an array of 0/1 values and compute the resulting value of the knapsack given that the 0 items were discarded and the 1 items were retained.

Running the basinhopping algorithm might give you runtime warnings. These are not usually a problem. You can pipe the warning (stderr) output to /dev/null if you want to ignore it.

[10] Play with the T and niter arguments. Find what parameters work well on knapsack_b from part B. This knapsack has a weight limit of 750. What is the best answer you get? Can you get the actual best value (1458)?


To hand in

The files lab6a.py, lab6b.py, and lab6c.py.


Copyright (c) 2017, California Institute of Technology. All rights reserved.