Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Simplex Optimization


Simplex Optimization

Simplex Optimization

By Larry Andrews

Sidebar: Function Objects


In science, engineering, finance, and other fields, practitioners often want to find the "best" solution to some problem. The problems change, but the need for stable, robust methods for finding best solutions does not. Writing a program using least-squares, conjugate-gradients, and similar optimization methods can be complex; it is often necessary to code the derivatives of the function being optimized. These methods may require exquisite care in the program design so that unexpected results (such as divergence from any solution) do not occur. Often overlooked, simplex optimization is a reasonable and quick-to-implement solution, useful at least for the initial stages of rapid development and for testing methodologies.

I first saw the simplex method in MINUIT, a Fortran package from CERN. MINUIT includes several optimization techniques, but simplex met my needs because it was simple to integrate into a program I was writing. Later, I used the same code in other programs. When I stopped using Fortran, I still needed to do optimizations, and I've coded the simplex method in Pascal, Modula-2, Visual Basic, and now C++.

C++ offers several advantages in implementing the simplex algorithm — good memory-management methods, valarrays that implement simple array arithmetic, and templates all contribute to elegant packaging of the simplex method. Use of function objects allows the creation of a self-contained simplex template that requires no special knowledge of the problem domain or the programmatic implementation of the problem domain.

Simplex optimization is easy to implement since it does not require the evaluation of derivatives. Its functioning is simple to understand, and since its internal method is inherently iterative, it is possible to monitor the progress of the algorithm. Its stability is well known. It does not go off to infinity searching for nonexistent minima. It often works well when the function being evaluated is discontinuous (or one that does not even have derivatives). In fact, sometimes it is about the only general method that can solve a problem.

Most scientific and engineering optimization problems use experimental data. Because of measurement error, the problems often have no actual zero, just some minimum that you want to get to. Least-squares is a common method of implementing optimization, but it often suffers from several problems. First, you need to derive and code the derivatives, sometimes a substantial task. Writing the derivatives tends to be an error-prone process. Then you need to implement or find matrix and vector code, including matrix inversion (and hope that what you choose is sufficiently robust). Finally, nonlinear least-squares is notorious for poor convergence in many cases; sometimes, it just diverges from any acceptable answer for reasons that are hard to diagnose. The simplex method, on the other hand, is straightforward, and if you need to, you can get inside and see the actual steps. Simplex just walks around in a stylized way, and you are guaranteed that you will at least have the best answer you have ever stumbled upon.

How Simplex Works

A simplex is defined as the simplest kind of object that can fill space. For example, in two dimensions, the simplex is the triangle. In three dimensions, it is a tetrahedron. The simplex optimization method begins with an initial (often guessed) point (that is, a set of parameter values) and constructs a simplex near that guessed point. The objective function is evaluated at each of the simplex points, the list of values is kept, and the best/worst vertices are located in the list. In any step, the best-known vertex is never replaced, and the worst known vertex is always replaced.

The centroid (average) of all of the points in the simplex without the worst point is computed. From this point, at each step in the process one of the following methods is invoked to compute the next simplex:

  1. Reflection. The worst point is reflected through the centroid. If the new point is better than the worst and not better than the best, the worst point is replaced; Figure 1(a).

  2. Expansion. If the point tested in step 1 is better than the best point in the simplex, then the step from the worst to the new point is doubled, and the new point is tested. If it is better than the best, then the worst is replaced with the second new point; Figure 1(b).

  3. Replacement. If the expansion point in step 2 is not better than the best, then the worst is replaced with the point tested in step 1.

  4. Contraction. The more aggressive attempts to find a better point having failed, the point midway between the worst point and the centroid is tested. If that point is better than the worst, it replaces the worst; Figure 1(c).

  5. Shrinking. Okay! Nothing else worked. Replace every point except the best with the point midway between that point and the best; Figure 1(d).

The original publication of the simplex method [1] only used the Reflection method. That approach suffered often from slow convergence and, since it did not have any way to reduce the size of the simplex, there was no way to converge down to the best answer. Others have improved the algorithm by including steps 2 through 5.

Implementation

Listing 1 is the basic simplex template. Some simple access routines for setting and getting simplex parameters have been deleted for clarity, but they are included in the files on the CUJ site (http://www.cuj.com/code/). Several of those access routines are in the commented code in Listing 2.

The first, obvious thing that is needed is some way to store the simplex. I have implemented the simplex as a Standard Library valarray of the points of that define the simplex. The points are also implemented as Standard Library valarrays. Valarrays implement simple array arithmetic such as sums and differences and multiplication by scalars, considerably simplifying several functions. For instance, calculating the centroid of the points of the simplex without the worst point just becomes (in pseudocode): simplex.sum( ) - simplex[worst point]. Also, the Standard Library allows defining the sizes of containers at run time, making the implementation independent of the details of the particular problem being solved. For instance, in the 2D example case, the valarrays for the points of the simplex (triangle) are just valarrays of length 2. Since there are three points, the valarray that contains the points is of length 3.

To provide more safety (for instance, protection from the user changing the data), I copied the function object into the template instantiation. This is not really necessary, but it provides some protection against the data being changed or the external function object's being deleted.

The other important component of the interface to the simplex class is fnMinimize(), which returns an std::pair. The first element of the returned pair is the agreement factor at the end point, and the second is the std::valarray of the best set of parameters found. For this reason, the calling function (see Listing 2) needs to provide:

pair<double, valarray> p;

An Example

To demonstrate simplex templates, Listing 2 provides a simple driver program and a small function object with its objective function a parabolic bowl with its minimum at (x=2, y=1). The objective function is (x-2)^2 + (y-1)^2, so there is an actual point where the objective function is zero (at x=2, y=1). The demonstration function object is quite simple. It provides only a constructor that stores the initial values of the two parameters (a and b in the example), and two call operators:

double operator () ( const std::valarray<double> );

std::valarray<double> operator () ();

The first call operator returns ((x-2)^2+(y-1)^2). The second returns the starting set of values of the parameters as an std::valarray. In the example main program and Figure 2, the starting point is (11, -5). The progress of the simplex is in Figure 2. The simplex starts off rapidly moving along the x-direction until it passes the x-coordinate where the minimum lies (that is, x=2). At that point, it turns and starts walking toward the global minimum. The figure shows the sequence of simplexes as the algorithm moves along.

Limitations

Sometimes the simplex method is the best method you can use to get something working quickly. If the computational burden of function evaluation is not too large, then any method may work well, and this one is simple to get up and running, but there are times when simplex doesn't work well, and there can (occasionally) be problems with getting it to run right. It also does not return an estimate of the standard deviations of the parameters.

Some of the problems that can be encountered using simplex were mentioned above. Because it does not converge as quickly as some other methods, it isn't effective if the function evaluations are very time consuming. The slow convergence can sometimes be a problem, but that may be offset by simplex being the only simple way to find a minimum at all. When there are a lot of parameters (that is, a high-dimension minimization problem), typically the simplex may get quite slow, just because there are a lot of directions to go.

A deeper problem in the use of the simplex method is getting it to stop. Basically, you want it to stop when it's not finding solutions that are any better than the best one known. But it's not always easy to see that it has happened. For one thing, you may not be interested in the absolute best solution. Any answer that is within 1.0E-7 or so of the best solution is good enough for many purposes, especially when using real data with measurement errors in it. In this implementation, I have included two termination methods. First (and most powerful), is a limit on the number of iterations (5000 iterations in the template); it stops when you tell it to stop. Second, the iterations stop when the magnitude of the vector of function evaluations changes by less than some preset amount (in the template, the default value is set to 1.0E-7). Sometimes, the simplex will cycle back and forth between two positions, and then it will simple go on forever (or at least until the iteration limit).

Another peculiarity of the simplex method is the need to figure out a starting simplex. The default method in the template is simple to increase each of the input parameters' values by 1%, one at a time. For instance, if the starting point were (1,1,1 ), then the simplex would be (1,1,1 ), (1.01,1,1), (1,1.01,1), and (1,1,1.01). (For a zero input parameter, 0.1 is used.) Lots of other schemes could be used. The signs could be changed, or all the values up to the current one could be changed. Different starting simplexes can lead to different rates of convergence. If there are multiple minima, then different starting simplexes can even lead to different final results.

In the interest of simplicity, certain checks have been left out of the template. For instance, there are no checks ensuring that the size of the valarrays is not changed by the function object; in fact, there is no sure way to do this from within the template. Likewise, there are no checks ensuring that the data is not being changed as the simplex moves along. If the copy constructor for the function object uses deep copies, then the data will simply reside in the simplex instantiation, and there will be no problem with changing data. I have chosen not to add simplifying niceties such as a typedef for valarray<double> so that the structure of the data flow remains clear.

References

[1] J.A. Nelder and R. Mead, Computer Journal, 1965, Vol. 7, p. 308.

[2] Qiang Liu, "Implementing Reusable Mathematical Procedures Using C++," C/C++ Users Journal, June 2001.


Larry Andrews has a Ph.D. in chemistry and is a software developer and software QA manager recently unemployed in the current downturn. He can be reached at [email protected].



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.