## The Mandelbrot SetThe Mandelbrot set is the "King" of all fractals. It describes all Julia Sets, has endless variety, and yet has an extremely simple definition. To construct it, one starts a complex number (a point on the complex plane). Next, that number is squared. Finally one adds the original number again. These steps are repeated over and over again. "Most" points quickly diverge to infinity: 2, 2*2+2 = 6, 6*6+2 = 38, etc. However, some points do not. 0, 0*0 + 0 = 0, 0*0 + 0 = 0. These points that do not are within the set. The either-or nature of the set isn't that interesting to draw. However, one can liven things up with a little colour. To do this, count how long it takes for the starting point to diverge beyond some boundary. By colouring by this "escape time" we can see extra detail. By choosing different boundaries, one gets different patterns of colours. As a general rule people tend to use the smallest circle about the origin that contains the set: If the magnitude of the iterate, z = x+iy, is greater than 2: x ## Displaying ImagesThe first thing we need in order to do this is have some way of displaying the results. Fortunately, it doesn't take much code to create a raw X11 window and then use a pixmap to show an image.
Next we need some sort of colour map for the escape-time colouring. To save space, we will just initialize one randomly, with a little bit of an intensity gradient built-in:
Now for the code that fills in the bitmap:
Finally all that is needed is the
The above code has an extremely simple inner loop... and as a result is quite slow, taking 24.9 seconds to run. It could be made faster by decreasing the maximum iteration count degrading the image somewhat, but as we'll see, that isn't the bottleneck. On the other hand, the results appear nicely:
## OptimizationThe first thing to notice is that we are using double precision. Since the "zoom level" is quite low, this excess precision isn't needed, and we can make do with just single precision floats. Since operations on floats tend to be faster than doubles, we can obtain an immediate speedup:
Unfortunately, things are still too slow, taking 20.6 to complete the drawing of the image. The next obvious thing to do is to use vectorization via SSE, where we can work on four floats at a time. The resulting speedup should be impressive. One way to do this is to use SSE intrinsics, however we will go directly to assembly language to try to extract as much speed as possible. First we need an SSE-enabled display function. Fortunately, gcc provides vector intrinsics that allow us to pass the data in SSE form to the inner loop:
The reason we will go directly to asm is that some inner loop functions have already been written. Converting these to the AT&T syntax used by gcc isn't difficult
These take on average 7.4 seconds for the "softlab" algorithm, 6.7 seconds for the first codeproject algorithm, 7.0 seconds for the second codeproject algorithm, and 6.7 seconds for the "Vodnaya" algorithm on this machine. Note how the "optimal" code depends quite strongly on machine type, but the timings don't vary all that much. The above code isn't quite the best on this machine. We can speed things up a little more by noticing that that output from the
The above is about as fast as we can go with the algorithm we are using. However, by changing that we can do much better. Note how most of the time spent is in the calculation of points within the Mandelbrot Set itself. Those points need to iterate ## Periodicity CheckingMany points within the Mandelbrot Set eventually reach periodic orbits. i.e. they converge to a sequence of values that repeats as the square-and-add operation is done. If we can detect this repetition, then we can bail-out early, and perhaps avoid quite a bit of work. To do this, we record a value to test against. We then test the next
The above code uses double precision (rather than single precision) and doesn't use any SSE tricks or assembly optimization. However, it completes the benchmark in 2.2 seconds! This is quite a bit faster than the "fastest" code. Of course, we can apply the same techniques to try and speed this up. Rewriting the inner loop to use assembly:
This gives us a little bit more speed, completing in 1.3 seconds. ## ThreadsThe next thing to try is a little bit of parallelization. This machine is a quad-core, and so far we have only been using a single one of those cores. By using multiple threads we can extract some more speed. Using the pthreads library with the same inner loop:
The above completes in 0.3 seconds, which corresponds to the 4-times speedup expected. It works by using scan-line interleaving. Each thread does roughly the same amount of work this way. Unfortunately, updating the screen becomes a bottleneck. This is due to the X11 calls not being thread-safe. Therefore we need to wrap them with locks to prevent data corruption. This locking slows things down. Fortunately, the result is so fast we don't need to worry too much about line-by-line updates. By only printing the image when done, we save a little more time. The is one more trick available. We can save some more time by exploiting some of the mathematics of the Mandelbrot set. A large part of it is described by some simple mathematical functions. By evaluating these, we can work out if points lie within those regions. If so, then we don't need to evaluate the inner loop at all. This could obviously save quite a bit of time. Wikipedia has a formula for the largest circle in the Set, given z = x+iy, points within it satisfy (x+1)
So how much faster is it? Unfortunately, the time taken is still 0.3 seconds. It appears that the program can't get that much faster due to the time taken by X11 to create the window. This extra optimization also seems to be mostly duplicating what the periodicity-checking code does (albeit in a much more efficient manner). However, the result is still much (20+ times) faster than the "optimal" asm code discussed on code project. (Another possibility is to use the reflection symmetry along the real axis to try and gain another factor of two, but again this doesn't help much in reality.) ## OnwardsAs can be seen, from the naive initial code to the final version there are almost a couple of orders of magnitude in difference in speed. So is the above the best code? Nope. There are still other algorithms that can be used to speed things up even more. However, they all add inaccuracy in some way or another. One way to speed things up is to not calculate every point. Since the image consists of blocks of colours, if we can detect regions of constant colour we can fill the entire block at once. Of course, it is possible to get this checking wrong, and thus draw something incorrect. Different algorithms have differing sensitivities to this. One simple one, known as "solid guessing" looks at the four corner points of a square. If they are all the same colour, it assumes that the entire square is. If they differ, it subdivides and tries again. Another algorithm traces the boundaries of coloured regions. Since the Mandelbrot Set is simply-connected this works fairly well. However, since the boundary-tracing algorithm only tests a finite number of points it doesn't prevent fine features, smaller than a pixel in width from confusing it. For deeper zooms, one can gain improved speed by noticing that the points on the screen all start out very close to each other in the complex plane. Instead of doing iterations for every point, one can discover how that rectangle (quadrilateral) is moved and distorted by the square-and-add operation. If the size is small enough, then the relative distortion is also small, and can be approximated in a simple functional form. Eventually, the distortion rises to be large enough that the quadrilateral needs to be subdivided. However, in the process we can save a large amount of work. Even deeper, one needs to worry about arbitrary precision arithmetic. As such, there is no obviously fastest algorithm. The best code needs to seamlessly change from single precision, to double, and then arbitrary precision as required. |

About Us | Returns Policy | Privacy Policy | Send us Feedback |

Company Info |
Product Index |
Category Index |
Help |
Terms of Use
Copyright © Lockless Inc All Rights Reserved. |

## Comments

firr said...for(n = 0 ; n <= max_iter; n++)

{

reim2 = (re + re) * im;

re = (re - im) * (re + im) + cRe;

im = reim2 + cIm;

if( re > 2.0 || re<-2.0 && im> 2.0 || im<-2.0 ) break;

}

had anyon tried how it works when rewritten to sse ?

(terrible captcha!)