Fraqtive/Generator Core

From MiMec
Jump to: navigation, search

The generator core, as the name implies, is the most important part of Fraqtive, doing all the actual calculations. Though only 30 kilobytes long, it took a major part of development time and was refactored and extended several times. It is quite complex and advanced code, heavily using templates and generative programming. The purpose of this article is to explain the generation algorithm and various techniques used in the code.

Algorithm

Generation of Mandelbrot set is a very simple algorithm which can be written in about 15 lines of code (see this example). The two reasons why the actual implementation is much more complicated are generalization and optimization.

Generalization

Generalization allows to produce an entire family of fractals using slightly modified variants of the classic Mandelbrot set equation. The general recursive fractal equation used in Fraqtive is:

Zn+1 = f(Zn)n + C

For Mandelbrot sets, C is the same as Z0 (that is, the coordinates of the point). For Julia sets, C is a constant parameter.

Function f determines the variant of the fractal. Normally it's simply identity function (f(x) = x), but there are three other variants used in Fraqtive:

  • f(x) = Re(x) - i*Im(x)
  • f(x) = |Re(x)| + i*|Im(x)|
  • f(x) = Re(x) + i*|Im(x)|

Generally, these are combinations of negations and absolute values which mirror the point about the X and/or Y axis.

Value n is the exponent of the equation which change the number of symmetry axes of the fractal. It can be either integer or fractional.

The recursive equation is calculated repeatedly for each point until value Z exceeds the bailout radius (which is 8 for better accuracy) or until reaching the maximum number of iterations. The continuous coloring algorithm is used to calculate the value of the points in order to improve the quality of the generated image. Finally the square root of the value is calculated for better color mapping in highly magnified areas.

Optimization

Optimization implemented in Fraqtive is based on the observation that no matter how much the fractal is magnified, there are always some areas which contain many details and some which contain few details. The algorithm uses simple heuristics to detect areas in which some (or most) calculations can be skipped. In those parts simple linear interpolation is used instead of calculating the recursive equation for every point. This allows to improve the generation speed several times.

The fractal surface is divided horizontally into rectangular regions. This allows to progressively update the image while calculating and to perform parallel calculations on multi-core processors. Each region consists of a number of whole cells of 3 x 3 points plus one row of points on the right and bottom edge:

x . . x . . x . . x
. . . . . . . . . .
. . . . . . . . . .
x . . x . . x . . x
. . . . . . . . . .
. . . . . . . . . .
x . . x . . x . . x

Calculation of each region is performed in three steps:

  • pre-calculation: every third point in every third row is calculated (shown as x)
  • interpolation: the calculated points are linearly interpolated to the remaining points
  • calculation: remaining points are calculated for areas containing significant details

The heuristics for determining which points are calculated works by comparing the absolute difference between values of pre-calculated points with a threshold value.

1 a a 2
b c c d
b c c d
3 e e 4

For example, for points a, the values of points 1 and 2 are compared. For points c, the values of all four pre-calculated points are compared.

Code Design

The generator requires a modern C++ compiler such as GCC 3.4 or Visual C++ 2005. It uses intrinsics for SSE2 support and some inline assembly for detecting the availability of SSE2. It depends on Qt to simplify detecting the architecture (compiler, operating system and byte order), but this dependency could easily be removed. It can be compiled with or without SSE2 support depending if the HAVE_SSE2 macro is defined.

The interface of the generator is very simple. There is one abstract class called the Functor, which can be used to calculate the value of a point at given coordinates. There are a few functions for creating functor objects of various types. There are also three functions which implement the three steps of the generation algorithm described above and helper structures for passing input parameters (the coordinates of the fractal region) and output parameters (memory buffer and resolution).

For maximum performance, to avoid conditional branches and improve code optimization, separate procedures for each variant and each value of exponent are used. SSE2 instructions are used if available, allowing to simultaneously calculate two points. Taking into account different variants and exponents, there are 40 different procedures with SSE2 and 48 procedures without SSE2 (fractional exponent calculations don't use SSE2). Advanced metaprogramming techniques are used to automatically generate these procedures in such way that they are equally fast as if they were written in pure assembly language.

Implementation

This section describes the most important parts on the code, focusing on mataprogramming techniques that were used. You can view the code online:

Calculations

The calculateResult() function implements the continuous coloring algorithm and calculates the square of the value based on number of calculated iteration and the final radius. Note that the term 'radius' actually means square of the radius, i.e. the distance between current point and (0,0).

The adjust() function template is parametrized using the Variant enumeration. It is specialized for each variant and performs the negation or absolute value calculation in each iteration. The adjustSSE2() function does the same on SSE2 registers, which store two double values. The negation and fabs() operations are performed using bitwise XOR and AND instructions and appropriate bit masks, effectively clearing or inverting the sign bit.

The calculate() function contains the main loop for calculating fractal with arbitrary fractional exponent. It is parametrized with the variant and calls the appropriate adjust() function is each iteration.

The calculatePower() function calculates an integral power of given complex number. The power is given as template parameter. The power operation is defined recursively as:

Zn = Zn/2 * Zn/2 (when n is even)
Zn = Z * Zn-1 (when n is odd)

The compiler generates fast code by inlining the recursive invocations of this function. Recursion is ended with specialization for n = 2. Therefore, optimal code for calculating an arbitrary exponent can be generated with a simple recursive definition. In addition, the calculatePower() function also calculates the (squared) radius to avoid calculating it separately.

The calculateFast() function contains the main loop for calculating fractal with integral exponent. It is parametrized with both the variant and the exponent value and calls appropriate adjust() and calculatePower() functions to create all possible combinations of variants and exponents.

There are also calculatePowerSSE2() and calculateFastSSE2() functions which works the same way as the regular ones, except they calculate two points simultaneously by using the SSE2 registers and instructions. The calculateStepSSE2() function performs a single step of the loop. After each step the radius of both points is compared with the bailout value. When a point exceeds the bailout radius, the result is stored, and when both points are already calculated, iteration is stopped.

The RepeatStepsSSE2 template clones the calculateStepSSE2() function the given number of times, depending on the STEPS parameter, by invoking itself recursively with number of steps decreased by one. This way multiple steps can be executed one after another in a single loop iteration, which speeds up the calculations. The AutoStepsSS2 is a helper class used to determine the number of steps depending on the exponent. The optimal number was found by experimenting - it's two steps for exponents from 2 to 5 and one step for higher exponents. Larger number of steps makes the code too big to fit in the instruction cache so there is no gain of performance.

Note that because templates are used, the compiler actually inlines all the code in the calculateFast() and calculateSSE2() functions. The templates and helper functions only describe how this code is generated for each combination of the variant and the exponent. Doing the same in pure assembly language would be almost impossible. This is exactly what metaprogramming or generative programming is all about.

What's more, the number of variants and the maximum allowed exponent can be easily increased without significant changes in the code. Otherwise the number of required functions would grow very fast or they would have to contain conditional code which would be significantly slower.

Functor Factories

Algorithms do not use static polymorphism (i.e. templates) but rather dynamic polymorphism (i.e. virtual methods). The reason is that the cost of the virtual function is very small (it's just called just once for every point) and the generated code doesn't become too huge. It also makes the code a bit easier to use, because only the abstract base class is exposed as the public interface and the templates are just implementation details. Combining inheritance of interface with implementation based on templates is a very powerful programming technique.

The implementation must provide factory functions for creating appropriate objects based on run-time parameters, such as the variant and the exponent. This leads to an interesting problem of creating instances of template classes whose parameters are static (defined at compile-time) based on dynamic parameters (run-time variables).

Selecting class template at run-time based on an enumeration requires a switch statement with separate case for each enumeration value. This switch statement is enclosed in the create() method of the VariantDispatcher class. Note that static methods are used instead of simple functions because class templates are much more flexible than function templates.

Selecting template based on the exponent, which is an integer from 2 to MaxExponent, is done by the create() method of the ExponentDispatcher class. It compares the run-time parameter exponent with its template parameter EXPONENT and if it doesn't match, it recursively calls another instance of itself with EXPONENT decreased by one. Recursion is finished with a specialization which returns NULL. Using this technique, the maximum allowed exponent can be changed by simply changing the value of one constant and the compiler will automatically generate code for higher exponents. Note that a similar approach could be used for enumerations, but using a switch statement seems more elegant.

There are two kinds of functors: those which depend only on the variant (because the exponent is passed at run-time as a fractional number) and those which depend on both the variant and the exponent (that includes the "Fast" functors and the SSE2 functors). To avoid creating separate dispatchers for different functor, they are written in a generic way by using template template parameters. The dispatcher's create() method calls the create() methods of the appropriate specialization of the passed template class, based on the run-time value of the parameter (either variant or exponent). This is demonstrated using the following pseudo-code:

template<typename BASE, template<Variant VARIANT> class FACTORY>
class VariantDispatcher
{
    template<typename PARAMS>
    static BASE* create( Variant variant, const PARAMS& params )
    {
        return FACTORY<variant>::create( params );
    }
};
template<typename BASE, template<int N, Variant VARIANT> class FACTORY>
class ExponentDispatcher
{
    template<typename PARAMS>
    static BASE* create( int exponent, Variant variant, const PARAMS& params )
    {
        return FACTORY<exponent, variant>::create( params );
    }
}

Of course this is a simplification, as template parameters must be compile-time constants, but it demonstrates what these two classes effectively do.

The PARAMS structure contains parameters passed to and stored by the functors. For Julia functors, the parameters contain the coordinates of the C point in the fractal equation. For functors with fractional exponent, the parameters contain the value of the exponent. Parameters are encapsulated in structures simply to avoid creating overloaded versions of the create() with different number of parameters. In future versions of C++ this could be done using variadic template parameters.

The functors inherit both the Functor abstract class for their public interface and the appropriate parameter class for implementation (which basically stores the parameters as members). They simply call the calculate() or calculateSSE2() functions described above, so they are a sort of adapters between those internal functions and the public interface.

Creating functors which use fractional exponents using the VariantDispatcher described above is quite simple. The FunctorFactory class template takes the functor class as its template template parameter. The inner class called InnerFactory is passed as a template parameter to the variant dispatcher. It is parametrized with the variant and creates an instance of the appropriate functor. Having the functor factory, the implementation of the publicly visible functions createMandelbrotFunctor() and createJuliaFunctor() is trivial.

Creating functors which use static, integral exponents requires two steps of dispatching: first using the exponent, then using the variant. The ExponentDispatcher passes its inner class called FactoryAdapter as a parameter to the variant dispatcher. The adapter, in turn, calls the factory method from the class passed to the exponent dispatcher. The fast functor factory uses the exponent dispatcher in the same way as the fractional functor factory uses variant dispatcher. So the actual sequence of calls is:

FastFunctorFactory<BASE, FUNCTOR>::create( exponent, variant, params )
 ExponentDispatcher<BASE, FastFunctorFactory<BASE, FUNCTOR>::InnerFactory>::create( exponent, variant, params )
  if ( ... )
   VariantDispatcher<BASE, ExponentDispatcher<..., exponent>::FactoryAdapter>::create( variant, params )
    switch ( ... )
     ExponentDispatcher<..., exponent>::FactoryAdapter<variant>::create( params )
      FastFunctorFactory<BASE, FUNCTOR>::InnerFactory<exponent, variant>::create( params )
       return new FUNCTOR<exponent, variant>( params )

It is quite complex and at first may be difficult to understand. Obviously it could be done (and it was, initially) in a much simpler way by creating separate factories for each kind of functor without using dispatcher classes, inner classes and template template parameters. However when I looked at the code I had a feeling that it's possible to write it in a much more generic way so that there is only a single switch statement for the variant and only one if statement for the exponent. Once I got used to it, I found that it's a very elegant and - apart from the convoluted C++ syntax - simple solution.