Difference between revisions of "Fraqtive/Generator Core"

From MiMec
Jump to: navigation, search
(Generalization)
(Calculations)
Line 66: Line 66:
 
=== Calculations ===
 
=== Calculations ===
  
The <code>calculateResult()</code> function implements the continuous coloring algorithm and calculates the square of the value based on number of calculated iteration and the final radius. Note that the term 'radius' actually means square of the radius (distance between current point and <code>(0, 0)</code>, i.e. <code>Re(Z)<sup>2</sup> + Im(Z)<sup>2</sup>).
+
The <code>calculateResult()</code> function implements the continuous coloring algorithm and calculates the square of the value based on number of calculated iteration and the final radius. Note that the term 'radius' actually means square of the radius, i.e. the distance between current point and <code>(0,0)</code>.
  
 
The <code>adjust()</code> function template is parametrized using the <code>Variant</code> enumeration. It is specialized for each variant and performs the negation or absolute value calculation in each iteration. The <code>adjustSSE2()</code> function does the same on SSE2 registers, which store two double values. The negation and <code>fabs()</code> operations are performed using bitwise XOR and AND instructions and appropriate bit masks, effectively clearing or inverting the sign bit.
 
The <code>adjust()</code> function template is parametrized using the <code>Variant</code> enumeration. It is specialized for each variant and performs the negation or absolute value calculation in each iteration. The <code>adjustSSE2()</code> function does the same on SSE2 registers, which store two double values. The negation and <code>fabs()</code> operations are performed using bitwise XOR and AND instructions and appropriate bit masks, effectively clearing or inverting the sign bit.

Revision as of 17:20, 28 May 2008

The generator core, as the name implies, is the most important part of Fraqtive, doing all the actual calculations. Though only 30 kilobytes long, it took a major part of development time and was refactored and extended several times. It is quite complex and advanced code, heavily using templates and generative programming. The purpose of this article is to explain the generation algorithm and various techniques used in the code.

Algorithm

Generation of Mandelbrot set is a very simple algorithm which can be written in about 15 lines of code (see this example). The two reasons why the actual implementation is much more complicated are generalization and optimization.

Generalization

Generalization allows to produce an entire family of fractals using slightly modified variants of the classic Mandelbrot set equation. The general recursive fractal equation used in Fraqtive is:

Zn+1 = f(Zn)n + C

For Mandelbrot sets, C is the same as Z0 (that is, the coordinates of the point). For Julia sets, C is a constant parameter.

Function f determines the variant of the fractal. Normally it's simply identity function (f(x) = x), but there are three other variants used in Fraqtive:

  • f(x) = Re(x) - i*Im(x)
  • f(x) = |Re(x)| + i*|Im(x)|
  • f(x) = Re(x) + i*|Im(x)|

Generally, these are combinations of negations and absolute values which mirror the point about the X and/or Y axis.

Value n is the exponent of the equation which change the number of symmetry axes of the fractal. It can be either integer or fractional.

The recursive equation is calculated repeatedly for each point until value Z exceeds the bailout radius (which is 8 for better accuracy) or until reaching the maximum number of iterations. The continuous coloring algorithm is used to calculate the value of the points in order to improve the quality of the generated image. Finally the square root of the value is calculated for better color mapping in highly magnified areas.

Optimization

Optimization implemented in Fraqtive is based on the observation that no matter how much the fractal is magnified, there are always some areas which contain many details and some which contain few details. The algorithm uses simple heuristics to detect areas which can be skipped. In those parts simple linear interpolation is used instead of calculating the recursive equation for every point. This allows to improve the generation speed several times.

The fractal surface is divided horizontally into rectangular regions. This allows to progressively update the image while calculating and to perform parallel calculations on multi-core processors. Each region consists of a number of whole cells of 3 x 3 points plus one row of points on the right and bottom edge:

x . . x . . x . . x
. . . . . . . . . .
. . . . . . . . . .
x . . x . . x . . x
. . . . . . . . . .
. . . . . . . . . .
x . . x . . x . . x

Calculation of each region is performed in three steps:

  • pre-calculation: every third point in every third row is calculated (shown as x)
  • interpolation: the calculated points are linearly interpolated to the remaining points
  • calculation: remaining points are calculated for areas containing significant details

The heuristics for determining which points are calculated works by comparing the absolute difference between values of pre-calculated points with a threshold value.

1 a a 2
b c c d
b c c d
3 e e 4

For points a, the values of points 1 and 2 are compared. For points c, the values of all four pre-calculated points are compared.

Code Design

The generator requires a modern C++ compiler such as GCC 3.4 or Visual C++ 2005. It uses intrinsics for SSE2 support and some inline assembly for detecting the availability of SSE2. It depends on Qt to simplify detecting the architecture (compiler, operating system and byte order), but this dependency could easily be removed. It can be compiled with or without SSE2 support depending if the HAVE_SSE2 macro is defined.

The interface of the generator is very simple. There is one abstract class called the Functor, which can be used to calculate the value of a point at given coordinates. There are a few functions for creating functor objects of various types. There are also three functions which implement the three steps of the generation algorithm described above and helper structures for passing input parameters (the coordinates of the fractal region) and output parameters (memory buffer and resolution).

For maximum performance, to avoid conditional branches and improve code optimization, separate procedures for each variant and each value of exponent are used. SSE2 instructions are used if available, allowing to simultaneously calculate two points. Taking into account different variants and exponents, there are 40 different procedures with SSE2 and 48 procedures without SSE2 (fractional exponent calculations don't use SSE2). Advanced metaprogramming techniques are used to automatically generate these procedures in such way that they are equally fast as if they were written in pure assembly language.

Code Documentation

Calculations

The calculateResult() function implements the continuous coloring algorithm and calculates the square of the value based on number of calculated iteration and the final radius. Note that the term 'radius' actually means square of the radius, i.e. the distance between current point and (0,0).

The adjust() function template is parametrized using the Variant enumeration. It is specialized for each variant and performs the negation or absolute value calculation in each iteration. The adjustSSE2() function does the same on SSE2 registers, which store two double values. The negation and fabs() operations are performed using bitwise XOR and AND instructions and appropriate bit masks, effectively clearing or inverting the sign bit.

The calculate() function contains the main loop for calculating fractal with arbitrary fractional exponent. It is parametrized with the variant and calls the appropriate adjust() function is each iteration.

The calculatePower() function calculates an integral power of given complex number. The power is given as template parameter. The power operation is defined recursively as:

Zn = Zn/2 * Zn/2 (when n is even)
Zn = Z * Zn-1 (when n is odd)

The compiler generates fast code by inlining the recursive invocations of this function. Recursion is ended with specialization for n = 2. In addition, the calculatePower() function also calculates the (squared) radius to avoid calculating it separately.

The calculateFast() function contains the main loop for calculating fractal with integral exponent. It is parametrized with both the variant and the exponent value and calls appropriate adjust() and calculatePower() functions to create all possible combinations of variants and exponents.

There are also calculatePowerSSE2() and calculateFastSSE2() functions which works the same way as the regular ones, except they calculate two points simultaneously by using the SSE2 registers and instructions. The calculateStepSSE2() function performs a single step of the loop. After each step the radius of both points is compared with the bailout value. When a point exceeds the bailout radius, the result is stored, and when both points are already calculated, iteration is stopped.

The RepeatStepsSSE2 template clones the calculateStepSSE2() function the given number of times, depending on the STEPS parameter, by invoking itself recursively with number of steps decreased by one. This way multiple steps can be executed one after another in a single loop iteration, which speeds up the calculations. The AutoStepsSS2 is a helper class used to determine the number of steps depending on the exponent. The optimal number was found by experimenting - it's two steps for exponents from 2 to 5 and one step for higher exponents. Larger number of steps makes the code too big to fit in the instruction cache so there is no gain of performance.

Note that because templates are used, the compiler actually inlines all the code in the calculate() and calculateSSE2() functions. The templates and helper functions only describe how this code is generated for each combination of the variant and the exponent. Doing the same in pure assembly language would be almost impossible. This is exactly what metaprogramming or generative programming is all about.

What's more, the number of variants and the maximum allowed exponent can be easily increased without significant changes in the code. Otherwise the number of required functions would grow very fast or they would have to contain conditional code which would be significantly slower.

Functor Factories

Problem: create instance of appropriate class template based on run-time parameters. Connection between virtual method overriding (run-time polymorphism) and generic programming (compile-time polymorhism).