Particle Rendering Hardware Considerations¶

CPU Cores vs. Memory¶

In Particle rendering mode, the particle data is loaded into memory, illuminated by every light and then sorted in camera space and splatted back to front using the current Filter Mode.
The illumination sorting and drawing, as well as the final pass sorting and drawing, are all multi-threaded and scale well up to about 8 cores.
- Adding more threads does not increase performance in a linear manner due to the overhead of managing multiple buffers, one for each thread.
- Rendering will be faster with 16 cores when compared to 8, but not twice as fast.
- In our tests, rendering on 40 cores was actually slower than rendering on 32 cores due to the management overhead.
- The latest versions of KRAKATOA will also measure the memory requirements of the particles and image buffers, and potentially reduce the actual number of threads used to ensure successful rendering.
Thus, when making hardware purchase decisions, going for more memory is recommended over going for 16, 24 or 32 cores.

Some motherboards support memory modules in dual-channel, tri-channel, or quad-channel setup.
Hardware test sites claim that tri-channel memory does not provide a measurable benefit, but our own tests have shown that faster memory access can speed up KRAKATOA rendering.
KRAKATOA is very memory intensive, so faster memory modules and multi-channel configurations usually offer a measurable benefit.
When chosing memory, you should go with the fastest possible configuration.

Most modern CPUs have multiple physical cores, plus the same number of logical cores thanks to Hyper-Treading (HT).
KRAKATOA has been tested in various configurations of physical and logical cores, and the following results emerged:
- Hyper-Threading offers a significant benefit when rendering in KRAKATOA - rendering on 2 CPUs with 4 cores and HT enabled was only 23% slower than rendering on 4 CPUs with 4 physical cores each (16 threads in both cases).
- Without Hyper-Threading, rendering with Affinity to physical cores on the same die would be faster than rendering on the same number of physical cores spread accross multiple dies (possibly due to on-die cache performance benefits)

When rendering many millions of particles stored on disk, loading the data into memory often accounts for about half of the total rendering time.
The loading speed depends on two factors - storage bandwidth and CPU performance.
A PRT file contains a single compressed data stream which needs to be uncompressed at loading time, thus the typical bottleneck is CPU performance.
- KRAKATOA MX is currently the only implementation which supports the optional parallel loading of multiple PRT sequences (Partitions) within the same PRT Loader.
- When the “Fast Particle Loading” option is enabled and the machine has multiple cores, the storage medium performance can become the bottleneck.
- For fast local iterations, an Solid State Drive (SSD) or a dedicated storage solution like a FusionIO card can increase PRT loading performance.
- SCSI drives tend to outperform HDD, with network storage usually being the slowest.
When network-rendering using KRAKATOA on multiple network render nodes, it is important to remember that multiple KRAKATOA instances would hit the storage in about the same time, and can potentially saturate the network.
In some cases, rendering on fewer render nodes can actually increase rendering performance.