This is going to be a dump of various things I've worked in recently.
I stumbled onto AMD's cluster culling which they added to GeometryFX.
Basically for a given mesh cluster, you can often perform a variant of backface culling on the entire cluster.
You do this by calculating a cone that represents the region in which the cluster is not visible.
Any viewer located within the cone, is unable to see the cluster, so it can be culled.
AMD implementation works like this:
- Find the average normal of the cluster
- Take the dot product of each normal against the average normal, and find the minimum.
- Use this as cone angle, anything greater than 0 can be culled in some situations.
They also do some other work involving the bounding box, to prevent some errors cases they had to deal with.
This is a smallest circle problem, the AMD solution using the average axis is rarely going to produce the tightest circle.
For my code I run multiple algorithms, the average, the min/max axis, and then run 1 round of ritters method over the data using whichever axis was the best. The average axis is pretty bad generally, so even just using min/max axis is a good improvement.
If you want an exact algorithm, you could try this method, although it will be slower to calculate.
The cull rate various heavily depending on the scene. It is also much more effective at higher details(smaller cluster size). Sometimes it is only 1%, but I have seen it go up to around 15%.
My engine does not generate clusters if they are outside the frustum or occluded, which reduces opportunities for culling.
In a standard game engine with offline generated content the cull rate would likely be higher.
At some point the last few weeks I added support for map file loading and saving. So now I have a map with a few thousands objects placed around in it, and I can stress test how the system handles it.
Once that was working, I had to speed up the file loading, because as the map grew, it had many textures it required. My texture loader now converts all textures into a cached binary lz4 compressed format for subsequent loads. It is pretty fast now, loading 100's of mb of textures on start in about 1 second. The SSD helps. I might use mmap for the files that will be only accessed on the CPU at some point in the future.
Previously only objects placed in the world had textures, the "base layer" which is basically RMF noise, was using just a few colors that had been manually assigned. Now it supports textures, using a pretty complicated selection process. I still need to tweek it, and expose it to the user. I also need to get better textures, some of my test textures are less than great.
I have also reduced the data required for storing material information. Previously the # of materials that could effect a vertex was unbounded.
I had to change this because I decided to store a texture ID per vertex, which meant that I needed to limit it to only one. I use an importance map to ensure the blend line isn't too obvious.
I have a custom occlusion system not based on the standard high Z/software rasterization/hardware occlusion queries approaches. It was working by running a filter on the render command list, but this was not as efficient as I wanted, since it involved chasing a pointer per object. It now gathers everything up as a stream out from the frustum/cluster cull phase, which is fed to the occlusion phase.



