I've been experimenting with volume compression.
My data is single channel distance values(otherwise known as signed distance or SDF), initially in f32 format.
These volumes are generated from a mesh during voxelization.
For high poly meshes, a large volume is needed to accurately capture details.
For example the standford xyzrgb_dragon
The source file is a 133MB ! uncompressed .ply.
A volume of:
w = 257, h = 145, d = 173
Is able to fairly accurately reproduce the dragon
So the uncompressed volume is:
257*145*173*sizeof(float) ~= 24.49mb
Lets make sure we have really captured all the detail, raising the settings..
w = 513, h = 287, d = 343
513*287*343*sizeof(float) ~=192.64mb
Using this amount of memory for single mesh volume is not desirable.
(I should note, this is the most detailed mesh I've tested against, so it is an extreme example.)
The first step I used to compress the data was to store it as u8 instead of f32.
I did this by finding the min/max of the entire volume, and then scaling it such that
0 = min float
255 = max float
If we leave it with linear stepping, 255 values is not really enough for a large volume such as this dragon. Truncation errors become visible on the surface.
Since the data is a SDF, we are mostly interested in accuracy near zero.
We can increase precision near zero by encoding distance with a gamma2 curve.
This steals precision from the upper range, and pushes it toward zero.
We can increase precision near zero by encoding distance with a gamma2 curve.
This steals precision from the upper range, and pushes it toward zero.
encode: sqrt_keep_sign(t)
decode: t*abs(t)
(there is slightly more complexity encoding/decoding, but this is the jist)
Now the 192.64mb file is 48.16mb. Still far too large.
I decided to investigate texture compression methods to see if they could be useful here.
Although most seemed geared toward encoding either color or normal data.
Although most seemed geared toward encoding either color or normal data.
Digression into GPU texturing compression..
compression methods: overview of old palettised methods, and dxt
hardware compression: see simons answer, but this image sums up the dxt approach
Basically for each 4x4 block of texels, find the best two colors that will act as endpoints to lerp between.
These endpoints are only stored in 565 format.
Each pixel is assigned a 2 bit index, which represents the lerp factor.
The original 16 colors, must now be represented by 4. And those 4 must be on a linear line.
Sounds horrible, but somehow this produces acceptable results for color data, as most games use this technique.
ASTC is a new standard for GPU texture compression algorithm.
It supports block sizes other than 4x4, for adjustable compression rates.
It supports block sizes other than 4x4, for adjustable compression rates.
I was curious how this might work, given that dxt had 2 bit weights per texel, astc must be doing something new.
The ASTC specification(astc), indicates that they perform interpolation on a lower resolution weight array.
ASTC can also use multiple sets of endpoints within a given block. It appears to support 2 different line segments, and a 1 bit index per texel indicating which segment to use.
After reading over the implementation details of these approaches I decided they might not be the best fit for storing SDF data.
Also I'm using d3d11, which does not support ASTC.
And old school dxt does not support volumes.
Also I'm using d3d11, which does not support ASTC.
And old school dxt does not support volumes.
I took what I had learned from my reading, I decided to come up with my own approach, one that takes into account the fact that my data represents a SDF, which means I only care about accuracy near the surface.
My approach works like this:
I break the volume down into N^3 sub blocks.
If a block contains a sign change I keep it, and stick it into hash map.
Identical blocks are merged here.
I also generate a block pallet buffer which maps each N^3 region to a corresponding block, or an empty flag if no block exists.
I also have an optional lossy toggle which will merge similar blocks.
Once we have the total number of unique blocks, I losslessly compress the pallet buffer by using the minimum number of bits to represent each entry.
For instance, if we have <= 1024 blocks, we only need 10 bits per entry.
The maximum value for a given number of bits is used as the empty flag.
Many meshes exhibit symmetry.
We can exploit this by allowing a flag per pallet index, this indicates if we should flip the block around x/y/z.
If a block contains a sign change I keep it, and stick it into hash map.
Identical blocks are merged here.
I also generate a block pallet buffer which maps each N^3 region to a corresponding block, or an empty flag if no block exists.
I also have an optional lossy toggle which will merge similar blocks.
Once we have the total number of unique blocks, I losslessly compress the pallet buffer by using the minimum number of bits to represent each entry.
For instance, if we have <= 1024 blocks, we only need 10 bits per entry.
The maximum value for a given number of bits is used as the empty flag.
Many meshes exhibit symmetry.
We can exploit this by allowing a flag per pallet index, this indicates if we should flip the block around x/y/z.
Blocks that extend beyond the bounds of the source volume, are zero padded.
For regions which do not have a block(because no sign change), I use the 2nd mip map.
This mip is 1/64th the size in memory.
This mip is 1/64th the size in memory.
In the case of the dragon the 2nd mip is (48.16mb/64) == .75mb
When sampling, I first read the 2nd mip.
If we are not near a surface, we are done.
Only if the sample is within a certain threshold do we need to read into the pallet.
Reading into the pallet produces the index of the correct block, or an empty pallet value.
If empty, we use the 2nd mip value we already read.
Otherwise we sample from the block.
After this we perform linear or cubic interpolation on the results.
To avoid regenerating these massive volumes & pallets, I cache them on disk.
On disk they are currently compressed using zstd at level 22(no dict)
In the future I might use lzma, now that my uncompressed size is small enough for lzma to not be a bottleneck during load.
In the future I might use lzma, now that my uncompressed size is small enough for lzma to not be a bottleneck during load.
Prior to bit compression, I run a simple PNG style entropy reduction pass, this uses the previous value in the stream to predict the next, which seems to work fairly well on my data.
The 192.64mb dragon becomes 2.7mb uncompressed(pallet + blocks), or 533kb when run through zstd for disk storage.
I suspect I could further reduce the size by encoding the blocks as 4 bit values representing offsets from the 2nd mips predicted value, but this would be quite lossy.
From testing the maximum difference between the 2nd maps value, and the true value, was about (-38 to +35), although the majority were <3.
Encoding this in 4 bits would be very lossy, but perhaps a 5 or 6 bit representation would work well.
Running dxt/astc style compression on the blocks and mips is another option, although I'm concerned the quality loss would not be worth it.
At this point I think the volumes are small enough for my purpose, I can load hundreds of them into memory without worrying. Also this dragon is the most extreme of them all, most are much smaller.
Running dxt/astc style compression on the blocks and mips is another option, although I'm concerned the quality loss would not be worth it.
At this point I think the volumes are small enough for my purpose, I can load hundreds of them into memory without worrying. Also this dragon is the most extreme of them all, most are much smaller.



