Monday, November 27, 2017

GPU Texture Format for PBR

Visuals with PBR.  


Objective: encode a full PBR set of channels into as few bits as possible, with good performance.

From testing it appears that each additional texture adds significant cost, using 3 textures to encode the PBR signal with a size of 2 bytes per texel is much more costly than 2 textures at the same overall number of bytes per texel.

The PBR system I'm using has 8 channels:

3 base color
2 normal 
1 roughness
1 metallic
1 AO

 If we attempt to store the normal in BC5, a two channel format designed specifically for tangent space normals, we have 6 channels remaining, and cannot fit that into a single texture, as none of them support more than 4 channels.
So we cannot use BC5.

 There are two good options I've found instead, both using the same layout, two textures, both with 4 channels.
On anything D3D11 and newer, BC7 can be used.
For pre-D3D11 systems, BC3 textures can be used instead. 

The normal will be split and stored into the alpha channels, which should help preserve precision of the normal.

Texture1: RGB: base color A: normal.x 
Texture2: RGB: ao, roughness, metallic.  A: normal.y 

*Texture1 can be safety set to SRGB, as both BC3 and BC7 treat the alpha as linear.

Uncompressed signal: 8 bytes 
Compressed: 2 bytes in both BC3 and BC7 formats

Encoding speed:

Using AMD Compressonator BC3 is fast to encode, even with quality set high it churn through BC3 fairly quickly.

Another encoder I tested was Crunch, a BC1/BC3 compressor that applies a lossy entropy reduction algorithm on top of the lossy block compression algorithm- this enables crunched BC1/3 files to compress much smaller on disk.
I decided not to use it because the compressor was very slow, and I feel that BC1 already looks less than stellar(the endpoints are 565..)-- throw in even more artifacts from Crunch and the textures just didn't look very good.

AMD Compressonators BC7 encoding is not nearly as fast as its BC3. 
This is understandable as the format is vastly more complex.

With the quality set to low, it still takes much longer than BC3 at high quality. 

BC format Impact on Rendering
There is no observable difference in rendering performance between BC3 and BC7 on my AMD 280x.  
Both are observably faster than uncompressed, not surprising given that uncompressed is 4x larger.

BC3 vs BC7 Visual Quality: 

I have only run BC7 high quality on a few images, I'd probably have to run it overnight and then some to generate high quality BC7 for everything.

 Comparing low quality BC7 vs high quality BC3:

BC3's RGB part(identical to BC1), can only encode 4 possible colors in each 4x4 region, BC7 is far less limited.

For noisey images the difference isn't all that noticeable, but if you look closely BC7 generally has slightly more detail.

For anything with smooth gradients BC7 is clearly superior.


BC3 has dedicated 8 bit end points and 3 bit indices for the alpha channel, while BC7 may or may not even have dedicated indices for alpha, as this is chosen on a per block basis. 

There is no obvious difference in the normals, but when I zoom in I can occasional spot areas where BC3 appears to have done a better job, but this is rare, and the overall improvements in the other channels is larger improvement than this small loss. Also running BC7 high quality may change this--

 Size on Disk: 
Both BC3 and BC7 are 8 bits per pixel
When bit compressed, in this case with zstd, the BC7 files are generally about 1-2% smaller.

I tried lzham(an LZMA variant), but the files are only about 5% smaller than zstd level 19, not worth the 5x slower decode.

Possible/Future Improvements:

1.  Quality of all channels can be improved by tracking min/max for the entire image and then re-normalizing it. This would require 2 floats per channel in the shader to decode though.

2. The normals in the normal map are in euclidean space, this wastes bits since some values are never used. Octahedral coordinates make better use of the available bits, and decoding isn't really much different.

Metal channel is active for many of the objects seen here