D3d11 Extentions: I need barycentric coords. AMD has a d3d11 extension for it. For Nvidia a geometry shader is required, but it looks like they have a nvAPI fast geometry shader that might work.
AMD Polaris: The reduced cost for small triangles is what most interests me here
GPUOpen: ATI open source with hair, shadows, gpu compute etc
Screen Space Reflections: implementation details
C survey: undefined behavior yadda yadda
compilers blog
math stuff
LZSSE: faster decompression than lz4
small lz4 -- smaller lz4 compatible files
corner wang tiles
fractal stuff
hg_sdf + puoet
povray: list of shapes supported has some interesting shapes
custom vertex fetch: see sebbbi's post. You can manually fetch vertex data instead of relying on fixed function. Can use this to encode extra bits of data into any unused bits in your indices. Runs well on AMD, but appears to perform very poorly on Nvidia.
Timing from Turanszkji's post:
GPU Method ShadowPass ZPrepass OpaquePass All GPU
NVidia GTX 960 InputLayout 4.52 ms 0.37 ms 6.12 ms 15.68 ms
NVidia GTX 960 CustomFetch (typed buffer) 18.89 ms 1.31 ms 8.68 ms 33.58 ms
NVidia GTX 960 CustomFetch (RAW buffer 1) 18.29 ms 1.35 ms 8.62 ms 33.03 ms
NVidia GTX 960 CustomFetch (RAW buffer 2) 18.42 ms 1.32 ms 8.61 ms 33.18 ms
NVidia GTX 960 CustomFetch (typed buffer) 18.89 ms 1.31 ms 8.68 ms 33.58 ms
NVidia GTX 960 CustomFetch (RAW buffer 1) 18.29 ms 1.35 ms 8.62 ms 33.03 ms
NVidia GTX 960 CustomFetch (RAW buffer 2) 18.42 ms 1.32 ms 8.61 ms 33.18 ms
AMD RX 470 InputLayout 7.43 ms 0.29 ms 3.06 ms 14.01 ms
AMD RX 470 CustomFetch (typed buffer) 7.41 ms 0.31 ms 3.12 ms 14.08 ms
AMD RX 470 CustomFetch (RAW buffer 1) 7.50 ms 0.29 ms 3.07 ms 14.09 ms
AMD RX 470 CustomFetch (RAW buffer 2) 7.56 ms 0.28 ms 3.09 ms 14.15 ms
AMD RX 470 CustomFetch (typed buffer) 7.41 ms 0.31 ms 3.12 ms 14.08 ms
AMD RX 470 CustomFetch (RAW buffer 1) 7.50 ms 0.29 ms 3.07 ms 14.09 ms
AMD RX 470 CustomFetch (RAW buffer 2) 7.56 ms 0.28 ms 3.09 ms 14.15 ms