VS2015 generates inefficient code for these instructions
float floored = std::floor(some_float);
So here is what VS generates with /AVX2 switch thrown:
00007FF6EE961016 vmovss xmm1,dword ptr [bob]
00007FF6EE96101C vcvttss2si ecx,xmm1
00007FF6EE961020 cmp ecx,80000000h
00007FF6EE961026 je main+4Bh (07FF6EE96104Bh)
00007FF6EE961028 vxorps xmm0,xmm0,xmm0
00007FF6EE96102C vcvtsi2ss xmm0,xmm0,ecx
00007FF6EE961030 vucomiss xmm0,xmm1
00007FF6EE961034 je main+4Bh (07FF6EE96104Bh)
00007FF6EE961036 vunpcklps xmm1,xmm1,xmm1
00007FF6EE96103A vmovmskps eax,xmm1
00007FF6EE96103E and eax,1
00007FF6EE961041 sub ecx,eax
00007FF6EE961043 vxorps xmm1,xmm1,xmm1
00007FF6EE961047 vcvtsi2ss xmm1,xmm1,ecx
Not good.
With AVX enabled I'd expect to see roundss used.
Here is a custom implementation of floor using intrinsics.
float floor_avx(float a) {
__m128 o;
return _mm_cvtss_f32(_mm_floor_ss(o, _mm_set_ss(a)));
}
And the assembly:
00007FF7461C1016 vmovss xmm1,dword ptr [bob]
00007FF7461C101C vmovaps xmm2,xmm1
00007FF7461C1020 vmovups xmm1,xmmword ptr [rsp+20h]
00007FF7461C1026 vroundss xmm3,xmm1,xmm2,1
There seems to be a few extra moves here for whatever reason, but at least it is in the ballpark of reasonable.
The same problem exists for std::trunc, std::ceil, and applies to both float and double.
Anyway I reported this on Connect(
floor/ceil/trunc), although my experience in the past with Connect has not been great..
Well, hopefully they fix this one..
Here is what std::trunc generates: It calls a function, instead of using roundss
00007FF750091016 vmovss xmm0,dword ptr [bob]
00007FF75009101C call qword ptr [__imp_truncf (07FF750092108h)]
(Edit: VS2017 is better, but still misses some optimizations with std::trunc and std::round)
godbolt link for x64