Faster way to multiply floats

We can do left shift operators in C/C++ for faster way to multiply integers with powers of 2.

But we cannot use left shift operators for floats or doubles because they are represented in different way, having an exponent component and a mantissa component.

My questions is that,

Is there any way? Like left shift operators for integers to faster multiply float numbers? Even with powers of 2??

13.10.2009 19:19:46
Floats have two elements (mantissa and exponent) which are powers of 2. What are you asking?
S.Lott 13.10.2009 19:23:43
don't try to outsmart your compiler without profound reasons
Christoph 13.10.2009 19:23:55
There are many pdfs explaining fast floating point product algorithms in google
Tom 13.10.2009 19:25:33
The compiler choses how to implement multiplication. The fact that you can do it faster by shifting is an old, old myth.
nos 13.10.2009 19:50:59
It's true that most compilers have recognized multiplication of integers by statically defined powers of two, and turned them into shifts (if helpful) for quite a while. That doesn't apply to floating point though. Having written a few compilers, and examined the output from quite a few more, I feel quite safe in stating categorically that know more than any compiler I've seen yet. Contrary to popular belief, compilers do NOT seem to be improving in this respect either -- the best FP optimization I've seen was on mainframes, decades ago.
Jerry Coffin 13.10.2009 20:10:08

No, you can't. But depending on your problem, you might be able to use SIMD instructions to perform one operation on several packed variables.. Read about the SSE2 instruction set.

In any event, if you are optimizing floating-point multiplications, you are in 99% of the cases looking in the wrong place. Without going on a major rant regarding premature optimization, at least justify it by performing proper profiling.

13.10.2009 19:31:45

You could do this:

float f = 5.0;
int* i = (int*)&f;
*i += 0x00800000;

But then you have the overhead of moving the float out of the register, into memory, then back into a different register, only to be flushed back to memory ... about 15 or so cycles more than if you'd just done fmul. Of course, that's even assuming your system has IEEE floats at all.

Don't try to optimize this. You should look at the rest of your program to find algorithmic optimizations instead of trying to discover ways to microoptimize things like floats. It will only end in blood and tears.

13.10.2009 19:34:24
Gah.. that code gives me the willies. Also, passing data between the floating point registers CPU registers, is often a very costly operation. Which is why even float-to-int conversions suck.
Mads Elvheim 13.10.2009 19:36:41

Truly, any decent compiler would recognize static-time power-of-two constants and use the smartest operation.

13.10.2009 19:32:11
I have to guess that you rarely (if ever) really examine the output of a compiler. I have -- nearly none of them is very smart, especially when it comes to floating point. Intel's does about as well as any I've seen recently, and I'd barely rate it as "mediocre" in this respect -- most of the others are substantially worse.
Jerry Coffin 13.10.2009 20:11:02

In Microsoft Visual C++, don't forget the "floating point model" switch. The default is /fp:precise but you can change it to /fp:fast. The fast model trades some floating point accuracy for more speed. In some cases, the speedups can be drastic (the blog post referenced below notes speedups as high as x5 in some cases). Note that Xbox games are compiled with the /fp:fast switch by default.

I just switched from /fp:precise to /fp:fast on a math-heavy application of mine (with many float multiplications) and got an immediate 27% speedup with almost no loss in accuracy across my test suite.

Read the Microsoft blog post regarding the details of this switch here. It seems that the main reasons to not enable this would be if you need all the accuracy available (eg, games with large worlds, long-running simulations where errors may accumulate) or you need robust double or float NaN processing.

Lastly, also consider enabling the SSE2 instruction extensions. This gave an extra 3% boost in my application. The effects of this will vary depending on the number of operands in your arithmetic—for example, these extensions can provide speedup in cases where you are adding or multiplying more than 2 numbers together at a time.

30.12.2015 06:36:46