File under:

blog / Compilers make me :(

Forcing GCC to emit the correct SIMD code

I was adding SIMD support to crtrm. SIMD is a CPU feature that gives a speed boost to mathematics by doing 4 maths operations at the same time, instead of the more usual 1. Today everybody's computer has some version of SIMD, so sooner or later I would have to support it.

After some fooling around with compiler options and C definitions, I got something that compiled fine. Until I looked at the actual display window, that is. It was completely corrupted. More than a little bit concerned, I switched off SIMD, and everything looked perfect. SIMD on, corrupted. SIMD off, fine.

Luckily I had enough experience to be sure that my code was solid, and so I started trying to figure out what was wrong with SIMD, rather than what was wrong with my code. As it turned out, the problem was neither my code, nor SIMD, but GCC. After fewer hours than I expected, I found the answer in my debug output:

Subtracting two vectors:
<1 1 1 1> -
<1 1 1 1> 
-------------------
 0 0 0 0

Adding two vectors:
<1 1 1 1> +
<1 1 1 1> 
-------------------
 2 1 1 1

That's not right.

GCC was not emitting SIMD instructions when adding vectors (but oddly, it was getting subtract correct). In effect, GCC thought that the vector was just one number, so it only added the first number in the vector. The "Multiple" part of "Single Instruction, Multiple Data" wasn't happening.

Another few hours with the manuals and I was able to force GCC to use the correct SIMD. I'm happy that I didn't spend days combing through my code for my mistake, as I would have in years past. Sometimes, the problem really is the compiler.

The solution

Here's how to force GCC to emit the correct SIMD instructions. This uses GCC-only extensions to the C language, which are relatively well documented, once you realise that you need them.

INLINE simd vAdd ( simd a, simd b) {
	simd c;
	float __attribute__ ((vector_size(16))) v1  = a.v;
	float __attribute__ ((vector_size(16))) v2  = b.v;
  
	c.v = __builtin_ia32_addps(v1, v2);
	return c;
}

or, if you have declared your vectors as unions of SIMD and float arrays

	INLINE simd vAdd ( simd a, simd b) {
		c.v = __builtin_ia32_addps(a.v, b.v);
		return c;
	}