File under: compilers, development, C, crtrm

blog / Compilers make me :(

An insignificant change in the code causes a 30x slowdown.

Optimisation?

This issue cost me about a day of work. What worries me is that if I had hit this bug to begin with, I probably would have given up the entire project.

The problem started when I rewrote the code to crtrm. I rewrote the critical path, so I can pass an extra vector through it. This is vital for supporting textures. Like this.

Textured objects

The frame rate dropped by 30x.

Before my "improvements"

crtrm$ time ./benchmark 
Starting engine...
Engine initialised!
Internal resolution:  64x64
Starting glut engine...done!

real	0m3.636s
user	0m3.540s
sys	0m0.030s

After my "improvements"

crtrm$ time ./benchmark 
Starting engine...
Engine initialised!
Internal resolution:  64x64
Starting glut engine...done!

real	1m39.917s		<---  30x slowdown!
user	1m39.750s
sys	0m0.080s

Debugging

I start running profilers and cachegrind to see what's going on. The tools point out various issues, like my homemade sine function appears to be taking a ludicrous amount of time to be read in, or that my main loop is taking too long (how helpful). I fix all the reported problems, but the slowdown remains.

In desperation, I grab an old version and start reverting my code, line by line, until I get to the most minor change that I made.

The culprit

The last line of the critical path was:

  intersection i = {start, point_s, dir, dist, count, FALSE, zeroVec};  //no_intersection
  return i;
}

If the ray being cast doesn't hit any part of the scenery, we build an intersection struct and return it with nothing in it (FALSE). This is the last part of the critical path for a raymarcher, and it handles the case where the ray misses every object in the scene and hits nothing. It gets called around 80 million times per second.

This "no-hit" structure has to be created in several different parts of the codebase, so I turned it into a function, and returned a call to the no-hit function.

  return no_intersection();
}

where no_intersection() holds the original lines to build an intersection data structure and return it.

That one change slowed the code by 30 times. Apparently GCC didn't inline a function when the result is immediately used by a return call. They've probably fixed this by now, I really should go back and check one day.

Summary

In summary, C can bite me.