I checked the code and I didnt find a bug or an not initialized variable or anything else. No reductions are present. The problems seems to be caused really by the kind floatingpoint-opreations are done.
Do you have a (reduced yet complete) example to play around with (compile, run)? It's good you examined the code, but I think it's unlikely the compiler changes the kind of floating point operations... Maybe something along the lines of what Mark suggests is happening, but in any case, the problem seems to be ill-conditioned (or something similar to), in which case parallel computing exposed it, hopefully in the early stages...
HTH,
Fernando.
