I've got a numerically intensive program on MacOS 10.5, and I am trying to apply some OpenMP to speed it up.
I think that I know something of what I'm doing in terms of the underlying MP universe: I was part of the design team for a shared-memory multi-'super'-computer years ago. Not that I couldn't be rusty or confused.
Anyway, I identified a region of code that I believe to be completely safe for parallelization. It is four 'for' loops. Each loop writes a distinct part of a matrix by reading from a wide variety of other shared data. The loops are coded one-after-another, so I put a 'sections' construct around the group of 4, and a section around each. I did not parallelize the loops themselves. Since I have only two cores to work with, it seemed as good as anything to take advantage of the natural division of labor.
I run this, and my program exhibits some rather bizarre failures. Some numbers that aren't ever written inside of the parallel section turn up with impossible values.
I am currently running various very long-running experiments to try to isolate this, as it might turn on OpenMP versus not-open-MP, or it might turn on -O2 versus -O3.
My question for the esteemed readers of this form is the following: various websites I read seem to assume that serious OpenMP-ery will take place with gcc 4.3 or gcc 4.4. Here I am with gcc 4.2. Is it generally thought to be stable for this purpose, particularly on Mac OS?