I know it is logic. But the decrease in efficiency looks to be very fast. And finally for 32 cores, I get only 22% (only 7 times faster).
And my question: Is there any way to increase the efficiency especially using 16 or 32 cores?
Actually I don't make parallel by blocks. I just use "Parallel Do" to run the loops as parallel. So, I think I have no control on the load imbalance.
My code solver the 3D NS equations, explicit for momentum, and using multigrid for pressure correction, with sor smoother.
Users browsing this forum: Google [Bot] and 3 guests