Although !$omp offers much performance improvement if suitably used, there are also some basic programming strategies to minimise memory access and improve performance. These become even more significant when using openmp. From the coding example msohail posted on 7th March, I would recommend changing the do loop order to sequentially access memory:
!$omp parallel do collapse(3)
Q(I,J,K)=V(I,J,K) ( update the variable )
!$OMP END PARALLEL DO
You did do this with the other nested loops, but (depending on the values of NI,NJ,NK) you might be surprised how slow your alternative loop is for accessing memory, especially if arrays Q and V are much larger that the cache size of the computer you are using.
I have not used collapse, but I wonder if NK is much larger than the number of available threads, would collapse have any benefit ? There is no indication of the relative size of NI,NJ,NK to indicate if collapse is worthwile or which loop order might be most effective.