I will soon be building a workstation with 4 processors. Each processor will have 4 cores, totaling 16 cores. I will have 4 distributed shared memory slots - 1 processor (ie. 4 cores) per shared memory slot.
Can OpenMP run on more than 1 processor, or would the operating system choose to run all the threads on 1 processor shared between the 4 cores?
If I want to take full advantage of this architecture, would I have to write an MPI program embedded with OpenMP? For example, would I have to specify to use 4 processors in MPI and 4 threads in OpenMP to achieve full usage of the 16 cores? How else can this be done?
Thank you for your time and help.