- Code: Select all
`void valuep(float x,float y,float *xaxis,float *yaxis,float *r,long long int totalnum,float *p)`

{

long long int i,j;

float dist,path=0,tp=0;

#pragma omp parallel for default(none) num_threads(num) shared(totalnum,x,y,xaxis,yaxis,r) private(i,dist,path) reduction(+:tp)

for(i=1;i<=totalnum;i++)

{

path=0;

dist=(x-xaxis[i])*(x-xaxis[i])+(y-yaxis[i])*(y-yaxis[i]);

if(dist<r[i])

{

path=sqrt(r[i]-dist);

}

tp=tp+path;

}

#pragma omp barrier

*p=tp;

}

The main function:

- Code: Select all
`float ph;`

for(j=1;j<=130000;j++)

{

for(i=1;i<=130000;i++)

{

valuep(i,j,x,y,r,totalnum,&ph);

phase[(j-1)*130000+i]=ph;

}

printf("j= %lld ",j);

}

The problem:

My computer has 24 cores.

totalnum=2730256 is large enough.

When I set num=20,it cost 57s for every "printf("j= %lld ",j)"

But when I set num=5,it cost 44s for every "printf("j= %lld ",j)"

More cores,but not less time.So I don't know what's wrong with my program? Did you experience this problem before?

Who can help me ? Thank you very much!!!