OpenMP and BLAKE

General OpenMP discussion

OpenMP and BLAKE

Postby Neshor » Mon Feb 11, 2013 3:23 pm

I have big problem to parallelize BLAKE using OMP. They sugested in specification that it is possible to parallelize "column step" and "diagonal step". I try to do this but the results are opposite that I expected (10 times slower than one-threaded). I know that authors of BLAKE published BLAKE2, which is improved (faster) version of BLAKE, but it has different implemention (tree-hashing) than BLAKE and this is quite hard to understand for me. My task is to do compare of one-threaded and multi-threaded implementation using OMP. So I try to do this on implementation that I understand. I am not expert of OMP, I want to make BLAKE multi-threaded in the easiest way possible. I must do proper implementation with OMP even if the performance may not be better. This is part of my code:
Code: Select all
#pragma omp parallel shared(n)
for(round=0; round<n; ++round)
/* column step, I want to run this 4 G32 functions in parallel, but don't know,
   that is proper approach to this problem */
        #pragma omp critical
     G32( 0, 4, 8,12, 0);
        #pragma omp critical
     G32( 1, 5, 9,13, 1);
        #pragma omp critical
     G32( 2, 6,10,14, 2);
        #pragma omp critical
     G32( 3, 7,11,15, 3);   

/* diagonal step, and same here */
        #pragma omp critical
     G32( 0, 5,10,15, 4);
        #pragma omp critical
     G32( 1, 6,11,12, 5);
        #pragma omp critical
     G32( 2, 7, 8,13, 6);
        #pragma omp critical
     G32( 3, 4, 9,14, 7);

And this is G32 funtion:
Code: Select all
#define G32(a,b,c,d,i)\
do { \
v[a] = ADD32(v[a],v[b])+XOR32(m[sigma[round][2*i]], c32[sigma[round][2*i+1]]);\
v[d] = ROT32(XOR32(v[d],v[a]),16);\
v[c] = ADD32(v[c],v[d]);\
v[b] = ROT32(XOR32(v[b],v[c]),12);\
v[a] = ADD32(v[a],v[b])+XOR32(m[sigma[round][2*i+1]], c32[sigma[round][2*i]]);\
v[d] = ROT32(XOR32(v[d],v[a]), 8);\
v[c] = ADD32(v[c],v[d]);\
v[b] = ROT32(XOR32(v[b],v[c]), 7);\
} while (0)

So the question is how to parallelize this loop properly?
Posts: 1
Joined: Mon Feb 11, 2013 2:33 pm

Re: OpenMP and BLAKE

Postby MarkB » Tue Feb 12, 2013 3:08 am

Hi there,

You could parallelise this code using the SECTIONS construct, but the amount of work appears to be much too small to offset the overheads of the OpenMP parallel region, so you won't see any speedup. A quick look at the BLAKE2 source suggests that it is parallelised at a higher level.

Hope that helps,
Posts: 610
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 3 guests