After some trial and error the issue is resolved. Part of my code is called repeatedly, in the millions, and it uses a few local std::vector of some data type of size about 100s bytes. The memory management should be very simple compared to the complexity of the computations involved. But somehow the memory management brings down the whole process.
I think it's not rare that memory footprint strongly penalizes performance. However, could you find ay relationship among memory (vector) accesses with debug/release binary code?