Anyone have problems with STL vector resizing, and OpenMP?

General OpenMP discussion

Anyone have problems with STL vector resizing, and OpenMP?

Postby myusrnam » Mon Sep 22, 2008 9:04 am

I am wondering if anyone has run into the following problem----by the way, it is an assumption on my part that this problem exists! I believe it does, given what I know about the STL library, but I have not yet followed a debug trace to confirm an instance of it. I am worried about it, however, given that the STL and in particular, the STL vector is widely used.

The problem I see is this : if an STL vector is full and the user pushes a new element onto it, automated resizing will start. My understanding of what happens is that (a) a new, larger block of memory is allocated to hold the vector's contents, (b) the existing contents of the vector, plus the new element just 'pushed', are copied to that new location, and (here is the important part), (c) the old block of memory previously used by the vector is "freed". I put "freed" in parentheses because I am not sure if STL actually returns the memory to the outer program's memory manager, or keeps it on a list of memory available for STL's reuse. But, either way, the old block of memory is made available for new content.

So, what if we multi-threaded a body of code that had, as a shared variable, an STL vector object. Let's say there is a line of code that does a "push" onto the vector. We can insert a "critical section" pragma for this line. But, I don't believe this protects other threads that might be reading the content of the vector at the same (or psuedo-same) time that the push is being done. If the push triggers automated resizing of the vector, it seems to me there is a finite chance that another thread might read corrupted data. For instance, what if the following happens: thread A launches a read from the vector, but the data is not in cache and the processor executing thread A schedules a memory fetch, but, before this fetch goes out (meaning, the memory address has been sent to the processor's bus interface unit, but it's sitting queued there), thread B is doing a critical section push (onto the vector) and the entire process of copying the vector's contents and "freeing" the old block of memory completes, and, then---all of this happening before thread A's processor gets access to the memory bus----another thread, thread C, causes something to be written to the old memory address. For instance, thread C could be creating a new STL vector, a private variable for that thread, and STL might be letting it reuse the memory that was freed. Then, when thread A's processor completes its memory fetch, it will get corrupted data.

This scenario seems possible. I can imagine ways around it---for instance, make all the code that reads from the vector be critical sections just like the push. But, that would be incompatible with good performance.

So, what has others' experience been? Have you thought about this problem, or experienced it? Have you implemented a solution?

Thanks,

Richard
myusrnam
 
Posts: 17
Joined: Wed Sep 17, 2008 4:43 pm

Re: Anyone have problems with STL vector resizing, and OpenMP?

Postby ejd » Tue Sep 23, 2008 7:01 am

What you are describing is a data race condition. Before the OpenMP V3.0 spec, the consequences of a data race were not really clearly spelled out. In the OpenMP V3.0 spec, in section 1.4 Memory Model (P14:19-20), it states, "if a data race occurs then the result of the program is unspecified". So while you may hate to lose performance by protecting these sections of code, you have no choice if you want the program to function correctly.

Since I am a "casual" user of C++, I also asked my resident expert. His reply follows:

Myusrnam is correct that modifying a container will invalidate iterators and references to elements, and that other threads can be affected without knowing that the container has been modified without their knowledge. In the general multi-threaded case, you need to lock a container for as long as you need iterators or references into the object to remain valid. In a case where, for example, you know that elements will be added only at the end of a vector, you could keep indices into the vector instead of iterators or references. An index in this context will remain valid.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Anyone have problems with STL vector resizing, and OpenMP?

Postby myusrnam » Wed Sep 24, 2008 6:42 am

Thanks for your reply. I agree with everything said except the last part, about indices into an STL vector still being valid even if the vector is modified. I think the case that I outlined in my first email can be construed as using an index, and I think it will produce an invalid result.

In that example, thread A is reading the vector. Let's assume the code that does the read is using an index, i.e., the code looks something like this (assume 'index' is an integer variable):
someVariable = vectorObject[index];
It is my understanding that an STL vector holds all its elements in a contiguous block of memory. The "index" will be used to compute an offset from the base of that block of memory, i.e., it will be used to compute a memory address. In my example, the processor executing this read sends this address to its load-store unit. But, the data is *not* found in the cache. This means that an access will have to be made over the system bus to main memory (processors usually have multiple levels of caching, but let's assume the data is not in any of them). However, we are also assuming that this bus is shared by other processors executing other threads. In my example, those other processors grab the bus first and use it to overwrite the memory at the address thread A is trying to read from. By the time thread A gets the data back, it has been corrupted.

So, I don't think use of indices in STL vectors is safe under multi-threading.

When I decided to make this post, I wanted to do so to confirm that use of an STL vector has to be very tightly controlled under mulit-threading, and I wanted to discuss the implications of this, which I'll now do. I think this is a problem, in that the way many people want to use openMP is to alter code originally written for single thread operation. The use of STL and in particular the STL vector is so common in C++ that a lot of the code we will want to alter will have this problem. The problem is particularly tricky because the operation that invalidates the data----which is the built-in automated resizing algorithm of STL---is hidden from the user. The user sees a line of code like this:

someVector.push_back(someObject);

and does not realize that this seemingly innocent looking line *may not* just be adding one more element to an existing block of memroy, it may overwriting the entire contents of that block of memory by triggering STL's automed resizing of the vector, which copies all the existing data elsewhere and "frees" the old block.

This creates a large methodology problem. The attractiveness of STL in the past has been precisely that the user, in general, did not need to know the details of how things were being done under the covers. Now, when using STL under multi-threading, the user *does* need to know these things, and the details are hidden in documents and files the user has been told he/she needn't look at or understand. This will be a big methodology shift. It will be very painful to migrate a lot of existing C++ code to OpenMP, because of this issue. Even if a person understands the problem, it takes a lot of time to restructure existing code to avoid it.

I guess I'm just stating the obvious. But, then again, maybe it's not obvious to some, and I am perhaps doing a service by stating it.

Thanks!

Richard
myusrnam
 
Posts: 17
Joined: Wed Sep 17, 2008 4:43 pm

Re: Anyone have problems with STL vector resizing, and OpenMP?

Postby ejd » Wed Sep 24, 2008 9:30 am

The problem you are pointing out is not only a problem with OpenMP, but with any attempt at threading a C++ program. It depends on how the STL is implemented as to how much locking needs to be done and at what level. There are some parallel implementations of the STL that do better for threading, but have a lot of locking overhead. The other problem is in the terminology. A "thread-safe" implementation of the STL can mean several different things. It could mean that the library can be called by different threads. However, it may not mean that the data objects that they handle are really thread-safe. All in all, it is quite a mess - one not specific to C++, but to any library trying to provide parallel support.

So you are correct (and this has been pointed out in some of the other topics in the forum). When trying to parallelize any program, you need to be careful when using any function or subroutine. You really have to know that it is thread-safe (in all meanings of the word) or you need to do the locking required to make sure it is safe. Parallelism unfortunately still is not for the faint at heart and the older languages which OpenMP tries to cover haven't addressed the issue yet (though Fortran is looking at co-arrays and C++ is looking at an MPI interface). It really has to be done at the language level to be done right and it will still be awhile before that is done. Meanwhile, OpenMP is an attempt to give the user some ability to parallelize their program - though it has it's pitfalls. But what doesn't. We could spend hours talking about the problems in Fortran, C, and C++. That is why there are new programming languages developed every day by someone.
ejd
 
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Anyone have problems with STL vector resizing, and OpenMP?

Postby myusrnam » Thu Sep 25, 2008 7:43 am

First of all, thank you for your reply. I am new to mulit-threading. I don't mean that I'm new to the concept, and I've even used mulit-threaded programs some in the past. But, what is new (to me) is that this is the first time I've taken on the task of trying to make an existing application program run under multi-threading. I apologize if my emails have the tone of someone discovering a major problem others are unaware of----I should have been aware that others have been working in this area for a long time and *are* aware of all that I'm finding out. So, yes, it is difficult to do this work because support in existing languages / libraries is not what it needs to be. That seems to be true, and thank you for validating it.

Before concluding the thread (no pun intended) on this topic, if anyone has any direct experience solving this particular problem, i.e., making use of the STL vector "thread safe", and in particular, controlling automated resizing so that it is thread safe, I'd be interested to hear about it.

I've already considered the fact that STL provides a couple of different ways to "pre-sizing" a vector, precisely so that the user can avoid automated resizing. The problem, of course, is that the user might be wrong in the pre-sizing estimate, or it might not be possible at all, under the circumstances, to presize (for example, when storing objects read in from some user-supplied input, where the number of objects is not possible to know in advance)

Richard
myusrnam
 
Posts: 17
Joined: Wed Sep 17, 2008 4:43 pm


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 12 guests