On Wed, 24 Mar 2010, Shawn Pearce wrote:

> On Wed, Mar 24, 2010 at 10:53 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Wed, 24 Mar 2010, Fredrik Kuivinen wrote:
> >
> >> On Wed, Mar 24, 2010 at 00:50, Nicolas Pitre <nico@fluxnic.net> wrote:
> >> > On Tue, 23 Mar 2010, Fredrik Kuivinen wrote:
> >> >
> >> >> On Tue, Mar 23, 2010 at 19:43, Shawn O. Pearce <spearce@spearce.org> wrote:
> >> >> > If that is what we are doing, disabling the release of pack windows
> >> >> > when malloc fails, why can't we do that all of the time?
> >> >>
> >> >> The idea was that most git programs are single threaded, so they can
> >> >> still benefit from releasing the pack windows when they are low on
> >> >> memory.
> >> >
> >> > This is bobus. The Git program using the most memory is probably
> >> > pack-objects and it is threaded.  Most single-threaded programs don't
> >> > use close to as much memory.
> >>
> >> Ok, you are right. But xmalloc/xrealloc cannot be used in multiple
> >> threads simultaneously without some serialization.
> >>
> >> For example, I think there are some potential race conditions in the
> >> pack-objects code. In the threaded code we have the following call
> >> chains leading to xcalloc, xmalloc, and xrealloc:
> >>
> >> find_deltas -> xcalloc
> >> find_deltas -> do_compress -> xmalloc
> >> find_deltas -> try_delta -> xrealloc
> >> find_deltas -> try_delta -> read_sha1_file -> ... -> xmalloc  (called
> >> with read_lock held, but it can still race with the other calls)
> >>
> >> As far as I can see there is no serialization between these calls.
> >
> > True.  We already have a problem.  This is nasty.
> 
> The easy solution is probably to remove the use of xmalloc from
> find_deltas code path.  But then we run into hard failures when we
> can't get the memory we need, there isn't a way to recover from a
> malloc() failure deep within read_sha1_file for example.

The read_sha1_file path is not the problem -- it is always protected 
against concurrency with a mutex.

It is more about do_compress() called on line 1476 of pack-objects.c for 
example.


> The current solution is the best we can do, try to ditch pack windows 
> and hope that releases sufficient virtual memory space that a second 
> malloc() attempt can succeed by increasing heap.
> 
> We could use a mutex during the malloc failure code-path of xmalloc,
> to ensure only one thread goes through that pack window cleanup at a
> time.  But that will still mess with the main thread which doesn't
> really want to acquire mutexes during object access as it uses the
> existing pack windows.

Right.

> I thought pack-objects did all object access from the main thread and
> only delta searches on the worker threads?

No.  Each thread is responsible for grabbing its own data set.

> If that is true, maybe we
> can have the worker threads signal the main thread on malloc failure
> to release pack windows, and then wait for that signal to be
> acknowledged before they attempt to retry the malloc.  This means the
> main thread would need to periodically test that condition as its
> dispatching batches of objects to the workers.
> 
> Ugly.

Indeed.

The real solution, of course, would be to have pack window manipulations 
protected by a mutex of its own.  This, plus another mutex for the delta 
base cache, and then read_sha1_file() could almost be reentrant.

Another solution could be for xmalloc() to use a function pointer for 
the method to use on malloc error path, which would default to a 
function calling release_pack_memory(size, -1).  Then pack-objects.c 
would override the default with its own to acquire the read_mutex around 
the call to release_pack_memory().  That is probably the easiest 
solution for now.


Nicolas