All of lore.kernel.org
 help / color / mirror / Atom feed
* Concurrent pushes updating the same ref
@ 2011-01-06 15:46 Marc Branchaud
  2011-01-06 16:30 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Branchaud @ 2011-01-06 15:46 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

[ BACKGROUND: I've modified our build system to push a custom ref at the
start of each build.  The aim is to identify in the repo which revision got
built.  For us, an overall "build" consists of creating about a dozen
products, all from the same source tree.  The build system (Hudson) launches
each product's build concurrently on one or more build slaves.  Each of those
individual product builds clones the repo, checks out the appropriate
revision, and pushes up the custom ref.  (I would have liked to make the
Hudson master job push up the ref, instead of all the slave jobs, but I
couldn't find a way to do that.) ]

Usually this works:  Each slave is setting the ref to the same value, so the
order of the updates doesn't matter.  But every once in a while, the push
fails with:

fatal: Unable to create
'/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
If no other git process is currently running, this probably means a
git process crashed in this repository earlier. Make sure no other git
process is running and remove the file manually to continue.
fatal: The remote end hung up unexpectedly

I think the cause is pretty obvious, and in a normal interactive situation
the solution would be to simply try again.  But in a script trying again
isn't so straightforward.

So I'm wondering if there's any sense or desire to make git a little more
flexible here.  Maybe teach it to wait and try again once or twice when it
sees a lock file.  I presume that normally a ref lock file should disappear
pretty quickly, so there shouldn't be a need to wait very long.

Thoughts?

		M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 15:46 Concurrent pushes updating the same ref Marc Branchaud
@ 2011-01-06 16:30 ` Jeff King
  2011-01-06 16:48   ` Shawn Pearce
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jeff King @ 2011-01-06 16:30 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Git Mailing List

On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote:

> fatal: Unable to create
> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
> If no other git process is currently running, this probably means a
> git process crashed in this repository earlier. Make sure no other git
> process is running and remove the file manually to continue.
> fatal: The remote end hung up unexpectedly
> 
> I think the cause is pretty obvious, and in a normal interactive situation
> the solution would be to simply try again.  But in a script trying again
> isn't so straightforward.
> 
> So I'm wondering if there's any sense or desire to make git a little more
> flexible here.  Maybe teach it to wait and try again once or twice when it
> sees a lock file.  I presume that normally a ref lock file should disappear
> pretty quickly, so there shouldn't be a need to wait very long.

Yeah, we probably should try again. The simplest possible (and untested)
patch is below. However, a few caveats:

  1. This patch unconditionally retries for all lock files. Do all
     callers want that? I wonder if there are any exploratory lock
     acquisitions that would rather return immediately than have some
     delay.

  2. The number of tries and sleep time are pulled out of a hat.

  3. Even with retries, I don't know if you will get the behavior you
     want. The lock procedure for refs is:

        1. get the lock
        2. check and remember the sha1
        3. release the lock
        4. do some long-running work (like the actual push)
        5. get the lock
        6. check that the sha1 is the same as the remembered one
        7. update the sha1
        8. release the lock

     Right now you are getting contention on the lock itself. But may
     you not also run afoul of step (6) above? That is, one push updates
     the ref from A to B, then the other one in attempting to go from A
     to B sees that it has already changed to B under our feet and
     complains?

     I can certainly think of a rule around that special case (if we are
     going to B, and it already changed to B, silently leave it alone
     and pretend we wrote it), but I don't know how often that would be
     useful in the real world.

Anyway, patch (for discussion, not inclusion) is below.

diff --git a/lockfile.c b/lockfile.c
index b0d74cd..3329719 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -122,7 +122,7 @@ static char *resolve_symlink(char *p, size_t s)
 }
 
 
-static int lock_file(struct lock_file *lk, const char *path, int flags)
+static int lock_file_single(struct lock_file *lk, const char *path, int flags)
 {
 	if (strlen(path) >= sizeof(lk->filename))
 		return -1;
@@ -155,6 +155,21 @@ static int lock_file(struct lock_file *lk, const char *path, int flags)
 	return lk->fd;
 }
 
+static int lock_file(struct lock_file *lk, const char *path, int flags)
+{
+	int tries;
+	int fd;
+	for (tries = 0; tries < 3; tries++) {
+		fd = lock_file_single(lk, path, flags);
+		if (fd >= 0)
+			return fd;
+		if (errno != EEXIST)
+			return fd;
+		sleep(1);
+	}
+	return fd;
+}
+
 static char *unable_to_lock_message(const char *path, int err)
 {
 	struct strbuf buf = STRBUF_INIT;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 16:30 ` Jeff King
@ 2011-01-06 16:48   ` Shawn Pearce
  2011-01-06 17:28     ` Ilari Liusvaara
  2011-01-06 17:12   ` Marc Branchaud
  2011-01-06 19:37   ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: Shawn Pearce @ 2011-01-06 16:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Marc Branchaud, Git Mailing List

On Thu, Jan 6, 2011 at 08:30, Jeff King <peff@peff.net> wrote:
>
> Yeah, we probably should try again. The simplest possible (and untested)
> patch is below. However, a few caveats:
>
>  1. This patch unconditionally retries for all lock files. Do all
>     callers want that? I wonder if there are any exploratory lock
>     acquisitions that would rather return immediately than have some
>     delay.

I don't see why not.  We shouldn't be exploring to see if a lock is
possible anywhere.

>  2. The number of tries and sleep time are pulled out of a hat.

FWIW, JGit has started to do some of this stuff for Windows.  We're
using 10 retries, with a delay of 100 milliseconds between each.  This
was also pulled out of a hat, but it seems to have resolved the bug
reports that came in on Windows.  We unfortunately have to do retries
on directory and file deletion.

>  3. Even with retries, I don't know if you will get the behavior you
>     want. The lock procedure for refs is:
>
>        1. get the lock
>        2. check and remember the sha1
>        3. release the lock

Why are we locking the ref to read it?  You can read a ref atomically
without locking.

>        4. do some long-running work (like the actual push)
>        5. get the lock
>        6. check that the sha1 is the same as the remembered one
>        7. update the sha1
>        8. release the lock
>
>     Right now you are getting contention on the lock itself. But may
>     you not also run afoul of step (6) above? That is, one push updates
>     the ref from A to B, then the other one in attempting to go from A
>     to B sees that it has already changed to B under our feet and
>     complains?

Not if its a force push.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 16:30 ` Jeff King
  2011-01-06 16:48   ` Shawn Pearce
@ 2011-01-06 17:12   ` Marc Branchaud
  2011-01-10 22:14     ` Marc Branchaud
  2011-01-06 19:37   ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: Marc Branchaud @ 2011-01-06 17:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 11-01-06 11:30 AM, Jeff King wrote:
> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote:
> 
>> fatal: Unable to create
>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
>> If no other git process is currently running, this probably means a
>> git process crashed in this repository earlier. Make sure no other git
>> process is running and remove the file manually to continue.
>> fatal: The remote end hung up unexpectedly
>>
>> I think the cause is pretty obvious, and in a normal interactive situation
>> the solution would be to simply try again.  But in a script trying again
>> isn't so straightforward.
>>
>> So I'm wondering if there's any sense or desire to make git a little more
>> flexible here.  Maybe teach it to wait and try again once or twice when it
>> sees a lock file.  I presume that normally a ref lock file should disappear
>> pretty quickly, so there shouldn't be a need to wait very long.
> 
> Yeah, we probably should try again. The simplest possible (and untested)
> patch is below. However, a few caveats:
> 
>   1. This patch unconditionally retries for all lock files. Do all
>      callers want that? I wonder if there are any exploratory lock
>      acquisitions that would rather return immediately than have some
>      delay.
> 
>   2. The number of tries and sleep time are pulled out of a hat.
> 
>   3. Even with retries, I don't know if you will get the behavior you
>      want. The lock procedure for refs is:
> 
>         1. get the lock
>         2. check and remember the sha1
>         3. release the lock
>         4. do some long-running work (like the actual push)
>         5. get the lock
>         6. check that the sha1 is the same as the remembered one
>         7. update the sha1
>         8. release the lock
> 
>      Right now you are getting contention on the lock itself. But may
>      you not also run afoul of step (6) above? That is, one push updates
>      the ref from A to B, then the other one in attempting to go from A
>      to B sees that it has already changed to B under our feet and
>      complains?

Could not anything run afoul of step (6)?  Who knows what might happen in
step (4)...

However, in my particular case I'm using a "force" refspec:

	git push origin +HEAD:refs/builds/${TAG}

so (as Shawn says) step (6) shouldn't matter, right?  Plus, all the
concurrent pushes are setting the ref to the same value anyway.

This is fairly degenerate behaviour though.

>      I can certainly think of a rule around that special case (if we are
>      going to B, and it already changed to B, silently leave it alone
>      and pretend we wrote it), but I don't know how often that would be
>      useful in the real world.

Yes -- useful in my case, but otherwise...  Still, I think it would be
more-correct to do that.

		M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 16:48   ` Shawn Pearce
@ 2011-01-06 17:28     ` Ilari Liusvaara
  0 siblings, 0 replies; 8+ messages in thread
From: Ilari Liusvaara @ 2011-01-06 17:28 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Jeff King, Marc Branchaud, Git Mailing List

On Thu, Jan 06, 2011 at 08:48:11AM -0800, Shawn Pearce wrote:
> On Thu, Jan 6, 2011 at 08:30, Jeff King <peff@peff.net> wrote:
> >
> >     Right now you are getting contention on the lock itself. But may
> >     you not also run afoul of step (6) above? That is, one push updates
> >     the ref from A to B, then the other one in attempting to go from A
> >     to B sees that it has already changed to B under our feet and
> >     complains?
> 
> Not if its a force push.  :-)

IIRC, there are no wire protocol bits to denote a forced push, the
force option only overrides client-side checks. Thus, even forced pushes
can fail due to race conditions...

-Ilari

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 16:30 ` Jeff King
  2011-01-06 16:48   ` Shawn Pearce
  2011-01-06 17:12   ` Marc Branchaud
@ 2011-01-06 19:37   ` Junio C Hamano
  2011-01-06 21:51     ` Marc Branchaud
  2 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2011-01-06 19:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Marc Branchaud, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote:
>
>> fatal: Unable to create
>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
>> If no other git process is currently running, this probably means a
>> git process crashed in this repository earlier. Make sure no other git
>> process is running and remove the file manually to continue.
>> fatal: The remote end hung up unexpectedly
>> 
>> I think the cause is pretty obvious, and in a normal interactive situation
>> the solution would be to simply try again.  But in a script trying again
>> isn't so straightforward.
>> 
>> So I'm wondering if there's any sense or desire to make git a little more
>> flexible here.  Maybe teach it to wait and try again once or twice when it
>> sees a lock file.  I presume that normally a ref lock file should disappear
>> pretty quickly, so there shouldn't be a need to wait very long.
>
> Yeah, we probably should try again. The simplest possible (and untested)
> patch is below. However, a few caveats:
>
>   1. This patch unconditionally retries for all lock files. Do all
>      callers want that?

I actually have to say that _no_ caller should want this.  If somebody
earlier crashed, we would want to know about it (and how).  If somebody
else alive is actively holding a lock, why not make it the responsibility
of a calling script to decide if it wants to retry itself or perhaps
decide to do something else?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 19:37   ` Junio C Hamano
@ 2011-01-06 21:51     ` Marc Branchaud
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Branchaud @ 2011-01-06 21:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On 11-01-06 02:37 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote:
>>
>>> fatal: Unable to create
>>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
>>> If no other git process is currently running, this probably means a
>>> git process crashed in this repository earlier. Make sure no other git
>>> process is running and remove the file manually to continue.
>>> fatal: The remote end hung up unexpectedly
>>>
>>> I think the cause is pretty obvious, and in a normal interactive situation
>>> the solution would be to simply try again.  But in a script trying again
>>> isn't so straightforward.
>>>
>>> So I'm wondering if there's any sense or desire to make git a little more
>>> flexible here.  Maybe teach it to wait and try again once or twice when it
>>> sees a lock file.  I presume that normally a ref lock file should disappear
>>> pretty quickly, so there shouldn't be a need to wait very long.
>>
>> Yeah, we probably should try again. The simplest possible (and untested)
>> patch is below. However, a few caveats:
>>
>>   1. This patch unconditionally retries for all lock files. Do all
>>      callers want that?
> 
> I actually have to say that _no_ caller should want this.  If somebody
> earlier crashed, we would want to know about it (and how).  If somebody
> else alive is actively holding a lock, why not make it the responsibility
> of a calling script to decide if it wants to retry itself or perhaps
> decide to do something else?

I'm not sure I follow this.

How would retrying a few times prevent us from finding out about an earlier
crash?  It's not like we're overriding the lock by retrying.  Nobody's going
to be able to remove a lock created by a crashed process, right?

And if someone active doesn't release the lock and the low-level code retried
a few times, the caller can still decide what to do.  I don't see how it
would even impact that decision -- if the caller wants to try again, the
system can still retry a few times underneath the caller's one retry.  It
seems fine to me.

		M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concurrent pushes updating the same ref
  2011-01-06 17:12   ` Marc Branchaud
@ 2011-01-10 22:14     ` Marc Branchaud
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Branchaud @ 2011-01-10 22:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 11-01-06 12:12 PM, Marc Branchaud wrote:
> On 11-01-06 11:30 AM, Jeff King wrote:
>> On Thu, Jan 06, 2011 at 10:46:38AM -0500, Marc Branchaud wrote:
>>
>>> fatal: Unable to create
>>> '/usr/xiplink/git/public/Main.git/refs/builds/3.3.0-3.lock': File exists.
>>> If no other git process is currently running, this probably means a
>>> git process crashed in this repository earlier. Make sure no other git
>>> process is running and remove the file manually to continue.
>>> fatal: The remote end hung up unexpectedly
>>>
>>> I think the cause is pretty obvious, and in a normal interactive situation
>>> the solution would be to simply try again.  But in a script trying again
>>> isn't so straightforward.
>>>
>>> So I'm wondering if there's any sense or desire to make git a little more
>>> flexible here.  Maybe teach it to wait and try again once or twice when it
>>> sees a lock file.  I presume that normally a ref lock file should disappear
>>> pretty quickly, so there shouldn't be a need to wait very long.
>>
>> Yeah, we probably should try again. The simplest possible (and untested)
>> patch is below. However, a few caveats:
>>
>>   1. This patch unconditionally retries for all lock files. Do all
>>      callers want that? I wonder if there are any exploratory lock
>>      acquisitions that would rather return immediately than have some
>>      delay.
>>
>>   2. The number of tries and sleep time are pulled out of a hat.
>>
>>   3. Even with retries, I don't know if you will get the behavior you
>>      want. The lock procedure for refs is:
>>
>>         1. get the lock
>>         2. check and remember the sha1
>>         3. release the lock
>>         4. do some long-running work (like the actual push)
>>         5. get the lock
>>         6. check that the sha1 is the same as the remembered one
>>         7. update the sha1
>>         8. release the lock
>>
>>      Right now you are getting contention on the lock itself. But may
>>      you not also run afoul of step (6) above? That is, one push updates
>>      the ref from A to B, then the other one in attempting to go from A
>>      to B sees that it has already changed to B under our feet and
>>      complains?
> 
> Could not anything run afoul of step (6)?  Who knows what might happen in
> step (4)...
> 
> However, in my particular case I'm using a "force" refspec:
> 
> 	git push origin +HEAD:refs/builds/${TAG}
> 
> so (as Shawn says) step (6) shouldn't matter, right?  Plus, all the
> concurrent pushes are setting the ref to the same value anyway.

Well, after modifying my build script to ignore failed pushes, I do
occasionally see failures like this:

remote: fatal: Invalid revision range
0000000000000000000000000000000000000000..1c58dc4c3fdd9475d26d0eb797cc096fb622a594
error: Ref refs/builds/3.3.0-9 is at 1c58dc4c3fdd9475d26d0eb797cc096fb622a594
but expected 0000000000000000000000000000000000000000
remote: error: failed to lock refs/builds/3.3.0-9

So I guess even the "force" refspec is getting blocked by step 6.

FYI, the repo receiving the push is running git 1.7.1.

		M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-10 22:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-06 15:46 Concurrent pushes updating the same ref Marc Branchaud
2011-01-06 16:30 ` Jeff King
2011-01-06 16:48   ` Shawn Pearce
2011-01-06 17:28     ` Ilari Liusvaara
2011-01-06 17:12   ` Marc Branchaud
2011-01-10 22:14     ` Marc Branchaud
2011-01-06 19:37   ` Junio C Hamano
2011-01-06 21:51     ` Marc Branchaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.