All of lore.kernel.org
 help / color / mirror / Atom feed
* concurrent fetches to update same mirror
@ 2011-01-05 20:33 Neal Kreitzinger
  2011-01-05 20:47 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Neal Kreitzinger @ 2011-01-05 20:33 UTC (permalink / raw)
  To: git

If two or more different users perform a git-fetch on the same mirror 
(--mirror) repo concurrently, could that cause corruption?  I tried a manual 
test using the git protocol over separate machines and they both thought 
they needed to do the full updates and they both appeared to work.  I'm not 
sure if git is serializing this, or if it is possible for concurrent fetches 
to step on each other.

v/r,
Neal 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 20:33 concurrent fetches to update same mirror Neal Kreitzinger
@ 2011-01-05 20:47 ` Jeff King
  2011-01-05 20:51   ` Shawn Pearce
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-01-05 20:47 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: git

On Wed, Jan 05, 2011 at 02:33:36PM -0600, Neal Kreitzinger wrote:

> If two or more different users perform a git-fetch on the same mirror 
> (--mirror) repo concurrently, could that cause corruption?  I tried a manual 
> test using the git protocol over separate machines and they both thought 
> they needed to do the full updates and they both appeared to work.  I'm not 
> sure if git is serializing this, or if it is possible for concurrent fetches 
> to step on each other.

No, it shouldn't cause corruption, but it will cause wasted effort and
it may cause one to report failure. The fetch process gets all of the
objects first, and then updates the ref (so we never have refs that
point to object we didn't get yet). So both of the concurrent fetches
will see that we have a big set of objects to get and will work on
getting them at the same time, after which they will update the refs
appropriately (presumably to the same thing).

I haven't looked specifically at how fetch does locking, but usually the
procedure is to lock the ref, fetch the old value, unlock it, then do
some long-running task (like fetching objects), then lock again, check
that the old value didn't change out from under us, update it, then
unlock. In which case one of the fetches might see "oops, somebody
updated while we were fetching" and complain.

However, in the default configuration, we fetch using a "+" refspec,
which forces update of the ref even in the case of a non-fast-forward. I
don't know whether that force also would override any lock-checking.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 20:47 ` Jeff King
@ 2011-01-05 20:51   ` Shawn Pearce
  2011-01-05 20:53     ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Pearce @ 2011-01-05 20:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Neal Kreitzinger, git

On Wed, Jan 5, 2011 at 12:47, Jeff King <peff@peff.net> wrote:
>
> However, in the default configuration, we fetch using a "+" refspec,
> which forces update of the ref even in the case of a non-fast-forward. I
> don't know whether that force also would override any lock-checking.

Nope, it doesn't.  We still use locking to update the refs, to ensure
the update is seen atomically by a reader.  The + just means don't
check that the old value is fully reachable from the new after the
lock as been taken.

If both fetch processes try to update the same ref at the same time,
one will get the lock and continue, and the other will crash with an
error (because the lock was busy).  If one is slightly slower than the
other, they will probably update the refs twice, with the slower fetch
updating what the faster one had just updated.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 20:51   ` Shawn Pearce
@ 2011-01-05 20:53     ` Jeff King
  2011-01-05 21:13       ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-01-05 20:53 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Neal Kreitzinger, git

On Wed, Jan 05, 2011 at 12:51:12PM -0800, Shawn Pearce wrote:

> On Wed, Jan 5, 2011 at 12:47, Jeff King <peff@peff.net> wrote:
> >
> > However, in the default configuration, we fetch using a "+" refspec,
> > which forces update of the ref even in the case of a non-fast-forward. I
> > don't know whether that force also would override any lock-checking.
> 
> Nope, it doesn't.  We still use locking to update the refs, to ensure
> the update is seen atomically by a reader.  The + just means don't
> check that the old value is fully reachable from the new after the
> lock as been taken.

Good, that's what IMHO it _should_ do. :)

> If both fetch processes try to update the same ref at the same time,
> one will get the lock and continue, and the other will crash with an
> error (because the lock was busy).  If one is slightly slower than the
> other, they will probably update the refs twice, with the slower fetch
> updating what the faster one had just updated.  :-)

I assumed it would take the "old" value at the very beginning of the
fetch (before talking with the remote), and then see that the ref was
changed under our feet. Or does it simply do it at the end?

... goes to read code ...

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 20:53     ` Jeff King
@ 2011-01-05 21:13       ` Jeff King
  2011-01-05 22:34         ` Neal Kreitzinger
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeff King @ 2011-01-05 21:13 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Neal Kreitzinger, git

On Wed, Jan 05, 2011 at 03:53:25PM -0500, Jeff King wrote:

> > If both fetch processes try to update the same ref at the same time,
> > one will get the lock and continue, and the other will crash with an
> > error (because the lock was busy).  If one is slightly slower than the
> > other, they will probably update the refs twice, with the slower fetch
> > updating what the faster one had just updated.  :-)
> 
> I assumed it would take the "old" value at the very beginning of the
> fetch (before talking with the remote), and then see that the ref was
> changed under our feet. Or does it simply do it at the end?

Hmm. Weirder even, builtin/fetch.c:s_update_ref takes a "check_old"
flag, and we do always use it for branch updates. But not for tag
updates. I can't think of why. The code blames all the way back to the
original builtin-fetch.

Anyway, when we do check, we check the value from the beginning of the
fetch. So you can get lock conflicts. For example, doing this:

  mkdir repo && cd repo && git init
  echo contents >foo && git add . && git commit -m one
  git update-ref refs/remotes/origin/master refs/heads/master
  git remote add origin some-remote-repo-that-takes-a-few-seconds
  xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'

I.e., putting some cruft into the ref and then updating it. One fetch
will force-write over the ref properly:

   + ac32203...4e64590 master     -> origin/master  (forced update)

but the other one will barf on the lock:

  error: Ref refs/remotes/origin/master is at 4e6459052ab329914c7712a926773e566b8c821d but expected ac32203727daa3bcb5fc041786aa45adbbe86299
  ...
   ! ac32203...4e64590 master     -> origin/master  (unable to update local ref)

Interestingly, in the case of ref _creation_, not update, like this:

  mkdir repo && cd repo && git init
  git remote add origin some-remote-repo-that-takes-a-few-seconds
  xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'

then both will happily update, the second one overwriting the results of
the first. It seems in the case of locking a ref which previously didn't
exist, we don't enforce that it still doesn't exist.

I wonder if we should, but perhaps there is some corner case I am not
considering. The code is in lock_ref_sha1_basic, but blaming didn't turn
up anything helpful.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 21:13       ` Jeff King
@ 2011-01-05 22:34         ` Neal Kreitzinger
  2011-01-05 22:42         ` Neal Kreitzinger
  2011-01-05 23:29         ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Neal Kreitzinger @ 2011-01-05 22:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git

On 1/5/2011 3:13 PM, Jeff King wrote:
> On Wed, Jan 05, 2011 at 03:53:25PM -0500, Jeff King wrote:
>
>>> If both fetch processes try to update the same ref at the same time,
>>> one will get the lock and continue, and the other will crash with an
>>> error (because the lock was busy).  If one is slightly slower than the
>>> other, they will probably update the refs twice, with the slower fetch
>>> updating what the faster one had just updated.  :-)
>>
>> I assumed it would take the "old" value at the very beginning of the
>> fetch (before talking with the remote), and then see that the ref was
>> changed under our feet. Or does it simply do it at the end?
>
> Hmm. Weirder even, builtin/fetch.c:s_update_ref takes a "check_old"
> flag, and we do always use it for branch updates. But not for tag
> updates. I can't think of why. The code blames all the way back to the
> original builtin-fetch.
>
> Anyway, when we do check, we check the value from the beginning of the
> fetch. So you can get lock conflicts. For example, doing this:
>
>    mkdir repo&&  cd repo&&  git init
>    echo contents>foo&&  git add .&&  git commit -m one
>    git update-ref refs/remotes/origin/master refs/heads/master
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> I.e., putting some cruft into the ref and then updating it. One fetch
> will force-write over the ref properly:
>
>     + ac32203...4e64590 master     ->  origin/master  (forced update)
>
> but the other one will barf on the lock:
>
>    error: Ref refs/remotes/origin/master is at 4e6459052ab329914c7712a926773e566b8c821d but expected ac32203727daa3bcb5fc041786aa45adbbe86299
>    ...
>     ! ac32203...4e64590 master     ->  origin/master  (unable to update local ref)
>
> Interestingly, in the case of ref _creation_, not update, like this:
>
>    mkdir repo&&  cd repo&&  git init
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.
>
> I wonder if we should, but perhaps there is some corner case I am not
> considering. The code is in lock_ref_sha1_basic, but blaming didn't turn
> up anything helpful.
>
> -Peff

This was actually the case in my test.  Updates to the mirror are always 
new branches except for master.  The only pre-existing branch that might 
get updated is master, but in that test it didn't.  The new branches and 
tags were updated.  The new tags always point to the new branches.  I'm 
running 1.7.1 on both servers.

v/r,
Neal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 21:13       ` Jeff King
  2011-01-05 22:34         ` Neal Kreitzinger
@ 2011-01-05 22:42         ` Neal Kreitzinger
  2011-01-05 22:57           ` Jeff King
  2011-01-05 23:29         ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Neal Kreitzinger @ 2011-01-05 22:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git

On 1/5/2011 3:13 PM, Jeff King wrote:
> On Wed, Jan 05, 2011 at 03:53:25PM -0500, Jeff King wrote:
>
>>> If both fetch processes try to update the same ref at the same time,
>>> one will get the lock and continue, and the other will crash with an
>>> error (because the lock was busy).  If one is slightly slower than the
>>> other, they will probably update the refs twice, with the slower fetch
>>> updating what the faster one had just updated.  :-)
>>
>> I assumed it would take the "old" value at the very beginning of the
>> fetch (before talking with the remote), and then see that the ref was
>> changed under our feet. Or does it simply do it at the end?
>
> Hmm. Weirder even, builtin/fetch.c:s_update_ref takes a "check_old"
> flag, and we do always use it for branch updates. But not for tag
> updates. I can't think of why. The code blames all the way back to the
> original builtin-fetch.
>
> Anyway, when we do check, we check the value from the beginning of the
> fetch. So you can get lock conflicts. For example, doing this:
>
>    mkdir repo&&  cd repo&&  git init
>    echo contents>foo&&  git add .&&  git commit -m one
>    git update-ref refs/remotes/origin/master refs/heads/master
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> I.e., putting some cruft into the ref and then updating it. One fetch
> will force-write over the ref properly:
>
>     + ac32203...4e64590 master     ->  origin/master  (forced update)
>
> but the other one will barf on the lock:
>
>    error: Ref refs/remotes/origin/master is at 4e6459052ab329914c7712a926773e566b8c821d but expected ac32203727daa3bcb5fc041786aa45adbbe86299
>    ...
>     ! ac32203...4e64590 master     ->  origin/master  (unable to update local ref)
>
> Interestingly, in the case of ref _creation_, not update, like this:
>
>    mkdir repo&&  cd repo&&  git init
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.
>
> I wonder if we should, but perhaps there is some corner case I am not
> considering. The code is in lock_ref_sha1_basic, but blaming didn't turn
> up anything helpful.
>
> -Peff

In the case of concurrent pulls to the same non-bare repo, could the 
working tree or index get corrupted, or does git have concurrency 
control mechanisms for this too?

v/r,
Neal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 22:42         ` Neal Kreitzinger
@ 2011-01-05 22:57           ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2011-01-05 22:57 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Shawn Pearce, Neal Kreitzinger, git

On Wed, Jan 05, 2011 at 04:42:49PM -0600, Neal Kreitzinger wrote:

> In the case of concurrent pulls to the same non-bare repo, could the
> working tree or index get corrupted, or does git have concurrency
> control mechanisms for this too?

There's a lock on the index, so it shouldn't be corruptable; one process
will just end up waiting. I'm not sure offhand whether writing working
tree files is done under any lock, but I would tend to think not, since
it can be a long process. However, writing the same file twice should be
OK; we unlink the old version and create the new from scratch. So the
first writer will get its write-in-progress unlinked, and the second one
will "win".

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 21:13       ` Jeff King
  2011-01-05 22:34         ` Neal Kreitzinger
  2011-01-05 22:42         ` Neal Kreitzinger
@ 2011-01-05 23:29         ` Junio C Hamano
  2011-01-06 23:45           ` Jeff King
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-01-05 23:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git

Jeff King <peff@peff.net> writes:

> Interestingly, in the case of ref _creation_, not update, like this:
>
>   mkdir repo && cd repo && git init
>   git remote add origin some-remote-repo-that-takes-a-few-seconds
>   xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.

We probably should, especially when there is no --force or +prefix is
involved.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-05 23:29         ` Junio C Hamano
@ 2011-01-06 23:45           ` Jeff King
  2011-01-07 14:50             ` Marc Branchaud
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-01-06 23:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Neal Kreitzinger, git

On Wed, Jan 05, 2011 at 03:29:47PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Interestingly, in the case of ref _creation_, not update, like this:
> >
> >   mkdir repo && cd repo && git init
> >   git remote add origin some-remote-repo-that-takes-a-few-seconds
> >   xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'
> >
> > then both will happily update, the second one overwriting the results of
> > the first. It seems in the case of locking a ref which previously didn't
> > exist, we don't enforce that it still doesn't exist.
> 
> We probably should, especially when there is no --force or +prefix is
> involved.

Hmph. So I created the test below to try to exercise this, expecting to
see at least one failure: according to the above example, we aren't
actually checking "null sha1 means ref must not exist", so we should get
an erroneous success for that case. And there is the added complication
that the null sha1 may also mean "don't care what the old one was". So
even if I changed the code, we would get erroneous failures the other
way.

But much to my surprise, it actually passes with stock git. Which means
I need to dig a little further to see exactly what is going on.

Hooray for test-driven development, I guess? :)

diff --git a/t/t1403-concurrent-refs.sh b/t/t1403-concurrent-refs.sh
new file mode 100755
index 0000000..7fb4424
--- /dev/null
+++ b/t/t1403-concurrent-refs.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+test_description='test behavior of concurrent ref updates'
+. ./test-lib.sh
+
+ref=refs/heads/foo
+null=0000000000000000000000000000000000000000
+
+check_ref() {
+	echo $2 >expect &&
+	git rev-parse --verify $1 >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success setup '
+	for name in A B C; do
+		test_tick &&
+		T=$(git write-tree) &&
+		sha1=$(echo $name | git commit-tree $T) &&
+		eval $name=$sha1
+	done
+'
+
+test_expect_success '(create ref, expecting non-null sha1) should fail' '
+	test_must_fail git update-ref $ref $A $C &&
+	test_must_fail git rev-parse --verify $ref
+'
+
+test_expect_success '(create ref, expecting null sha1) should work' '
+	git update-ref $ref $A $null &&
+	check_ref $ref $A
+'
+
+test_expect_success '(update ref, expecting null sha1) should fail' '
+	test_must_fail git update-ref $ref $B $null &&
+	check_ref $ref $A
+'
+
+test_expect_success '(update ref, expecting wrong sha1) should fail' '
+	test_must_fail git update-ref $ref $B $C &&
+	check_ref $ref $A
+'
+
+test_expect_success '(update ref, expecting current sha1) should work' '
+	git update-ref $ref $B $A &&
+	check_ref $ref $B
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-06 23:45           ` Jeff King
@ 2011-01-07 14:50             ` Marc Branchaud
  2011-01-07 14:51               ` Marc Branchaud
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2011-01-07 14:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Shawn Pearce, Neal Kreitzinger, git

On 11-01-06 06:45 PM, Jeff King wrote:
> On Wed, Jan 05, 2011 at 03:29:47PM -0800, Junio C Hamano wrote:
> 
>> Jeff King <peff@peff.net> writes:
>>
>>> Interestingly, in the case of ref _creation_, not update, like this:
>>>
>>>   mkdir repo && cd repo && git init
>>>   git remote add origin some-remote-repo-that-takes-a-few-seconds
>>>   xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'
>>>
>>> then both will happily update, the second one overwriting the results of
>>> the first. It seems in the case of locking a ref which previously didn't
>>> exist, we don't enforce that it still doesn't exist.
>>
>> We probably should, especially when there is no --force or +prefix is
>> involved.
> 
> Hmph. So I created the test below to try to exercise this, expecting to
> see at least one failure: according to the above example, we aren't
> actually checking "null sha1 means ref must not exist", so we should get
> an erroneous success for that case. And there is the added complication
> that the null sha1 may also mean "don't care what the old one was". So
> even if I changed the code, we would get erroneous failures the other
> way.
> 
> But much to my surprise, it actually passes with stock git. Which means
> I need to dig a little further to see exactly what is going on.

I should point out that the repository where I saw this issue is running git
1.7.1.

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: concurrent fetches to update same mirror
  2011-01-07 14:50             ` Marc Branchaud
@ 2011-01-07 14:51               ` Marc Branchaud
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Branchaud @ 2011-01-07 14:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Shawn Pearce, Neal Kreitzinger, git

On 11-01-07 09:50 AM, Marc Branchaud wrote:
> On 11-01-06 06:45 PM, Jeff King wrote:
>> On Wed, Jan 05, 2011 at 03:29:47PM -0800, Junio C Hamano wrote:
>>
>>> Jeff King <peff@peff.net> writes:
>>>
>>>> Interestingly, in the case of ref _creation_, not update, like this:
>>>>
>>>>   mkdir repo && cd repo && git init
>>>>   git remote add origin some-remote-repo-that-takes-a-few-seconds
>>>>   xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'
>>>>
>>>> then both will happily update, the second one overwriting the results of
>>>> the first. It seems in the case of locking a ref which previously didn't
>>>> exist, we don't enforce that it still doesn't exist.
>>>
>>> We probably should, especially when there is no --force or +prefix is
>>> involved.
>>
>> Hmph. So I created the test below to try to exercise this, expecting to
>> see at least one failure: according to the above example, we aren't
>> actually checking "null sha1 means ref must not exist", so we should get
>> an erroneous success for that case. And there is the added complication
>> that the null sha1 may also mean "don't care what the old one was". So
>> even if I changed the code, we would get erroneous failures the other
>> way.
>>
>> But much to my surprise, it actually passes with stock git. Which means
>> I need to dig a little further to see exactly what is going on.
> 
> I should point out that the repository where I saw this issue is running git
> 1.7.1.

Oops -- sorry!  I'm in the wrong concurrency thread...

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-01-07 14:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-05 20:33 concurrent fetches to update same mirror Neal Kreitzinger
2011-01-05 20:47 ` Jeff King
2011-01-05 20:51   ` Shawn Pearce
2011-01-05 20:53     ` Jeff King
2011-01-05 21:13       ` Jeff King
2011-01-05 22:34         ` Neal Kreitzinger
2011-01-05 22:42         ` Neal Kreitzinger
2011-01-05 22:57           ` Jeff King
2011-01-05 23:29         ` Junio C Hamano
2011-01-06 23:45           ` Jeff King
2011-01-07 14:50             ` Marc Branchaud
2011-01-07 14:51               ` Marc Branchaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.