All of lore.kernel.org
 help / color / mirror / Atom feed
* Simultaneous gc and repack
@ 2017-04-13 17:31 David Turner
  2017-04-13 18:03 ` Jacob Keller
  0 siblings, 1 reply; 7+ messages in thread
From: David Turner @ 2017-04-13 17:31 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

Git gc locks the repository (using a gc.pid file) so that other gcs
don't run concurrently. But git repack doesn't respect this lock, so
it's possible to have a repack running at the same time as a gc.  This
makes the gc sad when its packs are deleted out from under it with:
"fatal: ./objects/pack/pack-$sha.pack cannot be accessed".  Then it
dies, leaving a large temp file hanging around.

Does the following seem reasonable?

1. Make git repack, by default, check for a gc.pid file (using the same
logic as git gc itself does).
2. Provide a --force option to git repack to ignore said check.
3. Make git gc provide that --force option when it calls repack under
its own lock.

This came up because Gitlab runs a repack after every N pushes and a gc
after every M commits, where M >> N.  Sometimes, when pushes come in
rapidly, the repack catches the still-running gc and the above badness
happens.  At least, that's my understanding: I don't run our Gitlab
servers, but I talked to the person who does and that's what he said.

Of course, Gitlab could do its own locking, but the general approach
seems like it would help other folks too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 17:31 Simultaneous gc and repack David Turner
@ 2017-04-13 18:03 ` Jacob Keller
  2017-04-13 18:08   ` Martin Fick
  0 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2017-04-13 18:03 UTC (permalink / raw)
  To: David Turner; +Cc: Git mailing list, Christian Couder

On Thu, Apr 13, 2017 at 10:31 AM, David Turner <novalis@novalis.org> wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. But git repack doesn't respect this lock, so
> it's possible to have a repack running at the same time as a gc.  This
> makes the gc sad when its packs are deleted out from under it with:
> "fatal: ./objects/pack/pack-$sha.pack cannot be accessed".  Then it
> dies, leaving a large temp file hanging around.
>
> Does the following seem reasonable?
>
> 1. Make git repack, by default, check for a gc.pid file (using the same
> logic as git gc itself does).
> 2. Provide a --force option to git repack to ignore said check.
> 3. Make git gc provide that --force option when it calls repack under
> its own lock.
>

What about just making the code that calls repack today just call gc
instead? I guess it's more work if you don't strictly need it but..?

Thanks,
Jake

> This came up because Gitlab runs a repack after every N pushes and a gc
> after every M commits, where M >> N.  Sometimes, when pushes come in
> rapidly, the repack catches the still-running gc and the above badness
> happens.  At least, that's my understanding: I don't run our Gitlab
> servers, but I talked to the person who does and that's what he said.
>
> Of course, Gitlab could do its own locking, but the general approach
> seems like it would help other folks too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 18:03 ` Jacob Keller
@ 2017-04-13 18:08   ` Martin Fick
  2017-04-13 18:28     ` David Turner
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Fick @ 2017-04-13 18:08 UTC (permalink / raw)
  To: Jacob Keller; +Cc: David Turner, Git mailing list, Christian Couder

On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
> On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
<novalis@novalis.org> wrote:
> > Git gc locks the repository (using a gc.pid file) so
> > that other gcs don't run concurrently. But git repack
> > doesn't respect this lock, so it's possible to have a
> > repack running at the same time as a gc.  This makes
> > the gc sad when its packs are deleted out from under it
> > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
> > accessed".  Then it dies, leaving a large temp file
> > hanging around.
> > 
> > Does the following seem reasonable?
> > 
> > 1. Make git repack, by default, check for a gc.pid file
> > (using the same logic as git gc itself does).
> > 2. Provide a --force option to git repack to ignore said
> > check. 3. Make git gc provide that --force option when
> > it calls repack under its own lock.
> 
> What about just making the code that calls repack today
> just call gc instead? I guess it's more work if you don't
> strictly need it but..?

There are many scanerios where this does not achieve the 
same thing.  On the obvious side, gc does more than 
repacking, but on the other side, repacking has many 
switches that are not available via gc.

Would it make more sense to move the lock to repack instead 
of to gc?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 18:08   ` Martin Fick
@ 2017-04-13 18:28     ` David Turner
  2017-04-13 18:35       ` Jacob Keller
  2017-04-13 18:36       ` Martin Fick
  0 siblings, 2 replies; 7+ messages in thread
From: David Turner @ 2017-04-13 18:28 UTC (permalink / raw)
  To: Martin Fick, Jacob Keller; +Cc: Git mailing list, Christian Couder

On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> 
> <novalis@novalis.org> wrote:
> > > Git gc locks the repository (using a gc.pid file) so
> > > that other gcs don't run concurrently. But git repack
> > > doesn't respect this lock, so it's possible to have a
> > > repack running at the same time as a gc.  This makes
> > > the gc sad when its packs are deleted out from under it
> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
> > > accessed".  Then it dies, leaving a large temp file
> > > hanging around.
> > > 
> > > Does the following seem reasonable?
> > > 
> > > 1. Make git repack, by default, check for a gc.pid file
> > > (using the same logic as git gc itself does).
> > > 2. Provide a --force option to git repack to ignore said
> > > check. 3. Make git gc provide that --force option when
> > > it calls repack under its own lock.
> > 
> > What about just making the code that calls repack today
> > just call gc instead? I guess it's more work if you don't
> > strictly need it but..?
> 
> There are many scanerios where this does not achieve the 
> same thing.  On the obvious side, gc does more than 
> repacking, but on the other side, repacking has many 
> switches that are not available via gc.
> 
> Would it make more sense to move the lock to repack instead 
> of to gc?

Other gc operations might step on each other too (e.g. packing refs). 
That would be less bad (and less common), but it still seems worth
avoiding.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 18:28     ` David Turner
@ 2017-04-13 18:35       ` Jacob Keller
  2017-04-13 18:36       ` Martin Fick
  1 sibling, 0 replies; 7+ messages in thread
From: Jacob Keller @ 2017-04-13 18:35 UTC (permalink / raw)
  To: David Turner; +Cc: Martin Fick, Git mailing list, Christian Couder

On Thu, Apr 13, 2017 at 11:28 AM, David Turner <novalis@novalis.org> wrote:
> On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
>> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
>> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner
>>
>> <novalis@novalis.org> wrote:
>> > > Git gc locks the repository (using a gc.pid file) so
>> > > that other gcs don't run concurrently. But git repack
>> > > doesn't respect this lock, so it's possible to have a
>> > > repack running at the same time as a gc.  This makes
>> > > the gc sad when its packs are deleted out from under it
>> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
>> > > accessed".  Then it dies, leaving a large temp file
>> > > hanging around.
>> > >
>> > > Does the following seem reasonable?
>> > >
>> > > 1. Make git repack, by default, check for a gc.pid file
>> > > (using the same logic as git gc itself does).
>> > > 2. Provide a --force option to git repack to ignore said
>> > > check. 3. Make git gc provide that --force option when
>> > > it calls repack under its own lock.
>> >
>> > What about just making the code that calls repack today
>> > just call gc instead? I guess it's more work if you don't
>> > strictly need it but..?
>>
>> There are many scanerios where this does not achieve the
>> same thing.  On the obvious side, gc does more than
>> repacking, but on the other side, repacking has many
>> switches that are not available via gc.
>>
>> Would it make more sense to move the lock to repack instead
>> of to gc?
>
> Other gc operations might step on each other too (e.g. packing refs).
> That would be less bad (and less common), but it still seems worth
> avoiding.

It sounds like your original solution would work, though I wouldn't
use "force" and I would either not document or document with "this is
only meant to be used by git-gc internally"

Thanks,
Jake

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 18:28     ` David Turner
  2017-04-13 18:35       ` Jacob Keller
@ 2017-04-13 18:36       ` Martin Fick
  2017-04-13 19:05         ` David Turner
  1 sibling, 1 reply; 7+ messages in thread
From: Martin Fick @ 2017-04-13 18:36 UTC (permalink / raw)
  To: David Turner; +Cc: Jacob Keller, Git mailing list, Christian Couder

On Thursday, April 13, 2017 02:28:07 PM David Turner wrote:
> On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller 
wrote:
> > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> > 
> > <novalis@novalis.org> wrote:
> > > > Git gc locks the repository (using a gc.pid file) so
> > > > that other gcs don't run concurrently. But git
> > > > repack
> > > > doesn't respect this lock, so it's possible to have
> > > > a
> > > > repack running at the same time as a gc.  This makes
> > > > the gc sad when its packs are deleted out from under
> > > > it
> > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot
> > > > be
> > > > accessed".  Then it dies, leaving a large temp file
> > > > hanging around.
> > > > 
> > > > Does the following seem reasonable?
> > > > 
> > > > 1. Make git repack, by default, check for a gc.pid
> > > > file
> > > > (using the same logic as git gc itself does).
> > > > 2. Provide a --force option to git repack to ignore
> > > > said
> > > > check. 3. Make git gc provide that --force option
> > > > when
> > > > it calls repack under its own lock.
> > > 
> > > What about just making the code that calls repack
> > > today
> > > just call gc instead? I guess it's more work if you
> > > don't
> > > strictly need it but..?
> > 
> > There are many scanerios where this does not achieve
> > the 
> > same thing.  On the obvious side, gc does more than 
> > repacking, but on the other side, repacking has many 
> > switches that are not available via gc.
> > 
> > Would it make more sense to move the lock to repack
> > instead  of to gc?
> 
> Other gc operations might step on each other too (e.g.
> packing refs). That would be less bad (and less common),
> but it still seems worth avoiding.

Yes, but all of thsoe operations need to be self protected 
already, or they risk the same issue.

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Simultaneous gc and repack
  2017-04-13 18:36       ` Martin Fick
@ 2017-04-13 19:05         ` David Turner
  0 siblings, 0 replies; 7+ messages in thread
From: David Turner @ 2017-04-13 19:05 UTC (permalink / raw)
  To: Martin Fick; +Cc: Jacob Keller, Git mailing list, Christian Couder

On Thu, 2017-04-13 at 12:36 -0600, Martin Fick wrote:
> On Thursday, April 13, 2017 02:28:07 PM David Turner wrote:
> > On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> > > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller 
> 
> wrote:
> > > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> > > 
> > > <novalis@novalis.org> wrote:
> > > > > Git gc locks the repository (using a gc.pid file) so
> > > > > that other gcs don't run concurrently. But git
> > > > > repack
> > > > > doesn't respect this lock, so it's possible to have
> > > > > a
> > > > > repack running at the same time as a gc.  This makes
> > > > > the gc sad when its packs are deleted out from under
> > > > > it
> > > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot
> > > > > be
> > > > > accessed".  Then it dies, leaving a large temp file
> > > > > hanging around.
> > > > > 
> > > > > Does the following seem reasonable?
> > > > > 
> > > > > 1. Make git repack, by default, check for a gc.pid
> > > > > file
> > > > > (using the same logic as git gc itself does).
> > > > > 2. Provide a --force option to git repack to ignore
> > > > > said
> > > > > check. 3. Make git gc provide that --force option
> > > > > when
> > > > > it calls repack under its own lock.
> > > > 
> > > > What about just making the code that calls repack
> > > > today
> > > > just call gc instead? I guess it's more work if you
> > > > don't
> > > > strictly need it but..?
> > > 
> > > There are many scanerios where this does not achieve
> > > the 
> > > same thing.  On the obvious side, gc does more than 
> > > repacking, but on the other side, repacking has many 
> > > switches that are not available via gc.
> > > 
> > > Would it make more sense to move the lock to repack
> > > instead  of to gc?
> > 
> > Other gc operations might step on each other too (e.g.
> > packing refs). That would be less bad (and less common),
> > but it still seems worth avoiding.
> 
> Yes, but all of thsoe operations need to be self protected 
> already, or they risk the same issue.

They are  individually protected.  But I would rather fail fast.  And
I'm not sure what happens if git prune happens during a git repack -a;
might the repack fail if an object that it expects to pack is pruned
out from under it?



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-13 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-13 17:31 Simultaneous gc and repack David Turner
2017-04-13 18:03 ` Jacob Keller
2017-04-13 18:08   ` Martin Fick
2017-04-13 18:28     ` David Turner
2017-04-13 18:35       ` Jacob Keller
2017-04-13 18:36       ` Martin Fick
2017-04-13 19:05         ` David Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.