git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Flurries of 'git reflog expire'
@ 2017-07-04  7:57 Andreas Krey
  2017-07-04  9:43 ` Ævar Arnfjörð Bjarmason
  2017-07-05  8:20 ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Andreas Krey @ 2017-07-04  7:57 UTC (permalink / raw)
  To: git

Hi everyone,

how is 'git reflog expire' triggered? We're occasionally seeing a lot
of the running in parallel on a single of our repos (atlassian bitbucket),
and this usually means that the machine is not very responsive for
twenty minutes, the repo being as big as it is.

The server is still on git 2.6.2 (and bitbucket 4.14.5).

Questions:

What can be done about this? Cronjob 'git reflog expire' at midnight,
so the heuristic don't trigger during the day? (The relnotes don't
mention anything after 2.4.0, so I suppose a git upgrade won't help.)

What is the actual cause? Bad heuristics in git itself, or does
bitbucket run them too often (improbable)?

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-04  7:57 Flurries of 'git reflog expire' Andreas Krey
@ 2017-07-04  9:43 ` Ævar Arnfjörð Bjarmason
  2017-07-06 13:27   ` Andreas Krey
  2017-07-05  8:20 ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-07-04  9:43 UTC (permalink / raw)
  To: Andreas Krey; +Cc: git


On Tue, Jul 04 2017, Andreas Krey jotted:

> Hi everyone,
>
> how is 'git reflog expire' triggered? We're occasionally seeing a lot
> of the running in parallel on a single of our repos (atlassian bitbucket),
> and this usually means that the machine is not very responsive for
> twenty minutes, the repo being as big as it is.

Assuming Linux, what does 'ps auxf' look like when this happens? Is the
parent a 'git gc --auto'?

> The server is still on git 2.6.2 (and bitbucket 4.14.5).

You might want to upgrade, we've had a bunch of changes since then,
maybe some of this fixes it:

    git log --reverse -p -L'/^static.*lock_repo_for/,/^}/:builtin/gc.c'

> Questions:
>
> What can be done about this? Cronjob 'git reflog expire' at midnight,
> so the heuristic don't trigger during the day? (The relnotes don't
> mention anything after 2.4.0, so I suppose a git upgrade won't help.)
>
> What is the actual cause? Bad heuristics in git itself, or does
> bitbucket run them too often (improbable)?

You can set gc.auto=0 in the repo to disable auto-gc, and play with
e.g. the reflog expire values, see the git-gc manpage.

But then you need to run your own gc, which is not a bad idea anyway
with a dedicated git server.

But it would be good to get to the bottom of this, we shouldn't be
running these concurrently.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-04  7:57 Flurries of 'git reflog expire' Andreas Krey
  2017-07-04  9:43 ` Ævar Arnfjörð Bjarmason
@ 2017-07-05  8:20 ` Jeff King
  2017-07-06 13:31   ` Andreas Krey
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-07-05  8:20 UTC (permalink / raw)
  To: Andreas Krey; +Cc: git

On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:

> Questions:
> 
> What can be done about this? Cronjob 'git reflog expire' at midnight,
> so the heuristic don't trigger during the day? (The relnotes don't
> mention anything after 2.4.0, so I suppose a git upgrade won't help.)
> 
> What is the actual cause? Bad heuristics in git itself, or does
> bitbucket run them too often (improbable)?

If it's using --expire-unreachable (which a default "git gc" does), that
means the we have to traverse the entire history to see what is
reachable and what is not. Added on to a normal git-gc, that's usually
not a big deal (it has to do that traversal and much more for the
repack). But if bitbucket is triggering it for other operations, that
could be related (I don't think anything but gc should ever run it
otherwise).

I seem to recall that using --stale-fix is also extremely expensive,
too. What do the command line arguments for the slow commands look like?
And what does the process tree look like?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-04  9:43 ` Ævar Arnfjörð Bjarmason
@ 2017-07-06 13:27   ` Andreas Krey
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Krey @ 2017-07-06 13:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Tue, 04 Jul 2017 11:43:33 +0000, Ævar Arnfjörð Bjarmason wrote:
...
> You can set gc.auto=0 in the repo to disable auto-gc, and play with
> e.g. the reflog expire values, see the git-gc manpage.
> 
> But then you need to run your own gc, which is not a bad idea anyway
> with a dedicated git server.

Actually, bitbucket should be doing this. Although I can't quite
rule out the possibility that we reenabled GC in this repo some
time ago.

> But it would be good to get to the bottom of this, we shouldn't be
> running these concurrently.

Indeed. Unfortunately this isn't easily reproduced in the test instance,
so I will need to get a newer git under the production bitbucket.

There are quite some of

          \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
          |   \_ git gc --auto --quiet
          |       \_ git reflog expire --all

in the process tree, apparently a new one gets started even though previous
ones are still running. The number of running expires grew slowly, in the
order of many minutes.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-05  8:20 ` Jeff King
@ 2017-07-06 13:31   ` Andreas Krey
  2017-07-06 17:01     ` Bryan Turner
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Krey @ 2017-07-06 13:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, 05 Jul 2017 04:20:27 +0000, Jeff King wrote:
> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
...
> I seem to recall that using --stale-fix is also extremely expensive,
> too. What do the command line arguments for the slow commands look like?

The problem isn't that the expire is slow, it is that there are
many of them, waiting for disk writes.

> And what does the process tree look like?

Lots (~ 10) of

          \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
          |   \_ git gc --auto --quiet
          |       \_ git reflog expire --all

plus another dozen gc/expire pairs where the parent is already gone.
All with the same arguments - auto GC.

I'd wager that each push sees that a GC is in order,
and doesn't notice that there is one already running.

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-06 13:31   ` Andreas Krey
@ 2017-07-06 17:01     ` Bryan Turner
  2017-07-11  4:45       ` Andreas Krey
  2017-07-11  7:31       ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Bryan Turner @ 2017-07-06 17:01 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Jeff King, Git Users

I'm one of the Bitbucket Server developers. My apologies; I just
noticed this thread or I would have jumped in sooner!

On Thu, Jul 6, 2017 at 6:31 AM, Andreas Krey <a.krey@gmx.de> wrote:
> On Wed, 05 Jul 2017 04:20:27 +0000, Jeff King wrote:
>> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
> ...
>> And what does the process tree look like?
>
> Lots (~ 10) of
>
>           \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
>           |   \_ git gc --auto --quiet
>           |       \_ git reflog expire --all
>
> plus another dozen gc/expire pairs where the parent is already gone.
> All with the same arguments - auto GC.

Do you know what version of Bitbucket Server is in use? Based on the
fact that it's "git gc --auto" triggered from a "git receive-pack",
that implies two things:
- You're on a 4.x version of Bitbucket Server
- The repository (68) has never been forked

Depending on your Bitbucket Server version (this being the reason I
asked), there are a couple different fixes available:

- Fork the repository. You don't need to _use_ the fork, but having a
fork existing will trigger Bitbucket Server to disable auto GC and
fully manage that itself. That includes managing both _concurrency_
and _frequency_ of GC. This works on all versions of Bitbucket Server.

- Run "git config gc.auto 0" in
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 to
disable auto GC yourself. This may be preferable to forking the
repository, which, in addition to disabling auto GC, also disables
object pruning. However, you must be running at least Bitbucket Server
4.6.0 for this approach to work. Otherwise auto GC will simply be
reenabled the first time Bitbucket Server goes to trigger GC, when it
detects that the repository has no forks.

Assuming you're on 4.6.0 or newer, either approach should fix the
issue. If you're on 4.5 or older, forking is the only viable approach
unless you upgrade Bitbucket Server first.

I also want to add that Bitbucket Server 5.x includes totally
rewritten GC handling. 5.0.x automatically disables auto GC in all
repositories and manages it explicitly, and 5.1.x fully removes use of
"git gc" in favor of running relevant plumbing commands directly. We
moved away from "git gc" specifically to avoid the "git reflog expire
--all", because there's no config setting that _fully disables_
forking that process. By default our bare clones only have reflogs for
pull request refs, and we've explicitly configured those to never
expire, so all "git reflog expire --all" can do is use up I/O and,
quite frequently, fail because refs are updated. Since we stopped
running "git gc", we've not yet seen any GC failures on our internal
Bitbucket Server clusters.

Bitbucket Server 5.1.x also includes a new "gc.log" (not to be
confused with the one Git itself writes) which retains a record of
every GC-related process we run in each repository, and how long that
process took to complete. That can be useful for getting clearer
insight into both how often GC work is being done, and how long it's
taking.

Upgrading to 5.x can be a bit of an undertaking, since the major
version brings API changes, so it's totally understandable that many
organizations haven't upgraded yet. I'm just noting that these
improvements are there for when such an upgrade becomes viable.

Hope this helps!
Bryan

>
> I'd wager that each push sees that a GC is in order,
> and doesn't notice that there is one already running.
>
> - Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds <torvalds@*.org>
> Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-06 17:01     ` Bryan Turner
@ 2017-07-11  4:45       ` Andreas Krey
  2017-07-11  7:25         ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King
  2017-07-11  7:35         ` Flurries of 'git reflog expire' Bryan Turner
  2017-07-11  7:31       ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: Andreas Krey @ 2017-07-11  4:45 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Jeff King, Git Users

On Thu, 06 Jul 2017 10:01:05 +0000, Bryan Turner wrote:
....
> Do you know what version of Bitbucket Server is in use?

We're on the newest 4.x.

...
> - Run "git config gc.auto 0" in

Going that route.

...
> I also want to add that Bitbucket Server 5.x includes totally
> rewritten GC handling. 5.0.x automatically disables auto GC in all
> repositories and manages it explicitly, and 5.1.x fully removes use of
> "git gc" in favor of running relevant plumbing commands directly.

That's the part that irks me. This shouldn't be necessary - git itself
should make sure auto GC isn't run in parallel. Now I probably can't
evaluate whether a git upgrade would fix this, but given that you
are going the do-gc-ourselves route I suppose it wouldn't.

...
> Upgrading to 5.x can be a bit of an undertaking, since the major
> version brings API changes,

The upgrade is on my todo list, but there are plugins that don't
appear to be ready for 5.0, notable the jenkins one.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire'
  2017-07-11  4:45       ` Andreas Krey
@ 2017-07-11  7:25         ` Jeff King
  2017-07-11  9:06           ` [PATCH] gc: run pre-detach operations under lock Jeff King
  2017-07-11  7:35         ` Flurries of 'git reflog expire' Bryan Turner
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-07-11  7:25 UTC (permalink / raw)
  To: Andreas Krey
  Cc: Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users

[Updating the subject since I think this really is a bug].

On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote:

> > I also want to add that Bitbucket Server 5.x includes totally
> > rewritten GC handling. 5.0.x automatically disables auto GC in all
> > repositories and manages it explicitly, and 5.1.x fully removes use of
> > "git gc" in favor of running relevant plumbing commands directly.
> 
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.

It's _supposed_ to take a lock, even in older versions. See 64a99eb47
(gc: reject if another gc is running, unless --force is given,
2013-08-08).

But it looks like before we take that lock, we sometimes run pack-refs
and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs
in the background, 2014-05-25). IMHO this is buggy; it should be
checking the lock before calling gc_before_repack() and daemonizing.

Annoyingly, the lock code interacts badly with daemonizing because that
latter will fork to a new process. So the simple solution like:

diff --git a/builtin/gc.c b/builtin/gc.c
index 2ba50a287..79480124a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			if (report_last_gc_error())
 				return -1;
 
+			if (lock_repo_for_gc(force, &pid))
+				return 0;
+
 			if (gc_before_repack())
 				return -1;
 			/*

means that anybody looking at the lockfile will report the wrong pid
(and thus think the lock is invalid). I guess we'd need to update it in
place after daemonizing.

-Peff

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-06 17:01     ` Bryan Turner
  2017-07-11  4:45       ` Andreas Krey
@ 2017-07-11  7:31       ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-07-11  7:31 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Andreas Krey, Git Users

On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote:

> I also want to add that Bitbucket Server 5.x includes totally
> rewritten GC handling. 5.0.x automatically disables auto GC in all
> repositories and manages it explicitly, and 5.1.x fully removes use of
> "git gc" in favor of running relevant plumbing commands directly. We
> moved away from "git gc" specifically to avoid the "git reflog expire
> --all", because there's no config setting that _fully disables_
> forking that process.

FWIW, I think auto-gc in general is not a good way to handle maintenance
on a busy hosting server. Repacking can be very resource hungry (both
CPU and memory), and it needs to be throttled. You _could_ throttle with
an auto-gc hook, but that isn't very elegant when it comes to
re-queueing jobs which fail or timeout.

The right model IMHO (and what GitHub uses, and what I'm guessing
Bitbucket is doing in more recent versions) is to make note of write
operations in a data structure, then use that data to schedule
maintenance in a job queue. But that can never really be part of Git
itself, as the notion of a system job queue is outside its scope.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-11  4:45       ` Andreas Krey
  2017-07-11  7:25         ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King
@ 2017-07-11  7:35         ` Bryan Turner
  2017-07-11  7:45           ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Bryan Turner @ 2017-07-11  7:35 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Jeff King, Git Users

On Mon, Jul 10, 2017 at 9:45 PM, Andreas Krey <a.krey@gmx.de> wrote:
> On Thu, 06 Jul 2017 10:01:05 +0000, Bryan Turner wrote:
> ....
>> I also want to add that Bitbucket Server 5.x includes totally
>> rewritten GC handling. 5.0.x automatically disables auto GC in all
>> repositories and manages it explicitly, and 5.1.x fully removes use of
>> "git gc" in favor of running relevant plumbing commands directly.
>
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.
>

I believe I've seen some commits on the mailing list that suggest "git
gc --auto" manages its concurrency better in newer versions than it
used to, but even then it can only manage its concurrency within a
single repository. For a hosting server with thousands, or tens of
thousands, of active repositories, there still wouldn't be any
protection against "git gc --auto" running concurrently in dozens of
them at the same time.

But it's not only about concurrency. "git gc" (and by extension "git
gc --auto") is a general purpose tool, designed to generally do what
you need, and to mostly stay out of your way while it does it. I'd
hazard to say it's not really designed for managing heavily-trafficked
repositories on busy hosting services, though, and as a result, there
are things it can't do.

For example, I can configure auto GC to run based on how many loose
objects or packs I have, but there's no heuristic to make it repack
refs when I have a lot of loose ones, or configure it to _only_ pack
refs without repacking objects or pruning reflogs. There are knobs for
various things (like "gc.*.reflogExpire"), but those don't give
complete control. Even if I set "gc.reflogExpire=never", "git gc"
still forks "git reflog expire --all" (compared to
"gc.packRefs=false", which completely prevents forking "git
pack-refs").

A trace on "git gc" shows this:
$ GIT_TRACE=1 git gc
00:10:45.058066 git.c:437               trace: built-in: git 'gc'
00:10:45.067075 run-command.c:369       trace: run_command:
'pack-refs' '--all' '--prune'
00:10:45.077086 git.c:437               trace: built-in: git
'pack-refs' '--all' '--prune'
00:10:45.084098 run-command.c:369       trace: run_command: 'reflog'
'expire' '--all'
00:10:45.093102 git.c:437               trace: built-in: git 'reflog'
'expire' '--all'
00:10:45.097088 run-command.c:369       trace: run_command: 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.106096 git.c:437               trace: built-in: git 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.107098 run-command.c:369       trace: run_command:
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
00:10:45.127117 git.c:437               trace: built-in: git
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
Counting objects: 6, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)
00:10:45.173161 run-command.c:369       trace: run_command: 'prune'
'--expire' '2.weeks.ago'
00:10:45.184171 git.c:437               trace: built-in: git 'prune'
'--expire' '2.weeks.ago'
00:10:45.199202 run-command.c:369       trace: run_command: 'worktree'
'prune' '--expire' '3.months.ago'
00:10:45.208193 git.c:437               trace: built-in: git
'worktree' 'prune' '--expire' '3.months.ago'
00:10:45.212198 run-command.c:369       trace: run_command: 'rerere' 'gc'
00:10:45.221223 git.c:437               trace: built-in: git 'rerere' 'gc'

The bare repositories used by Bitbucket Server:
* Don't have reflogs enabled generally, and for the ones that are
enabled "gc.*.reflogExpire" is set to "never"
* Never have worktrees, so they don't need to be pruned
* Never use rerere, so that doesn't need to GC
* Have pruning disabled if they've been forked, due to using
alternates to manage disk space

That means of all the commands "git gc" runs, under the covers, at
most only "pack-refs", "repack" and sometimes "prune" have any value.
"reflog expire --all" in particular is extremely likely to fail. Which
brings up another consideration.

"git gc --auto" has no sense of context, or adjacent behavior. Even if
it correctly guards against concurrency, it still doesn't know what
else is going on. Immediately after a push, Bitbucket Server has many
other housekeeping tasks it performs, especially around pull requests.
That means pull request refs are disproportionately likely to be
"moving" immediately after a push completes--exactly when "git gc
--auto" tries to run. (Which tends to be why "reflog expire --all"
fails, due ref locking issues with pull request refs.) Bitbucket
Server, on the other hand, better understands the context GC is
running in. So it can defer GC processing for a period of time after a
push completes, to increase the likelihood that the repository is
"quiet" and GC can complete without issue.

Another limitation is that you can't configure "negative" heuristics,
like "Don't run GC more than once per day.". If "git gc --auto"'s
heuristics are exceeded, it'll run GC. Depending, for example, on how
rapidly a repository generates unreachable objects, it's entirely
possible to get to a point where "git gc --auto" wants to run after
every single push, sometimes for days in a row, while it waits for
objects to hit the prune threshold. By managing GC ourselves, we gain
the ability to enforce "cooldowns" to prevent continuous GC.

"git gc --auto" also has a tendency to run "attached" to the "git
receive-pack" process, which means both that pushing users can have
their local process "delayed" while it runs, and that they sometimes
get to see "scary" errors that they can't fix (or, often, understand).
Newer versions of Git have increased the likelihood that "git gc
--auto" will run detached, but that doesn't always happen. (Up to and
including 2.13.2, the "git config" documentation for "gc.autoDetach"
is qualified with "if the system supports it.") Managing GC in
Bitbucket Server guarantees that it's _always_ detached from user
processes.

That's a few of the reasons we've switched over. I'd imagine most
hosting providers take a similarly "hands on" approach to controlling
their GC. Beyond a certain scale, it seems almost unavoidable. Git
never has more than a repository-level view of the world; only the
hosting provider can see the big picture.

Best regards,
Bryan Turner

> ...
>> Upgrading to 5.x can be a bit of an undertaking, since the major
>> version brings API changes,
>
> The upgrade is on my todo list, but there are plugins that don't
> appear to be ready for 5.0, notable the jenkins one.
>
> Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds <torvalds@*.org>
> Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Flurries of 'git reflog expire'
  2017-07-11  7:35         ` Flurries of 'git reflog expire' Bryan Turner
@ 2017-07-11  7:45           ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-07-11  7:45 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Andreas Krey, Git Users

On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote:

> That's a few of the reasons we've switched over. I'd imagine most
> hosting providers take a similarly "hands on" approach to controlling
> their GC. Beyond a certain scale, it seems almost unavoidable. Git
> never has more than a repository-level view of the world; only the
> hosting provider can see the big picture.

Thanks for writing this out. I agree with all of the reasons given (in
my email which I suspect crossed with yours, I just said "throttling",
but there really are a lot of other reasons).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] gc: run pre-detach operations under lock
  2017-07-11  7:25         ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King
@ 2017-07-11  9:06           ` Jeff King
  2017-07-12 16:46             ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-07-11  9:06 UTC (permalink / raw)
  To: Andreas Krey
  Cc: Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users

On Tue, Jul 11, 2017 at 03:25:36AM -0400, Jeff King wrote:

> Annoyingly, the lock code interacts badly with daemonizing because that
> latter will fork to a new process. So the simple solution like:
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 2ba50a287..79480124a 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  			if (report_last_gc_error())
>  				return -1;
>  
> +			if (lock_repo_for_gc(force, &pid))
> +				return 0;
> +
>  			if (gc_before_repack())
>  				return -1;
>  			/*
> 
> means that anybody looking at the lockfile will report the wrong pid
> (and thus think the lock is invalid). I guess we'd need to update it in
> place after daemonizing.

Updating it in place is a bit tricky. I came up with this hack that
makes it work, but I'm not sure if the reasoning is too gross.

-- >8 --
Subject: [PATCH] gc: run pre-detach operations under lock

We normally try to avoid having two auto-gc operations run
at the same time, because it wastes resources. This was done
long ago in 64a99eb47 (gc: reject if another gc is running,
unless --force is given, 2013-08-08).

When we do a detached auto-gc, we run the ref-related
commands _before_ detaching, to avoid confusing lock
contention. This was done by 62aad1849 (gc --auto: do not
lock refs in the background, 2014-05-25).

These two features do not interact well. The pre-detach
operations are run before we check the gc.pid lock, meaning
that on a busy repository we may run many of them
concurrently. Ideally we'd take the lock before spawning any
operations, and hold it for the duration of the program.

This is tricky, though, with the way the pid-file interacts
with the daemonize() process.  Other processes will check
that the pid recorded in the pid-file still exists. But
detaching causes us to fork and continue running under a
new pid. So if we take the lock before detaching, the
pid-file will have a bogus pid in it. We'd have to go back
and update it with the new pid after detaching. We'd also
have to play some tricks with the tempfile subsystem to
tweak the "owner" field, so that the parent process does not
clean it up on exit, but the child process does.

Instead, we can do something a bit simpler: take the lock
only for the duration of the pre-detach work, then detach,
then take it again for the post-detach work. Technically,
this means that the post-detach lock could lose to another
process doing pre-detach work. But in the long run this
works out.

That second process would then follow-up by doing
post-detach work. Unless it was in turn blocked by a third
process doing pre-detach work, and so on. This could in
theory go on indefinitely, as the pre-detach work does not
repack, and so need_to_gc() will continue to trigger.  But
in each round we are racing between the pre- and post-detach
locks. Eventually, one of the post-detach locks will win the
race and complete the full gc. So in the worst case, we may
racily repeat the pre-detach work, but we would never do so
simultaneously (it would happen via a sequence of serialized
race-wins).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/gc.c  |  4 ++++
 t/t6500-gc.sh | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index bd91f136f..5a535040f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -414,8 +414,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			if (report_last_gc_error())
 				return -1;
 
+			if (lock_repo_for_gc(force, &pid))
+				return 0;
 			if (gc_before_repack())
 				return -1;
+			delete_tempfile(&pidfile);
+
 			/*
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index cc7acd101..41b0be575 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -95,6 +95,27 @@ test_expect_success 'background auto gc does not run if gc.log is present and re
 	test_line_count = 1 packs
 '
 
+test_expect_success 'background auto gc respects lock for all operations' '
+	# make sure we run a background auto-gc
+	test_commit make-pack &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+
+	# create a ref whose loose presence we can use to detect a pack-refs run
+	git update-ref refs/heads/should-be-loose HEAD &&
+	test_path_is_file .git/refs/heads/should-be-loose &&
+
+	# now fake a concurrent gc that holds the lock; we can use our
+	# shell pid so that it looks valid.
+	hostname=$(hostname || echo unknown) &&
+	printf "$$ %s" "$hostname" >.git/gc.pid &&
+
+	# our gc should exit zero without doing anything
+	run_and_wait_for_auto_gc &&
+	test_path_is_file .git/refs/heads/should-be-loose
+'
+
 # DO NOT leave a detached auto gc process running near the end of the
 # test script: it can run long enough in the background to racily
 # interfere with the cleanup in 'test_done'.
-- 
2.13.2.1149.g835628f6b


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] gc: run pre-detach operations under lock
  2017-07-11  9:06           ` [PATCH] gc: run pre-detach operations under lock Jeff King
@ 2017-07-12 16:46             ` Junio C Hamano
  2017-07-12 16:58               ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-07-12 16:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Andreas Krey, Nguyễn Thái Ngọc Duy,
	Bryan Turner, Git Users

Jeff King <peff@peff.net> writes:

> Instead, we can do something a bit simpler: take the lock
> only for the duration of the pre-detach work, then detach,
> then take it again for the post-detach work. Technically,
> this means that the post-detach lock could lose to another
> process doing pre-detach work. But in the long run this
> works out.

You might have found this part gross, but I actually don't.  It
looks like a reasonable practical compromise, and I tried to think
of a scenario that this would do a wrong thing but I didn't---it is
not like we carry information off-disk from the pre-detach to
post-detach work to cause the latter make decisions on it, so this
"split into two phrases" looks fairly safe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] gc: run pre-detach operations under lock
  2017-07-12 16:46             ` Junio C Hamano
@ 2017-07-12 16:58               ` Jeff King
  2017-07-12 21:10                 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-07-12 16:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Andreas Krey, Nguyễn Thái Ngọc Duy,
	Bryan Turner, Git Users

On Wed, Jul 12, 2017 at 09:46:25AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Instead, we can do something a bit simpler: take the lock
> > only for the duration of the pre-detach work, then detach,
> > then take it again for the post-detach work. Technically,
> > this means that the post-detach lock could lose to another
> > process doing pre-detach work. But in the long run this
> > works out.
> 
> You might have found this part gross, but I actually don't.  It
> looks like a reasonable practical compromise, and I tried to think
> of a scenario that this would do a wrong thing but I didn't---it is
> not like we carry information off-disk from the pre-detach to
> post-detach work to cause the latter make decisions on it, so this
> "split into two phrases" looks fairly safe.

Anytime I have to spend a few paragraphs saying "well, it looks like
this might behave terribly, but it doesn't because..." I get worried
that my analysis is missing a case. And that writing it in a way that
avoids that analysis might be safer, even if it's a little more work.

I gave it some more thought after sending the earlier message. And I
really think it's not "a little more work". Even if we decided to keep
the same file and replace the PID in it with the daemonized one, I think
that still isn't quite right. Because we don't do so atomically unless
we take gc.pid.lock again. But we may actually conflict with somebody
else on that! Even though that somebody would just pick up the lock,
read gc.pid and say "well, looks like somebody else is running" and
release it again. So we'd have to either hold the lock the whole time,
or do some kind of retry loop to race with other processes picking up
the lock.

It's definitely possible, but it's fighting an uphill battle against the
way our locking and tempfile code works. So I came to the conclusion
that it's not worth the trouble, and what I posted is probably a good
compromise.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] gc: run pre-detach operations under lock
  2017-07-12 16:58               ` Jeff King
@ 2017-07-12 21:10                 ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-07-12 21:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Andreas Krey, Nguyễn Thái Ngọc Duy,
	Bryan Turner, Git Users

Jeff King <peff@peff.net> writes:

> ... And I
> really think it's not "a little more work". Even if we decided to keep
> the same file and replace the PID in it with the daemonized one, I think
> that still isn't quite right. Because we don't do so atomically unless
> we take gc.pid.lock again.

Exactly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-07-12 21:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-04  7:57 Flurries of 'git reflog expire' Andreas Krey
2017-07-04  9:43 ` Ævar Arnfjörð Bjarmason
2017-07-06 13:27   ` Andreas Krey
2017-07-05  8:20 ` Jeff King
2017-07-06 13:31   ` Andreas Krey
2017-07-06 17:01     ` Bryan Turner
2017-07-11  4:45       ` Andreas Krey
2017-07-11  7:25         ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King
2017-07-11  9:06           ` [PATCH] gc: run pre-detach operations under lock Jeff King
2017-07-12 16:46             ` Junio C Hamano
2017-07-12 16:58               ` Jeff King
2017-07-12 21:10                 ` Junio C Hamano
2017-07-11  7:35         ` Flurries of 'git reflog expire' Bryan Turner
2017-07-11  7:45           ` Jeff King
2017-07-11  7:31       ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).