* Flurries of 'git reflog expire' @ 2017-07-04 7:57 Andreas Krey 2017-07-04 9:43 ` Ævar Arnfjörð Bjarmason 2017-07-05 8:20 ` Jeff King 0 siblings, 2 replies; 15+ messages in thread From: Andreas Krey @ 2017-07-04 7:57 UTC (permalink / raw) To: git Hi everyone, how is 'git reflog expire' triggered? We're occasionally seeing a lot of the running in parallel on a single of our repos (atlassian bitbucket), and this usually means that the machine is not very responsive for twenty minutes, the repo being as big as it is. The server is still on git 2.6.2 (and bitbucket 4.14.5). Questions: What can be done about this? Cronjob 'git reflog expire' at midnight, so the heuristic don't trigger during the day? (The relnotes don't mention anything after 2.4.0, so I suppose a git upgrade won't help.) What is the actual cause? Bad heuristics in git itself, or does bitbucket run them too often (improbable)? Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-04 7:57 Flurries of 'git reflog expire' Andreas Krey @ 2017-07-04 9:43 ` Ævar Arnfjörð Bjarmason 2017-07-06 13:27 ` Andreas Krey 2017-07-05 8:20 ` Jeff King 1 sibling, 1 reply; 15+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-07-04 9:43 UTC (permalink / raw) To: Andreas Krey; +Cc: git On Tue, Jul 04 2017, Andreas Krey jotted: > Hi everyone, > > how is 'git reflog expire' triggered? We're occasionally seeing a lot > of the running in parallel on a single of our repos (atlassian bitbucket), > and this usually means that the machine is not very responsive for > twenty minutes, the repo being as big as it is. Assuming Linux, what does 'ps auxf' look like when this happens? Is the parent a 'git gc --auto'? > The server is still on git 2.6.2 (and bitbucket 4.14.5). You might want to upgrade, we've had a bunch of changes since then, maybe some of this fixes it: git log --reverse -p -L'/^static.*lock_repo_for/,/^}/:builtin/gc.c' > Questions: > > What can be done about this? Cronjob 'git reflog expire' at midnight, > so the heuristic don't trigger during the day? (The relnotes don't > mention anything after 2.4.0, so I suppose a git upgrade won't help.) > > What is the actual cause? Bad heuristics in git itself, or does > bitbucket run them too often (improbable)? You can set gc.auto=0 in the repo to disable auto-gc, and play with e.g. the reflog expire values, see the git-gc manpage. But then you need to run your own gc, which is not a bad idea anyway with a dedicated git server. But it would be good to get to the bottom of this, we shouldn't be running these concurrently. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-04 9:43 ` Ævar Arnfjörð Bjarmason @ 2017-07-06 13:27 ` Andreas Krey 0 siblings, 0 replies; 15+ messages in thread From: Andreas Krey @ 2017-07-06 13:27 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git On Tue, 04 Jul 2017 11:43:33 +0000, Ævar Arnfjörð Bjarmason wrote: ... > You can set gc.auto=0 in the repo to disable auto-gc, and play with > e.g. the reflog expire values, see the git-gc manpage. > > But then you need to run your own gc, which is not a bad idea anyway > with a dedicated git server. Actually, bitbucket should be doing this. Although I can't quite rule out the possibility that we reenabled GC in this repo some time ago. > But it would be good to get to the bottom of this, we shouldn't be > running these concurrently. Indeed. Unfortunately this isn't easily reproduced in the test instance, so I will need to get a newer git under the production bitbucket. There are quite some of \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 | \_ git gc --auto --quiet | \_ git reflog expire --all in the process tree, apparently a new one gets started even though previous ones are still running. The number of running expires grew slowly, in the order of many minutes. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-04 7:57 Flurries of 'git reflog expire' Andreas Krey 2017-07-04 9:43 ` Ævar Arnfjörð Bjarmason @ 2017-07-05 8:20 ` Jeff King 2017-07-06 13:31 ` Andreas Krey 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2017-07-05 8:20 UTC (permalink / raw) To: Andreas Krey; +Cc: git On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: > Questions: > > What can be done about this? Cronjob 'git reflog expire' at midnight, > so the heuristic don't trigger during the day? (The relnotes don't > mention anything after 2.4.0, so I suppose a git upgrade won't help.) > > What is the actual cause? Bad heuristics in git itself, or does > bitbucket run them too often (improbable)? If it's using --expire-unreachable (which a default "git gc" does), that means the we have to traverse the entire history to see what is reachable and what is not. Added on to a normal git-gc, that's usually not a big deal (it has to do that traversal and much more for the repack). But if bitbucket is triggering it for other operations, that could be related (I don't think anything but gc should ever run it otherwise). I seem to recall that using --stale-fix is also extremely expensive, too. What do the command line arguments for the slow commands look like? And what does the process tree look like? -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-05 8:20 ` Jeff King @ 2017-07-06 13:31 ` Andreas Krey 2017-07-06 17:01 ` Bryan Turner 0 siblings, 1 reply; 15+ messages in thread From: Andreas Krey @ 2017-07-06 13:31 UTC (permalink / raw) To: Jeff King; +Cc: git On Wed, 05 Jul 2017 04:20:27 +0000, Jeff King wrote: > On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: ... > I seem to recall that using --stale-fix is also extremely expensive, > too. What do the command line arguments for the slow commands look like? The problem isn't that the expire is slow, it is that there are many of them, waiting for disk writes. > And what does the process tree look like? Lots (~ 10) of \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 | \_ git gc --auto --quiet | \_ git reflog expire --all plus another dozen gc/expire pairs where the parent is already gone. All with the same arguments - auto GC. I'd wager that each push sees that a GC is in order, and doesn't notice that there is one already running. - Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-06 13:31 ` Andreas Krey @ 2017-07-06 17:01 ` Bryan Turner 2017-07-11 4:45 ` Andreas Krey 2017-07-11 7:31 ` Jeff King 0 siblings, 2 replies; 15+ messages in thread From: Bryan Turner @ 2017-07-06 17:01 UTC (permalink / raw) To: Andreas Krey; +Cc: Jeff King, Git Users I'm one of the Bitbucket Server developers. My apologies; I just noticed this thread or I would have jumped in sooner! On Thu, Jul 6, 2017 at 6:31 AM, Andreas Krey <a.krey@gmx.de> wrote: > On Wed, 05 Jul 2017 04:20:27 +0000, Jeff King wrote: >> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: > ... >> And what does the process tree look like? > > Lots (~ 10) of > > \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 > | \_ git gc --auto --quiet > | \_ git reflog expire --all > > plus another dozen gc/expire pairs where the parent is already gone. > All with the same arguments - auto GC. Do you know what version of Bitbucket Server is in use? Based on the fact that it's "git gc --auto" triggered from a "git receive-pack", that implies two things: - You're on a 4.x version of Bitbucket Server - The repository (68) has never been forked Depending on your Bitbucket Server version (this being the reason I asked), there are a couple different fixes available: - Fork the repository. You don't need to _use_ the fork, but having a fork existing will trigger Bitbucket Server to disable auto GC and fully manage that itself. That includes managing both _concurrency_ and _frequency_ of GC. This works on all versions of Bitbucket Server. - Run "git config gc.auto 0" in /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 to disable auto GC yourself. This may be preferable to forking the repository, which, in addition to disabling auto GC, also disables object pruning. However, you must be running at least Bitbucket Server 4.6.0 for this approach to work. Otherwise auto GC will simply be reenabled the first time Bitbucket Server goes to trigger GC, when it detects that the repository has no forks. Assuming you're on 4.6.0 or newer, either approach should fix the issue. If you're on 4.5 or older, forking is the only viable approach unless you upgrade Bitbucket Server first. I also want to add that Bitbucket Server 5.x includes totally rewritten GC handling. 5.0.x automatically disables auto GC in all repositories and manages it explicitly, and 5.1.x fully removes use of "git gc" in favor of running relevant plumbing commands directly. We moved away from "git gc" specifically to avoid the "git reflog expire --all", because there's no config setting that _fully disables_ forking that process. By default our bare clones only have reflogs for pull request refs, and we've explicitly configured those to never expire, so all "git reflog expire --all" can do is use up I/O and, quite frequently, fail because refs are updated. Since we stopped running "git gc", we've not yet seen any GC failures on our internal Bitbucket Server clusters. Bitbucket Server 5.1.x also includes a new "gc.log" (not to be confused with the one Git itself writes) which retains a record of every GC-related process we run in each repository, and how long that process took to complete. That can be useful for getting clearer insight into both how often GC work is being done, and how long it's taking. Upgrading to 5.x can be a bit of an undertaking, since the major version brings API changes, so it's totally understandable that many organizations haven't upgraded yet. I'm just noting that these improvements are there for when such an upgrade becomes viable. Hope this helps! Bryan > > I'd wager that each push sees that a GC is in order, > and doesn't notice that there is one already running. > > - Andreas > > -- > "Totally trivial. Famous last words." > From: Linus Torvalds <torvalds@*.org> > Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-06 17:01 ` Bryan Turner @ 2017-07-11 4:45 ` Andreas Krey 2017-07-11 7:25 ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King 2017-07-11 7:35 ` Flurries of 'git reflog expire' Bryan Turner 2017-07-11 7:31 ` Jeff King 1 sibling, 2 replies; 15+ messages in thread From: Andreas Krey @ 2017-07-11 4:45 UTC (permalink / raw) To: Bryan Turner; +Cc: Jeff King, Git Users On Thu, 06 Jul 2017 10:01:05 +0000, Bryan Turner wrote: .... > Do you know what version of Bitbucket Server is in use? We're on the newest 4.x. ... > - Run "git config gc.auto 0" in Going that route. ... > I also want to add that Bitbucket Server 5.x includes totally > rewritten GC handling. 5.0.x automatically disables auto GC in all > repositories and manages it explicitly, and 5.1.x fully removes use of > "git gc" in favor of running relevant plumbing commands directly. That's the part that irks me. This shouldn't be necessary - git itself should make sure auto GC isn't run in parallel. Now I probably can't evaluate whether a git upgrade would fix this, but given that you are going the do-gc-ourselves route I suppose it wouldn't. ... > Upgrading to 5.x can be a bit of an undertaking, since the major > version brings API changes, The upgrade is on my todo list, but there are plugins that don't appear to be ready for 5.0, notable the jenkins one. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire' 2017-07-11 4:45 ` Andreas Krey @ 2017-07-11 7:25 ` Jeff King 2017-07-11 9:06 ` [PATCH] gc: run pre-detach operations under lock Jeff King 2017-07-11 7:35 ` Flurries of 'git reflog expire' Bryan Turner 1 sibling, 1 reply; 15+ messages in thread From: Jeff King @ 2017-07-11 7:25 UTC (permalink / raw) To: Andreas Krey Cc: Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users [Updating the subject since I think this really is a bug]. On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote: > > I also want to add that Bitbucket Server 5.x includes totally > > rewritten GC handling. 5.0.x automatically disables auto GC in all > > repositories and manages it explicitly, and 5.1.x fully removes use of > > "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. It's _supposed_ to take a lock, even in older versions. See 64a99eb47 (gc: reject if another gc is running, unless --force is given, 2013-08-08). But it looks like before we take that lock, we sometimes run pack-refs and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs in the background, 2014-05-25). IMHO this is buggy; it should be checking the lock before calling gc_before_repack() and daemonizing. Annoyingly, the lock code interacts badly with daemonizing because that latter will fork to a new process. So the simple solution like: diff --git a/builtin/gc.c b/builtin/gc.c index 2ba50a287..79480124a 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (report_last_gc_error()) return -1; + if (lock_repo_for_gc(force, &pid)) + return 0; + if (gc_before_repack()) return -1; /* means that anybody looking at the lockfile will report the wrong pid (and thus think the lock is invalid). I guess we'd need to update it in place after daemonizing. -Peff ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] gc: run pre-detach operations under lock 2017-07-11 7:25 ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King @ 2017-07-11 9:06 ` Jeff King 2017-07-12 16:46 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2017-07-11 9:06 UTC (permalink / raw) To: Andreas Krey Cc: Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users On Tue, Jul 11, 2017 at 03:25:36AM -0400, Jeff King wrote: > Annoyingly, the lock code interacts badly with daemonizing because that > latter will fork to a new process. So the simple solution like: > > diff --git a/builtin/gc.c b/builtin/gc.c > index 2ba50a287..79480124a 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > if (report_last_gc_error()) > return -1; > > + if (lock_repo_for_gc(force, &pid)) > + return 0; > + > if (gc_before_repack()) > return -1; > /* > > means that anybody looking at the lockfile will report the wrong pid > (and thus think the lock is invalid). I guess we'd need to update it in > place after daemonizing. Updating it in place is a bit tricky. I came up with this hack that makes it work, but I'm not sure if the reasoning is too gross. -- >8 -- Subject: [PATCH] gc: run pre-detach operations under lock We normally try to avoid having two auto-gc operations run at the same time, because it wastes resources. This was done long ago in 64a99eb47 (gc: reject if another gc is running, unless --force is given, 2013-08-08). When we do a detached auto-gc, we run the ref-related commands _before_ detaching, to avoid confusing lock contention. This was done by 62aad1849 (gc --auto: do not lock refs in the background, 2014-05-25). These two features do not interact well. The pre-detach operations are run before we check the gc.pid lock, meaning that on a busy repository we may run many of them concurrently. Ideally we'd take the lock before spawning any operations, and hold it for the duration of the program. This is tricky, though, with the way the pid-file interacts with the daemonize() process. Other processes will check that the pid recorded in the pid-file still exists. But detaching causes us to fork and continue running under a new pid. So if we take the lock before detaching, the pid-file will have a bogus pid in it. We'd have to go back and update it with the new pid after detaching. We'd also have to play some tricks with the tempfile subsystem to tweak the "owner" field, so that the parent process does not clean it up on exit, but the child process does. Instead, we can do something a bit simpler: take the lock only for the duration of the pre-detach work, then detach, then take it again for the post-detach work. Technically, this means that the post-detach lock could lose to another process doing pre-detach work. But in the long run this works out. That second process would then follow-up by doing post-detach work. Unless it was in turn blocked by a third process doing pre-detach work, and so on. This could in theory go on indefinitely, as the pre-detach work does not repack, and so need_to_gc() will continue to trigger. But in each round we are racing between the pre- and post-detach locks. Eventually, one of the post-detach locks will win the race and complete the full gc. So in the worst case, we may racily repeat the pre-detach work, but we would never do so simultaneously (it would happen via a sequence of serialized race-wins). Signed-off-by: Jeff King <peff@peff.net> --- builtin/gc.c | 4 ++++ t/t6500-gc.sh | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/builtin/gc.c b/builtin/gc.c index bd91f136f..5a535040f 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -414,8 +414,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (report_last_gc_error()) return -1; + if (lock_repo_for_gc(force, &pid)) + return 0; if (gc_before_repack()) return -1; + delete_tempfile(&pidfile); + /* * failure to daemonize is ok, we'll continue * in foreground diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index cc7acd101..41b0be575 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -95,6 +95,27 @@ test_expect_success 'background auto gc does not run if gc.log is present and re test_line_count = 1 packs ' +test_expect_success 'background auto gc respects lock for all operations' ' + # make sure we run a background auto-gc + test_commit make-pack && + git repack && + test_config gc.autopacklimit 1 && + test_config gc.autodetach true && + + # create a ref whose loose presence we can use to detect a pack-refs run + git update-ref refs/heads/should-be-loose HEAD && + test_path_is_file .git/refs/heads/should-be-loose && + + # now fake a concurrent gc that holds the lock; we can use our + # shell pid so that it looks valid. + hostname=$(hostname || echo unknown) && + printf "$$ %s" "$hostname" >.git/gc.pid && + + # our gc should exit zero without doing anything + run_and_wait_for_auto_gc && + test_path_is_file .git/refs/heads/should-be-loose +' + # DO NOT leave a detached auto gc process running near the end of the # test script: it can run long enough in the background to racily # interfere with the cleanup in 'test_done'. -- 2.13.2.1149.g835628f6b ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] gc: run pre-detach operations under lock 2017-07-11 9:06 ` [PATCH] gc: run pre-detach operations under lock Jeff King @ 2017-07-12 16:46 ` Junio C Hamano 2017-07-12 16:58 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2017-07-12 16:46 UTC (permalink / raw) To: Jeff King Cc: Andreas Krey, Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users Jeff King <peff@peff.net> writes: > Instead, we can do something a bit simpler: take the lock > only for the duration of the pre-detach work, then detach, > then take it again for the post-detach work. Technically, > this means that the post-detach lock could lose to another > process doing pre-detach work. But in the long run this > works out. You might have found this part gross, but I actually don't. It looks like a reasonable practical compromise, and I tried to think of a scenario that this would do a wrong thing but I didn't---it is not like we carry information off-disk from the pre-detach to post-detach work to cause the latter make decisions on it, so this "split into two phrases" looks fairly safe. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] gc: run pre-detach operations under lock 2017-07-12 16:46 ` Junio C Hamano @ 2017-07-12 16:58 ` Jeff King 2017-07-12 21:10 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2017-07-12 16:58 UTC (permalink / raw) To: Junio C Hamano Cc: Andreas Krey, Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users On Wed, Jul 12, 2017 at 09:46:25AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Instead, we can do something a bit simpler: take the lock > > only for the duration of the pre-detach work, then detach, > > then take it again for the post-detach work. Technically, > > this means that the post-detach lock could lose to another > > process doing pre-detach work. But in the long run this > > works out. > > You might have found this part gross, but I actually don't. It > looks like a reasonable practical compromise, and I tried to think > of a scenario that this would do a wrong thing but I didn't---it is > not like we carry information off-disk from the pre-detach to > post-detach work to cause the latter make decisions on it, so this > "split into two phrases" looks fairly safe. Anytime I have to spend a few paragraphs saying "well, it looks like this might behave terribly, but it doesn't because..." I get worried that my analysis is missing a case. And that writing it in a way that avoids that analysis might be safer, even if it's a little more work. I gave it some more thought after sending the earlier message. And I really think it's not "a little more work". Even if we decided to keep the same file and replace the PID in it with the daemonized one, I think that still isn't quite right. Because we don't do so atomically unless we take gc.pid.lock again. But we may actually conflict with somebody else on that! Even though that somebody would just pick up the lock, read gc.pid and say "well, looks like somebody else is running" and release it again. So we'd have to either hold the lock the whole time, or do some kind of retry loop to race with other processes picking up the lock. It's definitely possible, but it's fighting an uphill battle against the way our locking and tempfile code works. So I came to the conclusion that it's not worth the trouble, and what I posted is probably a good compromise. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] gc: run pre-detach operations under lock 2017-07-12 16:58 ` Jeff King @ 2017-07-12 21:10 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2017-07-12 21:10 UTC (permalink / raw) To: Jeff King Cc: Andreas Krey, Nguyễn Thái Ngọc Duy, Bryan Turner, Git Users Jeff King <peff@peff.net> writes: > ... And I > really think it's not "a little more work". Even if we decided to keep > the same file and replace the PID in it with the daemonized one, I think > that still isn't quite right. Because we don't do so atomically unless > we take gc.pid.lock again. Exactly. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-11 4:45 ` Andreas Krey 2017-07-11 7:25 ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King @ 2017-07-11 7:35 ` Bryan Turner 2017-07-11 7:45 ` Jeff King 1 sibling, 1 reply; 15+ messages in thread From: Bryan Turner @ 2017-07-11 7:35 UTC (permalink / raw) To: Andreas Krey; +Cc: Jeff King, Git Users On Mon, Jul 10, 2017 at 9:45 PM, Andreas Krey <a.krey@gmx.de> wrote: > On Thu, 06 Jul 2017 10:01:05 +0000, Bryan Turner wrote: > .... >> I also want to add that Bitbucket Server 5.x includes totally >> rewritten GC handling. 5.0.x automatically disables auto GC in all >> repositories and manages it explicitly, and 5.1.x fully removes use of >> "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. > I believe I've seen some commits on the mailing list that suggest "git gc --auto" manages its concurrency better in newer versions than it used to, but even then it can only manage its concurrency within a single repository. For a hosting server with thousands, or tens of thousands, of active repositories, there still wouldn't be any protection against "git gc --auto" running concurrently in dozens of them at the same time. But it's not only about concurrency. "git gc" (and by extension "git gc --auto") is a general purpose tool, designed to generally do what you need, and to mostly stay out of your way while it does it. I'd hazard to say it's not really designed for managing heavily-trafficked repositories on busy hosting services, though, and as a result, there are things it can't do. For example, I can configure auto GC to run based on how many loose objects or packs I have, but there's no heuristic to make it repack refs when I have a lot of loose ones, or configure it to _only_ pack refs without repacking objects or pruning reflogs. There are knobs for various things (like "gc.*.reflogExpire"), but those don't give complete control. Even if I set "gc.reflogExpire=never", "git gc" still forks "git reflog expire --all" (compared to "gc.packRefs=false", which completely prevents forking "git pack-refs"). A trace on "git gc" shows this: $ GIT_TRACE=1 git gc 00:10:45.058066 git.c:437 trace: built-in: git 'gc' 00:10:45.067075 run-command.c:369 trace: run_command: 'pack-refs' '--all' '--prune' 00:10:45.077086 git.c:437 trace: built-in: git 'pack-refs' '--all' '--prune' 00:10:45.084098 run-command.c:369 trace: run_command: 'reflog' 'expire' '--all' 00:10:45.093102 git.c:437 trace: built-in: git 'reflog' 'expire' '--all' 00:10:45.097088 run-command.c:369 trace: run_command: 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.106096 git.c:437 trace: built-in: git 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.107098 run-command.c:369 trace: run_command: 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' 00:10:45.127117 git.c:437 trace: built-in: git 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' Counting objects: 6, done. Delta compression using up to 16 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (6/6), done. Total 6 (delta 0), reused 6 (delta 0) 00:10:45.173161 run-command.c:369 trace: run_command: 'prune' '--expire' '2.weeks.ago' 00:10:45.184171 git.c:437 trace: built-in: git 'prune' '--expire' '2.weeks.ago' 00:10:45.199202 run-command.c:369 trace: run_command: 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.208193 git.c:437 trace: built-in: git 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.212198 run-command.c:369 trace: run_command: 'rerere' 'gc' 00:10:45.221223 git.c:437 trace: built-in: git 'rerere' 'gc' The bare repositories used by Bitbucket Server: * Don't have reflogs enabled generally, and for the ones that are enabled "gc.*.reflogExpire" is set to "never" * Never have worktrees, so they don't need to be pruned * Never use rerere, so that doesn't need to GC * Have pruning disabled if they've been forked, due to using alternates to manage disk space That means of all the commands "git gc" runs, under the covers, at most only "pack-refs", "repack" and sometimes "prune" have any value. "reflog expire --all" in particular is extremely likely to fail. Which brings up another consideration. "git gc --auto" has no sense of context, or adjacent behavior. Even if it correctly guards against concurrency, it still doesn't know what else is going on. Immediately after a push, Bitbucket Server has many other housekeeping tasks it performs, especially around pull requests. That means pull request refs are disproportionately likely to be "moving" immediately after a push completes--exactly when "git gc --auto" tries to run. (Which tends to be why "reflog expire --all" fails, due ref locking issues with pull request refs.) Bitbucket Server, on the other hand, better understands the context GC is running in. So it can defer GC processing for a period of time after a push completes, to increase the likelihood that the repository is "quiet" and GC can complete without issue. Another limitation is that you can't configure "negative" heuristics, like "Don't run GC more than once per day.". If "git gc --auto"'s heuristics are exceeded, it'll run GC. Depending, for example, on how rapidly a repository generates unreachable objects, it's entirely possible to get to a point where "git gc --auto" wants to run after every single push, sometimes for days in a row, while it waits for objects to hit the prune threshold. By managing GC ourselves, we gain the ability to enforce "cooldowns" to prevent continuous GC. "git gc --auto" also has a tendency to run "attached" to the "git receive-pack" process, which means both that pushing users can have their local process "delayed" while it runs, and that they sometimes get to see "scary" errors that they can't fix (or, often, understand). Newer versions of Git have increased the likelihood that "git gc --auto" will run detached, but that doesn't always happen. (Up to and including 2.13.2, the "git config" documentation for "gc.autoDetach" is qualified with "if the system supports it.") Managing GC in Bitbucket Server guarantees that it's _always_ detached from user processes. That's a few of the reasons we've switched over. I'd imagine most hosting providers take a similarly "hands on" approach to controlling their GC. Beyond a certain scale, it seems almost unavoidable. Git never has more than a repository-level view of the world; only the hosting provider can see the big picture. Best regards, Bryan Turner > ... >> Upgrading to 5.x can be a bit of an undertaking, since the major >> version brings API changes, > > The upgrade is on my todo list, but there are plugins that don't > appear to be ready for 5.0, notable the jenkins one. > > Andreas > > -- > "Totally trivial. Famous last words." > From: Linus Torvalds <torvalds@*.org> > Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-11 7:35 ` Flurries of 'git reflog expire' Bryan Turner @ 2017-07-11 7:45 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2017-07-11 7:45 UTC (permalink / raw) To: Bryan Turner; +Cc: Andreas Krey, Git Users On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote: > That's a few of the reasons we've switched over. I'd imagine most > hosting providers take a similarly "hands on" approach to controlling > their GC. Beyond a certain scale, it seems almost unavoidable. Git > never has more than a repository-level view of the world; only the > hosting provider can see the big picture. Thanks for writing this out. I agree with all of the reasons given (in my email which I suspect crossed with yours, I just said "throttling", but there really are a lot of other reasons). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Flurries of 'git reflog expire' 2017-07-06 17:01 ` Bryan Turner 2017-07-11 4:45 ` Andreas Krey @ 2017-07-11 7:31 ` Jeff King 1 sibling, 0 replies; 15+ messages in thread From: Jeff King @ 2017-07-11 7:31 UTC (permalink / raw) To: Bryan Turner; +Cc: Andreas Krey, Git Users On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote: > I also want to add that Bitbucket Server 5.x includes totally > rewritten GC handling. 5.0.x automatically disables auto GC in all > repositories and manages it explicitly, and 5.1.x fully removes use of > "git gc" in favor of running relevant plumbing commands directly. We > moved away from "git gc" specifically to avoid the "git reflog expire > --all", because there's no config setting that _fully disables_ > forking that process. FWIW, I think auto-gc in general is not a good way to handle maintenance on a busy hosting server. Repacking can be very resource hungry (both CPU and memory), and it needs to be throttled. You _could_ throttle with an auto-gc hook, but that isn't very elegant when it comes to re-queueing jobs which fail or timeout. The right model IMHO (and what GitHub uses, and what I'm guessing Bitbucket is doing in more recent versions) is to make note of write operations in a data structure, then use that data to schedule maintenance in a job queue. But that can never really be part of Git itself, as the notion of a system job queue is outside its scope. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-07-12 21:10 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-04 7:57 Flurries of 'git reflog expire' Andreas Krey 2017-07-04 9:43 ` Ævar Arnfjörð Bjarmason 2017-07-06 13:27 ` Andreas Krey 2017-07-05 8:20 ` Jeff King 2017-07-06 13:31 ` Andreas Krey 2017-07-06 17:01 ` Bryan Turner 2017-07-11 4:45 ` Andreas Krey 2017-07-11 7:25 ` [BUG] detached auto-gc does not respect lock for 'reflog expire', was " Jeff King 2017-07-11 9:06 ` [PATCH] gc: run pre-detach operations under lock Jeff King 2017-07-12 16:46 ` Junio C Hamano 2017-07-12 16:58 ` Jeff King 2017-07-12 21:10 ` Junio C Hamano 2017-07-11 7:35 ` Flurries of 'git reflog expire' Bryan Turner 2017-07-11 7:45 ` Jeff King 2017-07-11 7:31 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).