git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>, git@vger.kernel.org
Subject: Re: non-smooth progress  indication for git fsck and git gc
Date: Thu, 16 Aug 2018 16:55:57 -0400	[thread overview]
Message-ID: <20180816205556.GA8257@sigill.intra.peff.net> (raw)
In-Reply-To: <87bma2qcba.fsf@evledraar.gmail.com>

On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is all interesting, but I think unrelated to what Ulrich is talking
> about. Quote:
> 
>     Between the two phases of "git fsck" (checking directories and
>     checking objects) there was a break of several seconds where no
>     progress was indicated
> 
> I.e. it's not about the pause you get with your testcase (which is
> certainly another issue) but the break between the two progress bars.

I think he's talking about both. What I said responds to this:

> >> During "git gc" the writing objects phase did not update for some
> >> seconds, but then the percentage counter jumped like from 15% to 42%.

But yeah, I missed that the fsck thing was specifically about a break
between two meters. That's a separate problem, but also worth
discussing (and hopefully much easier to address).

> If you fsck this repository it'll take around (on my spinning rust
> server) 30 seconds between 100% of "Checking object directories" before
> you get any output from "Checking objects".
> 
> The breakdown of that is (this is from approximate eyeballing):
> 
>  * We spend 1-3 seconds just on this:
>    https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181

OK, so that's checking the sha1 over the .idx file. We could put a meter
on that. I wouldn't expect it to generally be all that slow outside of
pathological cases, since it scales with the number of objects (and 1s
is our minimum update anyway, so that might be OK as-is). Your case has
13M objects, which is quite large.

>  * We spend the majority of the ~30s on this:
>    https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79

This is hashing the actual packfile. This is potentially quite long,
especially if you have a ton of big objects.

I wonder if we need to do this as a separate step anyway, though. Our
verification is based on index-pack these days, which means it's going
to walk over the whole content as part of the "Indexing objects" step to
expand base objects and mark deltas for later. Could we feed this hash
as part of that walk over the data? It's not going to save us 30s, but
it's likely to be more efficient. And it would fold the effort naturally
into the existing progress meter.

>  * Wes spend another 3-5 seconds on this QSORT:
>    https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105

That's a tough one. I'm not sure how we'd count it (how many compares we
do?). And each item is doing so little work that hitting the progress
code may make things noticeably slower.

Again, your case is pretty big. Just based on the number of objects,
linux.git should be 1.5-2.5 seconds on your machine for the same
operation. Which I think may be small enough to ignore (or even just
print a generic before/after). It's really the 30s packfile hash that's
making the whole thing so terrible.

-Peff

  reply	other threads:[~2018-08-16 20:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16  6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl
2018-08-16 15:18 ` Duy Nguyen
2018-08-16 16:05   ` Jeff King
2018-08-20  8:27   ` Antw: " Ulrich Windl
2018-08-16 15:57 ` Jeff King
2018-08-16 20:02   ` Jeff King
2018-08-16 22:10     ` Junio C Hamano
2018-08-16 20:35   ` Ævar Arnfjörð Bjarmason
2018-08-16 20:55     ` Jeff King [this message]
2018-08-16 21:06       ` Jeff King
2018-08-17 14:39         ` Duy Nguyen
2018-08-20  8:33       ` Antw: " Ulrich Windl
2018-08-20  8:57         ` Ævar Arnfjörð Bjarmason
2018-08-20  9:37           ` Ulrich Windl
2018-08-21  1:07           ` Jeff King
2018-08-21  6:20             ` Ulrich Windl
2018-08-21 15:21             ` Duy Nguyen
2018-09-01 12:53     ` Ævar Arnfjörð Bjarmason
2018-09-01 13:52       ` Ævar Arnfjörð Bjarmason
2018-09-02  7:46       ` Jeff King
2018-09-02  7:55         ` Jeff King
2018-09-02  8:55           ` Jeff King
2018-09-03 16:48             ` Ævar Arnfjörð Bjarmason
2018-09-07  3:30               ` Jeff King
2018-09-04 15:53           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180816205556.GA8257@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Ulrich.Windl@rz.uni-regensburg.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).