* non-smooth progress indication for git fsck and git gc @ 2018-08-16 6:54 Ulrich Windl 2018-08-16 15:18 ` Duy Nguyen 2018-08-16 15:57 ` Jeff King 0 siblings, 2 replies; 25+ messages in thread From: Ulrich Windl @ 2018-08-16 6:54 UTC (permalink / raw) To: git Hi! I'd like to point out some minor issue observed while processing some 50000-object repository with many binary objects, but most are rather small: Between the two phases of "git fsck" (checking directories and checking objects) there was a break of several seconds where no progress was indicated. During "git gc" the writing objects phase did not update for some seconds, but then the percentage counter jumped like from 15% to 42%. I understand that updating the progress output too often can be a performance bottleneck, while upating it too rarely might only bore the user... ;-) But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3). Regards, Ulrich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl @ 2018-08-16 15:18 ` Duy Nguyen 2018-08-16 16:05 ` Jeff King 2018-08-20 8:27 ` Antw: " Ulrich Windl 2018-08-16 15:57 ` Jeff King 1 sibling, 2 replies; 25+ messages in thread From: Duy Nguyen @ 2018-08-16 15:18 UTC (permalink / raw) To: Ulrich.Windl; +Cc: Git Mailing List On Thu, Aug 16, 2018 at 1:10 PM Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote: > > Hi! > > I'd like to point out some minor issue observed while processing some 50000-object repository with many binary objects, but most are rather small: > > Between the two phases of "git fsck" (checking directories and checking objects) there was a break of several seconds where no progress was indicated. > > During "git gc" the writing objects phase did not update for some seconds, but then the percentage counter jumped like from 15% to 42%. > > I understand that updating the progress output too often can be a performance bottleneck, while upating it too rarely might only bore the user... ;-) > > But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3). Is it possible to make this repository public? You can also use "git fast-export --anonymize" to make a repo with same structure but no real content (but read the man page about that option first) > Regards, > Ulrich > > -- Duy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 15:18 ` Duy Nguyen @ 2018-08-16 16:05 ` Jeff King 2018-08-20 8:27 ` Antw: " Ulrich Windl 1 sibling, 0 replies; 25+ messages in thread From: Jeff King @ 2018-08-16 16:05 UTC (permalink / raw) To: Duy Nguyen; +Cc: Ulrich.Windl, Git Mailing List On Thu, Aug 16, 2018 at 05:18:51PM +0200, Duy Nguyen wrote: > > During "git gc" the writing objects phase did not update for some > > seconds, but then the percentage counter jumped like from 15% to > > 42%. > [...] > > Is it possible to make this repository public? You can also use "git > fast-export --anonymize" to make a repo with same structure but no > real content (but read the man page about that option first) Try this: -- >8 -- git init repo cd repo # one big blob dd if=/dev/urandom of=big bs=1M count=100 git add big git commit -m big # several small blobs for i in $(seq 10); do echo $i >file git add file git commit -m $i done git gc -- >8 -- It "stalls" at 33%, and then jumps straight to 100%. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-16 15:18 ` Duy Nguyen 2018-08-16 16:05 ` Jeff King @ 2018-08-20 8:27 ` Ulrich Windl 1 sibling, 0 replies; 25+ messages in thread From: Ulrich Windl @ 2018-08-20 8:27 UTC (permalink / raw) To: Duy Nguyen; +Cc: git >>> Duy Nguyen <pclouds@gmail.com> schrieb am 16.08.2018 um 17:18 in Nachricht <CACsJy8Dukjw_PKQXMTxwd_C3juA_0cqZSjb=1L2wKqtJoC3rkQ@mail.gmail.com>: > On Thu, Aug 16, 2018 at 1:10 PM Ulrich Windl > <Ulrich.Windl@rz.uni-regensburg.de> wrote: >> >> Hi! >> >> I'd like to point out some minor issue observed while processing some > 50000-object repository with many binary objects, but most are rather small: >> >> Between the two phases of "git fsck" (checking directories and checking > objects) there was a break of several seconds where no progress was > indicated. >> >> During "git gc" the writing objects phase did not update for some seconds, > but then the percentage counter jumped like from 15% to 42%. >> >> I understand that updating the progress output too often can be a > performance bottleneck, while upating it too rarely might only bore the > user... ;-) >> >> But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3). > > Is it possible to make this repository public? You can also use "git > fast-export --anonymize" to make a repo with same structure but no > real content (but read the man page about that option first) Hi! Actually I tried that locally, but with the resulting repository both, fsck and gc are very fast. So I guess it won't be very useful. Also the original .git directory uses 5.3G, while the anonymous .git just used 4.3M... I tried to capture the behavior as screencast, but it seems the screencast optimized the little cahnges away, and in the result git almost had no delay on any operation 8-( Regards, Ulrich > >> Regards, >> Ulrich >> >> > > > -- > Duy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl 2018-08-16 15:18 ` Duy Nguyen @ 2018-08-16 15:57 ` Jeff King 2018-08-16 20:02 ` Jeff King 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 25+ messages in thread From: Jeff King @ 2018-08-16 15:57 UTC (permalink / raw) To: Ulrich Windl; +Cc: git On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: > I'd like to point out some minor issue observed while processing some > 50000-object repository with many binary objects, but most are rather > small: > > Between the two phases of "git fsck" (checking directories and > checking objects) there was a break of several seconds where no > progress was indicated. > > During "git gc" the writing objects phase did not update for some > seconds, but then the percentage counter jumped like from 15% to 42%. > > I understand that updating the progress output too often can be a > performance bottleneck, while upating it too rarely might only bore > the user... ;-) We update the counter integer for every object we process, and then actually update the display whenever the percentage increases or a second has elapsed, whichever comes first. What you're seeing is likely an artifact of _what_ we're counting: written objects. Not all objects are the same size, so it's not uncommon to see thousands/sec when dealing with small ones, and then several seconds for one giant blob. The only way to solve that is to count bytes. We don't have a total byte count in most cases, and it wouldn't always make sense (e.g., the "Compressing objects" meter can show the same issue, but it's not really putting through bytes in a linear way). In some cases we do show transmitted size and throughput, but that's just for network operations. We could do the same for "gc" with the patch below. But usually throughput isn't all that interesting for a filesystem write, because bandwidth isn't the bottleneck. Possibly we could have a "half throughput" mode that counts up the total size written, but omits the speed indicator. That's not an unreasonable thing to show for a local pack, since you end up with the final pack size. The object counter would still be jumpy, but you'd at least have one number updated at least once per second as you put through a large blob. If you really want a smooth percentage, then we'd have to start counting bytes instead of objects. Two reasons we don't do that are: - we often don't know the total byte size exactly. E.g., for a packfile write, it depends on the result of deflating each object. You can make an approximation and just pretend at the end that you hit 100%. Or you can count the pre-deflate sizes, but then your meter doesn't match the bytes from the throughput counter. - for something like fsck, we're not actually writing out bytes. So I guess you'd be measuring "here's how many bytes of objects I fsck-ed". But is that on-disk compressed bytes, or decompressed bytes? If the former, that's only marginally better as a measure of effort, since delta compression means that a small number of on-disk bytes may require a big effort (imagine processing a 100 byte blob versus a 100 byte delta off of a 100MB blob). The latter is probably more accurate. But it's also not free to pre-generate the total. We can get the number of objects or the size of the packfile in constant-time, but totaling up the uncompressed size of all objects is O(n). So that's extra computation, but it also means a potential lag before we can start the progress meter. I'm also not sure how meaningful a byte count is for a user there. So there. That's probably more than you wanted to know about Git's progress code. I think it probably _could_ be improved by counting more/different things, but I also think it can be a bit of a rabbit hole. Which is why AFAIK nobody's really looked too seriously into changing it. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 15:57 ` Jeff King @ 2018-08-16 20:02 ` Jeff King 2018-08-16 22:10 ` Junio C Hamano 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 25+ messages in thread From: Jeff King @ 2018-08-16 20:02 UTC (permalink / raw) To: Ulrich Windl; +Cc: git On Thu, Aug 16, 2018 at 11:57:14AM -0400, Jeff King wrote: > The only way to solve that is to count bytes. We don't have a total byte > count in most cases, and it wouldn't always make sense (e.g., the > "Compressing objects" meter can show the same issue, but it's not really > putting through bytes in a linear way). In some cases we do show > transmitted size and throughput, but that's just for network operations. > We could do the same for "gc" with the patch below. But usually > throughput isn't all that interesting for a filesystem write, because > bandwidth isn't the bottleneck. Just realized I forgot to include the patch. Here it is, for reference. Doing something similar for fsck would be quite a bit more invasive. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 80c880e9ad..e1130b959d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -837,7 +837,7 @@ static void write_pack_file(void) if (pack_to_stdout) f = hashfd_throughput(1, "<stdout>", progress_state); else - f = create_tmp_packfile(&pack_tmp_name); + f = create_tmp_packfile(&pack_tmp_name, progress_state); offset = write_pack_header(f, nr_remaining); diff --git a/bulk-checkin.c b/bulk-checkin.c index 9f3b644811..0df45b8f55 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -178,7 +178,7 @@ static void prepare_to_stream(struct bulk_checkin_state *state, if (!(flags & HASH_WRITE_OBJECT) || state->f) return; - state->f = create_tmp_packfile(&state->pack_tmp_name); + state->f = create_tmp_packfile(&state->pack_tmp_name, NULL); reset_pack_idx_option(&state->pack_idx_opts); /* Pretend we are going to write only one object */ diff --git a/pack-write.c b/pack-write.c index a9d46bc03f..b72480b440 100644 --- a/pack-write.c +++ b/pack-write.c @@ -334,14 +334,15 @@ int encode_in_pack_object_header(unsigned char *hdr, int hdr_len, return n; } -struct hashfile *create_tmp_packfile(char **pack_tmp_name) +struct hashfile *create_tmp_packfile(char **pack_tmp_name, + struct progress *progress) { struct strbuf tmpname = STRBUF_INIT; int fd; fd = odb_mkstemp(&tmpname, "pack/tmp_pack_XXXXXX"); *pack_tmp_name = strbuf_detach(&tmpname, NULL); - return hashfd(fd, *pack_tmp_name); + return hashfd_throughput(fd, *pack_tmp_name, progress); } void finish_tmp_packfile(struct strbuf *name_buffer, diff --git a/pack.h b/pack.h index 34a9d458b4..c87628b093 100644 --- a/pack.h +++ b/pack.h @@ -98,7 +98,8 @@ extern int encode_in_pack_object_header(unsigned char *hdr, int hdr_len, #define PH_ERROR_PROTOCOL (-3) extern int read_pack_header(int fd, struct pack_header *); -extern struct hashfile *create_tmp_packfile(char **pack_tmp_name); +extern struct hashfile *create_tmp_packfile(char **pack_tmp_name, + struct progress *progress); extern void finish_tmp_packfile(struct strbuf *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]); #endif ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 20:02 ` Jeff King @ 2018-08-16 22:10 ` Junio C Hamano 0 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2018-08-16 22:10 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Windl, git Jeff King <peff@peff.net> writes: > On Thu, Aug 16, 2018 at 11:57:14AM -0400, Jeff King wrote: > >> The only way to solve that is to count bytes. We don't have a total byte >> count in most cases, and it wouldn't always make sense (e.g., the >> "Compressing objects" meter can show the same issue, but it's not really >> putting through bytes in a linear way). In some cases we do show >> transmitted size and throughput, but that's just for network operations. >> We could do the same for "gc" with the patch below. But usually >> throughput isn't all that interesting for a filesystem write, because >> bandwidth isn't the bottleneck. > > Just realized I forgot to include the patch. Here it is, for reference. I've been wondering when you'd realize the omission ;-) > Doing something similar for fsck would be quite a bit more invasive. Yeah, on that codepath there is no streaming write passing through a single chokepoint you can count bytes X-<. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 80c880e9ad..e1130b959d 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -837,7 +837,7 @@ static void write_pack_file(void) > if (pack_to_stdout) > f = hashfd_throughput(1, "<stdout>", progress_state); > else > - f = create_tmp_packfile(&pack_tmp_name); > + f = create_tmp_packfile(&pack_tmp_name, progress_state); > > offset = write_pack_header(f, nr_remaining); > > diff --git a/bulk-checkin.c b/bulk-checkin.c > index 9f3b644811..0df45b8f55 100644 > --- a/bulk-checkin.c > +++ b/bulk-checkin.c > @@ -178,7 +178,7 @@ static void prepare_to_stream(struct bulk_checkin_state *state, > if (!(flags & HASH_WRITE_OBJECT) || state->f) > return; > > - state->f = create_tmp_packfile(&state->pack_tmp_name); > + state->f = create_tmp_packfile(&state->pack_tmp_name, NULL); > reset_pack_idx_option(&state->pack_idx_opts); > > /* Pretend we are going to write only one object */ > diff --git a/pack-write.c b/pack-write.c > index a9d46bc03f..b72480b440 100644 > --- a/pack-write.c > +++ b/pack-write.c > @@ -334,14 +334,15 @@ int encode_in_pack_object_header(unsigned char *hdr, int hdr_len, > return n; > } > > -struct hashfile *create_tmp_packfile(char **pack_tmp_name) > +struct hashfile *create_tmp_packfile(char **pack_tmp_name, > + struct progress *progress) > { > struct strbuf tmpname = STRBUF_INIT; > int fd; > > fd = odb_mkstemp(&tmpname, "pack/tmp_pack_XXXXXX"); > *pack_tmp_name = strbuf_detach(&tmpname, NULL); > - return hashfd(fd, *pack_tmp_name); > + return hashfd_throughput(fd, *pack_tmp_name, progress); > } > > void finish_tmp_packfile(struct strbuf *name_buffer, > diff --git a/pack.h b/pack.h > index 34a9d458b4..c87628b093 100644 > --- a/pack.h > +++ b/pack.h > @@ -98,7 +98,8 @@ extern int encode_in_pack_object_header(unsigned char *hdr, int hdr_len, > #define PH_ERROR_PROTOCOL (-3) > extern int read_pack_header(int fd, struct pack_header *); > > -extern struct hashfile *create_tmp_packfile(char **pack_tmp_name); > +extern struct hashfile *create_tmp_packfile(char **pack_tmp_name, > + struct progress *progress); > extern void finish_tmp_packfile(struct strbuf *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]); > > #endif ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 15:57 ` Jeff King 2018-08-16 20:02 ` Jeff King @ 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason 2018-08-16 20:55 ` Jeff King 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-08-16 20:35 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Windl, git On Thu, Aug 16 2018, Jeff King wrote: > On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: > >> I'd like to point out some minor issue observed while processing some >> 50000-object repository with many binary objects, but most are rather >> small: >> >> Between the two phases of "git fsck" (checking directories and >> checking objects) there was a break of several seconds where no >> progress was indicated. >> >> During "git gc" the writing objects phase did not update for some >> seconds, but then the percentage counter jumped like from 15% to 42%. >> >> I understand that updating the progress output too often can be a >> performance bottleneck, while upating it too rarely might only bore >> the user... ;-) > > We update the counter integer for every object we process, and then > actually update the display whenever the percentage increases or a > second has elapsed, whichever comes first. > > What you're seeing is likely an artifact of _what_ we're counting: > written objects. Not all objects are the same size, so it's not uncommon > to see thousands/sec when dealing with small ones, and then several > seconds for one giant blob. > > The only way to solve that is to count bytes. We don't have a total byte > count in most cases, and it wouldn't always make sense (e.g., the > "Compressing objects" meter can show the same issue, but it's not really > putting through bytes in a linear way). In some cases we do show > transmitted size and throughput, but that's just for network operations. > We could do the same for "gc" with the patch below. But usually > throughput isn't all that interesting for a filesystem write, because > bandwidth isn't the bottleneck. > > Possibly we could have a "half throughput" mode that counts up the total > size written, but omits the speed indicator. That's not an unreasonable > thing to show for a local pack, since you end up with the final pack > size. The object counter would still be jumpy, but you'd at least have > one number updated at least once per second as you put through a large > blob. > > If you really want a smooth percentage, then we'd have to start counting > bytes instead of objects. Two reasons we don't do that are: > > - we often don't know the total byte size exactly. E.g., for a > packfile write, it depends on the result of deflating each object. > You can make an approximation and just pretend at the end that you > hit 100%. Or you can count the pre-deflate sizes, but then your > meter doesn't match the bytes from the throughput counter. > > - for something like fsck, we're not actually writing out bytes. So I > guess you'd be measuring "here's how many bytes of objects I > fsck-ed". But is that on-disk compressed bytes, or decompressed > bytes? > > If the former, that's only marginally better as a measure of effort, > since delta compression means that a small number of on-disk bytes > may require a big effort (imagine processing a 100 byte blob versus > a 100 byte delta off of a 100MB blob). > > The latter is probably more accurate. But it's also not free to > pre-generate the total. We can get the number of objects or the size > of the packfile in constant-time, but totaling up the uncompressed > size of all objects is O(n). So that's extra computation, but it > also means a potential lag before we can start the progress meter. > > I'm also not sure how meaningful a byte count is for a user there. > > So there. That's probably more than you wanted to know about Git's > progress code. I think it probably _could_ be improved by counting > more/different things, but I also think it can be a bit of a rabbit > hole. Which is why AFAIK nobody's really looked too seriously into > changing it. > > -Peff This is all interesting, but I think unrelated to what Ulrich is talking about. Quote: Between the two phases of "git fsck" (checking directories and checking objects) there was a break of several seconds where no progress was indicated I.e. it's not about the pause you get with your testcase (which is certainly another issue) but the break between the two progress bars. Here's a test case you can clone: https://github.com/avar/2015-04-03-1M-git (or might already have "locally" :) If you fsck this repository it'll take around (on my spinning rust server) 30 seconds between 100% of "Checking object directories" before you get any output from "Checking objects". The breakdown of that is (this is from approximate eyeballing): * We spend 1-3 seconds just on this: https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181 * We spend the majority of the ~30s on this: https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 * Wes spend another 3-5 seconds on this QSORT: https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105 I.e. it's not about objects v.s. bytes, but the structural problem with the code that we pass a progress bar down to verify_pack() which does a lot of work in verify_pack_index() and verify_packfile() before we even get to iterating over the objects in the file, and only then do we start displaying progress. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason @ 2018-08-16 20:55 ` Jeff King 2018-08-16 21:06 ` Jeff King 2018-08-20 8:33 ` Antw: " Ulrich Windl 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 25+ messages in thread From: Jeff King @ 2018-08-16 20:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: > This is all interesting, but I think unrelated to what Ulrich is talking > about. Quote: > > Between the two phases of "git fsck" (checking directories and > checking objects) there was a break of several seconds where no > progress was indicated > > I.e. it's not about the pause you get with your testcase (which is > certainly another issue) but the break between the two progress bars. I think he's talking about both. What I said responds to this: > >> During "git gc" the writing objects phase did not update for some > >> seconds, but then the percentage counter jumped like from 15% to 42%. But yeah, I missed that the fsck thing was specifically about a break between two meters. That's a separate problem, but also worth discussing (and hopefully much easier to address). > If you fsck this repository it'll take around (on my spinning rust > server) 30 seconds between 100% of "Checking object directories" before > you get any output from "Checking objects". > > The breakdown of that is (this is from approximate eyeballing): > > * We spend 1-3 seconds just on this: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181 OK, so that's checking the sha1 over the .idx file. We could put a meter on that. I wouldn't expect it to generally be all that slow outside of pathological cases, since it scales with the number of objects (and 1s is our minimum update anyway, so that might be OK as-is). Your case has 13M objects, which is quite large. > * We spend the majority of the ~30s on this: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 This is hashing the actual packfile. This is potentially quite long, especially if you have a ton of big objects. I wonder if we need to do this as a separate step anyway, though. Our verification is based on index-pack these days, which means it's going to walk over the whole content as part of the "Indexing objects" step to expand base objects and mark deltas for later. Could we feed this hash as part of that walk over the data? It's not going to save us 30s, but it's likely to be more efficient. And it would fold the effort naturally into the existing progress meter. > * Wes spend another 3-5 seconds on this QSORT: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105 That's a tough one. I'm not sure how we'd count it (how many compares we do?). And each item is doing so little work that hitting the progress code may make things noticeably slower. Again, your case is pretty big. Just based on the number of objects, linux.git should be 1.5-2.5 seconds on your machine for the same operation. Which I think may be small enough to ignore (or even just print a generic before/after). It's really the 30s packfile hash that's making the whole thing so terrible. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 20:55 ` Jeff King @ 2018-08-16 21:06 ` Jeff King 2018-08-17 14:39 ` Duy Nguyen 2018-08-20 8:33 ` Antw: " Ulrich Windl 1 sibling, 1 reply; 25+ messages in thread From: Jeff King @ 2018-08-16 21:06 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote: > > * We spend the majority of the ~30s on this: > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 > > This is hashing the actual packfile. This is potentially quite long, > especially if you have a ton of big objects. > > I wonder if we need to do this as a separate step anyway, though. Our > verification is based on index-pack these days, which means it's going > to walk over the whole content as part of the "Indexing objects" step to > expand base objects and mark deltas for later. Could we feed this hash > as part of that walk over the data? It's not going to save us 30s, but > it's likely to be more efficient. And it would fold the effort naturally > into the existing progress meter. Actually, I take it back. That's the nice, modern way we do it in git-verify-pack. But git-fsck uses the ancient "just walk over all of the idx entries method". It at least sorts in pack order, which is good, but: - it's not multi-threaded, like index-pack/verify-pack - the index-pack way is actually more efficient than pack-ordering for the delta-base cache, because it actually walks the delta-graph in the optimal order Once upon a time verify-pack used this same pack-check code, and we switched in 3de89c9d42 (verify-pack: use index-pack --verify, 2011-06-03). So I suspect the right thing to do here is for fsck to switch to that, too, and then delete pack-check.c entirely. That may well involve us switching the progress to a per-pack view (e.g., "checking pack 1/10 (10%)", followed by its own progress meter). But I don't think that would be a bad thing. It's a more realistic view of the work we're actually doing. Although perhaps it's misleading about the total work remaining, because not all packs are the same size (so you see that you're halfway through pack 2/10, but that's quite likely to be half of the total work if it's the one gigantic pack). -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 21:06 ` Jeff King @ 2018-08-17 14:39 ` Duy Nguyen 0 siblings, 0 replies; 25+ messages in thread From: Duy Nguyen @ 2018-08-17 14:39 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Ulrich.Windl, Git Mailing List On Thu, Aug 16, 2018 at 11:08 PM Jeff King <peff@peff.net> wrote: > > On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote: > > > > * We spend the majority of the ~30s on this: > > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 > > > > This is hashing the actual packfile. This is potentially quite long, > > especially if you have a ton of big objects. > > > > I wonder if we need to do this as a separate step anyway, though. Our > > verification is based on index-pack these days, which means it's going > > to walk over the whole content as part of the "Indexing objects" step to > > expand base objects and mark deltas for later. Could we feed this hash > > as part of that walk over the data? It's not going to save us 30s, but > > it's likely to be more efficient. And it would fold the effort naturally > > into the existing progress meter. > > Actually, I take it back. That's the nice, modern way we do it in > git-verify-pack. But git-fsck uses the ancient "just walk over all of > the idx entries method". It at least sorts in pack order, which is good, > but: > > - it's not multi-threaded, like index-pack/verify-pack > > - the index-pack way is actually more efficient than pack-ordering for > the delta-base cache, because it actually walks the delta-graph in > the optimal order > I actually tried to make git-fsck use index-pack --verify at one point. The only thing that stopped it from working was index-pack automatically wrote the newer index version if I remember correctly, and that would fail the final hash check. fsck performance was not a big deal so I dropped it. Just saying it should be possible, if someone's interested in that direction. -- Duy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-16 20:55 ` Jeff King 2018-08-16 21:06 ` Jeff King @ 2018-08-20 8:33 ` Ulrich Windl 2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 25+ messages in thread From: Ulrich Windl @ 2018-08-20 8:33 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, peff; +Cc: git >>> Jeff King <peff@peff.net> schrieb am 16.08.2018 um 22:55 in Nachricht <20180816205556.GA8257@sigill.intra.peff.net>: > On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> This is all interesting, but I think unrelated to what Ulrich is talking >> about. Quote: >> >> Between the two phases of "git fsck" (checking directories and >> checking objects) there was a break of several seconds where no >> progress was indicated >> >> I.e. it's not about the pause you get with your testcase (which is >> certainly another issue) but the break between the two progress bars. > > I think he's talking about both. What I said responds to this: Hi guys! Yes, I was wondering what git does between the two visible phases, and between the lines I was suggesting another progress message between those phases. At least the maximum unspecific three-dot-message "Thinking..." could be displayed ;-) Of course anything more appropriate would be welcome. Also that message should only be displayed if it's foreseeable that the operation will take significant time. In my case (I just repeated it a few minutes ago) the delay is significant (at least 10 seconds). As noted earlier I was hoping to capture the timing in a screencast, but it seems all the delays were just optimized away in the recording. > >> >> During "git gc" the writing objects phase did not update for some >> >> seconds, but then the percentage counter jumped like from 15% to 42%. > > But yeah, I missed that the fsck thing was specifically about a break > between two meters. That's a separate problem, but also worth > discussing (and hopefully much easier to address). > >> If you fsck this repository it'll take around (on my spinning rust >> server) 30 seconds between 100% of "Checking object directories" before >> you get any output from "Checking objects". >> >> The breakdown of that is (this is from approximate eyeballing): >> >> * We spend 1-3 seconds just on this: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L181 > > OK, so that's checking the sha1 over the .idx file. We could put a meter > on that. I wouldn't expect it to generally be all that slow outside of > pathological cases, since it scales with the number of objects (and 1s > is our minimum update anyway, so that might be OK as-is). Your case has > 13M objects, which is quite large. Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a classic spinning disk with 5400RPM built in... > >> * We spend the majority of the ~30s on this: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L70-L79 > > This is hashing the actual packfile. This is potentially quite long, > especially if you have a ton of big objects. That seems to apply. BTW: Is there a way go get some repository statistics like a histogram of object sizes (or whatever that might be useful to help making decisions)? > > I wonder if we need to do this as a separate step anyway, though. Our > verification is based on index-pack these days, which means it's going > to walk over the whole content as part of the "Indexing objects" step to > expand base objects and mark deltas for later. Could we feed this hash > as part of that walk over the data? It's not going to save us 30s, but > it's likely to be more efficient. And it would fold the effort naturally > into the existing progress meter. > >> * Wes spend another 3-5 seconds on this QSORT: >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > -check.c#L105 > > That's a tough one. I'm not sure how we'd count it (how many compares we > do?). And each item is doing so little work that hitting the progress > code may make things noticeably slower. If it's sorting, maybe add some code like (wild guess): if (objects_to_sort > magic_number) message("Sorting something..."); > > Again, your case is pretty big. Just based on the number of objects, > linux.git should be 1.5-2.5 seconds on your machine for the same > operation. Which I think may be small enough to ignore (or even just > print a generic before/after). It's really the 30s packfile hash that's > making the whole thing so terrible. > > -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-20 8:33 ` Antw: " Ulrich Windl @ 2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason 2018-08-20 9:37 ` Ulrich Windl 2018-08-21 1:07 ` Jeff King 0 siblings, 2 replies; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-08-20 8:57 UTC (permalink / raw) To: Ulrich Windl; +Cc: peff, git On Mon, Aug 20 2018, Ulrich Windl wrote: >>>> Jeff King <peff@peff.net> schrieb am 16.08.2018 um 22:55 in Nachricht > <20180816205556.GA8257@sigill.intra.peff.net>: >> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> This is all interesting, but I think unrelated to what Ulrich is talking >>> about. Quote: >>> >>> Between the two phases of "git fsck" (checking directories and >>> checking objects) there was a break of several seconds where no >>> progress was indicated >>> >>> I.e. it's not about the pause you get with your testcase (which is >>> certainly another issue) but the break between the two progress bars. >> >> I think he's talking about both. What I said responds to this: > > Hi guys! > > Yes, I was wondering what git does between the two visible phases, and between > the lines I was suggesting another progress message between those phases. At > least the maximum unspecific three-dot-message "Thinking..." could be displayed > ;-) Of course anything more appropriate would be welcome. > Also that message should only be displayed if it's foreseeable that the > operation will take significant time. In my case (I just repeated it a few > minutes ago) the delay is significant (at least 10 seconds). As noted earlier I > was hoping to capture the timing in a screencast, but it seems all the delays > were just optimized away in the recording. > >> >>> >> During "git gc" the writing objects phase did not update for some >>> >> seconds, but then the percentage counter jumped like from 15% to 42%. >> >> But yeah, I missed that the fsck thing was specifically about a break >> between two meters. That's a separate problem, but also worth >> discussing (and hopefully much easier to address). >> >>> If you fsck this repository it'll take around (on my spinning rust >>> server) 30 seconds between 100% of "Checking object directories" before >>> you get any output from "Checking objects". >>> >>> The breakdown of that is (this is from approximate eyeballing): >>> >>> * We spend 1-3 seconds just on this: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L181 >> >> OK, so that's checking the sha1 over the .idx file. We could put a meter >> on that. I wouldn't expect it to generally be all that slow outside of >> pathological cases, since it scales with the number of objects (and 1s >> is our minimum update anyway, so that might be OK as-is). Your case has >> 13M objects, which is quite large. > > Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my > CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a > classic spinning disk with 5400RPM built in... > >> >>> * We spend the majority of the ~30s on this: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L70-L79 >> >> This is hashing the actual packfile. This is potentially quite long, >> especially if you have a ton of big objects. > > That seems to apply. BTW: Is there a way go get some repository statistics > like a histogram of object sizes (or whatever that might be useful to help > making decisions)? The git-sizer program is really helpful in this regard: https://github.com/github/git-sizer >> >> I wonder if we need to do this as a separate step anyway, though. Our >> verification is based on index-pack these days, which means it's going >> to walk over the whole content as part of the "Indexing objects" step to >> expand base objects and mark deltas for later. Could we feed this hash >> as part of that walk over the data? It's not going to save us 30s, but >> it's likely to be more efficient. And it would fold the effort naturally >> into the existing progress meter. >> >>> * Wes spend another 3-5 seconds on this QSORT: >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack > >> -check.c#L105 >> >> That's a tough one. I'm not sure how we'd count it (how many compares we >> do?). And each item is doing so little work that hitting the progress >> code may make things noticeably slower. > > If it's sorting, maybe add some code like (wild guess): > > if (objects_to_sort > magic_number) > message("Sorting something..."); I think a good solution to these cases is to just introduce something to the progress.c mode where it learns how to display a counter where we don't know what the end-state will be. Something like your proposed magic_number can just be covered under the more general case where we don't show the progress bar unless it's been 1 second (which I believe is the default). >> >> Again, your case is pretty big. Just based on the number of objects, >> linux.git should be 1.5-2.5 seconds on your machine for the same >> operation. Which I think may be small enough to ignore (or even just >> print a generic before/after). It's really the 30s packfile hash that's >> making the whole thing so terrible. >> >> -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason @ 2018-08-20 9:37 ` Ulrich Windl 2018-08-21 1:07 ` Jeff King 1 sibling, 0 replies; 25+ messages in thread From: Ulrich Windl @ 2018-08-20 9:37 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: peff, git Hi! Here are some stats from the repository. First the fast import ones (which had good performance, but probably all cached, also): % git fast-import <../git-stream /usr/lib/git/git-fast-import statistics: --------------------------------------------------------------------- Alloc'd objects: 55000 Total objects: 51959 ( 56 duplicates ) blobs : 47991 ( 0 duplicates 0 deltas of 0 attempts) trees : 3946 ( 56 duplicates 994 deltas of 3929 attempts) commits: 22 ( 0 duplicates 0 deltas of 0 attempts) tags : 0 ( 0 duplicates 0 deltas of 0 attempts) Total branches: 15 ( 15 loads ) marks: 1048576 ( 48013 unique ) atoms: 43335 Memory total: 9611 KiB pools: 7033 KiB objects: 2578 KiB --------------------------------------------------------------------- pack_report: getpagesize() = 4096 pack_report: core.packedGitWindowSize = 1073741824 pack_report: core.packedGitLimit = 8589934592 pack_report: pack_used_ctr = 1780 pack_report: pack_mmap_calls = 23 pack_report: pack_open_windows = 1 / 1 pack_report: pack_mapped = 2905751 / 2905751 --------------------------------------------------------------------- Then the output from git-sizer: Processing blobs: 47991 Processing trees: 3946 Processing commits: 22 Matching commits to trees: 22 Processing annotated tags: 0 Processing references: 15 | Name | Value | Level of concern | | ---------------------------- | --------- | ------------------------------ | | Overall repository size | | | | * Blobs | | | | * Total size | 13.7 GiB | * | | | | | | Biggest objects | | | | * Trees | | | | * Maximum entries [1] | 13.4 k | ************* | | * Blobs | | | | * Maximum size [2] | 279 MiB | ***************************** | | | | | | Biggest checkouts | | | | * Maximum path depth [3] | 10 | * | | * Maximum path length [3] | 130 B | * | | * Total size of files [3] | 8.63 GiB | ********* | [1] b701345cbd4317276568b9d9890fd77f38933bdc (refs/heads/master:Resources/default/CIFP) [2] 19f54c4a7595667329c1be23200234f0fe50ac56 (refs/heads/master:Resources/default/apt.dat) [3] b0e3d3a2b7f2504117408f13486c905a8eb8fb1e (refs/heads/master^{tree}) Some notes: [1] is a directory with many short (typically < 1kB) text files [2] is a very large text file with many changes [3] Yes, some paths are long Regards, Ulrich >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> schrieb am 20.08.2018 um 10:57 in Nachricht <87woslpg9i.fsf@evledraar.gmail.com>: > On Mon, Aug 20 2018, Ulrich Windl wrote: > >>>>> Jeff King <peff@peff.net> schrieb am 16.08.2018 um 22:55 in Nachricht >> <20180816205556.GA8257@sigill.intra.peff.net>: >>> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote: >>> >>>> This is all interesting, but I think unrelated to what Ulrich is talking >>>> about. Quote: >>>> >>>> Between the two phases of "git fsck" (checking directories and >>>> checking objects) there was a break of several seconds where no >>>> progress was indicated >>>> >>>> I.e. it's not about the pause you get with your testcase (which is >>>> certainly another issue) but the break between the two progress bars. >>> >>> I think he's talking about both. What I said responds to this: >> >> Hi guys! >> >> Yes, I was wondering what git does between the two visible phases, and > between >> the lines I was suggesting another progress message between those phases. At >> least the maximum unspecific three-dot-message "Thinking..." could be > displayed >> ;-) Of course anything more appropriate would be welcome. >> Also that message should only be displayed if it's foreseeable that the >> operation will take significant time. In my case (I just repeated it a few >> minutes ago) the delay is significant (at least 10 seconds). As noted > earlier I >> was hoping to capture the timing in a screencast, but it seems all the > delays >> were just optimized away in the recording. >> >>> >>>> >> During "git gc" the writing objects phase did not update for some >>>> >> seconds, but then the percentage counter jumped like from 15% to 42%. >>> >>> But yeah, I missed that the fsck thing was specifically about a break >>> between two meters. That's a separate problem, but also worth >>> discussing (and hopefully much easier to address). >>> >>>> If you fsck this repository it'll take around (on my spinning rust >>>> server) 30 seconds between 100% of "Checking object directories" before >>>> you get any output from "Checking objects". >>>> >>>> The breakdown of that is (this is from approximate eyeballing): >>>> >>>> * We spend 1-3 seconds just on this: >>>> >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack >> >>> -check.c#L181 >>> >>> OK, so that's checking the sha1 over the .idx file. We could put a meter >>> on that. I wouldn't expect it to generally be all that slow outside of >>> pathological cases, since it scales with the number of objects (and 1s >>> is our minimum update anyway, so that might be OK as-is). Your case has >>> 13M objects, which is quite large. >> >> Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my >> CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there > is a >> classic spinning disk with 5400RPM built in... >> >>> >>>> * We spend the majority of the ~30s on this: >>>> >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack >> >>> -check.c#L70-L79 >>> >>> This is hashing the actual packfile. This is potentially quite long, >>> especially if you have a ton of big objects. >> >> That seems to apply. BTW: Is there a way go get some repository statistics >> like a histogram of object sizes (or whatever that might be useful to help >> making decisions)? > > The git-sizer program is really helpful in this regard: > https://github.com/github/git-sizer > >>> >>> I wonder if we need to do this as a separate step anyway, though. Our >>> verification is based on index-pack these days, which means it's going >>> to walk over the whole content as part of the "Indexing objects" step to >>> expand base objects and mark deltas for later. Could we feed this hash >>> as part of that walk over the data? It's not going to save us 30s, but >>> it's likely to be more efficient. And it would fold the effort naturally >>> into the existing progress meter. >>> >>>> * Wes spend another 3-5 seconds on this QSORT: >>>> >>> >> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack >> >>> -check.c#L105 >>> >>> That's a tough one. I'm not sure how we'd count it (how many compares we >>> do?). And each item is doing so little work that hitting the progress >>> code may make things noticeably slower. >> >> If it's sorting, maybe add some code like (wild guess): >> >> if (objects_to_sort > magic_number) >> message("Sorting something..."); > > I think a good solution to these cases is to just introduce something to > the progress.c mode where it learns how to display a counter where we > don't know what the end-state will be. Something like your proposed > magic_number can just be covered under the more general case where we > don't show the progress bar unless it's been 1 second (which I believe > is the default). > >>> >>> Again, your case is pretty big. Just based on the number of objects, >>> linux.git should be 1.5-2.5 seconds on your machine for the same >>> operation. Which I think may be small enough to ignore (or even just >>> print a generic before/after). It's really the 30s packfile hash that's >>> making the whole thing so terrible. >>> >>> -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason 2018-08-20 9:37 ` Ulrich Windl @ 2018-08-21 1:07 ` Jeff King 2018-08-21 6:20 ` Ulrich Windl 2018-08-21 15:21 ` Duy Nguyen 1 sibling, 2 replies; 25+ messages in thread From: Jeff King @ 2018-08-21 1:07 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Mon, Aug 20, 2018 at 10:57:13AM +0200, Ævar Arnfjörð Bjarmason wrote: > > That seems to apply. BTW: Is there a way go get some repository statistics > > like a histogram of object sizes (or whatever that might be useful to help > > making decisions)? > > The git-sizer program is really helpful in this regard: > https://github.com/github/git-sizer Yeah, I'd very much agree that's the best tool for a general overview of the repository stats. Ulrich, if you want to more analysis (like a histogram), probably something like: git cat-file --batch-all-objects --batch-check='%(objectsize)' might be a good starting place to dump information (see the "BATCH OUTPUT" section of "git help cat-file" for more format items). > > If it's sorting, maybe add some code like (wild guess): > > > > if (objects_to_sort > magic_number) > > message("Sorting something..."); > > I think a good solution to these cases is to just introduce something to > the progress.c mode where it learns how to display a counter where we > don't know what the end-state will be. Something like your proposed > magic_number can just be covered under the more general case where we > don't show the progress bar unless it's been 1 second (which I believe > is the default). Yeah. We already have open-ended counters (e.g., "counting objects"), and delayed meters (we actually normalized the default to 2s recently). I _think_ they should work together OK without further modification. Once upon a time the caller had to say "don't show if we're past N% after M seconds", but I think with the current code we'd just show it if we're not completely finished after 2 seconds. So it really should just be a simple: progress = start_delayed_progress("Hashing packfile", 0); That said, counting bytes would probably be ugly (just because the numbers get really big). We'd probably prefer to show a throughput or something. And as you noted, there's some refactoring to be done with pack-check for it to show multiple progress meters. (I still think in the long run we would want to scrap that code, but that's a much bigger job; I'm fine if somebody wants to do incremental improvements in the meantime). -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-21 1:07 ` Jeff King @ 2018-08-21 6:20 ` Ulrich Windl 2018-08-21 15:21 ` Duy Nguyen 1 sibling, 0 replies; 25+ messages in thread From: Ulrich Windl @ 2018-08-21 6:20 UTC (permalink / raw) To: peff; +Cc: git >>> Jeff King <peff@peff.net> schrieb am 21.08.2018 um 03:07 in Nachricht <20180821010712.GA32126@sigill.intra.peff.net>: > On Mon, Aug 20, 2018 at 10:57:13AM +0200, Ævar Arnfjörð Bjarmason wrote: > [...] > So it really should just be a simple: > > progress = start_delayed_progress("Hashing packfile", 0); > > That said, counting bytes would probably be ugly (just because the > numbers get really big). We'd probably prefer to show a throughput or Hi! With big numbers you could apply scaling (the usual ISO suffixes), but after having read GBs, there wouldn't be much change visible unless you add significant fractional digits. So scaling seems good for absolute numbers, but not for growing numbers. Of course one could recursively scale the fractional parts again, but then the result will be lengthy again, then. Regards, Ulrich > something. And as you noted, there's some refactoring to be done with > pack-check for it to show multiple progress meters. > > (I still think in the long run we would want to scrap that code, but > that's a much bigger job; I'm fine if somebody wants to do incremental > improvements in the meantime). > > -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Antw: Re: non-smooth progress indication for git fsck and git gc 2018-08-21 1:07 ` Jeff King 2018-08-21 6:20 ` Ulrich Windl @ 2018-08-21 15:21 ` Duy Nguyen 1 sibling, 0 replies; 25+ messages in thread From: Duy Nguyen @ 2018-08-21 15:21 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Ulrich Windl, Git Mailing List On Tue, Aug 21, 2018 at 3:13 AM Jeff King <peff@peff.net> wrote: > I _think_ they should work together OK without further modification. > Once upon a time the caller had to say "don't show if we're past N% > after M seconds", but I think with the current code we'd just show it if > we're not completely finished after 2 seconds. > > So it really should just be a simple: > > progress = start_delayed_progress("Hashing packfile", 0); > > That said, counting bytes would probably be ugly (just because the > numbers get really big). We'd probably prefer to show a throughput or > something. Or just an ascii throbber. I think the important thing is communicate "I'm still doing something, not hanging up". "Hashing packfile" satisfies the "something" part, the throbber the "hanging". -- Duy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason 2018-08-16 20:55 ` Jeff King @ 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason 2018-09-01 13:52 ` Ævar Arnfjörð Bjarmason 2018-09-02 7:46 ` Jeff King 1 sibling, 2 replies; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-01 12:53 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Windl, git On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote: > On Thu, Aug 16 2018, Jeff King wrote: > >> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: >> >>> I'd like to point out some minor issue observed while processing some >>> 50000-object repository with many binary objects, but most are rather >>> small: >>> >>> Between the two phases of "git fsck" (checking directories and >>> checking objects) there was a break of several seconds where no >>> progress was indicated. >>> >>> During "git gc" the writing objects phase did not update for some >>> seconds, but then the percentage counter jumped like from 15% to 42%. >>> >>> I understand that updating the progress output too often can be a >>> performance bottleneck, while upating it too rarely might only bore >>> the user... ;-) >> >> We update the counter integer for every object we process, and then >> actually update the display whenever the percentage increases or a >> second has elapsed, whichever comes first. >> >> What you're seeing is likely an artifact of _what_ we're counting: >> written objects. Not all objects are the same size, so it's not uncommon >> to see thousands/sec when dealing with small ones, and then several >> seconds for one giant blob. >> >> The only way to solve that is to count bytes. We don't have a total byte >> count in most cases, and it wouldn't always make sense (e.g., the >> "Compressing objects" meter can show the same issue, but it's not really >> putting through bytes in a linear way). In some cases we do show >> transmitted size and throughput, but that's just for network operations. >> We could do the same for "gc" with the patch below. But usually >> throughput isn't all that interesting for a filesystem write, because >> bandwidth isn't the bottleneck. >> >> Possibly we could have a "half throughput" mode that counts up the total >> size written, but omits the speed indicator. That's not an unreasonable >> thing to show for a local pack, since you end up with the final pack >> size. The object counter would still be jumpy, but you'd at least have >> one number updated at least once per second as you put through a large >> blob. >> >> If you really want a smooth percentage, then we'd have to start counting >> bytes instead of objects. Two reasons we don't do that are: >> >> - we often don't know the total byte size exactly. E.g., for a >> packfile write, it depends on the result of deflating each object. >> You can make an approximation and just pretend at the end that you >> hit 100%. Or you can count the pre-deflate sizes, but then your >> meter doesn't match the bytes from the throughput counter. >> >> - for something like fsck, we're not actually writing out bytes. So I >> guess you'd be measuring "here's how many bytes of objects I >> fsck-ed". But is that on-disk compressed bytes, or decompressed >> bytes? >> >> If the former, that's only marginally better as a measure of effort, >> since delta compression means that a small number of on-disk bytes >> may require a big effort (imagine processing a 100 byte blob versus >> a 100 byte delta off of a 100MB blob). >> >> The latter is probably more accurate. But it's also not free to >> pre-generate the total. We can get the number of objects or the size >> of the packfile in constant-time, but totaling up the uncompressed >> size of all objects is O(n). So that's extra computation, but it >> also means a potential lag before we can start the progress meter. >> >> I'm also not sure how meaningful a byte count is for a user there. >> >> So there. That's probably more than you wanted to know about Git's >> progress code. I think it probably _could_ be improved by counting >> more/different things, but I also think it can be a bit of a rabbit >> hole. Which is why AFAIK nobody's really looked too seriously into >> changing it. >> >> -Peff > > This is all interesting, but I think unrelated to what Ulrich is talking > about. Quote: > > Between the two phases of "git fsck" (checking directories and > checking objects) there was a break of several seconds where no > progress was indicated > > I.e. it's not about the pause you get with your testcase (which is > certainly another issue) but the break between the two progress bars. > > Here's a test case you can clone: > https://github.com/avar/2015-04-03-1M-git (or might already have > "locally" :) > > If you fsck this repository it'll take around (on my spinning rust > server) 30 seconds between 100% of "Checking object directories" before > you get any output from "Checking objects". > > The breakdown of that is (this is from approximate eyeballing): > > * We spend 1-3 seconds just on this: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181 > > * We spend the majority of the ~30s on this: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 > > * Wes spend another 3-5 seconds on this QSORT: > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105 > > I.e. it's not about objects v.s. bytes, but the structural problem with > the code that we pass a progress bar down to verify_pack() which does a > lot of work in verify_pack_index() and verify_packfile() before we even > get to iterating over the objects in the file, and only then do we start > displaying progress. This patch almost entirely fixes the problem, except for the 1-3 seconds we take on the qsort: diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c index df0630bc6d..6d6b9b54c4 100644 --- a/sha1dc/sha1.c +++ b/sha1dc/sha1.c @@ -25,6 +25,7 @@ #include "sha1.h" #include "ubc_check.h" +#include "../progress.h" #if (defined(__amd64__) || defined(__amd64) || defined(__x86_64__) || defined(__x86_64) || \ defined(i386) || defined(__i386) || defined(__i386__) || defined(__i486__) || \ @@ -1820,6 +1821,8 @@ void SHA1DCSetCallback(SHA1_CTX* ctx, collision_block_callback callback) void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) { unsigned left, fill; + struct progress *progress; + size_t len_cp = len, done = 0; if (len == 0) return; @@ -1836,6 +1839,8 @@ void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) len -= fill; left = 0; } + + progress = start_delayed_progress(_("Hashing"), len_cp); while (len >= 64) { ctx->total += 64; @@ -1848,7 +1853,12 @@ void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) #endif /* defined(SHA1DC_ALLOW_UNALIGNED_ACCESS) */ buf += 64; len -= 64; + done += 64; + display_progress(progress, done); } + display_progress(progress, len_cp); + stop_progress(&progress); + if (len > 0) { ctx->total += len; With this we'll get output like: $ ~/g/git/git -C ~/g/2015-04-03-1M-git/ --exec-path=$PWD fsck Checking object directories: 100% (256/256), done. Hashing: 100% (452634108/452634108), done. Hashing: 100% (1073741824/1073741824), done. Hashing: 100% (1073741824/1073741824), done. Hashing: 100% (1008001572/1008001572), done. Checking objects: 2% (262144/13064614) ^C All tests pass with this. Isn't it awesome? Except it's of course a massive hack, we wouldn't want to just hook into SHA1DC like this. The problem comes down to us needing to call git_hash_sha1_update() with some really huge input, that function is going to take a *long* time, and the only way we're getting incremental progress is: 1) If we ourselves split the input into N chunks 2) If we hack into the SHA1 library itself This patch does #2, but for this to be acceptable we'd need to do something like #1. The least hacky way I can think of doing this is something like the following: We'd define two git_hash_algo entries for SHA-1, one that could give you progress reports, and that one would only ever hash in chunks of at most size N, and would wrap git_SHA1_Update() in something similar to the progress code above. We'd also need to extend the git_hash_algo struct to e.g. have a char * we'd use as a description, instead of just "Hashing". Then the caller here could just switch out the hash algo if show_progress was true, temporarily some "int do_progress" and "char *progress_description" entries in the struct, and off we go: https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L69-L80 Still a hack, but I can't see another way of making it work without compromising all hashing performance with some progress reporting. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason @ 2018-09-01 13:52 ` Ævar Arnfjörð Bjarmason 2018-09-02 7:46 ` Jeff King 1 sibling, 0 replies; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-01 13:52 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Windl, git On Sat, Sep 01 2018, Ævar Arnfjörð Bjarmason wrote: > On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote: > >> On Thu, Aug 16 2018, Jeff King wrote: >> >>> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: >>> >>>> I'd like to point out some minor issue observed while processing some >>>> 50000-object repository with many binary objects, but most are rather >>>> small: >>>> >>>> Between the two phases of "git fsck" (checking directories and >>>> checking objects) there was a break of several seconds where no >>>> progress was indicated. >>>> >>>> During "git gc" the writing objects phase did not update for some >>>> seconds, but then the percentage counter jumped like from 15% to 42%. >>>> >>>> I understand that updating the progress output too often can be a >>>> performance bottleneck, while upating it too rarely might only bore >>>> the user... ;-) >>> >>> We update the counter integer for every object we process, and then >>> actually update the display whenever the percentage increases or a >>> second has elapsed, whichever comes first. >>> >>> What you're seeing is likely an artifact of _what_ we're counting: >>> written objects. Not all objects are the same size, so it's not uncommon >>> to see thousands/sec when dealing with small ones, and then several >>> seconds for one giant blob. >>> >>> The only way to solve that is to count bytes. We don't have a total byte >>> count in most cases, and it wouldn't always make sense (e.g., the >>> "Compressing objects" meter can show the same issue, but it's not really >>> putting through bytes in a linear way). In some cases we do show >>> transmitted size and throughput, but that's just for network operations. >>> We could do the same for "gc" with the patch below. But usually >>> throughput isn't all that interesting for a filesystem write, because >>> bandwidth isn't the bottleneck. >>> >>> Possibly we could have a "half throughput" mode that counts up the total >>> size written, but omits the speed indicator. That's not an unreasonable >>> thing to show for a local pack, since you end up with the final pack >>> size. The object counter would still be jumpy, but you'd at least have >>> one number updated at least once per second as you put through a large >>> blob. >>> >>> If you really want a smooth percentage, then we'd have to start counting >>> bytes instead of objects. Two reasons we don't do that are: >>> >>> - we often don't know the total byte size exactly. E.g., for a >>> packfile write, it depends on the result of deflating each object. >>> You can make an approximation and just pretend at the end that you >>> hit 100%. Or you can count the pre-deflate sizes, but then your >>> meter doesn't match the bytes from the throughput counter. >>> >>> - for something like fsck, we're not actually writing out bytes. So I >>> guess you'd be measuring "here's how many bytes of objects I >>> fsck-ed". But is that on-disk compressed bytes, or decompressed >>> bytes? >>> >>> If the former, that's only marginally better as a measure of effort, >>> since delta compression means that a small number of on-disk bytes >>> may require a big effort (imagine processing a 100 byte blob versus >>> a 100 byte delta off of a 100MB blob). >>> >>> The latter is probably more accurate. But it's also not free to >>> pre-generate the total. We can get the number of objects or the size >>> of the packfile in constant-time, but totaling up the uncompressed >>> size of all objects is O(n). So that's extra computation, but it >>> also means a potential lag before we can start the progress meter. >>> >>> I'm also not sure how meaningful a byte count is for a user there. >>> >>> So there. That's probably more than you wanted to know about Git's >>> progress code. I think it probably _could_ be improved by counting >>> more/different things, but I also think it can be a bit of a rabbit >>> hole. Which is why AFAIK nobody's really looked too seriously into >>> changing it. >>> >>> -Peff >> >> This is all interesting, but I think unrelated to what Ulrich is talking >> about. Quote: >> >> Between the two phases of "git fsck" (checking directories and >> checking objects) there was a break of several seconds where no >> progress was indicated >> >> I.e. it's not about the pause you get with your testcase (which is >> certainly another issue) but the break between the two progress bars. >> >> Here's a test case you can clone: >> https://github.com/avar/2015-04-03-1M-git (or might already have >> "locally" :) >> >> If you fsck this repository it'll take around (on my spinning rust >> server) 30 seconds between 100% of "Checking object directories" before >> you get any output from "Checking objects". >> >> The breakdown of that is (this is from approximate eyeballing): >> >> * We spend 1-3 seconds just on this: >> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181 >> >> * We spend the majority of the ~30s on this: >> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 >> >> * Wes spend another 3-5 seconds on this QSORT: >> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105 >> >> I.e. it's not about objects v.s. bytes, but the structural problem with >> the code that we pass a progress bar down to verify_pack() which does a >> lot of work in verify_pack_index() and verify_packfile() before we even >> get to iterating over the objects in the file, and only then do we start >> displaying progress. > > This patch almost entirely fixes the problem, except for the 1-3 seconds > we take on the qsort: > > diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c > index df0630bc6d..6d6b9b54c4 100644 > --- a/sha1dc/sha1.c > +++ b/sha1dc/sha1.c > @@ -25,6 +25,7 @@ > > #include "sha1.h" > #include "ubc_check.h" > +#include "../progress.h" > > #if (defined(__amd64__) || defined(__amd64) || defined(__x86_64__) || defined(__x86_64) || \ > defined(i386) || defined(__i386) || defined(__i386__) || defined(__i486__) || \ > @@ -1820,6 +1821,8 @@ void SHA1DCSetCallback(SHA1_CTX* ctx, collision_block_callback callback) > void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) > { > unsigned left, fill; > + struct progress *progress; > + size_t len_cp = len, done = 0; > > if (len == 0) > return; > @@ -1836,6 +1839,8 @@ void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) > len -= fill; > left = 0; > } > + > + progress = start_delayed_progress(_("Hashing"), len_cp); > while (len >= 64) > { > ctx->total += 64; > @@ -1848,7 +1853,12 @@ void SHA1DCUpdate(SHA1_CTX* ctx, const char* buf, size_t len) > #endif /* defined(SHA1DC_ALLOW_UNALIGNED_ACCESS) */ > buf += 64; > len -= 64; > + done += 64; > + display_progress(progress, done); > } > + display_progress(progress, len_cp); > + stop_progress(&progress); > + > if (len > 0) > { > ctx->total += len; > > With this we'll get output like: > > $ ~/g/git/git -C ~/g/2015-04-03-1M-git/ --exec-path=$PWD fsck > Checking object directories: 100% (256/256), done. > Hashing: 100% (452634108/452634108), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1008001572/1008001572), done. > Checking objects: 2% (262144/13064614) > ^C > > All tests pass with this. Isn't it awesome? Except it's of course a > massive hack, we wouldn't want to just hook into SHA1DC like this. > > The problem comes down to us needing to call git_hash_sha1_update() with > some really huge input, that function is going to take a *long* time, > and the only way we're getting incremental progress is: > > 1) If we ourselves split the input into N chunks > 2) If we hack into the SHA1 library itself > > This patch does #2, but for this to be acceptable we'd need to do > something like #1. > > The least hacky way I can think of doing this is something like the > following: We'd define two git_hash_algo entries for SHA-1, one that could give you > progress reports, and that one would only ever hash in chunks of at most > size N, and would wrap git_SHA1_Update() in something similar to the > progress code above. > > We'd also need to extend the git_hash_algo struct to e.g. have a char * > we'd use as a description, instead of just "Hashing". > > Then the caller here could just switch out the hash algo if > show_progress was true, temporarily some "int do_progress" and "char > *progress_description" entries in the struct, and off we go: > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L69-L80 > > Still a hack, but I can't see another way of making it work without > compromising all hashing performance with some progress reporting. Here's a POC version which does #1, the chunk size is 1 byte (just for simplicity), this one's SHA-1 hash implementation independent: diff --git a/hash.h b/hash.h index 7c8238bc2e..f0cc96586d 100644 --- a/hash.h +++ b/hash.h @@ -52,8 +52,9 @@ #define GIT_HASH_UNKNOWN 0 /* SHA-1 */ #define GIT_HASH_SHA1 1 +#define GIT_HASH_SHA1_WITH_PROGRESS 2 /* Number of algorithms supported (including unknown). */ -#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1) +#define GIT_HASH_NALGOS (GIT_HASH_SHA1_WITH_PROGRESS + 1) /* A suitably aligned type for stack allocations of hash contexts. */ union git_hash_ctx { @@ -95,7 +96,11 @@ struct git_hash_algo { /* The OID of the empty blob. */ const struct object_id *empty_blob; + + /* Are we showing hashing progress? And if so what are we doing? */ + int show_progress; + char *progress_title; }; -extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; +extern struct git_hash_algo hash_algos[GIT_HASH_NALGOS]; #endif diff --git a/pack-check.c b/pack-check.c index d3a57df34f..a96a8f4d10 100644 --- a/pack-check.c +++ b/pack-check.c @@ -66,6 +66,9 @@ static int verify_packfile(struct packed_git *p, if (!is_pack_valid(p)) return error("packfile %s cannot be accessed", p->pack_name); + repo_set_hash_algo(the_repository, GIT_HASH_SHA1_WITH_PROGRESS); + the_hash_algo->show_progress = 1; + the_hash_algo->progress_title = xstrdup("Hashing"); the_hash_algo->init_fn(&ctx); do { unsigned long remaining; @@ -78,6 +81,9 @@ static int verify_packfile(struct packed_git *p, the_hash_algo->update_fn(&ctx, in, remaining); } while (offset < pack_sig_ofs); the_hash_algo->final_fn(hash, &ctx); + free(the_hash_algo->progress_title); + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); if (hashcmp(hash, pack_sig)) err = error("%s pack checksum mismatch", diff --git a/repository.h b/repository.h index 9f16c42c1e..1d2954773c 100644 --- a/repository.h +++ b/repository.h @@ -88,7 +88,7 @@ struct repository { struct index_state *index; /* Repository's current hash algorithm, as serialized on disk. */ - const struct git_hash_algo *hash_algo; + struct git_hash_algo *hash_algo; /* Configurations */ diff --git a/sha1-file.c b/sha1-file.c index 97b7423848..9caf1f02e7 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -32,6 +32,7 @@ #include "packfile.h" #include "fetch-object.h" #include "object-store.h" +#include "progress.h" /* The maximum size for an object header. */ #define MAX_HEADER_LEN 32 @@ -61,7 +62,33 @@ static void git_hash_sha1_init(git_hash_ctx *ctx) static void git_hash_sha1_update(git_hash_ctx *ctx, const void *data, size_t len) { - git_SHA1_Update(&ctx->sha1, data, len); + void *chunk; + int i; + + for (i = 0; i < len; i++) { + chunk = (void *)data + i; + git_SHA1_Update(&ctx->sha1, chunk, 1); + } +} + +static void git_hash_sha1_update_progress(git_hash_ctx *ctx, const void *data, size_t len) +{ + struct progress *progress; + void *chunk; + int i; + + if (len >= 100000) { + progress = start_delayed_progress(_(the_hash_algo->progress_title), len); + for (i = 0; i < len; i++) { + chunk = (void *)data + i; + git_SHA1_Update(&ctx->sha1, chunk, 1); + display_progress(progress, i); + } + display_progress(progress, i); + stop_progress(&progress); + } else { + git_SHA1_Update(&ctx->sha1, data, len); + } } static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx) @@ -84,7 +111,7 @@ static void git_hash_unknown_final(unsigned char *hash, git_hash_ctx *ctx) BUG("trying to finalize unknown hash"); } -const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { +struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { NULL, 0x00000000, @@ -95,6 +122,8 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { git_hash_unknown_final, NULL, NULL, + 0, + NULL, }, { "sha-1", @@ -107,6 +136,22 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { git_hash_sha1_final, &empty_tree_oid, &empty_blob_oid, + 0, + NULL, + }, + { + "sha-1", + /* "sha1", big-endian */ + 0x73686131, + GIT_SHA1_RAWSZ, + GIT_SHA1_HEXSZ, + git_hash_sha1_init, + git_hash_sha1_update_progress, + git_hash_sha1_final, + &empty_tree_oid, + &empty_blob_oid, + 0, + NULL, }, }; Now, setting hash_algos to s/const // is scary, also the whole notion of needing to store the "do progress" and "title" info there is a bit nasty. I don't actually make use of the "show_progress" member, instead I just hardcode a large input size that we'll start showing this at. Consider this an RFC. I'd like to submit some patch like this (with obvious bugs like it should chunk in some sane size fixed). It's nasty, but very useful. We'll get progress output during fsck where currently we just hang for 30s on this large repo. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason 2018-09-01 13:52 ` Ævar Arnfjörð Bjarmason @ 2018-09-02 7:46 ` Jeff King 2018-09-02 7:55 ` Jeff King 1 sibling, 1 reply; 25+ messages in thread From: Jeff King @ 2018-09-02 7:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Sat, Sep 01, 2018 at 02:53:28PM +0200, Ævar Arnfjörð Bjarmason wrote: > With this we'll get output like: > > $ ~/g/git/git -C ~/g/2015-04-03-1M-git/ --exec-path=$PWD fsck > Checking object directories: 100% (256/256), done. > Hashing: 100% (452634108/452634108), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1073741824/1073741824), done. > Hashing: 100% (1008001572/1008001572), done. > Checking objects: 2% (262144/13064614) > ^C > > All tests pass with this. Isn't it awesome? Except it's of course a > massive hack, we wouldn't want to just hook into SHA1DC like this. I still consider that output so-so; the byte counts are big and there's no indication how many "hashing" lines we're going to see. It's also broken up in a weird way (it's not one per file; it's one per giant chunk we fed to sha1). > The problem comes down to us needing to call git_hash_sha1_update() with > some really huge input, that function is going to take a *long* time, > and the only way we're getting incremental progress is: > > 1) If we ourselves split the input into N chunks > 2) If we hack into the SHA1 library itself > > This patch does #2, but for this to be acceptable we'd need to do > something like #1. I think we could just do the chunking in verify_packfile(), couldn't we? (And the .idx hash, if we really want to cover that case, but IMHO that's way less interesting). Something like this, which chunks it there, uses a per-packfile meter (though still does not give any clue how many packfiles there are), and shows a throughput meter. diff --git a/pack-check.c b/pack-check.c index d3a57df34f..c94223664f 100644 --- a/pack-check.c +++ b/pack-check.c @@ -62,10 +62,25 @@ static int verify_packfile(struct packed_git *p, uint32_t nr_objects, i; int err = 0; struct idx_entry *entries; + struct progress *hashing_progress; + char *title; + off_t total_hashed = 0; if (!is_pack_valid(p)) return error("packfile %s cannot be accessed", p->pack_name); + if (progress) { + /* Probably too long... */ + title = xstrfmt("Hashing %s", p->pack_name); + + /* + * I don't think it actually works to have two progresses going + * at the same time, because when the first one ends, we'll + * cancel the alarm. But hey, this is a hacky proof of concept. + */ + hashing_progress = start_progress(title, 0); + } + the_hash_algo->init_fn(&ctx); do { unsigned long remaining; @@ -75,9 +90,25 @@ static int verify_packfile(struct packed_git *p, pack_sig_ofs = p->pack_size - the_hash_algo->rawsz; if (offset > pack_sig_ofs) remaining -= (unsigned int)(offset - pack_sig_ofs); - the_hash_algo->update_fn(&ctx, in, remaining); + while (remaining) { + int chunk = remaining < 4096 ? remaining : 4096; + the_hash_algo->update_fn(&ctx, in, chunk); + in += chunk; + remaining -= chunk; + total_hashed += chunk; + /* + * The progress code needs tweaking to show throughputs + * better for open-ended meters. + */ + display_throughput(hashing_progress, total_hashed); + display_progress(hashing_progress, 0); + } } while (offset < pack_sig_ofs); + the_hash_algo->final_fn(hash, &ctx); + stop_progress(&hashing_progress); + free(title); + pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); if (hashcmp(hash, pack_sig)) err = error("%s pack checksum mismatch", ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-02 7:46 ` Jeff King @ 2018-09-02 7:55 ` Jeff King 2018-09-02 8:55 ` Jeff King 2018-09-04 15:53 ` Junio C Hamano 0 siblings, 2 replies; 25+ messages in thread From: Jeff King @ 2018-09-02 7:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Sun, Sep 02, 2018 at 03:46:57AM -0400, Jeff King wrote: > Something like this, which chunks it there, uses a per-packfile meter > (though still does not give any clue how many packfiles there are), and > shows a throughput meter. Actually, in typical cases it would not matter how many packfiles there are, because there's generally one big one, and then none of the small ones would take long enough to trigger the delayed meter. So you'd only see one such meter usually, and it would cover the majority of the time. In theory, anyway. I still think the more interesting long-term thing here is to reuse the pack verification from index-pack, which actually hashes as it does the per-object countup. That code isn't lib-ified enough to be run in process, but I think the patch below should give similar behavior to what fsck currently does. We'd need to tell index-pack to use our fsck.* config for its checks, I imagine. The progress here is still per-pack, but I think we could pass in sufficient information to have it do one continuous meter across all of the packs (see the in-code comment). And it makes the result multi-threaded, and lets us drop a bunch of duplicate code. --- builtin/fsck.c | 53 +++++++------ pack-check.c | 142 ----------------------------------- pack.h | 1 - 3 files changed, 32 insertions(+), 164 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 250f5af118..643e16125b 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -386,25 +386,6 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size) return err; } -static int fsck_obj_buffer(const struct object_id *oid, enum object_type type, - unsigned long size, void *buffer, int *eaten) -{ - /* - * Note, buffer may be NULL if type is OBJ_BLOB. See - * verify_packfile(), data_valid variable for details. - */ - struct object *obj; - obj = parse_object_buffer(the_repository, oid, type, size, buffer, - eaten); - if (!obj) { - errors_found |= ERROR_OBJECT; - return error("%s: object corrupt or missing", oid_to_hex(oid)); - } - obj->flags &= ~(REACHABLE | SEEN); - obj->flags |= HAS_OBJ; - return fsck_obj(obj, buffer, size); -} - static int default_refs; static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid, @@ -662,6 +643,37 @@ static int mark_packed_for_connectivity(const struct object_id *oid, return 0; } +static int verify_pack(struct packed_git *p) +{ + struct child_process index_pack = CHILD_PROCESS_INIT; + + for_each_object_in_pack(p, mark_packed_for_connectivity, NULL, 0); + + /* + * This should probably be replaced with a command-line option + * that teaches "index-pack --verify" to show a more compact and + * fsck-oriented progress (see also the "-v" below). + */ + if (show_progress) + fprintf(stderr, "Checking packfile %s...\n", + p->pack_name); + + index_pack.git_cmd = 1; + argv_array_pushl(&index_pack.args, + "index-pack", + "--verify", + "--strict", + NULL); + if (show_progress) + argv_array_push(&index_pack.args, "-v"); + argv_array_push(&index_pack.args, p->pack_name); + + if (run_command(&index_pack)) + return -1; + + return 0; +} + static char const * const fsck_usage[] = { N_("git fsck [<options>] [<object>...]"), NULL @@ -752,8 +764,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) for (p = get_packed_git(the_repository); p; p = p->next) { /* verify gives error messages itself */ - if (verify_pack(p, fsck_obj_buffer, - progress, count)) + if (verify_pack(p)) errors_found |= ERROR_PACK; count += p->num_objects; } diff --git a/pack-check.c b/pack-check.c index d3a57df34f..ea1457ce53 100644 --- a/pack-check.c +++ b/pack-check.c @@ -15,17 +15,6 @@ struct idx_entry { unsigned int nr; }; -static int compare_entries(const void *e1, const void *e2) -{ - const struct idx_entry *entry1 = e1; - const struct idx_entry *entry2 = e2; - if (entry1->offset < entry2->offset) - return -1; - if (entry1->offset > entry2->offset) - return 1; - return 0; -} - int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr) { @@ -48,121 +37,6 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, return data_crc != ntohl(*index_crc); } -static int verify_packfile(struct packed_git *p, - struct pack_window **w_curs, - verify_fn fn, - struct progress *progress, uint32_t base_count) - -{ - off_t index_size = p->index_size; - const unsigned char *index_base = p->index_data; - git_hash_ctx ctx; - unsigned char hash[GIT_MAX_RAWSZ], *pack_sig; - off_t offset = 0, pack_sig_ofs = 0; - uint32_t nr_objects, i; - int err = 0; - struct idx_entry *entries; - - if (!is_pack_valid(p)) - return error("packfile %s cannot be accessed", p->pack_name); - - the_hash_algo->init_fn(&ctx); - do { - unsigned long remaining; - unsigned char *in = use_pack(p, w_curs, offset, &remaining); - offset += remaining; - if (!pack_sig_ofs) - pack_sig_ofs = p->pack_size - the_hash_algo->rawsz; - if (offset > pack_sig_ofs) - remaining -= (unsigned int)(offset - pack_sig_ofs); - the_hash_algo->update_fn(&ctx, in, remaining); - } while (offset < pack_sig_ofs); - the_hash_algo->final_fn(hash, &ctx); - pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); - if (hashcmp(hash, pack_sig)) - err = error("%s pack checksum mismatch", - p->pack_name); - if (hashcmp(index_base + index_size - the_hash_algo->hexsz, pack_sig)) - err = error("%s pack checksum does not match its index", - p->pack_name); - unuse_pack(w_curs); - - /* Make sure everything reachable from idx is valid. Since we - * have verified that nr_objects matches between idx and pack, - * we do not do scan-streaming check on the pack file. - */ - nr_objects = p->num_objects; - ALLOC_ARRAY(entries, nr_objects + 1); - entries[nr_objects].offset = pack_sig_ofs; - /* first sort entries by pack offset, since unpacking them is more efficient that way */ - for (i = 0; i < nr_objects; i++) { - entries[i].oid.hash = nth_packed_object_sha1(p, i); - if (!entries[i].oid.hash) - die("internal error pack-check nth-packed-object"); - entries[i].offset = nth_packed_object_offset(p, i); - entries[i].nr = i; - } - QSORT(entries, nr_objects, compare_entries); - - for (i = 0; i < nr_objects; i++) { - void *data; - enum object_type type; - unsigned long size; - off_t curpos; - int data_valid; - - if (p->index_version > 1) { - off_t offset = entries[i].offset; - off_t len = entries[i+1].offset - offset; - unsigned int nr = entries[i].nr; - if (check_pack_crc(p, w_curs, offset, len, nr)) - err = error("index CRC mismatch for object %s " - "from %s at offset %"PRIuMAX"", - oid_to_hex(entries[i].oid.oid), - p->pack_name, (uintmax_t)offset); - } - - curpos = entries[i].offset; - type = unpack_object_header(p, w_curs, &curpos, &size); - unuse_pack(w_curs); - - if (type == OBJ_BLOB && big_file_threshold <= size) { - /* - * Let check_object_signature() check it with - * the streaming interface; no point slurping - * the data in-core only to discard. - */ - data = NULL; - data_valid = 0; - } else { - data = unpack_entry(the_repository, p, entries[i].offset, &type, &size); - data_valid = 1; - } - - if (data_valid && !data) - err = error("cannot unpack %s from %s at offset %"PRIuMAX"", - oid_to_hex(entries[i].oid.oid), p->pack_name, - (uintmax_t)entries[i].offset); - else if (check_object_signature(entries[i].oid.oid, data, size, type_name(type))) - err = error("packed %s from %s is corrupt", - oid_to_hex(entries[i].oid.oid), p->pack_name); - else if (fn) { - int eaten = 0; - err |= fn(entries[i].oid.oid, type, size, data, &eaten); - if (eaten) - data = NULL; - } - if (((base_count + i) & 1023) == 0) - display_progress(progress, base_count + i); - free(data); - - } - display_progress(progress, base_count + i); - free(entries); - - return err; -} - int verify_pack_index(struct packed_git *p) { off_t index_size; @@ -185,19 +59,3 @@ int verify_pack_index(struct packed_git *p) p->pack_name); return err; } - -int verify_pack(struct packed_git *p, verify_fn fn, - struct progress *progress, uint32_t base_count) -{ - int err = 0; - struct pack_window *w_curs = NULL; - - err |= verify_pack_index(p); - if (!p->index_data) - return -1; - - err |= verify_packfile(p, &w_curs, fn, progress, base_count); - unuse_pack(&w_curs); - - return err; -} diff --git a/pack.h b/pack.h index 34a9d458b4..0d346c5e31 100644 --- a/pack.h +++ b/pack.h @@ -80,7 +80,6 @@ typedef int (*verify_fn)(const struct object_id *, enum object_type, unsigned lo extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1); extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr); extern int verify_pack_index(struct packed_git *); -extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, uint32_t); extern off_t write_pack_header(struct hashfile *f, uint32_t); extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t); extern char *index_pack_lockfile(int fd); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-02 7:55 ` Jeff King @ 2018-09-02 8:55 ` Jeff King 2018-09-03 16:48 ` Ævar Arnfjörð Bjarmason 2018-09-04 15:53 ` Junio C Hamano 1 sibling, 1 reply; 25+ messages in thread From: Jeff King @ 2018-09-02 8:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Sun, Sep 02, 2018 at 03:55:28AM -0400, Jeff King wrote: > I still think the more interesting long-term thing here is to reuse the > pack verification from index-pack, which actually hashes as it does the > per-object countup. > > That code isn't lib-ified enough to be run in process, but I think the > patch below should give similar behavior to what fsck currently does. > We'd need to tell index-pack to use our fsck.* config for its checks, I > imagine. The progress here is still per-pack, but I think we could pass > in sufficient information to have it do one continuous meter across all > of the packs (see the in-code comment). > > And it makes the result multi-threaded, and lets us drop a bunch of > duplicate code. Here's a little polish on that to pass enough progress data to index-pack to let it have one nice continuous meter. I'm not sure if it's worth all the hackery or not. The dual-meter that index-pack generally uses is actually more informative (since it meters the initial pass through the pack and the delta reconstruction separately). And there are definitely a few nasty bits (like the way the progress is ended). I'm not planning on taking this further for now, but maybe you or somebody can find it interesting or useful. --- builtin/fsck.c | 59 +++++++++------ builtin/index-pack.c | 43 ++++++++++- pack-check.c | 142 ----------------------------------- pack.h | 1 - 4 files changed, 75 insertions(+), 170 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 250f5af118..2d774ea2e5 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -386,25 +386,6 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size) return err; } -static int fsck_obj_buffer(const struct object_id *oid, enum object_type type, - unsigned long size, void *buffer, int *eaten) -{ - /* - * Note, buffer may be NULL if type is OBJ_BLOB. See - * verify_packfile(), data_valid variable for details. - */ - struct object *obj; - obj = parse_object_buffer(the_repository, oid, type, size, buffer, - eaten); - if (!obj) { - errors_found |= ERROR_OBJECT; - return error("%s: object corrupt or missing", oid_to_hex(oid)); - } - obj->flags &= ~(REACHABLE | SEEN); - obj->flags |= HAS_OBJ; - return fsck_obj(obj, buffer, size); -} - static int default_refs; static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid, @@ -662,6 +643,32 @@ static int mark_packed_for_connectivity(const struct object_id *oid, return 0; } +static int verify_pack(struct packed_git *p, + unsigned long count, unsigned long total) +{ + struct child_process index_pack = CHILD_PROCESS_INIT; + + if (is_pack_valid(p) < 0) + return -1; + for_each_object_in_pack(p, mark_packed_for_connectivity, NULL, 0); + + index_pack.git_cmd = 1; + argv_array_pushl(&index_pack.args, + "index-pack", + "--verify-fsck", + NULL); + if (show_progress) + argv_array_pushf(&index_pack.args, + "--fsck-progress=%lu,%lu,Checking pack %s", + count, total, sha1_to_hex(p->sha1)); + argv_array_push(&index_pack.args, p->pack_name); + + if (run_command(&index_pack)) + return -1; + + return 0; +} + static char const * const fsck_usage[] = { N_("git fsck [<options>] [<object>...]"), NULL @@ -737,7 +744,6 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) if (check_full) { struct packed_git *p; uint32_t total = 0, count = 0; - struct progress *progress = NULL; if (show_progress) { for (p = get_packed_git(the_repository); p; @@ -746,18 +752,21 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) continue; total += p->num_objects; } - - progress = start_progress(_("Checking objects"), total); } for (p = get_packed_git(the_repository); p; p = p->next) { /* verify gives error messages itself */ - if (verify_pack(p, fsck_obj_buffer, - progress, count)) + if (verify_pack(p, count, total)) errors_found |= ERROR_PACK; count += p->num_objects; } - stop_progress(&progress); + /* + * Our child index-pack never calls stop_progress(), + * which lets each child appear on the same line. Now + * that we've finished all of them, we have to tie that + * off. + */ + fprintf(stderr, "\n"); } if (fsck_finish(&fsck_obj_options)) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 9582ead950..d03ec7bb89 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -85,6 +85,13 @@ static int show_resolving_progress; static int show_stat; static int check_self_contained_and_connected; +static int verify_fsck_mode; +/* unlike our normal 2-part progress, this counts up only total objects */ +struct progress *fsck_progress; +static uint32_t fsck_progress_cur; +static uint32_t fsck_progress_total; +static const char *fsck_progress_title; + static struct progress *progress; /* We always read in 4kB chunks. */ @@ -1100,6 +1107,8 @@ static void *threaded_second_pass(void *data) int i; counter_lock(); display_progress(progress, nr_resolved_deltas); + display_progress(fsck_progress, + fsck_progress_cur + nr_resolved_deltas); counter_unlock(); work_lock(); while (nr_dispatched < nr_objects && @@ -1145,18 +1154,23 @@ static void parse_pack_objects(unsigned char *hash) nr_ofs_deltas++; ofs_delta->obj_no = i; ofs_delta++; + /* no fsck_progress; we only count it when we've resolved the delta */ } else if (obj->type == OBJ_REF_DELTA) { ALLOC_GROW(ref_deltas, nr_ref_deltas + 1, ref_deltas_alloc); oidcpy(&ref_deltas[nr_ref_deltas].oid, &ref_delta_oid); ref_deltas[nr_ref_deltas].obj_no = i; nr_ref_deltas++; + /* no fsck_progress; we only count it when we've resolved the delta */ } else if (!data) { /* large blobs, check later */ obj->real_type = OBJ_BAD; nr_delays++; - } else + display_progress(fsck_progress, ++fsck_progress_cur); + } else { sha1_object(data, NULL, obj->size, obj->type, &obj->idx.oid); + display_progress(fsck_progress, ++fsck_progress_cur); + } free(data); display_progress(progress, i+1); } @@ -1238,6 +1252,8 @@ static void resolve_deltas(void) continue; resolve_base(obj); display_progress(progress, nr_resolved_deltas); + display_progress(fsck_progress, + fsck_progress_cur + nr_resolved_deltas); } } @@ -1391,6 +1407,8 @@ static void fix_unresolved_deltas(struct hashfile *f) base_obj->data, base_obj->size, type); find_unresolved_deltas(base_obj); display_progress(progress, nr_resolved_deltas); + display_progress(fsck_progress, + fsck_progress_cur + nr_resolved_deltas); } free(sorted_by_pos); } @@ -1714,6 +1732,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) verify = 1; show_stat = 1; stat_only = 1; + } else if (!strcmp(arg, "--verify-fsck")) { + verify_fsck_mode = 1; + verify = 1; + strict = 1; + do_fsck_object = 1; } else if (skip_to_optional_arg(arg, "--keep", &keep_msg)) { ; /* nothing to do */ } else if (skip_to_optional_arg(arg, "--promisor", &promisor_msg)) { @@ -1746,6 +1769,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) verbose = 1; } else if (!strcmp(arg, "--show-resolving-progress")) { show_resolving_progress = 1; + } else if (skip_prefix(arg, "--fsck-progress=", &arg)) { + char *end; + fsck_progress_cur = strtoul(arg, &end, 10); + if (*end++ != ',') + die("bad fsck-progress arg: %s", arg); + fsck_progress_total = strtoul(end, &end, 10); + if (*end++ != ',') + die("bad fsck-progress arg: %s", arg); + fsck_progress_title = end; } else if (!strcmp(arg, "--report-end-of-input")) { report_end_of_input = 1; } else if (!strcmp(arg, "-o")) { @@ -1800,6 +1832,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) } #endif + if (fsck_progress_total) + fsck_progress = start_progress(fsck_progress_title, + fsck_progress_total); + curr_pack = open_pack_file(pack_name); parse_pack_header(); objects = xcalloc(st_add(nr_objects, 1), sizeof(struct object_entry)); @@ -1833,8 +1869,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) else close(input_fd); - if (do_fsck_object && fsck_finish(&fsck_options)) + if (do_fsck_object && fsck_finish(&fsck_options)) { + if (verify_fsck_mode) + return 1; /* quietly return error */ die(_("fsck error in pack objects")); + } free(objects); strbuf_release(&index_name_buf); diff --git a/pack-check.c b/pack-check.c index d3a57df34f..ea1457ce53 100644 --- a/pack-check.c +++ b/pack-check.c @@ -15,17 +15,6 @@ struct idx_entry { unsigned int nr; }; -static int compare_entries(const void *e1, const void *e2) -{ - const struct idx_entry *entry1 = e1; - const struct idx_entry *entry2 = e2; - if (entry1->offset < entry2->offset) - return -1; - if (entry1->offset > entry2->offset) - return 1; - return 0; -} - int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr) { @@ -48,121 +37,6 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, return data_crc != ntohl(*index_crc); } -static int verify_packfile(struct packed_git *p, - struct pack_window **w_curs, - verify_fn fn, - struct progress *progress, uint32_t base_count) - -{ - off_t index_size = p->index_size; - const unsigned char *index_base = p->index_data; - git_hash_ctx ctx; - unsigned char hash[GIT_MAX_RAWSZ], *pack_sig; - off_t offset = 0, pack_sig_ofs = 0; - uint32_t nr_objects, i; - int err = 0; - struct idx_entry *entries; - - if (!is_pack_valid(p)) - return error("packfile %s cannot be accessed", p->pack_name); - - the_hash_algo->init_fn(&ctx); - do { - unsigned long remaining; - unsigned char *in = use_pack(p, w_curs, offset, &remaining); - offset += remaining; - if (!pack_sig_ofs) - pack_sig_ofs = p->pack_size - the_hash_algo->rawsz; - if (offset > pack_sig_ofs) - remaining -= (unsigned int)(offset - pack_sig_ofs); - the_hash_algo->update_fn(&ctx, in, remaining); - } while (offset < pack_sig_ofs); - the_hash_algo->final_fn(hash, &ctx); - pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); - if (hashcmp(hash, pack_sig)) - err = error("%s pack checksum mismatch", - p->pack_name); - if (hashcmp(index_base + index_size - the_hash_algo->hexsz, pack_sig)) - err = error("%s pack checksum does not match its index", - p->pack_name); - unuse_pack(w_curs); - - /* Make sure everything reachable from idx is valid. Since we - * have verified that nr_objects matches between idx and pack, - * we do not do scan-streaming check on the pack file. - */ - nr_objects = p->num_objects; - ALLOC_ARRAY(entries, nr_objects + 1); - entries[nr_objects].offset = pack_sig_ofs; - /* first sort entries by pack offset, since unpacking them is more efficient that way */ - for (i = 0; i < nr_objects; i++) { - entries[i].oid.hash = nth_packed_object_sha1(p, i); - if (!entries[i].oid.hash) - die("internal error pack-check nth-packed-object"); - entries[i].offset = nth_packed_object_offset(p, i); - entries[i].nr = i; - } - QSORT(entries, nr_objects, compare_entries); - - for (i = 0; i < nr_objects; i++) { - void *data; - enum object_type type; - unsigned long size; - off_t curpos; - int data_valid; - - if (p->index_version > 1) { - off_t offset = entries[i].offset; - off_t len = entries[i+1].offset - offset; - unsigned int nr = entries[i].nr; - if (check_pack_crc(p, w_curs, offset, len, nr)) - err = error("index CRC mismatch for object %s " - "from %s at offset %"PRIuMAX"", - oid_to_hex(entries[i].oid.oid), - p->pack_name, (uintmax_t)offset); - } - - curpos = entries[i].offset; - type = unpack_object_header(p, w_curs, &curpos, &size); - unuse_pack(w_curs); - - if (type == OBJ_BLOB && big_file_threshold <= size) { - /* - * Let check_object_signature() check it with - * the streaming interface; no point slurping - * the data in-core only to discard. - */ - data = NULL; - data_valid = 0; - } else { - data = unpack_entry(the_repository, p, entries[i].offset, &type, &size); - data_valid = 1; - } - - if (data_valid && !data) - err = error("cannot unpack %s from %s at offset %"PRIuMAX"", - oid_to_hex(entries[i].oid.oid), p->pack_name, - (uintmax_t)entries[i].offset); - else if (check_object_signature(entries[i].oid.oid, data, size, type_name(type))) - err = error("packed %s from %s is corrupt", - oid_to_hex(entries[i].oid.oid), p->pack_name); - else if (fn) { - int eaten = 0; - err |= fn(entries[i].oid.oid, type, size, data, &eaten); - if (eaten) - data = NULL; - } - if (((base_count + i) & 1023) == 0) - display_progress(progress, base_count + i); - free(data); - - } - display_progress(progress, base_count + i); - free(entries); - - return err; -} - int verify_pack_index(struct packed_git *p) { off_t index_size; @@ -185,19 +59,3 @@ int verify_pack_index(struct packed_git *p) p->pack_name); return err; } - -int verify_pack(struct packed_git *p, verify_fn fn, - struct progress *progress, uint32_t base_count) -{ - int err = 0; - struct pack_window *w_curs = NULL; - - err |= verify_pack_index(p); - if (!p->index_data) - return -1; - - err |= verify_packfile(p, &w_curs, fn, progress, base_count); - unuse_pack(&w_curs); - - return err; -} diff --git a/pack.h b/pack.h index 34a9d458b4..0d346c5e31 100644 --- a/pack.h +++ b/pack.h @@ -80,7 +80,6 @@ typedef int (*verify_fn)(const struct object_id *, enum object_type, unsigned lo extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1); extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr); extern int verify_pack_index(struct packed_git *); -extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, uint32_t); extern off_t write_pack_header(struct hashfile *f, uint32_t); extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t); extern char *index_pack_lockfile(int fd); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-02 8:55 ` Jeff King @ 2018-09-03 16:48 ` Ævar Arnfjörð Bjarmason 2018-09-07 3:30 ` Jeff King 0 siblings, 1 reply; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-03 16:48 UTC (permalink / raw) To: Jeff King; +Cc: Ulrich Windl, git On Sun, Sep 02 2018, Jeff King wrote: > On Sun, Sep 02, 2018 at 03:55:28AM -0400, Jeff King wrote: > >> I still think the more interesting long-term thing here is to reuse the >> pack verification from index-pack, which actually hashes as it does the >> per-object countup. >> >> That code isn't lib-ified enough to be run in process, but I think the >> patch below should give similar behavior to what fsck currently does. >> We'd need to tell index-pack to use our fsck.* config for its checks, I >> imagine. The progress here is still per-pack, but I think we could pass >> in sufficient information to have it do one continuous meter across all >> of the packs (see the in-code comment). >> >> And it makes the result multi-threaded, and lets us drop a bunch of >> duplicate code. > > Here's a little polish on that to pass enough progress data to > index-pack to let it have one nice continuous meter. I'm not sure if > it's worth all the hackery or not. The dual-meter that index-pack > generally uses is actually more informative (since it meters the initial > pass through the pack and the delta reconstruction separately). Thanks! > And there are definitely a few nasty bits (like the way the progress is > ended). I'm not planning on taking this further for now, but maybe > you or somebody can find it interesting or useful. I think it would be really nice if this were taken further. Using my perf test in https://public-inbox.org/git/20180903144928.30691-7-avarab@gmail.com/T/#u I get these results: $ GIT_PERF_LARGE_REPO=/home/aearnfjord/g/linux GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1450-fsck.sh [...] Test HEAD~ HEAD ---------------------------------------------------------------- 1450.1: fsck 384.18(381.63+2.53) 301.52(508.28+38.34) -21.5% I.e. this gives a 20% speedup, although of course some of that might be because some of this might be skipping too much work, but looks really promising. I don't know if I'll have time for it, and besides, you wrote the code so you already understand it *hint* *hint* :) > --- > builtin/fsck.c | 59 +++++++++------ > builtin/index-pack.c | 43 ++++++++++- > pack-check.c | 142 ----------------------------------- > pack.h | 1 - > 4 files changed, 75 insertions(+), 170 deletions(-) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 250f5af118..2d774ea2e5 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -386,25 +386,6 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size) > return err; > } > > -static int fsck_obj_buffer(const struct object_id *oid, enum object_type type, > - unsigned long size, void *buffer, int *eaten) > -{ > - /* > - * Note, buffer may be NULL if type is OBJ_BLOB. See > - * verify_packfile(), data_valid variable for details. > - */ > - struct object *obj; > - obj = parse_object_buffer(the_repository, oid, type, size, buffer, > - eaten); > - if (!obj) { > - errors_found |= ERROR_OBJECT; > - return error("%s: object corrupt or missing", oid_to_hex(oid)); > - } > - obj->flags &= ~(REACHABLE | SEEN); > - obj->flags |= HAS_OBJ; > - return fsck_obj(obj, buffer, size); > -} > - > static int default_refs; > > static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid, > @@ -662,6 +643,32 @@ static int mark_packed_for_connectivity(const struct object_id *oid, > return 0; > } > > +static int verify_pack(struct packed_git *p, > + unsigned long count, unsigned long total) > +{ > + struct child_process index_pack = CHILD_PROCESS_INIT; > + > + if (is_pack_valid(p) < 0) > + return -1; > + for_each_object_in_pack(p, mark_packed_for_connectivity, NULL, 0); > + > + index_pack.git_cmd = 1; > + argv_array_pushl(&index_pack.args, > + "index-pack", > + "--verify-fsck", > + NULL); > + if (show_progress) > + argv_array_pushf(&index_pack.args, > + "--fsck-progress=%lu,%lu,Checking pack %s", > + count, total, sha1_to_hex(p->sha1)); > + argv_array_push(&index_pack.args, p->pack_name); > + > + if (run_command(&index_pack)) > + return -1; > + > + return 0; > +} > + > static char const * const fsck_usage[] = { > N_("git fsck [<options>] [<object>...]"), > NULL > @@ -737,7 +744,6 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) > if (check_full) { > struct packed_git *p; > uint32_t total = 0, count = 0; > - struct progress *progress = NULL; > > if (show_progress) { > for (p = get_packed_git(the_repository); p; > @@ -746,18 +752,21 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) > continue; > total += p->num_objects; > } > - > - progress = start_progress(_("Checking objects"), total); > } > for (p = get_packed_git(the_repository); p; > p = p->next) { > /* verify gives error messages itself */ > - if (verify_pack(p, fsck_obj_buffer, > - progress, count)) > + if (verify_pack(p, count, total)) > errors_found |= ERROR_PACK; > count += p->num_objects; > } > - stop_progress(&progress); > + /* > + * Our child index-pack never calls stop_progress(), > + * which lets each child appear on the same line. Now > + * that we've finished all of them, we have to tie that > + * off. > + */ > + fprintf(stderr, "\n"); > } > > if (fsck_finish(&fsck_obj_options)) > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 9582ead950..d03ec7bb89 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -85,6 +85,13 @@ static int show_resolving_progress; > static int show_stat; > static int check_self_contained_and_connected; > > +static int verify_fsck_mode; > +/* unlike our normal 2-part progress, this counts up only total objects */ > +struct progress *fsck_progress; > +static uint32_t fsck_progress_cur; > +static uint32_t fsck_progress_total; > +static const char *fsck_progress_title; > + > static struct progress *progress; > > /* We always read in 4kB chunks. */ > @@ -1100,6 +1107,8 @@ static void *threaded_second_pass(void *data) > int i; > counter_lock(); > display_progress(progress, nr_resolved_deltas); > + display_progress(fsck_progress, > + fsck_progress_cur + nr_resolved_deltas); > counter_unlock(); > work_lock(); > while (nr_dispatched < nr_objects && > @@ -1145,18 +1154,23 @@ static void parse_pack_objects(unsigned char *hash) > nr_ofs_deltas++; > ofs_delta->obj_no = i; > ofs_delta++; > + /* no fsck_progress; we only count it when we've resolved the delta */ > } else if (obj->type == OBJ_REF_DELTA) { > ALLOC_GROW(ref_deltas, nr_ref_deltas + 1, ref_deltas_alloc); > oidcpy(&ref_deltas[nr_ref_deltas].oid, &ref_delta_oid); > ref_deltas[nr_ref_deltas].obj_no = i; > nr_ref_deltas++; > + /* no fsck_progress; we only count it when we've resolved the delta */ > } else if (!data) { > /* large blobs, check later */ > obj->real_type = OBJ_BAD; > nr_delays++; > - } else > + display_progress(fsck_progress, ++fsck_progress_cur); > + } else { > sha1_object(data, NULL, obj->size, obj->type, > &obj->idx.oid); > + display_progress(fsck_progress, ++fsck_progress_cur); > + } > free(data); > display_progress(progress, i+1); > } > @@ -1238,6 +1252,8 @@ static void resolve_deltas(void) > continue; > resolve_base(obj); > display_progress(progress, nr_resolved_deltas); > + display_progress(fsck_progress, > + fsck_progress_cur + nr_resolved_deltas); > } > } > > @@ -1391,6 +1407,8 @@ static void fix_unresolved_deltas(struct hashfile *f) > base_obj->data, base_obj->size, type); > find_unresolved_deltas(base_obj); > display_progress(progress, nr_resolved_deltas); > + display_progress(fsck_progress, > + fsck_progress_cur + nr_resolved_deltas); > } > free(sorted_by_pos); > } > @@ -1714,6 +1732,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) > verify = 1; > show_stat = 1; > stat_only = 1; > + } else if (!strcmp(arg, "--verify-fsck")) { > + verify_fsck_mode = 1; > + verify = 1; > + strict = 1; > + do_fsck_object = 1; > } else if (skip_to_optional_arg(arg, "--keep", &keep_msg)) { > ; /* nothing to do */ > } else if (skip_to_optional_arg(arg, "--promisor", &promisor_msg)) { > @@ -1746,6 +1769,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) > verbose = 1; > } else if (!strcmp(arg, "--show-resolving-progress")) { > show_resolving_progress = 1; > + } else if (skip_prefix(arg, "--fsck-progress=", &arg)) { > + char *end; > + fsck_progress_cur = strtoul(arg, &end, 10); > + if (*end++ != ',') > + die("bad fsck-progress arg: %s", arg); > + fsck_progress_total = strtoul(end, &end, 10); > + if (*end++ != ',') > + die("bad fsck-progress arg: %s", arg); > + fsck_progress_title = end; > } else if (!strcmp(arg, "--report-end-of-input")) { > report_end_of_input = 1; > } else if (!strcmp(arg, "-o")) { > @@ -1800,6 +1832,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) > } > #endif > > + if (fsck_progress_total) > + fsck_progress = start_progress(fsck_progress_title, > + fsck_progress_total); > + > curr_pack = open_pack_file(pack_name); > parse_pack_header(); > objects = xcalloc(st_add(nr_objects, 1), sizeof(struct object_entry)); > @@ -1833,8 +1869,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) > else > close(input_fd); > > - if (do_fsck_object && fsck_finish(&fsck_options)) > + if (do_fsck_object && fsck_finish(&fsck_options)) { > + if (verify_fsck_mode) > + return 1; /* quietly return error */ > die(_("fsck error in pack objects")); > + } > > free(objects); > strbuf_release(&index_name_buf); > diff --git a/pack-check.c b/pack-check.c > index d3a57df34f..ea1457ce53 100644 > --- a/pack-check.c > +++ b/pack-check.c > @@ -15,17 +15,6 @@ struct idx_entry { > unsigned int nr; > }; > > -static int compare_entries(const void *e1, const void *e2) > -{ > - const struct idx_entry *entry1 = e1; > - const struct idx_entry *entry2 = e2; > - if (entry1->offset < entry2->offset) > - return -1; > - if (entry1->offset > entry2->offset) > - return 1; > - return 0; > -} > - > int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, > off_t offset, off_t len, unsigned int nr) > { > @@ -48,121 +37,6 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, > return data_crc != ntohl(*index_crc); > } > > -static int verify_packfile(struct packed_git *p, > - struct pack_window **w_curs, > - verify_fn fn, > - struct progress *progress, uint32_t base_count) > - > -{ > - off_t index_size = p->index_size; > - const unsigned char *index_base = p->index_data; > - git_hash_ctx ctx; > - unsigned char hash[GIT_MAX_RAWSZ], *pack_sig; > - off_t offset = 0, pack_sig_ofs = 0; > - uint32_t nr_objects, i; > - int err = 0; > - struct idx_entry *entries; > - > - if (!is_pack_valid(p)) > - return error("packfile %s cannot be accessed", p->pack_name); > - > - the_hash_algo->init_fn(&ctx); > - do { > - unsigned long remaining; > - unsigned char *in = use_pack(p, w_curs, offset, &remaining); > - offset += remaining; > - if (!pack_sig_ofs) > - pack_sig_ofs = p->pack_size - the_hash_algo->rawsz; > - if (offset > pack_sig_ofs) > - remaining -= (unsigned int)(offset - pack_sig_ofs); > - the_hash_algo->update_fn(&ctx, in, remaining); > - } while (offset < pack_sig_ofs); > - the_hash_algo->final_fn(hash, &ctx); > - pack_sig = use_pack(p, w_curs, pack_sig_ofs, NULL); > - if (hashcmp(hash, pack_sig)) > - err = error("%s pack checksum mismatch", > - p->pack_name); > - if (hashcmp(index_base + index_size - the_hash_algo->hexsz, pack_sig)) > - err = error("%s pack checksum does not match its index", > - p->pack_name); > - unuse_pack(w_curs); > - > - /* Make sure everything reachable from idx is valid. Since we > - * have verified that nr_objects matches between idx and pack, > - * we do not do scan-streaming check on the pack file. > - */ > - nr_objects = p->num_objects; > - ALLOC_ARRAY(entries, nr_objects + 1); > - entries[nr_objects].offset = pack_sig_ofs; > - /* first sort entries by pack offset, since unpacking them is more efficient that way */ > - for (i = 0; i < nr_objects; i++) { > - entries[i].oid.hash = nth_packed_object_sha1(p, i); > - if (!entries[i].oid.hash) > - die("internal error pack-check nth-packed-object"); > - entries[i].offset = nth_packed_object_offset(p, i); > - entries[i].nr = i; > - } > - QSORT(entries, nr_objects, compare_entries); > - > - for (i = 0; i < nr_objects; i++) { > - void *data; > - enum object_type type; > - unsigned long size; > - off_t curpos; > - int data_valid; > - > - if (p->index_version > 1) { > - off_t offset = entries[i].offset; > - off_t len = entries[i+1].offset - offset; > - unsigned int nr = entries[i].nr; > - if (check_pack_crc(p, w_curs, offset, len, nr)) > - err = error("index CRC mismatch for object %s " > - "from %s at offset %"PRIuMAX"", > - oid_to_hex(entries[i].oid.oid), > - p->pack_name, (uintmax_t)offset); > - } > - > - curpos = entries[i].offset; > - type = unpack_object_header(p, w_curs, &curpos, &size); > - unuse_pack(w_curs); > - > - if (type == OBJ_BLOB && big_file_threshold <= size) { > - /* > - * Let check_object_signature() check it with > - * the streaming interface; no point slurping > - * the data in-core only to discard. > - */ > - data = NULL; > - data_valid = 0; > - } else { > - data = unpack_entry(the_repository, p, entries[i].offset, &type, &size); > - data_valid = 1; > - } > - > - if (data_valid && !data) > - err = error("cannot unpack %s from %s at offset %"PRIuMAX"", > - oid_to_hex(entries[i].oid.oid), p->pack_name, > - (uintmax_t)entries[i].offset); > - else if (check_object_signature(entries[i].oid.oid, data, size, type_name(type))) > - err = error("packed %s from %s is corrupt", > - oid_to_hex(entries[i].oid.oid), p->pack_name); > - else if (fn) { > - int eaten = 0; > - err |= fn(entries[i].oid.oid, type, size, data, &eaten); > - if (eaten) > - data = NULL; > - } > - if (((base_count + i) & 1023) == 0) > - display_progress(progress, base_count + i); > - free(data); > - > - } > - display_progress(progress, base_count + i); > - free(entries); > - > - return err; > -} > - > int verify_pack_index(struct packed_git *p) > { > off_t index_size; > @@ -185,19 +59,3 @@ int verify_pack_index(struct packed_git *p) > p->pack_name); > return err; > } > - > -int verify_pack(struct packed_git *p, verify_fn fn, > - struct progress *progress, uint32_t base_count) > -{ > - int err = 0; > - struct pack_window *w_curs = NULL; > - > - err |= verify_pack_index(p); > - if (!p->index_data) > - return -1; > - > - err |= verify_packfile(p, &w_curs, fn, progress, base_count); > - unuse_pack(&w_curs); > - > - return err; > -} > diff --git a/pack.h b/pack.h > index 34a9d458b4..0d346c5e31 100644 > --- a/pack.h > +++ b/pack.h > @@ -80,7 +80,6 @@ typedef int (*verify_fn)(const struct object_id *, enum object_type, unsigned lo > extern const char *write_idx_file(const char *index_name, struct pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const unsigned char *sha1); > extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr); > extern int verify_pack_index(struct packed_git *); > -extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, uint32_t); > extern off_t write_pack_header(struct hashfile *f, uint32_t); > extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t); > extern char *index_pack_lockfile(int fd); ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-03 16:48 ` Ævar Arnfjörð Bjarmason @ 2018-09-07 3:30 ` Jeff King 0 siblings, 0 replies; 25+ messages in thread From: Jeff King @ 2018-09-07 3:30 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Ulrich Windl, git On Mon, Sep 03, 2018 at 06:48:54PM +0200, Ævar Arnfjörð Bjarmason wrote: > > And there are definitely a few nasty bits (like the way the progress is > > ended). I'm not planning on taking this further for now, but maybe > > you or somebody can find it interesting or useful. > > I think it would be really nice if this were taken further. Using my > perf test in > https://public-inbox.org/git/20180903144928.30691-7-avarab@gmail.com/T/#u > I get these results: > > $ GIT_PERF_LARGE_REPO=/home/aearnfjord/g/linux GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1450-fsck.sh > [...] > Test HEAD~ HEAD > ---------------------------------------------------------------- > 1450.1: fsck 384.18(381.63+2.53) 301.52(508.28+38.34) -21.5% > > > I.e. this gives a 20% speedup, although of course some of that might be > because some of this might be skipping too much work, but looks really > promising. I'm pretty sure it's doing the correct thing, in terms of doing all the right checks. But look at your CPU time. You're getting a 20% wall-clock speedup, but spending a lot more CPU. So the main difference is really the multi-threading in index-pack. It should be strictly worse in terms of total CPU on a single-processor system because we're doing work in the sub-process (so we pay for the process invocation, but also we probably are unable to share things like in-memory commit structs, wasting a little extra time). So I'm on the fence on whether it is worth it. I like getting rid of the duplicated code. But on the other hand it is not all that complex, and maybe when it comes to things like fsck it is good to have a different implementation than the one that writes the .idx out in the first place. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: non-smooth progress indication for git fsck and git gc 2018-09-02 7:55 ` Jeff King 2018-09-02 8:55 ` Jeff King @ 2018-09-04 15:53 ` Junio C Hamano 1 sibling, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2018-09-04 15:53 UTC (permalink / raw) To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Ulrich Windl, git Jeff King <peff@peff.net> writes: > That code isn't lib-ified enough to be run in process, but I think the > patch below should give similar behavior to what fsck currently does. > We'd need to tell index-pack to use our fsck.* config for its checks, I > imagine. The progress here is still per-pack, but I think we could pass > in sufficient information to have it do one continuous meter across all > of the packs (see the in-code comment). > > And it makes the result multi-threaded, and lets us drop a bunch of > duplicate code. > > --- > builtin/fsck.c | 53 +++++++------ > pack-check.c | 142 ----------------------------------- > pack.h | 1 - > 3 files changed, 32 insertions(+), 164 deletions(-) The numbers here are nice, even to readers who do not necessarily care about the progress meter output ;-) ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2018-09-07 3:30 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-16 6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl 2018-08-16 15:18 ` Duy Nguyen 2018-08-16 16:05 ` Jeff King 2018-08-20 8:27 ` Antw: " Ulrich Windl 2018-08-16 15:57 ` Jeff King 2018-08-16 20:02 ` Jeff King 2018-08-16 22:10 ` Junio C Hamano 2018-08-16 20:35 ` Ævar Arnfjörð Bjarmason 2018-08-16 20:55 ` Jeff King 2018-08-16 21:06 ` Jeff King 2018-08-17 14:39 ` Duy Nguyen 2018-08-20 8:33 ` Antw: " Ulrich Windl 2018-08-20 8:57 ` Ævar Arnfjörð Bjarmason 2018-08-20 9:37 ` Ulrich Windl 2018-08-21 1:07 ` Jeff King 2018-08-21 6:20 ` Ulrich Windl 2018-08-21 15:21 ` Duy Nguyen 2018-09-01 12:53 ` Ævar Arnfjörð Bjarmason 2018-09-01 13:52 ` Ævar Arnfjörð Bjarmason 2018-09-02 7:46 ` Jeff King 2018-09-02 7:55 ` Jeff King 2018-09-02 8:55 ` Jeff King 2018-09-03 16:48 ` Ævar Arnfjörð Bjarmason 2018-09-07 3:30 ` Jeff King 2018-09-04 15:53 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.