* [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too @ 2016-07-07 19:09 Kirill Smelkov 2016-07-07 20:52 ` Jeff King 0 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-07 19:09 UTC (permalink / raw) To: Junio C Hamano Cc: Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov, Vicent Marti, Jeff King Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. We can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. (at least that's my understanding after briefly looking at the code) We also need to care and teach add_object_entry_from_bitmap() to respect --local via not adding nonlocal loose object to resultant pack (this is bitmap-codepath counterpart of daae0625 (pack-objects: extend --local to mean ignore non-local loose objects too) -- not to break 'loose objects in alternate ODB are not repacked' in t7700-repack.sh . Otherwise all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup Cc: Vicent Marti <tanoku@gmail.com> Cc: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 7 +++++-- t/t5310-pack-bitmaps.sh | 9 +++++++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a2f8cfd..be0ebe8 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1052,6 +1052,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, { uint32_t index_pos; + if (local && has_loose_object_nonlocal(sha1)) + return 0; + if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; @@ -2488,7 +2491,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2773,7 +2776,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..533fc31 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,15 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # pack-objects uses bitmap index by default, when it is available + packsha1=$(git pack-objects --all mypack </dev/null) && + git verify-pack mypack-$packsha1.pack +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.gb11dac7.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-07 19:09 [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too Kirill Smelkov @ 2016-07-07 20:52 ` Jeff King 2016-07-08 10:38 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Jeff King @ 2016-07-07 20:52 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Thu, Jul 07, 2016 at 10:09:17PM +0300, Kirill Smelkov wrote: > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > if a repository has bitmap index, pack-objects can nicely speedup > "Counting objects" graph traversal phase. That however was done only for > case when resultant pack is sent to stdout, not written into a file. > > We can teach pack-objects to use bitmap index for initial object > counting phase when generating resultant pack file too: I'm not sure this is a good idea in general. When bitmaps are in use, we cannot fill out the details in the object-packing list as thoroughly. In particular: - we will not compute the same write order (which is based on traversal order), leading to packs that have less efficient cache characteristics - we don't learn about the filename of trees and blobs, which is going to make the delta step much less efficient. This might be mitigated by turning on the bitmap name-hash cache; I don't recall how much detail pack-objects needs on the name (i.e., the full name versus just the hash). There may be other subtle things, too. The general idea of tying the bitmap use to pack_to_stdout is that you _do_ want to use it for serving fetches and pushes, but for a full on-disk repack via gc, it's more important to generate a good pack. Your use case: > git-backup extracts many packs on repositories restoration. That was my > initial motivation for the patch. Seems to be somewhere in between. I'm not sure I understand how you're invoking pack-objects here, but I wonder if you should be using "pack-objects --stdout" yourself. But even if it is the right thing for your use case to be using bitmaps to generate an on-disk bitmap, I think we should be making sure it _doesn't_ trigger when doing a normal repack. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-07 20:52 ` Jeff King @ 2016-07-08 10:38 ` Kirill Smelkov 2016-07-12 19:08 ` Kirill Smelkov 2016-07-13 8:26 ` Jeff King 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-07-08 10:38 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti Peff first of all thanks for feedback, On Thu, Jul 07, 2016 at 04:52:23PM -0400, Jeff King wrote: > On Thu, Jul 07, 2016 at 10:09:17PM +0300, Kirill Smelkov wrote: > > > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > > if a repository has bitmap index, pack-objects can nicely speedup > > "Counting objects" graph traversal phase. That however was done only for > > case when resultant pack is sent to stdout, not written into a file. > > > > We can teach pack-objects to use bitmap index for initial object > > counting phase when generating resultant pack file too: > > I'm not sure this is a good idea in general. When bitmaps are in use, we > cannot fill out the details in the object-packing list as thoroughly. In > particular: > > - we will not compute the same write order (which is based on > traversal order), leading to packs that have less efficient cache > characteristics I agree the order can be not exactly the same. Still if original pack is packed well (with good recency order), while using bitmap we will tend to traverse it in close to original order. Maybe I'm not completely right on this, but to me it looks to be the case because if objects in original pack are put there linearly sorted by recency order, and we use bitmap index to set of all reachable objects from a root, and then just _linearly_ gather all those objects from original pack by 1s in bitmap and put them in the same order into destination pack, the recency order won't be broken. Or am I maybe misunderstanding something? Please also see below: > - we don't learn about the filename of trees and blobs, which is going > to make the delta step much less efficient. This might be mitigated > by turning on the bitmap name-hash cache; I don't recall how much > detail pack-objects needs on the name (i.e., the full name versus > just the hash). If I understand it right, it uses only uint32_t name hash while searching. From pack-objects.{h,c} : ---- 8< ---- struct object_entry { ... uint32_t hash; /* name hint hash */ /* * We search for deltas in a list sorted by type, by filename hash, and then * by size, so that we see progressively smaller and smaller files. * That's because we prefer deltas to be from the bigger file * to the smaller -- deletes are potentially cheaper, but perhaps * more importantly, the bigger file is likely the more recent * one. The deepest deltas are therefore the oldest objects which are * less susceptible to be accessed often. */ static int type_size_sort(const void *_a, const void *_b) { const struct object_entry *a = *(struct object_entry **)_a; const struct object_entry *b = *(struct object_entry **)_b; if (a->type > b->type) return -1; if (a->type < b->type) return 1; if (a->hash > b->hash) return -1; if (a->hash < b->hash) return 1; ... ---- 8< ---- Documentation/technical/pack-heuristics.txt also confirms this: ---- 8< ---- ... <gitster> The quote from the above linus should be rewritten a bit (wait for it): - first sort by type. Different objects never delta with each other. - then sort by filename/dirname. hash of the basename occupies the top BITS_PER_INT-DIR_BITS bits, and bottom DIR_BITS are for the hash of leading path elements. ... If I might add, the trick is to make files that _might_ be similar be located close to each other in the hash buckets based on their file names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and "Makefile" all landed in the same bucket due to their common basename, "Makefile". However, now they land in "close" buckets. The algorithm allows not just for the _same_ bucket, but for _close_ buckets to be considered delta candidates. The rationale is essentially that files, like Makefiles, often have very similar content no matter what directory they live in. ---- 8< ---- So yes, exactly as you say with pack.writeBitmapHashCache=true (ae4f07fb) the delta-search heuristics is almost as efficient as with just raw filenames. I can confirm this also via e.g. (with my patch applied) : ---- 8< ---- $ time echo 0186ac99 | git pack-objects --no-use-bitmap-index --revs erp5pack-plain Counting objects: 627171, done. Compressing objects: 100% (176949/176949), done. 50570987560d481742af4a8083028c2322a0534a Writing objects: 100% (627171/627171), done. Total 627171 (delta 439404), reused 594820 (delta 410210) real 0m37.272s user 0m33.648s sys 0m1.580s $ time echo 0186ac99 | git pack-objects --revs erp5pack-bitmap Counting objects: 627171, done. Compressing objects: 100% (176914/176914), done. 7c15a9b1eca1326e679297b217c5a48954625ca2 Writing objects: 100% (627171/627171), done. Total 627171 (delta 439484), reused 594855 (delta 410245) real 0m27.020s user 0m23.364s sys 0m0.992s $ ll erp5pack-{plain,bitmap}* 17561860 erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.idx 238760161 erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.pack 17561860 erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.idx 238634201 erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.pack ---- 8< ---- ( By the way about pack generated with bitmap retaining close recency order: ---- 8< ---- $ git verify-pack -v erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.pack >1 $ git verify-pack -v erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.pack >2 $ grep commit 1 |awk '{print $1}' >1.commit $ grep commit 2 |awk '{print $1}' >2.commit $ wc -l 1.commit 46136 1.commit $ wc -l 2.commit 46136 2.commit $ diff -u0 1.commit 2.commit |wc -l 55 ---- 8< ---- so 55/46136 shows it is very almost the same. ) > There may be other subtle things, too. The general idea of tying the > bitmap use to pack_to_stdout is that you _do_ want to use it for > serving fetches and pushes, but for a full on-disk repack via gc, it's > more important to generate a good pack. It is better we send good packs to clients too, right? And with pack.writeBitmapHashCache=true and retaining recency order (please see above, but again maybe I'm not completely right) to me we should be still generating a good pack while using bitmap reachability index for object graph traversal. > Your use case: > > > git-backup extracts many packs on repositories restoration. That was my > > initial motivation for the patch. > > Seems to be somewhere in between. I'm not sure I understand how you're > invoking pack-objects here, It is just pack-objects --revs --reuse-object --reuse-delta --delta-base-offset extractedrepo/objects/pack/pack < SHA1-HEADS https://lab.nexedi.com/kirr/git-backup/blob/7fcb8c67/git-backup.go#L829 > but I wonder if you should be using "pack-objects --stdout" yourself. I already tried --stdout. The problem is on repository extraction we need to both extract the pack and index it. While `pack-object file` does both, for --stdout case we need to additionally index extracted pack with `git index-pack`, and standalone `git index-pack` is very slow - in my experience much slower than generating the pack itself: ---- 8< ---- $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack Counting objects: 627171, done. Compressing objects: 100% (176914/176914), done. Total 627171 (delta 439484), reused 594855 (delta 410245) real 0m22.309s user 0m21.148s sys 0m0.932s $ ll erp5pack-stdout* 238760161 erp5pack-stdout.pack $ time git index-pack erp5pack-stdout.pack 7c15a9b1eca1326e679297b217c5a48954625ca2 real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s $ ll erp5pack-stdout* 17561860 erp5pack-stdout.idx 238760161 erp5pack-stdout.pack ---- 8< ---- So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. I've tried to briefly see why index-pack is so slow and offhand I can see that it needs to load all objects, decompresses them etc (maybe I'm not so right here - I looked only briefly), while pack-objects while generating the pack has all needed information directly at hand and thus can emit index much more easily. For sever - clients scenario, index-pack load is put onto clients thus offloading server, but for my use case where extracted repository is on the same machine the load does not go away. That's why for me it makes more sense to emit both pack and its index in one go. Still it would be interesting to eventually see why index-pack is so anomaly slow. > But even if it is the right thing for your use case to be using bitmaps > to generate an on-disk bitmap, I think we should be making sure it > _doesn't_ trigger when doing a normal repack. So seems the way forward here is to teach pack-objects not to silently drop explicit --use-pack-bitmap for cases when it can handle it? (currently even if this option was given, for !stdout cases pack-objects simply drop use_bitmap_index to 0). And to make sure default for use_bitmap_index is 0 for !stdout cases? Or are we fine with my arguments about recency order staying the same when using bitmap reachability index for object graph traversal, and this way the patch is fine to go in as it is? Thanks again, Kirill ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-08 10:38 ` Kirill Smelkov @ 2016-07-12 19:08 ` Kirill Smelkov 2016-07-13 8:30 ` Jeff King 2016-07-13 8:26 ` Jeff King 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-12 19:08 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > Peff first of all thanks for feedback, > > On Thu, Jul 07, 2016 at 04:52:23PM -0400, Jeff King wrote: > > On Thu, Jul 07, 2016 at 10:09:17PM +0300, Kirill Smelkov wrote: > > > > > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > > > if a repository has bitmap index, pack-objects can nicely speedup > > > "Counting objects" graph traversal phase. That however was done only for > > > case when resultant pack is sent to stdout, not written into a file. > > > > > > We can teach pack-objects to use bitmap index for initial object > > > counting phase when generating resultant pack file too: > > > > I'm not sure this is a good idea in general. When bitmaps are in use, we > > cannot fill out the details in the object-packing list as thoroughly. In > > particular: > > > > - we will not compute the same write order (which is based on > > traversal order), leading to packs that have less efficient cache > > characteristics > > I agree the order can be not exactly the same. Still if original pack is > packed well (with good recency order), while using bitmap we will tend > to traverse it in close to original order. > > Maybe I'm not completely right on this, but to me it looks to be the > case because if objects in original pack are put there linearly sorted > by recency order, and we use bitmap index to set of all reachable > objects from a root, and then just _linearly_ gather all those objects > from original pack by 1s in bitmap and put them in the same order into > destination pack, the recency order won't be broken. > > Or am I maybe misunderstanding something? > > Please also see below: > > > - we don't learn about the filename of trees and blobs, which is going > > to make the delta step much less efficient. This might be mitigated > > by turning on the bitmap name-hash cache; I don't recall how much > > detail pack-objects needs on the name (i.e., the full name versus > > just the hash). > > If I understand it right, it uses only uint32_t name hash while searching. From > pack-objects.{h,c} : > > ---- 8< ---- > struct object_entry { > ... > uint32_t hash; /* name hint hash */ > > > /* > * We search for deltas in a list sorted by type, by filename hash, and then > * by size, so that we see progressively smaller and smaller files. > * That's because we prefer deltas to be from the bigger file > * to the smaller -- deletes are potentially cheaper, but perhaps > * more importantly, the bigger file is likely the more recent > * one. The deepest deltas are therefore the oldest objects which are > * less susceptible to be accessed often. > */ > static int type_size_sort(const void *_a, const void *_b) > { > const struct object_entry *a = *(struct object_entry **)_a; > const struct object_entry *b = *(struct object_entry **)_b; > > if (a->type > b->type) > return -1; > if (a->type < b->type) > return 1; > if (a->hash > b->hash) > return -1; > if (a->hash < b->hash) > return 1; > ... > ---- 8< ---- > > Documentation/technical/pack-heuristics.txt also confirms this: > > ---- 8< ---- > ... > <gitster> The quote from the above linus should be rewritten a > bit (wait for it): > - first sort by type. Different objects never delta with > each other. > - then sort by filename/dirname. hash of the basename > occupies the top BITS_PER_INT-DIR_BITS bits, and bottom > DIR_BITS are for the hash of leading path elements. > > ... > > If I might add, the trick is to make files that _might_ be similar be > located close to each other in the hash buckets based on their file > names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and > "Makefile" all landed in the same bucket due to their common basename, > "Makefile". However, now they land in "close" buckets. > > The algorithm allows not just for the _same_ bucket, but for _close_ > buckets to be considered delta candidates. The rationale is > essentially that files, like Makefiles, often have very similar > content no matter what directory they live in. > ---- 8< ---- > > > So yes, exactly as you say with pack.writeBitmapHashCache=true (ae4f07fb) the > delta-search heuristics is almost as efficient as with just raw filenames. > > I can confirm this also via e.g. (with my patch applied) : > > ---- 8< ---- > $ time echo 0186ac99 | git pack-objects --no-use-bitmap-index --revs erp5pack-plain > Counting objects: 627171, done. > Compressing objects: 100% (176949/176949), done. > 50570987560d481742af4a8083028c2322a0534a > Writing objects: 100% (627171/627171), done. > Total 627171 (delta 439404), reused 594820 (delta 410210) > > real 0m37.272s > user 0m33.648s > sys 0m1.580s > > $ time echo 0186ac99 | git pack-objects --revs erp5pack-bitmap > Counting objects: 627171, done. > Compressing objects: 100% (176914/176914), done. > 7c15a9b1eca1326e679297b217c5a48954625ca2 > Writing objects: 100% (627171/627171), done. > Total 627171 (delta 439484), reused 594855 (delta 410245) > > real 0m27.020s > user 0m23.364s > sys 0m0.992s > > $ ll erp5pack-{plain,bitmap}* > 17561860 erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.idx > 238760161 erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.pack > 17561860 erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.idx > 238634201 erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.pack > ---- 8< ---- > > ( By the way about pack generated with bitmap retaining close recency > order: > > ---- 8< ---- > $ git verify-pack -v erp5pack-plain-50570987560d481742af4a8083028c2322a0534a.pack >1 > $ git verify-pack -v erp5pack-bitmap-7c15a9b1eca1326e679297b217c5a48954625ca2.pack >2 > $ grep commit 1 |awk '{print $1}' >1.commit > $ grep commit 2 |awk '{print $1}' >2.commit > $ wc -l 1.commit > 46136 1.commit > $ wc -l 2.commit > 46136 2.commit > $ diff -u0 1.commit 2.commit |wc -l > 55 > ---- 8< ---- > > so 55/46136 shows it is very almost the same. ) > > > > There may be other subtle things, too. The general idea of tying the > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > more important to generate a good pack. > > It is better we send good packs to clients too, right? And with > pack.writeBitmapHashCache=true and retaining recency order (please see > above, but again maybe I'm not completely right) to me we should be still > generating a good pack while using bitmap reachability index for object > graph traversal. > > > Your use case: > > > > > git-backup extracts many packs on repositories restoration. That was my > > > initial motivation for the patch. > > > > Seems to be somewhere in between. I'm not sure I understand how you're > > invoking pack-objects here, > > It is just > > pack-objects --revs --reuse-object --reuse-delta --delta-base-offset extractedrepo/objects/pack/pack < SHA1-HEADS > > https://lab.nexedi.com/kirr/git-backup/blob/7fcb8c67/git-backup.go#L829 > > > but I wonder if you should be using "pack-objects --stdout" yourself. > > I already tried --stdout. The problem is on repository extraction we > need to both extract the pack and index it. While `pack-object file` > does both, for --stdout case we need to additionally index extracted > pack with `git index-pack`, and standalone `git index-pack` is very slow > - in my experience much slower than generating the pack itself: > > ---- 8< ---- > $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack > Counting objects: 627171, done. > Compressing objects: 100% (176914/176914), done. > Total 627171 (delta 439484), reused 594855 (delta 410245) > > real 0m22.309s > user 0m21.148s > sys 0m0.932s > > $ ll erp5pack-stdout* > 238760161 erp5pack-stdout.pack > > $ time git index-pack erp5pack-stdout.pack > 7c15a9b1eca1326e679297b217c5a48954625ca2 > > real 0m50.873s <-- more than 2 times slower than time to generate pack itself! > user 0m49.300s > sys 0m1.360s > > $ ll erp5pack-stdout* > 17561860 erp5pack-stdout.idx > 238760161 erp5pack-stdout.pack > ---- 8< ---- > > So the time for > > `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, > > while > > `pack-objects file.pack` which does both pack and index is 27s. > > And even > > `pack-objects --no-use-bitmap-index file.pack` is 37s. > > > I've tried to briefly see why index-pack is so slow and offhand I can > see that it needs to load all objects, decompresses them etc (maybe I'm > not so right here - I looked only briefly), while pack-objects while > generating the pack has all needed information directly at hand and thus > can emit index much more easily. > > For sever - clients scenario, index-pack load is put onto clients thus > offloading server, but for my use case where extracted repository is on > the same machine the load does not go away. > > That's why for me it makes more sense to emit both pack and its index in > one go. > > Still it would be interesting to eventually see why index-pack is so > anomaly slow. > > > But even if it is the right thing for your use case to be using bitmaps > > to generate an on-disk bitmap, I think we should be making sure it > > _doesn't_ trigger when doing a normal repack. > > So seems the way forward here is to teach pack-objects not to silently > drop explicit --use-pack-bitmap for cases when it can handle it? > (currently even if this option was given, for !stdout cases pack-objects > simply drop use_bitmap_index to 0). > > And to make sure default for use_bitmap_index is 0 for !stdout cases? > > Or are we fine with my arguments about recency order staying the same > when using bitmap reachability index for object graph traversal, and this > way the patch is fine to go in as it is? Since there is no reply I assume the safe way to go is to let default for pack-to-file case to be "not using bitmap index". Please find updated patch and interdiff below. I would still be grateful for feedback on my above use-bitmap-for-pack-to-file arguments. Thanks, Kirill (interdiff) diff --git a/Documentation/config.txt b/Documentation/config.txt index e455fae..1888f42 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2241,12 +2241,20 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. -pack.useBitmaps:: +pack.useBitmaps (deprecated):: + This is a deprecated synonym for `pack.useBitmaps.stdout`. + +pack.useBitmaps.stdout:: When true, git will use pack bitmaps (if available) when packing to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. +pack.useBitmaps.file:: + When true, git will use pack bitmaps (if available) when packing + to file (e.g., on repack). Defaults to false. You should not + generally need to turn this on unless you know what you are doing. + pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index be0ebe8..7aaa1af 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_stdout = 1, use_bitmap_file = 0; +static int use_bitmap_index; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2227,8 +2228,12 @@ static int git_pack_config(const char *k, const char *v, void *cb) else write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } - if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + if (!strcmp(k, "pack.usebitmaps") || !strcmp(k, "pack.usebitmaps.stdout")) { + use_bitmap_stdout = git_config_bool(k, v); + return 0; + } + if (!strcmp(k, "pack.usebitmaps.file")) { + use_bitmap_file = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2705,6 +2710,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(&pack_idx_opts); git_config(git_pack_config, NULL); + use_bitmap_index = pack_to_stdout ? use_bitmap_stdout : use_bitmap_file; if (!pack_compression_seen && core_compression_seen) pack_compression_level = core_compression_level; diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 533fc31..9fab2bb 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -122,9 +122,14 @@ test_expect_success 'pack-objects to file can use bitmap' ' # make sure we still have 1 bitmap index from previous tests ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && - # pack-objects uses bitmap index by default, when it is available - packsha1=$(git pack-objects --all mypack </dev/null) && - git verify-pack mypack-$packsha1.pack + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects ' test_expect_success 'full repack, reusing previous bitmaps' ' ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Date: Thu, 7 Jul 2016 20:12:00 +0300 Subject: [PATCH v2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. We can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. (at least that's my understanding after briefly looking at the code) We also need to care and teach add_object_entry_from_bitmap() to respect --local via not adding nonlocal loose object to resultant pack (this is bitmap-codepath counterpart of daae0625 (pack-objects: extend --local to mean ignore non-local loose objects too) -- not to break 'loose objects in alternate ODB are not repacked' in t7700-repack.sh . Otherwise all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff King suggested that it might be not generally a good idea to use bitmap reachability index when repacking a repository. For this reason when packing to a file the default is not to use bitmap, while for packing-to-stdout case the default stays to be "bitmap is used". The defaults can be configured with pack.useBitmaps.stdout (renamed from pack.useBitmaps), and pack.useBitmaps.file More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 Cc: Vicent Marti <tanoku@gmail.com> Cc: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 10 +++++++++- builtin/pack-objects.c | 19 ++++++++++++++----- t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ 3 files changed, 37 insertions(+), 6 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index e455fae..1888f42 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2241,12 +2241,20 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. -pack.useBitmaps:: +pack.useBitmaps (deprecated):: + This is a deprecated synonym for `pack.useBitmaps.stdout`. + +pack.useBitmaps.stdout:: When true, git will use pack bitmaps (if available) when packing to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. +pack.useBitmaps.file:: + When true, git will use pack bitmaps (if available) when packing + to file (e.g., on repack). Defaults to false. You should not + generally need to turn this on unless you know what you are doing. + pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a2f8cfd..7aaa1af 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_stdout = 1, use_bitmap_file = 0; +static int use_bitmap_index; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, { uint32_t index_pos; + if (local && has_loose_object_nonlocal(sha1)) + return 0; + if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; @@ -2224,8 +2228,12 @@ static int git_pack_config(const char *k, const char *v, void *cb) else write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } - if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + if (!strcmp(k, "pack.usebitmaps") || !strcmp(k, "pack.usebitmaps.stdout")) { + use_bitmap_stdout = git_config_bool(k, v); + return 0; + } + if (!strcmp(k, "pack.usebitmaps.file")) { + use_bitmap_file = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2488,7 +2496,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2702,6 +2710,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(&pack_idx_opts); git_config(git_pack_config, NULL); + use_bitmap_index = pack_to_stdout ? use_bitmap_stdout : use_bitmap_file; if (!pack_compression_seen && core_compression_seen) pack_compression_level = core_compression_level; @@ -2773,7 +2782,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..9fab2bb 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,20 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.g3cb5c84 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-12 19:08 ` Kirill Smelkov @ 2016-07-13 8:30 ` Jeff King 0 siblings, 0 replies; 62+ messages in thread From: Jeff King @ 2016-07-13 8:30 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Tue, Jul 12, 2016 at 10:08:08PM +0300, Kirill Smelkov wrote: > > Or are we fine with my arguments about recency order staying the same > > when using bitmap reachability index for object graph traversal, and this > > way the patch is fine to go in as it is? > > Since there is no reply I assume the safe way to go is to let default > for pack-to-file case to be "not using bitmap index". Please find updated > patch and interdiff below. I would still be grateful for feedback on > my above use-bitmap-for-pack-to-file arguments. Yeah, I think that is a reasonable approach. I see here you've added new config, though, and I don't think we want that. For your purposes, where you're driving pack-objects individually, I think a command-line option makes more sense. If we did want to have a flag for "use bitmaps when repacking via repack", I think it should be "repack.useBitmaps", and git-repack should pass the command-line option to pack-objects. pack-objects is porcelain and should not really be reading config at all. You'll note that pack.writeBitmaps was a mistake and got deprecated in favor of repack.writeBitmaps. I think pack.useBitmaps is a mistake, too, but nobody has really noticed or cared because there's no good reason to set it (the more interesting question is: are there bitmaps available? and if so, we try to use them). -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-08 10:38 ` Kirill Smelkov 2016-07-12 19:08 ` Kirill Smelkov @ 2016-07-13 8:26 ` Jeff King 2016-07-13 10:52 ` Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Jeff King @ 2016-07-13 8:26 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > > - we will not compute the same write order (which is based on > > traversal order), leading to packs that have less efficient cache > > characteristics > > I agree the order can be not exactly the same. Still if original pack is > packed well (with good recency order), while using bitmap we will tend > to traverse it in close to original order. > > Maybe I'm not completely right on this, but to me it looks to be the > case because if objects in original pack are put there linearly sorted > by recency order, and we use bitmap index to set of all reachable > objects from a root, and then just _linearly_ gather all those objects > from original pack by 1s in bitmap and put them in the same order into > destination pack, the recency order won't be broken. > > Or am I maybe misunderstanding something? Yeah, I think you can go some of the way by reusing the order from the old pack. But keep in mind that the bitmap result may also contain objects that are not yet packed. Those will just come in a big lump at the end of the bitmap (these are the "extended entries" in the bitmap code). So I think if you were to repeatedly "git repack -adb" over time, you would get worse and worse ordering as objects are added to the repository. As an aside, two other things that pack order matters for: it makes the bitmaps themselves compress better (because it increases locality of reachability, so you get nice runs of "1" or "0" bits). It also makes the pack-reuse code more efficient (since in an ideal case, you can just dump a big block of data from the front of the pack). Note that the pack-reuse code that's in upstream git isn't that great; I have a better system on my big pile of patches to send upstream (that never seems to get smaller; <sigh>). > > - we don't learn about the filename of trees and blobs, which is going > > to make the delta step much less efficient. This might be mitigated > > by turning on the bitmap name-hash cache; I don't recall how much > > detail pack-objects needs on the name (i.e., the full name versus > > just the hash). > > If I understand it right, it uses only uint32_t name hash while searching. From > pack-objects.{h,c} : Yeah, I think you are right. Not having the real names is a problem for doing rev-list output, but I think pack-objects doesn't care (though do note that the name-hash cache is not enabled by default). > > There may be other subtle things, too. The general idea of tying the > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > more important to generate a good pack. > > It is better we send good packs to clients too, right? And with > pack.writeBitmapHashCache=true and retaining recency order (please see > above, but again maybe I'm not completely right) to me we should be still > generating a good pack while using bitmap reachability index for object > graph traversal. We do want to send the client a good pack, but it's always a tradeoff. We could spend much more time searching for the perfect delta, but at some point we have to decide on how much CPU to spend serving them. Likewise, even if the bitmapped packs we send are in slightly worse order, saving a minute of CPU time off of every clone of the kernel is a big deal. We also take robustness shortcuts when sending to clients. For example, when doing an on-disk repack we re-crc32 all of the delta data we are reusing, even if we don't actually inflate it (because we would want to stop immediately if we see even a single bit flipped on disk). But we don't check them when sending to a client, because we know they are going to actually `index-pack` it and get a stronger consistency check anyway, and don't want to waste server CPU. The bitmaps are sort of the same. If there is a bug or corruption in the bitmap, the worst case is that we send a broken pack to the client, who will complain that we did not give them all of the objects. It's a momentary problem that can be fixed. If you use them for an on-disk repack, then the next step is usually to delete all of the old packs. So a corruption there carries forward, and is irreversible. As I understand your use case, it is OK to do the less careful things. It's just that pack-objects until now has been split into two modes: packing to a file is careful, and packing to stdout is less so. And you want to pack to a file in the non-careful mode. > > but I wonder if you should be using "pack-objects --stdout" yourself. > > I already tried --stdout. The problem is on repository extraction we > need to both extract the pack and index it. While `pack-object file` > does both, for --stdout case we need to additionally index extracted > pack with `git index-pack`, and standalone `git index-pack` is very slow > - in my experience much slower than generating the pack itself: Ah, right, that makes sense. The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas! By contrast, a pack to stdout can be quite quick, because in most cases it can avoid even inflating most of the data; where possible it just sends the zlib data straight from disk to the client. So I do agree "--stdout" is not ideal for you (or at the very least, you really want pack-objects to generate the index from its internal table rather than having to reconstruct it just from the pack stream). > > But even if it is the right thing for your use case to be using bitmaps > > to generate an on-disk bitmap, I think we should be making sure it > > _doesn't_ trigger when doing a normal repack. > > So seems the way forward here is to teach pack-objects not to silently > drop explicit --use-pack-bitmap for cases when it can handle it? > (currently even if this option was given, for !stdout cases pack-objects > simply drop use_bitmap_index to 0). > > And to make sure default for use_bitmap_index is 0 for !stdout cases? I think it would be reasonable to accept "--use-bitmap-index" on the command line as an override for "yes, really, this is what I want". So the logic would be something like: static int use_bitmap_index_default = 1; static int use_bitmap_index = -1; ... parse config; if we see pack.usebitmaps, set use_bitmap_index_default ... ... parse command line, setting use_bitmap_index ... /* "soft" reasons not to use bitmaps */ if (!pack_to_stdout) use_bitmap_index_default = 0; /* now install our default if the user didn't otherwise specify */ if (use_bitmap_index < 0) use_bitmap_index = use_bitmap_index_default; /* "hard" reasons not to use bitmaps; these just won't work at all */ if (!use_internal_rev_list || is_repository_shallow()) use_bitmap_index = 0; -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-13 8:26 ` Jeff King @ 2016-07-13 10:52 ` Kirill Smelkov 2016-07-17 17:06 ` Kirill Smelkov 2016-07-25 18:40 ` Jeff King 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-07-13 10:52 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Wed, Jul 13, 2016 at 04:26:53AM -0400, Jeff King wrote: > On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > > > > - we will not compute the same write order (which is based on > > > traversal order), leading to packs that have less efficient cache > > > characteristics > > > > I agree the order can be not exactly the same. Still if original pack is > > packed well (with good recency order), while using bitmap we will tend > > to traverse it in close to original order. > > > > Maybe I'm not completely right on this, but to me it looks to be the > > case because if objects in original pack are put there linearly sorted > > by recency order, and we use bitmap index to set of all reachable > > objects from a root, and then just _linearly_ gather all those objects > > from original pack by 1s in bitmap and put them in the same order into > > destination pack, the recency order won't be broken. > > > > Or am I maybe misunderstanding something? > > Yeah, I think you can go some of the way by reusing the order from the > old pack. But keep in mind that the bitmap result may also contain > objects that are not yet packed. Those will just come in a big lump at > the end of the bitmap (these are the "extended entries" in the bitmap > code). > > So I think if you were to repeatedly "git repack -adb" over time, you > would get worse and worse ordering as objects are added to the > repository. Jeff, first of all thanks for clarifying. So it is not-yet-packed-objects which make packing with bitmap less efficient. I was originally keeping in mind fresh repacked repository with just built bitmap index and for that case extracting pack with bitmap index seems to be just ok, but the more not-yet-packed objects we have the worse the result can be. > As an aside, two other things that pack order matters for: it makes the > bitmaps themselves compress better (because it increases locality of > reachability, so you get nice runs of "1" or "0" bits). Yes I agree and thanks for bringing this up - putting objects in recency order in pack also makes bitmap index to have larger runs of same 1 or 0. > It also makes > the pack-reuse code more efficient (since in an ideal case, you can just > dump a big block of data from the front of the pack). Note that the > pack-reuse code that's in upstream git isn't that great; I have a better > system on my big pile of patches to send upstream (that never seems to > get smaller; <sigh>). Yes, it also make sense. I saw write_reused_pack() in upstream git just copy raw bytes from original to destination pack. You mentioned you have something better for pack reuse - in your patch queue, in two words, is it now reusing pack based on object, not raw bytes, or is it something else? In other words in which way it works better? (I'm just curious here as it is interesting to know) > > > - we don't learn about the filename of trees and blobs, which is going > > > to make the delta step much less efficient. This might be mitigated > > > by turning on the bitmap name-hash cache; I don't recall how much > > > detail pack-objects needs on the name (i.e., the full name versus > > > just the hash). > > > > If I understand it right, it uses only uint32_t name hash while searching. From > > pack-objects.{h,c} : > > Yeah, I think you are right. Not having the real names is a problem for > doing rev-list output, but I think pack-objects doesn't care (though do > note that the name-hash cache is not enabled by default). Yes, for packing it is only hash which is used. And I assume name-hash for bitmap is not enabled by default for compatibility with JGit code. It would make sense to me to eventually enable name-hash bitmap extension by default, as packing result is much better with it. And those who care about compatibility with JGit can just turn it off in their git config. Just my thoughts. > > > There may be other subtle things, too. The general idea of tying the > > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > > more important to generate a good pack. > > > > It is better we send good packs to clients too, right? And with > > pack.writeBitmapHashCache=true and retaining recency order (please see > > above, but again maybe I'm not completely right) to me we should be still > > generating a good pack while using bitmap reachability index for object > > graph traversal. > > We do want to send the client a good pack, but it's always a tradeoff. > We could spend much more time searching for the perfect delta, but at > some point we have to decide on how much CPU to spend serving them. > Likewise, even if the bitmapped packs we send are in slightly worse > order, saving a minute of CPU time off of every clone of the kernel is a > big deal. Yes, this I understand and agree. Like I said above I was imagining freshly repacked repo with recently rebuilt bitmap index and for that case we send a good pack with bitmaps out-of-the-box. > We also take robustness shortcuts when sending to clients. For example, > when doing an on-disk repack we re-crc32 all of the delta data we are > reusing, even if we don't actually inflate it (because we would want to > stop immediately if we see even a single bit flipped on disk). But we > don't check them when sending to a client, because we know they are > going to actually `index-pack` it and get a stronger consistency check > anyway, and don't want to waste server CPU. > > The bitmaps are sort of the same. If there is a bug or corruption in the > bitmap, the worst case is that we send a broken pack to the client, who > will complain that we did not give them all of the objects. It's a > momentary problem that can be fixed. If you use them for an on-disk > repack, then the next step is usually to delete all of the old packs. So > a corruption there carries forward, and is irreversible. Thanks for clarifying here. I did not knew pack-to-file is assumed to be robust and pack-to-stdout is assumed to be allowed to be less so. Or at least I did not thought about it this way before. > As I understand your use case, it is OK to do the less careful things. > It's just that pack-objects until now has been split into two modes: > packing to a file is careful, and packing to stdout is less so. And you > want to pack to a file in the non-careful mode. Yes, it should be ok, as after repository extraction git-backup verifies rev-list for all refs https://lab.nexedi.com/kirr/git-backup/blob/7fcb8c67/git-backup.go#L855 And if an object is missing - e.g. a blob - rev-list complains: fatal: missing blob object '980a0d5f19a64b4b30a87d4206aade58726b60e3' though it does not catch blob corruptions. As with when using bitmap index (due to bug in bitmap code or bitmap index corruprtion) the worst that can happen is not all objects are extracted, this should be effective measure to catch it. The original whole-backup repository is also not removed, so we can re-extract objects anytime. So yes, using bitmap reachability index for faster extraction from freshly repacked and bitmap indexed backup repository should be ok and make sense to me. > > > but I wonder if you should be using "pack-objects --stdout" yourself. > > > > I already tried --stdout. The problem is on repository extraction we > > need to both extract the pack and index it. While `pack-object file` > > does both, for --stdout case we need to additionally index extracted > > pack with `git index-pack`, and standalone `git index-pack` is very slow > > - in my experience much slower than generating the pack itself: > > Ah, right, that makes sense. The packfile does not carry the sha1 of the > objects. A receiving index-pack has to compute them itself, including > inflating and applying all of the deltas! By contrast, a pack to stdout > can be quite quick, because in most cases it can avoid even inflating > most of the data; where possible it just sends the zlib data straight > from disk to the client. > > So I do agree "--stdout" is not ideal for you (or at the very least, you > really want pack-objects to generate the index from its internal table > rather than having to reconstruct it just from the pack stream). Yes, and thanks for clarifying a bit why standalone index-pack can be slow. > > > But even if it is the right thing for your use case to be using bitmaps > > > to generate an on-disk bitmap, I think we should be making sure it > > > _doesn't_ trigger when doing a normal repack. > > > > So seems the way forward here is to teach pack-objects not to silently > > drop explicit --use-pack-bitmap for cases when it can handle it? > > (currently even if this option was given, for !stdout cases pack-objects > > simply drop use_bitmap_index to 0). > > > > And to make sure default for use_bitmap_index is 0 for !stdout cases? > > I think it would be reasonable to accept "--use-bitmap-index" on the > command line as an override for "yes, really, this is what I want". So > the logic would be something like: > > static int use_bitmap_index_default = 1; > static int use_bitmap_index = -1; > > ... parse config; if we see pack.usebitmaps, set > use_bitmap_index_default ... > > ... parse command line, setting use_bitmap_index ... > > /* "soft" reasons not to use bitmaps */ > if (!pack_to_stdout) > use_bitmap_index_default = 0; > > /* now install our default if the user didn't otherwise specify */ > if (use_bitmap_index < 0) > use_bitmap_index = use_bitmap_index_default; > > /* "hard" reasons not to use bitmaps; these just won't work at all */ > if (!use_internal_rev_list || is_repository_shallow()) > use_bitmap_index = 0; On Wed, Jul 13, 2016 at 04:30:44AM -0400, Jeff King wrote: > On Tue, Jul 12, 2016 at 10:08:08PM +0300, Kirill Smelkov wrote: > > > > Or are we fine with my arguments about recency order staying the same > > > when using bitmap reachability index for object graph traversal, and this > > > way the patch is fine to go in as it is? > > > > Since there is no reply I assume the safe way to go is to let default > > for pack-to-file case to be "not using bitmap index". Please find updated > > patch and interdiff below. I would still be grateful for feedback on > > my above use-bitmap-for-pack-to-file arguments. > > Yeah, I think that is a reasonable approach. I see here you've added new > config, though, and I don't think we want that. > > For your purposes, where you're driving pack-objects individually, I > think a command-line option makes more sense. Yes, I was going to use --use-bitmap-index explicitly, but I thought since we already have pack.useBitmaps for consistency it is better to introduce controlling to-file config point. > If we did want to have a flag for "use bitmaps when repacking via > repack", I think it should be "repack.useBitmaps", and git-repack should > pass the command-line option to pack-objects. pack-objects is porcelain > and should not really be reading config at all. You'll note that > pack.writeBitmaps was a mistake and got deprecated in favor of > repack.writeBitmaps. I think pack.useBitmaps is a mistake, too, but > nobody has really noticed or cared because there's no good reason to set > it (the more interesting question is: are there bitmaps available? and > if so, we try to use them). Probably pack.useBitmaps is of no use in normal situation, but for debugging problems related to bitmaps it can be handy. Though when someone debugs he/she can just adjust pack-objects.c . So should we deprecate and eventually remove pack.useBitmaps ? Anyway, please find below updated patch according to your suggestion. Hope it is ok now. Thanks again, Kirill (interdiff) diff --git a/Documentation/config.txt b/Documentation/config.txt index 8027951..4b14806 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2229,19 +2229,14 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. -pack.useBitmaps (deprecated):: - This is a deprecated synonym for `pack.useBitmaps.stdout`. - -pack.useBitmaps.stdout:: +pack.useBitmaps:: When true, git will use pack bitmaps (if available) when packing to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. - -pack.useBitmaps.file:: - When true, git will use pack bitmaps (if available) when packing - to file (e.g., on repack). Defaults to false. You should not - generally need to turn this on unless you know what you are doing. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7aaa1af..ffe8da6 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,8 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_stdout = 1, use_bitmap_file = 0; -static int use_bitmap_index; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2228,12 +2228,8 @@ static int git_pack_config(const char *k, const char *v, void *cb) else write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } - if (!strcmp(k, "pack.usebitmaps") || !strcmp(k, "pack.usebitmaps.stdout")) { - use_bitmap_stdout = git_config_bool(k, v); - return 0; - } - if (!strcmp(k, "pack.usebitmaps.file")) { - use_bitmap_file = git_config_bool(k, v); + if (!strcmp(k, "pack.usebitmaps")) { + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2710,7 +2706,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(&pack_idx_opts); git_config(git_pack_config, NULL); - use_bitmap_index = pack_to_stdout ? use_bitmap_stdout : use_bitmap_file; if (!pack_compression_seen && core_compression_seen) pack_compression_level = core_compression_level; @@ -2782,6 +2777,22 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH v3] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. We can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. (at least that's my understanding after briefly looking at the code) We also need to care and teach add_object_entry_from_bitmap() to respect --local via not adding nonlocal loose object to resultant pack (this is bitmap-codepath counterpart of daae0625 (pack-objects: extend --local to mean ignore non-local loose objects too) -- not to break 'loose objects in alternate ODB are not repacked' in t7700-repack.sh . Otherwise all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff King suggested that it might be not generally a good idea to use bitmap reachability index when repacking a repository. The reason here is for on-disk repack by default we want - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 http://article.gmane.org/gmane.comp.version-control.git/299420 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 28 ++++++++++++++++++++++++---- t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index db05dec..4b14806 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2234,6 +2234,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a2f8cfd..ffe8da6 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, { uint32_t index_pos; + if (local && has_loose_object_nonlocal(sha1)) + return 0; + if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; @@ -2225,7 +2229,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2488,7 +2492,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2773,7 +2777,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..9fab2bb 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,20 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.g3cb5c84 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-13 10:52 ` Kirill Smelkov @ 2016-07-17 17:06 ` Kirill Smelkov 2016-07-19 11:29 ` Jeff King 2016-07-25 18:40 ` Jeff King 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-17 17:06 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Wed, Jul 13, 2016 at 01:52:16PM +0300, Kirill Smelkov wrote: > On Wed, Jul 13, 2016 at 04:26:53AM -0400, Jeff King wrote: > > On Fri, Jul 08, 2016 at 01:38:55PM +0300, Kirill Smelkov wrote: > > > > > > - we will not compute the same write order (which is based on > > > > traversal order), leading to packs that have less efficient cache > > > > characteristics > > > > > > I agree the order can be not exactly the same. Still if original pack is > > > packed well (with good recency order), while using bitmap we will tend > > > to traverse it in close to original order. > > > > > > Maybe I'm not completely right on this, but to me it looks to be the > > > case because if objects in original pack are put there linearly sorted > > > by recency order, and we use bitmap index to set of all reachable > > > objects from a root, and then just _linearly_ gather all those objects > > > from original pack by 1s in bitmap and put them in the same order into > > > destination pack, the recency order won't be broken. > > > > > > Or am I maybe misunderstanding something? > > > > Yeah, I think you can go some of the way by reusing the order from the > > old pack. But keep in mind that the bitmap result may also contain > > objects that are not yet packed. Those will just come in a big lump at > > the end of the bitmap (these are the "extended entries" in the bitmap > > code). > > > > So I think if you were to repeatedly "git repack -adb" over time, you > > would get worse and worse ordering as objects are added to the > > repository. > > Jeff, first of all thanks for clarifying. > > So it is not-yet-packed-objects which make packing with bitmap less > efficient. I was originally keeping in mind fresh repacked repository > with just built bitmap index and for that case extracting pack with > bitmap index seems to be just ok, but the more not-yet-packed objects we > have the worse the result can be. > > > As an aside, two other things that pack order matters for: it makes the > > bitmaps themselves compress better (because it increases locality of > > reachability, so you get nice runs of "1" or "0" bits). > > Yes I agree and thanks for bringing this up - putting objects in recency > order in pack also makes bitmap index to have larger runs of same 1 or 0. > > > It also makes > > the pack-reuse code more efficient (since in an ideal case, you can just > > dump a big block of data from the front of the pack). Note that the > > pack-reuse code that's in upstream git isn't that great; I have a better > > system on my big pile of patches to send upstream (that never seems to > > get smaller; <sigh>). > > Yes, it also make sense. I saw write_reused_pack() in upstream git just > copy raw bytes from original to destination pack. You mentioned you have > something better for pack reuse - in your patch queue, in two words, is > it now reusing pack based on object, not raw bytes, or is it something > else? > > In other words in which way it works better? (I'm just curious here as > it is interesting to know) > > > > > > - we don't learn about the filename of trees and blobs, which is going > > > > to make the delta step much less efficient. This might be mitigated > > > > by turning on the bitmap name-hash cache; I don't recall how much > > > > detail pack-objects needs on the name (i.e., the full name versus > > > > just the hash). > > > > > > If I understand it right, it uses only uint32_t name hash while searching. From > > > pack-objects.{h,c} : > > > > Yeah, I think you are right. Not having the real names is a problem for > > doing rev-list output, but I think pack-objects doesn't care (though do > > note that the name-hash cache is not enabled by default). > > Yes, for packing it is only hash which is used. And I assume name-hash > for bitmap is not enabled by default for compatibility with JGit code. > > It would make sense to me to eventually enable name-hash bitmap > extension by default, as packing result is much better with it. And > those who care about compatibility with JGit can just turn it off in > their git config. > > Just my thoughts. > > > > > There may be other subtle things, too. The general idea of tying the > > > > bitmap use to pack_to_stdout is that you _do_ want to use it for > > > > serving fetches and pushes, but for a full on-disk repack via gc, it's > > > > more important to generate a good pack. > > > > > > It is better we send good packs to clients too, right? And with > > > pack.writeBitmapHashCache=true and retaining recency order (please see > > > above, but again maybe I'm not completely right) to me we should be still > > > generating a good pack while using bitmap reachability index for object > > > graph traversal. > > > > We do want to send the client a good pack, but it's always a tradeoff. > > We could spend much more time searching for the perfect delta, but at > > some point we have to decide on how much CPU to spend serving them. > > Likewise, even if the bitmapped packs we send are in slightly worse > > order, saving a minute of CPU time off of every clone of the kernel is a > > big deal. > > Yes, this I understand and agree. Like I said above I was imagining > freshly repacked repo with recently rebuilt bitmap index and for that > case we send a good pack with bitmaps out-of-the-box. > > > We also take robustness shortcuts when sending to clients. For example, > > when doing an on-disk repack we re-crc32 all of the delta data we are > > reusing, even if we don't actually inflate it (because we would want to > > stop immediately if we see even a single bit flipped on disk). But we > > don't check them when sending to a client, because we know they are > > going to actually `index-pack` it and get a stronger consistency check > > anyway, and don't want to waste server CPU. > > > > The bitmaps are sort of the same. If there is a bug or corruption in the > > bitmap, the worst case is that we send a broken pack to the client, who > > will complain that we did not give them all of the objects. It's a > > momentary problem that can be fixed. If you use them for an on-disk > > repack, then the next step is usually to delete all of the old packs. So > > a corruption there carries forward, and is irreversible. > > Thanks for clarifying here. I did not knew pack-to-file is assumed to be > robust and pack-to-stdout is assumed to be allowed to be less so. Or at > least I did not thought about it this way before. > > > As I understand your use case, it is OK to do the less careful things. > > It's just that pack-objects until now has been split into two modes: > > packing to a file is careful, and packing to stdout is less so. And you > > want to pack to a file in the non-careful mode. > > Yes, it should be ok, as after repository extraction git-backup > verifies rev-list for all refs > > https://lab.nexedi.com/kirr/git-backup/blob/7fcb8c67/git-backup.go#L855 > > And if an object is missing - e.g. a blob - rev-list complains: > > fatal: missing blob object '980a0d5f19a64b4b30a87d4206aade58726b60e3' > > though it does not catch blob corruptions. > > As with when using bitmap index (due to bug in bitmap code or bitmap > index corruprtion) the worst that can happen is not all objects are > extracted, this should be effective measure to catch it. > > The original whole-backup repository is also not removed, so we can > re-extract objects anytime. > > So yes, using bitmap reachability index for faster extraction from > freshly repacked and bitmap indexed backup repository should be ok and > make sense to me. > > > > > > but I wonder if you should be using "pack-objects --stdout" yourself. > > > > > > I already tried --stdout. The problem is on repository extraction we > > > need to both extract the pack and index it. While `pack-object file` > > > does both, for --stdout case we need to additionally index extracted > > > pack with `git index-pack`, and standalone `git index-pack` is very slow > > > - in my experience much slower than generating the pack itself: > > > > Ah, right, that makes sense. The packfile does not carry the sha1 of the > > objects. A receiving index-pack has to compute them itself, including > > inflating and applying all of the deltas! By contrast, a pack to stdout > > can be quite quick, because in most cases it can avoid even inflating > > most of the data; where possible it just sends the zlib data straight > > from disk to the client. > > > > So I do agree "--stdout" is not ideal for you (or at the very least, you > > really want pack-objects to generate the index from its internal table > > rather than having to reconstruct it just from the pack stream). > > Yes, and thanks for clarifying a bit why standalone index-pack can be > slow. > > > > > But even if it is the right thing for your use case to be using bitmaps > > > > to generate an on-disk bitmap, I think we should be making sure it > > > > _doesn't_ trigger when doing a normal repack. > > > > > > So seems the way forward here is to teach pack-objects not to silently > > > drop explicit --use-pack-bitmap for cases when it can handle it? > > > (currently even if this option was given, for !stdout cases pack-objects > > > simply drop use_bitmap_index to 0). > > > > > > And to make sure default for use_bitmap_index is 0 for !stdout cases? > > > > I think it would be reasonable to accept "--use-bitmap-index" on the > > command line as an override for "yes, really, this is what I want". So > > the logic would be something like: > > > > static int use_bitmap_index_default = 1; > > static int use_bitmap_index = -1; > > > > ... parse config; if we see pack.usebitmaps, set > > use_bitmap_index_default ... > > > > ... parse command line, setting use_bitmap_index ... > > > > /* "soft" reasons not to use bitmaps */ > > if (!pack_to_stdout) > > use_bitmap_index_default = 0; > > > > /* now install our default if the user didn't otherwise specify */ > > if (use_bitmap_index < 0) > > use_bitmap_index = use_bitmap_index_default; > > > > /* "hard" reasons not to use bitmaps; these just won't work at all */ > > if (!use_internal_rev_list || is_repository_shallow()) > > use_bitmap_index = 0; > > > On Wed, Jul 13, 2016 at 04:30:44AM -0400, Jeff King wrote: > > On Tue, Jul 12, 2016 at 10:08:08PM +0300, Kirill Smelkov wrote: > > > > > > Or are we fine with my arguments about recency order staying the same > > > > when using bitmap reachability index for object graph traversal, and this > > > > way the patch is fine to go in as it is? > > > > > > Since there is no reply I assume the safe way to go is to let default > > > for pack-to-file case to be "not using bitmap index". Please find updated > > > patch and interdiff below. I would still be grateful for feedback on > > > my above use-bitmap-for-pack-to-file arguments. > > > > Yeah, I think that is a reasonable approach. I see here you've added new > > config, though, and I don't think we want that. > > > > For your purposes, where you're driving pack-objects individually, I > > think a command-line option makes more sense. > > Yes, I was going to use --use-bitmap-index explicitly, but I thought > since we already have pack.useBitmaps for consistency it is better to > introduce controlling to-file config point. > > > > If we did want to have a flag for "use bitmaps when repacking via > > repack", I think it should be "repack.useBitmaps", and git-repack should > > pass the command-line option to pack-objects. pack-objects is porcelain > > and should not really be reading config at all. You'll note that > > pack.writeBitmaps was a mistake and got deprecated in favor of > > repack.writeBitmaps. I think pack.useBitmaps is a mistake, too, but > > nobody has really noticed or cared because there's no good reason to set > > it (the more interesting question is: are there bitmaps available? and > > if so, we try to use them). > > Probably pack.useBitmaps is of no use in normal situation, but for > debugging problems related to bitmaps it can be handy. Though when > someone debugs he/she can just adjust pack-objects.c . So should we > deprecate and eventually remove pack.useBitmaps ? > > Anyway, please find below updated patch according to your suggestion. > Hope it is ok now. Ping. Is the patch ok or something needs to be improved still? Thanks beforehand for feedback, Kirill > (interdiff) > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 8027951..4b14806 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2229,19 +2229,14 @@ pack.packSizeLimit:: > Common unit suffixes of 'k', 'm', or 'g' are > supported. > > -pack.useBitmaps (deprecated):: > - This is a deprecated synonym for `pack.useBitmaps.stdout`. > - > -pack.useBitmaps.stdout:: > +pack.useBitmaps:: > When true, git will use pack bitmaps (if available) when packing > to stdout (e.g., during the server side of a fetch). Defaults to > true. You should not generally need to turn this off unless > you are debugging pack bitmaps. > - > -pack.useBitmaps.file:: > - When true, git will use pack bitmaps (if available) when packing > - to file (e.g., on repack). Defaults to false. You should not > - generally need to turn this on unless you know what you are doing. > ++ > +*NOTE*: when packing to file (e.g., on repack) the default is always not to use > + pack bitmaps. > > pack.writeBitmaps (deprecated):: > This is a deprecated synonym for `repack.writeBitmaps`. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 7aaa1af..ffe8da6 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -66,8 +66,8 @@ static struct packed_git *reuse_packfile; > static uint32_t reuse_packfile_objects; > static off_t reuse_packfile_offset; > > -static int use_bitmap_stdout = 1, use_bitmap_file = 0; > -static int use_bitmap_index; > +static int use_bitmap_index_default = 1; > +static int use_bitmap_index = -1; > static int write_bitmap_index; > static uint16_t write_bitmap_options; > > @@ -2228,12 +2228,8 @@ static int git_pack_config(const char *k, const char *v, void *cb) > else > write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; > } > - if (!strcmp(k, "pack.usebitmaps") || !strcmp(k, "pack.usebitmaps.stdout")) { > - use_bitmap_stdout = git_config_bool(k, v); > - return 0; > - } > - if (!strcmp(k, "pack.usebitmaps.file")) { > - use_bitmap_file = git_config_bool(k, v); > + if (!strcmp(k, "pack.usebitmaps")) { > + use_bitmap_index_default = git_config_bool(k, v); > return 0; > } > if (!strcmp(k, "pack.threads")) { > @@ -2710,7 +2706,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > > reset_pack_idx_option(&pack_idx_opts); > git_config(git_pack_config, NULL); > - use_bitmap_index = pack_to_stdout ? use_bitmap_stdout : use_bitmap_file; > if (!pack_compression_seen && core_compression_seen) > pack_compression_level = core_compression_level; > > @@ -2782,6 +2777,22 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > + /* > + * "soft" reasons not to use bitmaps - for on-disk repack by default we want > + * > + * - to produce good pack (with bitmap index not-yet-packed objects are > + * packed in suboptimal order). > + * > + * - to use more robust pack-generation codepath (avoiding possible > + * bugs in bitmap code and possible bitmap index corruption). > + */ > + if (!pack_to_stdout) > + use_bitmap_index_default = 0; > + > + if (use_bitmap_index < 0) > + use_bitmap_index = use_bitmap_index_default; > + > + /* "hard" reasons not to use bitmaps; these just won't work at all */ > if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) > use_bitmap_index = 0; > > > ---- 8< ---- > From: Kirill Smelkov <kirr@nexedi.com> > Subject: [PATCH v3] pack-objects: Teach it to use reachability bitmap index when > generating non-stdout pack too > > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > if a repository has bitmap index, pack-objects can nicely speedup > "Counting objects" graph traversal phase. That however was done only for > case when resultant pack is sent to stdout, not written into a file. > > We can teach pack-objects to use bitmap index for initial object > counting phase when generating resultant pack file too: > > - if we know bitmap index generation is not enabled for resultant pack: > > Current code has singleton bitmap_git so cannot work simultaneously > with two bitmap indices. > > - if we keep pack reuse enabled still only for "send-to-stdout" case: > > Because on pack reuse raw entries are directly written out to destination > pack by write_reused_pack() bypassing needed for pack index generation > bookkeeping done by regular codepath in write_one() and friends. > > (at least that's my understanding after briefly looking at the code) > > We also need to care and teach add_object_entry_from_bitmap() to respect > --local via not adding nonlocal loose object to resultant pack (this > is bitmap-codepath counterpart of daae0625 (pack-objects: extend --local > to mean ignore non-local loose objects too) -- not to break 'loose > objects in alternate ODB are not repacked' in t7700-repack.sh . > > Otherwise all git tests pass, and for pack-objects -> file we get nice > speedup: > > erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup > repository managed by git-backup[2] via > > time echo 0186ac99 | git pack-objects --revs erp5pack > > before: 37.2s > after: 26.2s > > And for `git repack -adb` packed git.git > > time echo 5c589a73 | git pack-objects --revs gitpack > > before: 7.1s > after: 3.6s > > i.e. it can be 30% - 50% speedup for pack extraction. > > git-backup extracts many packs on repositories restoration. That was my > initial motivation for the patch. > > [1] https://lab.nexedi.com/nexedi/erp5 > [2] https://lab.nexedi.com/kirr/git-backup > > NOTE > > Jeff King suggested that it might be not generally a good idea to > use bitmap reachability index when repacking a repository. The reason > here is for on-disk repack by default we want > > - to produce good pack (with bitmap index not-yet-packed objects are > emitted to pack in suboptimal order). > > - to use more robust pack-generation codepath (avoiding possible > bugs in bitmap code and possible bitmap index corruption). > > Jeff also suggests that pack.useBitmaps was probably a mistake to > introduce originally. This way we are not adding another config point, > but instead just always default to-file pack-objects not to use bitmap > index: Tools which need to generate on-disk packs with using bitmap, can > pass --use-bitmap-index explicitly. > > More context: > > http://article.gmane.org/gmane.comp.version-control.git/299063 > http://article.gmane.org/gmane.comp.version-control.git/299107 > http://article.gmane.org/gmane.comp.version-control.git/299420 > > Cc: Vicent Marti <tanoku@gmail.com> > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Kirill Smelkov <kirr@nexedi.com> > --- > Documentation/config.txt | 3 +++ > builtin/pack-objects.c | 28 ++++++++++++++++++++++++---- > t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ > 3 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index db05dec..4b14806 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2234,6 +2234,9 @@ pack.useBitmaps:: > to stdout (e.g., during the server side of a fetch). Defaults to > true. You should not generally need to turn this off unless > you are debugging pack bitmaps. > ++ > +*NOTE*: when packing to file (e.g., on repack) the default is always not to use > + pack bitmaps. > > pack.writeBitmaps (deprecated):: > This is a deprecated synonym for `repack.writeBitmaps`. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index a2f8cfd..ffe8da6 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; > static uint32_t reuse_packfile_objects; > static off_t reuse_packfile_offset; > > -static int use_bitmap_index = 1; > +static int use_bitmap_index_default = 1; > +static int use_bitmap_index = -1; > static int write_bitmap_index; > static uint16_t write_bitmap_options; > > @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > { > uint32_t index_pos; > > + if (local && has_loose_object_nonlocal(sha1)) > + return 0; > + > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; > > @@ -2225,7 +2229,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) > write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; > } > if (!strcmp(k, "pack.usebitmaps")) { > - use_bitmap_index = git_config_bool(k, v); > + use_bitmap_index_default = git_config_bool(k, v); > return 0; > } > if (!strcmp(k, "pack.threads")) { > @@ -2488,7 +2492,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) > if (prepare_bitmap_walk(revs) < 0) > return -1; > > - if (pack_options_allow_reuse() && > + if (pack_options_allow_reuse() && pack_to_stdout && > !reuse_partial_packfile_from_bitmap( > &reuse_packfile, > &reuse_packfile_objects, > @@ -2773,7 +2777,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) > + /* > + * "soft" reasons not to use bitmaps - for on-disk repack by default we want > + * > + * - to produce good pack (with bitmap index not-yet-packed objects are > + * packed in suboptimal order). > + * > + * - to use more robust pack-generation codepath (avoiding possible > + * bugs in bitmap code and possible bitmap index corruption). > + */ > + if (!pack_to_stdout) > + use_bitmap_index_default = 0; > + > + if (use_bitmap_index < 0) > + use_bitmap_index = use_bitmap_index_default; > + > + /* "hard" reasons not to use bitmaps; these just won't work at all */ > + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) > use_bitmap_index = 0; > > if (pack_to_stdout || !rev_list_all) > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..9fab2bb 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -118,6 +118,20 @@ test_expect_success 'incremental repack can disable bitmaps' ' > git repack -d --no-write-bitmap-index > ' > > +test_expect_success 'pack-objects to file can use bitmap' ' > + # make sure we still have 1 bitmap index from previous tests > + ls .git/objects/pack/ | grep bitmap >output && > + test_line_count = 1 output && > + # verify equivalent packs are generated with/without using bitmap index > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > + git verify-pack -v packa-$packasha1.pack >packa.verify && > + git verify-pack -v packb-$packbsha1.pack >packb.verify && > + grep -o "^$_x40" packa.verify |sort >packa.objects && > + grep -o "^$_x40" packb.verify |sort >packb.objects && > + test_cmp packa.objects packb.objects > +' > + > test_expect_success 'full repack, reusing previous bitmaps' ' > git repack -ad && > ls .git/objects/pack/ | grep bitmap >output && > -- > 2.9.0.431.g3cb5c84 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-17 17:06 ` Kirill Smelkov @ 2016-07-19 11:29 ` Jeff King 2016-07-19 12:14 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Jeff King @ 2016-07-19 11:29 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Sun, Jul 17, 2016 at 08:06:49PM +0300, Kirill Smelkov wrote: > > Anyway, please find below updated patch according to your suggestion. > > Hope it is ok now. > > Ping. Is the patch ok or something needs to be improved still? Sorry, I'm traveling and haven't carefully reviewed it yet. It's still on my list, but it may be a few days. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-19 11:29 ` Jeff King @ 2016-07-19 12:14 ` Kirill Smelkov 0 siblings, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-07-19 12:14 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Tue, Jul 19, 2016 at 05:29:07AM -0600, Jeff King wrote: > On Sun, Jul 17, 2016 at 08:06:49PM +0300, Kirill Smelkov wrote: > > > > Anyway, please find below updated patch according to your suggestion. > > > Hope it is ok now. > > > > Ping. Is the patch ok or something needs to be improved still? > > Sorry, I'm traveling and haven't carefully reviewed it yet. It's still > on my list, but it may be a few days. Jeff thanks for feedback. Have a good traveling and good to know the patch was not forgotten. I will be waiting for the time while you are on trip. Thanks again for feedback, Kirill ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-13 10:52 ` Kirill Smelkov 2016-07-17 17:06 ` Kirill Smelkov @ 2016-07-25 18:40 ` Jeff King 2016-07-25 18:53 ` Jeff King 2016-07-27 20:15 ` Kirill Smelkov 1 sibling, 2 replies; 62+ messages in thread From: Jeff King @ 2016-07-25 18:40 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Wed, Jul 13, 2016 at 01:52:17PM +0300, Kirill Smelkov wrote: > > So I think if you were to repeatedly "git repack -adb" over time, you > > would get worse and worse ordering as objects are added to the > > repository. > > Jeff, first of all thanks for clarifying. > > So it is not-yet-packed-objects which make packing with bitmap less > efficient. I was originally keeping in mind fresh repacked repository > with just built bitmap index and for that case extracting pack with > bitmap index seems to be just ok, but the more not-yet-packed objects we > have the worse the result can be. Right. So I think your scheme is fine as long as you are doing your regular "pack all into one" repacks with a real walk, and then "branching" off of that with one-off bitmap-computed packs into files (even if you later take a bunch of those files and pull them into a single bitmapped, as long as that final "all into one" does the walk). Or I guess another way to think about it would be that if you're computing bitmaps, you'd want to do the actual traversal. > Yes, it also make sense. I saw write_reused_pack() in upstream git just > copy raw bytes from original to destination pack. You mentioned you have > something better for pack reuse - in your patch queue, in two words, is > it now reusing pack based on object, not raw bytes, or is it something > else? > > In other words in which way it works better? (I'm just curious here as > it is interesting to know) The problem with the existing pack-reuse code is that it doesn't kick in often enough. I think it looks to see that the client wants some percentage of the pack (e.g., 90%), and then just sends the whole beginning. This works especially badly if you have a bunch of related repositories packed together (e.g., all of the forks of torvalds/linux on GitHub), because you'll never hit 90% of that big pack; it has too much unrelated cruft, even if most of the stuff you want _is_ at the beginning. And "percent of pack" is not really a useful metric anyway. So the better scheme is more like: 1. Generate the bitmap of objects to send using reachability bitmaps. 2. Do a quick scan of their content in the packfile to see which can be reused verbatim. If they're base objects, we can send them as-is. If they're deltas, we can send them if their base is going to be sent. This fills in another bitmap of "reusable" objects. After a long string of unusable objects, you can give up and set the rest of the bitmap to zeroes. 3. Walk the "reuse" bitmap and send out the objects more-or-less verbatim. You do have make adjustments to delta-base-offsets for any "holes" (so if an object's entry says "my base is 500 bytes back", but you omitted some objects in between, you have to adjust that offset). The upside is that you can send out those objects without even making a "struct object_entry" for them, which drastically reduces the memory requirements for serving a clone. Any objects which didn't get marked for reuse just get handled in the usual way (so stuff that was not close by in the pack, or stuff that was pushed since your last big repack). The downside is that because those objects aren't in our normal packing list, they're not available as delta bases for the new objects we _do_ send. So it can make the resulting pack a little bit bigger. > Yes, for packing it is only hash which is used. And I assume name-hash > for bitmap is not enabled by default for compatibility with JGit code. > > It would make sense to me to eventually enable name-hash bitmap > extension by default, as packing result is much better with it. And > those who care about compatibility with JGit can just turn it off in > their git config. Correct, the defaults are for JGit compatibility. If you are not using JGit, you should have it on all the time. We went with the conservative default, but as more people using regular Git bitmaps, it would probably be good to make them less arcane and confusing to use. > > As I understand your use case, it is OK to do the less careful things. > > It's just that pack-objects until now has been split into two modes: > > packing to a file is careful, and packing to stdout is less so. And you > > want to pack to a file in the non-careful mode. > > Yes, it should be ok, as after repository extraction git-backup > verifies rev-list for all refs > > https://lab.nexedi.com/kirr/git-backup/blob/7fcb8c67/git-backup.go#L855 > > And if an object is missing - e.g. a blob - rev-list complains: > > fatal: missing blob object '980a0d5f19a64b4b30a87d4206aade58726b60e3' > > though it does not catch blob corruptions. Right, that makes sense. Even the pack-to-disk code invoked by git-repack is not foolproof for blob corruptions. It is only checking a crc, not the full sha1. So it's better than nothing, but not as careful as a full index-pack. > Probably pack.useBitmaps is of no use in normal situation, but for > debugging problems related to bitmaps it can be handy. Though when > someone debugs he/she can just adjust pack-objects.c . So should we > deprecate and eventually remove pack.useBitmaps ? In my opinion, yes. If we had any debugging option, it should be something like "core.usebitmaps", to tell _all_ of git to pretend that bitmaps don't exist (right now only pack-objects respects it, but we could be using them to optimize more traversals). > ---- 8< ---- > From: Kirill Smelkov <kirr@nexedi.com> > Subject: [PATCH v3] pack-objects: Teach it to use reachability bitmap index when > generating non-stdout pack too > > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > if a repository has bitmap index, pack-objects can nicely speedup > "Counting objects" graph traversal phase. That however was done only for > case when resultant pack is sent to stdout, not written into a file. I think we can give more motivation and context here with some of the bits that have come out in our discussion. Like: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. > We can teach pack-objects to use bitmap index for initial object > counting phase when generating resultant pack file too: > > - if we know bitmap index generation is not enabled for resultant pack: > > Current code has singleton bitmap_git so cannot work simultaneously > with two bitmap indices. So one reason is that it is not currently possible with the implementation. But I think it also gets to the above bit about "optimal" packs. We do not want to generate bitmaps off of bitmaps, because we lose information about the write order. That's probably worth mentioning here. > - if we keep pack reuse enabled still only for "send-to-stdout" case: > > Because on pack reuse raw entries are directly written out to destination > pack by write_reused_pack() bypassing needed for pack index generation > bookkeeping done by regular codepath in write_one() and friends. > > (at least that's my understanding after briefly looking at the code) Yes, that's right. We definitely want pack-reuse off for this case. > NOTE > > Jeff King suggested that it might be not generally a good idea to > use bitmap reachability index when repacking a repository. The reason > here is for on-disk repack by default we want > > - to produce good pack (with bitmap index not-yet-packed objects are > emitted to pack in suboptimal order). > > - to use more robust pack-generation codepath (avoiding possible > bugs in bitmap code and possible bitmap index corruption). Ah, this kind of covers the bits I talked about above. I think it makes more sense to introduce them as part of the motivation, though, rather than as a note here. > Jeff also suggests that pack.useBitmaps was probably a mistake to > introduce originally. This way we are not adding another config point, > but instead just always default to-file pack-objects not to use bitmap > index: Tools which need to generate on-disk packs with using bitmap, can > pass --use-bitmap-index explicitly. This part is important, though. Basically the reason we respect the command-line option is that we know that git-repack would never set it explicitly, so it is the hint that pack-objects can use to know which case we are serving: a careful repack of our data, or just extraction of some objects. > @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > { > uint32_t index_pos; > > + if (local && has_loose_object_nonlocal(sha1)) > + return 0; > + > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; Hrm. Adding entries from the bitmap should ideally be very fast, but here we're introducing extra lookups in the object database. I guess it only kicks in when --local is given, though, which most bitmap-using paths would not do. But is this check enough? The non-bitmap code path calls want_object_in_pack, which checks not only loose objects, but also non-local packs, and .keep. Those don't kick in for your use case. I wonder if we should simply have something like: if (local || ignore_packed_keep) use_bitmap_index = 0; and just skip bitmaps for those cases. That's easy to reason about, and I don't think anybody would care (your use case does not, and the repack use case is already not going to use bitmaps). > @@ -2773,7 +2777,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) > + /* > + * "soft" reasons not to use bitmaps - for on-disk repack by default we want > + * > + * - to produce good pack (with bitmap index not-yet-packed objects are > + * packed in suboptimal order). > + * > + * - to use more robust pack-generation codepath (avoiding possible > + * bugs in bitmap code and possible bitmap index corruption). > + */ > + if (!pack_to_stdout) > + use_bitmap_index_default = 0; > + > + if (use_bitmap_index < 0) > + use_bitmap_index = use_bitmap_index_default; > + > + /* "hard" reasons not to use bitmaps; these just won't work at all */ > + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) > use_bitmap_index = 0; And that local/keep logic above would just become "hard" reasons included here. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-25 18:40 ` Jeff King @ 2016-07-25 18:53 ` Jeff King 2016-07-27 20:15 ` Kirill Smelkov 1 sibling, 0 replies; 62+ messages in thread From: Jeff King @ 2016-07-25 18:53 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Mon, Jul 25, 2016 at 02:40:25PM -0400, Jeff King wrote: > > @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > { > > uint32_t index_pos; > > > > + if (local && has_loose_object_nonlocal(sha1)) > > + return 0; > > + > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > return 0; > > Hrm. Adding entries from the bitmap should ideally be very fast, but > here we're introducing extra lookups in the object database. I guess it > only kicks in when --local is given, though, which most bitmap-using > paths would not do. > > But is this check enough? The non-bitmap code path calls > want_object_in_pack, which checks not only loose objects, but also > non-local packs, and .keep. > > Those don't kick in for your use case. I wonder if we should simply have > something like: > > if (local || ignore_packed_keep) > use_bitmap_index = 0; > > and just skip bitmaps for those cases. That's easy to reason about, and > I don't think anybody would care (your use case does not, and the repack > use case is already not going to use bitmaps). BTW, I thought we had more optimizations in this area, but I realized that I had never sent them to the list. I just did, and you may want to take a peek at: http://thread.gmane.org/gmane.comp.version-control.git/300218 I doubt it will speed up your case much (unless you really do have tons of packs in your extraction). And I think it is still worth doing disabling I showed above, even with the optimizations, just because it's easier to reason about. So I _think_ those optimizations are orthogonal to what we're discussing here, but I wanted to point you at them just in case. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-25 18:40 ` Jeff King 2016-07-25 18:53 ` Jeff King @ 2016-07-27 20:15 ` Kirill Smelkov 2016-07-27 20:40 ` Junio C Hamano 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-27 20:15 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Mon, Jul 25, 2016 at 02:40:25PM -0400, Jeff King wrote: > On Wed, Jul 13, 2016 at 01:52:17PM +0300, Kirill Smelkov wrote: > > > > So I think if you were to repeatedly "git repack -adb" over time, you > > > would get worse and worse ordering as objects are added to the > > > repository. > > > > Jeff, first of all thanks for clarifying. > > > > So it is not-yet-packed-objects which make packing with bitmap less > > efficient. I was originally keeping in mind fresh repacked repository > > with just built bitmap index and for that case extracting pack with > > bitmap index seems to be just ok, but the more not-yet-packed objects we > > have the worse the result can be. > > Right. So I think your scheme is fine as long as you are doing your > regular "pack all into one" repacks with a real walk, and then > "branching" off of that with one-off bitmap-computed packs into files > (even if you later take a bunch of those files and pull them into a > single bitmapped, as long as that final "all into one" does the walk). > > Or I guess another way to think about it would be that if you're > computing bitmaps, you'd want to do the actual traversal. Yes, exactly, and thanks for stating it clearly. We are doing repacks and recomputing bitmaps doing the real walk. As you say this should be fine. > > Yes, it also make sense. I saw write_reused_pack() in upstream git just > > copy raw bytes from original to destination pack. You mentioned you have > > something better for pack reuse - in your patch queue, in two words, is > > it now reusing pack based on object, not raw bytes, or is it something > > else? > > > > In other words in which way it works better? (I'm just curious here as > > it is interesting to know) > > The problem with the existing pack-reuse code is that it doesn't kick in > often enough. I think it looks to see that the client wants some > percentage of the pack (e.g., 90%), and then just sends the whole > beginning. This works especially badly if you have a bunch of related > repositories packed together (e.g., all of the forks of torvalds/linux > on GitHub), because you'll never hit 90% of that big pack; it has too > much unrelated cruft, even if most of the stuff you want _is_ at the > beginning. And "percent of pack" is not really a useful metric anyway. > > So the better scheme is more like: > > 1. Generate the bitmap of objects to send using reachability bitmaps. > > 2. Do a quick scan of their content in the packfile to see which can > be reused verbatim. If they're base objects, we can send them > as-is. If they're deltas, we can send them if their base is going > to be sent. This fills in another bitmap of "reusable" objects. > > After a long string of unusable objects, you can give up and set > the rest of the bitmap to zeroes. > > 3. Walk the "reuse" bitmap and send out the objects more-or-less > verbatim. You do have make adjustments to delta-base-offsets for > any "holes" (so if an object's entry says "my base is 500 bytes > back", but you omitted some objects in between, you have to adjust > that offset). > > The upside is that you can send out those objects without even making a > "struct object_entry" for them, which drastically reduces the memory > requirements for serving a clone. Any objects which didn't get marked > for reuse just get handled in the usual way (so stuff that was not close > by in the pack, or stuff that was pushed since your last big repack). Thanks for clarifying. Yes, you are right, current upstream code checks to see whether >= 90% of pack is what destination wants and only reuse in such case. (I forgot about it, initially putting reuse at side in my head as "not applicable to git-backup" because of that >= 90% reason). So with the scheme you are drawing above it can be indeed more efficient, and applicable to both torvalds/linux+forks and git-backup case (extracting packs from big pack of all repos). I'm looking forward to your patches on this topic. Please cc me on those if you find it convenient. > The downside is that because those objects aren't in our normal packing > list, they're not available as delta bases for the new objects we _do_ > send. So it can make the resulting pack a little bit bigger. So once again, the badness effect is the more, the more we have such "new" objects not in original main pack - i.e. as loose objects or objects living in other smaller packs. The badness comes to zero in ideal case of freshly repacked repo with only one big pack. Also: after sending reused object, with more code effort, in principle we can hook reused object for being considered as delta-bases for new objects. I mean this should not be impossible in principle, or am I missing something? > > Yes, for packing it is only hash which is used. And I assume name-hash > > for bitmap is not enabled by default for compatibility with JGit code. > > > > It would make sense to me to eventually enable name-hash bitmap > > extension by default, as packing result is much better with it. And > > those who care about compatibility with JGit can just turn it off in > > their git config. > > Correct, the defaults are for JGit compatibility. If you are not using > JGit, you should have it on all the time. We went with the conservative > default, but as more people using regular Git bitmaps, it would probably > be good to make them less arcane and confusing to use. I've just checked - JGit 3.7.1.201504261725-r (the version from Debian - quite old) does _not_ barf on seeing bitmaps with "name hash" section. I mean at least it does not error-exit on `jgit gc`, like t5310-pack-bitmaps.sh says it can: # jgit gc will barf if it does not like our bitmaps jgit gc I will be sending another mail with relevant JGit people cc'ed to turn pack.writeBitmapHashCache=true by default. > > ---- 8< ---- > > From: Kirill Smelkov <kirr@nexedi.com> > > Subject: [PATCH v3] pack-objects: Teach it to use reachability bitmap index when > > generating non-stdout pack too > > > > Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) > > if a repository has bitmap index, pack-objects can nicely speedup > > "Counting objects" graph traversal phase. That however was done only for > > case when resultant pack is sent to stdout, not written into a file. > > I think we can give more motivation and context here with some of the > bits that have come out in our discussion. Like: > > The reason for this split is that pack-objects tries to determine how > "careful" it should be based on whether we are packing to disk or to > stdout. Packing to disk implies "git repack", and that we will likely > delete the old packs after finishing. We want to be more careful (so > as not to carry forward a corruption, and to generate a more optimal > pack), and we presumably run less frequently and can afford extra CPU. > Whereas packing to stdout implies serving a remote via "git fetch" or > "git push". This happens more frequently (e.g., a server handling many > fetching clients), and we assume the receiving end takes more > responsibility for verifying the data. > > But this isn't always the case. One might want to generate on-disk > packfiles for a specialized object transfer. Just using "--stdout" and > writing to a file is not optimal, as it will not generate the matching > pack index. > > So it would be useful to have some way of overriding this heuristic: > to tell pack-objects that even though it should generate on-disk > files, it is still OK to use the reachability bitmaps to do the > traversal. Thanks, I'm adding this to the patch message in appropriate place. > > We can teach pack-objects to use bitmap index for initial object > > counting phase when generating resultant pack file too: > > > > - if we know bitmap index generation is not enabled for resultant pack: > > > > Current code has singleton bitmap_git so cannot work simultaneously > > with two bitmap indices. > > So one reason is that it is not currently possible with the > implementation. But I think it also gets to the above bit about > "optimal" packs. We do not want to generate bitmaps off of bitmaps, > because we lose information about the write order. That's probably worth > mentioning here. Ok. I'm adding relevant note. > > - if we keep pack reuse enabled still only for "send-to-stdout" case: > > > > Because on pack reuse raw entries are directly written out to destination > > pack by write_reused_pack() bypassing needed for pack index generation > > bookkeeping done by regular codepath in write_one() and friends. > > > > (at least that's my understanding after briefly looking at the code) > > Yes, that's right. We definitely want pack-reuse off for this case. Ok, thanks for clarifying. > > NOTE > > > > Jeff King suggested that it might be not generally a good idea to > > use bitmap reachability index when repacking a repository. The reason > > here is for on-disk repack by default we want > > > > - to produce good pack (with bitmap index not-yet-packed objects are > > emitted to pack in suboptimal order). > > > > - to use more robust pack-generation codepath (avoiding possible > > bugs in bitmap code and possible bitmap index corruption). > > Ah, this kind of covers the bits I talked about above. I think it makes > more sense to introduce them as part of the motivation, though, rather > than as a note here. Thanks, good idea (after we discussed the robustness issues and start to take them into account). I'm moving this close to head of the description. > > Jeff also suggests that pack.useBitmaps was probably a mistake to > > introduce originally. This way we are not adding another config point, > > but instead just always default to-file pack-objects not to use bitmap > > index: Tools which need to generate on-disk packs with using bitmap, can > > pass --use-bitmap-index explicitly. > > This part is important, though. Basically the reason we respect the > command-line option is that we know that git-repack would never set it > explicitly, so it is the hint that pack-objects can use to know which > case we are serving: a careful repack of our data, or just extraction of > some objects. Yes. To make this very clear I'm also adding explicit note git-repack never passes --use-bitmap-index to pack-objects, so this way we can be sure regular on-disk repacking remains robust. > > @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > { > > uint32_t index_pos; > > > > + if (local && has_loose_object_nonlocal(sha1)) > > + return 0; > > + > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > return 0; > > Hrm. Adding entries from the bitmap should ideally be very fast, but > here we're introducing extra lookups in the object database. I guess it > only kicks in when --local is given, though, which most bitmap-using > paths would not do. > > But is this check enough? The non-bitmap code path calls > want_object_in_pack, which checks not only loose objects, but also > non-local packs, and .keep. > > Those don't kick in for your use case. I wonder if we should simply have > something like: > > if (local || ignore_packed_keep) > use_bitmap_index = 0; > > and just skip bitmaps for those cases. That's easy to reason about, and > I don't think anybody would care (your use case does not, and the repack > use case is already not going to use bitmaps). You are right - this is not enough. Initially I did not delved into this --local case and only cared to make tests pass (which were failing without this check when initial patch was using --use-bitmap-index by default). I agree it is simpler to just not handle this case for now. Actually after thinking about it a bit more, I can see that even current code, allows `git pack-objects --stdout --local or --honor-pack-keep` and does not handle those options properly. Thus I suggest to apply the following patch as the first one in this now series: ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH 1/2] pack-objects: Make sure use_bitmap_index is not active under --local or --honor-pack-keep Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass --local and --honor-pack-keep and bitmap indices were still used under such conditions - potentially giving wrong output (including objects from non-local or .keep'ed pack). Instead of fixing bitmapped codepath to respect those options, since currently no one actually need or use them in combination with bitmaps, let's just force use_bitmap_index=0 when any of --local or --honor-pack-keep are used and add appropriate comment about not-checking for those in add_object_entry_from_bitmap() Suggested-by: Jeff King <peff@peff.net> --- builtin/pack-objects.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 15866d7..d7cf782 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1055,6 +1055,12 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + /* + * for simplicity we always want object to be in pack, as + * use_bitmap_index codepath assumes neither --local nor --honor-pack-keep + * is active. + */ + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); @@ -2776,6 +2782,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) use_bitmap_index = 0; + /* + * "lazy" reasons not to use bitmaps; it is easier to reason about when + * neither --local nor --honor-pack-keep is in action, and so far no one + * needed nor implemented such support yet. + */ + if (local || ignore_packed_keep) + use_bitmap_index = 0; + + if (pack_to_stdout || !rev_list_all) write_bitmap_index = 0; -- 2.9.0.431.g3cb5c84 ---- 8< ---- On Mon, Jul 25, 2016 at 02:53:13PM -0400, Jeff King wrote: > On Mon, Jul 25, 2016 at 02:40:25PM -0400, Jeff King wrote: > > > > @@ -1052,6 +1053,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > > { > > > uint32_t index_pos; > > > > > > + if (local && has_loose_object_nonlocal(sha1)) > > > + return 0; > > > + > > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > > return 0; > > > > Hrm. Adding entries from the bitmap should ideally be very fast, but > > here we're introducing extra lookups in the object database. I guess it > > only kicks in when --local is given, though, which most bitmap-using > > paths would not do. > > > > But is this check enough? The non-bitmap code path calls > > want_object_in_pack, which checks not only loose objects, but also > > non-local packs, and .keep. > > > > Those don't kick in for your use case. I wonder if we should simply have > > something like: > > > > if (local || ignore_packed_keep) > > use_bitmap_index = 0; > > > > and just skip bitmaps for those cases. That's easy to reason about, and > > I don't think anybody would care (your use case does not, and the repack > > use case is already not going to use bitmaps). > > BTW, I thought we had more optimizations in this area, but I realized > that I had never sent them to the list. I just did, and you may want to > take a peek at: > > http://thread.gmane.org/gmane.comp.version-control.git/300218 > > I doubt it will speed up your case much (unless you really do have tons > of packs in your extraction). > And I think it is still worth doing disabling I showed above, even > with the optimizations, just because it's easier to reason about. > > So I _think_ those optimizations are orthogonal to what we're discussing > here, but I wanted to point you at them just in case. Thanks for the head-ups and for sending it. Yes, for git-backup we usually do restore from freshly repacked repo, But the optimization is useful in many other cases. After reading the patches I wonder why current state was for so a long time. I frequently have close to 50 packs in a repository, with only automatic gc triggering to do full repack, and for that case looping always through whole 50 packs for every object, even when we already found the pack an object lives in, is just a waste of time. And yes, on client side I almost never use alternate objects store and almost never do concurrent fetches etc (so if I understand correctly, no .keep files). Thanks for sending it. ( Btw, if we are talking about optimizations, here is something related to pack extractions, I think it is worth mentioning just in case: https://lab.nexedi.com/kirr/git-backup/blob/ad6c6853/NOTES.restore It is a scheme how to compute "non-overlapping" set of packs when restoring repositories from big backup repo, so both disk size (same objects in many packs) and time (computing packs with many same objects many times) are not wasted. Then shared between repositories packs are just hardlinked to appropriate places. It is in line with e.g. https://git.kernel.org/cgit/git/git.git/commit/tree-diff.c?id=72441af7 because it is algorithmical optimization, only for now I do not have working code implementing it yet. ) anyway updated main patch goes below: (whole-patch interdiff) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 1ef85a6..f8b173d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1053,12 +1053,15 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, { uint32_t index_pos; - if (local && has_loose_object_nonlocal(sha1)) - return 0; - if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + /* + * for simplicity we always want object to be in pack, as + * use_bitmap_index path assumes neither --local nor --honor-pack-keep + * is active. + */ + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); @@ -2796,6 +2799,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; + /* + * "lazy" reasons not to use bitmaps; it is easier to reason about when + * neither --local nor --honor-pack-keep is in action, and so far no one + * needed nor implemented such support yet. + */ + if (local || ignore_packed_keep) + use_bitmap_index = 0; + + if (pack_to_stdout || !rev_list_all) write_bitmap_index = 0; (log interdiff) @@ -5,29 +5,69 @@ if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. -We can teach pack-objects to use bitmap index for initial object +The reason here is for on-disk repack by default we want: + +- to produce good pack (with bitmap index not-yet-packed objects are + emitted to pack in suboptimal order). + +- to use more robust pack-generation codepath (avoiding possible + bugs in bitmap code and possible bitmap index corruption). + +Jeff Kind further explains: + + The reason for this split is that pack-objects tries to determine how + "careful" it should be based on whether we are packing to disk or to + stdout. Packing to disk implies "git repack", and that we will likely + delete the old packs after finishing. We want to be more careful (so + as not to carry forward a corruption, and to generate a more optimal + pack), and we presumably run less frequently and can afford extra CPU. + Whereas packing to stdout implies serving a remote via "git fetch" or + "git push". This happens more frequently (e.g., a server handling many + fetching clients), and we assume the receiving end takes more + responsibility for verifying the data. + + But this isn't always the case. One might want to generate on-disk + packfiles for a specialized object transfer. Just using "--stdout" and + writing to a file is not optimal, as it will not generate the matching + pack index. + + So it would be useful to have some way of overriding this heuristic: + to tell pack-objects that even though it should generate on-disk + files, it is still OK to use the reachability bitmaps to do the + traversal. + +So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: +- if we care it is not activated under git-repack: + + See above about repack robustness and not forward-carrying corruption. + - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. + We also want to avoid (at least with current implementation) + generating bitmaps off of bitmaps. The reason here is: when generating + a pack, not-yet-packed objects will be emitted into pack in + suboptimal order and added to tail of the bitmap as "extended entries". + When the resultant pack + some new objects in associated repository + are in turn used to generate another pack with bitmap, the situation + repeats: new objects are again not emitted optimally and just added to + bitmap tail - not in recency order. + + So the pack badness can grow over time when at each step we have + bitmapped pack + some other objects. That's why we want to avoid + generating bitmaps off of bitmaps, not to let pack badness grow. + - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. - (at least that's my understanding after briefly looking at the code) - -We also need to care and teach add_object_entry_from_bitmap() to respect ---local via not adding nonlocal loose object to resultant pack (this -is bitmap-codepath counterpart of daae0625 (pack-objects: extend --local -to mean ignore non-local loose objects too) -- not to break 'loose -objects in alternate ODB are not repacked' in t7700-repack.sh . - -Otherwise all git tests pass, and for pack-objects -> file we get nice +This way all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup @@ -55,27 +95,62 @@ initial motivation for the patch. NOTE -Jeff King suggested that it might be not generally a good idea to -use bitmap reachability index when repacking a repository. The reason -here is for on-disk repack by default we want - -- to produce good pack (with bitmap index not-yet-packed objects are - emitted to pack in suboptimal order). - -- to use more robust pack-generation codepath (avoiding possible - bugs in bitmap code and possible bitmap index corruption). - Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can -pass --use-bitmap-index explicitly. +pass --use-bitmap-index explicitly. And git-repack does never pass +--use-bitmap-index, so this way we can be sure regular on-disk repacking +remains robust. + +NOTE2 + +`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower +than `git pack-objects file.pack`. Extracting erp5.git pack from +lab.nexedi.com backup repository: + +---- 8< ---- +$ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack + +real 0m22.309s +user 0m21.148s +sys 0m0.932s + +$ time git index-pack erp5pack-stdout.pack + +real 0m50.873s <-- more than 2 times slower than time to generate pack itself! +user 0m49.300s +sys 0m1.360s +---- 8< ---- + +So the time for + + `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, + +while + + `pack-objects file.pack` which does both pack and index is 27s. + +And even + + `pack-objects --no-use-bitmap-index file.pack` is 37s. + +Jeff explains: + + The packfile does not carry the sha1 of the objects. A receiving + index-pack has to compute them itself, including inflating and applying + all of the deltas. + +that's why for `git-backup restore` we want to teach `git pack-objects +file.pack` to use bitmaps instead of using `git pack-objects --stdout +>file.pack` + `git index-pack file.pack`. More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 http://article.gmane.org/gmane.comp.version-control.git/299420 + http://article.gmane.org/gmane.comp.version-control.git/300217 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> (patch itself) ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff Kind further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: ---- 8< ---- $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s ---- 8< ---- So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 http://article.gmane.org/gmane.comp.version-control.git/299420 http://article.gmane.org/gmane.comp.version-control.git/300217 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 25 +++++++++++++++++++++---- t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index b0ed71f..39ab41d 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2244,6 +2244,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d7cf782..f8b173d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE; @@ -2231,7 +2232,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2494,7 +2495,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2779,7 +2780,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; /* diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 0d03583..0802b7c 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -117,6 +117,20 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.g3cb5c84 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-27 20:15 ` Kirill Smelkov @ 2016-07-27 20:40 ` Junio C Hamano 2016-07-28 20:22 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-07-27 20:40 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti Kirill Smelkov <kirr@nexedi.com> writes: > > From: Kirill Smelkov <kirr@nexedi.com> > Subject: [PATCH 1/2] pack-objects: Make sure use_bitmap_index is not active under > --local or --honor-pack-keep > > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > are two codepaths in pack-objects: with & without using bitmap > reachability index. > > However add_object_entry_from_bitmap(), despite its non-bitmapped > counterpart add_object_entry(), in no way does check for whether --local > or --honor-pack-keep should be respected. In non-bitmapped codepath this > is handled in want_object_in_pack(), but bitmapped codepath has simply > no such checking at all. > > The bitmapped codepath however was allowing to pass --local and > --honor-pack-keep and bitmap indices were still used under such > conditions - potentially giving wrong output (including objects from > non-local or .keep'ed pack). > > Instead of fixing bitmapped codepath to respect those options, since > currently no one actually need or use them in combination with bitmaps, > let's just force use_bitmap_index=0 when any of --local or > --honor-pack-keep are used and add appropriate comment about > not-checking for those in add_object_entry_from_bitmap() > > Suggested-by: Jeff King <peff@peff.net> > --- > builtin/pack-objects.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 15866d7..d7cf782 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1055,6 +1055,12 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; > > + /* > + * for simplicity we always want object to be in pack, as > + * use_bitmap_index codepath assumes neither --local nor --honor-pack-keep > + * is active. > + */ I am not sure this comment is useful to readers. Unless the readers are comparing add_object_entry() and this function and wondering why this side lacks a check here, iow, when they are merely following from a caller of this function through this function down to its callee to understand what goes on, this comment would not help them and only confuse them. If we were to say something to help those who are comparing these two functions, I think we should be more explicit, i.e. The caller disables use-bitmap-index when --local or --honor-pack-keep options are in effect because bitmap code is not prepared to handle them. Because the control does not reach here if these options are in effect, the check with want_object_in_pack() to skip objects is not done. or something like that. Or is the rest of the bitmap codepath prepared to handle these options and it is just the matter of adding the missing check with want_object_in_pack() here to make it work correctly? > create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); > > display_progress(progress_state, nr_result); > @@ -2776,6 +2782,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) > use_bitmap_index = 0; > > + /* > + * "lazy" reasons not to use bitmaps; it is easier to reason about when > + * neither --local nor --honor-pack-keep is in action, and so far no one > + * needed nor implemented such support yet. > + */ Justifying comment like this is a good idea, but the comment above does not make it very clear that this is a correctness fix, i.e. if we do not disable, the code will do a wrong thing. The other logic to disable use of bitmap we can see in the pre-context would also benefit from some description as to why; 6b8fda2d (pack-objects: use bitmaps when packing objects, 2013-12-21) didn't do a very good job in that---the reason is not clear in its log message, either. > + if (local || ignore_packed_keep) > + use_bitmap_index = 0; > + > + I see one extra blank line here ;-) > if (pack_to_stdout || !rev_list_all) > write_bitmap_index = 0; Thanks. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-27 20:40 ` Junio C Hamano @ 2016-07-28 20:22 ` Kirill Smelkov 2016-07-28 21:18 ` Junio C Hamano 0 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-28 20:22 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti Junio, first of all thanks for feedback, On Wed, Jul 27, 2016 at 01:40:36PM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > > From: Kirill Smelkov <kirr@nexedi.com> > > Subject: [PATCH 1/2] pack-objects: Make sure use_bitmap_index is not active under > > --local or --honor-pack-keep > > > > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > > are two codepaths in pack-objects: with & without using bitmap > > reachability index. > > > > However add_object_entry_from_bitmap(), despite its non-bitmapped > > counterpart add_object_entry(), in no way does check for whether --local > > or --honor-pack-keep should be respected. In non-bitmapped codepath this > > is handled in want_object_in_pack(), but bitmapped codepath has simply > > no such checking at all. > > > > The bitmapped codepath however was allowing to pass --local and > > --honor-pack-keep and bitmap indices were still used under such > > conditions - potentially giving wrong output (including objects from > > non-local or .keep'ed pack). > > > > Instead of fixing bitmapped codepath to respect those options, since > > currently no one actually need or use them in combination with bitmaps, > > let's just force use_bitmap_index=0 when any of --local or > > --honor-pack-keep are used and add appropriate comment about > > not-checking for those in add_object_entry_from_bitmap() > > > > Suggested-by: Jeff King <peff@peff.net> > > --- > > builtin/pack-objects.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index 15866d7..d7cf782 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -1055,6 +1055,12 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > return 0; > > > > + /* > > + * for simplicity we always want object to be in pack, as > > + * use_bitmap_index codepath assumes neither --local nor --honor-pack-keep > > + * is active. > > + */ > > I am not sure this comment is useful to readers. > > Unless the readers are comparing add_object_entry() and this > function and wondering why this side lacks a check here, iow, when > they are merely following from a caller of this function through > this function down to its callee to understand what goes on, this > comment would not help them and only confuse them. > > If we were to say something to help those who are comparing these > two functions, I think we should be more explicit, i.e. > > The caller disables use-bitmap-index when --local or > --honor-pack-keep options are in effect because bitmap code is > not prepared to handle them. Because the control does not reach > here if these options are in effect, the check with > want_object_in_pack() to skip objects is not done. > > or something like that. You are probably right. > Or is the rest of the bitmap codepath prepared to handle these > options and it is just the matter of adding the missing check with > want_object_in_pack() here to make it work correctly? I'm waiting so long for main patch to be at least queued to pu, that I'm now a bit frustrated and ready to do something not related to main goal :) (they say every joke contains part of a joke). Here is something from sleepy me: ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Date: Wed, 27 Jul 2016 22:18:04 +0300 Subject: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. For "2" we always have pack not yet found by bitmap traversal code, and thus we can simply reuse non-bitmapped want_object_in_pack() to find in which pack an object lives and also for taking omitting decision. For "1" we always have pack already found by bitmap traversal code and we only need to check that pack for same omission criteria used in want_object_in_pack() for found_pack. Suggested-by: Junio C Hamano <gitster@pobox.com> Discussed-with: Jeff King <peff@peff.net> --- builtin/pack-objects.c | 39 +++++++++++++++++++ t/t5310-pack-bitmaps.sh | 100 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a2f8cfd..34b3019 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -987,6 +987,42 @@ static int want_object_in_pack(const unsigned char *sha1, return 1; } +/* Like want_object_in_pack() but for objects coming from-under bitmapped traversal */ +static int want_object_in_pack_bitmap(const unsigned char *sha1, + struct packed_git **found_pack, + off_t *found_offset) +{ + struct packed_git *p = *found_pack; + + /* + * There are two types of requests coming here: + * 1. entries coming from main pack covered by bitmap index, and + * 2. object coming from, possibly alternate, loose or other packs. + * + * For "1" we always have *found_pack != NULL passed here from + * traverse_bitmap_commit_list(). (*found_pack is bitmap_git.pack + * actually). + * + * For "2" we always have *found_pack == NULL passed here from + * traverse_bitmap_commit_list() - since this is the way bitmap + * traversal passes here "extended" bitmap entries. + */ + + /* objects not covered by bitmap */ + if (!p) + return want_object_in_pack(sha1, 0, found_pack, found_offset); + + /* objects covered by bitmap - we only have to check p wrt local and .keep */ + if (incremental) + return 0; + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + return 1; +} + static void create_object_entry(const unsigned char *sha1, enum object_type type, uint32_t hash, @@ -1055,6 +1091,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack_bitmap(sha1, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..a76f6ca 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,88 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + mkdir -p alt_objects/pack && + echo $(pwd)/alt_objects > .git/objects/info/alternates && + echo content1 > file1 && + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | \ + git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + git verify-pack -v 1.pack >1.objects && + if egrep "^$objsha1" 1.objects; then + echo "Non-local object present in pack generated with --local: $objsha1" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 > file2 && + objsha2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + pack2=$(echo $objsha2 | \ + git pack-objects pack2) && + mv pack2-$pack2.* .git/objects/pack/ && + touch .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $objsha2) && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + git verify-pack -v 2a.pack >2a.objects && + if egrep "^$objsha2" 2a.objects; then + echo "Object from .keeped pack present in pack generated with --honor-pack-keep: $objsha2" + return 1 + fi +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + git verify-pack -v 2b.pack >2b.objects && + if egrep "^$objsha2" 2b.objects; then + echo "Non-local object present in pack generated with --local: $objsha2" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + git verify-pack -v .git/objects/pack/$packbitmap.pack >packbitmap.verify && + grep -o "^$_x40" packbitmap.verify |sort >packbitmap.objects && + touch .git/objects/pack/$packbitmap.keep && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + git verify-pack -v 3a.pack >3a.objects && + if grep -qFf packbitmap.objects 3a.objects; then + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" + return 1 + fi && + rm .git/objects/pack/$packbitmap.keep +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + git verify-pack -v 3b.pack >3b.objects && + if grep -qFf packbitmap.objects 3b.objects; then + echo "Non-local object from bitmapped pack present in pack generated with --local" + return 1 + fi && + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +225,24 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + git verify-pack -v 4.pack >4.verify && + grep -o "^$_x40" 4.verify |sort >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + grep -o "^$_x40" revlist |sort >objects && + if grep -qvFf objects 4.objects; then + echo "Expected objects not present in incremental pack" + return 1 + fi +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.0.431.g3cb5c84 ---- 8< ---- and main patch updated to avoid trivial conflicts ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Date: Thu, 7 Jul 2016 20:12:00 +0300 Subject: [PATCH 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff Kind further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: ---- 8< ---- $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s ---- 8< ---- So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 http://article.gmane.org/gmane.comp.version-control.git/299420 http://article.gmane.org/gmane.comp.version-control.git/300217 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 25 +++++++++++++++++++++---- t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 8b1aee4..6a903c0 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2244,6 +2244,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 34b3019..2b2e74a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2264,7 +2265,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2527,7 +2528,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2812,7 +2813,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index a76f6ca..58c3b29 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -200,6 +200,20 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' mv alt_objects/pack/$packbitmap.* .git/objects/pack/ ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.g3cb5c84 ---- 8< ---- Thanks, Kirill ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-28 20:22 ` Kirill Smelkov @ 2016-07-28 21:18 ` Junio C Hamano 2016-07-29 7:40 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-07-28 21:18 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti Kirill Smelkov <kirr@nexedi.com> writes: > I'm waiting so long for main patch to be at least queued to pu, that I'm > now a bit frustrated and ready to do something not related to main goal :) Perhaps the first step would be to stop putting multiple patches in a single e-mail buried after a few pages of discussion. I will not even find that there _are_ multiple patches in the message if I am not involved directly in the discussion, and the discussion is still ongoing, because it is likely that I'd skim just a few paragraphs at the top before going on to other messages. I won't touch the message I am responding to, as your -- 8< -- cut mark does not even seem to be a reliable marker between patches (i.e. I see something like this that is clearly not a message boundary: than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: ---- 8< ---- $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s ... ) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too 2016-07-28 21:18 ` Junio C Hamano @ 2016-07-29 7:40 ` Kirill Smelkov 2016-07-29 7:46 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Kirill Smelkov 2016-07-29 7:47 ` [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Kirill Smelkov 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-07-29 7:40 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Vicent Marti On Thu, Jul 28, 2016 at 02:18:29PM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > I'm waiting so long for main patch to be at least queued to pu, that I'm > > now a bit frustrated and ready to do something not related to main goal :) > > Perhaps the first step would be to stop putting multiple patches in > a single e-mail buried after a few pages of discussion. I will not > even find that there _are_ multiple patches in the message if I am > not involved directly in the discussion, and the discussion is still > ongoing, because it is likely that I'd skim just a few paragraphs at > the top before going on to other messages. > > I won't touch the message I am responding to, as your -- 8< -- cut > mark does not even seem to be a reliable marker between patches > (i.e. I see something like this that is clearly not a message > boundary: > > than `git pack-objects file.pack`. Extracting erp5.git pack from > lab.nexedi.com backup repository: > > ---- 8< ---- > $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack > > real 0m22.309s > ... > ) Ok, makes sense and my fault. I'm resending each patch as separate message in reply to this mail. ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-07-29 7:40 ` Kirill Smelkov @ 2016-07-29 7:46 ` Kirill Smelkov 2016-08-01 18:17 ` Junio C Hamano 2016-07-29 7:47 ` [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-29 7:46 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. For "2" we always have pack not yet found by bitmap traversal code, and thus we can simply reuse non-bitmapped want_object_in_pack() to find in which pack an object lives and also for taking omitting decision. For "1" we always have pack already found by bitmap traversal code and we only need to check that pack for same criteria used in want_object_in_pack() for found_pack. Suggested-by: Junio C Hamano <gitster@pobox.com> Discussed-with: Jeff King <peff@peff.net> --- builtin/pack-objects.c | 39 +++++++++++++++++++ t/t5310-pack-bitmaps.sh | 100 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a2f8cfd..34b3019 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -987,6 +987,42 @@ static int want_object_in_pack(const unsigned char *sha1, return 1; } +/* Like want_object_in_pack() but for objects coming from-under bitmapped traversal */ +static int want_object_in_pack_bitmap(const unsigned char *sha1, + struct packed_git **found_pack, + off_t *found_offset) +{ + struct packed_git *p = *found_pack; + + /* + * There are two types of requests coming here: + * 1. entries coming from main pack covered by bitmap index, and + * 2. object coming from, possibly alternate, loose or other packs. + * + * For "1" we always have *found_pack != NULL passed here from + * traverse_bitmap_commit_list(). (*found_pack is bitmap_git.pack + * actually). + * + * For "2" we always have *found_pack == NULL passed here from + * traverse_bitmap_commit_list() - since this is the way bitmap + * traversal passes here "extended" bitmap entries. + */ + + /* objects not covered by bitmap */ + if (!p) + return want_object_in_pack(sha1, 0, found_pack, found_offset); + + /* objects covered by bitmap - we only have to check p wrt local and .keep */ + if (incremental) + return 0; + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + return 1; +} + static void create_object_entry(const unsigned char *sha1, enum object_type type, uint32_t hash, @@ -1055,6 +1091,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack_bitmap(sha1, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..a76f6ca 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,88 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + mkdir -p alt_objects/pack && + echo $(pwd)/alt_objects > .git/objects/info/alternates && + echo content1 > file1 && + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | \ + git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + git verify-pack -v 1.pack >1.objects && + if egrep "^$objsha1" 1.objects; then + echo "Non-local object present in pack generated with --local: $objsha1" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 > file2 && + objsha2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + pack2=$(echo $objsha2 | \ + git pack-objects pack2) && + mv pack2-$pack2.* .git/objects/pack/ && + touch .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $objsha2) && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + git verify-pack -v 2a.pack >2a.objects && + if egrep "^$objsha2" 2a.objects; then + echo "Object from .keeped pack present in pack generated with --honor-pack-keep: $objsha2" + return 1 + fi +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + git verify-pack -v 2b.pack >2b.objects && + if egrep "^$objsha2" 2b.objects; then + echo "Non-local object present in pack generated with --local: $objsha2" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + git verify-pack -v .git/objects/pack/$packbitmap.pack >packbitmap.verify && + grep -o "^$_x40" packbitmap.verify |sort >packbitmap.objects && + touch .git/objects/pack/$packbitmap.keep && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + git verify-pack -v 3a.pack >3a.objects && + if grep -qFf packbitmap.objects 3a.objects; then + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" + return 1 + fi && + rm .git/objects/pack/$packbitmap.keep +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + git verify-pack -v 3b.pack >3b.objects && + if grep -qFf packbitmap.objects 3b.objects; then + echo "Non-local object from bitmapped pack present in pack generated with --local" + return 1 + fi && + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +225,24 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + git verify-pack -v 4.pack >4.verify && + grep -o "^$_x40" 4.verify |sort >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + grep -o "^$_x40" revlist |sort >objects && + if grep -qvFf objects 4.objects; then + echo "Expected objects not present in incremental pack" + return 1 + fi +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.0.431.g3cb5c84 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-07-29 7:46 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Kirill Smelkov @ 2016-08-01 18:17 ` Junio C Hamano 2016-08-08 12:37 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-01 18:17 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > are two codepaths in pack-objects: with & without using bitmap > reachability index. > > However add_object_entry_from_bitmap(), despite its non-bitmapped > counterpart add_object_entry(), in no way does check for whether --local > or --honor-pack-keep or --incremental should be respected. In > non-bitmapped codepath this is handled in want_object_in_pack(), but > bitmapped codepath has simply no such checking at all. > > The bitmapped codepath however was allowing to pass in all those options > and with bitmap indices still being used under such conditions - > potentially giving wrong output (e.g. including objects from non-local or > .keep'ed pack). > > We can easily fix this by noting the following: when an object comes to > add_object_entry_from_bitmap() it can come for two reasons: > > 1. entries coming from main pack covered by bitmap index, and > 2. object coming from, possibly alternate, loose or other packs. > > For "2" we always have pack not yet found by bitmap traversal code, and > thus we can simply reuse non-bitmapped want_object_in_pack() to find in > which pack an object lives and also for taking omitting decision. > > For "1" we always have pack already found by bitmap traversal code and we > only need to check that pack for same criteria used in > want_object_in_pack() for found_pack. > > Suggested-by: Junio C Hamano <gitster@pobox.com> > Discussed-with: Jeff King <peff@peff.net> > --- I do not think I suggested much of this to deserve credit like this, though, as I certainly haven't thought about the pros-and-cons between adding the same "some object in pack may not want to be in the output" logic to the bitmap side, or punting the bitmap codepath when local/keep are involved. > +/* Like want_object_in_pack() but for objects coming from-under bitmapped traversal */ > +static int want_object_in_pack_bitmap(const unsigned char *sha1, > + struct packed_git **found_pack, > + off_t *found_offset) > +{ > + struct packed_git *p = *found_pack; > + > + /* > + * There are two types of requests coming here: > + * 1. entries coming from main pack covered by bitmap index, and > + * 2. object coming from, possibly alternate, loose or other packs. > + * > + * For "1" we always have *found_pack != NULL passed here from > + * traverse_bitmap_commit_list(). (*found_pack is bitmap_git.pack > + * actually). > + * > + * For "2" we always have *found_pack == NULL passed here from > + * traverse_bitmap_commit_list() - since this is the way bitmap > + * traversal passes here "extended" bitmap entries. > + */ > + > + /* objects not covered by bitmap */ > + if (!p) > + return want_object_in_pack(sha1, 0, found_pack, found_offset); > + /* objects covered by bitmap - we only have to check p wrt local and .keep */ I am assuming that p != NULL only means "this object exists in THIS pack", without saying anything about "this object may also exist in other places", but "we only have to check" implies that "p != NULL" means "this object exists *ONLY* in this pack and nowhere else". Puzzled. > + if (incremental) > + return 0; > + if (local && !p->pack_local) > + return 0; > + if (ignore_packed_keep && p->pack_local && p->pack_keep) > + return 0; > + > + return 1; > +} > + ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-01 18:17 ` Junio C Hamano @ 2016-08-08 12:37 ` Kirill Smelkov 2016-08-08 13:50 ` Jeff King 2016-08-08 16:11 ` Junio C Hamano 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 12:37 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 01, 2016 at 11:17:30AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > > are two codepaths in pack-objects: with & without using bitmap > > reachability index. > > > > However add_object_entry_from_bitmap(), despite its non-bitmapped > > counterpart add_object_entry(), in no way does check for whether --local > > or --honor-pack-keep or --incremental should be respected. In > > non-bitmapped codepath this is handled in want_object_in_pack(), but > > bitmapped codepath has simply no such checking at all. > > > > The bitmapped codepath however was allowing to pass in all those options > > and with bitmap indices still being used under such conditions - > > potentially giving wrong output (e.g. including objects from non-local or > > .keep'ed pack). > > > > We can easily fix this by noting the following: when an object comes to > > add_object_entry_from_bitmap() it can come for two reasons: > > > > 1. entries coming from main pack covered by bitmap index, and > > 2. object coming from, possibly alternate, loose or other packs. > > > > For "2" we always have pack not yet found by bitmap traversal code, and > > thus we can simply reuse non-bitmapped want_object_in_pack() to find in > > which pack an object lives and also for taking omitting decision. > > > > For "1" we always have pack already found by bitmap traversal code and we > > only need to check that pack for same criteria used in > > want_object_in_pack() for found_pack. > > > > Suggested-by: Junio C Hamano <gitster@pobox.com> > > Discussed-with: Jeff King <peff@peff.net> > > --- > > I do not think I suggested much of this to deserve credit like this, > though, as I certainly haven't thought about the pros-and-cons > between adding the same "some object in pack may not want to be in > the output" logic to the bitmap side, or punting the bitmap codepath > when local/keep are involved. I understand. Still for me it was you who convinced me to add proper support for e.g. --local vs bitmap instead of special-casing it. I think we also can avoid punting the bitmap codepath - please see below. > > +/* Like want_object_in_pack() but for objects coming from-under bitmapped traversal */ > > +static int want_object_in_pack_bitmap(const unsigned char *sha1, > > + struct packed_git **found_pack, > > + off_t *found_offset) > > +{ > > + struct packed_git *p = *found_pack; > > + > > + /* > > + * There are two types of requests coming here: > > + * 1. entries coming from main pack covered by bitmap index, and > > + * 2. object coming from, possibly alternate, loose or other packs. > > + * > > + * For "1" we always have *found_pack != NULL passed here from > > + * traverse_bitmap_commit_list(). (*found_pack is bitmap_git.pack > > + * actually). > > + * > > + * For "2" we always have *found_pack == NULL passed here from > > + * traverse_bitmap_commit_list() - since this is the way bitmap > > + * traversal passes here "extended" bitmap entries. > > + */ > > + > > + /* objects not covered by bitmap */ > > + if (!p) > > + return want_object_in_pack(sha1, 0, found_pack, found_offset); > > + /* objects covered by bitmap - we only have to check p wrt local and .keep */ > > I am assuming that p != NULL only means "this object exists in THIS > pack", without saying anything about "this object may also exist in > other places", but "we only have to check" implies that "p != NULL" > means "this object exists *ONLY* in this pack and nowhere else". > > Puzzled. You are right. Being new to --local and .keep I've missed this. I've added tests to cover cases like "object lives in both bitmapped pack and non-local loose or .keep'ed pack" and made the adjustments. The checks are now live unified in want_object_in_pack() for both bitmapped and non-bitmapped codepaths. Please apply the following corrected patch on top of 56dfeb62 (jk/pack-objects-optim). Thanks, Kirill ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Date: Fri, 29 Jul 2016 10:46:56 +0300 Subject: [PATCH v2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way caring not to do more than 1 iteration in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.14(8.18+0.31) 8.89(7.92+0.28) -2.7% 5310.3: simulated clone 1.94(2.14+0.07) 1.91(2.08+0.08) -1.5% 5310.4: simulated fetch 0.75(1.01+0.02) 0.75(0.94+0.07) +0.0% 5310.6: partial bitmap 1.99(2.44+0.16) 1.95(2.40+0.14) -2.0% with all differences strangely showing we are a bit faster now, but probably all being within noise. Suggested-by: Junio C Hamano <gitster@pobox.com> Discussed-with: Jeff King <peff@peff.net> --- builtin/pack-objects.c | 36 ++++++++++++----- t/t5310-pack-bitmaps.sh | 103 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 130 insertions(+), 9 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..2c274d3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -948,9 +948,9 @@ static int have_duplicate_entry(const unsigned char *sha1, * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * As a side effect of this check, if object's pack entry was not already found, + * we will find the packed version of this object, if any. We therefore pass + * out the pack information to avoid having to look it up again later. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +958,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + struct packed_git *pack1 = *found_pack; + int pack1_seen = !pack1; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present the loop will degenerate to have only 1 iteration. + */ + for (p = (pack1 ? pack1 : packed_git); p; + p = (pack1_seen ? p->next : packed_git), pack1_seen = 1) { + off_t offset; + + if (p == pack1) { + if (pack1_seen) + continue; + offset = *found_offset; + } + else { + offset = find_pack_entry_one(sha1, p); + } - for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -1039,8 +1054,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1088,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..1a61de4 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -16,6 +16,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git show-ref -s master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +119,90 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + mkdir -p alt_objects/pack && + echo $(pwd)/alt_objects > .git/objects/info/alternates && + echo content1 > file1 && + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | \ + git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + git verify-pack -v 1.pack >1.objects && + echo -e "$objsha1\n$blob" >nonlocal-loose && + if grep -qFf nonlocal-loose 1.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 > file2 && + objsha2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + echo -e "$objsha2\n$bitmaptip" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + touch .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $objsha2) && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + git verify-pack -v 2a.pack >2a.objects && + if grep -qFf keepobjects 2a.objects; then + echo "Object from .keeped pack present in pack generated with --honor-pack-keep" + return 1 + fi +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + git verify-pack -v 2b.pack >2b.objects && + if grep -qFf keepobjects 2b.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + git verify-pack -v .git/objects/pack/$packbitmap.pack >packbitmap.verify && + grep -o "^$_x40" packbitmap.verify |sort >packbitmap.objects && + touch .git/objects/pack/$packbitmap.keep && + echo HEAD | \ + git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + git verify-pack -v 3a.pack >3a.objects && + if grep -qFf packbitmap.objects 3a.objects; then + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" + return 1 + fi && + rm .git/objects/pack/$packbitmap.keep +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + echo HEAD | \ + git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + git verify-pack -v 3b.pack >3b.objects && + if grep -qFf packbitmap.objects 3b.objects; then + echo "Non-local object from bitmapped pack present in pack generated with --local" + return 1 + fi && + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +228,24 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + git verify-pack -v 4.pack >4.verify && + grep -o "^$_x40" 4.verify |sort >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + grep -o "^$_x40" revlist |sort >objects && + if grep -qvFf objects 4.objects; then + echo "Expected objects not present in incremental pack" + return 1 + fi +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 12:37 ` Kirill Smelkov @ 2016-08-08 13:50 ` Jeff King 2016-08-08 13:51 ` Jeff King ` (2 more replies) 2016-08-08 16:11 ` Junio C Hamano 1 sibling, 3 replies; 62+ messages in thread From: Jeff King @ 2016-08-08 13:50 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 03:37:35PM +0300, Kirill Smelkov wrote: > @@ -958,15 +958,30 @@ static int want_object_in_pack(const unsigned char *sha1, > off_t *found_offset) > { > struct packed_git *p; > + struct packed_git *pack1 = *found_pack; > + int pack1_seen = !pack1; > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > return 0; > > - *found_pack = NULL; > - *found_offset = 0; > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep files > + * are present the loop will degenerate to have only 1 iteration. > + */ > + for (p = (pack1 ? pack1 : packed_git); p; > + p = (pack1_seen ? p->next : packed_git), pack1_seen = 1) { > + off_t offset; Hmm. So this is basically sticking the found-pack at the front of the loop. We either need to look at zero packs here (we already know where the object is, and we don't need to bother with --local or .keep lookups), or we need to look at all of them (to check for local/keep). I guess you structured it this way to try to reuse the "can we break out early" logic from the middle of the loop. So we go through the loop one time, and then break out. And then this: > + if (p == pack1) { > + if (pack1_seen) > + continue; > + offset = *found_offset; > + } > + else { > + offset = find_pack_entry_one(sha1, p); > + } is meant to make that one-time through the loop cheaper. So I don't think it's wrong, but it's very confusing to me. Would it be simpler to stick that logic in a function like: static int want_found_object(int exclude, struct packed_git *pack) { if (exclude) return 1; if (incremental) return 0; /* if we can break early, then do so */ if (!ignore_packed_keep && (!local || !have_non_local_packs)) return 1; if (local && !p->pack_local) return 0; if (ignore_packed_keep && p->pack_local && p->pack_keep) return 0; /* indeterminate; keep looking for more packs */ return -1; } static int want_object_in_pack(...) { ... if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; if (*found_pack) { int ret = want_found_object(exclude, *found_pack); if (ret != -1) return ret; } for (p = packed_git; p; p = p->next) { off_t offset; if (p == *found_pack) offset = *found_offset; else offset = find_pack_entry(sha1, p); if (offset) { ... fill in *found_pack ... int ret = want_found_object(exclude, p); if (ret != -1) return ret; } } return 1; } That's a little more verbose, but IMHO the flow is a lot easier to follow (especially as the later re-rolls of that series actually muck with the loop order more, but with this approach there's no conflict). > static int add_object_entry(const unsigned char *sha1, enum object_type type, > const char *name, int exclude) > { > - struct packed_git *found_pack; > - off_t found_offset; > + struct packed_git *found_pack = NULL; > + off_t found_offset = 0; > uint32_t index_pos; > > if (have_duplicate_entry(sha1, exclude, &index_pos)) > @@ -1073,6 +1088,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; > > + if (!want_object_in_pack(sha1, 0, &pack, &offset)) > + return 0; > + This part looks correct and easy to understand. > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..1a61de4 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -16,6 +16,7 @@ test_expect_success 'setup repo with moderate-sized history' ' > test_commit side-$i > done && > git checkout master && > + bitmaptip=$(git show-ref -s master) && Our usual method for getting a sha1 is "git rev-parse". I don't think there's anything wrong with your method, but it might be better to stick to the canonical one (I had to actually look up "show-ref -s"). > @@ -118,6 +119,90 @@ test_expect_success 'incremental repack can disable bitmaps' ' > git repack -d --no-write-bitmap-index > ' > > +test_expect_success 'pack-objects respects --local (non-local loose)' ' > + mkdir -p alt_objects/pack && > + echo $(pwd)/alt_objects > .git/objects/info/alternates && > + echo content1 > file1 && Style: we don't put a space between ">" and the filename. > + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && > + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && I'm not sure why we need two objects in the fake alt_objects repository. Shouldn't one be enough to do the test? > + git add file1 && I think this will actually skip the writing of the loose object, because it's already available in the alternate object store. You probably want to do this before adding it there. > + test_tick && > + git commit -m commit_file1 && > + echo HEAD | \ No need for "\" after a "|"; the shell knows it has to keep looking. > + git pack-objects --local --stdout --revs >1.pack && > + git index-pack 1.pack && I'd have expected you to use the non-stdout version here. Is this meant to be independent of your other patch (I think that's OK). > + git verify-pack -v 1.pack >1.objects && It's cheaper to use "git show-index <1.pack", and the output is saner, too. > + echo -e "$objsha1\n$blob" >nonlocal-loose && "echo -e" isn't portable. You can use "printf", or two echos like: { echo one && echo two } >file (though I'm still not sure what we gain by checking both). > + if grep -qFf nonlocal-loose 1.objects; then > + echo "Non-local object present in pack generated with --local" > + return 1 > + fi > +' grep -f isn't portable. However, I think: echo $objsha1 >expect && git show-index <1.pack | cut -d' ' -f2 >actual test_cmp expect actual would work (if you do stick with two entries, you might need to sort your "expect"). I think similar comments apply to the other tests. I would have expected "respects --local (non-local pack)" to come next (i.e., to keep all of the --local tests together). But you seem to interleave them with --honor-pack-keep. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 13:50 ` Jeff King @ 2016-08-08 13:51 ` Jeff King 2016-08-08 16:08 ` Junio C Hamano 2016-08-08 19:06 ` Junio C Hamano 2 siblings, 0 replies; 62+ messages in thread From: Jeff King @ 2016-08-08 13:51 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 09:50:20AM -0400, Jeff King wrote: > > + git pack-objects --local --stdout --revs >1.pack && > > + git index-pack 1.pack && > > I'd have expected you to use the non-stdout version here. Is this meant > to be independent of your other patch (I think that's OK). Oh, nevermind, I forgot this was meant to be a preparatory patch. So it makes sense to use --stdout in the tests. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 13:50 ` Jeff King 2016-08-08 13:51 ` Jeff King @ 2016-08-08 16:08 ` Junio C Hamano 2016-08-08 19:06 ` Junio C Hamano 2 siblings, 0 replies; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 16:08 UTC (permalink / raw) To: Jeff King Cc: Kirill Smelkov, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Jeff King <peff@peff.net> writes: > On Mon, Aug 08, 2016 at 03:37:35PM +0300, Kirill Smelkov wrote: > ... > static int want_object_in_pack(...) > { > ... > if (!exclude && local && has_loose_object_nonlocal(sha1)) > return 0; > > if (*found_pack) { > int ret = want_found_object(exclude, *found_pack); > if (ret != -1) > return ret; > } > > for (p = packed_git; p; p = p->next) { > off_t offset; > > if (p == *found_pack) > offset = *found_offset; > else > offset = find_pack_entry(sha1, p); > if (offset) { > ... fill in *found_pack ... > int ret = want_found_object(exclude, p); > if (ret != -1) > return ret; > } > } > return 1; > } > > That's a little more verbose, but IMHO the flow is a lot easier to > follow (especially as the later re-rolls of that series actually muck > with the loop order more, but with this approach there's no conflict). I agree; Kirill's version was so confusing that I couldn't see what it was trying to do with "pack1_seen" flag that is reset every time loop repeats (at least, before got my coffee ;-). A helper function like the above makes the logic a lot easier to grasp. >> static int add_object_entry(const unsigned char *sha1, enum object_type type, >> const char *name, int exclude) >> { >> - struct packed_git *found_pack; >> - off_t found_offset; >> + struct packed_git *found_pack = NULL; >> + off_t found_offset = 0; >> uint32_t index_pos; >> >> if (have_duplicate_entry(sha1, exclude, &index_pos)) >> @@ -1073,6 +1088,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, >> if (have_duplicate_entry(sha1, 0, &index_pos)) >> return 0; >> >> + if (!want_object_in_pack(sha1, 0, &pack, &offset)) >> + return 0; >> + > > This part looks correct and easy to understand. Yes. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 13:50 ` Jeff King 2016-08-08 13:51 ` Jeff King 2016-08-08 16:08 ` Junio C Hamano @ 2016-08-08 19:06 ` Junio C Hamano 2016-08-08 19:09 ` Jeff King 2 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 19:06 UTC (permalink / raw) To: Jeff King Cc: Kirill Smelkov, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Jeff King <peff@peff.net> writes: >> + if grep -qFf nonlocal-loose 1.objects; then >> + echo "Non-local object present in pack generated with --local" >> + return 1 >> + fi >> +' > > grep -f isn't portable. However, I think: > > echo $objsha1 >expect && > git show-index <1.pack | cut -d' ' -f2 >actual > test_cmp expect actual > > would work (if you do stick with two entries, you might need to sort > your "expect"). Hmph, are you sure? "grep -f pattern_file" is in POSIX.1. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 19:06 ` Junio C Hamano @ 2016-08-08 19:09 ` Jeff King 0 siblings, 0 replies; 62+ messages in thread From: Jeff King @ 2016-08-08 19:09 UTC (permalink / raw) To: Junio C Hamano Cc: Kirill Smelkov, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 12:06:13PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> + if grep -qFf nonlocal-loose 1.objects; then > >> + echo "Non-local object present in pack generated with --local" > >> + return 1 > >> + fi > >> +' > > > > grep -f isn't portable. However, I think: > > > > echo $objsha1 >expect && > > git show-index <1.pack | cut -d' ' -f2 >actual > > test_cmp expect actual > > > > would work (if you do stick with two entries, you might need to sort > > your "expect"). > > Hmph, are you sure? "grep -f pattern_file" is in POSIX.1. Hmm, you're right. I specifically checked my local grep.1posix manpage, but searching for "-f" didn't turn up anything, because it's formatted with a Unicode minus sign (U+2212). Bleh. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 12:37 ` Kirill Smelkov 2016-08-08 13:50 ` Jeff King @ 2016-08-08 16:11 ` Junio C Hamano 2016-08-08 18:19 ` Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 16:11 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: >> > ... >> > Suggested-by: Junio C Hamano <gitster@pobox.com> >> > Discussed-with: Jeff King <peff@peff.net> >> > --- >> >> I do not think I suggested much of this to deserve credit like this, >> though, as I certainly haven't thought about the pros-and-cons >> between adding the same "some object in pack may not want to be in >> the output" logic to the bitmap side, or punting the bitmap codepath >> when local/keep are involved. > > I understand. Still for me it was you who convinced me to add proper > support for e.g. --local vs bitmap instead of special-casing it. OK, in such a case, it probably is more sensible to do it like: ... with all differences strangely showing we are a bit faster now, but probably all being within noise. Credit for inspiring this solution and discussing the design of the change goes to Junio and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 36 ++++++++++++----- t/t5310-pack-bitmaps.sh | 103 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 130 insertions(+), 9 deletions(-) Don't forget your own sign-off ;-) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 16:11 ` Junio C Hamano @ 2016-08-08 18:19 ` Kirill Smelkov 2016-08-08 18:57 ` [PATCH v3] " Kirill Smelkov 2016-08-08 19:26 ` [PATCH 1/2] " Junio C Hamano 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 18:19 UTC (permalink / raw) To: Jeff King, Junio C Hamano Cc: Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git ( updated patch is in the end of this mail ) Jeff, first of all thanks for commenting, On Mon, Aug 08, 2016 at 09:50:20AM -0400, Jeff King wrote: > On Mon, Aug 08, 2016 at 03:37:35PM +0300, Kirill Smelkov wrote: > > > @@ -958,15 +958,30 @@ static int want_object_in_pack(const unsigned char *sha1, > > off_t *found_offset) > > { > > struct packed_git *p; > > + struct packed_git *pack1 = *found_pack; > > + int pack1_seen = !pack1; > > > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > > return 0; > > > > - *found_pack = NULL; > > - *found_offset = 0; > > + /* > > + * If we already know the pack object lives in, start checks from that > > + * pack - in the usual case when neither --local was given nor .keep files > > + * are present the loop will degenerate to have only 1 iteration. > > + */ > > + for (p = (pack1 ? pack1 : packed_git); p; > > + p = (pack1_seen ? p->next : packed_git), pack1_seen = 1) { > > + off_t offset; > > Hmm. So this is basically sticking the found-pack at the front of the > loop. > > We either need to look at zero packs here (we already know where the > object is, and we don't need to bother with --local or .keep lookups), > or we need to look at all of them (to check for local/keep). > > I guess you structured it this way to try to reuse the "can we break out > early" logic from the middle of the loop. So we go through the loop one > time, and then break out. And then this: > > > + if (p == pack1) { > > + if (pack1_seen) > > + continue; > > + offset = *found_offset; > > + } > > + else { > > + offset = find_pack_entry_one(sha1, p); > > + } > > is meant to make that one-time through the loop cheaper. So I don't > think it's wrong, but it's very confusing to me. > > Would it be simpler to stick that logic in a function like: > > static int want_found_object(int exclude, struct packed_git *pack) > { > if (exclude) > return 1; > if (incremental) > return 0; > > /* if we can break early, then do so */ > if (!ignore_packed_keep && > (!local || !have_non_local_packs)) > return 1; > > if (local && !p->pack_local) > return 0; > if (ignore_packed_keep && p->pack_local && p->pack_keep) > return 0; > > /* indeterminate; keep looking for more packs */ > return -1; > } > > static int want_object_in_pack(...) > { > ... > if (!exclude && local && has_loose_object_nonlocal(sha1)) > return 0; > > if (*found_pack) { > int ret = want_found_object(exclude, *found_pack); > if (ret != -1) > return ret; > } > > for (p = packed_git; p; p = p->next) { > off_t offset; > > if (p == *found_pack) > offset = *found_offset; > else > offset = find_pack_entry(sha1, p); > if (offset) { > ... fill in *found_pack ... > int ret = want_found_object(exclude, p); > if (ret != -1) > return ret; > } > } > return 1; > } > > That's a little more verbose, but IMHO the flow is a lot easier to > follow (especially as the later re-rolls of that series actually muck > with the loop order more, but with this approach there's no conflict). On Mon, Aug 08, 2016 at 09:08:51AM -0700, Junio C Hamano wrote: > I agree; Kirill's version was so confusing that I couldn't see what > it was trying to do with "pack1_seen" flag that is reset every time > loop repeats (at least, before got my coffee ;-). A helper function > like the above makes the logic a lot easier to grasp. Ok, at least I put today's record for the most confusing code. I agree with your comments - it is better to simplify control-flow logic. Somehow my head was refusing doing that and insisted on keeping the loop inside intact. Maybe I should have a bit of rest... Scratch all that in favour of want_found_object() and thanks for heads-up. > > static int add_object_entry(const unsigned char *sha1, enum object_type type, > > const char *name, int exclude) > > { > > - struct packed_git *found_pack; > > - off_t found_offset; > > + struct packed_git *found_pack = NULL; > > + off_t found_offset = 0; > > uint32_t index_pos; > > > > if (have_duplicate_entry(sha1, exclude, &index_pos)) > > @@ -1073,6 +1088,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > return 0; > > > > + if (!want_object_in_pack(sha1, 0, &pack, &offset)) > > + return 0; > > + > > This part looks correct and easy to understand. thanks. > > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > > index 3893afd..1a61de4 100755 > > --- a/t/t5310-pack-bitmaps.sh > > +++ b/t/t5310-pack-bitmaps.sh > > @@ -16,6 +16,7 @@ test_expect_success 'setup repo with moderate-sized history' ' > > test_commit side-$i > > done && > > git checkout master && > > + bitmaptip=$(git show-ref -s master) && > > Our usual method for getting a sha1 is "git rev-parse". I don't think > there's anything wrong with your method, but it might be better to stick > to the canonical one (I had to actually look up "show-ref -s"). ok. > > @@ -118,6 +119,90 @@ test_expect_success 'incremental repack can disable bitmaps' ' > > git repack -d --no-write-bitmap-index > > ' > > > > +test_expect_success 'pack-objects respects --local (non-local loose)' ' > > + mkdir -p alt_objects/pack && > > + echo $(pwd)/alt_objects > .git/objects/info/alternates && > > + echo content1 > file1 && > > Style: we don't put a space between ">" and the filename. ok, corrected. > > + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && > > + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && > > I'm not sure why we need two objects in the fake alt_objects repository. > Shouldn't one be enough to do the test? Those two objects are different: one is not present in main bitmapped pack and another is present in main bitmapped pack. So the second one tests for case Junio caught - when bitmapped pack overlaps with non-local loose object and with --local we want to avoid that object in resultant pack. I've adjusted the patch as + # non-local loose object which is not present in bitmapped pack objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && > > + git add file1 && > > I think this will actually skip the writing of the loose object, because > it's already available in the alternate object store. You probably want > to do this before adding it there. It does not want to add the object to local objects - it just wants to make a commit with reference to that object, so that $objsha1 we added above becomes referenced from HEAD and thus should be put in pack without --local (and with --local we test it is not put there). > > + test_tick && > > + git commit -m commit_file1 && > > + echo HEAD | \ > > No need for "\" after a "|"; the shell knows it has to keep looking. Ok, thanks for the info. I've actually now folded those two lines into one as it is not long. > > > + git pack-objects --local --stdout --revs >1.pack && > > + git index-pack 1.pack && > > I'd have expected you to use the non-stdout version here. Is this meant > to be independent of your other patch (I think that's OK). On Mon, Aug 08, 2016 at 09:50:20AM -0400, Jeff King wrote: > Oh, nevermind, I forgot this was meant to be a preparatory patch. So it > makes sense to use --stdout in the tests. Actually now these two patches: - to teach bitmapped pack-objects about --local & friends, and - to teach `pack-objects file` to use bitmaps are completely separated and orthogonal. I mean they work independently and can be reviewed / applied independently, each solving its own task. Initially I was keeping them together because in the first version of `pack-objects file` the default was to always use bitmap index, and since repack was using it and there were tests for repack v.s. non-local objects those tests were failing. Now, since we figured we should have use_bitmap_index=0 by default when packing to file, the `bitmap + --local` part is not needed for the first patch. ( it is still good to have the `bitmap + --local` applied because it restores correctness and consistency and allows future paths for brave soles to do repacking with bitmap index being on maybe ) For the current patch I think using --stdout in tests is ok as we know --stdout uses bitmap indices by default. > > + git verify-pack -v 1.pack >1.objects && > > It's cheaper to use "git show-index <1.pack", and the output is saner, > too. I've copied those verify-packs from t7700-repack.sh, but it ok to switch to show-index. Thanks for pointing this out. > > + echo -e "$objsha1\n$blob" >nonlocal-loose && > > "echo -e" isn't portable. You can use "printf", or two echos like: > > { > echo one && > echo two > } >file ok, switching to printf. Thanks for portability hint. > (though I'm still not sure what we gain by checking both). Please see above about those two objects serves for testing two different scenarios: 1) non-local object which is not in bitmapped pack, and 2) non-local object which is also present in bitmapped pack. > > + if grep -qFf nonlocal-loose 1.objects; then > > + echo "Non-local object present in pack generated with --local" > > + return 1 > > + fi > > +' > > grep -f isn't portable. However, I think: > > echo $objsha1 >expect && > git show-index <1.pack | cut -d' ' -f2 >actual > test_cmp expect actual > > would work (if you do stick with two entries, you might need to sort > your "expect"). Thanks for pointing out grep -f is not portable (I did not knew nor cared about portability). However here and in similar places we are checking that entries in nonlocal-loose are not present in 1.objects and that is not what test_cmp does as it would test nonlocal-loose and 1.objects to be completely same or not. For this reason I'm changing `grep -f ...` to `git grep --no-index -f ...` which we carry with us. > > I think similar comments apply to the other tests. I went through all tests in the patch and made similar adjustments everywhere. > I would have expected > "respects --local (non-local pack)" to come next (i.e., to keep all of > the --local tests together). But you seem to interleave them with > --honor-pack-keep. There is a reason: "respects --local (non-local pack)" needs non-local pack setup in alt_objects/ for its tests to run. And we setup such pack as byproduct of running "pack-objects respects --honor-pack-keep (local non-bitmapped pack)". Similarly "pack-objects respects --local (non-local bitmapped pack)" for its testing needs (and moves to alt_objects/) main bitmapped pack, which was just analyzed in "pack-objects respects --honor-pack-keep (local bitmapped pack)" Initially I tried to cluster tests (i.e. all --local together and then all --honor-pack-keep together) but having tests interleaved turned out to be handy because one step checks something and prepares setup for its next one. On Mon, Aug 08, 2016 at 09:11:53AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > >> > ... > >> > Suggested-by: Junio C Hamano <gitster@pobox.com> > >> > Discussed-with: Jeff King <peff@peff.net> > >> > --- > >> > >> I do not think I suggested much of this to deserve credit like this, > >> though, as I certainly haven't thought about the pros-and-cons > >> between adding the same "some object in pack may not want to be in > >> the output" logic to the bitmap side, or punting the bitmap codepath > >> when local/keep are involved. > > > > I understand. Still for me it was you who convinced me to add proper > > support for e.g. --local vs bitmap instead of special-casing it. > > OK, in such a case, it probably is more sensible to do it like: > > ... > with all differences strangely showing we are a bit faster now, but > probably all being within noise. > > Credit for inspiring this solution and discussing the design of > the change goes to Junio and Jeff King. Ok, thanks for advice. > Don't forget your own sign-off ;-) Oops, thanks for catching :) Obviously I forgot it and now corrected. Thanks, Kirill ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH v3] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% with all differences strangely showing we are a bit faster now, but probably all being within noise. And in the general case we care not to have duplicate find_pack_entry_one(*found_pack) calls. Worst what can happen is we can call want_found_object(*found_pack) -- newly introduced helper for checking whether we want object -- twice, but since want_found_object() is very lightweight it does not make any difference. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 94 ++++++++++++++++++++++++++-------------- t/t5310-pack-bitmaps.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 172 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..e06c1bf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,45 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* + * When asked to do --local (do not include an + * object that appears in a pack we borrow + * from elsewhere) or --honor-pack-keep (do not + * include an object that appears in a pack marked + * with .keep), we need to make sure no copy of this + * object come from in _any_ pack that causes us to + * omit it, and need to complete this loop. When + * neither option is in effect, we know the object + * we just found is going to be packed, so break + * out of the search loop now. + */ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * As a side effect of this check, if object's pack entry was not already found, + * we will find the packed version of this object, if any. We therefore pass + * out the pack information to avoid having to look it up again later. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +990,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present we will determine the answer right now. + */ + if (*found_pack) { + want = want_found_object(exclude, *found_pack); + if (want != -1) + return want; + } for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(sha1, p); + if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -974,31 +1021,9 @@ static int want_object_in_pack(const unsigned char *sha1, *found_offset = offset; *found_pack = p; } - if (exclude) - return 1; - if (incremental) - return 0; - - /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the loop to return 1 now. - */ - if (!ignore_packed_keep && - (!local || !have_non_local_packs)) - break; - - if (local && !p->pack_local) - return 0; - if (ignore_packed_keep && p->pack_local && p->pack_keep) - return 0; + want = want_found_object(exclude, p); + if (want != -1) + return want; } } @@ -1039,8 +1064,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1098,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..e71caa4 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -7,6 +7,19 @@ objpath () { echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" } +# show objects present in pack ($1 should be associated *.idx) +packobjects () { + git show-index <$1 | cut -d' ' -f2 +} + +# hasany pattern-file content-file +# tests whether content-file has any entry from pattern-file with entries being +# whole lines. +hasany () { + # NOTE `grep -f` is not portable + git grep --no-index -qFf $1 $2 +} + test_expect_success 'setup repo with moderate-sized history' ' for i in $(test_seq 1 10); do test_commit $i @@ -16,6 +29,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +132,86 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + mkdir -p alt_objects/pack && + echo $(pwd)/alt_objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + packobjects 1.idx >1.objects && + printf "$objsha1\n$blob\n" >nonlocal-loose && + if hasany nonlocal-loose 1.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + objsha2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "$objsha2\n$bitmaptip\n" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + touch .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $objsha2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + packobjects 2a.idx >2a.objects && + if hasany keepobjects 2a.objects; then + echo "Object from .keeped pack present in pack generated with --honor-pack-keep" + return 1 + fi +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + packobjects 2b.idx >2b.objects && + if hasany keepobjects 2b.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + packobjects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + touch .git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + packobjects 3a.idx >3a.objects && + if hasany packbitmap.objects 3a.objects; then + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" + return 1 + fi && + rm .git/objects/pack/$packbitmap.keep +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + packobjects 3b.idx >3b.objects && + if hasany packbitmap.objects 3b.objects; then + echo "Non-local object from bitmapped pack present in pack generated with --local" + return 1 + fi && + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +237,23 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + packobjects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + if !hasany objects 4.objects; then + echo "Expected objects not present in incremental pack" + return 1 + fi +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH v3] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 18:19 ` Kirill Smelkov @ 2016-08-08 18:57 ` Kirill Smelkov 2016-08-08 19:26 ` [PATCH 1/2] " Junio C Hamano 1 sibling, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 18:57 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% with all differences strangely showing we are a bit faster now, but probably all being within noise. And in the general case we care not to have duplicate find_pack_entry_one(*found_pack) calls. Worst what can happen is we can call want_found_object(*found_pack) -- newly introduced helper for checking whether we want object -- twice, but since want_found_object() is very lightweight it does not make any difference. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 94 ++++++++++++++++++++++++++-------------- t/t5310-pack-bitmaps.sh | 111 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 172 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..e06c1bf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,45 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* + * When asked to do --local (do not include an + * object that appears in a pack we borrow + * from elsewhere) or --honor-pack-keep (do not + * include an object that appears in a pack marked + * with .keep), we need to make sure no copy of this + * object come from in _any_ pack that causes us to + * omit it, and need to complete this loop. When + * neither option is in effect, we know the object + * we just found is going to be packed, so break + * out of the search loop now. + */ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * As a side effect of this check, if object's pack entry was not already found, + * we will find the packed version of this object, if any. We therefore pass + * out the pack information to avoid having to look it up again later. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +990,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present we will determine the answer right now. + */ + if (*found_pack) { + want = want_found_object(exclude, *found_pack); + if (want != -1) + return want; + } for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(sha1, p); + if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -974,31 +1021,9 @@ static int want_object_in_pack(const unsigned char *sha1, *found_offset = offset; *found_pack = p; } - if (exclude) - return 1; - if (incremental) - return 0; - - /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the loop to return 1 now. - */ - if (!ignore_packed_keep && - (!local || !have_non_local_packs)) - break; - - if (local && !p->pack_local) - return 0; - if (ignore_packed_keep && p->pack_local && p->pack_keep) - return 0; + want = want_found_object(exclude, p); + if (want != -1) + return want; } } @@ -1039,8 +1064,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1098,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..e71caa4 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -7,6 +7,19 @@ objpath () { echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" } +# show objects present in pack ($1 should be associated *.idx) +packobjects () { + git show-index <$1 | cut -d' ' -f2 +} + +# hasany pattern-file content-file +# tests whether content-file has any entry from pattern-file with entries being +# whole lines. +hasany () { + # NOTE `grep -f` is not portable + git grep --no-index -qFf $1 $2 +} + test_expect_success 'setup repo with moderate-sized history' ' for i in $(test_seq 1 10); do test_commit $i @@ -16,6 +29,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +132,86 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + mkdir -p alt_objects/pack && + echo $(pwd)/alt_objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + packobjects 1.idx >1.objects && + printf "$objsha1\n$blob\n" >nonlocal-loose && + if hasany nonlocal-loose 1.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + objsha2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "$objsha2\n$bitmaptip\n" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + touch .git/objects/pack/pack2-$pack2.keep && + rm $(objpath $objsha2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + packobjects 2a.idx >2a.objects && + if hasany keepobjects 2a.objects; then + echo "Object from .keeped pack present in pack generated with --honor-pack-keep" + return 1 + fi +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + packobjects 2b.idx >2b.objects && + if hasany keepobjects 2b.objects; then + echo "Non-local object present in pack generated with --local" + return 1 + fi +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + packobjects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + touch .git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + packobjects 3a.idx >3a.objects && + if hasany packbitmap.objects 3a.objects; then + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" + return 1 + fi && + rm .git/objects/pack/$packbitmap.keep +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + packobjects 3b.idx >3b.objects && + if hasany packbitmap.objects 3b.objects; then + echo "Non-local object from bitmapped pack present in pack generated with --local" + return 1 + fi && + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +237,23 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + packobjects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + if !hasany objects 4.objects; then + echo "Expected objects not present in incremental pack" + return 1 + fi +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 18:19 ` Kirill Smelkov 2016-08-08 18:57 ` [PATCH v3] " Kirill Smelkov @ 2016-08-08 19:26 ` Junio C Hamano 2016-08-09 11:21 ` Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 19:26 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > ---- 8< ---- > From: Kirill Smelkov <kirr@nexedi.com> > Subject: [PATCH v3] pack-objects: Teach --use-bitmap-index codepath to respect > --local, --honor-pack-keep and --incremental (Not a question to Kirill) Hmph. I suspect that handling of in-body header by mailinfo not prepared to see RFC2822 header folding. "am -c" gives a single line subject with " --local ..." as its first line in the body. I'll leave it as a low-hanging fruit for somebody to fix ;-) Subject: pack-objects: respect --local, etc. when bitmap is in use might be shorter and more to the point, anyway. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index c4c2a3c..e06c1bf 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -944,13 +944,45 @@ static int have_duplicate_entry(const unsigned char *sha1, > return 1; > } > > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an > + * object that appears in a pack we borrow > + * from elsewhere) or --honor-pack-keep (do not > + * include an object that appears in a pack marked > + * with .keep), we need to make sure no copy of this > + * object come from in _any_ pack that causes us to > + * omit it, and need to complete this loop. When > + * neither option is in effect, we know the object > + * we just found is going to be packed, so break > + * out of the search loop now. > + */ The blame is mine, but "no copy of this object appears in _any_ pack" would be more correct and easier to read. This code is no longer in a search loop; its caller is. Further rephrasing is needed. "When asked to do ...these things..., finding a pack that matches the criteria is sufficient for us to decide to omit it. However, even if this pack does not satisify the criteria, we need to make sure no copy of this object appears in _any_ pack that makes us to omit the object, so we need to check all the packs. Signal that by returning -1 to the caller." or something along that line. > /* > * Check whether we want the object in the pack (e.g., we do not want > * objects found in non-local stores if the "--local" option was used). > * > - * As a side effect of this check, we will find the packed version of this > - * object, if any. We therefore pass out the pack information to avoid having > - * to look it up again later. > + * As a side effect of this check, if object's pack entry was not already found, > + * we will find the packed version of this object, if any. We therefore pass > + * out the pack information to avoid having to look it up again later. The reasoning leading to "We therefore" is understandable, but "pass out the pack information" is not quite. Is this meant to explain the fact that *found_pack and *found_offset are in-out parameters? The explanation to justify why *found_pack and *found_offset that used to be out parameters are made in-out parameters belongs to the log message. We do not want this in-code comment to explain the updated code relative to what the code used to do; that is not useful to those who read the code for the first time in the context of the committed state. /* * Check whether we want to pack the object in the pack (e.g. ...). * * If the caller already knows an existing pack it wants to * take the object from, that is passed in *found_pack and * *found_offset; otherwise this function finds if there is * any pack that has the object and returns the pack and its * offset in these variables. */ > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..e71caa4 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -7,6 +7,19 @@ objpath () { > echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" > } > > +# show objects present in pack ($1 should be associated *.idx) > +packobjects () { > + git show-index <$1 | cut -d' ' -f2 > +} That is a misleading name for a helper function that produces a list of objects that were packed. "list_packed_objects", perhaps. > +# hasany pattern-file content-file > +# tests whether content-file has any entry from pattern-file with entries being > +# whole lines. > +hasany () { > + # NOTE `grep -f` is not portable > + git grep --no-index -qFf $1 $2 > +} I doubt "grep -f pattern_file" is not portable, but in any case, it is probably a good idea to have this helper function to make the caller easier to read. Please name it "has_any", though, and quote "$1" and "$2" as they are meant to be able to take any filename. > +test_expect_success 'pack-objects respects --local (non-local loose)' ' > + mkdir -p alt_objects/pack && I'd really really prefer to see an empty repository created for this. Even though the original intent was .git/objects/ alone, i.e. GIT_OBJECT_DIRECTORY can exist without associated refs, we discovered that it is in general not a good idea (think: "gc"). > + echo $(pwd)/alt_objects >.git/objects/info/alternates && > + echo content1 >file1 && > + # non-local loose object which is not present in bitmapped pack > + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && Don't say "sha" when you mean "object name". Otherwise you would end up introducing funky variable names like $objsha2 we see below that is confusing (we don't use SHA-2). > + # non-local loose object which is also present in bitmapped pack > + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && > + git add file1 && > + test_tick && > + git commit -m commit_file1 && > + echo HEAD | git pack-objects --local --stdout --revs >1.pack && > + git index-pack 1.pack && > + packobjects 1.idx >1.objects && > + printf "$objsha1\n$blob\n" >nonlocal-loose && I think Peff meant to suggest this instead: printf "%s\n" "$objsha1" "$blob" > + if hasany nonlocal-loose 1.objects; then > + echo "Non-local object present in pack generated with --local" > + return 1 > + fi Just saying ! has_any nonlocal-loose 1.objects is sufficient. Same comment for all other uses of these verbose output. Besides, we spell "if/then/fi" like this: if condition then body fi without a semicolon. > +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' > +... > + touch .git/objects/pack/pack2-$pack2.keep && Please don't do "touch" _unless_ you care about the timestamp of the file. Redirect an empty command into it, i.e. >.git/objects/pack/pack2-$pack2.keep or echo "reason to keep it" >.git/objects/pack/pack2-$pack2.keep instead. > +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' > + ls .git/objects/pack/ | grep bitmap >output && > + test_line_count = 1 output && > + packbitmap=$(basename $(cat output) .bitmap) && > + packobjects .git/objects/pack/$packbitmap.idx >packbitmap.objects && > + touch .git/objects/pack/$packbitmap.keep && > + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && > + git index-pack 3a.pack && > + packobjects 3a.idx >3a.objects && > + if hasany packbitmap.objects 3a.objects; then > + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" > + return 1 > + fi && > + rm .git/objects/pack/$packbitmap.keep Arrange this removal to happen even when any earlier step fails, so that later tests will not get affected by stray existence of this file, by using test_when_finished. E.g. list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && >.git/objects/pack/$packbitmap.keep" && > +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' > + mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && > + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && > + git index-pack 3b.pack && > + packobjects 3b.idx >3b.objects && > + if hasany packbitmap.objects 3b.objects; then > + echo "Non-local object from bitmapped pack present in pack generated with --local" > + return 1 > + fi && > + mv alt_objects/pack/$packbitmap.* .git/objects/pack/ Ditto on potential use of test_when_finished. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-08 19:26 ` [PATCH 1/2] " Junio C Hamano @ 2016-08-09 11:21 ` Kirill Smelkov 2016-08-09 11:25 ` [PATCH 1/2 v4] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov 2016-08-09 16:52 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 11:21 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Junio, first of all thanks for feedback, On Mon, Aug 08, 2016 at 12:26:33PM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: [...] > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index c4c2a3c..e06c1bf 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -944,13 +944,45 @@ static int have_duplicate_entry(const unsigned char *sha1, > > return 1; > > } > > > > +static int want_found_object(int exclude, struct packed_git *p) > > +{ > > + if (exclude) > > + return 1; > > + if (incremental) > > + return 0; > > + > > + /* > > + * When asked to do --local (do not include an > > + * object that appears in a pack we borrow > > + * from elsewhere) or --honor-pack-keep (do not > > + * include an object that appears in a pack marked > > + * with .keep), we need to make sure no copy of this > > + * object come from in _any_ pack that causes us to > > + * omit it, and need to complete this loop. When > > + * neither option is in effect, we know the object > > + * we just found is going to be packed, so break > > + * out of the search loop now. > > + */ > > The blame is mine, but "no copy of this object appears in _any_ pack" > would be more correct and easier to read. > > This code is no longer in a search loop; its caller is. Further > rephrasing is needed. "When asked to do ...these things..., finding > a pack that matches the criteria is sufficient for us to decide to > omit it. However, even if this pack does not satisify the criteria, > we need to make sure no copy of this object appears in _any_ pack > that makes us to omit the object, so we need to check all the packs. > Signal that by returning -1 to the caller." or something along that > line. Ok, I've rephrased it your way. Thanks for advising. > > /* > > * Check whether we want the object in the pack (e.g., we do not want > > * objects found in non-local stores if the "--local" option was used). > > * > > - * As a side effect of this check, we will find the packed version of this > > - * object, if any. We therefore pass out the pack information to avoid having > > - * to look it up again later. > > + * As a side effect of this check, if object's pack entry was not already found, > > + * we will find the packed version of this object, if any. We therefore pass > > + * out the pack information to avoid having to look it up again later. > > The reasoning leading to "We therefore" is understandable, but "pass > out the pack information" is not quite. Is this meant to explain > the fact that *found_pack and *found_offset are in-out parameters? > > The explanation to justify why *found_pack and *found_offset that > used to be out parameters are made in-out parameters belongs to the > log message. We do not want this in-code comment to explain the > updated code relative to what the code used to do; that is not > useful to those who read the code for the first time in the context > of the committed state. > > /* > * Check whether we want to pack the object in the pack (e.g. ...). > * > * If the caller already knows an existing pack it wants to > * take the object from, that is passed in *found_pack and > * *found_offset; otherwise this function finds if there is > * any pack that has the object and returns the pack and its > * offset in these variables. > */ The "pass out the pack information ..." is not my text - I only added "if object's pack entry was not already found" in the middle of the sentence and rewrapped this paragraph. The "pass out the pack information ..." comes from ce2bc424 (pack-objects: split add_object_entry; 2013-12-21) I agree your text is more clear and it is better to adjust the comments. > > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > > index 3893afd..e71caa4 100755 > > --- a/t/t5310-pack-bitmaps.sh > > +++ b/t/t5310-pack-bitmaps.sh > > @@ -7,6 +7,19 @@ objpath () { > > echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" > > } > > > > +# show objects present in pack ($1 should be associated *.idx) > > +packobjects () { > > + git show-index <$1 | cut -d' ' -f2 > > +} > > That is a misleading name for a helper function that produces a list > of objects that were packed. "list_packed_objects", perhaps. I agree it is ambiguous wrt `git pack-objects` and sorry for choosing not good name from the start. I'm changing it to pack_list_objects(). ( personally I would use pack_obj_list a-la git-rev-list, but let's try not to create another review step because of abbreviate vs not-abbreviate ) > > +# hasany pattern-file content-file > > +# tests whether content-file has any entry from pattern-file with entries being > > +# whole lines. > > +hasany () { > > + # NOTE `grep -f` is not portable > > + git grep --no-index -qFf $1 $2 > > +} > > I doubt "grep -f pattern_file" is not portable, but in any case, it > is probably a good idea to have this helper function to make the > caller easier to read. Please name it "has_any", though, and quote > "$1" and "$2" as they are meant to be able to take any filename. Ok, thanks for the info `grep -f` is portable > > +test_expect_success 'pack-objects respects --local (non-local loose)' ' > > + mkdir -p alt_objects/pack && > > I'd really really prefer to see an empty repository created for > this. Even though the original intent was .git/objects/ alone, > i.e. GIT_OBJECT_DIRECTORY can exist without associated refs, we > discovered that it is in general not a good idea (think: "gc"). The "mkdir alt_objects/" comes from t7700-repack.sh - e.g. from 3c3df429 which I've ported from there. However as you say let's switch this to having full another repo. > > + echo $(pwd)/alt_objects >.git/objects/info/alternates && > > + echo content1 >file1 && > > + # non-local loose object which is not present in bitmapped pack > > + objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && > > Don't say "sha" when you mean "object name". Otherwise you would > end up introducing funky variable names like $objsha2 we see below > that is confusing (we don't use SHA-2). Ok makes sense, I've changed objsha1 to altblob and objsha2 to blob2. Thanks for head-ups on this. > > + # non-local loose object which is also present in bitmapped pack > > + git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && > > + git add file1 && > > + test_tick && > > + git commit -m commit_file1 && > > + echo HEAD | git pack-objects --local --stdout --revs >1.pack && > > + git index-pack 1.pack && > > + packobjects 1.idx >1.objects && > > + printf "$objsha1\n$blob\n" >nonlocal-loose && > > I think Peff meant to suggest this instead: > > printf "%s\n" "$objsha1" "$blob" Oops, yes, my bad. Corrected. > > + if hasany nonlocal-loose 1.objects; then > > + echo "Non-local object present in pack generated with --local" > > + return 1 > > + fi > > Just saying > > ! has_any nonlocal-loose 1.objects > > is sufficient. Same comment for all other uses of these verbose > output. > > Besides, we spell "if/then/fi" like this: > > if condition > then > body > fi > > without a semicolon. I initially copied this check-templates from t7700-repack.sh, e.g. from 3289b9de (t7700: test that 'repack -a' packs alternate packed objects; 2008-11-13) and other places. But ok, let's switch the checks to oneliners like "! has_any ..." > > +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' > > +... > > + touch .git/objects/pack/pack2-$pack2.keep && > > Please don't do "touch" _unless_ you care about the timestamp of the > file. Redirect an empty command into it, i.e. > > >.git/objects/pack/pack2-$pack2.keep > > or > > echo "reason to keep it" >.git/objects/pack/pack2-$pack2.keep > > instead. Ok, I've changed to >file as the reason here is obvious. Would you please explain why we should not use touch if we do not care about timestamps? Simply style? > > +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' > > + ls .git/objects/pack/ | grep bitmap >output && > > + test_line_count = 1 output && > > + packbitmap=$(basename $(cat output) .bitmap) && > > + packobjects .git/objects/pack/$packbitmap.idx >packbitmap.objects && > > + touch .git/objects/pack/$packbitmap.keep && > > + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && > > + git index-pack 3a.pack && > > + packobjects 3a.idx >3a.objects && > > + if hasany packbitmap.objects 3a.objects; then > > + echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" > > + return 1 > > + fi && > > + rm .git/objects/pack/$packbitmap.keep > > Arrange this removal to happen even when any earlier step fails, so > that later tests will not get affected by stray existence of this > file, by using test_when_finished. E.g. > > list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && > test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && > >.git/objects/pack/$packbitmap.keep" && Ok, I did not knew about test_when_finished, and thanks for pointing this out. Adjusted here and in similar place. Will send v4 patch as reply to this mail with below interdiff: Thanks again, Kirill ---- 8< ---- (interdiff) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 4c129bd..c92d7fc 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -953,16 +953,14 @@ static int want_found_object(int exclude, struct packed_git *p) return 0; /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the search loop now. + * When asked to do --local (do not include an object that appears in a + * pack we borrow from elsewhere) or --honor-pack-keep (do not include + * an object that appears in a pack marked with .keep), finding a pack + * that matches the criteria is sufficient for us to decide to omit it. + * However, even if this pack does not satisfy the criteria, we need to + * make sure no copy of this object appears in _any_ pack that makes us + * to omit the object, so we need to check all the packs. Signal that by + * returning -1 to the caller. */ if (!ignore_packed_keep && (!local || !have_non_local_packs)) @@ -981,9 +979,10 @@ static int want_found_object(int exclude, struct packed_git *p) * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, if object's pack entry was not already found, - * we will find the packed version of this object, if any. We therefore pass - * out the pack information to avoid having to look it up again later. + * If the caller already knows an existing pack it wants to take the object + * from, that is passed in *found_pack and *found_offset; otherwise this + * function finds if there is any pack that has the object and returns the pack + * and its offset in these variables. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index cce95d8..44914ac 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -8,16 +8,15 @@ objpath () { } # show objects present in pack ($1 should be associated *.idx) -packobjects () { - git show-index <$1 | cut -d' ' -f2 +pack_list_objects () { + git show-index <"$1" | cut -d' ' -f2 } -# hasany pattern-file content-file +# has_any pattern-file content-file # tests whether content-file has any entry from pattern-file with entries being # whole lines. -hasany () { - # NOTE `grep -f` is not portable - git grep --no-index -qFf $1 $2 +has_any () { + grep -qFf "$1" "$2" } test_expect_success 'setup repo with moderate-sized history' ' @@ -133,83 +132,68 @@ test_expect_success 'incremental repack can disable bitmaps' ' ' test_expect_success 'pack-objects respects --local (non-local loose)' ' - mkdir -p alt_objects/pack && - echo $(pwd)/alt_objects >.git/objects/info/alternates && + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && echo content1 >file1 && # non-local loose object which is not present in bitmapped pack - objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) && + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && # non-local loose object which is also present in bitmapped pack - git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin && + git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && git add file1 && test_tick && git commit -m commit_file1 && echo HEAD | git pack-objects --local --stdout --revs >1.pack && git index-pack 1.pack && - packobjects 1.idx >1.objects && - printf "$objsha1\n$blob\n" >nonlocal-loose && - if hasany nonlocal-loose 1.objects; then - echo "Non-local object present in pack generated with --local" - return 1 - fi + pack_list_objects 1.idx >1.objects && + printf "%s\n" "$altblob" "$blob" >nonlocal-loose && + ! has_any nonlocal-loose 1.objects ' test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' echo content2 >file2 && - objsha2=$(git hash-object -w file2) && + blob2=$(git hash-object -w file2) && git add file2 && test_tick && git commit -m commit_file2 && - printf "$objsha2\n$bitmaptip\n" >keepobjects && + printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && pack2=$(git pack-objects pack2 <keepobjects) && mv pack2-$pack2.* .git/objects/pack/ && - touch .git/objects/pack/pack2-$pack2.keep && - rm $(objpath $objsha2) && + >.git/objects/pack/pack2-$pack2.keep && + rm $(objpath $blob2) && echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && git index-pack 2a.pack && - packobjects 2a.idx >2a.objects && - if hasany keepobjects 2a.objects; then - echo "Object from .keeped pack present in pack generated with --honor-pack-keep" - return 1 - fi + pack_list_objects 2a.idx >2a.objects && + ! has_any keepobjects 2a.objects ' test_expect_success 'pack-objects respects --local (non-local pack)' ' - mv .git/objects/pack/pack2-$pack2.* alt_objects/pack/ && + mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && echo HEAD | git pack-objects --local --stdout --revs >2b.pack && git index-pack 2b.pack && - packobjects 2b.idx >2b.objects && - if hasany keepobjects 2b.objects; then - echo "Non-local object present in pack generated with --local" - return 1 - fi + pack_list_objects 2b.idx >2b.objects && + ! has_any keepobjects 2b.objects ' test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && packbitmap=$(basename $(cat output) .bitmap) && - packobjects .git/objects/pack/$packbitmap.idx >packbitmap.objects && - touch .git/objects/pack/$packbitmap.keep && + pack_list_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && + >.git/objects/pack/$packbitmap.keep && echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && git index-pack 3a.pack && - packobjects 3a.idx >3a.objects && - if hasany packbitmap.objects 3a.objects; then - echo "Object from .keeped bitmapped pack present in pack generated with --honour-pack-keep" - return 1 - fi && - rm .git/objects/pack/$packbitmap.keep + pack_list_objects 3a.idx >3a.objects && + ! has_any packbitmap.objects 3a.objects ' test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' - mv .git/objects/pack/$packbitmap.* alt_objects/pack/ && + mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && echo HEAD | git pack-objects --local --stdout --revs >3b.pack && git index-pack 3b.pack && - packobjects 3b.idx >3b.objects && - if hasany packbitmap.objects 3b.objects; then - echo "Non-local object from bitmapped pack present in pack generated with --local" - return 1 - fi && - mv alt_objects/pack/$packbitmap.* .git/objects/pack/ + pack_list_objects 3b.idx >3b.objects && + ! has_any packbitmap.objects 3b.objects ' test_expect_success 'pack-objects to file can use bitmap' ' @@ -256,14 +240,11 @@ test_expect_success 'pack-objects respects --incremental' ' EOF git pack-objects --incremental --stdout --revs <revs2 >4.pack && git index-pack 4.pack && - packobjects 4.idx >4.objects && + pack_list_objects 4.idx >4.objects && test_line_count = 4 4.objects && git rev-list --objects $commit >revlist && cut -d" " -f1 revlist |sort >objects && - if !hasany objects 4.objects; then - echo "Expected objects not present in incremental pack" - return 1 - fi + test_cmp 4.objects objects ' test_expect_success 'pack with missing blob' ' ^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH 1/2 v4] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-08-09 11:21 ` Kirill Smelkov @ 2016-08-09 11:25 ` Kirill Smelkov 2016-08-09 16:52 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 1 sibling, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 11:25 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% with all differences strangely showing we are a bit faster now, but probably all being within noise. And in the general case we care not to have duplicate find_pack_entry_one(*found_pack) calls. Worst what can happen is we can call want_found_object(*found_pack) -- newly introduced helper for checking whether we want object -- twice, but since want_found_object() is very lightweight it does not make any difference. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 93 +++++++++++++++++++++++++++++++------------------ t/t5310-pack-bitmaps.sh | 92 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 152 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..b1007f2 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,44 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* + * When asked to do --local (do not include an object that appears in a + * pack we borrow from elsewhere) or --honor-pack-keep (do not include + * an object that appears in a pack marked with .keep), finding a pack + * that matches the criteria is sufficient for us to decide to omit it. + * However, even if this pack does not satisfy the criteria, we need to + * make sure no copy of this object appears in _any_ pack that makes us + * to omit the object, so we need to check all the packs. Signal that by + * returning -1 to the caller. + */ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * If the caller already knows an existing pack it wants to take the object + * from, that is passed in *found_pack and *found_offset; otherwise this + * function finds if there is any pack that has the object and returns the pack + * and its offset in these variables. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +989,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present we will determine the answer right now. + */ + if (*found_pack) { + want = want_found_object(exclude, *found_pack); + if (want != -1) + return want; + } for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(sha1, p); + if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -974,31 +1020,9 @@ static int want_object_in_pack(const unsigned char *sha1, *found_offset = offset; *found_pack = p; } - if (exclude) - return 1; - if (incremental) - return 0; - - /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the loop to return 1 now. - */ - if (!ignore_packed_keep && - (!local || !have_non_local_packs)) - break; - - if (local && !p->pack_local) - return 0; - if (ignore_packed_keep && p->pack_local && p->pack_keep) - return 0; + want = want_found_object(exclude, p); + if (want != -1) + return want; } } @@ -1039,8 +1063,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..a50d867 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -7,6 +7,18 @@ objpath () { echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" } +# show objects present in pack ($1 should be associated *.idx) +pack_list_objects () { + git show-index <"$1" | cut -d' ' -f2 +} + +# has_any pattern-file content-file +# tests whether content-file has any entry from pattern-file with entries being +# whole lines. +has_any () { + grep -qFf "$1" "$2" +} + test_expect_success 'setup repo with moderate-sized history' ' for i in $(test_seq 1 10); do test_commit $i @@ -16,6 +28,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +131,71 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + pack_list_objects 1.idx >1.objects && + printf "%s\n" "$altblob" "$blob" >nonlocal-loose && + ! has_any nonlocal-loose 1.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + blob2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + >.git/objects/pack/pack2-$pack2.keep && + rm $(objpath $blob2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + pack_list_objects 2a.idx >2a.objects && + ! has_any keepobjects 2a.objects +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + pack_list_objects 2b.idx >2b.objects && + ! has_any keepobjects 2b.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + pack_list_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && + >.git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + pack_list_objects 3a.idx >3a.objects && + ! has_any packbitmap.objects 3a.objects +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + pack_list_objects 3b.idx >3b.objects && + ! has_any packbitmap.objects 3b.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +221,20 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + pack_list_objects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + test_cmp 4.objects objects +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-09 11:21 ` Kirill Smelkov 2016-08-09 11:25 ` [PATCH 1/2 v4] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov @ 2016-08-09 16:52 ` Junio C Hamano 2016-08-09 19:29 ` Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-09 16:52 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > Would you please explain why we should not use touch if we do not care > about timestamps? Simply style? To help readers. "touch A" forcess the readers wonder "does the timestamp of A matter, and if so in what way?" and "does any later test care what is _in_ A, and if so in what way?" Both of them is wasting their time when there is no reason why "touch" should have been used. > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index cce95d8..44914ac 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -8,16 +8,15 @@ objpath () { > } > > # show objects present in pack ($1 should be associated *.idx) > -packobjects () { > - git show-index <$1 | cut -d' ' -f2 > +pack_list_objects () { > + git show-index <"$1" | cut -d' ' -f2 > } pack-list-objects still sounds as if you are packing "list objects", though. If you are listing packed objects (or objects in a pack), list-packed-objects (or list-objects-in-pack) reads clearer and more to the point, at least to me. > -# hasany pattern-file content-file > +# has_any pattern-file content-file > # tests whether content-file has any entry from pattern-file with entries being > # whole lines. > -hasany () { > - # NOTE `grep -f` is not portable > - git grep --no-index -qFf $1 $2 > +has_any () { > + grep -qFf "$1" "$2" Omitting "-q" would help those who have to debug breakage in this test or the code that this test checks. What test_expect_success outputs is not shown by default, and running the test script with "-v" would show them as a debugging aid. Thanks. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-09 16:52 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano @ 2016-08-09 19:29 ` Kirill Smelkov 2016-08-09 19:31 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 19:29 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Tue, Aug 09, 2016 at 09:52:18AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > Would you please explain why we should not use touch if we do not care > > about timestamps? Simply style? > > To help readers. > > "touch A" forcess the readers wonder "does the timestamp of A > matter, and if so in what way?" and "does any later test care what > is _in_ A, and if so in what way?" Both of them is wasting their > time when there is no reason why "touch" should have been used. I see, thanks for explaining. I used to read it a bit the other way; maybe it is just an environment difference. > > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > > index cce95d8..44914ac 100755 > > --- a/t/t5310-pack-bitmaps.sh > > +++ b/t/t5310-pack-bitmaps.sh > > @@ -8,16 +8,15 @@ objpath () { > > } > > > > # show objects present in pack ($1 should be associated *.idx) > > -packobjects () { > > - git show-index <$1 | cut -d' ' -f2 > > +pack_list_objects () { > > + git show-index <"$1" | cut -d' ' -f2 > > } > > pack-list-objects still sounds as if you are packing "list objects", > though. If you are listing packed objects (or objects in a pack), > list-packed-objects (or list-objects-in-pack) reads clearer and more > to the point, at least to me. Ok, let it be list_packed_objects(). > > -# hasany pattern-file content-file > > +# has_any pattern-file content-file > > # tests whether content-file has any entry from pattern-file with entries being > > # whole lines. > > -hasany () { > > - # NOTE `grep -f` is not portable > > - git grep --no-index -qFf $1 $2 > > +has_any () { > > + grep -qFf "$1" "$2" > > Omitting "-q" would help those who have to debug breakage in this > test or the code that this test checks. What test_expect_success > outputs is not shown by default, and running the test script with > "-v" would show them as a debugging aid. Ok, makes sense. Both patches adjusted and will be reposted. Thanks, Kirill ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-08-09 19:29 ` Kirill Smelkov @ 2016-08-09 19:31 ` Kirill Smelkov 2016-08-18 17:52 ` Jeff King 2016-08-09 19:32 ` [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov 2016-08-09 19:49 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 2 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 19:31 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows, we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% with all differences strangely showing we are a bit faster now, but probably all being within noise. And in the general case we care not to have duplicate find_pack_entry_one(*found_pack) calls. Worst what can happen is we can call want_found_object(*found_pack) -- newly introduced helper for checking whether we want object -- twice, but since want_found_object() is very lightweight it does not make any difference. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 93 +++++++++++++++++++++++++++++++------------------ t/t5310-pack-bitmaps.sh | 92 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 152 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..b1007f2 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,44 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* + * When asked to do --local (do not include an object that appears in a + * pack we borrow from elsewhere) or --honor-pack-keep (do not include + * an object that appears in a pack marked with .keep), finding a pack + * that matches the criteria is sufficient for us to decide to omit it. + * However, even if this pack does not satisfy the criteria, we need to + * make sure no copy of this object appears in _any_ pack that makes us + * to omit the object, so we need to check all the packs. Signal that by + * returning -1 to the caller. + */ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * If the caller already knows an existing pack it wants to take the object + * from, that is passed in *found_pack and *found_offset; otherwise this + * function finds if there is any pack that has the object and returns the pack + * and its offset in these variables. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +989,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present we will determine the answer right now. + */ + if (*found_pack) { + want = want_found_object(exclude, *found_pack); + if (want != -1) + return want; + } for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(sha1, p); + if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -974,31 +1020,9 @@ static int want_object_in_pack(const unsigned char *sha1, *found_offset = offset; *found_pack = p; } - if (exclude) - return 1; - if (incremental) - return 0; - - /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the loop to return 1 now. - */ - if (!ignore_packed_keep && - (!local || !have_non_local_packs)) - break; - - if (local && !p->pack_local) - return 0; - if (ignore_packed_keep && p->pack_local && p->pack_keep) - return 0; + want = want_found_object(exclude, p); + if (want != -1) + return want; } } @@ -1039,8 +1063,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..a278d30 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -7,6 +7,18 @@ objpath () { echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" } +# show objects present in pack ($1 should be associated *.idx) +list_packed_objects () { + git show-index <"$1" | cut -d' ' -f2 +} + +# has_any pattern-file content-file +# tests whether content-file has any entry from pattern-file with entries being +# whole lines. +has_any () { + grep -Ff "$1" "$2" +} + test_expect_success 'setup repo with moderate-sized history' ' for i in $(test_seq 1 10); do test_commit $i @@ -16,6 +28,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +131,71 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + list_packed_objects 1.idx >1.objects && + printf "%s\n" "$altblob" "$blob" >nonlocal-loose && + ! has_any nonlocal-loose 1.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + blob2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + >.git/objects/pack/pack2-$pack2.keep && + rm $(objpath $blob2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + list_packed_objects 2a.idx >2a.objects && + ! has_any keepobjects 2a.objects +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + list_packed_objects 2b.idx >2b.objects && + ! has_any keepobjects 2b.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && + >.git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + list_packed_objects 3a.idx >3a.objects && + ! has_any packbitmap.objects 3a.objects +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + list_packed_objects 3b.idx >3b.objects && + ! has_any packbitmap.objects 3b.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +221,20 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + list_packed_objects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + test_cmp 4.objects objects +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-08-09 19:31 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov @ 2016-08-18 17:52 ` Jeff King 2016-09-10 14:57 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Jeff King @ 2016-08-18 17:52 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote: > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > are two codepaths in pack-objects: with & without using bitmap > reachability index. Sorry, I got distracted from reviewing these patches. I'll give them a detailed look now and hopefully we can finalize the topic. > In want_object_in_pack() we care to start the checks from already found > pack, if we have one, this way determining the answer right away > in case neither --local nor --honour-pack-keep are active. In > particular, as p5310-pack-bitmaps.sh shows, we do not do harm to > served-with-bitmap clones performance-wise: > > Test 56dfeb62 this tree > ----------------------------------------------------------------- > 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% > 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% > 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% > 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% > > with all differences strangely showing we are a bit faster now, but > probably all being within noise. Good to know there is no regression. It is curious that there is a slight _improvement_ across the board. Do we have an explanation for that? It seems odd that noise would be so consistent. > And in the general case we care not to have duplicate > find_pack_entry_one(*found_pack) calls. Worst what can happen is we can > call want_found_object(*found_pack) -- newly introduced helper for > checking whether we want object -- twice, but since want_found_object() > is very lightweight it does not make any difference. I had trouble parsing this. I think maybe: In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an object that appears in a > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > + * an object that appears in a pack marked with .keep), finding a pack > + * that matches the criteria is sufficient for us to decide to omit it. > + * However, even if this pack does not satisfy the criteria, we need to > + * make sure no copy of this object appears in _any_ pack that makes us > + * to omit the object, so we need to check all the packs. Signal that by > + * returning -1 to the caller. > + */ > + if (!ignore_packed_keep && > + (!local || !have_non_local_packs)) > + return 1; Hmm. The comment says "-1", but the return says "1". That is because the comment is describing the return that happens at the end. :) I wonder if the last sentence should be: We can check here whether these options can possibly matter; if not, we can return early from the function here. Otherwise, we signal "-1" at the end to tell the caller that we do not know either way, and it needs to check more packs. > - *found_pack = NULL; > - *found_offset = 0; > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep files > + * are present we will determine the answer right now. > + */ > + if (*found_pack) { > + want = want_found_object(exclude, *found_pack); > + if (want != -1) > + return want; > + } Looks correct. Though it is not really "start checks from..." anymore, but rather "do a quick check to see if we can quit early, and otherwise start the loop". That might be nitpicking, though. > for (p = packed_git; p; p = p->next) { > - off_t offset = find_pack_entry_one(sha1, p); > + off_t offset; > + > + if (p == *found_pack) > + offset = *found_offset; > + else > + offset = find_pack_entry_one(sha1, p); > + This hunk will conflict with the MRU optimizations in 'next', but I think the resolution should be pretty trivial. > static int add_object_entry(const unsigned char *sha1, enum object_type type, > const char *name, int exclude) > { > - struct packed_git *found_pack; > - off_t found_offset; > + struct packed_git *found_pack = NULL; > + off_t found_offset = 0; I think technically we don't need to initialize found_offset here (it is considered only if *found_pack is not NULL), but it doesn't hurt to make our starting assumptions clear. > @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; > > + if (!want_object_in_pack(sha1, 0, &pack, &offset)) > + return 0; > + And this caller doesn't need to worry about initialization, because of course it knows it has a pack/offset already. Good. > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..a278d30 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh Tests look OK. I saw a few style nitpicks, but I think they are not even against our style guide but more "I would have written it like this" and are not even worth quibbling over. So I think the code here is fine, and I just had a few minor complaints on comment and commit message clarity. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-08-18 17:52 ` Jeff King @ 2016-09-10 14:57 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 1/2 v8] " Kirill Smelkov ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-09-10 14:57 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Thu, Aug 18, 2016 at 01:52:22PM -0400, Jeff King wrote: > On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote: > > > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > > are two codepaths in pack-objects: with & without using bitmap > > reachability index. > > Sorry, I got distracted from reviewing these patches. I'll give them a > detailed look now and hopefully we can finalize the topic. Jeff, thanks for feedback. On my side I'm sorry for the delay because I was travelling and only recently got back to work. > > In want_object_in_pack() we care to start the checks from already found > > pack, if we have one, this way determining the answer right away > > in case neither --local nor --honour-pack-keep are active. In > > particular, as p5310-pack-bitmaps.sh shows, we do not do harm to > > served-with-bitmap clones performance-wise: > > > > Test 56dfeb62 this tree > > ----------------------------------------------------------------- > > 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% > > 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% > > 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% > > 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% > > > > with all differences strangely showing we are a bit faster now, but > > probably all being within noise. > > Good to know there is no regression. It is curious that there is a > slight _improvement_ across the board. Do we have an explanation for > that? It seems odd that noise would be so consistent. Yes, I too thought it and it turned out to be t/perf/run does not copy config.mak.autogen & friends to build/ and I'm using autoconf with CFLAGS="-march=native -O3 ..." Junio, I could not resist to the following: ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Otherwise for people who use autotools-based configure in main worktree, the performance testing results will be inconsistent as work and build trees could be using e.g. different optimization levels. See e.g. http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/ for example. NOTE config.status has to be copied because otherwise without it the build would want to run reconfigure this way loosing just copied config.mak.autogen. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- t/perf/run | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index cfd7012..aa383c2 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,7 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp ../../config.mak build/$rev/config.mak + cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } -- 2.9.2.701.gf965a18.dirty ---- 8< ---- With corrected t/perf/run the timings are more realistic - e.g. 3 consecutive runs of `./run 56dfeb62 . ./p5310-pack-bitmaps.sh`: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% > > And in the general case we care not to have duplicate > > find_pack_entry_one(*found_pack) calls. Worst what can happen is we can > > call want_found_object(*found_pack) -- newly introduced helper for > > checking whether we want object -- twice, but since want_found_object() > > is very lightweight it does not make any difference. > > I had trouble parsing this. I think maybe: > > In the general case we do not want to call find_pack_entry_one() more > than once, because it is expensive. This patch splits the loop in > want_object_in_pack() into two parts: finding the object and seeing if > it impacts our choice to include it in the pack. We may call the > inexpensive want_found_object() twice, but we will never call > find_pack_entry_one() if we do not need to. Ok, thanks for the advice. > > > +static int want_found_object(int exclude, struct packed_git *p) > > +{ > > + if (exclude) > > + return 1; > > + if (incremental) > > + return 0; > > + > > + /* > > + * When asked to do --local (do not include an object that appears in a > > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > > + * an object that appears in a pack marked with .keep), finding a pack > > + * that matches the criteria is sufficient for us to decide to omit it. > > + * However, even if this pack does not satisfy the criteria, we need to > > + * make sure no copy of this object appears in _any_ pack that makes us > > + * to omit the object, so we need to check all the packs. Signal that by > > + * returning -1 to the caller. > > + */ > > + if (!ignore_packed_keep && > > + (!local || !have_non_local_packs)) > > + return 1; > > Hmm. The comment says "-1", but the return says "1". That is because the > comment is describing the return that happens at the end. :) > > I wonder if the last sentence should be: > > We can check here whether these options can possibly matter; if not, > we can return early from the function here. Otherwise, we signal "-1" > at the end to tell the caller that we do not know either way, and it > needs to check more packs. Thanks for the catch and hint. I've changed it to the following: We can however first check whether these options can possible matter; if they do not matter we know we want the object in generated pack. Otherwise, we signal "-1" at the end to tell the caller that we do not know either way, and it needs to check more packs. full version: /* * When asked to do --local (do not include an object that appears in a * pack we borrow from elsewhere) or --honor-pack-keep (do not include * an object that appears in a pack marked with .keep), finding a pack * that matches the criteria is sufficient for us to decide to omit it. * However, even if this pack does not satisfy the criteria, we need to * make sure no copy of this object appears in _any_ pack that makes us * to omit the object, so we need to check all the packs. * * We can however first check whether these options can possible matter; * if they do not matter we know we want the object in generated pack. * Otherwise, we signal "-1" at the end to tell the caller that we do * not know either way, and it needs to check more packs. */ Hope it is ok. > > - *found_pack = NULL; > > - *found_offset = 0; > > + /* > > + * If we already know the pack object lives in, start checks from that > > + * pack - in the usual case when neither --local was given nor .keep files > > + * are present we will determine the answer right now. > > + */ > > + if (*found_pack) { > > + want = want_found_object(exclude, *found_pack); > > + if (want != -1) > > + return want; > > + } > > Looks correct. Though it is not really "start checks from..." anymore, > but rather "do a quick check to see if we can quit early, and otherwise > start the loop". That might be nitpicking, though. I see. Your version is ok, but to me 'start checks from ...' is a bit more natural and explaining (yes, all subjective and depending on taste), so if possible I'd prefer to leave it as is. > > > for (p = packed_git; p; p = p->next) { > > - off_t offset = find_pack_entry_one(sha1, p); > > + off_t offset; > > + > > + if (p == *found_pack) > > + offset = *found_offset; > > + else > > + offset = find_pack_entry_one(sha1, p); > > + > > This hunk will conflict with the MRU optimizations in 'next', but I > think the resolution should be pretty trivial. Yes. > > static int add_object_entry(const unsigned char *sha1, enum object_type type, > > const char *name, int exclude) > > { > > - struct packed_git *found_pack; > > - off_t found_offset; > > + struct packed_git *found_pack = NULL; > > + off_t found_offset = 0; > > I think technically we don't need to initialize found_offset here (it is > considered only if *found_pack is not NULL), but it doesn't hurt to make > our starting assumptions clear. Yes, found_pack != NULL is indicator whether we have found_pack / found_offset info, but it makes it much clear and defending from mistakes to set both found_{pack,offset} into known initial state. > > @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > > if (have_duplicate_entry(sha1, 0, &index_pos)) > > return 0; > > > > + if (!want_object_in_pack(sha1, 0, &pack, &offset)) > > + return 0; > > + > > And this caller doesn't need to worry about initialization, because of > course it knows it has a pack/offset already. Good. Yes, we have this info from bitmap walker calling us. > > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > > index 3893afd..a278d30 100755 > > --- a/t/t5310-pack-bitmaps.sh > > +++ b/t/t5310-pack-bitmaps.sh > > Tests look OK. I saw a few style nitpicks, but I think they are not even > against our style guide but more "I would have written it like this" and > are not even worth quibbling over. > > So I think the code here is fine, and I just had a few minor complaints > on comment and commit message clarity. Thanks for feedback. Yes tastes can differ but your comments regarding commit message and want_found_object() were objectively (imho) worth it and there I've made the adjustments. Please expect updated patch to be send as reply to this mail. Thanks again for feedback, Kirill ^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-09-10 14:57 ` Kirill Smelkov @ 2016-09-10 15:01 ` Kirill Smelkov 2016-09-13 6:23 ` Junio C Hamano 2016-09-10 15:05 ` [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Kirill Smelkov 2016-09-12 17:33 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Junio C Hamano 2 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-09-10 15:01 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/pack-objects.c | 97 ++++++++++++++++++++++++++++++++----------------- t/t5310-pack-bitmaps.sh | 92 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 156 insertions(+), 33 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c4c2a3c..19668d3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -944,13 +944,48 @@ static int have_duplicate_entry(const unsigned char *sha1, return 1; } +static int want_found_object(int exclude, struct packed_git *p) +{ + if (exclude) + return 1; + if (incremental) + return 0; + + /* + * When asked to do --local (do not include an object that appears in a + * pack we borrow from elsewhere) or --honor-pack-keep (do not include + * an object that appears in a pack marked with .keep), finding a pack + * that matches the criteria is sufficient for us to decide to omit it. + * However, even if this pack does not satisfy the criteria, we need to + * make sure no copy of this object appears in _any_ pack that makes us + * to omit the object, so we need to check all the packs. + * + * We can however first check whether these options can possible matter; + * if they do not matter we know we want the object in generated pack. + * Otherwise, we signal "-1" at the end to tell the caller that we do + * not know either way, and it needs to check more packs. + */ + if (!ignore_packed_keep && + (!local || !have_non_local_packs)) + return 1; + + if (local && !p->pack_local) + return 0; + if (ignore_packed_keep && p->pack_local && p->pack_keep) + return 0; + + /* we don't know yet; keep looking for more packs */ + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). * - * As a side effect of this check, we will find the packed version of this - * object, if any. We therefore pass out the pack information to avoid having - * to look it up again later. + * If the caller already knows an existing pack it wants to take the object + * from, that is passed in *found_pack and *found_offset; otherwise this + * function finds if there is any pack that has the object and returns the pack + * and its offset in these variables. */ static int want_object_in_pack(const unsigned char *sha1, int exclude, @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char *sha1, off_t *found_offset) { struct packed_git *p; + int want; if (!exclude && local && has_loose_object_nonlocal(sha1)) return 0; - *found_pack = NULL; - *found_offset = 0; + /* + * If we already know the pack object lives in, start checks from that + * pack - in the usual case when neither --local was given nor .keep files + * are present we will determine the answer right now. + */ + if (*found_pack) { + want = want_found_object(exclude, *found_pack); + if (want != -1) + return want; + } for (p = packed_git; p; p = p->next) { - off_t offset = find_pack_entry_one(sha1, p); + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(sha1, p); + if (offset) { if (!*found_pack) { if (!is_pack_valid(p)) @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char *sha1, *found_offset = offset; *found_pack = p; } - if (exclude) - return 1; - if (incremental) - return 0; - - /* - * When asked to do --local (do not include an - * object that appears in a pack we borrow - * from elsewhere) or --honor-pack-keep (do not - * include an object that appears in a pack marked - * with .keep), we need to make sure no copy of this - * object come from in _any_ pack that causes us to - * omit it, and need to complete this loop. When - * neither option is in effect, we know the object - * we just found is going to be packed, so break - * out of the loop to return 1 now. - */ - if (!ignore_packed_keep && - (!local || !have_non_local_packs)) - break; - - if (local && !p->pack_local) - return 0; - if (ignore_packed_keep && p->pack_local && p->pack_keep) - return 0; + want = want_found_object(exclude, p); + if (want != -1) + return want; } } @@ -1039,8 +1067,8 @@ static const char no_closure_warning[] = N_( static int add_object_entry(const unsigned char *sha1, enum object_type type, const char *name, int exclude) { - struct packed_git *found_pack; - off_t found_offset; + struct packed_git *found_pack = NULL; + off_t found_offset = 0; uint32_t index_pos; if (have_duplicate_entry(sha1, exclude, &index_pos)) @@ -1073,6 +1101,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, if (have_duplicate_entry(sha1, 0, &index_pos)) return 0; + if (!want_object_in_pack(sha1, 0, &pack, &offset)) + return 0; + create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset); display_progress(progress_state, nr_result); diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..a278d30 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -7,6 +7,18 @@ objpath () { echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')" } +# show objects present in pack ($1 should be associated *.idx) +list_packed_objects () { + git show-index <"$1" | cut -d' ' -f2 +} + +# has_any pattern-file content-file +# tests whether content-file has any entry from pattern-file with entries being +# whole lines. +has_any () { + grep -Ff "$1" "$2" +} + test_expect_success 'setup repo with moderate-sized history' ' for i in $(test_seq 1 10); do test_commit $i @@ -16,6 +28,7 @@ test_expect_success 'setup repo with moderate-sized history' ' test_commit side-$i done && git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true && @@ -118,6 +131,71 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects respects --local (non-local loose)' ' + git init --bare alt.git && + echo $(pwd)/alt.git/objects >.git/objects/info/alternates && + echo content1 >file1 && + # non-local loose object which is not present in bitmapped pack + altblob=$(GIT_DIR=alt.git git hash-object -w file1) && + # non-local loose object which is also present in bitmapped pack + git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin && + git add file1 && + test_tick && + git commit -m commit_file1 && + echo HEAD | git pack-objects --local --stdout --revs >1.pack && + git index-pack 1.pack && + list_packed_objects 1.idx >1.objects && + printf "%s\n" "$altblob" "$blob" >nonlocal-loose && + ! has_any nonlocal-loose 1.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' ' + echo content2 >file2 && + blob2=$(git hash-object -w file2) && + git add file2 && + test_tick && + git commit -m commit_file2 && + printf "%s\n" "$blob2" "$bitmaptip" >keepobjects && + pack2=$(git pack-objects pack2 <keepobjects) && + mv pack2-$pack2.* .git/objects/pack/ && + >.git/objects/pack/pack2-$pack2.keep && + rm $(objpath $blob2) && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack && + git index-pack 2a.pack && + list_packed_objects 2a.idx >2a.objects && + ! has_any keepobjects 2a.objects +' + +test_expect_success 'pack-objects respects --local (non-local pack)' ' + mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ && + echo HEAD | git pack-objects --local --stdout --revs >2b.pack && + git index-pack 2b.pack && + list_packed_objects 2b.idx >2b.objects && + ! has_any keepobjects 2b.objects +' + +test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' ' + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + packbitmap=$(basename $(cat output) .bitmap) && + list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects && + test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" && + >.git/objects/pack/$packbitmap.keep && + echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack && + git index-pack 3a.pack && + list_packed_objects 3a.idx >3a.objects && + ! has_any packbitmap.objects 3a.objects +' + +test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' + mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ && + test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" && + echo HEAD | git pack-objects --local --stdout --revs >3b.pack && + git index-pack 3b.pack && + list_packed_objects 3b.idx >3b.objects && + ! has_any packbitmap.objects 3b.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && @@ -143,6 +221,20 @@ test_expect_success 'create objects for missing-HAVE tests' ' EOF ' +test_expect_success 'pack-objects respects --incremental' ' + cat >revs2 <<-EOF && + HEAD + $commit + EOF + git pack-objects --incremental --stdout --revs <revs2 >4.pack && + git index-pack 4.pack && + list_packed_objects 4.idx >4.objects && + test_line_count = 4 4.objects && + git rev-list --objects $commit >revlist && + cut -d" " -f1 revlist |sort >objects && + test_cmp 4.objects objects +' + test_expect_success 'pack with missing blob' ' rm $(objpath $blob) && git pack-objects --stdout --revs <revs >/dev/null -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-09-10 15:01 ` [PATCH 1/2 v8] " Kirill Smelkov @ 2016-09-13 6:23 ` Junio C Hamano 2016-09-13 7:50 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-09-13 6:23 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an object that appears in a > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > + * an object that appears in a pack marked with .keep), finding a pack > + * that matches the criteria is sufficient for us to decide to omit it. > + * However, even if this pack does not satisfy the criteria, we need to > + * make sure no copy of this object appears in _any_ pack that makes us > + * to omit the object, so we need to check all the packs. > + * > + * We can however first check whether these options can possible matter; > + * if they do not matter we know we want the object in generated pack. > + * Otherwise, we signal "-1" at the end to tell the caller that we do > + * not know either way, and it needs to check more packs. > + */ > + if (!ignore_packed_keep && > + (!local || !have_non_local_packs)) > + return 1; > + > + if (local && !p->pack_local) > + return 0; > + if (ignore_packed_keep && p->pack_local && p->pack_keep) > + return 0; > + > + /* we don't know yet; keep looking for more packs */ > + return -1; > +} Moving this logic out to this helper made the main logic in the caller easier to grasp. > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char *sha1, > off_t *found_offset) > { > struct packed_git *p; > + int want; > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > return 0; > > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep files > + * are present we will determine the answer right now. > + */ > + if (*found_pack) { > + want = want_found_object(exclude, *found_pack); > + if (want != -1) > + return want; > + } > > for (p = packed_git; p; p = p->next) { > + off_t offset; > + > + if (p == *found_pack) > + offset = *found_offset; > + else > + offset = find_pack_entry_one(sha1, p); > + > if (offset) { > if (!*found_pack) { > if (!is_pack_valid(p)) > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char *sha1, > *found_offset = offset; > *found_pack = p; > } > + want = want_found_object(exclude, p); > + if (want != -1) > + return want; > } > } As Peff noted in his earlier review, however, MRU code needed to be grafted in to the caller (an update to the MRU list was done in the code that was moved to the want_found_object() helper). I think I did it correctly, which ended up looking like this: want = want_found_object(exclude, p); if (!exclude && want > 0) mru_mark(packed_git_mru, entry); if (want != -1) return want; I somewhat feel that it is ugly that the helper knows about exclude (i.e. in the original code, we immediately returned 1 without futzing with the MRU when we find an entry that is to be excluded, which now is done in the helper), and the caller also knows about exclude (i.e. the caller knows that the helper may return positive in two cases, it knows that MRU marking needs to happen only one of the two cases, and it also knows that "exclude" is what differentiates between the two cases) at the same time. But probably the reason why I feel it ugly is only because I knew how the original looked like. I dunno. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-09-13 6:23 ` Junio C Hamano @ 2016-09-13 7:50 ` Kirill Smelkov 0 siblings, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-09-13 7:50 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Sep 12, 2016 at 11:23:18PM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > +static int want_found_object(int exclude, struct packed_git *p) > > +{ > > + if (exclude) > > + return 1; > > + if (incremental) > > + return 0; > > + > > + /* > > + * When asked to do --local (do not include an object that appears in a > > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > > + * an object that appears in a pack marked with .keep), finding a pack > > + * that matches the criteria is sufficient for us to decide to omit it. > > + * However, even if this pack does not satisfy the criteria, we need to > > + * make sure no copy of this object appears in _any_ pack that makes us > > + * to omit the object, so we need to check all the packs. > > + * > > + * We can however first check whether these options can possible matter; > > + * if they do not matter we know we want the object in generated pack. > > + * Otherwise, we signal "-1" at the end to tell the caller that we do > > + * not know either way, and it needs to check more packs. > > + */ > > + if (!ignore_packed_keep && > > + (!local || !have_non_local_packs)) > > + return 1; > > + > > + if (local && !p->pack_local) > > + return 0; > > + if (ignore_packed_keep && p->pack_local && p->pack_keep) > > + return 0; > > + > > + /* we don't know yet; keep looking for more packs */ > > + return -1; > > +} > > Moving this logic out to this helper made the main logic in the > caller easier to grasp. > > > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char *sha1, > > off_t *found_offset) > > { > > struct packed_git *p; > > + int want; > > > > if (!exclude && local && has_loose_object_nonlocal(sha1)) > > return 0; > > > > + /* > > + * If we already know the pack object lives in, start checks from that > > + * pack - in the usual case when neither --local was given nor .keep files > > + * are present we will determine the answer right now. > > + */ > > + if (*found_pack) { > > + want = want_found_object(exclude, *found_pack); > > + if (want != -1) > > + return want; > > + } > > > > for (p = packed_git; p; p = p->next) { > > + off_t offset; > > + > > + if (p == *found_pack) > > + offset = *found_offset; > > + else > > + offset = find_pack_entry_one(sha1, p); > > + > > if (offset) { > > if (!*found_pack) { > > if (!is_pack_valid(p)) > > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char *sha1, > > *found_offset = offset; > > *found_pack = p; > > } > > + want = want_found_object(exclude, p); > > + if (want != -1) > > + return want; > > } > > } > > As Peff noted in his earlier review, however, MRU code needed to be > grafted in to the caller (an update to the MRU list was done in the > code that was moved to the want_found_object() helper). I think I > did it correctly, which ended up looking like this: > > want = want_found_object(exclude, p); > if (!exclude && want > 0) > mru_mark(packed_git_mru, entry); > if (want != -1) > return want; > > I somewhat feel that it is ugly that the helper knows about exclude > (i.e. in the original code, we immediately returned 1 without > futzing with the MRU when we find an entry that is to be excluded, > which now is done in the helper), and the caller also knows about > exclude (i.e. the caller knows that the helper may return positive > in two cases, it knows that MRU marking needs to happen only one of > the two cases, and it also knows that "exclude" is what > differentiates between the two cases) at the same time. > > But probably the reason why I feel it ugly is only because I knew > how the original looked like. I dunno. Junio, the code above is correct semantic merge of pack-mru and my topic, because in pack-mru if found and exclude=1, 1 was returned without marking found pack. But I wonder: even if we exclude an object, we were still looking for it in packs, and when we found it, we found the corresponding pack too. So, that pack _was_ most-recently-used, and it is correct to mark it as MRU. We can do the simplification in the follow-up patch after the merge, so merge does not change semantics and it is all bisectable, etc. Jeff? ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area 2016-09-10 14:57 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 1/2 v8] " Kirill Smelkov @ 2016-09-10 15:05 ` Kirill Smelkov 2016-09-12 19:12 ` Junio C Hamano 2016-09-12 17:33 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Junio C Hamano 2 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-09-10 15:05 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Otherwise for people who use autotools-based configure in main worktree, the performance testing results will be inconsistent as work and build trees could be using e.g. different optimization levels. See e.g. http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/ for example. NOTE config.status has to be copied because otherwise without it the build would want to run reconfigure this way loosing just copied config.mak.autogen. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- ( Resending as separate patch-mail, just in case ) t/perf/run | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index cfd7012..aa383c2 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,7 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp ../../config.mak build/$rev/config.mak + cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area 2016-09-10 15:05 ` [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Kirill Smelkov @ 2016-09-12 19:12 ` Junio C Hamano 2016-09-12 19:17 ` Junio C Hamano 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-09-12 19:12 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > Otherwise for people who use autotools-based configure in main worktree, > the performance testing results will be inconsistent as work and build > trees could be using e.g. different optimization levels. > > See e.g. > > http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/ > > for example. > > NOTE config.status has to be copied because otherwise without it the build > would want to run reconfigure this way loosing just copied config.mak.autogen. > > Signed-off-by: Kirill Smelkov <kirr@nexedi.com> > --- > ( Resending as separate patch-mail, just in case ) > > t/perf/run | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/t/perf/run b/t/perf/run > index cfd7012..aa383c2 100755 > --- a/t/perf/run > +++ b/t/perf/run > @@ -30,7 +30,7 @@ unpack_git_rev () { > } > build_git_rev () { > rev=$1 > - cp ../../config.mak build/$rev/config.mak > + cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} That unfortunately is a GNUism -t with a bash-ism {a,b,c}; just keep it simple and stupid to make sure it is portable. This is not even a part that we measure the runtime for anyway. > (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || > die "failed to build revision '$mydir'" > } ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area 2016-09-12 19:12 ` Junio C Hamano @ 2016-09-12 19:17 ` Junio C Hamano 2016-09-12 23:10 ` Junio C Hamano 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-09-12 19:17 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Junio C Hamano <gitster@pobox.com> writes: >> build_git_rev () { >> rev=$1 >> - cp ../../config.mak build/$rev/config.mak >> + cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} > > That unfortunately is a GNUism -t with a bash-ism {a,b,c}; just keep > it simple and stupid to make sure it is portable. > > This is not even a part that we measure the runtime for anyway. In other words, something along this line, perhaps. t/perf/run | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index aa383c2..69a4714 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,10 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} + for config in config.mak config.mak.autogen config.status + do + cp "../../$config" "build/$rev/" + done (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } -- 2.10.0-342-gc678130 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area 2016-09-12 19:17 ` Junio C Hamano @ 2016-09-12 23:10 ` Junio C Hamano 2016-09-13 6:58 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-09-12 23:10 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Junio C Hamano <gitster@pobox.com> writes: > In other words, something along this line, perhaps. > ... Not quite. There is no guanratee that the user is using autoconf at all. It should be more like this, I think. t/perf/run | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index aa383c2..7ec3734 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,13 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} + for config in config.mak config.mak.autogen config.status + do + if test -f "../../$config" + then + cp "../../$config" "build/$rev/" + fi + done (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area 2016-09-12 23:10 ` Junio C Hamano @ 2016-09-13 6:58 ` Kirill Smelkov 0 siblings, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-09-13 6:58 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Sep 12, 2016 at 04:10:09PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > In other words, something along this line, perhaps. > > ... > > Not quite. There is no guanratee that the user is using autoconf at > all. It should be more like this, I think. > > t/perf/run | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/t/perf/run b/t/perf/run > index aa383c2..7ec3734 100755 > --- a/t/perf/run > +++ b/t/perf/run > @@ -30,7 +30,13 @@ unpack_git_rev () { > } > build_git_rev () { > rev=$1 > - cp -t build/$rev ../../{config.mak,config.mak.autogen,config.status} > + for config in config.mak config.mak.autogen config.status > + do > + if test -f "../../$config" > + then > + cp "../../$config" "build/$rev/" > + fi > + done > (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || > die "failed to build revision '$mydir'" > } Junio, thanks for encouraging feedback and for catching the *-isms. What you propose is good (and we also automatically fix error when there was no config.mak - it was working but cp was giving an error to stderr but script was continuing normally). I would amend your squash the following way: * `test -f` -> `test -e`, because -f tests whether a file exists _and_ is regular file. Some people might have config.mak as a symlink for example. We don't want to miss them too. Please find updated patch below: ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Subject: [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Otherwise for people who use autotools-based configure in main worktree, the performance testing results will be inconsistent as work and build trees could be using e.g. different optimization levels. See e.g. http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/ for example. NOTE config.status has to be copied because otherwise without it the build would want to run reconfigure this way loosing just copied config.mak.autogen. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- t/perf/run | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/t/perf/run b/t/perf/run index cfd7012..e8adeda 100755 --- a/t/perf/run +++ b/t/perf/run @@ -30,7 +30,13 @@ unpack_git_rev () { } build_git_rev () { rev=$1 - cp ../../config.mak build/$rev/config.mak + for config in config.mak config.mak.autogen config.status + do + if test -e "../../$config" + then + cp "../../$config" "build/$rev/" + fi + done (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || die "failed to build revision '$mydir'" } -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use 2016-09-10 14:57 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 1/2 v8] " Kirill Smelkov 2016-09-10 15:05 ` [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Kirill Smelkov @ 2016-09-12 17:33 ` Junio C Hamano 2 siblings, 0 replies; 62+ messages in thread From: Junio C Hamano @ 2016-09-12 17:33 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > On Thu, Aug 18, 2016 at 01:52:22PM -0400, Jeff King wrote: > > > > Good to know there is no regression. It is curious that there is a > > slight _improvement_ across the board. Do we have an explanation for > > that? It seems odd that noise would be so consistent. > > Yes, I too thought it and it turned out to be t/perf/run does not copy > config.mak.autogen & friends to build/ and I'm using autoconf with > CFLAGS="-march=native -O3 ..." > > Junio, I could not resist to the following: > ... > With corrected t/perf/run the timings are more realistic - e.g. 3 > consecutive runs of `./run 56dfeb62 . ./p5310-pack-bitmaps.sh`: Wow, that's what I call an exchange with quality during a review ;-) Thanks for the curiosity and digging it to the root cause of the anomaly. Some GNUism/bashism in the way copying is spelled in the patch bothers me, but that is easily fixable. Thanks. ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-08-09 19:29 ` Kirill Smelkov 2016-08-09 19:31 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov @ 2016-08-09 19:32 ` Kirill Smelkov 2016-08-18 18:06 ` Jeff King 2016-08-09 19:49 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 2 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 19:32 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://marc.info/?t=146792101400001&r=1&w=2 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 31 ++++++++++++++++++++++++------- t/t5310-pack-bitmaps.sh | 12 ++++++++++++ 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b1007f2..c92d7fc 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -67,7 +67,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2270,7 +2271,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2519,13 +2520,13 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } /* - * This tracks any options which a reader of the pack might - * not understand, and which would therefore prevent blind reuse - * of what we have on disk. + * This tracks any options which pack-reuse code expects to be on, or which a + * reader of the pack might not understand, and which would therefore prevent + * blind reuse of what we have on disk. */ static int pack_options_allow_reuse(void) { - return allow_ofs_delta; + return pack_to_stdout && allow_ofs_delta; } static int get_object_list_from_bitmap(struct rev_info *revs) @@ -2818,7 +2819,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index a278d30..9602e9a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -196,6 +196,18 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' ! has_any packbitmap.objects 3b.objects ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + list_packed_objects <packa-$packasha1.idx >packa.objects && + list_packed_objects <packb-$packbsha1.idx >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-08-09 19:32 ` [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov @ 2016-08-18 18:06 ` Jeff King 2016-09-10 14:59 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Jeff King @ 2016-08-18 18:06 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Tue, Aug 09, 2016 at 10:32:17PM +0300, Kirill Smelkov wrote: > Subject: Re: [PATCH 2/2 v7] pack-objects: use reachability bitmap index when > generating non-stdout pack This is v7, but as I understand your numbering, it goes with v5 of patch 1/2 that I just reviewed (usually we just increment the version number on the whole series and treat it as a unit, even if some patches didn't change from version to version). > So we can teach pack-objects to use bitmap index for initial object > counting phase when generating resultant pack file too: > > - if we care it is not activated under git-repack: Do you mean "if we take care that it is not..." here? (I think you might just be getting tripped up in the English idioms; "care" means that we have a preference; "to take care" means that we are being careful). > - if we know bitmap index generation is not enabled for resultant pack: > > Current code has singleton bitmap_git so cannot work simultaneously > with two bitmap indices. Minor English fixes: The current code has a singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. > - if we keep pack reuse enabled still only for "send-to-stdout" case: > > Because on pack reuse raw entries are directly written out to destination > pack by write_reused_pack() bypassing needed for pack index generation > bookkeeping done by regular codepath in write_one() and friends. Ditto on English: On pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing the need for pack index generation bookkeeping done by the regular code path in write_one() and friends. I think this is missing the implication. Why wouldn't we want to reuse in this case? Certainly we don't when doing a "careful" on-disk repack. I suspect the answer is that we cannot write a ".idx" off of the result of write_reused_pack(), and write-to-disk always includes the .idx. > More context: > > http://marc.info/?t=146792101400001&r=1&w=2 Can we turn this into a link to public-inbox? We have just been bit by all of our old links to gmane dying, and they cannot easily be replaced because they use a gmane-specific article number. public-inbox URLs use message-ids, which should be usable for other archives if public-inbox goes away. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index b1007f2..c92d7fc 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c The code here looks fine. > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index a278d30..9602e9a 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -196,6 +196,18 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' > ! has_any packbitmap.objects 3b.objects > ' > > +test_expect_success 'pack-objects to file can use bitmap' ' > + # make sure we still have 1 bitmap index from previous tests > + ls .git/objects/pack/ | grep bitmap >output && > + test_line_count = 1 output && > + # verify equivalent packs are generated with/without using bitmap index > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > + list_packed_objects <packa-$packasha1.idx >packa.objects && > + list_packed_objects <packb-$packbsha1.idx >packb.objects && > + test_cmp packa.objects packb.objects > +' Of course we can't know if bitmaps were actually used, or if they were turned off under the hood. But at least this exercises the code a bit. You could possibly add a perf test which shows off the improvement, but I don't think it's strictly necessary. -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-08-18 18:06 ` Jeff King @ 2016-09-10 14:59 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 2/2 v8] " Kirill Smelkov 2016-09-12 19:21 ` [PATCH 2/2 v7] " Junio C Hamano 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-09-10 14:59 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Thu, Aug 18, 2016 at 02:06:15PM -0400, Jeff King wrote: > On Tue, Aug 09, 2016 at 10:32:17PM +0300, Kirill Smelkov wrote: > > > Subject: Re: [PATCH 2/2 v7] pack-objects: use reachability bitmap index when > > generating non-stdout pack > > This is v7, but as I understand your numbering, it goes with v5 of patch > 1/2 that I just reviewed (usually we just increment the version number > on the whole series and treat it as a unit, even if some patches didn't > change from version to version). The reason those patches are having their own numbers is that they are orthogonal to each other and can be applied / rejected independently. Since I though Junio might want to pick them up as separate topics they were versioned separately. But ok, since now we have them considered both together, their next versions posted will be uniform v8. > > So we can teach pack-objects to use bitmap index for initial object > > counting phase when generating resultant pack file too: > > > > - if we care it is not activated under git-repack: > > Do you mean "if we take care that it is not..." here? > > (I think you might just be getting tripped up in the English idioms; > "care" means that we have a preference; "to take care" means that we are > being careful). Ok, I've might have been tripped and thanks for the catch up. I've changed to "if we take care to not let it be activated under git-repack" > > > - if we know bitmap index generation is not enabled for resultant pack: > > > > Current code has singleton bitmap_git so cannot work simultaneously > > with two bitmap indices. > > Minor English fixes: > > The current code has a singleton bitmap_git, so it cannot work > simultaneously with two bitmap indices. ok. > > - if we keep pack reuse enabled still only for "send-to-stdout" case: > > > > Because on pack reuse raw entries are directly written out to destination > > pack by write_reused_pack() bypassing needed for pack index generation > > bookkeeping done by regular codepath in write_one() and friends. > > Ditto on English: > > On pack reuse raw entries are directly written out to the destination > pack by write_reused_pack(), bypassing the need for pack index > generation bookkeeping done by the regular code path in write_one() > and friends. > > I think this is missing the implication. Why wouldn't we want to reuse > in this case? Certainly we don't when doing a "careful" on-disk repack. > I suspect the answer is that we cannot write a ".idx" off of the result > of write_reused_pack(), and write-to-disk always includes the .idx. Yes, mentioning pack-to-file needs to generate .idx makes it more clear and thanks for pointing this out. I've changed this item to the following (picking some of your English corrections): - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) Hope it is ok. > > More context: > > > > http://marc.info/?t=146792101400001&r=1&w=2 > > Can we turn this into a link to public-inbox? We have just been bit by > all of our old links to gmane dying, and they cannot easily be replaced > because they use a gmane-specific article number. public-inbox URLs use > message-ids, which should be usable for other archives if public-inbox > goes away. Yes, makes sense to put msgid here. I've added http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index b1007f2..c92d7fc 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > The code here looks fine. Thanks. > > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > > index a278d30..9602e9a 100755 > > --- a/t/t5310-pack-bitmaps.sh > > +++ b/t/t5310-pack-bitmaps.sh > > @@ -196,6 +196,18 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' > > ! has_any packbitmap.objects 3b.objects > > ' > > > > +test_expect_success 'pack-objects to file can use bitmap' ' > > + # make sure we still have 1 bitmap index from previous tests > > + ls .git/objects/pack/ | grep bitmap >output && > > + test_line_count = 1 output && > > + # verify equivalent packs are generated with/without using bitmap index > > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > > + list_packed_objects <packa-$packasha1.idx >packa.objects && > > + list_packed_objects <packb-$packbsha1.idx >packb.objects && > > + test_cmp packa.objects packb.objects > > +' > > Of course we can't know if bitmaps were actually used, or if they were > turned off under the hood. But at least this exercises the code a bit. Yes, I was thinking how to know the bitmap codepath was actually active, and without adding debugging points there is no way (at least I could not find it). > You could possibly add a perf test which shows off the improvement, but > I don't think it's strictly necessary. Good idea. I've added this ---- 8< ---- diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh index de2a224..bb91dbb 100755 --- a/t/perf/p5310-pack-bitmaps.sh +++ b/t/perf/p5310-pack-bitmaps.sh @@ -32,6 +32,14 @@ test_perf 'simulated fetch' ' } | git pack-objects --revs --stdout >/dev/null ' +test_perf 'pack to file' ' + git pack-objects --all pack1 </dev/null >/dev/null +' + +test_perf 'pack to file (bitmap)' ' + git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null +' + test_expect_success 'create partial bitmap state' ' # pick a commit to represent the repo tip in the past cutoff=$(git rev-list HEAD~100 -1) && @@ -53,8 +61,12 @@ test_expect_success 'create partial bitmap state' ' git update-ref HEAD $orig_tip ' -test_perf 'partial bitmap' ' +test_perf 'clone (partial bitmap)' ' git pack-objects --stdout --all </dev/null >/dev/null ' +test_perf 'pack to file (partial bitmap)' ' + git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null +' + test_done ---- 8< ---- Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% Kirill ^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH 2/2 v8] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-09-10 14:59 ` Kirill Smelkov @ 2016-09-10 15:01 ` Kirill Smelkov 2016-09-12 19:21 ` [PATCH 2/2 v7] " Junio C Hamano 1 sibling, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-09-10 15:01 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/pack-objects.c | 31 ++++++++++++++++++++++++------- t/perf/p5310-pack-bitmaps.sh | 14 +++++++++++++- t/t5310-pack-bitmaps.sh | 12 ++++++++++++ 3 files changed, 49 insertions(+), 8 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 19668d3..d48c290 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -67,7 +67,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2274,7 +2275,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2523,13 +2524,13 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } /* - * This tracks any options which a reader of the pack might - * not understand, and which would therefore prevent blind reuse - * of what we have on disk. + * This tracks any options which pack-reuse code expects to be on, or which a + * reader of the pack might not understand, and which would therefore prevent + * blind reuse of what we have on disk. */ static int pack_options_allow_reuse(void) { - return allow_ofs_delta; + return pack_to_stdout && allow_ofs_delta; } static int get_object_list_from_bitmap(struct rev_info *revs) @@ -2822,7 +2823,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh index de2a224..bb91dbb 100755 --- a/t/perf/p5310-pack-bitmaps.sh +++ b/t/perf/p5310-pack-bitmaps.sh @@ -32,6 +32,14 @@ test_perf 'simulated fetch' ' } | git pack-objects --revs --stdout >/dev/null ' +test_perf 'pack to file' ' + git pack-objects --all pack1 </dev/null >/dev/null +' + +test_perf 'pack to file (bitmap)' ' + git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null +' + test_expect_success 'create partial bitmap state' ' # pick a commit to represent the repo tip in the past cutoff=$(git rev-list HEAD~100 -1) && @@ -53,8 +61,12 @@ test_expect_success 'create partial bitmap state' ' git update-ref HEAD $orig_tip ' -test_perf 'partial bitmap' ' +test_perf 'clone (partial bitmap)' ' git pack-objects --stdout --all </dev/null >/dev/null ' +test_perf 'pack to file (partial bitmap)' ' + git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null +' + test_done diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index a278d30..9602e9a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -196,6 +196,18 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' ! has_any packbitmap.objects 3b.objects ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + list_packed_objects <packa-$packasha1.idx >packa.objects && + list_packed_objects <packb-$packbsha1.idx >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-09-10 14:59 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 2/2 v8] " Kirill Smelkov @ 2016-09-12 19:21 ` Junio C Hamano 1 sibling, 0 replies; 62+ messages in thread From: Junio C Hamano @ 2016-09-12 19:21 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: >> This is v7, but as I understand your numbering, it goes with v5 of patch >> 1/2 that I just reviewed (usually we just increment the version number >> on the whole series and treat it as a unit, even if some patches didn't >> change from version to version). > > The reason those patches are having their own numbers is that they are > orthogonal to each other and can be applied / rejected independently. In such a case, we wouldn't label them 1/2 and 2/2, which tells the readers that these are two pieces that are to be applied together to form a single unit of change. That was what these numbered patches with different version numbers confusing. > But ok, since now we have them considered both together, their next > versions posted will be uniform v8. OK. Thanks for clarifying. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental 2016-08-09 19:29 ` Kirill Smelkov 2016-08-09 19:31 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov 2016-08-09 19:32 ` [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov @ 2016-08-09 19:49 ` Junio C Hamano 2 siblings, 0 replies; 62+ messages in thread From: Junio C Hamano @ 2016-08-09 19:49 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > On Tue, Aug 09, 2016 at 09:52:18AM -0700, Junio C Hamano wrote: >> "touch A" forcess the readers wonder "does the timestamp of A >> matter, and if so in what way?" and "does any later test care what >> is _in_ A, and if so in what way?" Both of them is wasting their >> time when there is no reason why "touch" should have been used. > > I see, thanks for explaining. I used to read it a bit the other way; Surely ">A" may invite "Hmm, is it important that A gets empty?", so the choice between the two is not so black-and-white. It just is that "touch" has a more specific "update the timestamp while keeping its contents intact" meaning, compared to ">A", which _could_ be read as "make it empty and update its mtime" but most people would not (i.e. "update its mtime" is a side effect for any modification). > Ok, makes sense. Both patches adjusted and will be reposted. Thanks. ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-07-29 7:40 ` Kirill Smelkov 2016-07-29 7:46 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Kirill Smelkov @ 2016-07-29 7:47 ` Kirill Smelkov 2016-08-08 13:56 ` Jeff King 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-07-29 7:47 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff Kind further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way all git tests pass, and for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://article.gmane.org/gmane.comp.version-control.git/299063 http://article.gmane.org/gmane.comp.version-control.git/299107 http://article.gmane.org/gmane.comp.version-control.git/299420 http://article.gmane.org/gmane.comp.version-control.git/300217 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 25 +++++++++++++++++++++---- t/t5310-pack-bitmaps.sh | 14 ++++++++++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 8b1aee4..6a903c0 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2244,6 +2244,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 34b3019..2b2e74a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2264,7 +2265,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2527,7 +2528,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (prepare_bitmap_walk(revs) < 0) return -1; - if (pack_options_allow_reuse() && + if (pack_options_allow_reuse() && pack_to_stdout && !reuse_partial_packfile_from_bitmap( &reuse_packfile, &reuse_packfile_objects, @@ -2812,7 +2813,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index a76f6ca..58c3b29 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -200,6 +200,20 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' mv alt_objects/pack/$packbitmap.* .git/objects/pack/ ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git verify-pack -v packa-$packasha1.pack >packa.verify && + git verify-pack -v packb-$packbsha1.pack >packb.verify && + grep -o "^$_x40" packa.verify |sort >packa.objects && + grep -o "^$_x40" packb.verify |sort >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.0.431.g3cb5c84 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-07-29 7:47 ` [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Kirill Smelkov @ 2016-08-08 13:56 ` Jeff King 2016-08-08 15:40 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Jeff King @ 2016-08-08 13:56 UTC (permalink / raw) To: Kirill Smelkov Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Fri, Jul 29, 2016 at 10:47:46AM +0300, Kirill Smelkov wrote: > @@ -2527,7 +2528,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) > if (prepare_bitmap_walk(revs) < 0) > return -1; > > - if (pack_options_allow_reuse() && > + if (pack_options_allow_reuse() && pack_to_stdout && > !reuse_partial_packfile_from_bitmap( Should pack_to_stdout just be part of pack_options_allow_reuse()? > @@ -2812,7 +2813,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > unpack_unreachable_expiration = 0; > > - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) > + /* > + * "soft" reasons not to use bitmaps - for on-disk repack by default we want > + * > + * - to produce good pack (with bitmap index not-yet-packed objects are > + * packed in suboptimal order). > + * > + * - to use more robust pack-generation codepath (avoiding possible > + * bugs in bitmap code and possible bitmap index corruption). > + */ > + if (!pack_to_stdout) > + use_bitmap_index_default = 0; > + > + if (use_bitmap_index < 0) > + use_bitmap_index = use_bitmap_index_default; > + > + /* "hard" reasons not to use bitmaps; these just won't work at all */ > + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) > use_bitmap_index = 0; This all makes sense and looks good. > +test_expect_success 'pack-objects to file can use bitmap' ' > + # make sure we still have 1 bitmap index from previous tests > + ls .git/objects/pack/ | grep bitmap >output && > + test_line_count = 1 output && > + # verify equivalent packs are generated with/without using bitmap index > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > + git verify-pack -v packa-$packasha1.pack >packa.verify && > + git verify-pack -v packb-$packbsha1.pack >packb.verify && > + grep -o "^$_x40" packa.verify |sort >packa.objects && > + grep -o "^$_x40" packb.verify |sort >packb.objects && > + test_cmp packa.objects packb.objects > +' I don't think "grep -o" is portable. However, an easier way to do this is probably: # these are already in sorted order git show-index <packa-$packasha1.pack | cut -d' ' -f2 >packa.objects && git show-index <packb-$packbsha1.pack | cut -d' ' -f2 >packb.objects && test_cmp packa.objects packb.objects -Peff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 13:56 ` Jeff King @ 2016-08-08 15:40 ` Kirill Smelkov 2016-08-08 18:08 ` Junio C Hamano 2016-08-08 18:55 ` [PATCH v5] pack-objects: teach " Kirill Smelkov 0 siblings, 2 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 15:40 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 09:56:00AM -0400, Jeff King wrote: > On Fri, Jul 29, 2016 at 10:47:46AM +0300, Kirill Smelkov wrote: > > > @@ -2527,7 +2528,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs) > > if (prepare_bitmap_walk(revs) < 0) > > return -1; > > > > - if (pack_options_allow_reuse() && > > + if (pack_options_allow_reuse() && pack_to_stdout && > > !reuse_partial_packfile_from_bitmap( > > Should pack_to_stdout just be part of pack_options_allow_reuse()? Yes, makes sense; thanks for catching this. > > @@ -2812,7 +2813,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > > if (!rev_list_all || !rev_list_reflog || !rev_list_index) > > unpack_unreachable_expiration = 0; > > > > - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) > > + /* > > + * "soft" reasons not to use bitmaps - for on-disk repack by default we want > > + * > > + * - to produce good pack (with bitmap index not-yet-packed objects are > > + * packed in suboptimal order). > > + * > > + * - to use more robust pack-generation codepath (avoiding possible > > + * bugs in bitmap code and possible bitmap index corruption). > > + */ > > + if (!pack_to_stdout) > > + use_bitmap_index_default = 0; > > + > > + if (use_bitmap_index < 0) > > + use_bitmap_index = use_bitmap_index_default; > > + > > + /* "hard" reasons not to use bitmaps; these just won't work at all */ > > + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) > > use_bitmap_index = 0; > > This all makes sense and looks good. Thanks. > > +test_expect_success 'pack-objects to file can use bitmap' ' > > + # make sure we still have 1 bitmap index from previous tests > > + ls .git/objects/pack/ | grep bitmap >output && > > + test_line_count = 1 output && > > + # verify equivalent packs are generated with/without using bitmap index > > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > > + git verify-pack -v packa-$packasha1.pack >packa.verify && > > + git verify-pack -v packb-$packbsha1.pack >packb.verify && > > + grep -o "^$_x40" packa.verify |sort >packa.objects && > > + grep -o "^$_x40" packb.verify |sort >packb.objects && > > + test_cmp packa.objects packb.objects > > +' > > I don't think "grep -o" is portable. However, an easier way to do this > is probably: > > # these are already in sorted order > git show-index <packa-$packasha1.pack | cut -d' ' -f2 >packa.objects && > git show-index <packb-$packbsha1.pack | cut -d' ' -f2 >packb.objects && > test_cmp packa.objects packb.objects Thanks for the info. I did not knew about show-index when I was starting to work on this and later it just came out of sight. Please find corrected patch below. ---- 8< ---- From: Kirill Smelkov <kirr@nexedi.com> Date: Fri, 29 Jul 2016 10:47:46 +0300 Subject: [PATCH v5] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff Kind further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://marc.info/?t=146792101400001&r=1&w=2 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 31 ++++++++++++++++++++++++------- t/t5310-pack-bitmaps.sh | 12 ++++++++++++ 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index bc1c433..4ba0c4a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2244,6 +2244,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 92e2e5f..0a89e8d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2226,7 +2227,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2475,13 +2476,13 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } /* - * This tracks any options which a reader of the pack might - * not understand, and which would therefore prevent blind reuse - * of what we have on disk. + * This tracks any options which pack-reuse code expects to be on, or which a + * reader of the pack might not understand, and which would therefore prevent + * blind reuse of what we have on disk. */ static int pack_options_allow_reuse(void) { - return allow_ofs_delta; + return pack_to_stdout && allow_ofs_delta; } static int get_object_list_from_bitmap(struct rev_info *revs) @@ -2774,7 +2775,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..ffecc6a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,18 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git show-index <packa-$packasha1.idx | cut -d" " -f2 >packa.objects && + git show-index <packb-$packbsha1.idx | cut -d" " -f2 >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 15:40 ` Kirill Smelkov @ 2016-08-08 18:08 ` Junio C Hamano 2016-08-08 18:13 ` Kirill Smelkov 2016-08-08 18:55 ` [PATCH v5] pack-objects: teach " Kirill Smelkov 1 sibling, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 18:08 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > Thanks for the info. I did not knew about show-index when I was starting > to work on this and later it just came out of sight. Please find > corrected patch below. > > ---- 8< ---- > From: Kirill Smelkov <kirr@nexedi.com> > Date: Fri, 29 Jul 2016 10:47:46 +0300 > Subject: [PATCH v5] pack-objects: Teach it to use reachability bitmap index when > generating non-stdout pack too Please don't do this (not the patch text itself, but saying "Please find ..." and attaching the patch AFTER 60+ lines of response). When going through old/read messages to see if there are patches that fell through the cracks, if it is not immediately clear in the top part of the message that it contains an updated patch, such a patch will certainly be missed. Please say "I'll follow up with a corrected patch" instead of "Please find ..." and respond to that message with just the patch. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 18:08 ` Junio C Hamano @ 2016-08-08 18:13 ` Kirill Smelkov 2016-08-08 18:28 ` Junio C Hamano 0 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 18:13 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 11:08:34AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > Thanks for the info. I did not knew about show-index when I was starting > > to work on this and later it just came out of sight. Please find > > corrected patch below. > > > > ---- 8< ---- > > From: Kirill Smelkov <kirr@nexedi.com> > > Date: Fri, 29 Jul 2016 10:47:46 +0300 > > Subject: [PATCH v5] pack-objects: Teach it to use reachability bitmap index when > > generating non-stdout pack too > > Please don't do this (not the patch text itself, but saying "Please > find ..." and attaching the patch AFTER 60+ lines of response). > When going through old/read messages to see if there are patches > that fell through the cracks, if it is not immediately clear in the > top part of the message that it contains an updated patch, such a > patch will certainly be missed. > > Please say "I'll follow up with a corrected patch" instead of > "Please find ..." and respond to that message with just the patch. Ok, I see. Should I resend this v5 as separated one or only starting from next time? Another question: I'm preparing another version of "pack-objects: Teach --use-bitmap-index codepath to respect --local ..." and was going to put ( updated patch is in the end of this mail ) in the top of the message. Is it ok or better not to do so and just respin the patch in its own separate mail? Thanks beforehand for clarifying, Kirill P.S. I put updated patches in the same mail not because I'm trying to make maintainer's life harder, but because this is the way I would expect and prefer them to be coming to me... ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 18:13 ` Kirill Smelkov @ 2016-08-08 18:28 ` Junio C Hamano 2016-08-08 18:58 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 18:28 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > Another question: I'm preparing another version of "pack-objects: Teach > --use-bitmap-index codepath to respect --local ..." and was going to > put > > ( updated patch is in the end of this mail ) > > in the top of the message. Is it ok or better not to do so and just respin > the patch in its own separate mail? That would force those who pick leftover bits to _open_ and read a first few lines. Definitely it is better than burying a patch after 60+ lines, but a separate patch with incremented "[PATCH v6 1/2]" on the subject line beats it hands-down from discoverability's point of view. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 18:28 ` Junio C Hamano @ 2016-08-08 18:58 ` Kirill Smelkov 0 siblings, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 18:58 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 11:28:02AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > Another question: I'm preparing another version of "pack-objects: Teach > > --use-bitmap-index codepath to respect --local ..." and was going to > > put > > > > ( updated patch is in the end of this mail ) > > > > in the top of the message. Is it ok or better not to do so and just respin > > the patch in its own separate mail? > > That would force those who pick leftover bits to _open_ and read a > first few lines. > > Definitely it is better than burying a patch after 60+ lines, but a > separate patch with incremented "[PATCH v6 1/2]" on the subject line > beats it hands-down from discoverability's point of view. Thanks, I see. I've resent both patches as separate mails. ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v5] pack-objects: teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 15:40 ` Kirill Smelkov 2016-08-08 18:08 ` Junio C Hamano @ 2016-08-08 18:55 ` Kirill Smelkov 2016-08-08 20:53 ` Junio C Hamano 1 sibling, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-08-08 18:55 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://marc.info/?t=146792101400001&r=1&w=2 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- Documentation/config.txt | 3 +++ builtin/pack-objects.c | 31 ++++++++++++++++++++++++------- t/t5310-pack-bitmaps.sh | 12 ++++++++++++ 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index bc1c433..4ba0c4a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2244,6 +2244,9 @@ pack.useBitmaps:: to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. ++ +*NOTE*: when packing to file (e.g., on repack) the default is always not to use + pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 92e2e5f..0a89e8d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -66,7 +66,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2226,7 +2227,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2475,13 +2476,13 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } /* - * This tracks any options which a reader of the pack might - * not understand, and which would therefore prevent blind reuse - * of what we have on disk. + * This tracks any options which pack-reuse code expects to be on, or which a + * reader of the pack might not understand, and which would therefore prevent + * blind reuse of what we have on disk. */ static int pack_options_allow_reuse(void) { - return allow_ofs_delta; + return pack_to_stdout && allow_ofs_delta; } static int get_object_list_from_bitmap(struct rev_info *revs) @@ -2774,7 +2775,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 3893afd..ffecc6a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -118,6 +118,18 @@ test_expect_success 'incremental repack can disable bitmaps' ' git repack -d --no-write-bitmap-index ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + git show-index <packa-$packasha1.idx | cut -d" " -f2 >packa.objects && + git show-index <packb-$packbsha1.idx | cut -d" " -f2 >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v5] pack-objects: teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 18:55 ` [PATCH v5] pack-objects: teach " Kirill Smelkov @ 2016-08-08 20:53 ` Junio C Hamano 2016-08-09 11:21 ` Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Junio C Hamano @ 2016-08-08 20:53 UTC (permalink / raw) To: Kirill Smelkov Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git Kirill Smelkov <kirr@nexedi.com> writes: > diff --git a/Documentation/config.txt b/Documentation/config.txt > index bc1c433..4ba0c4a 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2244,6 +2244,9 @@ pack.useBitmaps:: > to stdout (e.g., during the server side of a fetch). Defaults to > true. You should not generally need to turn this off unless > you are debugging pack bitmaps. > ++ > +*NOTE*: when packing to file (e.g., on repack) the default is always not to use > + pack bitmaps. This is a bit hard to read and understand. The patched result starts with "When true, git will use bitmap when packing to stdout", i.e. when packing to file, git will not. So this *NOTE* is repeating the same thing. The reader is made to wonder "Why does it need to repeat the same thing? Does this mean when the variable is set, a pack sent to a disk uses the bitmap?" I think what you actually do in the code is to make the variable affect _only_ the standard-output case, and users need a command line option if they want to use bitmap when writing to a file (the code to do so looks correctly done). > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..ffecc6a 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -118,6 +118,18 @@ test_expect_success 'incremental repack can disable bitmaps' ' > git repack -d --no-write-bitmap-index > ' > > +test_expect_success 'pack-objects to file can use bitmap' ' > + # make sure we still have 1 bitmap index from previous tests > + ls .git/objects/pack/ | grep bitmap >output && > + test_line_count = 1 output && > + # verify equivalent packs are generated with/without using bitmap index > + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && > + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && > + git show-index <packa-$packasha1.idx | cut -d" " -f2 >packa.objects && > + git show-index <packb-$packbsha1.idx | cut -d" " -f2 >packb.objects && > + test_cmp packa.objects packb.objects > +' Looks good. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v5] pack-objects: teach it to use reachability bitmap index when generating non-stdout pack too 2016-08-08 20:53 ` Junio C Hamano @ 2016-08-09 11:21 ` Kirill Smelkov 2016-08-09 11:26 ` [PATCH 2/2 v6] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov 0 siblings, 1 reply; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 11:21 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git On Mon, Aug 08, 2016 at 01:53:20PM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@nexedi.com> writes: > > > diff --git a/Documentation/config.txt b/Documentation/config.txt > > index bc1c433..4ba0c4a 100644 > > --- a/Documentation/config.txt > > +++ b/Documentation/config.txt > > @@ -2244,6 +2244,9 @@ pack.useBitmaps:: > > to stdout (e.g., during the server side of a fetch). Defaults to > > true. You should not generally need to turn this off unless > > you are debugging pack bitmaps. > > ++ > > +*NOTE*: when packing to file (e.g., on repack) the default is always not to use > > + pack bitmaps. > > This is a bit hard to read and understand. > > The patched result starts with "When true, git will use bitmap when > packing to stdout", i.e. when packing to file, git will not. So > this *NOTE* is repeating the same thing. The reader is made to > wonder "Why does it need to repeat the same thing? Does this mean > when the variable is set, a pack sent to a disk uses the bitmap?" > > I think what you actually do in the code is to make the variable > affect _only_ the standard-output case, and users need a command > line option if they want to use bitmap when writing to a file (the > code to do so looks correctly done). Yes it is this way how it is programmed. But I've added the note because it is very implicit to me that "When true, git will use bitmap when packing to stdout" means 1) the default for packing-to-file is different and 2) there is no way to set the default for packing-to-file. That's why I added the explicit info. And especially since the config name "pack.useBitmaps" does not contain "stdout" at all it can be very confusing to people looking at this the first time (at least it was so this way for me). Also please recall you wondering why 6b8fda2d added bitmap support only for to-stdout case not even mentioning about why it is done only for that case and not for to-file case). I do not insist on the note however - I only thought it is better to have it - so if you prefer we go without it - let us drop this note. Will send v6 as reply to this mail with below interdiff. Thanks, Kirill ---- 8< ---- (interdiff) --- b/Documentation/config.txt +++ a/Documentation/config.txt @@ -2246,9 +2246,6 @@ to stdout (e.g., during the server side of a fetch). Defaults to true. You should not generally need to turn this off unless you are debugging pack bitmaps. -+ -*NOTE*: when packing to file (e.g., on repack) the default is always not to use - pack bitmaps. pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff -u b/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh --- b/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -219,8 +219,8 @@ # verify equivalent packs are generated with/without using bitmap index packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && - git show-index <packa-$packasha1.idx | cut -d" " -f2 >packa.objects && - git show-index <packb-$packbsha1.idx | cut -d" " -f2 >packb.objects && + pack_list_objects <packa-$packasha1.idx >packa.objects && + pack_list_objects <packb-$packbsha1.idx >packb.objects && test_cmp packa.objects packb.objects ' ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH 2/2 v6] pack-objects: use reachability bitmap index when generating non-stdout pack 2016-08-09 11:21 ` Kirill Smelkov @ 2016-08-09 11:26 ` Kirill Smelkov 0 siblings, 0 replies; 62+ messages in thread From: Kirill Smelkov @ 2016-08-09 11:26 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Vicent Marti, Jérome Perrin, Isabelle Vallet, Kazuhiko Shiozaki, Julien Muchembled, git, Kirill Smelkov Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we care it is not activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: Current code has singleton bitmap_git so cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because on pack reuse raw entries are directly written out to destination pack by write_reused_pack() bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. More context: http://marc.info/?t=146792101400001&r=1&w=2 Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> --- builtin/pack-objects.c | 31 ++++++++++++++++++++++++------- t/t5310-pack-bitmaps.sh | 12 ++++++++++++ 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b1007f2..c92d7fc 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -67,7 +67,8 @@ static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; static off_t reuse_packfile_offset; -static int use_bitmap_index = 1; +static int use_bitmap_index_default = 1; +static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; @@ -2270,7 +2271,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE; } if (!strcmp(k, "pack.usebitmaps")) { - use_bitmap_index = git_config_bool(k, v); + use_bitmap_index_default = git_config_bool(k, v); return 0; } if (!strcmp(k, "pack.threads")) { @@ -2519,13 +2520,13 @@ static void loosen_unused_packed_objects(struct rev_info *revs) } /* - * This tracks any options which a reader of the pack might - * not understand, and which would therefore prevent blind reuse - * of what we have on disk. + * This tracks any options which pack-reuse code expects to be on, or which a + * reader of the pack might not understand, and which would therefore prevent + * blind reuse of what we have on disk. */ static int pack_options_allow_reuse(void) { - return allow_ofs_delta; + return pack_to_stdout && allow_ofs_delta; } static int get_object_list_from_bitmap(struct rev_info *revs) @@ -2818,7 +2819,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow()) + /* + * "soft" reasons not to use bitmaps - for on-disk repack by default we want + * + * - to produce good pack (with bitmap index not-yet-packed objects are + * packed in suboptimal order). + * + * - to use more robust pack-generation codepath (avoiding possible + * bugs in bitmap code and possible bitmap index corruption). + */ + if (!pack_to_stdout) + use_bitmap_index_default = 0; + + if (use_bitmap_index < 0) + use_bitmap_index = use_bitmap_index_default; + + /* "hard" reasons not to use bitmaps; these just won't work at all */ + if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow()) use_bitmap_index = 0; if (pack_to_stdout || !rev_list_all) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index a50d867..44914ac 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -196,6 +196,18 @@ test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' ' ! has_any packbitmap.objects 3b.objects ' +test_expect_success 'pack-objects to file can use bitmap' ' + # make sure we still have 1 bitmap index from previous tests + ls .git/objects/pack/ | grep bitmap >output && + test_line_count = 1 output && + # verify equivalent packs are generated with/without using bitmap index + packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) && + packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) && + pack_list_objects <packa-$packasha1.idx >packa.objects && + pack_list_objects <packb-$packbsha1.idx >packb.objects && + test_cmp packa.objects packb.objects +' + test_expect_success 'full repack, reusing previous bitmaps' ' git repack -ad && ls .git/objects/pack/ | grep bitmap >output && -- 2.9.2.701.gf965a18.dirty ^ permalink raw reply related [flat|nested] 62+ messages in thread
end of thread, other threads:[~2016-09-13 7:50 UTC | newest] Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-07 19:09 [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too Kirill Smelkov 2016-07-07 20:52 ` Jeff King 2016-07-08 10:38 ` Kirill Smelkov 2016-07-12 19:08 ` Kirill Smelkov 2016-07-13 8:30 ` Jeff King 2016-07-13 8:26 ` Jeff King 2016-07-13 10:52 ` Kirill Smelkov 2016-07-17 17:06 ` Kirill Smelkov 2016-07-19 11:29 ` Jeff King 2016-07-19 12:14 ` Kirill Smelkov 2016-07-25 18:40 ` Jeff King 2016-07-25 18:53 ` Jeff King 2016-07-27 20:15 ` Kirill Smelkov 2016-07-27 20:40 ` Junio C Hamano 2016-07-28 20:22 ` Kirill Smelkov 2016-07-28 21:18 ` Junio C Hamano 2016-07-29 7:40 ` Kirill Smelkov 2016-07-29 7:46 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Kirill Smelkov 2016-08-01 18:17 ` Junio C Hamano 2016-08-08 12:37 ` Kirill Smelkov 2016-08-08 13:50 ` Jeff King 2016-08-08 13:51 ` Jeff King 2016-08-08 16:08 ` Junio C Hamano 2016-08-08 19:06 ` Junio C Hamano 2016-08-08 19:09 ` Jeff King 2016-08-08 16:11 ` Junio C Hamano 2016-08-08 18:19 ` Kirill Smelkov 2016-08-08 18:57 ` [PATCH v3] " Kirill Smelkov 2016-08-08 19:26 ` [PATCH 1/2] " Junio C Hamano 2016-08-09 11:21 ` Kirill Smelkov 2016-08-09 11:25 ` [PATCH 1/2 v4] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov 2016-08-09 16:52 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 2016-08-09 19:29 ` Kirill Smelkov 2016-08-09 19:31 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Kirill Smelkov 2016-08-18 17:52 ` Jeff King 2016-09-10 14:57 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 1/2 v8] " Kirill Smelkov 2016-09-13 6:23 ` Junio C Hamano 2016-09-13 7:50 ` Kirill Smelkov 2016-09-10 15:05 ` [PATCH] t/perf/run: Don't forget to copy config.mak.autogen & friends to build area Kirill Smelkov 2016-09-12 19:12 ` Junio C Hamano 2016-09-12 19:17 ` Junio C Hamano 2016-09-12 23:10 ` Junio C Hamano 2016-09-13 6:58 ` Kirill Smelkov 2016-09-12 17:33 ` [PATCH 1/2 v5] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Junio C Hamano 2016-08-09 19:32 ` [PATCH 2/2 v7] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov 2016-08-18 18:06 ` Jeff King 2016-09-10 14:59 ` Kirill Smelkov 2016-09-10 15:01 ` [PATCH 2/2 v8] " Kirill Smelkov 2016-09-12 19:21 ` [PATCH 2/2 v7] " Junio C Hamano 2016-08-09 19:49 ` [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental Junio C Hamano 2016-07-29 7:47 ` [PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too Kirill Smelkov 2016-08-08 13:56 ` Jeff King 2016-08-08 15:40 ` Kirill Smelkov 2016-08-08 18:08 ` Junio C Hamano 2016-08-08 18:13 ` Kirill Smelkov 2016-08-08 18:28 ` Junio C Hamano 2016-08-08 18:58 ` Kirill Smelkov 2016-08-08 18:55 ` [PATCH v5] pack-objects: teach " Kirill Smelkov 2016-08-08 20:53 ` Junio C Hamano 2016-08-09 11:21 ` Kirill Smelkov 2016-08-09 11:26 ` [PATCH 2/2 v6] pack-objects: use reachability bitmap index when generating non-stdout pack Kirill Smelkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).