* [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs @ 2017-02-25 1:18 Jonathan Tan 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Jonathan Tan @ 2017-02-25 1:18 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, gitster, peff, peartben, benpeart As stated in a previous e-mail [1], I was trying to think a way to allow Git to fetch arbitrary blobs from another Git server, and it turned out that fetch-pack already can. However, there were some bugs with blob reachability. This patch set fixes those bugs, and verifies (with tests) that fetch-pack can fetch reachable blobs and cannot fetch unreachable blobs. These patches are (I think) worthwhile on their own, but may be of special interest to people who need Git to tolerate missing objects in the local repo (for example, the e-mail discussion "[RFC] Add support for downloading blobs on demand" [2]) because a way for Git to download missing objects natively is (I think) a prerequisite to that. [1] <20170223230358.30050-1-jonathantanmy@google.com> [2] <20170113155253.1644-1-benpeart@microsoft.com> Jonathan Tan (3): revision: unify {tree,blob}_objects in rev_info revision: exclude trees/blobs given commit upload-pack: compute blob reachability correctly bisect.c | 2 +- builtin/rev-list.c | 6 ++-- list-objects.c | 4 +-- pack-bitmap-write.c | 3 +- pack-bitmap.c | 3 +- reachable.c | 3 +- revision.c | 18 +++++----- revision.h | 3 +- t/t5500-fetch-pack.sh | 30 +++++++++++++++++ t/t6000-rev-list-misc.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ upload-pack.c | 15 +++++++++ 11 files changed, 151 insertions(+), 24 deletions(-) -- 2.11.0.483.g087da7b7c-goog ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info 2017-02-25 1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan @ 2017-02-25 1:18 ` Jonathan Tan 2017-02-28 21:42 ` Junio C Hamano 2017-02-28 22:06 ` Jeff King 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan 2017-02-25 1:18 ` [PATCH 3/3] upload-pack: compute blob reachability correctly Jonathan Tan 2 siblings, 2 replies; 17+ messages in thread From: Jonathan Tan @ 2017-02-25 1:18 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, gitster, peff, peartben, benpeart Whenever tree_objects is set to 1 in revision.h's struct rev_info, blob_objects is likewise set, and vice versa. Combine those two fields into one. Some of the existing code does not handle tree_objects being different from blob_objects properly. For example, "handle_commit" in revision.c recurses from an UNINTERESTING tree into its subtree if tree_objects == 1, completely ignoring blob_objects; it probably should still recurse if tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should behave differently depending on blob_objects (controlling the instantiation and marking of blob objects). This commit resolves the issue by forbidding tree_objects from being different to blob_objects. It could be argued that in the future, Git might need to distinguish tree_objects from blob_objects - in particular, a user might want rev-list to print the trees but not the blobs. However, this results in a minor performance savings at best in that objects no longer need to be instantiated (causing memory allocations and hashtable insertions) - no disk reads are being done for objects whether blob_objects is set or not. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- bisect.c | 2 +- builtin/rev-list.c | 6 +++--- list-objects.c | 4 ++-- pack-bitmap-write.c | 3 +-- pack-bitmap.c | 3 +-- reachable.c | 3 +-- revision.c | 16 ++++++---------- revision.h | 3 +-- 8 files changed, 16 insertions(+), 24 deletions(-) diff --git a/bisect.c b/bisect.c index 8e63c40d2..265b32905 100644 --- a/bisect.c +++ b/bisect.c @@ -634,7 +634,7 @@ static void bisect_common(struct rev_info *revs) { if (prepare_revision_walk(revs)) die("revision walk setup failed"); - if (revs->tree_objects) + if (revs->tree_and_blob_objects) mark_edges_uninteresting(revs, NULL); } diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 0aa93d589..6c2651b31 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -345,7 +345,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) revs.commit_format = CMIT_FMT_RAW; if ((!revs.commits && - (!(revs.tag_objects || revs.tree_objects || revs.blob_objects) && + (!(revs.tag_objects || revs.tree_and_blob_objects) && !revs.pending.nr)) || revs.diff) usage(rev_list_usage); @@ -374,7 +374,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) return 0; } } else if (revs.max_count < 0 && - revs.tag_objects && revs.tree_objects && revs.blob_objects) { + revs.tag_objects && revs.tree_and_blob_objects) { if (!prepare_bitmap_walk(&revs)) { traverse_bitmap_commit_list(&show_object_fast); return 0; @@ -384,7 +384,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (prepare_revision_walk(&revs)) die("revision walk setup failed"); - if (revs.tree_objects) + if (revs.tree_and_blob_objects) mark_edges_uninteresting(&revs, show_edge); if (bisect_list) { diff --git a/list-objects.c b/list-objects.c index f3ca6aafb..796957105 100644 --- a/list-objects.c +++ b/list-objects.c @@ -18,7 +18,7 @@ static void process_blob(struct rev_info *revs, struct object *obj = &blob->object; size_t pathlen; - if (!revs->blob_objects) + if (!revs->tree_and_blob_objects) return; if (!obj) die("bad blob object"); @@ -78,7 +78,7 @@ static void process_tree(struct rev_info *revs, all_entries_interesting: entry_not_interesting; int baselen = base->len; - if (!revs->tree_objects) + if (!revs->tree_and_blob_objects) return; if (!obj) die("bad tree object"); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 970559601..5080e276b 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -258,8 +258,7 @@ void bitmap_writer_build(struct packing_data *to_pack) init_revisions(&revs, NULL); revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; + revs.tree_and_blob_objects = 1; revs.no_walk = 0; revs.include_check = should_include; diff --git a/pack-bitmap.c b/pack-bitmap.c index 39bcc1684..445a24e0d 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -951,8 +951,7 @@ void test_bitmap_walk(struct rev_info *revs) die("Commit %s doesn't have an indexed bitmap", oid_to_hex(&root->oid)); revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; result_popcnt = bitmap_popcount(result); diff --git a/reachable.c b/reachable.c index d0199cace..9a3beb75f 100644 --- a/reachable.c +++ b/reachable.c @@ -166,8 +166,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog, * in all object types, not just commits. */ revs->tag_objects = 1; - revs->blob_objects = 1; - revs->tree_objects = 1; + revs->tree_and_blob_objects = 1; /* Add all refs from the index file */ add_index_objects_to_pending(revs, 0); diff --git a/revision.c b/revision.c index b37dbec37..5e49d9e0e 100644 --- a/revision.c +++ b/revision.c @@ -267,7 +267,7 @@ static struct commit *handle_commit(struct rev_info *revs, */ if (object->type == OBJ_TREE) { struct tree *tree = (struct tree *)object; - if (!revs->tree_objects) + if (!revs->tree_and_blob_objects) return NULL; if (flags & UNINTERESTING) { mark_tree_contents_uninteresting(tree); @@ -281,7 +281,7 @@ static struct commit *handle_commit(struct rev_info *revs, * Blob object? You know the drill by now.. */ if (object->type == OBJ_BLOB) { - if (!revs->blob_objects) + if (!revs->tree_and_blob_objects) return NULL; if (flags & UNINTERESTING) return NULL; @@ -1817,23 +1817,19 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->limited = 1; } else if (!strcmp(arg, "--objects")) { revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; } else if (!strcmp(arg, "--objects-edge")) { revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; revs->edge_hint = 1; } else if (!strcmp(arg, "--objects-edge-aggressive")) { revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; revs->edge_hint = 1; revs->edge_hint_aggressive = 1; } else if (!strcmp(arg, "--verify-objects")) { revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; revs->verify_objects = 1; } else if (!strcmp(arg, "--unpacked")) { revs->unpacked = 1; diff --git a/revision.h b/revision.h index 9fac1a607..43cce137f 100644 --- a/revision.h +++ b/revision.h @@ -89,8 +89,7 @@ struct rev_info { simplify_merges:1, simplify_by_decoration:1, tag_objects:1, - tree_objects:1, - blob_objects:1, + tree_and_blob_objects:1, verify_objects:1, edge_hint:1, edge_hint_aggressive:1, -- 2.11.0.483.g087da7b7c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan @ 2017-02-28 21:42 ` Junio C Hamano 2017-02-28 21:59 ` Jeff King 2017-02-28 22:06 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2017-02-28 21:42 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, peff, peartben, benpeart Jonathan Tan <jonathantanmy@google.com> writes: > It could be argued that in the future, Git might need to distinguish > tree_objects from blob_objects - in particular, a user might want > rev-list to print the trees but not the blobs. That was exactly why these bits were originally made to "appear independent but in practice nobody sets only one and leaves others off". And it didn't happen in the past 10 years, which tells us that we should take this patch. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info 2017-02-28 21:42 ` Junio C Hamano @ 2017-02-28 21:59 ` Jeff King 2017-03-02 18:36 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2017-02-28 21:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git, peartben, benpeart On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > > > It could be argued that in the future, Git might need to distinguish > > tree_objects from blob_objects - in particular, a user might want > > rev-list to print the trees but not the blobs. > > That was exactly why these bits were originally made to "appear > independent but in practice nobody sets only one and leaves others > off". > > And it didn't happen in the past 10 years, which tells us that we > should take this patch. I actually have a patch which uses the distinction. It's for upload-archive doing reachability checks (which seems rather familiar to what's going on here). The whole series (from 2013!) is at: git://github.com/peff/git jk/archive-reachability but the relevant commits are below. I don't think the same logic holds for this case, though, because somebody actually can ask for a single blob. -- >8 -- From: Jeff King <peff@peff.net> Date: Wed, 5 Jun 2013 17:57:02 -0400 Subject: [PATCH] list-objects: optimize "revs->blob_objects = 0" case If we are traversing trees during a "--objects" traversal, we may skip blobs if the "blob_objects" field of rev_info is not set. But we do so as the first thing in process_blob(), only after we have actually created the "struct blob" object, incurring a hash lookup. We can optimize out this no-op call completely. This does not actually affect any current code, as all of the current traversals always set blob_objects when looking at objects, anyway. Signed-off-by: Jeff King <peff@peff.net> --- list-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/list-objects.c b/list-objects.c index f3ca6aafb..58ad69557 100644 --- a/list-objects.c +++ b/list-objects.c @@ -117,7 +117,7 @@ static void process_tree(struct rev_info *revs, process_gitlink(revs, entry.oid->hash, show, base, entry.path, cb_data); - else + else if (revs->blob_objects) process_blob(revs, lookup_blob(entry.oid->hash), show, base, entry.path, -- 2.12.0.359.gd4c8c42e9 -- >8 -- From: Jeff King <peff@peff.net> Date: Wed, 5 Jun 2013 18:02:42 -0400 Subject: [PATCH] archive: ignore blob objects when checking reachability We cannot create an archive from a blob object, so we would not expect anyone to provide one to us. And if they do, we will fail anyway just after the reachability check. We can therefore optimize our reachability check to ignore blobs completely, and not even create a "struct blob" for them. Depending on the repository size and the exact place we find the reachable object in the traversal, this can save 20-25%, a we can avoid many lookups in the object hash. The downside of this is that a blob provided to a remote archive process will fail with "no such object" rather than "object is not a tree" (we could organize the code to retain the old message, but since we no longer know whether the blob is reachable or not, we would potentially be leaking information about the existence of unreachable objects). Signed-off-by: Jeff King <peff@peff.net> --- archive.c | 1 + 1 file changed, 1 insertion(+) diff --git a/archive.c b/archive.c index ef89b2556..489115f9f 100644 --- a/archive.c +++ b/archive.c @@ -383,6 +383,7 @@ static int object_is_reachable(const unsigned char *sha1) save_commit_buffer = 0; init_revisions(&data.revs, NULL); setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL); + data.revs.blob_objects = 0; if (prepare_revision_walk(&data.revs)) return 0; -- 2.12.0.359.gd4c8c42e9 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info 2017-02-28 21:59 ` Jeff King @ 2017-03-02 18:36 ` Junio C Hamano 0 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2017-03-02 18:36 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, git, peartben, benpeart Jeff King <peff@peff.net> writes: > On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote: > >> Jonathan Tan <jonathantanmy@google.com> writes: >> >> > It could be argued that in the future, Git might need to distinguish >> > tree_objects from blob_objects - in particular, a user might want >> > rev-list to print the trees but not the blobs. >> >> That was exactly why these bits were originally made to "appear >> independent but in practice nobody sets only one and leaves others >> off". >> >> And it didn't happen in the past 10 years, which tells us that we >> should take this patch. > > I actually have a patch which uses the distinction. It's for > upload-archive doing reachability checks (which seems rather familiar to > what's going on here). OK. Thanks for stopping me ;-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan 2017-02-28 21:42 ` Junio C Hamano @ 2017-02-28 22:06 ` Jeff King 1 sibling, 0 replies; 17+ messages in thread From: Jeff King @ 2017-02-28 22:06 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster, peartben, benpeart On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote: > Whenever tree_objects is set to 1 in revision.h's struct rev_info, > blob_objects is likewise set, and vice versa. Combine those two fields > into one. > > Some of the existing code does not handle tree_objects being different > from blob_objects properly. For example, "handle_commit" in revision.c > recurses from an UNINTERESTING tree into its subtree if tree_objects == > 1, completely ignoring blob_objects; it probably should still recurse if > tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should > behave differently depending on blob_objects (controlling the > instantiation and marking of blob objects). This commit resolves the > issue by forbidding tree_objects from being different to blob_objects. Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is set, then tree_objects must also be set". It's the other way around I care more about. > It could be argued that in the future, Git might need to distinguish > tree_objects from blob_objects - in particular, a user might want > rev-list to print the trees but not the blobs. However, this results in > a minor performance savings at best in that objects no longer need to be > instantiated (causing memory allocations and hashtable insertions) - no > disk reads are being done for objects whether blob_objects is set or > not. In a full object-graph traversal, we actually spend a big chunk of our time in hash lookups. My measurements (admittedly from 2013, which I haven't repeated lately) show something like a 20-25% speedup for this case. My only use for it (and the source of those timings) was to compute archive reachability, which nobody seems to care too much about. But I suspect we could speed up your case, too, when we are just computing the reachability of a non-blob. I.e., you should be able to turn on the smallest subset of "commits only", "commits and trees", and "commits, trees, and blobs", based on what the other side has asked for. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/3] revision: exclude trees/blobs given commit 2017-02-25 1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan @ 2017-02-25 1:18 ` Jonathan Tan 2017-02-28 21:44 ` Junio C Hamano ` (2 more replies) 2017-02-25 1:18 ` [PATCH 3/3] upload-pack: compute blob reachability correctly Jonathan Tan 2 siblings, 3 replies; 17+ messages in thread From: Jonathan Tan @ 2017-02-25 1:18 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, gitster, peff, peartben, benpeart When the --objects argument is given to rev-list, an argument of the form "^$tree" can be given to exclude all blobs and trees reachable from that tree, but an argument of the form "^$commit" only excludes that commit, not any blob or tree reachable from it. Make "^$commit" behave consistent to "^$tree". Also, formalize this behavior in unit tests. (Some of the added tests would already pass even before this commit, but are included nevertheless for completeness.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- revision.c | 2 ++ t/t6000-rev-list-misc.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+) diff --git a/revision.c b/revision.c index 5e49d9e0e..e6a62da98 100644 --- a/revision.c +++ b/revision.c @@ -254,6 +254,8 @@ static struct commit *handle_commit(struct rev_info *revs, die("unable to parse commit %s", name); if (flags & UNINTERESTING) { mark_parents_uninteresting(commit); + if (revs->tree_and_blob_objects) + mark_tree_uninteresting(commit->tree); revs->limited = 1; } if (revs->show_source && !commit->util) diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 969e4e9e5..c04a9582b 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -114,4 +114,92 @@ test_expect_success '--header shows a NUL after each commit' ' test_cmp expect actual ' +test_expect_success 'setup tree-granularity and blob-granularity tests' ' + echo 0 >file && + + mkdir subdir && + echo 1 >subdir/file && + git add file subdir/file && + git commit -m one && + + echo 2 >subdir/file && + git add subdir/file && + git commit -m two && + + commit1=$(git rev-parse HEAD^1) && + commit2=$(git rev-parse HEAD) && + tree1=$(git rev-parse $commit1^{tree}) && + tree2=$(git rev-parse $commit2^{tree}) && + subtree1=$(git cat-file -p $tree1 | grep "subdir" | cut -c13-52) && + subtree2=$(git cat-file -p $tree2 | grep "subdir" | cut -c13-52) && + blob0=$(echo 0 | git hash-object --stdin) && + blob1=$(echo 1 | git hash-object --stdin) && + blob2=$(echo 2 | git hash-object --stdin) +' + +test_expect_success 'include commit, exclude blob' ' + git rev-list --objects $commit2 >out && + grep "$blob1" out && + grep "$blob2" out && + + git rev-list --objects $commit2 "^$blob2" >out && + grep "$blob1" out && + ! grep "$blob2" out +' + +test_expect_success 'include commit, exclude tree (also excludes nested trees/blobs)' ' + git rev-list --objects $commit2 "^$tree2" >out && + grep "$tree1" out && + grep "$subtree1" out && + grep "$blob1" out && + ! grep "$tree2" out && + ! grep "$subtree2" out && + ! grep "$blob2" out && + + git rev-list --objects $commit2 "^$subtree2" >out && + grep "$tree1" out && + grep "$subtree1" out && + grep "$blob1" out && + grep "$tree2" out && + ! grep "$subtree2" out && + ! grep "$blob2" out +' + +test_expect_success 'include tree, exclude commit' ' + git rev-list --objects "$tree1" "^$commit2" >out && + ! grep "$blob0" out && # common to both + grep "$blob1" out && # only in tree + ! grep "$blob2" out # only in commit +' + +test_expect_success 'include tree, exclude tree' ' + git rev-list --objects "$tree1" "^$tree2" >out && + ! grep "$blob0" out && # common to both + grep "$blob1" out && # only in tree1 + ! grep "$blob2" out # only in tree2 +' + +test_expect_success 'include tree, exclude blob' ' + git rev-list --objects "$tree1" "^$blob2" >out && + grep "$blob0" out && + grep "$blob1" out && + ! grep "$blob2" out +' + +test_expect_success 'include blob, exclude commit' ' + git rev-list --objects "$blob2" "^$commit1" >out && + grep "$blob2" out && + + git rev-list --objects "$blob2" "^$commit2" >out && + ! grep "$blob2" out +' + +test_expect_success 'include blob, exclude tree' ' + git rev-list --objects "$blob2" "^$tree1" >out && + grep "$blob2" out && + + git rev-list --objects "$blob2" "^$tree2" >out && + ! grep "$blob2" out +' + test_done -- 2.11.0.483.g087da7b7c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] revision: exclude trees/blobs given commit 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan @ 2017-02-28 21:44 ` Junio C Hamano 2017-02-28 22:12 ` Jeff King 2017-02-28 23:12 ` [PATCH 2/3] revision: exclude trees/blobs given commit Junio C Hamano 2 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2017-02-28 21:44 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, peff, peartben, benpeart Jonathan Tan <jonathantanmy@google.com> writes: > When the --objects argument is given to rev-list, an argument of the > form "^$tree" can be given to exclude all blobs and trees reachable from > that tree, but an argument of the form "^$commit" only excludes that > commit, not any blob or tree reachable from it. Make "^$commit" behave > consistent to "^$tree". > > Also, formalize this behavior in unit tests. (Some of the added tests > would already pass even before this commit, but are included > nevertheless for completeness.) > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > revision.c | 2 ++ > t/t6000-rev-list-misc.sh | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 90 insertions(+) > > diff --git a/revision.c b/revision.c > index 5e49d9e0e..e6a62da98 100644 > --- a/revision.c > +++ b/revision.c > @@ -254,6 +254,8 @@ static struct commit *handle_commit(struct rev_info *revs, > die("unable to parse commit %s", name); > if (flags & UNINTERESTING) { > mark_parents_uninteresting(commit); > + if (revs->tree_and_blob_objects) > + mark_tree_uninteresting(commit->tree); I fear that this may end up to be quite expensive. Can we have a perf test? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] revision: exclude trees/blobs given commit 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan 2017-02-28 21:44 ` Junio C Hamano @ 2017-02-28 22:12 ` Jeff King 2017-03-02 19:50 ` [PATCH] t/perf: export variable used in other blocks Jonathan Tan 2017-02-28 23:12 ` [PATCH 2/3] revision: exclude trees/blobs given commit Junio C Hamano 2 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2017-02-28 22:12 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster, peartben, benpeart On Fri, Feb 24, 2017 at 05:18:37PM -0800, Jonathan Tan wrote: > When the --objects argument is given to rev-list, an argument of the > form "^$tree" can be given to exclude all blobs and trees reachable from > that tree, but an argument of the form "^$commit" only excludes that > commit, not any blob or tree reachable from it. Make "^$commit" behave > consistent to "^$tree". Like Junio, I suspect this is going to be quite expensive. This is similar to the "--objects-edge" and "--objects-edge-aggressive" options, which we had to pull back on the use of because of their expensiveness. (And as an aside, wouldn't those options be the right place for what you're doing?). I also think that the mechanism here is not 100% accurate. The commit traversal will stop once it has painted down, so you're effectively exploring the trees of the merge bases. But older history could mention an object that has resurfaced again (e.g., due to a cherry-pick). Getting the 100% accurate answer is _really_ expensive, though with reachability bitmaps it's not too bad. I just wonder if that's a better approach to be taking. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] t/perf: export variable used in other blocks 2017-02-28 22:12 ` Jeff King @ 2017-03-02 19:50 ` Jonathan Tan 2017-03-03 6:45 ` Jeff King 0 siblings, 1 reply; 17+ messages in thread From: Jonathan Tan @ 2017-03-02 19:50 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, peff, gitster In p0001, a variable was created in a test_expect_success block to be used in later test_perf blocks, but was not exported. This caused the variable to not appear in those blocks (this can be verified by writing 'test -n "$commit"' in those blocks), resulting in a slightly different invocation than what was intended. Export that variable. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Performance indeed does drop significantly with this patch. And I cannot say "at least this is more correct", because it isn't - as Peff thinks, it is indeed still not 100% accurate because of the resurfacing-object issue he mentions (I've verified this by constructing a test that fails even after this patch set). Also, since we don't want to unify tree_ and blob_ (for the reason Peff mentioned in another e-mail), feel free to drop this patch set. Having said that, while looking at the perf test, I noticed an issue with a non-exported variable, which is corrected in this patch. t/perf/p0001-rev-list.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh index 16359d51a..ebf172401 100755 --- a/t/perf/p0001-rev-list.sh +++ b/t/perf/p0001-rev-list.sh @@ -15,7 +15,8 @@ test_perf 'rev-list --all --objects' ' ' test_expect_success 'create new unreferenced commit' ' - commit=$(git commit-tree HEAD^{tree} -p HEAD) + commit=$(git commit-tree HEAD^{tree} -p HEAD) && + test_export commit ' test_perf 'rev-list $commit --not --all' ' -- 2.12.0.rc1.440.g5b76565f74-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] t/perf: export variable used in other blocks 2017-03-02 19:50 ` [PATCH] t/perf: export variable used in other blocks Jonathan Tan @ 2017-03-03 6:45 ` Jeff King 2017-03-03 7:14 ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King 2017-03-03 18:51 ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano 0 siblings, 2 replies; 17+ messages in thread From: Jeff King @ 2017-03-03 6:45 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster On Thu, Mar 02, 2017 at 11:50:41AM -0800, Jonathan Tan wrote: > In p0001, a variable was created in a test_expect_success block to be > used in later test_perf blocks, but was not exported. This caused the > variable to not appear in those blocks (this can be verified by writing > 'test -n "$commit"' in those blocks), resulting in a slightly different > invocation than what was intended. Export that variable. Thanks, this is obviously the right thing to do, and the mistake is mine from ea97002fc (t/perf: time rev-list with UNINTERESTING commits, 2014-01-20). This is not the first time I've been confused by missing variables in t/perf scripts, since it behaves differently than the normal test suite. I wonder if we should turn on "set -a" during t/perf setup snippets. That's a bit of a blunt tool, but I suspect it would just be easier to work with. I was curious that the tests added by ea97002fc showed something useful even with the bug you're fixing here. But it's because the actual slice of history we meant to traverse isn't important. It's intentionally tiny to show off the time spent dealing with the UNINTERESTING commits. So in effect we were traversing no commits instead of a tiny set, but the timing results were the same. I repeated the tests over fbd4a703 given in the commit message of ee9a7002fc and confirmed that it behaves the same with your fixed version of the test. I did have to tweak a few other things to get the test to run against such an old version of git, though. I'll follow-up with a patch. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps 2017-03-03 6:45 ` Jeff King @ 2017-03-03 7:14 ` Jeff King 2017-03-03 7:36 ` [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git Jeff King 2017-03-03 18:51 ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano 1 sibling, 1 reply; 17+ messages in thread From: Jeff King @ 2017-03-03 7:14 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster On Fri, Mar 03, 2017 at 01:45:12AM -0500, Jeff King wrote: > I repeated the tests over fbd4a703 given in the commit message of > ee9a7002fc and confirmed that it behaves the same with your fixed > version of the test. I did have to tweak a few other things to get the > test to run against such an old version of git, though. I'll follow-up > with a patch. Here's that patch. -- >8 -- Subject: [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Since 1a0962dee (t/perf: fix regression in testing older versions of git, 2016-06-22), we point "$MODERN_GIT" to a copy of git that matches the t/perf script itself, and which can be used for tasks outside of the actual timings. This is needed because the setup done by perf scripts keeps moving forward in time, and may use features that the older versions of git we are testing do not have. That commit used $MODERN_GIT to fix a case where we relied on the relatively recent --git-path option. But if you go back further still, there are more problems. Since 7501b5921 (perf: make the tests work in worktrees, 2016-05-13), we use "git -C", but versions of git older than 44e1e4d67 (git: run in a directory given with -C option, 2013-09-09) don't know about "-C". So testing an old version of git with a new version of t/perf will fail the setup step. We can fix this by using $MODERN_GIT during the setup; there's no need to use the antique version, since it doesn't affect the timings. Likewise, we'll adjust the "init" invocation; antique versions of git called this "init-db". Signed-off-by: Jeff King <peff@peff.net> --- With this patch I was able to run p0001 against v1.7.0. I don't think we can go further back than that because the perf library depends on the presence of bin-wrappers. That's probably enough. Unlike the t/interop library I proposed recently it's not that interesting to go really far back in time (and I did hack around the bin-wrappers thing in t/interop; you really can test against v1.0.0 there). t/perf/perf-lib.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index 46f08ee08..ab4b8b06a 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -83,7 +83,7 @@ test_perf_create_repo_from () { error "bug in the test script: not 2 parameters to test-create-repo" repo="$1" source="$2" - source_git="$(git -C "$source" rev-parse --git-dir)" + source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" mkdir -p "$repo/.git" ( @@ -102,7 +102,7 @@ test_perf_create_repo_from () { ) && ( cd "$repo" && - git init -q && { + "$MODERN_GIT" init -q && { test_have_prereq SYMLINKS || git config core.symlinks false } && -- 2.12.0.385.gdf4947bc7 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git 2017-03-03 7:14 ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King @ 2017-03-03 7:36 ` Jeff King 0 siblings, 0 replies; 17+ messages in thread From: Jeff King @ 2017-03-03 7:36 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster On Fri, Mar 03, 2017 at 02:14:03AM -0500, Jeff King wrote: > With this patch I was able to run p0001 against v1.7.0. I don't think we > can go further back than that because the perf library depends on the > presence of bin-wrappers. That's probably enough. Unlike the t/interop > library I proposed recently it's not that interesting to go really far > back in time (and I did hack around the bin-wrappers thing in t/interop; > you really can test against v1.0.0 there). This is easy to fix (see below). I doubt anybody cares, but it's probably worth fixing just because the failure mode (quietly running whatever git is in your PATH) is so confusing. It would also be an improvement to just detect the situation and die(), but this is literally not any more effort. -- >8 -- Subject: [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git It's tempting to say: ./run v1.0.0 HEAD to see how we've sped up Git over the years. Unfortunately, this doesn't quite work because versions of Git prior to v1.7.0 lack bin-wrappers, so our "run" script doesn't correctly put them in the PATH. Worse, it means we silently find whatever other "git" is in the PATH, and produce test results that have no bearing on what we asked for. Let's fallback to the main git directory when bin-wrappers isn't present. Many modern perf scripts won't run with such an antique version of Git, of course, but at least those failures are detected and reported (and you're free to write a limited perf script that works across many versions). Signed-off-by: Jeff King <peff@peff.net> --- t/perf/run | 3 +++ 1 file changed, 3 insertions(+) diff --git a/t/perf/run b/t/perf/run index e8adedadf..c788d713a 100755 --- a/t/perf/run +++ b/t/perf/run @@ -63,6 +63,9 @@ run_dirs_helper () { unset GIT_TEST_INSTALLED else GIT_TEST_INSTALLED="$mydir/bin-wrappers" + # Older versions of git lacked bin-wrappers; fallback to the + # files in the root. + test -d "$GIT_TEST_INSTALLED" || GIT_TEST_INSTALLED=$mydir export GIT_TEST_INSTALLED fi run_one_dir "$@" -- 2.12.0.385.gdf4947bc7 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] t/perf: export variable used in other blocks 2017-03-03 6:45 ` Jeff King 2017-03-03 7:14 ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King @ 2017-03-03 18:51 ` Junio C Hamano 2017-03-03 22:31 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2017-03-03 18:51 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, git Jeff King <peff@peff.net> writes: > Thanks, this is obviously the right thing to do, and the mistake is mine > from ea97002fc (t/perf: time rev-list with UNINTERESTING commits, > 2014-01-20). This is not the first time I've been confused by missing > variables in t/perf scripts, since it behaves differently than the > normal test suite. I wonder if we should turn on "set -a" during t/perf > setup snippets. That's a bit of a blunt tool, but I suspect it would > just be easier to work with. I wonder if we can make t/perf to behave more similar to the normal test suite to eliminate the need for this exporting, by the way. t/perf/README does not say anything more than "for lack of better options" throughout its history, which does not help very much. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] t/perf: export variable used in other blocks 2017-03-03 18:51 ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano @ 2017-03-03 22:31 ` Jeff King 0 siblings, 0 replies; 17+ messages in thread From: Jeff King @ 2017-03-03 22:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git On Fri, Mar 03, 2017 at 10:51:57AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Thanks, this is obviously the right thing to do, and the mistake is mine > > from ea97002fc (t/perf: time rev-list with UNINTERESTING commits, > > 2014-01-20). This is not the first time I've been confused by missing > > variables in t/perf scripts, since it behaves differently than the > > normal test suite. I wonder if we should turn on "set -a" during t/perf > > setup snippets. That's a bit of a blunt tool, but I suspect it would > > just be easier to work with. > > I wonder if we can make t/perf to behave more similar to the normal > test suite to eliminate the need for this exporting, by the way. > t/perf/README does not say anything more than "for lack of better > options" throughout its history, which does not help very much. The problem is that it has to run each block multiple times under an external "time" program. If you assume that the shell running the perf suite is bash, you can do: i=0 foo() { echo $i; i=$(($i+1)); } for run in 1 2 3; do time foo; done which shows that you can get repeated timings from a shell function. That shell function would run the actual test snippet, and would have to do some redirect trickery to split the "time" results from the stderr of the underlying test. And parse bash's time output. But those are all do-able things. The main thing is throwing out a dependency on an external "time" for "bash". That may not be a big loss, though. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] revision: exclude trees/blobs given commit 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan 2017-02-28 21:44 ` Junio C Hamano 2017-02-28 22:12 ` Jeff King @ 2017-02-28 23:12 ` Junio C Hamano 2 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2017-02-28 23:12 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, peff, peartben, benpeart Jonathan Tan <jonathantanmy@google.com> writes: > When the --objects argument is given to rev-list, an argument of the > form "^$tree" can be given to exclude all blobs and trees reachable from > that tree, but an argument of the form "^$commit" only excludes that > commit, not any blob or tree reachable from it. Make "^$commit" behave > consistent to "^$tree". So with this: $ git rev-list --objects ^HEAD^@ HEAD ^HEAD^{tree} should be a round-about way to say $ git rev-parse HEAD ;-) The expression wants to list everything reachable from HEAD, but it does not want to show its parents (i.e. ^HEAD^@) and it does not want to show its tree (i.e. ^HEAD^{tree}), so the only thing that remains is the commit object HEAD and nothing else? I agree with Peff's comment about objects that may appear beyond the boundary (i.e. merge base between interesting ones and uninteresting ones); whether that inaccuracy matters depends on what you want to use this for---if you want to hide sensitive objects it does, if you want to reduce the network cost without incurring too much cpu cost, it probably does not. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/3] upload-pack: compute blob reachability correctly 2017-02-25 1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan @ 2017-02-25 1:18 ` Jonathan Tan 2 siblings, 0 replies; 17+ messages in thread From: Jonathan Tan @ 2017-02-25 1:18 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, gitster, peff, peartben, benpeart If allowreachablesha1inwant is set, upload-pack will provide a blob to a user, provided its hash, regardless of whether the blob is reachable or not. Teach upload-pack to compute reachability correctly by passing the "--objects" argument when it invokes "rev-list" if necessary. This commit only affects the case where blob/tree hashes are provided to upload-pack; the more typical case of only commit/tag hashes being provided is not affected. In the case where blob/tree hashes are provided, the reachability check is now slower (since trees need to be read) but correct. (The user may still set allowanysha1inwant instead of allowreachablesha1inwant to opt-out of the reachability check.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++ upload-pack.c | 15 +++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 505e1b4a7..a4ae888ff 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -547,6 +547,36 @@ test_expect_success 'fetch-pack can fetch a raw sha1' ' git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one) ' +test_expect_success 'setup for tests that fetch blobs by hash' ' + git init blobserver && + test_commit -C blobserver 1 && + test_commit -C blobserver 2 && + test_commit -C blobserver 3 && + blob1=$(echo 1 | git hash-object --stdin) && + blob2=$(echo 2 | git hash-object --stdin) && + blob3=$(echo 3 | git hash-object --stdin) && + + unreachable=$(echo 4 | git -C blobserver hash-object -w --stdin) && + git -C blobserver cat-file -e "$unreachable" +' + +test_expect_success 'fetch-pack can fetch reachable blobs by hash' ' + test_config -C blobserver uploadpack.allowreachablesha1inwant 1 && + + git init reachabletest && + git -C reachabletest fetch-pack ../blobserver "$blob1" "$blob2" && + git -C reachabletest cat-file -e "$blob1" && + git -C reachabletest cat-file -e "$blob2" && + test_must_fail git -C reachabletest cat-file -e "$blob3" +' + +test_expect_success 'fetch-pack cannot fetch unreachable blobs' ' + test_config -C blobserver uploadpack.allowreachablesha1inwant 1 && + + git init unreachabletest && + test_must_fail git -C unreachabletest fetch-pack ../blobserver "$blob1" "$unreachable" +' + check_prot_path () { cat >expected <<-EOF && Diag: url=$1 diff --git a/upload-pack.c b/upload-pack.c index 7597ba340..f05cc2b5e 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -471,6 +471,9 @@ static int do_reachable_revlist(struct child_process *cmd, static const char *argv[] = { "rev-list", "--stdin", NULL, }; + static const char *argv_with_objects[] = { + "rev-list", "--objects", "--stdin", NULL, + }; struct object *o; char namebuf[42]; /* ^ + SHA-1 + LF */ int i; @@ -488,6 +491,18 @@ static int do_reachable_revlist(struct child_process *cmd, */ sigchain_push(SIGPIPE, SIG_IGN); + /* + * If we are testing reachability of a tree or blob, rev-list needs to + * operate more granularly. + */ + for (i = 0; i < src->nr; i++) { + o = src->objects[i].item; + if (o->type == OBJ_TREE || o->type == OBJ_BLOB) { + cmd->argv = argv_with_objects; + break; + } + } + if (start_command(cmd)) goto error; -- 2.11.0.483.g087da7b7c-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-03-03 22:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-25 1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan 2017-02-25 1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan 2017-02-28 21:42 ` Junio C Hamano 2017-02-28 21:59 ` Jeff King 2017-03-02 18:36 ` Junio C Hamano 2017-02-28 22:06 ` Jeff King 2017-02-25 1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan 2017-02-28 21:44 ` Junio C Hamano 2017-02-28 22:12 ` Jeff King 2017-03-02 19:50 ` [PATCH] t/perf: export variable used in other blocks Jonathan Tan 2017-03-03 6:45 ` Jeff King 2017-03-03 7:14 ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King 2017-03-03 7:36 ` [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git Jeff King 2017-03-03 18:51 ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano 2017-03-03 22:31 ` Jeff King 2017-02-28 23:12 ` [PATCH 2/3] revision: exclude trees/blobs given commit Junio C Hamano 2017-02-25 1:18 ` [PATCH 3/3] upload-pack: compute blob reachability correctly Jonathan Tan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.