* [PATCH] send-pack: never fetch when checking exclusions @ 2019-10-07 21:38 Jonathan Tan 2019-10-08 4:10 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jonathan Tan @ 2019-10-07 21:38 UTC (permalink / raw) To: git; +Cc: Jonathan Tan When building the packfile to be sent, send_pack() is given a list of remote refs to be used as exclusions. For each ref, it first checks if the ref exists locally, and if it does, passes it with a "^" prefix to pack-objects. However, in a partial clone, the check may trigger a lazy fetch. Ensure that this lazy fetch does not occur. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- send-pack.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/send-pack.c b/send-pack.c index 6dc16c3211..34c77cbb1a 100644 --- a/send-pack.c +++ b/send-pack.c @@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt, static void feed_object(const struct object_id *oid, FILE *fh, int negative) { - if (negative && !has_object_file(oid)) + if (negative && + !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) return; if (negative) -- 2.23.0.581.g78d2f28ef7-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] send-pack: never fetch when checking exclusions 2019-10-07 21:38 [PATCH] send-pack: never fetch when checking exclusions Jonathan Tan @ 2019-10-08 4:10 ` Junio C Hamano 2019-10-08 18:37 ` [PATCH v2] " Jonathan Tan 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2019-10-08 4:10 UTC (permalink / raw) To: Jonathan Tan; +Cc: git Jonathan Tan <jonathantanmy@google.com> writes: > When building the packfile to be sent, send_pack() is given a list of > remote refs to be used as exclusions. For each ref, it first checks if > the ref exists locally, and if it does, passes it with a "^" prefix to > pack-objects. However, in a partial clone, the check may trigger a lazy > fetch. Ensure that this lazy fetch does not occur. Is there any effect worth describing here, other than the obvious "we do not lazily fetch from within the has_object_file() function"? For example, would this change mean that the resulting pack may include stuff that are reachable from the (missing) negative objects that would not otherwise have to be sent if these objects were available (or made available by the lazy fetching), and we are making a trade-off to send possibly more in order for not fetching? Have we laid enough on the table to help readers if such a trade-off (if we are making one, that is) strikes the right balance? Thanks. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > send-pack.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/send-pack.c b/send-pack.c > index 6dc16c3211..34c77cbb1a 100644 > --- a/send-pack.c > +++ b/send-pack.c > @@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt, > > static void feed_object(const struct object_id *oid, FILE *fh, int negative) > { > - if (negative && !has_object_file(oid)) > + if (negative && > + !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) > return; > > if (negative) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-08 4:10 ` Junio C Hamano @ 2019-10-08 18:37 ` Jonathan Tan 2019-10-11 6:12 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Jonathan Tan @ 2019-10-08 18:37 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, gitster When building the packfile to be sent, send_pack() is given a list of remote refs to be used as exclusions. For each ref, it first checks if the ref exists locally, and if it does, passes it with a "^" prefix to pack-objects. However, in a partial clone, the check may trigger a lazy fetch. The additional commit ancestry information obtained during such fetches may show that certain objects that would have been sent are already known to the server, resulting in a smaller pack being sent. But this is at the cost of fetching from many possibly unrelated refs, and the lazy fetches do not help at all in the typical case where the client is up-to-date with the upstream of the branch being pushed. Ensure that these lazy fetches do not occur. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- > For example, would this change mean that the resulting pack may > include stuff that are reachable from the (missing) negative objects > that would not otherwise have to be sent if these objects were > available (or made available by the lazy fetching), and we are > making a trade-off to send possibly more in order for not fetching? > Have we laid enough on the table to help readers if such a trade-off > (if we are making one, that is) strikes the right balance? Thanks for your comments. I've expanded the commit message. --- send-pack.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/send-pack.c b/send-pack.c index 6dc16c3211..34c77cbb1a 100644 --- a/send-pack.c +++ b/send-pack.c @@ -40,7 +40,8 @@ int option_parse_push_signed(const struct option *opt, static void feed_object(const struct object_id *oid, FILE *fh, int negative) { - if (negative && !has_object_file(oid)) + if (negative && + !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT)) return; if (negative) -- 2.23.0.581.g78d2f28ef7-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-08 18:37 ` [PATCH v2] " Jonathan Tan @ 2019-10-11 6:12 ` Jeff King 2019-10-11 12:31 ` Derrick Stolee 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2019-10-11 6:12 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, gitster On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote: > When building the packfile to be sent, send_pack() is given a list of > remote refs to be used as exclusions. For each ref, it first checks if > the ref exists locally, and if it does, passes it with a "^" prefix to > pack-objects. However, in a partial clone, the check may trigger a lazy > fetch. > > The additional commit ancestry information obtained during such fetches > may show that certain objects that would have been sent are already > known to the server, resulting in a smaller pack being sent. But this is > at the cost of fetching from many possibly unrelated refs, and the lazy > fetches do not help at all in the typical case where the client is > up-to-date with the upstream of the branch being pushed. > > Ensure that these lazy fetches do not occur. That makes sense. For similar reasons, should we be using OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that we don't have, we'll end up re-scanning the pack directory over and over (which is _usually_ pretty quick, but can be slow if you have a lot of packs locally). And it's OK if we racily miss out on an exclusion due to somebody else repacking simultaneously. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-11 6:12 ` Jeff King @ 2019-10-11 12:31 ` Derrick Stolee 2019-10-11 16:15 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Derrick Stolee @ 2019-10-11 12:31 UTC (permalink / raw) To: Jeff King, Jonathan Tan; +Cc: git, gitster On 10/11/2019 2:12 AM, Jeff King wrote: > On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote: > >> When building the packfile to be sent, send_pack() is given a list of >> remote refs to be used as exclusions. For each ref, it first checks if >> the ref exists locally, and if it does, passes it with a "^" prefix to >> pack-objects. However, in a partial clone, the check may trigger a lazy >> fetch. >> >> The additional commit ancestry information obtained during such fetches >> may show that certain objects that would have been sent are already >> known to the server, resulting in a smaller pack being sent. But this is >> at the cost of fetching from many possibly unrelated refs, and the lazy >> fetches do not help at all in the typical case where the client is >> up-to-date with the upstream of the branch being pushed. >> >> Ensure that these lazy fetches do not occur. > > That makes sense. For similar reasons, should we be using > OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that > we don't have, we'll end up re-scanning the pack directory over and over > (which is _usually_ pretty quick, but can be slow if you have a lot of > packs locally). And it's OK if we racily miss out on an exclusion due to > somebody else repacking simultaneously. That's a good idea. We can hint to the object store that we don't expect misses to be due to a concurrent repack, so we don't want to reprepare pack-files. -Stolee ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-11 12:31 ` Derrick Stolee @ 2019-10-11 16:15 ` Jeff King 2019-10-11 22:08 ` Jonathan Tan 2019-10-12 0:47 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Jeff King @ 2019-10-11 16:15 UTC (permalink / raw) To: Derrick Stolee; +Cc: Jonathan Tan, git, gitster On Fri, Oct 11, 2019 at 08:31:30AM -0400, Derrick Stolee wrote: > >> Ensure that these lazy fetches do not occur. > > > > That makes sense. For similar reasons, should we be using > > OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that > > we don't have, we'll end up re-scanning the pack directory over and over > > (which is _usually_ pretty quick, but can be slow if you have a lot of > > packs locally). And it's OK if we racily miss out on an exclusion due to > > somebody else repacking simultaneously. > > That's a good idea. We can hint to the object store that we don't expect > misses to be due to a concurrent repack, so we don't want to reprepare > pack-files. As a general rule (and why I'm raising this issue in reply to Jonathan's patch), I think most or all sites that want OBJECT_INFO_QUICK will want SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally the same: - it's OK to racily have a false negative (we'll still be correct, but possibly a little less optimal) - it's expected and normal to be missing the object, so spending time double-checking the pack store wastes measurable time in real-world cases -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-11 16:15 ` Jeff King @ 2019-10-11 22:08 ` Jonathan Tan 2019-10-17 6:10 ` Jeff King 2019-10-12 0:47 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Tan @ 2019-10-11 22:08 UTC (permalink / raw) To: peff; +Cc: stolee, jonathantanmy, git, gitster > As a general rule (and why I'm raising this issue in reply to Jonathan's > patch), I think most or all sites that want OBJECT_INFO_QUICK will want > SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally > the same: > > - it's OK to racily have a false negative (we'll still be correct, but > possibly a little less optimal) > > - it's expected and normal to be missing the object, so spending time > double-checking the pack store wastes measurable time in real-world > cases I took a look on "next" and it's true for these reasons in most cases but not all. QUICK implies SKIP_FETCH_OBJECT: fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c, builtin/fetch-pack.c, or through a lazy fetch) so OK. builtin/index-pack.c: Run with fetch_if_missing=0, so OK. builtin/fetch.c: Run with fetch_if_missing=0, so OK. object-store.h, sha1-file.c: Definition and implementation of this flag. Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK: cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not lazy-fetch tentative tree", 2019-09-09). No problem with a false negative, since we know how to reconstruct the tree. OK. object-store.h, sha1-file.c: Definition and implementation of this flag. send-pack.c: This patch (which is already in "next"). If we have a false negative, we might accidentally send more than we need. But that is not too bad. promisor-remote.c: This is the slightly tricky one. We use this information to determine if we got our lazily-fetched object from the most recent lazy fetch, or if we should continue attempting to fetch the given object from other promisor remotes; so this information is important. However, adding QUICK doesn't lose us anything because the lack of QUICK only helps us when there is another process packing loose objects: if we got our object, our object will be in a pack (because of the way the fetch is implemented - in particular, we need a pack because we need the ".promisor" file). So everything is OK except for promisor-remote.c, but even that is OK for another reason. Having said that, perhaps we should consider promisor-remote.c as low-level code and expect it to know that objects are fetched into a packfile (as opposed to loose objects), so it can safely use QUICK (which is documented as checking packed after packed and loose). If no one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is merged to master (as is the plan, according to What's Cooking [1]). [1] https://public-inbox.org/git/xmqq8sprhgzc.fsf@gitster-ct.c.googlers.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-11 22:08 ` Jonathan Tan @ 2019-10-17 6:10 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2019-10-17 6:10 UTC (permalink / raw) To: Jonathan Tan; +Cc: stolee, git, gitster On Fri, Oct 11, 2019 at 03:08:22PM -0700, Jonathan Tan wrote: > > As a general rule (and why I'm raising this issue in reply to Jonathan's > > patch), I think most or all sites that want OBJECT_INFO_QUICK will want > > SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally > > the same: > > > > - it's OK to racily have a false negative (we'll still be correct, but > > possibly a little less optimal) > > > > - it's expected and normal to be missing the object, so spending time > > double-checking the pack store wastes measurable time in real-world > > cases > > I took a look on "next" and it's true for these reasons in most cases > but not all. Thanks for digging into this. > QUICK implies SKIP_FETCH_OBJECT: > > fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c, > builtin/fetch-pack.c, or through a lazy fetch) so OK. > > builtin/index-pack.c: Run with fetch_if_missing=0, so OK. > > builtin/fetch.c: Run with fetch_if_missing=0, so OK. > > object-store.h, sha1-file.c: Definition and implementation of this > flag. Right, I think going in this direction is pretty simple. Having been marked with QUICK, they hit both of my points from above. And if we want to avoid re-scanning the pack directory because of cost, we _definitely_ want to avoid making an expensive network call. > Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK: > > cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not > lazy-fetch tentative tree", 2019-09-09). No problem with a false > negative, since we know how to reconstruct the tree. OK. > [...] > send-pack.c: This patch (which is already in "next"). If we have a > false negative, we might accidentally send more than we need. But that > is not too bad. Yeah, I think both of these could be QUICK. > promisor-remote.c: This is the slightly tricky one. We use this > information to determine if we got our lazily-fetched object from the > most recent lazy fetch, or if we should continue attempting to fetch the > given object from other promisor remotes; so this information is > important. However, adding QUICK doesn't lose us anything because the > lack of QUICK only helps us when there is another process packing > loose objects: if we got our object, our object will be in a pack > (because of the way the fetch is implemented - in particular, we need > a pack because we need the ".promisor" file). > > So everything is OK except for promisor-remote.c, but even that is OK > for another reason. Yeah, though I wouldn't be sad to see that use a separate flag, since it really is about promisor logic. That implies to me maybe we should be using QUICK more aggressively, and QUICK should auto-imply SKIP_FETCH_OBJECT. > Having said that, perhaps we should consider promisor-remote.c as > low-level code and expect it to know that objects are fetched into a > packfile (as opposed to loose objects), so it can safely use QUICK > (which is documented as checking packed after packed and loose). If no > one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is > merged to master (as is the plan, according to What's Cooking [1]). I think it's OK to continue leaving out QUICK there if it's not causing problems. It really is a bit different than the other cases. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] send-pack: never fetch when checking exclusions 2019-10-11 16:15 ` Jeff King 2019-10-11 22:08 ` Jonathan Tan @ 2019-10-12 0:47 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2019-10-12 0:47 UTC (permalink / raw) To: Jeff King; +Cc: Derrick Stolee, Jonathan Tan, git Jeff King <peff@peff.net> writes: > As a general rule (and why I'm raising this issue in reply to Jonathan's > patch), I think most or all sites that want OBJECT_INFO_QUICK will want > SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally > the same: > > - it's OK to racily have a false negative (we'll still be correct, but > possibly a little less optimal) > > - it's expected and normal to be missing the object, so spending time > double-checking the pack store wastes measurable time in real-world > cases 31f5256c ("sha1-file: split OBJECT_INFO_FOR_PREFETCH", 2019-05-28) separated SKIP_FETCH_OBJECT out of FOR_PREFETCH, the latter of which was and is SKIP_FETCH and QUICK combined. Use SKIP_FETCH_OBJECT alone may need to be re-examined and discouraged? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-10-17 6:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-07 21:38 [PATCH] send-pack: never fetch when checking exclusions Jonathan Tan 2019-10-08 4:10 ` Junio C Hamano 2019-10-08 18:37 ` [PATCH v2] " Jonathan Tan 2019-10-11 6:12 ` Jeff King 2019-10-11 12:31 ` Derrick Stolee 2019-10-11 16:15 ` Jeff King 2019-10-11 22:08 ` Jonathan Tan 2019-10-17 6:10 ` Jeff King 2019-10-12 0:47 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.