From: Emily Shaffer <emilyshaffer@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] connected: always use partial clone optimization
Date: Thu, 26 Mar 2020 14:11:56 -0700 [thread overview]
Message-ID: <20200326211156.GA37946@google.com> (raw)
In-Reply-To: <20200320220045.258462-1-jonathantanmy@google.com>
On Fri, Mar 20, 2020 at 03:00:45PM -0700, Jonathan Tan wrote:
> With 50033772d5 ("connected: verify promisor-ness of partial clone",
> 2020-01-30), the fast path (checking promisor packs) in
> check_connected() now passes a subset of the slow path (rev-list) - if
> all objects to be checked are found in promisor packs, both the fast
> path and the slow path will pass; otherwise, the fast path will
> definitely not pass. This means that we can always attempt the fast path
> whenever we need to do the slow path.
>
> The fast path is currently guarded by a flag; therefore, remove that
> flag. Also, make the fast path fallback to the slow path - if the fast
> path fails, the failing OID and all remaining OIDs will be passed to
> rev-list.
It looks like a pretty simple change. I had one probably-biased
complaint about gotos below, otherwise it looks reasonable to me.
>
> The main user-visible benefit is the performance of fetch from a partial
> clone - specifically, the speedup of the connectivity check done before
> the fetch. In particular, a no-op fetch into a partial clone on my
> computer was sped up from 7 seconds to 0.01 seconds. This is a
> complement to the work in 2df1aa239c ("fetch: forgo full
> connectivity check if --filter", 2020-01-30), which is the child of the
> aforementioned 50033772d5. In that commit, the connectivity check
> *after* the fetch was sped up.
>
> The addition of the fast path might cause performance reductions in
> these cases:
>
> - If a partial clone or a fetch into a partial clone fails, Git will
> fruitlessly run rev-list (it is expected that everything fetched
> would go into promisor packs, so if that didn't happen, it is most
> likely that rev-list will fail too).
>
> - Any connectivity checks done by receive-pack, in the (in my opinion,
> unlikely) event that a partial clone serves receive-pack.
>
> I think that these cases are rare enough, and the performance reduction
> in this case minor enough (additional object DB access), that the
> benefit of avoiding a flag outweighs these.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> This is the second half of the work I did previously [1]. Quoting from
> [1]:
>
> > For example, a local fetch was sped up from 6.63s to 3.39s. The bulk of
> > the remaining time is spent in yet another connectivity check
> > (fetch_refs -> check_exist_and_connected) prior to the fetch - that will
> > hopefully be done in a subsequent patch.
>
> This is the subsequent patch. (Note that the timings were done on
> another computer, so don't compare the timings from [1] and this patch
> directly.)
>
> [1] https://lore.kernel.org/git/be1d6aa4c4fd8868f3682b73c01a92d3830534ad.1578802317.git.jonathantanmy@google.com/
> ---
> builtin/clone.c | 7 ++-----
> builtin/fetch.c | 7 -------
> connected.c | 9 +++++++--
> connected.h | 9 ---------
> 4 files changed, 9 insertions(+), 23 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 1ad26f4d8c..4b2b14ff61 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -672,8 +672,7 @@ static void update_remote_refs(const struct ref *refs,
> const char *branch_top,
> const char *msg,
> struct transport *transport,
> - int check_connectivity,
> - int check_refs_are_promisor_objects_only)
> + int check_connectivity)
> {
> const struct ref *rm = mapped_refs;
>
> @@ -682,8 +681,6 @@ static void update_remote_refs(const struct ref *refs,
>
> opt.transport = transport;
> opt.progress = transport->progress;
> - opt.check_refs_are_promisor_objects_only =
> - !!check_refs_are_promisor_objects_only;
>
> if (check_connected(iterate_ref_map, &rm, &opt))
> die(_("remote did not send all necessary objects"));
> @@ -1275,7 +1272,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>
> update_remote_refs(refs, mapped_refs, remote_head_points_at,
> branch_top.buf, reflog_msg.buf, transport,
> - !is_local, filter_options.choice);
> + !is_local);
>
> update_head(our_head_points_at, remote_head, reflog_msg.buf);
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index bf6bab80fa..1097e1e512 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -908,13 +908,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
> if (!connectivity_checked) {
> struct check_connected_options opt = CHECK_CONNECTED_INIT;
>
> - if (filter_options.choice)
> - /*
> - * Since a filter is specified, objects indirectly
> - * referenced by refs are allowed to be absent.
> - */
> - opt.check_refs_are_promisor_objects_only = 1;
> -
> rm = ref_map;
> if (check_connected(iterate_ref_map, &rm, &opt)) {
> rc = error(_("%s did not send all necessary objects\n"), url);
> diff --git a/connected.c b/connected.c
> index 7e9bd1bc62..846f2e4eef 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -52,7 +52,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
> strbuf_release(&idx_file);
> }
>
> - if (opt->check_refs_are_promisor_objects_only) {
> + if (has_promisor_remote()) {
> /*
> * For partial clones, we don't want to have to do a regular
> * connectivity check because we have to enumerate and exclude
> @@ -71,13 +71,18 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
> if (find_pack_entry_one(oid.hash, p))
> goto promisor_pack_found;
> }
> - return 1;
> + /*
> + * Fallback to rev-list with oid and the rest of the
> + * object IDs provided by fn.
> + */
> + goto no_promisor_pack_found;
> promisor_pack_found:
> ;
> } while (!fn(cb_data, &oid));
> return 0;
> }
>
> +no_promisor_pack_found:
Having a look at the final structure of the loop with these gotos, I'm a
little confused. Could be this isn't C-idiomatic but I think the code
could be easier to read with helpers instead of gotos. I realize it's
longer but I have a hard time understanding that your gotos are used to
double-continue or double-break; nested loops tend to make me want to
use helpers. But - I'm a lowly barely-reformed C++ developer, so what do
I know ;)
int oid_in_promisor(oid) {
for (p = get_all_packs(the_repository); p; p = p->next) {
if (!p->pack_promisor)
continue;
if (find_pack_entry_one(oid.hash, p)
return 1;
}
}
int all_oids_in_promisors(oid, fn, cb_data)
{
do {
if (! oid_in_promisor(oid))
return 0;
} while (!fn(cb_data, &oid));
return 1;
}
int check_connected(...)
{
...
if (has_promisor_remote()) {
if (all_oids_in_promisors(oid, fn, cb_data))
return 0;
if (opt->shallow_file) {
...
}
> if (opt->shallow_file) {
> argv_array_push(&rev_list.args, "--shallow-file");
> argv_array_push(&rev_list.args, opt->shallow_file);
> diff --git a/connected.h b/connected.h
> index eba5c261ba..8d5a6b3ad6 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -46,15 +46,6 @@ struct check_connected_options {
> * during a fetch.
> */
> unsigned is_deepening_fetch : 1;
> -
> - /*
> - * If non-zero, only check that the top-level objects referenced by the
> - * wanted refs (passed in as cb_data) are promisor objects. This is
> - * useful for partial clones, where enumerating and excluding all
> - * promisor objects is very slow and the commit-walk itself becomes a
> - * no-op.
> - */
> - unsigned check_refs_are_promisor_objects_only : 1;
> };
>
> #define CHECK_CONNECTED_INIT { 0 }
> --
> 2.25.1.696.g5e7596f4ac-goog
>
next prev parent reply other threads:[~2020-03-26 21:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-20 22:00 [PATCH] connected: always use partial clone optimization Jonathan Tan
2020-03-20 22:54 ` Junio C Hamano
2020-03-26 19:01 ` Josh Steadmon
2020-03-26 21:11 ` Emily Shaffer [this message]
2020-03-26 23:14 ` Josh Steadmon
2020-03-29 17:39 ` Junio C Hamano
2020-03-30 3:32 ` Jonathan Tan
2020-03-30 5:12 ` Junio C Hamano
2020-03-30 16:04 ` Jonathan Tan
2020-03-30 18:09 ` Junio C Hamano
2020-03-30 13:37 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200326211156.GA37946@google.com \
--to=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).