All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
Date: Tue, 4 Dec 2018 15:12:21 -0800	[thread overview]
Message-ID: <CAGZ79kYOOk2ODYgRcSZgDUqBfx2HeywnEGpbJB9BrrVzEUi_JA@mail.gmail.com> (raw)
In-Reply-To: <20181204224238.50966-1-jonathantanmy@google.com>

On Tue, Dec 4, 2018 at 2:42 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When fetching into a repository, a connectivity check is first made by
> check_exist_and_connected() in builtin/fetch.c that runs:
>
>   git rev-list --objects --stdin --not --all --quiet <(list of objects)
>
> If the client repository has many refs, this command can be slow,
> regardless of the nature of the server repository or what is being
> fetched. A profiler reveals that most of the time is spent in
> setup_revisions() (approx. 60/63), and of the time spent in
> setup_revisions(), most of it is spent in parse_object() (approx.
> 49/60). This is because setup_revisions() parses the target of every ref
> (from "--all"), and parse_object() reads the buffer of the object.
>
> Reading the buffer is unnecessary if the repository has a commit graph
> and if the ref points to a commit (which is typically the case). This
> patch uses the commit graph wherever possible; on my computer, when I
> run the above command with a list of 1 object on a many-ref repository,
> I get a speedup from 1.8s to 1.0s.
>
> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
> function.

This is a mere nicety, not strictly required.
Before we had parse_commit(struct commit *) which would accomplish the
same, (and we'd still have that afterwards as a #define falling back onto
the_repository). As the function get_reference() is not the_repository safe
as it contains a call to is_promisor_object() that is repository
agnostic, I think
it would be fair game to not depend on that series. I am not
complaining, though.

> A colleague noticed this issue when handling a mirror clone.
>
> Looking at the bigger picture, the speed of the connectivity check
> during a fetch might be further improved by passing only the negotiation
> tips (obtained through --negotiation-tip) instead of "--all". This patch
> just handles the low-hanging fruit first.
> ---
>  revision.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/revision.c b/revision.c
> index b5108b75ab..e7da2c57ab 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  {
>         struct object *object;
>
> -       object = parse_object(revs->repo, oid);
> +       /*
> +        * If the repository has commit graphs, repo_parse_commit() avoids
> +        * reading the object buffer, so use it whenever possible.
> +        */
> +       if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
> +               struct commit *c = lookup_commit(revs->repo, oid);
> +               if (!repo_parse_commit(revs->repo, c))
> +                       object = (struct object *) c;
> +               else
> +                       object = NULL;

Would it make sense in this case to rely on parse_object below
instead of assigning NULL? The reason for that would be that
when lookup_commit returns NULL, we would try more broadly.

AFAICT oid_object_info doesn't take advantage of the commit graph,
but just looks up the object header, which is still less than completely
parsing it. Then lookup_commit is overly strict, as it may return
NULL as when there still is a type mismatch (I don't think a mismatch
could happen here, as both rely on just the object store, and not the
commit graph.), so this would be just defensive programming for
the sake of it. I dunno.

    struct commit *c;

    if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
        (c = lookup_commit(revs->repo, oid)) &&
        !repo_parse_commit(revs->repo, c))
            object = (struct object *) c;
    else
        object = parse_object(revs->repo, oid);


So with all that said, I still think this is a good patch.

Thanks,
Stefan

> +       } else {
> +               object = parse_object(revs->repo, oid);
> +       }
> +
>         if (!object) {
>                 if (revs->ignore_missing)
>                         return object;
> --
> 2.19.0.271.gfe8321ec05.dirty
>

  reply	other threads:[~2018-12-04 23:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller [this message]
2018-12-06 23:36   ` Jonathan Tan
2018-12-07 13:49     ` Derrick Stolee
2018-12-05  4:54 ` Jeff King
2018-12-06 23:54   ` Jonathan Tan
2018-12-07  8:53     ` Jeff King
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09  0:51   ` Junio C Hamano
2018-12-09  1:49     ` Junio C Hamano
2018-12-11 10:54     ` Jeff King
2018-12-12 19:58       ` Jonathan Tan
2018-12-13  1:27         ` Jeff King
2018-12-13 16:20           ` Derrick Stolee
2018-12-13 18:54 ` [PATCH v3] " Jonathan Tan
2018-12-14  3:20   ` Junio C Hamano
2018-12-14  8:45   ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56   ` Stefan Beller
2019-01-25 22:01     ` Jonathan Tan
2019-01-25 22:14     ` SZEDER Gábor
2019-01-25 22:21       ` SZEDER Gábor
2019-01-27 13:08         ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28           ` SZEDER Gábor
2019-01-27 18:40             ` Derrick Stolee
2019-01-28 16:15           ` Jeff King
2019-01-28 16:57           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYOOk2ODYgRcSZgDUqBfx2HeywnEGpbJB9BrrVzEUi_JA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.