All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] defer expensive load_ref_decorations until needed
@ 2017-11-21 23:43 Phil Hord
  2017-11-22  5:03 ` Junio C Hamano
  2017-11-22 21:27 ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Phil Hord @ 2017-11-21 23:43 UTC (permalink / raw)
  To: Git; +Cc: Phil Hord

With many thousands of references, a simple `git rev-parse HEAD` may take
more than a second to return because it first loads all the refs into
memory even though it will never use them.

Defer loading any references until we actually need them.

Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
 log-tree.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 3b904f037..c1509f8b9 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -84,8 +84,10 @@ void add_name_decoration(enum decoration_type type, const char *name, struct obj
 	res->next = add_decoration(&name_decoration, obj, res);
 }
 
+static void maybe_load_ref_decorations();
 const struct name_decoration *get_name_decoration(const struct object *obj)
 {
+	maybe_load_ref_decorations();
 	return lookup_decoration(&name_decoration, obj);
 }
 
@@ -150,10 +152,13 @@ static int add_graft_decoration(const struct commit_graft *graft, void *cb_data)
 
 void load_ref_decorations(int flags)
 {
-	if (!decoration_loaded) {
+	decoration_flags = flags;
+}
 
+static void maybe_load_ref_decorations()
+{
+	if (!decoration_loaded) {
 		decoration_loaded = 1;
-		decoration_flags = flags;
 		for_each_ref(add_ref_decoration, NULL);
 		head_ref(add_ref_decoration, NULL);
 		for_each_commit_graft(add_graft_decoration, NULL);
-- 
2.15.0.471.g17a719cfe.dirty


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-21 23:43 [PATCH] defer expensive load_ref_decorations until needed Phil Hord
@ 2017-11-22  5:03 ` Junio C Hamano
  2017-11-22  6:14   ` Junio C Hamano
  2017-11-22 21:27 ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2017-11-22  5:03 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git

Phil Hord <phil.hord@gmail.com> writes:

> With many thousands of references, a simple `git rev-parse HEAD` may take
> more than a second to return because it first loads all the refs into
> memory even though it will never use them.
>
> Defer loading any references until we actually need them.
>
> Signed-off-by: Phil Hord <phil.hord@gmail.com>
> ---
>  log-tree.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/log-tree.c b/log-tree.c
> index 3b904f037..c1509f8b9 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -84,8 +84,10 @@ void add_name_decoration(enum decoration_type type, const char *name, struct obj
>  	res->next = add_decoration(&name_decoration, obj, res);
>  }
>  
> +static void maybe_load_ref_decorations();

I'll tweak that "()" and the other one we see below to "(void)"
while queuing.

I am not sure if "maybe_" is a good name here, though.  If anything,
you are making the semantics of "load_ref_decorations()" to "maybe"
(but I do not suggest renaming that one).

How about calling it to load_ref_decorations_lazily() or something?

I also wonder if decoration_loaded should now become function-scope
static in this new helper, but that can be left outside of the
topic.

Other than that, I like what this patch attempts to do.  A nicely
identified low-hanging fruit ;-).

Thanks.

>  const struct name_decoration *get_name_decoration(const struct object *obj)
>  {
> +	maybe_load_ref_decorations();
>  	return lookup_decoration(&name_decoration, obj);
>  }
>  
> @@ -150,10 +152,13 @@ static int add_graft_decoration(const struct commit_graft *graft, void *cb_data)
>  
>  void load_ref_decorations(int flags)
>  {
> -	if (!decoration_loaded) {
> +	decoration_flags = flags;
> +}
>  
> +static void maybe_load_ref_decorations()
> +{
> +	if (!decoration_loaded) {
>  		decoration_loaded = 1;
> -		decoration_flags = flags;
>  		for_each_ref(add_ref_decoration, NULL);
>  		head_ref(add_ref_decoration, NULL);
>  		for_each_commit_graft(add_graft_decoration, NULL);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-22  5:03 ` Junio C Hamano
@ 2017-11-22  6:14   ` Junio C Hamano
  2017-11-22 17:45     ` Phil Hord
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2017-11-22  6:14 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git, Rafael Ascensão

Junio C Hamano <gitster@pobox.com> writes:

> Other than that, I like what this patch attempts to do.  A nicely
> identified low-hanging fruit ;-).

Having said that, this will have a bad interaction with another
topic in flight: <20171121213341.13939-1-rafa.almas@gmail.com>

Perhaps this should wait until the other topic lands and stabilizes.
We'd need to rethink if the approach taken by this patch, i.e. to
still pass the info to load() but holding onto it until the time
lazy_load() actually uses it, is a sensible way forward, or we would
want to change the calling convention to help making it easier to
implement the lazy loading.

Thanks.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-22  6:14   ` Junio C Hamano
@ 2017-11-22 17:45     ` Phil Hord
  0 siblings, 0 replies; 9+ messages in thread
From: Phil Hord @ 2017-11-22 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git, Rafael Ascensão

On Tue, Nov 21, 2017, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> I am not sure if "maybe_" is a good name here, though.  If anything,
> you are making the semantics of "load_ref_decorations()" to "maybe"
> (but I do not suggest renaming that one).
>
> How about calling it to load_ref_decorations_lazily() or something?

I groped about for something conventional, but "..._gently" didn't fit
the bill, so I went with "maybe".  I like "lazily" better for this
case. I will change it for v2.

>> Other than that, I like what this patch attempts to do.  A nicely
>> identified low-hanging fruit ;-).
>
> Having said that, this will have a bad interaction with another
> topic in flight: <20171121213341.13939-1-rafa.almas@gmail.com>
>
> Perhaps this should wait until the other topic lands and stabilizes.
> We'd need to rethink if the approach taken by this patch, i.e. to
> still pass the info to load() but holding onto it until the time
> lazy_load() actually uses it, is a sensible way forward, or we would
> want to change the calling convention to help making it easier to
> implement the lazy loading.

I noticed that after just after cleaning this one up, but didn't look
closely yet.  I'll hold this in my local queue until ra lands.

P

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-21 23:43 [PATCH] defer expensive load_ref_decorations until needed Phil Hord
  2017-11-22  5:03 ` Junio C Hamano
@ 2017-11-22 21:27 ` Jeff King
  2017-11-22 23:21   ` Phil Hord
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2017-11-22 21:27 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git

On Tue, Nov 21, 2017 at 03:43:36PM -0800, Phil Hord wrote:

> With many thousands of references, a simple `git rev-parse HEAD` may take
> more than a second to return because it first loads all the refs into
> memory even though it will never use them.

The overall goal of lazy-loading seems reasonable, but I'm slightly
confused: how and why does "git rev-parse HEAD" load ref decorations?

Grepping around I find that we mostly load them only when appropriate
(when the "log" family sees a decorate option, when we see %d/%D in a
pretty format, or with --simplify-by-decoration in a traversal). And
poking at "rev-parse HEAD" in gdb seems to confirm that it does not hit
that function.

I have definitely seen "rev-parse HEAD" be O(# of refs), but that is
mostly attributable to having all the refs packed (and until v2.15.0,
the packed-refs code would read the whole file into memory). I've also
seen unnecessary ref lookups due to replace refs (we load al of the
packed refs to find out that no, there's nothing in refs/replace).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-22 21:27 ` Jeff King
@ 2017-11-22 23:21   ` Phil Hord
  2017-11-22 23:48     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Phil Hord @ 2017-11-22 23:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Git

On Wed, Nov 22, 2017 at 1:27 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Nov 21, 2017 at 03:43:36PM -0800, Phil Hord wrote:
>
>> With many thousands of references, a simple `git rev-parse HEAD` may take
>> more than a second to return because it first loads all the refs into
>> memory even though it will never use them.
>
> The overall goal of lazy-loading seems reasonable, but I'm slightly
> confused: how and why does "git rev-parse HEAD" load ref decorations?
>
> Grepping around I find that we mostly load them only when appropriate
> (when the "log" family sees a decorate option, when we see %d/%D in a
> pretty format, or with --simplify-by-decoration in a traversal). And
> poking at "rev-parse HEAD" in gdb seems to confirm that it does not hit
> that function.

Hm. I think I was confused.

I wrote v1 of this patch a few months ago. Clearly I was wrong about
rev-parse being afflicted.  We have a script that was suffering and it
uses both "git log --format=%h" and "git rev-parse" to get hashes; I
remember testing both, but I can't find it in my $zsh_history; my
memory and my commit-message must be faulty.

However, "git log" does not need any --decorate option to trigger this lag.

    $ git for-each-ref| wc -l
    24172
    $ time git log --format=%h -1
    git log --format=%h -1   0.47s user 0.04s system 99% cpu 0.509 total

I grepped the code just now, too, and I see the same as you, though;
it seems to hold off unless !!decoration_style.  Nevertheless, gdb
shows me decoration_style=1 with this command:

    GIT_CONFIG=/dev/null cgdb --args git log -1 --format="%h"

Here are timing tests on this repo without this change:

    git log --format=%h -1             0.54s user 0.05s system 99% cpu
0.597 total
    git log --format=%h -1 --decorate  0.54s user 0.04s system 98% cpu
0.590 total
    git log --format=%h%d -1           0.53s user 0.05s system 99% cpu
0.578 total

And the same commands with this change:

    git log --format=%h -1              0.01s user 0.01s system 71%
cpu 0.017 total
    git log --format=%h -1 --decorate   0.00s user 0.01s system 92%
cpu 0.009 total
    git log --format=%h%d -1            0.53s user 0.09s system 88%
cpu 0.699 total

> I have definitely seen "rev-parse HEAD" be O(# of refs), but that is
> mostly attributable to having all the refs packed (and until v2.15.0,
> the packed-refs code would read the whole file into memory).

Hm.  Could this be why rev-parse was slow for me?  My original problem
showed up on v1.9 (build machine) and I patched it on v2.14.0-rc1.
But, no; testing on 1.9, 2.11 and 2.14 still doesn't show me the lag
in rev-parse.  I remain befuddled.

> I've also
> seen unnecessary ref lookups due to replace refs (we load al of the
> packed refs to find out that no, there's nothing in refs/replace).

I haven't seen this in the code, but I have had refs/replace hacks in
the past. Is that enough to wake this up?

Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-22 23:21   ` Phil Hord
@ 2017-11-22 23:48     ` Jeff King
  2017-11-23  2:19       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2017-11-22 23:48 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git

On Wed, Nov 22, 2017 at 03:21:06PM -0800, Phil Hord wrote:

> Hm. I think I was confused.
> 
> I wrote v1 of this patch a few months ago. Clearly I was wrong about
> rev-parse being afflicted.  We have a script that was suffering and it
> uses both "git log --format=%h" and "git rev-parse" to get hashes; I
> remember testing both, but I can't find it in my $zsh_history; my
> memory and my commit-message must be faulty.

OK, that makes more sense (that log would see it).

> However, "git log" does not need any --decorate option to trigger this lag.
> 
>     $ git for-each-ref| wc -l
>     24172
>     $ time git log --format=%h -1
>     git log --format=%h -1   0.47s user 0.04s system 99% cpu 0.509 total
>
> I grepped the code just now, too, and I see the same as you, though;
> it seems to hold off unless !!decoration_style.  Nevertheless, gdb
> shows me decoration_style=1 with this command:
> 
>     GIT_CONFIG=/dev/null cgdb --args git log -1 --format="%h"
> 

Right, the default these days is "auto decorate", so it's enabled if
your output is to a terminal. So "git log --no-decorate" should be cheap
again (or you may want to set log.decorate=false in your config).

And lazy-load wouldn't help you there for a normal:

  git log

But what's interesting in your command is the pretty-format. Even though
decoration is turned on, your format doesn't show any. So we never
actually ask "is this commit decorated" and the lazy-load helps.

So I think your patch is doing the right thing, but the explanation
should probably cover that it is really helping non-decorating formats.

> Here are timing tests on this repo without this change:
> 
>     git log --format=%h -1             0.54s user 0.05s system 99% cpu
> 0.597 total
>     git log --format=%h -1 --decorate  0.54s user 0.04s system 98% cpu
> 0.590 total
>     git log --format=%h%d -1           0.53s user 0.05s system 99% cpu
> 0.578 total
> 
> And the same commands with this change:
> 
>     git log --format=%h -1              0.01s user 0.01s system 71%
> cpu 0.017 total
>     git log --format=%h -1 --decorate   0.00s user 0.01s system 92%
> cpu 0.009 total
>     git log --format=%h%d -1            0.53s user 0.09s system 88%
> cpu 0.699 total

Yeah, that's consistent with what I'd expect.

> > I have definitely seen "rev-parse HEAD" be O(# of refs), but that is
> > mostly attributable to having all the refs packed (and until v2.15.0,
> > the packed-refs code would read the whole file into memory).
> 
> Hm.  Could this be why rev-parse was slow for me?  My original problem
> showed up on v1.9 (build machine) and I patched it on v2.14.0-rc1.
> But, no; testing on 1.9, 2.11 and 2.14 still doesn't show me the lag
> in rev-parse.  I remain befuddled.

Doing "rev-parse HEAD" would still have to load the packed refs if the
thing that HEAD points to is in there. Perhaps your current HEAD is
detached, or you have a loose ref for the current branch? Try "git
pack-refs --all --prune" and then re-time.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-22 23:48     ` Jeff King
@ 2017-11-23  2:19       ` Junio C Hamano
  2017-11-23  4:00         ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2017-11-23  2:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Phil Hord, Git

Jeff King <peff@peff.net> writes:

> And lazy-load wouldn't help you there for a normal:
>
>   git log
>
> But what's interesting in your command is the pretty-format. Even though
> decoration is turned on, your format doesn't show any. So we never
> actually ask "is this commit decorated" and the lazy-load helps.

Hmph, I wonder if we can detect this case and not make a call to
load decorations in the first place.  That would remove the need to
remember the options when load is called so that we can use it when
we load decorations lazily later.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] defer expensive load_ref_decorations until needed
  2017-11-23  2:19       ` Junio C Hamano
@ 2017-11-23  4:00         ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2017-11-23  4:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phil Hord, Git

On Thu, Nov 23, 2017 at 11:19:42AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > And lazy-load wouldn't help you there for a normal:
> >
> >   git log
> >
> > But what's interesting in your command is the pretty-format. Even though
> > decoration is turned on, your format doesn't show any. So we never
> > actually ask "is this commit decorated" and the lazy-load helps.
> 
> Hmph, I wonder if we can detect this case and not make a call to
> load decorations in the first place.  That would remove the need to
> remember the options when load is called so that we can use it when
> we load decorations lazily later.

Probably userformat_find_requirements() could (and should) be taught to
report on decoration flags, like it does for notes. As it is now we call
load_ref_decorations() repeatedly while processing the commits (which
works because it's a noop after the first call).

And once we can do that, it would be easy to do something like:

  if (decoration_style && !want->decorations)
	decoration_style = 0;

But I think that may only be part of the story. Do all of the output
formats show decorations? I think --format=email, for instance, does
not.

So the real question is not just "does the user format want
decorations", but "does the pretty format want decorations". Which
requires knowing things about each format that might get out of sync
with the rest of the code. That might be OK if it lives in pretty.c. But
the lazy-load thing make sit just work without having to duplicate the
logic at all.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-23  4:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21 23:43 [PATCH] defer expensive load_ref_decorations until needed Phil Hord
2017-11-22  5:03 ` Junio C Hamano
2017-11-22  6:14   ` Junio C Hamano
2017-11-22 17:45     ` Phil Hord
2017-11-22 21:27 ` Jeff King
2017-11-22 23:21   ` Phil Hord
2017-11-22 23:48     ` Jeff King
2017-11-23  2:19       ` Junio C Hamano
2017-11-23  4:00         ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.