All of lore.kernel.org
 help / color / mirror / Atom feed
* git log with ordering option and --first-parent is unnecessarily slow
@ 2016-04-09  3:55 Josiah Boning
  2016-04-09  5:01 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Josiah Boning @ 2016-04-09  3:55 UTC (permalink / raw)
  To: git

As measured on linux.git, adding --date-order to a log command can
result in a significant slowdown (~25x here; I've seen ~100x on other
repositories):

    $ time git log --first-parent --max-count=51 master > /dev/null
    real    0m0.024s
    user    0m0.006s
    sys    0m0.016s
    $ time git log --date-order --first-parent --max-count=51 master > /dev/null
    real    0m0.652s
    user    0m0.570s
    sys    0m0.074s

In combination with --first-parent, --date-order (or any other
ordering option) should be a no-op, since --first-parent selects a
linear history. So it seems like there's a significant performance win
available by ignoring ordering options when --first-parent is present.
Would this change be desirable? If so, I'll see about submitting a
patch.

More generally, it seems like it might be possible to identify when
the selected commits are linear and avoid the cost of sorting.

Josiah

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git log with ordering option and --first-parent is unnecessarily slow
  2016-04-09  3:55 git log with ordering option and --first-parent is unnecessarily slow Josiah Boning
@ 2016-04-09  5:01 ` Jeff King
  2016-04-09  5:42   ` Josiah Boning
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2016-04-09  5:01 UTC (permalink / raw)
  To: Josiah Boning; +Cc: git

On Fri, Apr 08, 2016 at 08:55:54PM -0700, Josiah Boning wrote:

> As measured on linux.git, adding --date-order to a log command can
> result in a significant slowdown (~25x here; I've seen ~100x on other
> repositories):
> 
>     $ time git log --first-parent --max-count=51 master > /dev/null
>     real    0m0.024s
>     user    0m0.006s
>     sys    0m0.016s
>     $ time git log --date-order --first-parent --max-count=51 master > /dev/null
>     real    0m0.652s
>     user    0m0.570s
>     sys    0m0.074s

Try timing "git log --first-parent master >/dev/null". It should be
about the same as the latter, which gives a hint about what is going on.

It's the "--max-count" which is interesting here. It is applied _after_
the sorting. So in the non-sorted case, git can stream out commits and
stop after printing 51. In the sorted case, we have to access every
commit to get its date (well, every commit on the first-parent path),
then sort them, and then return the first 51.

> In combination with --first-parent, --date-order (or any other
> ordering option) should be a no-op, since --first-parent selects a
> linear history. So it seems like there's a significant performance win
> available by ignoring ordering options when --first-parent is present.
> Would this change be desirable? If so, I'll see about submitting a
> patch.

I do agree that --date-order on a linear parent walk cannot change the
ordering (which guarantees child-before-parent), and is a noop.  But
note that not all first-parent invocations are strictly linear. For
example:

  git log --first-parent --date-order master next

which starts from two tips. We'd still want to order commits from the
two lists according to --date-order.

I suppose we could catch the single-tip, first-parent case and ignore
any ordering options which imply child-before-parent (which is currently
all of them). But I did not think too hard if there are any other corner
cases. This sounds like a case of "doctor, it hurts when I do this". Why
do you want to add --date-order in such a case, when it is a noop?

> More generally, it seems like it might be possible to identify when
> the selected commits are linear and avoid the cost of sorting.

It's not the cost of sorting. It's the cost of accessing the commits (if
you profile, you should see most time spent in zlib). So figuring out
that the case is linear will require roughly the same expense.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git log with ordering option and --first-parent is unnecessarily slow
  2016-04-09  5:01 ` Jeff King
@ 2016-04-09  5:42   ` Josiah Boning
  0 siblings, 0 replies; 3+ messages in thread
From: Josiah Boning @ 2016-04-09  5:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Fri, Apr 8, 2016 at 10:01 PM, Jeff King wrote:
> I do agree that --date-order on a linear parent walk cannot change the
> ordering (which guarantees child-before-parent), and is a noop.  But
> note that not all first-parent invocations are strictly linear. For
> example:
>
>   git log --first-parent --date-order master next
>
> which starts from two tips. We'd still want to order commits from the
> two lists according to --date-order.

Ah, indeed; multiple tips certainly count as a nonlinearity.

> I suppose we could catch the single-tip, first-parent case and ignore
> any ordering options which imply child-before-parent (which is currently
> all of them). But I did not think too hard if there are any other corner
> cases. This sounds like a case of "doctor, it hurts when I do this". Why
> do you want to add --date-order in such a case, when it is a noop?

It hurt when I did that, so I stopped. In my case, the git log command
was written when the repositories involved were much smaller, so the
delay was not so noticeable. Today I found and fixed the
invocation--but it still seems nice to fix at the git level, if that's
possible without adding much complexity.

> It's not the cost of sorting. It's the cost of accessing the commits (if
> you profile, you should see most time spent in zlib). So figuring out
> that the case is linear will require roughly the same expense.

I'm imagining that while traversing the commit graph, one could keep
an is_linear flag, which is set initially if starting from a single
tip, and becomes unset when a merge commit is encountered. If
is_linear is still true when the entire requested range or the desired
number of commit sare accumulated, then there's no need to sort.

This is, as a disclaimer, the guesswork of someone not familiar with
the codebase. I'll go read the source before imagining further :)

Josiah

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-09  5:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-09  3:55 git log with ordering option and --first-parent is unnecessarily slow Josiah Boning
2016-04-09  5:01 ` Jeff King
2016-04-09  5:42   ` Josiah Boning

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.