All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-08 20:44 Ben Peart
  2016-09-08 21:22 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Peart @ 2016-09-08 20:44 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, =peartben, Ben Peart

Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

    merge_working_tree <-- 95%
        unpack_trees <-- 80%
            traverse_trees <-- 50%
            cache_tree_update <-- 17%
            mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406               trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667       performance: 17.442926555 s: read_index_from
name-hash.c:128         performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208       performance: 4.387713335 s: write_locked_index
trace.c:420             performance: 166.458921289 s: git command:
                                        'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406               trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667       performance: 16.100742476 s: read_index_from
trace.c:420             performance: 16.461547867 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/checkout.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..595d64b 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -827,10 +827,25 @@ static int switch_branches(const struct checkout_opts *opts,
 		parse_commit_or_die(new->commit);
 	}
 
-	ret = merge_working_tree(opts, &old, new, &writeout_error);
-	if (ret) {
-		free(path_to_free);
-		return ret;
+	/*
+	 * Optimize the performance of checkout when the current and
+	 * new branch have the same OID and avoid the trivial merge.
+	 * For example, a "git checkout -b foo" just needs to create
+	 * the new ref and report the stats.
+	 */
+	if (!old.commit || !new->commit
+		|| oidcmp(&old.commit->object.oid, &new->commit->object.oid)
+		|| !opts->new_branch || opts->new_branch_force || opts->new_orphan_branch
+		|| opts->patch_mode || opts->merge || opts->force || opts->force_detach
+		|| opts->writeout_stage || !opts->overwrite_ignore
+		|| opts->ignore_skipworktree || opts->ignore_other_worktrees
+		|| opts->new_branch_log || opts->branch_exists || opts->prefix
+		|| opts->source_tree) {
+		ret = merge_working_tree(opts, &old, new, &writeout_error);
+		if (ret) {
+			free(path_to_free);
+			return ret;
+		}
 	}
 
 	if (!opts->quiet && !old.path && old.commit && new->commit != old.commit)
-- 
2.10.0.windows.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-08 20:44 [PATCH] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
@ 2016-09-08 21:22 ` Junio C Hamano
  2016-09-08 21:37   ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2016-09-08 21:22 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, pclouds, =peartben, Ben Peart

Ben Peart <peartben@gmail.com> writes:

> Teach git to avoid unnecessary merge during trivial checkout.  When
> running 'git checkout -b foo' git follows a common code path through
> the expensive merge_working_tree even when it is unnecessary.

I would be lying if I said I am not sympathetic to the cause, but...

> +	/*
> +	 * Optimize the performance of checkout when the current and
> +	 * new branch have the same OID and avoid the trivial merge.
> +	 * For example, a "git checkout -b foo" just needs to create
> +	 * the new ref and report the stats.
> +	 */
> +	if (!old.commit || !new->commit
> +		|| oidcmp(&old.commit->object.oid, &new->commit->object.oid)
> +		|| !opts->new_branch || opts->new_branch_force || opts->new_orphan_branch
> +		|| opts->patch_mode || opts->merge || opts->force || opts->force_detach
> +		|| opts->writeout_stage || !opts->overwrite_ignore
> +		|| opts->ignore_skipworktree || opts->ignore_other_worktrees
> +		|| opts->new_branch_log || opts->branch_exists || opts->prefix
> +		|| opts->source_tree) {

... this is a maintenance nightmare in that any new option we will
add later will need to consider what this "optimization" is trying
(not) to skip.  The first two lines (i.e. we need a real checkout if
we cannot positively say that old and new commits are the same
object) are clear, but no explanation was given for all the other
random conditions this if condition checks.  What if opts->something
was not listed (or "listed" for that matter) in the list above--it
is totally unclear if it was missed by mistake (or "added by
mistake") or deliberately excluded (or "deliberately added").

For example, why is opts->prefix there?  If

	git checkout -b new-branch HEAD

should be able to omit the two-way merge, shouldn't

	cd t && git checkout -b new-branch HEAD

also be able to?

Even the main condition is unclear.  It wants to see that old and
new have exactly the same commit, but shouldn't the "the result of
the two-way merge is known to be no-op" logic equally apply if the
old and two trees are the same?





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-08 21:22 ` Junio C Hamano
@ 2016-09-08 21:37   ` Jeff King
  2016-09-09 13:29     ` Ben Peart
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2016-09-08 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, git, pclouds, =peartben, Ben Peart

On Thu, Sep 08, 2016 at 02:22:16PM -0700, Junio C Hamano wrote:

> > +	/*
> > +	 * Optimize the performance of checkout when the current and
> > +	 * new branch have the same OID and avoid the trivial merge.
> > +	 * For example, a "git checkout -b foo" just needs to create
> > +	 * the new ref and report the stats.
> > +	 */
> > +	if (!old.commit || !new->commit
> > +		|| oidcmp(&old.commit->object.oid, &new->commit->object.oid)
> > +		|| !opts->new_branch || opts->new_branch_force || opts->new_orphan_branch
> > +		|| opts->patch_mode || opts->merge || opts->force || opts->force_detach
> > +		|| opts->writeout_stage || !opts->overwrite_ignore
> > +		|| opts->ignore_skipworktree || opts->ignore_other_worktrees
> > +		|| opts->new_branch_log || opts->branch_exists || opts->prefix
> > +		|| opts->source_tree) {
> 
> ... this is a maintenance nightmare in that any new option we will
> add later will need to consider what this "optimization" is trying
> (not) to skip.  The first two lines (i.e. we need a real checkout if
> we cannot positively say that old and new commits are the same
> object) are clear, but no explanation was given for all the other
> random conditions this if condition checks.  What if opts->something
> was not listed (or "listed" for that matter) in the list above--it
> is totally unclear if it was missed by mistake (or "added by
> mistake") or deliberately excluded (or "deliberately added").
> 
> For example, why is opts->prefix there?  If
> 
> 	git checkout -b new-branch HEAD
> 
> should be able to omit the two-way merge, shouldn't
> 
> 	cd t && git checkout -b new-branch HEAD
> 
> also be able to?

I was just writing another reply, but I think our complaints may have
dovetailed.

My issue is that the condition above is an unreadable mass.  It would be
really nice to pull it out into a helper function, and then all of the
items could be split out and commented independently, like:

  static int needs_working_tree_merge(const struct checkout_opts *opts,
                                      const struct branch_info *old,
				      const struct branch_info *new)
  {
	/*
	 * We must do the merge if we are actually moving to a new
	 * commit.
	 */
	if (!old->commit || !new->commit ||
	    oidcmp(&old.commit->object.oid, &new->commit->object.oid))
		return 1;

	/* Option "foo" is not compatible because of... */
	if (opts->foo)
		return 1;

	... etc ...
  }

That still leaves your "what if opts->something is not listed" question
open, but at least it makes it easier to comment on it in the code.

-Peff

PS I didn't think hard on whether the conditions above make _sense_. My
   first goal would be to get more communication about them individually,
   and then we can evaluate them.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-08 21:37   ` Jeff King
@ 2016-09-09 13:29     ` Ben Peart
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Peart @ 2016-09-09 13:29 UTC (permalink / raw)
  To: 'Jeff King', 'Junio C Hamano'
  Cc: git, pclouds, peartben, 'Ben Peart'



> -----Original Message-----
> From: Jeff King [mailto:peff@peff.net]
> Sent: Thursday, September 8, 2016 5:38 PM
> To: Junio C Hamano <gitster@pobox.com>
> Cc: Ben Peart <peartben@gmail.com>; git@vger.kernel.org;
> pclouds@gmail.com; =peartben@gmail.com; Ben Peart
> <benpeart@microsoft.com>
> Subject: Re: [PATCH] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> On Thu, Sep 08, 2016 at 02:22:16PM -0700, Junio C Hamano wrote:
> 
> > > +	/*
> > > +	 * Optimize the performance of checkout when the current and
> > > +	 * new branch have the same OID and avoid the trivial merge.
> > > +	 * For example, a "git checkout -b foo" just needs to create
> > > +	 * the new ref and report the stats.
> > > +	 */
> > > +	if (!old.commit || !new->commit
> > > +		|| oidcmp(&old.commit->object.oid, &new->commit-
> >object.oid)
> > > +		|| !opts->new_branch || opts->new_branch_force || opts-
> >new_orphan_branch
> > > +		|| opts->patch_mode || opts->merge || opts->force || opts-
> >force_detach
> > > +		|| opts->writeout_stage || !opts->overwrite_ignore
> > > +		|| opts->ignore_skipworktree || opts-
> >ignore_other_worktrees
> > > +		|| opts->new_branch_log || opts->branch_exists || opts-
> >prefix
> > > +		|| opts->source_tree) {
> >
> > ... this is a maintenance nightmare in that any new option we will add
> > later will need to consider what this "optimization" is trying
> > (not) to skip.  The first two lines (i.e. we need a real checkout if
> > we cannot positively say that old and new commits are the same
> > object) are clear, but no explanation was given for all the other
> > random conditions this if condition checks.  What if opts->something
> > was not listed (or "listed" for that matter) in the list above--it is
> > totally unclear if it was missed by mistake (or "added by
> > mistake") or deliberately excluded (or "deliberately added").
> >
> > For example, why is opts->prefix there?  If
> >
> > 	git checkout -b new-branch HEAD
> >
> > should be able to omit the two-way merge, shouldn't
> >
> > 	cd t && git checkout -b new-branch HEAD
> >
> > also be able to?

Because this induces a behavior change (the optimized path will no 
longer do a "soft reset" and regenerate the index for example) I was
attempting to make it as restrictive as possible but still enable the
fast path in the most common case.  If everyone is OK with the behavior
change, I can make the optimization more inclusive by removing those
tests that are not absolutely required (like opts->prefix).

To help ensure the optimization is updated when new checkout options are
added I could add a comment into the checkout_opts structure and/or put
a pseudo version check into the code so if the size of the structure
changes, the fast path fails.  That feels a little hacky and I haven't
seen that in other areas so I'd rather stick with splitting it out into
a helper function and add comments.

> 
> I was just writing another reply, but I think our complaints may have
> dovetailed.
> 
> My issue is that the condition above is an unreadable mass.  It would be
> really nice to pull it out into a helper function, and then all of the items could
> be split out and commented independently, like:
> 
>   static int needs_working_tree_merge(const struct checkout_opts *opts,
>                                       const struct branch_info *old,
> 				      const struct branch_info *new)
>   {
> 	/*
> 	 * We must do the merge if we are actually moving to a new
> 	 * commit.
> 	 */
> 	if (!old->commit || !new->commit ||
> 	    oidcmp(&old.commit->object.oid, &new->commit->object.oid))
> 		return 1;
> 
> 	/* Option "foo" is not compatible because of... */
> 	if (opts->foo)
> 		return 1;
> 
> 	... etc ...
>   }

That is a great suggestion.  Splitting this out into a helper function 
with comments will definitely make this more readable/maintainable and 
provide more information on why each test is there.  I'll do that and
reroll the patch.

> 
> That still leaves your "what if opts->something is not listed" question open,
> but at least it makes it easier to comment on it in the code.
> 
> -Peff
> 
> PS I didn't think hard on whether the conditions above make _sense_. My
>    first goal would be to get more communication about them individually,
>    and then we can evaluate them.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-09-09 13:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 20:44 [PATCH] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-08 21:22 ` Junio C Hamano
2016-09-08 21:37   ` Jeff King
2016-09-09 13:29     ` Ben Peart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.