git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Ben Peart <Ben.Peart@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2] checkout: optimize "git checkout -b <new_branch>"
Date: Wed, 15 Aug 2018 17:05:36 -0400	[thread overview]
Message-ID: <d15a36bc-96a8-5d92-5e41-23d726535cf3@gmail.com> (raw)
In-Reply-To: <21ac52cb-6a23-01c7-c593-59bd86ebca52@gmail.com>



On 8/6/2018 10:25 AM, Ben Peart wrote:
> 
> 
> On 8/3/2018 11:58 AM, Duy Nguyen wrote:
>> On Thu, Aug 02, 2018 at 02:02:00PM -0400, Ben Peart wrote:
>>>
>>>
> 
>> But if you still want to push it further, this is something I have in
>> mind. It probably has bugs, but at least preliminary test shows me
>> that it could skip 99% work inside unpack_trees() and not need to
>> write the index.
>>
>> The main check to determine "checkout -b" is basically the new
>> oidcmp() in merge_working_tree(). Not sure if I miss any condition in
>> that check, I didn't look very closely at checkout code.
>>
> 
> Thanks Duy.  I think this is an interesting idea to pursue but... when I 
> tried running this patch on a virtualized repo it started triggering 
> many object downloads.  After taking a quick look, it appears that 
> CE_UPDATE is set on every cache entry so check_updates() ends up calling 
> checkout_entry() which writes out every file to the working tree - even 
> those supposedly skipped by the skip-wortree bit.  Oops.
> 
> Not too surprising (you did say it probably has bugs :)) but it means I 
> can't trivially get performance data on how much this will help.  It 
> also fails a lot of tests (see below).
> 
> It experience does highlight the additional risk of this model of 
> changing the underlying functions (vs the high level optimization of my 
> original patch).  In addition, the new special cases in those 
> lower-level functions do add additional complexity and fragility to the 
> codebase.  So, like most things, to me it isn't a clear better/worse 
> decision - it's just different.  While I like the idea of general 
> optimizations that could apply more broadly to other commands; I do 
> worry about the additional complexity, amount of code churn, and 
> associated risk with the change.
> 
> When I have cycles, I'll take a look at how to fix this bug and get some 
> performance data.  I just wanted to give you a heads up that I'm not 
> ignoring your patch, just that it is going to take additional time and 
> effort before I can properly evaluate how much impact it will have.
> 

Now that the unpack-trees and cache-tree optimizations are settling 
down, I took a look at this proposed patch again with the intent of 
debugging why so many tests were broken by it.

The most obvious first fix was for all the segment faults when 
dereferencing a NULL pointer.  Adding an additional test so that we only 
perform the optimization when we actually have commit ID's to compare 
fixed a bunch of the test failures.

The next fix was to resolve all the breaks caused by applying this 
optimization when sparse-checkout is turned on.  Since we are skipping 
the logic to update the skip-worktree bit, I added an additional test so 
that we only perform the optimization when sparse checkout is not turned 
on.  Of course, this does completely remove the optimization when using 
sparse checkout so it isn't a workable permanent solution but it let me 
make progress.

There are still test failures with submodules and partial clone.  I 
haven't found/added the necessary tests to prevent those breaks nor the 
few other remaining breaks.

My current set of tests looks like this:

	if (!core_apply_sparse_checkout &&
		old_branch_info->commit &&
		new_branch_info->commit &&
		!oidcmp(&old_branch_info->commit->object.oid,
			&new_branch_info->commit->object.oid)) {

While I'm sure I could find and add additional tests to handle the 
remaining bugs, the net result is starting to look as fragile as the 
original patch.

Unfortunately it has the additional downsides of 1) being at a much 
lower level where we risk breaking more code paths and 2) not being 
nearly as much savings (with the original patch checkout -b <new branch> 
takes 0.3 seconds, this patch will make it take >4 seconds.)

Net, net - I don't think this particular path is a better path to pursue.

I understand the concern with the fragility of the current patch and 
it's set of tests to determine if the optimization is valid.  I also 
understand the concern with the potential change in behavior (ie not 
showing the local changes - even though nothing has changed).  Other 
than switching the optimization back to be "opt-in" via a config flag, I 
don't currently have a great answer.  I'll keep thinking and looking but 
am open to suggestions!



> Test Summary Report
> -------------------
> ./t1011-read-tree-sparse-checkout.sh               (Wstat: 256 Tests: 21 
> Failed: 1)
>    Failed test:  20
>    Non-zero exit status: 1
> ./t1400-update-ref.sh                              (Wstat: 256 Tests: 
> 170 Failed: 73)
>    Failed tests:  40, 42-45, 55-59, 70, 72, 82, 85, 87-88
>                  90-100, 103-110, 113-119, 127, 129-130
>                  132-133, 136-137, 140-147, 150-157, 160-166
>                  170
>    Non-zero exit status: 1
> ./t2011-checkout-invalid-head.sh                   (Wstat: 256 Tests: 10 
> Failed: 5)
>    Failed tests:  3, 6-7, 9-10
>    Non-zero exit status: 1
> ./t2015-checkout-unborn.sh                         (Wstat: 256 Tests: 6 
> Failed: 3)
>    Failed tests:  2-4
>    Non-zero exit status: 1
> ./t2017-checkout-orphan.sh                         (Wstat: 256 Tests: 13 
> Failed: 7)
>    Failed tests:  7-13
>    Non-zero exit status: 1
> ./t3033-merge-toplevel.sh                          (Wstat: 256 Tests: 13 
> Failed: 11)
>    Failed tests:  3-13
>    Non-zero exit status: 1
> ./t3200-branch.sh                                  (Wstat: 256 Tests: 
> 139 Failed: 2)
>    Failed tests:  137-138
>    Non-zero exit status: 1
> ./t5616-partial-clone.sh                           (Wstat: 256 Tests: 13 
> Failed: 1)
>    Failed test:  4
>    Non-zero exit status: 1
> ./t5516-fetch-push.sh                              (Wstat: 256 Tests: 90 
> Failed: 1)
>    Failed test:  34
>    Non-zero exit status: 1
> ./t6300-for-each-ref.sh                            (Wstat: 256 Tests: 
> 205 Failed: 9)
>    Failed tests:  189-196, 199
>    Non-zero exit status: 1
> ./t7114-reset-sparse-checkout.sh                   (Wstat: 256 Tests: 3 
> Failed: 2)
>    Failed tests:  2-3
>    Non-zero exit status: 1
> ./t7063-status-untracked-cache.sh                  (Wstat: 256 Tests: 50 
> Failed: 1)
>    Failed test:  23
>    Non-zero exit status: 1
> ./t7201-co.sh                                      (Wstat: 256 Tests: 38 
> Failed: 33)
>    Failed tests:  4, 6-27, 29-38
>    Non-zero exit status: 1
> ./t7409-submodule-detached-work-tree.sh            (Wstat: 256 Tests: 2 
> Failed: 1)
>    Failed test:  1
>    Non-zero exit status: 1
> ./t9350-fast-export.sh                             (Wstat: 256 Tests: 37 
> Failed: 1)
>    Failed test:  12
>    Non-zero exit status: 1
> ./t9903-bash-prompt.sh                             (Wstat: 256 Tests: 65 
> Failed: 52)
>    Failed tests:  4, 6-10, 14-34, 36, 39-51, 53-62, 65
>    Non-zero exit status: 1
> Files=834, Tests=19658, 2081 wallclock secs (10.42 usr 15.09 sys + 
> 1082.56 cusr 3530.46 csys = 4638.53 CPU)
> Result: FAIL
> 
> 

  reply	other threads:[~2018-08-15 21:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24 18:01 [PATCH v1] checkout: optionally speed up "git checkout -b foo" Ben Peart
2018-07-24 18:42 ` Eric Sunshine
2018-07-24 19:45   ` Ben Peart
2018-07-26 15:04     ` Junio C Hamano
2018-07-26 18:59       ` Eric Sunshine
2018-07-26 19:08         ` Eric Sunshine
2018-07-24 19:21 ` Junio C Hamano
2018-07-24 20:47   ` Ben Peart
2018-07-31 16:39 ` [PATCH v2] checkout: optimize "git checkout -b <new_branch>" Ben Peart
2018-07-31 20:01   ` Junio C Hamano
2018-08-01 15:10   ` Duy Nguyen
2018-08-02 18:02     ` Ben Peart
2018-08-03 15:58       ` Duy Nguyen
2018-08-06 14:25         ` Ben Peart
2018-08-15 21:05           ` Ben Peart [this message]
2018-08-05  8:57       ` Duy Nguyen
2018-08-16 18:27 ` [PATCH v3] " Ben Peart
2018-08-16 18:37   ` Duy Nguyen
2018-08-17 12:37     ` Ben Peart
2018-08-19  1:44       ` Elijah Newren
2018-08-20 13:40         ` Ben Peart
2018-08-20 18:16           ` Elijah Newren
2018-08-21 14:51             ` Duy Nguyen
2018-08-30 17:22               ` Elijah Newren
2018-09-04 16:46                 ` Duy Nguyen
2018-08-20 18:31         ` Junio C Hamano
2018-09-18  5:34   ` [PATCH] config doc: add missing list separator for checkout.optimizeNewBranch Ævar Arnfjörð Bjarmason
2018-09-18 16:57     ` Taylor Blau
2018-09-18 17:16       ` Jeff King
2018-09-18 17:20         ` Taylor Blau
2018-09-18 17:13     ` Jeff King
2018-09-19  4:41       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d15a36bc-96a8-5d92-5e41-23d726535cf3@gmail.com \
    --to=peartben@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).