git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Taylor Blau <me@ttaylorr.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions
Date: Wed, 17 Jun 2020 22:03:09 -0700	[thread overview]
Message-ID: <CABPp-BHqPqEyf_US9GXB1bqC_ks0Xt5xLcDAwdKE=GhC6CAHZw@mail.gmail.com> (raw)
In-Reply-To: <2ee11a3c-9785-fa29-afa4-9eeac2248972@gmail.com>

On Wed, Jun 17, 2020 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/17/2020 9:59 PM, Elijah Newren wrote:
> > On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 6/17/2020 7:14 PM, Elijah Newren wrote:
> >> You mentioned in another thread that it is a bit unwieldy for a user
> >> to rely on a committed (or staged?) file, so adding the ability to
> >> check the working directory first is interesting. I wonder how the
> >> timing comes into play when changing HEAD to a new commit? Seems
> >> tricky, but solvable.
> >
> > Isn't that essentially the same timing issue that comes into play if
> > you only look at the index, and are changing HEAD to a new commit?
>
> It adds to the complexity. We can inspect the new index for the
> in-tree definition and update the skip-worktree bits before actually
> changing the working directory to match the new index. However, if
> we trust the working directory copy over the index copy, then we
> need to also decide if the working directory copy _would_ change
> in this move before using it.
>
> Of course, maybe I'm making it over-complicated. I still know so
> little about the index. I got into this feature due to a simple
> pattern-matching problem that I could understand, but now I'm
> getting lost in index states and dirty statuses. ;)

Honestly, I think your first cut for switching branches with this new
feature should just be:
  1) Do a switch/checkout exactly the way it's done today already:
    1a) Load the index and existing sparsity rules (from worktree then
index), according to what existed before the branch switch
    1b) Do the appropriate 1-way or 2-way merge from unpack_trees() to
move to the right branch (as builtin/checkout.c already does)
  2) *If* using in-tree sparsity feature, re-load the sparsity rules
(from worktree then index) and run the equivalent of `sparse-checkout
reapply`

Not only does this avoid messing up any code for the normal case, I'm
pretty sure this gets the behavior the user expects in all the special
cases.  There is one big downside, and a little downside.  The big
downside is that it'll have two progress meters instead of just one
(much like sparse-checkout init && sparse-checkout set do today).  The
little downside is that this isn't optimal from a performance
standpoint, for two reasons:
(a) some files may be updated in the working tree in step 1b despite
the fact that they'll be removed in step 2.
(b) step 2 may just be an expensive no-op, in fact I suspect it will
be most the time

The reason I think the performance isn't a big deal is because:
C) the fact that (b) is a problem means (a) is not often an issue --
the sparsity patterns are usually the same.
D) Even if the sparsity patterns differ, it's often the case that
files within the tree won't change (people tend to only modify a few
files per commit, after all, so even switching branches only updates a
subset of all files).
E) Even if the sparsity patterns differ and files differ and have to
be updated, it's still proportional at most to the number of files
selected by the sparsity patterns, so it won't feel excessive or slow
to the user.
F) `sparse-checkout reapply` is pretty cheap, especially if called as
a function instead of as a subprocess


In my opinion, other than the unfortunate dual progress meters, the
only place where this new feature gets hairy is that `git
sparse-checkout reapply` (or its libified variant) now needs to know
how to "read your new form of sparsity rules".  That sounds easy at
first but in my opinion it is a bit of a mess.  The user could have
written garbage to the file and I don't know what to do with garbage,
but I think we have to do something sane.  (Give up and checkout
everything?  Give up and checkout nothing but the toplevel directory?
Ignore each line we don't understand and treat the rest as stuff that
needs to be checked out?  Can that last one even be done with the .ini
reader?)  I don't think saying "well, tough luck the user should know
better" works because we already know that merges or rebases or
checkout -m could have written garbage to those files (in the form of
conflict markers), and users will say that *we* were the ones that
filled the file with garbage.  So we have to handle it intelligently
in some fashion.

Thoughts?

      reply	other threads:[~2020-06-18  5:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
2020-05-07 22:27   ` Junio C Hamano
2020-05-08 12:19     ` Derrick Stolee
2020-05-08 15:09       ` Junio C Hamano
2020-05-20 16:32     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58   ` Junio C Hamano
2020-05-08 15:40     ` Derrick Stolee
2020-05-20 17:52       ` Elijah Newren
2020-06-17 23:07         ` Elijah Newren
2020-06-18  8:18           ` Son Luong Ngoc
2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
2020-05-20 16:28   ` Elijah Newren
2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
2020-05-20 16:40   ` Elijah Newren
2020-05-21  3:49     ` Elijah Newren
2020-05-21 17:54       ` Derrick Stolee
2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
2020-05-20 18:10   ` Elijah Newren
2020-05-30 17:26     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
2020-06-17 23:14 ` Elijah Newren
2020-06-18  1:42   ` Derrick Stolee
2020-06-18  1:59     ` Elijah Newren
2020-06-18  3:01       ` Derrick Stolee
2020-06-18  5:03         ` Elijah Newren [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHqPqEyf_US9GXB1bqC_ks0Xt5xLcDAwdKE=GhC6CAHZw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).