All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Victoria Dye <vdye@github.com>
Cc: Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 6/7] read-tree: make two-way merge sparse-aware
Date: Mon, 28 Feb 2022 18:56:58 -0800	[thread overview]
Message-ID: <CABPp-BH4jf6z5umYpWB_A+gokUtDkjkiLOahB3FKed_Pe4sZ2g@mail.gmail.com> (raw)
In-Reply-To: <dc47f12b-8724-22ef-ed2c-096badfafd76@github.com>

On Mon, Feb 28, 2022 at 10:04 AM Victoria Dye <vdye@github.com> wrote:
>
> Elijah Newren wrote:
> > On Wed, Feb 23, 2022 at 4:09 PM Victoria Dye via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Victoria Dye <vdye@github.com>
> >>
> >> Enable two-way merge with 'git read-tree' without expanding the sparse
> >> index. When in a sparse index, a two-way merge will trivially succeed as
> >> long as there are not changes to the same sparse directory in multiple trees
> >> (i.e., sparse directory-level "edit-edit" conflicts). If there are such
> >> conflicts, the merge will fail despite the possibility that individual files
> >> could merge cleanly.
> >>
> >> In order to resolve these "edit-edit" conflicts, "conflicted" sparse
> >> directories are - rather than rejected - merged by traversing their
> >> associated trees by OID. For each child of the sparse directory:
> >>
> >> 1. Files are merged as normal (see Documentation/git-read-tree.txt for
> >>    details).
> >> 2. Subdirectories are treated as sparse directories and merged in
> >>    'twoway_merge'. If there are no conflicts, they are merged according to
> >>    the rules in Documentation/git-read-tree.txt; otherwise, the subdirectory
> >>    is recursively traversed and merged.
> >>
> >> This process allows sparse directories to be individually merged at the
> >> necessary depth *without* expanding a full index.
> >
> > The idea of merging directory-level entries turns out to be
> > problematic _if_ rename detection is involved, but read-tree-style
> > merges are only trivial merges that ignore rename detection.  As such,
> > this idea is perfectly reasonable, and is a good way to go.  Nicely
> > done.
> >
> > Mostly the patch looks good.  There's one thing I'm wondering about, though...
> >
> >>
> >> Signed-off-by: Victoria Dye <vdye@github.com>
> >> ---
> >>  builtin/read-tree.c                      |  5 --
> >>  t/t1092-sparse-checkout-compatibility.sh |  3 +-
> >>  unpack-trees.c                           | 75 ++++++++++++++++++++++++
> >>  3 files changed, 77 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> >> index a7b7f822281..5a421de2629 100644
> >> --- a/builtin/read-tree.c
> >> +++ b/builtin/read-tree.c
> >> @@ -225,11 +225,6 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >>                         opts.fn = opts.prefix ? bind_merge : oneway_merge;
> >>                         break;
> >>                 case 2:
> >> -                       /*
> >> -                        * TODO: update twoway_merge to handle edit/edit conflicts in
> >> -                        * sparse directories.
> >> -                        */
> >> -                       ensure_full_index(&the_index);
> >>                         opts.fn = twoway_merge;
> >>                         opts.initial_checkout = is_cache_unborn();
> >>                         break;
> >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> >> index a404be0a10f..d6f19682d65 100755
> >> --- a/t/t1092-sparse-checkout-compatibility.sh
> >> +++ b/t/t1092-sparse-checkout-compatibility.sh
> >> @@ -1411,7 +1411,8 @@ test_expect_success 'sparse index is not expanded: read-tree' '
> >>         init_repos &&
> >>
> >>         ensure_not_expanded checkout -b test-branch update-folder1 &&
> >> -       for MERGE_TREES in "update-folder2"
> >> +       for MERGE_TREES in "update-folder2" \
> >> +                          "base update-folder2"
> >>         do
> >>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
> >>                 ensure_not_expanded reset --hard HEAD || return 1
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index dba122a02bb..a4ace53904e 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -1360,6 +1360,42 @@ static int is_sparse_directory_entry(struct cache_entry *ce,
> >>         return sparse_dir_matches_path(ce, info, name);
> >>  }
> >>
> >> +static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
> >> +{
> >> +       struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> >> +       struct unpack_trees_options *o = info->data;
> >> +       int ret;
> >> +
> >> +       assert(o->merge);
> >> +
> >> +       /*
> >> +        * Unlike in 'unpack_callback', where src[0] is derived from the index when
> >> +        * merging, src[0] is a transient cache entry derived from the first tree
> >> +        * provided. Create the temporary entry as if it came from a non-sparse index.
> >> +        */
> >> +       if (!is_null_oid(&names[0].oid)) {
> >> +               src[0] = create_ce_entry(info, &names[0], 0,
> >> +                                       &o->result, 1,
> >> +                                       dirmask & (1ul << 0));
> >> +               src[0]->ce_flags |= (CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE);
> >> +       }
> >> +
> >> +       /*
> >> +        * 'unpack_single_entry' assumes that src[0] is derived directly from
> >> +        * the index, rather than from an entry in 'names'. This is *not* true when
> >> +        * merging a sparse directory, in which case names[0] is the "index" source
> >> +        * entry. To match the expectations of 'unpack_single_entry', shift past the
> >> +        * "index" tree (i.e., names[0]) and adjust 'names', 'n', 'mask', and
> >> +        * 'dirmask' accordingly.
> >> +        */
> >> +       ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
> >
> > So, you're passing one less entry to unpack_single_entry() when you've
> > traversed into a sparse directory...won't the traversal at the next
> > subdirectory deeper then also pass one less entry to
> > unpack_single_entry(), so after recursing a directory or two, you only
> > have one directory left and it won't conflict with anything so it just
> > uses that remaining tree?  (Or maybe it passes the wrong number of
> > arguments into twoway_merge()?)  Did I miss something in the logic
> > somewhere that avoids that issue?  It'd be nice to test it out, which
> > brings me to...
> >
>
> The answer itself is pretty straightforward (`merged_sparse_dir(...)` is
> called with `n = 3`, which is +1 to the `n` propagated throughout
> `unpack_trees(...)` for a two-way merge), but I'd like to take a more
> in-depth approach answering "why" in case it helps with review and/or anyone
> reading along.
>
> Suppose you are performing a two-way merge, e.g. with the command `git
> read-tree my-base other-commit`. The repo contains the following files:
>
> .
> ├── bar
> │   └── f1
> ├── baz
> │   ├── deep
> │   │   └── a
> │   └── f2
> ├── foo
> └── foo1
>
> Additionally:
>
> 1. 'other-commit' is identical to the index
> 2. 'baz/' is a sparse directory
>
> With those assumptions in mind, below are excerpts of the execution path for
> merging a file, a non-sparse directory, and a sparse directory (with values
> for arguments indicated where appropriate):
>
> MERGING FILE 'foo'
> ------------------
> unpack_trees(len = 2, t = [my-base, other-commit], ...)
> -> traverse_trees(..., n = 2, t = [my-base, other-commit], ...)
>   -> unpack_callback(n = 2, ..., names = [my-base:foo, other-commit:foo], ...)
>     -> unpack_single_entry(n = 2, ..., src = [INDEX:foo, NULL, NULL],
>                            names = [my-base:foo, other-commit:foo], ...)
>       -> call_unpack_fn(src = [INDEX:foo, my-base:foo, other-commit:foo], ...)
>         -> twoway_merge(src = [INDEX:foo, my-base:foo, other-commit:foo] ...)
>   -> unpack_callback(n = 2, ..., names = [my-base:foo1, other-commit:foo1], ...)
>     ...
>
> MERGING NON-SPARSE DIRECTORY 'bar/'
> ----------------------------------
> unpack_trees(len = 2, t = [my-base, other-commit], ...)
> -> traverse_trees(..., n = 2, t = [my-base, other-commit], ...)
>   -> unpack_callback(n = 2, ..., names = [my-base:bar/, other-commit:bar/], ...)
>     -> unpack_single_entry(n = 2, ..., src = [NULL, NULL, NULL],
>                            names = [my-base:bar/, other-commit:bar/], ...)
>     -> traverse_trees_recursive(n = 2, ..., names = [my-base:bar/, other-commit:bar/], ...)
>       -> traverse_trees(..., n = 2, t = [my-base, other-commit], ...)
>         -> unpack_callback(n = 2, ..., names = [my-base:bar/f1, other-commit:bar/f1], ...)
>           ...
>
> MERGING SPARSE DIRECTORY 'baz/'
> ------------------------------
> unpack_trees(len = 2, t = [my-base, other-commit], ...)
> -> traverse_trees(..., n = 2, t = [my-base, other-commit], ...)
>   -> unpack_callback(n = 2, ..., names = [my-base:baz/, other-commit:baz/], ...)
>     -> unpack_single_entry(n = 2, ..., src = [INDEX:baz/, NULL, NULL],
>                            names = [my-base:baz/, other-commit:baz/], ...)
>       -> call_unpack_fn(src = [INDEX:baz/, my-base:baz/, other-commit:baz/], ...)
>         -> twoway_merge(src = [INDEX:baz/, my-base:baz/, other-commit:baz/] ...)
>   -> unpack_callback(n = 2, ..., names = [my-base:foo, other-commit:foo], ...)
>     ...
>
> Note that, throughout this process, `n = 2` despite the fact that
> `twoway_merge` is actually called with *three* source entries, because
> `src[0]` is the traversed entry *as it appears in the index*. Additionally,
> because there are no differences between the index and 'other-commit',
> 'baz/' is merged without conflict in basically the same way as 'foo'.
>
> Now suppose you update the the index to differ from 'other-commit':
>
> 1. 'baz/deep/a' is different between the index and 'my-base', same between
>    'my-base' and 'other-commit'
> 2. 'baz/f2' is different between 'my-base' and 'other-commit', same between
>    the index and 'my-base'
>
> Before this patch, `twoway_merge` would reject the merge entirely because
> the sparse directory 'baz/' has three different tree OIDs between the index,
> 'my-base' , and 'other-commit'. However, these changes *should* be mergeable
> as long as you merge 'baz/deep/' separately from 'baz/f2'.
>
> The way we do this is by traversing the 'baz/' index entry the same way we
> traverse 'my-base' and 'other-commit' so we can circumvent populating
> `src[0]` from the index. The execution path looks something like this (when
> started at the sparse directory's `twoway_merge`):
>
> MERGING CONFLICT IN SPARSE 'baz/'
> ---------------------------------
> twoway_merge(src = [INDEX:baz/, my-base:baz/, other-commit:baz/] ...)
> -> merged_sparse_dir(src = [INDEX:baz/, my-base:baz/, other-commit:baz/], n = 3, ...)
>   -> traverse_trees(..., n = 3, t = [INDEX^{tree}, my-base, other-commit], ...)
>     -> unpack_sparse_callback(n = 3, ...,
>                               names = [INDEX^{tree}:baz/deep/, my-base:baz/deep/, other-commit:baz/deep/])
>       -> unpack_single_entry(n = 2, ..., src = [INDEX^{tree}:baz/deep/, NULL, NULL],
>                              names = [my-base:baz/deep/, other-commit:baz/deep/], ...)
>         -> call_unpack_fn(src = [INDEX^{tree}:baz/deep/, my-base:baz/deep/, other-commit:baz/deep/], ...)
>           -> twoway_merge(src = [INDEX^{tree}:baz/deep/, my-base:baz/deep/, other-commit:baz/deep/] ...)
>     -> unpack_sparse_callback(n = 3, ...,
>                               names = [INDEX^{tree}:baz/f2, my-base:baz/f2, other-commit:baz/f2])
>       -> unpack_single_entry(n = 2, ..., src = [INDEX^{tree}:baz/f2, NULL, NULL],
>                              names = [my-base:baz/f2, other-commit:baz/f2], ...)
>         -> call_unpack_fn(src = [INDEX^{tree}:baz/f2, my-base:baz/f2, other-commit:baz/f2], ...)
>           -> twoway_merge(src = [INDEX^{tree}:baz/f2, my-base:baz/f2, other-commit:baz/f2] ...)
>
> Here, `unpack_sparse_callback` transforms its inputs (from `traverse_trees`)
> into the those needed by `unpack_single_entry`. Unlike `unpack_callback`,
> which extracts its `src[0]` from the index, `unpack_sparse_callback` creates
> `src[0]` from the first tree in `names`. Then, because `unpack_single_entry`
> expects `n = 2` with a populated `src[0]` and the two remaining trees in
> `names`, it subtracts 1 from its value of `n` and shifts `names` (along with
> other arguments like `mask` and `dirmask`). As you can see, this could
> continue recursing without losing trees, since the first tree in the
> `merged_sparse_dir` path is basically storage for the index as the traversal
> continues.
>
> Please let me know if I can clarify anything - this explanation more-or-less
> corresponds to how I understood the problem while solving it, but it's still
> pretty dense and there could be details I'm not conveying well or taking for
> granted.

Thanks for taking the time to explain this.  For some reason, I tend
to get confused by the extra index entry.  I blame the fact that I
mostly only ever looked at threeway_merge() (since it was the one used
by merge-recursive.c), and there we get to assume that the HEAD tree
is equal to the index always, so there really isn't a separate fourth
argument from the index to worry about.  So the combination of having
to look at twoway_merge() as well as the special index entry for some
reason was throwing me a bit.  This clears it up nicely.

(Once upon a time we only claimed but didn't enforce that index was
equal to the HEAD tree for the three way merging, but that was an
abject failure; if you're curious about it, see the fun commit message
at 9822175d2b ("Ensure index matches head before invoking merge
machinery, round N", 2019-08-17) explaining it.)

> >> +
> >> +       if (src[0])
> >> +               discard_cache_entry(src[0]);
> >> +
> >> +       return ret >= 0 ? mask : -1;
> >> +}
> >> +
> >>  /*
> >>   * Note that traverse_by_cache_tree() duplicates some logic in this function
> >>   * without actually calling it. If you change the logic here you may need to
> >> @@ -2464,6 +2500,37 @@ static int merged_entry(const struct cache_entry *ce,
> >>         return 1;
> >>  }
> >>
> >> +static int merged_sparse_dir(const struct cache_entry * const *src, int n,
> >> +                            struct unpack_trees_options *o)
> >> +{
> >> +       struct tree_desc t[MAX_UNPACK_TREES + 1];
> >> +       void * tree_bufs[MAX_UNPACK_TREES + 1];
> >> +       struct traverse_info info;
> >> +       int i, ret;
> >> +
> >> +       /*
> >> +        * Create the tree traversal information for traversing into *only* the
> >> +        * sparse directory.
> >> +        */
> >> +       setup_traverse_info(&info, src[0]->name);
> >> +       info.fn = unpack_sparse_callback;
> >> +       info.data = o;
> >> +       info.show_all_errors = o->show_all_errors;
> >> +       info.pathspec = o->pathspec;
> >> +
> >> +       /* Get the tree descriptors of the sparse directory in each of the merging trees */
> >> +       for (i = 0; i < n; i++)
> >> +               tree_bufs[i] = fill_tree_descriptor(o->src_index->repo, &t[i],
> >> +                                                   src[i] && !is_null_oid(&src[i]->oid) ? &src[i]->oid : NULL);
> >> +
> >> +       ret = traverse_trees(o->src_index, n, t, &info);
> >> +
> >> +       for (i = 0; i < n; i++)
> >> +               free(tree_bufs[i]);
> >> +
> >> +       return ret;
> >> +}
> >> +
> >>  static int deleted_entry(const struct cache_entry *ce,
> >>                          const struct cache_entry *old,
> >>                          struct unpack_trees_options *o)
> >> @@ -2734,6 +2801,14 @@ int twoway_merge(const struct cache_entry * const *src,
> >>                          * reject the merge instead.
> >>                          */
> >>                         return merged_entry(newtree, current, o);
> >> +               } else if (S_ISSPARSEDIR(current->ce_mode)) {
> >> +                       /*
> >> +                        * The sparse directories differ, but we don't know whether that's
> >> +                        * because of two different files in the directory being modified
> >> +                        * (can be trivially merged) or if there is a real file conflict.
> >> +                        * Merge the sparse directory by OID to compare file-by-file.
> >> +                        */
> >> +                       return merged_sparse_dir(src, 3, o);
> >>                 } else
> >>                         return reject_merge(current, o);
> >>         }
> >> --
> >> gitgitgadget
> >
> > It would be nice to have a couple of tests.  In particular, one
> > designed to see what happens when we need to traverse into
> > subdirectories of sparse directory entries and paths different between
> > the two trees being merged.
>
> There were supposed to be "ensure_not_expanded" tests added in this patch,
> but they ended up in [7/7] - I'll move them back in my next version.

Awesome.

> Additionally, the 't1092' test 'read-tree --merge with edit/edit conflicts
> in sparse directories' contains examples of merges that require recursing
> into sparse directories (added back in [3/7] to establish expected behavior
> before changing `read-tree`).

Any chance you could reference the testcase from 3/7 in your commit
message just so other reviewers or our future selves can find it more
easily?

  reply	other threads:[~2022-03-01  2:57 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 18:25 [PATCH 0/7] Sparse index: integrate with 'read-tree' Victoria Dye via GitGitGadget
2022-02-23 18:25 ` [PATCH 1/7] sparse-index: prevent repo root from becoming sparse Victoria Dye via GitGitGadget
2022-02-24 16:48   ` Derrick Stolee
2022-02-24 21:42     ` Victoria Dye
2022-02-23 18:25 ` [PATCH 2/7] status: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-02-23 18:25 ` [PATCH 3/7] read-tree: expand sparse checkout test coverage Victoria Dye via GitGitGadget
2022-02-23 18:25 ` [PATCH 4/7] read-tree: integrate with sparse index Victoria Dye via GitGitGadget
2022-02-23 18:25 ` [PATCH 5/7] read-tree: narrow scope of index expansion for '--prefix' Victoria Dye via GitGitGadget
2022-02-23 18:25 ` [PATCH 6/7] read-tree: make two-way merge sparse-aware Victoria Dye via GitGitGadget
2022-02-26  8:05   ` Elijah Newren
2022-02-28 18:04     ` Victoria Dye
2022-03-01  2:56       ` Elijah Newren [this message]
2022-02-23 18:25 ` [PATCH 7/7] read-tree: make three-way " Victoria Dye via GitGitGadget
2022-02-24 16:59 ` [PATCH 0/7] Sparse index: integrate with 'read-tree' Derrick Stolee
2022-02-24 22:34 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-02-24 22:34   ` [PATCH v2 1/7] sparse-index: prevent repo root from becoming sparse Victoria Dye via GitGitGadget
2022-02-24 22:34   ` [PATCH v2 2/7] status: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-02-25  7:45     ` Elijah Newren
2022-02-28 23:17       ` Victoria Dye
2022-02-24 22:34   ` [PATCH v2 3/7] read-tree: expand sparse checkout test coverage Victoria Dye via GitGitGadget
2022-02-26  8:41     ` Elijah Newren
2022-02-28 18:14       ` Victoria Dye
2022-02-28 23:09     ` Ævar Arnfjörð Bjarmason
2022-02-28 23:27       ` Victoria Dye
2022-02-28 23:46         ` Ævar Arnfjörð Bjarmason
2022-02-24 22:34   ` [PATCH v2 4/7] read-tree: integrate with sparse index Victoria Dye via GitGitGadget
2022-02-24 22:34   ` [PATCH v2 5/7] read-tree: narrow scope of index expansion for '--prefix' Victoria Dye via GitGitGadget
2022-02-25  8:38     ` Elijah Newren
2022-02-25 20:25       ` Victoria Dye
2022-02-26  7:52         ` Elijah Newren
2022-02-28 18:44           ` Victoria Dye
2022-02-24 22:34   ` [PATCH v2 6/7] read-tree: make two-way merge sparse-aware Victoria Dye via GitGitGadget
2022-02-24 22:34   ` [PATCH v2 7/7] read-tree: make three-way " Victoria Dye via GitGitGadget
2022-02-26  8:46   ` [PATCH v2 0/7] Sparse index: integrate with 'read-tree' Elijah Newren
2022-03-01 20:24   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 1/8] sparse-index: prevent repo root from becoming sparse Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 2/8] status: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 3/8] read-tree: explicitly disallow prefixes with a leading '/' Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 4/8] read-tree: expand sparse checkout test coverage Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 5/8] read-tree: integrate with sparse index Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 6/8] read-tree: narrow scope of index expansion for '--prefix' Victoria Dye via GitGitGadget
2022-03-03 17:54       ` Glen Choo
2022-03-03 21:19         ` Victoria Dye
2022-03-04 18:47           ` Glen Choo
2022-03-01 20:24     ` [PATCH v3 7/8] read-tree: make two-way merge sparse-aware Victoria Dye via GitGitGadget
2022-03-01 20:24     ` [PATCH v3 8/8] read-tree: make three-way " Victoria Dye via GitGitGadget
2022-03-02  7:22     ` [PATCH v3 0/8] Sparse index: integrate with 'read-tree' Elijah Newren
2022-03-02 13:40       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BH4jf6z5umYpWB_A+gokUtDkjkiLOahB3FKed_Pe4sZ2g@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.