All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info
Date: Mon, 9 Nov 2020 15:05:48 -0800	[thread overview]
Message-ID: <CABPp-BFNk5ZQLyo5G-7vq3XD3s6z_rGUNM3QL14hXvMYTzK_Ug@mail.gmail.com> (raw)
In-Reply-To: <20201109220431.2534786-1-jonathantanmy@google.com>

On Mon, Nov 9, 2020 at 2:04 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > On Fri, Nov 6, 2020 at 2:52 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > >
> > > > +     /*
> > > > +      * Note: We only label files with df_conflict, not directories.
> > > > +      * Since directories stay where they are, and files move out of the
> > > > +      * way to make room for a directory, we don't care if there was a
> > > > +      * directory/file conflict for a parent directory of the current path.
> > > > +      */
> > > > +     unsigned df_conflict = (filemask != 0) && (dirmask != 0);
> > >
> > > Suppose you have:
> > >
> > >  [ours]
> > >   foo/
> > >     bar/
> > >       baz
> > >     quux
> > >  [theirs]
> > >   foo
> > >
> > > By "we only label files with df_conflict, not directories", are you
> > > referring to not labelling "foo/" in [ours], or to "bar/", "baz", and
> > > "quux" (so, the files and directories within a directory)? At first I
> > > thought you were referring to the former, but perhaps you are referring
> > > to the latter.
> >
> > The former.  I was drawing a distinction between how this code
> > operates, and how unpack_trees() operates, which probably only matters
> > to those familiar with unpack_trees() or who have been reading through
> > it recently.
>
> Just for clarification: do you mean "the latter"? (The "not" in my
> question might be confusing.)

Yeah, probably was confusing, so let me just state where you are
almost right below.

> To be more illustrative in what I meant, at first I thought that you
> were NOT labelling "foo/" in [ours], hence:
>
>  [ours]
>   foo/  <- unlabeled
>  [theirs]
>   foo   <- labeled
>
> In this way, in a sense, you are indeed labelling only the file, not the
> directory.
>
> But instead what you seem to be doing is this:
>
>  [ours]
>   foo/     <- labeled
>     bar/   <- unlabeled
>       baz  <- unlabeled
>     quux   <- unlabeled
>  [theirs]
>   foo      <- labeled
>
> which is what I meant by NOT labelling "bar/", "baz", and "quux".

I'm doing something /really/ close to this, yes.  However, just to be
pedantic, there is no "foo/".  '/' is an illegal character in a
filename to record in a tree.  One side has a "foo" whose mode and
object_id happen to reflect a tree rather than a blob.  But I only
have one conflict_info per pathname, not 3 (can't have three since
strmaps don't allow duplicate keys, and wouldn't want it if I could).
That one conflict_info stores 3 (mode, object_id) pairs, and also has
a single df_conflict bit.  So, I label "foo" by setting that
df_conflict bit.  But I only pay attention to it for the pairs
representing a blob, not the ones representing a tree.  And I don't
propagate the information down to paths below the foo directory.

> > unpack_trees() will note when there is a directory/file
> > conflict, and propagates that information to all subtrees, with every
> > path specially checking for the o->df_conflict_entry and then handling
> > it specially (e.g. keeping higher order stages instead of using an
> > aggressive or trivial resolutions).
>
> And here it seems like you're describing that unpack_trees() would label
> it in this way:
>
>  [ours]
>   foo/     <- labeled
>     bar/   <- labeled
>       baz  <- labeled
>     quux   <- labeled
>  [theirs]
>   foo      <- labeled
>
> (and you're emphasizing by contrast that merge-ort is NOT doing this).

Correct.

> > However, leaving both a file and
> > a directory at the same path, while allowed in the index, makes for
> > ugliness and difficulty for users to resolve.   Plus it isn't allowed
> > in the working tree anyway.  We decided a while ago that it'd be
> > better to represent these conflicts differently[1], [2].
> >
> > Also, at the time you are unpacking or traversing trees, you only know
> > if one side had a directory where the other side had a file.  You
> > don't know if the final merge result will actually have a
> > directory/file conflict.  If the file existed in both the base version
> > and unmodified on one side, for example, then the file will be removed
> > as part of the merge.  It is similarly possible that the entire
> > directory of files all need to be deleted or are all renamed
> > elsewhere.  So, you have to keep track of a df_conflict bit, but you
> > can't act on it until you've processed several other things first.
> >
> > Since I already know I'm not going to move a whole directory of files
> > out of the way so that a file can be placed in the working tree
> > instead of that whole directory, the directory doesn't need to be
> > tweaked.  I'm not going to propagate any information about a
> > directory/file conflict at some path down to all subpaths of the
> > directory.  I only track it for the file that immediately conflicts,
> > and then only take action on it after resolving all the paths under
> > the corresponding directory to see if the directory/file conflict
> > remains.
> >
> > [1] https://lore.kernel.org/git/xmqqbmabcuhf.fsf@gitster-ct.c.googlers.com/
> > and the thread surrounding it
> > [2] https://lore.kernel.org/git/f27f12e8e50e56c010c29caa00296475d4de205b.1603731704.git.gitgitgadget@gmail.com/,
> > which is now commit ef52778708 ("merge tests: expect improved
> > directory/file conflict handling in ort", 2020-10-26)
>
> Makes sense.
>
> > > > @@ -161,6 +179,13 @@ static int collect_merge_info_callback(int n,
> > > >               newinfo.name = p->path;
> > > >               newinfo.namelen = p->pathlen;
> > > >               newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
> > > > +             /*
> > > > +              * If we did care about parent directories having a D/F
> > > > +              * conflict, then we'd include
> > > > +              *    newinfo.df_conflicts |= (mask & ~dirmask);
> > > > +              * here.  But we don't.  (See comment near setting of local
> > > > +              * df_conflict variable near the beginning of this function).
> > > > +              */
> > >
> > > I'm not sure how "mask" and "dirmask" contains information about parent
> > > directories. "mask" represents the available entries, and "dirmask"
> > > represents which of them are directories, as far as I know. So we can
> > > notice when something is missing, but I don't see how this distinguishes
> > > between the case that something is missing because it was in a parent
> > > directory that got deleted, vs something is missing because it itself
> > > got deleted.
> >
> > Yeah, this is more comparisons to unpack_trees.  This code is about to
> > set up a recursive call into subdirectories.  newinfo is set based on
> > the mask and dirmask of the current entry, and then subdirectories can
> > consult newinfo.df_conflicts to see if that path is within a directory
> > that was involved in a directory/file conflict.  For example:
> >
> > Tree in base version:
> >     foo/
> >         bar
> >     stuff.txt
> > Tree on side 1: (adds foo/baz)
> >     foo/
> >         bar
> >         baz
> >     stuff.txt
> > Tree on side 2: (deletes foo/, adds new file foo)
> >    foo
> >    stuff.txt
> >
> > When processing 'foo', we have mask=7, dirmask = 3.  So, here
> > unpack_trees() would have set newinfo.df_conflicts = (mask & ~dirmask)
> > = 4.  Then when we process foo/bar or foo/baz, we have mask=2,
> > dirmask=0, which looks like there are no directory/file conflicts.
> > However, we can note that these paths are under a directory involved
> > in a directory/file conflict via info.df_conflicts whose value is 4.
> > unpack_trees() cared about paths under a directory that was involved
> > in a directory/file conflict, and someone familiar with that code
> > might ask why I don't also track the same information.  The answer is
> > that I don't track it, even though I thought about it, because it's
> > useless overhead since I'm going to leave the directory alone and move
> > the file out of the way.
> >
> > Does that make sense?
>
> Ah...yes, that makes sense. I think I didn't notice the "newinfo", so I
> didn't realize that we were setting the info of our children based on
> ourselves. Perhaps I would have noticed it sooner if the comment had
> read "If this file/directory cared about its parent directory (the
> current directory) having a D/F conflict, then we'd propagate the masks
> in this way:" instead of "If we did care about parent directories having
> a D/F conflict", but perhaps the point is already obvious enough.

I'm happy to reword it if that makes it clearer.  Thanks for the suggestion.

  reply	other threads:[~2020-11-09 23:06 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren [this message]
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren
2020-11-12 20:15   ` Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BFNk5ZQLyo5G-7vq3XD3s6z_rGUNM3QL14hXvMYTzK_Ug@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.