From: Jonathan Tan <jonathantanmy@google.com> To: newren@gmail.com Cc: jonathantanmy@google.com, git@vger.kernel.org Subject: Re: [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Date: Mon, 9 Nov 2020 14:04:31 -0800 [thread overview] Message-ID: <20201109220431.2534786-1-jonathantanmy@google.com> (raw) In-Reply-To: <CABPp-BHOvk+W0u_MYWtH8OC-Uh5bEfsRO6QuXZKWfDQ=8fyasA@mail.gmail.com> > On Fri, Nov 6, 2020 at 2:52 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > > > > + /* > > > + * Note: We only label files with df_conflict, not directories. > > > + * Since directories stay where they are, and files move out of the > > > + * way to make room for a directory, we don't care if there was a > > > + * directory/file conflict for a parent directory of the current path. > > > + */ > > > + unsigned df_conflict = (filemask != 0) && (dirmask != 0); > > > > Suppose you have: > > > > [ours] > > foo/ > > bar/ > > baz > > quux > > [theirs] > > foo > > > > By "we only label files with df_conflict, not directories", are you > > referring to not labelling "foo/" in [ours], or to "bar/", "baz", and > > "quux" (so, the files and directories within a directory)? At first I > > thought you were referring to the former, but perhaps you are referring > > to the latter. > > The former. I was drawing a distinction between how this code > operates, and how unpack_trees() operates, which probably only matters > to those familiar with unpack_trees() or who have been reading through > it recently. Just for clarification: do you mean "the latter"? (The "not" in my question might be confusing.) To be more illustrative in what I meant, at first I thought that you were NOT labelling "foo/" in [ours], hence: [ours] foo/ <- unlabeled [theirs] foo <- labeled In this way, in a sense, you are indeed labelling only the file, not the directory. But instead what you seem to be doing is this: [ours] foo/ <- labeled bar/ <- unlabeled baz <- unlabeled quux <- unlabeled [theirs] foo <- labeled which is what I meant by NOT labelling "bar/", "baz", and "quux". > unpack_trees() will note when there is a directory/file > conflict, and propagates that information to all subtrees, with every > path specially checking for the o->df_conflict_entry and then handling > it specially (e.g. keeping higher order stages instead of using an > aggressive or trivial resolutions). And here it seems like you're describing that unpack_trees() would label it in this way: [ours] foo/ <- labeled bar/ <- labeled baz <- labeled quux <- labeled [theirs] foo <- labeled (and you're emphasizing by contrast that merge-ort is NOT doing this). > However, leaving both a file and > a directory at the same path, while allowed in the index, makes for > ugliness and difficulty for users to resolve. Plus it isn't allowed > in the working tree anyway. We decided a while ago that it'd be > better to represent these conflicts differently[1], [2]. > > Also, at the time you are unpacking or traversing trees, you only know > if one side had a directory where the other side had a file. You > don't know if the final merge result will actually have a > directory/file conflict. If the file existed in both the base version > and unmodified on one side, for example, then the file will be removed > as part of the merge. It is similarly possible that the entire > directory of files all need to be deleted or are all renamed > elsewhere. So, you have to keep track of a df_conflict bit, but you > can't act on it until you've processed several other things first. > > Since I already know I'm not going to move a whole directory of files > out of the way so that a file can be placed in the working tree > instead of that whole directory, the directory doesn't need to be > tweaked. I'm not going to propagate any information about a > directory/file conflict at some path down to all subpaths of the > directory. I only track it for the file that immediately conflicts, > and then only take action on it after resolving all the paths under > the corresponding directory to see if the directory/file conflict > remains. > > [1] https://lore.kernel.org/git/xmqqbmabcuhf.fsf@gitster-ct.c.googlers.com/ > and the thread surrounding it > [2] https://lore.kernel.org/git/f27f12e8e50e56c010c29caa00296475d4de205b.1603731704.git.gitgitgadget@gmail.com/, > which is now commit ef52778708 ("merge tests: expect improved > directory/file conflict handling in ort", 2020-10-26) Makes sense. > > > @@ -161,6 +179,13 @@ static int collect_merge_info_callback(int n, > > > newinfo.name = p->path; > > > newinfo.namelen = p->pathlen; > > > newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1); > > > + /* > > > + * If we did care about parent directories having a D/F > > > + * conflict, then we'd include > > > + * newinfo.df_conflicts |= (mask & ~dirmask); > > > + * here. But we don't. (See comment near setting of local > > > + * df_conflict variable near the beginning of this function). > > > + */ > > > > I'm not sure how "mask" and "dirmask" contains information about parent > > directories. "mask" represents the available entries, and "dirmask" > > represents which of them are directories, as far as I know. So we can > > notice when something is missing, but I don't see how this distinguishes > > between the case that something is missing because it was in a parent > > directory that got deleted, vs something is missing because it itself > > got deleted. > > Yeah, this is more comparisons to unpack_trees. This code is about to > set up a recursive call into subdirectories. newinfo is set based on > the mask and dirmask of the current entry, and then subdirectories can > consult newinfo.df_conflicts to see if that path is within a directory > that was involved in a directory/file conflict. For example: > > Tree in base version: > foo/ > bar > stuff.txt > Tree on side 1: (adds foo/baz) > foo/ > bar > baz > stuff.txt > Tree on side 2: (deletes foo/, adds new file foo) > foo > stuff.txt > > When processing 'foo', we have mask=7, dirmask = 3. So, here > unpack_trees() would have set newinfo.df_conflicts = (mask & ~dirmask) > = 4. Then when we process foo/bar or foo/baz, we have mask=2, > dirmask=0, which looks like there are no directory/file conflicts. > However, we can note that these paths are under a directory involved > in a directory/file conflict via info.df_conflicts whose value is 4. > unpack_trees() cared about paths under a directory that was involved > in a directory/file conflict, and someone familiar with that code > might ask why I don't also track the same information. The answer is > that I don't track it, even though I thought about it, because it's > useless overhead since I'm going to leave the directory alone and move > the file out of the way. > > Does that make sense? Ah...yes, that makes sense. I think I didn't notice the "newinfo", so I didn't realize that we were setting the info of our children based on ourselves. Perhaps I would have noticed it sooner if the comment had read "If this file/directory cared about its parent directory (the current directory) having a D/F conflict, then we'd propagate the masks in this way:" instead of "If we did care about parent directories having a D/F conflict", but perhaps the point is already obvious enough.
next prev parent reply other threads:[~2020-11-09 22:04 UTC|newest] Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren 2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren 2020-11-06 22:05 ` Jonathan Tan 2020-11-06 22:45 ` Elijah Newren 2020-11-09 20:55 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren 2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren 2020-11-11 13:52 ` Derrick Stolee 2020-11-11 16:22 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren 2020-11-11 13:54 ` Derrick Stolee 2020-11-11 16:47 ` Elijah Newren 2020-11-11 16:51 ` Derrick Stolee 2020-11-11 17:03 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren 2020-11-11 13:58 ` Derrick Stolee 2020-11-11 17:07 ` Elijah Newren 2020-11-11 17:10 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren 2020-11-06 22:19 ` Jonathan Tan 2020-11-06 23:10 ` Elijah Newren 2020-11-09 20:59 ` Jonathan Tan 2020-11-11 14:38 ` Derrick Stolee 2020-11-11 17:02 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren 2020-11-11 14:51 ` Derrick Stolee 2020-11-11 17:13 ` Elijah Newren 2020-11-11 17:21 ` Eric Sunshine 2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren 2020-11-06 22:52 ` Jonathan Tan 2020-11-06 23:41 ` Elijah Newren 2020-11-09 22:04 ` Jonathan Tan [this message] 2020-11-09 23:05 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren 2020-11-06 22:58 ` Jonathan Tan 2020-11-07 0:26 ` Elijah Newren 2020-11-09 22:09 ` Jonathan Tan 2020-11-09 23:08 ` Elijah Newren 2020-11-11 15:26 ` Derrick Stolee 2020-11-11 18:16 ` Elijah Newren 2020-11-11 22:06 ` Elijah Newren 2020-11-12 18:23 ` Derrick Stolee 2020-11-12 18:39 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren 2020-11-11 15:31 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren 2020-11-11 19:51 ` Jonathan Tan 2020-11-12 1:48 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren 2020-11-11 16:09 ` Derrick Stolee 2020-11-11 18:58 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren 2020-11-11 20:01 ` Jonathan Tan 2020-11-11 20:24 ` Elijah Newren 2020-11-12 20:39 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren 2020-11-11 20:47 ` Jonathan Tan 2020-11-11 21:21 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren 2020-11-12 20:15 ` Jonathan Tan 2020-11-12 22:30 ` Elijah Newren 2020-11-24 20:19 ` Elijah Newren 2020-11-25 2:07 ` Jonathan Tan 2020-11-26 18:13 ` Elijah Newren 2020-11-30 18:41 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren 2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren 2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee 2020-11-03 16:36 ` Elijah Newren 2020-11-07 6:06 ` Elijah Newren 2020-11-07 15:02 ` Derrick Stolee 2020-11-07 19:39 ` Elijah Newren 2020-11-09 12:30 ` Derrick Stolee 2020-11-09 17:13 ` Elijah Newren 2020-11-09 19:51 ` Derrick Stolee 2020-11-09 22:44 ` Elijah Newren 2020-11-11 17:08 ` Derrick Stolee 2020-11-11 18:35 ` Elijah Newren 2020-11-11 20:48 ` Derrick Stolee 2020-11-11 21:18 ` Elijah Newren 2020-11-29 7:43 [PATCH " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201109220431.2534786-1-jonathantanmy@google.com \ --to=jonathantanmy@google.com \ --cc=git@vger.kernel.org \ --cc=newren@gmail.com \ --subject='Re: [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).