Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go
Date: Mon,  2 Nov 2020 12:43:39 -0800
Message-ID: <20201102204344.342633-16-newren@gmail.com> (raw)
In-Reply-To: <20201102204344.342633-1-newren@gmail.com>

Our order for processing of entries means that if we have a tree of
files that looks like
   Makefile
   src/moduleA/foo.c
   src/moduleA/bar.c
   src/moduleB/baz.c
   src/moduleB/umm.c
   tokens.txt

Then we will process paths in the order of the leftmost column below.  I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB
   src/moduleB              <version_info>    src
   src/moduleA/foo.c        <version_info>    src/moduleA
   src/moduleA/bar.c        <version_info>    src/moduleA
   src/moduleA              <version_info>    src
   src                      <version_info>    ""
   Makefile                 <version_info>    ""

When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.

So, for example, when we get to this point:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB

and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
   100644 blob <HASH> umm.c
   100644 blob <HASH> baz.c

then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
   tokens.txt               <version_info>        ""
   src/moduleB              <new version_info>    src

This process repeats until at the end we get to
   tokens.txt               <version_info>        ""
   src                      <new version_info>    ""
   Makefile                 <version_info>        ""

and then we can write out the toplevel tree.  Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
   whizbang.txt             <version_info>        ""
   tokens.txt               <version_info>        ""
   src/moduleD              <new version_info>    src
   src/moduleC              <new version_info>    src
   src/moduleB              <new version_info>    src
   src/moduleA/foo.c        <version_info>        src/moduleA
   src/moduleA/bar.c        <version_info>        src/moduleA

When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory.  So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
   ""             0
   src            2
   src/moduleA    5

Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.

The idea is relatively simple, there's just a lot of accounting to
implement this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 106 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c560dd1634..20b7c0d8b0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -353,6 +353,9 @@ static int string_list_df_name_compare(const char *one, const char *two)
 
 struct directory_versions {
 	struct string_list versions;
+	struct string_list offsets;
+	const char *last_directory;
+	unsigned last_directory_len;
 };
 
 static void write_tree(struct object_id *result_oid,
@@ -409,12 +412,100 @@ static void record_entry_for_tree(struct directory_versions *dir_metadata,
 		/* nothing to record */
 		return;
 
+	/*
+	 * Note: write_completed_directories() already added
+	 * entries for directories to dir_metadata->versions,
+	 * so no need to handle ci->filemask == 0 again.
+	 */
+	if (!ci->merged.clean && !ci->filemask)
+		return;
+
 	basename = path + ci->merged.basename_offset;
 	assert(strchr(basename, '/') == NULL);
 	string_list_append(&dir_metadata->versions,
 			   basename)->util = &ci->merged.result;
 }
 
+static void write_completed_directories(struct merge_options *opt,
+					const char *new_directory_name,
+					struct directory_versions *info)
+{
+	const char *prev_dir;
+	struct merged_info *dir_info = NULL;
+	unsigned int offset;
+	int wrote_a_new_tree = 0;
+
+	if (new_directory_name == info->last_directory)
+		return;
+
+	/*
+	 * If we are just starting (last_directory is NULL), or last_directory
+	 * is a prefix of the current directory, then we can just update
+	 * last_directory and record the offset where we started this directory.
+	 */
+	if (info->last_directory == NULL ||
+	    !strncmp(new_directory_name, info->last_directory,
+		     info->last_directory_len)) {
+		uintptr_t offset = info->versions.nr;
+
+		info->last_directory = new_directory_name;
+		info->last_directory_len = strlen(info->last_directory);
+		string_list_append(&info->offsets,
+				   info->last_directory)->util = (void*)offset;
+		return;
+	}
+
+	/*
+	 * At this point, ne (next entry) is within a different directory
+	 * than the last entry, so we need to create a tree object for all
+	 * the entries in info->versions that are under info->last_directory.
+	 */
+	dir_info = strmap_get(&opt->priv->paths, info->last_directory);
+	assert(dir_info);
+	offset = (uintptr_t)info->offsets.items[info->offsets.nr-1].util;
+	if (offset == info->versions.nr) {
+		dir_info->is_null = 1;
+	} else {
+		dir_info->result.mode = S_IFDIR;
+		write_tree(&dir_info->result.oid, &info->versions, offset);
+		wrote_a_new_tree = 1;
+	}
+
+	/*
+	 * We've now used several entries from info->versions and one entry
+	 * from info->offsets, so we get rid of those values.
+	 */
+	info->offsets.nr--;
+	info->versions.nr = offset;
+
+	/*
+	 * Now we've got an OID for last_directory in dir_info.  We need to
+	 * add it to info->versions for it to be part of the computation of
+	 * its parent directories' OID.  But first, we have to find out what
+	 * its' parent name was and whether that matches the previous
+	 * info->offsets or we need to set up a new one.
+	 */
+	prev_dir = info->offsets.nr == 0 ? NULL :
+		   info->offsets.items[info->offsets.nr-1].string;
+	if (new_directory_name != prev_dir) {
+		uintptr_t c = info->versions.nr;
+		string_list_append(&info->offsets,
+				   new_directory_name)->util = (void*)c;
+	}
+
+	/*
+	 * Okay, finally record OID for last_directory in info->versions,
+	 * and update last_directory.
+	 */
+	if (wrote_a_new_tree) {
+		const char *dir_name = strrchr(info->last_directory, '/');
+		dir_name = dir_name ? dir_name+1 : info->last_directory;
+		string_list_append(&info->versions, dir_name)->util = dir_info;
+	}
+	info->last_directory = new_directory_name;
+	info->last_directory_len = strlen(info->last_directory);
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -528,6 +619,9 @@ static void process_entries(struct merge_options *opt,
 
 	/* other setup */
 	string_list_init(&dir_metadata.versions, 0);
+	string_list_init(&dir_metadata.offsets, 0);
+	dir_metadata.last_directory = NULL;
+	dir_metadata.last_directory_len = 0;
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -541,22 +635,27 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct conflict_info *ci = entry->util;
 
+		write_completed_directories(opt, ci->merged.directory_name,
+					    &dir_metadata);
 		if (ci->merged.clean)
 			record_entry_for_tree(&dir_metadata, path, ci);
 		else
 			process_entry(opt, path, ci, &dir_metadata);
 	}
 
-	/*
-	 * TODO: We can't actually write a tree yet, because dir_metadata just
-	 * contains all basenames of all files throughout the tree with their
-	 * mode and hash.  Not only is that a nonsensical tree, it will have
-	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
-	 */
-	die("Not yet implemented; need to process subtrees separately");
+	if (dir_metadata.offsets.nr != 1 ||
+	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
+		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
+		       dir_metadata.offsets.nr);
+		printf("dir_metadata.offsets.items[0].util = %u (should be 0)\n",
+		       (unsigned)(uintptr_t)dir_metadata.offsets.items[0].util);
+		fflush(stdout);
+		BUG("dir_metadata accounting completely off; shouldn't happen");
+	}
 	write_tree(result_oid, &dir_metadata.versions, 0);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
+	string_list_clear(&dir_metadata.offsets, 0);
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
2.29.0.471.ga4f56089c0


  parent reply index

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` Elijah Newren [this message]
2020-11-12 20:15   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201102204344.342633-16-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git