All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] More index cleanups
@ 2021-01-20 16:53 Derrick Stolee via GitGitGadget
  2021-01-20 16:53 ` [PATCH 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
                   ` (9 more replies)
  0 siblings, 10 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee

This is based on ds/cache-tree-basics.

Here are a few more cleanups that are vaguely related to the index. I
discovered these while preparing my sparse-index RFC that I intend to send
early next week.

The biggest patch is the final one, which creates a test script for
comparing sparse-checkouts to full checkouts. There are some commands that
do not behave similarly. This script will be the backbone of my testing
strategy for the sparse-index by adding a new mode to compare
sparse-checkouts with the two index types (full and sparse).

Thanks, -Stolee

Derrick Stolee (9):
  cache-tree: clean up cache_tree_update()
  cache-tree: extract subtree_pos()
  fsmonitor: de-duplicate BUG()s around dirty bits
  repository: add repo reference to index_state
  name-hash: use trace2 regions for init
  sparse-checkout: load sparse-checkout patterns
  sparse-checkout: hold pattern list in index
  test-lib: test_region looks for trace2 regions
  t1092: test interesting sparse-checkout scenarios

 builtin/sparse-checkout.c                |  22 +-
 cache-tree.c                             |  20 +-
 cache-tree.h                             |   2 +
 cache.h                                  |   3 +
 dir.c                                    |  17 ++
 dir.h                                    |   2 +
 fsmonitor.c                              |  27 +-
 name-hash.c                              |   3 +
 repository.c                             |   4 +
 t/t0500-progress-display.sh              |   3 +-
 t/t1092-sparse-checkout-compatibility.sh | 323 +++++++++++++++++++++++
 t/test-lib-functions.sh                  |  40 +++
 unpack-trees.c                           |   6 +-
 13 files changed, 431 insertions(+), 41 deletions(-)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh


base-commit: a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-839%2Fderrickstolee%2Fmore-index-cleanups-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-839/derrickstolee/more-index-cleanups-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/839
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 1/9] cache-tree: clean up cache_tree_update()
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:21   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 2/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present.

Also drop local variables that are used exactly once and can be found
directly from the 'istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 3f1a8d4f1b7..c1e49901c17 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
-	struct cache_tree *it = istate->cache_tree;
-	struct cache_entry **cache = istate->cache;
-	int entries = istate->cache_nr;
-	int skip, i = verify_cache(cache, entries, flags);
+	int skip, i;
+
+	i = verify_cache(istate->cache, istate->cache_nr, flags);
 
 	if (i)
 		return i;
+
+	if (!istate->cache_tree)
+		istate->cache_tree = cache_tree();
+
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", the_repository);
-	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
+		       "", 0, &skip, flags);
 	trace2_region_leave("cache_tree", "update", the_repository);
 	trace_performance_leave("cache_tree_update");
 	if (i < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/9] cache-tree: extract subtree_pos()
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
  2021-01-20 16:53 ` [PATCH 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:23   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 6 +++---
 cache-tree.h | 2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index c1e49901c17..2b130dd5e19 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -45,7 +45,7 @@ static int subtree_name_cmp(const char *one, int onelen,
 	return memcmp(one, two, onelen);
 }
 
-static int subtree_pos(struct cache_tree *it, const char *path, int pathlen)
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen)
 {
 	struct cache_tree_sub **down = it->down;
 	int lo, hi;
@@ -72,7 +72,7 @@ static struct cache_tree_sub *find_subtree(struct cache_tree *it,
 					   int create)
 {
 	struct cache_tree_sub *down;
-	int pos = subtree_pos(it, path, pathlen);
+	int pos = cache_tree_subtree_pos(it, path, pathlen);
 	if (0 <= pos)
 		return it->down[pos];
 	if (!create)
@@ -123,7 +123,7 @@ static int do_invalidate_path(struct cache_tree *it, const char *path)
 	it->entry_count = -1;
 	if (!*slash) {
 		int pos;
-		pos = subtree_pos(it, path, namelen);
+		pos = cache_tree_subtree_pos(it, path, namelen);
 		if (0 <= pos) {
 			cache_tree_free(&it->down[pos]->cache_tree);
 			free(it->down[pos]);
diff --git a/cache-tree.h b/cache-tree.h
index 639bfa5340e..8efeccebfc9 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -27,6 +27,8 @@ void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct index_state *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
  2021-01-20 16:53 ` [PATCH 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
  2021-01-20 16:53 ` [PATCH 2/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:26   ` Elijah Newren
  2021-01-21 12:53   ` Chris Torek
  2021-01-20 16:53 ` [PATCH 4/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The index has an fsmonitor_dirty bitmap that records which index entries
are "dirty" based on the response from the FSMonitor. If this bitmap
ever grows larger than the index, then there was an error in how it was
constructed, and it was probably a developer's bug.

There are several BUG() statements that are very similar, so replace
these uses with a simpler assert_index_minimum(). Since there is one
caller that uses a custom 'pos' value instead of the bit_size member, we
cannot simplify it too much. However, the error string is identical in
each, so this simplifies things.

The end result is that the code is simpler to read while also preserving
these assertions for developers in the FSMonitor space.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 fsmonitor.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index ca031c3abb8..52a50a9545a 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -13,14 +13,19 @@
 
 struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR);
 
+static void assert_index_minimum(struct index_state *istate, size_t pos)
+{
+	if (pos > istate->cache_nr)
+		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
+		    (uintmax_t)pos, istate->cache_nr);
+}
+
 static void fsmonitor_ewah_callback(size_t pos, void *is)
 {
 	struct index_state *istate = (struct index_state *)is;
 	struct cache_entry *ce;
 
-	if (pos >= istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
-		    (uintmax_t)pos, istate->cache_nr);
+	assert_index_minimum(istate, pos);
 
 	ce = istate->cache[pos];
 	ce->ce_flags &= ~CE_FSMONITOR_VALID;
@@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
 	}
 	istate->fsmonitor_dirty = fsmonitor_dirty;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
 	return 0;
@@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
 	uint32_t ewah_size = 0;
 	int fixup = 0;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
 	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
@@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate)
 			}
 
 			/* Mark all previously saved entries as dirty */
-			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+			assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
 
 			refresh_fsmonitor(istate);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 4/9] repository: add repo reference to index_state
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:46   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 5/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It will be helpful to add behavior to index opertations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.

This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h      | 1 +
 repository.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/cache.h b/cache.h
index 71097657489..f9c7a603841 100644
--- a/cache.h
+++ b/cache.h
@@ -328,6 +328,7 @@ struct index_state {
 	struct ewah_bitmap *fsmonitor_dirty;
 	struct mem_pool *ce_mem_pool;
 	struct progress *progress;
+	struct repository *repo;
 };
 
 /* Name hashing */
diff --git a/repository.c b/repository.c
index a4174ddb062..67a4c1da2d9 100644
--- a/repository.c
+++ b/repository.c
@@ -264,6 +264,10 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
+	/* Complete the double-reference */
+	if (!repo->index->repo)
+		repo->index->repo = repo;
+
 	return read_index_from(repo->index, repo->index_file, repo->gitdir);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 5/9] name-hash: use trace2 regions for init
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 4/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:47   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 6/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The lazy_init_name_hash() populates a hashset with all filenames and
another with all directories represented in the index. This is run only
if we need to use the hashsets to check for existence or case-folding
renames.

Place trace2 regions where there is already a performance trace.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 5d3c7b12c18..4e03fac9bb1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -7,6 +7,7 @@
  */
 #include "cache.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -577,6 +578,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	trace_performance_enter();
+	trace2_region_enter("index", "name-hash-init", istate->repo);
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -597,6 +599,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
+	trace2_region_leave("index", "name-hash-init", istate->repo);
 	trace_performance_leave("initialize name hash");
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 6/9] sparse-checkout: load sparse-checkout patterns
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 5/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 17:54   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 7/9] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.

The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.

The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c |  5 -----
 dir.c                     | 17 +++++++++++++++++
 dir.h                     |  2 ++
 unpack-trees.c            |  6 +-----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index e3140db2a0a..2306a9ad98e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -22,11 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = {
 	NULL
 };
 
-static char *get_sparse_checkout_filename(void)
-{
-	return git_pathdup("info/sparse-checkout");
-}
-
 static void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 {
 	int i;
diff --git a/dir.c b/dir.c
index d637461da5c..d153a63bbd1 100644
--- a/dir.c
+++ b/dir.c
@@ -2998,6 +2998,23 @@ void setup_standard_excludes(struct dir_struct *dir)
 	}
 }
 
+char *get_sparse_checkout_filename(void)
+{
+	return git_pathdup("info/sparse-checkout");
+}
+
+int get_sparse_checkout_patterns(struct pattern_list *pl)
+{
+	int res;
+	char *sparse_filename = get_sparse_checkout_filename();
+
+	pl->use_cone_patterns = core_sparse_checkout_cone;
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL);
+
+	free(sparse_filename);
+	return res;
+}
+
 int remove_path(const char *name)
 {
 	char *slash;
diff --git a/dir.h b/dir.h
index a3c40dec516..facfae47402 100644
--- a/dir.h
+++ b/dir.h
@@ -448,6 +448,8 @@ int is_empty_dir(const char *dir);
 
 void setup_standard_excludes(struct dir_struct *dir);
 
+char *get_sparse_checkout_filename(void);
+int get_sparse_checkout_patterns(struct pattern_list *pl);
 
 /* Constants for remove_dir_recursively: */
 
diff --git a/unpack-trees.c b/unpack-trees.c
index af6e9b9c2fd..837b8bb42fb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1549,14 +1549,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 static void populate_from_existing_patterns(struct unpack_trees_options *o,
 					    struct pattern_list *pl)
 {
-	char *sparse = git_pathdup("info/sparse-checkout");
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0)
+	if (get_sparse_checkout_patterns(pl) < 0)
 		o->skip_sparse_checkout = 1;
 	else
 		o->pl = pl;
-	free(sparse);
 }
 
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 7/9] sparse-checkout: hold pattern list in index
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 6/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 18:03   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

As we modify the sparse-checkout definition, we perform index operations
on a pattern_list that only exists in-memory. This allows easy backing
out in case the index update fails.

However, if the index write itself cares about the sparse-checkout
pattern set, we need access to that in-memory copy. Place a pointer to
a 'struct pattern_list' in the index so we can access this on-demand.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c | 17 ++++++++++-------
 cache.h                   |  2 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 2306a9ad98e..e00b82af727 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl)
 	if (is_index_unborn(r->index))
 		return UPDATE_SPARSITY_SUCCESS;
 
+	r->index->sparse_checkout_patterns = pl;
+
 	memset(&o, 0, sizeof(o));
 	o.verbose_update = isatty(2);
 	o.update = 1;
@@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl)
 	else
 		rollback_lock_file(&lock_file);
 
+	r->index->sparse_checkout_patterns = NULL;
 	return result;
 }
 
@@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
 {
 	int result;
 	int changed_config = 0;
-	struct pattern_list pl;
-	memset(&pl, 0, sizeof(pl));
+	struct pattern_list *pl = xcalloc(1, sizeof(*pl));
 
 	switch (m) {
 	case ADD:
 		if (core_sparse_checkout_cone)
-			add_patterns_cone_mode(argc, argv, &pl);
+			add_patterns_cone_mode(argc, argv, pl);
 		else
-			add_patterns_literal(argc, argv, &pl);
+			add_patterns_literal(argc, argv, pl);
 		break;
 
 	case REPLACE:
-		add_patterns_from_input(&pl, argc, argv);
+		add_patterns_from_input(pl, argc, argv);
 		break;
 	}
 
@@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
 		changed_config = 1;
 	}
 
-	result = write_patterns_and_update(&pl);
+	result = write_patterns_and_update(pl);
 
 	if (result && changed_config)
 		set_config(MODE_NO_PATTERNS);
 
-	clear_pattern_list(&pl);
+	clear_pattern_list(pl);
+	free(pl);
 	return result;
 }
 
diff --git a/cache.h b/cache.h
index f9c7a603841..bf4ec3de4b0 100644
--- a/cache.h
+++ b/cache.h
@@ -305,6 +305,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 struct split_index;
 struct untracked_cache;
 struct progress;
+struct pattern_list;
 
 struct index_state {
 	struct cache_entry **cache;
@@ -329,6 +330,7 @@ struct index_state {
 	struct mem_pool *ce_mem_pool;
 	struct progress *progress;
 	struct repository *repo;
+	struct pattern_list *sparse_checkout_patterns;
 };
 
 /* Name hashing */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 8/9] test-lib: test_region looks for trace2 regions
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 7/9] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 18:20   ` Elijah Newren
  2021-01-20 16:53 ` [PATCH 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.

Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.

There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. More uses will
be added in a later change.

t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t0500-progress-display.sh |  3 +--
 t/test-lib-functions.sh     | 40 +++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index 1ed1df351cb..c461b89dfaf 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' '
 		"Working hard" <in 2>stderr &&
 
 	# t0212/parse_events.perl intentionally omits regions and data.
-	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
-	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
+	test_region category progress trace.event &&
 	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
 	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 999982fe4a9..c878db93013 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1655,3 +1655,43 @@ test_subcommand () {
 		grep "\[$expr\]"
 	fi
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_region [!] <category> <label> git <command> <args>...
+#
+# For example, to look for trace2_region_enter("index", "do_read_index", repo)
+# in an invocation of "git checkout HEAD~1", run
+#
+#	GIT_TRACE2_EVENT="$(pwd)/trace.txt" GIT_TRACE2_EVENT_NESTING=10 \
+#		git checkout HEAD~1 &&
+#	test_region index do_read_index <trace.txt
+#
+# If the first parameter passed is !, this instead checks that
+# the given region was not entered.
+#
+test_region () {
+	local expect_exit=0
+	if test "$1" = "!"
+	then
+		expect_exit=1
+		shift
+	fi
+
+	grep -e "region_enter" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+
+	grep -e "region_leave" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 9/9] t1092: test interesting sparse-checkout scenarios
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-20 16:53 ` Derrick Stolee via GitGitGadget
  2021-01-20 19:40   ` Elijah Newren
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-20 16:53 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These also document some behaviors that differ from a full checkout, and
possibly in a way that is not intended.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 323 +++++++++++++++++++++++
 1 file changed, 323 insertions(+)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
new file mode 100755
index 00000000000..46f9dc2cdf3
--- /dev/null
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -0,0 +1,323 @@
+#!/bin/sh
+
+test_description='compare full workdir to sparse workdir'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init initial-repo &&
+	(
+		cd initial-repo &&
+		echo a >a &&
+		echo "after deep" >e &&
+		echo "after folder1" >g &&
+		mkdir folder1 folder2 deep x &&
+		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1/deepest &&
+		echo "after deeper1" >deep/e &&
+		echo "after deepest" >deep/deeper1/e &&
+		cp a folder1 &&
+		cp a folder2 &&
+		cp a deep &&
+		cp a deep/deeper1 &&
+		cp a deep/deeper2 &&
+		cp a deep/deeper1/deepest &&
+		git add . &&
+		git commit -m "initial commit" &&
+		git checkout -b base &&
+		for dir in folder1 folder2 deep
+		do
+			git checkout -b update-$dir &&
+			echo "updated $dir" >$dir/a &&
+			git commit -a -m "update $dir" || return 1
+		done &&
+
+		git checkout -b rename-base base &&
+		echo >folder1/larger-content <<-\EOF &&
+		matching
+		lines
+		help
+		inexact
+		renames
+		EOF
+		cp folder1/larger-content folder2/ &&
+		cp folder1/larger-content deep/deeper1/ &&
+		git add . &&
+		git commit -m "add interesting rename content" &&
+
+		git checkout -b rename-out-to-out rename-base &&
+		mv folder1/a folder2/b &&
+		mv folder1/larger-content folder2/edited-content &&
+		echo >>folder2/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to folder2/..." &&
+
+		git checkout -b rename-out-to-in rename-base &&
+		mv folder1/a deep/deeper1/b &&
+		mv folder1/larger-content deep/deeper1/edited-content &&
+		echo >>deep/deeper1/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to deep/deeper1/..." &&
+
+		git checkout -b rename-in-to-out rename-base &&
+		mv deep/deeper1/a folder1/b &&
+		mv deep/deeper1/larger-content folder1/edited-content &&
+		echo >>folder1/edited-content &&
+		git add . &&
+		git commit -m "rename deep/deeper1/... to folder1/..." &&
+
+		git checkout -b deepest base &&
+		echo "updated deepest" >deep/deeper1/deepest/a &&
+		git commit -a -m "update deepest" &&
+
+		git checkout -f base &&
+		git reset --hard
+	)
+'
+
+init_repos () {
+	rm -rf full-checkout sparse-checkout sparse-index &&
+
+	# create repos in initial state
+	cp -r initial-repo full-checkout &&
+	git -C full-checkout reset --hard &&
+
+	cp -r initial-repo sparse-checkout &&
+	git -C sparse-checkout reset --hard &&
+	git -C sparse-checkout sparse-checkout init --cone &&
+
+	# initialize sparse-checkout definitions
+	git -C sparse-checkout sparse-checkout set deep
+}
+
+run_on_sparse () {
+	(
+		cd sparse-checkout &&
+		$* >../sparse-checkout-out 2>../sparse-checkout-err
+	)
+}
+
+run_on_all () {
+	(
+		cd full-checkout &&
+		$* >../full-checkout-out 2>../full-checkout-err
+	) &&
+	run_on_sparse $*
+}
+
+test_all_match () {
+	run_on_all $* &&
+	test_cmp full-checkout-out sparse-checkout-out &&
+	test_cmp full-checkout-err sparse-checkout-err
+}
+
+test_expect_success 'status with options' '
+	init_repos &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	run_on_all "touch README.md" &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno
+'
+
+test_expect_success 'add, commit, checkout' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>README.md
+	EOF
+	run_on_all "../edit-contents" &&
+
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Add README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents" &&
+
+	test_all_match git add -A &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Extend README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout -
+'
+
+test_expect_success 'add, commit, checkout' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Add README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add -A &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Extend README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents deep/newfile" &&
+
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "add deep/newfile" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout -
+'
+
+test_expect_success 'checkout and reset --hard' '
+	init_repos &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout update-deep &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout -b reset-test &&
+	test_all_match git reset --hard deepest &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset --hard update-folder2
+'
+
+test_expect_success 'diff --staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>README.md
+	EOF
+	run_on_all "../edit-contents" &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	test_all_match git add README.md &&
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
+test_expect_success 'diff with renames' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		test_all_match git checkout rename-base &&
+		test_all_match git checkout $branch -- .&&
+		test_all_match git diff --staged &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'log with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git log -- a &&
+	test_all_match git log -- folder1/a &&
+	test_all_match git log -- folder2/a &&
+	test_all_match git log -- deep/a &&
+	test_all_match git log -- deep/deeper1/a &&
+	test_all_match git log -- deep/deeper1/deepest/a &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git log -- folder1/a
+'
+
+test_expect_success 'blame with pathspec inside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame a &&
+	test_all_match git blame deep/a &&
+	test_all_match git blame deep/deeper1/a &&
+	test_all_match git blame deep/deeper1/deepest/a
+'
+
+# TODO: blame currently does not support blaming files outside of the
+# sparse definition. It complains that the file doesn't exist locally.
+test_expect_failure 'blame with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame folder1/a &&
+	test_all_match git blame folder2/a &&
+	test_all_match git blame deep/deeper2/a &&
+	test_all_match git blame deep/deeper2/deepest/a
+'
+
+# TODO: reset currently does not behave as expected when in a
+# sparse-checkout.
+test_expect_failure 'checkout and reset (mixed)' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset deepest &&
+	test_all_match git reset update-folder1 &&
+	test_all_match git reset update-folder2
+'
+
+test_expect_success 'merge' '
+	init_repos &&
+
+	test_all_match git checkout -b merge update-deep &&
+	test_all_match git merge -m "folder1" update-folder1 &&
+	test_all_match git rev-parse HEAD^{tree} &&
+	test_all_match git merge -m "folder2" update-folder2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
+test_expect_success 'merge with outside renames' '
+	init_repos &&
+
+	for type in out-to-out out-to-in in-to-out
+	do
+		test_all_match git reset --hard &&
+		test_all_match git checkout -f -b merge-$type update-deep &&
+		test_all_match git merge -m "$type" rename-$type &&
+		test_all_match git rev-parse HEAD^{tree} || return 1
+	done
+'
+
+test_expect_success 'clean' '
+	init_repos &&
+
+	echo bogus >>.gitignore &&
+	run_on_all cp ../.gitignore . &&
+	test_all_match git add .gitignore &&
+	test_all_match git commit -m ignore-bogus-files &&
+
+	run_on_sparse mkdir folder1 &&
+	run_on_all touch folder1/bogus &&
+
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git clean -f &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xdf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_path_is_dir sparse-checkout/folder1
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] cache-tree: clean up cache_tree_update()
  2021-01-20 16:53 ` [PATCH 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
@ 2021-01-20 17:21   ` Elijah Newren
  2021-01-20 19:10     ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Make the method safer by allocating a cache_tree member for the given
> index_state if it is not already present.
>
> Also drop local variables that are used exactly once and can be found
> directly from the 'istate' parameter.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache-tree.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 3f1a8d4f1b7..c1e49901c17 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it,
>
>  int cache_tree_update(struct index_state *istate, int flags)
>  {
> -       struct cache_tree *it = istate->cache_tree;
> -       struct cache_entry **cache = istate->cache;
> -       int entries = istate->cache_nr;
> -       int skip, i = verify_cache(cache, entries, flags);
> +       int skip, i;
> +
> +       i = verify_cache(istate->cache, istate->cache_nr, flags);

All mechanical changes so far; these look obviously correct.

>
>         if (i)
>                 return i;
> +
> +       if (!istate->cache_tree)
> +               istate->cache_tree = cache_tree();

This is the only substantive change.  It seems fairly innocuous, but
it makes me wonder the reasoning...I don't know/remember enough about
cache_tree handling to know when this would or wouldn't have already
been allocated.  It seems that this would have had to segfault below
if istate->cache_tree were ever NULL, and I don't see you mentioning
any bug you are fixing, so I presume this means you are going to be
adding new codepaths somewhere that cause this function to be reached
under different circumstances than previously had been and you need it
to be more safe for those.  Is that correct?  Or is it just an
abundance of caution thing that you're adding?  If the latter, any
reason you chose to allocate one rather than assume it's a violation
of design invariants and BUG() instead?  (Perhaps the commit message
could add a sentence about the rationale for the extra safety?)

> +
>         trace_performance_enter();
>         trace2_region_enter("cache_tree", "update", the_repository);
> -       i = update_one(it, cache, entries, "", 0, &skip, flags);
> +       i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> +                      "", 0, &skip, flags);

Another mechanical update; looks good.

>         trace2_region_leave("cache_tree", "update", the_repository);
>         trace_performance_leave("cache_tree_update");
>         if (i < 0)
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 2/9] cache-tree: extract subtree_pos()
  2021-01-20 16:53 ` [PATCH 2/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
@ 2021-01-20 17:23   ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:23 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> This method will be helpful to use outside of cache-tree.c in a later
> feature. The implementation is subtle due to subtree_name_cmp() sorting
> by length and then lexicographically.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache-tree.c | 6 +++---
>  cache-tree.h | 2 ++
>  2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index c1e49901c17..2b130dd5e19 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -45,7 +45,7 @@ static int subtree_name_cmp(const char *one, int onelen,
>         return memcmp(one, two, onelen);
>  }
>
> -static int subtree_pos(struct cache_tree *it, const char *path, int pathlen)
> +int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen)
>  {
>         struct cache_tree_sub **down = it->down;
>         int lo, hi;
> @@ -72,7 +72,7 @@ static struct cache_tree_sub *find_subtree(struct cache_tree *it,
>                                            int create)
>  {
>         struct cache_tree_sub *down;
> -       int pos = subtree_pos(it, path, pathlen);
> +       int pos = cache_tree_subtree_pos(it, path, pathlen);
>         if (0 <= pos)
>                 return it->down[pos];
>         if (!create)
> @@ -123,7 +123,7 @@ static int do_invalidate_path(struct cache_tree *it, const char *path)
>         it->entry_count = -1;
>         if (!*slash) {
>                 int pos;
> -               pos = subtree_pos(it, path, namelen);
> +               pos = cache_tree_subtree_pos(it, path, namelen);
>                 if (0 <= pos) {
>                         cache_tree_free(&it->down[pos]->cache_tree);
>                         free(it->down[pos]);
> diff --git a/cache-tree.h b/cache-tree.h
> index 639bfa5340e..8efeccebfc9 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -27,6 +27,8 @@ void cache_tree_free(struct cache_tree **);
>  void cache_tree_invalidate_path(struct index_state *, const char *);
>  struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
>
> +int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
> +
>  void cache_tree_write(struct strbuf *, struct cache_tree *root);
>  struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
>
> --
> gitgitgadget

Simple, straight-forward patch for exposing the function outside the
file scope; looks good.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-20 16:53 ` [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
@ 2021-01-20 17:26   ` Elijah Newren
  2021-01-21 12:53   ` Chris Torek
  1 sibling, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The index has an fsmonitor_dirty bitmap that records which index entries
> are "dirty" based on the response from the FSMonitor. If this bitmap
> ever grows larger than the index, then there was an error in how it was
> constructed, and it was probably a developer's bug.
>
> There are several BUG() statements that are very similar, so replace
> these uses with a simpler assert_index_minimum(). Since there is one
> caller that uses a custom 'pos' value instead of the bit_size member, we
> cannot simplify it too much. However, the error string is identical in
> each, so this simplifies things.
>
> The end result is that the code is simpler to read while also preserving
> these assertions for developers in the FSMonitor space.

Indeed, looking through the patch, the end result is simpler to read.
Nice cleanup.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  fsmonitor.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index ca031c3abb8..52a50a9545a 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -13,14 +13,19 @@
>
>  struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR);
>
> +static void assert_index_minimum(struct index_state *istate, size_t pos)
> +{
> +       if (pos > istate->cache_nr)
> +               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> +                   (uintmax_t)pos, istate->cache_nr);
> +}
> +
>  static void fsmonitor_ewah_callback(size_t pos, void *is)
>  {
>         struct index_state *istate = (struct index_state *)is;
>         struct cache_entry *ce;
>
> -       if (pos >= istate->cache_nr)
> -               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> -                   (uintmax_t)pos, istate->cache_nr);
> +       assert_index_minimum(istate, pos);
>
>         ce = istate->cache[pos];
>         ce->ce_flags &= ~CE_FSMONITOR_VALID;
> @@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
>         }
>         istate->fsmonitor_dirty = fsmonitor_dirty;
>
> -       if (!istate->split_index &&
> -           istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -                   (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +       if (!istate->split_index)
> +               assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>
>         trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
>         return 0;
> @@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
>         uint32_t ewah_size = 0;
>         int fixup = 0;
>
> -       if (!istate->split_index &&
> -           istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -                   (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +       if (!istate->split_index)
> +               assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>
>         put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
>         strbuf_add(sb, &hdr_version, sizeof(uint32_t));
> @@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate)
>                         }
>
>                         /* Mark all previously saved entries as dirty */
> -                       if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -                               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -                                   (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +                       assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>                         ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
>
>                         refresh_fsmonitor(istate);
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 4/9] repository: add repo reference to index_state
  2021-01-20 16:53 ` [PATCH 4/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
@ 2021-01-20 17:46   ` Elijah Newren
  2021-01-20 19:16     ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> It will be helpful to add behavior to index opertations that might

s/opertations/operations/

> trigger an object lookup. Since each index belongs to a specific
> repository, add a 'repo' pointer to struct index_state that allows
> access to this repository.
>
> This will prevent future changes from needing to pass an additional
> 'struct repository *repo' parameter and instead rely only on the 'struct
> index_state *istate' parameter.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache.h      | 1 +
>  repository.c | 4 ++++
>  2 files changed, 5 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 71097657489..f9c7a603841 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -328,6 +328,7 @@ struct index_state {
>         struct ewah_bitmap *fsmonitor_dirty;
>         struct mem_pool *ce_mem_pool;
>         struct progress *progress;
> +       struct repository *repo;
>  };
>
>  /* Name hashing */
> diff --git a/repository.c b/repository.c
> index a4174ddb062..67a4c1da2d9 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -264,6 +264,10 @@ int repo_read_index(struct repository *repo)
>         if (!repo->index)
>                 repo->index = xcalloc(1, sizeof(*repo->index));
>
> +       /* Complete the double-reference */
> +       if (!repo->index->repo)
> +               repo->index->repo = repo;
> +
>         return read_index_from(repo->index, repo->index_file, repo->gitdir);
>  }
>
> --
> gitgitgadget

Since we have repo->index and we have index->repo, which are intended
to be circular...what if they aren't?  Do we want or need to add
assertions anywhere that repo == repo->index->repo or that index ==
index->repo->index ?

My initial implementations of --remerge-diff[1] played around with
creating a second repo, with a different primary object store but
everything else the same.  The index for the two repository objects
was thus the same, and thus clearly would have violated this assumed
invariant for one of the two repos.  I discarded that initial
implementation (which I didn't quite have working) because I
discovered tmp-objdir.h and was able to add some
tmp_objdir_make_primary() and tmp_objdir_remove_as_primary() functions
that merely altered the existing repo's primary object store, but I'm
curious if there might be other cases of folks doing stuff that might
have weird failures with this new invariant.

It's entirely possible that --remerge-diff was just so different, and
I was so unfamiliar with repo objects (and still kind of am) that I
was just doing weird stuff no one has done before, so perhaps no
additional checks are needed -- I'm just throwing my gut question out
there as food for thought.



[1] I have not yet submitted `--remerge-diff` to the list; you haven't
missed anything.  I'm waiting for merge-ort to be submitted, reviewed,
and merged first.  It's the remerge-diff branch in my fork on GitHub
if anyone is curious, though.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 5/9] name-hash: use trace2 regions for init
  2021-01-20 16:53 ` [PATCH 5/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
@ 2021-01-20 17:47   ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:47 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The lazy_init_name_hash() populates a hashset with all filenames and
> another with all directories represented in the index. This is run only
> if we need to use the hashsets to check for existence or case-folding
> renames.
>
> Place trace2 regions where there is already a performance trace.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  name-hash.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/name-hash.c b/name-hash.c
> index 5d3c7b12c18..4e03fac9bb1 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -7,6 +7,7 @@
>   */
>  #include "cache.h"
>  #include "thread-utils.h"
> +#include "trace2.h"
>
>  struct dir_entry {
>         struct hashmap_entry ent;
> @@ -577,6 +578,7 @@ static void lazy_init_name_hash(struct index_state *istate)
>         if (istate->name_hash_initialized)
>                 return;
>         trace_performance_enter();
> +       trace2_region_enter("index", "name-hash-init", istate->repo);
>         hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
>         hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
>
> @@ -597,6 +599,7 @@ static void lazy_init_name_hash(struct index_state *istate)
>         }
>
>         istate->name_hash_initialized = 1;
> +       trace2_region_leave("index", "name-hash-init", istate->repo);
>         trace_performance_leave("initialize name hash");
>  }
>
> --
> gitgitgadget

Yaay for trace2.  :-)

Total sidenote: Are we ever going to get rid of the trace1 stuff?  I'm
sure this was discussed somewhere, but I don't know where.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] sparse-checkout: load sparse-checkout patterns
  2021-01-20 16:53 ` [PATCH 6/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
@ 2021-01-20 17:54   ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 17:54 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> A future feature will want to load the sparse-checkout patterns into a
> pattern_list, but the current mechanism to do so is a bit complicated.
> This is made difficult due to needing to find the sparse-checkout file
> in different ways throughout the codebase.
>
> The logic implemented in the new get_sparse_checkout_patterns() was
> duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
> the new method instead, keeping the logic around handling the struct
> unpack_trees_options.
>
> The callers to get_sparse_checkout_filename() in
> builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
> so it is not appropriate to replace logic in that file with
> get_sparse_checkout_patterns().
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c |  5 -----
>  dir.c                     | 17 +++++++++++++++++
>  dir.h                     |  2 ++
>  unpack-trees.c            |  6 +-----
>  4 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index e3140db2a0a..2306a9ad98e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -22,11 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = {
>         NULL
>  };
>
> -static char *get_sparse_checkout_filename(void)
> -{
> -       return git_pathdup("info/sparse-checkout");
> -}
> -
>  static void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
>  {
>         int i;
> diff --git a/dir.c b/dir.c
> index d637461da5c..d153a63bbd1 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2998,6 +2998,23 @@ void setup_standard_excludes(struct dir_struct *dir)
>         }
>  }
>
> +char *get_sparse_checkout_filename(void)
> +{
> +       return git_pathdup("info/sparse-checkout");
> +}
> +
> +int get_sparse_checkout_patterns(struct pattern_list *pl)
> +{
> +       int res;
> +       char *sparse_filename = get_sparse_checkout_filename();
> +
> +       pl->use_cone_patterns = core_sparse_checkout_cone;
> +       res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL);
> +
> +       free(sparse_filename);
> +       return res;
> +}
> +
>  int remove_path(const char *name)
>  {
>         char *slash;
> diff --git a/dir.h b/dir.h
> index a3c40dec516..facfae47402 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -448,6 +448,8 @@ int is_empty_dir(const char *dir);
>
>  void setup_standard_excludes(struct dir_struct *dir);
>
> +char *get_sparse_checkout_filename(void);
> +int get_sparse_checkout_patterns(struct pattern_list *pl);
>
>  /* Constants for remove_dir_recursively: */
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index af6e9b9c2fd..837b8bb42fb 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1549,14 +1549,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  static void populate_from_existing_patterns(struct unpack_trees_options *o,
>                                             struct pattern_list *pl)
>  {
> -       char *sparse = git_pathdup("info/sparse-checkout");
> -
> -       pl->use_cone_patterns = core_sparse_checkout_cone;
> -       if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0)
> +       if (get_sparse_checkout_patterns(pl) < 0)
>                 o->skip_sparse_checkout = 1;
>         else
>                 o->pl = pl;
> -       free(sparse);
>  }
>
>
> --
> gitgitgadget

Looks straightforward and well motivated to me.

But the cherry on top that really sells this patch is that more lines
of dir.c will blame to someone besides me.  Win-win!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 7/9] sparse-checkout: hold pattern list in index
  2021-01-20 16:53 ` [PATCH 7/9] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget
@ 2021-01-20 18:03   ` Elijah Newren
  2021-01-20 19:22     ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 18:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> As we modify the sparse-checkout definition, we perform index operations
> on a pattern_list that only exists in-memory. This allows easy backing
> out in case the index update fails.
>
> However, if the index write itself cares about the sparse-checkout
> pattern set, we need access to that in-memory copy. Place a pointer to
> a 'struct pattern_list' in the index so we can access this on-demand.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c | 17 ++++++++++-------
>  cache.h                   |  2 ++
>  2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 2306a9ad98e..e00b82af727 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl)
>         if (is_index_unborn(r->index))
>                 return UPDATE_SPARSITY_SUCCESS;
>
> +       r->index->sparse_checkout_patterns = pl;
> +
>         memset(&o, 0, sizeof(o));
>         o.verbose_update = isatty(2);
>         o.update = 1;
> @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl)
>         else
>                 rollback_lock_file(&lock_file);
>
> +       r->index->sparse_checkout_patterns = NULL;
>         return result;

The setting back to NULL made me curious; we don't want this
information to remain available later?  Is it only going to be used
for the updating of the working directory?

I dug a bit into the callers, and didn't find the answer to my
question...but I did notice that modify_pattern_list() will correctly
free the patterns after write_patterns_and_update() via calling
clear_pattern_list(&pl), but sparse_checkout_init() appears to leak
the patterns it allocates.  That's a separate issue from this patch,
but do you want to fix that up while working in this area (so I avoid
stepping on your toes with all your other patches)?

>  }
>
> @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>  {
>         int result;
>         int changed_config = 0;
> -       struct pattern_list pl;
> -       memset(&pl, 0, sizeof(pl));
> +       struct pattern_list *pl = xcalloc(1, sizeof(*pl));
>
>         switch (m) {
>         case ADD:
>                 if (core_sparse_checkout_cone)
> -                       add_patterns_cone_mode(argc, argv, &pl);
> +                       add_patterns_cone_mode(argc, argv, pl);
>                 else
> -                       add_patterns_literal(argc, argv, &pl);
> +                       add_patterns_literal(argc, argv, pl);
>                 break;
>
>         case REPLACE:
> -               add_patterns_from_input(&pl, argc, argv);
> +               add_patterns_from_input(pl, argc, argv);

Slightly annoying that the other functions are (argc, argv, pl) and
this one is (pl, argc, argv).  But again, that's outside the scope of
this patch and might not be worth the churn to fix.

>                 break;
>         }
>
> @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
>                 changed_config = 1;
>         }
>
> -       result = write_patterns_and_update(&pl);
> +       result = write_patterns_and_update(pl);
>
>         if (result && changed_config)
>                 set_config(MODE_NO_PATTERNS);
>
> -       clear_pattern_list(&pl);
> +       clear_pattern_list(pl);
> +       free(pl);
>         return result;
>  }
>
> diff --git a/cache.h b/cache.h
> index f9c7a603841..bf4ec3de4b0 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -305,6 +305,7 @@ static inline unsigned int canon_mode(unsigned int mode)
>  struct split_index;
>  struct untracked_cache;
>  struct progress;
> +struct pattern_list;
>
>  struct index_state {
>         struct cache_entry **cache;
> @@ -329,6 +330,7 @@ struct index_state {
>         struct mem_pool *ce_mem_pool;
>         struct progress *progress;
>         struct repository *repo;
> +       struct pattern_list *sparse_checkout_patterns;
>  };
>
>  /* Name hashing */
> --
> gitgitgadget

Otherwise, this patch looks good to me.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] test-lib: test_region looks for trace2 regions
  2021-01-20 16:53 ` [PATCH 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-20 18:20   ` Elijah Newren
  2021-01-20 19:24     ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 18:20 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Most test cases can verify Git's behavior using input/output
> expectations or changes to the .git directory. However, sometimes we
> want to check that Git did or did not run a certain section of code.
> This is particularly important for performance-only features that we
> want to ensure have been enabled in certain cases.
>
> Add a new 'test_region' function that checks if a trace2 region was
> entered and left in a given trace2 event log.

Ooh, others do this too?  Sounds like a helpful function to add, but
just checking for entered and left means that...

>
> There is one existing test (t0500-progress-display.sh) that performs
> this check already, so use the helper function instead. More uses will
> be added in a later change.
>
> t6423-merge-rename-directories.sh also greps for region_enter lines, but
> it verifies the number of such lines, which is not the same as an
> existence check.

...yeah, won't cover the case that I added.  That's fine, since it
appears to be a one-off for now and we don't know of any other cases,
current or planned, that want to do something like that yet.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t0500-progress-display.sh |  3 +--
>  t/test-lib-functions.sh     | 40 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index 1ed1df351cb..c461b89dfaf 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' '
>                 "Working hard" <in 2>stderr &&
>
>         # t0212/parse_events.perl intentionally omits regions and data.
> -       grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> -       grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +       test_region category progress trace.event &&

Sidenote: Hmm...about 40% of my region labels in merge-ort.c and 90%
in diffcore-rename.c have spaces in them.  This function could still
be used, but I'm curious if I should change the labels (but then
again, they are testing logical regions rather than individual
functions, and the spaces instead of underscores kind of convey
that...)

>         grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
>         grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
>  '
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 999982fe4a9..c878db93013 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1655,3 +1655,43 @@ test_subcommand () {
>                 grep "\[$expr\]"
>         fi
>  }
> +
> +# Check that the given command was invoked as part of the
> +# trace2-format trace on stdin.
> +#
> +#      test_region [!] <category> <label> git <command> <args>...
> +#
> +# For example, to look for trace2_region_enter("index", "do_read_index", repo)
> +# in an invocation of "git checkout HEAD~1", run
> +#
> +#      GIT_TRACE2_EVENT="$(pwd)/trace.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +#              git checkout HEAD~1 &&
> +#      test_region index do_read_index <trace.txt
> +#
> +# If the first parameter passed is !, this instead checks that
> +# the given region was not entered.
> +#
> +test_region () {
> +       local expect_exit=0
> +       if test "$1" = "!"
> +       then
> +               expect_exit=1
> +               shift
> +       fi
> +
> +       grep -e "region_enter" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
> +       exitcode=$?
> +
> +       if test $exitcode != $expect_exit
> +       then
> +               return 1
> +       fi
> +
> +       grep -e "region_leave" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
> +       exitcode=$?
> +
> +       if test $exitcode != $expect_exit
> +       then
> +               return 1
> +       fi
> +}
> --
> gitgitgadget

This patch looks good to me.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] cache-tree: clean up cache_tree_update()
  2021-01-20 17:21   ` Elijah Newren
@ 2021-01-20 19:10     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-20 19:10 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On 1/20/2021 12:21 PM, Elijah Newren wrote:
> On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:...
>> +
>> +       if (!istate->cache_tree)
>> +               istate->cache_tree = cache_tree();
> 
> This is the only substantive change.  It seems fairly innocuous, but
> it makes me wonder the reasoning...I don't know/remember enough about
> cache_tree handling to know when this would or wouldn't have already
> been allocated.  It seems that this would have had to segfault below
> if istate->cache_tree were ever NULL, and I don't see you mentioning
> any bug you are fixing, so I presume this means you are going to be
> adding new codepaths somewhere that cause this function to be reached
> under different circumstances than previously had been and you need it
> to be more safe for those.  Is that correct?  Or is it just an
> abundance of caution thing that you're adding?  If the latter, any
> reason you chose to allocate one rather than assume it's a violation
> of design invariants and BUG() instead?  (Perhaps the commit message
> could add a sentence about the rationale for the extra safety?)

It's something I need in the future when I use the cache_tree_update()
in more places. I think I call it two times, and either I need to
initialize the cache_tree member outside of both, or just make it a
feature of the method that it will re-initialize the cache-tree.

Note: the implementation treats an initialized, but empty cache-tree
as "invalid" so update_one() correctly populates the full tree.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 4/9] repository: add repo reference to index_state
  2021-01-20 17:46   ` Elijah Newren
@ 2021-01-20 19:16     ` Derrick Stolee
  2021-01-20 19:50       ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2021-01-20 19:16 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On 1/20/2021 12:46 PM, Elijah Newren wrote:
> On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> It will be helpful to add behavior to index opertations that might
> 
> s/opertations/operations/

Thanks.

>> trigger an object lookup. Since each index belongs to a specific
>> repository, add a 'repo' pointer to struct index_state that allows
>> access to this repository.
>>
>> This will prevent future changes from needing to pass an additional
>> 'struct repository *repo' parameter and instead rely only on the 'struct
>> index_state *istate' parameter.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  cache.h      | 1 +
>>  repository.c | 4 ++++
>>  2 files changed, 5 insertions(+)
>>
>> diff --git a/cache.h b/cache.h
>> index 71097657489..f9c7a603841 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -328,6 +328,7 @@ struct index_state {
>>         struct ewah_bitmap *fsmonitor_dirty;
>>         struct mem_pool *ce_mem_pool;
>>         struct progress *progress;
>> +       struct repository *repo;
>>  };
>>
>>  /* Name hashing */
>> diff --git a/repository.c b/repository.c
>> index a4174ddb062..67a4c1da2d9 100644
>> --- a/repository.c
>> +++ b/repository.c
>> @@ -264,6 +264,10 @@ int repo_read_index(struct repository *repo)
>>         if (!repo->index)
>>                 repo->index = xcalloc(1, sizeof(*repo->index));
>>
>> +       /* Complete the double-reference */
>> +       if (!repo->index->repo)
>> +               repo->index->repo = repo;
>> +
>>         return read_index_from(repo->index, repo->index_file, repo->gitdir);
>>  }
>>
>> --
>> gitgitgadget
> 
> Since we have repo->index and we have index->repo, which are intended
> to be circular...what if they aren't?  Do we want or need to add
> assertions anywhere that repo == repo->index->repo or that index ==
> index->repo->index ?

Here, we are pairing them together and the loop is complete. I don't
view that as a permanent thing. This only initializes istate->repo
when we are parsing an index from a file, but not when we create one
in memory.

I imagine it will be likely in some cases to have multiple index_state
instances for a single repository. However, having the pointer "this
index belongs to this repository" seems helpful (to me).

> My initial implementations of --remerge-diff[1] played around with
> creating a second repo, with a different primary object store but
> everything else the same.  The index for the two repository objects
> was thus the same, and thus clearly would have violated this assumed
> invariant for one of the two repos.  I discarded that initial
> implementation (which I didn't quite have working) because I
> discovered tmp-objdir.h and was able to add some
> tmp_objdir_make_primary() and tmp_objdir_remove_as_primary() functions
> that merely altered the existing repo's primary object store, but I'm
> curious if there might be other cases of folks doing stuff that might
> have weird failures with this new invariant.

This is an interesting concept, and definitely violates my expectations
that an index belongs to only one repository. I'd need to know more
about why this was a good design decision before being convinced that
the relationship should not be many-to-one (index-to-repo).

> It's entirely possible that --remerge-diff was just so different, and
> I was so unfamiliar with repo objects (and still kind of am) that I
> was just doing weird stuff no one has done before, so perhaps no
> additional checks are needed -- I'm just throwing my gut question out
> there as food for thought.
> 
> [1] I have not yet submitted `--remerge-diff` to the list; you haven't
> missed anything.  I'm waiting for merge-ort to be submitted, reviewed,
> and merged first.  It's the remerge-diff branch in my fork on GitHub
> if anyone is curious, though.
 
I'm interested in what others might say about this idea. I'd be able
to do most of what I want to do without this patch, but it just gets
a lot messier. (istate->repo is used in the very next patch in a way
that would be less clean without it.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 7/9] sparse-checkout: hold pattern list in index
  2021-01-20 18:03   ` Elijah Newren
@ 2021-01-20 19:22     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-20 19:22 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On 1/20/2021 1:03 PM, Elijah Newren wrote:
> On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> As we modify the sparse-checkout definition, we perform index operations
>> on a pattern_list that only exists in-memory. This allows easy backing
>> out in case the index update fails.
>>
>> However, if the index write itself cares about the sparse-checkout
>> pattern set, we need access to that in-memory copy. Place a pointer to
>> a 'struct pattern_list' in the index so we can access this on-demand.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  builtin/sparse-checkout.c | 17 ++++++++++-------
>>  cache.h                   |  2 ++
>>  2 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
>> index 2306a9ad98e..e00b82af727 100644
>> --- a/builtin/sparse-checkout.c
>> +++ b/builtin/sparse-checkout.c
>> @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl)
>>         if (is_index_unborn(r->index))
>>                 return UPDATE_SPARSITY_SUCCESS;
>>
>> +       r->index->sparse_checkout_patterns = pl;
>> +
>>         memset(&o, 0, sizeof(o));
>>         o.verbose_update = isatty(2);
>>         o.update = 1;
>> @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl)
>>         else
>>                 rollback_lock_file(&lock_file);
>>
>> +       r->index->sparse_checkout_patterns = NULL;
>>         return result;
> 
> The setting back to NULL made me curious; we don't want this
> information to remain available later?  Is it only going to be used
> for the updating of the working directory?
> 
> I dug a bit into the callers, and didn't find the answer to my
> question...but I did notice that modify_pattern_list() will correctly
> free the patterns after write_patterns_and_update() via calling
> clear_pattern_list(&pl), but sparse_checkout_init() appears to leak
> the patterns it allocates.  That's a separate issue from this patch,
> but do you want to fix that up while working in this area (so I avoid
> stepping on your toes with all your other patches)?

The thing that caught me here is that update_working_directory() uses
an in-memory pattern_list that hasn't been committed to the
sparse-checkout file yet. This means we need to (temporarily) point
to this pattern_list.

Perhaps this patch is premature, since nothing actually _uses_
sparse_checkout_patterns yet. When we do add such a use, it will
initialize a NULL value with the patterns in the sparse-checkout
file. In that case, we definitely want to inject our in-memory
patterns instead.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] test-lib: test_region looks for trace2 regions
  2021-01-20 18:20   ` Elijah Newren
@ 2021-01-20 19:24     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-20 19:24 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On 1/20/2021 1:20 PM, Elijah Newren wrote:
> On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>>         # t0212/parse_events.perl intentionally omits regions and data.
>> -       grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
>> -       grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
>> +       test_region category progress trace.event &&
> 
> Sidenote: Hmm...about 40% of my region labels in merge-ort.c and 90%
> in diffcore-rename.c have spaces in them.  This function could still
> be used, but I'm curious if I should change the labels (but then
> again, they are testing logical regions rather than individual
> functions, and the spaces instead of underscores kind of convey
> that...)

You should be able to use

	test_region "category with spaces" "progress with spaces" trace

but if not, then the test_region helper could be improved to match.

I do think that it's better to avoid spaces in these identifiers.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 9/9] t1092: test interesting sparse-checkout scenarios
  2021-01-20 16:53 ` [PATCH 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
@ 2021-01-20 19:40   ` Elijah Newren
  2021-01-21 11:59     ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 19:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> These also document some behaviors that differ from a full checkout, and
> possibly in a way that is not intended.

I'm in favor.  I should turn some of my noted weird behaviors from [1]
into testcases as well.

[1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 323 +++++++++++++++++++++++
>  1 file changed, 323 insertions(+)
>  create mode 100755 t/t1092-sparse-checkout-compatibility.sh
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> new file mode 100755
> index 00000000000..46f9dc2cdf3
> --- /dev/null
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -0,0 +1,323 @@
> +#!/bin/sh
> +
> +test_description='compare full workdir to sparse workdir'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +       git init initial-repo &&
> +       (
> +               cd initial-repo &&
> +               echo a >a &&
> +               echo "after deep" >e &&
> +               echo "after folder1" >g &&
> +               mkdir folder1 folder2 deep x &&
> +               mkdir deep/deeper1 deep/deeper2 &&
> +               mkdir deep/deeper1/deepest &&
> +               echo "after deeper1" >deep/e &&
> +               echo "after deepest" >deep/deeper1/e &&
> +               cp a folder1 &&
> +               cp a folder2 &&
> +               cp a deep &&
> +               cp a deep/deeper1 &&
> +               cp a deep/deeper2 &&
> +               cp a deep/deeper1/deepest &&
> +               git add . &&
> +               git commit -m "initial commit" &&
> +               git checkout -b base &&
> +               for dir in folder1 folder2 deep
> +               do
> +                       git checkout -b update-$dir &&
> +                       echo "updated $dir" >$dir/a &&
> +                       git commit -a -m "update $dir" || return 1
> +               done &&
> +
> +               git checkout -b rename-base base &&
> +               echo >folder1/larger-content <<-\EOF &&
> +               matching
> +               lines
> +               help
> +               inexact
> +               renames
> +               EOF
> +               cp folder1/larger-content folder2/ &&
> +               cp folder1/larger-content deep/deeper1/ &&
> +               git add . &&
> +               git commit -m "add interesting rename content" &&
> +
> +               git checkout -b rename-out-to-out rename-base &&
> +               mv folder1/a folder2/b &&
> +               mv folder1/larger-content folder2/edited-content &&
> +               echo >>folder2/edited-content &&
> +               git add . &&
> +               git commit -m "rename folder1/... to folder2/..." &&
> +
> +               git checkout -b rename-out-to-in rename-base &&
> +               mv folder1/a deep/deeper1/b &&
> +               mv folder1/larger-content deep/deeper1/edited-content &&
> +               echo >>deep/deeper1/edited-content &&
> +               git add . &&
> +               git commit -m "rename folder1/... to deep/deeper1/..." &&
> +
> +               git checkout -b rename-in-to-out rename-base &&
> +               mv deep/deeper1/a folder1/b &&
> +               mv deep/deeper1/larger-content folder1/edited-content &&
> +               echo >>folder1/edited-content &&
> +               git add . &&
> +               git commit -m "rename deep/deeper1/... to folder1/..." &&
> +
> +               git checkout -b deepest base &&
> +               echo "updated deepest" >deep/deeper1/deepest/a &&
> +               git commit -a -m "update deepest" &&
> +
> +               git checkout -f base &&
> +               git reset --hard
> +       )
> +'
> +
> +init_repos () {
> +       rm -rf full-checkout sparse-checkout sparse-index &&
> +
> +       # create repos in initial state
> +       cp -r initial-repo full-checkout &&
> +       git -C full-checkout reset --hard &&
> +
> +       cp -r initial-repo sparse-checkout &&
> +       git -C sparse-checkout reset --hard &&
> +       git -C sparse-checkout sparse-checkout init --cone &&
> +
> +       # initialize sparse-checkout definitions
> +       git -C sparse-checkout sparse-checkout set deep
> +}
> +
> +run_on_sparse () {
> +       (
> +               cd sparse-checkout &&
> +               $* >../sparse-checkout-out 2>../sparse-checkout-err
> +       )
> +}
> +
> +run_on_all () {
> +       (
> +               cd full-checkout &&
> +               $* >../full-checkout-out 2>../full-checkout-err
> +       ) &&
> +       run_on_sparse $*
> +}
> +
> +test_all_match () {
> +       run_on_all $* &&
> +       test_cmp full-checkout-out sparse-checkout-out &&
> +       test_cmp full-checkout-err sparse-checkout-err
> +}
> +
> +test_expect_success 'status with options' '
> +       init_repos &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno &&
> +       run_on_all "touch README.md" &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno &&
> +       test_all_match git add README.md &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno
> +'
> +
> +test_expect_success 'add, commit, checkout' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>README.md
> +       EOF
> +       run_on_all "../edit-contents" &&
> +
> +       test_all_match git add README.md &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Add README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout - &&
> +
> +       run_on_all "../edit-contents" &&
> +
> +       test_all_match git add -A &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Extend README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout -
> +'

I was going to add comments here, but I noticed the next test had the
same description and looked very similar, so I'll defer the
comments...

> +
> +test_expect_success 'add, commit, checkout' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all "../edit-contents README.md" &&
> +
> +       test_all_match git add README.md &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Add README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout - &&
> +
> +       run_on_all "../edit-contents README.md" &&
> +
> +       test_all_match git add -A &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Extend README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout - &&

Up to here, this test is identical to the previous one.  Why repeat it?

> +
> +       run_on_all "../edit-contents deep/newfile" &&
> +
> +       test_all_match git status --porcelain=v2 -uno &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "add deep/newfile" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout -
> +'

Think out loud...so you are only adding files that were not previously
tracked and that would have been part of the sparse cone.  You aren't
trying to add files that would be outside the sparse cone, or manually
creating files missing from the working tree due to sparseness and
then attempting to add them.  (Which is fine, we have to start
somewhere with our testing.  Also, I think my testcases didn't look at
the case you did, and only covered one of these other two cases.)

> +
> +test_expect_success 'checkout and reset --hard' '
> +       init_repos &&
> +
> +       test_all_match git checkout update-folder1 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git checkout update-deep &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git checkout -b reset-test &&
> +       test_all_match git reset --hard deepest &&
> +       test_all_match git reset --hard update-folder1 &&
> +       test_all_match git reset --hard update-folder2
> +'
> +
> +test_expect_success 'diff --staged' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>README.md
> +       EOF
> +       run_on_all "../edit-contents" &&
> +
> +       test_all_match git diff &&
> +       test_all_match git diff --staged &&
> +       test_all_match git add README.md &&
> +       test_all_match git diff &&
> +       test_all_match git diff --staged
> +'

Also a case where you're operating on a file that matches the sparsity
patterns (in cone mode, everything in the root directory is included).

> +test_expect_success 'diff with renames' '
> +       init_repos &&
> +
> +       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       do
> +               test_all_match git checkout rename-base &&
> +               test_all_match git checkout $branch -- .&&
> +               test_all_match git diff --staged &&
> +               test_all_match git diff --staged --find-renames || return 1

Aren't these last two lines the same? (diff.renames defaults to true
ever since commit 5404c116aa, "diff: activate diff.renames by
default", 2016-02-25)  Are they only different because you have a
tweaked config that turns off renames by default?

Perhaps the first diff line should have a --no-renames flag.

> +       done
> +'
> +
> +test_expect_success 'log with pathspec outside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git log -- a &&
> +       test_all_match git log -- folder1/a &&
> +       test_all_match git log -- folder2/a &&
> +       test_all_match git log -- deep/a &&
> +       test_all_match git log -- deep/deeper1/a &&
> +       test_all_match git log -- deep/deeper1/deepest/a &&
> +
> +       test_all_match git checkout update-folder1 &&
> +       test_all_match git log -- folder1/a
> +'
> +
> +test_expect_success 'blame with pathspec inside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git blame a &&
> +       test_all_match git blame deep/a &&
> +       test_all_match git blame deep/deeper1/a &&
> +       test_all_match git blame deep/deeper1/deepest/a
> +'

Good check.

On a side note going back to a piece of the other thread I didn't get
a response to, I'm still curious whether
    git blame -C -C $PATH_INSIDE_SPARSE_DEFINITION
should (optionally?) behave differently in a sparse checkout.  In
particular, should it limit its copy detection to other paths also in
the sparse checkout, or should it always search all other files within
the repository for copied lines?  Searching just within the sparse
checkout seems like it could be a really nice performance
optimization.

> +
> +# TODO: blame currently does not support blaming files outside of the
> +# sparse definition. It complains that the file doesn't exist locally.

Nice catch.  Yeah, blame tries to check the local working copy for
changes, and shows those lines with a changed in commit 0000000000.
We should add a check that says that if the file is SKIP_WORKTREE,
then we treat it the same as `git blame $PATH HEAD`.

> +test_expect_failure 'blame with pathspec outside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git blame folder1/a &&
> +       test_all_match git blame folder2/a &&
> +       test_all_match git blame deep/deeper2/a &&
> +       test_all_match git blame deep/deeper2/deepest/a
> +'
> +
> +# TODO: reset currently does not behave as expected when in a
> +# sparse-checkout.

I'm going to go to test this out to see what it does.  It's the first
testcase you listed that I didn't know how it worked and couldn't
figure it out from your comments.  However it turns out, definitely a
good test to have.

> +test_expect_failure 'checkout and reset (mixed)' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset deepest &&
> +       test_all_match git reset update-folder1 &&
> +       test_all_match git reset update-folder2
> +'
> +
> +test_expect_success 'merge' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b merge update-deep &&
> +       test_all_match git merge -m "folder1" update-folder1 &&
> +       test_all_match git rev-parse HEAD^{tree} &&
> +       test_all_match git merge -m "folder2" update-folder2 &&
> +       test_all_match git rev-parse HEAD^{tree}
> +'
> +
> +test_expect_success 'merge with outside renames' '
> +       init_repos &&
> +
> +       for type in out-to-out out-to-in in-to-out
> +       do
> +               test_all_match git reset --hard &&
> +               test_all_match git checkout -f -b merge-$type update-deep &&
> +               test_all_match git merge -m "$type" rename-$type &&
> +               test_all_match git rev-parse HEAD^{tree} || return 1
> +       done
> +'
> +
> +test_expect_success 'clean' '
> +       init_repos &&
> +
> +       echo bogus >>.gitignore &&
> +       run_on_all cp ../.gitignore . &&
> +       test_all_match git add .gitignore &&
> +       test_all_match git commit -m ignore-bogus-files &&
> +
> +       run_on_sparse mkdir folder1 &&
> +       run_on_all touch folder1/bogus &&
> +
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git clean -f &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git clean -xf &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git clean -xdf &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_path_is_dir sparse-checkout/folder1
> +'
> +
> +test_done
> --

I made lots of comments, but overall these tests look good to me other
than just one question about test duplication and another about using
--no-renames for diff when rename detection isn't wanted.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 4/9] repository: add repo reference to index_state
  2021-01-20 19:16     ` Derrick Stolee
@ 2021-01-20 19:50       ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-20 19:50 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 11:16 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/20/2021 12:46 PM, Elijah Newren wrote:
> > On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> It will be helpful to add behavior to index opertations that might
> >
> > s/opertations/operations/
>
> Thanks.
>
> >> trigger an object lookup. Since each index belongs to a specific
> >> repository, add a 'repo' pointer to struct index_state that allows
> >> access to this repository.
> >>
> >> This will prevent future changes from needing to pass an additional
> >> 'struct repository *repo' parameter and instead rely only on the 'struct
> >> index_state *istate' parameter.
> >>
> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> >> ---
> >>  cache.h      | 1 +
> >>  repository.c | 4 ++++
> >>  2 files changed, 5 insertions(+)
> >>
> >> diff --git a/cache.h b/cache.h
> >> index 71097657489..f9c7a603841 100644
> >> --- a/cache.h
> >> +++ b/cache.h
> >> @@ -328,6 +328,7 @@ struct index_state {
> >>         struct ewah_bitmap *fsmonitor_dirty;
> >>         struct mem_pool *ce_mem_pool;
> >>         struct progress *progress;
> >> +       struct repository *repo;
> >>  };
> >>
> >>  /* Name hashing */
> >> diff --git a/repository.c b/repository.c
> >> index a4174ddb062..67a4c1da2d9 100644
> >> --- a/repository.c
> >> +++ b/repository.c
> >> @@ -264,6 +264,10 @@ int repo_read_index(struct repository *repo)
> >>         if (!repo->index)
> >>                 repo->index = xcalloc(1, sizeof(*repo->index));
> >>
> >> +       /* Complete the double-reference */
> >> +       if (!repo->index->repo)
> >> +               repo->index->repo = repo;
> >> +
> >>         return read_index_from(repo->index, repo->index_file, repo->gitdir);
> >>  }
> >>
> >> --
> >> gitgitgadget
> >
> > Since we have repo->index and we have index->repo, which are intended
> > to be circular...what if they aren't?  Do we want or need to add
> > assertions anywhere that repo == repo->index->repo or that index ==
> > index->repo->index ?
>
> Here, we are pairing them together and the loop is complete. I don't
> view that as a permanent thing. This only initializes istate->repo
> when we are parsing an index from a file, but not when we create one
> in memory.
>
> I imagine it will be likely in some cases to have multiple index_state
> instances for a single repository. However, having the pointer "this
> index belongs to this repository" seems helpful (to me).
>
> > My initial implementations of --remerge-diff[1] played around with
> > creating a second repo, with a different primary object store but
> > everything else the same.  The index for the two repository objects
> > was thus the same, and thus clearly would have violated this assumed
> > invariant for one of the two repos.  I discarded that initial
> > implementation (which I didn't quite have working) because I
> > discovered tmp-objdir.h and was able to add some
> > tmp_objdir_make_primary() and tmp_objdir_remove_as_primary() functions
> > that merely altered the existing repo's primary object store, but I'm
> > curious if there might be other cases of folks doing stuff that might
> > have weird failures with this new invariant.
>
> This is an interesting concept, and definitely violates my expectations
> that an index belongs to only one repository. I'd need to know more
> about why this was a good design decision before being convinced that
> the relationship should not be many-to-one (index-to-repo).

I'm not sure what I did was a good design decision; I was kind of
exploring and trying to figure things out.  In retrospect, I think it
was probably a bad idea.  But we have various guard rails in the form
of BUG() calls and such when basic assumptions are violated, and here
it seems that you are now making a new basic assumption that an index
belongs to only one repository.  (Even if all current callers happen
to satisfy that assumption, it's not clear to me that git previously
cared if this condition were satisfied or not).  Hence my question
about safety checks.

> > It's entirely possible that --remerge-diff was just so different, and
> > I was so unfamiliar with repo objects (and still kind of am) that I
> > was just doing weird stuff no one has done before, so perhaps no
> > additional checks are needed -- I'm just throwing my gut question out
> > there as food for thought.
> >
> > [1] I have not yet submitted `--remerge-diff` to the list; you haven't
> > missed anything.  I'm waiting for merge-ort to be submitted, reviewed,
> > and merged first.  It's the remerge-diff branch in my fork on GitHub
> > if anyone is curious, though.
>
> I'm interested in what others might say about this idea. I'd be able
> to do most of what I want to do without this patch, but it just gets
> a lot messier. (istate->repo is used in the very next patch in a way
> that would be less clean without it.)

I'm less concerned with your patch as-is (I think your assumption
seems reasonable and I'm fine with labelling my former unsubmitted
patches as erroneous), and more wondering whether others in the future
will accidentally violate assumptions your patch starts encoding...and
whether we can or should do anything about it.  If there's a simple
place we can add a check for such an error, then it probably makes
sense to add one.  If there isn't...then at least we considered it?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 9/9] t1092: test interesting sparse-checkout scenarios
  2021-01-20 19:40   ` Elijah Newren
@ 2021-01-21 11:59     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-21 11:59 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Derrick Stolee

On 1/20/2021 2:40 PM, Elijah Newren wrote:
> On Wed, Jan 20, 2021 at 8:54 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> These also document some behaviors that differ from a full checkout, and
>> possibly in a way that is not intended.
> 
> I'm in favor.  I should turn some of my noted weird behaviors from [1]
> into testcases as well.
> 
> [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

That would be helpful, and adding them here would be helpful.

For the cases where things don't work correctly, it is good
to add "test_expect_failure" cases as a TODO list for the
feature space.

>> +
>> +test_expect_success 'add, commit, checkout' '
>> +       init_repos &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>$1
>> +       EOF
>> +       run_on_all "../edit-contents README.md" &&
>> +
>> +       test_all_match git add README.md &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git commit -m "Add README.md" &&
>> +
>> +       test_all_match git checkout HEAD~1 &&
>> +       test_all_match git checkout - &&
>> +
>> +       run_on_all "../edit-contents README.md" &&
>> +
>> +       test_all_match git add -A &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git commit -m "Extend README.md" &&
>> +
>> +       test_all_match git checkout HEAD~1 &&
>> +       test_all_match git checkout - &&
> 
> Up to here, this test is identical to the previous one.  Why repeat it?

Unintentional. I cleaned this up in a later commit of my
prototype, but forgot to delete that back in this patch.
Will fix.

>> +
>> +       run_on_all "../edit-contents deep/newfile" &&
>> +
>> +       test_all_match git status --porcelain=v2 -uno &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git add . &&
>> +       test_all_match git status --porcelain=v2 &&
>> +       test_all_match git commit -m "add deep/newfile" &&
>> +
>> +       test_all_match git checkout HEAD~1 &&
>> +       test_all_match git checkout -
>> +'
> 
> Think out loud...so you are only adding files that were not previously
> tracked and that would have been part of the sparse cone.  You aren't
> trying to add files that would be outside the sparse cone, or manually
> creating files missing from the working tree due to sparseness and
> then attempting to add them.  (Which is fine, we have to start
> somewhere with our testing.  Also, I think my testcases didn't look at
> the case you did, and only covered one of these other two cases.)

Yes, these tests are currently focusing on the "happy" cases
of what is happening within the sparse cone. I plan to expand
to the more complicated cases later, as I start implementing
them correctly with the sparse-index. However, it would be fine
to have the tests here earlier. Extra documentation of the
expected behavior (or how the current implementation is not
desirable) would be good.

>> +
>> +test_expect_success 'checkout and reset --hard' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout update-folder1 &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       test_all_match git checkout update-deep &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       test_all_match git checkout -b reset-test &&
>> +       test_all_match git reset --hard deepest &&
>> +       test_all_match git reset --hard update-folder1 &&
>> +       test_all_match git reset --hard update-folder2
>> +'
>> +
>> +test_expect_success 'diff --staged' '
>> +       init_repos &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>README.md
>> +       EOF
>> +       run_on_all "../edit-contents" &&
>> +
>> +       test_all_match git diff &&
>> +       test_all_match git diff --staged &&
>> +       test_all_match git add README.md &&
>> +       test_all_match git diff &&
>> +       test_all_match git diff --staged
>> +'
> 
> Also a case where you're operating on a file that matches the sparsity
> patterns (in cone mode, everything in the root directory is included).
> 
>> +test_expect_success 'diff with renames' '
>> +       init_repos &&
>> +
>> +       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
>> +       do
>> +               test_all_match git checkout rename-base &&
>> +               test_all_match git checkout $branch -- .&&
>> +               test_all_match git diff --staged &&
>> +               test_all_match git diff --staged --find-renames || return 1
> 
> Aren't these last two lines the same? (diff.renames defaults to true
> ever since commit 5404c116aa, "diff: activate diff.renames by
> default", 2016-02-25)  Are they only different because you have a
> tweaked config that turns off renames by default?
> 
> Perhaps the first diff line should have a --no-renames flag.

Thanks! Will do.

>> +       done
>> +'
>> +
>> +test_expect_success 'log with pathspec outside sparse definition' '
>> +       init_repos &&
>> +
>> +       test_all_match git log -- a &&
>> +       test_all_match git log -- folder1/a &&
>> +       test_all_match git log -- folder2/a &&
>> +       test_all_match git log -- deep/a &&
>> +       test_all_match git log -- deep/deeper1/a &&
>> +       test_all_match git log -- deep/deeper1/deepest/a &&
>> +
>> +       test_all_match git checkout update-folder1 &&
>> +       test_all_match git log -- folder1/a
>> +'
>> +
>> +test_expect_success 'blame with pathspec inside sparse definition' '
>> +       init_repos &&
>> +
>> +       test_all_match git blame a &&
>> +       test_all_match git blame deep/a &&
>> +       test_all_match git blame deep/deeper1/a &&
>> +       test_all_match git blame deep/deeper1/deepest/a
>> +'
> 
> Good check.
> 
> On a side note going back to a piece of the other thread I didn't get
> a response to, I'm still curious whether
>     git blame -C -C $PATH_INSIDE_SPARSE_DEFINITION
> should (optionally?) behave differently in a sparse checkout.  In
> particular, should it limit its copy detection to other paths also in
> the sparse checkout, or should it always search all other files within
> the repository for copied lines?  Searching just within the sparse
> checkout seems like it could be a really nice performance
> optimization.

All of the "find movements or copies" logic could benefit from a
"universal" option to restrict to the sparse-checkout definition.

>> +
>> +# TODO: blame currently does not support blaming files outside of the
>> +# sparse definition. It complains that the file doesn't exist locally.
> 
> Nice catch.  Yeah, blame tries to check the local working copy for
> changes, and shows those lines with a changed in commit 0000000000.
> We should add a check that says that if the file is SKIP_WORKTREE,
> then we treat it the same as `git blame $PATH HEAD`.

Right. If it's not in the working directory, then we should
interpret that as HEAD.

>> +test_expect_failure 'blame with pathspec outside sparse definition' '
>> +       init_repos &&
>> +
>> +       test_all_match git blame folder1/a &&
>> +       test_all_match git blame folder2/a &&
>> +       test_all_match git blame deep/deeper2/a &&
>> +       test_all_match git blame deep/deeper2/deepest/a
>> +'
>> +
>> +# TODO: reset currently does not behave as expected when in a
>> +# sparse-checkout.
> 
> I'm going to go to test this out to see what it does.  It's the first
> testcase you listed that I didn't know how it worked and couldn't
> figure it out from your comments.  However it turns out, definitely a
> good test to have.
>
>> +test_expect_failure 'checkout and reset (mixed)' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout -b reset-test update-deep &&
>> +       test_all_match git reset deepest &&
>> +       test_all_match git reset update-folder1 &&
>> +       test_all_match git reset update-folder2
>> +'

Here are the failures:

after "git checkout -b reset-test update-deep"

--- full-checkout-out   2021-01-21 11:51:30.059713246 +0000
+++ sparse-checkout-out 2021-01-21 11:51:30.063713235 +0000
@@ -1,5 +1,5 @@
 Unstaged changes after reset:
 M      deep/a
 M      deep/deeper1/deepest/a
-M      folder1/a
-M      folder2/a
+D      folder1/a
+D      folder2/a

If we comment out that line, the failures for
"git reset update-folder1" is:

--- full-checkout-out   2021-01-21 11:52:36.203509990 +0000
+++ sparse-checkout-out 2021-01-21 11:52:36.207509976 +0000
@@ -1,2 +1,2 @@
 Unstaged changes after reset:
-M      folder1/a
+D      folder1/a

If we comment out that line, then the failure for
"git reset update-folder2" is:

--- full-checkout-out   2021-01-21 11:53:55.199197608 +0000
+++ sparse-checkout-out 2021-01-21 11:53:55.207197573 +0000
@@ -1,3 +1,3 @@
 Unstaged changes after reset:
-M      folder1/a
-M      folder2/a
+D      folder1/a
+D      folder2/a

Oddly, when I merge this into our branch in microsoft/git,
these failures disappear. There is something in those commits
that resolve this particular case. I hope to figure that out
sometime.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-20 16:53 ` [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
  2021-01-20 17:26   ` Elijah Newren
@ 2021-01-21 12:53   ` Chris Torek
  2021-01-21 15:56     ` Derrick Stolee
  1 sibling, 1 reply; 65+ messages in thread
From: Chris Torek @ 2021-01-21 12:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git List, Elijah Newren, Derrick Stolee, Derrick Stolee

On Wed, Jan 20, 2021 at 8:58 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The index has an fsmonitor_dirty bitmap that records which index entries
> are "dirty" based on the response from the FSMonitor. If this bitmap
> ever grows larger than the index, then there was an error in how it was
> constructed, and it was probably a developer's bug.

Curious: some of the tests were >=, some were > (not >=).  Now
that they're shared in a function they are all ">".

It's pretty clear that for size-based ones, greater-than is the
right test, but for position ones, isn't it still greater-or-equal?  So
perhaps the calls that pass an actual position should add 1...

Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-21 12:53   ` Chris Torek
@ 2021-01-21 15:56     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-21 15:56 UTC (permalink / raw)
  To: Chris Torek, Derrick Stolee via GitGitGadget
  Cc: Git List, Elijah Newren, Derrick Stolee, Derrick Stolee

On 1/21/2021 7:53 AM, Chris Torek wrote:
> On Wed, Jan 20, 2021 at 8:58 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The index has an fsmonitor_dirty bitmap that records which index entries
>> are "dirty" based on the response from the FSMonitor. If this bitmap
>> ever grows larger than the index, then there was an error in how it was
>> constructed, and it was probably a developer's bug.
> 
> Curious: some of the tests were >=, some were > (not >=).  Now
> that they're shared in a function they are all ">".
> 
> It's pretty clear that for size-based ones, greater-than is the
> right test, but for position ones, isn't it still greater-or-equal?  So
> perhaps the calls that pass an actual position should add 1...

That's a good point. I should pass "pos + 1" in the appropriate
places.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 0/8] More index cleanups
  2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-01-20 16:53 ` [PATCH 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03 ` Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 1/8] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
                     ` (9 more replies)
  9 siblings, 10 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee

This is based on ds/cache-tree-basics.

Here are a few more cleanups that are vaguely related to the index. I
discovered these while preparing my sparse-index RFC that I intend to send
early next week.

The biggest patch is the final one, which creates a test script for
comparing sparse-checkouts to full checkouts. There are some commands that
do not behave similarly. This script will be the backbone of my testing
strategy for the sparse-index by adding a new mode to compare
sparse-checkouts with the two index types (full and sparse).


UPDATES IN V2
=============

 * Fixed duplicated test in t1092.

 * Changed the implementation of 'test_region' after I discovered the
   negation doesn't work correctly. (I updated the test to use what was in
   t0500-progress-display.sh at the last minute before v1, but that
   implementation was wrong.) The use of it in t0500-progress-display.sh was
   incorrect, as well.

 * Updated commit messages to be more informative and have fewer typos.

 * I dropped the patch that placed the sparse-checkout patterns in struct
   index_state. I'll re-introduce that in time for the actual use of the
   member.

Thanks, -Stolee

Derrick Stolee (8):
  cache-tree: clean up cache_tree_update()
  cache-tree: extract subtree_pos()
  fsmonitor: de-duplicate BUG()s around dirty bits
  repository: add repo reference to index_state
  name-hash: use trace2 regions for init
  sparse-checkout: load sparse-checkout patterns
  test-lib: test_region looks for trace2 regions
  t1092: test interesting sparse-checkout scenarios

 builtin/sparse-checkout.c                |   5 -
 cache-tree.c                             |  20 +-
 cache-tree.h                             |   2 +
 cache.h                                  |   1 +
 dir.c                                    |  17 ++
 dir.h                                    |   2 +
 fsmonitor.c                              |  27 +-
 name-hash.c                              |   3 +
 repository.c                             |   4 +
 t/t0500-progress-display.sh              |   3 +-
 t/t1092-sparse-checkout-compatibility.sh | 298 +++++++++++++++++++++++
 t/test-lib-functions.sh                  |  40 +++
 unpack-trees.c                           |   6 +-
 13 files changed, 394 insertions(+), 34 deletions(-)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh


base-commit: a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-839%2Fderrickstolee%2Fmore-index-cleanups-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-839/derrickstolee/more-index-cleanups-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/839

Range-diff vs v1:

  1:  0bccfd34ae5 !  1:  f9dccaed0ac cache-tree: clean up cache_tree_update()
     @@ Commit message
          cache-tree: clean up cache_tree_update()
      
          Make the method safer by allocating a cache_tree member for the given
     -    index_state if it is not already present.
     +    index_state if it is not already present. This is preferrable to a
     +    BUG() statement or returning with an error because future callers will
     +    want to populate an empty cache-tree using this method.
      
          Also drop local variables that are used exactly once and can be found
          directly from the 'istate' parameter.
  2:  a6f2406a795 =  2:  84323e04d08 cache-tree: extract subtree_pos()
  3:  838922de2e9 =  3:  31095f9aa0e fsmonitor: de-duplicate BUG()s around dirty bits
  4:  d4ff0468fc0 !  4:  a0d89d7a973 repository: add repo reference to index_state
     @@ Metadata
       ## Commit message ##
          repository: add repo reference to index_state
      
     -    It will be helpful to add behavior to index opertations that might
     +    It will be helpful to add behavior to index operations that might
          trigger an object lookup. Since each index belongs to a specific
          repository, add a 'repo' pointer to struct index_state that allows
          access to this repository.
  5:  3ba4b35f09c =  5:  bc092f5c703 name-hash: use trace2 regions for init
  6:  64358ec7ea2 =  6:  04d1daf7222 sparse-checkout: load sparse-checkout patterns
  7:  91344f5108c <  -:  ----------- sparse-checkout: hold pattern list in index
  8:  8326a9b5320 !  7:  8832ce84623 test-lib: test_region looks for trace2 regions
     @@ Commit message
          entered and left in a given trace2 event log.
      
          There is one existing test (t0500-progress-display.sh) that performs
     -    this check already, so use the helper function instead. More uses will
     -    be added in a later change.
     +    this check already, so use the helper function instead. Note that this
     +    changes the expectations slightly. The old test (incorrectly) used two
     +    patterns for the 'grep' invocation, but this performs an OR of the
     +    patterns, not an AND. This means that as long as one region_enter event
     +    was logged, the test would succeed, even if it was not due to the
     +    progress category.
     +
     +    More uses will be added in a later change.
      
          t6423-merge-rename-directories.sh also greps for region_enter lines, but
          it verifies the number of such lines, which is not the same as an
     @@ t/t0500-progress-display.sh: test_expect_success 'progress generates traces' '
       	# t0212/parse_events.perl intentionally omits regions and data.
      -	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
      -	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
     -+	test_region category progress trace.event &&
     ++	test_region progress "Working hard" trace.event &&
       	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
       	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
       '
     @@ t/test-lib-functions.sh: test_subcommand () {
      +		shift
      +	fi
      +
     -+	grep -e "region_enter" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
     ++	grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
      +	exitcode=$?
      +
      +	if test $exitcode != $expect_exit
     @@ t/test-lib-functions.sh: test_subcommand () {
      +		return 1
      +	fi
      +
     -+	grep -e "region_leave" -e "\"category\":\"$1\",\"label\":\"$2\"" "$3"
     ++	grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
      +	exitcode=$?
      +
      +	if test $exitcode != $expect_exit
  9:  555e210dc03 !  8:  984458007ed t1092: test interesting sparse-checkout scenarios
     @@ Commit message
          These also document some behaviors that differ from a full checkout, and
          possibly in a way that is not intended.
      
     +    The test is designed to be run with "--run=1,X" where 'X' is an
     +    interesting test case. Each test uses 'init_repos' to reset the full and
     +    sparse copies of the initial-repo that is created by the first test
     +    case. This also makes it possible to have test cases leave the working
     +    directory or index in unusual states without disturbing later cases.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh (new) ##
     @@ t/t1092-sparse-checkout-compatibility.sh (new)
      +	init_repos &&
      +
      +	write_script edit-contents <<-\EOF &&
     -+	echo text >>README.md
     -+	EOF
     -+	run_on_all "../edit-contents" &&
     -+
     -+	test_all_match git add README.md &&
     -+	test_all_match git status --porcelain=v2 &&
     -+	test_all_match git commit -m "Add README.md" &&
     -+
     -+	test_all_match git checkout HEAD~1 &&
     -+	test_all_match git checkout - &&
     -+
     -+	run_on_all "../edit-contents" &&
     -+
     -+	test_all_match git add -A &&
     -+	test_all_match git status --porcelain=v2 &&
     -+	test_all_match git commit -m "Extend README.md" &&
     -+
     -+	test_all_match git checkout HEAD~1 &&
     -+	test_all_match git checkout -
     -+'
     -+
     -+test_expect_success 'add, commit, checkout' '
     -+	init_repos &&
     -+
     -+	write_script edit-contents <<-\EOF &&
      +	echo text >>$1
      +	EOF
      +	run_on_all "../edit-contents README.md" &&
     @@ t/t1092-sparse-checkout-compatibility.sh (new)
      +	do
      +		test_all_match git checkout rename-base &&
      +		test_all_match git checkout $branch -- .&&
     -+		test_all_match git diff --staged &&
     ++		test_all_match git diff --staged --no-renames &&
      +		test_all_match git diff --staged --find-renames || return 1
      +	done
      +'

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 1/8] cache-tree: clean up cache_tree_update()
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 19:11     ` Junio C Hamano
  2021-01-22 13:03   ` [PATCH v2 2/8] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.

Also drop local variables that are used exactly once and can be found
directly from the 'istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 3f1a8d4f1b7..c1e49901c17 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
-	struct cache_tree *it = istate->cache_tree;
-	struct cache_entry **cache = istate->cache;
-	int entries = istate->cache_nr;
-	int skip, i = verify_cache(cache, entries, flags);
+	int skip, i;
+
+	i = verify_cache(istate->cache, istate->cache_nr, flags);
 
 	if (i)
 		return i;
+
+	if (!istate->cache_tree)
+		istate->cache_tree = cache_tree();
+
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", the_repository);
-	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
+		       "", 0, &skip, flags);
 	trace2_region_leave("cache_tree", "update", the_repository);
 	trace_performance_leave("cache_tree_update");
 	if (i < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 2/8] cache-tree: extract subtree_pos()
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 1/8] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 6 +++---
 cache-tree.h | 2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index c1e49901c17..2b130dd5e19 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -45,7 +45,7 @@ static int subtree_name_cmp(const char *one, int onelen,
 	return memcmp(one, two, onelen);
 }
 
-static int subtree_pos(struct cache_tree *it, const char *path, int pathlen)
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen)
 {
 	struct cache_tree_sub **down = it->down;
 	int lo, hi;
@@ -72,7 +72,7 @@ static struct cache_tree_sub *find_subtree(struct cache_tree *it,
 					   int create)
 {
 	struct cache_tree_sub *down;
-	int pos = subtree_pos(it, path, pathlen);
+	int pos = cache_tree_subtree_pos(it, path, pathlen);
 	if (0 <= pos)
 		return it->down[pos];
 	if (!create)
@@ -123,7 +123,7 @@ static int do_invalidate_path(struct cache_tree *it, const char *path)
 	it->entry_count = -1;
 	if (!*slash) {
 		int pos;
-		pos = subtree_pos(it, path, namelen);
+		pos = cache_tree_subtree_pos(it, path, namelen);
 		if (0 <= pos) {
 			cache_tree_free(&it->down[pos]->cache_tree);
 			free(it->down[pos]);
diff --git a/cache-tree.h b/cache-tree.h
index 639bfa5340e..8efeccebfc9 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -27,6 +27,8 @@ void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct index_state *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 1/8] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 2/8] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 19:18     ` Junio C Hamano
  2021-01-22 13:03   ` [PATCH v2 4/8] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The index has an fsmonitor_dirty bitmap that records which index entries
are "dirty" based on the response from the FSMonitor. If this bitmap
ever grows larger than the index, then there was an error in how it was
constructed, and it was probably a developer's bug.

There are several BUG() statements that are very similar, so replace
these uses with a simpler assert_index_minimum(). Since there is one
caller that uses a custom 'pos' value instead of the bit_size member, we
cannot simplify it too much. However, the error string is identical in
each, so this simplifies things.

The end result is that the code is simpler to read while also preserving
these assertions for developers in the FSMonitor space.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 fsmonitor.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index ca031c3abb8..52a50a9545a 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -13,14 +13,19 @@
 
 struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR);
 
+static void assert_index_minimum(struct index_state *istate, size_t pos)
+{
+	if (pos > istate->cache_nr)
+		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
+		    (uintmax_t)pos, istate->cache_nr);
+}
+
 static void fsmonitor_ewah_callback(size_t pos, void *is)
 {
 	struct index_state *istate = (struct index_state *)is;
 	struct cache_entry *ce;
 
-	if (pos >= istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
-		    (uintmax_t)pos, istate->cache_nr);
+	assert_index_minimum(istate, pos);
 
 	ce = istate->cache[pos];
 	ce->ce_flags &= ~CE_FSMONITOR_VALID;
@@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
 	}
 	istate->fsmonitor_dirty = fsmonitor_dirty;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
 	return 0;
@@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
 	uint32_t ewah_size = 0;
 	int fixup = 0;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
 	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
@@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate)
 			}
 
 			/* Mark all previously saved entries as dirty */
-			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+			assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
 
 			refresh_fsmonitor(istate);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 4/8] repository: add repo reference to index_state
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 19:23     ` Junio C Hamano
  2021-01-22 13:03   ` [PATCH v2 5/8] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.

This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h      | 1 +
 repository.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/cache.h b/cache.h
index 71097657489..f9c7a603841 100644
--- a/cache.h
+++ b/cache.h
@@ -328,6 +328,7 @@ struct index_state {
 	struct ewah_bitmap *fsmonitor_dirty;
 	struct mem_pool *ce_mem_pool;
 	struct progress *progress;
+	struct repository *repo;
 };
 
 /* Name hashing */
diff --git a/repository.c b/repository.c
index a4174ddb062..67a4c1da2d9 100644
--- a/repository.c
+++ b/repository.c
@@ -264,6 +264,10 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
+	/* Complete the double-reference */
+	if (!repo->index->repo)
+		repo->index->repo = repo;
+
 	return read_index_from(repo->index, repo->index_file, repo->gitdir);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 5/8] name-hash: use trace2 regions for init
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 4/8] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 6/8] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The lazy_init_name_hash() populates a hashset with all filenames and
another with all directories represented in the index. This is run only
if we need to use the hashsets to check for existence or case-folding
renames.

Place trace2 regions where there is already a performance trace.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 5d3c7b12c18..4e03fac9bb1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -7,6 +7,7 @@
  */
 #include "cache.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -577,6 +578,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	trace_performance_enter();
+	trace2_region_enter("index", "name-hash-init", istate->repo);
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -597,6 +599,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
+	trace2_region_leave("index", "name-hash-init", istate->repo);
 	trace_performance_leave("initialize name hash");
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 6/8] sparse-checkout: load sparse-checkout patterns
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 5/8] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 13:03   ` [PATCH v2 7/8] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.

The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.

The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c |  5 -----
 dir.c                     | 17 +++++++++++++++++
 dir.h                     |  2 ++
 unpack-trees.c            |  6 +-----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index e3140db2a0a..2306a9ad98e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -22,11 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = {
 	NULL
 };
 
-static char *get_sparse_checkout_filename(void)
-{
-	return git_pathdup("info/sparse-checkout");
-}
-
 static void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 {
 	int i;
diff --git a/dir.c b/dir.c
index d637461da5c..d153a63bbd1 100644
--- a/dir.c
+++ b/dir.c
@@ -2998,6 +2998,23 @@ void setup_standard_excludes(struct dir_struct *dir)
 	}
 }
 
+char *get_sparse_checkout_filename(void)
+{
+	return git_pathdup("info/sparse-checkout");
+}
+
+int get_sparse_checkout_patterns(struct pattern_list *pl)
+{
+	int res;
+	char *sparse_filename = get_sparse_checkout_filename();
+
+	pl->use_cone_patterns = core_sparse_checkout_cone;
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL);
+
+	free(sparse_filename);
+	return res;
+}
+
 int remove_path(const char *name)
 {
 	char *slash;
diff --git a/dir.h b/dir.h
index a3c40dec516..facfae47402 100644
--- a/dir.h
+++ b/dir.h
@@ -448,6 +448,8 @@ int is_empty_dir(const char *dir);
 
 void setup_standard_excludes(struct dir_struct *dir);
 
+char *get_sparse_checkout_filename(void);
+int get_sparse_checkout_patterns(struct pattern_list *pl);
 
 /* Constants for remove_dir_recursively: */
 
diff --git a/unpack-trees.c b/unpack-trees.c
index af6e9b9c2fd..837b8bb42fb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1549,14 +1549,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 static void populate_from_existing_patterns(struct unpack_trees_options *o,
 					    struct pattern_list *pl)
 {
-	char *sparse = git_pathdup("info/sparse-checkout");
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0)
+	if (get_sparse_checkout_patterns(pl) < 0)
 		o->skip_sparse_checkout = 1;
 	else
 		o->pl = pl;
-	free(sparse);
 }
 
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 7/8] test-lib: test_region looks for trace2 regions
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 6/8] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 19:42     ` Junio C Hamano
  2021-01-22 13:03   ` [PATCH v2 8/8] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.

Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.

There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. Note that this
changes the expectations slightly. The old test (incorrectly) used two
patterns for the 'grep' invocation, but this performs an OR of the
patterns, not an AND. This means that as long as one region_enter event
was logged, the test would succeed, even if it was not due to the
progress category.

More uses will be added in a later change.

t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t0500-progress-display.sh |  3 +--
 t/test-lib-functions.sh     | 40 +++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index 1ed1df351cb..84cce345e7d 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' '
 		"Working hard" <in 2>stderr &&
 
 	# t0212/parse_events.perl intentionally omits regions and data.
-	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
-	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
+	test_region progress "Working hard" trace.event &&
 	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
 	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 999982fe4a9..1170f7108ac 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1655,3 +1655,43 @@ test_subcommand () {
 		grep "\[$expr\]"
 	fi
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_region [!] <category> <label> git <command> <args>...
+#
+# For example, to look for trace2_region_enter("index", "do_read_index", repo)
+# in an invocation of "git checkout HEAD~1", run
+#
+#	GIT_TRACE2_EVENT="$(pwd)/trace.txt" GIT_TRACE2_EVENT_NESTING=10 \
+#		git checkout HEAD~1 &&
+#	test_region index do_read_index <trace.txt
+#
+# If the first parameter passed is !, this instead checks that
+# the given region was not entered.
+#
+test_region () {
+	local expect_exit=0
+	if test "$1" = "!"
+	then
+		expect_exit=1
+		shift
+	fi
+
+	grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+
+	grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 8/8] t1092: test interesting sparse-checkout scenarios
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 7/8] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-22 13:03   ` Derrick Stolee via GitGitGadget
  2021-01-22 19:49   ` [PATCH v2 0/8] More index cleanups Elijah Newren
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-22 13:03 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These also document some behaviors that differ from a full checkout, and
possibly in a way that is not intended.

The test is designed to be run with "--run=1,X" where 'X' is an
interesting test case. Each test uses 'init_repos' to reset the full and
sparse copies of the initial-repo that is created by the first test
case. This also makes it possible to have test cases leave the working
directory or index in unusual states without disturbing later cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 298 +++++++++++++++++++++++
 1 file changed, 298 insertions(+)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
new file mode 100755
index 00000000000..141cbe822b6
--- /dev/null
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -0,0 +1,298 @@
+#!/bin/sh
+
+test_description='compare full workdir to sparse workdir'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init initial-repo &&
+	(
+		cd initial-repo &&
+		echo a >a &&
+		echo "after deep" >e &&
+		echo "after folder1" >g &&
+		mkdir folder1 folder2 deep x &&
+		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1/deepest &&
+		echo "after deeper1" >deep/e &&
+		echo "after deepest" >deep/deeper1/e &&
+		cp a folder1 &&
+		cp a folder2 &&
+		cp a deep &&
+		cp a deep/deeper1 &&
+		cp a deep/deeper2 &&
+		cp a deep/deeper1/deepest &&
+		git add . &&
+		git commit -m "initial commit" &&
+		git checkout -b base &&
+		for dir in folder1 folder2 deep
+		do
+			git checkout -b update-$dir &&
+			echo "updated $dir" >$dir/a &&
+			git commit -a -m "update $dir" || return 1
+		done &&
+
+		git checkout -b rename-base base &&
+		echo >folder1/larger-content <<-\EOF &&
+		matching
+		lines
+		help
+		inexact
+		renames
+		EOF
+		cp folder1/larger-content folder2/ &&
+		cp folder1/larger-content deep/deeper1/ &&
+		git add . &&
+		git commit -m "add interesting rename content" &&
+
+		git checkout -b rename-out-to-out rename-base &&
+		mv folder1/a folder2/b &&
+		mv folder1/larger-content folder2/edited-content &&
+		echo >>folder2/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to folder2/..." &&
+
+		git checkout -b rename-out-to-in rename-base &&
+		mv folder1/a deep/deeper1/b &&
+		mv folder1/larger-content deep/deeper1/edited-content &&
+		echo >>deep/deeper1/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to deep/deeper1/..." &&
+
+		git checkout -b rename-in-to-out rename-base &&
+		mv deep/deeper1/a folder1/b &&
+		mv deep/deeper1/larger-content folder1/edited-content &&
+		echo >>folder1/edited-content &&
+		git add . &&
+		git commit -m "rename deep/deeper1/... to folder1/..." &&
+
+		git checkout -b deepest base &&
+		echo "updated deepest" >deep/deeper1/deepest/a &&
+		git commit -a -m "update deepest" &&
+
+		git checkout -f base &&
+		git reset --hard
+	)
+'
+
+init_repos () {
+	rm -rf full-checkout sparse-checkout sparse-index &&
+
+	# create repos in initial state
+	cp -r initial-repo full-checkout &&
+	git -C full-checkout reset --hard &&
+
+	cp -r initial-repo sparse-checkout &&
+	git -C sparse-checkout reset --hard &&
+	git -C sparse-checkout sparse-checkout init --cone &&
+
+	# initialize sparse-checkout definitions
+	git -C sparse-checkout sparse-checkout set deep
+}
+
+run_on_sparse () {
+	(
+		cd sparse-checkout &&
+		$* >../sparse-checkout-out 2>../sparse-checkout-err
+	)
+}
+
+run_on_all () {
+	(
+		cd full-checkout &&
+		$* >../full-checkout-out 2>../full-checkout-err
+	) &&
+	run_on_sparse $*
+}
+
+test_all_match () {
+	run_on_all $* &&
+	test_cmp full-checkout-out sparse-checkout-out &&
+	test_cmp full-checkout-err sparse-checkout-err
+}
+
+test_expect_success 'status with options' '
+	init_repos &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	run_on_all "touch README.md" &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno
+'
+
+test_expect_success 'add, commit, checkout' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Add README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add -A &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Extend README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents deep/newfile" &&
+
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "add deep/newfile" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout -
+'
+
+test_expect_success 'checkout and reset --hard' '
+	init_repos &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout update-deep &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout -b reset-test &&
+	test_all_match git reset --hard deepest &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset --hard update-folder2
+'
+
+test_expect_success 'diff --staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>README.md
+	EOF
+	run_on_all "../edit-contents" &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	test_all_match git add README.md &&
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
+test_expect_success 'diff with renames' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		test_all_match git checkout rename-base &&
+		test_all_match git checkout $branch -- .&&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'log with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git log -- a &&
+	test_all_match git log -- folder1/a &&
+	test_all_match git log -- folder2/a &&
+	test_all_match git log -- deep/a &&
+	test_all_match git log -- deep/deeper1/a &&
+	test_all_match git log -- deep/deeper1/deepest/a &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git log -- folder1/a
+'
+
+test_expect_success 'blame with pathspec inside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame a &&
+	test_all_match git blame deep/a &&
+	test_all_match git blame deep/deeper1/a &&
+	test_all_match git blame deep/deeper1/deepest/a
+'
+
+# TODO: blame currently does not support blaming files outside of the
+# sparse definition. It complains that the file doesn't exist locally.
+test_expect_failure 'blame with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame folder1/a &&
+	test_all_match git blame folder2/a &&
+	test_all_match git blame deep/deeper2/a &&
+	test_all_match git blame deep/deeper2/deepest/a
+'
+
+# TODO: reset currently does not behave as expected when in a
+# sparse-checkout.
+test_expect_failure 'checkout and reset (mixed)' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset deepest &&
+	test_all_match git reset update-folder1 &&
+	test_all_match git reset update-folder2
+'
+
+test_expect_success 'merge' '
+	init_repos &&
+
+	test_all_match git checkout -b merge update-deep &&
+	test_all_match git merge -m "folder1" update-folder1 &&
+	test_all_match git rev-parse HEAD^{tree} &&
+	test_all_match git merge -m "folder2" update-folder2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
+test_expect_success 'merge with outside renames' '
+	init_repos &&
+
+	for type in out-to-out out-to-in in-to-out
+	do
+		test_all_match git reset --hard &&
+		test_all_match git checkout -f -b merge-$type update-deep &&
+		test_all_match git merge -m "$type" rename-$type &&
+		test_all_match git rev-parse HEAD^{tree} || return 1
+	done
+'
+
+test_expect_success 'clean' '
+	init_repos &&
+
+	echo bogus >>.gitignore &&
+	run_on_all cp ../.gitignore . &&
+	test_all_match git add .gitignore &&
+	test_all_match git commit -m ignore-bogus-files &&
+
+	run_on_sparse mkdir folder1 &&
+	run_on_all touch folder1/bogus &&
+
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git clean -f &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xdf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_path_is_dir sparse-checkout/folder1
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 1/8] cache-tree: clean up cache_tree_update()
  2021-01-22 13:03   ` [PATCH v2 1/8] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
@ 2021-01-22 19:11     ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 19:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Make the method safer by allocating a cache_tree member for the given
> index_state if it is not already present. This is preferrable to a
> BUG() statement or returning with an error because future callers will
> want to populate an empty cache-tree using this method.

How do the current callers prepare the istate to be passed to this
function?  The implications of this question obviously are:

 - why is it unreasonable for future callers to follow suit?  or

 - now the callers are no longer responsible to give an empty
   cache_tree to istate before calling this function, shouldn't
   existing callers lose code that is no longer needed?

I would imagine that the first is easily answered with "but that
would be more code on the callers' side", but then the second one
becomes even more relevant, no?

> Also drop local variables that are used exactly once and can be found
> directly from the 'istate' parameter.

It actuall is used twice, no?

I do find it an improvement because it makes it clearer that
istate->{cache,cache_nr} comes in pairs.

I wonder if verify_cache() wants to take istate as a parameter,
replacing the first two?  update_one() shifts where it starts
working in the array and reduces the number of entries as it digs
deeper, so it still has to keep taking the (base, nr) pair (iow, its
second and third parameters cannot be replaced with an istate).

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache-tree.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 3f1a8d4f1b7..c1e49901c17 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it,
>  
>  int cache_tree_update(struct index_state *istate, int flags)
>  {
> -	struct cache_tree *it = istate->cache_tree;
> -	struct cache_entry **cache = istate->cache;
> -	int entries = istate->cache_nr;
> -	int skip, i = verify_cache(cache, entries, flags);
> +	int skip, i;
> +
> +	i = verify_cache(istate->cache, istate->cache_nr, flags);
>  
>  	if (i)
>  		return i;
> +
> +	if (!istate->cache_tree)
> +		istate->cache_tree = cache_tree();
> +
>  	trace_performance_enter();
>  	trace2_region_enter("cache_tree", "update", the_repository);
> -	i = update_one(it, cache, entries, "", 0, &skip, flags);
> +	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> +		       "", 0, &skip, flags);
>  	trace2_region_leave("cache_tree", "update", the_repository);
>  	trace_performance_leave("cache_tree_update");
>  	if (i < 0)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-22 13:03   ` [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
@ 2021-01-22 19:18     ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 19:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The index has an fsmonitor_dirty bitmap that records which index entries
> are "dirty" based on the response from the FSMonitor. If this bitmap
> ever grows larger than the index, then there was an error in how it was
> constructed, and it was probably a developer's bug.
>
> There are several BUG() statements that are very similar, so replace
> these uses with a simpler assert_index_minimum(). Since there is one
> caller that uses a custom 'pos' value instead of the bit_size member, we
> cannot simplify it too much. However, the error string is identical in
> each, so this simplifies things.

Also that single caller with a custom 'pos' used to allow 'pos' to
point one beyond the end of istate->cache[] array, but now it is
forbidden.  If this is a strict bugfix, it probably deserves a
mention here.

> The end result is that the code is simpler to read while also preserving
> these assertions for developers in the FSMonitor space.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  fsmonitor.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index ca031c3abb8..52a50a9545a 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -13,14 +13,19 @@
>  
>  struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR);
>  
> +static void assert_index_minimum(struct index_state *istate, size_t pos)
> +{
> +	if (pos > istate->cache_nr)
> +		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> +		    (uintmax_t)pos, istate->cache_nr);
> +}
> +
>  static void fsmonitor_ewah_callback(size_t pos, void *is)
>  {
>  	struct index_state *istate = (struct index_state *)is;
>  	struct cache_entry *ce;
>  
> -	if (pos >= istate->cache_nr)
> -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> -		    (uintmax_t)pos, istate->cache_nr);
> +	assert_index_minimum(istate, pos);
>  
>  	ce = istate->cache[pos];
>  	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> @@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
>  	}
>  	istate->fsmonitor_dirty = fsmonitor_dirty;
>  
> -	if (!istate->split_index &&
> -	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +	if (!istate->split_index)
> +		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>  
>  	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
>  	return 0;
> @@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
>  	uint32_t ewah_size = 0;
>  	int fixup = 0;
>  
> -	if (!istate->split_index &&
> -	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +	if (!istate->split_index)
> +		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>  
>  	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
>  	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
> @@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate)
>  			}
>  
>  			/* Mark all previously saved entries as dirty */
> -			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +			assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
>  			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
>  
>  			refresh_fsmonitor(istate);

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 4/8] repository: add repo reference to index_state
  2021-01-22 13:03   ` [PATCH v2 4/8] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
@ 2021-01-22 19:23     ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 19:23 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> It will be helpful to add behavior to index operations that might
> trigger an object lookup. Since each index belongs to a specific
> repository, add a 'repo' pointer to struct index_state that allows
> access to this repository.
>
> This will prevent future changes from needing to pass an additional
> 'struct repository *repo' parameter and instead rely only on the 'struct
> index_state *istate' parameter.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---

I think this makes sense, but shouldn't we insist on these
bidirectional links to point at each other?  Otherwise we cannot
simplify the function signatures safely later.  That is ...

> +	/* Complete the double-reference */
> +	if (!repo->index->repo)
> +		repo->index->repo = repo;
> +

	else if (repo->index->repo != repo)
		BUG("the repo->index instance does not belong to the repo???");

... a check like this?

>  	return read_index_from(repo->index, repo->index_file, repo->gitdir);
>  }

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 7/8] test-lib: test_region looks for trace2 regions
  2021-01-22 13:03   ` [PATCH v2 7/8] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-22 19:42     ` Junio C Hamano
  2021-01-23 18:36       ` Derrick Stolee
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 19:42 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> ... Note that this
> changes the expectations slightly. The old test (incorrectly) used two
> patterns for the 'grep' invocation, but this performs an OR of the
> patterns, not an AND. This means that as long as one region_enter event
> was logged, the test would succeed, even if it was not due to the
> progress category.

>  	# t0212/parse_events.perl intentionally omits regions and data.
> -	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
> -	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
> +	test_region progress "Working hard" trace.event &&

So we do want "region_enter" and the "category":"progress" on the
same line in the event file, but as long as "category":"progress"
exists, both will pass, regardless of enter/leave.  And ...

> +test_region () {
> +	local expect_exit=0
> +	if test "$1" = "!"
> +	then
> +		expect_exit=1
> +		shift
> +	fi
> +
> +	grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"

... this makes sure there is enter/category on a line (and
leave/category on a line with another check).  Makes sense.

But...

	test_region '!' '\(unmatching capture)' 'two' 'three'

would try to use an invalid regexp and cause grep to exit with 2,
which would mean ...

> +	exitcode=$?
> +
> +	if test $exitcode != $expect_exit

... this will not trigger and we return "success" (i.e. "failed as
expected")?

	Clarification.  The point is *NOT* that the grep pattern is
	not robust against funnies in $1 and $2---after all, these
	strings are under our control.  The point is what should
	happen when "grep" exits with an error when asked to ensure
	that there is no region detected.

> +	then
> +		return 1
> +	fi
> +
> +	grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"

The same comment on "what about an error from grep" applies to this
one.

It might be easier to read to avoid having to say too many
backslash-quoted double quotes:

	grep -e	'"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"

This comment applies to the earlier "grep", too.

> +	exitcode=$?
> +
> +	if test $exitcode != $expect_exit
> +	then
> +		return 1
> +	fi
> +}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/8] More index cleanups
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-01-22 13:03   ` [PATCH v2 8/8] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
@ 2021-01-22 19:49   ` Elijah Newren
  2021-01-23 18:47     ` Derrick Stolee
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-22 19:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Derrick Stolee

On Fri, Jan 22, 2021 at 5:04 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is based on ds/cache-tree-basics.
>
> Here are a few more cleanups that are vaguely related to the index. I
> discovered these while preparing my sparse-index RFC that I intend to send
> early next week.
>
> The biggest patch is the final one, which creates a test script for
> comparing sparse-checkouts to full checkouts. There are some commands that
> do not behave similarly. This script will be the backbone of my testing
> strategy for the sparse-index by adding a new mode to compare
> sparse-checkouts with the two index types (full and sparse).
>
>
> UPDATES IN V2
> =============
>
>  * Fixed duplicated test in t1092.
>
>  * Changed the implementation of 'test_region' after I discovered the
>    negation doesn't work correctly. (I updated the test to use what was in
>    t0500-progress-display.sh at the last minute before v1, but that
>    implementation was wrong.) The use of it in t0500-progress-display.sh was
>    incorrect, as well.
>
>  * Updated commit messages to be more informative and have fewer typos.
>
>  * I dropped the patch that placed the sparse-checkout patterns in struct
>    index_state. I'll re-introduce that in time for the actual use of the
>    member.

You've addressed all my feedback from v1, but it looks like you missed
the pos + 1 changes highlighted by Chris in his review of patch 3.
Oversight?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 7/8] test-lib: test_region looks for trace2 regions
  2021-01-22 19:42     ` Junio C Hamano
@ 2021-01-23 18:36       ` Derrick Stolee
  2021-01-23 18:50         ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 18:36 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee

On 1/22/2021 2:42 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +	grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
> 
> ... this makes sure there is enter/category on a line (and
> leave/category on a line with another check).  Makes sense.
> 
> But...
> 
> 	test_region '!' '\(unmatching capture)' 'two' 'three'
> 
> would try to use an invalid regexp and cause grep to exit with 2,
> which would mean ...
> 
>> +	exitcode=$?
>> +
>> +	if test $exitcode != $expect_exit
> 
> ... this will not trigger and we return "success" (i.e. "failed as
> expected")?

Am I misunderstanding something here? If exitcode is 2, then this
will always trigger and return 1, signaling a failure. That would
propagate to the parent test and cause the test to fail. That seems
like the correct intention, but I'm not 100% confident about that.

> 	Clarification.  The point is *NOT* that the grep pattern is
> 	not robust against funnies in $1 and $2---after all, these
> 	strings are under our control.  The point is what should
> 	happen when "grep" exits with an error when asked to ensure
> 	that there is no region detected.

I'll be more robust to these in the next version. We'll expect
exit code equal to zero or _not_ equal to zero, depending on the
presence of '!'. This has the downside of returning success for
bad input strings when '!' is specified.

Basically, the approach I'm taking for v3 is here:

	if [test $expect_exit = 1] && [test $exitcode = 0]
	then
		return 1
	elif [test $expect_exit = 0] && [test $exitcode != 0]
	then
		return 1
	fi

>> +	grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
> 
> The same comment on "what about an error from grep" applies to this
> one.
> 
> It might be easier to read to avoid having to say too many
> backslash-quoted double quotes:
> 
> 	grep -e	'"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
> 
> This comment applies to the earlier "grep", too.

Thanks. This does look a bit cleaner.

-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 0/8] More index cleanups
  2021-01-22 19:49   ` [PATCH v2 0/8] More index cleanups Elijah Newren
@ 2021-01-23 18:47     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 18:47 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On 1/22/2021 2:49 PM, Elijah Newren wrote:
> 
> You've addressed all my feedback from v1, but it looks like you missed
> the pos + 1 changes highlighted by Chris in his review of patch 3.
> Oversight?
 
Yes, oversight. Thank you for the reminder.

-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 7/8] test-lib: test_region looks for trace2 regions
  2021-01-23 18:36       ` Derrick Stolee
@ 2021-01-23 18:50         ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 18:50 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, newren, Derrick Stolee,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>>> +	if test $exitcode != $expect_exit
>> 
>> ... this will not trigger and we return "success" (i.e. "failed as
>> expected")?
>
> Am I misunderstanding something here? If exitcode is 2, then this
> will always trigger and return 1, signaling a failure.

Are, please disregard.  Yes, the above does the right thing already.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 0/9] More index cleanups
  2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-01-22 19:49   ` [PATCH v2 0/8] More index cleanups Elijah Newren
@ 2021-01-23 19:58   ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
                       ` (9 more replies)
  9 siblings, 10 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee

This is based on ds/cache-tree-basics.

Here are a few more cleanups that are vaguely related to the index. I
discovered these while preparing my sparse-index RFC that I intend to send
early next week.

The biggest patch is the final one, which creates a test script for
comparing sparse-checkouts to full checkouts. There are some commands that
do not behave similarly. This script will be the backbone of my testing
strategy for the sparse-index by adding a new mode to compare
sparse-checkouts with the two index types (full and sparse).


UPDATES IN V3
=============

 * Callers to cache_tree_update() no longer initialize the cache_tree in
   advance.

 * Added a patch to update verify_cache() prototype.

 * Added missing "pos + 1" in fsmonitor.c.

 * Added a BUG() statement when repo->istate->repo is already populated, but
   not equal to repo.

 * Cleaned up test_region pattern quoting. Thanks, Junio!

Thanks, -Stolee

Derrick Stolee (9):
  cache-tree: clean up cache_tree_update()
  cache-tree: simplify verify_cache() prototype
  cache-tree: extract subtree_pos()
  fsmonitor: de-duplicate BUG()s around dirty bits
  repository: add repo reference to index_state
  name-hash: use trace2 regions for init
  sparse-checkout: load sparse-checkout patterns
  test-lib: test_region looks for trace2 regions
  t1092: test interesting sparse-checkout scenarios

 builtin/checkout.c                       |   3 -
 builtin/sparse-checkout.c                |   5 -
 cache-tree.c                             |  38 +--
 cache-tree.h                             |   2 +
 cache.h                                  |   1 +
 dir.c                                    |  17 ++
 dir.h                                    |   2 +
 fsmonitor.c                              |  27 +-
 name-hash.c                              |   3 +
 repository.c                             |   6 +
 sequencer.c                              |   3 -
 t/t0500-progress-display.sh              |   3 +-
 t/t1092-sparse-checkout-compatibility.sh | 301 +++++++++++++++++++++++
 t/test-lib-functions.sh                  |  42 ++++
 unpack-trees.c                           |   8 +-
 15 files changed, 408 insertions(+), 53 deletions(-)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh


base-commit: a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-839%2Fderrickstolee%2Fmore-index-cleanups-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-839/derrickstolee/more-index-cleanups-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/839

Range-diff vs v2:

  1:  f9dccaed0ac !  1:  bdc8ecca3d2 cache-tree: clean up cache_tree_update()
     @@ Commit message
          BUG() statement or returning with an error because future callers will
          want to populate an empty cache-tree using this method.
      
     -    Also drop local variables that are used exactly once and can be found
     -    directly from the 'istate' parameter.
     +    Callers can also remove their conditional allocations of cache_tree.
     +
     +    Also drop local variables that can be found directly from the 'istate'
     +    parameter.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## builtin/checkout.c ##
     +@@ builtin/checkout.c: static int merge_working_tree(const struct checkout_opts *opts,
     + 		}
     + 	}
     + 
     +-	if (!active_cache_tree)
     +-		active_cache_tree = cache_tree();
     +-
     + 	if (!cache_tree_fully_valid(active_cache_tree))
     + 		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
     + 
     +
       ## cache-tree.c ##
      @@ cache-tree.c: static int update_one(struct cache_tree *it,
       
     @@ cache-tree.c: static int update_one(struct cache_tree *it,
       	trace2_region_leave("cache_tree", "update", the_repository);
       	trace_performance_leave("cache_tree_update");
       	if (i < 0)
     +@@ cache-tree.c: static int write_index_as_tree_internal(struct object_id *oid,
     + 		cache_tree_valid = 0;
     + 	}
     + 
     +-	if (!index_state->cache_tree)
     +-		index_state->cache_tree = cache_tree();
     +-
     + 	if (!cache_tree_valid && cache_tree_update(index_state, flags) < 0)
     + 		return WRITE_TREE_UNMERGED_INDEX;
     + 
     +
     + ## sequencer.c ##
     +@@ sequencer.c: static int do_recursive_merge(struct repository *r,
     + 
     + static struct object_id *get_cache_tree_oid(struct index_state *istate)
     + {
     +-	if (!istate->cache_tree)
     +-		istate->cache_tree = cache_tree();
     +-
     + 	if (!cache_tree_fully_valid(istate->cache_tree))
     + 		if (cache_tree_update(istate, 0)) {
     + 			error(_("unable to update cache tree"));
     +
     + ## unpack-trees.c ##
     +@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
     + 		if (!ret) {
     + 			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
     + 				cache_tree_verify(the_repository, &o->result);
     +-			if (!o->result.cache_tree)
     +-				o->result.cache_tree = cache_tree();
     + 			if (!cache_tree_fully_valid(o->result.cache_tree))
     + 				cache_tree_update(&o->result,
     + 						  WRITE_TREE_SILENT |
  -:  ----------- >  2:  1b8b5680094 cache-tree: simplify verify_cache() prototype
  2:  84323e04d08 =  3:  314b6b34f75 cache-tree: extract subtree_pos()
  3:  31095f9aa0e !  4:  4e688d25f8c fsmonitor: de-duplicate BUG()s around dirty bits
     @@ Commit message
          cannot simplify it too much. However, the error string is identical in
          each, so this simplifies things.
      
     +    Be sure to add one when checking if a position if valid, since the
     +    minimum is a bound on the expected size.
     +
          The end result is that the code is simpler to read while also preserving
          these assertions for developers in the FSMonitor space.
      
     @@ fsmonitor.c
      -	if (pos >= istate->cache_nr)
      -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
      -		    (uintmax_t)pos, istate->cache_nr);
     -+	assert_index_minimum(istate, pos);
     ++	assert_index_minimum(istate, pos + 1);
       
       	ce = istate->cache[pos];
       	ce->ce_flags &= ~CE_FSMONITOR_VALID;
  4:  a0d89d7a973 !  5:  6373997e05c repository: add repo reference to index_state
     @@ Commit message
          repository, add a 'repo' pointer to struct index_state that allows
          access to this repository.
      
     +    Add a BUG() statement if the repo already has an index, and the index
     +    already has a repo, but somehow the index points to a different repo.
     +
          This will prevent future changes from needing to pass an additional
          'struct repository *repo' parameter and instead rely only on the 'struct
          index_state *istate' parameter.
     @@ repository.c: int repo_read_index(struct repository *repo)
      +	/* Complete the double-reference */
      +	if (!repo->index->repo)
      +		repo->index->repo = repo;
     ++	else if (repo->index->repo != repo)
     ++		BUG("repo's index should point back at itself");
      +
       	return read_index_from(repo->index, repo->index_file, repo->gitdir);
       }
  5:  bc092f5c703 =  6:  9b545d7dbec name-hash: use trace2 regions for init
  6:  04d1daf7222 =  7:  554cc7647e6 sparse-checkout: load sparse-checkout patterns
  7:  8832ce84623 !  8:  b37181bdec4 test-lib: test_region looks for trace2 regions
     @@ t/test-lib-functions.sh: test_subcommand () {
      +		shift
      +	fi
      +
     -+	grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
     ++	grep -e	'"region_enter".*"category":"'"$1"'","label":"'"$2"\" "$3"
      +	exitcode=$?
      +
     -+	if test $exitcode != $expect_exit
     ++	if test $exitcode != $expect_exit = 1]
      +	then
      +		return 1
      +	fi
      +
     -+	grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
     ++	grep -e	'"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
      +	exitcode=$?
      +
     -+	if test $exitcode != $expect_exit
     ++	if test $exitcode != $expect_exit = 1]
      +	then
      +		return 1
      +	fi
     ++
     ++	return 0
      +}
  8:  984458007ed !  9:  72f925353d3 t1092: test interesting sparse-checkout scenarios
     @@ t/t1092-sparse-checkout-compatibility.sh (new)
      +		echo a >a &&
      +		echo "after deep" >e &&
      +		echo "after folder1" >g &&
     ++		echo "after x" >z &&
      +		mkdir folder1 folder2 deep x &&
      +		mkdir deep/deeper1 deep/deeper2 &&
      +		mkdir deep/deeper1/deepest &&
     @@ t/t1092-sparse-checkout-compatibility.sh (new)
      +		echo "after deepest" >deep/deeper1/e &&
      +		cp a folder1 &&
      +		cp a folder2 &&
     ++		cp a x &&
      +		cp a deep &&
      +		cp a deep/deeper1 &&
      +		cp a deep/deeper2 &&
      +		cp a deep/deeper1/deepest &&
     ++		cp -r deep/deeper1/deepest deep/deeper2 &&
      +		git add . &&
      +		git commit -m "initial commit" &&
      +		git checkout -b base &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 1/9] cache-tree: clean up cache_tree_update()
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.

Callers can also remove their conditional allocations of cache_tree.

Also drop local variables that can be found directly from the 'istate'
parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout.c |  3 ---
 cache-tree.c       | 17 +++++++++--------
 sequencer.c        |  3 ---
 unpack-trees.c     |  2 --
 4 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index c9ba23c2794..2d6550bc3c8 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -821,9 +821,6 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		}
 	}
 
-	if (!active_cache_tree)
-		active_cache_tree = cache_tree();
-
 	if (!cache_tree_fully_valid(active_cache_tree))
 		cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
 
diff --git a/cache-tree.c b/cache-tree.c
index 3f1a8d4f1b7..60b6aefbf51 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
-	struct cache_tree *it = istate->cache_tree;
-	struct cache_entry **cache = istate->cache;
-	int entries = istate->cache_nr;
-	int skip, i = verify_cache(cache, entries, flags);
+	int skip, i;
+
+	i = verify_cache(istate->cache, istate->cache_nr, flags);
 
 	if (i)
 		return i;
+
+	if (!istate->cache_tree)
+		istate->cache_tree = cache_tree();
+
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", the_repository);
-	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
+		       "", 0, &skip, flags);
 	trace2_region_leave("cache_tree", "update", the_repository);
 	trace_performance_leave("cache_tree_update");
 	if (i < 0)
@@ -635,9 +639,6 @@ static int write_index_as_tree_internal(struct object_id *oid,
 		cache_tree_valid = 0;
 	}
 
-	if (!index_state->cache_tree)
-		index_state->cache_tree = cache_tree();
-
 	if (!cache_tree_valid && cache_tree_update(index_state, flags) < 0)
 		return WRITE_TREE_UNMERGED_INDEX;
 
diff --git a/sequencer.c b/sequencer.c
index 8909a467700..aa3e4c81cf0 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -679,9 +679,6 @@ static int do_recursive_merge(struct repository *r,
 
 static struct object_id *get_cache_tree_oid(struct index_state *istate)
 {
-	if (!istate->cache_tree)
-		istate->cache_tree = cache_tree();
-
 	if (!cache_tree_fully_valid(istate->cache_tree))
 		if (cache_tree_update(istate, 0)) {
 			error(_("unable to update cache tree"));
diff --git a/unpack-trees.c b/unpack-trees.c
index af6e9b9c2fd..a810b79657e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1726,8 +1726,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		if (!ret) {
 			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
 				cache_tree_verify(the_repository, &o->result);
-			if (!o->result.cache_tree)
-				o->result.cache_tree = cache_tree();
 			if (!cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 20:24       ` Elijah Newren
  2021-01-23 19:58     ` [PATCH v3 3/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The verify_cache() method takes an array of cache entries and a count,
but these are always provided directly from a struct index_state. Use
a pointer to the full structure instead.

There is a subtle point when istate->cache_nr is zero that subtracting
one will underflow. This triggers a failure in t0000-basic.sh, among
others. Use "i + 1 < istate->cache_nr" to avoid these strange
comparisons. Convert i to be unsigned as well, which also removes the
potential signed overflow in the unlikely case that cache_nr is over 2.1
billion entries. The 'funny' variable has a maximum value of 11, so
making it unsigned does not change anything of importance.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 60b6aefbf51..acac6d58c37 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -151,16 +151,15 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
 		istate->cache_changed |= CACHE_TREE_CHANGED;
 }
 
-static int verify_cache(struct cache_entry **cache,
-			int entries, int flags)
+static int verify_cache(struct index_state *istate, int flags)
 {
-	int i, funny;
+	unsigned i, funny;
 	int silent = flags & WRITE_TREE_SILENT;
 
 	/* Verify that the tree is merged */
 	funny = 0;
-	for (i = 0; i < entries; i++) {
-		const struct cache_entry *ce = cache[i];
+	for (i = 0; i < istate->cache_nr; i++) {
+		const struct cache_entry *ce = istate->cache[i];
 		if (ce_stage(ce)) {
 			if (silent)
 				return -1;
@@ -180,13 +179,13 @@ static int verify_cache(struct cache_entry **cache,
 	 * stage 0 entries.
 	 */
 	funny = 0;
-	for (i = 0; i < entries - 1; i++) {
+	for (i = 0; i + 1 < istate->cache_nr; i++) {
 		/* path/file always comes after path because of the way
 		 * the cache is sorted.  Also path can appear only once,
 		 * which means conflicting one would immediately follow.
 		 */
-		const struct cache_entry *this_ce = cache[i];
-		const struct cache_entry *next_ce = cache[i + 1];
+		const struct cache_entry *this_ce = istate->cache[i];
+		const struct cache_entry *next_ce = istate->cache[i + 1];
 		const char *this_name = this_ce->name;
 		const char *next_name = next_ce->name;
 		int this_len = ce_namelen(this_ce);
@@ -438,7 +437,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 {
 	int skip, i;
 
-	i = verify_cache(istate->cache, istate->cache_nr, flags);
+	i = verify_cache(istate, flags);
 
 	if (i)
 		return i;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 3/9] cache-tree: extract subtree_pos()
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 4/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache-tree.c | 6 +++---
 cache-tree.h | 2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index acac6d58c37..2fb483d3c08 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -45,7 +45,7 @@ static int subtree_name_cmp(const char *one, int onelen,
 	return memcmp(one, two, onelen);
 }
 
-static int subtree_pos(struct cache_tree *it, const char *path, int pathlen)
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen)
 {
 	struct cache_tree_sub **down = it->down;
 	int lo, hi;
@@ -72,7 +72,7 @@ static struct cache_tree_sub *find_subtree(struct cache_tree *it,
 					   int create)
 {
 	struct cache_tree_sub *down;
-	int pos = subtree_pos(it, path, pathlen);
+	int pos = cache_tree_subtree_pos(it, path, pathlen);
 	if (0 <= pos)
 		return it->down[pos];
 	if (!create)
@@ -123,7 +123,7 @@ static int do_invalidate_path(struct cache_tree *it, const char *path)
 	it->entry_count = -1;
 	if (!*slash) {
 		int pos;
-		pos = subtree_pos(it, path, namelen);
+		pos = cache_tree_subtree_pos(it, path, namelen);
 		if (0 <= pos) {
 			cache_tree_free(&it->down[pos]->cache_tree);
 			free(it->down[pos]);
diff --git a/cache-tree.h b/cache-tree.h
index 639bfa5340e..8efeccebfc9 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -27,6 +27,8 @@ void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct index_state *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
+int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 4/9] fsmonitor: de-duplicate BUG()s around dirty bits
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 3/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 5/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The index has an fsmonitor_dirty bitmap that records which index entries
are "dirty" based on the response from the FSMonitor. If this bitmap
ever grows larger than the index, then there was an error in how it was
constructed, and it was probably a developer's bug.

There are several BUG() statements that are very similar, so replace
these uses with a simpler assert_index_minimum(). Since there is one
caller that uses a custom 'pos' value instead of the bit_size member, we
cannot simplify it too much. However, the error string is identical in
each, so this simplifies things.

Be sure to add one when checking if a position if valid, since the
minimum is a bound on the expected size.

The end result is that the code is simpler to read while also preserving
these assertions for developers in the FSMonitor space.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 fsmonitor.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index ca031c3abb8..fe9e9d7baf4 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -13,14 +13,19 @@
 
 struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR);
 
+static void assert_index_minimum(struct index_state *istate, size_t pos)
+{
+	if (pos > istate->cache_nr)
+		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
+		    (uintmax_t)pos, istate->cache_nr);
+}
+
 static void fsmonitor_ewah_callback(size_t pos, void *is)
 {
 	struct index_state *istate = (struct index_state *)is;
 	struct cache_entry *ce;
 
-	if (pos >= istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
-		    (uintmax_t)pos, istate->cache_nr);
+	assert_index_minimum(istate, pos + 1);
 
 	ce = istate->cache[pos];
 	ce->ce_flags &= ~CE_FSMONITOR_VALID;
@@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
 	}
 	istate->fsmonitor_dirty = fsmonitor_dirty;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
 	return 0;
@@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
 	uint32_t ewah_size = 0;
 	int fixup = 0;
 
-	if (!istate->split_index &&
-	    istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index)
+		assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 
 	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
 	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
@@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate)
 			}
 
 			/* Mark all previously saved entries as dirty */
-			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+			assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size);
 			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
 
 			refresh_fsmonitor(istate);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 5/9] repository: add repo reference to index_state
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 4/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 6/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.

Add a BUG() statement if the repo already has an index, and the index
already has a repo, but somehow the index points to a different repo.

This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h      | 1 +
 repository.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/cache.h b/cache.h
index 71097657489..f9c7a603841 100644
--- a/cache.h
+++ b/cache.h
@@ -328,6 +328,7 @@ struct index_state {
 	struct ewah_bitmap *fsmonitor_dirty;
 	struct mem_pool *ce_mem_pool;
 	struct progress *progress;
+	struct repository *repo;
 };
 
 /* Name hashing */
diff --git a/repository.c b/repository.c
index a4174ddb062..c98298acd01 100644
--- a/repository.c
+++ b/repository.c
@@ -264,6 +264,12 @@ int repo_read_index(struct repository *repo)
 	if (!repo->index)
 		repo->index = xcalloc(1, sizeof(*repo->index));
 
+	/* Complete the double-reference */
+	if (!repo->index->repo)
+		repo->index->repo = repo;
+	else if (repo->index->repo != repo)
+		BUG("repo's index should point back at itself");
+
 	return read_index_from(repo->index, repo->index_file, repo->gitdir);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 6/9] name-hash: use trace2 regions for init
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 5/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 7/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The lazy_init_name_hash() populates a hashset with all filenames and
another with all directories represented in the index. This is run only
if we need to use the hashsets to check for existence or case-folding
renames.

Place trace2 regions where there is already a performance trace.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 5d3c7b12c18..4e03fac9bb1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -7,6 +7,7 @@
  */
 #include "cache.h"
 #include "thread-utils.h"
+#include "trace2.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -577,6 +578,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	if (istate->name_hash_initialized)
 		return;
 	trace_performance_enter();
+	trace2_region_enter("index", "name-hash-init", istate->repo);
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -597,6 +599,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
+	trace2_region_leave("index", "name-hash-init", istate->repo);
 	trace_performance_leave("initialize name hash");
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 7/9] sparse-checkout: load sparse-checkout patterns
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 6/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 19:58     ` [PATCH v3 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.

The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.

The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c |  5 -----
 dir.c                     | 17 +++++++++++++++++
 dir.h                     |  2 ++
 unpack-trees.c            |  6 +-----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index e3140db2a0a..2306a9ad98e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -22,11 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = {
 	NULL
 };
 
-static char *get_sparse_checkout_filename(void)
-{
-	return git_pathdup("info/sparse-checkout");
-}
-
 static void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 {
 	int i;
diff --git a/dir.c b/dir.c
index d637461da5c..d153a63bbd1 100644
--- a/dir.c
+++ b/dir.c
@@ -2998,6 +2998,23 @@ void setup_standard_excludes(struct dir_struct *dir)
 	}
 }
 
+char *get_sparse_checkout_filename(void)
+{
+	return git_pathdup("info/sparse-checkout");
+}
+
+int get_sparse_checkout_patterns(struct pattern_list *pl)
+{
+	int res;
+	char *sparse_filename = get_sparse_checkout_filename();
+
+	pl->use_cone_patterns = core_sparse_checkout_cone;
+	res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL);
+
+	free(sparse_filename);
+	return res;
+}
+
 int remove_path(const char *name)
 {
 	char *slash;
diff --git a/dir.h b/dir.h
index a3c40dec516..facfae47402 100644
--- a/dir.h
+++ b/dir.h
@@ -448,6 +448,8 @@ int is_empty_dir(const char *dir);
 
 void setup_standard_excludes(struct dir_struct *dir);
 
+char *get_sparse_checkout_filename(void);
+int get_sparse_checkout_patterns(struct pattern_list *pl);
 
 /* Constants for remove_dir_recursively: */
 
diff --git a/unpack-trees.c b/unpack-trees.c
index a810b79657e..f5f668f532d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1549,14 +1549,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
 static void populate_from_existing_patterns(struct unpack_trees_options *o,
 					    struct pattern_list *pl)
 {
-	char *sparse = git_pathdup("info/sparse-checkout");
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0)
+	if (get_sparse_checkout_patterns(pl) < 0)
 		o->skip_sparse_checkout = 1;
 	else
 		o->pl = pl;
-	free(sparse);
 }
 
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 8/9] test-lib: test_region looks for trace2 regions
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 7/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-23 21:07       ` Derrick Stolee
  2021-01-23 19:58     ` [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
  2021-01-23 20:29     ` [PATCH v3 0/9] More index cleanups Elijah Newren
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.

Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.

There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. Note that this
changes the expectations slightly. The old test (incorrectly) used two
patterns for the 'grep' invocation, but this performs an OR of the
patterns, not an AND. This means that as long as one region_enter event
was logged, the test would succeed, even if it was not due to the
progress category.

More uses will be added in a later change.

t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t0500-progress-display.sh |  3 +--
 t/test-lib-functions.sh     | 42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index 1ed1df351cb..84cce345e7d 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' '
 		"Working hard" <in 2>stderr &&
 
 	# t0212/parse_events.perl intentionally omits regions and data.
-	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
-	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
+	test_region progress "Working hard" trace.event &&
 	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
 	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 999982fe4a9..3f425deba18 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1655,3 +1655,45 @@ test_subcommand () {
 		grep "\[$expr\]"
 	fi
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_region [!] <category> <label> git <command> <args>...
+#
+# For example, to look for trace2_region_enter("index", "do_read_index", repo)
+# in an invocation of "git checkout HEAD~1", run
+#
+#	GIT_TRACE2_EVENT="$(pwd)/trace.txt" GIT_TRACE2_EVENT_NESTING=10 \
+#		git checkout HEAD~1 &&
+#	test_region index do_read_index <trace.txt
+#
+# If the first parameter passed is !, this instead checks that
+# the given region was not entered.
+#
+test_region () {
+	local expect_exit=0
+	if test "$1" = "!"
+	then
+		expect_exit=1
+		shift
+	fi
+
+	grep -e	'"region_enter".*"category":"'"$1"'","label":"'"$2"\" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit = 1]
+	then
+		return 1
+	fi
+
+	grep -e	'"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit = 1]
+	then
+		return 1
+	fi
+
+	return 0
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-23 19:58     ` Derrick Stolee via GitGitGadget
  2021-01-25 18:45       ` Elijah Newren
  2021-01-23 20:29     ` [PATCH v3 0/9] More index cleanups Elijah Newren
  9 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-01-23 19:58 UTC (permalink / raw)
  To: git; +Cc: newren, Derrick Stolee, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These also document some behaviors that differ from a full checkout, and
possibly in a way that is not intended.

The test is designed to be run with "--run=1,X" where 'X' is an
interesting test case. Each test uses 'init_repos' to reset the full and
sparse copies of the initial-repo that is created by the first test
case. This also makes it possible to have test cases leave the working
directory or index in unusual states without disturbing later cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 301 +++++++++++++++++++++++
 1 file changed, 301 insertions(+)
 create mode 100755 t/t1092-sparse-checkout-compatibility.sh

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
new file mode 100755
index 00000000000..8cd3e5a8d22
--- /dev/null
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -0,0 +1,301 @@
+#!/bin/sh
+
+test_description='compare full workdir to sparse workdir'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init initial-repo &&
+	(
+		cd initial-repo &&
+		echo a >a &&
+		echo "after deep" >e &&
+		echo "after folder1" >g &&
+		echo "after x" >z &&
+		mkdir folder1 folder2 deep x &&
+		mkdir deep/deeper1 deep/deeper2 &&
+		mkdir deep/deeper1/deepest &&
+		echo "after deeper1" >deep/e &&
+		echo "after deepest" >deep/deeper1/e &&
+		cp a folder1 &&
+		cp a folder2 &&
+		cp a x &&
+		cp a deep &&
+		cp a deep/deeper1 &&
+		cp a deep/deeper2 &&
+		cp a deep/deeper1/deepest &&
+		cp -r deep/deeper1/deepest deep/deeper2 &&
+		git add . &&
+		git commit -m "initial commit" &&
+		git checkout -b base &&
+		for dir in folder1 folder2 deep
+		do
+			git checkout -b update-$dir &&
+			echo "updated $dir" >$dir/a &&
+			git commit -a -m "update $dir" || return 1
+		done &&
+
+		git checkout -b rename-base base &&
+		echo >folder1/larger-content <<-\EOF &&
+		matching
+		lines
+		help
+		inexact
+		renames
+		EOF
+		cp folder1/larger-content folder2/ &&
+		cp folder1/larger-content deep/deeper1/ &&
+		git add . &&
+		git commit -m "add interesting rename content" &&
+
+		git checkout -b rename-out-to-out rename-base &&
+		mv folder1/a folder2/b &&
+		mv folder1/larger-content folder2/edited-content &&
+		echo >>folder2/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to folder2/..." &&
+
+		git checkout -b rename-out-to-in rename-base &&
+		mv folder1/a deep/deeper1/b &&
+		mv folder1/larger-content deep/deeper1/edited-content &&
+		echo >>deep/deeper1/edited-content &&
+		git add . &&
+		git commit -m "rename folder1/... to deep/deeper1/..." &&
+
+		git checkout -b rename-in-to-out rename-base &&
+		mv deep/deeper1/a folder1/b &&
+		mv deep/deeper1/larger-content folder1/edited-content &&
+		echo >>folder1/edited-content &&
+		git add . &&
+		git commit -m "rename deep/deeper1/... to folder1/..." &&
+
+		git checkout -b deepest base &&
+		echo "updated deepest" >deep/deeper1/deepest/a &&
+		git commit -a -m "update deepest" &&
+
+		git checkout -f base &&
+		git reset --hard
+	)
+'
+
+init_repos () {
+	rm -rf full-checkout sparse-checkout sparse-index &&
+
+	# create repos in initial state
+	cp -r initial-repo full-checkout &&
+	git -C full-checkout reset --hard &&
+
+	cp -r initial-repo sparse-checkout &&
+	git -C sparse-checkout reset --hard &&
+	git -C sparse-checkout sparse-checkout init --cone &&
+
+	# initialize sparse-checkout definitions
+	git -C sparse-checkout sparse-checkout set deep
+}
+
+run_on_sparse () {
+	(
+		cd sparse-checkout &&
+		$* >../sparse-checkout-out 2>../sparse-checkout-err
+	)
+}
+
+run_on_all () {
+	(
+		cd full-checkout &&
+		$* >../full-checkout-out 2>../full-checkout-err
+	) &&
+	run_on_sparse $*
+}
+
+test_all_match () {
+	run_on_all $* &&
+	test_cmp full-checkout-out sparse-checkout-out &&
+	test_cmp full-checkout-err sparse-checkout-err
+}
+
+test_expect_success 'status with options' '
+	init_repos &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	run_on_all "touch README.md" &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git status --porcelain=v2 -z -u &&
+	test_all_match git status --porcelain=v2 -uno
+'
+
+test_expect_success 'add, commit, checkout' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add README.md &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Add README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents README.md" &&
+
+	test_all_match git add -A &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "Extend README.md" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout - &&
+
+	run_on_all "../edit-contents deep/newfile" &&
+
+	test_all_match git status --porcelain=v2 -uno &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git add . &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git commit -m "add deep/newfile" &&
+
+	test_all_match git checkout HEAD~1 &&
+	test_all_match git checkout -
+'
+
+test_expect_success 'checkout and reset --hard' '
+	init_repos &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout update-deep &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git checkout -b reset-test &&
+	test_all_match git reset --hard deepest &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset --hard update-folder2
+'
+
+test_expect_success 'diff --staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>README.md
+	EOF
+	run_on_all "../edit-contents" &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	test_all_match git add README.md &&
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
+test_expect_success 'diff with renames' '
+	init_repos &&
+
+	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	do
+		test_all_match git checkout rename-base &&
+		test_all_match git checkout $branch -- .&&
+		test_all_match git diff --staged --no-renames &&
+		test_all_match git diff --staged --find-renames || return 1
+	done
+'
+
+test_expect_success 'log with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git log -- a &&
+	test_all_match git log -- folder1/a &&
+	test_all_match git log -- folder2/a &&
+	test_all_match git log -- deep/a &&
+	test_all_match git log -- deep/deeper1/a &&
+	test_all_match git log -- deep/deeper1/deepest/a &&
+
+	test_all_match git checkout update-folder1 &&
+	test_all_match git log -- folder1/a
+'
+
+test_expect_success 'blame with pathspec inside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame a &&
+	test_all_match git blame deep/a &&
+	test_all_match git blame deep/deeper1/a &&
+	test_all_match git blame deep/deeper1/deepest/a
+'
+
+# TODO: blame currently does not support blaming files outside of the
+# sparse definition. It complains that the file doesn't exist locally.
+test_expect_failure 'blame with pathspec outside sparse definition' '
+	init_repos &&
+
+	test_all_match git blame folder1/a &&
+	test_all_match git blame folder2/a &&
+	test_all_match git blame deep/deeper2/a &&
+	test_all_match git blame deep/deeper2/deepest/a
+'
+
+# TODO: reset currently does not behave as expected when in a
+# sparse-checkout.
+test_expect_failure 'checkout and reset (mixed)' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset deepest &&
+	test_all_match git reset update-folder1 &&
+	test_all_match git reset update-folder2
+'
+
+test_expect_success 'merge' '
+	init_repos &&
+
+	test_all_match git checkout -b merge update-deep &&
+	test_all_match git merge -m "folder1" update-folder1 &&
+	test_all_match git rev-parse HEAD^{tree} &&
+	test_all_match git merge -m "folder2" update-folder2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
+test_expect_success 'merge with outside renames' '
+	init_repos &&
+
+	for type in out-to-out out-to-in in-to-out
+	do
+		test_all_match git reset --hard &&
+		test_all_match git checkout -f -b merge-$type update-deep &&
+		test_all_match git merge -m "$type" rename-$type &&
+		test_all_match git rev-parse HEAD^{tree} || return 1
+	done
+'
+
+test_expect_success 'clean' '
+	init_repos &&
+
+	echo bogus >>.gitignore &&
+	run_on_all cp ../.gitignore . &&
+	test_all_match git add .gitignore &&
+	test_all_match git commit -m ignore-bogus-files &&
+
+	run_on_sparse mkdir folder1 &&
+	run_on_all touch folder1/bogus &&
+
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git clean -f &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git clean -xdf &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_path_is_dir sparse-checkout/folder1
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 19:58     ` [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype Derrick Stolee via GitGitGadget
@ 2021-01-23 20:24       ` Elijah Newren
  2021-01-23 21:02         ` Derrick Stolee
  2021-01-23 21:10         ` Junio C Hamano
  0 siblings, 2 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-23 20:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Derrick Stolee,
	Derrick Stolee

On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The verify_cache() method takes an array of cache entries and a count,
> but these are always provided directly from a struct index_state. Use
> a pointer to the full structure instead.
>
> There is a subtle point when istate->cache_nr is zero that subtracting
> one will underflow. This triggers a failure in t0000-basic.sh, among
> others. Use "i + 1 < istate->cache_nr" to avoid these strange
> comparisons. Convert i to be unsigned as well, which also removes the
> potential signed overflow in the unlikely case that cache_nr is over 2.1
> billion entries. The 'funny' variable has a maximum value of 11, so

AND a minimum value of 0 (which is important for the type change to be valid).

> making it unsigned does not change anything of importance.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  cache-tree.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 60b6aefbf51..acac6d58c37 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -151,16 +151,15 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
>                 istate->cache_changed |= CACHE_TREE_CHANGED;
>  }
>
> -static int verify_cache(struct cache_entry **cache,
> -                       int entries, int flags)
> +static int verify_cache(struct index_state *istate, int flags)
>  {
> -       int i, funny;
> +       unsigned i, funny;
>         int silent = flags & WRITE_TREE_SILENT;
>
>         /* Verify that the tree is merged */
>         funny = 0;
> -       for (i = 0; i < entries; i++) {
> -               const struct cache_entry *ce = cache[i];
> +       for (i = 0; i < istate->cache_nr; i++) {
> +               const struct cache_entry *ce = istate->cache[i];
>                 if (ce_stage(ce)) {
>                         if (silent)
>                                 return -1;
> @@ -180,13 +179,13 @@ static int verify_cache(struct cache_entry **cache,
>          * stage 0 entries.
>          */
>         funny = 0;
> -       for (i = 0; i < entries - 1; i++) {
> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
>                 /* path/file always comes after path because of the way
>                  * the cache is sorted.  Also path can appear only once,
>                  * which means conflicting one would immediately follow.
>                  */
> -               const struct cache_entry *this_ce = cache[i];
> -               const struct cache_entry *next_ce = cache[i + 1];
> +               const struct cache_entry *this_ce = istate->cache[i];
> +               const struct cache_entry *next_ce = istate->cache[i + 1];
>                 const char *this_name = this_ce->name;
>                 const char *next_name = next_ce->name;
>                 int this_len = ce_namelen(this_ce);
> @@ -438,7 +437,7 @@ int cache_tree_update(struct index_state *istate, int flags)
>  {
>         int skip, i;
>
> -       i = verify_cache(istate->cache, istate->cache_nr, flags);
> +       i = verify_cache(istate, flags);
>
>         if (i)
>                 return i;
> --
> gitgitgadget

Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
in the commit message; made it easier to read through quickly.  I'm
curious if it deserves a comment in the code too, since it does feel
slightly unusual.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 0/9] More index cleanups
  2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-01-23 19:58     ` [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
@ 2021-01-23 20:29     ` Elijah Newren
  2021-01-23 21:05       ` Derrick Stolee
  9 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2021-01-23 20:29 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Derrick Stolee

On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is based on ds/cache-tree-basics.
>
> Here are a few more cleanups that are vaguely related to the index. I
> discovered these while preparing my sparse-index RFC that I intend to send
> early next week.
>
> The biggest patch is the final one, which creates a test script for
> comparing sparse-checkouts to full checkouts. There are some commands that
> do not behave similarly. This script will be the backbone of my testing
> strategy for the sparse-index by adding a new mode to compare
> sparse-checkouts with the two index types (full and sparse).
>
>
> UPDATES IN V3
> =============
>
>  * Callers to cache_tree_update() no longer initialize the cache_tree in
>    advance.
>
>  * Added a patch to update verify_cache() prototype.
>
>  * Added missing "pos + 1" in fsmonitor.c.
>
>  * Added a BUG() statement when repo->istate->repo is already populated, but
>    not equal to repo.
>
>  * Cleaned up test_region pattern quoting. Thanks, Junio!
>
> Thanks, -Stolee
>
> Derrick Stolee (9):
>   cache-tree: clean up cache_tree_update()
>   cache-tree: simplify verify_cache() prototype
>   cache-tree: extract subtree_pos()
>   fsmonitor: de-duplicate BUG()s around dirty bits
>   repository: add repo reference to index_state
>   name-hash: use trace2 regions for init
>   sparse-checkout: load sparse-checkout patterns
>   test-lib: test_region looks for trace2 regions
>   t1092: test interesting sparse-checkout scenarios
>
>  builtin/checkout.c                       |   3 -
>  builtin/sparse-checkout.c                |   5 -
>  cache-tree.c                             |  38 +--
>  cache-tree.h                             |   2 +
>  cache.h                                  |   1 +
>  dir.c                                    |  17 ++
>  dir.h                                    |   2 +
>  fsmonitor.c                              |  27 +-
>  name-hash.c                              |   3 +
>  repository.c                             |   6 +
>  sequencer.c                              |   3 -
>  t/t0500-progress-display.sh              |   3 +-
>  t/t1092-sparse-checkout-compatibility.sh | 301 +++++++++++++++++++++++
>  t/test-lib-functions.sh                  |  42 ++++
>  unpack-trees.c                           |   8 +-
>  15 files changed, 408 insertions(+), 53 deletions(-)
>  create mode 100755 t/t1092-sparse-checkout-compatibility.sh
>
>
> base-commit: a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-839%2Fderrickstolee%2Fmore-index-cleanups-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-839/derrickstolee/more-index-cleanups-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/839
>
> Range-diff vs v2:
>
>   1:  f9dccaed0ac !  1:  bdc8ecca3d2 cache-tree: clean up cache_tree_update()
>      @@ Commit message
>           BUG() statement or returning with an error because future callers will
>           want to populate an empty cache-tree using this method.
>
>      -    Also drop local variables that are used exactly once and can be found
>      -    directly from the 'istate' parameter.
>      +    Callers can also remove their conditional allocations of cache_tree.
>      +
>      +    Also drop local variables that can be found directly from the 'istate'
>      +    parameter.
>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>      + ## builtin/checkout.c ##
>      +@@ builtin/checkout.c: static int merge_working_tree(const struct checkout_opts *opts,
>      +          }
>      +  }
>      +
>      +- if (!active_cache_tree)
>      +-         active_cache_tree = cache_tree();
>      +-
>      +  if (!cache_tree_fully_valid(active_cache_tree))
>      +          cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR);
>      +
>      +
>        ## cache-tree.c ##
>       @@ cache-tree.c: static int update_one(struct cache_tree *it,
>
>      @@ cache-tree.c: static int update_one(struct cache_tree *it,
>         trace2_region_leave("cache_tree", "update", the_repository);
>         trace_performance_leave("cache_tree_update");
>         if (i < 0)
>      +@@ cache-tree.c: static int write_index_as_tree_internal(struct object_id *oid,
>      +          cache_tree_valid = 0;
>      +  }
>      +
>      +- if (!index_state->cache_tree)
>      +-         index_state->cache_tree = cache_tree();
>      +-
>      +  if (!cache_tree_valid && cache_tree_update(index_state, flags) < 0)
>      +          return WRITE_TREE_UNMERGED_INDEX;
>      +
>      +
>      + ## sequencer.c ##
>      +@@ sequencer.c: static int do_recursive_merge(struct repository *r,
>      +
>      + static struct object_id *get_cache_tree_oid(struct index_state *istate)
>      + {
>      +- if (!istate->cache_tree)
>      +-         istate->cache_tree = cache_tree();
>      +-
>      +  if (!cache_tree_fully_valid(istate->cache_tree))
>      +          if (cache_tree_update(istate, 0)) {
>      +                  error(_("unable to update cache tree"));
>      +
>      + ## unpack-trees.c ##
>      +@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>      +          if (!ret) {
>      +                  if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
>      +                          cache_tree_verify(the_repository, &o->result);
>      +-                 if (!o->result.cache_tree)
>      +-                         o->result.cache_tree = cache_tree();
>      +                  if (!cache_tree_fully_valid(o->result.cache_tree))
>      +                          cache_tree_update(&o->result,
>      +                                            WRITE_TREE_SILENT |
>   -:  ----------- >  2:  1b8b5680094 cache-tree: simplify verify_cache() prototype
>   2:  84323e04d08 =  3:  314b6b34f75 cache-tree: extract subtree_pos()
>   3:  31095f9aa0e !  4:  4e688d25f8c fsmonitor: de-duplicate BUG()s around dirty bits
>      @@ Commit message
>           cannot simplify it too much. However, the error string is identical in
>           each, so this simplifies things.
>
>      +    Be sure to add one when checking if a position if valid, since the
>      +    minimum is a bound on the expected size.
>      +
>           The end result is that the code is simpler to read while also preserving
>           these assertions for developers in the FSMonitor space.
>
>      @@ fsmonitor.c
>       - if (pos >= istate->cache_nr)
>       -         BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
>       -             (uintmax_t)pos, istate->cache_nr);
>      -+ assert_index_minimum(istate, pos);
>      ++ assert_index_minimum(istate, pos + 1);
>
>         ce = istate->cache[pos];
>         ce->ce_flags &= ~CE_FSMONITOR_VALID;
>   4:  a0d89d7a973 !  5:  6373997e05c repository: add repo reference to index_state
>      @@ Commit message
>           repository, add a 'repo' pointer to struct index_state that allows
>           access to this repository.
>
>      +    Add a BUG() statement if the repo already has an index, and the index
>      +    already has a repo, but somehow the index points to a different repo.
>      +
>           This will prevent future changes from needing to pass an additional
>           'struct repository *repo' parameter and instead rely only on the 'struct
>           index_state *istate' parameter.
>      @@ repository.c: int repo_read_index(struct repository *repo)
>       + /* Complete the double-reference */
>       + if (!repo->index->repo)
>       +         repo->index->repo = repo;
>      ++ else if (repo->index->repo != repo)
>      ++         BUG("repo's index should point back at itself");
>       +
>         return read_index_from(repo->index, repo->index_file, repo->gitdir);
>        }
>   5:  bc092f5c703 =  6:  9b545d7dbec name-hash: use trace2 regions for init
>   6:  04d1daf7222 =  7:  554cc7647e6 sparse-checkout: load sparse-checkout patterns
>   7:  8832ce84623 !  8:  b37181bdec4 test-lib: test_region looks for trace2 regions
>      @@ t/test-lib-functions.sh: test_subcommand () {
>       +         shift
>       + fi
>       +
>      -+ grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
>      ++ grep -e '"region_enter".*"category":"'"$1"'","label":"'"$2"\" "$3"
>       + exitcode=$?
>       +
>      -+ if test $exitcode != $expect_exit
>      ++ if test $exitcode != $expect_exit = 1]

I don't understand this change.  Is it even valid code?  What does it mean?

>       + then
>       +         return 1
>       + fi
>       +
>      -+ grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
>      ++ grep -e '"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
>       + exitcode=$?
>       +
>      -+ if test $exitcode != $expect_exit
>      ++ if test $exitcode != $expect_exit = 1]

Same comment.

>       + then
>       +         return 1
>       + fi
>      ++
>      ++ return 0
>       +}
>   8:  984458007ed !  9:  72f925353d3 t1092: test interesting sparse-checkout scenarios
>      @@ t/t1092-sparse-checkout-compatibility.sh (new)
>       +         echo a >a &&
>       +         echo "after deep" >e &&
>       +         echo "after folder1" >g &&
>      ++         echo "after x" >z &&
>       +         mkdir folder1 folder2 deep x &&
>       +         mkdir deep/deeper1 deep/deeper2 &&
>       +         mkdir deep/deeper1/deepest &&
>      @@ t/t1092-sparse-checkout-compatibility.sh (new)
>       +         echo "after deepest" >deep/deeper1/e &&
>       +         cp a folder1 &&
>       +         cp a folder2 &&
>      ++         cp a x &&
>       +         cp a deep &&
>       +         cp a deep/deeper1 &&
>       +         cp a deep/deeper2 &&
>       +         cp a deep/deeper1/deepest &&
>      ++         cp -r deep/deeper1/deepest deep/deeper2 &&
>       +         git add . &&
>       +         git commit -m "initial commit" &&
>       +         git checkout -b base &&
>

Having read the previous rounds, the rest of the range-diff looks good
to me; I sent out separate comments on the new patch.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 20:24       ` Elijah Newren
@ 2021-01-23 21:02         ` Derrick Stolee
  2021-01-23 21:10           ` Elijah Newren
  2021-01-23 21:41           ` Junio C Hamano
  2021-01-23 21:10         ` Junio C Hamano
  1 sibling, 2 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 21:02 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 1/23/2021 3:24 PM, Elijah Newren wrote:
> On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> -       for (i = 0; i < entries - 1; i++) {
>> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
>>                 /* path/file always comes after path because of the way
>>                  * the cache is sorted.  Also path can appear only once,
>>                  * which means conflicting one would immediately follow.
>>                  */
>> -               const struct cache_entry *this_ce = cache[i];
>> -               const struct cache_entry *next_ce = cache[i + 1];
>> +               const struct cache_entry *this_ce = istate->cache[i];
>> +               const struct cache_entry *next_ce = istate->cache[i + 1];
>>                 const char *this_name = this_ce->name;
>>                 const char *next_name = next_ce->name;
>>                 int this_len = ce_namelen(this_ce);
> Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
> in the commit message; made it easier to read through quickly.  I'm
> curious if it deserves a comment in the code too, since it does feel
> slightly unusual.

I would argue that "i + 1 < N" is a more natural way to write this,
because we use "i + 1" as an index, so we want to ensure the index
we are about to use is within range. "i < N - 1" is the backwards
way to write that statement.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 0/9] More index cleanups
  2021-01-23 20:29     ` [PATCH v3 0/9] More index cleanups Elijah Newren
@ 2021-01-23 21:05       ` Derrick Stolee
  2021-01-23 21:42         ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 21:05 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On 1/23/2021 3:29 PM, Elijah Newren wrote:
> On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is based on ds/cache-tree-basics.
...
>>      -+ grep -e "\"region_enter\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
>>      ++ grep -e '"region_enter".*"category":"'"$1"'","label":"'"$2"\" "$3"
>>       + exitcode=$?
>>       +
>>      -+ if test $exitcode != $expect_exit
>>      ++ if test $exitcode != $expect_exit = 1]
> 
> I don't understand this change.  Is it even valid code?  What does it mean?
> 
>>       + then
>>       +         return 1
>>       + fi
>>       +
>>      -+ grep -e "\"region_leave\".*\"category\":\"$1\",\"label\":\"$2\"" "$3"
>>      ++ grep -e '"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
>>       + exitcode=$?
>>       +
>>      -+ if test $exitcode != $expect_exit
>>      ++ if test $exitcode != $expect_exit = 1]
> 
> Same comment

Wow. I am sorry this snuck in. It's an artifact of what I was trying [1]
in response to Junio's comments, but then did not completely undo. I'm
surprised this ran without error. Will fix.

[1] https://lore.kernel.org/git/8406512b-3d9f-e899-24fd-8a09c4af3569@gmail.com/

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 8/9] test-lib: test_region looks for trace2 regions
  2021-01-23 19:58     ` [PATCH v3 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
@ 2021-01-23 21:07       ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 21:07 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: newren, gitster, Derrick Stolee, Derrick Stolee

On 1/23/2021 2:58 PM, Derrick Stolee via GitGitGadget wrote:
...
> +	if test $exitcode != $expect_exit = 1]
...
> +	if test $exitcode != $expect_exit = 1]
As Elijah pointed out, these lines are bogus. I'm not sure how
they passed the tests without failure, but here is a replacement
for this patch:

--- >8 ---


From ff15d509b89edd4830d85d53cea3079a6b0c1c08 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Mon, 11 Jan 2021 08:53:09 -0500
Subject: [PATCH 8/9] test-lib: test_region looks for trace2 regions

Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.

Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.

There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. Note that this
changes the expectations slightly. The old test (incorrectly) used two
patterns for the 'grep' invocation, but this performs an OR of the
patterns, not an AND. This means that as long as one region_enter event
was logged, the test would succeed, even if it was not due to the
progress category.

More uses will be added in a later change.

t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t0500-progress-display.sh |  3 +--
 t/test-lib-functions.sh     | 42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index 1ed1df351c..84cce345e7 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' '
 		"Working hard" <in 2>stderr &&
 
 	# t0212/parse_events.perl intentionally omits regions and data.
-	grep -e "region_enter" -e "\"category\":\"progress\"" trace.event &&
-	grep -e "region_leave" -e "\"category\":\"progress\"" trace.event &&
+	test_region progress "Working hard" trace.event &&
 	grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event &&
 	grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 999982fe4a..9fc4cf8476 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1655,3 +1655,45 @@ test_subcommand () {
 		grep "\[$expr\]"
 	fi
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_region [!] <category> <label> git <command> <args>...
+#
+# For example, to look for trace2_region_enter("index", "do_read_index", repo)
+# in an invocation of "git checkout HEAD~1", run
+#
+#	GIT_TRACE2_EVENT="$(pwd)/trace.txt" GIT_TRACE2_EVENT_NESTING=10 \
+#		git checkout HEAD~1 &&
+#	test_region index do_read_index <trace.txt
+#
+# If the first parameter passed is !, this instead checks that
+# the given region was not entered.
+#
+test_region () {
+	local expect_exit=0
+	if test "$1" = "!"
+	then
+		expect_exit=1
+		shift
+	fi
+
+	grep -e	'"region_enter".*"category":"'"$1"'","label":"'"$2"\" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+
+	grep -e	'"region_leave".*"category":"'"$1"'","label":"'"$2"\" "$3"
+	exitcode=$?
+
+	if test $exitcode != $expect_exit
+	then
+		return 1
+	fi
+
+	return 0
+}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 20:24       ` Elijah Newren
  2021-01-23 21:02         ` Derrick Stolee
@ 2021-01-23 21:10         ` Junio C Hamano
  2021-01-23 21:14           ` Derrick Stolee
  1 sibling, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 21:10 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

Elijah Newren <newren@gmail.com> writes:

>> -       for (i = 0; i < entries - 1; i++) {
>> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
>>                 /* path/file always comes after path because of the way
>>                  * the cache is sorted.  Also path can appear only once,
>>                  * which means conflicting one would immediately follow.
>>                  */
>> -               const struct cache_entry *this_ce = cache[i];
>> -               const struct cache_entry *next_ce = cache[i + 1];
>> +               const struct cache_entry *this_ce = istate->cache[i];
>> +               const struct cache_entry *next_ce = istate->cache[i + 1];
>
> Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
> in the commit message; made it easier to read through quickly.  I'm
> curious if it deserves a comment in the code too, since it does feel
> slightly unusual.

I think this is entirely my fault.

It probably reads more natural to start from 1 and interate through
to the end of the array, comparing the current one with the previous
entry, i.e.

	for (i = 1; i < istate->cache_nr; i++) {
        	prev = cache[i - 1];
		this = cache[i];
                compare(prev, this);


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 21:02         ` Derrick Stolee
@ 2021-01-23 21:10           ` Elijah Newren
  2021-01-23 21:41           ` Junio C Hamano
  1 sibling, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-23 21:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Sat, Jan 23, 2021 at 1:02 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/23/2021 3:24 PM, Elijah Newren wrote:
> > On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> -       for (i = 0; i < entries - 1; i++) {
> >> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
> >>                 /* path/file always comes after path because of the way
> >>                  * the cache is sorted.  Also path can appear only once,
> >>                  * which means conflicting one would immediately follow.
> >>                  */
> >> -               const struct cache_entry *this_ce = cache[i];
> >> -               const struct cache_entry *next_ce = cache[i + 1];
> >> +               const struct cache_entry *this_ce = istate->cache[i];
> >> +               const struct cache_entry *next_ce = istate->cache[i + 1];
> >>                 const char *this_name = this_ce->name;
> >>                 const char *next_name = next_ce->name;
> >>                 int this_len = ce_namelen(this_ce);
> > Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
> > in the commit message; made it easier to read through quickly.  I'm
> > curious if it deserves a comment in the code too, since it does feel
> > slightly unusual.
>
> I would argue that "i + 1 < N" is a more natural way to write this,
> because we use "i + 1" as an index, so we want to ensure the index
> we are about to use is within range. "i < N - 1" is the backwards
> way to write that statement.

Oh, right, I think I was reading too quickly and assuming one thing in
my head (about what the code was going to do), and forgetting that
assumption when I got to the actual code.  Sorry about that; I agree
with you, so ignore my previous comment.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 21:10         ` Junio C Hamano
@ 2021-01-23 21:14           ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2021-01-23 21:14 UTC (permalink / raw)
  To: Junio C Hamano, Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee, Derrick Stolee

On 1/23/2021 4:10 PM, Junio C Hamano wrote:
> Elijah Newren <newren@gmail.com> writes:
> 
>>> -       for (i = 0; i < entries - 1; i++) {
>>> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
>>>                 /* path/file always comes after path because of the way
>>>                  * the cache is sorted.  Also path can appear only once,
>>>                  * which means conflicting one would immediately follow.
>>>                  */
>>> -               const struct cache_entry *this_ce = cache[i];
>>> -               const struct cache_entry *next_ce = cache[i + 1];
>>> +               const struct cache_entry *this_ce = istate->cache[i];
>>> +               const struct cache_entry *next_ce = istate->cache[i + 1];
>>
>> Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
>> in the commit message; made it easier to read through quickly.  I'm
>> curious if it deserves a comment in the code too, since it does feel
>> slightly unusual.
> 
> I think this is entirely my fault.
> 
> It probably reads more natural to start from 1 and interate through
> to the end of the array, comparing the current one with the previous
> entry, i.e.
> 
> 	for (i = 1; i < istate->cache_nr; i++) {
>         	prev = cache[i - 1];
> 		this = cache[i];
>                 compare(prev, this);

This would be another natural way to make the loop extremely clear.

It's a bigger diff, changing the names of 'this' and 'next', but
perhaps worthwhile.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype
  2021-01-23 21:02         ` Derrick Stolee
  2021-01-23 21:10           ` Elijah Newren
@ 2021-01-23 21:41           ` Junio C Hamano
  1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 21:41 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 1/23/2021 3:24 PM, Elijah Newren wrote:
>> On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>> -       for (i = 0; i < entries - 1; i++) {
>>> +       for (i = 0; i + 1 < istate->cache_nr; i++) {
>>>                 /* path/file always comes after path because of the way
>>>                  * the cache is sorted.  Also path can appear only once,
>>>                  * which means conflicting one would immediately follow.
>>>                  */
>>> -               const struct cache_entry *this_ce = cache[i];
>>> -               const struct cache_entry *next_ce = cache[i + 1];
>>> +               const struct cache_entry *this_ce = istate->cache[i];
>>> +               const struct cache_entry *next_ce = istate->cache[i + 1];
>>>                 const char *this_name = this_ce->name;
>>>                 const char *next_name = next_ce->name;
>>>                 int this_len = ce_namelen(this_ce);
>> Makes sense.  Thanks for explaining the i + 1 < istate->cache_nr bit
>> in the commit message; made it easier to read through quickly.  I'm
>> curious if it deserves a comment in the code too, since it does feel
>> slightly unusual.
>
> I would argue that "i + 1 < N" is a more natural way to write this,
> because we use "i + 1" as an index, so we want to ensure the index
> we are about to use is within range. "i < N - 1" is the backwards
> way to write that statement.

Our mails have crossed, I guess.  Comparing i+1 and N is also good.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 0/9] More index cleanups
  2021-01-23 21:05       ` Derrick Stolee
@ 2021-01-23 21:42         ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 21:42 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>>>      -+ if test $exitcode != $expect_exit
>>>      ++ if test $exitcode != $expect_exit = 1]
>> 
>> Same comment
>
> Wow. I am sorry this snuck in. It's an artifact of what I was trying [1]
> in response to Junio's comments, but then did not completely undo.

Sorry about that.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios
  2021-01-23 19:58     ` [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
@ 2021-01-25 18:45       ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2021-01-25 18:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Derrick Stolee,
	Derrick Stolee

On Sat, Jan 23, 2021 at 11:58 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> These also document some behaviors that differ from a full checkout, and
> possibly in a way that is not intended.
>
> The test is designed to be run with "--run=1,X" where 'X' is an

Random sidenote: Why not suggest "--run=setup,X" instead?  Then if you
ever add additional "setup" steps, it'd run all of them.

(See https://lore.kernel.org/git/0355c888828aeec331a6b99a26d8aa04cff59859.1602980628.git.gitgitgadget@gmail.com/)

> interesting test case. Each test uses 'init_repos' to reset the full and
> sparse copies of the initial-repo that is created by the first test
> case. This also makes it possible to have test cases leave the working
> directory or index in unusual states without disturbing later cases.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 301 +++++++++++++++++++++++
>  1 file changed, 301 insertions(+)
>  create mode 100755 t/t1092-sparse-checkout-compatibility.sh
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> new file mode 100755
> index 00000000000..8cd3e5a8d22
> --- /dev/null
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -0,0 +1,301 @@
> +#!/bin/sh
> +
> +test_description='compare full workdir to sparse workdir'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +       git init initial-repo &&
> +       (
> +               cd initial-repo &&
> +               echo a >a &&
> +               echo "after deep" >e &&
> +               echo "after folder1" >g &&
> +               echo "after x" >z &&
> +               mkdir folder1 folder2 deep x &&
> +               mkdir deep/deeper1 deep/deeper2 &&
> +               mkdir deep/deeper1/deepest &&
> +               echo "after deeper1" >deep/e &&
> +               echo "after deepest" >deep/deeper1/e &&
> +               cp a folder1 &&
> +               cp a folder2 &&
> +               cp a x &&
> +               cp a deep &&
> +               cp a deep/deeper1 &&
> +               cp a deep/deeper2 &&
> +               cp a deep/deeper1/deepest &&
> +               cp -r deep/deeper1/deepest deep/deeper2 &&
> +               git add . &&
> +               git commit -m "initial commit" &&
> +               git checkout -b base &&
> +               for dir in folder1 folder2 deep
> +               do
> +                       git checkout -b update-$dir &&
> +                       echo "updated $dir" >$dir/a &&
> +                       git commit -a -m "update $dir" || return 1
> +               done &&
> +
> +               git checkout -b rename-base base &&
> +               echo >folder1/larger-content <<-\EOF &&
> +               matching
> +               lines
> +               help
> +               inexact
> +               renames
> +               EOF
> +               cp folder1/larger-content folder2/ &&
> +               cp folder1/larger-content deep/deeper1/ &&
> +               git add . &&
> +               git commit -m "add interesting rename content" &&
> +
> +               git checkout -b rename-out-to-out rename-base &&
> +               mv folder1/a folder2/b &&
> +               mv folder1/larger-content folder2/edited-content &&
> +               echo >>folder2/edited-content &&
> +               git add . &&
> +               git commit -m "rename folder1/... to folder2/..." &&
> +
> +               git checkout -b rename-out-to-in rename-base &&
> +               mv folder1/a deep/deeper1/b &&
> +               mv folder1/larger-content deep/deeper1/edited-content &&
> +               echo >>deep/deeper1/edited-content &&
> +               git add . &&
> +               git commit -m "rename folder1/... to deep/deeper1/..." &&
> +
> +               git checkout -b rename-in-to-out rename-base &&
> +               mv deep/deeper1/a folder1/b &&
> +               mv deep/deeper1/larger-content folder1/edited-content &&
> +               echo >>folder1/edited-content &&
> +               git add . &&
> +               git commit -m "rename deep/deeper1/... to folder1/..." &&
> +
> +               git checkout -b deepest base &&
> +               echo "updated deepest" >deep/deeper1/deepest/a &&
> +               git commit -a -m "update deepest" &&
> +
> +               git checkout -f base &&
> +               git reset --hard
> +       )
> +'
> +
> +init_repos () {
> +       rm -rf full-checkout sparse-checkout sparse-index &&
> +
> +       # create repos in initial state
> +       cp -r initial-repo full-checkout &&
> +       git -C full-checkout reset --hard &&
> +
> +       cp -r initial-repo sparse-checkout &&
> +       git -C sparse-checkout reset --hard &&
> +       git -C sparse-checkout sparse-checkout init --cone &&
> +
> +       # initialize sparse-checkout definitions
> +       git -C sparse-checkout sparse-checkout set deep
> +}
> +
> +run_on_sparse () {
> +       (
> +               cd sparse-checkout &&
> +               $* >../sparse-checkout-out 2>../sparse-checkout-err
> +       )
> +}
> +
> +run_on_all () {
> +       (
> +               cd full-checkout &&
> +               $* >../full-checkout-out 2>../full-checkout-err
> +       ) &&
> +       run_on_sparse $*
> +}
> +
> +test_all_match () {
> +       run_on_all $* &&
> +       test_cmp full-checkout-out sparse-checkout-out &&
> +       test_cmp full-checkout-err sparse-checkout-err
> +}
> +
> +test_expect_success 'status with options' '
> +       init_repos &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno &&
> +       run_on_all "touch README.md" &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno &&
> +       test_all_match git add README.md &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git status --porcelain=v2 -z -u &&
> +       test_all_match git status --porcelain=v2 -uno
> +'
> +
> +test_expect_success 'add, commit, checkout' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +       run_on_all "../edit-contents README.md" &&
> +
> +       test_all_match git add README.md &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Add README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout - &&
> +
> +       run_on_all "../edit-contents README.md" &&
> +
> +       test_all_match git add -A &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "Extend README.md" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout - &&
> +
> +       run_on_all "../edit-contents deep/newfile" &&
> +
> +       test_all_match git status --porcelain=v2 -uno &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git add . &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git commit -m "add deep/newfile" &&
> +
> +       test_all_match git checkout HEAD~1 &&
> +       test_all_match git checkout -
> +'
> +
> +test_expect_success 'checkout and reset --hard' '
> +       init_repos &&
> +
> +       test_all_match git checkout update-folder1 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git checkout update-deep &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git checkout -b reset-test &&
> +       test_all_match git reset --hard deepest &&
> +       test_all_match git reset --hard update-folder1 &&
> +       test_all_match git reset --hard update-folder2
> +'
> +
> +test_expect_success 'diff --staged' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>README.md
> +       EOF
> +       run_on_all "../edit-contents" &&
> +
> +       test_all_match git diff &&
> +       test_all_match git diff --staged &&
> +       test_all_match git add README.md &&
> +       test_all_match git diff &&
> +       test_all_match git diff --staged
> +'
> +
> +test_expect_success 'diff with renames' '
> +       init_repos &&
> +
> +       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       do
> +               test_all_match git checkout rename-base &&
> +               test_all_match git checkout $branch -- .&&
> +               test_all_match git diff --staged --no-renames &&
> +               test_all_match git diff --staged --find-renames || return 1
> +       done
> +'
> +
> +test_expect_success 'log with pathspec outside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git log -- a &&
> +       test_all_match git log -- folder1/a &&
> +       test_all_match git log -- folder2/a &&
> +       test_all_match git log -- deep/a &&
> +       test_all_match git log -- deep/deeper1/a &&
> +       test_all_match git log -- deep/deeper1/deepest/a &&
> +
> +       test_all_match git checkout update-folder1 &&
> +       test_all_match git log -- folder1/a
> +'
> +
> +test_expect_success 'blame with pathspec inside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git blame a &&
> +       test_all_match git blame deep/a &&
> +       test_all_match git blame deep/deeper1/a &&
> +       test_all_match git blame deep/deeper1/deepest/a
> +'
> +
> +# TODO: blame currently does not support blaming files outside of the
> +# sparse definition. It complains that the file doesn't exist locally.
> +test_expect_failure 'blame with pathspec outside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git blame folder1/a &&
> +       test_all_match git blame folder2/a &&
> +       test_all_match git blame deep/deeper2/a &&
> +       test_all_match git blame deep/deeper2/deepest/a
> +'
> +
> +# TODO: reset currently does not behave as expected when in a
> +# sparse-checkout.
> +test_expect_failure 'checkout and reset (mixed)' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset deepest &&
> +       test_all_match git reset update-folder1 &&
> +       test_all_match git reset update-folder2
> +'
> +
> +test_expect_success 'merge' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b merge update-deep &&
> +       test_all_match git merge -m "folder1" update-folder1 &&
> +       test_all_match git rev-parse HEAD^{tree} &&
> +       test_all_match git merge -m "folder2" update-folder2 &&
> +       test_all_match git rev-parse HEAD^{tree}
> +'
> +
> +test_expect_success 'merge with outside renames' '
> +       init_repos &&
> +
> +       for type in out-to-out out-to-in in-to-out
> +       do
> +               test_all_match git reset --hard &&
> +               test_all_match git checkout -f -b merge-$type update-deep &&
> +               test_all_match git merge -m "$type" rename-$type &&
> +               test_all_match git rev-parse HEAD^{tree} || return 1
> +       done
> +'
> +
> +test_expect_success 'clean' '
> +       init_repos &&
> +
> +       echo bogus >>.gitignore &&
> +       run_on_all cp ../.gitignore . &&
> +       test_all_match git add .gitignore &&
> +       test_all_match git commit -m ignore-bogus-files &&
> +
> +       run_on_sparse mkdir folder1 &&
> +       run_on_all touch folder1/bogus &&
> +
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git clean -f &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git clean -xf &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git clean -xdf &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_path_is_dir sparse-checkout/folder1
> +'
> +
> +test_done
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2021-01-26 20:52 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 16:53 [PATCH 0/9] More index cleanups Derrick Stolee via GitGitGadget
2021-01-20 16:53 ` [PATCH 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
2021-01-20 17:21   ` Elijah Newren
2021-01-20 19:10     ` Derrick Stolee
2021-01-20 16:53 ` [PATCH 2/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
2021-01-20 17:23   ` Elijah Newren
2021-01-20 16:53 ` [PATCH 3/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
2021-01-20 17:26   ` Elijah Newren
2021-01-21 12:53   ` Chris Torek
2021-01-21 15:56     ` Derrick Stolee
2021-01-20 16:53 ` [PATCH 4/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
2021-01-20 17:46   ` Elijah Newren
2021-01-20 19:16     ` Derrick Stolee
2021-01-20 19:50       ` Elijah Newren
2021-01-20 16:53 ` [PATCH 5/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
2021-01-20 17:47   ` Elijah Newren
2021-01-20 16:53 ` [PATCH 6/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
2021-01-20 17:54   ` Elijah Newren
2021-01-20 16:53 ` [PATCH 7/9] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget
2021-01-20 18:03   ` Elijah Newren
2021-01-20 19:22     ` Derrick Stolee
2021-01-20 16:53 ` [PATCH 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
2021-01-20 18:20   ` Elijah Newren
2021-01-20 19:24     ` Derrick Stolee
2021-01-20 16:53 ` [PATCH 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
2021-01-20 19:40   ` Elijah Newren
2021-01-21 11:59     ` Derrick Stolee
2021-01-22 13:03 ` [PATCH v2 0/8] More index cleanups Derrick Stolee via GitGitGadget
2021-01-22 13:03   ` [PATCH v2 1/8] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
2021-01-22 19:11     ` Junio C Hamano
2021-01-22 13:03   ` [PATCH v2 2/8] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
2021-01-22 13:03   ` [PATCH v2 3/8] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
2021-01-22 19:18     ` Junio C Hamano
2021-01-22 13:03   ` [PATCH v2 4/8] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
2021-01-22 19:23     ` Junio C Hamano
2021-01-22 13:03   ` [PATCH v2 5/8] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
2021-01-22 13:03   ` [PATCH v2 6/8] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
2021-01-22 13:03   ` [PATCH v2 7/8] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
2021-01-22 19:42     ` Junio C Hamano
2021-01-23 18:36       ` Derrick Stolee
2021-01-23 18:50         ` Junio C Hamano
2021-01-22 13:03   ` [PATCH v2 8/8] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
2021-01-22 19:49   ` [PATCH v2 0/8] More index cleanups Elijah Newren
2021-01-23 18:47     ` Derrick Stolee
2021-01-23 19:58   ` [PATCH v3 0/9] " Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 1/9] cache-tree: clean up cache_tree_update() Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 2/9] cache-tree: simplify verify_cache() prototype Derrick Stolee via GitGitGadget
2021-01-23 20:24       ` Elijah Newren
2021-01-23 21:02         ` Derrick Stolee
2021-01-23 21:10           ` Elijah Newren
2021-01-23 21:41           ` Junio C Hamano
2021-01-23 21:10         ` Junio C Hamano
2021-01-23 21:14           ` Derrick Stolee
2021-01-23 19:58     ` [PATCH v3 3/9] cache-tree: extract subtree_pos() Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 4/9] fsmonitor: de-duplicate BUG()s around dirty bits Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 5/9] repository: add repo reference to index_state Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 6/9] name-hash: use trace2 regions for init Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 7/9] sparse-checkout: load sparse-checkout patterns Derrick Stolee via GitGitGadget
2021-01-23 19:58     ` [PATCH v3 8/9] test-lib: test_region looks for trace2 regions Derrick Stolee via GitGitGadget
2021-01-23 21:07       ` Derrick Stolee
2021-01-23 19:58     ` [PATCH v3 9/9] t1092: test interesting sparse-checkout scenarios Derrick Stolee via GitGitGadget
2021-01-25 18:45       ` Elijah Newren
2021-01-23 20:29     ` [PATCH v3 0/9] More index cleanups Elijah Newren
2021-01-23 21:05       ` Derrick Stolee
2021-01-23 21:42         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.