All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] In grep, no adding submodule ODB as alternates
@ 2021-08-10 18:28 Jonathan Tan
  2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
                   ` (10 more replies)
  0 siblings, 11 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This patch series removes the need to add submodule ODBs as alternates
in all codepaths covered by t7814. I believe that that is also all the
codepaths covered by "git grep", but if it isn't, the uncovered
codepaths will still work - they will just not benefit from the
performance improvements.

In doing this work of migrating away from adding submodule ODBs as
alternates, I'm mainly motivated by the possibility of adding partial
clone submodule support, but this has benefits even for those who do not
use partial clones, as described in the documentation in patch 1.

To reviewers: you can cherry-pick the last patch onto one of the earlier
ones to observe what happens when the code still accesses a submodule
object as if it were in the_repository.

Jonathan Tan (7):
  submodule: lazily add submodule ODBs as alternates
  grep: use submodule-ODB-as-alternate lazy-addition
  grep: typesafe versions of grep_source_init
  grep: read submodule entry with explicit repo
  grep: allocate subrepos on heap
  grep: add repository to OID grep sources
  t7814: show lack of alternate ODB-adding

 builtin/grep.c                     | 49 +++++++++++++++++++++---------
 grep.c                             | 48 ++++++++++++++++++-----------
 grep.h                             | 10 ++++--
 object-file.c                      |  5 +++
 submodule.c                        | 25 ++++++++++++++-
 submodule.h                        |  8 +++++
 t/README                           | 10 ++++++
 t/t7814-grep-recurse-submodules.sh |  3 ++
 8 files changed, 122 insertions(+), 36 deletions(-)

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 1/7] submodule: lazily add submodule ODBs as alternates
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-10 21:13   ` Junio C Hamano
  2021-08-11 21:33   ` Emily Shaffer
  2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Teach Git to add submodule ODBs as alternates to the object store of
the_repository only upon the first access of an object not in
the_repository, and not when add_submodule_odb() is called.

This provides a means of gradually migrating from accessing a
submodule's object through alternates to accessing a submodule's object
by explicitly passing its repository object. Any Git command can declare
that it might access submodule objects by calling add_submodule_odb()
(as they do now), but the submodule ODBs themselves will not be added
until needed, so individual commands and/or combinations of arguments
can be migrated one by one.

[The advantage of explicit repository-object passing is code clarity (it
is clear which repository an object read is from), performance (there is
no need to linearly search through all submodule ODBs whenever an object
is accessed from any repository, whether superproject or submodule), and
the possibility of future features like partial clone submodules (which
right now is not possible because if an object is missing, we do not
know which repository to lazy-fetch into).]

This commit also introduces an environment variable that a test may set
to make the actual registration of alternates fatal, in order to
demonstrate that its codepaths do not need this registration.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-file.c |  5 +++++
 submodule.c   | 20 +++++++++++++++++++-
 submodule.h   |  7 +++++++
 t/README      | 10 ++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 3d27dc8dea..621b121bcb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -32,6 +32,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "submodule.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1592,6 +1593,10 @@ static int do_oid_object_info_extended(struct repository *r,
 				break;
 		}
 
+		if (register_all_submodule_odb_as_alternates())
+			/* We added some alternates; retry */
+			continue;
+
 		/* Check if it is a missing object */
 		if (fetch_if_missing && repo_has_promisor_remote(r) &&
 		    !already_retried &&
diff --git a/submodule.c b/submodule.c
index 8e611fe1db..8fde90e906 100644
--- a/submodule.c
+++ b/submodule.c
@@ -165,6 +165,8 @@ void stage_updated_gitmodules(struct index_state *istate)
 		die(_("staging updated .gitmodules failed"));
 }
 
+static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
+
 /* TODO: remove this function, use repo_submodule_init instead. */
 int add_submodule_odb(const char *path)
 {
@@ -178,12 +180,28 @@ int add_submodule_odb(const char *path)
 		ret = -1;
 		goto done;
 	}
-	add_to_alternates_memory(objects_directory.buf);
+	string_list_insert(&added_submodule_odb_paths,
+			   strbuf_detach(&objects_directory, NULL));
 done:
 	strbuf_release(&objects_directory);
 	return ret;
 }
 
+int register_all_submodule_odb_as_alternates(void)
+{
+	int i;
+	int ret = added_submodule_odb_paths.nr;
+
+	for (i = 0; i < added_submodule_odb_paths.nr; i++)
+		add_to_alternates_memory(added_submodule_odb_paths.items[i].string);
+	if (ret) {
+		string_list_clear(&added_submodule_odb_paths, 0);
+		if (git_env_bool("GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB", 0))
+			BUG("register_all_submodule_odb_as_alternates() called");
+	}
+	return ret;
+}
+
 void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 					     const char *path)
 {
diff --git a/submodule.h b/submodule.h
index 84640c49c1..c252784bc2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -97,7 +97,14 @@ int submodule_uses_gitfile(const char *path);
 #define SUBMODULE_REMOVAL_IGNORE_IGNORED_UNTRACKED (1<<2)
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
+/*
+ * Call add_submodule_odb() to add the submodule at the given path to a list.
+ * When register_all_submodule_odb_as_alternates() is called, the object stores
+ * of all submodules in that list will be added as alternates in
+ * the_repository.
+ */
 int add_submodule_odb(const char *path);
+int register_all_submodule_odb_as_alternates(void);
 
 /*
  * Checks if there are submodule changes in a..b. If a is the null OID,
diff --git a/t/README b/t/README
index 9e70122302..8b67b4f00b 100644
--- a/t/README
+++ b/t/README
@@ -448,6 +448,16 @@ GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
 to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
 execution of the parallel-checkout code.
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=<boolean>, when true, makes
+registering submodule ODBs as alternates a fatal action. Support for
+this environment variable can be removed once the migration to
+explicitly providing repositories when accessing submodule objects is
+complete (in which case we might want to replace this with a trace2
+call so that users can make it visible if accessing submodule objects
+without an explicit repository still happens) or needs to be abandoned
+for whatever reason (in which case the migrated codepaths still retain
+their performance benefits).
+
 Naming Tests
 ------------
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
  2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-11 21:36   ` Emily Shaffer
  2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 2 +-
 submodule.c    | 5 +++++
 submodule.h    | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7d2f8e5adb..87bcb934a2 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -450,7 +450,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_to_alternates_memory(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo.objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
diff --git a/submodule.c b/submodule.c
index 8fde90e906..8de1aecaeb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -187,6 +187,11 @@ int add_submodule_odb(const char *path)
 	return ret;
 }
 
+void add_submodule_odb_by_path(const char *path)
+{
+	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
+}
+
 int register_all_submodule_odb_as_alternates(void)
 {
 	int i;
diff --git a/submodule.h b/submodule.h
index c252784bc2..17a06cc43b 100644
--- a/submodule.h
+++ b/submodule.h
@@ -104,6 +104,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags);
  * the_repository.
  */
 int add_submodule_odb(const char *path);
+void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
 /*
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
  2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
  2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-10 21:38   ` Junio C Hamano
                     ` (2 more replies)
  2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c |  4 ++--
 grep.c         | 43 +++++++++++++++++++++++++++----------------
 grep.h         |  8 +++++---
 3 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 87bcb934a2..e454335e9d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -333,7 +333,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init(&gs, GREP_SOURCE_OID, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
@@ -359,7 +359,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, 0, &buf);
-	grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
+	grep_source_init_file(&gs, buf.buf, filename);
 	strbuf_release(&buf);
 
 	if (num_threads > 1) {
diff --git a/grep.c b/grep.c
index 424a39591b..ba3711dc56 100644
--- a/grep.c
+++ b/grep.c
@@ -1830,7 +1830,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
+	grep_source_init_buf(&gs);
 	gs.buf = buf;
 	gs.size = size;
 
@@ -1840,28 +1840,39 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	return r;
 }
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier)
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path)
 {
-	gs->type = type;
+	gs->type = GREP_SOURCE_FILE;
 	gs->name = xstrdup_or_null(name);
 	gs->path = xstrdup_or_null(path);
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
+	gs->identifier = xstrdup(path);
+}
 
-	switch (type) {
-	case GREP_SOURCE_FILE:
-		gs->identifier = xstrdup(identifier);
-		break;
-	case GREP_SOURCE_OID:
-		gs->identifier = oiddup(identifier);
-		break;
-	case GREP_SOURCE_BUF:
-		gs->identifier = NULL;
-		break;
-	}
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid)
+{
+	gs->type = GREP_SOURCE_OID;
+	gs->name = xstrdup_or_null(name);
+	gs->path = xstrdup_or_null(path);
+	gs->buf = NULL;
+	gs->size = 0;
+	gs->driver = NULL;
+	gs->identifier = oiddup(oid);
+}
+
+void grep_source_init_buf(struct grep_source *gs)
+{
+	gs->type = GREP_SOURCE_BUF;
+	gs->name = NULL;
+	gs->path = NULL;
+	gs->buf = NULL;
+	gs->size = 0;
+	gs->driver = NULL;
+	gs->identifier = NULL;
 }
 
 void grep_source_clear(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 72f82b1e30..f4a3090f1c 100644
--- a/grep.h
+++ b/grep.h
@@ -195,9 +195,11 @@ struct grep_source {
 	struct userdiff_driver *driver;
 };
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier);
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path);
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid);
+void grep_source_init_buf(struct grep_source *gs);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 4/7] grep: read submodule entry with explicit repo
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-11 21:44   ` Emily Shaffer
  2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Replace an existing parse_object_or_die() call (which implicitly works
on the_repository) with a function call that allows a repository to be
passed in. There is no such direct equivalent to parse_object_or_die(),
but we only need the type of the object, so replace with
oid_object_info().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index e454335e9d..9e61c7c993 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -457,27 +457,27 @@ static int grep_submodule(struct grep_opt *opt,
 	subopt.repo = &subrepo;
 
 	if (oid) {
-		struct object *object;
+		enum object_type object_type;
 		struct tree_desc tree;
 		void *data;
 		unsigned long size;
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object = parse_object_or_die(oid, NULL);
+		object_type = oid_object_info(&subrepo, oid, NULL);
 		obj_read_unlock();
 		data = read_object_with_reference(&subrepo,
-						  &object->oid, tree_type,
+						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
-			die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
+			die(_("unable to read tree (%s)"), oid_to_hex(oid));
 
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
-				object->type == OBJ_COMMIT);
+				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
 		free(data);
 	} else {
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 5/7] grep: allocate subrepos on heap
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (3 preceding siblings ...)
  2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-11 21:50   ` Emily Shaffer
  2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Currently, struct repository objects corresponding to submodules are
allocated on the stack in grep_submodule(). This currently works because
they will not be used once grep_submodule() exits, but a subsequent
patch will require these structs to be accessible for longer (perhaps
even in another thread). Allocate them on the heap and clear them only
at the very end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9e61c7c993..5a40e18e47 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -65,6 +65,9 @@ static int todo_done;
 /* Has all work items been added? */
 static int all_work_added;
 
+static struct repository **repos_to_free;
+static size_t repos_to_free_nr, repos_to_free_alloc;
+
 /* This lock protects all the variables above. */
 static pthread_mutex_t grep_mutex;
 
@@ -168,6 +171,17 @@ static void work_done(struct work_item *w)
 	grep_unlock();
 }
 
+static void free_repos(void)
+{
+	int i;
+
+	for (i = 0; i < repos_to_free_nr; i++) {
+		repo_clear(repos_to_free[i]);
+		free(repos_to_free[i]);
+	}
+	free(repos_to_free);
+}
+
 static void *run(void *arg)
 {
 	int hit = 0;
@@ -415,19 +429,24 @@ static int grep_submodule(struct grep_opt *opt,
 			  const struct object_id *oid,
 			  const char *filename, const char *path, int cached)
 {
-	struct repository subrepo;
+	struct repository *subrepo;
 	struct repository *superproject = opt->repo;
 	const struct submodule *sub;
 	struct grep_opt subopt;
-	int hit;
+	int hit = 0;
 
 	sub = submodule_from_path(superproject, null_oid(), path);
 
 	if (!is_submodule_active(superproject, path))
 		return 0;
 
-	if (repo_submodule_init(&subrepo, superproject, sub))
+	subrepo = xmalloc(sizeof(*subrepo));
+	if (repo_submodule_init(subrepo, superproject, sub)) {
+		free(subrepo);
 		return 0;
+	}
+	ALLOC_GROW(repos_to_free, repos_to_free_nr + 1, repos_to_free_alloc);
+	repos_to_free[repos_to_free_nr++] = subrepo;
 
 	/*
 	 * NEEDSWORK: repo_read_gitmodules() might call
@@ -438,7 +457,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * subrepo's odbs to the in-memory alternates list.
 	 */
 	obj_read_lock();
-	repo_read_gitmodules(&subrepo, 0);
+	repo_read_gitmodules(subrepo, 0);
 
 	/*
 	 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -450,11 +469,11 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_submodule_odb_by_path(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo->objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
-	subopt.repo = &subrepo;
+	subopt.repo = subrepo;
 
 	if (oid) {
 		enum object_type object_type;
@@ -464,9 +483,9 @@ static int grep_submodule(struct grep_opt *opt,
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object_type = oid_object_info(&subrepo, oid, NULL);
+		object_type = oid_object_info(subrepo, oid, NULL);
 		obj_read_unlock();
-		data = read_object_with_reference(&subrepo,
+		data = read_object_with_reference(subrepo,
 						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
@@ -484,7 +503,6 @@ static int grep_submodule(struct grep_opt *opt,
 		hit = grep_cache(&subopt, pathspec, cached);
 	}
 
-	repo_clear(&subrepo);
 	return hit;
 }
 
@@ -1182,5 +1200,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		run_pager(&opt, prefix);
 	clear_pathspec(&pathspec);
 	free_grep_patterns(&opt);
+	free_repos();
 	return !hit;
 }
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 6/7] grep: add repository to OID grep sources
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (4 preceding siblings ...)
  2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-11 21:52   ` Emily Shaffer
  2021-08-11 23:28   ` Matheus Tavares Bernardino
  2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 2 +-
 grep.c         | 7 +++++--
 grep.h         | 4 +++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 5a40e18e47..ea6df6dca4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -347,7 +347,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
diff --git a/grep.c b/grep.c
index ba3711dc56..14c677b4ae 100644
--- a/grep.c
+++ b/grep.c
@@ -1853,7 +1853,8 @@ void grep_source_init_file(struct grep_source *gs, const char *name,
 }
 
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid)
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo)
 {
 	gs->type = GREP_SOURCE_OID;
 	gs->name = xstrdup_or_null(name);
@@ -1862,6 +1863,7 @@ void grep_source_init_oid(struct grep_source *gs, const char *name,
 	gs->size = 0;
 	gs->driver = NULL;
 	gs->identifier = oiddup(oid);
+	gs->repo = repo;
 }
 
 void grep_source_init_buf(struct grep_source *gs)
@@ -1901,7 +1903,8 @@ static int grep_source_load_oid(struct grep_source *gs)
 {
 	enum object_type type;
 
-	gs->buf = read_object_file(gs->identifier, &type, &gs->size);
+	gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type,
+					&gs->size);
 	if (!gs->buf)
 		return error(_("'%s': unable to read %s"),
 			     gs->name,
diff --git a/grep.h b/grep.h
index f4a3090f1c..c5234f9b38 100644
--- a/grep.h
+++ b/grep.h
@@ -187,6 +187,7 @@ struct grep_source {
 		GREP_SOURCE_BUF,
 	} type;
 	void *identifier;
+	struct repository *repo; /* if GREP_SOURCE_OID */
 
 	char *buf;
 	unsigned long size;
@@ -198,7 +199,8 @@ struct grep_source {
 void grep_source_init_file(struct grep_source *gs, const char *name,
 			   const char *path);
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid);
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo);
 void grep_source_init_buf(struct grep_source *gs);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 7/7] t7814: show lack of alternate ODB-adding
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (5 preceding siblings ...)
  2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
@ 2021-08-10 18:28 ` Jonathan Tan
  2021-08-11 21:55   ` Emily Shaffer
  2021-08-11 21:29 ` [PATCH 0/7] In grep, no adding submodule ODB as alternates Emily Shaffer
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-10 18:28 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The previous patches have made "git grep" no longer need to add
submodule ODBs as alternates, at least for the code paths tested in
t7814. Demonstrate this by making adding a submodule ODB as an alternate
fatal in this test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 t/t7814-grep-recurse-submodules.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 828cb3ba58..3172f5b936 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -8,6 +8,9 @@ submodules.
 
 . ./test-lib.sh
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
+export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
+
 test_expect_success 'setup directory structure and submodule' '
 	echo "(1|2)d(3|4)" >a &&
 	mkdir b &&
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH 1/7] submodule: lazily add submodule ODBs as alternates
  2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
@ 2021-08-10 21:13   ` Junio C Hamano
  2021-08-13 16:53     ` Jonathan Tan
  2021-08-11 21:33   ` Emily Shaffer
  1 sibling, 1 reply; 71+ messages in thread
From: Junio C Hamano @ 2021-08-10 21:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> @@ -1592,6 +1593,10 @@ static int do_oid_object_info_extended(struct repository *r,
>  				break;
>  		}
>  
> +		if (register_all_submodule_odb_as_alternates())
> +			/* We added some alternates; retry */
> +			continue;
> +

OK.  Unless we are running in the "we no longer are relying on the
submodule-odb-as-alternates hack" mode, the control may reach this
point number of times, so this caller expects the function to return
how many new odbs were added with this single invocation.

> diff --git a/submodule.c b/submodule.c
> index 8e611fe1db..8fde90e906 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -165,6 +165,8 @@ void stage_updated_gitmodules(struct index_state *istate)
>  		die(_("staging updated .gitmodules failed"));
>  }
>  
> +static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
> +

I see 2/7 allows callers to add paths for submodule odb to this
list.

> @@ -178,12 +180,28 @@ int add_submodule_odb(const char *path)
>  		ret = -1;
>  		goto done;
>  	}
> -	add_to_alternates_memory(objects_directory.buf);
> +	string_list_insert(&added_submodule_odb_paths,
> +			   strbuf_detach(&objects_directory, NULL));

OK.  By default, any codepath that still call add_submodule_odb()
will add the paths to submodules to the list (so that they will
lazily be loaded).

> +int register_all_submodule_odb_as_alternates(void)
> +{
> +	int i;
> +	int ret = added_submodule_odb_paths.nr;
> +
> +	for (i = 0; i < added_submodule_odb_paths.nr; i++)
> +		add_to_alternates_memory(added_submodule_odb_paths.items[i].string);
> +	if (ret) {
> +		string_list_clear(&added_submodule_odb_paths, 0);
> +		if (git_env_bool("GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB", 0))
> +			BUG("register_all_submodule_odb_as_alternates() called");
> +	}
> +	return ret;
> +}

OK.  We add new ones that were added to the list since we were
called for the last time and clear the list.  When we didn't add
anything, we will return 0.  Seems to mean well.

I wonder if we need to prepare ourselves to catch callers that call
add_submodule_odb() to the same path twice or more.  Probably not.

Thanks.

The "pretend as if objects in submodule odb are locally available",
even though it may have been useful, was an ugly hack invented
before we started adding the "access more than one repository at the
time with the 'repo' arg" extended API.  It is pleasing to see this
step shows an approach to incrementally migrate the users of the
hack.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
@ 2021-08-10 21:38   ` Junio C Hamano
  2021-08-11 21:42   ` Emily Shaffer
  2021-08-11 22:45   ` Matheus Tavares Bernardino
  2 siblings, 0 replies; 71+ messages in thread
From: Junio C Hamano @ 2021-08-10 21:38 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> grep_source_init() can create "struct grep_source" objects and,
> depending on the value of the type passed, some void-pointer parameters have
> different meanings. Because one of these types (GREP_SOURCE_OID) will
> require an additional parameter in a subsequent patch, take the
> opportunity to increase clarity and type safety by replacing this
> function with individual functions for each type.

Nice clean-up.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 0/7] In grep, no adding submodule ODB as alternates
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (6 preceding siblings ...)
  2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
@ 2021-08-11 21:29 ` Emily Shaffer
  2021-08-11 22:49 ` Josh Steadmon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:38AM -0700, Jonathan Tan wrote:
> 
> This patch series removes the need to add submodule ODBs as alternates
> in all codepaths covered by t7814. I believe that that is also all the
> codepaths covered by "git grep", but if it isn't, the uncovered
> codepaths will still work - they will just not benefit from the
> performance improvements.
> 
> In doing this work of migrating away from adding submodule ODBs as
> alternates, I'm mainly motivated by the possibility of adding partial
> clone submodule support, but this has benefits even for those who do not
> use partial clones, as described in the documentation in patch 1.
> 
> To reviewers: you can cherry-pick the last patch onto one of the earlier
> ones to observe what happens when the code still accesses a submodule
> object as if it were in the_repository.

One note - I liked this tip, I think that is a useful thing for
reviewers to see. Thanks.

 - Emily

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 1/7] submodule: lazily add submodule ODBs as alternates
  2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
  2021-08-10 21:13   ` Junio C Hamano
@ 2021-08-11 21:33   ` Emily Shaffer
  2021-08-13 16:23     ` Jonathan Tan
  1 sibling, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:39AM -0700, Jonathan Tan wrote:
> diff --git a/object-file.c b/object-file.c
> index 3d27dc8dea..621b121bcb 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -32,6 +32,7 @@
>  #include "packfile.h"
>  #include "object-store.h"
>  #include "promisor-remote.h"
> +#include "submodule.h"
>  
>  /* The maximum size for an object header. */
>  #define MAX_HEADER_LEN 32
> @@ -1592,6 +1593,10 @@ static int do_oid_object_info_extended(struct repository *r,
>  				break;
>  		}
>  
> +		if (register_all_submodule_odb_as_alternates())
> +			/* We added some alternates; retry */
> +			continue;
> +

Ok, this is where we finally get around to loading the alternate much
later. Fine.

>  		/* Check if it is a missing object */
>  		if (fetch_if_missing && repo_has_promisor_remote(r) &&
>  		    !already_retried &&
> diff --git a/submodule.c b/submodule.c
> index 8e611fe1db..8fde90e906 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -165,6 +165,8 @@ void stage_updated_gitmodules(struct index_state *istate)
>  		die(_("staging updated .gitmodules failed"));
>  }
>  
> +static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
> +
>  /* TODO: remove this function, use repo_submodule_init instead. */
>  int add_submodule_odb(const char *path)
>  {
> @@ -178,12 +180,28 @@ int add_submodule_odb(const char *path)
>  		ret = -1;
>  		goto done;
>  	}
> -	add_to_alternates_memory(objects_directory.buf);
> +	string_list_insert(&added_submodule_odb_paths,
> +			   strbuf_detach(&objects_directory, NULL));

And here is where we hijack the usual alternate load and put the path
into the lazy load list instead. Ok.

>  done:
>  	strbuf_release(&objects_directory);
>  	return ret;
>  }
>  
> +int register_all_submodule_odb_as_alternates(void)
> +{
> +	int i;
> +	int ret = added_submodule_odb_paths.nr;
> +
> +	for (i = 0; i < added_submodule_odb_paths.nr; i++)
> +		add_to_alternates_memory(added_submodule_odb_paths.items[i].string);
> +	if (ret) {
> +		string_list_clear(&added_submodule_odb_paths, 0);
> +		if (git_env_bool("GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB", 0))
> +			BUG("register_all_submodule_odb_as_alternates() called");

Nice, and this is the flag you mentioned in the cover letter to complain
if we're trying to use alternates to access submodule objects. This will
be useful for finding out other random places where we are using that
weird hack.

> +	}
> +	return ret;
> +}
> +
>  void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
>  					     const char *path)
>  {
> diff --git a/submodule.h b/submodule.h
> index 84640c49c1..c252784bc2 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -97,7 +97,14 @@ int submodule_uses_gitfile(const char *path);
>  #define SUBMODULE_REMOVAL_IGNORE_IGNORED_UNTRACKED (1<<2)
>  int bad_to_remove_submodule(const char *path, unsigned flags);
>  
> +/*
> + * Call add_submodule_odb() to add the submodule at the given path to a list.
> + * When register_all_submodule_odb_as_alternates() is called, the object stores
> + * of all submodules in that list will be added as alternates in
> + * the_repository.
> + */
>  int add_submodule_odb(const char *path);
> +int register_all_submodule_odb_as_alternates(void);

Does it need to be public? Could this be a static in submodule.c
instead?

>  
>  /*
>   * Checks if there are submodule changes in a..b. If a is the null OID,
> diff --git a/t/README b/t/README
> index 9e70122302..8b67b4f00b 100644
> --- a/t/README
> +++ b/t/README
> @@ -448,6 +448,16 @@ GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
>  to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
>  execution of the parallel-checkout code.
>  
> +GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=<boolean>, when true, makes
> +registering submodule ODBs as alternates a fatal action. Support for
> +this environment variable can be removed once the migration to
> +explicitly providing repositories when accessing submodule objects is
> +complete (in which case we might want to replace this with a trace2
> +call so that users can make it visible if accessing submodule objects
> +without an explicit repository still happens) or needs to be abandoned
> +for whatever reason (in which case the migrated codepaths still retain
> +their performance benefits).
> +
>  Naming Tests
>  ------------
>  
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition
  2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
@ 2021-08-11 21:36   ` Emily Shaffer
  2021-08-13 16:31     ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:36 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:40AM -0700, Jonathan Tan wrote:
> 
> In the parent commit, Git was taught to add submodule ODBs as alternates
> lazily, but grep does not use this because it computes the path to add
> directly, not going through add_submodule_odb(). Add an equivalent to
> add_submodule_odb() that takes the exact ODB path and teach grep to use
> it.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/grep.c | 2 +-
>  submodule.c    | 5 +++++
>  submodule.h    | 1 +
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 7d2f8e5adb..87bcb934a2 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -450,7 +450,7 @@ static int grep_submodule(struct grep_opt *opt,
>  	 * store is no longer global and instead is a member of the repository
>  	 * object.
>  	 */
> -	add_to_alternates_memory(subrepo.objects->odb->path);
> +	add_submodule_odb_by_path(subrepo.objects->odb->path);

I had wondered whether we can entirely drop add_to_alternates_memory()
but I see on second reading that that's still used by
register_all_submodule_odb...(). I wonder if it can become static in
submodule.c to prevent users from dodging the envvar+BUG()?

>  	obj_read_unlock();
>  
>  	memcpy(&subopt, opt, sizeof(subopt));
> diff --git a/submodule.c b/submodule.c
> index 8fde90e906..8de1aecaeb 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -187,6 +187,11 @@ int add_submodule_odb(const char *path)
>  	return ret;
>  }
>  
> +void add_submodule_odb_by_path(const char *path)
> +{
> +	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
> +}
> +
>  int register_all_submodule_odb_as_alternates(void)
>  {
>  	int i;
> diff --git a/submodule.h b/submodule.h
> index c252784bc2..17a06cc43b 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -104,6 +104,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags);
>   * the_repository.
>   */
>  int add_submodule_odb(const char *path);
Curious whether add_submodule_odb() can be reduced by calling
add_submodule_odb_by_path() internally.
> +void add_submodule_odb_by_path(const char *path);
>  int register_all_submodule_odb_as_alternates(void);
>  
>  /*
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

These are all pretty minor comments; with or without them,

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
  2021-08-10 21:38   ` Junio C Hamano
@ 2021-08-11 21:42   ` Emily Shaffer
  2021-08-11 23:07     ` Ramsay Jones
  2021-08-11 22:45   ` Matheus Tavares Bernardino
  2 siblings, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:42 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:41AM -0700, Jonathan Tan wrote:
> 
> grep_source_init() can create "struct grep_source" objects and,
> depending on the value of the type passed, some void-pointer parameters have
> different meanings. Because one of these types (GREP_SOURCE_OID) will
> require an additional parameter in a subsequent patch, take the
> opportunity to increase clarity and type safety by replacing this
> function with individual functions for each type.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Like Junio said, it is very neat.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 4/7] grep: read submodule entry with explicit repo
  2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
@ 2021-08-11 21:44   ` Emily Shaffer
  2021-08-13 16:39     ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:44 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:42AM -0700, Jonathan Tan wrote:
> 
> Replace an existing parse_object_or_die() call (which implicitly works
> on the_repository) with a function call that allows a repository to be
> passed in. There is no such direct equivalent to parse_object_or_die(),
> but we only need the type of the object, so replace with
> oid_object_info().

Always exciting to see less implicit use of the_repository ;)

> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/grep.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index e454335e9d..9e61c7c993 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -457,27 +457,27 @@ static int grep_submodule(struct grep_opt *opt,
>  	subopt.repo = &subrepo;
>  
>  	if (oid) {
> -		struct object *object;
> +		enum object_type object_type;
>  		struct tree_desc tree;
>  		void *data;
>  		unsigned long size;
>  		struct strbuf base = STRBUF_INIT;
>  
>  		obj_read_lock();
> -		object = parse_object_or_die(oid, NULL);
> +		object_type = oid_object_info(&subrepo, oid, NULL);

One thing I wonder is whether we are missing out on some error
conditions we used to get with parse_object_or_die() by using
oid_object_info() instead. Do we need to be more defensive in
investigating 'oid' before calling that helper, now?

>  		obj_read_unlock();
>  		data = read_object_with_reference(&subrepo,
> -						  &object->oid, tree_type,
> +						  oid, tree_type,

And a handful of instances where we were using object->oid instead of
the oid we were already passed. Ok.

>  						  &size, NULL);
>  		if (!data)
> -			die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
> +			die(_("unable to read tree (%s)"), oid_to_hex(oid));
>  
>  		strbuf_addstr(&base, filename);
>  		strbuf_addch(&base, '/');
>  
>  		init_tree_desc(&tree, data, size);
>  		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
> -				object->type == OBJ_COMMIT);
> +				object_type == OBJ_COMMIT);

And finally, using the type we got from oid_object_info instead. Ok.

>  		strbuf_release(&base);
>  		free(data);
>  	} else {
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

LGTM. Thanks.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 5/7] grep: allocate subrepos on heap
  2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
@ 2021-08-11 21:50   ` Emily Shaffer
  2021-08-13 16:42     ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:43AM -0700, Jonathan Tan wrote:
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 9e61c7c993..5a40e18e47 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -65,6 +65,9 @@ static int todo_done;
>  /* Has all work items been added? */
>  static int all_work_added;
>  
> +static struct repository **repos_to_free;
> +static size_t repos_to_free_nr, repos_to_free_alloc;

One thing I was curious about was whether it would make more sense to
use an existing utility library in Git to handle this. But I think we
kind of are, since ALLOC_GROW is used, so this is idiomatic for Git
project. Ok, fine, I guess this is life at C ;)

> +
>  /* This lock protects all the variables above. */
>  static pthread_mutex_t grep_mutex;
>  
> @@ -168,6 +171,17 @@ static void work_done(struct work_item *w)
>  	grep_unlock();
>  }
>  
> +static void free_repos(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < repos_to_free_nr; i++) {
> +		repo_clear(repos_to_free[i]);
> +		free(repos_to_free[i]);
> +	}
> +	free(repos_to_free);
> +}

Should repos_to_free_nr be reset here? I guess it doesn't make sense to,
since we'd be trying to use-after-free the initial repos_to_free head
pointer too if we wanted to reuse this array.

> +
>  static void *run(void *arg)
>  {
>  	int hit = 0;
> @@ -415,19 +429,24 @@ static int grep_submodule(struct grep_opt *opt,
>  			  const struct object_id *oid,
>  			  const char *filename, const char *path, int cached)
>  {
> -	struct repository subrepo;
> +	struct repository *subrepo;
>  	struct repository *superproject = opt->repo;
>  	const struct submodule *sub;
>  	struct grep_opt subopt;
> -	int hit;
> +	int hit = 0;
>  
>  	sub = submodule_from_path(superproject, null_oid(), path);
>  
>  	if (!is_submodule_active(superproject, path))
>  		return 0;
>  
> -	if (repo_submodule_init(&subrepo, superproject, sub))
> +	subrepo = xmalloc(sizeof(*subrepo));
> +	if (repo_submodule_init(subrepo, superproject, sub)) {
> +		free(subrepo);
>  		return 0;
> +	}
> +	ALLOC_GROW(repos_to_free, repos_to_free_nr + 1, repos_to_free_alloc);
> +	repos_to_free[repos_to_free_nr++] = subrepo;

Is this the only place we add to the repos_to_free array? It looks like
yes, so I guess this doesn't need a helper.

>  
>  	/*
>  	 * NEEDSWORK: repo_read_gitmodules() might call
> @@ -438,7 +457,7 @@ static int grep_submodule(struct grep_opt *opt,
>  	 * subrepo's odbs to the in-memory alternates list.
>  	 */
>  	obj_read_lock();
> -	repo_read_gitmodules(&subrepo, 0);
> +	repo_read_gitmodules(subrepo, 0);
>  
>  	/*
>  	 * NEEDSWORK: This adds the submodule's object directory to the list of
> @@ -450,11 +469,11 @@ static int grep_submodule(struct grep_opt *opt,
>  	 * store is no longer global and instead is a member of the repository
>  	 * object.
>  	 */
> -	add_submodule_odb_by_path(subrepo.objects->odb->path);
> +	add_submodule_odb_by_path(subrepo->objects->odb->path);
>  	obj_read_unlock();
>  
>  	memcpy(&subopt, opt, sizeof(subopt));
> -	subopt.repo = &subrepo;
> +	subopt.repo = subrepo;
>  
>  	if (oid) {
>  		enum object_type object_type;
> @@ -464,9 +483,9 @@ static int grep_submodule(struct grep_opt *opt,
>  		struct strbuf base = STRBUF_INIT;
>  
>  		obj_read_lock();
> -		object_type = oid_object_info(&subrepo, oid, NULL);
> +		object_type = oid_object_info(subrepo, oid, NULL);
>  		obj_read_unlock();
> -		data = read_object_with_reference(&subrepo,
> +		data = read_object_with_reference(subrepo,
>  						  oid, tree_type,
>  						  &size, NULL);
>  		if (!data)
> @@ -484,7 +503,6 @@ static int grep_submodule(struct grep_opt *opt,
>  		hit = grep_cache(&subopt, pathspec, cached);
>  	}
>  
> -	repo_clear(&subrepo);
>  	return hit;
>  }
>  
> @@ -1182,5 +1200,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		run_pager(&opt, prefix);
>  	clear_pathspec(&pathspec);
>  	free_grep_patterns(&opt);
> +	free_repos();
>  	return !hit;
>  }

The rest of the diff seems to be a pretty clear object-becomes-reference
conversion, so...

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 6/7] grep: add repository to OID grep sources
  2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
@ 2021-08-11 21:52   ` Emily Shaffer
  2021-08-13 16:44     ` Jonathan Tan
  2021-08-11 23:28   ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote:
> 
> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.

Since this extra information is not used by this series at all, does it
make sense to only include this patch with the series covering the rest
of your partial-clone-with-submodules work?

 - Emily

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 7/7] t7814: show lack of alternate ODB-adding
  2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
@ 2021-08-11 21:55   ` Emily Shaffer
  2021-08-11 22:22     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 71+ messages in thread
From: Emily Shaffer @ 2021-08-11 21:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 11:28:45AM -0700, Jonathan Tan wrote:
> 
> The previous patches have made "git grep" no longer need to add
> submodule ODBs as alternates, at least for the code paths tested in
> t7814. Demonstrate this by making adding a submodule ODB as an alternate
> fatal in this test.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  t/t7814-grep-recurse-submodules.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 828cb3ba58..3172f5b936 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -8,6 +8,9 @@ submodules.
>  
>  . ./test-lib.sh
>  
> +GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
> +export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
> +
>  test_expect_success 'setup directory structure and submodule' '
>  	echo "(1|2)d(3|4)" >a &&
>  	mkdir b &&
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

This proof seems pretty handy, assuming nobody else is directly calling
add_to_alternates_memory() (and therefore skipping the envvar check). I
would feel slightly more convinced if that function was file-static
somewhere, but I am not familiar with the problem space enough to say
whether that's possible.

This patch by itself, though:
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

 - Emily

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 7/7] t7814: show lack of alternate ODB-adding
  2021-08-11 21:55   ` Emily Shaffer
@ 2021-08-11 22:22     ` Matheus Tavares Bernardino
  2021-08-13 16:50       ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-11 22:22 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, git

On Wed, Aug 11, 2021 at 6:55 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Tue, Aug 10, 2021 at 11:28:45AM -0700, Jonathan Tan wrote:
> >
> > The previous patches have made "git grep" no longer need to add
> > submodule ODBs as alternates, at least for the code paths tested in
> > t7814. Demonstrate this by making adding a submodule ODB as an alternate
> > fatal in this test.
> >
> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> > ---
> >  t/t7814-grep-recurse-submodules.sh | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> > index 828cb3ba58..3172f5b936 100755
> > --- a/t/t7814-grep-recurse-submodules.sh
> > +++ b/t/t7814-grep-recurse-submodules.sh
> > @@ -8,6 +8,9 @@ submodules.
> >
> >  . ./test-lib.sh
> >
> > +GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
> > +export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
> > +
> >  test_expect_success 'setup directory structure and submodule' '
> >       echo "(1|2)d(3|4)" >a &&
> >       mkdir b &&
> > --
> > 2.33.0.rc1.237.g0d66db33f3-goog
> >
>
> This proof seems pretty handy, assuming nobody else is directly calling
> add_to_alternates_memory() (and therefore skipping the envvar check).

Hmm, there is at least one call chain in grep which might end up
calling add_to_alternates_memory() directly (although it only seems to
happen on a very specific case):

        grep_submodule > repo_read_gitmodules >
        config_from_gitmodules > add_to_alternates_memory

We can check that with the following:

git init A
git init A/B
git init A/B/C

echo f >A/B/C/f
git -C A/B/C add f
git -C A/B/C commit -m f

git -C A/B submodule add ./C
git -C A/B commit -m C

git -C A submodule add ./B
git -C A commit -m B

rm B/.gitmodules

gdb -ex 'break add_to_alternates_memory' -ex 'run' --args \
    git grep --recurse-submodules .

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
  2021-08-10 21:38   ` Junio C Hamano
  2021-08-11 21:42   ` Emily Shaffer
@ 2021-08-11 22:45   ` Matheus Tavares Bernardino
  2021-08-12 16:49     ` Junio C Hamano
  2 siblings, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-11 22:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> diff --git a/grep.c b/grep.c
> index 424a39591b..ba3711dc56 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1830,7 +1830,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
>         struct grep_source gs;
>         int r;
>
> -       grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
> +       grep_source_init_buf(&gs);
>         gs.buf = buf;
>         gs.size = size;

Small nit: perhaps `grep_source_init_buf()` could take `buf` and
`size` too, so that all the fields get initialized by the same
function.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 0/7] In grep, no adding submodule ODB as alternates
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (7 preceding siblings ...)
  2021-08-11 21:29 ` [PATCH 0/7] In grep, no adding submodule ODB as alternates Emily Shaffer
@ 2021-08-11 22:49 ` Josh Steadmon
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
  10 siblings, 0 replies; 71+ messages in thread
From: Josh Steadmon @ 2021-08-11 22:49 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2021.08.10 11:28, Jonathan Tan wrote:
> This patch series removes the need to add submodule ODBs as alternates
> in all codepaths covered by t7814. I believe that that is also all the
> codepaths covered by "git grep", but if it isn't, the uncovered
> codepaths will still work - they will just not benefit from the
> performance improvements.
> 
> In doing this work of migrating away from adding submodule ODBs as
> alternates, I'm mainly motivated by the possibility of adding partial
> clone submodule support, but this has benefits even for those who do not
> use partial clones, as described in the documentation in patch 1.
> 
> To reviewers: you can cherry-pick the last patch onto one of the earlier
> ones to observe what happens when the code still accesses a submodule
> object as if it were in the_repository.

Thanks for this series. I am very excited to see steps toward enabling
partial clones for submodules! And having fewer implicit references to
the_repository is great as well.

I don't have any comments that other reviewers haven't already
mentioned, so I'll just add:

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-11 21:42   ` Emily Shaffer
@ 2021-08-11 23:07     ` Ramsay Jones
  2021-08-13 16:32       ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Ramsay Jones @ 2021-08-11 23:07 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, git

On Wed, Aug 11, 2021 at 02:42:20PM -0700, Emily Shaffer wrote:
> On Tue, Aug 10, 2021 at 11:28:41AM -0700, Jonathan Tan wrote:
> > 
> > grep_source_init() can create "struct grep_source" objects and,
> > depending on the value of the type passed, some void-pointer parameters have
> > different meanings. Because one of these types (GREP_SOURCE_OID) will
> > require an additional parameter in a subsequent patch, take the
> > opportunity to increase clarity and type safety by replacing this
> > function with individual functions for each type.
> > 
> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Like Junio said, it is very neat.

[Sorry for piggy-backing, I have already deleted the original mail :( ]

Just a quick note: grep_source_init_buf() is only called from
grep.c:1833, before its definition at grep.c:1869, so it could be marked
as static (as things stand). Do you anticipate any future callers from
outside of grep.c? (after removing the declaration from grep.h, you
would need to add a forward declaration or, better, move the definition
to before the call (or the call (and grep_buffer()) after the definition)).

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 6/7] grep: add repository to OID grep sources
  2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
  2021-08-11 21:52   ` Emily Shaffer
@ 2021-08-11 23:28   ` Matheus Tavares Bernardino
  2021-08-13 16:47     ` Jonathan Tan
  1 sibling, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-11 23:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.

This patch fixes the second NEEDSWORK comment at grep_submodule(),
right? Maybe this comment could be replaced with a mention that
add_submodule_odb_by_path() is called for testing purposes, and it
should no longer produce a real addition to the alternates list?

> diff --git a/grep.h b/grep.h
> index f4a3090f1c..c5234f9b38 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -187,6 +187,7 @@ struct grep_source {
>                 GREP_SOURCE_BUF,
>         } type;
>         void *identifier;
> +       struct repository *repo; /* if GREP_SOURCE_OID */

Hmm, the grep threads now have two `struct repository` references, one
in `struct grep_source` and one in `struct grep_opt` (see
builtin/grep.c:run()). The one from `struct grep_opt` will always be
`the_repository` (in the worker threads). I wonder if, in the long
run, we could instruct the worker threads to use the new reference
from `struct grep_source` throughout grep.c, and perhaps even remove
the one from `struct grep_opt`.

This would solve the issue I mentioned in [1], about git grep
currently ignoring the textconv attributes from submodules, and
instead applying the ones from the superproject in the submodules'
files. (Here there is an example of this bug: [2] .)

It would also allow grep to use the textconv cache from each
submodule, instead of saving/reading everything from the
superproject's textconv cache.

While this doesn't happen, though, could we perhaps add a comment
somewhere to avoid any confusion regarding the two different
repository pointers that the worker threads hold?

[1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
[2]: https://gitlab.com/-/snippets/1896951

>         char *buf;
>         unsigned long size;
> @@ -198,7 +199,8 @@ struct grep_source {
>  void grep_source_init_file(struct grep_source *gs, const char *name,
>                            const char *path);
>  void grep_source_init_oid(struct grep_source *gs, const char *name,
> -                         const char *path, const struct object_id *oid);
> +                         const char *path, const struct object_id *oid,
> +                         struct repository *repo);
>  void grep_source_init_buf(struct grep_source *gs);
>  void grep_source_clear_data(struct grep_source *gs);
>  void grep_source_clear(struct grep_source *gs);
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-11 22:45   ` Matheus Tavares Bernardino
@ 2021-08-12 16:49     ` Junio C Hamano
  2021-08-13 16:33       ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Junio C Hamano @ 2021-08-12 16:49 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: Jonathan Tan, git

Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>>
>> diff --git a/grep.c b/grep.c
>> index 424a39591b..ba3711dc56 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -1830,7 +1830,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
>>         struct grep_source gs;
>>         int r;
>>
>> -       grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
>> +       grep_source_init_buf(&gs);
>>         gs.buf = buf;
>>         gs.size = size;
>
> Small nit: perhaps `grep_source_init_buf()` could take `buf` and
> `size` too, so that all the fields get initialized by the same
> function.

Sounds sensible.  Thanks.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 1/7] submodule: lazily add submodule ODBs as alternates
  2021-08-11 21:33   ` Emily Shaffer
@ 2021-08-13 16:23     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:23 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> > +int register_all_submodule_odb_as_alternates(void);
> 
> Does it need to be public? Could this be a static in submodule.c
> instead?

Thanks for taking a look at this patch series.

To answer this question: no - in this patch, I need to use it from
object-file.c to actually register the submodule ODBs as alternates when
we try to read an object that is not in the superproject.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition
  2021-08-11 21:36   ` Emily Shaffer
@ 2021-08-13 16:31     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:31 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> I had wondered whether we can entirely drop add_to_alternates_memory()
> but I see on second reading that that's still used by
> register_all_submodule_odb...(). I wonder if it can become static in
> submodule.c to prevent users from dodging the envvar+BUG()?

You mean make add_to_alternates_memory() static? I'm not sure how we can
do that - alternates is a concern that extends beyond submodules.

> Curious whether add_submodule_odb() can be reduced by calling
> add_submodule_odb_by_path() internally.

That's a reasonable idea, but I don't think it works in this case - in
particular, add_submodule_odb_by_path() dups its argument whereas
add_submodule_odb() already has an allocated string that it is prepared
to give up ownership of. Also, add_submodule_odb_by_path() is only one
line, so it won't save us much.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-11 23:07     ` Ramsay Jones
@ 2021-08-13 16:32       ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:32 UTC (permalink / raw)
  To: ramsay; +Cc: emilyshaffer, jonathantanmy, git

> Just a quick note: grep_source_init_buf() is only called from
> grep.c:1833, before its definition at grep.c:1869, so it could be marked
> as static (as things stand). Do you anticipate any future callers from
> outside of grep.c? (after removing the declaration from grep.h, you
> would need to add a forward declaration or, better, move the definition
> to before the call (or the call (and grep_buffer()) after the definition)).

No, I do not - thanks for the note, and I'll make it static in the next
reroll.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 3/7] grep: typesafe versions of grep_source_init
  2021-08-12 16:49     ` Junio C Hamano
@ 2021-08-13 16:33       ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:33 UTC (permalink / raw)
  To: gitster; +Cc: matheus.bernardino, jonathantanmy, git

> > Small nit: perhaps `grep_source_init_buf()` could take `buf` and
> > `size` too, so that all the fields get initialized by the same
> > function.
> 
> Sounds sensible.  Thanks.

Makes sense - I'll do this.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 4/7] grep: read submodule entry with explicit repo
  2021-08-11 21:44   ` Emily Shaffer
@ 2021-08-13 16:39     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:39 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> > -		object = parse_object_or_die(oid, NULL);
> > +		object_type = oid_object_info(&subrepo, oid, NULL);
> 
> One thing I wonder is whether we are missing out on some error
> conditions we used to get with parse_object_or_die() by using
> oid_object_info() instead. Do we need to be more defensive in
> investigating 'oid' before calling that helper, now?

For the purposes of grep, I don't think that this matters - we just want
to find the ultimate tree that this object represents.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 5/7] grep: allocate subrepos on heap
  2021-08-11 21:50   ` Emily Shaffer
@ 2021-08-13 16:42     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:42 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> > +static struct repository **repos_to_free;
> > +static size_t repos_to_free_nr, repos_to_free_alloc;
> 
> One thing I was curious about was whether it would make more sense to
> use an existing utility library in Git to handle this. But I think we
> kind of are, since ALLOC_GROW is used, so this is idiomatic for Git
> project. Ok, fine, I guess this is life at C ;)

Yeah, this *is* the utility library :-)

> > +static void free_repos(void)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < repos_to_free_nr; i++) {
> > +		repo_clear(repos_to_free[i]);
> > +		free(repos_to_free[i]);
> > +	}
> > +	free(repos_to_free);
> > +}
> 
> Should repos_to_free_nr be reset here? I guess it doesn't make sense to,
> since we'd be trying to use-after-free the initial repos_to_free head
> pointer too if we wanted to reuse this array.

Hmm...I guess that if I'm going through the trouble of clearing the
repos instead of just letting all the memory be collected at the end of
a process, I should also reset _nr and _alloc. I'll make that change.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 6/7] grep: add repository to OID grep sources
  2021-08-11 21:52   ` Emily Shaffer
@ 2021-08-13 16:44     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:44 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote:
> > 
> > Record the repository whenever an OID grep source is created, and teach
> > the worker threads to explicitly provide the repository when accessing
> > objects.
> 
> Since this extra information is not used by this series at all, does it
> make sense to only include this patch with the series covering the rest
> of your partial-clone-with-submodules work?

It is used by this series - it prevents an instance in which the
submodule ODBs would be lazily registered as alternates. I'll add a note
in the commit message to this effect.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 6/7] grep: add repository to OID grep sources
  2021-08-11 23:28   ` Matheus Tavares Bernardino
@ 2021-08-13 16:47     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:47 UTC (permalink / raw)
  To: matheus.bernardino; +Cc: jonathantanmy, git

> On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > Record the repository whenever an OID grep source is created, and teach
> > the worker threads to explicitly provide the repository when accessing
> > objects.
> 
> This patch fixes the second NEEDSWORK comment at grep_submodule(),
> right? Maybe this comment could be replaced with a mention that
> add_submodule_odb_by_path() is called for testing purposes, and it
> should no longer produce a real addition to the alternates list?

Good catch. I'll update the comment.

> > diff --git a/grep.h b/grep.h
> > index f4a3090f1c..c5234f9b38 100644
> > --- a/grep.h
> > +++ b/grep.h
> > @@ -187,6 +187,7 @@ struct grep_source {
> >                 GREP_SOURCE_BUF,
> >         } type;
> >         void *identifier;
> > +       struct repository *repo; /* if GREP_SOURCE_OID */
> 
> Hmm, the grep threads now have two `struct repository` references, one
> in `struct grep_source` and one in `struct grep_opt` (see
> builtin/grep.c:run()). The one from `struct grep_opt` will always be
> `the_repository` (in the worker threads). I wonder if, in the long
> run, we could instruct the worker threads to use the new reference
> from `struct grep_source` throughout grep.c, and perhaps even remove
> the one from `struct grep_opt`.
> 
> This would solve the issue I mentioned in [1], about git grep
> currently ignoring the textconv attributes from submodules, and
> instead applying the ones from the superproject in the submodules'
> files. (Here there is an example of this bug: [2] .)
> 
> It would also allow grep to use the textconv cache from each
> submodule, instead of saving/reading everything from the
> superproject's textconv cache.
> 
> While this doesn't happen, though, could we perhaps add a comment
> somewhere to avoid any confusion regarding the two different
> repository pointers that the worker threads hold?
> 
> [1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
> [2]: https://gitlab.com/-/snippets/1896951

Ah...another good catch. Thanks also for a link to your email - I'll
mention it when I add the comment you suggested.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 7/7] t7814: show lack of alternate ODB-adding
  2021-08-11 22:22     ` Matheus Tavares Bernardino
@ 2021-08-13 16:50       ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:50 UTC (permalink / raw)
  To: matheus.bernardino; +Cc: emilyshaffer, jonathantanmy, git

> Hmm, there is at least one call chain in grep which might end up
> calling add_to_alternates_memory() directly (although it only seems to
> happen on a very specific case):
> 
>         grep_submodule > repo_read_gitmodules >
>         config_from_gitmodules > add_to_alternates_memory
> 
> We can check that with the following:

[snip reproduction recipe]

Thanks - it looks like my patch set is incomplete then. I'll make
config_from_gitmodules() use my new function and if the existing grep
test cases don't cover your reproduction recipe, I'll add yours in.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 1/7] submodule: lazily add submodule ODBs as alternates
  2021-08-10 21:13   ` Junio C Hamano
@ 2021-08-13 16:53     ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 16:53 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > +static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
> > +
> 
> I see 2/7 allows callers to add paths for submodule odb to this
> list.

Yes, and as you said below, add_submodule_odb() has been modified to add
to this list too.

> The "pretend as if objects in submodule odb are locally available",
> even though it may have been useful, was an ugly hack invented
> before we started adding the "access more than one repository at the
> time with the 'repo' arg" extended API.  It is pleasing to see this
> step shows an approach to incrementally migrate the users of the
> hack.

Thanks.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v2 0/8] In grep, no adding submodule ODB as alternates
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (8 preceding siblings ...)
  2021-08-11 22:49 ` Josh Steadmon
@ 2021-08-13 21:05 ` Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
                     ` (8 more replies)
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
  10 siblings, 9 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

Thanks everyone for your reviews. I believe I've addressed all of them.

The main change is that I've also made a change in submodule-config to
avoid registering submodule ODBs as alternates (thanks, Matheus, for
noticing this).

Jonathan Tan (8):
  submodule: lazily add submodule ODBs as alternates
  grep: use submodule-ODB-as-alternate lazy-addition
  grep: typesafe versions of grep_source_init
  grep: read submodule entry with explicit repo
  grep: allocate subrepos on heap
  grep: add repository to OID grep sources
  submodule-config: pass repo upon blob config read
  t7814: show lack of alternate ODB-adding

 builtin/grep.c                     | 64 +++++++++++++++++++-----------
 config.c                           | 18 ++++++---
 config.h                           |  3 ++
 grep.c                             | 51 +++++++++++++++---------
 grep.h                             | 18 +++++++--
 object-file.c                      |  5 +++
 submodule-config.c                 |  5 ++-
 submodule.c                        | 25 +++++++++++-
 submodule.h                        |  8 ++++
 t/README                           | 10 +++++
 t/t7814-grep-recurse-submodules.sh |  3 ++
 11 files changed, 156 insertions(+), 54 deletions(-)

Range-diff against v1:
1:  5994a517e8 = 1:  5994a517e8 submodule: lazily add submodule ODBs as alternates
2:  31e9b914c4 = 2:  31e9b914c4 grep: use submodule-ODB-as-alternate lazy-addition
3:  e5e6a0dc1e ! 3:  aa3f1f3c89 grep: typesafe versions of grep_source_init
    @@ builtin/grep.c: static int grep_file(struct grep_opt *opt, const char *filename)
      	if (num_threads > 1) {
     
      ## grep.c ##
    -@@ grep.c: int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
    +@@ grep.c: int grep_source(struct grep_opt *opt, struct grep_source *gs)
    + 	return grep_source_1(opt, gs, 0);
    + }
    + 
    ++static void grep_source_init_buf(struct grep_source *gs, char *buf,
    ++				 unsigned long size)
    ++{
    ++	gs->type = GREP_SOURCE_BUF;
    ++	gs->name = NULL;
    ++	gs->path = NULL;
    ++	gs->buf = buf;
    ++	gs->size = size;
    ++	gs->driver = NULL;
    ++	gs->identifier = NULL;
    ++}
    ++
    + int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
    + {
      	struct grep_source gs;
      	int r;
      
     -	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
    -+	grep_source_init_buf(&gs);
    - 	gs.buf = buf;
    - 	gs.size = size;
    +-	gs.buf = buf;
    +-	gs.size = size;
    ++	grep_source_init_buf(&gs, buf, size);
    + 
    + 	r = grep_source(opt, &gs);
      
     @@ grep.c: int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
      	return r;
    @@ grep.c: int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
     +	gs->size = 0;
     +	gs->driver = NULL;
     +	gs->identifier = oiddup(oid);
    -+}
    -+
    -+void grep_source_init_buf(struct grep_source *gs)
    -+{
    -+	gs->type = GREP_SOURCE_BUF;
    -+	gs->name = NULL;
    -+	gs->path = NULL;
    -+	gs->buf = NULL;
    -+	gs->size = 0;
    -+	gs->driver = NULL;
    -+	gs->identifier = NULL;
      }
      
      void grep_source_clear(struct grep_source *gs)
    @@ grep.h: struct grep_source {
     +			   const char *path);
     +void grep_source_init_oid(struct grep_source *gs, const char *name,
     +			  const char *path, const struct object_id *oid);
    -+void grep_source_init_buf(struct grep_source *gs);
      void grep_source_clear_data(struct grep_source *gs);
      void grep_source_clear(struct grep_source *gs);
      void grep_source_load_driver(struct grep_source *gs,
4:  30ead880b3 = 4:  050deacfb7 grep: read submodule entry with explicit repo
5:  f62608e88f ! 5:  3f24815224 grep: allocate subrepos on heap
    @@ builtin/grep.c: static void work_done(struct work_item *w)
     +		free(repos_to_free[i]);
     +	}
     +	free(repos_to_free);
    ++	repos_to_free_nr = 0;
    ++	repos_to_free_alloc = 0;
     +}
     +
      static void *run(void *arg)
6:  b2df245587 ! 6:  50c69a988b grep: add repository to OID grep sources
    @@ builtin/grep.c: static int grep_oid(struct grep_opt *opt, const struct object_id
      	strbuf_release(&pathbuf);
      
      	if (num_threads > 1) {
    +@@ builtin/grep.c: static int grep_submodule(struct grep_opt *opt,
    + 	repo_read_gitmodules(subrepo, 0);
    + 
    + 	/*
    +-	 * NEEDSWORK: This adds the submodule's object directory to the list of
    +-	 * alternates for the single in-memory object store.  This has some bad
    +-	 * consequences for memory (processed objects will never be freed) and
    +-	 * performance (this increases the number of pack files git has to pay
    +-	 * attention to, to the sum of the number of pack files in all the
    +-	 * repositories processed so far).  This can be removed once the object
    +-	 * store is no longer global and instead is a member of the repository
    +-	 * object.
    ++	 * All code paths tested by test code no longer need submodule ODBs to
    ++	 * be added as alternates, but add it to the list just in case.
    ++	 * Submodule ODBs added through add_submodule_odb_by_path() will be
    ++	 * lazily registered as alternates when needed (and except in an
    ++	 * unexpected code interaction, it won't be needed).
    + 	 */
    + 	add_submodule_odb_by_path(subrepo->objects->odb->path);
    + 	obj_read_unlock();
     
      ## grep.c ##
     @@ grep.c: void grep_source_init_file(struct grep_source *gs, const char *name,
    @@ grep.c: void grep_source_init_oid(struct grep_source *gs, const char *name,
     +	gs->repo = repo;
      }
      
    - void grep_source_init_buf(struct grep_source *gs)
    + void grep_source_clear(struct grep_source *gs)
     @@ grep.c: static int grep_source_load_oid(struct grep_source *gs)
      {
      	enum object_type type;
    @@ grep.c: static int grep_source_load_oid(struct grep_source *gs)
      			     gs->name,
     
      ## grep.h ##
    +@@ grep.h: struct grep_opt {
    + 	struct grep_pat *header_list;
    + 	struct grep_pat **header_tail;
    + 	struct grep_expr *pattern_expression;
    ++
    ++	/*
    ++	 * NEEDSWORK: See if we can remove this field, because the repository
    ++	 * should probably be per-source, not per-repo. This is potentially the
    ++	 * cause of at least one bug - "git grep" ignoring the textconv
    ++	 * attributes from submodules. See [1] for more information.
    ++	 * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
    ++	 */
    + 	struct repository *repo;
    ++
    + 	const char *prefix;
    + 	int prefix_length;
    + 	regex_t regexp;
     @@ grep.h: struct grep_source {
      		GREP_SOURCE_BUF,
      	} type;
    @@ grep.h: struct grep_source {
     -			  const char *path, const struct object_id *oid);
     +			  const char *path, const struct object_id *oid,
     +			  struct repository *repo);
    - void grep_source_init_buf(struct grep_source *gs);
      void grep_source_clear_data(struct grep_source *gs);
      void grep_source_clear(struct grep_source *gs);
    + void grep_source_load_driver(struct grep_source *gs,
-:  ---------- > 7:  94db10a4e5 submodule-config: pass repo upon blob config read
7:  f1fc89894b = 8:  4a51fcfb77 t7814: show lack of alternate ODB-adding
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v2 1/8] submodule: lazily add submodule ODBs as alternates
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

Teach Git to add submodule ODBs as alternates to the object store of
the_repository only upon the first access of an object not in
the_repository, and not when add_submodule_odb() is called.

This provides a means of gradually migrating from accessing a
submodule's object through alternates to accessing a submodule's object
by explicitly passing its repository object. Any Git command can declare
that it might access submodule objects by calling add_submodule_odb()
(as they do now), but the submodule ODBs themselves will not be added
until needed, so individual commands and/or combinations of arguments
can be migrated one by one.

[The advantage of explicit repository-object passing is code clarity (it
is clear which repository an object read is from), performance (there is
no need to linearly search through all submodule ODBs whenever an object
is accessed from any repository, whether superproject or submodule), and
the possibility of future features like partial clone submodules (which
right now is not possible because if an object is missing, we do not
know which repository to lazy-fetch into).]

This commit also introduces an environment variable that a test may set
to make the actual registration of alternates fatal, in order to
demonstrate that its codepaths do not need this registration.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-file.c |  5 +++++
 submodule.c   | 20 +++++++++++++++++++-
 submodule.h   |  7 +++++++
 t/README      | 10 ++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 3d27dc8dea..621b121bcb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -32,6 +32,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "submodule.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1592,6 +1593,10 @@ static int do_oid_object_info_extended(struct repository *r,
 				break;
 		}
 
+		if (register_all_submodule_odb_as_alternates())
+			/* We added some alternates; retry */
+			continue;
+
 		/* Check if it is a missing object */
 		if (fetch_if_missing && repo_has_promisor_remote(r) &&
 		    !already_retried &&
diff --git a/submodule.c b/submodule.c
index 8e611fe1db..8fde90e906 100644
--- a/submodule.c
+++ b/submodule.c
@@ -165,6 +165,8 @@ void stage_updated_gitmodules(struct index_state *istate)
 		die(_("staging updated .gitmodules failed"));
 }
 
+static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
+
 /* TODO: remove this function, use repo_submodule_init instead. */
 int add_submodule_odb(const char *path)
 {
@@ -178,12 +180,28 @@ int add_submodule_odb(const char *path)
 		ret = -1;
 		goto done;
 	}
-	add_to_alternates_memory(objects_directory.buf);
+	string_list_insert(&added_submodule_odb_paths,
+			   strbuf_detach(&objects_directory, NULL));
 done:
 	strbuf_release(&objects_directory);
 	return ret;
 }
 
+int register_all_submodule_odb_as_alternates(void)
+{
+	int i;
+	int ret = added_submodule_odb_paths.nr;
+
+	for (i = 0; i < added_submodule_odb_paths.nr; i++)
+		add_to_alternates_memory(added_submodule_odb_paths.items[i].string);
+	if (ret) {
+		string_list_clear(&added_submodule_odb_paths, 0);
+		if (git_env_bool("GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB", 0))
+			BUG("register_all_submodule_odb_as_alternates() called");
+	}
+	return ret;
+}
+
 void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 					     const char *path)
 {
diff --git a/submodule.h b/submodule.h
index 84640c49c1..c252784bc2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -97,7 +97,14 @@ int submodule_uses_gitfile(const char *path);
 #define SUBMODULE_REMOVAL_IGNORE_IGNORED_UNTRACKED (1<<2)
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
+/*
+ * Call add_submodule_odb() to add the submodule at the given path to a list.
+ * When register_all_submodule_odb_as_alternates() is called, the object stores
+ * of all submodules in that list will be added as alternates in
+ * the_repository.
+ */
 int add_submodule_odb(const char *path);
+int register_all_submodule_odb_as_alternates(void);
 
 /*
  * Checks if there are submodule changes in a..b. If a is the null OID,
diff --git a/t/README b/t/README
index 9e70122302..8b67b4f00b 100644
--- a/t/README
+++ b/t/README
@@ -448,6 +448,16 @@ GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
 to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
 execution of the parallel-checkout code.
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=<boolean>, when true, makes
+registering submodule ODBs as alternates a fatal action. Support for
+this environment variable can be removed once the migration to
+explicitly providing repositories when accessing submodule objects is
+complete (in which case we might want to replace this with a trace2
+call so that users can make it visible if accessing submodule objects
+without an explicit repository still happens) or needs to be abandoned
+for whatever reason (in which case the migrated codepaths still retain
+their performance benefits).
+
 Naming Tests
 ------------
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 2 +-
 submodule.c    | 5 +++++
 submodule.h    | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7d2f8e5adb..87bcb934a2 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -450,7 +450,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_to_alternates_memory(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo.objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
diff --git a/submodule.c b/submodule.c
index 8fde90e906..8de1aecaeb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -187,6 +187,11 @@ int add_submodule_odb(const char *path)
 	return ret;
 }
 
+void add_submodule_odb_by_path(const char *path)
+{
+	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
+}
+
 int register_all_submodule_odb_as_alternates(void)
 {
 	int i;
diff --git a/submodule.h b/submodule.h
index c252784bc2..17a06cc43b 100644
--- a/submodule.h
+++ b/submodule.h
@@ -104,6 +104,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags);
  * the_repository.
  */
 int add_submodule_odb(const char *path);
+void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
 /*
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 3/8] grep: typesafe versions of grep_source_init
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-16 15:06     ` Matheus Tavares Bernardino
  2021-08-13 21:05   ` [PATCH v2 4/8] grep: read submodule entry with explicit repo Jonathan Tan
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c |  4 ++--
 grep.c         | 46 ++++++++++++++++++++++++++++------------------
 grep.h         |  7 ++++---
 3 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 87bcb934a2..e454335e9d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -333,7 +333,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init(&gs, GREP_SOURCE_OID, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
@@ -359,7 +359,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, 0, &buf);
-	grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
+	grep_source_init_file(&gs, buf.buf, filename);
 	strbuf_release(&buf);
 
 	if (num_threads > 1) {
diff --git a/grep.c b/grep.c
index 424a39591b..8a8105c2eb 100644
--- a/grep.c
+++ b/grep.c
@@ -1825,14 +1825,24 @@ int grep_source(struct grep_opt *opt, struct grep_source *gs)
 	return grep_source_1(opt, gs, 0);
 }
 
+static void grep_source_init_buf(struct grep_source *gs, char *buf,
+				 unsigned long size)
+{
+	gs->type = GREP_SOURCE_BUF;
+	gs->name = NULL;
+	gs->path = NULL;
+	gs->buf = buf;
+	gs->size = size;
+	gs->driver = NULL;
+	gs->identifier = NULL;
+}
+
 int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 {
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
-	gs.buf = buf;
-	gs.size = size;
+	grep_source_init_buf(&gs, buf, size);
 
 	r = grep_source(opt, &gs);
 
@@ -1840,28 +1850,28 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	return r;
 }
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier)
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path)
 {
-	gs->type = type;
+	gs->type = GREP_SOURCE_FILE;
 	gs->name = xstrdup_or_null(name);
 	gs->path = xstrdup_or_null(path);
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
+	gs->identifier = xstrdup(path);
+}
 
-	switch (type) {
-	case GREP_SOURCE_FILE:
-		gs->identifier = xstrdup(identifier);
-		break;
-	case GREP_SOURCE_OID:
-		gs->identifier = oiddup(identifier);
-		break;
-	case GREP_SOURCE_BUF:
-		gs->identifier = NULL;
-		break;
-	}
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid)
+{
+	gs->type = GREP_SOURCE_OID;
+	gs->name = xstrdup_or_null(name);
+	gs->path = xstrdup_or_null(path);
+	gs->buf = NULL;
+	gs->size = 0;
+	gs->driver = NULL;
+	gs->identifier = oiddup(oid);
 }
 
 void grep_source_clear(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 72f82b1e30..480b3f5bba 100644
--- a/grep.h
+++ b/grep.h
@@ -195,9 +195,10 @@ struct grep_source {
 	struct userdiff_driver *driver;
 };
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier);
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path);
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 4/8] grep: read submodule entry with explicit repo
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

Replace an existing parse_object_or_die() call (which implicitly works
on the_repository) with a function call that allows a repository to be
passed in. There is no such direct equivalent to parse_object_or_die(),
but we only need the type of the object, so replace with
oid_object_info().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index e454335e9d..9e61c7c993 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -457,27 +457,27 @@ static int grep_submodule(struct grep_opt *opt,
 	subopt.repo = &subrepo;
 
 	if (oid) {
-		struct object *object;
+		enum object_type object_type;
 		struct tree_desc tree;
 		void *data;
 		unsigned long size;
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object = parse_object_or_die(oid, NULL);
+		object_type = oid_object_info(&subrepo, oid, NULL);
 		obj_read_unlock();
 		data = read_object_with_reference(&subrepo,
-						  &object->oid, tree_type,
+						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
-			die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
+			die(_("unable to read tree (%s)"), oid_to_hex(oid));
 
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
-				object->type == OBJ_COMMIT);
+				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
 		free(data);
 	} else {
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 5/8] grep: allocate subrepos on heap
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 4/8] grep: read submodule entry with explicit repo Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-13 21:44     ` Junio C Hamano
  2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

Currently, struct repository objects corresponding to submodules are
allocated on the stack in grep_submodule(). This currently works because
they will not be used once grep_submodule() exits, but a subsequent
patch will require these structs to be accessible for longer (perhaps
even in another thread). Allocate them on the heap and clear them only
at the very end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9e61c7c993..69d8ea0808 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -65,6 +65,9 @@ static int todo_done;
 /* Has all work items been added? */
 static int all_work_added;
 
+static struct repository **repos_to_free;
+static size_t repos_to_free_nr, repos_to_free_alloc;
+
 /* This lock protects all the variables above. */
 static pthread_mutex_t grep_mutex;
 
@@ -168,6 +171,19 @@ static void work_done(struct work_item *w)
 	grep_unlock();
 }
 
+static void free_repos(void)
+{
+	int i;
+
+	for (i = 0; i < repos_to_free_nr; i++) {
+		repo_clear(repos_to_free[i]);
+		free(repos_to_free[i]);
+	}
+	free(repos_to_free);
+	repos_to_free_nr = 0;
+	repos_to_free_alloc = 0;
+}
+
 static void *run(void *arg)
 {
 	int hit = 0;
@@ -415,19 +431,24 @@ static int grep_submodule(struct grep_opt *opt,
 			  const struct object_id *oid,
 			  const char *filename, const char *path, int cached)
 {
-	struct repository subrepo;
+	struct repository *subrepo;
 	struct repository *superproject = opt->repo;
 	const struct submodule *sub;
 	struct grep_opt subopt;
-	int hit;
+	int hit = 0;
 
 	sub = submodule_from_path(superproject, null_oid(), path);
 
 	if (!is_submodule_active(superproject, path))
 		return 0;
 
-	if (repo_submodule_init(&subrepo, superproject, sub))
+	subrepo = xmalloc(sizeof(*subrepo));
+	if (repo_submodule_init(subrepo, superproject, sub)) {
+		free(subrepo);
 		return 0;
+	}
+	ALLOC_GROW(repos_to_free, repos_to_free_nr + 1, repos_to_free_alloc);
+	repos_to_free[repos_to_free_nr++] = subrepo;
 
 	/*
 	 * NEEDSWORK: repo_read_gitmodules() might call
@@ -438,7 +459,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * subrepo's odbs to the in-memory alternates list.
 	 */
 	obj_read_lock();
-	repo_read_gitmodules(&subrepo, 0);
+	repo_read_gitmodules(subrepo, 0);
 
 	/*
 	 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -450,11 +471,11 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_submodule_odb_by_path(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo->objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
-	subopt.repo = &subrepo;
+	subopt.repo = subrepo;
 
 	if (oid) {
 		enum object_type object_type;
@@ -464,9 +485,9 @@ static int grep_submodule(struct grep_opt *opt,
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object_type = oid_object_info(&subrepo, oid, NULL);
+		object_type = oid_object_info(subrepo, oid, NULL);
 		obj_read_unlock();
-		data = read_object_with_reference(&subrepo,
+		data = read_object_with_reference(subrepo,
 						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
@@ -484,7 +505,6 @@ static int grep_submodule(struct grep_opt *opt,
 		hit = grep_cache(&subopt, pathspec, cached);
 	}
 
-	repo_clear(&subrepo);
 	return hit;
 }
 
@@ -1182,5 +1202,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		run_pager(&opt, prefix);
 	clear_pathspec(&pathspec);
 	free_grep_patterns(&opt);
+	free_repos();
 	return !hit;
 }
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 6/8] grep: add repository to OID grep sources
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-16 14:48     ` Matheus Tavares Bernardino
  2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 15 ++++++---------
 grep.c         |  7 +++++--
 grep.h         | 13 ++++++++++++-
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 69d8ea0808..d27e95e092 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -349,7 +349,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
@@ -462,14 +462,11 @@ static int grep_submodule(struct grep_opt *opt,
 	repo_read_gitmodules(subrepo, 0);
 
 	/*
-	 * NEEDSWORK: This adds the submodule's object directory to the list of
-	 * alternates for the single in-memory object store.  This has some bad
-	 * consequences for memory (processed objects will never be freed) and
-	 * performance (this increases the number of pack files git has to pay
-	 * attention to, to the sum of the number of pack files in all the
-	 * repositories processed so far).  This can be removed once the object
-	 * store is no longer global and instead is a member of the repository
-	 * object.
+	 * All code paths tested by test code no longer need submodule ODBs to
+	 * be added as alternates, but add it to the list just in case.
+	 * Submodule ODBs added through add_submodule_odb_by_path() will be
+	 * lazily registered as alternates when needed (and except in an
+	 * unexpected code interaction, it won't be needed).
 	 */
 	add_submodule_odb_by_path(subrepo->objects->odb->path);
 	obj_read_unlock();
diff --git a/grep.c b/grep.c
index 8a8105c2eb..79598f245f 100644
--- a/grep.c
+++ b/grep.c
@@ -1863,7 +1863,8 @@ void grep_source_init_file(struct grep_source *gs, const char *name,
 }
 
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid)
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo)
 {
 	gs->type = GREP_SOURCE_OID;
 	gs->name = xstrdup_or_null(name);
@@ -1872,6 +1873,7 @@ void grep_source_init_oid(struct grep_source *gs, const char *name,
 	gs->size = 0;
 	gs->driver = NULL;
 	gs->identifier = oiddup(oid);
+	gs->repo = repo;
 }
 
 void grep_source_clear(struct grep_source *gs)
@@ -1900,7 +1902,8 @@ static int grep_source_load_oid(struct grep_source *gs)
 {
 	enum object_type type;
 
-	gs->buf = read_object_file(gs->identifier, &type, &gs->size);
+	gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type,
+					&gs->size);
 	if (!gs->buf)
 		return error(_("'%s': unable to read %s"),
 			     gs->name,
diff --git a/grep.h b/grep.h
index 480b3f5bba..24c6b64cea 100644
--- a/grep.h
+++ b/grep.h
@@ -120,7 +120,16 @@ struct grep_opt {
 	struct grep_pat *header_list;
 	struct grep_pat **header_tail;
 	struct grep_expr *pattern_expression;
+
+	/*
+	 * NEEDSWORK: See if we can remove this field, because the repository
+	 * should probably be per-source, not per-repo. This is potentially the
+	 * cause of at least one bug - "git grep" ignoring the textconv
+	 * attributes from submodules. See [1] for more information.
+	 * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
+	 */
 	struct repository *repo;
+
 	const char *prefix;
 	int prefix_length;
 	regex_t regexp;
@@ -187,6 +196,7 @@ struct grep_source {
 		GREP_SOURCE_BUF,
 	} type;
 	void *identifier;
+	struct repository *repo; /* if GREP_SOURCE_OID */
 
 	char *buf;
 	unsigned long size;
@@ -198,7 +208,8 @@ struct grep_source {
 void grep_source_init_file(struct grep_source *gs, const char *name,
 			   const char *path);
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid);
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (5 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-16 14:32     ` Matheus Tavares Bernardino
  2021-08-16 15:48     ` Matheus Tavares Bernardino
  2021-08-13 21:05   ` [PATCH v2 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
  2021-08-16 15:14   ` [PATCH v2 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
  8 siblings, 2 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c           | 18 ++++++++++++------
 config.h           |  3 +++
 submodule-config.c |  5 +++--
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/config.c b/config.c
index f33abeab85..a85c12e6cc 100644
--- a/config.c
+++ b/config.c
@@ -1796,6 +1796,7 @@ int git_config_from_mem(config_fn_t fn,
 
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
+			      struct repository *repo,
 			      const struct object_id *oid,
 			      void *data)
 {
@@ -1804,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	unsigned long size;
 	int ret;
 
-	buf = read_object_file(oid, &type, &size);
+	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
 		return error(_("unable to load config blob object '%s'"), name);
 	if (type != OBJ_BLOB) {
@@ -1820,6 +1821,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 }
 
 static int git_config_from_blob_ref(config_fn_t fn,
+				    struct repository *repo,
 				    const char *name,
 				    void *data)
 {
@@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
 
 	if (get_oid(name, &oid) < 0)
 		return error(_("unable to resolve config blob '%s'"), name);
-	return git_config_from_blob_oid(fn, name, &oid, data);
+	return git_config_from_blob_oid(fn, name, repo, &oid, data);
 }
 
 char *git_system_config(void)
@@ -1958,12 +1960,16 @@ int config_with_options(config_fn_t fn, void *data,
 	 * If we have a specific filename, use it. Otherwise, follow the
 	 * regular lookup sequence.
 	 */
-	if (config_source && config_source->use_stdin)
+	if (config_source && config_source->use_stdin) {
 		return git_config_from_stdin(fn, data);
-	else if (config_source && config_source->file)
+	} else if (config_source && config_source->file) {
 		return git_config_from_file(fn, config_source->file, data);
-	else if (config_source && config_source->blob)
-		return git_config_from_blob_ref(fn, config_source->blob, data);
+	} else if (config_source && config_source->blob) {
+		struct repository *repo = config_source->repo ?
+			config_source->repo : the_repository;
+		return git_config_from_blob_ref(fn, repo, config_source->blob,
+						data);
+	}
 
 	return do_git_config_sequence(opts, fn, data);
 }
diff --git a/config.h b/config.h
index a2200f3111..147f5e0490 100644
--- a/config.h
+++ b/config.h
@@ -49,6 +49,8 @@ const char *config_scope_name(enum config_scope scope);
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
+	/* The repository if blob is not NULL; leave blank for the_repository */
+	struct repository *repo;
 	const char *blob;
 	enum config_scope scope;
 };
@@ -136,6 +138,7 @@ int git_config_from_mem(config_fn_t fn,
 			const char *buf, size_t len,
 			void *data, const struct config_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
+			     struct repository *repo,
 			     const struct object_id *oid, void *data);
 void git_config_push_parameter(const char *text);
 void git_config_push_env(const char *spec);
diff --git a/submodule-config.c b/submodule-config.c
index 2026120fb3..f95344028b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 			config_source.file = file;
 		} else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
 			   repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
+			config_source.repo = repo;
 			config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
 			if (repo != the_repository)
-				add_to_alternates_memory(repo->objects->odb->path);
+				add_submodule_odb_by_path(repo->objects->odb->path);
 		} else {
 			goto out;
 		}
@@ -702,7 +703,7 @@ void gitmodules_config_oid(const struct object_id *commit_oid)
 
 	if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) {
 		git_config_from_blob_oid(gitmodules_cb, rev.buf,
-					 &oid, the_repository);
+					 the_repository, &oid, the_repository);
 	}
 	strbuf_release(&rev);
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 8/8] t7814: show lack of alternate ODB-adding
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (6 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
@ 2021-08-13 21:05   ` Jonathan Tan
  2021-08-16 15:14   ` [PATCH v2 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
  8 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-13 21:05 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, matheus.bernardino, emilyshaffer, gitster, ramsay,
	steadmon

The previous patches have made "git grep" no longer need to add
submodule ODBs as alternates, at least for the code paths tested in
t7814. Demonstrate this by making adding a submodule ODB as an alternate
fatal in this test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 t/t7814-grep-recurse-submodules.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 828cb3ba58..3172f5b936 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -8,6 +8,9 @@ submodules.
 
 . ./test-lib.sh
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
+export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
+
 test_expect_success 'setup directory structure and submodule' '
 	echo "(1|2)d(3|4)" >a &&
 	mkdir b &&
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 5/8] grep: allocate subrepos on heap
  2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
@ 2021-08-13 21:44     ` Junio C Hamano
  2021-08-16 19:42       ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Junio C Hamano @ 2021-08-13 21:44 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matheus.bernardino, emilyshaffer, ramsay, steadmon

Jonathan Tan <jonathantanmy@google.com> writes:

> +static void free_repos(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < repos_to_free_nr; i++) {
> +		repo_clear(repos_to_free[i]);
> +		free(repos_to_free[i]);
> +	}
> +	free(repos_to_free);
> +	repos_to_free_nr = 0;
> +	repos_to_free_alloc = 0;

The clearing of nr/alloc is new in this round.

It does not matter if we won't using anything that allocates
repositories and accumulates them in repos_to_free after we call
free_repos() once, but then clearing the nr/alloc would not matter,
either, so it may be more consistent to FREE_AND_NULL(repos_to_free)
here, not just free(), to prepare for another call to ALLOC_GROW()
on the <repos_to_free, repos_to_free_nr, repos_to_free_alloc> tuple,
which eventually will call into REALLOC_ARRAY() on the pointer, I
would think.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
@ 2021-08-16 14:32     ` Matheus Tavares Bernardino
  2021-08-16 19:57       ` Matheus Tavares Bernardino
  2021-08-16 20:02       ` Jonathan Tan
  2021-08-16 15:48     ` Matheus Tavares Bernardino
  1 sibling, 2 replies; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 14:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, steadmon

On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When reading the config of a submodule, if reading from a blob, read
> using an explicitly specified repository instead of by adding the
> submodule's ODB as an alternate and then reading an object from
> the_repository.

Great!

At first, I thought this would also allow us to remove another
NEEDSWORK comment in grep_submodule(), together with a lock
protection:

/*
 * NEEDSWORK: repo_read_gitmodules() might call
 * add_to_alternates_memory() via config_from_gitmodules(). This
 * operation causes a race condition with concurrent object readings
 * performed by the worker threads. That's why we need obj_read_lock()
 * here. It should be removed once it's no longer necessary to add the
 * subrepo's odbs to the in-memory alternates list.
 */
obj_read_lock();
repo_read_gitmodules(subrepo, 0);

Back when I wrote this comment, my conclusion was that the alternates
mechanics were the only thread-unsafe object-reading operations in
repo_read_gitmodules()'s call chains. So once the add-to-alternates
mechanics were gone, we could also remove the lock.

But with further inspection now, I see that this is not really the
case. For example, we have a few global variables in packfile.c
collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
which are updated on obj readings from both the_repository *and*
submodules. So I no longer think its safe to remove the
obj_read_lock() protection here, as the NEEDSWORK comment suggests,
even if we are not using the alternates list anymore.

Do you want to remove this comment in your patchset? I can also send a
follow-up patch explaining this situation and removing the comment
(but not the locking), if you prefer.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
[...]
> diff --git a/submodule-config.c b/submodule-config.c
> index 2026120fb3..f95344028b 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
>                         config_source.file = file;
>                 } else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
>                            repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
> +                       config_source.repo = repo;
>                         config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
>                         if (repo != the_repository)
> -                               add_to_alternates_memory(repo->objects->odb->path);
> +                               add_submodule_odb_by_path(repo->objects->odb->path);

Ok. Like in grep_submodule(), this should no longer add the submodule
ODB to the alternates list, so this call is now mostly used as a
fallback and also for testing.

To see if we are indeed testing this add-to-alternates case, I
reverted the change that made the code read from the submodule instead
of the_repository:

diff --git a/config.c b/config.c
index a85c12e6cc..cd37a9dcd9 100644
--- a/config.c
+++ b/config.c
@@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
        unsigned long size;
        int ret;

-       buf = repo_read_object_file(repo, oid, &type, &size);
+       buf = read_object_file(oid, &type, &size);

Then, I ran t7814-grep-recurse-submodules.sh , where you've added the
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly
produced the following error:

BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called
[...]
not ok 23 - grep --recurse-submodules with submodules without
.gitmodules in the working tree

Nice! So the change made by this patch is covered by test 23. I think
it would be nice to mention that in this patch's message.

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 6/8] grep: add repository to OID grep sources
  2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
@ 2021-08-16 14:48     ` Matheus Tavares Bernardino
  2021-08-16 19:44       ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 14:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, steadmon

On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/grep.c | 15 ++++++---------
>  grep.c         |  7 +++++--
>  grep.h         | 13 ++++++++++++-
>  3 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 69d8ea0808..d27e95e092 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -349,7 +349,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
>         struct grep_source gs;
>
>         grep_source_name(opt, filename, tree_name_len, &pathbuf);
> -       grep_source_init_oid(&gs, pathbuf.buf, path, oid);
> +       grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo);
>         strbuf_release(&pathbuf);
>
>         if (num_threads > 1) {
> @@ -462,14 +462,11 @@ static int grep_submodule(struct grep_opt *opt,
>         repo_read_gitmodules(subrepo, 0);
>
>         /*
> -        * NEEDSWORK: This adds the submodule's object directory to the list of
> -        * alternates for the single in-memory object store.  This has some bad
> -        * consequences for memory (processed objects will never be freed) and
> -        * performance (this increases the number of pack files git has to pay
> -        * attention to, to the sum of the number of pack files in all the
> -        * repositories processed so far).  This can be removed once the object
> -        * store is no longer global and instead is a member of the repository
> -        * object.
> +        * All code paths tested by test code no longer need submodule ODBs to
> +        * be added as alternates, but add it to the list just in case.
> +        * Submodule ODBs added through add_submodule_odb_by_path() will be
> +        * lazily registered as alternates when needed (and except in an
> +        * unexpected code interaction, it won't be needed).

Nice.

>          */
>         add_submodule_odb_by_path(subrepo->objects->odb->path);
>         obj_read_unlock();
> diff --git a/grep.c b/grep.c
> index 8a8105c2eb..79598f245f 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1863,7 +1863,8 @@ void grep_source_init_file(struct grep_source *gs, const char *name,
>  }
>
>  void grep_source_init_oid(struct grep_source *gs, const char *name,
> -                         const char *path, const struct object_id *oid)
> +                         const char *path, const struct object_id *oid,
> +                         struct repository *repo)
>  {
>         gs->type = GREP_SOURCE_OID;
>         gs->name = xstrdup_or_null(name);
> @@ -1872,6 +1873,7 @@ void grep_source_init_oid(struct grep_source *gs, const char *name,
>         gs->size = 0;
>         gs->driver = NULL;
>         gs->identifier = oiddup(oid);
> +       gs->repo = repo;
>  }
>
>  void grep_source_clear(struct grep_source *gs)
> @@ -1900,7 +1902,8 @@ static int grep_source_load_oid(struct grep_source *gs)
>  {
>         enum object_type type;
>
> -       gs->buf = read_object_file(gs->identifier, &type, &gs->size);
> +       gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type,
> +                                       &gs->size);
>         if (!gs->buf)
>                 return error(_("'%s': unable to read %s"),
>                              gs->name,
> diff --git a/grep.h b/grep.h
> index 480b3f5bba..24c6b64cea 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -120,7 +120,16 @@ struct grep_opt {
>         struct grep_pat *header_list;
>         struct grep_pat **header_tail;
>         struct grep_expr *pattern_expression;
> +
> +       /*
> +        * NEEDSWORK: See if we can remove this field, because the repository
> +        * should probably be per-source, not per-repo.

Hmm, I think the "not per-repo" part is a bit confusing, as it refers
to "the repository" ("the repository should not be per-repo"?) Could
we remove that part?

Maybe we could also be a bit more specific regarding the suggested
conversion:  "See if we can remove this field, because the repository
should probably be per-source. That is, grep.c functions using
`grep_opt.repo` should probably start using `grep_source.repo`
instead." (But that's nitpicking from my part, feel free to ignore
it.)

>             [...] This is potentially the
> +        * cause of at least one bug - "git grep" ignoring the textconv
> +        * attributes from submodules. See [1] for more information.
> +        * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
> +        */
>         struct repository *repo;
> +
>         const char *prefix;
>         int prefix_length;
>         regex_t regexp;
> @@ -187,6 +196,7 @@ struct grep_source {
>                 GREP_SOURCE_BUF,
>         } type;
>         void *identifier;
> +       struct repository *repo; /* if GREP_SOURCE_OID */
>
>         char *buf;
>         unsigned long size;
> @@ -198,7 +208,8 @@ struct grep_source {
>  void grep_source_init_file(struct grep_source *gs, const char *name,
>                            const char *path);
>  void grep_source_init_oid(struct grep_source *gs, const char *name,
> -                         const char *path, const struct object_id *oid);
> +                         const char *path, const struct object_id *oid,
> +                         struct repository *repo);
>  void grep_source_clear_data(struct grep_source *gs);
>  void grep_source_clear(struct grep_source *gs);
>  void grep_source_load_driver(struct grep_source *gs,
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 3/8] grep: typesafe versions of grep_source_init
  2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
@ 2021-08-16 15:06     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 15:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, steadmon

On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> diff --git a/grep.c b/grep.c
> index 424a39591b..8a8105c2eb 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1840,28 +1850,28 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
>         return r;
>  }
>
> -void grep_source_init(struct grep_source *gs, enum grep_source_type type,
> -                     const char *name, const char *path,
> -                     const void *identifier)
> +void grep_source_init_file(struct grep_source *gs, const char *name,
> +                          const char *path)
>  {
> -       gs->type = type;
> +       gs->type = GREP_SOURCE_FILE;
>         gs->name = xstrdup_or_null(name);
>         gs->path = xstrdup_or_null(path);
>         gs->buf = NULL;
>         gs->size = 0;
>         gs->driver = NULL;
> +       gs->identifier = xstrdup(path);

At first, it seemed a bit odd to me that we use
`xstrdup_or_null(path)` first but `xtsrdup(path)` later. But, I saw
that we already did this before this patch, and that the second call
is fine as `path` is required to be non-NULL in this case (i.e.
GREP_SOURCE_FILE), because we need it for grep_source_load_file()
later. (Also, the only caller is correctly passing a non-NULL buffer,
so all is good.)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 0/8] In grep, no adding submodule ODB as alternates
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
                     ` (7 preceding siblings ...)
  2021-08-13 21:05   ` [PATCH v2 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
@ 2021-08-16 15:14   ` Matheus Tavares Bernardino
  8 siblings, 0 replies; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 15:14 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, steadmon

On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks everyone for your reviews. I believe I've addressed all of them.
>
> The main change is that I've also made a change in submodule-config to
> avoid registering submodule ODBs as alternates (thanks, Matheus, for
> noticing this).

Thanks for working on this! I reviewed the entire series and left a
few comments on some small things, but overall it looks good to me.

Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
  2021-08-16 14:32     ` Matheus Tavares Bernardino
@ 2021-08-16 15:48     ` Matheus Tavares Bernardino
  2021-08-16 20:09       ` Jonathan Tan
  1 sibling, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 15:48 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, Josh Steadmon

On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>

Oops, I accidentally deleted this part in my previous reply:

> diff --git a/config.c b/config.c
> index f33abeab85..a85c12e6cc 100644
> --- a/config.c
> +++ b/config.c
> @@ -1820,6 +1821,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>  }
>
>  static int git_config_from_blob_ref(config_fn_t fn,
> +                                   struct repository *repo,
>                                     const char *name,
>                                     void *data)
>  {
> @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
>
>         if (get_oid(name, &oid) < 0)

This should be `repo_get_oid(repo, ...)` now.

>                 return error(_("unable to resolve config blob '%s'"), name);
> -       return git_config_from_blob_oid(fn, name, &oid, data);
> +       return git_config_from_blob_oid(fn, name, repo, &oid, data);
>  }

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 5/8] grep: allocate subrepos on heap
  2021-08-13 21:44     ` Junio C Hamano
@ 2021-08-16 19:42       ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 19:42 UTC (permalink / raw)
  To: gitster
  Cc: jonathantanmy, git, matheus.bernardino, emilyshaffer, ramsay, steadmon

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +static void free_repos(void)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < repos_to_free_nr; i++) {
> > +		repo_clear(repos_to_free[i]);
> > +		free(repos_to_free[i]);
> > +	}
> > +	free(repos_to_free);
> > +	repos_to_free_nr = 0;
> > +	repos_to_free_alloc = 0;
> 
> The clearing of nr/alloc is new in this round.
> 
> It does not matter if we won't using anything that allocates
> repositories and accumulates them in repos_to_free after we call
> free_repos() once, but then clearing the nr/alloc would not matter,
> either, so it may be more consistent to FREE_AND_NULL(repos_to_free)
> here, not just free(), to prepare for another call to ALLOC_GROW()
> on the <repos_to_free, repos_to_free_nr, repos_to_free_alloc> tuple,
> which eventually will call into REALLOC_ARRAY() on the pointer, I
> would think.

Yes, FREE_AND_NULL is more consistent. I'll change it.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 6/8] grep: add repository to OID grep sources
  2021-08-16 14:48     ` Matheus Tavares Bernardino
@ 2021-08-16 19:44       ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 19:44 UTC (permalink / raw)
  To: matheus.bernardino
  Cc: jonathantanmy, git, emilyshaffer, gitster, ramsay, steadmon

> > @@ -120,7 +120,16 @@ struct grep_opt {
> >         struct grep_pat *header_list;
> >         struct grep_pat **header_tail;
> >         struct grep_expr *pattern_expression;
> > +
> > +       /*
> > +        * NEEDSWORK: See if we can remove this field, because the repository
> > +        * should probably be per-source, not per-repo.
> 
> Hmm, I think the "not per-repo" part is a bit confusing, as it refers
> to "the repository" ("the repository should not be per-repo"?) Could
> we remove that part?
> 
> Maybe we could also be a bit more specific regarding the suggested
> conversion:  "See if we can remove this field, because the repository
> should probably be per-source. That is, grep.c functions using
> `grep_opt.repo` should probably start using `grep_source.repo`
> instead." (But that's nitpicking from my part, feel free to ignore
> it.)

Thanks - your suggestion is much clearer, so I'll use it.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-16 14:32     ` Matheus Tavares Bernardino
@ 2021-08-16 19:57       ` Matheus Tavares Bernardino
  2021-08-16 20:02       ` Jonathan Tan
  1 sibling, 0 replies; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-16 19:57 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Emily Shaffer, Junio C Hamano, Ramsay Jones, Josh Steadmon

On Mon, Aug 16, 2021 at 11:32 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > When reading the config of a submodule, if reading from a blob, read
> > using an explicitly specified repository instead of by adding the
> > submodule's ODB as an alternate and then reading an object from
> > the_repository.
>
> Great!
>
> At first, I thought this would also allow us to remove another
> NEEDSWORK comment in grep_submodule(), together with a lock
> protection:
>
> /*
>  * NEEDSWORK: repo_read_gitmodules() might call
>  * add_to_alternates_memory() via config_from_gitmodules(). This
>  * operation causes a race condition with concurrent object readings
>  * performed by the worker threads. That's why we need obj_read_lock()
>  * here. It should be removed once it's no longer necessary to add the
>  * subrepo's odbs to the in-memory alternates list.
>  */
> obj_read_lock();
> repo_read_gitmodules(subrepo, 0);
>
> Back when I wrote this comment, my conclusion was that the alternates
> mechanics were the only thread-unsafe object-reading operations in
> repo_read_gitmodules()'s call chains. So once the add-to-alternates
> mechanics were gone, we could also remove the lock.
>
> But with further inspection now, I see that this is not really the
> case. For example, we have a few global variables in packfile.c
> collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
> which are updated on obj readings from both the_repository *and*
> submodules.

Sorry, this is incorrect. I forgot that repo_read_object_file() (which
is part of repo_read_gitmodules()'s call chain) also acquires the
obj_read_mutex before accessing those global variables. So the
NEEDSWORK might be right.

Nevertheless, I think it might be better to look into
repo_read_gitmodules() more carefully before removing this lock. And
this is something for another series. Sorry about the noise.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-16 14:32     ` Matheus Tavares Bernardino
  2021-08-16 19:57       ` Matheus Tavares Bernardino
@ 2021-08-16 20:02       ` Jonathan Tan
  1 sibling, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 20:02 UTC (permalink / raw)
  To: matheus.bernardino
  Cc: jonathantanmy, git, emilyshaffer, gitster, ramsay, steadmon

> /*
>  * NEEDSWORK: repo_read_gitmodules() might call
>  * add_to_alternates_memory() via config_from_gitmodules(). This
>  * operation causes a race condition with concurrent object readings
>  * performed by the worker threads. That's why we need obj_read_lock()
>  * here. It should be removed once it's no longer necessary to add the
>  * subrepo's odbs to the in-memory alternates list.
>  */
> obj_read_lock();
> repo_read_gitmodules(subrepo, 0);
> 
> Back when I wrote this comment, my conclusion was that the alternates
> mechanics were the only thread-unsafe object-reading operations in
> repo_read_gitmodules()'s call chains. So once the add-to-alternates
> mechanics were gone, we could also remove the lock.
> 
> But with further inspection now, I see that this is not really the
> case. For example, we have a few global variables in packfile.c
> collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
> which are updated on obj readings from both the_repository *and*
> submodules. So I no longer think its safe to remove the
> obj_read_lock() protection here, as the NEEDSWORK comment suggests,
> even if we are not using the alternates list anymore.
> 
> Do you want to remove this comment in your patchset? I can also send a
> follow-up patch explaining this situation and removing the comment
> (but not the locking), if you prefer.

I think you can make the patch yourself - the comment change you
describe seems unrelated to this patch set.

> Ok. Like in grep_submodule(), this should no longer add the submodule
> ODB to the alternates list, so this call is now mostly used as a
> fallback and also for testing.
> 
> To see if we are indeed testing this add-to-alternates case, I
> reverted the change that made the code read from the submodule instead
> of the_repository:
> 
> diff --git a/config.c b/config.c
> index a85c12e6cc..cd37a9dcd9 100644
> --- a/config.c
> +++ b/config.c
> @@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>         unsigned long size;
>         int ret;
> 
> -       buf = repo_read_object_file(repo, oid, &type, &size);
> +       buf = read_object_file(oid, &type, &size);
> 
> Then, I ran t7814-grep-recurse-submodules.sh , where you've added the
> GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly
> produced the following error:
> 
> BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called
> [...]
> not ok 23 - grep --recurse-submodules with submodules without
> .gitmodules in the working tree
> 
> Nice! So the change made by this patch is covered by test 23. I think
> it would be nice to mention that in this patch's message.

Thanks for checking this. I'll mention this in the commit message.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-16 15:48     ` Matheus Tavares Bernardino
@ 2021-08-16 20:09       ` Jonathan Tan
  2021-08-16 20:57         ` Jonathan Tan
  0 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 20:09 UTC (permalink / raw)
  To: matheus.bernardino
  Cc: jonathantanmy, git, emilyshaffer, gitster, ramsay, steadmon

> > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
> >
> >         if (get_oid(name, &oid) < 0)
> 
> This should be `repo_get_oid(repo, ...)` now.

Ah, good catch! This wasn't caught by the tests because the submodule
config mechanism always passes a full-length hexadecimal string hash as
"name" - and probably would never be caught because
git_config_from_blob_ref() is only called from config_with_options(),
which is called with non-NULL source from only 2 files:
submodule-config.c (this one) and builtin/config.c (which most likely
will never operate on any repo other than the_repository). I'll refactor
the API to avoid this situation in the first place.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 7/8] submodule-config: pass repo upon blob config read
  2021-08-16 20:09       ` Jonathan Tan
@ 2021-08-16 20:57         ` Jonathan Tan
  0 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 20:57 UTC (permalink / raw)
  To: jonathantanmy
  Cc: matheus.bernardino, git, emilyshaffer, gitster, ramsay, steadmon

> > > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
> > >
> > >         if (get_oid(name, &oid) < 0)
> > 
> > This should be `repo_get_oid(repo, ...)` now.
> 
> Ah, good catch! This wasn't caught by the tests because the submodule
> config mechanism always passes a full-length hexadecimal string hash as
> "name" - and probably would never be caught because
> git_config_from_blob_ref() is only called from config_with_options(),
> which is called with non-NULL source from only 2 files:
> submodule-config.c (this one) and builtin/config.c (which most likely
> will never operate on any repo other than the_repository). I'll refactor
> the API to avoid this situation in the first place.

The refactoring I was thinking of was parsing the OID in
builtin/config.c and then passing only the OID (instead of a
user-provided string that could contain anything), but that doesn't
really work because the user-provided string is used in certain outputs.
I'll just change it to repo_get_oid() in this patch.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v3 0/8] In grep, no adding submodule ODB as alternates
  2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
                   ` (9 preceding siblings ...)
  2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
@ 2021-08-16 21:09 ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
                     ` (9 more replies)
  10 siblings, 10 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

Thanks for reviewing, everyone. Here are the requested changes.

Jonathan Tan (8):
  submodule: lazily add submodule ODBs as alternates
  grep: use submodule-ODB-as-alternate lazy-addition
  grep: typesafe versions of grep_source_init
  grep: read submodule entry with explicit repo
  grep: allocate subrepos on heap
  grep: add repository to OID grep sources
  submodule-config: pass repo upon blob config read
  t7814: show lack of alternate ODB-adding

 builtin/grep.c                     | 64 +++++++++++++++++++-----------
 config.c                           | 20 ++++++----
 config.h                           |  3 ++
 grep.c                             | 51 +++++++++++++++---------
 grep.h                             | 22 ++++++++--
 object-file.c                      |  5 +++
 submodule-config.c                 |  5 ++-
 submodule.c                        | 25 +++++++++++-
 submodule.h                        |  8 ++++
 t/README                           | 10 +++++
 t/t7814-grep-recurse-submodules.sh |  3 ++
 11 files changed, 161 insertions(+), 55 deletions(-)

Range-diff against v2:
1:  5994a517e8 = 1:  5994a517e8 submodule: lazily add submodule ODBs as alternates
2:  31e9b914c4 = 2:  31e9b914c4 grep: use submodule-ODB-as-alternate lazy-addition
3:  aa3f1f3c89 = 3:  aa3f1f3c89 grep: typesafe versions of grep_source_init
4:  050deacfb7 = 4:  050deacfb7 grep: read submodule entry with explicit repo
5:  3f24815224 ! 5:  7d1eeac4b5 grep: allocate subrepos on heap
    @@ builtin/grep.c: static void work_done(struct work_item *w)
     +		repo_clear(repos_to_free[i]);
     +		free(repos_to_free[i]);
     +	}
    -+	free(repos_to_free);
    ++	FREE_AND_NULL(repos_to_free);
     +	repos_to_free_nr = 0;
     +	repos_to_free_alloc = 0;
     +}
6:  50c69a988b ! 6:  f362fc278c grep: add repository to OID grep sources
    @@ grep.h: struct grep_opt {
     +
     +	/*
     +	 * NEEDSWORK: See if we can remove this field, because the repository
    -+	 * should probably be per-source, not per-repo. This is potentially the
    -+	 * cause of at least one bug - "git grep" ignoring the textconv
    -+	 * attributes from submodules. See [1] for more information.
    ++	 * should probably be per-source. That is, grep.c functions using this
    ++	 * field should probably start using "repo" in "struct grep_source"
    ++	 * instead.
    ++	 *
    ++	 * This is potentially the cause of at least one bug - "git grep"
    ++	 * ignoring the textconv attributes from submodules. See [1] for more
    ++	 * information.
     +	 * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
     +	 */
      	struct repository *repo;
7:  94db10a4e5 ! 7:  8b86618531 submodule-config: pass repo upon blob config read
    @@ Commit message
         submodule's ODB as an alternate and then reading an object from
         the_repository.
     
    +    This makes the "grep --recurse-submodules with submodules without
    +    .gitmodules in the working tree" test in t7814 work when
    +    GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## config.c ##
    @@ config.c: int git_config_from_blob_oid(config_fn_t fn,
      				    const char *name,
      				    void *data)
      {
    -@@ config.c: static int git_config_from_blob_ref(config_fn_t fn,
    + 	struct object_id oid;
      
    - 	if (get_oid(name, &oid) < 0)
    +-	if (get_oid(name, &oid) < 0)
    ++	if (repo_get_oid(repo, name, &oid) < 0)
      		return error(_("unable to resolve config blob '%s'"), name);
     -	return git_config_from_blob_oid(fn, name, &oid, data);
     +	return git_config_from_blob_oid(fn, name, repo, &oid, data);
8:  4a51fcfb77 = 8:  4b3176f99e t7814: show lack of alternate ODB-adding
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v3 1/8] submodule: lazily add submodule ODBs as alternates
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

Teach Git to add submodule ODBs as alternates to the object store of
the_repository only upon the first access of an object not in
the_repository, and not when add_submodule_odb() is called.

This provides a means of gradually migrating from accessing a
submodule's object through alternates to accessing a submodule's object
by explicitly passing its repository object. Any Git command can declare
that it might access submodule objects by calling add_submodule_odb()
(as they do now), but the submodule ODBs themselves will not be added
until needed, so individual commands and/or combinations of arguments
can be migrated one by one.

[The advantage of explicit repository-object passing is code clarity (it
is clear which repository an object read is from), performance (there is
no need to linearly search through all submodule ODBs whenever an object
is accessed from any repository, whether superproject or submodule), and
the possibility of future features like partial clone submodules (which
right now is not possible because if an object is missing, we do not
know which repository to lazy-fetch into).]

This commit also introduces an environment variable that a test may set
to make the actual registration of alternates fatal, in order to
demonstrate that its codepaths do not need this registration.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-file.c |  5 +++++
 submodule.c   | 20 +++++++++++++++++++-
 submodule.h   |  7 +++++++
 t/README      | 10 ++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 3d27dc8dea..621b121bcb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -32,6 +32,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "submodule.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1592,6 +1593,10 @@ static int do_oid_object_info_extended(struct repository *r,
 				break;
 		}
 
+		if (register_all_submodule_odb_as_alternates())
+			/* We added some alternates; retry */
+			continue;
+
 		/* Check if it is a missing object */
 		if (fetch_if_missing && repo_has_promisor_remote(r) &&
 		    !already_retried &&
diff --git a/submodule.c b/submodule.c
index 8e611fe1db..8fde90e906 100644
--- a/submodule.c
+++ b/submodule.c
@@ -165,6 +165,8 @@ void stage_updated_gitmodules(struct index_state *istate)
 		die(_("staging updated .gitmodules failed"));
 }
 
+static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
+
 /* TODO: remove this function, use repo_submodule_init instead. */
 int add_submodule_odb(const char *path)
 {
@@ -178,12 +180,28 @@ int add_submodule_odb(const char *path)
 		ret = -1;
 		goto done;
 	}
-	add_to_alternates_memory(objects_directory.buf);
+	string_list_insert(&added_submodule_odb_paths,
+			   strbuf_detach(&objects_directory, NULL));
 done:
 	strbuf_release(&objects_directory);
 	return ret;
 }
 
+int register_all_submodule_odb_as_alternates(void)
+{
+	int i;
+	int ret = added_submodule_odb_paths.nr;
+
+	for (i = 0; i < added_submodule_odb_paths.nr; i++)
+		add_to_alternates_memory(added_submodule_odb_paths.items[i].string);
+	if (ret) {
+		string_list_clear(&added_submodule_odb_paths, 0);
+		if (git_env_bool("GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB", 0))
+			BUG("register_all_submodule_odb_as_alternates() called");
+	}
+	return ret;
+}
+
 void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 					     const char *path)
 {
diff --git a/submodule.h b/submodule.h
index 84640c49c1..c252784bc2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -97,7 +97,14 @@ int submodule_uses_gitfile(const char *path);
 #define SUBMODULE_REMOVAL_IGNORE_IGNORED_UNTRACKED (1<<2)
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
+/*
+ * Call add_submodule_odb() to add the submodule at the given path to a list.
+ * When register_all_submodule_odb_as_alternates() is called, the object stores
+ * of all submodules in that list will be added as alternates in
+ * the_repository.
+ */
 int add_submodule_odb(const char *path);
+int register_all_submodule_odb_as_alternates(void);
 
 /*
  * Checks if there are submodule changes in a..b. If a is the null OID,
diff --git a/t/README b/t/README
index 9e70122302..8b67b4f00b 100644
--- a/t/README
+++ b/t/README
@@ -448,6 +448,16 @@ GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
 to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
 execution of the parallel-checkout code.
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=<boolean>, when true, makes
+registering submodule ODBs as alternates a fatal action. Support for
+this environment variable can be removed once the migration to
+explicitly providing repositories when accessing submodule objects is
+complete (in which case we might want to replace this with a trace2
+call so that users can make it visible if accessing submodule objects
+without an explicit repository still happens) or needs to be abandoned
+for whatever reason (in which case the migrated codepaths still retain
+their performance benefits).
+
 Naming Tests
 ------------
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 2 +-
 submodule.c    | 5 +++++
 submodule.h    | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7d2f8e5adb..87bcb934a2 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -450,7 +450,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_to_alternates_memory(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo.objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
diff --git a/submodule.c b/submodule.c
index 8fde90e906..8de1aecaeb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -187,6 +187,11 @@ int add_submodule_odb(const char *path)
 	return ret;
 }
 
+void add_submodule_odb_by_path(const char *path)
+{
+	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
+}
+
 int register_all_submodule_odb_as_alternates(void)
 {
 	int i;
diff --git a/submodule.h b/submodule.h
index c252784bc2..17a06cc43b 100644
--- a/submodule.h
+++ b/submodule.h
@@ -104,6 +104,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags);
  * the_repository.
  */
 int add_submodule_odb(const char *path);
+void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
 /*
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 3/8] grep: typesafe versions of grep_source_init
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 4/8] grep: read submodule entry with explicit repo Jonathan Tan
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c |  4 ++--
 grep.c         | 46 ++++++++++++++++++++++++++++------------------
 grep.h         |  7 ++++---
 3 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 87bcb934a2..e454335e9d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -333,7 +333,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init(&gs, GREP_SOURCE_OID, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
@@ -359,7 +359,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, 0, &buf);
-	grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename);
+	grep_source_init_file(&gs, buf.buf, filename);
 	strbuf_release(&buf);
 
 	if (num_threads > 1) {
diff --git a/grep.c b/grep.c
index 424a39591b..8a8105c2eb 100644
--- a/grep.c
+++ b/grep.c
@@ -1825,14 +1825,24 @@ int grep_source(struct grep_opt *opt, struct grep_source *gs)
 	return grep_source_1(opt, gs, 0);
 }
 
+static void grep_source_init_buf(struct grep_source *gs, char *buf,
+				 unsigned long size)
+{
+	gs->type = GREP_SOURCE_BUF;
+	gs->name = NULL;
+	gs->path = NULL;
+	gs->buf = buf;
+	gs->size = size;
+	gs->driver = NULL;
+	gs->identifier = NULL;
+}
+
 int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 {
 	struct grep_source gs;
 	int r;
 
-	grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
-	gs.buf = buf;
-	gs.size = size;
+	grep_source_init_buf(&gs, buf, size);
 
 	r = grep_source(opt, &gs);
 
@@ -1840,28 +1850,28 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size)
 	return r;
 }
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier)
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path)
 {
-	gs->type = type;
+	gs->type = GREP_SOURCE_FILE;
 	gs->name = xstrdup_or_null(name);
 	gs->path = xstrdup_or_null(path);
 	gs->buf = NULL;
 	gs->size = 0;
 	gs->driver = NULL;
+	gs->identifier = xstrdup(path);
+}
 
-	switch (type) {
-	case GREP_SOURCE_FILE:
-		gs->identifier = xstrdup(identifier);
-		break;
-	case GREP_SOURCE_OID:
-		gs->identifier = oiddup(identifier);
-		break;
-	case GREP_SOURCE_BUF:
-		gs->identifier = NULL;
-		break;
-	}
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid)
+{
+	gs->type = GREP_SOURCE_OID;
+	gs->name = xstrdup_or_null(name);
+	gs->path = xstrdup_or_null(path);
+	gs->buf = NULL;
+	gs->size = 0;
+	gs->driver = NULL;
+	gs->identifier = oiddup(oid);
 }
 
 void grep_source_clear(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 72f82b1e30..480b3f5bba 100644
--- a/grep.h
+++ b/grep.h
@@ -195,9 +195,10 @@ struct grep_source {
 	struct userdiff_driver *driver;
 };
 
-void grep_source_init(struct grep_source *gs, enum grep_source_type type,
-		      const char *name, const char *path,
-		      const void *identifier);
+void grep_source_init_file(struct grep_source *gs, const char *name,
+			   const char *path);
+void grep_source_init_oid(struct grep_source *gs, const char *name,
+			  const char *path, const struct object_id *oid);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 4/8] grep: read submodule entry with explicit repo
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 5/8] grep: allocate subrepos on heap Jonathan Tan
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

Replace an existing parse_object_or_die() call (which implicitly works
on the_repository) with a function call that allows a repository to be
passed in. There is no such direct equivalent to parse_object_or_die(),
but we only need the type of the object, so replace with
oid_object_info().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index e454335e9d..9e61c7c993 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -457,27 +457,27 @@ static int grep_submodule(struct grep_opt *opt,
 	subopt.repo = &subrepo;
 
 	if (oid) {
-		struct object *object;
+		enum object_type object_type;
 		struct tree_desc tree;
 		void *data;
 		unsigned long size;
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object = parse_object_or_die(oid, NULL);
+		object_type = oid_object_info(&subrepo, oid, NULL);
 		obj_read_unlock();
 		data = read_object_with_reference(&subrepo,
-						  &object->oid, tree_type,
+						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
-			die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
+			die(_("unable to read tree (%s)"), oid_to_hex(oid));
 
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
-				object->type == OBJ_COMMIT);
+				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
 		free(data);
 	} else {
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 5/8] grep: allocate subrepos on heap
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 4/8] grep: read submodule entry with explicit repo Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

Currently, struct repository objects corresponding to submodules are
allocated on the stack in grep_submodule(). This currently works because
they will not be used once grep_submodule() exits, but a subsequent
patch will require these structs to be accessible for longer (perhaps
even in another thread). Allocate them on the heap and clear them only
at the very end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9e61c7c993..fa7fd08150 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -65,6 +65,9 @@ static int todo_done;
 /* Has all work items been added? */
 static int all_work_added;
 
+static struct repository **repos_to_free;
+static size_t repos_to_free_nr, repos_to_free_alloc;
+
 /* This lock protects all the variables above. */
 static pthread_mutex_t grep_mutex;
 
@@ -168,6 +171,19 @@ static void work_done(struct work_item *w)
 	grep_unlock();
 }
 
+static void free_repos(void)
+{
+	int i;
+
+	for (i = 0; i < repos_to_free_nr; i++) {
+		repo_clear(repos_to_free[i]);
+		free(repos_to_free[i]);
+	}
+	FREE_AND_NULL(repos_to_free);
+	repos_to_free_nr = 0;
+	repos_to_free_alloc = 0;
+}
+
 static void *run(void *arg)
 {
 	int hit = 0;
@@ -415,19 +431,24 @@ static int grep_submodule(struct grep_opt *opt,
 			  const struct object_id *oid,
 			  const char *filename, const char *path, int cached)
 {
-	struct repository subrepo;
+	struct repository *subrepo;
 	struct repository *superproject = opt->repo;
 	const struct submodule *sub;
 	struct grep_opt subopt;
-	int hit;
+	int hit = 0;
 
 	sub = submodule_from_path(superproject, null_oid(), path);
 
 	if (!is_submodule_active(superproject, path))
 		return 0;
 
-	if (repo_submodule_init(&subrepo, superproject, sub))
+	subrepo = xmalloc(sizeof(*subrepo));
+	if (repo_submodule_init(subrepo, superproject, sub)) {
+		free(subrepo);
 		return 0;
+	}
+	ALLOC_GROW(repos_to_free, repos_to_free_nr + 1, repos_to_free_alloc);
+	repos_to_free[repos_to_free_nr++] = subrepo;
 
 	/*
 	 * NEEDSWORK: repo_read_gitmodules() might call
@@ -438,7 +459,7 @@ static int grep_submodule(struct grep_opt *opt,
 	 * subrepo's odbs to the in-memory alternates list.
 	 */
 	obj_read_lock();
-	repo_read_gitmodules(&subrepo, 0);
+	repo_read_gitmodules(subrepo, 0);
 
 	/*
 	 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -450,11 +471,11 @@ static int grep_submodule(struct grep_opt *opt,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_submodule_odb_by_path(subrepo.objects->odb->path);
+	add_submodule_odb_by_path(subrepo->objects->odb->path);
 	obj_read_unlock();
 
 	memcpy(&subopt, opt, sizeof(subopt));
-	subopt.repo = &subrepo;
+	subopt.repo = subrepo;
 
 	if (oid) {
 		enum object_type object_type;
@@ -464,9 +485,9 @@ static int grep_submodule(struct grep_opt *opt,
 		struct strbuf base = STRBUF_INIT;
 
 		obj_read_lock();
-		object_type = oid_object_info(&subrepo, oid, NULL);
+		object_type = oid_object_info(subrepo, oid, NULL);
 		obj_read_unlock();
-		data = read_object_with_reference(&subrepo,
+		data = read_object_with_reference(subrepo,
 						  oid, tree_type,
 						  &size, NULL);
 		if (!data)
@@ -484,7 +505,6 @@ static int grep_submodule(struct grep_opt *opt,
 		hit = grep_cache(&subopt, pathspec, cached);
 	}
 
-	repo_clear(&subrepo);
 	return hit;
 }
 
@@ -1182,5 +1202,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		run_pager(&opt, prefix);
 	clear_pathspec(&pathspec);
 	free_grep_patterns(&opt);
+	free_repos();
 	return !hit;
 }
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 6/8] grep: add repository to OID grep sources
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 5/8] grep: allocate subrepos on heap Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
  2021-08-16 21:09   ` [PATCH v3 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 15 ++++++---------
 grep.c         |  7 +++++--
 grep.h         | 17 ++++++++++++++++-
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index fa7fd08150..51278b01fa 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -349,7 +349,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
@@ -462,14 +462,11 @@ static int grep_submodule(struct grep_opt *opt,
 	repo_read_gitmodules(subrepo, 0);
 
 	/*
-	 * NEEDSWORK: This adds the submodule's object directory to the list of
-	 * alternates for the single in-memory object store.  This has some bad
-	 * consequences for memory (processed objects will never be freed) and
-	 * performance (this increases the number of pack files git has to pay
-	 * attention to, to the sum of the number of pack files in all the
-	 * repositories processed so far).  This can be removed once the object
-	 * store is no longer global and instead is a member of the repository
-	 * object.
+	 * All code paths tested by test code no longer need submodule ODBs to
+	 * be added as alternates, but add it to the list just in case.
+	 * Submodule ODBs added through add_submodule_odb_by_path() will be
+	 * lazily registered as alternates when needed (and except in an
+	 * unexpected code interaction, it won't be needed).
 	 */
 	add_submodule_odb_by_path(subrepo->objects->odb->path);
 	obj_read_unlock();
diff --git a/grep.c b/grep.c
index 8a8105c2eb..79598f245f 100644
--- a/grep.c
+++ b/grep.c
@@ -1863,7 +1863,8 @@ void grep_source_init_file(struct grep_source *gs, const char *name,
 }
 
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid)
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo)
 {
 	gs->type = GREP_SOURCE_OID;
 	gs->name = xstrdup_or_null(name);
@@ -1872,6 +1873,7 @@ void grep_source_init_oid(struct grep_source *gs, const char *name,
 	gs->size = 0;
 	gs->driver = NULL;
 	gs->identifier = oiddup(oid);
+	gs->repo = repo;
 }
 
 void grep_source_clear(struct grep_source *gs)
@@ -1900,7 +1902,8 @@ static int grep_source_load_oid(struct grep_source *gs)
 {
 	enum object_type type;
 
-	gs->buf = read_object_file(gs->identifier, &type, &gs->size);
+	gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type,
+					&gs->size);
 	if (!gs->buf)
 		return error(_("'%s': unable to read %s"),
 			     gs->name,
diff --git a/grep.h b/grep.h
index 480b3f5bba..128007db65 100644
--- a/grep.h
+++ b/grep.h
@@ -120,7 +120,20 @@ struct grep_opt {
 	struct grep_pat *header_list;
 	struct grep_pat **header_tail;
 	struct grep_expr *pattern_expression;
+
+	/*
+	 * NEEDSWORK: See if we can remove this field, because the repository
+	 * should probably be per-source. That is, grep.c functions using this
+	 * field should probably start using "repo" in "struct grep_source"
+	 * instead.
+	 *
+	 * This is potentially the cause of at least one bug - "git grep"
+	 * ignoring the textconv attributes from submodules. See [1] for more
+	 * information.
+	 * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
+	 */
 	struct repository *repo;
+
 	const char *prefix;
 	int prefix_length;
 	regex_t regexp;
@@ -187,6 +200,7 @@ struct grep_source {
 		GREP_SOURCE_BUF,
 	} type;
 	void *identifier;
+	struct repository *repo; /* if GREP_SOURCE_OID */
 
 	char *buf;
 	unsigned long size;
@@ -198,7 +212,8 @@ struct grep_source {
 void grep_source_init_file(struct grep_source *gs, const char *name,
 			   const char *path);
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid);
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);
 void grep_source_load_driver(struct grep_source *gs,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 7/8] submodule-config: pass repo upon blob config read
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (5 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-16 21:09   ` [PATCH v3 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.

This makes the "grep --recurse-submodules with submodules without
.gitmodules in the working tree" test in t7814 work when
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c           | 20 +++++++++++++-------
 config.h           |  3 +++
 submodule-config.c |  5 +++--
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/config.c b/config.c
index f33abeab85..f401c1d24e 100644
--- a/config.c
+++ b/config.c
@@ -1796,6 +1796,7 @@ int git_config_from_mem(config_fn_t fn,
 
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
+			      struct repository *repo,
 			      const struct object_id *oid,
 			      void *data)
 {
@@ -1804,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	unsigned long size;
 	int ret;
 
-	buf = read_object_file(oid, &type, &size);
+	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
 		return error(_("unable to load config blob object '%s'"), name);
 	if (type != OBJ_BLOB) {
@@ -1820,14 +1821,15 @@ int git_config_from_blob_oid(config_fn_t fn,
 }
 
 static int git_config_from_blob_ref(config_fn_t fn,
+				    struct repository *repo,
 				    const char *name,
 				    void *data)
 {
 	struct object_id oid;
 
-	if (get_oid(name, &oid) < 0)
+	if (repo_get_oid(repo, name, &oid) < 0)
 		return error(_("unable to resolve config blob '%s'"), name);
-	return git_config_from_blob_oid(fn, name, &oid, data);
+	return git_config_from_blob_oid(fn, name, repo, &oid, data);
 }
 
 char *git_system_config(void)
@@ -1958,12 +1960,16 @@ int config_with_options(config_fn_t fn, void *data,
 	 * If we have a specific filename, use it. Otherwise, follow the
 	 * regular lookup sequence.
 	 */
-	if (config_source && config_source->use_stdin)
+	if (config_source && config_source->use_stdin) {
 		return git_config_from_stdin(fn, data);
-	else if (config_source && config_source->file)
+	} else if (config_source && config_source->file) {
 		return git_config_from_file(fn, config_source->file, data);
-	else if (config_source && config_source->blob)
-		return git_config_from_blob_ref(fn, config_source->blob, data);
+	} else if (config_source && config_source->blob) {
+		struct repository *repo = config_source->repo ?
+			config_source->repo : the_repository;
+		return git_config_from_blob_ref(fn, repo, config_source->blob,
+						data);
+	}
 
 	return do_git_config_sequence(opts, fn, data);
 }
diff --git a/config.h b/config.h
index a2200f3111..147f5e0490 100644
--- a/config.h
+++ b/config.h
@@ -49,6 +49,8 @@ const char *config_scope_name(enum config_scope scope);
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
+	/* The repository if blob is not NULL; leave blank for the_repository */
+	struct repository *repo;
 	const char *blob;
 	enum config_scope scope;
 };
@@ -136,6 +138,7 @@ int git_config_from_mem(config_fn_t fn,
 			const char *buf, size_t len,
 			void *data, const struct config_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
+			     struct repository *repo,
 			     const struct object_id *oid, void *data);
 void git_config_push_parameter(const char *text);
 void git_config_push_env(const char *spec);
diff --git a/submodule-config.c b/submodule-config.c
index 2026120fb3..f95344028b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 			config_source.file = file;
 		} else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
 			   repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
+			config_source.repo = repo;
 			config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
 			if (repo != the_repository)
-				add_to_alternates_memory(repo->objects->odb->path);
+				add_submodule_odb_by_path(repo->objects->odb->path);
 		} else {
 			goto out;
 		}
@@ -702,7 +703,7 @@ void gitmodules_config_oid(const struct object_id *commit_oid)
 
 	if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) {
 		git_config_from_blob_oid(gitmodules_cb, rev.buf,
-					 &oid, the_repository);
+					 the_repository, &oid, the_repository);
 	}
 	strbuf_release(&rev);
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v3 8/8] t7814: show lack of alternate ODB-adding
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (6 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
@ 2021-08-16 21:09   ` Jonathan Tan
  2021-08-17 19:29   ` [PATCH v3 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
  2021-09-08  0:26   ` Junio C Hamano
  9 siblings, 0 replies; 71+ messages in thread
From: Jonathan Tan @ 2021-08-16 21:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, matheus.bernardino, gitster

The previous patches have made "git grep" no longer need to add
submodule ODBs as alternates, at least for the code paths tested in
t7814. Demonstrate this by making adding a submodule ODB as an alternate
fatal in this test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 t/t7814-grep-recurse-submodules.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 828cb3ba58..3172f5b936 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -8,6 +8,9 @@ submodules.
 
 . ./test-lib.sh
 
+GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
+export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
+
 test_expect_success 'setup directory structure and submodule' '
 	echo "(1|2)d(3|4)" >a &&
 	mkdir b &&
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v3 0/8] In grep, no adding submodule ODB as alternates
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (7 preceding siblings ...)
  2021-08-16 21:09   ` [PATCH v3 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
@ 2021-08-17 19:29   ` Matheus Tavares Bernardino
  2021-09-08  0:26   ` Junio C Hamano
  9 siblings, 0 replies; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-08-17 19:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Junio C Hamano

On Mon, Aug 16, 2021 at 6:10 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks for reviewing, everyone. Here are the requested changes.

Thanks. You've addressed all my comments from the previous rounds.

One minor suggestion which is not worth a re-roll on its own:

> Range-diff against v2:
[...]
> 7:  94db10a4e5 ! 7:  8b86618531 submodule-config: pass repo upon blob config read
>     @@ Commit message
>          When reading the config of a submodule, if reading from a blob, read
>          using an explicitly specified repository instead of by adding the
>          submodule's ODB as an alternate and then reading an object from
>          the_repository.
>
>     +    This makes the "grep --recurse-submodules with submodules without
>     +    .gitmodules in the working tree" test in t7814 work when
>     +    GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.

This sounds to me as if the test was previously failing when
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true, which is not the case.
I think an alternative could be:

This code path is exercised by the "grep --recurse-submodules with
submodules without .gitmodules in the working tree" test in t7814. The
test passes with GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1,
showing that we indeed are no longer accessing the submodule's ODB
through the alternates list.

But it's really a minor thing. With or without this change:

Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v3 0/8] In grep, no adding submodule ODB as alternates
  2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
                     ` (8 preceding siblings ...)
  2021-08-17 19:29   ` [PATCH v3 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
@ 2021-09-08  0:26   ` Junio C Hamano
  2021-09-08 15:31     ` Matheus Tavares Bernardino
  9 siblings, 1 reply; 71+ messages in thread
From: Junio C Hamano @ 2021-09-08  0:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matheus.bernardino

Jonathan Tan <jonathantanmy@google.com> writes:

> Thanks for reviewing, everyone. Here are the requested changes.
>
> Jonathan Tan (8):
>   submodule: lazily add submodule ODBs as alternates
>   grep: use submodule-ODB-as-alternate lazy-addition
>   grep: typesafe versions of grep_source_init
>   grep: read submodule entry with explicit repo
>   grep: allocate subrepos on heap
>   grep: add repository to OID grep sources
>   submodule-config: pass repo upon blob config read
>   t7814: show lack of alternate ODB-adding

I didn't see anybody comment on this round (and do not think I saw
anything glaringly wrong).

Is everybody happy with this version?  I am about to mark it for
'next' in the next issue of "What's cooking" report, so please
holler if I should wait.

Thanks.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v3 0/8] In grep, no adding submodule ODB as alternates
  2021-09-08  0:26   ` Junio C Hamano
@ 2021-09-08 15:31     ` Matheus Tavares Bernardino
  2021-09-08 18:45       ` Junio C Hamano
  0 siblings, 1 reply; 71+ messages in thread
From: Matheus Tavares Bernardino @ 2021-09-08 15:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

On Tue, Sep 7, 2021 at 9:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jonathan Tan <jonathantanmy@google.com> writes:
>
> > Thanks for reviewing, everyone. Here are the requested changes.
> >
> > Jonathan Tan (8):
> >   submodule: lazily add submodule ODBs as alternates
> >   grep: use submodule-ODB-as-alternate lazy-addition
> >   grep: typesafe versions of grep_source_init
> >   grep: read submodule entry with explicit repo
> >   grep: allocate subrepos on heap
> >   grep: add repository to OID grep sources
> >   submodule-config: pass repo upon blob config read
> >   t7814: show lack of alternate ODB-adding
>
> I didn't see anybody comment on this round (and do not think I saw
> anything glaringly wrong).
>
> Is everybody happy with this version?  I am about to mark it for
> 'next' in the next issue of "What's cooking" report, so please
> holler if I should wait.

I think it's ready for 'next'. Jonathan addressed all my comments
from previous rounds; I also just read the patches again and all
looks good to me. Feel free to include my:

Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v3 0/8] In grep, no adding submodule ODB as alternates
  2021-09-08 15:31     ` Matheus Tavares Bernardino
@ 2021-09-08 18:45       ` Junio C Hamano
  0 siblings, 0 replies; 71+ messages in thread
From: Junio C Hamano @ 2021-09-08 18:45 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: Jonathan Tan, git

Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> On Tue, Sep 7, 2021 at 9:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>> > Thanks for reviewing, everyone. Here are the requested changes.
>> >
>> > Jonathan Tan (8):
>> >   submodule: lazily add submodule ODBs as alternates
>> >   grep: use submodule-ODB-as-alternate lazy-addition
>> >   grep: typesafe versions of grep_source_init
>> >   grep: read submodule entry with explicit repo
>> >   grep: allocate subrepos on heap
>> >   grep: add repository to OID grep sources
>> >   submodule-config: pass repo upon blob config read
>> >   t7814: show lack of alternate ODB-adding
>>
>> I didn't see anybody comment on this round (and do not think I saw
>> anything glaringly wrong).
>>
>> Is everybody happy with this version?  I am about to mark it for
>> 'next' in the next issue of "What's cooking" report, so please
>> holler if I should wait.
>
> I think it's ready for 'next'. Jonathan addressed all my comments
> from previous rounds; I also just read the patches again and all
> looks good to me. Feel free to include my:
>
> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>

Thanks, both.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v3 6/8] grep: add repository to OID grep sources
  2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
@ 2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
  2021-09-27 16:45       ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
  0 siblings, 1 reply; 71+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-27 12:08 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, matheus.bernardino, gitster


On Mon, Aug 16 2021, Jonathan Tan wrote:

> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.
> [...]
> diff --git a/grep.h b/grep.h
> index 480b3f5bba..128007db65 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -120,7 +120,20 @@ struct grep_opt {
>  	struct grep_pat *header_list;
>  	struct grep_pat **header_tail;
>  	struct grep_expr *pattern_expression;
> +
> +	/*
> +	 * NEEDSWORK: See if we can remove this field, because the repository
> +	 * should probably be per-source. That is, grep.c functions using this
> +	 * field should probably start using "repo" in "struct grep_source"
> +	 * instead.
> +	 *
> +	 * This is potentially the cause of at least one bug - "git grep"
> +	 * ignoring the textconv attributes from submodules. See [1] for more
> +	 * information.
> +	 * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
> +	 */
>  	struct repository *repo;
> +

I ran into this comment and read the linked E-Mail, and then the
downthread
https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@mail.gmail.com/;

Given Matheus's "I've somehow missed this guard and the..." there I'm
not quite sure what/if we should be doing here & what this comment is
recommending? I.e. do we still need to adjust the call chains as noted
in the E-Mail the comment links to, or not?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates
  2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
@ 2021-09-27 16:45       ` Matheus Tavares
  2021-09-27 17:30         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 71+ messages in thread
From: Matheus Tavares @ 2021-09-27 16:45 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, jonathantanmy, matheus.bernardino

On Mon, Sep 27, 2021 at 9:09 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Mon, Aug 16 2021, Jonathan Tan wrote:
>
> > Record the repository whenever an OID grep source is created, and teach
> > the worker threads to explicitly provide the repository when accessing
> > objects.
> > [...]
> > diff --git a/grep.h b/grep.h
> > index 480b3f5bba..128007db65 100644
> > --- a/grep.h
> > +++ b/grep.h
> > @@ -120,7 +120,20 @@ struct grep_opt {
> >       struct grep_pat *header_list;
> >       struct grep_pat **header_tail;
> >       struct grep_expr *pattern_expression;
> > +
> > +     /*
> > +      * NEEDSWORK: See if we can remove this field, because the repository
> > +      * should probably be per-source. That is, grep.c functions using this
> > +      * field should probably start using "repo" in "struct grep_source"
> > +      * instead.
> > +      *
> > +      * This is potentially the cause of at least one bug - "git grep"
> > +      * ignoring the textconv attributes from submodules. See [1] for more
> > +      * information.
> > +      * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
> > +      */
> >       struct repository *repo;
> > +
>
> I ran into this comment and read the linked E-Mail, and then the
> downthread
> https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@mail.gmail.com/;
>
> Given Matheus's "I've somehow missed this guard and the..." there I'm
> not quite sure what/if we should be doing here & what this comment is
> recommending? I.e. do we still need to adjust the call chains as noted
> in the E-Mail the comment links to, or not?

I think we should still adjust the call chains, yes. The downthread
message you mentioned is kind of a tangent about performance, where
Junio helped me understand something I had previously missed in the
code, regarding the persistence of the attributes stack.

But the issue that started the thread was about a correctness problem:
the superproject textconv attributes are being used on submodules'
files when running `git grep` with `--recurse-submodules --textconv`.
The three cases to consider are:

- .gitattributes from the working tree
- .gitattributes from the index
- .git/info/attributes

On all these cases, the superproject attributes are being used on the
submodule. Additionally, if the superproject does not define any
attribute, the submodule attributes are being ignored in all cases
except by the first one (but that is only because the code sees the
.gitattributes file on the submodule as if it were a "regular"
subdirectory of the surperproject. So the submodule's .gitattribures
takes higher precedence when evaluating the attributes for files in
that directory).

Another issue is that the textconv cache is always saved to (and read
from) the superproject gitdir, even for submodules' files.

Here are some test cases that demonstrate these issues:

-- snipsnap --
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 3172f5b936..d01a3bc5d8 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -441,4 +441,104 @@ test_expect_success 'grep --recurse-submodules with --cached ignores worktree mo
 	test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 &&
 	test_must_be_empty actual
 '
+
+test_expect_failure 'grep --textconv: superproject .gitattributes does not affect submodules' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+	echo "a diff=d2x" >.gitattributes &&
+
+	cat >expect <<-\EOF &&
+	a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'grep --textconv: superproject .gitattributes (from index) does not affect submodules' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+	echo "a diff=d2x" >.gitattributes &&
+	git add .gitattributes &&
+	rm .gitattributes &&
+
+	cat >expect <<-\EOF &&
+	a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'grep --textconv: superproject .git/info/attributes does not affect submodules' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+	super_attr="$(git rev-parse --path-format=relative --git-path info/attributes)" &&
+	test_when_finished rm -f "$super_attr" &&
+	echo "a diff=d2x" >"$super_attr" &&
+
+	cat >expect <<-\EOF &&
+	a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --textconv corectly reads submodule .gitattributes' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+	echo "a diff=d2x" >submodule/.gitattributes &&
+
+	cat >expect <<-\EOF &&
+	submodule/a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'grep --textconv corectly reads submodule .gitattributes (from index)' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+	echo "a diff=d2x" >submodule/.gitattributes &&
+	git -C submodule add .gitattributes &&
+	rm submodule/.gitattributes &&
+
+	cat >expect <<-\EOF &&
+	submodule/a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'grep --textconv corectly reads submodule .git/info/attributes' '
+	reset_and_clean &&
+	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
+
+	# Workaround: we use --path-format=relative because the absolute path
+	# contains whitespaces and that seems to confuse test_when_finished
+	#
+	submodule_attr="submodule/$(git -C submodule rev-parse --path-format=relative --git-path info/attributes)" &&
+	test_when_finished rm -f "$submodule_attr" &&
+	echo "a diff=d2x" >"$submodule_attr" &&
+
+	cat >expect <<-\EOF &&
+	submodule/a:(1|2)x(3|4)
+	EOF
+	git grep --textconv --recurse-submodules x >actual &&
+	test_cmp expect actual
+'
+
+test_expect_failure 'grep saves textconv cache in the appropriated repository' '
+	reset_and_clean &&
+	test_config_global diff.d2x_cached.textconv "sed -e \"s/d/x/\"" &&
+	test_config_global diff.d2x_cached.cachetextconv true &&
+	echo "a diff=d2x_cached" >submodule/.gitattributes &&
+
+	# Note: we only read/write to the textconv cache when grepping from an
+	# OID as the working tree file might have modifications. That is why
+	# we use --cached here.
+	#
+	git grep --textconv --cached --recurse-submodules x &&
+	test_path_is_missing "$(git rev-parse --git-path refs/notes/textconv/d2x_cached)" &&
+	test_path_is_file "$(git -C submodule rev-parse --git-path refs/notes/textconv/d2x_cached)"
+'
+
 test_done
--

Junio seemed to agree that the behavior I described above is not the
correct one:

"None of the attributes defined in the superproject
should affect the paths in the submodule, as it is a totally
separate project, oblivious to the existence of enclosing the
superproject."

But he raised an important concern regarding how to fix this without
affecting [too much] performance:

"As there is only one attribute cache IIUC, invalidating
the whole cache for the top-level and replacing it with the one for
a submodule, every time we cross the module boundary, would probably
have a negative effect on the performance, and I am not sure what
would happen if you run more than one threads working in different
repositories (i.e. top-level and submodules)."

Maybe we would need a different attributes stack for each repository?

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates
  2021-09-27 16:45       ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
@ 2021-09-27 17:30         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 71+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-27 17:30 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, gitster, jonathantanmy


On Mon, Sep 27 2021, Matheus Tavares wrote:

> On Mon, Sep 27, 2021 at 9:09 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Aug 16 2021, Jonathan Tan wrote:
>>
>> > Record the repository whenever an OID grep source is created, and teach
>> > the worker threads to explicitly provide the repository when accessing
>> > objects.
>> > [...]
>> > diff --git a/grep.h b/grep.h
>> > index 480b3f5bba..128007db65 100644
>> > --- a/grep.h
>> > +++ b/grep.h
>> > @@ -120,7 +120,20 @@ struct grep_opt {
>> >       struct grep_pat *header_list;
>> >       struct grep_pat **header_tail;
>> >       struct grep_expr *pattern_expression;
>> > +
>> > +     /*
>> > +      * NEEDSWORK: See if we can remove this field, because the repository
>> > +      * should probably be per-source. That is, grep.c functions using this
>> > +      * field should probably start using "repo" in "struct grep_source"
>> > +      * instead.
>> > +      *
>> > +      * This is potentially the cause of at least one bug - "git grep"
>> > +      * ignoring the textconv attributes from submodules. See [1] for more
>> > +      * information.
>> > +      * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
>> > +      */
>> >       struct repository *repo;
>> > +
>>
>> I ran into this comment and read the linked E-Mail, and then the
>> downthread
>> https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@mail.gmail.com/;
>>
>> Given Matheus's "I've somehow missed this guard and the..." there I'm
>> not quite sure what/if we should be doing here & what this comment is
>> recommending? I.e. do we still need to adjust the call chains as noted
>> in the E-Mail the comment links to, or not?
>
> I think we should still adjust the call chains, yes. The downthread
> message you mentioned is kind of a tangent about performance, where
> Junio helped me understand something I had previously missed in the
> code, regarding the persistence of the attributes stack.
>
> But the issue that started the thread was about a correctness problem:
> the superproject textconv attributes are being used on submodules'
> files when running `git grep` with `--recurse-submodules --textconv`.
> The three cases to consider are:
>
> - .gitattributes from the working tree
> - .gitattributes from the index
> - .git/info/attributes
>
> On all these cases, the superproject attributes are being used on the
> submodule. Additionally, if the superproject does not define any
> attribute, the submodule attributes are being ignored in all cases
> except by the first one (but that is only because the code sees the
> .gitattributes file on the submodule as if it were a "regular"
> subdirectory of the surperproject. So the submodule's .gitattribures
> takes higher precedence when evaluating the attributes for files in
> that directory).
>
> Another issue is that the textconv cache is always saved to (and read
> from) the superproject gitdir, even for submodules' files.
>
> Here are some test cases that demonstrate these issues:
>
> -- snipsnap --
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 3172f5b936..d01a3bc5d8 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -441,4 +441,104 @@ test_expect_success 'grep --recurse-submodules with --cached ignores worktree mo
>  	test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 &&
>  	test_must_be_empty actual
>  '
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes (from index) does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +	git add .gitattributes &&
> +	rm .gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .git/info/attributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	super_attr="$(git rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$super_attr" &&
> +	echo "a diff=d2x" >"$super_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'grep --textconv corectly reads submodule .gitattributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .gitattributes (from index)' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +	git -C submodule add .gitattributes &&
> +	rm submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .git/info/attributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +
> +	# Workaround: we use --path-format=relative because the absolute path
> +	# contains whitespaces and that seems to confuse test_when_finished
> +	#
> +	submodule_attr="submodule/$(git -C submodule rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$submodule_attr" &&
> +	echo "a diff=d2x" >"$submodule_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep saves textconv cache in the appropriated repository' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x_cached.textconv "sed -e \"s/d/x/\"" &&
> +	test_config_global diff.d2x_cached.cachetextconv true &&
> +	echo "a diff=d2x_cached" >submodule/.gitattributes &&
> +
> +	# Note: we only read/write to the textconv cache when grepping from an
> +	# OID as the working tree file might have modifications. That is why
> +	# we use --cached here.
> +	#
> +	git grep --textconv --cached --recurse-submodules x &&
> +	test_path_is_missing "$(git rev-parse --git-path refs/notes/textconv/d2x_cached)" &&
> +	test_path_is_file "$(git -C submodule rev-parse --git-path refs/notes/textconv/d2x_cached)"
> +'
> +
>  test_done

Thanks! I think it would be very good to have these tests in-tree along
with an updated comment pointing to them.


^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2021-09-27 17:46 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-10 21:13   ` Junio C Hamano
2021-08-13 16:53     ` Jonathan Tan
2021-08-11 21:33   ` Emily Shaffer
2021-08-13 16:23     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-11 21:36   ` Emily Shaffer
2021-08-13 16:31     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-10 21:38   ` Junio C Hamano
2021-08-11 21:42   ` Emily Shaffer
2021-08-11 23:07     ` Ramsay Jones
2021-08-13 16:32       ` Jonathan Tan
2021-08-11 22:45   ` Matheus Tavares Bernardino
2021-08-12 16:49     ` Junio C Hamano
2021-08-13 16:33       ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-11 21:44   ` Emily Shaffer
2021-08-13 16:39     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
2021-08-11 21:50   ` Emily Shaffer
2021-08-13 16:42     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
2021-08-11 21:52   ` Emily Shaffer
2021-08-13 16:44     ` Jonathan Tan
2021-08-11 23:28   ` Matheus Tavares Bernardino
2021-08-13 16:47     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-11 21:55   ` Emily Shaffer
2021-08-11 22:22     ` Matheus Tavares Bernardino
2021-08-13 16:50       ` Jonathan Tan
2021-08-11 21:29 ` [PATCH 0/7] In grep, no adding submodule ODB as alternates Emily Shaffer
2021-08-11 22:49 ` Josh Steadmon
2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 15:06     ` Matheus Tavares Bernardino
2021-08-13 21:05   ` [PATCH v2 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-13 21:44     ` Junio C Hamano
2021-08-16 19:42       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-08-16 14:48     ` Matheus Tavares Bernardino
2021-08-16 19:44       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 14:32     ` Matheus Tavares Bernardino
2021-08-16 19:57       ` Matheus Tavares Bernardino
2021-08-16 20:02       ` Jonathan Tan
2021-08-16 15:48     ` Matheus Tavares Bernardino
2021-08-16 20:09       ` Jonathan Tan
2021-08-16 20:57         ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-16 15:14   ` [PATCH v2 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
2021-09-27 16:45       ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
2021-09-27 17:30         ` Ævar Arnfjörð Bjarmason
2021-08-16 21:09   ` [PATCH v3 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-17 19:29   ` [PATCH v3 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-09-08  0:26   ` Junio C Hamano
2021-09-08 15:31     ` Matheus Tavares Bernardino
2021-09-08 18:45       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.