git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Submodules and partial clones
@ 2020-09-29 15:53 Andrew Oakley
  2020-09-29 15:53 ` [PATCH 1/7] refs: store owning repository for object lookup Andrew Oakley
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

I've been investigating what is required to get submodules and partial
clones to work well together.  The issue seems to be that the correct
repository is not passed around, so we sometimes end up trying to fetch
objects from the wrong place.

These patches don't make promisor_remote_get_direct handle different
repositories because I've not found a case where that is necessary.

The patches rework various cases where objects from a submodule are
added to the object store of the main repository.  There are some
remaining cases where add_to_alternates_memory is used to do this, but
add_submodule_odb has been removed.

I expect there will be some remaining issues, but these changes seem to
be enough to get the basics working.

Andrew Oakley (6):
  refs: store owning repository for object lookup
  submodule: use separate submodule repositories
  refs: use correct repo in refs_peel_ref
  merge-recursive: use separate submodule repository
  submodule: remove add_submodule_odb
  submodule: use partial clone filter

Luke Diamand (1):
  Add failing test for partial clones with submodules

 builtin/clone.c                     |   5 ++
 builtin/fsck.c                      |   2 +-
 builtin/pack-objects.c              |   2 +-
 builtin/submodule--helper.c         |  21 +++--
 git-submodule.sh                    |  20 ++++-
 http-push.c                         |   2 +-
 merge-recursive.c                   |  73 ++++++++-------
 object.c                            |   7 +-
 object.h                            |   2 +-
 refs.c                              |  37 ++++----
 refs.h                              |  37 +-------
 refs/debug.c                        |   3 +-
 refs/files-backend.c                |  22 +++--
 refs/iterator.c                     |  11 ++-
 refs/packed-backend.c               |  15 ++--
 refs/packed-backend.h               |   3 +-
 refs/ref-cache.c                    |   5 +-
 refs/refs-internal.h                |  18 +++-
 revision.c                          |  21 ++---
 revision.h                          |   1 -
 submodule.c                         | 135 +++++++++++++---------------
 submodule.h                         |  11 ++-
 t/helper/test-example-decorate.c    |   6 +-
 t/helper/test-reach.c               |   2 +-
 t/helper/test-ref-store.c           |   1 -
 t/t0411-partial-clone-submodules.sh |  44 +++++++++
 t/t1406-submodule-ref-store.sh      |  17 +---
 tag.c                               |   4 +-
 tag.h                               |   2 +-
 upload-pack.c                       |   2 +-
 walker.c                            |   3 +-
 31 files changed, 301 insertions(+), 233 deletions(-)
 create mode 100755 t/t0411-partial-clone-submodules.sh

-- 
2.26.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] refs: store owning repository for object lookup
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 2/7] submodule: use separate submodule repositories Andrew Oakley
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

When calling ref_resolves_to_object we want to be able to pass the
correct repository.  This is intended to allow correct handling of
promisors and alternates inside submodules.

This change continues to pass the_repository around, even in the case
where a submodule is being used.  It shouldn't change any behaviour by
itself.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 refs.c                | 21 +++++++++++++--------
 refs/debug.c          |  3 ++-
 refs/files-backend.c  | 22 +++++++++++++---------
 refs/iterator.c       | 11 ++++++++---
 refs/packed-backend.c | 10 +++++++---
 refs/packed-backend.h |  3 ++-
 refs/ref-cache.c      |  3 ++-
 refs/refs-internal.h  | 14 +++++++++++---
 8 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/refs.c b/refs.c
index fa01153151..e62da6f2de 100644
--- a/refs.c
+++ b/refs.c
@@ -254,13 +254,14 @@ int refname_is_safe(const char *refname)
  * be resolved to an object in the database. If the referred-to object
  * does not exist, emit a warning and return false.
  */
-int ref_resolves_to_object(const char *refname,
+int ref_resolves_to_object(struct repository *repo,
+			   const char *refname,
 			   const struct object_id *oid,
 			   unsigned int flags)
 {
 	if (flags & REF_ISBROKEN)
 		return 0;
-	if (!has_object_file(oid)) {
+	if (!repo_has_object_file(repo, oid)) {
 		error(_("%s does not point to a valid object!"), refname);
 		return 0;
 	}
@@ -1751,7 +1752,8 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(struct repository *repo,
+					const char *gitdir,
 					unsigned int flags)
 {
 	const char *be_name = "files";
@@ -1761,7 +1763,7 @@ static struct ref_store *ref_store_init(const char *gitdir,
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
-	refs = be->init(gitdir, flags);
+	refs = be->init(repo, gitdir, flags);
 	return refs;
 }
 
@@ -1773,7 +1775,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r, r->gitdir, REF_STORE_ALL_CAPS);
 	r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
 	return r->refs_private;
 }
@@ -1829,7 +1831,8 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(the_repository,
+			      submodule_sb.buf,
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1855,10 +1858,12 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 		return refs;
 
 	if (wt->id)
-		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
+		refs = ref_store_init(the_repository,
+				      git_common_path("worktrees/%s", wt->id),
 				      REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(the_repository,
+				      get_git_common_dir(),
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs/debug.c b/refs/debug.c
index 922e64fa6a..6525142cc4 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -230,7 +230,8 @@ debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
 	struct ref_iterator *res =
 		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
 	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
-	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable,
+			       ref_store->repo, 1);
 	diter->iter = res;
 	trace_printf_key(&trace_refs, "ref_iterator_begin: %s (0x%x)\n", prefix, flags);
 	return &diter->base;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 04e85e7002..2ddb680c5c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -79,13 +79,15 @@ static void clear_loose_ref_cache(struct files_ref_store *refs)
  * Create a new submodule ref cache and add it to the internal
  * set of caches.
  */
-static struct ref_store *files_ref_store_create(const char *gitdir,
+static struct ref_store *files_ref_store_create(struct repository *repo,
+						const char *gitdir,
 						unsigned int flags)
 {
 	struct files_ref_store *refs = xcalloc(1, sizeof(*refs));
 	struct ref_store *ref_store = (struct ref_store *)refs;
 	struct strbuf sb = STRBUF_INIT;
 
+	ref_store->repo = repo;
 	ref_store->gitdir = xstrdup(gitdir);
 	base_ref_store_init(ref_store, &refs_be_files);
 	refs->store_flags = flags;
@@ -93,7 +95,7 @@ static struct ref_store *files_ref_store_create(const char *gitdir,
 	get_common_dir_noenv(&sb, gitdir);
 	refs->gitcommondir = strbuf_detach(&sb, NULL);
 	strbuf_addf(&sb, "%s/packed-refs", refs->gitcommondir);
-	refs->packed_ref_store = packed_ref_store_create(sb.buf, flags);
+	refs->packed_ref_store = packed_ref_store_create(repo, sb.buf, flags);
 	strbuf_release(&sb);
 
 	chdir_notify_reparent("files-backend $GIT_DIR", &refs->base.gitdir);
@@ -745,7 +747,8 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-		    !ref_resolves_to_object(iter->iter0->refname,
+		    !ref_resolves_to_object(ref_iterator->repo,
+					    iter->iter0->refname,
 					    iter->iter0->oid,
 					    iter->iter0->flags))
 			continue;
@@ -846,7 +849,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable,
-			       overlay_iter->ordered);
+			       ref_store->repo, overlay_iter->ordered);
 	iter->iter0 = overlay_iter;
 	iter->flags = flags;
 
@@ -1115,7 +1118,7 @@ static void prune_refs(struct files_ref_store *refs, struct ref_to_prune **refs_
 /*
  * Return true if the specified reference should be packed.
  */
-static int should_pack_ref(const char *refname,
+static int should_pack_ref(struct repository* repo, const char *refname,
 			   const struct object_id *oid, unsigned int ref_flags,
 			   unsigned int pack_flags)
 {
@@ -1132,7 +1135,7 @@ static int should_pack_ref(const char *refname,
 		return 0;
 
 	/* Do not pack broken refs: */
-	if (!ref_resolves_to_object(refname, oid, ref_flags))
+	if (!ref_resolves_to_object(repo, refname, oid, ref_flags))
 		return 0;
 
 	return 1;
@@ -1162,8 +1165,8 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 		 * in the packed ref cache. If the reference should be
 		 * pruned, also add it to refs_to_prune.
 		 */
-		if (!should_pack_ref(iter->refname, iter->oid, iter->flags,
-				     flags))
+		if (!should_pack_ref(ref_store->repo, iter->refname, iter->oid,
+				     iter->flags, flags))
 			continue;
 
 		/*
@@ -2155,7 +2158,8 @@ static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable,
+			       ref_store->repo, 0);
 	iter->dir_iterator = diter;
 	iter->ref_store = ref_store;
 	strbuf_release(&sb);
diff --git a/refs/iterator.c b/refs/iterator.c
index 629e00a122..a68dd452b6 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -26,9 +26,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable,
+			    struct repository *repo,
 			    int ordered)
 {
 	iter->vtable = vtable;
+	iter->repo = repo;
 	iter->ordered = !!ordered;
 	iter->refname = NULL;
 	iter->oid = NULL;
@@ -74,7 +76,8 @@ struct ref_iterator *empty_ref_iterator_begin(void)
 	struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
 	struct ref_iterator *ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable,
+			       NULL, 1);
 	return ref_iterator;
 }
 
@@ -222,7 +225,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 * references through only if they exist in both iterators.
 	 */
 
-	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, ordered);
+	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable,
+			       iter0->repo, ordered);
 	iter->iter0 = iter0;
 	iter->iter1 = iter1;
 	iter->select = select;
@@ -396,7 +400,8 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable, iter0->ordered);
+	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable,
+			       iter0->repo, iter0->ordered);
 
 	iter->iter0 = iter0;
 	iter->prefix = xstrdup(prefix);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b912f2505f..9743ee0155 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -193,13 +193,15 @@ static int release_snapshot(struct snapshot *snapshot)
 	}
 }
 
-struct ref_store *packed_ref_store_create(const char *path,
+struct ref_store *packed_ref_store_create(struct repository *repo,
+					  const char *path,
 					  unsigned int store_flags)
 {
 	struct packed_ref_store *refs = xcalloc(1, sizeof(*refs));
 	struct ref_store *ref_store = (struct ref_store *)refs;
 
 	base_ref_store_init(ref_store, &refs_be_packed);
+	ref_store->repo = repo;
 	ref_store->gitdir = xstrdup(path);
 	refs->store_flags = store_flags;
 
@@ -864,7 +866,8 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			continue;
 
 		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-		    !ref_resolves_to_object(iter->base.refname, &iter->oid,
+		    !ref_resolves_to_object(ref_iterator->repo,
+					    iter->base.refname, &iter->oid,
 					    iter->flags))
 			continue;
 
@@ -943,7 +946,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable,
+			       ref_store->repo, 1);
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
diff --git a/refs/packed-backend.h b/refs/packed-backend.h
index a01a0aff9c..942c908771 100644
--- a/refs/packed-backend.h
+++ b/refs/packed-backend.h
@@ -12,7 +12,8 @@ struct ref_transaction;
  * even among packed refs.
  */
 
-struct ref_store *packed_ref_store_create(const char *path,
+struct ref_store *packed_ref_store_create(struct repository *repo,
+					  const char *path,
 					  unsigned int store_flags);
 
 /*
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index b7052f72e2..974d37ee79 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -532,7 +532,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable, 1);
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable,
+			       cache->ref_store->repo, 1);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 467f4b3c93..9e9b2e8c76 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -65,7 +65,8 @@ int refname_is_safe(const char *refname);
  * oid and flags, can be resolved to an object in the database. If the
  * referred-to object does not exist, emit a warning and return false.
  */
-int ref_resolves_to_object(const char *refname,
+int ref_resolves_to_object(struct repository *repo,
+			   const char *refname,
 			   const struct object_id *oid,
 			   unsigned int flags);
 
@@ -299,6 +300,8 @@ int refs_rename_ref_available(struct ref_store *refs,
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
 
+	struct repository *repo;
+
 	/*
 	 * Does this `ref_iterator` iterate over references in order
 	 * by refname?
@@ -432,6 +435,7 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
  */
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable,
+			    struct repository* repo,
 			    int ordered);
 
 /*
@@ -518,11 +522,12 @@ struct ref_store;
 				 REF_STORE_MAIN)
 
 /*
- * Initialize the ref_store for the specified gitdir. These functions
+ * Initialize the ref_store for the specified repository. These functions
  * should call base_ref_store_init() to initialize the shared part of
  * the ref_store and to record the ref_store for later lookup.
  */
-typedef struct ref_store *ref_store_init_fn(const char *gitdir,
+typedef struct ref_store *ref_store_init_fn(struct repository* repo,
+					    char const *gitdir,
 					    unsigned int flags);
 
 typedef int ref_init_db_fn(struct ref_store *refs, struct strbuf *err);
@@ -680,6 +685,9 @@ struct ref_store {
 	/* The backend describing this ref_store's storage scheme: */
 	const struct ref_storage_be *be;
 
+	/* The repository that this ref_store is for: */
+	struct repository* repo;
+
 	/* The gitdir that this ref_store applies to: */
 	char *gitdir;
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] submodule: use separate submodule repositories
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
  2020-09-29 15:53 ` [PATCH 1/7] refs: store owning repository for object lookup Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 3/7] Add failing test for partial clones with submodules Andrew Oakley
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

This commit allows `git fetch --recuse-submodules` to work correctly
with partial clones, including the case where it is only the parent
repository that is a partial clone.

Using a separate repository with its own object store should allow
any objects that need to be fetched from a promisor or an alternate
object store to work correctly.

Replacing get_submodule_ref_store with get_main_ref_store for the
correct repo makes the refs store lookup objects in the correct repo
(for at least the cases relevant for a fetch).  We still can't fetch
objects from promisor remotes, but do_oid_object_info_extended detects
this and fails gracefully.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 submodule.c | 105 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 60 insertions(+), 45 deletions(-)

diff --git a/submodule.c b/submodule.c
index 543b1123ae..3889dc7d9a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -89,11 +89,11 @@ int is_staging_gitmodules_ok(struct index_state *istate)
 	return 1;
 }
 
-static int for_each_remote_ref_submodule(const char *submodule,
+static int for_each_remote_ref_submodule(struct repository *subrepo,
 					 each_ref_fn fn, void *cb_data)
 {
-	return refs_for_each_remote_ref(get_submodule_ref_store(submodule),
-					fn, cb_data);
+       return refs_for_each_remote_ref(get_main_ref_store(subrepo), fn,
+				       cb_data);
 }
 
 /*
@@ -879,6 +879,27 @@ static void free_submodules_oids(struct string_list *submodules)
 	string_list_clear(submodules, 1);
 }
 
+static struct repository* get_changed_submodule_repo(struct repository *r,
+						     const char *name_or_path)
+{
+	const struct submodule *submodule;
+	struct repository *subrepo;
+
+	submodule = submodule_from_name(r, &null_oid, name_or_path);
+	if (!submodule) {
+		/* Not a named submodule, try just using the path */
+		return open_submodule(name_or_path);
+	}
+
+	subrepo = xmalloc(sizeof(*subrepo));
+	if (repo_submodule_init(subrepo, r, submodule)) {
+		free(subrepo);
+		return NULL;
+	}
+
+	return subrepo;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
 		      int flags, void *cb_data)
 {
@@ -895,7 +916,6 @@ static int append_oid_to_argv(const struct object_id *oid, void *data)
 struct has_commit_data {
 	struct repository *repo;
 	int result;
-	const char *path;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -916,28 +936,22 @@ static int check_has_commit(const struct object_id *oid, void *data)
 		return 0;
 	default:
 		die(_("submodule entry '%s' (%s) is a %s, not a commit"),
-		    cb->path, oid_to_hex(oid), type_name(type));
+		    cb->repo->submodule_prefix, oid_to_hex(oid),
+		    type_name(type));
 	}
 }
 
-static int submodule_has_commits(struct repository *r,
-				 const char *path,
+static int submodule_has_commits(struct repository *subrepo,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = { subrepo, 1 };
 
 	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
+	 * Perform a check for the existence of 'commits' in the submodule's
+	 * object store.  If we do not have the commit object, there is no
+	 * chance we have it reachable from a ref, so we can fail early without
+	 * spawning rev-list which is expensive.
 	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
@@ -956,7 +970,7 @@ static int submodule_has_commits(struct repository *r,
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.no_stdin = 1;
-		cp.dir = path;
+		cp.dir = subrepo->submodule_prefix;
 
 		if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) || out.len)
 			has_commit.result = 0;
@@ -967,11 +981,12 @@ static int submodule_has_commits(struct repository *r,
 	return has_commit.result;
 }
 
-static int submodule_needs_pushing(struct repository *r,
-				   const char *path,
+static int submodule_needs_pushing(struct repository *subrepo,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	const char *path = subrepo->submodule_prefix;
+
+	if (!submodule_has_commits(subrepo, commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -985,7 +1000,7 @@ static int submodule_needs_pushing(struct repository *r,
 		 */
 		return 0;
 
-	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
+	if (for_each_remote_ref_submodule(subrepo, has_remote, NULL) > 0) {
 		struct child_process cp = CHILD_PROCESS_INIT;
 		struct strbuf buf = STRBUF_INIT;
 		int needs_pushing = 0;
@@ -1032,20 +1047,18 @@ int find_unpushed_submodules(struct repository *r,
 
 	for_each_string_list_item(name, &submodules) {
 		struct oid_array *commits = name->util;
-		const struct submodule *submodule;
-		const char *path = NULL;
-
-		submodule = submodule_from_name(r, &null_oid, name->string);
-		if (submodule)
-			path = submodule->path;
-		else
-			path = default_name_or_path(name->string);
+		struct repository *subrepo;
 
-		if (!path)
+		subrepo = get_changed_submodule_repo(r, name->string);
+		if (!subrepo)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
-			string_list_insert(needs_pushing, path);
+		if (submodule_needs_pushing(subrepo, commits))
+			string_list_insert(needs_pushing,
+					   subrepo->submodule_prefix);
+
+		repo_clear(subrepo);
+		free(subrepo);
 	}
 
 	free_submodules_oids(&submodules);
@@ -1060,7 +1073,12 @@ static int push_submodule(const char *path,
 			  const struct string_list *push_options,
 			  int dry_run)
 {
-	if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
+	struct repository *subrepo = open_submodule(path);
+	int have_remote = for_each_remote_ref_submodule(subrepo, has_remote, NULL);
+	repo_clear(subrepo);
+	free(subrepo);
+
+	if (have_remote > 0) {
 		struct child_process cp = CHILD_PROCESS_INIT;
 		strvec_push(&cp.args, "push");
 		if (dry_run)
@@ -1219,22 +1237,19 @@ static void calculate_changed_submodule_paths(struct repository *r,
 
 	for_each_string_list_item(name, changed_submodule_names) {
 		struct oid_array *commits = name->util;
-		const struct submodule *submodule;
-		const char *path = NULL;
-
-		submodule = submodule_from_name(r, &null_oid, name->string);
-		if (submodule)
-			path = submodule->path;
-		else
-			path = default_name_or_path(name->string);
+		struct repository *subrepo;
 
-		if (!path)
+		subrepo = get_changed_submodule_repo(r, name->string);
+		if (!subrepo)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
+		if (submodule_has_commits(subrepo, commits)) {
 			oid_array_clear(commits);
 			*name->string = '\0';
 		}
+
+		repo_clear(subrepo);
+		free(subrepo);
 	}
 
 	string_list_remove_empty_items(changed_submodule_names, 1);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] Add failing test for partial clones with submodules
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
  2020-09-29 15:53 ` [PATCH 1/7] refs: store owning repository for object lookup Andrew Oakley
  2020-09-29 15:53 ` [PATCH 2/7] submodule: use separate submodule repositories Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 4/7] refs: use correct repo in refs_peel_ref Andrew Oakley
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

From: Luke Diamand <luke@diamand.org>

When using a partial clone with submodules, the initial clone works
fine, but subsequent updates fail with:

    fatal: git upload-pack: not our ref 5d54256650497d43cbeedc86648d6f16eaf556d2
    fatal: remote error: upload-pack: not our ref 5d54256650497d43cbeedc86648d6f16eaf556d2

Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 t/t0411-partial-clone-submodules.sh | 44 +++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100755 t/t0411-partial-clone-submodules.sh

diff --git a/t/t0411-partial-clone-submodules.sh b/t/t0411-partial-clone-submodules.sh
new file mode 100755
index 0000000000..e8fdcc4670
--- /dev/null
+++ b/t/t0411-partial-clone-submodules.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+test_description='partial clone with submodules'
+
+. ./test-lib.sh
+
+test_expect_success 'partial clone setup' '
+	test_create_repo super &&
+    test_create_repo submod &&
+
+	git -C super config uploadpack.allowfilter true &&
+	git -C super config uploadpack.allowanysha1inwant true &&
+
+    generate_zero_bytes 1024 >submod/bigfile &&
+    generate_zero_bytes 1 >submod/smallfile &&
+    git -C submod add bigfile smallfile &&
+    git -C submod commit -m "Adding files" &&
+    git -C super submodule add ../submod ./submod &&
+    git -C super commit -m "Adding submodule" &&
+    generate_zero_bytes 1025 >submod/bigfile &&
+    git -C submod commit -m "Extend bigfile" bigfile &&
+    git -C super submodule update --rebase --remote &&
+    git -C super add submod &&
+    git -C super commit -m "Extend bigfile"
+'
+
+test_expect_success 'partial clone of super' '
+    git clone --recursive --filter=blob:limit=512 "file://$(pwd)/super" cloned-super &&
+    test_path_exists cloned-super/submod/bigfile
+'
+
+test_expect_success 'update submodule' '
+    generate_zero_bytes 1026 >submod/bigfile &&
+    git -C submod commit -m "Further extend bigfile" bigfile &&
+    git -C super submodule update --rebase --remote &&
+    git -C super add submod &&
+    git -C super commit -m "Further extend bigfile"
+'
+
+test_expect_success 'update partial clone' '
+    git -C cloned-super pull --recurse-submodules --rebase
+'
+
+test_done
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] refs: use correct repo in refs_peel_ref
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
                   ` (2 preceding siblings ...)
  2020-09-29 15:53 ` [PATCH 3/7] Add failing test for partial clones with submodules Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 5/7] merge-recursive: use separate submodule repository Andrew Oakley
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

This required a few other functions to be updated to pass the corret
repo through.  If there is no repository, fail.

This change doesn't fix any known issue.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 builtin/fsck.c                   |  2 +-
 builtin/pack-objects.c           |  2 +-
 http-push.c                      |  2 +-
 object.c                         |  7 +++----
 object.h                         |  2 +-
 refs.c                           | 15 ++++++++++-----
 refs/packed-backend.c            |  5 +++--
 refs/ref-cache.c                 |  2 +-
 refs/refs-internal.h             |  4 +++-
 t/helper/test-example-decorate.c |  6 +++---
 t/helper/test-reach.c            |  2 +-
 tag.c                            |  4 ++--
 tag.h                            |  2 +-
 upload-pack.c                    |  2 +-
 walker.c                         |  3 ++-
 15 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index fbf26cafcf..9ce450d372 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -745,7 +745,7 @@ static int fsck_cache_tree(struct cache_tree *it)
 
 static void mark_object_for_connectivity(const struct object_id *oid)
 {
-	struct object *obj = lookup_unknown_object(oid);
+	struct object *obj = lookup_unknown_object(the_repository, oid);
 	obj->flags |= HAS_OBJ;
 }
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5617c01b5a..7ee63cb192 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3161,7 +3161,7 @@ static void add_objects_in_unpacked_packs(void)
 
 		for (i = 0; i < p->num_objects; i++) {
 			nth_packed_object_id(&oid, p, i);
-			o = lookup_unknown_object(&oid);
+			o = lookup_unknown_object(the_repository, &oid);
 			if (!(o->flags & OBJECT_ADDED))
 				mark_in_pack_object(o, p, &in_pack);
 			o->flags |= OBJECT_ADDED;
diff --git a/http-push.c b/http-push.c
index 6a4a43e07f..a9d1383165 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1436,7 +1436,7 @@ static void one_remote_ref(const char *refname)
 	 * may be required for updating server info later.
 	 */
 	if (repo->can_update_info_refs && !has_object_file(&ref->old_oid)) {
-		obj = lookup_unknown_object(&ref->old_oid);
+		obj = lookup_unknown_object(the_repository, &ref->old_oid);
 		fprintf(stderr,	"  fetch %s for %s\n",
 			oid_to_hex(&ref->old_oid), refname);
 		add_fetch_request(obj);
diff --git a/object.c b/object.c
index 3257518656..a538d2dd77 100644
--- a/object.c
+++ b/object.c
@@ -177,12 +177,11 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 }
 
-struct object *lookup_unknown_object(const struct object_id *oid)
+struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid)
 {
-	struct object *obj = lookup_object(the_repository, oid);
+	struct object *obj = lookup_object(r, oid);
 	if (!obj)
-		obj = create_object(the_repository, oid,
-				    alloc_object_node(the_repository));
+		obj = create_object(r, oid, alloc_object_node(r));
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 20b18805f0..8d6daf6df7 100644
--- a/object.h
+++ b/object.h
@@ -145,7 +145,7 @@ struct object *parse_object_or_die(const struct object_id *oid, const char *name
 struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p);
 
 /** Returns the object, with potentially excess memory allocated. **/
-struct object *lookup_unknown_object(const struct object_id *oid);
+struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid);
 
 struct object_list *object_list_insert(struct object *item,
 				       struct object_list **list_p);
diff --git a/refs.c b/refs.c
index e62da6f2de..ceff6d2af5 100644
--- a/refs.c
+++ b/refs.c
@@ -336,12 +336,14 @@ static int filter_refs(const char *refname, const struct object_id *oid,
 	return filter->fn(refname, oid, flags, filter->cb_data);
 }
 
-enum peel_status peel_object(const struct object_id *name, struct object_id *oid)
+enum peel_status peel_object(struct repository *repo,
+			     const struct object_id *name,
+			     struct object_id *oid)
 {
-	struct object *o = lookup_unknown_object(name);
+	struct object *o = lookup_unknown_object(repo, name);
 
 	if (o->type == OBJ_NONE) {
-		int type = oid_object_info(the_repository, name, NULL);
+		int type = oid_object_info(repo, name, NULL);
 		if (type < 0 || !object_as_type(o, type, 0))
 			return PEEL_INVALID;
 	}
@@ -349,7 +351,7 @@ enum peel_status peel_object(const struct object_id *name, struct object_id *oid
 	if (o->type != OBJ_TAG)
 		return PEEL_NON_TAG;
 
-	o = deref_tag_noverify(o);
+	o = deref_tag_noverify(repo, o);
 	if (!o)
 		return PEEL_INVALID;
 
@@ -1903,7 +1905,10 @@ int refs_peel_ref(struct ref_store *refs, const char *refname,
 			       RESOLVE_REF_READING, &base, &flag))
 		return -1;
 
-	return peel_object(&base, oid);
+	if (!refs->repo)
+		return -1;
+
+	return peel_object(refs->repo, &base, oid);
 }
 
 int peel_ref(const char *refname, struct object_id *oid)
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9743ee0155..0d598ee76c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -892,7 +892,7 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	} else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
 		return -1;
 	} else {
-		return !!peel_object(&iter->oid, peeled);
+		return !!peel_object(ref_iterator->repo, &iter->oid, peeled);
 	}
 }
 
@@ -1242,7 +1242,8 @@ static int write_with_updates(struct packed_ref_store *refs,
 			i++;
 		} else {
 			struct object_id peeled;
-			int peel_error = peel_object(&update->new_oid,
+			int peel_error = peel_object(refs->base.repo,
+						     &update->new_oid,
 						     &peeled);
 
 			if (write_packed_entry(out, update->refname,
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 974d37ee79..fc54c4eae3 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -491,7 +491,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
-	return peel_object(ref_iterator->oid, peeled);
+	return peel_object(ref_iterator->repo, ref_iterator->oid, peeled);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 9e9b2e8c76..b08fd7a247 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -102,7 +102,9 @@ enum peel_status {
  * or is not valid, return PEEL_NON_TAG or PEEL_INVALID, respectively,
  * and leave oid unchanged.
  */
-enum peel_status peel_object(const struct object_id *name, struct object_id *oid);
+enum peel_status peel_object(struct repository *repo,
+			     const struct object_id *name,
+			     struct object_id *oid);
 
 /**
  * Information needed for a single ref update. Set new_oid to the new
diff --git a/t/helper/test-example-decorate.c b/t/helper/test-example-decorate.c
index c8a1cde7d2..b9d1200eb9 100644
--- a/t/helper/test-example-decorate.c
+++ b/t/helper/test-example-decorate.c
@@ -26,8 +26,8 @@ int cmd__example_decorate(int argc, const char **argv)
 	 * Add 2 objects, one with a non-NULL decoration and one with a NULL
 	 * decoration.
 	 */
-	one = lookup_unknown_object(&one_oid);
-	two = lookup_unknown_object(&two_oid);
+	one = lookup_unknown_object(the_repository, &one_oid);
+	two = lookup_unknown_object(the_repository, &two_oid);
 	ret = add_decoration(&n, one, &decoration_a);
 	if (ret)
 		BUG("when adding a brand-new object, NULL should be returned");
@@ -56,7 +56,7 @@ int cmd__example_decorate(int argc, const char **argv)
 	ret = lookup_decoration(&n, two);
 	if (ret != &decoration_b)
 		BUG("lookup should return added declaration");
-	three = lookup_unknown_object(&three_oid);
+	three = lookup_unknown_object(the_repository, &three_oid);
 	ret = lookup_decoration(&n, three);
 	if (ret)
 		BUG("lookup for unknown object should return NULL");
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 14a3655442..9bae4b0682 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -61,7 +61,7 @@ int cmd__reach(int ac, const char **av)
 			die("failed to resolve %s", buf.buf + 2);
 
 		orig = parse_object(r, &oid);
-		peeled = deref_tag_noverify(orig);
+		peeled = deref_tag_noverify(r, orig);
 
 		if (!peeled)
 			die("failed to load commit for input %s resulting in oid %s\n",
diff --git a/tag.c b/tag.c
index 1ed2684e45..1ba7fda738 100644
--- a/tag.c
+++ b/tag.c
@@ -86,10 +86,10 @@ struct object *deref_tag(struct repository *r, struct object *o, const char *war
 	return o;
 }
 
-struct object *deref_tag_noverify(struct object *o)
+struct object *deref_tag_noverify(struct repository *r, struct object *o)
 {
 	while (o && o->type == OBJ_TAG) {
-		o = parse_object(the_repository, &o->oid);
+		o = parse_object(r, &o->oid);
 		if (o && o->type == OBJ_TAG && ((struct tag *)o)->tagged)
 			o = ((struct tag *)o)->tagged;
 		else
diff --git a/tag.h b/tag.h
index 3ce8e72192..c49d7c19ad 100644
--- a/tag.h
+++ b/tag.h
@@ -16,7 +16,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
 struct object *deref_tag(struct repository *r, struct object *, const char *, int);
-struct object *deref_tag_noverify(struct object *);
+struct object *deref_tag_noverify(struct repository *r, struct object *);
 int gpg_verify_tag(const struct object_id *oid,
 		   const char *name_to_report, unsigned flags);
 struct object_id *get_tagged_oid(struct tag *tag);
diff --git a/upload-pack.c b/upload-pack.c
index 3b858eb457..ccbb8887b0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1149,7 +1149,7 @@ static void receive_needs(struct upload_pack_data *data,
 static int mark_our_ref(const char *refname, const char *refname_full,
 			const struct object_id *oid)
 {
-	struct object *o = lookup_unknown_object(oid);
+	struct object *o = lookup_unknown_object(the_repository, oid);
 
 	if (ref_is_hidden(refname, refname_full)) {
 		o->flags |= HIDDEN_REF;
diff --git a/walker.c b/walker.c
index 4984bf8b3d..2a5cc90418 100644
--- a/walker.c
+++ b/walker.c
@@ -298,7 +298,8 @@ int walker_fetch(struct walker *walker, int targets, char **target,
 			error("Could not interpret response from server '%s' as something to pull", target[i]);
 			goto done;
 		}
-		if (process(walker, lookup_unknown_object(&oids[i])))
+		if (process(walker,
+			    lookup_unknown_object(the_repository, &oids[i])))
 			goto done;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] merge-recursive: use separate submodule repository
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
                   ` (3 preceding siblings ...)
  2020-09-29 15:53 ` [PATCH 4/7] refs: use correct repo in refs_peel_ref Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 6/7] submodule: remove add_submodule_odb Andrew Oakley
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

Using a separate repository with it's own object store should allow any
objects that need to be fetched from a promisor or an alternate object
store to work correctly.

This change is not expected to alter any behaviour.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 merge-recursive.c | 73 ++++++++++++++++++++++++++++-------------------
 revision.c        | 21 +++++---------
 revision.h        |  1 -
 submodule.c       | 11 +------
 submodule.h       | 11 +++++++
 5 files changed, 62 insertions(+), 55 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..4984c1e19e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -25,6 +25,7 @@
 #include "revision.h"
 #include "string-list.h"
 #include "submodule.h"
+#include "submodule-config.h"
 #include "tag.h"
 #include "tree-walk.h"
 #include "unpack-trees.h"
@@ -340,7 +341,8 @@ static void output(struct merge_options *opt, int v, const char *fmt, ...)
 		flush_output(opt);
 }
 
-static void output_commit_title(struct merge_options *opt, struct commit *commit)
+static void output_commit_title(struct merge_options *opt,
+				struct repository *repo, struct commit *commit)
 {
 	struct merge_remote_desc *desc;
 
@@ -352,15 +354,16 @@ static void output_commit_title(struct merge_options *opt, struct commit *commit
 		strbuf_add_unique_abbrev(&opt->obuf, &commit->object.oid,
 					 DEFAULT_ABBREV);
 		strbuf_addch(&opt->obuf, ' ');
-		if (parse_commit(commit) != 0)
+		if (repo_parse_commit(repo, commit) != 0)
 			strbuf_addstr(&opt->obuf, _("(bad commit)\n"));
 		else {
 			const char *title;
-			const char *msg = get_commit_buffer(commit, NULL);
+			const char *msg = repo_get_commit_buffer(repo, commit,
+								 NULL);
 			int len = find_commit_subject(msg, &title);
 			if (len)
 				strbuf_addf(&opt->obuf, "%.*s\n", len, title);
-			unuse_commit_buffer(commit, msg);
+			repo_unuse_commit_buffer(repo, commit, msg);
 		}
 	}
 	flush_output(opt);
@@ -1110,9 +1113,8 @@ static int find_first_merges(struct repository *repo,
 	xsnprintf(merged_revision, sizeof(merged_revision), "^%s",
 		  oid_to_hex(&a->object.oid));
 	repo_init_revisions(repo, &revs, NULL);
-	rev_opts.submodule = path;
 	/* FIXME: can't handle linked worktrees in submodules yet */
-	revs.single_worktree = path != NULL;
+	revs.single_worktree = repo != the_repository;
 	setup_revisions(ARRAY_SIZE(rev_args)-1, rev_args, &revs, &rev_opts);
 
 	/* save all revisions from the above list that contain b */
@@ -1149,12 +1151,12 @@ static int find_first_merges(struct repository *repo,
 	return result->nr;
 }
 
-static void print_commit(struct commit *commit)
+static void print_commit(struct repository *repo, struct commit *commit)
 {
 	struct strbuf sb = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
 	ctx.date_mode.type = DATE_NORMAL;
-	format_commit_message(commit, " %h: %m %s", &sb, &ctx);
+	repo_format_commit_message(repo, commit, " %h: %m %s", &sb, &ctx);
 	fprintf(stderr, "%s\n", sb.buf);
 	strbuf_release(&sb);
 }
@@ -1169,9 +1171,11 @@ static int merge_submodule(struct merge_options *opt,
 			   const struct object_id *base, const struct object_id *a,
 			   const struct object_id *b)
 {
+	struct repository *subrepo;
 	struct commit *commit_base, *commit_a, *commit_b;
 	int parent_count;
 	struct object_array merges;
+	int clean = 0;
 
 	int i;
 	int search = !opt->priv->call_depth;
@@ -1187,49 +1191,52 @@ static int merge_submodule(struct merge_options *opt,
 	if (is_null_oid(b))
 		return 0;
 
-	if (add_submodule_odb(path)) {
+	subrepo = open_submodule(path);
+	if (!subrepo) {
 		output(opt, 1, _("Failed to merge submodule %s (not checked out)"), path);
 		return 0;
 	}
 
-	if (!(commit_base = lookup_commit_reference(opt->repo, base)) ||
-	    !(commit_a = lookup_commit_reference(opt->repo, a)) ||
-	    !(commit_b = lookup_commit_reference(opt->repo, b))) {
+	if (!(commit_base = lookup_commit_reference(subrepo, base)) ||
+	    !(commit_a = lookup_commit_reference(subrepo, a)) ||
+	    !(commit_b = lookup_commit_reference(subrepo, b))) {
 		output(opt, 1, _("Failed to merge submodule %s (commits not present)"), path);
-		return 0;
+		goto cleanup;
 	}
 
 	/* check whether both changes are forward */
-	if (!in_merge_bases(commit_base, commit_a) ||
-	    !in_merge_bases(commit_base, commit_b)) {
+	if (!repo_in_merge_bases(subrepo, commit_base, commit_a) ||
+	    !repo_in_merge_bases(subrepo, commit_base, commit_b)) {
 		output(opt, 1, _("Failed to merge submodule %s (commits don't follow merge-base)"), path);
-		return 0;
+		goto cleanup;
 	}
 
 	/* Case #1: a is contained in b or vice versa */
-	if (in_merge_bases(commit_a, commit_b)) {
+	if (repo_in_merge_bases(subrepo, commit_a, commit_b)) {
 		oidcpy(result, b);
 		if (show(opt, 3)) {
 			output(opt, 3, _("Fast-forwarding submodule %s to the following commit:"), path);
-			output_commit_title(opt, commit_b);
+			output_commit_title(opt, subrepo, commit_b);
 		} else if (show(opt, 2))
 			output(opt, 2, _("Fast-forwarding submodule %s"), path);
 		else
 			; /* no output */
 
-		return 1;
+		clean = 1;
+		goto cleanup;
 	}
-	if (in_merge_bases(commit_b, commit_a)) {
+	if (repo_in_merge_bases(subrepo, commit_b, commit_a)) {
 		oidcpy(result, a);
 		if (show(opt, 3)) {
 			output(opt, 3, _("Fast-forwarding submodule %s to the following commit:"), path);
-			output_commit_title(opt, commit_a);
+			output_commit_title(opt, subrepo, commit_a);
 		} else if (show(opt, 2))
 			output(opt, 2, _("Fast-forwarding submodule %s"), path);
 		else
 			; /* no output */
 
-		return 1;
+		clean = 1;
+		goto cleanup;
 	}
 
 	/*
@@ -1241,10 +1248,10 @@ static int merge_submodule(struct merge_options *opt,
 
 	/* Skip the search if makes no sense to the calling context.  */
 	if (!search)
-		return 0;
+		goto cleanup;
 
 	/* find commit which merges them */
-	parent_count = find_first_merges(opt->repo, &merges, path,
+	parent_count = find_first_merges(subrepo, &merges, path,
 					 commit_a, commit_b);
 	switch (parent_count) {
 	case 0:
@@ -1254,7 +1261,8 @@ static int merge_submodule(struct merge_options *opt,
 	case 1:
 		output(opt, 1, _("Failed to merge submodule %s (not fast-forward)"), path);
 		output(opt, 2, _("Found a possible merge resolution for the submodule:\n"));
-		print_commit((struct commit *) merges.objects[0].item);
+		print_commit(subrepo,
+			     (struct commit *) merges.objects[0].item);
 		output(opt, 2, _(
 		       "If this is correct simply add it to the index "
 		       "for example\n"
@@ -1267,11 +1275,16 @@ static int merge_submodule(struct merge_options *opt,
 	default:
 		output(opt, 1, _("Failed to merge submodule %s (multiple merges found)"), path);
 		for (i = 0; i < merges.nr; i++)
-			print_commit((struct commit *) merges.objects[i].item);
+			print_commit(subrepo,
+				     (struct commit *) merges.objects[i].item);
 	}
 
 	object_array_clear(&merges);
-	return 0;
+
+cleanup:
+	repo_clear(subrepo);
+	free(subrepo);
+	return clean;
 }
 
 static int merge_mode_and_contents(struct merge_options *opt,
@@ -3548,8 +3561,8 @@ static int merge_recursive_internal(struct merge_options *opt,
 
 	if (show(opt, 4)) {
 		output(opt, 4, _("Merging:"));
-		output_commit_title(opt, h1);
-		output_commit_title(opt, h2);
+		output_commit_title(opt, opt->repo, h1);
+		output_commit_title(opt, opt->repo, h2);
 	}
 
 	if (!merge_bases) {
@@ -3563,7 +3576,7 @@ static int merge_recursive_internal(struct merge_options *opt,
 		output(opt, 5, Q_("found %u common ancestor:",
 				"found %u common ancestors:", cnt), cnt);
 		for (iter = merge_bases; iter; iter = iter->next)
-			output_commit_title(opt, iter->item);
+			output_commit_title(opt, opt->repo, iter->item);
 	}
 
 	merged_merge_bases = pop_commit(&merge_bases);
diff --git a/revision.c b/revision.c
index 067030e64c..a3747aaecc 100644
--- a/revision.c
+++ b/revision.c
@@ -2582,16 +2582,15 @@ static int for_each_good_bisect_ref(struct ref_store *refs, each_ref_fn fn, void
 	return for_each_bisect_ref(refs, fn, cb_data, term_good);
 }
 
-static int handle_revision_pseudo_opt(const char *submodule,
-				struct rev_info *revs,
-				int argc, const char **argv, int *flags)
+static int handle_revision_pseudo_opt(struct rev_info *revs,
+				      int argc, const char **argv, int *flags)
 {
 	const char *arg = argv[0];
 	const char *optarg;
 	struct ref_store *refs;
 	int argcount;
 
-	if (submodule) {
+	if (revs->repo != the_repository) {
 		/*
 		 * We need some something like get_submodule_worktrees()
 		 * before we can go through all worktrees of a submodule,
@@ -2600,9 +2599,8 @@ static int handle_revision_pseudo_opt(const char *submodule,
 		 */
 		if (!revs->single_worktree)
 			BUG("--single-worktree cannot be used together with submodule");
-		refs = get_submodule_ref_store(submodule);
-	} else
-		refs = get_main_ref_store(revs->repo);
+	}
+	refs = get_main_ref_store(revs->repo);
 
 	/*
 	 * NOTE!
@@ -2720,12 +2718,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 {
 	int i, flags, left, seen_dashdash, revarg_opt;
 	struct strvec prune_data = STRVEC_INIT;
-	const char *submodule = NULL;
 	int seen_end_of_options = 0;
 
-	if (opt)
-		submodule = opt->submodule;
-
 	/* First, search for "--" */
 	if (opt && opt->assume_dashdash) {
 		seen_dashdash = 1;
@@ -2754,9 +2748,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		if (!seen_end_of_options && *arg == '-') {
 			int opts;
 
-			opts = handle_revision_pseudo_opt(submodule,
-						revs, argc - i, argv + i,
-						&flags);
+			opts = handle_revision_pseudo_opt(revs, argc - i,
+							  argv + i, &flags);
 			if (opts > 0) {
 				i += opts - 1;
 				continue;
diff --git a/revision.h b/revision.h
index f6bf860d19..2b8396fc44 100644
--- a/revision.h
+++ b/revision.h
@@ -330,7 +330,6 @@ extern volatile show_early_output_fn_t show_early_output;
 struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
-	const char *submodule;	/* TODO: drop this and use rev_info->repo */
 	unsigned int	assume_dashdash:1,
 			allow_exclude_promisor_objects:1;
 	unsigned revarg_opt;
diff --git a/submodule.c b/submodule.c
index 3889dc7d9a..1671800340 100644
--- a/submodule.c
+++ b/submodule.c
@@ -505,16 +505,7 @@ static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
 	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
 }
 
-/*
- * Initialize a repository struct for a submodule based on the provided 'path'.
- *
- * Unlike repo_submodule_init, this tolerates submodules not present
- * in .gitmodules. This function exists only to preserve historical behavior,
- *
- * Returns the repository struct on success,
- * NULL when the submodule is not present.
- */
-static struct repository *open_submodule(const char *path)
+struct repository *open_submodule(const char *path)
 {
 	struct strbuf sb = STRBUF_INIT;
 	struct repository *out = xmalloc(sizeof(*out));
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1..6a66d3e05f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -99,6 +99,17 @@ int bad_to_remove_submodule(const char *path, unsigned flags);
 
 int add_submodule_odb(const char *path);
 
+/*
+ * Initialize a repository struct for a submodule based on the provided 'path'.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns the repository struct on success,
+ * NULL when the submodule is not present.
+ */
+struct repository *open_submodule(const char *path);
+
 /*
  * Checks if there are submodule changes in a..b. If a is the null OID,
  * checks b and all its ancestors instead.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] submodule: remove add_submodule_odb
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
                   ` (4 preceding siblings ...)
  2020-09-29 15:53 ` [PATCH 5/7] merge-recursive: use separate submodule repository Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 15:53 ` [PATCH 7/7] submodule: use partial clone filter Andrew Oakley
  2020-09-29 18:05 ` [PATCH 0/7] Submodules and partial clones Jonathan Tan
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

The remaining callers to get_submodule_ref_store do not require access
to the objects, and do not call add_submodule_odb.  This function can
therefore be removed and the refstore marked as not having access to the
objects.

The submodule ref-store tests have been updated to reflect the fact that
the objects are no longer available in the ref-store.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 refs.c                         |  5 +----
 refs.h                         | 37 +++-------------------------------
 submodule.c                    | 19 -----------------
 submodule.h                    |  2 --
 t/helper/test-ref-store.c      |  1 -
 t/t1406-submodule-ref-store.sh | 17 ++--------------
 6 files changed, 6 insertions(+), 75 deletions(-)

diff --git a/refs.c b/refs.c
index ceff6d2af5..0a1960b4fe 100644
--- a/refs.c
+++ b/refs.c
@@ -1832,10 +1832,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 	if (submodule_to_gitdir(&submodule_sb, submodule))
 		goto done;
 
-	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(the_repository,
-			      submodule_sb.buf,
-			      REF_STORE_READ | REF_STORE_ODB);
+	refs = ref_store_init(NULL, submodule_sb.buf, REF_STORE_READ);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
 
diff --git a/refs.h b/refs.h
index 6695518156..a1b381437b 100644
--- a/refs.h
+++ b/refs.h
@@ -831,40 +831,6 @@ int ref_storage_backend_exists(const char *name);
 
 struct ref_store *get_main_ref_store(struct repository *r);
 
-/**
- * Submodules
- * ----------
- *
- * If you want to iterate the refs of a submodule you first need to add the
- * submodules object database. You can do this by a code-snippet like
- * this:
- *
- * 	const char *path = "path/to/submodule"
- * 	if (add_submodule_odb(path))
- * 		die("Error submodule '%s' not populated.", path);
- *
- * `add_submodule_odb()` will return zero on success. If you
- * do not do this you will get an error for each ref that it does not point
- * to a valid object.
- *
- * Note: As a side-effect of this you cannot safely assume that all
- * objects you lookup are available in superproject. All submodule objects
- * will be available the same way as the superprojects objects.
- *
- * Example:
- * --------
- *
- * ----
- * static int handle_remote_ref(const char *refname,
- * 		const unsigned char *sha1, int flags, void *cb_data)
- * {
- * 	struct strbuf *output = cb_data;
- * 	strbuf_addf(output, "%s\n", refname);
- * 	return 0;
- * }
- *
- */
-
 /*
  * Return the ref_store instance for the specified submodule. For the
  * main repository, use submodule==NULL; such a call cannot fail. For
@@ -874,6 +840,9 @@ struct ref_store *get_main_ref_store(struct repository *r);
  *
  * For backwards compatibility, submodule=="" is treated the same as
  * submodule==NULL.
+ *
+ * The ref_store for a submodule will be read only and have no access to the
+ * object database.  Only raw access to the refs is possible.
  */
 struct ref_store *get_submodule_ref_store(const char *submodule);
 struct ref_store *get_worktree_ref_store(const struct worktree *wt);
diff --git a/submodule.c b/submodule.c
index 1671800340..cf7db6815e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -165,25 +165,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 		die(_("staging updated .gitmodules failed"));
 }
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	add_to_alternates_memory(objects_directory.buf);
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 					     const char *path)
 {
diff --git a/submodule.h b/submodule.h
index 6a66d3e05f..0260c75e06 100644
--- a/submodule.h
+++ b/submodule.h
@@ -97,8 +97,6 @@ int submodule_uses_gitfile(const char *path);
 #define SUBMODULE_REMOVAL_IGNORE_IGNORED_UNTRACKED (1<<2)
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
-int add_submodule_odb(const char *path);
-
 /*
  * Initialize a repository struct for a submodule based on the provided 'path'.
  *
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 759e69dc54..1ec9f1b764 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -32,7 +32,6 @@ static const char **get_store(const char **argv, struct ref_store **refs)
 		ret = strbuf_git_path_submodule(&sb, gitdir, "objects/");
 		if (ret)
 			die("strbuf_git_path_submodule failed: %d", ret);
-		add_to_alternates_memory(sb.buf);
 		strbuf_release(&sb);
 
 		*refs = get_submodule_ref_store(gitdir);
diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index 36b7ef5046..f4fbf26149 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -20,10 +20,8 @@ test_expect_success 'pack_refs() not allowed' '
 '
 
 test_expect_success 'peel_ref(new-tag)' '
-	git -C sub rev-parse HEAD >expected &&
 	git -C sub tag -a -m new-tag new-tag HEAD &&
-	$RUN peel-ref refs/tags/new-tag >actual &&
-	test_cmp expected actual
+	test_must_fail $RUN peel-ref refs/tags/new-tag
 '
 
 test_expect_success 'create_symref() not allowed' '
@@ -39,18 +37,7 @@ test_expect_success 'rename_refs() not allowed' '
 '
 
 test_expect_success 'for_each_ref(refs/heads/)' '
-	$RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
-	cat >expected <<-\EOF &&
-	master 0x0
-	new-master 0x0
-	EOF
-	test_cmp expected actual
-'
-
-test_expect_success 'for_each_ref() is sorted' '
-	$RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
-	sort actual > expected &&
-	test_cmp expected actual
+	test_must_fail $RUN for-each-ref refs/heads/
 '
 
 test_expect_success 'resolve_ref(master)' '
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] submodule: use partial clone filter
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
                   ` (5 preceding siblings ...)
  2020-09-29 15:53 ` [PATCH 6/7] submodule: remove add_submodule_odb Andrew Oakley
@ 2020-09-29 15:53 ` Andrew Oakley
  2020-09-29 18:05 ` [PATCH 0/7] Submodules and partial clones Jonathan Tan
  7 siblings, 0 replies; 11+ messages in thread
From: Andrew Oakley @ 2020-09-29 15:53 UTC (permalink / raw)
  To: git, Luke Diamand, Jonathan Tan; +Cc: Andrew Oakley

Pass the --filter option down when fetching submodules.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
---
 builtin/clone.c             |  5 +++++
 builtin/submodule--helper.c | 21 ++++++++++++++++-----
 git-submodule.sh            | 20 +++++++++++++++++++-
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fbfd6568cd..354dadc544 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -844,6 +844,11 @@ static int checkout(int submodule_progress)
 					       "--single-branch" :
 					       "--no-single-branch");
 
+		if (filter_options.choice)
+			strvec_pushf(&args, "--filter=%s",
+				     expand_list_objects_filter_spec(
+					&filter_options));
+
 		err = run_command_v_opt(args.v, RUN_GIT_CMD);
 		strvec_clear(&args);
 	}
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index de5ad73bb8..1ca03ec08b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1658,7 +1658,8 @@ static int module_deinit(int argc, const char **argv, const char *prefix)
 }
 
 static int clone_submodule(const char *path, const char *gitdir, const char *url,
-			   const char *depth, struct string_list *reference, int dissociate,
+			   const char *depth, const char *filter,
+			   struct string_list *reference, int dissociate,
 			   int quiet, int progress, int single_branch)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -1671,6 +1672,8 @@ static int clone_submodule(const char *path, const char *gitdir, const char *url
 		strvec_push(&cp.args, "--progress");
 	if (depth && *depth)
 		strvec_pushl(&cp.args, "--depth", depth, NULL);
+	if (filter && *filter)
+		strvec_pushl(&cp.args, "--filter", filter, NULL);
 	if (reference->nr) {
 		struct string_list_item *item;
 		for_each_string_list_item(item, reference)
@@ -1803,7 +1806,7 @@ static void prepare_possible_alternates(const char *sm_name,
 
 static int module_clone(int argc, const char **argv, const char *prefix)
 {
-	const char *name = NULL, *url = NULL, *depth = NULL;
+	const char *name = NULL, *url = NULL, *depth = NULL, *filter = NULL;
 	int quiet = 0;
 	int progress = 0;
 	char *p, *path = NULL, *sm_gitdir;
@@ -1834,6 +1837,9 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &depth,
 			   N_("string"),
 			   N_("depth for shallow clones")),
+		OPT_STRING(0, "filter", &filter,
+			   N_("string"),
+			   N_("object filtering")),
 		OPT__QUIET(&quiet, "Suppress output for cloning a submodule"),
 		OPT_BOOL(0, "progress", &progress,
 			   N_("force cloning progress")),
@@ -1847,7 +1853,7 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	const char *const git_submodule_helper_usage[] = {
 		N_("git submodule--helper clone [--prefix=<path>] [--quiet] "
 		   "[--reference <repository>] [--name <name>] [--depth <depth>] "
-		   "[--single-branch] "
+		   "[--filter <filter>] [--single-branch] "
 		   "--url <url> --path <path>"),
 		NULL
 	};
@@ -1879,8 +1885,8 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 
 		prepare_possible_alternates(name, &reference);
 
-		if (clone_submodule(path, sm_gitdir, url, depth, &reference, dissociate,
-				    quiet, progress, single_branch))
+		if (clone_submodule(path, sm_gitdir, url, depth, filter, &reference,
+				    dissociate, quiet, progress, single_branch))
 			die(_("clone of '%s' into submodule path '%s' failed"),
 			    url, path);
 	} else {
@@ -2002,6 +2008,7 @@ struct submodule_update_clone {
 	int dissociate;
 	unsigned require_init;
 	const char *depth;
+	const char *filter;
 	const char *recursive_prefix;
 	const char *prefix;
 	int single_branch;
@@ -2165,6 +2172,8 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
 		strvec_push(&child->args, "--dissociate");
 	if (suc->depth)
 		strvec_push(&child->args, suc->depth);
+	if (suc->filter)
+		strvec_push(&child->args, suc->filter);
 	if (suc->single_branch >= 0)
 		strvec_push(&child->args, suc->single_branch ?
 					      "--single-branch" :
@@ -2339,6 +2348,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &suc.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_STRING(0, "filter", &suc.filter, "<filter>",
+			   N_("object filtering")),
 		OPT_INTEGER('j', "jobs", &suc.max_jobs,
 			    N_("parallel jobs")),
 		OPT_BOOL(0, "recommend-shallow", &suc.recommend_shallow,
diff --git a/git-submodule.sh b/git-submodule.sh
index 6fb12585cb..029f7e7f44 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -45,6 +45,7 @@ update=
 prefix=
 custom_name=
 depth=
+filter=
 progress=
 dissociate=
 single_branch=
@@ -132,6 +133,14 @@ cmd_add()
 		--depth=*)
 			depth=$1
 			;;
+		--filter)
+			case "$2" in '') usage ;; esac
+			filter="--filter=$2"
+			shift
+			;;
+		--filter=*)
+			filter=$1
+			;;
 		--)
 			shift
 			break
@@ -268,7 +277,7 @@ or you are unsure what this means choose another name with the '--name' option."
 				eval_gettextln "Reactivating local git directory for submodule '\$sm_name'."
 			fi
 		fi
-		git submodule--helper clone ${GIT_QUIET:+--quiet} ${progress:+"--progress"} --prefix "$wt_prefix" --path "$sm_path" --name "$sm_name" --url "$realrepo" ${reference:+"$reference"} ${dissociate:+"--dissociate"} ${depth:+"$depth"} || exit
+		git submodule--helper clone ${GIT_QUIET:+--quiet} ${progress:+"--progress"} --prefix "$wt_prefix" --path "$sm_path" --name "$sm_name" --url "$realrepo" ${reference:+"$reference"} ${dissociate:+"--dissociate"} ${depth:+"$depth"} ${filter:+"$filter"} || exit
 		(
 			sanitize_submodule_env
 			cd "$sm_path" &&
@@ -498,6 +507,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		--filter)
+			case "$2" in '') usage ;; esac
+			filter="--depth=$2"
+			shift
+			;;
+		--filter=*)
+			filter=$1
+			;;
 		-j|--jobs)
 			case "$2" in '') usage ;; esac
 			jobs="--jobs=$2"
@@ -540,6 +557,7 @@ cmd_update()
 		${reference:+"$reference"} \
 		${dissociate:+"--dissociate"} \
 		${depth:+--depth "$depth"} \
+		${filter:+--filter "$filter"} \
 		${require_init:+--require-init} \
 		$single_branch \
 		$recommend_shallow \
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] Submodules and partial clones
  2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
                   ` (6 preceding siblings ...)
  2020-09-29 15:53 ` [PATCH 7/7] submodule: use partial clone filter Andrew Oakley
@ 2020-09-29 18:05 ` Jonathan Tan
  2020-09-30 13:28   ` Andrew Oakley
  7 siblings, 1 reply; 11+ messages in thread
From: Jonathan Tan @ 2020-09-29 18:05 UTC (permalink / raw)
  To: andrew; +Cc: git, luke, jonathantanmy

> I've been investigating what is required to get submodules and partial
> clones to work well together.  The issue seems to be that the correct
> repository is not passed around, so we sometimes end up trying to fetch
> objects from the wrong place.
> 
> These patches don't make promisor_remote_get_direct handle different
> repositories because I've not found a case where that is necessary.

Anything that reads a submodule object without spawning another process
to do so (e.g. grep, which adds submodule object stores as alternates in
order to read from them) will need to be prepared to lazy-fetch objects
into those stores.

> The patches rework various cases where objects from a submodule are
> added to the object store of the main repository.  There are some
> remaining cases where add_to_alternates_memory is used to do this, but
> add_submodule_odb has been removed.
> 
> I expect there will be some remaining issues, but these changes seem to
> be enough to get the basics working.

What are the basics that work?

When I looked into this, my main difficulty lay in getting the
lazy fetch to work in another repository. Now that lazy fetches are done
using a separate process, the problem has shifted to being able to
invoke run_command() in a separate Git repository. I haven't figured out
the best way to ensure that run_command() is run with a clean set of
environment variables (so no inheriting of GIT_DIR etc.), but that
doesn't seem insurmountable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] Submodules and partial clones
  2020-09-29 18:05 ` [PATCH 0/7] Submodules and partial clones Jonathan Tan
@ 2020-09-30 13:28   ` Andrew Oakley
  2020-09-30 20:41     ` Jonathan Tan
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Oakley @ 2020-09-30 13:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, luke

On Tue, 29 Sep 2020 11:05:08 -0700
Jonathan Tan <jonathantanmy@google.com> wrote:

> > I've been investigating what is required to get submodules and
> > partial clones to work well together.  The issue seems to be that
> > the correct repository is not passed around, so we sometimes end up
> > trying to fetch objects from the wrong place.
> > 
> > These patches don't make promisor_remote_get_direct handle different
> > repositories because I've not found a case where that is necessary.
> 
> Anything that reads a submodule object without spawning another
> process to do so (e.g. grep, which adds submodule object stores as
> alternates in order to read from them) will need to be prepared to
> lazy-fetch objects into those stores.

Yes, grep just calls `add_to_alternates_memory` and will be broken.

When handling nested submodules `config_from_gitmodules` does the same
thing, so that will also be broken if some of the .gitmodules files
need fetching.

Fixing these probably does require supporting fetching of objects from
submodules.

> > The patches rework various cases where objects from a submodule are
> > added to the object store of the main repository.  There are some
> > remaining cases where add_to_alternates_memory is used to do this,
> > but add_submodule_odb has been removed.
> > 
> > I expect there will be some remaining issues, but these changes
> > seem to be enough to get the basics working.  
> 
> What are the basics that work?

I've tried at least the following, in a repo with several submodules and
large objects (but no nested submodules):
- git clone --recursive --filter=blob:limit=1M ...
- git pull --rebase --recurse-submodules=on-demand
- git show --submodue=diff <commit-with-big-submodule-object>
- git push --recurse-submodules=check
- git push --recurse-submodules=on-demand

I used the partial clone for a while and didn't hit any problems, but I
can't say what (relevant) commands I might have used.

An important thing that I've not tried is a merge that needs to fetch
objects.  I should probably write a testcase for that.

> When I looked into this, my main difficulty lay in getting the
> lazy fetch to work in another repository. Now that lazy fetches are
> done using a separate process, the problem has shifted to being able
> to invoke run_command() in a separate Git repository. I haven't
> figured out the best way to ensure that run_command() is run with a
> clean set of environment variables (so no inheriting of GIT_DIR
> etc.), but that doesn't seem insurmountable.

Yes, I think that to fix promisor_remote_get_direct we need to:
- store the promisor configuration per-repository
- run the fetch process in the correct repository

AFAICT we just need to set cp.dir and call prepare_submodule_repo_env
to get the right environment for the fetch process. The per-repository
configuration looks more fiddly to do.  I'm happy to try and make these
additional changes (but it won't be quick as I'm busy with the day job).

In any case we need to pass the right repository around.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] Submodules and partial clones
  2020-09-30 13:28   ` Andrew Oakley
@ 2020-09-30 20:41     ` Jonathan Tan
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Tan @ 2020-09-30 20:41 UTC (permalink / raw)
  To: andrew; +Cc: jonathantanmy, git, luke

> Yes, grep just calls `add_to_alternates_memory` and will be broken.
> 
> When handling nested submodules `config_from_gitmodules` does the same
> thing, so that will also be broken if some of the .gitmodules files
> need fetching.
> 
> Fixing these probably does require supporting fetching of objects from
> submodules.

[snip]

> > > The patches rework various cases where objects from a submodule are
> > > added to the object store of the main repository.  There are some
> > > remaining cases where add_to_alternates_memory is used to do this,
> > > but add_submodule_odb has been removed.
> > > 
> > > I expect there will be some remaining issues, but these changes
> > > seem to be enough to get the basics working.  
> > 
> > What are the basics that work?
> 
> I've tried at least the following, in a repo with several submodules and
> large objects (but no nested submodules):
> - git clone --recursive --filter=blob:limit=1M ...
> - git pull --rebase --recurse-submodules=on-demand
> - git show --submodue=diff <commit-with-big-submodule-object>
> - git push --recurse-submodules=check
> - git push --recurse-submodules=on-demand
> 
> I used the partial clone for a while and didn't hit any problems, but I
> can't say what (relevant) commands I might have used.
> 
> An important thing that I've not tried is a merge that needs to fetch
> objects.  I should probably write a testcase for that.

Thanks - so it looks like what we need to have are:

 (1) propagate --filter when cloning (done here)
 (2) handle how Git lazy-fetches when accessing submodule objects
  (2a) access through add_submodule_odb (seems to have been done here -
       the patches here convert these accesses into (2c))
  (2b) access through add_to_alternates_memory (reading missing objects will
       trigger a likely-to-fail fetch)
  (2c) access through repo_.* and similar functions (reading missing objects
       will fail outright)
  (2d) access through invoking a subprocess (will work)

Having (1) and (2a) means that (as you described above) we can have
generally working partial-clone submodules (in that commands like
"clone", "pull", and "push" work), except that there may be some
commands like "grep" that will have strange behavior (lazy-fetching from
the wrong repo and failing). Currently, partial-clone submodules do not
work at all. My initial inclination was to say that we should resolve
(2c) first, but it is true that this change (even without (2b) and (2c))
would bring user-facing benefit, but at the cost of possible negative
surprises (even if we warn in the documentation that this feature is
experimental and might fail). I'm not sure what the rest of the Git
developer community thinks about this.

> > When I looked into this, my main difficulty lay in getting the
> > lazy fetch to work in another repository. Now that lazy fetches are
> > done using a separate process, the problem has shifted to being able
> > to invoke run_command() in a separate Git repository. I haven't
> > figured out the best way to ensure that run_command() is run with a
> > clean set of environment variables (so no inheriting of GIT_DIR
> > etc.), but that doesn't seem insurmountable.
> 
> Yes, I think that to fix promisor_remote_get_direct we need to:
> - store the promisor configuration per-repository
> - run the fetch process in the correct repository
> 
> AFAICT we just need to set cp.dir and call prepare_submodule_repo_env
> to get the right environment for the fetch process. The per-repository
> configuration looks more fiddly to do.  I'm happy to try and make these
> additional changes (but it won't be quick as I'm busy with the day job).
> 
> In any case we need to pass the right repository around.

Ah, good point about prepare_submodule_repo_env() - that does take care
of the environment variables.

I'll look at per-repository configuration and see what I can do too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-30 20:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 15:53 [PATCH 0/7] Submodules and partial clones Andrew Oakley
2020-09-29 15:53 ` [PATCH 1/7] refs: store owning repository for object lookup Andrew Oakley
2020-09-29 15:53 ` [PATCH 2/7] submodule: use separate submodule repositories Andrew Oakley
2020-09-29 15:53 ` [PATCH 3/7] Add failing test for partial clones with submodules Andrew Oakley
2020-09-29 15:53 ` [PATCH 4/7] refs: use correct repo in refs_peel_ref Andrew Oakley
2020-09-29 15:53 ` [PATCH 5/7] merge-recursive: use separate submodule repository Andrew Oakley
2020-09-29 15:53 ` [PATCH 6/7] submodule: remove add_submodule_odb Andrew Oakley
2020-09-29 15:53 ` [PATCH 7/7] submodule: use partial clone filter Andrew Oakley
2020-09-29 18:05 ` [PATCH 0/7] Submodules and partial clones Jonathan Tan
2020-09-30 13:28   ` Andrew Oakley
2020-09-30 20:41     ` Jonathan Tan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).