git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] fetch: make sure submodule oids are fetched
@ 2018-08-08 22:17 Stefan Beller
  2018-08-08 22:17 ` [PATCH 01/10] string_list: print_string_list to use trace_printf Stefan Beller
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows, not so well in others,
which this series aims to fix.

The first patches provide new basic functionality and do some refactoring;
the interesting part is in the two last patches.

Thanks,
Stefan

Stefan Beller (10):
  string_list: print_string_list to use trace_printf
  string-list.h: add string_list_pop function.
  sha1-array: provide oid_array_remove_if
  submodule.c: convert submodule_move_head new argument to object id
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if sha1 were not fetched

 builtin/fetch.c             |   9 +-
 entry.c                     |   6 +-
 sha1-array.c                |  39 ++++++++
 sha1-array.h                |   3 +
 string-list.c               |  12 ++-
 string-list.h               |   6 ++
 submodule.c                 | 194 +++++++++++++++++++++++++++---------
 submodule.h                 |   2 +-
 t/t5526-fetch-submodules.sh |  23 ++++-
 unpack-trees.c              |  13 +--
 10 files changed, 241 insertions(+), 66 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/10] string_list: print_string_list to use trace_printf
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-09 21:16   ` Junio C Hamano
  2018-08-08 22:17 ` [PATCH 02/10] string-list.h: add string_list_pop function Stefan Beller
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

It is a debugging aid, so it should print to the debugging channel.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/string-list.c b/string-list.c
index 771c4550980..9f651bb4294 100644
--- a/string-list.c
+++ b/string-list.c
@@ -200,9 +200,9 @@ void print_string_list(const struct string_list *p, const char *text)
 {
 	int i;
 	if ( text )
-		printf("%s\n", text);
+		trace_printf("%s\n", text);
 	for (i = 0; i < p->nr; i++)
-		printf("%s:%p\n", p->items[i].string, p->items[i].util);
+		trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
 }
 
 struct string_list_item *string_list_append_nodup(struct string_list *list,
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
  2018-08-08 22:17 ` [PATCH 01/10] string_list: print_string_list to use trace_printf Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-09  7:35   ` Martin Ågren
  2018-08-08 22:17 ` [PATCH 03/10] sha1-array: provide oid_array_remove_if Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

A string list can be used as a stack, but should we? A later patch shows
how useful this will be.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 8 ++++++++
 string-list.h | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/string-list.c b/string-list.c
index 9f651bb4294..ea80afc8a0c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -80,6 +80,14 @@ void string_list_remove(struct string_list *list, const char *string,
 	}
 }
 
+struct string_list_item *string_list_pop(struct string_list *list)
+{
+	if (list->nr == 0)
+		return 0;
+
+	return &list->items[--list->nr];
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
 	int exact_match;
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..a1a41bc961a 100644
--- a/string-list.h
+++ b/string-list.h
@@ -191,6 +191,12 @@ extern void string_list_remove(struct string_list *list, const char *string,
  */
 struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
 
+/**
+ * Returns the last item inserted and removes it from the list.
+ * If the list is empty, return NULL.
+ */
+struct string_list_item *string_list_pop(struct string_list *list);
+
 /*
  * Remove all but the first of consecutive entries with the same
  * string value.  If free_util is true, call free() on the util
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
  2018-08-08 22:17 ` [PATCH 01/10] string_list: print_string_list to use trace_printf Stefan Beller
  2018-08-08 22:17 ` [PATCH 02/10] string-list.h: add string_list_pop function Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-09  7:39   ` Martin Ågren
  2018-08-09 21:44   ` Junio C Hamano
  2018-08-08 22:17 ` [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1-array.c | 39 +++++++++++++++++++++++++++++++++++++++
 sha1-array.h |  3 +++
 2 files changed, 42 insertions(+)

diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf40..10eb08b425e 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,42 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+int oid_array_remove_if(struct oid_array *array,
+			for_each_oid_fn fn,
+			void *data)
+{
+	int i, j;
+	char *to_remove = xcalloc(array->nr, sizeof(char));
+
+	/* No oid_array_sort() here! See the api-oid-array.txt docs! */
+
+	for (i = 0; i < array->nr; i++) {
+		int ret = fn(array->oid + i, data);
+		if (ret)
+			to_remove[i] = 1;
+	}
+
+	i = 0, j = 0;
+	while (i < array->nr && j < array->nr) {
+		while (i < array->nr && !to_remove[i])
+			i++;
+		/* i at first marked for deletion or out */
+		if (j < i)
+			j = i;
+		while (j < array->nr && to_remove[j])
+			j++;
+		/* j > i; j at first valid after first deletion range or out */
+		if (i < array->nr && j < array->nr)
+			oidcpy(&array->oid[i], &array->oid[j]);
+		else if (i >= array->nr)
+			assert(j >= array->nr);
+			/* no pruning happened, keep original array->nr */
+		else if (j >= array->nr)
+			array->nr = i;
+	}
+
+	free(to_remove);
+
+	return 0;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..151c7ad7f30 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
+int oid_array_remove_if(struct oid_array *array,
+			for_each_oid_fn fn,
+			void *data);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (2 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 03/10] sha1-array: provide oid_array_remove_if Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-09 22:00   ` Junio C Hamano
  2018-08-08 22:17 ` [PATCH 05/10] submodule.c: fix indentation Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

All callers use oid_to_hex to convert the desired oid to a string before
calling submodule_move_head. Defer the conversion to the
submodule_move_head as it will turn out to be useful in a bit.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 entry.c        |  6 +++---
 submodule.c    | 12 ++++++------
 submodule.h    |  2 +-
 unpack-trees.c | 13 +++++--------
 4 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/entry.c b/entry.c
index b5d1d3cf231..4b34dfd30df 100644
--- a/entry.c
+++ b/entry.c
@@ -358,7 +358,7 @@ static int write_entry(struct cache_entry *ce,
 		sub = submodule_from_ce(ce);
 		if (sub)
 			return submodule_move_head(ce->name,
-				NULL, oid_to_hex(&ce->oid),
+				NULL, &ce->oid,
 				state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 		break;
 
@@ -438,10 +438,10 @@ int checkout_entry(struct cache_entry *ce,
 					unlink_or_warn(ce->name);
 
 				return submodule_move_head(ce->name,
-					NULL, oid_to_hex(&ce->oid), 0);
+					NULL, &ce->oid, 0);
 			} else
 				return submodule_move_head(ce->name,
-					"HEAD", oid_to_hex(&ce->oid),
+					"HEAD", &ce->oid,
 					state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 		}
 
diff --git a/submodule.c b/submodule.c
index 6e14547e9e0..5b4e5227d90 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1597,9 +1597,9 @@ static void submodule_reset_index(const char *path)
  * pass NULL for old or new respectively.
  */
 int submodule_move_head(const char *path,
-			 const char *old_head,
-			 const char *new_head,
-			 unsigned flags)
+			const char *old_head,
+			const struct object_id *new_oid,
+			unsigned flags)
 {
 	int ret = 0;
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -1679,7 +1679,7 @@ int submodule_move_head(const char *path,
 	if (!(flags & SUBMODULE_MOVE_HEAD_FORCE))
 		argv_array_push(&cp.args, old_head ? old_head : empty_tree_oid_hex());
 
-	argv_array_push(&cp.args, new_head ? new_head : empty_tree_oid_hex());
+	argv_array_push(&cp.args, new_oid ? oid_to_hex(new_oid) : empty_tree_oid_hex());
 
 	if (run_command(&cp)) {
 		ret = error(_("Submodule '%s' could not be updated."), path);
@@ -1687,7 +1687,7 @@ int submodule_move_head(const char *path,
 	}
 
 	if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
-		if (new_head) {
+		if (new_oid) {
 			child_process_init(&cp);
 			/* also set the HEAD accordingly */
 			cp.git_cmd = 1;
@@ -1696,7 +1696,7 @@ int submodule_move_head(const char *path,
 
 			prepare_submodule_repo_env(&cp.env_array);
 			argv_array_pushl(&cp.args, "update-ref", "HEAD",
-					 "--no-deref", new_head, NULL);
+					 "--no-deref", oid_to_hex(new_oid), NULL);
 
 			if (run_command(&cp)) {
 				ret = -1;
diff --git a/submodule.h b/submodule.h
index 4644683e6cb..d1cceabb9b7 100644
--- a/submodule.h
+++ b/submodule.h
@@ -118,7 +118,7 @@ int submodule_to_gitdir(struct strbuf *buf, const char *submodule);
 #define SUBMODULE_MOVE_HEAD_FORCE   (1<<1)
 extern int submodule_move_head(const char *path,
 			       const char *old,
-			       const char *new_head,
+			       const struct object_id *new_oid,
 			       unsigned flags);
 
 void submodule_unset_core_worktree(const struct submodule *sub);
diff --git a/unpack-trees.c b/unpack-trees.c
index f9efee0836a..6c76bd6162a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -256,7 +256,7 @@ static void display_error_msgs(struct unpack_trees_options *o)
 
 static int check_submodule_move_head(const struct cache_entry *ce,
 				     const char *old_id,
-				     const char *new_id,
+				     const struct object_id *new_id,
 				     struct unpack_trees_options *o)
 {
 	unsigned flags = SUBMODULE_MOVE_HEAD_DRY_RUN;
@@ -1513,7 +1513,7 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int r = check_submodule_move_head(ce,
-				"HEAD", oid_to_hex(&ce->oid), o);
+				"HEAD", &ce->oid, o);
 			if (r)
 				return o->gently ? -1 :
 					add_rejected_path(o, error_type, ce->name);
@@ -1576,8 +1576,7 @@ static int verify_clean_submodule(const char *old_sha1,
 	if (!submodule_from_ce(ce))
 		return 0;
 
-	return check_submodule_move_head(ce, old_sha1,
-					 oid_to_hex(&ce->oid), o);
+	return check_submodule_move_head(ce, old_sha1, &ce->oid, o);
 }
 
 static int verify_clean_subdirectory(const struct cache_entry *ce,
@@ -1821,8 +1820,7 @@ static int merged_entry(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int ret = check_submodule_move_head(ce, NULL,
-							    oid_to_hex(&ce->oid),
-							    o);
+							    &ce->oid, o);
 			if (ret)
 				return ret;
 		}
@@ -1850,8 +1848,7 @@ static int merged_entry(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int ret = check_submodule_move_head(ce, oid_to_hex(&old->oid),
-							    oid_to_hex(&ce->oid),
-							    o);
+							    &ce->oid, o);
 			if (ret)
 				return ret;
 		}
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/10] submodule.c: fix indentation
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (3 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-08 22:17 ` [PATCH 06/10] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5b4e5227d90..bceeba13217 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1254,8 +1255,9 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule || !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/10] submodule.c: sort changed_submodule_names before searching it
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (4 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 05/10] submodule.c: fix indentation Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-08 22:17 ` [PATCH 07/10] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

Instead of sorting it after we created an unsorted list, we could insert
correctly into the list.  As the unsorted append is in order of cache entry
names, this is already sorted if names were equal to paths for submodules.

As submodule names are often the same as their path, the input is sorted
pretty well already, so let's just do the sort afterwards.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index bceeba13217..89a46b8af50 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1255,7 +1255,7 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(
+			if (!submodule || !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1349,6 +1349,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/10] submodule: move global changed_submodule_names into fetch submodule struct
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (5 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 06/10] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-08 22:17 ` [PATCH 08/10] submodule.c: do not copy around submodule list Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 89a46b8af50..92988239f6b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1256,7 +1260,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule || !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1348,8 +1352,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1358,7 +1362,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/10] submodule.c: do not copy around submodule list
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (6 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 07/10] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-08 22:17 ` [PATCH 09/10] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
  2018-08-08 22:17 ` [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
  9 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

As a side effect, we'll have access to the util pointer for longer that
contains the commits that we need to fetch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 92988239f6b..21757e32908 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1362,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/10] submodule: fetch in submodules git directory instead of in worktree
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (7 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 08/10] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-08 22:17 ` [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
  9 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that git-fetch actually doesn't
need to be run in the worktree. So let's run it in the git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c                 | 43 ++++++++++++++++++++++++++-----------
 t/t5526-fetch-submodules.sh |  7 +++++-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 21757e32908..ec7ea6f8c2d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static const char *get_submodule_git_dir(struct repository *r, const char *path)
+{
+	struct repository subrepo;
+	const char *ret;
+
+	if (repo_submodule_init(&subrepo, r, path)) {
+		/* no entry in .gitmodules? */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
+		if (repo_init(&subrepo, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+	}
+
+	ret = xstrdup(subrepo.gitdir);
+	repo_clear(&subrepo);
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1273,16 +1298,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		git_dir = get_submodule_git_dir(spf->r, ce->name);
+		if (git_dir) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = git_dir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1294,8 +1315,6 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, submodule_prefix.buf);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 0f730d77815..4437cb17698 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -567,7 +567,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched
  2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (8 preceding siblings ...)
  2018-08-08 22:17 ` [PATCH 09/10] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-08-08 22:17 ` Stefan Beller
  2018-08-09  7:50   ` Martin Ågren
  9 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-08 22:17 UTC (permalink / raw)
  To: git; +Cc: hvoigt, Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their on ref (refs/changes/<int>).

Retry fetching a submodule if the object id that the superproject points
to, cannot be found.

Note: This is an RFC and doesn't support fetching to FETCH_HEAD yet, but
only into a local branch. To make fetching into FETCH_HEAD work, we need
some refactoring in builtin/fetch.c to adjust the calls to
'check_for_new_submodule_commits'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |  9 ++---
 submodule.c                 | 79 ++++++++++++++++++++++++++++++++++++-
 t/t5526-fetch-submodules.sh | 16 ++++++++
 3 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 34d2bd123b3..9396e6a44c6 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index ec7ea6f8c2d..6cbd0b1a470 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,6 +1127,7 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list retry;
 };
 #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
 
@@ -1259,8 +1260,10 @@ static int get_next_submodule(struct child_process *cp,
 {
 	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
+	struct string_list_item *retry_it;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
+		int recurse_config;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1280,7 +1283,9 @@ static int get_next_submodule(struct child_process *cp,
 			}
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(submodule, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
@@ -1318,9 +1323,46 @@ static int get_next_submodule(struct child_process *cp,
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
+			if (submodule != &default_submodule)
+				/* discard const-ness: */
+				*task_cb = (void*)submodule;
 			return 1;
 		}
 	}
+
+retry_next:
+	retry_it = string_list_pop(&spf->retry);
+	if (retry_it) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		const struct submodule *sub =
+				submodule_from_name(spf->r,
+						    &null_oid,
+						    retry_it->string);
+
+		child_process_init(cp);
+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
+		if (!cp->dir)
+			goto retry_next;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(retry_it->util,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = NULL; /* make sure we do not recurse forever */
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1334,14 +1376,49 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type == OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct submodule *sub = task_cb;
+	struct repository subrepo;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!sub)
+		return 0;
+
+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
+			  sub->path, spf->r->worktree);
+	else {
+		struct string_list_item *it;
+		struct oid_array *commits;
+
+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+		if (!it)
+			return 0;
+
+		commits = it->util;
+		oid_array_remove_if(commits,
+				    commit_exists_in_sub,
+				    &subrepo);
+
+		if (commits->nr)
+			string_list_append(&spf->retry, sub->name)
+				->util = commits;
+	}
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 4437cb17698..ade4bf6d0bc 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -606,4 +606,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-08 22:17 ` [PATCH 02/10] string-list.h: add string_list_pop function Stefan Beller
@ 2018-08-09  7:35   ` Martin Ågren
  2018-08-09 21:29     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2018-08-09  7:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, hvoigt

On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> A string list can be used as a stack, but should we? A later patch shows
> how useful this will be.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  string-list.c | 8 ++++++++
>  string-list.h | 6 ++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/string-list.c b/string-list.c
> index 9f651bb4294..ea80afc8a0c 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -80,6 +80,14 @@ void string_list_remove(struct string_list *list, const char *string,
>         }
>  }
>
> +struct string_list_item *string_list_pop(struct string_list *list)
> +{
> +       if (list->nr == 0)
> +               return 0;

return NULL, not 0.

> +
> +       return &list->items[--list->nr];
> +}
> +

> +/**
> + * Returns the last item inserted and removes it from the list.
> + * If the list is empty, return NULL.
> + */
> +struct string_list_item *string_list_pop(struct string_list *list);
> +

The memory ownership is now with the caller. That is, the caller needs
to check/know `list->strdup_strings` and know `free_util` to be able to
properly free all memory.

OTOH, the pointer returned by this function is only guaranteed to be
valid until you start inserting into the list (well, you can do one
insertion per pop without worrying, but that's quite detailed
implementation knowledge).

Maybe these caveats should be documented, or is a hint that this
`_pop()` is not so nice to have after all, but let's see what happens in
the later patches...

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-08 22:17 ` [PATCH 03/10] sha1-array: provide oid_array_remove_if Stefan Beller
@ 2018-08-09  7:39   ` Martin Ågren
  2018-08-09 17:25     ` Stefan Beller
  2018-08-09 21:44   ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2018-08-09  7:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, hvoigt

On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> +int oid_array_remove_if(struct oid_array *array,
> +                       for_each_oid_fn fn,
> +                       void *data)
> +{
> +       int i, j;
> +       char *to_remove = xcalloc(array->nr, sizeof(char));

Do you really need this scratch space? Let's see..

> +       /* No oid_array_sort() here! See the api-oid-array.txt docs! */
> +
> +       for (i = 0; i < array->nr; i++) {
> +               int ret = fn(array->oid + i, data);
> +               if (ret)
> +                       to_remove[i] = 1;
> +       }
> +
> +       i = 0, j = 0;
> +       while (i < array->nr && j < array->nr) {
> +               while (i < array->nr && !to_remove[i])
> +                       i++;
> +               /* i at first marked for deletion or out */
> +               if (j < i)
> +                       j = i;
> +               while (j < array->nr && to_remove[j])
> +                       j++;
> +               /* j > i; j at first valid after first deletion range or out */
> +               if (i < array->nr && j < array->nr)
> +                       oidcpy(&array->oid[i], &array->oid[j]);
> +               else if (i >= array->nr)
> +                       assert(j >= array->nr);
> +                       /* no pruning happened, keep original array->nr */
> +               else if (j >= array->nr)
> +                       array->nr = i;
> +       }
> +
> +       free(to_remove);
> +
> +       return 0;
> +}

I can't entirely follow this index-fiddling, but then I haven't had my
morning coffee yet, so please forgive me if this is nonsense. Would it
suffice to let i point out where to place items (starting at the first
item not to keep) and j where to take them from (i.e., the items to
keep, after the initial run)?

> diff --git a/sha1-array.h b/sha1-array.h
> index 232bf950172..151c7ad7f30 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
>  int oid_array_for_each_unique(struct oid_array *array,
>                               for_each_oid_fn fn,
>                               void *data);
> +int oid_array_remove_if(struct oid_array *array,
> +                       for_each_oid_fn fn,
> +                       void *data);

Maybe some documentation here, but this seems to be following suit. ;-)

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched
  2018-08-08 22:17 ` [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
@ 2018-08-09  7:50   ` Martin Ågren
  2018-08-09 17:42     ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2018-08-09  7:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, hvoigt

On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> Currently when git-fetch is asked to recurse into submodules, it dispatches
> a plain "git-fetch -C <submodule-dir>" (and some submodule related options
> such as prefix and recusing strategy, but) without any information of the
> remote or the tip that should be fetched.
>
> This works surprisingly well in some workflows (such as using submodules
> as a third party library), while not so well in other scenarios, such
> as in a Gerrit topic-based workflow, that can tie together changes
> (potentially across repositories) on the server side. One of the parts
> of such a Gerrit workflow is to download a change when wanting to examine
> it, and you'd want to have its submodule changes that are in the same
> topic downloaded as well. However these submodule changes reside in their
> own repository in their on ref (refs/changes/<int>).

s/on/own/

> Retry fetching a submodule if the object id that the superproject points
> to, cannot be found.
>
> Note: This is an RFC and doesn't support fetching to FETCH_HEAD yet, but
> only into a local branch. To make fetching into FETCH_HEAD work, we need
> some refactoring in builtin/fetch.c to adjust the calls to
> 'check_for_new_submodule_commits'.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

> diff --git a/submodule.c b/submodule.c
> index ec7ea6f8c2d..6cbd0b1a470 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1127,6 +1127,7 @@ struct submodule_parallel_fetch {
>         int result;
>
>         struct string_list changed_submodule_names;
> +       struct string_list retry;
>  };
>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }

`retry` will effectively be `STRING_LIST_INIT_NODUP`, but making that
explicit would be better and the next addition to the struct would be
easier to get right.

> +retry_next:
> +       retry_it = string_list_pop(&spf->retry);
> +       if (retry_it) {
> +               struct strbuf submodule_prefix = STRBUF_INIT;
> +               const struct submodule *sub =
> +                               submodule_from_name(spf->r,
> +                                                   &null_oid,
> +                                                   retry_it->string);
> +
> +               child_process_init(cp);
> +               cp->dir = get_submodule_git_dir(spf->r, sub->path);
> +               if (!cp->dir)
> +                       goto retry_next;

So here you just drop the string list item. Since it's NODUP, and since
the `util` pointers are owned elsewhere(?), this seems fine. Other uses
of `string_list_pop()` might not be so straightforward.

Just a thought, but rather than pop+if+goto, maybe

while ((retry_it = )) {
        ...
        if (!cp->dir) continue;
        ...
        return 1;
}

I haven't commented on any of the submodule stuff, which is probably
where you'd be most interested in comments. I don't use submodules, nor
do I know the code that runs them.. I guess my comments are more "if
those who know something about submodules find this series worthwhile,
you might want to consider my comments as well".

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-09  7:39   ` Martin Ågren
@ 2018-08-09 17:25     ` Stefan Beller
  2018-08-09 19:24       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-09 17:25 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Heiko Voigt

On Thu, Aug 9, 2018 at 12:39 AM Martin Ågren <martin.agren@gmail.com> wrote:
>
> On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> > +int oid_array_remove_if(struct oid_array *array,
> > +                       for_each_oid_fn fn,
> > +                       void *data)
> > +{
> > +       int i, j;
> > +       char *to_remove = xcalloc(array->nr, sizeof(char));
>
> Do you really need this scratch space?

I don't think so, when we reorder the items while iterating over them.

I though reordering them later would be easier, but I am not sure anymore.

>
> > +       /* No oid_array_sort() here! See the api-oid-array.txt docs! */
> > +
> > +       for (i = 0; i < array->nr; i++) {
> > +               int ret = fn(array->oid + i, data);
> > +               if (ret)
> > +                       to_remove[i] = 1;
> > +       }
> > +
> > +       i = 0, j = 0;
> > +       while (i < array->nr && j < array->nr) {
> > +               while (i < array->nr && !to_remove[i])
> > +                       i++;
> > +               /* i at first marked for deletion or out */
> > +               if (j < i)
> > +                       j = i;
> > +               while (j < array->nr && to_remove[j])
> > +                       j++;
> > +               /* j > i; j at first valid after first deletion range or out */
> > +               if (i < array->nr && j < array->nr)
> > +                       oidcpy(&array->oid[i], &array->oid[j]);
> > +               else if (i >= array->nr)
> > +                       assert(j >= array->nr);
> > +                       /* no pruning happened, keep original array->nr */
> > +               else if (j >= array->nr)
> > +                       array->nr = i;
> > +       }
> > +
> > +       free(to_remove);
> > +
> > +       return 0;
> > +}
>
> I can't entirely follow this index-fiddling, but then I haven't had my
> morning coffee yet, so please forgive me if this is nonsense. Would it
> suffice to let i point out where to place items (starting at the first
> item not to keep) and j where to take them from (i.e., the items to
> keep, after the initial run)?

I thought this is what happens, just after the actual loop of calls.

> > +int oid_array_remove_if(struct oid_array *array,
> > +                       for_each_oid_fn fn,
> > +                       void *data);
>
> Maybe some documentation here, but this seems to be following suit. ;-)

Worth mentioning: the order is kept stable. (c.f. shrink_potential_moved_blocks
in diff.c which also "compacts an array", but without stable order).

Thanks for the review. I'll try to rewrite this to be more legible.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched
  2018-08-09  7:50   ` Martin Ågren
@ 2018-08-09 17:42     ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-09 17:42 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Heiko Voigt

On Thu, Aug 9, 2018 at 12:50 AM Martin Ågren <martin.agren@gmail.com> wrote:
>
> On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> > Currently when git-fetch is asked to recurse into submodules, it dispatches
> > a plain "git-fetch -C <submodule-dir>" (and some submodule related options
> > such as prefix and recusing strategy, but) without any information of the
> > remote or the tip that should be fetched.
> >
> > This works surprisingly well in some workflows (such as using submodules
> > as a third party library), while not so well in other scenarios, such
> > as in a Gerrit topic-based workflow, that can tie together changes
> > (potentially across repositories) on the server side. One of the parts
> > of such a Gerrit workflow is to download a change when wanting to examine
> > it, and you'd want to have its submodule changes that are in the same
> > topic downloaded as well. However these submodule changes reside in their
> > own repository in their on ref (refs/changes/<int>).
>
> s/on/own/
>
> > Retry fetching a submodule if the object id that the superproject points
> > to, cannot be found.
> >
> > Note: This is an RFC and doesn't support fetching to FETCH_HEAD yet, but
> > only into a local branch. To make fetching into FETCH_HEAD work, we need
> > some refactoring in builtin/fetch.c to adjust the calls to
> > 'check_for_new_submodule_commits'.
> >
> > Signed-off-by: Stefan Beller <sbeller@google.com>
> > ---
>
> > diff --git a/submodule.c b/submodule.c
> > index ec7ea6f8c2d..6cbd0b1a470 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -1127,6 +1127,7 @@ struct submodule_parallel_fetch {
> >         int result;
> >
> >         struct string_list changed_submodule_names;
> > +       struct string_list retry;
> >  };
> >  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
>
> `retry` will effectively be `STRING_LIST_INIT_NODUP`, but making that
> explicit would be better and the next addition to the struct would be
> easier to get right.
>
> > +retry_next:
> > +       retry_it = string_list_pop(&spf->retry);
> > +       if (retry_it) {
> > +               struct strbuf submodule_prefix = STRBUF_INIT;
> > +               const struct submodule *sub =
> > +                               submodule_from_name(spf->r,
> > +                                                   &null_oid,
> > +                                                   retry_it->string);
> > +
> > +               child_process_init(cp);
> > +               cp->dir = get_submodule_git_dir(spf->r, sub->path);
> > +               if (!cp->dir)
> > +                       goto retry_next;
>
> So here you just drop the string list item. Since it's NODUP, and since
> the `util` pointers are owned elsewhere(?), this seems fine. Other uses
> of `string_list_pop()` might not be so straightforward.
>
> Just a thought, but rather than pop+if+goto, maybe
>
> while ((retry_it = )) {
>         ...
>         if (!cp->dir) continue;
>         ...
>         return 1;
> }

I really want to keep the retry list short and pruned, as this
function is called O(n) times with n the number of submodules
and the retry list will also be up to n.
And with that we'd run O(n^2) times into "if (!..) continue;".

When we use the 'pop-no-work items' logic, then we're still in O(n).

> I haven't commented on any of the submodule stuff, which is probably
> where you'd be most interested in comments. I don't use submodules, nor
> do I know the code that runs them.. I guess my comments are more "if
> those who know something about submodules find this series worthwhile,
> you might want to consider my comments as well".

Thanks for your comments! I'll try to think of another way to
represent this more easily in code.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-09 17:25     ` Stefan Beller
@ 2018-08-09 19:24       ` Jeff King
  2018-08-09 21:46         ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2018-08-09 19:24 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Martin Ågren, git, Heiko Voigt

On Thu, Aug 09, 2018 at 10:25:52AM -0700, Stefan Beller wrote:

> On Thu, Aug 9, 2018 at 12:39 AM Martin Ågren <martin.agren@gmail.com> wrote:
> >
> > On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
> > > +int oid_array_remove_if(struct oid_array *array,
> > > +                       for_each_oid_fn fn,
> > > +                       void *data)
> > > +{
> > > +       int i, j;
> > > +       char *to_remove = xcalloc(array->nr, sizeof(char));
> >
> > Do you really need this scratch space?
> 
> I don't think so, when we reorder the items while iterating over them.

Even with keeping the order this can be done in a single linear pass.
See filter_string_list() for an example.

The one twist here is that you cannot:

  oidcpy(array->oid[i], array->oid[j]);

when i==j, because of memcpy restrictions. With the current
implementation it would suffice to use struct assignment (and really,
every oidcpy() could just be struct assignment these days). But you
could also just do:

  if (i != j)
    oidcpy(array->oid[i], array->oid[j]);

> > I can't entirely follow this index-fiddling, but then I haven't had my
> > morning coffee yet, so please forgive me if this is nonsense. Would it
> > suffice to let i point out where to place items (starting at the first
> > item not to keep) and j where to take them from (i.e., the items to
> > keep, after the initial run)?
> 
> I thought this is what happens, just after the actual loop of calls.

I think the point is that we can just maintain those meanings during the
single walk through the array. The result is simpler to read and more
efficient.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] string_list: print_string_list to use trace_printf
  2018-08-08 22:17 ` [PATCH 01/10] string_list: print_string_list to use trace_printf Stefan Beller
@ 2018-08-09 21:16   ` Junio C Hamano
  2018-08-09 21:40     ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2018-08-09 21:16 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, hvoigt

Stefan Beller <sbeller@google.com> writes:

> It is a debugging aid, so it should print to the debugging channel.

Who says?  In-tree code may say so, and I do not think any in-flight
topic up to 'pu' uses this to produce non-debugging output, but I do
not think it is healthy attitude to assume that you can take over an
existing function and change what it does unilaterally.

As there is no in-tree or in-flight user, I think it makes sense if
the proposed change were to rename it to trace_string_list().  If
there weren't any print_string_list() and we were adding a debugging
aid to use trace_printf() to dump these, that would be the name we
would use anyway.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  string-list.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/string-list.c b/string-list.c
> index 771c4550980..9f651bb4294 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -200,9 +200,9 @@ void print_string_list(const struct string_list *p, const char *text)
>  {
>  	int i;
>  	if ( text )
> -		printf("%s\n", text);
> +		trace_printf("%s\n", text);
>  	for (i = 0; i < p->nr; i++)
> -		printf("%s:%p\n", p->items[i].string, p->items[i].util);
> +		trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
>  }
>  
>  struct string_list_item *string_list_append_nodup(struct string_list *list,

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-09  7:35   ` Martin Ågren
@ 2018-08-09 21:29     ` Junio C Hamano
  2018-08-09 21:41       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2018-08-09 21:29 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Stefan Beller, Git Mailing List, hvoigt

Martin Ågren <martin.agren@gmail.com> writes:

> On 9 August 2018 at 00:17, Stefan Beller <sbeller@google.com> wrote:
>> A string list can be used as a stack, but should we? A later patch shows
>> how useful this will be.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  string-list.c | 8 ++++++++
>>  string-list.h | 6 ++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/string-list.c b/string-list.c
>> index 9f651bb4294..ea80afc8a0c 100644
>> --- a/string-list.c
>> +++ b/string-list.c
>> @@ -80,6 +80,14 @@ void string_list_remove(struct string_list *list, const char *string,
>>         }
>>  }
>>
>> +struct string_list_item *string_list_pop(struct string_list *list)
>> +{
>> +       if (list->nr == 0)
>> +               return 0;
>
> return NULL, not 0.

It is OK to return NULL, which may make the caller a bit simpler,
i.e.

	while (item = list_pop(list))
		work_on(item);

but if we consider it a good discipline for the caller to see if
there are still things on the stack before attempting to pop, then
instead of returning NULL, we can have BUG("") there, which requires
the caller to be more like this:

	while (list->nr)
		work_on(list_pop(list));

which is not so bad.

>> +/**
>> + * Returns the last item inserted and removes it from the list.
>> + * If the list is empty, return NULL.
>> + */
>> +struct string_list_item *string_list_pop(struct string_list *list);
>> +
>
> The memory ownership is now with the caller. That is, the caller needs
> to check/know `list->strdup_strings` and know `free_util` to be able to
> properly free all memory.

> OTOH, the pointer returned by this function is only guaranteed to be
> valid until you start inserting into the list (well, you can do one
> insertion per pop without worrying, but that's quite detailed
> implementation knowledge).

Yes, it is a more grave limitation when using string_list as a
stack.  A single realloc() and you are dead X-<.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] string_list: print_string_list to use trace_printf
  2018-08-09 21:16   ` Junio C Hamano
@ 2018-08-09 21:40     ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-09 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Heiko Voigt

On Thu, Aug 9, 2018 at 2:16 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > It is a debugging aid, so it should print to the debugging channel.
>
> Who says?

The comment in the header file?

  /**
   * Dump a string_list to stdout, useful mainly for debugging
   * purposes. It can take an optional header argument and it writes out
   * the string-pointer pairs of the string_list, each one in its own
   * line.
   */

>  In-tree code may say so, and I do not think any in-flight
> topic up to 'pu' uses this to produce non-debugging output, but I do
> not think it is healthy attitude to assume that you can take over an
> existing function and change what it does unilaterally.

That is reasonable, as usual, and given the recent fallout of the
object store refactoring I have these concerns on my mind, too.

But for this instance, I do not see any risk of accidental collisions with
other series in flight:

$ git log -S print_string_list origin/pu --oneline
4f665f2cf33 string-list.h: move documentation from Documentation/api/
into header
d10cb3dfab9 help.c: rename function "pretty_print_string_list"
c455c87c5cd Rename path_list to string_list
1eb056905a2 include $PATH in generating list of commands for "help -a"
70827b15bfb Split up builtin commands into separate files from git.c
27dedf0c3b7 (tag: v1.0rc3, tag: v0.99.9j) GIT 0.99.9j aka 1.0rc3
8e49d503882 C implementation of the 'git' program, take two.

> As there is no in-tree or in-flight user, I think it makes sense if
> the proposed change were to rename it to trace_string_list().  If
> there weren't any print_string_list() and we were adding a debugging
> aid to use trace_printf() to dump these, that would be the name we
> would use anyway.

Good call. I'll rename and send separately (or might even drop this
patch completely; this patch is not used in the patches to follow;
it is more a "FYI: here is a thing that I found helpful for debugging,
unlike the existing function, which I found unhelpful and actively
harmful")

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-09 21:29     ` Junio C Hamano
@ 2018-08-09 21:41       ` Jeff King
  2018-08-09 21:52         ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2018-08-09 21:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Ågren, Stefan Beller, Git Mailing List, hvoigt

On Thu, Aug 09, 2018 at 02:29:32PM -0700, Junio C Hamano wrote:

> >> +struct string_list_item *string_list_pop(struct string_list *list)
> >> +{
> >> +       if (list->nr == 0)
> >> +               return 0;
> >
> > return NULL, not 0.
> 
> It is OK to return NULL, which may make the caller a bit simpler,
> i.e.
> 
> 	while (item = list_pop(list))
> 		work_on(item);
> 
> but if we consider it a good discipline for the caller to see if
> there are still things on the stack before attempting to pop, then
> instead of returning NULL, we can have BUG("") there, which requires
> the caller to be more like this:
> 
> 	while (list->nr)
> 		work_on(list_pop(list));
> 
> which is not so bad.

In many cases you can just do:

  while (list->nr) {
	work_on(list->items[list->nr - 1]);
	list_remove(list, list->nr - 1);
  }

and then all of those memory ownership issues like:

> > The memory ownership is now with the caller. That is, the caller needs
> > to check/know `list->strdup_strings` and know `free_util` to be able to
> > properly free all memory.
> 
> > OTOH, the pointer returned by this function is only guaranteed to be
> > valid until you start inserting into the list (well, you can do one
> > insertion per pop without worrying, but that's quite detailed
> > implementation knowledge).
> 
> Yes, it is a more grave limitation when using string_list as a
> stack.  A single realloc() and you are dead X-<.

just go away. :)

Where that falls down is if you really need work_on() to put more items
on the stack, but only after you've removed the current top. But then
writing it out may still be nicer, because it makes it clear you have to
do:

  const char *cur_string = xstrdup(list->items[list->nr-1].string);

if you want the data to live past the removal.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-08 22:17 ` [PATCH 03/10] sha1-array: provide oid_array_remove_if Stefan Beller
  2018-08-09  7:39   ` Martin Ågren
@ 2018-08-09 21:44   ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2018-08-09 21:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, hvoigt

Stefan Beller <sbeller@google.com> writes:

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sha1-array.c | 39 +++++++++++++++++++++++++++++++++++++++
>  sha1-array.h |  3 +++
>  2 files changed, 42 insertions(+)
>
> diff --git a/sha1-array.c b/sha1-array.c
> index 265941fbf40..10eb08b425e 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,42 @@ int oid_array_for_each_unique(struct oid_array *array,
>  	}
>  	return 0;
>  }
> +
> +int oid_array_remove_if(struct oid_array *array,
> +			for_each_oid_fn fn,
> +			void *data)
> +{
> +	int i, j;
> +	char *to_remove = xcalloc(array->nr, sizeof(char));
> +
> +	/* No oid_array_sort() here! See the api-oid-array.txt docs! */
> +
> +	for (i = 0; i < array->nr; i++) {
> +		int ret = fn(array->oid + i, data);
> +		if (ret)
> +			to_remove[i] = 1;
> +	}

Doing the same without this secondary array and loop, i.e.

	for (src = dst = 0; src < array->nr; src++) {
		if (!fn(&array->oid[src], cbdata)) {
			if (dst < src)
				oidcpy(&array->oid[dst], &array->oid[src]);
			dst++;
		}
	}
        array->nr = dst;

would be no less efficient.  The only reason why you might want to
measure move-span by a secondary array and preliminary counting loop
like your version does is that moving contiguous area of memory may
be more efficient than moving only by a single oid sized chunks, but
as far as I can tell you are not doing that "optimization", either.

I doubt that remove_if() is particularly a good name.  A version of
this function, for which polarity of fn() is reversed, can be called
"grep", or "filter", I think, and would be more understandable.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] sha1-array: provide oid_array_remove_if
  2018-08-09 19:24       ` Jeff King
@ 2018-08-09 21:46         ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2018-08-09 21:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Martin Ågren, git, Heiko Voigt

Jeff King <peff@peff.net> writes:

> Even with keeping the order this can be done in a single linear pass.
> See filter_string_list() for an example.

Heh, I just wasted a few minutes saying the same; I should have
pointed him at these two lines ;-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-09 21:41       ` Jeff King
@ 2018-08-09 21:52         ` Stefan Beller
  2018-08-09 21:56           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-08-09 21:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Martin Ågren, git, Heiko Voigt

On Thu, Aug 9, 2018 at 2:41 PM Jeff King <peff@peff.net> wrote:

> >
> >       while (list->nr)
> >               work_on(list_pop(list));
> >
> > which is not so bad.
>
> In many cases you can just do:
>
>   while (list->nr) {
>         work_on(list->items[list->nr - 1]);
>         list_remove(list, list->nr - 1);
>   }
>
> and then all of those memory ownership issues like:

[...]
>
> just go away. :)

The only complication here is the lack of list_remove(index),
we do have list_remove(string), which internally searches the
item and removes it. Hence I did not want to use it.

Another idea I had was to keep the list immutable (except amending,
just like a constitution ;-) and store an index of how far we got in that
list already. That wastes memory for keeping entries around, but is safe
for memory due to its nature.

> Where that falls down is if you really need work_on() to put more items
> on the stack, but only after you've removed the current top. But then
> writing it out may still be nicer, because it makes it clear you have to
> do:
>
>   const char *cur_string = xstrdup(list->items[list->nr-1].string);

Another way would be to use

  string_list_pop(&list, &string_dst, &util_dst);
i.e.
  /* Returns 0 if the dst was filled */
  int (struct string_list *, char **, void**)

as then we do not expose the internals and would not have issues
with reallocs.

> if you want the data to live past the removal.

In the code proposed there are no additions (hence no reallocs)
and the need for the data is short lived.

But I can see how the design was just fitting my purpose and
we could come up with some better API.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-09 21:52         ` Stefan Beller
@ 2018-08-09 21:56           ` Jeff King
  2018-08-09 22:10             ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2018-08-09 21:56 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Martin Ågren, git, Heiko Voigt

On Thu, Aug 09, 2018 at 02:52:29PM -0700, Stefan Beller wrote:

> > In many cases you can just do:
> >
> >   while (list->nr) {
> >         work_on(list->items[list->nr - 1]);
> >         list_remove(list, list->nr - 1);
> >   }
> >
> > and then all of those memory ownership issues like:
> 
> [...]
> >
> > just go away. :)
> 
> The only complication here is the lack of list_remove(index),
> we do have list_remove(string), which internally searches the
> item and removes it. Hence I did not want to use it.

Heh, I almost dug into that more.

I think you could have helpers to spell the two lines above even more
nicely:

  while (list->nr) {
        work_on(list_top(list));
	list_pop(list); /* note this doesn't return anything! */
  }

But yes, it's not possible with the current functions.

> Another idea I had was to keep the list immutable (except amending,
> just like a constitution ;-) and store an index of how far we got in that
> list already. That wastes memory for keeping entries around, but is safe
> for memory due to its nature.

You can also use a list.h linked-list. Then removal from the list and
freeing are two separate operations (but it exercises your malloc a lot
more if you're constantly pushing and popping).

> > Where that falls down is if you really need work_on() to put more items
> > on the stack, but only after you've removed the current top. But then
> > writing it out may still be nicer, because it makes it clear you have to
> > do:
> >
> >   const char *cur_string = xstrdup(list->items[list->nr-1].string);
> 
> Another way would be to use
> 
>   string_list_pop(&list, &string_dst, &util_dst);
> i.e.
>   /* Returns 0 if the dst was filled */
>   int (struct string_list *, char **, void**)
> 
> as then we do not expose the internals and would not have issues
> with reallocs.

Yes, I almost suggested that, but there's the question of memory
ownership of string_dst. Does it need freed or not? Is that answer
dependent on the strdup_strings flag?

> > if you want the data to live past the removal.
> 
> In the code proposed there are no additions (hence no reallocs)
> and the need for the data is short lived.
> 
> But I can see how the design was just fitting my purpose and
> we could come up with some better API.

Yeah, I didn't actually dig into your use case. I just want to make sure
we don't add a crappy function to our API. ;)

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id
  2018-08-08 22:17 ` [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
@ 2018-08-09 22:00   ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2018-08-09 22:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, hvoigt

Stefan Beller <sbeller@google.com> writes:

> All callers use oid_to_hex to convert the desired oid to a string before
> calling submodule_move_head. Defer the conversion to the
> submodule_move_head as it will turn out to be useful in a bit.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  entry.c        |  6 +++---
>  submodule.c    | 12 ++++++------
>  submodule.h    |  2 +-
>  unpack-trees.c | 13 +++++--------
>  4 files changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/entry.c b/entry.c
> index b5d1d3cf231..4b34dfd30df 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -358,7 +358,7 @@ static int write_entry(struct cache_entry *ce,
>  		sub = submodule_from_ce(ce);
>  		if (sub)
>  			return submodule_move_head(ce->name,
> -				NULL, oid_to_hex(&ce->oid),
> +				NULL, &ce->oid,
>  				state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
>  		break;

Nice.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] string-list.h: add string_list_pop function.
  2018-08-09 21:56           ` Jeff King
@ 2018-08-09 22:10             ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-08-09 22:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Martin Ågren, git, Heiko Voigt

On Thu, Aug 9, 2018 at 2:56 PM Jeff King <peff@peff.net> wrote:

>
> I think you could have helpers to spell the two lines above even more
> nicely:
>
>   while (list->nr) {
>         work_on(list_top(list));
>         list_pop(list); /* note this doesn't return anything! */
>   }
>
> But yes, it's not possible with the current functions.

I like this one most, and as we manage our own memory via the
alloc variable we also would not see a lot of memory churn for constant
push/pop traffic.

> You can also use a list.h linked-list. Then removal from the list and
> freeing are two separate operations (but it exercises your malloc a lot
> more if you're constantly pushing and popping).

For that I'd have to define my own type derived from list.h to carry
the string and the util pointer, which looks very similar to the string_list
we already have from a users POV.

> > > Where that falls down is if you really need work_on() to put more items
> > > on the stack, but only after you've removed the current top. But then
> > > writing it out may still be nicer, because it makes it clear you have to
> > > do:
> > >
> > >   const char *cur_string = xstrdup(list->items[list->nr-1].string);
> >
> > Another way would be to use
> >
> >   string_list_pop(&list, &string_dst, &util_dst);
> > i.e.
> >   /* Returns 0 if the dst was filled */
> >   int (struct string_list *, char **, void**)
> >
> > as then we do not expose the internals and would not have issues
> > with reallocs.
>
> Yes, I almost suggested that, but there's the question of memory
> ownership of string_dst. Does it need freed or not? Is that answer
> dependent on the strdup_strings flag?

Sure. But as the caller, you should know?
You constructed that string_list.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-08-09 22:10 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 22:17 [RFC PATCH 00/10] fetch: make sure submodule oids are fetched Stefan Beller
2018-08-08 22:17 ` [PATCH 01/10] string_list: print_string_list to use trace_printf Stefan Beller
2018-08-09 21:16   ` Junio C Hamano
2018-08-09 21:40     ` Stefan Beller
2018-08-08 22:17 ` [PATCH 02/10] string-list.h: add string_list_pop function Stefan Beller
2018-08-09  7:35   ` Martin Ågren
2018-08-09 21:29     ` Junio C Hamano
2018-08-09 21:41       ` Jeff King
2018-08-09 21:52         ` Stefan Beller
2018-08-09 21:56           ` Jeff King
2018-08-09 22:10             ` Stefan Beller
2018-08-08 22:17 ` [PATCH 03/10] sha1-array: provide oid_array_remove_if Stefan Beller
2018-08-09  7:39   ` Martin Ågren
2018-08-09 17:25     ` Stefan Beller
2018-08-09 19:24       ` Jeff King
2018-08-09 21:46         ` Junio C Hamano
2018-08-09 21:44   ` Junio C Hamano
2018-08-08 22:17 ` [PATCH 04/10] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
2018-08-09 22:00   ` Junio C Hamano
2018-08-08 22:17 ` [PATCH 05/10] submodule.c: fix indentation Stefan Beller
2018-08-08 22:17 ` [PATCH 06/10] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-08-08 22:17 ` [PATCH 07/10] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-08-08 22:17 ` [PATCH 08/10] submodule.c: do not copy around submodule list Stefan Beller
2018-08-08 22:17 ` [PATCH 09/10] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-08-08 22:17 ` [PATCH 10/10] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-08-09  7:50   ` Martin Ågren
2018-08-09 17:42     ` Stefan Beller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).