git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] fetch: make sure submodule oids are fetched
@ 2018-09-04 23:01 Stefan Beller
  2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
                   ` (10 more replies)
  0 siblings, 11 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This is a resend of [1] and was rebased to origin/master and all review comments
have been addressed.

Thanks,
Stefan

[1] https://public-inbox.org/git/20180808221752.195419-1-sbeller@google.com/

Stefan Beller (11):
  string_list: print_string_list to use trace_printf
  string-list.h: add string_list_{pop, last} functions
  sha1-array: provide oid_array_filter
  submodule.c: convert submodule_move_head new argument to object id
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if sha1 were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 builtin/fetch.c             |  14 +--
 entry.c                     |   6 +-
 sha1-array.c                |  18 ++++
 sha1-array.h                |   5 +
 string-list.c               |  20 +++-
 string-list.h               |  15 ++-
 submodule.c                 | 200 ++++++++++++++++++++++++++++--------
 submodule.h                 |   2 +-
 t/t5526-fetch-submodules.sh |  23 ++++-
 unpack-trees.c              |  13 +--
 10 files changed, 246 insertions(+), 70 deletions(-)

-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 16:52   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

It is a debugging aid, so it should print to the debugging channel.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 6 +++---
 string-list.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/string-list.c b/string-list.c
index 771c4550980..1ebbe1f56ea 100644
--- a/string-list.c
+++ b/string-list.c
@@ -196,13 +196,13 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 }
 
 
-void print_string_list(const struct string_list *p, const char *text)
+void trace_print_string_list(const struct string_list *p, const char *text)
 {
 	int i;
 	if ( text )
-		printf("%s\n", text);
+		trace_printf("%s\n", text);
 	for (i = 0; i < p->nr; i++)
-		printf("%s:%p\n", p->items[i].string, p->items[i].util);
+		trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
 }
 
 struct string_list_item *string_list_append_nodup(struct string_list *list,
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..5b22560cf19 100644
--- a/string-list.h
+++ b/string-list.h
@@ -114,12 +114,12 @@ void filter_string_list(struct string_list *list, int free_util,
 			string_list_each_func_t want, void *cb_data);
 
 /**
- * Dump a string_list to stdout, useful mainly for debugging
+ * Dump a string_list using the trace_print, useful mainly for debugging
  * purposes. It can take an optional header argument and it writes out
  * the string-pointer pairs of the string_list, each one in its own
  * line.
  */
-void print_string_list(const struct string_list *p, const char *text);
+void trace_print_string_list(const struct string_list *p, const char *text);
 
 /**
  * Free a string_list. The `string` pointer of the items will be freed
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/11] string-list.h: add string_list_{pop, last} functions
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 17:10   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

A string list can be used as a stack, but should we? A later patch shows
how useful this will be.

In an earlier iteration of this patch it was suggested to return the last
element or NULL (if empty), to enable a pattern of

  while ((item = string_list_pop(&list))
	work_on(item);

But it turns out that this pattern is error-prone as the memory model
of the returned item is unclear. The caller would need to free the
item->string if the string list is a duplicating list; and more importantly
if there are further calls to insert items into the list, a memory
reallocation might occur, invalidating the returned item at some time in
the future.

That is why the implementation doesn't return any item and the caller
needs to be worded as

  while (list.nr) {
    work_on(last(&list]));
    string_list_pop(&list, free_util);
  }

Also provide the function to access the last item in the list.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 14 ++++++++++++++
 string-list.h | 11 +++++++++++
 2 files changed, 25 insertions(+)

diff --git a/string-list.c b/string-list.c
index 1ebbe1f56ea..21559f222a7 100644
--- a/string-list.c
+++ b/string-list.c
@@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
 	}
 }
 
+void string_list_pop(struct string_list *list, int free_util)
+{
+	if (list->nr == 0)
+		BUG("tried to remove an item from empty string list");
+
+	if (list->strdup_strings)
+		free(list->items[list->nr - 1].string);
+
+	if (free_util)
+		free(list->items[list->nr - 1].util);
+
+	list->nr--;
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
 	int exact_match;
diff --git a/string-list.h b/string-list.h
index 5b22560cf19..d7cdf38e57a 100644
--- a/string-list.h
+++ b/string-list.h
@@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, const char *string,
  */
 struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
 
+/**
+ * Removes the last item from the list.
+ * The caller must ensure that the list is not empty.
+ */
+void string_list_pop(struct string_list *list, int free_util);
+
+inline struct string_list_item *string_list_last(struct string_list *list)
+{
+	return &list->items[list->nr - 1];
+}
+
 /*
  * Remove all but the first of consecutive entries with the same
  * string value.  If free_util is true, call free() on the util
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/11] sha1-array: provide oid_array_filter
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
  2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 17:20   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1-array.c | 18 ++++++++++++++++++
 sha1-array.h |  5 +++++
 2 files changed, 23 insertions(+)

diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf40..7eada4d1811 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,21 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+int oid_array_filter(struct oid_array *array,
+		     for_each_oid_fn fn,
+		     void *cbdata)
+{
+	int src, dst;
+
+	for (src = dst = 0; src < array->nr; src++) {
+		if (!fn(&array->oid[src], cbdata)) {
+			if (dst < src)
+				oidcpy(&array->oid[dst], &array->oid[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+
+	return 0;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..b516ec4bffc 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -23,4 +23,9 @@ int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
 
+/* Call fn for each oid, and retain it if fn returns 0, remove it otherwise */
+int oid_array_filter(struct oid_array *array,
+		     for_each_oid_fn fn,
+		     void *cbdata);
+
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (2 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 17:31   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

All callers use oid_to_hex to convert the desired oid to a string before
calling submodule_move_head. Defer the conversion to the
submodule_move_head as it will turn out to be useful in a bit.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 entry.c        |  6 +++---
 submodule.c    | 12 ++++++------
 submodule.h    |  2 +-
 unpack-trees.c | 13 +++++--------
 4 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/entry.c b/entry.c
index 2a2ab6c8394..0b676f997b2 100644
--- a/entry.c
+++ b/entry.c
@@ -358,7 +358,7 @@ static int write_entry(struct cache_entry *ce,
 		sub = submodule_from_ce(ce);
 		if (sub)
 			return submodule_move_head(ce->name,
-				NULL, oid_to_hex(&ce->oid),
+				NULL, &ce->oid,
 				state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 		break;
 
@@ -439,10 +439,10 @@ int checkout_entry(struct cache_entry *ce,
 					unlink_or_warn(ce->name);
 
 				return submodule_move_head(ce->name,
-					NULL, oid_to_hex(&ce->oid), 0);
+					NULL, &ce->oid, 0);
 			} else
 				return submodule_move_head(ce->name,
-					"HEAD", oid_to_hex(&ce->oid),
+					"HEAD", &ce->oid,
 					state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 		}
 
diff --git a/submodule.c b/submodule.c
index 50cbf5f13ed..da2ed8696f5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1597,9 +1597,9 @@ static void submodule_reset_index(const char *path)
  * pass NULL for old or new respectively.
  */
 int submodule_move_head(const char *path,
-			 const char *old_head,
-			 const char *new_head,
-			 unsigned flags)
+			const char *old_head,
+			const struct object_id *new_oid,
+			unsigned flags)
 {
 	int ret = 0;
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -1679,7 +1679,7 @@ int submodule_move_head(const char *path,
 	if (!(flags & SUBMODULE_MOVE_HEAD_FORCE))
 		argv_array_push(&cp.args, old_head ? old_head : empty_tree_oid_hex());
 
-	argv_array_push(&cp.args, new_head ? new_head : empty_tree_oid_hex());
+	argv_array_push(&cp.args, new_oid ? oid_to_hex(new_oid) : empty_tree_oid_hex());
 
 	if (run_command(&cp)) {
 		ret = error(_("Submodule '%s' could not be updated."), path);
@@ -1687,7 +1687,7 @@ int submodule_move_head(const char *path,
 	}
 
 	if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
-		if (new_head) {
+		if (new_oid) {
 			child_process_init(&cp);
 			/* also set the HEAD accordingly */
 			cp.git_cmd = 1;
@@ -1696,7 +1696,7 @@ int submodule_move_head(const char *path,
 
 			prepare_submodule_repo_env(&cp.env_array);
 			argv_array_pushl(&cp.args, "update-ref", "HEAD",
-					 "--no-deref", new_head, NULL);
+					 "--no-deref", oid_to_hex(new_oid), NULL);
 
 			if (run_command(&cp)) {
 				ret = -1;
diff --git a/submodule.h b/submodule.h
index 7d476cefa7e..ef34d5a7ca8 100644
--- a/submodule.h
+++ b/submodule.h
@@ -124,7 +124,7 @@ int submodule_to_gitdir(struct strbuf *buf, const char *submodule);
 #define SUBMODULE_MOVE_HEAD_FORCE   (1<<1)
 int submodule_move_head(const char *path,
 			const char *old,
-			const char *new_head,
+			const struct object_id *new_oid,
 			unsigned flags);
 
 void submodule_unset_core_worktree(const struct submodule *sub);
diff --git a/unpack-trees.c b/unpack-trees.c
index f25089b878a..75d1b294ade 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -256,7 +256,7 @@ static void display_error_msgs(struct unpack_trees_options *o)
 
 static int check_submodule_move_head(const struct cache_entry *ce,
 				     const char *old_id,
-				     const char *new_id,
+				     const struct object_id *new_id,
 				     struct unpack_trees_options *o)
 {
 	unsigned flags = SUBMODULE_MOVE_HEAD_DRY_RUN;
@@ -1517,7 +1517,7 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int r = check_submodule_move_head(ce,
-				"HEAD", oid_to_hex(&ce->oid), o);
+				"HEAD", &ce->oid, o);
 			if (r)
 				return o->gently ? -1 :
 					add_rejected_path(o, error_type, ce->name);
@@ -1591,8 +1591,7 @@ static int verify_clean_submodule(const char *old_sha1,
 	if (!submodule_from_ce(ce))
 		return 0;
 
-	return check_submodule_move_head(ce, old_sha1,
-					 oid_to_hex(&ce->oid), o);
+	return check_submodule_move_head(ce, old_sha1, &ce->oid, o);
 }
 
 static int verify_clean_subdirectory(const struct cache_entry *ce,
@@ -1836,8 +1835,7 @@ static int merged_entry(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int ret = check_submodule_move_head(ce, NULL,
-							    oid_to_hex(&ce->oid),
-							    o);
+							    &ce->oid, o);
 			if (ret)
 				return ret;
 		}
@@ -1865,8 +1863,7 @@ static int merged_entry(const struct cache_entry *ce,
 
 		if (submodule_from_ce(ce)) {
 			int ret = check_submodule_move_head(ce, oid_to_hex(&old->oid),
-							    oid_to_hex(&ce->oid),
-							    o);
+							    &ce->oid, o);
 			if (ret)
 				return ret;
 		}
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/11] submodule.c: fix indentation
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (3 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 17:34   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index da2ed8696f5..8345d423fda 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1254,8 +1255,9 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule || !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (4 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 18:03   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Instead of sorting it after we created an unsorted list, we could insert
correctly into the list.  As the unsorted append is in order of cache entry
names, this is already sorted if names were equal to paths for submodules.

As submodule names are often the same as their path, the input is sorted
pretty well already, so let's just do the sort afterwards.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 8345d423fda..8bee455137a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1255,7 +1255,7 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(
+			if (!submodule || !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1349,6 +1349,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (5 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 18:04   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8bee455137a..582c0263b91 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1256,7 +1260,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule || !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1348,8 +1352,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1358,7 +1362,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/11] submodule.c: do not copy around submodule list
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (6 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-06 18:20   ` Junio C Hamano
  2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

As a side effect, we'll have access to the util pointer for longer that
contains the commits that we need to fetch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 582c0263b91..0efe6711a8c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1362,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (7 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
  2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  10 siblings, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that git-fetch actually doesn't
need to be run in the worktree. So let's run it in the git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c                 | 43 ++++++++++++++++++++++++++-----------
 t/t5526-fetch-submodules.sh |  7 +++++-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0efe6711a8c..8b67f086895 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static const char *get_submodule_git_dir(struct repository *r, const char *path)
+{
+	struct repository subrepo;
+	const char *ret;
+
+	if (repo_submodule_init(&subrepo, r, path)) {
+		/* no entry in .gitmodules? */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
+		if (repo_init(&subrepo, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+	}
+
+	ret = xstrdup(subrepo.gitdir);
+	repo_clear(&subrepo);
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1273,16 +1298,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		git_dir = get_submodule_git_dir(spf->r, ce->name);
+		if (git_dir) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = git_dir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1294,8 +1315,6 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, submodule_prefix.buf);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (8 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  10 siblings, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule if the object id that the superproject points
to, cannot be found.

Note: This is an RFC and doesn't support fetching to FETCH_HEAD yet, but
only into a local branch. To make fetching into FETCH_HEAD work, we need
some refactoring in builtin/fetch.c to adjust the calls to
'check_for_new_submodule_commits'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |  9 ++--
 submodule.c                 | 87 ++++++++++++++++++++++++++++++++++++-
 t/t5526-fetch-submodules.sh | 16 +++++++
 3 files changed, 104 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213d..95c44bf6ffa 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 8b67f086895..702cba138be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list retry;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  STRING_LIST_INIT_NODUP}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
 {
 	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
+	struct string_list_item *it;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
+		int recurse_config;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1280,7 +1285,9 @@ static int get_next_submodule(struct child_process *cp,
 			}
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(submodule, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
@@ -1318,9 +1325,50 @@ static int get_next_submodule(struct child_process *cp,
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
+			if (submodule != &default_submodule)
+				/* discard const-ness: */
+				*task_cb = (void*)submodule;
 			return 1;
 		}
 	}
+
+retry_next:
+
+	if (spf->retry.nr) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		const struct submodule *sub;
+
+		it = string_list_last(&spf->retry);
+		sub = submodule_from_name(spf->r, &null_oid,
+					  it->string);
+
+		child_process_init(cp);
+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
+		if (!cp->dir) {
+			string_list_pop(&spf->retry, 0);
+			goto retry_next;
+		}
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(it->util,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = NULL; /* make sure we do not recurse forever */
+		strbuf_release(&submodule_prefix);
+		string_list_pop(&spf->retry, 0);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1334,14 +1382,49 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type == OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct submodule *sub = task_cb;
+	struct repository subrepo;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!sub)
+		return 0;
+
+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
+			  sub->path, spf->r->worktree);
+	else {
+		struct string_list_item *it;
+		struct oid_array *commits;
+
+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+		if (!it)
+			return 0;
+
+		commits = it->util;
+		oid_array_filter(commits,
+				 commit_exists_in_sub,
+				 &subrepo);
+
+		if (commits->nr)
+			string_list_append(&spf->retry, sub->name)
+				->util = commits;
+	}
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1a..af12c50e7dd 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches
  2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (9 preceding siblings ...)
  2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
@ 2018-09-04 23:01 ` Stefan Beller
  10 siblings, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-04 23:01 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

For Gerrit users that use submodules the invocation of fetch without a
branch is their main use case.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ffa..ea6ecd123e7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7dd..a509eabb044 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
@ 2018-09-06 16:52   ` Junio C Hamano
  2018-09-06 16:56     ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 16:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> It is a debugging aid, so it should print to the debugging channel.

... and rename it with trace_ prefix.

Use of trace_printf() is nice, as we can control its behavior at
runtime ;-)

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  string-list.c | 6 +++---
>  string-list.h | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/string-list.c b/string-list.c
> index 771c4550980..1ebbe1f56ea 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -196,13 +196,13 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
>  }
>  
>  
> -void print_string_list(const struct string_list *p, const char *text)
> +void trace_print_string_list(const struct string_list *p, const char *text)
>  {
>  	int i;
>  	if ( text )
> -		printf("%s\n", text);
> +		trace_printf("%s\n", text);
>  	for (i = 0; i < p->nr; i++)
> -		printf("%s:%p\n", p->items[i].string, p->items[i].util);
> +		trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
>  }
>  
>  struct string_list_item *string_list_append_nodup(struct string_list *list,
> diff --git a/string-list.h b/string-list.h
> index ff8f6094a33..5b22560cf19 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -114,12 +114,12 @@ void filter_string_list(struct string_list *list, int free_util,
>  			string_list_each_func_t want, void *cb_data);
>  
>  /**
> - * Dump a string_list to stdout, useful mainly for debugging
> + * Dump a string_list using the trace_print, useful mainly for debugging
>   * purposes. It can take an optional header argument and it writes out
>   * the string-pointer pairs of the string_list, each one in its own
>   * line.
>   */
> -void print_string_list(const struct string_list *p, const char *text);
> +void trace_print_string_list(const struct string_list *p, const char *text);
>  
>  /**
>   * Free a string_list. The `string` pointer of the items will be freed

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-06 16:52   ` Junio C Hamano
@ 2018-09-06 16:56     ` Jeff King
  2018-09-06 22:16       ` Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-09-06 16:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

On Thu, Sep 06, 2018 at 09:52:28AM -0700, Junio C Hamano wrote:

> Stefan Beller <sbeller@google.com> writes:
> 
> > It is a debugging aid, so it should print to the debugging channel.
> 
> ... and rename it with trace_ prefix.
> 
> Use of trace_printf() is nice, as we can control its behavior at
> runtime ;-)

Yes, though...

> > -void print_string_list(const struct string_list *p, const char *text)
> > +void trace_print_string_list(const struct string_list *p, const char *text)
> >  {
> >  	int i;
> >  	if ( text )
> > -		printf("%s\n", text);
> > +		trace_printf("%s\n", text);
> >  	for (i = 0; i < p->nr; i++)
> > -		printf("%s:%p\n", p->items[i].string, p->items[i].util);
> > +		trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
> >  }

It seems funny that we'd iterate through the list checking over and over
whether tracing is enabled.

Should this do:

  if (!trace_want(&trace_default_key))
	return;

at the top? (Or possibly even take a trace key from the caller, so that
it can use whatever context makes sense for this particular list?)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] string-list.h: add string_list_{pop, last} functions
  2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-06 17:10   ` Junio C Hamano
  2018-09-06 22:29     ` Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 17:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> A string list can be used as a stack, but should we?

verb missing.  "We can use a string list as ..., but should we?" is
readable.  "A string list can be ..., but should it be?" also is.

> A later patch shows
> how useful this will be.

Instead of all of the above, how about being more direct, i.e. e.g.

	Add a few functions to allow a string-list to be used as a
	stack:

	- string_list_last() lets a caller peek the string_list_item
          at the end of the string list.  The caller needs to be
          aware that it is borrowing a pointer, which can become
          invalid if/when the string_list is resized.

	- string_list_pop() removes the string_list_item at the end
          of the string list.

	You can use them in this pattern:

		while (list.nr) {
			struct string_list_item *item = string_list_last(&list);

			work_on(item);
			string_list_pop(&list);
		}

Then the readers would immediately have enough information to judge
if the new API is useful without blindly trusting what you promise.

Two things to note in the above suggested rewrite are:

 - It is curious that _pop() is not accompanied by _push().  It
   probably is a good idea to add the third bullet point that says
   "existing string_list_append() can be used in place of the
   missing string_list_push()."

 - As it already lays out the memory ownership issue, it probably
   makes the description we see below unnecessary (besides, the code
   snippet in there were buggy in at least two counts).

> In an earlier iteration of this patch it was suggested to return the last
> ...
> Also provide the function to access the last item in the list.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  string-list.c | 14 ++++++++++++++
>  string-list.h | 11 +++++++++++
>  2 files changed, 25 insertions(+)
>
> diff --git a/string-list.c b/string-list.c
> index 1ebbe1f56ea..21559f222a7 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
>  	}
>  }
>  
> +void string_list_pop(struct string_list *list, int free_util)
> +{
> +	if (list->nr == 0)
> +		BUG("tried to remove an item from empty string list");
> +
> +	if (list->strdup_strings)
> +		free(list->items[list->nr - 1].string);
> +
> +	if (free_util)
> +		free(list->items[list->nr - 1].util);
> +
> +	list->nr--;
> +}

Conceptually, this allows string_list_clear() to be implemented in
terms of this function, i.e.

	string_list_clear(struct string_list *list, int free_util)
	{
		while (list->nr)
			string_list_pop(list, free_util);
		free(list->items);
		list->items = NULL;
		list->nr = list->alloc = 0;
	}

It is unfortunate that string_list has become such a kitchen-sink in
that a string list whose util field is used in such a way that
string_list_clear_func() needs to be used to clear the list cannot
be used as a stack.  Ideally each of these "features" (like
"sorted/unsorted", "owning/borrowing util", "owning/borrowing
string", etc.)  ought to be orthogonal, but they are not.

> diff --git a/string-list.h b/string-list.h
> index 5b22560cf19..d7cdf38e57a 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, const char *string,
>   */
>  struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
>  
> +/**
> + * Removes the last item from the list.
> + * The caller must ensure that the list is not empty.
> + */
> +void string_list_pop(struct string_list *list, int free_util);
> +
> +inline struct string_list_item *string_list_last(struct string_list *list)
> +{
> +	return &list->items[list->nr - 1];
> +}

inline functions declared in a header file must be marked "static
inline", I think.

>  /*
>   * Remove all but the first of consecutive entries with the same
>   * string value.  If free_util is true, call free() on the util

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] sha1-array: provide oid_array_filter
  2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-06 17:20   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 17:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> +/* Call fn for each oid, and retain it if fn returns 0, remove it otherwise */
> +int oid_array_filter(struct oid_array *array,
> +		     for_each_oid_fn fn,
> +		     void *cbdata);

Comparing this with object_array_filter(), which I think this was
modeled after, I notice that this is much harder to remember how to
use it correctly.

    void object_array_filter(struct object_array *array,
                             object_array_each_func_t want, void *cb_data)

It is clear to a potential user of this function who needs to
implement the callback function what the function should return,
because it calls it "want" (as opposed to "fn").  The function must
answer "I want to keep this?  Yes/no?".

Also polarity of oid_array_filter()'s callback is different.  It
retains elements for which "fn" returns false.

So one step of improvement may be to rename "fn" to "reject", but as
we do not have any caller of this new function yet, perhaps match
the polarity and call it "want"?

I think "x_filter" is perhaps trying to match other language's
filter(), e.g. Python's

	>>> filter(lambda x: x % 2 == 0, range(0,10))
	[0, 2, 4, 6, 8]

so "say yes if you want to keep it" may be more familiar than "say
yes if you want to filter it out".


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id
  2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
@ 2018-09-06 17:31   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 17:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> All callers use oid_to_hex to convert the desired oid to a string before
> calling submodule_move_head. Defer the conversion to the
> submodule_move_head as it will turn out to be useful in a bit.

I would think this is a good change even without "as it will turn
out..." which readers do not have enough information to judge for
themselves at this point.

> diff --git a/submodule.c b/submodule.c
> index 50cbf5f13ed..da2ed8696f5 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1597,9 +1597,9 @@ static void submodule_reset_index(const char *path)
>   * pass NULL for old or new respectively.
>   */
>  int submodule_move_head(const char *path,
> -			 const char *old_head,
> -			 const char *new_head,
> -			 unsigned flags)
> +			const char *old_head,
> +			const struct object_id *new_oid,
> +			unsigned flags)

The new calling convention feels somewhat uneven, though, that "new"
(does it mean "switching to this commit object"?) takes an object
id, but "old" ("switching from this commit object"?) still wants a
textual representation.

So, I no longer am sure that this is a good change by itself.  It
would be a good change by itself if we deferred _both_ to keep them
in sync (otherwise those who write (or read) callers will be forced
to wonder which one takes the object id and which one takes the
textual representation and why they need to remember the
difference).

Perhaps not all callers use oid_to_hex() to come up with old_head,
and some may even use branch names, etc., which is passed through
from this function to its callee, to convey more information than
mere object names?  If that is the case, then converting old_head to
old_oid would be a bad move as it can lose information and we'd need
to reconsider the strategy used here (e.g. perhaps we may be better
off sending both textual name and object id down the callchain, and
a caller without a meaningful textual name may indicate that by
passing NULL, or something like that).

> @@ -1865,8 +1863,7 @@ static int merged_entry(const struct cache_entry *ce,
>  
>  		if (submodule_from_ce(ce)) {
>  			int ret = check_submodule_move_head(ce, oid_to_hex(&old->oid),
> -							    oid_to_hex(&ce->oid),
> -							    o);
> +							    &ce->oid, o);
>  			if (ret)
>  				return ret;
>  		}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] submodule.c: fix indentation
  2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
@ 2018-09-06 17:34   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 17:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The submodule subsystem is really bad at staying within 80 characters.
> Fix it while we are here.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index da2ed8696f5..8345d423fda 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!submodule) {
>  			const char *name = default_name_or_path(ce->name);
>  			if (name) {
> -				default_submodule.path = default_submodule.name = name;
> +				default_submodule.path = name;
> +				default_submodule.name = name;

Besides indentation, it is good to avoid A = B = C assignment.
Written this way, it is more obvious that these two fields share the
same pointer.

Good.

>  				submodule = &default_submodule;
>  			}
>  		}
> @@ -1254,8 +1255,9 @@ static int get_next_submodule(struct child_process *cp,
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
> -			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
> -							 submodule->name))
> +			if (!submodule || !unsorted_string_list_lookup(
> +					&changed_submodule_names,
> +					submodule->name))

			if (!submodule ||
			    !unsorted_string_list_lookup(
					&changed_submodule_names,
					submodule->name))

would be easier to see what is going on.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
  2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-06 18:03   ` Junio C Hamano
  2018-09-11 18:31     ` Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 18:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Instead of sorting it after we created an unsorted list, we could insert
> correctly into the list.

It is unclear what problem you are solving, especially with
subjunctive "could" there.  We are creating an unsorted list and
then sorting it and you see it as a problem because it is just as
easy and efficient to do the insertion sort while building up the
list?  (don't react and answer without reading all the way to the
end; I think I know what is going on).

> As the unsorted append is in order of cache entry
> names, this is already sorted if names were equal to paths for submodules.

That may be a statement of a fact, but it is unclear how that fact
relates to what the code is doing before this patch, or what the
code updated by this patch is doing.

> As submodule names are often the same as their path, the input is sorted
> pretty well already, so let's just do the sort afterwards.

It is unclear what (performance?) trade-off this senttence is trying
to make.  It sounds as if it is claiming this:

	We can string_list_insert() to maintain sorted-ness of the
	list as we find new items, or we can string_list_append() to
	build an unsorted list and sort it at the end just once.

	To pick which one is more appropriate, we notice the fact
	that we discover new items more or less in the already
	sorted order.  That makes "append then sort" more
	appropriate.

But is that reasoning sensible?  

I'd imagine that append-then-sort would always be more efficient
than insert-at-the-right-place-as-we-go, and the reason why we
sometimes would need to do the latter is when we need to look up
elements in the middle of building the list (e.g. we may want to
de-dup, which requires us to look up a name from the ones we
collected so far).

And in this application, calculate_changed_submodule_paths()
discover paths by calling collect_changed_submodules() which finds a
mapping <submodule name, oid of commits> into a list sorted by
submodule name, and then goes through that list and builds a list of
submodules paths (which could be different from submodule names) by
appending.  Only after this list is fully built, get_next_submodule()
gets called, so making the latter use string_list_lookup() that assumes
a sorted list is safe if we built the list by append-then-sort (iow,
sortedness while building the list does not matter).

Having analysed all that, I find it somewhat iffy that _append() is
used there in the calculate_changed_submodule_paths() function.  It
would cause the resulting changed_submodule_names list to hold the
same name twice (or more), but I do not know if that would pose a
problem to the consumer of the list.  Using "accumulate then sort
before calling look-up" would not change it as string_list_sort()
would not dedup, so I do not think this patch would introduce a new
problem, though.



> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 8345d423fda..8bee455137a 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1255,7 +1255,7 @@ static int get_next_submodule(struct child_process *cp,
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
> -			if (!submodule || !unsorted_string_list_lookup(
> +			if (!submodule || !string_list_lookup(
>  					&changed_submodule_names,
>  					submodule->name))
>  				continue;
> @@ -1349,6 +1349,7 @@ int fetch_populated_submodules(struct repository *r,
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
>  	calculate_changed_submodule_paths();
> +	string_list_sort(&changed_submodule_names);
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,
>  			       fetch_start_failure,

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-06 18:04   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 18:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The `changed_submodule_names` are only used for fetching, so let's make it
> part of the struct that is passed around for fetching submodules.

Yay.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/11] submodule.c: do not copy around submodule list
  2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-06 18:20   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 18:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> 'calculate_changed_submodule_paths' uses a local list to compute the
> changed submodules, and then produces the result by copying appropriate
> items into the result list.
>
> Instead use the result list directly and prune items afterwards
> using string_list_remove_empty_items.

That may describe what the patch does, but does not explain why we
would even want to do so in the first place.  

It may be a safe thing to do to munge the list in place as there is
nobody that looks at the list after we are done in the current code,
but the above description does not tell that to readers and fails to
give readers warm and fuzzy feeling that the safety likely will stay
with us in the future.

> As a side effect, we'll have access to the util pointer for longer that
> contains the commits that we need to fetch.

If this patch does not move free-submodules-oids call to out: label
(i.e. instead do so after the call to string-list-remove-empty-items
is made, perhaps), then the list of oids will have the same lifespan
as the original code, no?  Then it is not a "side effect" but is
deliberate change of behaviour this patch makes.  We can access the
list for longer time than before, which may be a good thing, in
which case we'd want to explain the potential benefit we could reap
with future changes, no?

>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 582c0263b91..0efe6711a8c 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
>  	struct submodule_parallel_fetch *spf)
>  {
>  	struct argv_array argv = ARGV_ARRAY_INIT;
> -	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
> -	const struct string_list_item *name;
> +	struct string_list_item *name;
>  
>  	/* No need to check if there are no submodules configured */
>  	if (!submodule_from_path(the_repository, NULL, NULL))
> @@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
>  	 * Collect all submodules (whether checked out or not) for which new
>  	 * commits have been recorded upstream in "changed_submodule_names".
>  	 */
> -	collect_changed_submodules(&changed_submodules, &argv);
> +	collect_changed_submodules(&spf->changed_submodule_names, &argv);
>  
> -	for_each_string_list_item(name, &changed_submodules) {
> +	for_each_string_list_item(name, &spf->changed_submodule_names) {
>  		struct oid_array *commits = name->util;
>  		const struct submodule *submodule;
>  		const char *path = NULL;
> @@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
>  		if (!path)
>  			continue;
>  
> -		if (!submodule_has_commits(path, commits))
> -			string_list_append(&spf->changed_submodule_names,
> -					   name->string);
> +		if (submodule_has_commits(path, commits)) {
> +			oid_array_clear(commits);
> +			*name->string = '\0';
> +		}
>  	}
>  
> -	free_submodules_oids(&changed_submodules);
> +	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
> +
>  	argv_array_clear(&argv);
>  	oid_array_clear(&ref_tips_before_fetch);
>  	oid_array_clear(&ref_tips_after_fetch);
> @@ -1362,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
>  
>  	argv_array_clear(&spf.args);
>  out:
> -	string_list_clear(&spf.changed_submodule_names, 1);
> +	free_submodules_oids(&spf.changed_submodule_names);
>  	return spf.result;
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-06 16:56     ` Jeff King
@ 2018-09-06 22:16       ` Stefan Beller
  2018-09-07  0:04         ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-06 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Thu, Sep 6, 2018 at 9:56 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Sep 06, 2018 at 09:52:28AM -0700, Junio C Hamano wrote:
>
> > Stefan Beller <sbeller@google.com> writes:
> >
> > > It is a debugging aid, so it should print to the debugging channel.
> >
> > ... and rename it with trace_ prefix.
> >
> > Use of trace_printf() is nice, as we can control its behavior at
> > runtime ;-)
>
> Yes, though...
>
> > > -void print_string_list(const struct string_list *p, const char *text)
> > > +void trace_print_string_list(const struct string_list *p, const char *text)
> > >  {
> > >     int i;
> > >     if ( text )
> > > -           printf("%s\n", text);
> > > +           trace_printf("%s\n", text);
> > >     for (i = 0; i < p->nr; i++)
> > > -           printf("%s:%p\n", p->items[i].string, p->items[i].util);
> > > +           trace_printf("%s:%p\n", p->items[i].string, p->items[i].util);
> > >  }
>
> It seems funny that we'd iterate through the list checking over and over
> whether tracing is enabled.
>
> Should this do:
>
>   if (!trace_want(&trace_default_key))
>         return;
>
> at the top? (Or possibly even take a trace key from the caller, so that
> it can use whatever context makes sense for this particular list?)

I added this check as well as rewording the commit message
to recite Junios understanding of the patch as well.

However I would want to not derail this patch any further.
This function was used as an aid by me some time ago, so I
am willing to share the modifications needed for efficient
printf debugging here, but I do not want to be dragged into
a rabbit hole.

For example taking the trace key is much overkill IMHO
for a pure debugging aid (and so is the shortcutting return
that you proposed, but I added that already for the resend),
so if anyone needs this function outside of printf-debugging,
I would recommend patches on top.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] string-list.h: add string_list_{pop, last} functions
  2018-09-06 17:10   ` Junio C Hamano
@ 2018-09-06 22:29     ` Stefan Beller
  0 siblings, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-06 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> > A later patch shows
> > how useful this will be.
>
> Instead of all of the above, how about being more direct, i.e. e.g.


ok.


> Conceptually, this allows string_list_clear() to be implemented in
> terms of this function, i.e.
>
>         string_list_clear(struct string_list *list, int free_util)
>         {
>                 while (list->nr)
>                         string_list_pop(list, free_util);
>                 free(list->items);
>                 list->items = NULL;
>                 list->nr = list->alloc = 0;
>         }
>
> It is unfortunate that string_list has become such a kitchen-sink in
> that a string list whose util field is used in such a way that
> string_list_clear_func() needs to be used to clear the list cannot
> be used as a stack.  Ideally each of these "features" (like
> "sorted/unsorted", "owning/borrowing util", "owning/borrowing
> string", etc.)  ought to be orthogonal, but they are not.

Oh, another neat string list feature I overlooked. :/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-06 22:16       ` Stefan Beller
@ 2018-09-07  0:04         ` Jeff King
  2018-09-07  9:53           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-09-07  0:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git

On Thu, Sep 06, 2018 at 03:16:21PM -0700, Stefan Beller wrote:

> > It seems funny that we'd iterate through the list checking over and over
> > whether tracing is enabled.
> >
> > Should this do:
> >
> >   if (!trace_want(&trace_default_key))
> >         return;
> >
> > at the top? (Or possibly even take a trace key from the caller, so that
> > it can use whatever context makes sense for this particular list?)
> 
> I added this check as well as rewording the commit message
> to recite Junios understanding of the patch as well.
> 
> However I would want to not derail this patch any further.
> This function was used as an aid by me some time ago, so I
> am willing to share the modifications needed for efficient
> printf debugging here, but I do not want to be dragged into
> a rabbit hole.
> 
> For example taking the trace key is much overkill IMHO
> for a pure debugging aid (and so is the shortcutting return
> that you proposed, but I added that already for the resend),
> so if anyone needs this function outside of printf-debugging,
> I would recommend patches on top.

I guess the question is: is this a thing we would want to make available
to code to leave in all the time? Or is it just for sticking in
temporarily for a quick dump?

If the former, then I think it needs the early-return at the least (and
probably _should_ have the key parameter).

But I get the feeling that you really just want the latter, and are only
grudgingly being pushed into the former by Junio's suggestion.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-07  0:04         ` Jeff King
@ 2018-09-07  9:53           ` Junio C Hamano
  2018-09-07 17:21             ` Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-07  9:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git

Jeff King <peff@peff.net> writes:

> I guess the question is: is this a thing we would want to make available
> to code to leave in all the time? Or is it just for sticking in
> temporarily for a quick dump?
>
> If the former, then I think it needs the early-return at the least (and
> probably _should_ have the key parameter).
>
> But I get the feeling that you really just want the latter, and are only
> grudgingly being pushed into the former by Junio's suggestion.

Well, if that is the case, I'd change "my suggestion" (although I
didn't mean to suggest anything concrete).  If this was needed
and/or was useful only during the defvelopment of the remainder of
the series, and if this is known to be a half-hearted change that is
not meant to be useful as a general solution, then let's not take
the main part of the series a hostage to this step; rather, we
should drop this change and leave it to another series that is
willing to do it right.  It would save both author's and reviewers'
time if we did so.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] string_list: print_string_list to use trace_printf
  2018-09-07  9:53           ` Junio C Hamano
@ 2018-09-07 17:21             ` Stefan Beller
  2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-07 17:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Fri, Sep 7, 2018 at 2:53 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > I guess the question is: is this a thing we would want to make available
> > to code to leave in all the time? Or is it just for sticking in
> > temporarily for a quick dump?
> >
> > If the former, then I think it needs the early-return at the least (and
> > probably _should_ have the key parameter).
> >
> > But I get the feeling that you really just want the latter, and are only
> > grudgingly being pushed into the former by Junio's suggestion.

That is not quite the case. I did take all but the one suggestion
as I think they are good suggestions and I appreciate it.

However having the trace key is not useful I would think.
(I did not need it for debugging, but presented an API
of what I would have found useful)

If the key is really that useful, I would just think we can add it on top.

> Well, if that is the case, I'd change "my suggestion" (although I
> didn't mean to suggest anything concrete).  If this was needed
> and/or was useful only during the defvelopment of the remainder of
> the series, and if this is known to be a half-hearted change that is
> not meant to be useful as a general solution, then let's not take
> the main part of the series a hostage to this step; rather, we
> should drop this change and leave it to another series that is
> willing to do it right.  It would save both author's and reviewers'
> time if we did so.

In that case, would we be interested in a separate patch
that kills off that function instead?

Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-07 17:21             ` Stefan Beller
@ 2018-09-10 21:58               ` Stefan Beller
  2018-09-10 21:58                 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
  2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
  0 siblings, 2 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-10 21:58 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, peff

Similar to trace_strbuf or trace_argv_printf, we might want to output
a string list in tracing. Add such a function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

I separated this from the other series, making it into 2 patches:
This first patch adds tracing for string lists and the next patch that
removes the unused function from the string list API.
That way we can decide on these two patches separately if needed.

 Documentation/technical/api-trace.txt |  8 ++++++
 trace.c                               | 39 +++++++++++++++++++++++++++
 trace.h                               | 16 +++++++++++
 3 files changed, 63 insertions(+)

diff --git a/Documentation/technical/api-trace.txt b/Documentation/technical/api-trace.txt
index fadb5979c48..ad0ea99d930 100644
--- a/Documentation/technical/api-trace.txt
+++ b/Documentation/technical/api-trace.txt
@@ -62,6 +62,14 @@ Functions
 	Prints the strbuf, without additional formatting (i.e. doesn't
 	choke on `%` or even `\0`).
 
+`void trace_print_string_list_key(struct trace_key *key, const struct string_list *p, const char *text)::
+
+	Print the string-pointer pairs of the string_list,
+	each one in its own line.
+	It takes an optional key and header argument.
+	If the tracing key is not given, use the default key.
+	The header text is printed before the list itself.
+
 `uint64_t getnanotime(void)`::
 
 	Returns nanoseconds since the epoch (01/01/1970), typically used
diff --git a/trace.c b/trace.c
index fc623e91fdd..e3a12a092f9 100644
--- a/trace.c
+++ b/trace.c
@@ -176,6 +176,38 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 	strbuf_release(&buf);
 }
 
+void trace_print_string_list_key_fl(const char *file, int line,
+				    struct trace_key *key,
+				    const struct string_list *p,
+				    const char *text)
+{
+	int i, buf_prefix;
+	struct strbuf buf = STRBUF_INIT;
+
+	if (!key)
+		key = &trace_default_key;
+
+	if (!prepare_trace_line(file, line, key, &buf))
+		return;
+
+	buf_prefix = buf.len;
+
+	if (text) {
+		strbuf_addstr(&buf, text);
+		print_trace_line(key, &buf);
+	}
+
+	for (i = 0; i < p->nr; i++) {
+		strbuf_setlen(&buf, buf_prefix);
+		strbuf_addf(&buf, "%s:%p\n",
+			    p->items[i].string,
+			    p->items[i].util);
+		print_trace_line(key, &buf);
+	}
+
+	strbuf_release(&buf);
+}
+
 static void trace_performance_vprintf_fl(const char *file, int line,
 					 uint64_t nanos, const char *format,
 					 va_list ap)
@@ -227,6 +259,13 @@ void trace_strbuf(struct trace_key *key, const struct strbuf *data)
 	trace_strbuf_fl(NULL, 0, key, data);
 }
 
+void trace_print_string_list_key(struct trace_key *key,
+				 const struct string_list *p,
+				 const char *text)
+{
+	trace_print_string_list_key_fl(NULL, 0, key, p, list);
+}
+
 void trace_performance(uint64_t nanos, const char *format, ...)
 {
 	va_list ap;
diff --git a/trace.h b/trace.h
index 2b6a1bc17c2..0e083891e1f 100644
--- a/trace.h
+++ b/trace.h
@@ -37,6 +37,10 @@ extern void trace_argv_printf(const char **argv, const char *format, ...);
 
 extern void trace_strbuf(struct trace_key *key, const struct strbuf *data);
 
+extern void trace_print_string_list_key(struct trace_key *key,
+					const struct string_list *p,
+					const char *text);
+
 /* Prints elapsed time (in nanoseconds) if GIT_TRACE_PERFORMANCE is enabled. */
 __attribute__((format (printf, 2, 3)))
 extern void trace_performance(uint64_t nanos, const char *format, ...);
@@ -103,6 +107,13 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
 			trace_strbuf_fl(TRACE_CONTEXT, __LINE__, key, data);\
 	} while (0)
 
+#define trace_print_string_list_key(key, list, text) 			    \
+	do {								    \
+		if (trace_pass_fl(key))					    \
+			trace_print_string_list_key_fl(TRACE_CONTEXT,	    \
+						       __LINE__, key, data);\
+	} while (0)
+
 #define trace_performance(nanos, ...)					    \
 	do {								    \
 		if (trace_pass_fl(&trace_perf_key))			    \
@@ -127,6 +138,11 @@ extern void trace_argv_printf_fl(const char *file, int line, const char **argv,
 				 const char *format, ...);
 extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 			    const struct strbuf *data);
+extern void trace_print_string_list_key_fl(const char *file, int line,
+					   struct trace_key *key,
+					   const struct string_list *p,
+					   const char *text);
+
 __attribute__((format (printf, 4, 5)))
 extern void trace_performance_fl(const char *file, int line,
 				 uint64_t nanos, const char *fmt, ...);
-- 
2.19.0.rc2.392.g5ba43deb5a-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/2] string-list: remove unused function print_string_list
  2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
@ 2018-09-10 21:58                 ` Stefan Beller
  2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-10 21:58 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, peff

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 10 ----------
 string-list.h |  8 --------
 2 files changed, 18 deletions(-)

diff --git a/string-list.c b/string-list.c
index 771c4550980..1f6063f2a27 100644
--- a/string-list.c
+++ b/string-list.c
@@ -195,16 +195,6 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 	list->nr = list->alloc = 0;
 }
 
-
-void print_string_list(const struct string_list *p, const char *text)
-{
-	int i;
-	if ( text )
-		printf("%s\n", text);
-	for (i = 0; i < p->nr; i++)
-		printf("%s:%p\n", p->items[i].string, p->items[i].util);
-}
-
 struct string_list_item *string_list_append_nodup(struct string_list *list,
 						  char *string)
 {
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..18c718c12ce 100644
--- a/string-list.h
+++ b/string-list.h
@@ -113,14 +113,6 @@ typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 void filter_string_list(struct string_list *list, int free_util,
 			string_list_each_func_t want, void *cb_data);
 
-/**
- * Dump a string_list to stdout, useful mainly for debugging
- * purposes. It can take an optional header argument and it writes out
- * the string-pointer pairs of the string_list, each one in its own
- * line.
- */
-void print_string_list(const struct string_list *p, const char *text);
-
 /**
  * Free a string_list. The `string` pointer of the items will be freed
  * in case the `strdup_strings` member of the string_list is set. The
-- 
2.19.0.rc2.392.g5ba43deb5a-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
  2018-09-10 21:58                 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
@ 2018-09-10 22:32                 ` Junio C Hamano
  2018-09-10 22:38                   ` Stefan Beller
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-10 22:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <sbeller@google.com> writes:

> I separated this from the other series, making it into 2 patches:
> This first patch adds tracing for string lists and the next patch that
> removes the unused function from the string list API.
> That way we can decide on these two patches separately if needed.

Of course, even though these are 1/2 and 2/2, only one of them and
not both would apply.

Thanks for sticking to the topic.  

Given how simple that "dump them to standard output" code is, I am
inclined to say that anybody who needs to inspect the contents of
string list at various points in the code under development can
create one from scratch even if we did not have this implementation,
so perhaps 2/2 is a better choice between the two.

It is not costing us much to leave it in the code.  It's not like
the function costed a lot of maintenance burden since it was added
in 8fd2cb40 ("Extract helper bits from c-merge-recursive work",
2006-07-25), so the alternative #3 might be to do nothing.

I have no strong preference between #2 and #3.  The benefit of #2
compared to #3 is that, if we remove it today, there will not be
somebody else in the future to come and propose removing the
otherwise unused function, which would cost us time to review and
discuss, so unless somebody stops me, I am inclined to say we'd take
2/2 ;-)

Thanks.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
@ 2018-09-10 22:38                   ` Stefan Beller
  2018-09-11  0:52                     ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-10 22:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Sep 10, 2018 at 3:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > I separated this from the other series, making it into 2 patches:
> > This first patch adds tracing for string lists and the next patch that
> > removes the unused function from the string list API.
> > That way we can decide on these two patches separately if needed.
>
> Of course, even though these are 1/2 and 2/2, only one of them and
> not both would apply.

Or you could squash them once we reach consensus that both are good.

> Thanks for sticking to the topic.
>
> Given how simple that "dump them to standard output" code is, I am
> inclined to say that anybody who needs to inspect the contents of
> string list at various points in the code under development can
> create one from scratch even if we did not have this implementation,
> so perhaps 2/2 is a better choice between the two.

This sounds like the consensus is not to take both but only 2/2,
which I'd be happy with, too.

> It is not costing us much to leave it in the code.  It's not like
> the function costed a lot of maintenance burden since it was added
> in 8fd2cb40 ("Extract helper bits from c-merge-recursive work",
> 2006-07-25), so the alternative #3 might be to do nothing.

True, but ...

> somebody else in the future to propose removing

is what is actually happening here already, see

https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/

> I am inclined to say we'd take
> 2/2 ;-)
>
> Thanks.

Feel free to take Alexanders patch from 2015 instead.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-10 22:38                   ` Stefan Beller
@ 2018-09-11  0:52                     ` Junio C Hamano
  2018-09-11  3:08                       ` Stefan Beller
  2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11  0:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Jeff King

Stefan Beller <sbeller@google.com> writes:

>> Of course, even though these are 1/2 and 2/2, only one of them and
>> not both would apply.
>
> Or you could squash them once we reach consensus that both are good.

Ah, sorry, I completely misread the first one.  I thought that it
was extending the implementation of existing unused function by
using trace API, which explains why I mistook them as an either-or
choice.  I did not realize 1/2 was adding yet another unused one
without doing anything to the existing unused one.

So the choice being offered are:

 (0) take 2/2 only, keeping zero unused helper.
 (1) take 1/2 only, keeping two unused helpers.
 (2) do nothing, keeping the simple unused helper we had from the
     beginning of time.
 (3) take 1/2 and 2/2, replacing one simple unused helper with
     another unused helper that is more complex and capable.

Are you planning to, or do you know somebody who plans to, use one
or the other if available in a near future?  If so, it would be OK
to take choice (2) or (3), and it probably is preferrable to take
(3) between them.  A more complex and capable one would require
maintenance over time (trace API is being updated with the trace2 in
another topic that will start flying soon, so it would be expected a
user of trace API may need update), but as long as that is actually
used and help developers, that maintenance cost is worth paying.

If not, I would say that the option (1) or (3) are worse choices
than either (0) or (2).  It would be better to minimize maintenance
cost for unused helper(s), and the simpler one is likely to stay
maintenance free for longer than the more complex and capable one,
so (1) and (3) do not make much sense unless we plan to start using
these real soon.

>> It is not costing us much to leave it in the code.  It's not like
>> the function costed a lot of maintenance burden since it was added
>> in 8fd2cb40 ("Extract helper bits from c-merge-recursive work",
>> 2006-07-25), so the alternative #3 might be to do nothing.
>
> True, but ...
>
>> somebody else in the future to propose removing
>
> is what is actually happening here already, see
>
> https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
>
>> I am inclined to say we'd take
>> 2/2 ;-)
>>
>> Thanks.
>
> Feel free to take Alexanders patch from 2015 instead.

I prefer to take 2/2 over the one from 2015, especially if we can
explain the removal better.  We had three extra years that the
helper stayed unused and unchanged, which gives us a better
indication that it won't be missed.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-11  0:52                     ` Junio C Hamano
@ 2018-09-11  3:08                       ` Stefan Beller
  2018-09-11 21:03                         ` Jeff King
  2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
  1 sibling, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-11  3:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

On Mon, Sep 10, 2018 at 5:52 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> >> Of course, even though these are 1/2 and 2/2, only one of them and
> >> not both would apply.
> >
> > Or you could squash them once we reach consensus that both are good.
>
> Ah, sorry, I completely misread the first one.  I thought that it
> was extending the implementation of existing unused function by
> using trace API, which explains why I mistook them as an either-or
> choice.  I did not realize 1/2 was adding yet another unused one
> without doing anything to the existing unused one.
>
> So the choice being offered are:
>
>  (0) take 2/2 only, keeping zero unused helper.
>  (1) take 1/2 only, keeping two unused helpers.
>  (2) do nothing, keeping the simple unused helper we had from the
>      beginning of time.
>  (3) take 1/2 and 2/2, replacing one simple unused helper with
>      another unused helper that is more complex and capable.
>
> Are you planning to, or do you know somebody who plans to, use one
> or the other if available in a near future?  If so, it would be OK
> to take choice (2) or (3), and it probably is preferrable to take
> (3) between them.  A more complex and capable one would require
> maintenance over time (trace API is being updated with the trace2 in
> another topic that will start flying soon, so it would be expected a
> user of trace API may need update), but as long as that is actually
> used and help developers, that maintenance cost is worth paying.
>
> If not, I would say that the option (1) or (3) are worse choices
> than either (0) or (2).  It would be better to minimize maintenance
> cost for unused helper(s), and the simpler one is likely to stay
> maintenance free for longer than the more complex and capable one,
> so (1) and (3) do not make much sense unless we plan to start using
> these real soon.

Yes, I think (0) is the way to go, actually.

I wrote patch 1/2 to show Peff and you to prove otherwise that I am
not contributing "only grudgingly".

If the current unused function would be actually helpful in debugging
I would not remove it, but actually use it.

>
> >> It is not costing us much to leave it in the code.  It's not like
> >> the function costed a lot of maintenance burden since it was added
> >> in 8fd2cb40 ("Extract helper bits from c-merge-recursive work",
> >> 2006-07-25), so the alternative #3 might be to do nothing.
> >
> > True, but ...
> >
> >> somebody else in the future to propose removing
> >
> > is what is actually happening here already, see
> >
> > https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
> >
> >> I am inclined to say we'd take
> >> 2/2 ;-)
> >>
> >> Thanks.
> >
> > Feel free to take Alexanders patch from 2015 instead.
>
> I prefer to take 2/2 over the one from 2015, especially if we can
> explain the removal better.  We had three extra years that the
> helper stayed unused and unchanged, which gives us a better
> indication that it won't be missed.

Will send a patch with better reasons tomorrow,

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
  2018-09-06 18:03   ` Junio C Hamano
@ 2018-09-11 18:31     ` Stefan Beller
  0 siblings, 0 replies; 39+ messages in thread
From: Stefan Beller @ 2018-09-11 18:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Sep 6, 2018 at 11:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > Instead of sorting it after we created an unsorted list, we could insert
> > correctly into the list.
>
> It is unclear what problem you are solving, especially with
> subjunctive "could" there.  We are creating an unsorted list and
> then sorting it and you see it as a problem because it is just as
> easy and efficient to do the insertion sort while building up the
> list?  (don't react and answer without reading all the way to the
> end; I think I know what is going on).
>
> > As the unsorted append is in order of cache entry
> > names, this is already sorted if names were equal to paths for submodules.
>
> That may be a statement of a fact, but it is unclear how that fact
> relates to what the code is doing before this patch, or what the
> code updated by this patch is doing.
>
> > As submodule names are often the same as their path, the input is sorted
> > pretty well already, so let's just do the sort afterwards.
>
> It is unclear what (performance?) trade-off this senttence is trying
> to make.  It sounds as if it is claiming this:
>
>         We can string_list_insert() to maintain sorted-ness of the
>         list as we find new items, or we can string_list_append() to
>         build an unsorted list and sort it at the end just once.
>
>         To pick which one is more appropriate, we notice the fact
>         that we discover new items more or less in the already
>         sorted order.  That makes "append then sort" more
>         appropriate.
>
> But is that reasoning sensible?
>
> I'd imagine that append-then-sort would always be more efficient
> than insert-at-the-right-place-as-we-go, and the reason why we
> sometimes would need to do the latter is when we need to look up
> elements in the middle of building the list (e.g. we may want to
> de-dup, which requires us to look up a name from the ones we
> collected so far).

If we come across a (mostly) sorted list, then the insert-at-the-right-place
we'd come across the best case of insertion sort which is O(n), which
sounds better than append-then-sort as our sorting is a merge sort,
which has O(n log n) even in its best case (and needs to copy stuff
into a temp buffer and back).

By having the submodules named after its path, I strongly suspect
we have a mostly sorted list in nearly all cases except some really
interesting corner cases out there.

> And in this application, calculate_changed_submodule_paths()
> discover paths by calling collect_changed_submodules() which finds a
> mapping <submodule name, oid of commits> into a list sorted by
> submodule name, and then goes through that list and builds a list of
> submodules paths (which could be different from submodule names) by
> appending.  Only after this list is fully built, get_next_submodule()
> gets called, so making the latter use string_list_lookup() that assumes
> a sorted list is safe if we built the list by append-then-sort (iow,
> sortedness while building the list does not matter).
>
> Having analysed all that, I find it somewhat iffy that _append() is
> used there in the calculate_changed_submodule_paths() function.

Note that this is fixed in the later patch
"submodule.c: do not copy around submodule list"

>  It
> would cause the resulting changed_submodule_names list to hold the
> same name twice (or more),

This would be possible if there is a submodule at path A and another
submodule (at a different path) named "A", as we'll try hard to collect
names, but are also okay with path as we want to keep supporting the
historical use case of submodules.

> but I do not know if that would pose a
> problem to the consumer of the list.  Using "accumulate then sort
> before calling look-up" would not change it as string_list_sort()
> would not dedup, so I do not think this patch would introduce a new
> problem, though.

Yes, that is true, so we'd want to extend the message above to
mention the potential duplicates.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] string-list: remove unused function print_string_list
  2018-09-11  0:52                     ` Junio C Hamano
  2018-09-11  3:08                       ` Stefan Beller
@ 2018-09-11 18:48                       ` Stefan Beller
  2018-09-11 19:27                         ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-11 18:48 UTC (permalink / raw)
  To: gitster; +Cc: git, peff, sbeller

The helper function stayed unused for 3 years. A removal of that function
was proposed before[1], but now time has proven we really do not need the
function.

[1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 10 ----------
 string-list.h |  8 --------
 2 files changed, 18 deletions(-)

diff --git a/string-list.c b/string-list.c
index 771c4550980..1f6063f2a27 100644
--- a/string-list.c
+++ b/string-list.c
@@ -195,16 +195,6 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 	list->nr = list->alloc = 0;
 }
 
-
-void print_string_list(const struct string_list *p, const char *text)
-{
-	int i;
-	if ( text )
-		printf("%s\n", text);
-	for (i = 0; i < p->nr; i++)
-		printf("%s:%p\n", p->items[i].string, p->items[i].util);
-}
-
 struct string_list_item *string_list_append_nodup(struct string_list *list,
 						  char *string)
 {
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..18c718c12ce 100644
--- a/string-list.h
+++ b/string-list.h
@@ -113,14 +113,6 @@ typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
 void filter_string_list(struct string_list *list, int free_util,
 			string_list_each_func_t want, void *cb_data);
 
-/**
- * Dump a string_list to stdout, useful mainly for debugging
- * purposes. It can take an optional header argument and it writes out
- * the string-pointer pairs of the string_list, each one in its own
- * line.
- */
-void print_string_list(const struct string_list *p, const char *text);
-
 /**
  * Free a string_list. The `string` pointer of the items will be freed
  * in case the `strdup_strings` member of the string_list is set. The
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] string-list: remove unused function print_string_list
  2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
@ 2018-09-11 19:27                         ` Junio C Hamano
  2018-09-11 19:30                           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 19:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <sbeller@google.com> writes:

> The helper function stayed unused for 3 years. A removal of that function

I think it stayed unused for more than that before the previous
proposal to remove it was written (I do not bother going back to my
earlier message that identified which exact commit this was
introduced at).  It has stayed that way for 3 more years since then.

> was proposed before[1], but now time has proven we really do not need the
> function.
>
> [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
> Signed-off-by: Stefan Beller <sbeller@google.com>

I'll add a blank line before the sign-off.  Is this an example that
our "where is the existing trailer?" code misbehaving?

> ---
>  string-list.c | 10 ----------
>  string-list.h |  8 --------
>  2 files changed, 18 deletions(-)
>
> diff --git a/string-list.c b/string-list.c
> index 771c4550980..1f6063f2a27 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -195,16 +195,6 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
>  	list->nr = list->alloc = 0;
>  }
>  
> -
> -void print_string_list(const struct string_list *p, const char *text)
> -{
> -	int i;
> -	if ( text )
> -		printf("%s\n", text);
> -	for (i = 0; i < p->nr; i++)
> -		printf("%s:%p\n", p->items[i].string, p->items[i].util);
> -}
> -
>  struct string_list_item *string_list_append_nodup(struct string_list *list,
>  						  char *string)
>  {
> diff --git a/string-list.h b/string-list.h
> index ff8f6094a33..18c718c12ce 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -113,14 +113,6 @@ typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>  void filter_string_list(struct string_list *list, int free_util,
>  			string_list_each_func_t want, void *cb_data);
>  
> -/**
> - * Dump a string_list to stdout, useful mainly for debugging
> - * purposes. It can take an optional header argument and it writes out
> - * the string-pointer pairs of the string_list, each one in its own
> - * line.
> - */
> -void print_string_list(const struct string_list *p, const char *text);
> -
>  /**
>   * Free a string_list. The `string` pointer of the items will be freed
>   * in case the `strdup_strings` member of the string_list is set. The

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] string-list: remove unused function print_string_list
  2018-09-11 19:27                         ` Junio C Hamano
@ 2018-09-11 19:30                           ` Junio C Hamano
  2018-09-11 19:47                             ` Stefan Beller
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 19:30 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <sbeller@google.com> writes:
>
>> The helper function stayed unused for 3 years. A removal of that function
>
> I think it stayed unused for more than that before the previous
> proposal to remove it was written (I do not bother going back to my
> earlier message that identified which exact commit this was
> introduced at).  It has stayed that way for 3 more years since then.
>
>> was proposed before[1], but now time has proven we really do not need the
>> function.
>>
>> [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>
> I'll add a blank line before the sign-off.  Is this an example that
> our "where is the existing trailer?" code misbehaving?

I am still curious about this one.

In any case, I've reworded the above around "3 years".

    string-list: remove unused function print_string_list
    
    A removal of this helper function was proposed 3 years ago [1]; the
    function was never used since it was introduced in 2006 back then,
    and there is no new callers since.  Now time has proven we really do
    not need the function.
    
    [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
    
    Signed-off-by: Stefan Beller <sbeller@google.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] string-list: remove unused function print_string_list
  2018-09-11 19:30                           ` Junio C Hamano
@ 2018-09-11 19:47                             ` Stefan Beller
  2018-09-11 20:53                               ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Beller @ 2018-09-11 19:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

On Tue, Sep 11, 2018 at 12:31 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Stefan Beller <sbeller@google.com> writes:
> >
> >> The helper function stayed unused for 3 years. A removal of that function
> >
> > I think it stayed unused for more than that before the previous
> > proposal to remove it was written (I do not bother going back to my
> > earlier message that identified which exact commit this was
> > introduced at).  It has stayed that way for 3 more years since then.
> >
> >> was proposed before[1], but now time has proven we really do not need the
> >> function.
> >>
> >> [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
> >> Signed-off-by: Stefan Beller <sbeller@google.com>
> >
> > I'll add a blank line before the sign-off.  Is this an example that
> > our "where is the existing trailer?" code misbehaving?
>
> I am still curious about this one.

No, it's me who is misbehaving. ;-)

While I do have format.signoff set, such that I never forget to sign off on
a patch, I have the habit to sign it off manually, in git-gui while editing the
commit message so I can see the whole message.

> In any case, I've reworded the above around "3 years".
>
>     string-list: remove unused function print_string_list
>
>     A removal of this helper function was proposed 3 years ago [1]; the
>     function was never used since it was introduced in 2006 back then,
>     and there is no new callers since.  Now time has proven we really do
>     not need the function.
>
>     [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
>
>     Signed-off-by: Stefan Beller <sbeller@google.com>
>     Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> Thanks.

Thanks for rewording it.
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] string-list: remove unused function print_string_list
  2018-09-11 19:47                             ` Stefan Beller
@ 2018-09-11 20:53                               ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 20:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Jeff King

Stefan Beller <sbeller@google.com> writes:

>> >>
>> >> [1] https://public-inbox.org/git/1421343725-3973-1-git-send-email-kuleshovmail@gmail.com/
>> >> Signed-off-by: Stefan Beller <sbeller@google.com>
>> >
>> > I'll add a blank line before the sign-off.  Is this an example that
>> > our "where is the existing trailer?" code misbehaving?
>>
>> I am still curious about this one.
>
> No, it's me who is misbehaving. ;-)

Whew.  That lets me worried about one fewer things ;-)  Thanks.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] trace: add trace_print_string_list_key
  2018-09-11  3:08                       ` Stefan Beller
@ 2018-09-11 21:03                         ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-09-11 21:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git

On Mon, Sep 10, 2018 at 08:08:35PM -0700, Stefan Beller wrote:

> > So the choice being offered are:
> >
> >  (0) take 2/2 only, keeping zero unused helper.
> >  (1) take 1/2 only, keeping two unused helpers.
> >  (2) do nothing, keeping the simple unused helper we had from the
> >      beginning of time.
> >  (3) take 1/2 and 2/2, replacing one simple unused helper with
> >      another unused helper that is more complex and capable.
> [...]
> 
> Yes, I think (0) is the way to go, actually.
> 
> I wrote patch 1/2 to show Peff and you to prove otherwise that I am
> not contributing "only grudgingly".

I am perfectly happy with 0. But for reference, your new trace helper
looks completely reasonable from a quick view. Perhaps we can let it
live on in the list archive, and somebody may find a good use for it in
the future (though there is a significant chance that they would not
think to search the archive -- it could even be of value to commit and
revert it so that they find it in "git log", but that may be getting
pretty hypothetical).

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-09-11 21:03 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
2018-09-06 16:52   ` Junio C Hamano
2018-09-06 16:56     ` Jeff King
2018-09-06 22:16       ` Stefan Beller
2018-09-07  0:04         ` Jeff King
2018-09-07  9:53           ` Junio C Hamano
2018-09-07 17:21             ` Stefan Beller
2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
2018-09-10 21:58                 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
2018-09-10 22:38                   ` Stefan Beller
2018-09-11  0:52                     ` Junio C Hamano
2018-09-11  3:08                       ` Stefan Beller
2018-09-11 21:03                         ` Jeff King
2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
2018-09-11 19:27                         ` Junio C Hamano
2018-09-11 19:30                           ` Junio C Hamano
2018-09-11 19:47                             ` Stefan Beller
2018-09-11 20:53                               ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
2018-09-06 17:10   ` Junio C Hamano
2018-09-06 22:29     ` Stefan Beller
2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
2018-09-06 17:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
2018-09-06 17:31   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
2018-09-06 17:34   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-06 18:03   ` Junio C Hamano
2018-09-11 18:31     ` Stefan Beller
2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-06 18:04   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
2018-09-06 18:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).