Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers
@ 2020-05-05  1:13 Taylor Blau
  2020-05-05  1:13 ` [PATCH 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
                   ` (10 more replies)
  0 siblings, 11 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Hi,

These are a few patches to address [1] by getting rid of the
'COMMIT_GRAPH_WRITE_CHECK_OIDS' flag. A more complete discussion about
why we might want to get rid of that flag can be found in the second
patch, as well as in [1], but the basic gist is as follows:

  - It is often convenient to pipe the output of 'git for-each-ref' into
    'git commit-graph write --stdin-commits', but handling multi-layer
    tags is frustrating (at the very least, you have to peel them
    yourself in 'for-each-ref', incurring a higher-than-necessary cost
    outside of the commit-graph invocation)

  - It would be much more convenient if 'git commit-graph write
    --stdin-commits' ignored OIDs that peel down to non-commit objects
    silently, while still warning the caller about objects that (a)
    don't exist, or (b) invalid hash inputs (for e.g., "HEAD", or
    "refs/heads/foo").

  - By making this change, we are allowing *more* possible behaviors
    than in previous versions, and the only case that we are breaking
    (that 'git commit-graph write --stdin-commits' complains on
    non-commit OIDs) can be addressed by peeling outside of
    'commit-graph write' [2].

The first six patches move peeling refs out from the commit-graph
internals, and into the callers.

This has no impact on the '--stdin-commits' case, but is a potential
improvement for '--reachable', where it is more efficient to call
'peel_ref()' than 'lookup_commit_reference_gently()'. There is an
intermediate state (that is resolved in the final patch) where we can
have more than one progress meter (since the progress meter moved with
the perhaps-expensive operations out of the commit-graph internals,
too), but the final patch addresses this.

The last two patches lift the restriction on input to '--stdin-commits'
resolving to commit objects, instead making the behavior to silently
drop non-commit OIDs, and continue complain about invalid OIDs, and
valid-but-missing OIDs.

Finally, this topic isn't based on anything that isn't already in
'master', so I think the days of many tangle commit-graph topics are
behind us.

Thanks in advance for your review.

Thanks,
Taylor

Taylor Blau (8):
  commit-graph.c: extract 'refs_cb_data'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: peel refs in 'add_ref_to_set'
  builtin/commit-graph.c: extract 'read_one_commit()'
  builtin/commit-graph.c: dereference tags in builtin
  commit-graph.c: simplify 'fill_oids_from_commits'
  t5318: reorder test below 'graph_read_expect'
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag

 Documentation/git-commit-graph.txt |  6 ++-
 builtin/commit-graph.c             | 73 ++++++++++++++++++++----------
 commit-graph.c                     | 61 +++++++++++--------------
 commit-graph.h                     | 10 ++--
 t/t5318-commit-graph.sh            | 25 ++++++----
 5 files changed, 98 insertions(+), 77 deletions(-)

--
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 1/8] commit-graph.c: extract 'refs_cb_data'
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-05  1:13 ` [PATCH 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In subsequent patches, we are going to update a progress meter when
'add_ref_to_set()' is called, and need a convenient way to pass a
'struct progress *' in from the caller.

Introduce 'refs_cb_data' as a catch-all for parameters that
'add_ref_to_set' may need, and wrap the existing single parameter in
that struct.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6dc777e2f3..00da281f39 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1318,13 +1318,17 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 	stop_progress(&progress);
 }
 
+struct refs_cb_data {
+	struct oidset *commits;
+};
+
 static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
-	struct oidset *commits = (struct oidset *)cb_data;
+	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
-	oidset_insert(commits, oid);
+	oidset_insert(data->commits, oid);
 	return 0;
 }
 
@@ -1333,9 +1337,13 @@ int write_commit_graph_reachable(struct object_directory *odb,
 				 const struct split_commit_graph_opts *split_opts)
 {
 	struct oidset commits = OIDSET_INIT;
+	struct refs_cb_data data;
 	int result;
 
-	for_each_ref(add_ref_to_set, &commits);
+	memset(&data, 0, sizeof(data));
+	data.commits = &commits;
+
+	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 2/8] commit-graph.c: show progress of finding reachable commits
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
  2020-05-05  1:13 ` [PATCH 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-05 11:50   ` Derrick Stolee
  2020-05-05  1:13 ` [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When 'git commit-graph write --reachable' is invoked, the commit-graph
machinery calls 'for_each_ref()' to discover the set of reachable
commits.

Right now the 'add_ref_to_set' callback is not doing anything other than
adding an OID to the set of known-reachable OIDs. In a subsequent
commit, 'add_ref_to_set' will presumptively peel references. This
operation should be fast for repositories with an up-to-date
'$GIT_DIR/packed-refs', but may be slow in the general case.

So that it doesn't appear that 'git commit-graph write' is idling with
'--reachable' in the slow case, add a progress meter to provide some
output in the meantime.

In general, we don't expect a progress meter to appear at all, since
peeling references with a 'packed-refs' file is quick. If it's slow and
we do show a progress meter, the subsequent 'fill_oids_from_commits()'
will be fast, since all of the calls to
'lookup_commit_reference_gently()' will be no-ops.

Both progress meters are delayed, so it is unlikely that more than one
will appear. In either case, this intermediate state will go away in a
handful of patches, at which point there will be at most one progress
meter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 00da281f39..8f61256b0a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1320,6 +1320,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 
 struct refs_cb_data {
 	struct oidset *commits;
+	struct progress *progress;
 };
 
 static int add_ref_to_set(const char *refname,
@@ -1328,6 +1329,8 @@ static int add_ref_to_set(const char *refname,
 {
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
+	display_progress(data->progress, oidset_size(data->commits) + 1);
+
 	oidset_insert(data->commits, oid);
 	return 0;
 }
@@ -1342,12 +1345,17 @@ int write_commit_graph_reachable(struct object_directory *odb,
 
 	memset(&data, 0, sizeof(data));
 	data.commits = &commits;
+	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
+		data.progress = start_delayed_progress(
+			_("Finding reachable commits"), 0);
 
 	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
 	oidset_clear(&commits);
+	if (data.progress)
+		stop_progress(&data.progress);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
  2020-05-05  1:13 ` [PATCH 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
  2020-05-05  1:13 ` [PATCH 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-07 19:54   ` Jeff King
  2020-05-05  1:13 ` [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

While iterating references (to discover the set of commits to write to
the commit-graph with 'git commit-graph write --reachable'),
'add_ref_to_set' can save 'fill_oids_from_commits()' some time by
peeling the references beforehand.

Move peeling out of 'fill_oids_from_commits()' and into
'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing
so allows the commit-graph machinery to use the peeled value from
'$GIT_DIR/packed-refs' instead of having to load and parse tags.

While we're at it, discard non-commit objects reachable from ref tips.
This would be done automatically by 'fill_oids_from_commits()', but such
functionality will be removed in a subsequent patch after the call to
'lookup_commit_reference_gently' is dropped (at which point a non-commit
object in the commits oidset will become an error).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8f61256b0a..5c3fad0dd7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1327,11 +1327,15 @@ static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
+	struct object_id peeled;
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
 	display_progress(data->progress, oidset_size(data->commits) + 1);
 
-	oidset_insert(data->commits, oid);
+	if (peel_ref(refname, &peeled))
+		peeled = *oid;
+	if (oid_object_info(the_repository, &peeled, NULL) == OBJ_COMMIT)
+		oidset_insert(data->commits, &peeled);
 	return 0;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (2 preceding siblings ...)
  2020-05-05  1:13 ` [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-07 20:03   ` Jeff King
  2020-05-05  1:13 ` [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 54 +++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 15fe60317c..f550d8489a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -138,12 +138,25 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int read_one_commit(struct oidset *commits, char *hash)
+{
+	struct object_id oid;
+	const char *end;
+
+	if (parse_oid_hex(hash, &oid, &end)) {
+		error(_("unexpected non-hex object ID: %s"), hash);
+		return 1;
+	}
+
+	oidset_insert(commits, &oid);
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
-	struct string_list *pack_indexes = NULL;
+	struct string_list pack_indexes;
 	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
-	struct string_list lines;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
 
@@ -209,44 +222,35 @@ static int graph_write(int argc, const char **argv)
 		return 0;
 	}
 
-	string_list_init(&lines, 0);
+	string_list_init(&pack_indexes, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-
-		while (strbuf_getline(&buf, stdin) != EOF)
-			string_list_append(&lines, strbuf_detach(&buf, NULL));
-
-		if (opts.stdin_packs)
-			pack_indexes = &lines;
 		if (opts.stdin_commits) {
-			struct string_list_item *item;
-			oidset_init(&commits, lines.nr);
-			for_each_string_list_item(item, &lines) {
-				struct object_id oid;
-				const char *end;
-
-				if (parse_oid_hex(item->string, &oid, &end)) {
-					error(_("unexpected non-hex object ID: "
-						"%s"), item->string);
-					return 1;
-				}
-
-				oidset_insert(&commits, &oid);
-			}
+			oidset_init(&commits, 0);
 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
 
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			char *line = strbuf_detach(&buf, NULL);
+			if (opts.stdin_commits) {
+				int result = read_one_commit(&commits, line);
+				if (result)
+					return result;
+			} else
+				string_list_append(&pack_indexes, line);
+		}
+
 		UNLEAK(buf);
 	}
 
 	if (write_commit_graph(odb,
-			       pack_indexes,
+			       opts.stdin_packs ? &pack_indexes : NULL,
 			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;
 
-	UNLEAK(lines);
+	UNLEAK(pack_indexes);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (3 preceding siblings ...)
  2020-05-05  1:13 ` [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-05 12:01   ` Derrick Stolee
  2020-05-07 20:14   ` Jeff King
  2020-05-05  1:13 ` [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.

In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.

Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.

Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f550d8489a..9eec68572f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -6,6 +6,7 @@
 #include "repository.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "progress.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
@@ -138,8 +139,10 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
-static int read_one_commit(struct oidset *commits, char *hash)
+static int read_one_commit(struct oidset *commits, struct progress *progress,
+			   char *hash)
 {
+	struct commit *result;
 	struct object_id oid;
 	const char *end;
 
@@ -148,7 +151,15 @@ static int read_one_commit(struct oidset *commits, char *hash)
 		return 1;
 	}
 
-	oidset_insert(commits, &oid);
+	display_progress(progress, oidset_size(commits) + 1);
+
+	result = lookup_commit_reference_gently(the_repository, &oid, 1);
+	if (result)
+		oidset_insert(commits, &result->object.oid);
+	else {
+		error(_("invalid commit object id: %s"), hash);
+		return 1;
+	}
 	return 0;
 }
 
@@ -159,6 +170,7 @@ static int graph_write(int argc, const char **argv)
 	struct object_directory *odb = NULL;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
+	struct progress *progress = NULL;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -228,18 +240,25 @@ static int graph_write(int argc, const char **argv)
 		if (opts.stdin_commits) {
 			oidset_init(&commits, 0);
 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+			if (opts.progress)
+				progress = start_delayed_progress(
+					_("Analyzing commits from stdin"), 0);
 		}
 
 		while (strbuf_getline(&buf, stdin) != EOF) {
 			char *line = strbuf_detach(&buf, NULL);
 			if (opts.stdin_commits) {
-				int result = read_one_commit(&commits, line);
+				int result = read_one_commit(&commits, progress,
+							     line);
 				if (result)
 					return result;
 			} else
 				string_list_append(&pack_indexes, line);
 		}
 
+		if (progress)
+			stop_progress(&progress);
+
 		UNLEAK(buf);
 	}
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits'
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (4 preceding siblings ...)
  2020-05-05  1:13 ` [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
@ 2020-05-05  1:13 ` Taylor Blau
  2020-05-05 12:05   ` Derrick Stolee
  2020-05-07 20:21   ` Jeff King
  2020-05-05  1:14 ` [PATCH 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:13 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the previous handful of commits, both 'git commit-graph write
--reachable' and '--stdin-commits' learned to peel tags down to the
commits which they refer to before passing them into the commit-graph
internals.

This makes the call to 'lookup_commit_reference_gently()' inside of
'fill_oids_from_commits()' a noop, since all OIDs are commits by that
point.

As such, remove the call entirely, as well as the progress meter, which
has been split and moved out to the callers in the aforementioned
earlier commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 5c3fad0dd7..24c2d80935 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1411,46 +1411,19 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 static int fill_oids_from_commits(struct write_commit_graph_context *ctx,
 				  struct oidset *commits)
 {
-	uint32_t i = 0;
-	struct strbuf progress_title = STRBUF_INIT;
 	struct oidset_iter iter;
 	struct object_id *oid;
 
 	if (!oidset_size(commits))
 		return 0;
 
-	if (ctx->report_progress) {
-		strbuf_addf(&progress_title,
-			    Q_("Finding commits for commit graph from %d ref",
-			       "Finding commits for commit graph from %d refs",
-			       oidset_size(commits)),
-			    oidset_size(commits));
-		ctx->progress = start_delayed_progress(
-					progress_title.buf,
-					oidset_size(commits));
-	}
-
 	oidset_iter_init(commits, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
-		struct commit *result;
-
-		display_progress(ctx->progress, ++i);
-
-		result = lookup_commit_reference_gently(ctx->r, oid, 1);
-		if (result) {
-			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
-			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
-			ctx->oids.nr++;
-		} else if (ctx->check_oids) {
-			error(_("invalid commit object id: %s"),
-			      oid_to_hex(oid));
-			return -1;
-		}
+		ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+		oidcpy(&ctx->oids.list[ctx->oids.nr], oid);
+		ctx->oids.nr++;
 	}
 
-	stop_progress(&ctx->progress);
-	strbuf_release(&progress_title);
-
 	return 0;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 7/8] t5318: reorder test below 'graph_read_expect'
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (5 preceding siblings ...)
  2020-05-05  1:13 ` [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
@ 2020-05-05  1:14 ` Taylor Blau
  2020-05-05  1:14 ` [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:14 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the subsequent commit, we will introduce a dependency on
'graph_read_expect' from t5318.7. Preemptively move it below
'graph_read_expect()'s definition so that the test can call it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5318-commit-graph.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 39e2918a32..89020d3d44 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -42,15 +42,6 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
-test_expect_success 'exit with correct error on bad input to --stdin-commits' '
-	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
-'
-
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
@@ -91,6 +82,15 @@ graph_read_expect() {
 	test_cmp expect output
 }
 
+test_expect_success 'exit with correct error on bad input to --stdin-commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
+	# valid tree OID, but not a commit OID
+	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "invalid commit object id" stderr
+'
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (6 preceding siblings ...)
  2020-05-05  1:14 ` [PATCH 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
@ 2020-05-05  1:14 ` Taylor Blau
  2020-05-05 12:10   ` Derrick Stolee
  2020-05-07 20:40   ` Jeff King
  2020-05-05 11:35 ` [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Derrick Stolee
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05  1:14 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

    git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
    awk '/commit/ { print $1 }' |
    git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  6 ++++--
 builtin/commit-graph.c             | 10 +++++-----
 commit-graph.c                     |  2 --
 commit-graph.h                     | 10 ++++------
 t/t5318-commit-graph.sh            | 15 +++++++++++----
 5 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 53a650225a..fcac7d12e1 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
-of OIDs in hex, one OID per line. (Cannot be combined with
-`--stdin-packs` or `--reachable`.)
+of OIDs in hex, one OID per line. OIDs that resolve to non-commits
+(either directly, or by peeling tags) are silently ignored. OIDs that
+are malformed, or do not exist generate an error. (Cannot be combined
+with `--stdin-packs` or `--reachable`.)
 +
 With the `--reachable` option, generate the new commit graph by walking
 commits starting at all refs. (Cannot be combined with `--stdin-commits`
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9eec68572f..3637d079fb 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
 
 	display_progress(progress, oidset_size(commits) + 1);
 
+	if (oid_object_info(the_repository, &oid, NULL) < 0) {
+		error(_("object %s does not exist"), hash);
+		return 1;
+	}
+
 	result = lookup_commit_reference_gently(the_repository, &oid, 1);
 	if (result)
 		oidset_insert(commits, &result->object.oid);
-	else {
-		error(_("invalid commit object id: %s"), hash);
-		return 1;
-	}
 	return 0;
 }
 
@@ -239,7 +240,6 @@ static int graph_write(int argc, const char **argv)
 		struct strbuf buf = STRBUF_INIT;
 		if (opts.stdin_commits) {
 			oidset_init(&commits, 0);
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 			if (opts.progress)
 				progress = start_delayed_progress(
 					_("Analyzing commits from stdin"), 0);
diff --git a/commit-graph.c b/commit-graph.c
index 24c2d80935..4e5539bac5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -880,7 +880,6 @@ struct write_commit_graph_context {
 	unsigned append:1,
 		 report_progress:1,
 		 split:1,
-		 check_oids:1,
 		 changed_paths:1,
 		 order_by_pack:1;
 
@@ -2001,7 +2000,6 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
 	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
-	ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
 	ctx->split_opts = split_opts;
 	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
 	ctx->total_bloom_filter_data_size = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 4212766a4f..659393e07f 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -88,12 +88,10 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 int generation_numbers_enabled(struct repository *r);
 
 enum commit_graph_write_flags {
-	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
-	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
-	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
-	/* Make sure that each OID in the input is a valid commit OID. */
-	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
-	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
+	COMMIT_GRAPH_WRITE_APPEND        = (1 << 0),
+	COMMIT_GRAPH_WRITE_PROGRESS      = (1 << 1),
+	COMMIT_GRAPH_WRITE_SPLIT         = (1 << 2),
+	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3)
 };
 
 enum commit_graph_split_flags {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 89020d3d44..74f93f0a17 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -84,11 +84,18 @@ graph_read_expect() {
 
 test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	# invalid, non-hex OID
+	echo HEAD >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
 	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
+	# non-existent OID
+	echo $ZERO_OID >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
+	test_i18ngrep "does not exist" stderr &&
+	# valid commit and tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits <in &&
+	graph_read_expect 3
 '
 
 test_expect_success 'write graph' '
-- 
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (7 preceding siblings ...)
  2020-05-05  1:14 ` [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
@ 2020-05-05 11:35 ` Derrick Stolee
  2020-05-05 16:11   ` Taylor Blau
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
  10 siblings, 1 reply; 54+ messages in thread
From: Derrick Stolee @ 2020-05-05 11:35 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff, szeder.dev

On 5/4/2020 9:13 PM, Taylor Blau wrote:
> Hi,
> 
> These are a few patches to address [1]

We seem to be missing links [1] and [2].

>   - It is often convenient to pipe the output of 'git for-each-ref' into
>     'git commit-graph write --stdin-commits', but handling multi-layer
>     tags is frustrating (at the very least, you have to peel them
>     yourself in 'for-each-ref', incurring a higher-than-necessary cost
>     outside of the commit-graph invocation)
> 
>   - It would be much more convenient if 'git commit-graph write
>     --stdin-commits' ignored OIDs that peel down to non-commit objects
>     silently, while still warning the caller about objects that (a)
>     don't exist, or (b) invalid hash inputs (for e.g., "HEAD", or
>     "refs/heads/foo").
> 
>   - By making this change, we are allowing *more* possible behaviors
>     than in previous versions, and the only case that we are breaking
>     (that 'git commit-graph write --stdin-commits' complains on
>     non-commit OIDs) can be addressed by peeling outside of
>     'commit-graph write' [2].

I agree that these are worthy goals. If someone was _depending_ on
the "git commit-graph write" command to _fail_, then they chose a
strange way to discover a ref that points to a non-commit.

> The first six patches move peeling refs out from the commit-graph
> internals, and into the callers.
> 
> This has no impact on the '--stdin-commits' case, but is a potential
> improvement for '--reachable', where it is more efficient to call
> 'peel_ref()' than 'lookup_commit_reference_gently()'. There is an
> intermediate state (that is resolved in the final patch) where we can
> have more than one progress meter (since the progress meter moved with
> the perhaps-expensive operations out of the commit-graph internals,
> too), but the final patch addresses this.

Interesting. I'll be sure to look carefully at this progress handling
as this has potential to lose our progress options. I expect that you
were careful about this.

> The last two patches lift the restriction on input to '--stdin-commits'
> resolving to commit objects, instead making the behavior to silently
> drop non-commit OIDs, and continue complain about invalid OIDs, and
> valid-but-missing OIDs.
> 
> Finally, this topic isn't based on anything that isn't already in
> 'master', so I think the days of many tangle commit-graph topics are
> behind us.

Woo!

-Stolee


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/8] commit-graph.c: show progress of finding reachable commits
  2020-05-05  1:13 ` [PATCH 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
@ 2020-05-05 11:50   ` Derrick Stolee
  2020-05-05 16:13     ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Derrick Stolee @ 2020-05-05 11:50 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff, szeder.dev

On 5/4/2020 9:13 PM, Taylor Blau wrote:
> When 'git commit-graph write --reachable' is invoked, the commit-graph
> machinery calls 'for_each_ref()' to discover the set of reachable
> commits.
> 
> Right now the 'add_ref_to_set' callback is not doing anything other than
> adding an OID to the set of known-reachable OIDs. In a subsequent
> commit, 'add_ref_to_set' will presumptively peel references. This
> operation should be fast for repositories with an up-to-date
> '$GIT_DIR/packed-refs', but may be slow in the general case.
> 
> So that it doesn't appear that 'git commit-graph write' is idling with
> '--reachable' in the slow case, add a progress meter to provide some
> output in the meantime.
> 
> In general, we don't expect a progress meter to appear at all, since
> peeling references with a 'packed-refs' file is quick. If it's slow and
> we do show a progress meter, the subsequent 'fill_oids_from_commits()'
> will be fast, since all of the calls to
> 'lookup_commit_reference_gently()' will be no-ops.
> 
> Both progress meters are delayed, so it is unlikely that more than one
> will appear. In either case, this intermediate state will go away in a
> handful of patches, at which point there will be at most one progress
> meter.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 00da281f39..8f61256b0a 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1320,6 +1320,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  
>  struct refs_cb_data {
>  	struct oidset *commits;
> +	struct progress *progress;
>  };
>  
>  static int add_ref_to_set(const char *refname,
> @@ -1328,6 +1329,8 @@ static int add_ref_to_set(const char *refname,
>  {
>  	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
>  
> +	display_progress(data->progress, oidset_size(data->commits) + 1);
> +
>  	oidset_insert(data->commits, oid);
>  	return 0;
>  }
> @@ -1342,12 +1345,17 @@ int write_commit_graph_reachable(struct object_directory *odb,
>  
>  	memset(&data, 0, sizeof(data));
>  	data.commits = &commits;
> +	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
> +		data.progress = start_delayed_progress(
> +			_("Finding reachable commits"), 0);

Is this the right phrase to use? This seems too close to the progress
indicator "Expanding reachable commits in commit graph" which is walking
commits.

An alternative could be: "Collecting referenced commits"

Outside of this string, I have no complaints. I also don't feel too
strongly about the string, either.

Thanks,
-Stolee

 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-05  1:13 ` [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
@ 2020-05-05 12:01   ` Derrick Stolee
  2020-05-05 16:14     ` Taylor Blau
  2020-05-07 20:14   ` Jeff King
  1 sibling, 1 reply; 54+ messages in thread
From: Derrick Stolee @ 2020-05-05 12:01 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff, szeder.dev

On 5/4/2020 9:13 PM, Taylor Blau wrote:
> @@ -228,18 +240,25 @@ static int graph_write(int argc, const char **argv)
>  		if (opts.stdin_commits) {
>  			oidset_init(&commits, 0);
>  			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
> +			if (opts.progress)
> +				progress = start_delayed_progress(
> +					_("Analyzing commits from stdin"), 0);
The code functions as you intend and is an improvement. Similar to my
earlier suggestion to use something like "Collecting referenced commits"
for the --reachable case, perhaps this could be "Collecting commits from input"?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits'
  2020-05-05  1:13 ` [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
@ 2020-05-05 12:05   ` Derrick Stolee
  2020-05-07 20:21   ` Jeff King
  1 sibling, 0 replies; 54+ messages in thread
From: Derrick Stolee @ 2020-05-05 12:05 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff, szeder.dev

On 5/4/2020 9:13 PM, Taylor Blau wrote:
> In the previous handful of commits, both 'git commit-graph write
> --reachable' and '--stdin-commits' learned to peel tags down to the
> commits which they refer to before passing them into the commit-graph
> internals.
> 
> This makes the call to 'lookup_commit_reference_gently()' inside of
> 'fill_oids_from_commits()' a noop, since all OIDs are commits by that
> point.
> 
> As such, remove the call entirely, as well as the progress meter, which
> has been split and moved out to the callers in the aforementioned
> earlier commits.
...
>  	oidset_iter_init(commits, &iter);
>  	while ((oid = oidset_iter_next(&iter))) {
[snip removed code]
> +		ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
> +		oidcpy(&ctx->oids.list[ctx->oids.nr], oid);
> +		ctx->oids.nr++;
>  	}

After looking at this loop, I wondered why we can't just use the
oidset inside the context struct. But then I realized that the
oids list expands with reachable commits and then is sorted
lexicographically. Thus, this is the best time to copy from the
oidset to the commit list.

I agree that dropping the progress is valuable here, since this
_should_ be a very fast operation even for enormous commit sets.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-05  1:14 ` [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
@ 2020-05-05 12:10   ` Derrick Stolee
  2020-05-05 16:16     ` Taylor Blau
  2020-05-07 20:40   ` Jeff King
  1 sibling, 1 reply; 54+ messages in thread
From: Derrick Stolee @ 2020-05-05 12:10 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff, szeder.dev

On 5/4/2020 9:14 PM, Taylor Blau wrote:
> Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
> 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
> receiving non-commit OIDs as input to '--stdin-commits'.
> 
> This behavior can be cumbersome to work around in, say, the case of
> piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
> the caller does not want to cull out non-commits themselves. In this
> situation, it would be ideal if 'git commit-graph write' wrote the graph
> containing the inputs that did pertain to commits, and silently ignored
> the remainder of the input.
> 
> Some options have been proposed to the effect of '--[no-]check-oids'
> which would allow callers to have the commit-graph builtin do just that.
> After some discussion, it is difficult to imagine a caller who wouldn't
> want to pass '--no-check-oids', suggesting that we should get rid of the
> behavior of complaining about non-commit inputs altogether.
> 
> If callers do wish to retain this behavior, they can easily work around
> this change by doing the following:
> 
>     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
>     awk '/commit/ { print $1 }' |
>     git commit-graph write --stdin-commits

I appreciate that you included the workaround here for posterity. That
allows anyone complaining to bisect to these instructions.

> To make it so that valid OIDs that refer to non-existent objects are
> indeed an error after loosening the error handling, perform an extra
> lookup to make sure that object indeed exists before sending it to the
> commit-graph internals.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-commit-graph.txt |  6 ++++--
>  builtin/commit-graph.c             | 10 +++++-----
>  commit-graph.c                     |  2 --
>  commit-graph.h                     | 10 ++++------
>  t/t5318-commit-graph.sh            | 15 +++++++++++----
>  5 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 53a650225a..fcac7d12e1 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
>  +
>  With the `--stdin-commits` option, generate the new commit graph by
>  walking commits starting at the commits specified in stdin as a list
> -of OIDs in hex, one OID per line. (Cannot be combined with
> -`--stdin-packs` or `--reachable`.)
> +of OIDs in hex, one OID per line. OIDs that resolve to non-commits
> +(either directly, or by peeling tags) are silently ignored. OIDs that
> +are malformed, or do not exist generate an error. (Cannot be combined
> +with `--stdin-packs` or `--reachable`.)
>  +
>  With the `--reachable` option, generate the new commit graph by walking
>  commits starting at all refs. (Cannot be combined with `--stdin-commits`
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 9eec68572f..3637d079fb 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
>  
>  	display_progress(progress, oidset_size(commits) + 1);
>  
> +	if (oid_object_info(the_repository, &oid, NULL) < 0) {
> +		error(_("object %s does not exist"), hash);
> +		return 1;
> +	}
> +

If we get a non-existent object, then this will cause us to fail
the command, right?

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 89020d3d44..74f93f0a17 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -84,11 +84,18 @@ graph_read_expect() {
>  
>  test_expect_success 'exit with correct error on bad input to --stdin-commits' '
>  	cd "$TRASH_DIRECTORY/full" &&
> -	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
> +	# invalid, non-hex OID
> +	echo HEAD >in &&
> +	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
>  	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
> -	# valid tree OID, but not a commit OID
> -	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
> -	test_i18ngrep "invalid commit object id" stderr
> +	# non-existent OID
> +	echo $ZERO_OID >in &&
> +	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
> +	test_i18ngrep "does not exist" stderr &&

And here you verify that it fails at that point. Excellent!

> +	# valid commit and tree OID
> +	git rev-parse HEAD HEAD^{tree} >in &&
> +	git commit-graph write --stdin-commits <in &&
> +	graph_read_expect 3
>  '

This is an excellent series! Thanks.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-05 11:35 ` [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Derrick Stolee
@ 2020-05-05 16:11   ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05 16:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff, szeder.dev

On Tue, May 05, 2020 at 07:35:29AM -0400, Derrick Stolee wrote:
> On 5/4/2020 9:13 PM, Taylor Blau wrote:
> > Hi,
> >
> > These are a few patches to address [1]
>
> We seem to be missing links [1] and [2].

Whoops. I wrote the cover letter and then postponed it for an hour or
two while I shuffled (what are now) the last two patches around. It
looks like I dropped the hyperlinks in the meantime, but here they are
now:

[1]: https://lore.kernel.org/git/20200501223848.GH41612@syl.local/
[2]: https://lore.kernel.org/git/20200504145937.GA11373@coredump.intra.peff.net/

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 2/8] commit-graph.c: show progress of finding reachable commits
  2020-05-05 11:50   ` Derrick Stolee
@ 2020-05-05 16:13     ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05 16:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff, szeder.dev

On Tue, May 05, 2020 at 07:50:50AM -0400, Derrick Stolee wrote:
> On 5/4/2020 9:13 PM, Taylor Blau wrote:
> > When 'git commit-graph write --reachable' is invoked, the commit-graph
> > machinery calls 'for_each_ref()' to discover the set of reachable
> > commits.
> >
> > Right now the 'add_ref_to_set' callback is not doing anything other than
> > adding an OID to the set of known-reachable OIDs. In a subsequent
> > commit, 'add_ref_to_set' will presumptively peel references. This
> > operation should be fast for repositories with an up-to-date
> > '$GIT_DIR/packed-refs', but may be slow in the general case.
> >
> > So that it doesn't appear that 'git commit-graph write' is idling with
> > '--reachable' in the slow case, add a progress meter to provide some
> > output in the meantime.
> >
> > In general, we don't expect a progress meter to appear at all, since
> > peeling references with a 'packed-refs' file is quick. If it's slow and
> > we do show a progress meter, the subsequent 'fill_oids_from_commits()'
> > will be fast, since all of the calls to
> > 'lookup_commit_reference_gently()' will be no-ops.
> >
> > Both progress meters are delayed, so it is unlikely that more than one
> > will appear. In either case, this intermediate state will go away in a
> > handful of patches, at which point there will be at most one progress
> > meter.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  commit-graph.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 00da281f39..8f61256b0a 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -1320,6 +1320,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
> >
> >  struct refs_cb_data {
> >  	struct oidset *commits;
> > +	struct progress *progress;
> >  };
> >
> >  static int add_ref_to_set(const char *refname,
> > @@ -1328,6 +1329,8 @@ static int add_ref_to_set(const char *refname,
> >  {
> >  	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
> >
> > +	display_progress(data->progress, oidset_size(data->commits) + 1);
> > +
> >  	oidset_insert(data->commits, oid);
> >  	return 0;
> >  }
> > @@ -1342,12 +1345,17 @@ int write_commit_graph_reachable(struct object_directory *odb,
> >
> >  	memset(&data, 0, sizeof(data));
> >  	data.commits = &commits;
> > +	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
> > +		data.progress = start_delayed_progress(
> > +			_("Finding reachable commits"), 0);
>
> Is this the right phrase to use? This seems too close to the progress
> indicator "Expanding reachable commits in commit graph" which is walking
> commits.
>
> An alternative could be: "Collecting referenced commits"

That's a great suggestion. I'll apply that locally. For what it's worth,
I don't feel that strongly about it either, but it's not hard to change
;).

> Outside of this string, I have no complaints. I also don't feel too
> strongly about the string, either.
>
> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-05 12:01   ` Derrick Stolee
@ 2020-05-05 16:14     ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05 16:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff, szeder.dev

On Tue, May 05, 2020 at 08:01:29AM -0400, Derrick Stolee wrote:
> On 5/4/2020 9:13 PM, Taylor Blau wrote:
> > @@ -228,18 +240,25 @@ static int graph_write(int argc, const char **argv)
> >  		if (opts.stdin_commits) {
> >  			oidset_init(&commits, 0);
> >  			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
> > +			if (opts.progress)
> > +				progress = start_delayed_progress(
> > +					_("Analyzing commits from stdin"), 0);
> The code functions as you intend and is an improvement. Similar to my
> earlier suggestion to use something like "Collecting referenced commits"
> for the --reachable case, perhaps this could be "Collecting commits from input"?

Yep, making these consistent with one another is a good thing to do,
thanks.

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-05 12:10   ` Derrick Stolee
@ 2020-05-05 16:16     ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-05 16:16 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, gitster, peff, szeder.dev

On Tue, May 05, 2020 at 08:10:40AM -0400, Derrick Stolee wrote:
> On 5/4/2020 9:14 PM, Taylor Blau wrote:
> > Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
> > 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
> > receiving non-commit OIDs as input to '--stdin-commits'.
> >
> > This behavior can be cumbersome to work around in, say, the case of
> > piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
> > the caller does not want to cull out non-commits themselves. In this
> > situation, it would be ideal if 'git commit-graph write' wrote the graph
> > containing the inputs that did pertain to commits, and silently ignored
> > the remainder of the input.
> >
> > Some options have been proposed to the effect of '--[no-]check-oids'
> > which would allow callers to have the commit-graph builtin do just that.
> > After some discussion, it is difficult to imagine a caller who wouldn't
> > want to pass '--no-check-oids', suggesting that we should get rid of the
> > behavior of complaining about non-commit inputs altogether.
> >
> > If callers do wish to retain this behavior, they can easily work around
> > this change by doing the following:
> >
> >     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
> >     awk '/commit/ { print $1 }' |
> >     git commit-graph write --stdin-commits
>
> I appreciate that you included the workaround here for posterity. That
> allows anyone complaining to bisect to these instructions.

Well, now that I linked provided a link to [2] in the cover letter,
you'll see that it was Peff's suggestion ;).

But, I wrote something similar to this in one of GitHub's
post-processing jobs before we had this series, so I think I can take
some of the credit!

> > To make it so that valid OIDs that refer to non-existent objects are
> > indeed an error after loosening the error handling, perform an extra
> > lookup to make sure that object indeed exists before sending it to the
> > commit-graph internals.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  Documentation/git-commit-graph.txt |  6 ++++--
> >  builtin/commit-graph.c             | 10 +++++-----
> >  commit-graph.c                     |  2 --
> >  commit-graph.h                     | 10 ++++------
> >  t/t5318-commit-graph.sh            | 15 +++++++++++----
> >  5 files changed, 24 insertions(+), 19 deletions(-)
> >
> > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > index 53a650225a..fcac7d12e1 100644
> > --- a/Documentation/git-commit-graph.txt
> > +++ b/Documentation/git-commit-graph.txt
> > @@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
> >  +
> >  With the `--stdin-commits` option, generate the new commit graph by
> >  walking commits starting at the commits specified in stdin as a list
> > -of OIDs in hex, one OID per line. (Cannot be combined with
> > -`--stdin-packs` or `--reachable`.)
> > +of OIDs in hex, one OID per line. OIDs that resolve to non-commits
> > +(either directly, or by peeling tags) are silently ignored. OIDs that
> > +are malformed, or do not exist generate an error. (Cannot be combined
> > +with `--stdin-packs` or `--reachable`.)
> >  +
> >  With the `--reachable` option, generate the new commit graph by walking
> >  commits starting at all refs. (Cannot be combined with `--stdin-commits`
> > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> > index 9eec68572f..3637d079fb 100644
> > --- a/builtin/commit-graph.c
> > +++ b/builtin/commit-graph.c
> > @@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
> >
> >  	display_progress(progress, oidset_size(commits) + 1);
> >
> > +	if (oid_object_info(the_repository, &oid, NULL) < 0) {
> > +		error(_("object %s does not exist"), hash);
> > +		return 1;
> > +	}
> > +
>
> If we get a non-existent object, then this will cause us to fail
> the command, right?
>
> > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> > index 89020d3d44..74f93f0a17 100755
> > --- a/t/t5318-commit-graph.sh
> > +++ b/t/t5318-commit-graph.sh
> > @@ -84,11 +84,18 @@ graph_read_expect() {
> >
> >  test_expect_success 'exit with correct error on bad input to --stdin-commits' '
> >  	cd "$TRASH_DIRECTORY/full" &&
> > -	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
> > +	# invalid, non-hex OID
> > +	echo HEAD >in &&
> > +	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
> >  	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
> > -	# valid tree OID, but not a commit OID
> > -	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
> > -	test_i18ngrep "invalid commit object id" stderr
> > +	# non-existent OID
> > +	echo $ZERO_OID >in &&
> > +	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
> > +	test_i18ngrep "does not exist" stderr &&
>
> And here you verify that it fails at that point. Excellent!
>
> > +	# valid commit and tree OID
> > +	git rev-parse HEAD HEAD^{tree} >in &&
> > +	git commit-graph write --stdin-commits <in &&
> > +	graph_read_expect 3
> >  '
>
> This is an excellent series! Thanks.
>
> Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

Thanks for your review. I applied your suggested changes locally (which
came down to tweaking the progress meter's messages in the 2nd and 5th
patches), but I'll sit on them for a day or so to make sure that nobody
else has any thoughts before sending a v2.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (8 preceding siblings ...)
  2020-05-05 11:35 ` [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Derrick Stolee
@ 2020-05-06  0:07 ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
                     ` (8 more replies)
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
  10 siblings, 9 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Hi,

Here's a small re-roll of my series to move peeling outside of the
commit-graph internals, and drop the CHECK_OIDS flag. This re-roll was
promised in [1], and only updates the messages used in the new progress
meters. For convenience, a range-diff from v1 is included below.

Thanks in advance for another review.

[1]: https://lore.kernel.org/git/20200505161649.GG69300@syl.local/

Taylor Blau (8):
  commit-graph.c: extract 'refs_cb_data'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: peel refs in 'add_ref_to_set'
  builtin/commit-graph.c: extract 'read_one_commit()'
  builtin/commit-graph.c: dereference tags in builtin
  commit-graph.c: simplify 'fill_oids_from_commits'
  t5318: reorder test below 'graph_read_expect'
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag

 Documentation/git-commit-graph.txt |  6 ++-
 builtin/commit-graph.c             | 73 ++++++++++++++++++++----------
 commit-graph.c                     | 61 +++++++++++--------------
 commit-graph.h                     | 10 ++--
 t/t5318-commit-graph.sh            | 25 ++++++----
 5 files changed, 98 insertions(+), 77 deletions(-)

Range-diff against v1:
1:  5bdbeaf374 ! 1:  cb56a9a73b commit-graph.c: show progress of finding reachable commits
    @@ commit-graph.c: int write_commit_graph_reachable(struct object_directory *odb,
      	data.commits = &commits;
     +	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
     +		data.progress = start_delayed_progress(
    -+			_("Finding reachable commits"), 0);
    ++			_("Collecting referenced commits"), 0);

      	for_each_ref(add_ref_to_set, &data);
      	result = write_commit_graph(odb, NULL, &commits,
2:  5ff56feab5 = 2:  85c388a077 commit-graph.c: peel refs in 'add_ref_to_set'
3:  9ae8745dc0 = 3:  cef441b465 builtin/commit-graph.c: extract 'read_one_commit()'
4:  513a634f14 ! 4:  d449d83ce2 builtin/commit-graph.c: dereference tags in builtin
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
      			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
     +			if (opts.progress)
     +				progress = start_delayed_progress(
    -+					_("Analyzing commits from stdin"), 0);
    ++					_("Collecting commits from input"), 0);
      		}

      		while (strbuf_getline(&buf, stdin) != EOF) {
5:  7e9d8c1f1a = 5:  61887870c7 commit-graph.c: simplify 'fill_oids_from_commits'
6:  f2f018b54c = 6:  e393b16097 t5318: reorder test below 'graph_read_expect'
7:  6c2d130b0c ! 7:  ad373f05ff commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
     -			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
      			if (opts.progress)
      				progress = start_delayed_progress(
    - 					_("Analyzing commits from stdin"), 0);
    + 					_("Collecting commits from input"), 0);

      ## commit-graph.c ##
     @@ commit-graph.c: struct write_commit_graph_context {
--
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 1/8] commit-graph.c: extract 'refs_cb_data'
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In subsequent patches, we are going to update a progress meter when
'add_ref_to_set()' is called, and need a convenient way to pass a
'struct progress *' in from the caller.

Introduce 'refs_cb_data' as a catch-all for parameters that
'add_ref_to_set' may need, and wrap the existing single parameter in
that struct.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6dc777e2f3..00da281f39 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1318,13 +1318,17 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 	stop_progress(&progress);
 }
 
+struct refs_cb_data {
+	struct oidset *commits;
+};
+
 static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
-	struct oidset *commits = (struct oidset *)cb_data;
+	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
-	oidset_insert(commits, oid);
+	oidset_insert(data->commits, oid);
 	return 0;
 }
 
@@ -1333,9 +1337,13 @@ int write_commit_graph_reachable(struct object_directory *odb,
 				 const struct split_commit_graph_opts *split_opts)
 {
 	struct oidset commits = OIDSET_INIT;
+	struct refs_cb_data data;
 	int result;
 
-	for_each_ref(add_ref_to_set, &commits);
+	memset(&data, 0, sizeof(data));
+	data.commits = &commits;
+
+	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 2/8] commit-graph.c: show progress of finding reachable commits
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When 'git commit-graph write --reachable' is invoked, the commit-graph
machinery calls 'for_each_ref()' to discover the set of reachable
commits.

Right now the 'add_ref_to_set' callback is not doing anything other than
adding an OID to the set of known-reachable OIDs. In a subsequent
commit, 'add_ref_to_set' will presumptively peel references. This
operation should be fast for repositories with an up-to-date
'$GIT_DIR/packed-refs', but may be slow in the general case.

So that it doesn't appear that 'git commit-graph write' is idling with
'--reachable' in the slow case, add a progress meter to provide some
output in the meantime.

In general, we don't expect a progress meter to appear at all, since
peeling references with a 'packed-refs' file is quick. If it's slow and
we do show a progress meter, the subsequent 'fill_oids_from_commits()'
will be fast, since all of the calls to
'lookup_commit_reference_gently()' will be no-ops.

Both progress meters are delayed, so it is unlikely that more than one
will appear. In either case, this intermediate state will go away in a
handful of patches, at which point there will be at most one progress
meter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 00da281f39..d0397f2a23 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1320,6 +1320,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 
 struct refs_cb_data {
 	struct oidset *commits;
+	struct progress *progress;
 };
 
 static int add_ref_to_set(const char *refname,
@@ -1328,6 +1329,8 @@ static int add_ref_to_set(const char *refname,
 {
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
+	display_progress(data->progress, oidset_size(data->commits) + 1);
+
 	oidset_insert(data->commits, oid);
 	return 0;
 }
@@ -1342,12 +1345,17 @@ int write_commit_graph_reachable(struct object_directory *odb,
 
 	memset(&data, 0, sizeof(data));
 	data.commits = &commits;
+	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
+		data.progress = start_delayed_progress(
+			_("Collecting referenced commits"), 0);
 
 	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
 	oidset_clear(&commits);
+	if (data.progress)
+		stop_progress(&data.progress);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

While iterating references (to discover the set of commits to write to
the commit-graph with 'git commit-graph write --reachable'),
'add_ref_to_set' can save 'fill_oids_from_commits()' some time by
peeling the references beforehand.

Move peeling out of 'fill_oids_from_commits()' and into
'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing
so allows the commit-graph machinery to use the peeled value from
'$GIT_DIR/packed-refs' instead of having to load and parse tags.

While we're at it, discard non-commit objects reachable from ref tips.
This would be done automatically by 'fill_oids_from_commits()', but such
functionality will be removed in a subsequent patch after the call to
'lookup_commit_reference_gently' is dropped (at which point a non-commit
object in the commits oidset will become an error).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index d0397f2a23..2c3b5fd09d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1327,11 +1327,15 @@ static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
+	struct object_id peeled;
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
 	display_progress(data->progress, oidset_size(data->commits) + 1);
 
-	oidset_insert(data->commits, oid);
+	if (peel_ref(refname, &peeled))
+		peeled = *oid;
+	if (oid_object_info(the_repository, &peeled, NULL) == OBJ_COMMIT)
+		oidset_insert(data->commits, &peeled);
 	return 0;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (2 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 54 +++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 15fe60317c..f550d8489a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -138,12 +138,25 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int read_one_commit(struct oidset *commits, char *hash)
+{
+	struct object_id oid;
+	const char *end;
+
+	if (parse_oid_hex(hash, &oid, &end)) {
+		error(_("unexpected non-hex object ID: %s"), hash);
+		return 1;
+	}
+
+	oidset_insert(commits, &oid);
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
-	struct string_list *pack_indexes = NULL;
+	struct string_list pack_indexes;
 	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
-	struct string_list lines;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
 
@@ -209,44 +222,35 @@ static int graph_write(int argc, const char **argv)
 		return 0;
 	}
 
-	string_list_init(&lines, 0);
+	string_list_init(&pack_indexes, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-
-		while (strbuf_getline(&buf, stdin) != EOF)
-			string_list_append(&lines, strbuf_detach(&buf, NULL));
-
-		if (opts.stdin_packs)
-			pack_indexes = &lines;
 		if (opts.stdin_commits) {
-			struct string_list_item *item;
-			oidset_init(&commits, lines.nr);
-			for_each_string_list_item(item, &lines) {
-				struct object_id oid;
-				const char *end;
-
-				if (parse_oid_hex(item->string, &oid, &end)) {
-					error(_("unexpected non-hex object ID: "
-						"%s"), item->string);
-					return 1;
-				}
-
-				oidset_insert(&commits, &oid);
-			}
+			oidset_init(&commits, 0);
 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
 
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			char *line = strbuf_detach(&buf, NULL);
+			if (opts.stdin_commits) {
+				int result = read_one_commit(&commits, line);
+				if (result)
+					return result;
+			} else
+				string_list_append(&pack_indexes, line);
+		}
+
 		UNLEAK(buf);
 	}
 
 	if (write_commit_graph(odb,
-			       pack_indexes,
+			       opts.stdin_packs ? &pack_indexes : NULL,
 			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;
 
-	UNLEAK(lines);
+	UNLEAK(pack_indexes);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (3 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.

In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.

Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.

Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f550d8489a..6537e9acef 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -6,6 +6,7 @@
 #include "repository.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "progress.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
@@ -138,8 +139,10 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
-static int read_one_commit(struct oidset *commits, char *hash)
+static int read_one_commit(struct oidset *commits, struct progress *progress,
+			   char *hash)
 {
+	struct commit *result;
 	struct object_id oid;
 	const char *end;
 
@@ -148,7 +151,15 @@ static int read_one_commit(struct oidset *commits, char *hash)
 		return 1;
 	}
 
-	oidset_insert(commits, &oid);
+	display_progress(progress, oidset_size(commits) + 1);
+
+	result = lookup_commit_reference_gently(the_repository, &oid, 1);
+	if (result)
+		oidset_insert(commits, &result->object.oid);
+	else {
+		error(_("invalid commit object id: %s"), hash);
+		return 1;
+	}
 	return 0;
 }
 
@@ -159,6 +170,7 @@ static int graph_write(int argc, const char **argv)
 	struct object_directory *odb = NULL;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
+	struct progress *progress = NULL;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -228,18 +240,25 @@ static int graph_write(int argc, const char **argv)
 		if (opts.stdin_commits) {
 			oidset_init(&commits, 0);
 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+			if (opts.progress)
+				progress = start_delayed_progress(
+					_("Collecting commits from input"), 0);
 		}
 
 		while (strbuf_getline(&buf, stdin) != EOF) {
 			char *line = strbuf_detach(&buf, NULL);
 			if (opts.stdin_commits) {
-				int result = read_one_commit(&commits, line);
+				int result = read_one_commit(&commits, progress,
+							     line);
 				if (result)
 					return result;
 			} else
 				string_list_append(&pack_indexes, line);
 		}
 
+		if (progress)
+			stop_progress(&progress);
+
 		UNLEAK(buf);
 	}
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 6/8] commit-graph.c: simplify 'fill_oids_from_commits'
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (4 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the previous handful of commits, both 'git commit-graph write
--reachable' and '--stdin-commits' learned to peel tags down to the
commits which they refer to before passing them into the commit-graph
internals.

This makes the call to 'lookup_commit_reference_gently()' inside of
'fill_oids_from_commits()' a noop, since all OIDs are commits by that
point.

As such, remove the call entirely, as well as the progress meter, which
has been split and moved out to the callers in the aforementioned
earlier commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2c3b5fd09d..56a4a43b30 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1411,46 +1411,19 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 static int fill_oids_from_commits(struct write_commit_graph_context *ctx,
 				  struct oidset *commits)
 {
-	uint32_t i = 0;
-	struct strbuf progress_title = STRBUF_INIT;
 	struct oidset_iter iter;
 	struct object_id *oid;
 
 	if (!oidset_size(commits))
 		return 0;
 
-	if (ctx->report_progress) {
-		strbuf_addf(&progress_title,
-			    Q_("Finding commits for commit graph from %d ref",
-			       "Finding commits for commit graph from %d refs",
-			       oidset_size(commits)),
-			    oidset_size(commits));
-		ctx->progress = start_delayed_progress(
-					progress_title.buf,
-					oidset_size(commits));
-	}
-
 	oidset_iter_init(commits, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
-		struct commit *result;
-
-		display_progress(ctx->progress, ++i);
-
-		result = lookup_commit_reference_gently(ctx->r, oid, 1);
-		if (result) {
-			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
-			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
-			ctx->oids.nr++;
-		} else if (ctx->check_oids) {
-			error(_("invalid commit object id: %s"),
-			      oid_to_hex(oid));
-			return -1;
-		}
+		ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+		oidcpy(&ctx->oids.list[ctx->oids.nr], oid);
+		ctx->oids.nr++;
 	}
 
-	stop_progress(&ctx->progress);
-	strbuf_release(&progress_title);
-
 	return 0;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 7/8] t5318: reorder test below 'graph_read_expect'
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (5 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-06  0:07   ` [PATCH v2 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
  2020-05-07 20:42   ` [PATCH v2 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the subsequent commit, we will introduce a dependency on
'graph_read_expect' from t5318.7. Preemptively move it below
'graph_read_expect()'s definition so that the test can call it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5318-commit-graph.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 39e2918a32..89020d3d44 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -42,15 +42,6 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
-test_expect_success 'exit with correct error on bad input to --stdin-commits' '
-	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
-'
-
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
@@ -91,6 +82,15 @@ graph_read_expect() {
 	test_cmp expect output
 }
 
+test_expect_success 'exit with correct error on bad input to --stdin-commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
+	# valid tree OID, but not a commit OID
+	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "invalid commit object id" stderr
+'
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (6 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
@ 2020-05-06  0:07   ` Taylor Blau
  2020-05-07 20:42   ` [PATCH v2 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-06  0:07 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

    git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
    awk '/commit/ { print $1 }' |
    git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  6 ++++--
 builtin/commit-graph.c             | 10 +++++-----
 commit-graph.c                     |  2 --
 commit-graph.h                     | 10 ++++------
 t/t5318-commit-graph.sh            | 15 +++++++++++----
 5 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 53a650225a..fcac7d12e1 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
-of OIDs in hex, one OID per line. (Cannot be combined with
-`--stdin-packs` or `--reachable`.)
+of OIDs in hex, one OID per line. OIDs that resolve to non-commits
+(either directly, or by peeling tags) are silently ignored. OIDs that
+are malformed, or do not exist generate an error. (Cannot be combined
+with `--stdin-packs` or `--reachable`.)
 +
 With the `--reachable` option, generate the new commit graph by walking
 commits starting at all refs. (Cannot be combined with `--stdin-commits`
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 6537e9acef..a4685a1553 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
 
 	display_progress(progress, oidset_size(commits) + 1);
 
+	if (oid_object_info(the_repository, &oid, NULL) < 0) {
+		error(_("object %s does not exist"), hash);
+		return 1;
+	}
+
 	result = lookup_commit_reference_gently(the_repository, &oid, 1);
 	if (result)
 		oidset_insert(commits, &result->object.oid);
-	else {
-		error(_("invalid commit object id: %s"), hash);
-		return 1;
-	}
 	return 0;
 }
 
@@ -239,7 +240,6 @@ static int graph_write(int argc, const char **argv)
 		struct strbuf buf = STRBUF_INIT;
 		if (opts.stdin_commits) {
 			oidset_init(&commits, 0);
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 			if (opts.progress)
 				progress = start_delayed_progress(
 					_("Collecting commits from input"), 0);
diff --git a/commit-graph.c b/commit-graph.c
index 56a4a43b30..3c3ada29b1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -880,7 +880,6 @@ struct write_commit_graph_context {
 	unsigned append:1,
 		 report_progress:1,
 		 split:1,
-		 check_oids:1,
 		 changed_paths:1,
 		 order_by_pack:1;
 
@@ -2001,7 +2000,6 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
 	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
-	ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
 	ctx->split_opts = split_opts;
 	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
 	ctx->total_bloom_filter_data_size = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 4212766a4f..659393e07f 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -88,12 +88,10 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 int generation_numbers_enabled(struct repository *r);
 
 enum commit_graph_write_flags {
-	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
-	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
-	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
-	/* Make sure that each OID in the input is a valid commit OID. */
-	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
-	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
+	COMMIT_GRAPH_WRITE_APPEND        = (1 << 0),
+	COMMIT_GRAPH_WRITE_PROGRESS      = (1 << 1),
+	COMMIT_GRAPH_WRITE_SPLIT         = (1 << 2),
+	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3)
 };
 
 enum commit_graph_split_flags {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 89020d3d44..74f93f0a17 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -84,11 +84,18 @@ graph_read_expect() {
 
 test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	# invalid, non-hex OID
+	echo HEAD >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
 	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
+	# non-existent OID
+	echo $ZERO_OID >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
+	test_i18ngrep "does not exist" stderr &&
+	# valid commit and tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits <in &&
+	graph_read_expect 3
 '
 
 test_expect_success 'write graph' '
-- 
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-05  1:13 ` [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
@ 2020-05-07 19:54   ` Jeff King
  2020-05-13 19:48     ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2020-05-07 19:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Mon, May 04, 2020 at 07:13:43PM -0600, Taylor Blau wrote:

> While iterating references (to discover the set of commits to write to
> the commit-graph with 'git commit-graph write --reachable'),
> 'add_ref_to_set' can save 'fill_oids_from_commits()' some time by
> peeling the references beforehand.
> 
> Move peeling out of 'fill_oids_from_commits()' and into
> 'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing
> so allows the commit-graph machinery to use the peeled value from
> '$GIT_DIR/packed-refs' instead of having to load and parse tags.

Or having to load and parse commits only to find out that they're not
tags. :)

> diff --git a/commit-graph.c b/commit-graph.c
> index 8f61256b0a..5c3fad0dd7 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1327,11 +1327,15 @@ static int add_ref_to_set(const char *refname,
>  			  const struct object_id *oid,
>  			  int flags, void *cb_data)
>  {
> +	struct object_id peeled;
>  	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
>  
>  	display_progress(data->progress, oidset_size(data->commits) + 1);
>  
> -	oidset_insert(data->commits, oid);
> +	if (peel_ref(refname, &peeled))
> +		peeled = *oid;

It may be the old-timer C programmer in me, but I always look slightly
suspicious at struct assignments. We know that object_id doesn't need a
deep copy, so it's obviously OK here. But should we use oidcpy() as a
style thing?

Alternatively, you could do this without a struct copy at all with:

  if (!peel_ref(...))
         oid = peeled;
  oidset_insert(..., oid);

which is actually a bit cheaper.

> +	if (oid_object_info(the_repository, &peeled, NULL) == OBJ_COMMIT)
> +		oidset_insert(data->commits, &peeled);

I probably would have left adding this "if" until a later step, but I
think it's OK here.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-05  1:13 ` [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
@ 2020-05-07 20:03   ` Jeff King
  2020-05-13 20:01     ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2020-05-07 20:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Mon, May 04, 2020 at 07:13:46PM -0600, Taylor Blau wrote:

> In the case of '--stdin-commits', feed each line to a new
> 'read_one_commit' helper, which (for now) will merely call
> 'parse_oid_hex'.

Makes sense.

> +static int read_one_commit(struct oidset *commits, char *hash)

This could be "const char *hash", couldn't it?

> +	struct object_id oid;
> +	const char *end;
> +
> +	if (parse_oid_hex(hash, &oid, &end)) {
> +		error(_("unexpected non-hex object ID: %s"), hash);
> +		return 1;
> +	}

Returning "-1" for error is more idiomatic in our code base (though I
know some of the commit-graph code doesn't follow that, I think we
should slowly try to move it back in the other direction.

> +		while (strbuf_getline(&buf, stdin) != EOF) {
> +			char *line = strbuf_detach(&buf, NULL);
> +			if (opts.stdin_commits) {
> +				int result = read_one_commit(&commits, line);
> +				if (result)
> +					return result;
> +			} else
> +				string_list_append(&pack_indexes, line);
> +		}

This leaks "line" for each commit in stdin_commits mode (it used to get
added to a string list). I think you want:

  while (strbuf_getline(&buf, stdin) != EOF) {
        if (opts.stdin_commits) {
	        if (read_one_commit(&commits, buf.buf)) {
			strbuf_release(&buf);
			return 1;
		}
	} else {
	        string_list_append(&pack_indexes, strbuf_detach(&buf));
	}
  }

Though I think it might be easier to follow if each mode simply has its
own while loop.

> +
>  		UNLEAK(buf);

Not new in your patch, but this UNLEAK() has always bugged me. ;) Why
not just strbuf_release() it?

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-05  1:13 ` [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
  2020-05-05 12:01   ` Derrick Stolee
@ 2020-05-07 20:14   ` Jeff King
  1 sibling, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-07 20:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Mon, May 04, 2020 at 07:13:49PM -0600, Taylor Blau wrote:

> When given a list of commits, the commit-graph machinery calls
> 'lookup_commit_reference_gently()' on each element in the set and treats
> the resulting set of OIDs as the base over which to close for
> reachability.
> 
> In an earlier collection of commits, the 'git commit-graph write
> --reachable' case made the inner-most call to
> 'lookup_commit_reference_gently()' by peeling references before they
> were passed over to the commit-graph internals.
> 
> Do the analog for 'git commit-graph write --stdin-commits' by calling
> 'lookup_commit_reference_gently()' outside of the commit-graph
> machinery, making the inner-most call a noop.

Yep, I think this is a good direction.

> @@ -148,7 +151,15 @@ static int read_one_commit(struct oidset *commits, char *hash)
>  		return 1;
>  	}
>  
> -	oidset_insert(commits, &oid);
> +	display_progress(progress, oidset_size(commits) + 1);

Most of our meters increment progress _after_ doing work.

This is especially important for meters with percentages (if we knew we
had 1 commit, we'd print "100%" and _then_ start to peel it, which is
silly).

For this instance we don't know the total number, so we're just counting
up. But I think we should be consistent about when we update meters.
Plus it's shorter to say:

  display_progress(progress, oidset_size(commits));

after having done the work. ;)

> +	result = lookup_commit_reference_gently(the_repository, &oid, 1);

Would we want to pass quiet==0 here? If we see an error we're going to
bail loudly below, so getting more details from the low-level functions
is helpful (I think the only one you'd really get is "I'm looking for a
commit, but it's a tree" or similar).

lookup_commit_reference_gently() is pretty aggressive about parsing
objects. We'll have to parse commits eventually, but we could possibly
do so using their graph representations. It may not be worth optimizing,
because it would only matter if you fed a lot of --stdin-commits inputs
that were already graphed. (And if it is worth optimizing, it should
probably come in a separate commit anyway; this is just about moving the
existing peeling).

> +	if (result)
> +		oidset_insert(commits, &result->object.oid);
> +	else {
> +		error(_("invalid commit object id: %s"), hash);
> +		return 1;
> +	}

If you follow my "return -1" suggestion from earlier, this would need
it, too.

>  		while (strbuf_getline(&buf, stdin) != EOF) {
>  			char *line = strbuf_detach(&buf, NULL);
>  			if (opts.stdin_commits) {
> -				int result = read_one_commit(&commits, line);
> +				int result = read_one_commit(&commits, progress,
> +							     line);
>  				if (result)
>  					return result;
>  			} else
>  				string_list_append(&pack_indexes, line);
>  		}
>  
> +		if (progress)
> +			stop_progress(&progress);

If we return early in the loop, we'd leave this progress meter hanging.
It might be worth converting that return to a break or goto that handles
cleanup (it also needs to handle releasing the strbuf).

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits'
  2020-05-05  1:13 ` [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
  2020-05-05 12:05   ` Derrick Stolee
@ 2020-05-07 20:21   ` Jeff King
  1 sibling, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-07 20:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Mon, May 04, 2020 at 07:13:54PM -0600, Taylor Blau wrote:

> In the previous handful of commits, both 'git commit-graph write
> --reachable' and '--stdin-commits' learned to peel tags down to the
> commits which they refer to before passing them into the commit-graph
> internals.
> 
> This makes the call to 'lookup_commit_reference_gently()' inside of
> 'fill_oids_from_commits()' a noop, since all OIDs are commits by that
> point.
> 
> As such, remove the call entirely, as well as the progress meter, which
> has been split and moved out to the callers in the aforementioned
> earlier commits.

Yep, all this makes sense. I agree with Stolee that it's unfortunate we
can't just reuse the oidset now, but we do need the flattened array view
here. We could perhaps create such an array from the beginning (perhaps
using an oid_array), but we do need to care about de-duping the input.
That can be done easily with the sorted list, but there are pathological
corner cases where the performance is worse (e.g., if you have a ton of
refs all pointing to the same tags, like if you happened to be storing
the refs for 20,000 forks of the kernel all in one giant repo).

I think we'd eventually turn all these into "struct commit" (and indeed,
we already do in --stdin-commits when we try to peel). So another
alternative would be passing in an object_array(). But I guess that
would require surgery to the rest of the commit-graph code to work with
that instead of the oid list.

>  	while ((oid = oidset_iter_next(&iter))) {
> -		struct commit *result;
> -
> -		display_progress(ctx->progress, ++i);
> -
> -		result = lookup_commit_reference_gently(ctx->r, oid, 1);
> -		if (result) {
> -			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
> -			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
> -			ctx->oids.nr++;
> -		} else if (ctx->check_oids) {
> -			error(_("invalid commit object id: %s"),
> -			      oid_to_hex(oid));
> -			return -1;
> -		}
> +		ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
> +		oidcpy(&ctx->oids.list[ctx->oids.nr], oid);
> +		ctx->oids.nr++;
>  	}

I wondered if it's worth asserting that everything we got here is a
commit. But it's not cheap to make that check, and anyway we'd
presumably just barf later on when we try to resolve the oids to
commits. So there's little point in spending cycles to catch it here.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-05  1:14 ` [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
  2020-05-05 12:10   ` Derrick Stolee
@ 2020-05-07 20:40   ` Jeff King
  2020-05-13 21:32     ` Taylor Blau
  1 sibling, 1 reply; 54+ messages in thread
From: Jeff King @ 2020-05-07 20:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Mon, May 04, 2020 at 07:14:03PM -0600, Taylor Blau wrote:

> If callers do wish to retain this behavior, they can easily work around
> this change by doing the following:
> 
>     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
>     awk '/commit/ { print $1 }' |
>     git commit-graph write --stdin-commits

I know this came from my earlier email, but I think that recipe actually
shows how to make your input work even if --check-oids were the default.
If you really want the --check-oids behavior, you'd want the opposite:
to complain about those ones. So it's something like:

     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
     awk '
       !/commit/ { print "not-a-commit:"$1 }
        /commit/ { print $1 }
     ' |
     git commit-graph write --stdin-commits

> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 53a650225a..fcac7d12e1 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
>  +
>  With the `--stdin-commits` option, generate the new commit graph by
>  walking commits starting at the commits specified in stdin as a list
> -of OIDs in hex, one OID per line. (Cannot be combined with
> -`--stdin-packs` or `--reachable`.)
> +of OIDs in hex, one OID per line. OIDs that resolve to non-commits
> +(either directly, or by peeling tags) are silently ignored. OIDs that
> +are malformed, or do not exist generate an error. (Cannot be combined
> +with `--stdin-packs` or `--reachable`.)

Yeah, I think these semantics are good.

> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 9eec68572f..3637d079fb 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
>  
>  	display_progress(progress, oidset_size(commits) + 1);
>  
> +	if (oid_object_info(the_repository, &oid, NULL) < 0) {
> +		error(_("object %s does not exist"), hash);
> +		return 1;
> +	}
> +
>  	result = lookup_commit_reference_gently(the_repository, &oid, 1);
>  	if (result)
>  		oidset_insert(commits, &result->object.oid);
> -	else {
> -		error(_("invalid commit object id: %s"), hash);
> -		return 1;
> -	}
>  	return 0;
>  }

We can avoid the object-existence check entirely if
lookup_commit_reference_gently() gives us an answer. And we'd expect
that to be the common path.

Also, using has_object_file() is cheaper than oid_object_info(), since
it doesn't have to resolve the type for deltas.

So perhaps:

  result = lookup_commit_reference_gently(...);
  if (result)
          oidset_insert(...);
  else if (has_object_file(&oid))
          ; /* not a commit; quietly ignore;
  else
          return error(no such object...);

That said, I think this technique misses some cases of corruption.
You're checking that the outer-most object exists, but not any
intermediate peeled objects. I.e., lookup_commit_reference_gently()
might have failed for two reasons:

  - an object it peeled to didn't exist

  - an object it peeled to wasn't a commit

To do it thoroughly, I think you'd have to call deref_tag() yourself and
distinguish NULL there (an error) from a result where obj->type isn't
OBJ_COMMIT (quietly ignore).

>  enum commit_graph_write_flags {
> -	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
> -	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
> -	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
> -	/* Make sure that each OID in the input is a valid commit OID. */
> -	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
> -	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
> +	COMMIT_GRAPH_WRITE_APPEND        = (1 << 0),
> +	COMMIT_GRAPH_WRITE_PROGRESS      = (1 << 1),
> +	COMMIT_GRAPH_WRITE_SPLIT         = (1 << 2),
> +	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3)

As much as I love looking at matched-indentation lists, I think this
diff is a good example of why it's not worth doing. It's much easier to
see what's going on if the first three items aren't touched. I'd
actually even leave BLOOM_FILTERS where it is, and accept the hole which
could be refilled later.

Your patch also loses the trailing comma after the final BLOOM_FILTERS
entry.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
                     ` (7 preceding siblings ...)
  2020-05-06  0:07   ` [PATCH v2 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
@ 2020-05-07 20:42   ` Jeff King
  8 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-07 20:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Tue, May 05, 2020 at 06:07:09PM -0600, Taylor Blau wrote:

> Here's a small re-roll of my series to move peeling outside of the
> commit-graph internals, and drop the CHECK_OIDS flag. This re-roll was
> promised in [1], and only updates the messages used in the new progress
> meters. For convenience, a range-diff from v1 is included below.

I just reviewed v1. :) But I think the revised progress messages in v2
are much better.

The overall direction looks very good to me. I left a couple of
comments. Nothing too major, but I think there are some things worth
addressing (leaks, and more robust error checking in the final patch).

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-07 19:54   ` Jeff King
@ 2020-05-13 19:48     ` Taylor Blau
  2020-05-13 20:17       ` Jeff King
  0 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 19:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, szeder.dev

On Thu, May 07, 2020 at 03:54:41PM -0400, Jeff King wrote:
> On Mon, May 04, 2020 at 07:13:43PM -0600, Taylor Blau wrote:
>
> > While iterating references (to discover the set of commits to write to
> > the commit-graph with 'git commit-graph write --reachable'),
> > 'add_ref_to_set' can save 'fill_oids_from_commits()' some time by
> > peeling the references beforehand.
> >
> > Move peeling out of 'fill_oids_from_commits()' and into
> > 'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing
> > so allows the commit-graph machinery to use the peeled value from
> > '$GIT_DIR/packed-refs' instead of having to load and parse tags.
>
> Or having to load and parse commits only to find out that they're not
> tags. :)
>
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 8f61256b0a..5c3fad0dd7 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -1327,11 +1327,15 @@ static int add_ref_to_set(const char *refname,
> >  			  const struct object_id *oid,
> >  			  int flags, void *cb_data)
> >  {
> > +	struct object_id peeled;
> >  	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
> >
> >  	display_progress(data->progress, oidset_size(data->commits) + 1);
> >
> > -	oidset_insert(data->commits, oid);
> > +	if (peel_ref(refname, &peeled))
> > +		peeled = *oid;
>
> It may be the old-timer C programmer in me, but I always look slightly
> suspicious at struct assignments. We know that object_id doesn't need a
> deep copy, so it's obviously OK here. But should we use oidcpy() as a
> style thing?
>
> Alternatively, you could do this without a struct copy at all with:
>
>   if (!peel_ref(...))
>          oid = peeled;
>   oidset_insert(..., oid);
>
> which is actually a bit cheaper.

Makes sense, I think this version is the better of the two that you
suggested here. I noticed one small thing which is that since peeled is
only on the stack, I think we actually want 'oid = &peeled', but
otherwise I took this as-is.

> > +	if (oid_object_info(the_repository, &peeled, NULL) == OBJ_COMMIT)
> > +		oidset_insert(data->commits, &peeled);
>
> I probably would have left adding this "if" until a later step, but I
> think it's OK here.

Yeah, I did it here to avoid having to add a seemingly-unrelated change
later on. I agree it doesn't matter much, so in the interest of leaving
the series alone, I'll leave it where it is here.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-07 20:03   ` Jeff King
@ 2020-05-13 20:01     ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 20:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, szeder.dev

On Thu, May 07, 2020 at 04:03:05PM -0400, Jeff King wrote:
> On Mon, May 04, 2020 at 07:13:46PM -0600, Taylor Blau wrote:
>
> > In the case of '--stdin-commits', feed each line to a new
> > 'read_one_commit' helper, which (for now) will merely call
> > 'parse_oid_hex'.
>
> Makes sense.
>
> > +static int read_one_commit(struct oidset *commits, char *hash)
>
> This could be "const char *hash", couldn't it?

Yep, thanks.

> > +	struct object_id oid;
> > +	const char *end;
> > +
> > +	if (parse_oid_hex(hash, &oid, &end)) {
> > +		error(_("unexpected non-hex object ID: %s"), hash);
> > +		return 1;
> > +	}
>
> Returning "-1" for error is more idiomatic in our code base (though I
> know some of the commit-graph code doesn't follow that, I think we
> should slowly try to move it back in the other direction.

Yeah, I know the -1 is more idiomatic than what I had written here. This
was done so that I could use the return value from 'read_one_commit' as
an exit code from 'graph_write()', but I don't mind switching this to
'return error(...)' and then checking at the caller for a non-zero
return value and returning 1 there instead.

> > +		while (strbuf_getline(&buf, stdin) != EOF) {
> > +			char *line = strbuf_detach(&buf, NULL);
> > +			if (opts.stdin_commits) {
> > +				int result = read_one_commit(&commits, line);
> > +				if (result)
> > +					return result;
> > +			} else
> > +				string_list_append(&pack_indexes, line);
> > +		}
>
> This leaks "line" for each commit in stdin_commits mode (it used to get
> added to a string list). I think you want:
>
>   while (strbuf_getline(&buf, stdin) != EOF) {
>         if (opts.stdin_commits) {
> 	        if (read_one_commit(&commits, buf.buf)) {
> 			strbuf_release(&buf);
> 			return 1;
> 		}
> 	} else {
> 	        string_list_append(&pack_indexes, strbuf_detach(&buf));
> 	}
>   }
>
> Though I think it might be easier to follow if each mode simply has its
> own while loop.

Yeah, it's much clearer as two separate cases. I'll send something like
that shortly once I get to the rest of your review.

> > +
> >  		UNLEAK(buf);
>
> Not new in your patch, but this UNLEAK() has always bugged me. ;) Why
> not just strbuf_release() it?

I snuck it in! ;)

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-13 19:48     ` Taylor Blau
@ 2020-05-13 20:17       ` Jeff King
  0 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-13 20:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Wed, May 13, 2020 at 01:48:48PM -0600, Taylor Blau wrote:

> > Alternatively, you could do this without a struct copy at all with:
> >
> >   if (!peel_ref(...))
> >          oid = peeled;
> >   oidset_insert(..., oid);
> >
> > which is actually a bit cheaper.
> 
> Makes sense, I think this version is the better of the two that you
> suggested here. I noticed one small thing which is that since peeled is
> only on the stack, I think we actually want 'oid = &peeled', but
> otherwise I took this as-is.

Yes, definitely. That's what I get for writing code in my email editor. :)

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-07 20:40   ` Jeff King
@ 2020-05-13 21:32     ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, szeder.dev

On Thu, May 07, 2020 at 04:40:05PM -0400, Jeff King wrote:
> On Mon, May 04, 2020 at 07:14:03PM -0600, Taylor Blau wrote:
>
> > If callers do wish to retain this behavior, they can easily work around
> > this change by doing the following:
> >
> >     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
> >     awk '/commit/ { print $1 }' |
> >     git commit-graph write --stdin-commits
>
> I know this came from my earlier email, but I think that recipe actually
> shows how to make your input work even if --check-oids were the default.
> If you really want the --check-oids behavior, you'd want the opposite:
> to complain about those ones. So it's something like:
>
>      git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
>      awk '
>        !/commit/ { print "not-a-commit:"$1 }
>         /commit/ { print $1 }
>      ' |
>      git commit-graph write --stdin-commits
>
> > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > index 53a650225a..fcac7d12e1 100644
> > --- a/Documentation/git-commit-graph.txt
> > +++ b/Documentation/git-commit-graph.txt
> > @@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
> >  +
> >  With the `--stdin-commits` option, generate the new commit graph by
> >  walking commits starting at the commits specified in stdin as a list
> > -of OIDs in hex, one OID per line. (Cannot be combined with
> > -`--stdin-packs` or `--reachable`.)
> > +of OIDs in hex, one OID per line. OIDs that resolve to non-commits
> > +(either directly, or by peeling tags) are silently ignored. OIDs that
> > +are malformed, or do not exist generate an error. (Cannot be combined
> > +with `--stdin-packs` or `--reachable`.)
>
> Yeah, I think these semantics are good.
>
> > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> > index 9eec68572f..3637d079fb 100644
> > --- a/builtin/commit-graph.c
> > +++ b/builtin/commit-graph.c
> > @@ -153,13 +153,14 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
> >
> >  	display_progress(progress, oidset_size(commits) + 1);
> >
> > +	if (oid_object_info(the_repository, &oid, NULL) < 0) {
> > +		error(_("object %s does not exist"), hash);
> > +		return 1;
> > +	}
> > +
> >  	result = lookup_commit_reference_gently(the_repository, &oid, 1);
> >  	if (result)
> >  		oidset_insert(commits, &result->object.oid);
> > -	else {
> > -		error(_("invalid commit object id: %s"), hash);
> > -		return 1;
> > -	}
> >  	return 0;
> >  }
>
> We can avoid the object-existence check entirely if
> lookup_commit_reference_gently() gives us an answer. And we'd expect
> that to be the common path.
>
> Also, using has_object_file() is cheaper than oid_object_info(), since
> it doesn't have to resolve the type for deltas.
>
> So perhaps:
>
>   result = lookup_commit_reference_gently(...);
>   if (result)
>           oidset_insert(...);
>   else if (has_object_file(&oid))
>           ; /* not a commit; quietly ignore;
>   else
>           return error(no such object...);
>
> That said, I think this technique misses some cases of corruption.
> You're checking that the outer-most object exists, but not any
> intermediate peeled objects. I.e., lookup_commit_reference_gently()
> might have failed for two reasons:
>
>   - an object it peeled to didn't exist
>
>   - an object it peeled to wasn't a commit
>
> To do it thoroughly, I think you'd have to call deref_tag() yourself and
> distinguish NULL there (an error) from a result where obj->type isn't
> OBJ_COMMIT (quietly ignore).

Makes sense. I initially worried a little bit about what to call the
error message (i.e., is is "this object doesn't exist" or "this object
exists but peels to a non-commit, or an otherwise missing object"). But,
I think just saying "invalid object: <hash>" is good enough here.

> >  enum commit_graph_write_flags {
> > -	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
> > -	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
> > -	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
> > -	/* Make sure that each OID in the input is a valid commit OID. */
> > -	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
> > -	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
> > +	COMMIT_GRAPH_WRITE_APPEND        = (1 << 0),
> > +	COMMIT_GRAPH_WRITE_PROGRESS      = (1 << 1),
> > +	COMMIT_GRAPH_WRITE_SPLIT         = (1 << 2),
> > +	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3)
>
> As much as I love looking at matched-indentation lists, I think this
> diff is a good example of why it's not worth doing. It's much easier to
> see what's going on if the first three items aren't touched. I'd
> actually even leave BLOOM_FILTERS where it is, and accept the hole which
> could be refilled later.
>
> Your patch also loses the trailing comma after the final BLOOM_FILTERS
> entry.

Fixed both, thanks. I'll build these eight new patches and send them
shortly, thanks for your review.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
                   ` (9 preceding siblings ...)
  2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
@ 2020-05-13 21:59 ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
                     ` (8 more replies)
  10 siblings, 9 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Hi,

Here is a tiny reroll to my series to drop the commit-graph's behavior
of complaining about non-commit OIDs on input with '--stdin-commits'. I
promised this reroll after Peff's review beginning in [1].

Not a great deal has changed, but I mostly took every suggestion that
Peff put forward (in the sub-thread beginning at [1]). The "big deals"
since last time are:

  - 'add_ref_to_set()' got cleaned up to use 'oid_object_info'.

  - progress meter updates moved after their action

  - 'graph_write()' got cleaned up to break the loop into two loops
    executed conditionally (based on which of '--stdin-packs' or
    '--stdin-commits' was passed) , as well as a cleanup label was
    introduced and some UNLEAK cruft was cleaned up.

For convenience, a range-diff since v2 (with the changes on master
applied since then elided out) is included below.

Thanks again for your review.

-Taylor

[1]: https://lore.kernel.org/git/20200507195441.GA29683@coredump.intra.peff.net/

Taylor Blau (8):
  commit-graph.c: extract 'refs_cb_data'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: peel refs in 'add_ref_to_set'
  builtin/commit-graph.c: extract 'read_one_commit()'
  builtin/commit-graph.c: dereference tags in builtin
  commit-graph.c: simplify 'fill_oids_from_commits'
  t5318: reorder test below 'graph_read_expect'
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag

 Documentation/git-commit-graph.txt |  6 ++-
 builtin/commit-graph.c             | 73 +++++++++++++++++++-----------
 commit-graph.c                     | 62 +++++++++++--------------
 commit-graph.h                     |  4 +-
 t/t5318-commit-graph.sh            | 25 ++++++----
 5 files changed, 95 insertions(+), 75 deletions(-)

Range-diff against v2:
 1:  43286c3c45 = 60:  0efbcfcf3a commit-graph.c: extract 'refs_cb_data'
 2:  cb56a9a73b ! 61:  773522c745 commit-graph.c: show progress of finding reachable commits
    @@ commit-graph.c: static void compute_bloom_filters(struct write_commit_graph_cont

      static int add_ref_to_set(const char *refname,
     @@ commit-graph.c: static int add_ref_to_set(const char *refname,
    - {
      	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;

    -+	display_progress(data->progress, oidset_size(data->commits) + 1);
    -+
      	oidset_insert(data->commits, oid);
    ++
    ++	display_progress(data->progress, oidset_size(data->commits));
    ++
      	return 0;
      }
    +
     @@ commit-graph.c: int write_commit_graph_reachable(struct object_directory *odb,

      	memset(&data, 0, sizeof(data));
 3:  85c388a077 <  -:  ---------- commit-graph.c: peel refs in 'add_ref_to_set'
 -:  ---------- > 62:  74b424f1ac commit-graph.c: peel refs in 'add_ref_to_set'
 4:  cef441b465 ! 63:  c37e94907b builtin/commit-graph.c: extract 'read_one_commit()'
    @@ builtin/commit-graph.c: static int write_option_parse_split(const struct option
      	return 0;
      }

    -+static int read_one_commit(struct oidset *commits, char *hash)
    ++static int read_one_commit(struct oidset *commits, const char *hash)
     +{
     +	struct object_id oid;
     +	const char *end;
     +
    -+	if (parse_oid_hex(hash, &oid, &end)) {
    -+		error(_("unexpected non-hex object ID: %s"), hash);
    -+		return 1;
    -+	}
    ++	if (parse_oid_hex(hash, &oid, &end))
    ++		return error(_("unexpected non-hex object ID: %s"), hash);
     +
     +	oidset_insert(commits, &oid);
     +	return 0;
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
      	}

     -	string_list_init(&lines, 0);
    -+	string_list_init(&pack_indexes, 0);
    - 	if (opts.stdin_packs || opts.stdin_commits) {
    - 		struct strbuf buf = STRBUF_INIT;
    --
    --		while (strbuf_getline(&buf, stdin) != EOF)
    +-	if (opts.stdin_packs || opts.stdin_commits) {
    +-		struct strbuf buf = STRBUF_INIT;
    ++	struct strbuf buf = STRBUF_INIT;
    ++	if (opts.stdin_packs) {
    ++		string_list_init(&pack_indexes, 0);
    +
    + 		while (strbuf_getline(&buf, stdin) != EOF)
     -			string_list_append(&lines, strbuf_detach(&buf, NULL));
    --
    ++			string_list_append(&pack_indexes,
    ++					   strbuf_detach(&buf, NULL));
    ++	} else if (opts.stdin_commits) {
    ++		oidset_init(&commits, 0);
    ++		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
    +
     -		if (opts.stdin_packs)
     -			pack_indexes = &lines;
    - 		if (opts.stdin_commits) {
    +-		if (opts.stdin_commits) {
     -			struct string_list_item *item;
     -			oidset_init(&commits, lines.nr);
     -			for_each_string_list_item(item, &lines) {
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
     -				}
     -
     -				oidset_insert(&commits, &oid);
    --			}
    -+			oidset_init(&commits, 0);
    - 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
    - 		}
    -
     +		while (strbuf_getline(&buf, stdin) != EOF) {
    -+			char *line = strbuf_detach(&buf, NULL);
    -+			if (opts.stdin_commits) {
    -+				int result = read_one_commit(&commits, line);
    -+				if (result)
    -+					return result;
    -+			} else
    -+				string_list_append(&pack_indexes, line);
    -+		}
    -+
    - 		UNLEAK(buf);
    ++			if (read_one_commit(&commits, buf.buf)) {
    ++				result = 1;
    ++				goto cleanup;
    + 			}
    +-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
    + 		}
    +-
    +-		UNLEAK(buf);
      	}

      	if (write_commit_graph(odb,
    @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
      		result = 1;

     -	UNLEAK(lines);
    ++cleanup:
     +	UNLEAK(pack_indexes);
    ++	strbuf_release(&buf);
      	return result;
      }

 5:  d449d83ce2 <  -:  ---------- builtin/commit-graph.c: dereference tags in builtin
 -:  ---------- > 64:  56403dd377 builtin/commit-graph.c: dereference tags in builtin
 6:  61887870c7 = 65:  6afb08a927 commit-graph.c: simplify 'fill_oids_from_commits'
 7:  e393b16097 = 66:  8fdd2792df t5318: reorder test below 'graph_read_expect'
 8:  ad373f05ff ! 67:  3cb0bd306c commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
    @@ Commit message
         If callers do wish to retain this behavior, they can easily work around
         this change by doing the following:

    -        git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
    -        awk '/commit/ { print $1 }' |
    -        git commit-graph write --stdin-commits
    +         git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
    +         awk '
    +           !/commit/ { print "not-a-commit:"$1 }
    +            /commit/ { print $1 }
    +         ' |
    +         git commit-graph write --stdin-commits

         To make it so that valid OIDs that refer to non-existent objects are
         indeed an error after loosening the error handling, perform an extra
         lookup to make sure that object indeed exists before sending it to the
         commit-graph internals.

    +    Helped-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## Documentation/git-commit-graph.txt ##
    @@ Documentation/git-commit-graph.txt: with `--stdin-commits` or `--reachable`.)
      commits starting at all refs. (Cannot be combined with `--stdin-commits`

      ## builtin/commit-graph.c ##
    -@@ builtin/commit-graph.c: static int read_one_commit(struct oidset *commits, struct progress *progress,
    +@@
    + #include "commit-graph.h"
    + #include "object-store.h"
    + #include "progress.h"
    ++#include "tag.h"

    - 	display_progress(progress, oidset_size(commits) + 1);
    + static char const * const builtin_commit_graph_usage[] = {
    + 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
    +@@ builtin/commit-graph.c: static int write_option_parse_split(const struct option *opt, const char *arg,
    + static int read_one_commit(struct oidset *commits, struct progress *progress,
    + 			   const char *hash)
    + {
    +-	struct commit *result;
    ++	struct object *result;
    + 	struct object_id oid;
    + 	const char *end;

    -+	if (oid_object_info(the_repository, &oid, NULL) < 0) {
    -+		error(_("object %s does not exist"), hash);
    -+		return 1;
    -+	}
    -+
    - 	result = lookup_commit_reference_gently(the_repository, &oid, 1);
    - 	if (result)
    - 		oidset_insert(commits, &result->object.oid);
    --	else {
    --		error(_("invalid commit object id: %s"), hash);
    --		return 1;
    --	}
    - 	return 0;
    - }
    + 	if (parse_oid_hex(hash, &oid, &end))
    + 		return error(_("unexpected non-hex object ID: %s"), hash);
    +
    +-	result = lookup_commit_reference_gently(the_repository, &oid, 1);
    +-	if (result)
    +-		oidset_insert(commits, &result->object.oid);
    +-	else
    +-		return error(_("invalid commit object id: %s"), hash);
    ++	result = deref_tag(the_repository, parse_object(the_repository, &oid),
    ++			   NULL, 0);
    ++	if (!result)
    ++		return error(_("invalid object: %s"), hash);
    ++	else if (object_as_type(the_repository, result, OBJ_COMMIT, 1))
    ++		oidset_insert(commits, &result->oid);
    +
    + 	display_progress(progress, oidset_size(commits));

     @@ builtin/commit-graph.c: static int graph_write(int argc, const char **argv)
    - 		struct strbuf buf = STRBUF_INIT;
    - 		if (opts.stdin_commits) {
    - 			oidset_init(&commits, 0);
    --			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
    - 			if (opts.progress)
    - 				progress = start_delayed_progress(
    - 					_("Collecting commits from input"), 0);
    + 					   strbuf_detach(&buf, NULL));
    + 	} else if (opts.stdin_commits) {
    + 		oidset_init(&commits, 0);
    +-		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
    + 		if (opts.progress)
    + 			progress = start_delayed_progress(
    + 				_("Collecting commits from input"), 0);

      ## commit-graph.c ##
     @@ commit-graph.c: struct write_commit_graph_context {
    @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      	ctx->total_bloom_filter_data_size = 0;

      ## commit-graph.h ##
    -@@ commit-graph.h: struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
    - int generation_numbers_enabled(struct repository *r);
    -
    - enum commit_graph_write_flags {
    --	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
    --	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
    --	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
    +@@ commit-graph.h: enum commit_graph_write_flags {
    + 	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
    + 	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
    + 	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
     -	/* Make sure that each OID in the input is a valid commit OID. */
     -	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
     -	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
    -+	COMMIT_GRAPH_WRITE_APPEND        = (1 << 0),
    -+	COMMIT_GRAPH_WRITE_PROGRESS      = (1 << 1),
    -+	COMMIT_GRAPH_WRITE_SPLIT         = (1 << 2),
    -+	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3)
    ++	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3),
      };

      enum commit_graph_split_flags {
    @@ t/t5318-commit-graph.sh: graph_read_expect() {
     +	# non-existent OID
     +	echo $ZERO_OID >in &&
     +	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
    -+	test_i18ngrep "does not exist" stderr &&
    ++	test_i18ngrep "invalid object" stderr &&
     +	# valid commit and tree OID
     +	git rev-parse HEAD HEAD^{tree} >in &&
     +	git commit-graph write --stdin-commits <in &&
--
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 1/8] commit-graph.c: extract 'refs_cb_data'
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In subsequent patches, we are going to update a progress meter when
'add_ref_to_set()' is called, and need a convenient way to pass a
'struct progress *' in from the caller.

Introduce 'refs_cb_data' as a catch-all for parameters that
'add_ref_to_set' may need, and wrap the existing single parameter in
that struct.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e3420ddcbf..9693112063 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1319,13 +1319,17 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 	stop_progress(&progress);
 }
 
+struct refs_cb_data {
+	struct oidset *commits;
+};
+
 static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
-	struct oidset *commits = (struct oidset *)cb_data;
+	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
-	oidset_insert(commits, oid);
+	oidset_insert(data->commits, oid);
 	return 0;
 }
 
@@ -1334,9 +1338,13 @@ int write_commit_graph_reachable(struct object_directory *odb,
 				 const struct split_commit_graph_opts *split_opts)
 {
 	struct oidset commits = OIDSET_INIT;
+	struct refs_cb_data data;
 	int result;
 
-	for_each_ref(add_ref_to_set, &commits);
+	memset(&data, 0, sizeof(data));
+	data.commits = &commits;
+
+	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 2/8] commit-graph.c: show progress of finding reachable commits
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When 'git commit-graph write --reachable' is invoked, the commit-graph
machinery calls 'for_each_ref()' to discover the set of reachable
commits.

Right now the 'add_ref_to_set' callback is not doing anything other than
adding an OID to the set of known-reachable OIDs. In a subsequent
commit, 'add_ref_to_set' will presumptively peel references. This
operation should be fast for repositories with an up-to-date
'$GIT_DIR/packed-refs', but may be slow in the general case.

So that it doesn't appear that 'git commit-graph write' is idling with
'--reachable' in the slow case, add a progress meter to provide some
output in the meantime.

In general, we don't expect a progress meter to appear at all, since
peeling references with a 'packed-refs' file is quick. If it's slow and
we do show a progress meter, the subsequent 'fill_oids_from_commits()'
will be fast, since all of the calls to
'lookup_commit_reference_gently()' will be no-ops.

Both progress meters are delayed, so it is unlikely that more than one
will appear. In either case, this intermediate state will go away in a
handful of patches, at which point there will be at most one progress
meter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 9693112063..f855911320 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1321,6 +1321,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 
 struct refs_cb_data {
 	struct oidset *commits;
+	struct progress *progress;
 };
 
 static int add_ref_to_set(const char *refname,
@@ -1330,6 +1331,9 @@ static int add_ref_to_set(const char *refname,
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
 	oidset_insert(data->commits, oid);
+
+	display_progress(data->progress, oidset_size(data->commits));
+
 	return 0;
 }
 
@@ -1343,12 +1347,17 @@ int write_commit_graph_reachable(struct object_directory *odb,
 
 	memset(&data, 0, sizeof(data));
 	data.commits = &commits;
+	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
+		data.progress = start_delayed_progress(
+			_("Collecting referenced commits"), 0);
 
 	for_each_ref(add_ref_to_set, &data);
 	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
 	oidset_clear(&commits);
+	if (data.progress)
+		stop_progress(&data.progress);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 3/8] commit-graph.c: peel refs in 'add_ref_to_set'
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

While iterating references (to discover the set of commits to write to
the commit-graph with 'git commit-graph write --reachable'),
'add_ref_to_set' can save 'fill_oids_from_commits()' some time by
peeling the references beforehand.

Move peeling out of 'fill_oids_from_commits()' and into
'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing
so allows the commit-graph machinery to use the peeled value from
'$GIT_DIR/packed-refs' instead of having to load and parse tags.

While we're at it, discard non-commit objects reachable from ref tips.
This would be done automatically by 'fill_oids_from_commits()', but such
functionality will be removed in a subsequent patch after the call to
'lookup_commit_reference_gently' is dropped (at which point a non-commit
object in the commits oidset will become an error).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index f855911320..e7de91f0f1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1328,9 +1328,13 @@ static int add_ref_to_set(const char *refname,
 			  const struct object_id *oid,
 			  int flags, void *cb_data)
 {
+	struct object_id peeled;
 	struct refs_cb_data *data = (struct refs_cb_data *)cb_data;
 
-	oidset_insert(data->commits, oid);
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) == OBJ_COMMIT)
+		oidset_insert(data->commits, oid);
 
 	display_progress(data->progress, oidset_size(data->commits));
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (2 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-14 17:56     ` Jeff King
  2020-05-13 21:59   ` [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 56 ++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 15fe60317c..f6647449ed 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -138,12 +138,23 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int read_one_commit(struct oidset *commits, const char *hash)
+{
+	struct object_id oid;
+	const char *end;
+
+	if (parse_oid_hex(hash, &oid, &end))
+		return error(_("unexpected non-hex object ID: %s"), hash);
+
+	oidset_insert(commits, &oid);
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
-	struct string_list *pack_indexes = NULL;
+	struct string_list pack_indexes;
 	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
-	struct string_list lines;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
 
@@ -209,44 +220,35 @@ static int graph_write(int argc, const char **argv)
 		return 0;
 	}
 
-	string_list_init(&lines, 0);
-	if (opts.stdin_packs || opts.stdin_commits) {
-		struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	if (opts.stdin_packs) {
+		string_list_init(&pack_indexes, 0);
 
 		while (strbuf_getline(&buf, stdin) != EOF)
-			string_list_append(&lines, strbuf_detach(&buf, NULL));
+			string_list_append(&pack_indexes,
+					   strbuf_detach(&buf, NULL));
+	} else if (opts.stdin_commits) {
+		oidset_init(&commits, 0);
+		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 
-		if (opts.stdin_packs)
-			pack_indexes = &lines;
-		if (opts.stdin_commits) {
-			struct string_list_item *item;
-			oidset_init(&commits, lines.nr);
-			for_each_string_list_item(item, &lines) {
-				struct object_id oid;
-				const char *end;
-
-				if (parse_oid_hex(item->string, &oid, &end)) {
-					error(_("unexpected non-hex object ID: "
-						"%s"), item->string);
-					return 1;
-				}
-
-				oidset_insert(&commits, &oid);
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			if (read_one_commit(&commits, buf.buf)) {
+				result = 1;
+				goto cleanup;
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
-
-		UNLEAK(buf);
 	}
 
 	if (write_commit_graph(odb,
-			       pack_indexes,
+			       opts.stdin_packs ? &pack_indexes : NULL,
 			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;
 
-	UNLEAK(lines);
+cleanup:
+	UNLEAK(pack_indexes);
+	strbuf_release(&buf);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (3 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-14 18:01     ` Jeff King
  2020-05-13 21:59   ` [PATCH v3 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.

In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.

Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.

Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f6647449ed..83c55d9227 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -6,6 +6,7 @@
 #include "repository.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "progress.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
@@ -138,15 +139,24 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }
 
-static int read_one_commit(struct oidset *commits, const char *hash)
+static int read_one_commit(struct oidset *commits, struct progress *progress,
+			   const char *hash)
 {
+	struct commit *result;
 	struct object_id oid;
 	const char *end;
 
 	if (parse_oid_hex(hash, &oid, &end))
 		return error(_("unexpected non-hex object ID: %s"), hash);
 
-	oidset_insert(commits, &oid);
+	result = lookup_commit_reference_gently(the_repository, &oid, 1);
+	if (result)
+		oidset_insert(commits, &result->object.oid);
+	else
+		return error(_("invalid commit object id: %s"), hash);
+
+	display_progress(progress, oidset_size(commits));
+
 	return 0;
 }
 
@@ -157,6 +167,7 @@ static int graph_write(int argc, const char **argv)
 	struct object_directory *odb = NULL;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;
+	struct progress *progress = NULL;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -230,13 +241,18 @@ static int graph_write(int argc, const char **argv)
 	} else if (opts.stdin_commits) {
 		oidset_init(&commits, 0);
 		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+		if (opts.progress)
+			progress = start_delayed_progress(
+				_("Collecting commits from input"), 0);
 
 		while (strbuf_getline(&buf, stdin) != EOF) {
-			if (read_one_commit(&commits, buf.buf)) {
+			if (read_one_commit(&commits, progress, buf.buf)) {
 				result = 1;
 				goto cleanup;
 			}
 		}
+
+
 	}
 
 	if (write_commit_graph(odb,
@@ -249,6 +265,8 @@ static int graph_write(int argc, const char **argv)
 cleanup:
 	UNLEAK(pack_indexes);
 	strbuf_release(&buf);
+	if (progress)
+		stop_progress(&progress);
 	return result;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 6/8] commit-graph.c: simplify 'fill_oids_from_commits'
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (4 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the previous handful of commits, both 'git commit-graph write
--reachable' and '--stdin-commits' learned to peel tags down to the
commits which they refer to before passing them into the commit-graph
internals.

This makes the call to 'lookup_commit_reference_gently()' inside of
'fill_oids_from_commits()' a noop, since all OIDs are commits by that
point.

As such, remove the call entirely, as well as the progress meter, which
has been split and moved out to the callers in the aforementioned
earlier commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e7de91f0f1..3ea70e6676 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1413,46 +1413,19 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 static int fill_oids_from_commits(struct write_commit_graph_context *ctx,
 				  struct oidset *commits)
 {
-	uint32_t i = 0;
-	struct strbuf progress_title = STRBUF_INIT;
 	struct oidset_iter iter;
 	struct object_id *oid;
 
 	if (!oidset_size(commits))
 		return 0;
 
-	if (ctx->report_progress) {
-		strbuf_addf(&progress_title,
-			    Q_("Finding commits for commit graph from %d ref",
-			       "Finding commits for commit graph from %d refs",
-			       oidset_size(commits)),
-			    oidset_size(commits));
-		ctx->progress = start_delayed_progress(
-					progress_title.buf,
-					oidset_size(commits));
-	}
-
 	oidset_iter_init(commits, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
-		struct commit *result;
-
-		display_progress(ctx->progress, ++i);
-
-		result = lookup_commit_reference_gently(ctx->r, oid, 1);
-		if (result) {
-			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
-			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
-			ctx->oids.nr++;
-		} else if (ctx->check_oids) {
-			error(_("invalid commit object id: %s"),
-			      oid_to_hex(oid));
-			return -1;
-		}
+		ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+		oidcpy(&ctx->oids.list[ctx->oids.nr], oid);
+		ctx->oids.nr++;
 	}
 
-	stop_progress(&ctx->progress);
-	strbuf_release(&progress_title);
-
 	return 0;
 }
 
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 7/8] t5318: reorder test below 'graph_read_expect'
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (5 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-13 21:59   ` [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
  2020-05-14 18:12   ` [PATCH v3 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King
  8 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

In the subsequent commit, we will introduce a dependency on
'graph_read_expect' from t5318.7. Preemptively move it below
'graph_read_expect()'s definition so that the test can call it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5318-commit-graph.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 424599959c..255e3bb1c9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -46,15 +46,6 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
-test_expect_success 'exit with correct error on bad input to --stdin-commits' '
-	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
-'
-
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
@@ -95,6 +86,15 @@ graph_read_expect() {
 	test_cmp expect output
 }
 
+test_expect_success 'exit with correct error on bad input to --stdin-commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
+	# valid tree OID, but not a commit OID
+	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	test_i18ngrep "invalid commit object id" stderr
+'
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-- 
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (6 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
@ 2020-05-13 21:59   ` Taylor Blau
  2020-05-14 18:09     ` Jeff King
  2020-05-14 18:12   ` [PATCH v3 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King
  8 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-13 21:59 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff, szeder.dev

Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
     awk '
       !/commit/ { print "not-a-commit:"$1 }
        /commit/ { print $1 }
     ' |
     git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  6 ++++--
 builtin/commit-graph.c             | 15 ++++++++-------
 commit-graph.c                     |  2 --
 commit-graph.h                     |  4 +---
 t/t5318-commit-graph.sh            | 15 +++++++++++----
 5 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 53a650225a..fcac7d12e1 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -47,8 +47,10 @@ with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
-of OIDs in hex, one OID per line. (Cannot be combined with
-`--stdin-packs` or `--reachable`.)
+of OIDs in hex, one OID per line. OIDs that resolve to non-commits
+(either directly, or by peeling tags) are silently ignored. OIDs that
+are malformed, or do not exist generate an error. (Cannot be combined
+with `--stdin-packs` or `--reachable`.)
 +
 With the `--reachable` option, generate the new commit graph by walking
 commits starting at all refs. (Cannot be combined with `--stdin-commits`
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 83c55d9227..d3bd1c31c9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,6 +7,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "progress.h"
+#include "tag.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
@@ -142,18 +143,19 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 static int read_one_commit(struct oidset *commits, struct progress *progress,
 			   const char *hash)
 {
-	struct commit *result;
+	struct object *result;
 	struct object_id oid;
 	const char *end;
 
 	if (parse_oid_hex(hash, &oid, &end))
 		return error(_("unexpected non-hex object ID: %s"), hash);
 
-	result = lookup_commit_reference_gently(the_repository, &oid, 1);
-	if (result)
-		oidset_insert(commits, &result->object.oid);
-	else
-		return error(_("invalid commit object id: %s"), hash);
+	result = deref_tag(the_repository, parse_object(the_repository, &oid),
+			   NULL, 0);
+	if (!result)
+		return error(_("invalid object: %s"), hash);
+	else if (object_as_type(the_repository, result, OBJ_COMMIT, 1))
+		oidset_insert(commits, &result->oid);
 
 	display_progress(progress, oidset_size(commits));
 
@@ -240,7 +242,6 @@ static int graph_write(int argc, const char **argv)
 					   strbuf_detach(&buf, NULL));
 	} else if (opts.stdin_commits) {
 		oidset_init(&commits, 0);
-		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		if (opts.progress)
 			progress = start_delayed_progress(
 				_("Collecting commits from input"), 0);
diff --git a/commit-graph.c b/commit-graph.c
index 3ea70e6676..2ff042fbf4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -881,7 +881,6 @@ struct write_commit_graph_context {
 	unsigned append:1,
 		 report_progress:1,
 		 split:1,
-		 check_oids:1,
 		 changed_paths:1,
 		 order_by_pack:1;
 
@@ -2011,7 +2010,6 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
 	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
-	ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
 	ctx->split_opts = split_opts;
 	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
 	ctx->total_bloom_filter_data_size = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 4212766a4f..3ba0da1e5f 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -91,9 +91,7 @@ enum commit_graph_write_flags {
 	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
 	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
 	COMMIT_GRAPH_WRITE_SPLIT      = (1 << 2),
-	/* Make sure that each OID in the input is a valid commit OID. */
-	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
-	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
+	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3),
 };
 
 enum commit_graph_split_flags {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 255e3bb1c9..a79c624875 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -88,11 +88,18 @@ graph_read_expect() {
 
 test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	cd "$TRASH_DIRECTORY/full" &&
-	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
+	# invalid, non-hex OID
+	echo HEAD >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
 	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
-	# valid tree OID, but not a commit OID
-	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr
+	# non-existent OID
+	echo $ZERO_OID >in &&
+	test_expect_code 1 git commit-graph write --stdin-commits <in 2>stderr &&
+	test_i18ngrep "invalid object" stderr &&
+	# valid commit and tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits <in &&
+	graph_read_expect 3
 '
 
 test_expect_success 'write graph' '
-- 
2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-13 21:59   ` [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
@ 2020-05-14 17:56     ` Jeff King
  2020-05-14 18:02       ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2020-05-14 17:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Wed, May 13, 2020 at 03:59:41PM -0600, Taylor Blau wrote:

> @@ -209,44 +220,35 @@ static int graph_write(int argc, const char **argv)
>  		return 0;
>  	}
>  
> -	string_list_init(&lines, 0);
> -	if (opts.stdin_packs || opts.stdin_commits) {
> -		struct strbuf buf = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	if (opts.stdin_packs) {

This is a decl-after-statement, isn't it? It should fail with
DEVELOPER=1.

The strbuf could be local in each block of the conditional, though that
makes your cleanup label trickier. Only the stdin_commits side needs to
clean up the buf, so we could just do it there. Or I'd be OK with having
the strbuf declared at the top of the function, and using the
whole-function cleanup label.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-13 21:59   ` [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
@ 2020-05-14 18:01     ` Jeff King
  2020-05-14 18:04       ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Jeff King @ 2020-05-14 18:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Wed, May 13, 2020 at 03:59:44PM -0600, Taylor Blau wrote:

> -static int read_one_commit(struct oidset *commits, const char *hash)
> +static int read_one_commit(struct oidset *commits, struct progress *progress,
> +			   const char *hash)
>  {
> +	struct commit *result;
>  	struct object_id oid;
>  	const char *end;
>  
>  	if (parse_oid_hex(hash, &oid, &end))
>  		return error(_("unexpected non-hex object ID: %s"), hash);
>  
> -	oidset_insert(commits, &oid);
> +	result = lookup_commit_reference_gently(the_repository, &oid, 1);
> +	if (result)
> +		oidset_insert(commits, &result->object.oid);
> +	else
> +		return error(_("invalid commit object id: %s"), hash);
> +
> +	display_progress(progress, oidset_size(commits));
> +

I expected this to switch to deref_tag() here, but it looks like you do
it in the final commit. That makes sense, since this step is really just
copying the existing logic.

> @@ -249,6 +265,8 @@ static int graph_write(int argc, const char **argv)
>  cleanup:
>  	UNLEAK(pack_indexes);
>  	strbuf_release(&buf);
> +	if (progress)
> +		stop_progress(&progress);
>  	return result;

Really minor nit, but stop_progress(), like display_progress(), handles
NULL already. So you can lose the "if" here.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-14 17:56     ` Jeff King
@ 2020-05-14 18:02       ` Taylor Blau
  2020-05-14 18:27         ` Junio C Hamano
  0 siblings, 1 reply; 54+ messages in thread
From: Taylor Blau @ 2020-05-14 18:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, szeder.dev

On Thu, May 14, 2020 at 01:56:46PM -0400, Jeff King wrote:
> On Wed, May 13, 2020 at 03:59:41PM -0600, Taylor Blau wrote:
>
> > @@ -209,44 +220,35 @@ static int graph_write(int argc, const char **argv)
> >  		return 0;
> >  	}
> >
> > -	string_list_init(&lines, 0);
> > -	if (opts.stdin_packs || opts.stdin_commits) {
> > -		struct strbuf buf = STRBUF_INIT;
> > +	struct strbuf buf = STRBUF_INIT;
> > +	if (opts.stdin_packs) {
>
> This is a decl-after-statement, isn't it? It should fail with
> DEVELOPER=1.

Whoops. I'm not sure how I missed this.

> The strbuf could be local in each block of the conditional, though that
> makes your cleanup label trickier. Only the stdin_commits side needs to
> clean up the buf, so we could just do it there. Or I'd be OK with having
> the strbuf declared at the top of the function, and using the
> whole-function cleanup label.

Thankfully, fixing this by moving the declaration/initialization up to
the top of the function didn't cause any conflicts with later patches.

Junio: if you're queuing this reroll, please use the below version
instead:

--- >8 ---

Subject: [PATCH] builtin/commit-graph.c: extract 'read_one_commit()'

With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 56 ++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 15fe60317c..aaa1d0c738 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -138,12 +138,24 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }

+static int read_one_commit(struct oidset *commits, const char *hash)
+{
+	struct object_id oid;
+	const char *end;
+
+	if (parse_oid_hex(hash, &oid, &end))
+		return error(_("unexpected non-hex object ID: %s"), hash);
+
+	oidset_insert(commits, &oid);
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
-	struct string_list *pack_indexes = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list pack_indexes;
 	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
-	struct string_list lines;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;

@@ -209,44 +221,34 @@ static int graph_write(int argc, const char **argv)
 		return 0;
 	}

-	string_list_init(&lines, 0);
-	if (opts.stdin_packs || opts.stdin_commits) {
-		struct strbuf buf = STRBUF_INIT;
+	if (opts.stdin_packs) {
+		string_list_init(&pack_indexes, 0);

 		while (strbuf_getline(&buf, stdin) != EOF)
-			string_list_append(&lines, strbuf_detach(&buf, NULL));
+			string_list_append(&pack_indexes,
+					   strbuf_detach(&buf, NULL));
+	} else if (opts.stdin_commits) {
+		oidset_init(&commits, 0);
+		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;

-		if (opts.stdin_packs)
-			pack_indexes = &lines;
-		if (opts.stdin_commits) {
-			struct string_list_item *item;
-			oidset_init(&commits, lines.nr);
-			for_each_string_list_item(item, &lines) {
-				struct object_id oid;
-				const char *end;
-
-				if (parse_oid_hex(item->string, &oid, &end)) {
-					error(_("unexpected non-hex object ID: "
-						"%s"), item->string);
-					return 1;
-				}
-
-				oidset_insert(&commits, &oid);
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			if (read_one_commit(&commits, buf.buf)) {
+				result = 1;
+				goto cleanup;
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
-
-		UNLEAK(buf);
 	}

 	if (write_commit_graph(odb,
-			       pack_indexes,
+			       opts.stdin_packs ? &pack_indexes : NULL,
 			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;

-	UNLEAK(lines);
+cleanup:
+	UNLEAK(pack_indexes);
+	strbuf_release(&buf);
 	return result;
 }

--
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin
  2020-05-14 18:01     ` Jeff King
@ 2020-05-14 18:04       ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-14 18:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, szeder.dev

On Thu, May 14, 2020 at 02:01:50PM -0400, Jeff King wrote:
> On Wed, May 13, 2020 at 03:59:44PM -0600, Taylor Blau wrote:
>
> > -static int read_one_commit(struct oidset *commits, const char *hash)
> > +static int read_one_commit(struct oidset *commits, struct progress *progress,
> > +			   const char *hash)
> >  {
> > +	struct commit *result;
> >  	struct object_id oid;
> >  	const char *end;
> >
> >  	if (parse_oid_hex(hash, &oid, &end))
> >  		return error(_("unexpected non-hex object ID: %s"), hash);
> >
> > -	oidset_insert(commits, &oid);
> > +	result = lookup_commit_reference_gently(the_repository, &oid, 1);
> > +	if (result)
> > +		oidset_insert(commits, &result->object.oid);
> > +	else
> > +		return error(_("invalid commit object id: %s"), hash);
> > +
> > +	display_progress(progress, oidset_size(commits));
> > +
>
> I expected this to switch to deref_tag() here, but it looks like you do
> it in the final commit. That makes sense, since this step is really just
> copying the existing logic.

I went back and forth on this. I figure that the behavior change should
come in the final commit, so this is keeping things consistent at this
point in the series.

> > @@ -249,6 +265,8 @@ static int graph_write(int argc, const char **argv)
> >  cleanup:
> >  	UNLEAK(pack_indexes);
> >  	strbuf_release(&buf);
> > +	if (progress)
> > +		stop_progress(&progress);
> >  	return result;
>
> Really minor nit, but stop_progress(), like display_progress(), handles
> NULL already. So you can lose the "if" here.

Mm. If there's nothing else (besides the review of 4/8 which I already
sent a replacement patch for), I'd prefer to leave this as-is to avoid
another reroll. Hopefully since this isn't hurting anything with the
extra if statement, it shouldn't matter too much.

If you feel strongly, I'm happy to re-send this series, or this patch.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  2020-05-13 21:59   ` [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
@ 2020-05-14 18:09     ` Jeff King
  0 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-14 18:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Wed, May 13, 2020 at 03:59:55PM -0600, Taylor Blau wrote:

> -	result = lookup_commit_reference_gently(the_repository, &oid, 1);
> -	if (result)
> -		oidset_insert(commits, &result->object.oid);
> -	else
> -		return error(_("invalid commit object id: %s"), hash);
> +	result = deref_tag(the_repository, parse_object(the_repository, &oid),
> +			   NULL, 0);
> +	if (!result)
> +		return error(_("invalid object: %s"), hash);

OK. As you noted earlier, this is really "we don't have that object"
_or_ "we don't have an object it points to". But since either is a
corruption, and I'd expect parse_object() to produce a more detailed
message anyway, I don't think it's worth trying to get more specific
here.

> +	else if (object_as_type(the_repository, result, OBJ_COMMIT, 1))
> +		oidset_insert(commits, &result->oid);

I suspect this could just be "if (result->type == OBJ_COMMIT)", as we'd
never see OBJ_NONE from a tag deref (which would have required the
casting behavior), but I don't think it hurts to use this function to be
on the safe side.

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/8] commit-graph: drop CHECK_OIDS, peel in callers
  2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
                     ` (7 preceding siblings ...)
  2020-05-13 21:59   ` [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
@ 2020-05-14 18:12   ` Jeff King
  8 siblings, 0 replies; 54+ messages in thread
From: Jeff King @ 2020-05-14 18:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, szeder.dev

On Wed, May 13, 2020 at 03:59:25PM -0600, Taylor Blau wrote:

> Here is a tiny reroll to my series to drop the commit-graph's behavior
> of complaining about non-commit OIDs on input with '--stdin-commits'. I
> promised this reroll after Peff's review beginning in [1].
> 
> Not a great deal has changed, but I mostly took every suggestion that
> Peff put forward (in the sub-thread beginning at [1]). The "big deals"
> since last time are:

Thanks. With the exception of the decl-after-statement (and the tiny
if() nit which we could take or leave), this looks good to me.

The fix for the decl-after-statement should be easy, but you might want
to double check your build setup ("make DEVELOPER=1" does complain about
it, and I think that means our CI should, too; if it doesn't, we should
fix that).

-Peff

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-14 18:02       ` Taylor Blau
@ 2020-05-14 18:27         ` Junio C Hamano
  2020-05-18 19:27           ` Taylor Blau
  0 siblings, 1 reply; 54+ messages in thread
From: Junio C Hamano @ 2020-05-14 18:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, dstolee, szeder.dev

Taylor Blau <me@ttaylorr.com> writes:

>  static int graph_write(int argc, const char **argv)
>  {
> -	struct string_list *pack_indexes = NULL;
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list pack_indexes;

I would strongly prefer this to be initialized here, not ...

>  	struct oidset commits = OIDSET_INIT;
>  	struct object_directory *odb = NULL;
> -	struct string_list lines;
>  	int result = 0;
>  	enum commit_graph_write_flags flags = 0;
>
> @@ -209,44 +221,34 @@ static int graph_write(int argc, const char **argv)
>  		return 0;
>  	}
>
> -	string_list_init(&lines, 0);
> -	if (opts.stdin_packs || opts.stdin_commits) {
> -		struct strbuf buf = STRBUF_INIT;
> +	if (opts.stdin_packs) {
> +		string_list_init(&pack_indexes, 0);

... down here.  I know that the use of it is guarded with a ternary
(opts.stdin_packs ? &pack_indexes : NULL), but that is why you need
to UNLEAK() it, intead of unconditionally releasing the strbuf, at
the end of this function, no?

>  		while (strbuf_getline(&buf, stdin) != EOF)
> -			string_list_append(&lines, strbuf_detach(&buf, NULL));
> +			string_list_append(&pack_indexes,
> +					   strbuf_detach(&buf, NULL));
> +	} else if (opts.stdin_commits) {
> +		oidset_init(&commits, 0);
> +		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
>
> -		if (opts.stdin_packs)
> -			pack_indexes = &lines;
> -		if (opts.stdin_commits) {
> -			struct string_list_item *item;
> -			oidset_init(&commits, lines.nr);
> -			for_each_string_list_item(item, &lines) {
> -				struct object_id oid;
> -				const char *end;
> -
> -				if (parse_oid_hex(item->string, &oid, &end)) {
> -					error(_("unexpected non-hex object ID: "
> -						"%s"), item->string);
> -					return 1;
> -				}
> -
> -				oidset_insert(&commits, &oid);
> +		while (strbuf_getline(&buf, stdin) != EOF) {
> +			if (read_one_commit(&commits, buf.buf)) {
> +				result = 1;
> +				goto cleanup;
>  			}
> -			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
>  		}
> -
> -		UNLEAK(buf);
>  	}
>
>  	if (write_commit_graph(odb,
> -			       pack_indexes,
> +			       opts.stdin_packs ? &pack_indexes : NULL,
>  			       opts.stdin_commits ? &commits : NULL,
>  			       flags,
>  			       &split_opts))
>  		result = 1;
>
> -	UNLEAK(lines);
> +cleanup:
> +	UNLEAK(pack_indexes);
> +	strbuf_release(&buf);
>  	return result;
>  }
>
> --
> 2.26.0.113.ge9739cdccc

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()'
  2020-05-14 18:27         ` Junio C Hamano
@ 2020-05-18 19:27           ` Taylor Blau
  0 siblings, 0 replies; 54+ messages in thread
From: Taylor Blau @ 2020-05-18 19:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jeff King, git, dstolee, szeder.dev

Hi Junio,

Sorry for a delayed response. I have been working on something that I
hope to send to the list shortly to discuss. In the meantime...

On Thu, May 14, 2020 at 11:27:00AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >  static int graph_write(int argc, const char **argv)
> >  {
> > -	struct string_list *pack_indexes = NULL;
> > +	struct strbuf buf = STRBUF_INIT;
> > +	struct string_list pack_indexes;
>
> I would strongly prefer this to be initialized here, not ...

Thanks for the suggestion. Your approach is much more favorable than
what I had written here, since it does allow me to unconditionally call
'string_list_clear(...)' instead of 'UNLEAK', which all three of us
dislike :).

Below is an amended patch; please use this as 4/8 instead of the
original when queuing. I'd send you a re-rolled series, but the rest of
the patches apply cleanly on top, so I don't think there is a need
unless it would make things more convenient for you.

-- >8 --

Subject: [PATCH] builtin/commit-graph.c: extract 'read_one_commit()'

With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c | 56 +++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 15fe60317c..a5c2332a86 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -138,12 +138,24 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 	return 0;
 }

+static int read_one_commit(struct oidset *commits, const char *hash)
+{
+	struct object_id oid;
+	const char *end;
+
+	if (parse_oid_hex(hash, &oid, &end))
+		return error(_("unexpected non-hex object ID: %s"), hash);
+
+	oidset_insert(commits, &oid);
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
-	struct string_list *pack_indexes = NULL;
+	struct string_list pack_indexes = STRING_LIST_INIT_NODUP;
+	struct strbuf buf = STRBUF_INIT;
 	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
-	struct string_list lines;
 	int result = 0;
 	enum commit_graph_write_flags flags = 0;

@@ -209,44 +221,32 @@ static int graph_write(int argc, const char **argv)
 		return 0;
 	}

-	string_list_init(&lines, 0);
-	if (opts.stdin_packs || opts.stdin_commits) {
-		struct strbuf buf = STRBUF_INIT;
-
+	if (opts.stdin_packs) {
 		while (strbuf_getline(&buf, stdin) != EOF)
-			string_list_append(&lines, strbuf_detach(&buf, NULL));
+			string_list_append(&pack_indexes,
+					   strbuf_detach(&buf, NULL));
+	} else if (opts.stdin_commits) {
+		oidset_init(&commits, 0);
+		flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;

-		if (opts.stdin_packs)
-			pack_indexes = &lines;
-		if (opts.stdin_commits) {
-			struct string_list_item *item;
-			oidset_init(&commits, lines.nr);
-			for_each_string_list_item(item, &lines) {
-				struct object_id oid;
-				const char *end;
-
-				if (parse_oid_hex(item->string, &oid, &end)) {
-					error(_("unexpected non-hex object ID: "
-						"%s"), item->string);
-					return 1;
-				}
-
-				oidset_insert(&commits, &oid);
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			if (read_one_commit(&commits, buf.buf)) {
+				result = 1;
+				goto cleanup;
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
-
-		UNLEAK(buf);
 	}

 	if (write_commit_graph(odb,
-			       pack_indexes,
+			       opts.stdin_packs ? &pack_indexes : NULL,
 			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;

-	UNLEAK(lines);
+cleanup:
+	string_list_clear(&pack_indexes, 0);
+	strbuf_release(&buf);
 	return result;
 }

--
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, back to index

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05  1:13 [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Taylor Blau
2020-05-05  1:13 ` [PATCH 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
2020-05-05  1:13 ` [PATCH 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
2020-05-05 11:50   ` Derrick Stolee
2020-05-05 16:13     ` Taylor Blau
2020-05-05  1:13 ` [PATCH 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
2020-05-07 19:54   ` Jeff King
2020-05-13 19:48     ` Taylor Blau
2020-05-13 20:17       ` Jeff King
2020-05-05  1:13 ` [PATCH 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
2020-05-07 20:03   ` Jeff King
2020-05-13 20:01     ` Taylor Blau
2020-05-05  1:13 ` [PATCH 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
2020-05-05 12:01   ` Derrick Stolee
2020-05-05 16:14     ` Taylor Blau
2020-05-07 20:14   ` Jeff King
2020-05-05  1:13 ` [PATCH 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
2020-05-05 12:05   ` Derrick Stolee
2020-05-07 20:21   ` Jeff King
2020-05-05  1:14 ` [PATCH 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
2020-05-05  1:14 ` [PATCH 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
2020-05-05 12:10   ` Derrick Stolee
2020-05-05 16:16     ` Taylor Blau
2020-05-07 20:40   ` Jeff King
2020-05-13 21:32     ` Taylor Blau
2020-05-05 11:35 ` [PATCH 0/8] commit-graph: drop CHECK_OIDS, peel in callers Derrick Stolee
2020-05-05 16:11   ` Taylor Blau
2020-05-06  0:07 ` [PATCH v2 " Taylor Blau
2020-05-06  0:07   ` [PATCH v2 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
2020-05-06  0:07   ` [PATCH v2 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
2020-05-06  0:07   ` [PATCH v2 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
2020-05-06  0:07   ` [PATCH v2 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
2020-05-06  0:07   ` [PATCH v2 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
2020-05-06  0:07   ` [PATCH v2 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
2020-05-06  0:07   ` [PATCH v2 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
2020-05-06  0:07   ` [PATCH v2 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
2020-05-07 20:42   ` [PATCH v2 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King
2020-05-13 21:59 ` [PATCH v3 " Taylor Blau
2020-05-13 21:59   ` [PATCH v3 1/8] commit-graph.c: extract 'refs_cb_data' Taylor Blau
2020-05-13 21:59   ` [PATCH v3 2/8] commit-graph.c: show progress of finding reachable commits Taylor Blau
2020-05-13 21:59   ` [PATCH v3 3/8] commit-graph.c: peel refs in 'add_ref_to_set' Taylor Blau
2020-05-13 21:59   ` [PATCH v3 4/8] builtin/commit-graph.c: extract 'read_one_commit()' Taylor Blau
2020-05-14 17:56     ` Jeff King
2020-05-14 18:02       ` Taylor Blau
2020-05-14 18:27         ` Junio C Hamano
2020-05-18 19:27           ` Taylor Blau
2020-05-13 21:59   ` [PATCH v3 5/8] builtin/commit-graph.c: dereference tags in builtin Taylor Blau
2020-05-14 18:01     ` Jeff King
2020-05-14 18:04       ` Taylor Blau
2020-05-13 21:59   ` [PATCH v3 6/8] commit-graph.c: simplify 'fill_oids_from_commits' Taylor Blau
2020-05-13 21:59   ` [PATCH v3 7/8] t5318: reorder test below 'graph_read_expect' Taylor Blau
2020-05-13 21:59   ` [PATCH v3 8/8] commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Taylor Blau
2020-05-14 18:09     ` Jeff King
2020-05-14 18:12   ` [PATCH v3 0/8] commit-graph: drop CHECK_OIDS, peel in callers Jeff King

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git