git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] for-each-ref optimizations & usability improvements
@ 2023-11-07  1:25 Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
                   ` (10 more replies)
  0 siblings, 11 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye

This series is a bit of an informal follow-up to [1], adding some more
substantial optimizations and usability fixes around ref
filtering/formatting. Some of the changes here affect user-facing behavior,
some are internal-only, but they're all interdependent enough to warrant
putting them together in one series.

[1]
https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/

Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
'tag', and 'branch'. Currently, it just removes previous sort keys and, if
no further keys are specified, falls back on ascending refname sort (which,
IMO, makes the name '--no-sort' somewhat misleading).

Patch 2 updates the 'for-each-ref' docs to clearly state what happens if you
use '--omit-empty' and '--count' together. I based the explanation on what
the current behavior is (i.e., refs omitted with '--omit-empty' do count
towards the total limited by '--count').

Patches 3-7 incrementally refactor various parts of the ref
filtering/formatting workflows in order to create a
'filter_and_format_refs()' function. If certain conditions are met (sorting
disabled, no reachability filtering or ahead-behind formatting), ref
filtering & formatting is done within a single 'for_each_fullref_in'
callback. Especially in large repositories, this makes a huge difference in
memory usage & runtime for certain usages of 'for-each-ref', since it's no
longer writing everything to a 'struct ref_array' then repeatedly whittling
down/updating its contents.

Patch 8 introduces a new option to 'for-each-ref' called '--full-deref'.
When provided, any format fields for the dereferenced value of a tag (e.g.
"%(*objectname)") will be populated with the fully peeled target of the tag;
right now, those fields are populated with the immediate target of a tag
(which can be another tag). This avoids the need to pipe 'for-each-ref'
results to 'cat-file --batch-check' to get fully-peeled tag information. It
also benefits from the 'filter_and_format_refs()' single-iteration
optimization, since 'peel_iterated_oid()' may be able to read the
pre-computed peeled OID from a packed ref. A couple notes on this one:

 * I went with a command line option for '--full-deref' rather than another
   format specifier (like ** instead of *) because it seems unlikely that a
   user is going to want to perform a shallow dereference and a full
   dereference in the same 'for-each-ref'. There's also a NEEDSWORK going
   all the way back to the introduction of 'for-each-ref' in 9f613ddd21c
   (Add git-for-each-ref: helper for language bindings, 2006-09-15) that (to
   me) implies different dereferencing behavior corresponds to different use
   cases/user needs.
 * I'm not attached to '--full-deref' as a name - if someone has an idea for
   a more descriptive name, please suggest it!

Finally, patch 9 adds performance tests for 'for-each-ref', showing the
effects of optimizations made throughout the series. Here are some sample
results from my Ubuntu VM (test names shortened for space):

Test                                                         this branch    
----------------------------------------------------------------------------
6300.2: (loose)                                              4.78(0.89+3.82)
6300.3: (loose, no sort)                                     4.51(0.86+3.58)
6300.4: (loose, --count=1)                                   4.70(0.90+3.73)
6300.5: (loose, --count=1, no sort)                          4.35(0.58+3.73)
6300.6: (loose, tags)                                        2.45(0.44+1.95)
6300.7: (loose, tags, no sort)                               2.38(0.44+1.90)
6300.8: (loose, tags, shallow deref)                         3.33(1.27+1.99)
6300.9: (loose, tags, shallow deref, no sort)                3.29(1.29+1.93)
6300.10: (loose, tags, full deref)                           3.76(1.69+1.99)
6300.11: (loose, tags, full deref, no sort)                  3.73(1.71+1.94)
6300.12: for-each-ref + cat-file (loose, tags)               4.25(2.16+2.17)
6300.14: (packed)                                            0.61(0.50+0.09)
6300.15: (packed, no sort)                                   0.46(0.40+0.04)
6300.16: (packed, --count=1)                                 0.59(0.44+0.13)
6300.17: (packed, --count=1, no sort)                        0.02(0.01+0.01)
6300.18: (packed, tags)                                      0.28(0.18+0.09)
6300.19: (packed, tags, no sort)                             0.29(0.24+0.03)
6300.20: (packed, tags, shallow deref)                       1.20(1.03+0.13)
6300.21: (packed, tags, shallow deref, no sort)              1.13(0.99+0.08)
6300.22: (packed, tags, full deref)                          1.57(1.45+0.11)
6300.23: (packed, tags, full deref, no sort)                 1.07(1.01+0.05)
6300.24: for-each-ref + cat-file (packed, tags)              2.01(1.81+0.33)


 * Victoria

Victoria Dye (9):
  ref-filter.c: really don't sort when using --no-sort
  for-each-ref: clarify interaction of --omit-empty & --count
  ref-filter.h: add max_count and omit_empty to ref_format
  ref-filter.h: move contains caches into filter
  ref-filter.h: add functions for filter/format & format-only
  ref-filter.c: refactor to create common helper functions
  ref-filter.c: filter & format refs in the same callback
  for-each-ref: add option to fully dereference tags
  t/perf: add perf tests for for-each-ref

 Documentation/git-for-each-ref.txt |  12 +-
 builtin/branch.c                   |  42 +++--
 builtin/for-each-ref.c             |  41 ++---
 builtin/ls-remote.c                |  10 +-
 builtin/tag.c                      |  32 +---
 ref-filter.c                       | 277 ++++++++++++++++++++---------
 ref-filter.h                       |  26 +++
 t/perf/p6300-for-each-ref.sh       |  87 +++++++++
 t/t3200-branch.sh                  |  68 ++++++-
 t/t6300-for-each-ref.sh            |  55 ++++++
 t/t7004-tag.sh                     |  45 +++++
 11 files changed, 532 insertions(+), 163 deletions(-)
 create mode 100755 t/perf/p6300-for-each-ref.sh


base-commit: bc5204569f7db44d22477485afd52ea410d83743
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1609%2Fvdye%2Fvdye%2Ffor-each-ref-optimizations-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1609/vdye/vdye/for-each-ref-optimizations-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1609
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07  1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if
the string list provided to it is empty, rather than returning the default
refname sort structure. Also update 'ref_array_sort()' to explicitly skip
sorting if its 'struct ref_sorting *' arg is NULL. Other functions using
'struct ref_sorting *' do not need any changes because they already properly
ignore NULL values.

The goal of this change is to have the '--no-sort' option truly disable
sorting in commands like 'for-each-ref, 'tag', and 'branch'. Right now,
'--no-sort' will still trigger refname sorting by default in 'for-each-ref',
'tag', and 'branch'.

To match existing behavior as closely as possible, explicitly add "refname"
to the list of sort keys in 'for-each-ref', 'tag', and 'branch' before
parsing options (if no config-based sort keys are set). This ensures that
sorting will only be fully disabled if '--no-sort' is provided as an option;
otherwise, "refname" sorting will remain the default. Note: this also means
that even when sort keys are provided on the command line, "refname" will be
the final sort key in the sorting structure. This doesn't actually change
any behavior, since 'compare_refs()' already falls back on comparing
refnames if two refs are equal w.r.t all other sort keys.

Finally, remove the condition around sorting in 'ls-remote', since it's no
longer necessary. Unlike 'for-each-ref' et. al., it does *not* set any sort
keys by default. The default empty list of sort keys will produce a NULL
'struct ref_sorting *', which causes the sorting to be skipped in
'ref_array_sort()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c        |  6 ++++
 builtin/for-each-ref.c  |  3 ++
 builtin/ls-remote.c     | 10 ++----
 builtin/tag.c           |  6 ++++
 ref-filter.c            | 19 ++----------
 t/t3200-branch.sh       | 68 +++++++++++++++++++++++++++++++++++++++--
 t/t6300-for-each-ref.sh | 21 +++++++++++++
 t/t7004-tag.sh          | 45 +++++++++++++++++++++++++++
 8 files changed, 152 insertions(+), 26 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e7ee9bd0f15..d67738bbcaa 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -767,7 +767,13 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_branch_usage, options);
 
+	/*
+	 * Try to set sort keys from config. If config does not set any,
+	 * fall back on default (refname) sorting.
+	 */
 	git_config(git_branch_config, &sorting_options);
+	if (!sorting_options.nr)
+		string_list_append(&sorting_options, "refname");
 
 	track = git_branch_track;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 350bfa6e811..93b370f550b 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -67,6 +67,9 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	/* Set default (refname) sorting */
+	string_list_append(&sorting_options, "refname");
+
 	parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
 	if (maxcount < 0) {
 		error("invalid --count argument: `%d'", maxcount);
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index fc765754305..436249b720c 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -58,6 +58,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 	struct transport *transport;
 	const struct ref *ref;
 	struct ref_array ref_array;
+	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 
 	struct option options[] = {
@@ -141,13 +142,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		item->symref = xstrdup_or_null(ref->symref);
 	}
 
-	if (sorting_options.nr) {
-		struct ref_sorting *sorting;
-
-		sorting = ref_sorting_options(&sorting_options);
-		ref_array_sort(sorting, &ref_array);
-		ref_sorting_release(sorting);
-	}
+	sorting = ref_sorting_options(&sorting_options);
+	ref_array_sort(sorting, &ref_array);
 
 	for (i = 0; i < ref_array.nr; i++) {
 		const struct ref_array_item *ref = ref_array.items[i];
diff --git a/builtin/tag.c b/builtin/tag.c
index 3918eacbb57..64f3196cd4c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -501,7 +501,13 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
+	/*
+	 * Try to set sort keys from config. If config does not set any,
+	 * fall back on default (refname) sorting.
+	 */
 	git_config(git_tag_config, &sorting_options);
+	if (!sorting_options.nr)
+		string_list_append(&sorting_options, "refname");
 
 	memset(&opt, 0, sizeof(opt));
 	filter.lines = -1;
diff --git a/ref-filter.c b/ref-filter.c
index e4d3510e28e..7250089b7c6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3142,7 +3142,8 @@ void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting,
 
 void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
 {
-	QSORT_S(array->items, array->nr, compare_refs, sorting);
+	if (sorting)
+		QSORT_S(array->items, array->nr, compare_refs, sorting);
 }
 
 static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
@@ -3248,18 +3249,6 @@ static int parse_sorting_atom(const char *atom)
 	return res;
 }
 
-/*  If no sorting option is given, use refname to sort as default */
-static struct ref_sorting *ref_default_sorting(void)
-{
-	static const char cstr_name[] = "refname";
-
-	struct ref_sorting *sorting = xcalloc(1, sizeof(*sorting));
-
-	sorting->next = NULL;
-	sorting->atom = parse_sorting_atom(cstr_name);
-	return sorting;
-}
-
 static void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *arg)
 {
 	struct ref_sorting *s;
@@ -3283,9 +3272,7 @@ struct ref_sorting *ref_sorting_options(struct string_list *options)
 	struct string_list_item *item;
 	struct ref_sorting *sorting = NULL, **tail = &sorting;
 
-	if (!options->nr) {
-		sorting = ref_default_sorting();
-	} else {
+	if (options->nr) {
 		for_each_string_list_item(item, options)
 			parse_ref_sorting(tail, item->string);
 	}
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 3182abde27f..9918ba05dec 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1570,9 +1570,10 @@ test_expect_success 'tracking with unexpected .fetch refspec' '
 
 test_expect_success 'configured committerdate sort' '
 	git init -b main sort &&
+	test_config -C sort branch.sort "committerdate" &&
+
 	(
 		cd sort &&
-		git config branch.sort committerdate &&
 		test_commit initial &&
 		git checkout -b a &&
 		test_commit a &&
@@ -1592,9 +1593,10 @@ test_expect_success 'configured committerdate sort' '
 '
 
 test_expect_success 'option override configured sort' '
+	test_config -C sort branch.sort "committerdate" &&
+
 	(
 		cd sort &&
-		git config branch.sort committerdate &&
 		git branch --sort=refname >actual &&
 		cat >expect <<-\EOF &&
 		  a
@@ -1606,10 +1608,70 @@ test_expect_success 'option override configured sort' '
 	)
 '
 
+test_expect_success '--no-sort cancels config sort keys' '
+	test_config -C sort branch.sort "-refname" &&
+
+	(
+		cd sort &&
+
+		# objecttype is identical for all of them, so sort falls back on
+		# default (ascending refname)
+		git branch \
+			--no-sort \
+			--sort="objecttype" >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		* b
+		  c
+		  main
+		EOF
+		test_cmp expect actual
+	)
+
+'
+
+test_expect_success '--no-sort cancels command line sort keys' '
+	(
+		cd sort &&
+
+		# objecttype is identical for all of them, so sort falls back on
+		# default (ascending refname)
+		git branch \
+			--sort="-refname" \
+			--no-sort \
+			--sort="objecttype" >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		* b
+		  c
+		  main
+		EOF
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--no-sort without subsequent --sort prints expected branches' '
+	(
+		cd sort &&
+
+		# Sort the results with `sort` for a consistent comparison
+		# against expected
+		git branch --no-sort | sort >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		  c
+		  main
+		* b
+		EOF
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'invalid sort parameter in configuration' '
+	test_config -C sort branch.sort "v:notvalid" &&
+
 	(
 		cd sort &&
-		git config branch.sort "v:notvalid" &&
 
 		# this works in the "listing" mode, so bad sort key
 		# is a dying offence.
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 00a060df0b5..0613e5e3623 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1335,6 +1335,27 @@ test_expect_success '--no-sort cancels the previous sort keys' '
 	test_cmp expected actual
 '
 
+test_expect_success '--no-sort without subsequent --sort prints expected refs' '
+	cat >expected <<-\EOF &&
+	refs/tags/multi-ref1-100000-user1
+	refs/tags/multi-ref1-100000-user2
+	refs/tags/multi-ref1-200000-user1
+	refs/tags/multi-ref1-200000-user2
+	refs/tags/multi-ref2-100000-user1
+	refs/tags/multi-ref2-100000-user2
+	refs/tags/multi-ref2-200000-user1
+	refs/tags/multi-ref2-200000-user2
+	EOF
+
+	# Sort the results with `sort` for a consistent comparison against
+	# expected
+	git for-each-ref \
+		--format="%(refname)" \
+		--no-sort \
+		"refs/tags/multi-*" | sort >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'do not dereference NULL upon %(HEAD) on unborn branch' '
 	test_when_finished "git checkout main" &&
 	git for-each-ref --format="%(HEAD) %(refname:short)" refs/heads/ >actual &&
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index e689db42929..b41a47eb943 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1862,6 +1862,51 @@ test_expect_success 'option override configured sort' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-sort cancels config sort keys' '
+	test_config tag.sort "-refname" &&
+
+	# objecttype is identical for all of them, so sort falls back on
+	# default (ascending refname)
+	git tag -l \
+		--no-sort \
+		--sort="objecttype" \
+		"foo*" >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success '--no-sort cancels command line sort keys' '
+	# objecttype is identical for all of them, so sort falls back on
+	# default (ascending refname)
+	git tag -l \
+		--sort="-refname" \
+		--no-sort \
+		--sort="objecttype" \
+		"foo*" >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success '--no-sort without subsequent --sort prints expected tags' '
+	# Sort the results with `sort` for a consistent comparison against
+	# expected
+	git tag -l --no-sort "foo*" | sort >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'invalid sort parameter on command line' '
 	test_must_fail git tag -l --sort=notvalid "foo*" >actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07 19:23   ` Øystein Walle
  2023-11-07  1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update the 'for-each-ref' builtin documentation to clarify that refs
"omitted" by --omit-empty are still counted toward the limit specified by
--count. The use of the term "omit" would otherwise be somewhat ambiguous
and could incorrectly be construed as excluding empty refs entirely (i.e.
not counting them towards the total ref count).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-for-each-ref.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index e86d5700ddf..407f624fbaa 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -101,7 +101,8 @@ OPTIONS
 
 --omit-empty::
 	Do not print a newline after formatted refs where the format expands
-	to the empty string.
+	to the empty string. Although omitted refs are not shown in the output,
+	they still count toward the total limited by `--count`.
 
 --exclude=<pattern>::
 	If one or more patterns are given, only refs which do not match
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add an internal 'array_opts' struct to 'struct ref_format' containing
formatting options that pertain to the formatting of an entire ref array:
'max_count' and 'omit_empty'. These values are specified by the '--count'
and '--omit-empty' options, respectively, to 'for-each-ref'/'tag'/'branch'.
Storing these values in the 'ref_format' will simplify the consolidation of
ref array formatting logic across builtins in later patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c       |  5 ++---
 builtin/for-each-ref.c | 21 +++++++++++----------
 builtin/tag.c          |  5 ++---
 ref-filter.h           |  5 +++++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index d67738bbcaa..5a1ec1cd04f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -45,7 +45,6 @@ static const char *head;
 static struct object_id head_oid;
 static int recurse_submodules = 0;
 static int submodule_propagate_branches = 0;
-static int omit_empty = 0;
 
 static int branch_use_color = -1;
 static char branch_colors[][COLOR_MAXLEN] = {
@@ -480,7 +479,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 			string_list_append(output, out.buf);
 		} else {
 			fwrite(out.buf, 1, out.len, stdout);
-			if (out.len || !omit_empty)
+			if (out.len || !format->array_opts.omit_empty)
 				putchar('\n');
 		}
 	}
@@ -737,7 +736,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		OPT_BIT('D', NULL, &delete, N_("delete branch (even if not merged)"), 2),
 		OPT_BIT('m', "move", &rename, N_("move/rename a branch and its reflog"), 1),
 		OPT_BIT('M', NULL, &rename, N_("move/rename a branch, even if target exists"), 2),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 		OPT_BIT('c', "copy", &copy, N_("copy a branch and its reflog"), 1),
 		OPT_BIT('C', NULL, &copy, N_("copy a branch, even if target exists"), 2),
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 93b370f550b..881c3ee055f 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -19,10 +19,10 @@ static char const * const for_each_ref_usage[] = {
 
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
-	int i;
+	int i, total;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
-	int maxcount = 0, icase = 0, omit_empty = 0;
+	int icase = 0;
 	struct ref_array array;
 	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -40,11 +40,11 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 			N_("quote placeholders suitably for python"), QUOTE_PYTHON),
 		OPT_BIT(0 , "tcl",  &format.quote_style,
 			N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 
 		OPT_GROUP(""),
-		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
+		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
@@ -71,8 +71,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	string_list_append(&sorting_options, "refname");
 
 	parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
-	if (maxcount < 0) {
-		error("invalid --count argument: `%d'", maxcount);
+	if (format.array_opts.max_count < 0) {
+		error("invalid --count argument: `%d'", format.array_opts.max_count);
 		usage_with_options(for_each_ref_usage, opts);
 	}
 	if (HAS_MULTI_BITS(format.quote_style)) {
@@ -109,15 +109,16 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	ref_array_sort(sorting, &array);
 
-	if (!maxcount || array.nr < maxcount)
-		maxcount = array.nr;
-	for (i = 0; i < maxcount; i++) {
+	total = format.array_opts.max_count;
+	if (!total || array.nr < total)
+		total = array.nr;
+	for (i = 0; i < total; i++) {
 		strbuf_reset(&err);
 		strbuf_reset(&output);
 		if (format_ref_array_item(array.items[i], &format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !omit_empty)
+		if (output.len || !format.array_opts.omit_empty)
 			putchar('\n');
 	}
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 64f3196cd4c..2d599245d48 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -44,7 +44,6 @@ static const char * const git_tag_usage[] = {
 static unsigned int colopts;
 static int force_sign_annotate;
 static int config_sign_tag = -1; /* unspecified */
-static int omit_empty = 0;
 
 static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
@@ -83,7 +82,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		if (format_ref_array_item(array.items[i], format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !omit_empty)
+		if (output.len || !format->array_opts.omit_empty)
 			putchar('\n');
 	}
 
@@ -481,7 +480,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 		OPT_WITHOUT(&filter.no_commit, N_("print only tags that don't contain the commit")),
 		OPT_MERGED(&filter, N_("print only tags that are merged")),
 		OPT_NO_MERGED(&filter, N_("print only tags that are not merged")),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 		OPT_REF_SORT(&sorting_options),
 		{
diff --git a/ref-filter.h b/ref-filter.h
index 1524bc463a5..d87d61238b7 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,11 @@ struct ref_format {
 
 	/* List of bases for ahead-behind counts. */
 	struct string_list bases;
+
+	struct {
+		int max_count;
+		int omit_empty;
+	} array_opts;
 };
 
 #define REF_FILTER_INIT { \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/9] ref-filter.h: move contains caches into filter
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-11-07  1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07  1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into
an 'internal' struct of the 'struct ref_filter'. In later patches, the
'struct ref_filter *' will be a common data structure across multiple
filtering functions. These caches are part of the common functionality the
filter struct will support, so they are updated to be internally accessible
wherever the filter is used.

The design used here is mirrors what was introduced in 576de3d956
(unpack_trees: start splitting internal fields from public API, 2023-02-27)
for 'unpack_trees_options'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 14 ++++++--------
 ref-filter.h |  6 ++++++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 7250089b7c6..5129b6986c9 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2764,8 +2764,6 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 struct ref_filter_cbdata {
 	struct ref_array *array;
 	struct ref_filter *filter;
-	struct contains_cache contains_cache;
-	struct contains_cache no_contains_cache;
 };
 
 /*
@@ -2816,11 +2814,11 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 			return 0;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
-		    !commit_contains(filter, commit, filter->with_commit, &ref_cbdata->contains_cache))
+		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
 			return 0;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
-		    commit_contains(filter, commit, filter->no_commit, &ref_cbdata->no_contains_cache))
+		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
 			return 0;
 	}
 
@@ -2989,8 +2987,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	save_commit_buffer_orig = save_commit_buffer;
 	save_commit_buffer = 0;
 
-	init_contains_cache(&ref_cbdata.contains_cache);
-	init_contains_cache(&ref_cbdata.no_contains_cache);
+	init_contains_cache(&filter->internal.contains_cache);
+	init_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Simple per-ref filtering */
 	if (!filter->kind)
@@ -3014,8 +3012,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 			head_ref(ref_filter_handler, &ref_cbdata);
 	}
 
-	clear_contains_cache(&ref_cbdata.contains_cache);
-	clear_contains_cache(&ref_cbdata.no_contains_cache);
+	clear_contains_cache(&filter->internal.contains_cache);
+	clear_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Filters that need revision walking */
 	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
diff --git a/ref-filter.h b/ref-filter.h
index d87d61238b7..0db3ff52889 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -7,6 +7,7 @@
 #include "commit.h"
 #include "string-list.h"
 #include "strvec.h"
+#include "commit-reach.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -75,6 +76,11 @@ struct ref_filter {
 		lines;
 	int abbrev,
 		verbose;
+
+	struct {
+		struct contains_cache contains_cache;
+		struct contains_cache no_contains_cache;
+	} internal;
 };
 
 struct ref_format {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-11-07  1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07  1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add two new public methods to 'ref-filter.h':

* 'print_formatted_ref_array()' which, given a format specification & array
  of ref items, formats and prints the items to stdout.
* 'filter_and_format_refs()' which combines 'filter_refs()',
  'ref_array_sort()', and 'print_formatted_ref_array()' into a single
  function.

This consolidates much of the code used to filter and format refs in
'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing
duplication and simplifying the future changes needed to optimize the filter
& format process.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c       | 33 +++++++++++++++++----------------
 builtin/for-each-ref.c | 27 +--------------------------
 builtin/tag.c          | 23 +----------------------
 ref-filter.c           | 35 +++++++++++++++++++++++++++++++++++
 ref-filter.h           | 14 ++++++++++++++
 5 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 5a1ec1cd04f..2ed59f16f1c 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -437,8 +437,6 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 {
 	int i;
 	struct ref_array array;
-	struct strbuf out = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	int maxwidth = 0;
 	const char *remote_prefix = "";
 	char *to_free = NULL;
@@ -468,24 +466,27 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
-	for (i = 0; i < array.nr; i++) {
-		strbuf_reset(&err);
-		strbuf_reset(&out);
-		if (format_ref_array_item(array.items[i], format, &out, &err))
-			die("%s", err.buf);
-		if (column_active(colopts)) {
-			assert(!filter->verbose && "--column and --verbose are incompatible");
-			 /* format to a string_list to let print_columns() do its job */
+	if (column_active(colopts)) {
+		struct strbuf out = STRBUF_INIT, err = STRBUF_INIT;
+
+		assert(!filter->verbose && "--column and --verbose are incompatible");
+
+		for (i = 0; i < array.nr; i++) {
+			strbuf_reset(&err);
+			strbuf_reset(&out);
+			if (format_ref_array_item(array.items[i], format, &out, &err))
+				die("%s", err.buf);
+
+			/* format to a string_list to let print_columns() do its job */
 			string_list_append(output, out.buf);
-		} else {
-			fwrite(out.buf, 1, out.len, stdout);
-			if (out.len || !format->array_opts.omit_empty)
-				putchar('\n');
 		}
+
+		strbuf_release(&err);
+		strbuf_release(&out);
+	} else {
+		print_formatted_ref_array(&array, format);
 	}
 
-	strbuf_release(&err);
-	strbuf_release(&out);
 	ref_array_clear(&array);
 	free(to_free);
 }
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 881c3ee055f..1c19cd5bd34 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -19,15 +19,11 @@ static char const * const for_each_ref_usage[] = {
 
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
-	int i, total;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int icase = 0;
-	struct ref_array array;
 	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
-	struct strbuf output = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	int from_stdin = 0;
 	struct strvec vec = STRVEC_INIT;
 
@@ -61,8 +57,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 
-	memset(&array, 0, sizeof(array));
-
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
 	git_config(git_default_config, NULL);
@@ -104,27 +98,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	}
 
 	filter.match_as_path = 1;
-	filter_refs(&array, &filter, FILTER_REFS_ALL);
-	filter_ahead_behind(the_repository, &format, &array);
-
-	ref_array_sort(sorting, &array);
-
-	total = format.array_opts.max_count;
-	if (!total || array.nr < total)
-		total = array.nr;
-	for (i = 0; i < total; i++) {
-		strbuf_reset(&err);
-		strbuf_reset(&output);
-		if (format_ref_array_item(array.items[i], &format, &output, &err))
-			die("%s", err.buf);
-		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !format.array_opts.omit_empty)
-			putchar('\n');
-	}
+	filter_and_format_refs(&filter, FILTER_REFS_ALL, sorting, &format);
 
-	strbuf_release(&err);
-	strbuf_release(&output);
-	ref_array_clear(&array);
 	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
diff --git a/builtin/tag.c b/builtin/tag.c
index 2d599245d48..2528d499dd8 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -48,13 +48,7 @@ static int config_sign_tag = -1; /* unspecified */
 static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
 {
-	struct ref_array array;
-	struct strbuf output = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	char *to_free = NULL;
-	int i;
-
-	memset(&array, 0, sizeof(array));
 
 	if (filter->lines == -1)
 		filter->lines = 0;
@@ -72,23 +66,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
-	filter_refs(&array, filter, FILTER_REFS_TAGS);
-	filter_ahead_behind(the_repository, format, &array);
-	ref_array_sort(sorting, &array);
-
-	for (i = 0; i < array.nr; i++) {
-		strbuf_reset(&output);
-		strbuf_reset(&err);
-		if (format_ref_array_item(array.items[i], format, &output, &err))
-			die("%s", err.buf);
-		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !format->array_opts.omit_empty)
-			putchar('\n');
-	}
+	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
 
-	strbuf_release(&err);
-	strbuf_release(&output);
-	ref_array_clear(&array);
 	free(to_free);
 
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index 5129b6986c9..8992fbf45b1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3023,6 +3023,18 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	return ret;
 }
 
+void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
+			    struct ref_sorting *sorting,
+			    struct ref_format *format)
+{
+	struct ref_array array = { 0 };
+	filter_refs(&array, filter, type);
+	filter_ahead_behind(the_repository, format, &array);
+	ref_array_sort(sorting, &array);
+	print_formatted_ref_array(&array, format);
+	ref_array_clear(&array);
+}
+
 static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
 {
 	if (!(a->kind ^ b->kind))
@@ -3212,6 +3224,29 @@ int format_ref_array_item(struct ref_array_item *info,
 	return 0;
 }
 
+void print_formatted_ref_array(struct ref_array *array, struct ref_format *format)
+{
+	int total;
+	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
+
+	total = format->array_opts.max_count;
+	if (!total || array->nr < total)
+		total = array->nr;
+	for (int i = 0; i < total; i++) {
+		strbuf_reset(&err);
+		strbuf_reset(&output);
+		if (format_ref_array_item(array->items[i], format, &output, &err))
+			die("%s", err.buf);
+		if (output.len || !format->array_opts.omit_empty) {
+			fwrite(output.buf, 1, output.len, stdout);
+			putchar('\n');
+		}
+	}
+
+	strbuf_release(&err);
+	strbuf_release(&output);
+}
+
 void pretty_print_ref(const char *name, const struct object_id *oid,
 		      struct ref_format *format)
 {
diff --git a/ref-filter.h b/ref-filter.h
index 0db3ff52889..0ce5af58ab3 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -137,6 +137,14 @@ struct ref_format {
  * filtered refs in the ref_array structure.
  */
 int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
+/*
+ * Filter refs using the given ref_filter and type, sort the contents
+ * according to the given ref_sorting, format the filtered refs with the
+ * given ref_format, and print them to stdout.
+ */
+void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
+			    struct ref_sorting *sorting,
+			    struct ref_format *format);
 /*  Clear all memory allocated to ref_array */
 void ref_array_clear(struct ref_array *array);
 /*  Used to verify if the given format is correct and to parse out the used atoms */
@@ -161,6 +169,12 @@ char *get_head_description(void);
 /*  Set up translated strings in the output. */
 void setup_ref_filter_porcelain_msg(void);
 
+/*
+ * Print up to maxcount ref_array elements to stdout using the given
+ * ref_format.
+ */
+void print_formatted_ref_array(struct ref_array *array, struct ref_format *format);
+
 /*
  * Print a single ref, outside of any ref-filter. Note that the
  * name must be a fully qualified refname.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/9] ref-filter.c: refactor to create common helper functions
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (4 preceding siblings ...)
  2023-11-07  1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07  1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
'filter_refs()' into new helper functions ('ref_array_append()',
'apply_ref_filter()', and 'do_filter_refs()' respectively), as well as
rename 'ref_filter_handler()' to 'filter_one()'. In this and later
patches, these helpers will be used by new ref-filter API functions. This
patch does not result in any user-facing behavior changes or changes to
callers outside of 'ref-filter.c'.

The changes are as follows:

* The logic to grow a 'struct ref_array' and append a given 'struct
  ref_array_item *' to it is extracted from 'ref_array_push()' into
  'ref_array_append()'.
* 'ref_filter_handler()' is renamed to 'filter_one()' to more clearly
  distinguish it from other ref filtering callbacks that will be added in
  later patches. The "*_one()" naming convention is common throughout the
  codebase for iteration callbacks.
* The code to filter a given ref by refname & object ID then create a new
  'struct ref_array_item' is moved out of 'filter_one()' and into
  'apply_ref_filter()'. 'apply_ref_filter()' returns either NULL (if the ref
  does not match the given filter) or a 'struct ref_array_item *' created
  with 'new_ref_array_item()'; 'filter_one()' appends that item to
  its ref array with 'ref_array_append()'.
* The filter pre-processing, contains cache creation, and ref iteration of
  'filter_refs()' is extracted into 'do_filter_refs()'. 'do_filter_refs()'
  takes its ref iterator function & callback data as an input from the
  caller, setting it up to be used with additional filtering callbacks in
  later patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 115 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 46 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 8992fbf45b1..ff00ab4b8d8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2716,15 +2716,18 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 	return ref;
 }
 
+static void ref_array_append(struct ref_array *array, struct ref_array_item *ref)
+{
+	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
+	array->items[array->nr++] = ref;
+}
+
 struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid)
 {
 	struct ref_array_item *ref = new_ref_array_item(refname, oid);
-
-	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
-	array->items[array->nr++] = ref;
-
+	ref_array_append(array, ref);
 	return ref;
 }
 
@@ -2761,46 +2764,36 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 	return ref_kind_from_refname(refname);
 }
 
-struct ref_filter_cbdata {
-	struct ref_array *array;
-	struct ref_filter *filter;
-};
-
-/*
- * A call-back given to for_each_ref().  Filter refs and keep them for
- * later object processing.
- */
-static int ref_filter_handler(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
+			    int flag, struct ref_filter *filter)
 {
-	struct ref_filter_cbdata *ref_cbdata = cb_data;
-	struct ref_filter *filter = ref_cbdata->filter;
 	struct ref_array_item *ref;
 	struct commit *commit = NULL;
 	unsigned int kind;
 
 	if (flag & REF_BAD_NAME) {
 		warning(_("ignoring ref with broken name %s"), refname);
-		return 0;
+		return NULL;
 	}
 
 	if (flag & REF_ISBROKEN) {
 		warning(_("ignoring broken ref %s"), refname);
-		return 0;
+		return NULL;
 	}
 
 	/* Obtain the current ref kind from filter_ref_kind() and ignore unwanted refs. */
 	kind = filter_ref_kind(filter, refname);
 	if (!(kind & filter->kind))
-		return 0;
+		return NULL;
 
 	if (!filter_pattern_match(filter, refname))
-		return 0;
+		return NULL;
 
 	if (filter_exclude_match(filter, refname))
-		return 0;
+		return NULL;
 
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
-		return 0;
+		return NULL;
 
 	/*
 	 * A merge filter is applied on refs pointing to commits. Hence
@@ -2811,15 +2804,15 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	    filter->with_commit || filter->no_commit || filter->verbose) {
 		commit = lookup_commit_reference_gently(the_repository, oid, 1);
 		if (!commit)
-			return 0;
+			return NULL;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
 		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
-			return 0;
+			return NULL;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
 		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
-			return 0;
+			return NULL;
 	}
 
 	/*
@@ -2827,11 +2820,32 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	 * to do its job and the resulting list may yet to be pruned
 	 * by maxcount logic.
 	 */
-	ref = ref_array_push(ref_cbdata->array, refname, oid);
+	ref = new_ref_array_item(refname, oid);
 	ref->commit = commit;
 	ref->flag = flag;
 	ref->kind = kind;
 
+	return ref;
+}
+
+struct ref_filter_cbdata {
+	struct ref_array *array;
+	struct ref_filter *filter;
+};
+
+/*
+ * A call-back given to for_each_ref().  Filter refs and keep them for
+ * later object processing.
+ */
+static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+{
+	struct ref_filter_cbdata *ref_cbdata = cb_data;
+	struct ref_array_item *ref;
+
+	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	if (ref)
+		ref_array_append(ref_cbdata->array, ref);
+
 	return 0;
 }
 
@@ -2967,26 +2981,12 @@ void filter_ahead_behind(struct repository *r,
 	free(commits);
 }
 
-/*
- * API for filtering a set of refs. Based on the type of refs the user
- * has requested, we iterate through those refs and apply filters
- * as per the given ref_filter structure and finally store the
- * filtered refs in the ref_array structure.
- */
-int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
+static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
 {
-	struct ref_filter_cbdata ref_cbdata;
-	int save_commit_buffer_orig;
 	int ret = 0;
 
-	ref_cbdata.array = array;
-	ref_cbdata.filter = filter;
-
 	filter->kind = type & FILTER_REFS_KIND_MASK;
 
-	save_commit_buffer_orig = save_commit_buffer;
-	save_commit_buffer = 0;
-
 	init_contains_cache(&filter->internal.contains_cache);
 	init_contains_cache(&filter->internal.no_contains_cache);
 
@@ -3001,20 +3001,43 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/heads/", fn, cb_data);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/remotes/", fn, cb_data);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/tags/", fn, cb_data);
 		else if (filter->kind & FILTER_REFS_ALL)
-			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
 		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			head_ref(ref_filter_handler, &ref_cbdata);
+			head_ref(fn, cb_data);
 	}
 
 	clear_contains_cache(&filter->internal.contains_cache);
 	clear_contains_cache(&filter->internal.no_contains_cache);
 
+	return ret;
+}
+
+/*
+ * API for filtering a set of refs. Based on the type of refs the user
+ * has requested, we iterate through those refs and apply filters
+ * as per the given ref_filter structure and finally store the
+ * filtered refs in the ref_array structure.
+ */
+int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
+{
+	struct ref_filter_cbdata ref_cbdata;
+	int save_commit_buffer_orig;
+	int ret = 0;
+
+	ref_cbdata.array = array;
+	ref_cbdata.filter = filter;
+
+	save_commit_buffer_orig = save_commit_buffer;
+	save_commit_buffer = 0;
+
+	ret = do_filter_refs(filter, type, filter_one, &ref_cbdata);
+
 	/*  Filters that need revision walking */
 	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
 	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 7/9] ref-filter.c: filter & format refs in the same callback
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (5 preceding siblings ...)
  2023-11-07  1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
@ 2023-11-07  1:25 ` Victoria Dye via GitGitGadget
  2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07  1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:25 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'filter_and_format_refs()' to try to perform ref filtering &
formatting in a single ref iteration, without an intermediate 'struct
ref_array'. This can only be done if no operations need to be performed on a
pre-filtered array; specifically, if the refs are

- filtered on reachability,
- sorted, or
- formatted with ahead-behind information

they cannot be filtered & formatted in the same iteration. In that case,
fall back on the current filter-then-sort-then-format flow.

This optimization substantially improves memory usage due to no longer
storing a ref array in memory. In some cases, it also dramatically reduces
runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
all refs into a 'struct ref_array' to printing only the first ref).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 74 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index ff00ab4b8d8..384cf1595ff 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2863,6 +2863,44 @@ static void free_array_item(struct ref_array_item *item)
 	free(item);
 }
 
+struct ref_filter_and_format_cbdata {
+	struct ref_filter *filter;
+	struct ref_format *format;
+
+	struct ref_filter_and_format_internal {
+		int count;
+	} internal;
+};
+
+static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+{
+	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
+	struct ref_array_item *ref;
+	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
+
+	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	if (!ref)
+		return 0;
+
+	if (format_ref_array_item(ref, ref_cbdata->format, &output, &err))
+		die("%s", err.buf);
+
+	if (output.len || !ref_cbdata->format->array_opts.omit_empty) {
+		fwrite(output.buf, 1, output.len, stdout);
+		putchar('\n');
+	}
+
+	strbuf_release(&output);
+	strbuf_release(&err);
+	free_array_item(ref);
+
+	if (ref_cbdata->format->array_opts.max_count &&
+	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
+		return -1;
+
+	return 0;
+}
+
 /* Free all memory allocated for ref_array */
 void ref_array_clear(struct ref_array *array)
 {
@@ -3046,16 +3084,46 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	return ret;
 }
 
+static inline int can_do_iterative_format(struct ref_filter *filter,
+					  struct ref_sorting *sorting,
+					  struct ref_format *format)
+{
+	/*
+	 * Refs can be filtered and formatted in the same iteration as long
+	 * as we aren't filtering on reachability, sorting the results, or
+	 * including ahead-behind information in the formatted output.
+	 */
+	return !(filter->reachable_from ||
+		 filter->unreachable_from ||
+		 sorting ||
+		 format->bases.nr);
+}
+
 void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
 			    struct ref_sorting *sorting,
 			    struct ref_format *format)
 {
-	struct ref_array array = { 0 };
-	filter_refs(&array, filter, type);
-	filter_ahead_behind(the_repository, format, &array);
-	ref_array_sort(sorting, &array);
-	print_formatted_ref_array(&array, format);
-	ref_array_clear(&array);
+	if (can_do_iterative_format(filter, sorting, format)) {
+		int save_commit_buffer_orig;
+		struct ref_filter_and_format_cbdata ref_cbdata = {
+			.filter = filter,
+			.format = format,
+		};
+
+		save_commit_buffer_orig = save_commit_buffer;
+		save_commit_buffer = 0;
+
+		do_filter_refs(filter, type, filter_and_format_one, &ref_cbdata);
+
+		save_commit_buffer = save_commit_buffer_orig;
+	} else {
+		struct ref_array array = { 0 };
+		filter_refs(&array, filter, type);
+		filter_ahead_behind(the_repository, format, &array);
+		ref_array_sort(sorting, &array);
+		print_formatted_ref_array(&array, format);
+		ref_array_clear(&array);
+	}
 }
 
 static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (6 preceding siblings ...)
  2023-11-07  1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
@ 2023-11-07  1:26 ` Victoria Dye via GitGitGadget
  2023-11-07 10:50   ` Patrick Steinhardt
  2023-11-07  1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:26 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a boolean flag '--full-deref' that, when enabled, fills '%(*fieldname)'
format fields using the fully peeled target of tag objects, rather than the
immediate target.

In other builtins ('rev-parse', 'show-ref'), "dereferencing" tags typically
means peeling them down to their non-tag target. Unlike these commands,
'for-each-ref' dereferences only one "level" of tags in '*' format fields
(like "%(*objectname)"). For most annotated tags, one level of dereferencing
is enough, since most tags point to commits or trees. However, nested tags
(annotated tags whose target is another annotated tag) dereferenced once
will point to their target tag, different a full peel to e.g. a commit.

Currently, if a user wants to filter & format refs and include information
about the fully dereferenced tag, they can do so with something like
'cat-file --batch-check':

    git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
        git cat-file --batch-check="%(objectname) %(rest)"

But the combination of commands is inefficient. So, to improve the
efficiency of this use case, add a '--full-deref' option that causes
'for-each-ref' to fully dereference tags when formatting with '*' fields.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-for-each-ref.txt |  9 ++++++++
 builtin/for-each-ref.c             |  2 ++
 ref-filter.c                       | 26 ++++++++++++++---------
 ref-filter.h                       |  1 +
 t/t6300-for-each-ref.sh            | 34 ++++++++++++++++++++++++++++++
 5 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 407f624fbaa..2714a87088e 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
 		   [(--sort=<key>)...] [--format=<format>]
 		   [ --stdin | <pattern>... ]
+		   [--full-deref]
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
@@ -77,6 +78,14 @@ OPTIONS
 	the specified host language.  This is meant to produce
 	a scriptlet that can directly be `eval`ed.
 
+--full-deref::
+	Populate dereferenced format fields (indicated with an asterisk (`*`)
+	prefix before the fieldname) with information about the fully-peeled
+	target object of a tag ref, rather than its immediate target object.
+	This only affects the output for nested annotated tags, where the tag's
+	immediate target is another tag but its fully-peeled target is another
+	object type (e.g. a commit).
+
 --points-at=<object>::
 	Only list refs which points at the given object.
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 1c19cd5bd34..7a2127a3bc4 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -43,6 +43,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
+		OPT_BOOL(0, "full-deref", &format.full_deref,
+			 N_("fully dereference tags to populate '*' format fields")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
 		OPT_REF_SORT(&sorting_options),
 		OPT_CALLBACK(0, "points-at", &filter.points_at,
diff --git a/ref-filter.c b/ref-filter.c
index 384cf1595ff..a66ac7921b1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -237,7 +237,14 @@ static struct used_atom {
 		char *head;
 	} u;
 } *used_atom;
-static int used_atom_cnt, need_tagged, need_symref;
+static int used_atom_cnt, need_symref;
+
+enum tag_dereference_mode {
+	NO_DEREF = 0,
+	DEREF_ONE,
+	DEREF_ALL
+};
+static enum tag_dereference_mode need_tagged;
 
 /*
  * Expand string, append it to strbuf *sb, then return error code ret.
@@ -1066,8 +1073,8 @@ static int parse_ref_filter_atom(struct ref_format *format,
 	memset(&used_atom[at].u, 0, sizeof(used_atom[at].u));
 	if (valid_atom[i].parser && valid_atom[i].parser(format, &used_atom[at], arg, err))
 		return -1;
-	if (*atom == '*')
-		need_tagged = 1;
+	if (*atom == '*' && !need_tagged)
+		need_tagged = format->full_deref ? DEREF_ALL : DEREF_ONE;
 	if (i == ATOM_SYMREF)
 		need_symref = 1;
 	return at;
@@ -2511,14 +2518,13 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	 * If it is a tag object, see if we use a value that derefs
 	 * the object, and if we do grab the object it refers to.
 	 */
-	oi_deref.oid = *get_tagged_oid((struct tag *)obj);
+	if (need_tagged == DEREF_ALL) {
+		if (peel_iterated_oid(&obj->oid, &oi_deref.oid))
+			die("bad tag");
+	} else {
+		oi_deref.oid = *get_tagged_oid((struct tag *)obj);
+	}
 
-	/*
-	 * NEEDSWORK: This derefs tag only once, which
-	 * is good to deal with chains of trust, but
-	 * is not consistent with what deref_tag() does
-	 * which peels the onion to the core.
-	 */
 	return get_object(ref, 1, &obj, &oi_deref, err);
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index 0ce5af58ab3..0caa39ecee5 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,7 @@ struct ref_format {
 	const char *rest;
 	int quote_style;
 	int use_color;
+	int full_deref;
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 0613e5e3623..3c2af785cdb 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1839,6 +1839,40 @@ test_expect_success 'git for-each-ref with non-existing refs' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'git for-each-ref with nested tags' '
+	git tag -am "Normal tag" nested/base HEAD &&
+	git tag -am "Nested tag" nested/nest1 refs/tags/nested/base &&
+	git tag -am "Double nested tag" nested/nest2 refs/tags/nested/nest1 &&
+
+	head_oid="$(git rev-parse HEAD)" &&
+	base_tag_oid="$(git rev-parse refs/tags/nested/base)" &&
+	nest1_tag_oid="$(git rev-parse refs/tags/nested/nest1)" &&
+	nest2_tag_oid="$(git rev-parse refs/tags/nested/nest2)" &&
+
+	# Without full dereference
+	cat >expect <<-EOF &&
+	refs/tags/nested/base $base_tag_oid tag $head_oid commit
+	refs/tags/nested/nest1 $nest1_tag_oid tag $base_tag_oid tag
+	refs/tags/nested/nest2 $nest2_tag_oid tag $nest1_tag_oid tag
+	EOF
+
+	git for-each-ref --format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
+		refs/tags/nested/ >actual &&
+	test_cmp expect actual &&
+
+	# With full dereference
+	cat >expect <<-EOF &&
+	refs/tags/nested/base $base_tag_oid tag $head_oid commit
+	refs/tags/nested/nest1 $nest1_tag_oid tag $head_oid commit
+	refs/tags/nested/nest2 $nest2_tag_oid tag $head_oid commit
+	EOF
+
+	git for-each-ref --full-deref \
+		--format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
+		refs/tags/nested/ >actual &&
+	test_cmp expect actual
+'
+
 GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
 TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 9/9] t/perf: add perf tests for for-each-ref
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (7 preceding siblings ...)
  2023-11-07  1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
@ 2023-11-07  1:26 ` Victoria Dye via GitGitGadget
  2023-11-07  2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
  10 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-07  1:26 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add performance tests for 'for-each-ref'. The tests exercise different
combinations of filters/formats/options, as well as the overall performance
of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
performance difference vs. 'git for-each-ref --full-deref'.

All tests are run against a repository with 40k loose refs - 10k commits,
each having a unique:

- branch
- custom ref (refs/custom/special_*)
- annotated tag pointing at the commit
- annotated tag pointing at the other annotated tag (i.e., a nested tag)

After those tests are finished, the refs are packed with 'pack-refs --all'
and the same tests are rerun.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p6300-for-each-ref.sh | 87 ++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100755 t/perf/p6300-for-each-ref.sh

diff --git a/t/perf/p6300-for-each-ref.sh b/t/perf/p6300-for-each-ref.sh
new file mode 100755
index 00000000000..172fd68a4e9
--- /dev/null
+++ b/t/perf/p6300-for-each-ref.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+
+test_description='performance of for-each-ref'
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+ref_count_per_type=10000
+test_iteration_count=10
+
+test_expect_success "setup" '
+	test_commit_bulk $(( 1 + $ref_count_per_type )) &&
+
+	# Create refs
+	test_seq $ref_count_per_type |
+		sed "s,.*,update refs/heads/branch_& HEAD~&\nupdate refs/custom/special_& HEAD~&," |
+		git update-ref --stdin &&
+
+	# Create annotated tags
+	for i in $(test_seq $ref_count_per_type)
+	do
+		# Base tags
+		echo "tag tag_$i" &&
+		echo "mark :$i" &&
+		echo "from HEAD~$i" &&
+		printf "tagger %s <%s> %s\n" \
+			"$GIT_COMMITTER_NAME" \
+			"$GIT_COMMITTER_EMAIL" \
+			"$GIT_COMMITTER_DATE" &&
+		echo "data <<EOF" &&
+		echo "tag $i" &&
+		echo "EOF" &&
+
+		# Nested tags
+		echo "tag nested_$i" &&
+		echo "from :$i" &&
+		printf "tagger %s <%s> %s\n" \
+			"$GIT_COMMITTER_NAME" \
+			"$GIT_COMMITTER_EMAIL" \
+			"$GIT_COMMITTER_DATE" &&
+		echo "data <<EOF" &&
+		echo "nested tag $i" &&
+		echo "EOF" || return 1
+	done | git fast-import
+'
+
+test_for_each_ref () {
+	title="for-each-ref"
+	if test $# -gt 0; then
+		title="$title ($1)"
+		shift
+	fi
+	args="$@"
+
+	test_perf "$title" "
+		for i in \$(test_seq $test_iteration_count); do
+			git for-each-ref $args >/dev/null
+		done
+	"
+}
+
+run_tests () {
+	test_for_each_ref "$1"
+	test_for_each_ref "$1, no sort" --no-sort
+	test_for_each_ref "$1, tags" refs/tags/
+	test_for_each_ref "$1, tags, no sort" --no-sort refs/tags/
+	test_for_each_ref "$1, tags, shallow deref" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+	test_for_each_ref "$1, tags, shallow deref, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+	test_for_each_ref "$1, tags, full deref" --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+	test_for_each_ref "$1, tags, full deref, no sort" --no-sort --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+
+	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (full deref)" "
+		for i in \$(test_seq $test_iteration_count); do
+			git for-each-ref --format='%(objectname)^{} %(refname) %(objectname)' refs/tags/ | \
+				git cat-file --batch-check='%(objectname) %(rest)' >/dev/null
+		done
+	"
+}
+
+run_tests "loose"
+
+test_expect_success 'pack refs' '
+	git pack-refs --all
+'
+run_tests "packed"
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/9] for-each-ref optimizations & usability improvements
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (8 preceding siblings ...)
  2023-11-07  1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
@ 2023-11-07  2:36 ` Junio C Hamano
  2023-11-07  2:48   ` Victoria Dye
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
  10 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2023-11-07  2:36 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series is a bit of an informal follow-up to [1], adding some more
> substantial optimizations and usability fixes around ref
> filtering/formatting. Some of the changes here affect user-facing behavior,
> some are internal-only, but they're all interdependent enough to warrant
> putting them together in one series.
>
> [1]
> https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/
>
> Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
> 'tag', and 'branch'. Currently, it just removes previous sort keys and, if
> no further keys are specified, falls back on ascending refname sort (which,
> IMO, makes the name '--no-sort' somewhat misleading).

We can read it changes the behaviour and what the current behaviour
is, but I presume that the untold new behaviour with --no-sort is to
show the output in an unspecified order of implementation's
convenience?  I think it makes quite a lot of sense if that is what
is done.

> Patch 2 updates the 'for-each-ref' docs to clearly state what happens if you
> use '--omit-empty' and '--count' together. I based the explanation on what
> the current behavior is (i.e., refs omitted with '--omit-empty' do count
> towards the total limited by '--count').

OK.

> Patches 3-7 incrementally refactor various parts of the ref
> filtering/formatting workflows in order to create a
> 'filter_and_format_refs()' function. If certain conditions are met (sorting
> disabled, no reachability filtering or ahead-behind formatting), ref
> filtering & formatting is done within a single 'for_each_fullref_in'
> callback. Especially in large repositories, this makes a huge difference in
> memory usage & runtime for certain usages of 'for-each-ref', since it's no
> longer writing everything to a 'struct ref_array' then repeatedly whittling
> down/updating its contents.

OK.  I was wondering if you are going threaded implementation, until
I read into 6th line ;-)

> Patch 8 introduces a new option to 'for-each-ref' called '--full-deref'.
> When provided, any format fields for the dereferenced value of a tag (e.g.
> "%(*objectname)") will be populated with the fully peeled target of the tag;
> right now, those fields are populated with the immediate target of a tag
> (which can be another tag). This avoids the need to pipe 'for-each-ref'
> results to 'cat-file --batch-check' to get fully-peeled tag information. It
> also benefits from the 'filter_and_format_refs()' single-iteration
> optimization, since 'peel_iterated_oid()' may be able to read the
> pre-computed peeled OID from a packed ref. A couple notes on this one:
>
>  * I went with a command line option for '--full-deref' rather than another
>    format specifier (like ** instead of *) because it seems unlikely that a
>    user is going to want to perform a shallow dereference and a full
>    dereference in the same 'for-each-ref'. There's also a NEEDSWORK going
>    all the way back to the introduction of 'for-each-ref' in 9f613ddd21c
>    (Add git-for-each-ref: helper for language bindings, 2006-09-15) that (to
>    me) implies different dereferencing behavior corresponds to different use
>    cases/user needs.

Makes quite a lot of sense.

>  * I'm not attached to '--full-deref' as a name - if someone has an idea for
>    a more descriptive name, please suggest it!

Another candidate verb may be "to peel", and I have no strong
opinion between it and "to dereference".  But I have a mild aversion
to an abbreviation that is not strongly established.

> Finally, patch 9 adds performance tests for 'for-each-ref', showing the
> effects of optimizations made throughout the series. Here are some sample
> results from my Ubuntu VM (test names shortened for space):

Nice.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/9] for-each-ref optimizations & usability improvements
  2023-11-07  2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
@ 2023-11-07  2:48   ` Victoria Dye
  2023-11-07  3:04     ` Junio C Hamano
  2023-11-07 10:49     ` Patrick Steinhardt
  0 siblings, 2 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-07  2:48 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This series is a bit of an informal follow-up to [1], adding some more
>> substantial optimizations and usability fixes around ref
>> filtering/formatting. Some of the changes here affect user-facing behavior,
>> some are internal-only, but they're all interdependent enough to warrant
>> putting them together in one series.
>>
>> [1]
>> https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/
>>
>> Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
>> 'tag', and 'branch'. Currently, it just removes previous sort keys and, if
>> no further keys are specified, falls back on ascending refname sort (which,
>> IMO, makes the name '--no-sort' somewhat misleading).
> 
> We can read it changes the behaviour and what the current behaviour
> is, but I presume that the untold new behaviour with --no-sort is to
> show the output in an unspecified order of implementation's
> convenience?  I think it makes quite a lot of sense if that is what
> is done.

Ah sorry, I over-edited my cover letter and accidentally removed the
explanation of what this patch does! Yes - the new behavior is that
'--no-sort' (assuming there are no subsequent --sort=<something> options)
will completely skip sorting the filtered refs. 

>>  * I'm not attached to '--full-deref' as a name - if someone has an idea for
>>    a more descriptive name, please suggest it!
> 
> Another candidate verb may be "to peel", and I have no strong
> opinion between it and "to dereference".  But I have a mild aversion
> to an abbreviation that is not strongly established.
> 

Makes sense. I got the "deref" abbreviation for 'update-ref --no-deref', but
'show-ref' has a "--dereference" option and protocol v2's "ls-refs" includes
a "peel" arg. "Dereference" is the term already used in the 'for-each-ref'
documentation, though, so if no one comes in with an especially strong
opinion on this I'll change the option to '--full-dereference'. Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/9] for-each-ref optimizations & usability improvements
  2023-11-07  2:48   ` Victoria Dye
@ 2023-11-07  3:04     ` Junio C Hamano
  2023-11-07 10:49     ` Patrick Steinhardt
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-07  3:04 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Victoria Dye via GitGitGadget, git

Victoria Dye <vdye@github.com> writes:

> Ah sorry, I over-edited my cover letter and accidentally removed the
> explanation of what this patch does! Yes - the new behavior is that
> '--no-sort' (assuming there are no subsequent --sort=<something> options)
> will completely skip sorting the filtered refs. 

Makes sense.

And the way to countermand "--no-sort" that appears earlier on the
command line to revert to the default sort order is "--sort" that
uses "refname" as the sort key, which is also nice.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/9] for-each-ref optimizations & usability improvements
  2023-11-07  2:48   ` Victoria Dye
  2023-11-07  3:04     ` Junio C Hamano
@ 2023-11-07 10:49     ` Patrick Steinhardt
  2023-11-08  1:31       ` Victoria Dye
  1 sibling, 1 reply; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:49 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, Victoria Dye via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

On Mon, Nov 06, 2023 at 06:48:29PM -0800, Victoria Dye wrote:
> Junio C Hamano wrote:
> > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
[snip]
> >>  * I'm not attached to '--full-deref' as a name - if someone has an idea for
> >>    a more descriptive name, please suggest it!
> > 
> > Another candidate verb may be "to peel", and I have no strong
> > opinion between it and "to dereference".  But I have a mild aversion
> > to an abbreviation that is not strongly established.
> > 
> 
> Makes sense. I got the "deref" abbreviation for 'update-ref --no-deref', but
> 'show-ref' has a "--dereference" option and protocol v2's "ls-refs" includes
> a "peel" arg. "Dereference" is the term already used in the 'for-each-ref'
> documentation, though, so if no one comes in with an especially strong
> opinion on this I'll change the option to '--full-dereference'. Thanks!

But doesn't dereferencing in the context of git-update-ref(1) refer to
something different? It's not about tags, but it is about symbolic
references and whether we want to update the symref or the pointee. But
true enough, in git-show-ref(1) "dereference" actually means that we
should peel the tag.

To me it feels like preexisting commands are confused already. In my
mind model:

    - "peel" means that an object gets resolved to one of its pointees.
      This also includes the case here, where a tag gets peeled to its
      pointee.

    - "dereference" means that a symbolic reference gets resolved to its
      pointee. This matches what we do in `git update-ref --no-deref`.

But after reading through the code I don't think we distinguish those
terms cleanly throughout our codebase. Still, "peeling" feels like a
better match in my opinion.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort
  2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
@ 2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07 18:13     ` Victoria Dye
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:49 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

[-- Attachment #1: Type: text/plain, Size: 13178 bytes --]

On Tue, Nov 07, 2023 at 01:25:53AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if
> the string list provided to it is empty, rather than returning the default
> refname sort structure. Also update 'ref_array_sort()' to explicitly skip
> sorting if its 'struct ref_sorting *' arg is NULL. Other functions using
> 'struct ref_sorting *' do not need any changes because they already properly
> ignore NULL values.
> 
> The goal of this change is to have the '--no-sort' option truly disable
> sorting in commands like 'for-each-ref, 'tag', and 'branch'. Right now,
> '--no-sort' will still trigger refname sorting by default in 'for-each-ref',
> 'tag', and 'branch'.
> 
> To match existing behavior as closely as possible, explicitly add "refname"
> to the list of sort keys in 'for-each-ref', 'tag', and 'branch' before
> parsing options (if no config-based sort keys are set). This ensures that
> sorting will only be fully disabled if '--no-sort' is provided as an option;
> otherwise, "refname" sorting will remain the default. Note: this also means
> that even when sort keys are provided on the command line, "refname" will be
> the final sort key in the sorting structure. This doesn't actually change
> any behavior, since 'compare_refs()' already falls back on comparing
> refnames if two refs are equal w.r.t all other sort keys.
> 
> Finally, remove the condition around sorting in 'ls-remote', since it's no
> longer necessary. Unlike 'for-each-ref' et. al., it does *not* set any sort
> keys by default. The default empty list of sort keys will produce a NULL
> 'struct ref_sorting *', which causes the sorting to be skipped in
> 'ref_array_sort()'.

I found the order in this commit message a bit funny because you first
explain what you're doing, then explain the goal, and then jump into the
changes again. The message might be a bit easier to read if the goal was
stated up front.

I was also briefly wondering whether it would make sense to split up
this commit, as you're doing two different things:

    - Refactor how git-for-each-ref(1), git-tag(1) and git-branch(1) set
      up their default sorting.

    - Change `ref_array_sort()` to not sort when its sorting option is
      `NULL`.

If this was split up into two commits, then the result might be a bit
easier to reason about. But I don't feel strongly about this.

> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/branch.c        |  6 ++++
>  builtin/for-each-ref.c  |  3 ++
>  builtin/ls-remote.c     | 10 ++----
>  builtin/tag.c           |  6 ++++
>  ref-filter.c            | 19 ++----------
>  t/t3200-branch.sh       | 68 +++++++++++++++++++++++++++++++++++++++--
>  t/t6300-for-each-ref.sh | 21 +++++++++++++
>  t/t7004-tag.sh          | 45 +++++++++++++++++++++++++++
>  8 files changed, 152 insertions(+), 26 deletions(-)
> 
> diff --git a/builtin/branch.c b/builtin/branch.c
> index e7ee9bd0f15..d67738bbcaa 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -767,7 +767,13 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>  	if (argc == 2 && !strcmp(argv[1], "-h"))
>  		usage_with_options(builtin_branch_usage, options);
>  
> +	/*
> +	 * Try to set sort keys from config. If config does not set any,
> +	 * fall back on default (refname) sorting.
> +	 */
>  	git_config(git_branch_config, &sorting_options);
> +	if (!sorting_options.nr)
> +		string_list_append(&sorting_options, "refname");
>  
>  	track = git_branch_track;
>  
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index 350bfa6e811..93b370f550b 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -67,6 +67,9 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  
>  	git_config(git_default_config, NULL);
>  
> +	/* Set default (refname) sorting */
> +	string_list_append(&sorting_options, "refname");
> +
>  	parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
>  	if (maxcount < 0) {
>  		error("invalid --count argument: `%d'", maxcount);
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index fc765754305..436249b720c 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -58,6 +58,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  	struct transport *transport;
>  	const struct ref *ref;
>  	struct ref_array ref_array;
> +	struct ref_sorting *sorting;
>  	struct string_list sorting_options = STRING_LIST_INIT_DUP;
>  
>  	struct option options[] = {
> @@ -141,13 +142,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  		item->symref = xstrdup_or_null(ref->symref);
>  	}
>  
> -	if (sorting_options.nr) {
> -		struct ref_sorting *sorting;
> -
> -		sorting = ref_sorting_options(&sorting_options);
> -		ref_array_sort(sorting, &ref_array);
> -		ref_sorting_release(sorting);
> -	}
> +	sorting = ref_sorting_options(&sorting_options);
> +	ref_array_sort(sorting, &ref_array);

We stopped calling `ref_sorting_release()`. Doesn't that cause us to
leak memory?

>  	for (i = 0; i < ref_array.nr; i++) {
>  		const struct ref_array_item *ref = ref_array.items[i];
> diff --git a/builtin/tag.c b/builtin/tag.c
> index 3918eacbb57..64f3196cd4c 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -501,7 +501,13 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>  
>  	setup_ref_filter_porcelain_msg();
>  
> +	/*
> +	 * Try to set sort keys from config. If config does not set any,
> +	 * fall back on default (refname) sorting.
> +	 */
>  	git_config(git_tag_config, &sorting_options);
> +	if (!sorting_options.nr)
> +		string_list_append(&sorting_options, "refname");
>  
>  	memset(&opt, 0, sizeof(opt));
>  	filter.lines = -1;
> diff --git a/ref-filter.c b/ref-filter.c
> index e4d3510e28e..7250089b7c6 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3142,7 +3142,8 @@ void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting,
>  
>  void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
>  {
> -	QSORT_S(array->items, array->nr, compare_refs, sorting);
> +	if (sorting)
> +		QSORT_S(array->items, array->nr, compare_refs, sorting);
>  }
>  
>  static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
> @@ -3248,18 +3249,6 @@ static int parse_sorting_atom(const char *atom)
>  	return res;
>  }
>  
> -/*  If no sorting option is given, use refname to sort as default */
> -static struct ref_sorting *ref_default_sorting(void)
> -{
> -	static const char cstr_name[] = "refname";
> -
> -	struct ref_sorting *sorting = xcalloc(1, sizeof(*sorting));
> -
> -	sorting->next = NULL;
> -	sorting->atom = parse_sorting_atom(cstr_name);
> -	return sorting;
> -}
> -
>  static void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *arg)
>  {
>  	struct ref_sorting *s;
> @@ -3283,9 +3272,7 @@ struct ref_sorting *ref_sorting_options(struct string_list *options)
>  	struct string_list_item *item;
>  	struct ref_sorting *sorting = NULL, **tail = &sorting;
>  
> -	if (!options->nr) {
> -		sorting = ref_default_sorting();
> -	} else {
> +	if (options->nr) {
>  		for_each_string_list_item(item, options)
>  			parse_ref_sorting(tail, item->string);
>  	}
> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 3182abde27f..9918ba05dec 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1570,9 +1570,10 @@ test_expect_success 'tracking with unexpected .fetch refspec' '
>  
>  test_expect_success 'configured committerdate sort' '
>  	git init -b main sort &&
> +	test_config -C sort branch.sort "committerdate" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort committerdate &&
>  		test_commit initial &&
>  		git checkout -b a &&
>  		test_commit a &&
> @@ -1592,9 +1593,10 @@ test_expect_success 'configured committerdate sort' '
>  '
>  
>  test_expect_success 'option override configured sort' '
> +	test_config -C sort branch.sort "committerdate" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort committerdate &&
>  		git branch --sort=refname >actual &&
>  		cat >expect <<-\EOF &&
>  		  a
> @@ -1606,10 +1608,70 @@ test_expect_success 'option override configured sort' '
>  	)
>  '
>  
> +test_expect_success '--no-sort cancels config sort keys' '
> +	test_config -C sort branch.sort "-refname" &&
> +
> +	(
> +		cd sort &&
> +
> +		# objecttype is identical for all of them, so sort falls back on
> +		# default (ascending refname)
> +		git branch \
> +			--no-sort \
> +			--sort="objecttype" >actual &&

This test is a bit confusing to me. Shouldn't we in fact ignore the
configured sorting order as soon as we pass `--sort=` anyway? In other
words, I would expect the `--no-sort` option to not make a difference
here. What should make a difference is if you _only_ passed `--no-sort`.

> +		cat >expect <<-\EOF &&
> +		  a
> +		* b
> +		  c
> +		  main
> +		EOF
> +		test_cmp expect actual
> +	)
> +
> +'
> +
> +test_expect_success '--no-sort cancels command line sort keys' '
> +	(
> +		cd sort &&
> +
> +		# objecttype is identical for all of them, so sort falls back on
> +		# default (ascending refname)
> +		git branch \
> +			--sort="-refname" \
> +			--no-sort \
> +			--sort="objecttype" >actual &&
> +		cat >expect <<-\EOF &&
> +		  a
> +		* b
> +		  c
> +		  main
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success '--no-sort without subsequent --sort prints expected branches' '
> +	(
> +		cd sort &&
> +
> +		# Sort the results with `sort` for a consistent comparison
> +		# against expected
> +		git branch --no-sort | sort >actual &&
> +		cat >expect <<-\EOF &&
> +		  a
> +		  c
> +		  main
> +		* b
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_expect_success 'invalid sort parameter in configuration' '
> +	test_config -C sort branch.sort "v:notvalid" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort "v:notvalid" &&
>  
>  		# this works in the "listing" mode, so bad sort key
>  		# is a dying offence.
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 00a060df0b5..0613e5e3623 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -1335,6 +1335,27 @@ test_expect_success '--no-sort cancels the previous sort keys' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success '--no-sort without subsequent --sort prints expected refs' '
> +	cat >expected <<-\EOF &&
> +	refs/tags/multi-ref1-100000-user1
> +	refs/tags/multi-ref1-100000-user2
> +	refs/tags/multi-ref1-200000-user1
> +	refs/tags/multi-ref1-200000-user2
> +	refs/tags/multi-ref2-100000-user1
> +	refs/tags/multi-ref2-100000-user2
> +	refs/tags/multi-ref2-200000-user1
> +	refs/tags/multi-ref2-200000-user2
> +	EOF
> +
> +	# Sort the results with `sort` for a consistent comparison against
> +	# expected
> +	git for-each-ref \
> +		--format="%(refname)" \
> +		--no-sort \
> +		"refs/tags/multi-*" | sort >actual &&
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'do not dereference NULL upon %(HEAD) on unborn branch' '
>  	test_when_finished "git checkout main" &&
>  	git for-each-ref --format="%(HEAD) %(refname:short)" refs/heads/ >actual &&
> diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
> index e689db42929..b41a47eb943 100755
> --- a/t/t7004-tag.sh
> +++ b/t/t7004-tag.sh
> @@ -1862,6 +1862,51 @@ test_expect_success 'option override configured sort' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--no-sort cancels config sort keys' '
> +	test_config tag.sort "-refname" &&
> +
> +	# objecttype is identical for all of them, so sort falls back on
> +	# default (ascending refname)
> +	git tag -l \
> +		--no-sort \
> +		--sort="objecttype" \
> +		"foo*" >actual &&
> +	cat >expect <<-\EOF &&
> +	foo1.10
> +	foo1.3
> +	foo1.6
> +	EOF
> +	test_cmp expect actual
> +'

Same question here.

Patrick

> +test_expect_success '--no-sort cancels command line sort keys' '
> +	# objecttype is identical for all of them, so sort falls back on
> +	# default (ascending refname)
> +	git tag -l \
> +		--sort="-refname" \
> +		--no-sort \
> +		--sort="objecttype" \
> +		"foo*" >actual &&
> +	cat >expect <<-\EOF &&
> +	foo1.10
> +	foo1.3
> +	foo1.6
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--no-sort without subsequent --sort prints expected tags' '
> +	# Sort the results with `sort` for a consistent comparison against
> +	# expected
> +	git tag -l --no-sort "foo*" | sort >actual &&
> +	cat >expect <<-\EOF &&
> +	foo1.10
> +	foo1.3
> +	foo1.6
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'invalid sort parameter on command line' '
>  	test_must_fail git tag -l --sort=notvalid "foo*" >actual
>  '
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/9] ref-filter.h: move contains caches into filter
  2023-11-07  1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
@ 2023-11-07 10:49   ` Patrick Steinhardt
  0 siblings, 0 replies; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:49 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

[-- Attachment #1: Type: text/plain, Size: 3709 bytes --]

On Tue, Nov 07, 2023 at 01:25:56AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into
> an 'internal' struct of the 'struct ref_filter'. In later patches, the
> 'struct ref_filter *' will be a common data structure across multiple
> filtering functions. These caches are part of the common functionality the
> filter struct will support, so they are updated to be internally accessible
> wherever the filter is used.
> 
> The design used here is mirrors what was introduced in 576de3d956

Nit: s/is //

Patrick

> (unpack_trees: start splitting internal fields from public API, 2023-02-27)
> for 'unpack_trees_options'.
> 
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  ref-filter.c | 14 ++++++--------
>  ref-filter.h |  6 ++++++
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index 7250089b7c6..5129b6986c9 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2764,8 +2764,6 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
>  struct ref_filter_cbdata {
>  	struct ref_array *array;
>  	struct ref_filter *filter;
> -	struct contains_cache contains_cache;
> -	struct contains_cache no_contains_cache;
>  };
>  
>  /*
> @@ -2816,11 +2814,11 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
>  			return 0;
>  		/* We perform the filtering for the '--contains' option... */
>  		if (filter->with_commit &&
> -		    !commit_contains(filter, commit, filter->with_commit, &ref_cbdata->contains_cache))
> +		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
>  			return 0;
>  		/* ...or for the `--no-contains' option */
>  		if (filter->no_commit &&
> -		    commit_contains(filter, commit, filter->no_commit, &ref_cbdata->no_contains_cache))
> +		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
>  			return 0;
>  	}
>  
> @@ -2989,8 +2987,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
>  	save_commit_buffer_orig = save_commit_buffer;
>  	save_commit_buffer = 0;
>  
> -	init_contains_cache(&ref_cbdata.contains_cache);
> -	init_contains_cache(&ref_cbdata.no_contains_cache);
> +	init_contains_cache(&filter->internal.contains_cache);
> +	init_contains_cache(&filter->internal.no_contains_cache);
>  
>  	/*  Simple per-ref filtering */
>  	if (!filter->kind)
> @@ -3014,8 +3012,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
>  			head_ref(ref_filter_handler, &ref_cbdata);
>  	}
>  
> -	clear_contains_cache(&ref_cbdata.contains_cache);
> -	clear_contains_cache(&ref_cbdata.no_contains_cache);
> +	clear_contains_cache(&filter->internal.contains_cache);
> +	clear_contains_cache(&filter->internal.no_contains_cache);
>  
>  	/*  Filters that need revision walking */
>  	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
> diff --git a/ref-filter.h b/ref-filter.h
> index d87d61238b7..0db3ff52889 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -7,6 +7,7 @@
>  #include "commit.h"
>  #include "string-list.h"
>  #include "strvec.h"
> +#include "commit-reach.h"
>  
>  /* Quoting styles */
>  #define QUOTE_NONE 0
> @@ -75,6 +76,11 @@ struct ref_filter {
>  		lines;
>  	int abbrev,
>  		verbose;
> +
> +	struct {
> +		struct contains_cache contains_cache;
> +		struct contains_cache no_contains_cache;
> +	} internal;
>  };
>  
>  struct ref_format {
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/9] ref-filter.c: refactor to create common helper functions
  2023-11-07  1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
@ 2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07 18:41     ` Victoria Dye
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:49 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

[-- Attachment #1: Type: text/plain, Size: 9785 bytes --]

On Tue, Nov 07, 2023 at 01:25:58AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
> 'filter_refs()' into new helper functions ('ref_array_append()',
> 'apply_ref_filter()', and 'do_filter_refs()' respectively), as well as
> rename 'ref_filter_handler()' to 'filter_one()'. In this and later
> patches, these helpers will be used by new ref-filter API functions. This
> patch does not result in any user-facing behavior changes or changes to
> callers outside of 'ref-filter.c'.
> 
> The changes are as follows:
> 
> * The logic to grow a 'struct ref_array' and append a given 'struct
>   ref_array_item *' to it is extracted from 'ref_array_push()' into
>   'ref_array_append()'.
> * 'ref_filter_handler()' is renamed to 'filter_one()' to more clearly
>   distinguish it from other ref filtering callbacks that will be added in
>   later patches. The "*_one()" naming convention is common throughout the
>   codebase for iteration callbacks.
> * The code to filter a given ref by refname & object ID then create a new
>   'struct ref_array_item' is moved out of 'filter_one()' and into
>   'apply_ref_filter()'. 'apply_ref_filter()' returns either NULL (if the ref
>   does not match the given filter) or a 'struct ref_array_item *' created
>   with 'new_ref_array_item()'; 'filter_one()' appends that item to
>   its ref array with 'ref_array_append()'.
> * The filter pre-processing, contains cache creation, and ref iteration of
>   'filter_refs()' is extracted into 'do_filter_refs()'. 'do_filter_refs()'
>   takes its ref iterator function & callback data as an input from the
>   caller, setting it up to be used with additional filtering callbacks in
>   later patches.

To me, a bulleted list spelling out the different changes I'm doing
often indicates that I might want to split up the commit into one for
each of the items. I don't feel strongly about this, but think that it
might help the reviewer in this case.

Patrick

> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  ref-filter.c | 115 ++++++++++++++++++++++++++++++---------------------
>  1 file changed, 69 insertions(+), 46 deletions(-)
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index 8992fbf45b1..ff00ab4b8d8 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2716,15 +2716,18 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
>  	return ref;
>  }
>  
> +static void ref_array_append(struct ref_array *array, struct ref_array_item *ref)
> +{
> +	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
> +	array->items[array->nr++] = ref;
> +}
> +
>  struct ref_array_item *ref_array_push(struct ref_array *array,
>  				      const char *refname,
>  				      const struct object_id *oid)
>  {
>  	struct ref_array_item *ref = new_ref_array_item(refname, oid);
> -
> -	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
> -	array->items[array->nr++] = ref;
> -
> +	ref_array_append(array, ref);
>  	return ref;
>  }
>  
> @@ -2761,46 +2764,36 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
>  	return ref_kind_from_refname(refname);
>  }
>  
> -struct ref_filter_cbdata {
> -	struct ref_array *array;
> -	struct ref_filter *filter;
> -};
> -
> -/*
> - * A call-back given to for_each_ref().  Filter refs and keep them for
> - * later object processing.
> - */
> -static int ref_filter_handler(const char *refname, const struct object_id *oid, int flag, void *cb_data)
> +static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
> +			    int flag, struct ref_filter *filter)
>  {
> -	struct ref_filter_cbdata *ref_cbdata = cb_data;
> -	struct ref_filter *filter = ref_cbdata->filter;
>  	struct ref_array_item *ref;
>  	struct commit *commit = NULL;
>  	unsigned int kind;
>  
>  	if (flag & REF_BAD_NAME) {
>  		warning(_("ignoring ref with broken name %s"), refname);
> -		return 0;
> +		return NULL;
>  	}
>  
>  	if (flag & REF_ISBROKEN) {
>  		warning(_("ignoring broken ref %s"), refname);
> -		return 0;
> +		return NULL;
>  	}
>  
>  	/* Obtain the current ref kind from filter_ref_kind() and ignore unwanted refs. */
>  	kind = filter_ref_kind(filter, refname);
>  	if (!(kind & filter->kind))
> -		return 0;
> +		return NULL;
>  
>  	if (!filter_pattern_match(filter, refname))
> -		return 0;
> +		return NULL;
>  
>  	if (filter_exclude_match(filter, refname))
> -		return 0;
> +		return NULL;
>  
>  	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
> -		return 0;
> +		return NULL;
>  
>  	/*
>  	 * A merge filter is applied on refs pointing to commits. Hence
> @@ -2811,15 +2804,15 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
>  	    filter->with_commit || filter->no_commit || filter->verbose) {
>  		commit = lookup_commit_reference_gently(the_repository, oid, 1);
>  		if (!commit)
> -			return 0;
> +			return NULL;
>  		/* We perform the filtering for the '--contains' option... */
>  		if (filter->with_commit &&
>  		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> -			return 0;
> +			return NULL;
>  		/* ...or for the `--no-contains' option */
>  		if (filter->no_commit &&
>  		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> -			return 0;
> +			return NULL;
>  	}
>  
>  	/*
> @@ -2827,11 +2820,32 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
>  	 * to do its job and the resulting list may yet to be pruned
>  	 * by maxcount logic.
>  	 */
> -	ref = ref_array_push(ref_cbdata->array, refname, oid);
> +	ref = new_ref_array_item(refname, oid);
>  	ref->commit = commit;
>  	ref->flag = flag;
>  	ref->kind = kind;
>  
> +	return ref;
> +}
> +
> +struct ref_filter_cbdata {
> +	struct ref_array *array;
> +	struct ref_filter *filter;
> +};
> +
> +/*
> + * A call-back given to for_each_ref().  Filter refs and keep them for
> + * later object processing.
> + */
> +static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
> +{
> +	struct ref_filter_cbdata *ref_cbdata = cb_data;
> +	struct ref_array_item *ref;
> +
> +	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
> +	if (ref)
> +		ref_array_append(ref_cbdata->array, ref);
> +
>  	return 0;
>  }
>  
> @@ -2967,26 +2981,12 @@ void filter_ahead_behind(struct repository *r,
>  	free(commits);
>  }
>  
> -/*
> - * API for filtering a set of refs. Based on the type of refs the user
> - * has requested, we iterate through those refs and apply filters
> - * as per the given ref_filter structure and finally store the
> - * filtered refs in the ref_array structure.
> - */
> -int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
> +static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
>  {
> -	struct ref_filter_cbdata ref_cbdata;
> -	int save_commit_buffer_orig;
>  	int ret = 0;
>  
> -	ref_cbdata.array = array;
> -	ref_cbdata.filter = filter;
> -
>  	filter->kind = type & FILTER_REFS_KIND_MASK;
>  
> -	save_commit_buffer_orig = save_commit_buffer;
> -	save_commit_buffer = 0;
> -
>  	init_contains_cache(&filter->internal.contains_cache);
>  	init_contains_cache(&filter->internal.no_contains_cache);
>  
> @@ -3001,20 +3001,43 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
>  		 * of filter_ref_kind().
>  		 */
>  		if (filter->kind == FILTER_REFS_BRANCHES)
> -			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
> +			ret = for_each_fullref_in("refs/heads/", fn, cb_data);
>  		else if (filter->kind == FILTER_REFS_REMOTES)
> -			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
> +			ret = for_each_fullref_in("refs/remotes/", fn, cb_data);
>  		else if (filter->kind == FILTER_REFS_TAGS)
> -			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
> +			ret = for_each_fullref_in("refs/tags/", fn, cb_data);
>  		else if (filter->kind & FILTER_REFS_ALL)
> -			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
> +			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
>  		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
> -			head_ref(ref_filter_handler, &ref_cbdata);
> +			head_ref(fn, cb_data);
>  	}
>  
>  	clear_contains_cache(&filter->internal.contains_cache);
>  	clear_contains_cache(&filter->internal.no_contains_cache);
>  
> +	return ret;
> +}
> +
> +/*
> + * API for filtering a set of refs. Based on the type of refs the user
> + * has requested, we iterate through those refs and apply filters
> + * as per the given ref_filter structure and finally store the
> + * filtered refs in the ref_array structure.
> + */
> +int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
> +{
> +	struct ref_filter_cbdata ref_cbdata;
> +	int save_commit_buffer_orig;
> +	int ret = 0;
> +
> +	ref_cbdata.array = array;
> +	ref_cbdata.filter = filter;
> +
> +	save_commit_buffer_orig = save_commit_buffer;
> +	save_commit_buffer = 0;
> +
> +	ret = do_filter_refs(filter, type, filter_one, &ref_cbdata);
> +
>  	/*  Filters that need revision walking */
>  	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
>  	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9] ref-filter.c: filter & format refs in the same callback
  2023-11-07  1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
@ 2023-11-07 10:49   ` Patrick Steinhardt
  2023-11-07 19:45     ` Victoria Dye
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:49 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

[-- Attachment #1: Type: text/plain, Size: 5290 bytes --]

On Tue, Nov 07, 2023 at 01:25:59AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Update 'filter_and_format_refs()' to try to perform ref filtering &
> formatting in a single ref iteration, without an intermediate 'struct
> ref_array'. This can only be done if no operations need to be performed on a
> pre-filtered array; specifically, if the refs are
> 
> - filtered on reachability,
> - sorted, or
> - formatted with ahead-behind information
> 
> they cannot be filtered & formatted in the same iteration. In that case,
> fall back on the current filter-then-sort-then-format flow.
> 
> This optimization substantially improves memory usage due to no longer
> storing a ref array in memory. In some cases, it also dramatically reduces
> runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
> all refs into a 'struct ref_array' to printing only the first ref).
> 
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  ref-filter.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 74 insertions(+), 6 deletions(-)
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index ff00ab4b8d8..384cf1595ff 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2863,6 +2863,44 @@ static void free_array_item(struct ref_array_item *item)
>  	free(item);
>  }
>  
> +struct ref_filter_and_format_cbdata {
> +	struct ref_filter *filter;
> +	struct ref_format *format;
> +
> +	struct ref_filter_and_format_internal {
> +		int count;
> +	} internal;
> +};
> +
> +static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
> +{
> +	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
> +	struct ref_array_item *ref;
> +	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
> +
> +	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
> +	if (!ref)
> +		return 0;
> +
> +	if (format_ref_array_item(ref, ref_cbdata->format, &output, &err))
> +		die("%s", err.buf);
> +
> +	if (output.len || !ref_cbdata->format->array_opts.omit_empty) {
> +		fwrite(output.buf, 1, output.len, stdout);
> +		putchar('\n');
> +	}
> +
> +	strbuf_release(&output);
> +	strbuf_release(&err);
> +	free_array_item(ref);
> +
> +	if (ref_cbdata->format->array_opts.max_count &&
> +	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
> +		return -1;

It feels a bit weird to return a negative value here, which usually
indicates that an error has happened whereas we only use it here to
abort the iteration. But we ignore the return value of
`do_iterate_refs()` anyway, so it doesn't make much of a difference.

> +	return 0;
> +}
> +
>  /* Free all memory allocated for ref_array */
>  void ref_array_clear(struct ref_array *array)
>  {
> @@ -3046,16 +3084,46 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
>  	return ret;
>  }
>  
> +static inline int can_do_iterative_format(struct ref_filter *filter,
> +					  struct ref_sorting *sorting,
> +					  struct ref_format *format)
> +{
> +	/*
> +	 * Refs can be filtered and formatted in the same iteration as long
> +	 * as we aren't filtering on reachability, sorting the results, or
> +	 * including ahead-behind information in the formatted output.
> +	 */

Do we want to format this as a bulleted list so that it's more readily
extensible if we ever need to pay attention to new options here? Also, I
noted that this commit doesn't add any new tests -- do we already
exercise all of these conditions?

More generally, I worry a bit about maintainability of this code snippet
as we need to remember to always update this condition whenever we add a
new option, and this can be quite easy to miss. The performance benefit
might be worth the effort though.

Patrick

> +	return !(filter->reachable_from ||
> +		 filter->unreachable_from ||
> +		 sorting ||
> +		 format->bases.nr);
> +}
> +
>  void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
>  			    struct ref_sorting *sorting,
>  			    struct ref_format *format)
>  {
> -	struct ref_array array = { 0 };
> -	filter_refs(&array, filter, type);
> -	filter_ahead_behind(the_repository, format, &array);
> -	ref_array_sort(sorting, &array);
> -	print_formatted_ref_array(&array, format);
> -	ref_array_clear(&array);
> +	if (can_do_iterative_format(filter, sorting, format)) {
> +		int save_commit_buffer_orig;
> +		struct ref_filter_and_format_cbdata ref_cbdata = {
> +			.filter = filter,
> +			.format = format,
> +		};
> +
> +		save_commit_buffer_orig = save_commit_buffer;
> +		save_commit_buffer = 0;
> +
> +		do_filter_refs(filter, type, filter_and_format_one, &ref_cbdata);
> +
> +		save_commit_buffer = save_commit_buffer_orig;
> +	} else {
> +		struct ref_array array = { 0 };
> +		filter_refs(&array, filter, type);
> +		filter_ahead_behind(the_repository, format, &array);
> +		ref_array_sort(sorting, &array);
> +		print_formatted_ref_array(&array, format);
> +		ref_array_clear(&array);
> +	}
>  }
>  
>  static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-07  1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
@ 2023-11-07 10:50   ` Patrick Steinhardt
  2023-11-08  1:13     ` Victoria Dye
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-07 10:50 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye

[-- Attachment #1: Type: text/plain, Size: 8763 bytes --]

On Tue, Nov 07, 2023 at 01:26:00AM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add a boolean flag '--full-deref' that, when enabled, fills '%(*fieldname)'
> format fields using the fully peeled target of tag objects, rather than the
> immediate target.
> 
> In other builtins ('rev-parse', 'show-ref'), "dereferencing" tags typically
> means peeling them down to their non-tag target. Unlike these commands,
> 'for-each-ref' dereferences only one "level" of tags in '*' format fields
> (like "%(*objectname)"). For most annotated tags, one level of dereferencing
> is enough, since most tags point to commits or trees. However, nested tags
> (annotated tags whose target is another annotated tag) dereferenced once
> will point to their target tag, different a full peel to e.g. a commit.
> 
> Currently, if a user wants to filter & format refs and include information
> about the fully dereferenced tag, they can do so with something like
> 'cat-file --batch-check':
> 
>     git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
>         git cat-file --batch-check="%(objectname) %(rest)"
> 
> But the combination of commands is inefficient. So, to improve the
> efficiency of this use case, add a '--full-deref' option that causes
> 'for-each-ref' to fully dereference tags when formatting with '*' fields.

I do wonder whether it would make sense to introduce this feature in the
form of a separate field prefix, as you also mentioned in your cover
letter. It would buy the user more flexibility, but the question is
whether such flexibility would really ever be needed.

The only thing I could really think of where it might make sense is to
distinguish tags that peel to a commit immediately from ones that don't.
That feels rather esoteric to me and doesn't seem to be of much use. But
regardless of whether or not we can see the usefulness now, if this
wouldn't be significantly more complex I wonder whether it would make
more sense to use a new field prefix instead anyway.

In any case, I think it would be helpful if this was discussed in the
commit message.

Patrick

> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  Documentation/git-for-each-ref.txt |  9 ++++++++
>  builtin/for-each-ref.c             |  2 ++
>  ref-filter.c                       | 26 ++++++++++++++---------
>  ref-filter.h                       |  1 +
>  t/t6300-for-each-ref.sh            | 34 ++++++++++++++++++++++++++++++
>  5 files changed, 62 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index 407f624fbaa..2714a87088e 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -11,6 +11,7 @@ SYNOPSIS
>  'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
>  		   [(--sort=<key>)...] [--format=<format>]
>  		   [ --stdin | <pattern>... ]
> +		   [--full-deref]
>  		   [--points-at=<object>]
>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>  		   [--contains[=<object>]] [--no-contains[=<object>]]
> @@ -77,6 +78,14 @@ OPTIONS
>  	the specified host language.  This is meant to produce
>  	a scriptlet that can directly be `eval`ed.
>  
> +--full-deref::
> +	Populate dereferenced format fields (indicated with an asterisk (`*`)
> +	prefix before the fieldname) with information about the fully-peeled
> +	target object of a tag ref, rather than its immediate target object.
> +	This only affects the output for nested annotated tags, where the tag's
> +	immediate target is another tag but its fully-peeled target is another
> +	object type (e.g. a commit).
> +
>  --points-at=<object>::
>  	Only list refs which points at the given object.
>  
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index 1c19cd5bd34..7a2127a3bc4 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -43,6 +43,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
>  		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
>  		OPT__COLOR(&format.use_color, N_("respect format colors")),
> +		OPT_BOOL(0, "full-deref", &format.full_deref,
> +			 N_("fully dereference tags to populate '*' format fields")),
>  		OPT_REF_FILTER_EXCLUDE(&filter),
>  		OPT_REF_SORT(&sorting_options),
>  		OPT_CALLBACK(0, "points-at", &filter.points_at,
> diff --git a/ref-filter.c b/ref-filter.c
> index 384cf1595ff..a66ac7921b1 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -237,7 +237,14 @@ static struct used_atom {
>  		char *head;
>  	} u;
>  } *used_atom;
> -static int used_atom_cnt, need_tagged, need_symref;
> +static int used_atom_cnt, need_symref;
> +
> +enum tag_dereference_mode {
> +	NO_DEREF = 0,
> +	DEREF_ONE,
> +	DEREF_ALL
> +};
> +static enum tag_dereference_mode need_tagged;
>  
>  /*
>   * Expand string, append it to strbuf *sb, then return error code ret.
> @@ -1066,8 +1073,8 @@ static int parse_ref_filter_atom(struct ref_format *format,
>  	memset(&used_atom[at].u, 0, sizeof(used_atom[at].u));
>  	if (valid_atom[i].parser && valid_atom[i].parser(format, &used_atom[at], arg, err))
>  		return -1;
> -	if (*atom == '*')
> -		need_tagged = 1;
> +	if (*atom == '*' && !need_tagged)
> +		need_tagged = format->full_deref ? DEREF_ALL : DEREF_ONE;
>  	if (i == ATOM_SYMREF)
>  		need_symref = 1;
>  	return at;
> @@ -2511,14 +2518,13 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>  	 * If it is a tag object, see if we use a value that derefs
>  	 * the object, and if we do grab the object it refers to.
>  	 */
> -	oi_deref.oid = *get_tagged_oid((struct tag *)obj);
> +	if (need_tagged == DEREF_ALL) {
> +		if (peel_iterated_oid(&obj->oid, &oi_deref.oid))
> +			die("bad tag");
> +	} else {
> +		oi_deref.oid = *get_tagged_oid((struct tag *)obj);
> +	}
>  
> -	/*
> -	 * NEEDSWORK: This derefs tag only once, which
> -	 * is good to deal with chains of trust, but
> -	 * is not consistent with what deref_tag() does
> -	 * which peels the onion to the core.
> -	 */
>  	return get_object(ref, 1, &obj, &oi_deref, err);
>  }
>  
> diff --git a/ref-filter.h b/ref-filter.h
> index 0ce5af58ab3..0caa39ecee5 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -92,6 +92,7 @@ struct ref_format {
>  	const char *rest;
>  	int quote_style;
>  	int use_color;
> +	int full_deref;
>  
>  	/* Internal state to ref-filter */
>  	int need_color_reset_at_eol;
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 0613e5e3623..3c2af785cdb 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -1839,6 +1839,40 @@ test_expect_success 'git for-each-ref with non-existing refs' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_success 'git for-each-ref with nested tags' '
> +	git tag -am "Normal tag" nested/base HEAD &&
> +	git tag -am "Nested tag" nested/nest1 refs/tags/nested/base &&
> +	git tag -am "Double nested tag" nested/nest2 refs/tags/nested/nest1 &&
> +
> +	head_oid="$(git rev-parse HEAD)" &&
> +	base_tag_oid="$(git rev-parse refs/tags/nested/base)" &&
> +	nest1_tag_oid="$(git rev-parse refs/tags/nested/nest1)" &&
> +	nest2_tag_oid="$(git rev-parse refs/tags/nested/nest2)" &&
> +
> +	# Without full dereference
> +	cat >expect <<-EOF &&
> +	refs/tags/nested/base $base_tag_oid tag $head_oid commit
> +	refs/tags/nested/nest1 $nest1_tag_oid tag $base_tag_oid tag
> +	refs/tags/nested/nest2 $nest2_tag_oid tag $nest1_tag_oid tag
> +	EOF
> +
> +	git for-each-ref --format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
> +		refs/tags/nested/ >actual &&
> +	test_cmp expect actual &&
> +
> +	# With full dereference
> +	cat >expect <<-EOF &&
> +	refs/tags/nested/base $base_tag_oid tag $head_oid commit
> +	refs/tags/nested/nest1 $nest1_tag_oid tag $head_oid commit
> +	refs/tags/nested/nest2 $nest2_tag_oid tag $head_oid commit
> +	EOF
> +
> +	git for-each-ref --full-deref \
> +		--format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
> +		refs/tags/nested/ >actual &&
> +	test_cmp expect actual
> +'
> +
>  GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
>  TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
>  
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort
  2023-11-07 10:49   ` Patrick Steinhardt
@ 2023-11-07 18:13     ` Victoria Dye
  0 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-07 18:13 UTC (permalink / raw)
  To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git

Patrick Steinhardt wrote:
> On Tue, Nov 07, 2023 at 01:25:53AM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if
>> the string list provided to it is empty, rather than returning the default
>> refname sort structure. Also update 'ref_array_sort()' to explicitly skip
>> sorting if its 'struct ref_sorting *' arg is NULL. Other functions using
>> 'struct ref_sorting *' do not need any changes because they already properly
>> ignore NULL values.
>>
>> The goal of this change is to have the '--no-sort' option truly disable
>> sorting in commands like 'for-each-ref, 'tag', and 'branch'. Right now,
>> '--no-sort' will still trigger refname sorting by default in 'for-each-ref',
>> 'tag', and 'branch'.
>>
>> To match existing behavior as closely as possible, explicitly add "refname"
>> to the list of sort keys in 'for-each-ref', 'tag', and 'branch' before
>> parsing options (if no config-based sort keys are set). This ensures that
>> sorting will only be fully disabled if '--no-sort' is provided as an option;
>> otherwise, "refname" sorting will remain the default. Note: this also means
>> that even when sort keys are provided on the command line, "refname" will be
>> the final sort key in the sorting structure. This doesn't actually change
>> any behavior, since 'compare_refs()' already falls back on comparing
>> refnames if two refs are equal w.r.t all other sort keys.
>>
>> Finally, remove the condition around sorting in 'ls-remote', since it's no
>> longer necessary. Unlike 'for-each-ref' et. al., it does *not* set any sort
>> keys by default. The default empty list of sort keys will produce a NULL
>> 'struct ref_sorting *', which causes the sorting to be skipped in
>> 'ref_array_sort()'.
> 
> I found the order in this commit message a bit funny because you first
> explain what you're doing, then explain the goal, and then jump into the
> changes again. The message might be a bit easier to read if the goal was
> stated up front.

I'll try to restructure it.

> 
> I was also briefly wondering whether it would make sense to split up
> this commit, as you're doing two different things:
> 
>     - Refactor how git-for-each-ref(1), git-tag(1) and git-branch(1) set
>       up their default sorting.
> 
>     - Change `ref_array_sort()` to not sort when its sorting option is
>       `NULL`.
> 
> If this was split up into two commits, then the result might be a bit
> easier to reason about. But I don't feel strongly about this.

The addition of "refname" to the sorting defaults really only makes sense in
the context of needing it to update 'ref_array_sort()', though. While you
can convey some of that in a commit message, when reading through commits
(mine and others') I find it much easier to contextualize small refactors
with their associated behavior change if they're done in a single patch.
There's a limit to that, of course; even within this series I have a lot of
"this will make sense later" commit messages (more than I'd like really)
because the refactors are large & varied enough that they'd be overwhelming
if squashed into a single patch.

So, while I definitely see where you're coming from, I think this patch is
better off not being split.

>> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
>> index fc765754305..436249b720c 100644
>> --- a/builtin/ls-remote.c
>> +++ b/builtin/ls-remote.c
>> @@ -58,6 +58,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>>  	struct transport *transport;
>>  	const struct ref *ref;
>>  	struct ref_array ref_array;
>> +	struct ref_sorting *sorting;
>>  	struct string_list sorting_options = STRING_LIST_INIT_DUP;
>>  
>>  	struct option options[] = {
>> @@ -141,13 +142,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>>  		item->symref = xstrdup_or_null(ref->symref);
>>  	}
>>  
>> -	if (sorting_options.nr) {
>> -		struct ref_sorting *sorting;
>> -
>> -		sorting = ref_sorting_options(&sorting_options);
>> -		ref_array_sort(sorting, &ref_array);
>> -		ref_sorting_release(sorting);
>> -	}
>> +	sorting = ref_sorting_options(&sorting_options);
>> +	ref_array_sort(sorting, &ref_array);
> 
> We stopped calling `ref_sorting_release()`. Doesn't that cause us to
> leak memory?

Nice catch, thanks! It should have been moved to the end of this function
(right before the 'ref_array_clear()').

>> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
>> index 3182abde27f..9918ba05dec 100755
>> --- a/t/t3200-branch.sh
>> +++ b/t/t3200-branch.sh
>> @@ -1606,10 +1608,70 @@ test_expect_success 'option override configured sort' '
>>  	)
>>  '
>>  
>> +test_expect_success '--no-sort cancels config sort keys' '
>> +	test_config -C sort branch.sort "-refname" &&
>> +
>> +	(
>> +		cd sort &&
>> +
>> +		# objecttype is identical for all of them, so sort falls back on
>> +		# default (ascending refname)
>> +		git branch \
>> +			--no-sort \
>> +			--sort="objecttype" >actual &&
> 
> This test is a bit confusing to me. Shouldn't we in fact ignore the
> configured sorting order as soon as we pass `--sort=` anyway? In other
> words, I would expect the `--no-sort` option to not make a difference
> here. What should make a difference is if you _only_ passed `--no-sort`.

The existing behavior (as demonstrated by this test) is that the command
line sort keys append to, rather than replace, the config-based sort keys. I
don't see any evidence in the commit history to indicate that this was an
intentional design decision, but it's not necessarily incorrect either.

For one, it's not universal in string list options that the command line
replaces the config. There are examples of both approaches to string list
options in other commands:

- in 'git push', specifying '--push-option' on the command line even once
  will remove any values set by 'push.pushoption'
- in 'git blame', any values specified with '--ignore-revs-file' are
  appended to those set by 'blame.ignorerevsfile'

In the case of 'git (tag|branch)', I can see why users might not want
command line sort keys to completely remove config-based ones. The only time
the config-based keys will come into play is when two entries are identical
w.r.t _all_ of the command line sort keys. In that scenario, I'd expect a
user would want to use their configured defaults to "break the tie" instead
of the hardcoded ascending refname sort. If they do actually want to remove
the config keys, they can set '--no-sort' before their other sort keys.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/9] ref-filter.c: refactor to create common helper functions
  2023-11-07 10:49   ` Patrick Steinhardt
@ 2023-11-07 18:41     ` Victoria Dye
  0 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-07 18:41 UTC (permalink / raw)
  To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git

Patrick Steinhardt wrote:
> On Tue, Nov 07, 2023 at 01:25:58AM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
>> 'filter_refs()' into new helper functions ('ref_array_append()',
>> 'apply_ref_filter()', and 'do_filter_refs()' respectively), as well as
>> rename 'ref_filter_handler()' to 'filter_one()'. In this and later
>> patches, these helpers will be used by new ref-filter API functions. This
>> patch does not result in any user-facing behavior changes or changes to
>> callers outside of 'ref-filter.c'.
>>
>> The changes are as follows:
>>
>> * The logic to grow a 'struct ref_array' and append a given 'struct
>>   ref_array_item *' to it is extracted from 'ref_array_push()' into
>>   'ref_array_append()'.
>> * 'ref_filter_handler()' is renamed to 'filter_one()' to more clearly
>>   distinguish it from other ref filtering callbacks that will be added in
>>   later patches. The "*_one()" naming convention is common throughout the
>>   codebase for iteration callbacks.
>> * The code to filter a given ref by refname & object ID then create a new
>>   'struct ref_array_item' is moved out of 'filter_one()' and into
>>   'apply_ref_filter()'. 'apply_ref_filter()' returns either NULL (if the ref
>>   does not match the given filter) or a 'struct ref_array_item *' created
>>   with 'new_ref_array_item()'; 'filter_one()' appends that item to
>>   its ref array with 'ref_array_append()'.
>> * The filter pre-processing, contains cache creation, and ref iteration of
>>   'filter_refs()' is extracted into 'do_filter_refs()'. 'do_filter_refs()'
>>   takes its ref iterator function & callback data as an input from the
>>   caller, setting it up to be used with additional filtering callbacks in
>>   later patches.
> 
> To me, a bulleted list spelling out the different changes I'm doing
> often indicates that I might want to split up the commit into one for
> each of the items. I don't feel strongly about this, but think that it
> might help the reviewer in this case.

While that's a good guideline to keep in mind, it's not universally
applicable. In this case, (almost) all of the changes are done the same way,
focused on the same goal: extract bits of 'filter_refs()' into generic,
internal helpers so we can use those bits elsewhere in later patches.
Splitting those extractions into multiple patches would essentially lead to
a handful of very small patches that more-or-less have the same commit
message. As I mentioned in [1], I think there's value to having the
immediate context of related changes in a single patch (as long as that
single patch doesn't become unwieldy), so I'm not inclined to split this up.

That said, I did say "(almost) all" of the changes are conceptually similar.
Looking at this now, the rename of 'ref_filter_handler()' => 'filter_one()'
doesn't really fit the "extract into helper functions" theme of the rest of
the patch, I'll pull that out into its own.

[1] https://lore.kernel.org/git/a833b5a7-0201-4c2e-8821-f2a1930cb403@github.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count
  2023-11-07  1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
@ 2023-11-07 19:23   ` Øystein Walle
  2023-11-07 19:30     ` Victoria Dye
  0 siblings, 1 reply; 49+ messages in thread
From: Øystein Walle @ 2023-11-07 19:23 UTC (permalink / raw)
  To: gitgitgadget; +Cc: git, vdye, Øystein Walle

Hi Victoria,

Victoria Dye <vdye@github.com> writes:

> Update the 'for-each-ref' builtin documentation to clarify that refs
> "omitted" by --omit-empty are still counted toward the limit specified
> by --count. The use of the term "omit" would otherwise be somewhat
> ambiguous and could incorrectly be construed as excluding empty refs
> entirely (i.e. not counting them towards the total ref count).

I implemented --omit-empty and I completely overlooked --count!

(If I were to do it all over again I probably would have implemented it
so that so-called omitted refs did not count towards the total. It makes
sense to me since e.g.  `git log -3 -- git.c` prints the three most
recent commits that touch git.c regardless of how many commits were
walked in the process.)

This is a good and welcome clarification. 

Acked-by: Øystein Walle <oystwa@gmail.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count
  2023-11-07 19:23   ` Øystein Walle
@ 2023-11-07 19:30     ` Victoria Dye
  2023-11-08  7:53       ` Øystein Walle
  0 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye @ 2023-11-07 19:30 UTC (permalink / raw)
  To: Øystein Walle, gitgitgadget; +Cc: git

Øystein Walle wrote:
> Hi Victoria,
> 
> Victoria Dye <vdye@github.com> writes:
> 
>> Update the 'for-each-ref' builtin documentation to clarify that refs
>> "omitted" by --omit-empty are still counted toward the limit specified
>> by --count. The use of the term "omit" would otherwise be somewhat
>> ambiguous and could incorrectly be construed as excluding empty refs
>> entirely (i.e. not counting them towards the total ref count).
> 
> I implemented --omit-empty and I completely overlooked --count!
> 
> (If I were to do it all over again I probably would have implemented it
> so that so-called omitted refs did not count towards the total. It makes
> sense to me since e.g.  `git log -3 -- git.c` prints the three most
> recent commits that touch git.c regardless of how many commits were
> walked in the process.)

Since the interaction isn't clearly defined at the moment, we could probably
still update it to work like you're describing here. I'm happy to drop this
patch and implement your recommendation in a follow-up series. Let me know
what you think!

> 
> This is a good and welcome clarification. 
> 
> Acked-by: Øystein Walle <oystwa@gmail.com>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9] ref-filter.c: filter & format refs in the same callback
  2023-11-07 10:49   ` Patrick Steinhardt
@ 2023-11-07 19:45     ` Victoria Dye
  0 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-07 19:45 UTC (permalink / raw)
  To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git

Patrick Steinhardt wrote:
>> diff --git a/ref-filter.c b/ref-filter.c
>> index ff00ab4b8d8..384cf1595ff 100644
>> --- a/ref-filter.c
>> +++ b/ref-filter.c
>> @@ -2863,6 +2863,44 @@ static void free_array_item(struct ref_array_item *item)
>>  	free(item);
>>  }
>>  
>> +struct ref_filter_and_format_cbdata {
>> +	struct ref_filter *filter;
>> +	struct ref_format *format;
>> +
>> +	struct ref_filter_and_format_internal {
>> +		int count;
>> +	} internal;
>> +};
>> +
>> +static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
>> +{
>> +	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
>> +	struct ref_array_item *ref;
>> +	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
>> +
>> +	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
>> +	if (!ref)
>> +		return 0;
>> +
>> +	if (format_ref_array_item(ref, ref_cbdata->format, &output, &err))
>> +		die("%s", err.buf);
>> +
>> +	if (output.len || !ref_cbdata->format->array_opts.omit_empty) {
>> +		fwrite(output.buf, 1, output.len, stdout);
>> +		putchar('\n');
>> +	}
>> +
>> +	strbuf_release(&output);
>> +	strbuf_release(&err);
>> +	free_array_item(ref);
>> +
>> +	if (ref_cbdata->format->array_opts.max_count &&
>> +	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
>> +		return -1;
> 
> It feels a bit weird to return a negative value here, which usually
> indicates that an error has happened whereas we only use it here to
> abort the iteration. But we ignore the return value of
> `do_iterate_refs()` anyway, so it doesn't make much of a difference.

I'll update it to 1, and also add a comment that the non-zero return value
stops iteration since it's not immediately clear from other 'each_ref_fn's
what that means. For reference, there appears to only be one other
'each_ref_fn' that even has the potential to return a nonzero return value
('ref_present()' in 'refs/files-backend.c).

> 
>> +	return 0;
>> +}
>> +
>>  /* Free all memory allocated for ref_array */
>>  void ref_array_clear(struct ref_array *array)
>>  {
>> @@ -3046,16 +3084,46 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
>>  	return ret;
>>  }
>>  
>> +static inline int can_do_iterative_format(struct ref_filter *filter,
>> +					  struct ref_sorting *sorting,
>> +					  struct ref_format *format)
>> +{
>> +	/*
>> +	 * Refs can be filtered and formatted in the same iteration as long
>> +	 * as we aren't filtering on reachability, sorting the results, or
>> +	 * including ahead-behind information in the formatted output.
>> +	 */
> 
> Do we want to format this as a bulleted list so that it's more readily
> extensible if we ever need to pay attention to new options here? Also, I
> noted that this commit doesn't add any new tests -- do we already
> exercise all of these conditions?

Sure, I'll convert it to a bulleted list. I don't really expect it to change
much, though; to have any effect on this condition, the new filter/format
would need to act on the pre-filtered ref_array, which isn't particularly
common.

And yes, the existing tests cover scenarios where this function returns true
(e.g. 'git for-each-ref --no-sort') & where it returns false (essentially
anything else).

> 
> More generally, I worry a bit about maintainability of this code snippet
> as we need to remember to always update this condition whenever we add a
> new option, and this can be quite easy to miss. The performance benefit
> might be worth the effort though.

I'll add more detailed comments to clarify what's going on here.

In practice, though, I don't think this would be all that easy to miss. As I
noted above, the only filters/formats that affect this are ones that need to
loop over an entire filtered ref_array after the initial
'for_each_fullref_in()'. To have it actually apply to commands that use
'filter_and_format_refs()', they'll need to add that behavior here (like
'filter_ahead_behind()'), where it should be apparent that
'can_do_iterative_format()' is relevant to their change. 

> 
> Patrick

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-07 10:50   ` Patrick Steinhardt
@ 2023-11-08  1:13     ` Victoria Dye
  2023-11-08  3:14       ` Junio C Hamano
  0 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye @ 2023-11-08  1:13 UTC (permalink / raw)
  To: Patrick Steinhardt, Victoria Dye via GitGitGadget; +Cc: git

Patrick Steinhardt wrote:
> On Tue, Nov 07, 2023 at 01:26:00AM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Add a boolean flag '--full-deref' that, when enabled, fills '%(*fieldname)'
>> format fields using the fully peeled target of tag objects, rather than the
>> immediate target.
>>
>> In other builtins ('rev-parse', 'show-ref'), "dereferencing" tags typically
>> means peeling them down to their non-tag target. Unlike these commands,
>> 'for-each-ref' dereferences only one "level" of tags in '*' format fields
>> (like "%(*objectname)"). For most annotated tags, one level of dereferencing
>> is enough, since most tags point to commits or trees. However, nested tags
>> (annotated tags whose target is another annotated tag) dereferenced once
>> will point to their target tag, different a full peel to e.g. a commit.
>>
>> Currently, if a user wants to filter & format refs and include information
>> about the fully dereferenced tag, they can do so with something like
>> 'cat-file --batch-check':
>>
>>     git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
>>         git cat-file --batch-check="%(objectname) %(rest)"
>>
>> But the combination of commands is inefficient. So, to improve the
>> efficiency of this use case, add a '--full-deref' option that causes
>> 'for-each-ref' to fully dereference tags when formatting with '*' fields.
> 
> I do wonder whether it would make sense to introduce this feature in the
> form of a separate field prefix, as you also mentioned in your cover
> letter. It would buy the user more flexibility, but the question is
> whether such flexibility would really ever be needed.
> 
> The only thing I could really think of where it might make sense is to
> distinguish tags that peel to a commit immediately from ones that don't.
> That feels rather esoteric to me and doesn't seem to be of much use. But
> regardless of whether or not we can see the usefulness now, if this
> wouldn't be significantly more complex I wonder whether it would make
> more sense to use a new field prefix instead anyway.
> 
> In any case, I think it would be helpful if this was discussed in the
> commit message.
I've been going back and forth on this, but I think a field specifier might
be the way to go after all. Using a field specifier would inherently be more
complex than the command line option (since the formatting code is a bit
complicated), but that's not an insurmountable problem. The thing I kept
getting caught up on was which symbol (or symbols?) to use to indicate a full
object peel. I mentioned `**fieldname` in the cover letter, but that looks
more like a double dereference than a recursive one.

I think `^{}fieldname` would be a good candidate, but it's *extremely*
important (for the sake of avoiding user confusion/frustration) that it
produces the same object & associated info as the standard revision parsing
machinery [1]. One notable difference (it might be the only one) from
`*fieldname` would be, if a ref points to a non-tag object, then that
object's information would printed (rather than an empty string). But maybe
that difference is what we'd want anyway, since it's a better one-for-one
replacement of 'git for-each-ref | git cat-file --batch-check'.

I'll try implementing that for V2. If it doesn't work for some reason,
though, I'll explain why in the commit message.

[1] https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt-emltrevgtemegemv0998em

> 
> Patrick
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/9] for-each-ref optimizations & usability improvements
  2023-11-07 10:49     ` Patrick Steinhardt
@ 2023-11-08  1:31       ` Victoria Dye
  0 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-08  1:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, Victoria Dye via GitGitGadget, git

Patrick Steinhardt wrote:
> On Mon, Nov 06, 2023 at 06:48:29PM -0800, Victoria Dye wrote:
>> Junio C Hamano wrote:
>>> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> [snip]
>>>>  * I'm not attached to '--full-deref' as a name - if someone has an idea for
>>>>    a more descriptive name, please suggest it!
>>>
>>> Another candidate verb may be "to peel", and I have no strong
>>> opinion between it and "to dereference".  But I have a mild aversion
>>> to an abbreviation that is not strongly established.
>>>
>>
>> Makes sense. I got the "deref" abbreviation for 'update-ref --no-deref', but
>> 'show-ref' has a "--dereference" option and protocol v2's "ls-refs" includes
>> a "peel" arg. "Dereference" is the term already used in the 'for-each-ref'
>> documentation, though, so if no one comes in with an especially strong
>> opinion on this I'll change the option to '--full-dereference'. Thanks!
> 
> But doesn't dereferencing in the context of git-update-ref(1) refer to
> something different? It's not about tags, but it is about symbolic
> references and whether we want to update the symref or the pointee. But
> true enough, in git-show-ref(1) "dereference" actually means that we
> should peel the tag.

Since both annotated tags and symbolic refs are essentially pointers, it's
not surprising that they both use the term "dereference." Even though
"deref" refers to symbolic refs in 'update-ref', its existence as an
abbreviation for "dereference" is relevant when coming up with a way to
abbreviate "dereference" when referring to tags.

> 
> To me it feels like preexisting commands are confused already. In my
> mind model:
> 
>     - "peel" means that an object gets resolved to one of its pointees.
>       This also includes the case here, where a tag gets peeled to its
>       pointee.
> 
>     - "dereference" means that a symbolic reference gets resolved to its
>       pointee. This matches what we do in `git update-ref --no-deref`.
> 
> But after reading through the code I don't think we distinguish those
> terms cleanly throughout our codebase. Still, "peeling" feels like a
> better match in my opinion.

Hmm. I think I mostly agree on your definition of "peel". In the docs, it's
used to refer to:

- recursively resolving an OID to an object of a specified type [1]
- recursively resolving a tag OID to a non-tag object [2]

Notably, there seems to be a strong association of "peeling" to "recursive
resolution". Which means it doesn't necessarily describe what "*" currently
does.

"Dereference" generally seems like a looser term than what you've suggested.
It does refer to symbolic ref resolution as you describe [3], but "recursive
dereference" is definitely also a synonym for "peel" [4]. That, combined
with the fact that "*" is the "dereference operator", leads me to believe
that "%(*fieldname)" would accurately be described as a "tag dereference"
field in the context of 'for-each-ref'.

As I mentioned in [5], I'm going to try adding this functionality with a
field specifier rather than a command line option, so the name of the option
might be moot. But, since dereferencing/peeling will still be relevant to
the changes, I'll make sure the terminology I use in the documentation is as
precise as possible (i.e., use "peel" where I previously used "fully
dereference").

Separately, this has inspired me to revisit something I've been putting off,
which is to add a definition for "peel" (and now probably "dereference" as
well) in 'gitglossary'. I'll try to send that out in the next couple days.

Thanks!

[1] https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---verify
[2] https://git-scm.com/docs/gitprotocol-v2#_ls_refs
[3] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefsymrefasymref
[4] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefcommit-ishacommit-ishalsocommittish
[5] https://lore.kernel.org/git/cf691b7c-288f-4cc9-a2ac-1a43972ae446@github.com/

> 
> Patrick


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-08  1:13     ` Victoria Dye
@ 2023-11-08  3:14       ` Junio C Hamano
  2023-11-08  7:19         ` Patrick Steinhardt
  2023-11-08 18:02         ` Victoria Dye
  0 siblings, 2 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-08  3:14 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git

Victoria Dye <vdye@github.com> writes:

> I think `^{}fieldname` would be a good candidate, but it's *extremely*

Gaah.  Why?  fieldname^{} I may understand, but in the prefix form?

In any case, has anybody considered that we may be better off to
declare that "*field" peeling a tag only once is a longstanding bug?

IOW, can we not add "fully peel" command line option or a new syntax
and instead just "fix" the bug to fully peel when "*field" is asked
for?

An application that cares about handling a chain of annotatetd tags
would want to be able to say "this is the outermost tag's
information; one level down, the tag was signed by this person;
another level down, the tag was signed by this person, etc."  which
would mean either

 * we have a syntax that shows the information from all levels
   (e.g., "**taggername" may say "Victoria\nPatrick\nGitster")

 * we have a syntax that allows to specify how many levels to peel,
   (e.g., "*0*taggername" may be the same as "taggername",
   "*1*taggername" may be the same as "*taggername") plus some
   programming construct like variables and loops.

but the repertoire being proposed that consists only of "peel only
once" and "peel all levels" is way too insufficient.

Note that I do not advocate for allowing inspection of each levels
separately.  Quite the contrary.  I would say that --format=<>
placeholder should not be a programming language to satisify such a
niche need.  And my conclusion from that stance is "peel once" plus
"peel all" are already one level too many, and "peel once" was a
very flawed implementation from day one, when 9f613ddd (Add
git-for-each-ref: helper for language bindings, 2006-09-15)
introduced it.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-08  3:14       ` Junio C Hamano
@ 2023-11-08  7:19         ` Patrick Steinhardt
  2023-11-08 18:02         ` Victoria Dye
  1 sibling, 0 replies; 49+ messages in thread
From: Patrick Steinhardt @ 2023-11-08  7:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Victoria Dye, Victoria Dye via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]

On Wed, Nov 08, 2023 at 12:14:02PM +0900, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
> > I think `^{}fieldname` would be a good candidate, but it's *extremely*
> 
> Gaah.  Why?  fieldname^{} I may understand, but in the prefix form?
> 
> In any case, has anybody considered that we may be better off to
> declare that "*field" peeling a tag only once is a longstanding bug?
> 
> IOW, can we not add "fully peel" command line option or a new syntax
> and instead just "fix" the bug to fully peel when "*field" is asked
> for?

I see where you're coming from, but I wonder whether this wouldn't break
scripts. To me, the documentation seems to explicitly state that this
will only deref tags once:

    If fieldname is prefixed with an asterisk (*) and the ref points at
    a tag object, use the value for the field in the object which the
    tag object refers to (instead of the field in the tag object).

So changing that now would break both the documented and the actual
behaviour. Now whether anybody actually cares about such a breaking
change is of course a different question, and you're probably correct
that in practice nobody does.

Patrick

> An application that cares about handling a chain of annotatetd tags
> would want to be able to say "this is the outermost tag's
> information; one level down, the tag was signed by this person;
> another level down, the tag was signed by this person, etc."  which
> would mean either
> 
>  * we have a syntax that shows the information from all levels
>    (e.g., "**taggername" may say "Victoria\nPatrick\nGitster")
> 
>  * we have a syntax that allows to specify how many levels to peel,
>    (e.g., "*0*taggername" may be the same as "taggername",
>    "*1*taggername" may be the same as "*taggername") plus some
>    programming construct like variables and loops.
> 
> but the repertoire being proposed that consists only of "peel only
> once" and "peel all levels" is way too insufficient.
> 
> Note that I do not advocate for allowing inspection of each levels
> separately.  Quite the contrary.  I would say that --format=<>
> placeholder should not be a programming language to satisify such a
> niche need.  And my conclusion from that stance is "peel once" plus
> "peel all" are already one level too many, and "peel once" was a
> very flawed implementation from day one, when 9f613ddd (Add
> git-for-each-ref: helper for language bindings, 2006-09-15)
> introduced it.
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count
  2023-11-07 19:30     ` Victoria Dye
@ 2023-11-08  7:53       ` Øystein Walle
  2023-11-08 10:00         ` Kristoffer Haugsbakk
  0 siblings, 1 reply; 49+ messages in thread
From: Øystein Walle @ 2023-11-08  7:53 UTC (permalink / raw)
  To: Victoria Dye; +Cc: gitgitgadget, git

On Tue, 7 Nov 2023 at 20:30, Victoria Dye <vdye@github.com> wrote:

> Since the interaction isn't clearly defined at the moment, we could probably
> still update it to work like you're describing here. I'm happy to drop this
> patch and implement your recommendation in a follow-up series. Let me know
> what you think!

Regardless of whether the logic is changed in a follow-up series or not
I think the current behavior is worth documenting even if it doesn't
exist for much longer in the tree. So I am favor of having this patch as
part of this series.

I am also in favor of changing the behavior but that warrants a separate
series and discussion.

Øsse

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count
  2023-11-08  7:53       ` Øystein Walle
@ 2023-11-08 10:00         ` Kristoffer Haugsbakk
  0 siblings, 0 replies; 49+ messages in thread
From: Kristoffer Haugsbakk @ 2023-11-08 10:00 UTC (permalink / raw)
  To: Øystein Walle; +Cc: Josh Soref, git, Victoria Dye

On Wed, Nov 8, 2023, at 08:53, Øystein Walle wrote:
> On Tue, 7 Nov 2023 at 20:30, Victoria Dye <vdye@github.com> wrote:
>
>> Since the interaction isn't clearly defined at the moment, we could probably
>> still update it to work like you're describing here. I'm happy to drop this
>> patch and implement your recommendation in a follow-up series. Let me know
>> what you think!
>
> Regardless of whether the logic is changed in a follow-up series or not
> I think the current behavior is worth documenting even if it doesn't
> exist for much longer in the tree. So I am favor of having this patch as
> part of this series.

The funny thing though is that once it’s documented then you also kind of
commit yourself to it, right? That it’s how it’s supposed to behave.[1] If
you instead change the behavior (to the correct one) and document it in
the same series then there is no in-between time when people can claim to
rely on it via the documentation.

[1] Modulo “subject to change” hedging, but it seems that even
    experimental commands who are documented as that are now resistant to
    change in practice.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-08  3:14       ` Junio C Hamano
  2023-11-08  7:19         ` Patrick Steinhardt
@ 2023-11-08 18:02         ` Victoria Dye
  2023-11-09  1:22           ` Junio C Hamano
  2023-11-09  1:23           ` Junio C Hamano
  1 sibling, 2 replies; 49+ messages in thread
From: Victoria Dye @ 2023-11-08 18:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git

Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> I think `^{}fieldname` would be a good candidate, but it's *extremely*
> 
> Gaah.  Why?  fieldname^{} I may understand, but in the prefix form?

'fieldname^{}' seemed like more of a misuse of "^{}" than the prefixed form,
since we're not peeling "fieldname" but instead getting the value of
"fieldname" from the peeled tag. But then we're not dereferencing
"fieldname" in '*fieldname' either, so 'fieldname^{}' is no worse than what
already exists.

> 
> In any case, has anybody considered that we may be better off to
> declare that "*field" peeling a tag only once is a longstanding bug?
> 
> IOW, can we not add "fully peel" command line option or a new syntax
> and instead just "fix" the bug to fully peel when "*field" is asked
> for?

I'd certainly prefer that from a technical standpoint; it simplifies this
patch if I can just replace 'get_tagged_oid' with 'peel_iterated_oid'. The
two things that make me hesitate are:

1. There isn't a straightforward 1:1 substitute available for getting info
   on the immediate target of a list of tags. 
2. The performance of a recursive peel can be worse than that of a single
   tag dereference, since (unless the formatting is done in a ref_iterator
   iteration *and* the tag is a packed ref) the dereferenced object needs to
   be resolved to determine whether it's another tag or not.

#1 may not be an issue in practice, but I don't have enough information on
how applications use that formatting atom to say for sure. #2 is a bigger
issue, IMO, since one of the goals of this series was to improve performance
for some cases of 'for-each-ref' without hurting it in others.

> An application that cares about handling a chain of annotatetd tags
> would want to be able to say "this is the outermost tag's
> information; one level down, the tag was signed by this person;
> another level down, the tag was signed by this person, etc."  which
> would mean either
> 
>  * we have a syntax that shows the information from all levels
>    (e.g., "**taggername" may say "Victoria\nPatrick\nGitster")
> 
>  * we have a syntax that allows to specify how many levels to peel,
>    (e.g., "*0*taggername" may be the same as "taggername",
>    "*1*taggername" may be the same as "*taggername") plus some
>    programming construct like variables and loops.
> 
> but the repertoire being proposed that consists only of "peel only
> once" and "peel all levels" is way too insufficient.
> 
> Note that I do not advocate for allowing inspection of each levels
> separately.  Quite the contrary.  I would say that --format=<>
> placeholder should not be a programming language to satisify such a
> niche need.  And my conclusion from that stance is "peel once" plus
> "peel all" are already one level too many, and "peel once" was a
> very flawed implementation from day one, when 9f613ddd (Add
> git-for-each-ref: helper for language bindings, 2006-09-15)
> introduced it.

I can (and would like to) deprecate the "peel once" behavior and replace it
with "peel all", but with how long it's been around and the potential
performance impact, such a change should probably be clearly communicated.
How that happens depends on how aggressively we want to cut over. We could:

1. Change the behavior of '*' from single dereference to recursive
   dereference, make a note of it in the documentation.
2. Same as #1, but also add an option like '--no-recursive-dereference' or
   something to use the old behavior. Remove the option after 1-2 release
   cycles?
3. Add a new format specifier '^{}', note that '*' is deprecated in the
   docs.
4. Same as #3, but also show a warning/advice if '*' is used.
5. Same as #3, but die() if '*' is used.

I'm open to other options, those were just the first few I could think of. 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-08 18:02         ` Victoria Dye
@ 2023-11-09  1:22           ` Junio C Hamano
  2023-11-09  1:23           ` Junio C Hamano
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-09  1:22 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git

Victoria Dye <vdye@github.com> writes:

> I can (and would like to) deprecate the "peel once" behavior and replace it
> with "peel all", but with how long it's been around and the potential
> performance impact, such a change should probably be clearly communicated.

I've written a fairly detailed response on this about the reason why
I think that "leave a mention in the backward compatibility notes
section of the release notes" (your #1) is sufficient, but it seems
to have been lost in the ether.  I'll wait a bit and if the previous
response does not materialize, I may type it again.

But in addition to what I wrote there, there is this thread [*] from
2019 that indicates that our position is to mildly discourage
tag-to-tag in the first place.


[Reference]

* https://lore.kernel.org/git/20190404020226.GG4409@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-08 18:02         ` Victoria Dye
  2023-11-09  1:22           ` Junio C Hamano
@ 2023-11-09  1:23           ` Junio C Hamano
  2023-11-09  1:32             ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2023-11-09  1:23 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git

Victoria Dye <vdye@github.com> writes:

> I'd certainly prefer that from a technical standpoint; it simplifies this
> patch if I can just replace 'get_tagged_oid' with 'peel_iterated_oid'. The
> two things that make me hesitate are:
>
> 1. There isn't a straightforward 1:1 substitute available for getting info
>    on the immediate target of a list of tags. 
> 2. The performance of a recursive peel can be worse than that of a single
>    tag dereference, since (unless the formatting is done in a ref_iterator
>    iteration *and* the tag is a packed ref) the dereferenced object needs to
>    be resolved to determine whether it's another tag or not.
>
> #1 may not be an issue in practice, but I don't have enough information on
> how applications use that formatting atom to say for sure. #2 is a bigger
> issue, IMO, since one of the goals of this series was to improve performance
> for some cases of 'for-each-ref' without hurting it in others.

In a repository without any tag-to-tag at tips of refs, would #2
above still be an issue?  My assumption when I raised "isn't this
simply a bug?" question was that the use of tag-to-tag is a mere
intellectual curiosity, there is no serious use case, and they are
not heavily used.  Hence I was envisioning that #1 below (i.e., a
mention in the Release Notes' backward compatibility notes section)
would be sufficient.

If it weren't the case, then I do not think any "transition" would
work, either.

And stepping back a bit, even though "peel only once" is how
for-each-ref works, I do not think anybody who really cares about
tag-to-tag and inspecting each level of peeled tag is helped by it
all that much.  Yes, you can get the result of single level peeling
via "git format-patch --format=%(*objectname)", but then what would
you do to dig further from that point?  You cannot ask rev-parse to
peel the result with "^{}", as that will peel all the way down.

You have to feed it to "git cat-file tag" and parse the contents of
the tag obbject yourself to manually peel further levels of onion.
Anybody who do care must already have such a machinery, and such a
machinery does not depend on "git for-each-ref --format='%(*field)'"
peeling just once, I would say.  They would most likely learn the
"%(objectname) %(objecttype) %(refname)" from the command, and for
those that are tags, they would manually peel the object with such a
machinery, because they have to do that for second and further
levels anyway.

And that is why I am not so worried about "breaking" existing users
in this particular case.  Our existing support with tag-to-tag is so
poor that those who truly need it would have invented necessary
support without relying on for-each-ref's peeling (if such people
did exist, that is).

But perhaps I am so overly optimistic against Hyrum's law.

> I can (and would like to) deprecate the "peel once" behavior and replace it
> with "peel all", but with how long it's been around and the potential
> performance impact, such a change should probably be clearly communicated.
> How that happens depends on how aggressively we want to cut over. We could:
>
> 1. Change the behavior of '*' from single dereference to recursive
>    dereference, make a note of it in the documentation.
> 2. Same as #1, but also add an option like '--no-recursive-dereference' or
>    something to use the old behavior. Remove the option after 1-2 release
>    cycles?
> 3. Add a new format specifier '^{}', note that '*' is deprecated in the
>    docs.
> 4. Same as #3, but also show a warning/advice if '*' is used.
> 5. Same as #3, but die() if '*' is used.
>
> I'm open to other options, those were just the first few I could think of. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9] for-each-ref: add option to fully dereference tags
  2023-11-09  1:23           ` Junio C Hamano
@ 2023-11-09  1:32             ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-09  1:32 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git

Junio C Hamano <gitster@pobox.com> writes:

> You have to feed it to "git cat-file tag" and parse the contents of
> the tag obbject yourself to manually peel further levels of onion.

Alternatively, you can drive "git show -s" with "--format" and you
probably can produce a machine parseable output.

But it does not change the argument fundamentally.  The point is
that "for-each-ref --format=%(*field)" that peels only the first
layer would not have helped all that much, if somebody really cares
about each levels of nested tags.  They would have been relying on
a solution to deal with the second and further layers anyway, and
that solution would have been working with the first layer, too.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 00/10] for-each-ref optimizations & usability improvements
  2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
                   ` (9 preceding siblings ...)
  2023-11-07  2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
@ 2023-11-14 19:53 ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
                     ` (9 more replies)
  10 siblings, 10 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye

This series is a bit of an informal follow-up to [1], adding some more
substantial optimizations and usability fixes around ref
filtering/formatting. Some of the changes here affect user-facing behavior,
some are internal-only, but they're all interdependent enough to warrant
putting them together in one series.

[1]
https://lore.kernel.org/git/pull.1594.v2.git.1696888736.gitgitgadget@gmail.com/

Patch 1 changes the behavior of the '--no-sort' option in 'for-each-ref',
'tag', and 'branch'. Currently, it just removes previous sort keys and, if
no further keys are specified, falls back on ascending refname sort (which,
IMO, makes the name '--no-sort' somewhat misleading). Now, '--no-sort'
completely disables sorting (unless subsequent '--sort' options are
provided).

Patches 2-7 incrementally refactor various parts of the ref
filtering/formatting workflows in order to create a
'filter_and_format_refs()' function. If certain conditions are met (sorting
disabled, no reachability filtering or ahead-behind formatting), ref
filtering & formatting is done within a single 'for_each_fullref_in'
callback. Especially in large repositories, this makes a huge difference in
memory usage & runtime for certain usages of 'for-each-ref', since it's no
longer writing everything to a 'struct ref_array' then repeatedly whittling
down/updating its contents.

Patch 8 updates the 'for-each-ref' documentation, making the '--format'
description a bit less jumbled and more clearly explaining the '*' prefix
(to be updated in the next patch)

Patch 9 changes the dereferencing done by the '*' format prefix from a
single dereference to a recursive peel. See [1] + replies for the discussion
that led to this approach (as opposed to a command line option or new format
specifier).

[1] https://lore.kernel.org/git/ZUoWWo7IEKsiSx-C@tanuki/

Finally, patch 10 adds performance tests for 'for-each-ref', showing the
effects of optimizations made throughout the series. Here are some sample
results from my Ubuntu VM (test names shortened for space):

Test                                                         HEAD
----------------------------------------------------------------------------
6300.2: (loose)                                              4.68(0.98+3.64)
6300.3: (loose, no sort)                                     4.65(0.91+3.67)
6300.4: (loose, --count=1)                                   4.50(0.84+3.60)
6300.5: (loose, --count=1, no sort)                          4.24(0.46+3.71)
6300.6: (loose, tags)                                        2.41(0.45+1.93)
6300.7: (loose, tags, no sort)                               2.33(0.48+1.83)
6300.8: (loose, tags, dereferenced)                          3.65(1.66+1.95)
6300.9: (loose, tags, dereferenced, no sort)                 3.48(1.59+1.87)
6300.10: for-each-ref + cat-file (loose, tags)               4.48(2.27+2.22)
6300.12: (packed)                                            0.90(0.68+0.18)
6300.13: (packed, no sort)                                   0.71(0.55+0.06)
6300.14: (packed, --count=1)                                 0.77(0.52+0.16)
6300.15: (packed, --count=1, no sort)                        0.03(0.01+0.02)
6300.16: (packed, tags)                                      0.45(0.33+0.10)
6300.17: (packed, tags, no sort)                             0.39(0.33+0.03)
6300.18: (packed, tags, dereferenced)                        1.83(1.67+0.10)
6300.19: (packed, tags, dereferenced, no sort)               1.42(1.28+0.08)
6300.20: for-each-ref + cat-file (packed, tags)              2.36(2.11+0.29)


 * Victoria


Changes since V1
================

 * Restructured commit message of patch 1 for better readability
 * Re-added 'ref_sorting_release(sorting)' to 'ls-remote'
 * Dropped patch 2 so we don't commit to behavior we don't want in
   'for-each-ref --omit-empty --count'
 * Split patch 6 into one that renames 'ref_filter_handler()' to
   'filter_one()' and another that creates helper functions from existing
   code
 * Added/updated code comments in patch 7, changed ref iteration "break"
   return value from -1 to 1
 * Added a patch to reword 'for-each-ref' documentation in anticipation of
   updating the description of what '*' does in the format
 * Removed command-line option '--full-deref' for peeling tags in '*' format
   fields in favor of simply cutting over from the current single
   dereference to recursive dereference in all cases. Updated tests to match
   new behavior.
 * Added the '--count=1' tests back to p6300 (I must have unintentionally
   removed them before submitting V1)

Victoria Dye (10):
  ref-filter.c: really don't sort when using --no-sort
  ref-filter.h: add max_count and omit_empty to ref_format
  ref-filter.h: move contains caches into filter
  ref-filter.h: add functions for filter/format & format-only
  ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  ref-filter.c: refactor to create common helper functions
  ref-filter.c: filter & format refs in the same callback
  for-each-ref: clean up documentation of --format
  ref-filter.c: use peeled tag for '*' format fields
  t/perf: add perf tests for for-each-ref

 Documentation/git-for-each-ref.txt |  23 +--
 builtin/branch.c                   |  42 +++--
 builtin/for-each-ref.c             |  39 +----
 builtin/ls-remote.c                |  11 +-
 builtin/tag.c                      |  32 +---
 ref-filter.c                       | 272 ++++++++++++++++++++---------
 ref-filter.h                       |  25 +++
 t/perf/p6300-for-each-ref.sh       |  87 +++++++++
 t/t3200-branch.sh                  |  68 +++++++-
 t/t6300-for-each-ref.sh            |  43 +++++
 t/t6302-for-each-ref-filter.sh     |   4 +-
 t/t7004-tag.sh                     |  45 +++++
 12 files changed, 517 insertions(+), 174 deletions(-)
 create mode 100755 t/perf/p6300-for-each-ref.sh


base-commit: bc5204569f7db44d22477485afd52ea410d83743
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1609%2Fvdye%2Fvdye%2Ffor-each-ref-optimizations-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1609/vdye/vdye/for-each-ref-optimizations-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1609

Range-diff vs v1:

  1:  dea8d7d1e86 !  1:  074da1ff3e8 ref-filter.c: really don't sort when using --no-sort
     @@ Metadata
       ## Commit message ##
          ref-filter.c: really don't sort when using --no-sort
      
     -    Update 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if
     -    the string list provided to it is empty, rather than returning the default
     -    refname sort structure. Also update 'ref_array_sort()' to explicitly skip
     -    sorting if its 'struct ref_sorting *' arg is NULL. Other functions using
     -    'struct ref_sorting *' do not need any changes because they already properly
     -    ignore NULL values.
     +    When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
     +    printed refs are still sorted by ascending refname. Change the handling of
     +    sort options in these commands so that '--no-sort' to truly disables
     +    sorting.
     +
     +    '--no-sort' does not disable sorting in these commands is because their
     +    option parsing does not distinguish between "the absence of '--sort'"
     +    (and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
     +    an empty 'sorting_options' string list, which is parsed by
     +    'ref_sorting_options()' to create the 'struct ref_sorting *' for the
     +    command. If the string list is empty, 'ref_sorting_options()' interprets
     +    that as "the absence of '--sort'" and returns the default ref sorting
     +    structure (equivalent to "refname" sort).
      
     -    The goal of this change is to have the '--no-sort' option truly disable
     -    sorting in commands like 'for-each-ref, 'tag', and 'branch'. Right now,
     -    '--no-sort' will still trigger refname sorting by default in 'for-each-ref',
     -    'tag', and 'branch'.
     +    To handle '--no-sort' properly while preserving the "refname" sort in the
     +    "absence of --sort'" case, first explicitly add "refname" to the string list
     +    *before* parsing options. This alone doesn't actually change any behavior,
     +    since 'compare_refs()' already falls back on comparing refnames if two refs
     +    are equal w.r.t all other sort keys.
      
     -    To match existing behavior as closely as possible, explicitly add "refname"
     -    to the list of sort keys in 'for-each-ref', 'tag', and 'branch' before
     -    parsing options (if no config-based sort keys are set). This ensures that
     -    sorting will only be fully disabled if '--no-sort' is provided as an option;
     -    otherwise, "refname" sorting will remain the default. Note: this also means
     -    that even when sort keys are provided on the command line, "refname" will be
     -    the final sort key in the sorting structure. This doesn't actually change
     -    any behavior, since 'compare_refs()' already falls back on comparing
     -    refnames if two refs are equal w.r.t all other sort keys.
     +    Now that the string list is populated by default, '--no-sort' is the only
     +    way to empty the 'sorting_options' string list. Update
     +    'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
     +    string list is empty, and add a condition to 'ref_array_sort()' to skip the
     +    sort altogether if the sort structure is NULL. Note that other functions
     +    using 'struct ref_sorting *' do not need any changes because they already
     +    ignore NULL values.
      
          Finally, remove the condition around sorting in 'ls-remote', since it's no
     -    longer necessary. Unlike 'for-each-ref' et. al., it does *not* set any sort
     -    keys by default. The default empty list of sort keys will produce a NULL
     -    'struct ref_sorting *', which causes the sorting to be skipped in
     -    'ref_array_sort()'.
     +    longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
     +    sorting by default. This default is preserved by simply leaving its sort key
     +    string list empty before parsing options; if no additional sort keys are
     +    set, 'struct ref_sorting *' is NULL and sorting is skipped.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *
       
       	for (i = 0; i < ref_array.nr; i++) {
       		const struct ref_array_item *ref = ref_array.items[i];
     +@@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *prefix)
     + 		status = 0; /* we found something */
     + 	}
     + 
     ++	ref_sorting_release(sorting);
     + 	ref_array_clear(&ref_array);
     + 	if (transport_disconnect(transport))
     + 		status = 1;
      
       ## builtin/tag.c ##
      @@ builtin/tag.c: int cmd_tag(int argc, const char **argv, const char *prefix)
  2:  88eba4146cd <  -:  ----------- for-each-ref: clarify interaction of --omit-empty & --count
  3:  2e2f9738205 =  2:  adac101bc60 ref-filter.h: add max_count and omit_empty to ref_format
  4:  6c66445ee31 !  3:  f44c4b42c93 ref-filter.h: move contains caches into filter
     @@ Commit message
          filter struct will support, so they are updated to be internally accessible
          wherever the filter is used.
      
     -    The design used here is mirrors what was introduced in 576de3d956
     +    The design used here mirrors what was introduced in 576de3d956
          (unpack_trees: start splitting internal fields from public API, 2023-02-27)
          for 'unpack_trees_options'.
      
  5:  f5be57eea7d =  4:  187b1d6610f ref-filter.h: add functions for filter/format & format-only
  -:  ----------- >  5:  040d291ca45 ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  6:  8c77452e5dd !  6:  633c0c74c2e ref-filter.c: refactor to create common helper functions
     @@ Commit message
          ref-filter.c: refactor to create common helper functions
      
          Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
     -    'filter_refs()' into new helper functions ('ref_array_append()',
     -    'apply_ref_filter()', and 'do_filter_refs()' respectively), as well as
     -    rename 'ref_filter_handler()' to 'filter_one()'. In this and later
     -    patches, these helpers will be used by new ref-filter API functions. This
     -    patch does not result in any user-facing behavior changes or changes to
     -    callers outside of 'ref-filter.c'.
     +    'filter_refs()' into new helper functions:
      
     -    The changes are as follows:
     +    * Extract the code to grow a 'struct ref_array' and append a given 'struct
     +      ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'.
     +    * Extract the code to filter a given ref by refname & object ID then create
     +      a new 'struct ref_array_item *' from 'filter_one()' into
     +      'apply_ref_filter()'.
     +    * Extract the code for filter pre-processing, contains cache creation, and
     +      ref iteration from 'filter_refs()' into 'do_filter_refs()'.
      
     -    * The logic to grow a 'struct ref_array' and append a given 'struct
     -      ref_array_item *' to it is extracted from 'ref_array_push()' into
     -      'ref_array_append()'.
     -    * 'ref_filter_handler()' is renamed to 'filter_one()' to more clearly
     -      distinguish it from other ref filtering callbacks that will be added in
     -      later patches. The "*_one()" naming convention is common throughout the
     -      codebase for iteration callbacks.
     -    * The code to filter a given ref by refname & object ID then create a new
     -      'struct ref_array_item' is moved out of 'filter_one()' and into
     -      'apply_ref_filter()'. 'apply_ref_filter()' returns either NULL (if the ref
     -      does not match the given filter) or a 'struct ref_array_item *' created
     -      with 'new_ref_array_item()'; 'filter_one()' appends that item to
     -      its ref array with 'ref_array_append()'.
     -    * The filter pre-processing, contains cache creation, and ref iteration of
     -      'filter_refs()' is extracted into 'do_filter_refs()'. 'do_filter_refs()'
     -      takes its ref iterator function & callback data as an input from the
     -      caller, setting it up to be used with additional filtering callbacks in
     -      later patches.
     +    In later patches, these helpers will be used by new ref-filter API
     +    functions. This patch does not result in any user-facing behavior changes or
     +    changes to callers outside of 'ref-filter.c'.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     @@ ref-filter.c: static int filter_ref_kind(struct ref_filter *filter, const char *
      - * A call-back given to for_each_ref().  Filter refs and keep them for
      - * later object processing.
      - */
     --static int ref_filter_handler(const char *refname, const struct object_id *oid, int flag, void *cb_data)
     +-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
      +static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
      +			    int flag, struct ref_filter *filter)
       {
     @@ ref-filter.c: static int filter_ref_kind(struct ref_filter *filter, const char *
       
       	/*
       	 * A merge filter is applied on refs pointing to commits. Hence
     -@@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct object_id *oid,
     +@@ ref-filter.c: static int filter_one(const char *refname, const struct object_id *oid, int flag
       	    filter->with_commit || filter->no_commit || filter->verbose) {
       		commit = lookup_commit_reference_gently(the_repository, oid, 1);
       		if (!commit)
     @@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct ob
       	}
       
       	/*
     -@@ ref-filter.c: static int ref_filter_handler(const char *refname, const struct object_id *oid,
     +@@ ref-filter.c: static int filter_one(const char *refname, const struct object_id *oid, int flag
       	 * to do its job and the resulting list may yet to be pruned
       	 * by maxcount logic.
       	 */
     @@ ref-filter.c: int filter_refs(struct ref_array *array, struct ref_filter *filter
       		 * of filter_ref_kind().
       		 */
       		if (filter->kind == FILTER_REFS_BRANCHES)
     --			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/heads/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/heads/", fn, cb_data);
       		else if (filter->kind == FILTER_REFS_REMOTES)
     --			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/remotes/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/remotes/", fn, cb_data);
       		else if (filter->kind == FILTER_REFS_TAGS)
     --			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in("refs/tags/", filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in("refs/tags/", fn, cb_data);
       		else if (filter->kind & FILTER_REFS_ALL)
     --			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
     +-			ret = for_each_fullref_in_pattern(filter, filter_one, &ref_cbdata);
      +			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
       		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
     --			head_ref(ref_filter_handler, &ref_cbdata);
     +-			head_ref(filter_one, &ref_cbdata);
      +			head_ref(fn, cb_data);
       	}
       
  7:  84db440896c !  7:  91a77c1a834 ref-filter.c: filter & format refs in the same callback
     @@ ref-filter.c: static void free_array_item(struct ref_array_item *item)
      +	strbuf_release(&err);
      +	free_array_item(ref);
      +
     ++	/*
     ++	 * Increment the running count of refs that match the filter. If
     ++	 * max_count is set and we've reached the max, stop the ref
     ++	 * iteration by returning a nonzero value.
     ++	 */
      +	if (ref_cbdata->format->array_opts.max_count &&
      +	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
     -+		return -1;
     ++		return 1;
      +
      +	return 0;
      +}
     @@ ref-filter.c: int filter_refs(struct ref_array *array, struct ref_filter *filter
      +					  struct ref_format *format)
      +{
      +	/*
     -+	 * Refs can be filtered and formatted in the same iteration as long
     -+	 * as we aren't filtering on reachability, sorting the results, or
     -+	 * including ahead-behind information in the formatted output.
     ++	 * Filtering & formatting results within a single ref iteration
     ++	 * callback is not compatible with options that require
     ++	 * post-processing a filtered ref_array. These include:
     ++	 * - filtering on reachability
     ++	 * - sorting the filtered results
     ++	 * - including ahead-behind information in the formatted output
      +	 */
      +	return !(filter->reachable_from ||
      +		 filter->unreachable_from ||
  -:  ----------- >  8:  8eb2fc2950c for-each-ref: clean up documentation of --format
  8:  352b5c42ac3 !  9:  48254d8e161 for-each-ref: add option to fully dereference tags
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    for-each-ref: add option to fully dereference tags
     +    ref-filter.c: use peeled tag for '*' format fields
      
     -    Add a boolean flag '--full-deref' that, when enabled, fills '%(*fieldname)'
     -    format fields using the fully peeled target of tag objects, rather than the
     -    immediate target.
     -
     -    In other builtins ('rev-parse', 'show-ref'), "dereferencing" tags typically
     -    means peeling them down to their non-tag target. Unlike these commands,
     -    'for-each-ref' dereferences only one "level" of tags in '*' format fields
     -    (like "%(*objectname)"). For most annotated tags, one level of dereferencing
     -    is enough, since most tags point to commits or trees. However, nested tags
     -    (annotated tags whose target is another annotated tag) dereferenced once
     -    will point to their target tag, different a full peel to e.g. a commit.
     +    In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
     +    "dereferencing" a tag refers to a recursive peel of the tag object. Unlike
     +    these cases, the dereferencing prefix ('*') in 'for-each-ref' format
     +    specifiers triggers only a single, non-recursive dereference of a given tag
     +    object. For most annotated tags, a single dereference is all that is needed
     +    to access the tag's associated commit or tree; "recursive" and
     +    "non-recursive" dereferencing are functionally equivalent in these cases.
     +    However, nested tags (annotated tags whose target is another annotated tag)
     +    dereferenced once return another tag, where a recursive dereference would
     +    return the commit or tree.
      
          Currently, if a user wants to filter & format refs and include information
     -    about the fully dereferenced tag, they can do so with something like
     +    about a recursively-dereferenced tag, they can do so with something like
          'cat-file --batch-check':
      
              git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
                  git cat-file --batch-check="%(objectname) %(rest)"
      
          But the combination of commands is inefficient. So, to improve the
     -    efficiency of this use case, add a '--full-deref' option that causes
     -    'for-each-ref' to fully dereference tags when formatting with '*' fields.
     +    performance of this use case and align the defererencing behavior of
     +    'for-each-ref' with that of other commands, update the ref formatting code
     +    to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields
     +    rather than the tag's immediate target object (from 'get_tagged_oid()').
     +
     +    Additionally, add a test to 't6300-for-each-ref' to verify new nested tag
     +    behavior and update 't6302-for-each-ref-filter.sh' to print the correct
     +    value for nested dereferenced fields.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-for-each-ref.txt ##
     -@@ Documentation/git-for-each-ref.txt: SYNOPSIS
     - 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
     - 		   [(--sort=<key>)...] [--format=<format>]
     - 		   [ --stdin | <pattern>... ]
     -+		   [--full-deref]
     - 		   [--points-at=<object>]
     - 		   [--merged[=<object>]] [--no-merged[=<object>]]
     - 		   [--contains[=<object>]] [--no-contains[=<object>]]
     -@@ Documentation/git-for-each-ref.txt: OPTIONS
     - 	the specified host language.  This is meant to produce
     - 	a scriptlet that can directly be `eval`ed.
     +@@ Documentation/git-for-each-ref.txt: from the `committer` or `tagger` fields depending on the object type.
     + These are intended for working on a mix of annotated and lightweight tags.
       
     -+--full-deref::
     -+	Populate dereferenced format fields (indicated with an asterisk (`*`)
     -+	prefix before the fieldname) with information about the fully-peeled
     -+	target object of a tag ref, rather than its immediate target object.
     -+	This only affects the output for nested annotated tags, where the tag's
     -+	immediate target is another tag but its fully-peeled target is another
     -+	object type (e.g. a commit).
     -+
     - --points-at=<object>::
     - 	Only list refs which points at the given object.
     + For tag objects, a `fieldname` prefixed with an asterisk (`*`) expands to
     +-the `fieldname` value of object the tag points at, rather than that of the
     +-tag object itself.
     ++the `fieldname` value of the peeled object, rather than that of the tag
     ++object itself.
       
     -
     - ## builtin/for-each-ref.c ##
     -@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     - 		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
     - 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
     - 		OPT__COLOR(&format.use_color, N_("respect format colors")),
     -+		OPT_BOOL(0, "full-deref", &format.full_deref,
     -+			 N_("fully dereference tags to populate '*' format fields")),
     - 		OPT_REF_FILTER_EXCLUDE(&filter),
     - 		OPT_REF_SORT(&sorting_options),
     - 		OPT_CALLBACK(0, "points-at", &filter.points_at,
     + Fields that have name-email-date tuple as its value (`author`,
     + `committer`, and `tagger`) can be suffixed with `name`, `email`,
      
       ## ref-filter.c ##
     -@@ ref-filter.c: static struct used_atom {
     - 		char *head;
     - 	} u;
     - } *used_atom;
     --static int used_atom_cnt, need_tagged, need_symref;
     -+static int used_atom_cnt, need_symref;
     -+
     -+enum tag_dereference_mode {
     -+	NO_DEREF = 0,
     -+	DEREF_ONE,
     -+	DEREF_ALL
     -+};
     -+static enum tag_dereference_mode need_tagged;
     - 
     - /*
     -  * Expand string, append it to strbuf *sb, then return error code ret.
     -@@ ref-filter.c: static int parse_ref_filter_atom(struct ref_format *format,
     - 	memset(&used_atom[at].u, 0, sizeof(used_atom[at].u));
     - 	if (valid_atom[i].parser && valid_atom[i].parser(format, &used_atom[at], arg, err))
     - 		return -1;
     --	if (*atom == '*')
     --		need_tagged = 1;
     -+	if (*atom == '*' && !need_tagged)
     -+		need_tagged = format->full_deref ? DEREF_ALL : DEREF_ONE;
     - 	if (i == ATOM_SYMREF)
     - 		need_symref = 1;
     - 	return at;
      @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbuf *err)
     - 	 * If it is a tag object, see if we use a value that derefs
     - 	 * the object, and if we do grab the object it refers to.
     + 		return 0;
     + 
     + 	/*
     +-	 * If it is a tag object, see if we use a value that derefs
     +-	 * the object, and if we do grab the object it refers to.
     ++	 * If it is a tag object, see if we use the peeled value. If we do,
     ++	 * grab the peeled OID.
       	 */
      -	oi_deref.oid = *get_tagged_oid((struct tag *)obj);
     -+	if (need_tagged == DEREF_ALL) {
     -+		if (peel_iterated_oid(&obj->oid, &oi_deref.oid))
     -+			die("bad tag");
     -+	} else {
     -+		oi_deref.oid = *get_tagged_oid((struct tag *)obj);
     -+	}
     ++	if (need_tagged && peel_iterated_oid(&obj->oid, &oi_deref.oid))
     ++		die("bad tag");
       
      -	/*
      -	 * NEEDSWORK: This derefs tag only once, which
     @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbu
       }
       
      
     - ## ref-filter.h ##
     -@@ ref-filter.h: struct ref_format {
     - 	const char *rest;
     - 	int quote_style;
     - 	int use_color;
     -+	int full_deref;
     - 
     - 	/* Internal state to ref-filter */
     - 	int need_color_reset_at_eol;
     -
       ## t/t6300-for-each-ref.sh ##
      @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing refs' '
       	test_must_be_empty actual
     @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing
      +	nest1_tag_oid="$(git rev-parse refs/tags/nested/nest1)" &&
      +	nest2_tag_oid="$(git rev-parse refs/tags/nested/nest2)" &&
      +
     -+	# Without full dereference
     -+	cat >expect <<-EOF &&
     -+	refs/tags/nested/base $base_tag_oid tag $head_oid commit
     -+	refs/tags/nested/nest1 $nest1_tag_oid tag $base_tag_oid tag
     -+	refs/tags/nested/nest2 $nest2_tag_oid tag $nest1_tag_oid tag
     -+	EOF
     -+
     -+	git for-each-ref --format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
     -+		refs/tags/nested/ >actual &&
     -+	test_cmp expect actual &&
     -+
     -+	# With full dereference
      +	cat >expect <<-EOF &&
      +	refs/tags/nested/base $base_tag_oid tag $head_oid commit
      +	refs/tags/nested/nest1 $nest1_tag_oid tag $head_oid commit
      +	refs/tags/nested/nest2 $nest2_tag_oid tag $head_oid commit
      +	EOF
      +
     -+	git for-each-ref --full-deref \
     ++	git for-each-ref \
      +		--format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
      +		refs/tags/nested/ >actual &&
      +	test_cmp expect actual
     @@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with non-existing
       GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
       TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
       
     +
     + ## t/t6302-for-each-ref-filter.sh ##
     +@@ t/t6302-for-each-ref-filter.sh: test_expect_success 'check signed tags with --points-at' '
     + 	sed -e "s/Z$//" >expect <<-\EOF &&
     + 	refs/heads/side Z
     + 	refs/tags/annotated-tag four
     +-	refs/tags/doubly-annotated-tag An annotated tag
     +-	refs/tags/doubly-signed-tag A signed tag
     ++	refs/tags/doubly-annotated-tag four
     ++	refs/tags/doubly-signed-tag four
     + 	refs/tags/four Z
     + 	refs/tags/signed-tag four
     + 	EOF
  9:  a409d773057 ! 10:  d51d073aa4a t/perf: add perf tests for for-each-ref
     @@ Commit message
          Add performance tests for 'for-each-ref'. The tests exercise different
          combinations of filters/formats/options, as well as the overall performance
          of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
     -    performance difference vs. 'git for-each-ref --full-deref'.
     +    performance difference vs. 'git for-each-ref' with "%(*fieldname)" format
     +    specifiers.
      
          All tests are run against a repository with 40k loose refs - 10k commits,
          each having a unique:
     @@ t/perf/p6300-for-each-ref.sh (new)
      +run_tests () {
      +	test_for_each_ref "$1"
      +	test_for_each_ref "$1, no sort" --no-sort
     ++	test_for_each_ref "$1, --count=1" --count=1
     ++	test_for_each_ref "$1, --count=1, no sort" --no-sort --count=1
      +	test_for_each_ref "$1, tags" refs/tags/
      +	test_for_each_ref "$1, tags, no sort" --no-sort refs/tags/
     -+	test_for_each_ref "$1, tags, shallow deref" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, shallow deref, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, full deref" --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     -+	test_for_each_ref "$1, tags, full deref, no sort" --no-sort --full-deref '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     ++	test_for_each_ref "$1, tags, dereferenced" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
     ++	test_for_each_ref "$1, tags, dereferenced, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
      +
     -+	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (full deref)" "
     ++	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (dereferenced)" "
      +		for i in \$(test_seq $test_iteration_count); do
      +			git for-each-ref --format='%(objectname)^{} %(refname) %(objectname)' refs/tags/ | \
      +				git cat-file --batch-check='%(objectname) %(rest)' >/dev/null

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-16  5:29     ` Junio C Hamano
  2023-11-14 19:53   ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
printed refs are still sorted by ascending refname. Change the handling of
sort options in these commands so that '--no-sort' to truly disables
sorting.

'--no-sort' does not disable sorting in these commands is because their
option parsing does not distinguish between "the absence of '--sort'"
(and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
an empty 'sorting_options' string list, which is parsed by
'ref_sorting_options()' to create the 'struct ref_sorting *' for the
command. If the string list is empty, 'ref_sorting_options()' interprets
that as "the absence of '--sort'" and returns the default ref sorting
structure (equivalent to "refname" sort).

To handle '--no-sort' properly while preserving the "refname" sort in the
"absence of --sort'" case, first explicitly add "refname" to the string list
*before* parsing options. This alone doesn't actually change any behavior,
since 'compare_refs()' already falls back on comparing refnames if two refs
are equal w.r.t all other sort keys.

Now that the string list is populated by default, '--no-sort' is the only
way to empty the 'sorting_options' string list. Update
'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
string list is empty, and add a condition to 'ref_array_sort()' to skip the
sort altogether if the sort structure is NULL. Note that other functions
using 'struct ref_sorting *' do not need any changes because they already
ignore NULL values.

Finally, remove the condition around sorting in 'ls-remote', since it's no
longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
sorting by default. This default is preserved by simply leaving its sort key
string list empty before parsing options; if no additional sort keys are
set, 'struct ref_sorting *' is NULL and sorting is skipped.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c        |  6 ++++
 builtin/for-each-ref.c  |  3 ++
 builtin/ls-remote.c     | 11 +++----
 builtin/tag.c           |  6 ++++
 ref-filter.c            | 19 ++----------
 t/t3200-branch.sh       | 68 +++++++++++++++++++++++++++++++++++++++--
 t/t6300-for-each-ref.sh | 21 +++++++++++++
 t/t7004-tag.sh          | 45 +++++++++++++++++++++++++++
 8 files changed, 153 insertions(+), 26 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e7ee9bd0f15..d67738bbcaa 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -767,7 +767,13 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_branch_usage, options);
 
+	/*
+	 * Try to set sort keys from config. If config does not set any,
+	 * fall back on default (refname) sorting.
+	 */
 	git_config(git_branch_config, &sorting_options);
+	if (!sorting_options.nr)
+		string_list_append(&sorting_options, "refname");
 
 	track = git_branch_track;
 
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 350bfa6e811..93b370f550b 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -67,6 +67,9 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	/* Set default (refname) sorting */
+	string_list_append(&sorting_options, "refname");
+
 	parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
 	if (maxcount < 0) {
 		error("invalid --count argument: `%d'", maxcount);
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index fc765754305..b416602b4d3 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -58,6 +58,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 	struct transport *transport;
 	const struct ref *ref;
 	struct ref_array ref_array;
+	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 
 	struct option options[] = {
@@ -141,13 +142,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		item->symref = xstrdup_or_null(ref->symref);
 	}
 
-	if (sorting_options.nr) {
-		struct ref_sorting *sorting;
-
-		sorting = ref_sorting_options(&sorting_options);
-		ref_array_sort(sorting, &ref_array);
-		ref_sorting_release(sorting);
-	}
+	sorting = ref_sorting_options(&sorting_options);
+	ref_array_sort(sorting, &ref_array);
 
 	for (i = 0; i < ref_array.nr; i++) {
 		const struct ref_array_item *ref = ref_array.items[i];
@@ -157,6 +153,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		status = 0; /* we found something */
 	}
 
+	ref_sorting_release(sorting);
 	ref_array_clear(&ref_array);
 	if (transport_disconnect(transport))
 		status = 1;
diff --git a/builtin/tag.c b/builtin/tag.c
index 3918eacbb57..64f3196cd4c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -501,7 +501,13 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 	setup_ref_filter_porcelain_msg();
 
+	/*
+	 * Try to set sort keys from config. If config does not set any,
+	 * fall back on default (refname) sorting.
+	 */
 	git_config(git_tag_config, &sorting_options);
+	if (!sorting_options.nr)
+		string_list_append(&sorting_options, "refname");
 
 	memset(&opt, 0, sizeof(opt));
 	filter.lines = -1;
diff --git a/ref-filter.c b/ref-filter.c
index e4d3510e28e..7250089b7c6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3142,7 +3142,8 @@ void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting,
 
 void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
 {
-	QSORT_S(array->items, array->nr, compare_refs, sorting);
+	if (sorting)
+		QSORT_S(array->items, array->nr, compare_refs, sorting);
 }
 
 static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
@@ -3248,18 +3249,6 @@ static int parse_sorting_atom(const char *atom)
 	return res;
 }
 
-/*  If no sorting option is given, use refname to sort as default */
-static struct ref_sorting *ref_default_sorting(void)
-{
-	static const char cstr_name[] = "refname";
-
-	struct ref_sorting *sorting = xcalloc(1, sizeof(*sorting));
-
-	sorting->next = NULL;
-	sorting->atom = parse_sorting_atom(cstr_name);
-	return sorting;
-}
-
 static void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *arg)
 {
 	struct ref_sorting *s;
@@ -3283,9 +3272,7 @@ struct ref_sorting *ref_sorting_options(struct string_list *options)
 	struct string_list_item *item;
 	struct ref_sorting *sorting = NULL, **tail = &sorting;
 
-	if (!options->nr) {
-		sorting = ref_default_sorting();
-	} else {
+	if (options->nr) {
 		for_each_string_list_item(item, options)
 			parse_ref_sorting(tail, item->string);
 	}
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 3182abde27f..9918ba05dec 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1570,9 +1570,10 @@ test_expect_success 'tracking with unexpected .fetch refspec' '
 
 test_expect_success 'configured committerdate sort' '
 	git init -b main sort &&
+	test_config -C sort branch.sort "committerdate" &&
+
 	(
 		cd sort &&
-		git config branch.sort committerdate &&
 		test_commit initial &&
 		git checkout -b a &&
 		test_commit a &&
@@ -1592,9 +1593,10 @@ test_expect_success 'configured committerdate sort' '
 '
 
 test_expect_success 'option override configured sort' '
+	test_config -C sort branch.sort "committerdate" &&
+
 	(
 		cd sort &&
-		git config branch.sort committerdate &&
 		git branch --sort=refname >actual &&
 		cat >expect <<-\EOF &&
 		  a
@@ -1606,10 +1608,70 @@ test_expect_success 'option override configured sort' '
 	)
 '
 
+test_expect_success '--no-sort cancels config sort keys' '
+	test_config -C sort branch.sort "-refname" &&
+
+	(
+		cd sort &&
+
+		# objecttype is identical for all of them, so sort falls back on
+		# default (ascending refname)
+		git branch \
+			--no-sort \
+			--sort="objecttype" >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		* b
+		  c
+		  main
+		EOF
+		test_cmp expect actual
+	)
+
+'
+
+test_expect_success '--no-sort cancels command line sort keys' '
+	(
+		cd sort &&
+
+		# objecttype is identical for all of them, so sort falls back on
+		# default (ascending refname)
+		git branch \
+			--sort="-refname" \
+			--no-sort \
+			--sort="objecttype" >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		* b
+		  c
+		  main
+		EOF
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--no-sort without subsequent --sort prints expected branches' '
+	(
+		cd sort &&
+
+		# Sort the results with `sort` for a consistent comparison
+		# against expected
+		git branch --no-sort | sort >actual &&
+		cat >expect <<-\EOF &&
+		  a
+		  c
+		  main
+		* b
+		EOF
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'invalid sort parameter in configuration' '
+	test_config -C sort branch.sort "v:notvalid" &&
+
 	(
 		cd sort &&
-		git config branch.sort "v:notvalid" &&
 
 		# this works in the "listing" mode, so bad sort key
 		# is a dying offence.
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 00a060df0b5..0613e5e3623 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1335,6 +1335,27 @@ test_expect_success '--no-sort cancels the previous sort keys' '
 	test_cmp expected actual
 '
 
+test_expect_success '--no-sort without subsequent --sort prints expected refs' '
+	cat >expected <<-\EOF &&
+	refs/tags/multi-ref1-100000-user1
+	refs/tags/multi-ref1-100000-user2
+	refs/tags/multi-ref1-200000-user1
+	refs/tags/multi-ref1-200000-user2
+	refs/tags/multi-ref2-100000-user1
+	refs/tags/multi-ref2-100000-user2
+	refs/tags/multi-ref2-200000-user1
+	refs/tags/multi-ref2-200000-user2
+	EOF
+
+	# Sort the results with `sort` for a consistent comparison against
+	# expected
+	git for-each-ref \
+		--format="%(refname)" \
+		--no-sort \
+		"refs/tags/multi-*" | sort >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'do not dereference NULL upon %(HEAD) on unborn branch' '
 	test_when_finished "git checkout main" &&
 	git for-each-ref --format="%(HEAD) %(refname:short)" refs/heads/ >actual &&
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index e689db42929..b41a47eb943 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1862,6 +1862,51 @@ test_expect_success 'option override configured sort' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-sort cancels config sort keys' '
+	test_config tag.sort "-refname" &&
+
+	# objecttype is identical for all of them, so sort falls back on
+	# default (ascending refname)
+	git tag -l \
+		--no-sort \
+		--sort="objecttype" \
+		"foo*" >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success '--no-sort cancels command line sort keys' '
+	# objecttype is identical for all of them, so sort falls back on
+	# default (ascending refname)
+	git tag -l \
+		--sort="-refname" \
+		--no-sort \
+		--sort="objecttype" \
+		"foo*" >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success '--no-sort without subsequent --sort prints expected tags' '
+	# Sort the results with `sort` for a consistent comparison against
+	# expected
+	git tag -l --no-sort "foo*" | sort >actual &&
+	cat >expect <<-\EOF &&
+	foo1.10
+	foo1.3
+	foo1.6
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'invalid sort parameter on command line' '
 	test_must_fail git tag -l --sort=notvalid "foo*" >actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-16 12:06     ` Øystein Walle
  2023-11-14 19:53   ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add an internal 'array_opts' struct to 'struct ref_format' containing
formatting options that pertain to the formatting of an entire ref array:
'max_count' and 'omit_empty'. These values are specified by the '--count'
and '--omit-empty' options, respectively, to 'for-each-ref'/'tag'/'branch'.
Storing these values in the 'ref_format' will simplify the consolidation of
ref array formatting logic across builtins in later patches.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c       |  5 ++---
 builtin/for-each-ref.c | 21 +++++++++++----------
 builtin/tag.c          |  5 ++---
 ref-filter.h           |  5 +++++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index d67738bbcaa..5a1ec1cd04f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -45,7 +45,6 @@ static const char *head;
 static struct object_id head_oid;
 static int recurse_submodules = 0;
 static int submodule_propagate_branches = 0;
-static int omit_empty = 0;
 
 static int branch_use_color = -1;
 static char branch_colors[][COLOR_MAXLEN] = {
@@ -480,7 +479,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 			string_list_append(output, out.buf);
 		} else {
 			fwrite(out.buf, 1, out.len, stdout);
-			if (out.len || !omit_empty)
+			if (out.len || !format->array_opts.omit_empty)
 				putchar('\n');
 		}
 	}
@@ -737,7 +736,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
 		OPT_BIT('D', NULL, &delete, N_("delete branch (even if not merged)"), 2),
 		OPT_BIT('m', "move", &rename, N_("move/rename a branch and its reflog"), 1),
 		OPT_BIT('M', NULL, &rename, N_("move/rename a branch, even if target exists"), 2),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 		OPT_BIT('c', "copy", &copy, N_("copy a branch and its reflog"), 1),
 		OPT_BIT('C', NULL, &copy, N_("copy a branch, even if target exists"), 2),
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 93b370f550b..881c3ee055f 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -19,10 +19,10 @@ static char const * const for_each_ref_usage[] = {
 
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
-	int i;
+	int i, total;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
-	int maxcount = 0, icase = 0, omit_empty = 0;
+	int icase = 0;
 	struct ref_array array;
 	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
@@ -40,11 +40,11 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 			N_("quote placeholders suitably for python"), QUOTE_PYTHON),
 		OPT_BIT(0 , "tcl",  &format.quote_style,
 			N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 
 		OPT_GROUP(""),
-		OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
+		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
 		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
 		OPT__COLOR(&format.use_color, N_("respect format colors")),
 		OPT_REF_FILTER_EXCLUDE(&filter),
@@ -71,8 +71,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	string_list_append(&sorting_options, "refname");
 
 	parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
-	if (maxcount < 0) {
-		error("invalid --count argument: `%d'", maxcount);
+	if (format.array_opts.max_count < 0) {
+		error("invalid --count argument: `%d'", format.array_opts.max_count);
 		usage_with_options(for_each_ref_usage, opts);
 	}
 	if (HAS_MULTI_BITS(format.quote_style)) {
@@ -109,15 +109,16 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	ref_array_sort(sorting, &array);
 
-	if (!maxcount || array.nr < maxcount)
-		maxcount = array.nr;
-	for (i = 0; i < maxcount; i++) {
+	total = format.array_opts.max_count;
+	if (!total || array.nr < total)
+		total = array.nr;
+	for (i = 0; i < total; i++) {
 		strbuf_reset(&err);
 		strbuf_reset(&output);
 		if (format_ref_array_item(array.items[i], &format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !omit_empty)
+		if (output.len || !format.array_opts.omit_empty)
 			putchar('\n');
 	}
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 64f3196cd4c..2d599245d48 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -44,7 +44,6 @@ static const char * const git_tag_usage[] = {
 static unsigned int colopts;
 static int force_sign_annotate;
 static int config_sign_tag = -1; /* unspecified */
-static int omit_empty = 0;
 
 static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
@@ -83,7 +82,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		if (format_ref_array_item(array.items[i], format, &output, &err))
 			die("%s", err.buf);
 		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !omit_empty)
+		if (output.len || !format->array_opts.omit_empty)
 			putchar('\n');
 	}
 
@@ -481,7 +480,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 		OPT_WITHOUT(&filter.no_commit, N_("print only tags that don't contain the commit")),
 		OPT_MERGED(&filter, N_("print only tags that are merged")),
 		OPT_NO_MERGED(&filter, N_("print only tags that are not merged")),
-		OPT_BOOL(0, "omit-empty",  &omit_empty,
+		OPT_BOOL(0, "omit-empty",  &format.array_opts.omit_empty,
 			N_("do not output a newline after empty formatted refs")),
 		OPT_REF_SORT(&sorting_options),
 		{
diff --git a/ref-filter.h b/ref-filter.h
index 1524bc463a5..d87d61238b7 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -92,6 +92,11 @@ struct ref_format {
 
 	/* List of bases for ahead-behind counts. */
 	struct string_list bases;
+
+	struct {
+		int max_count;
+		int omit_empty;
+	} array_opts;
 };
 
 #define REF_FILTER_INIT { \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 03/10] ref-filter.h: move contains caches into filter
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into
an 'internal' struct of the 'struct ref_filter'. In later patches, the
'struct ref_filter *' will be a common data structure across multiple
filtering functions. These caches are part of the common functionality the
filter struct will support, so they are updated to be internally accessible
wherever the filter is used.

The design used here mirrors what was introduced in 576de3d956
(unpack_trees: start splitting internal fields from public API, 2023-02-27)
for 'unpack_trees_options'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 14 ++++++--------
 ref-filter.h |  6 ++++++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 7250089b7c6..5129b6986c9 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2764,8 +2764,6 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 struct ref_filter_cbdata {
 	struct ref_array *array;
 	struct ref_filter *filter;
-	struct contains_cache contains_cache;
-	struct contains_cache no_contains_cache;
 };
 
 /*
@@ -2816,11 +2814,11 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 			return 0;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
-		    !commit_contains(filter, commit, filter->with_commit, &ref_cbdata->contains_cache))
+		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
 			return 0;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
-		    commit_contains(filter, commit, filter->no_commit, &ref_cbdata->no_contains_cache))
+		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
 			return 0;
 	}
 
@@ -2989,8 +2987,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	save_commit_buffer_orig = save_commit_buffer;
 	save_commit_buffer = 0;
 
-	init_contains_cache(&ref_cbdata.contains_cache);
-	init_contains_cache(&ref_cbdata.no_contains_cache);
+	init_contains_cache(&filter->internal.contains_cache);
+	init_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Simple per-ref filtering */
 	if (!filter->kind)
@@ -3014,8 +3012,8 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 			head_ref(ref_filter_handler, &ref_cbdata);
 	}
 
-	clear_contains_cache(&ref_cbdata.contains_cache);
-	clear_contains_cache(&ref_cbdata.no_contains_cache);
+	clear_contains_cache(&filter->internal.contains_cache);
+	clear_contains_cache(&filter->internal.no_contains_cache);
 
 	/*  Filters that need revision walking */
 	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
diff --git a/ref-filter.h b/ref-filter.h
index d87d61238b7..0db3ff52889 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -7,6 +7,7 @@
 #include "commit.h"
 #include "string-list.h"
 #include "strvec.h"
+#include "commit-reach.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -75,6 +76,11 @@ struct ref_filter {
 		lines;
 	int abbrev,
 		verbose;
+
+	struct {
+		struct contains_cache contains_cache;
+		struct contains_cache no_contains_cache;
+	} internal;
 };
 
 struct ref_format {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-16  5:39     ` Junio C Hamano
  2023-11-14 19:53   ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add two new public methods to 'ref-filter.h':

* 'print_formatted_ref_array()' which, given a format specification & array
  of ref items, formats and prints the items to stdout.
* 'filter_and_format_refs()' which combines 'filter_refs()',
  'ref_array_sort()', and 'print_formatted_ref_array()' into a single
  function.

This consolidates much of the code used to filter and format refs in
'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing
duplication and simplifying the future changes needed to optimize the filter
& format process.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/branch.c       | 33 +++++++++++++++++----------------
 builtin/for-each-ref.c | 27 +--------------------------
 builtin/tag.c          | 23 +----------------------
 ref-filter.c           | 35 +++++++++++++++++++++++++++++++++++
 ref-filter.h           | 14 ++++++++++++++
 5 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 5a1ec1cd04f..2ed59f16f1c 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -437,8 +437,6 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 {
 	int i;
 	struct ref_array array;
-	struct strbuf out = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	int maxwidth = 0;
 	const char *remote_prefix = "";
 	char *to_free = NULL;
@@ -468,24 +466,27 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
-	for (i = 0; i < array.nr; i++) {
-		strbuf_reset(&err);
-		strbuf_reset(&out);
-		if (format_ref_array_item(array.items[i], format, &out, &err))
-			die("%s", err.buf);
-		if (column_active(colopts)) {
-			assert(!filter->verbose && "--column and --verbose are incompatible");
-			 /* format to a string_list to let print_columns() do its job */
+	if (column_active(colopts)) {
+		struct strbuf out = STRBUF_INIT, err = STRBUF_INIT;
+
+		assert(!filter->verbose && "--column and --verbose are incompatible");
+
+		for (i = 0; i < array.nr; i++) {
+			strbuf_reset(&err);
+			strbuf_reset(&out);
+			if (format_ref_array_item(array.items[i], format, &out, &err))
+				die("%s", err.buf);
+
+			/* format to a string_list to let print_columns() do its job */
 			string_list_append(output, out.buf);
-		} else {
-			fwrite(out.buf, 1, out.len, stdout);
-			if (out.len || !format->array_opts.omit_empty)
-				putchar('\n');
 		}
+
+		strbuf_release(&err);
+		strbuf_release(&out);
+	} else {
+		print_formatted_ref_array(&array, format);
 	}
 
-	strbuf_release(&err);
-	strbuf_release(&out);
 	ref_array_clear(&array);
 	free(to_free);
 }
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 881c3ee055f..1c19cd5bd34 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -19,15 +19,11 @@ static char const * const for_each_ref_usage[] = {
 
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
-	int i, total;
 	struct ref_sorting *sorting;
 	struct string_list sorting_options = STRING_LIST_INIT_DUP;
 	int icase = 0;
-	struct ref_array array;
 	struct ref_filter filter = REF_FILTER_INIT;
 	struct ref_format format = REF_FORMAT_INIT;
-	struct strbuf output = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	int from_stdin = 0;
 	struct strvec vec = STRVEC_INIT;
 
@@ -61,8 +57,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 
-	memset(&array, 0, sizeof(array));
-
 	format.format = "%(objectname) %(objecttype)\t%(refname)";
 
 	git_config(git_default_config, NULL);
@@ -104,27 +98,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	}
 
 	filter.match_as_path = 1;
-	filter_refs(&array, &filter, FILTER_REFS_ALL);
-	filter_ahead_behind(the_repository, &format, &array);
-
-	ref_array_sort(sorting, &array);
-
-	total = format.array_opts.max_count;
-	if (!total || array.nr < total)
-		total = array.nr;
-	for (i = 0; i < total; i++) {
-		strbuf_reset(&err);
-		strbuf_reset(&output);
-		if (format_ref_array_item(array.items[i], &format, &output, &err))
-			die("%s", err.buf);
-		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !format.array_opts.omit_empty)
-			putchar('\n');
-	}
+	filter_and_format_refs(&filter, FILTER_REFS_ALL, sorting, &format);
 
-	strbuf_release(&err);
-	strbuf_release(&output);
-	ref_array_clear(&array);
 	ref_filter_clear(&filter);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
diff --git a/builtin/tag.c b/builtin/tag.c
index 2d599245d48..2528d499dd8 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -48,13 +48,7 @@ static int config_sign_tag = -1; /* unspecified */
 static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
 {
-	struct ref_array array;
-	struct strbuf output = STRBUF_INIT;
-	struct strbuf err = STRBUF_INIT;
 	char *to_free = NULL;
-	int i;
-
-	memset(&array, 0, sizeof(array));
 
 	if (filter->lines == -1)
 		filter->lines = 0;
@@ -72,23 +66,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
-	filter_refs(&array, filter, FILTER_REFS_TAGS);
-	filter_ahead_behind(the_repository, format, &array);
-	ref_array_sort(sorting, &array);
-
-	for (i = 0; i < array.nr; i++) {
-		strbuf_reset(&output);
-		strbuf_reset(&err);
-		if (format_ref_array_item(array.items[i], format, &output, &err))
-			die("%s", err.buf);
-		fwrite(output.buf, 1, output.len, stdout);
-		if (output.len || !format->array_opts.omit_empty)
-			putchar('\n');
-	}
+	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
 
-	strbuf_release(&err);
-	strbuf_release(&output);
-	ref_array_clear(&array);
 	free(to_free);
 
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index 5129b6986c9..8992fbf45b1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3023,6 +3023,18 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	return ret;
 }
 
+void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
+			    struct ref_sorting *sorting,
+			    struct ref_format *format)
+{
+	struct ref_array array = { 0 };
+	filter_refs(&array, filter, type);
+	filter_ahead_behind(the_repository, format, &array);
+	ref_array_sort(sorting, &array);
+	print_formatted_ref_array(&array, format);
+	ref_array_clear(&array);
+}
+
 static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
 {
 	if (!(a->kind ^ b->kind))
@@ -3212,6 +3224,29 @@ int format_ref_array_item(struct ref_array_item *info,
 	return 0;
 }
 
+void print_formatted_ref_array(struct ref_array *array, struct ref_format *format)
+{
+	int total;
+	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
+
+	total = format->array_opts.max_count;
+	if (!total || array->nr < total)
+		total = array->nr;
+	for (int i = 0; i < total; i++) {
+		strbuf_reset(&err);
+		strbuf_reset(&output);
+		if (format_ref_array_item(array->items[i], format, &output, &err))
+			die("%s", err.buf);
+		if (output.len || !format->array_opts.omit_empty) {
+			fwrite(output.buf, 1, output.len, stdout);
+			putchar('\n');
+		}
+	}
+
+	strbuf_release(&err);
+	strbuf_release(&output);
+}
+
 void pretty_print_ref(const char *name, const struct object_id *oid,
 		      struct ref_format *format)
 {
diff --git a/ref-filter.h b/ref-filter.h
index 0db3ff52889..0ce5af58ab3 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -137,6 +137,14 @@ struct ref_format {
  * filtered refs in the ref_array structure.
  */
 int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
+/*
+ * Filter refs using the given ref_filter and type, sort the contents
+ * according to the given ref_sorting, format the filtered refs with the
+ * given ref_format, and print them to stdout.
+ */
+void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
+			    struct ref_sorting *sorting,
+			    struct ref_format *format);
 /*  Clear all memory allocated to ref_array */
 void ref_array_clear(struct ref_array *array);
 /*  Used to verify if the given format is correct and to parse out the used atoms */
@@ -161,6 +169,12 @@ char *get_head_description(void);
 /*  Set up translated strings in the output. */
 void setup_ref_filter_porcelain_msg(void);
 
+/*
+ * Print up to maxcount ref_array elements to stdout using the given
+ * ref_format.
+ */
+void print_formatted_ref_array(struct ref_array *array, struct ref_format *format);
+
 /*
  * Print a single ref, outside of any ref-filter. Note that the
  * name must be a fully qualified refname.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Rename 'ref_filter_handler()' to 'filter_one()' to more clearly distinguish
it from other ref filtering callbacks that will be added in later patches.
The "*_one()" naming convention is common throughout the codebase for
iteration callbacks.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 8992fbf45b1..5186ee2687b 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2770,7 +2770,7 @@ struct ref_filter_cbdata {
  * A call-back given to for_each_ref().  Filter refs and keep them for
  * later object processing.
  */
-static int ref_filter_handler(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct ref_filter_cbdata *ref_cbdata = cb_data;
 	struct ref_filter *filter = ref_cbdata->filter;
@@ -3001,15 +3001,15 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/heads/", filter_one, &ref_cbdata);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/remotes/", filter_one, &ref_cbdata);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in("refs/tags/", filter_one, &ref_cbdata);
 		else if (filter->kind & FILTER_REFS_ALL)
-			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
+			ret = for_each_fullref_in_pattern(filter, filter_one, &ref_cbdata);
 		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			head_ref(ref_filter_handler, &ref_cbdata);
+			head_ref(filter_one, &ref_cbdata);
 	}
 
 	clear_contains_cache(&filter->internal.contains_cache);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (4 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
'filter_refs()' into new helper functions:

* Extract the code to grow a 'struct ref_array' and append a given 'struct
  ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'.
* Extract the code to filter a given ref by refname & object ID then create
  a new 'struct ref_array_item *' from 'filter_one()' into
  'apply_ref_filter()'.
* Extract the code for filter pre-processing, contains cache creation, and
  ref iteration from 'filter_refs()' into 'do_filter_refs()'.

In later patches, these helpers will be used by new ref-filter API
functions. This patch does not result in any user-facing behavior changes or
changes to callers outside of 'ref-filter.c'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 115 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 46 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 5186ee2687b..ff00ab4b8d8 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2716,15 +2716,18 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 	return ref;
 }
 
+static void ref_array_append(struct ref_array *array, struct ref_array_item *ref)
+{
+	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
+	array->items[array->nr++] = ref;
+}
+
 struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid)
 {
 	struct ref_array_item *ref = new_ref_array_item(refname, oid);
-
-	ALLOC_GROW(array->items, array->nr + 1, array->alloc);
-	array->items[array->nr++] = ref;
-
+	ref_array_append(array, ref);
 	return ref;
 }
 
@@ -2761,46 +2764,36 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 	return ref_kind_from_refname(refname);
 }
 
-struct ref_filter_cbdata {
-	struct ref_array *array;
-	struct ref_filter *filter;
-};
-
-/*
- * A call-back given to for_each_ref().  Filter refs and keep them for
- * later object processing.
- */
-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
+			    int flag, struct ref_filter *filter)
 {
-	struct ref_filter_cbdata *ref_cbdata = cb_data;
-	struct ref_filter *filter = ref_cbdata->filter;
 	struct ref_array_item *ref;
 	struct commit *commit = NULL;
 	unsigned int kind;
 
 	if (flag & REF_BAD_NAME) {
 		warning(_("ignoring ref with broken name %s"), refname);
-		return 0;
+		return NULL;
 	}
 
 	if (flag & REF_ISBROKEN) {
 		warning(_("ignoring broken ref %s"), refname);
-		return 0;
+		return NULL;
 	}
 
 	/* Obtain the current ref kind from filter_ref_kind() and ignore unwanted refs. */
 	kind = filter_ref_kind(filter, refname);
 	if (!(kind & filter->kind))
-		return 0;
+		return NULL;
 
 	if (!filter_pattern_match(filter, refname))
-		return 0;
+		return NULL;
 
 	if (filter_exclude_match(filter, refname))
-		return 0;
+		return NULL;
 
 	if (filter->points_at.nr && !match_points_at(&filter->points_at, oid, refname))
-		return 0;
+		return NULL;
 
 	/*
 	 * A merge filter is applied on refs pointing to commits. Hence
@@ -2811,15 +2804,15 @@ static int filter_one(const char *refname, const struct object_id *oid, int flag
 	    filter->with_commit || filter->no_commit || filter->verbose) {
 		commit = lookup_commit_reference_gently(the_repository, oid, 1);
 		if (!commit)
-			return 0;
+			return NULL;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
 		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
-			return 0;
+			return NULL;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
 		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
-			return 0;
+			return NULL;
 	}
 
 	/*
@@ -2827,11 +2820,32 @@ static int filter_one(const char *refname, const struct object_id *oid, int flag
 	 * to do its job and the resulting list may yet to be pruned
 	 * by maxcount logic.
 	 */
-	ref = ref_array_push(ref_cbdata->array, refname, oid);
+	ref = new_ref_array_item(refname, oid);
 	ref->commit = commit;
 	ref->flag = flag;
 	ref->kind = kind;
 
+	return ref;
+}
+
+struct ref_filter_cbdata {
+	struct ref_array *array;
+	struct ref_filter *filter;
+};
+
+/*
+ * A call-back given to for_each_ref().  Filter refs and keep them for
+ * later object processing.
+ */
+static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+{
+	struct ref_filter_cbdata *ref_cbdata = cb_data;
+	struct ref_array_item *ref;
+
+	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	if (ref)
+		ref_array_append(ref_cbdata->array, ref);
+
 	return 0;
 }
 
@@ -2967,26 +2981,12 @@ void filter_ahead_behind(struct repository *r,
 	free(commits);
 }
 
-/*
- * API for filtering a set of refs. Based on the type of refs the user
- * has requested, we iterate through those refs and apply filters
- * as per the given ref_filter structure and finally store the
- * filtered refs in the ref_array structure.
- */
-int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
+static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
 {
-	struct ref_filter_cbdata ref_cbdata;
-	int save_commit_buffer_orig;
 	int ret = 0;
 
-	ref_cbdata.array = array;
-	ref_cbdata.filter = filter;
-
 	filter->kind = type & FILTER_REFS_KIND_MASK;
 
-	save_commit_buffer_orig = save_commit_buffer;
-	save_commit_buffer = 0;
-
 	init_contains_cache(&filter->internal.contains_cache);
 	init_contains_cache(&filter->internal.no_contains_cache);
 
@@ -3001,20 +3001,43 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = for_each_fullref_in("refs/heads/", filter_one, &ref_cbdata);
+			ret = for_each_fullref_in("refs/heads/", fn, cb_data);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = for_each_fullref_in("refs/remotes/", filter_one, &ref_cbdata);
+			ret = for_each_fullref_in("refs/remotes/", fn, cb_data);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = for_each_fullref_in("refs/tags/", filter_one, &ref_cbdata);
+			ret = for_each_fullref_in("refs/tags/", fn, cb_data);
 		else if (filter->kind & FILTER_REFS_ALL)
-			ret = for_each_fullref_in_pattern(filter, filter_one, &ref_cbdata);
+			ret = for_each_fullref_in_pattern(filter, fn, cb_data);
 		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			head_ref(filter_one, &ref_cbdata);
+			head_ref(fn, cb_data);
 	}
 
 	clear_contains_cache(&filter->internal.contains_cache);
 	clear_contains_cache(&filter->internal.no_contains_cache);
 
+	return ret;
+}
+
+/*
+ * API for filtering a set of refs. Based on the type of refs the user
+ * has requested, we iterate through those refs and apply filters
+ * as per the given ref_filter structure and finally store the
+ * filtered refs in the ref_array structure.
+ */
+int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type)
+{
+	struct ref_filter_cbdata ref_cbdata;
+	int save_commit_buffer_orig;
+	int ret = 0;
+
+	ref_cbdata.array = array;
+	ref_cbdata.filter = filter;
+
+	save_commit_buffer_orig = save_commit_buffer;
+	save_commit_buffer = 0;
+
+	ret = do_filter_refs(filter, type, filter_one, &ref_cbdata);
+
 	/*  Filters that need revision walking */
 	reach_filter(array, &filter->reachable_from, INCLUDE_REACHED);
 	reach_filter(array, &filter->unreachable_from, EXCLUDE_REACHED);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (5 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'filter_and_format_refs()' to try to perform ref filtering &
formatting in a single ref iteration, without an intermediate 'struct
ref_array'. This can only be done if no operations need to be performed on a
pre-filtered array; specifically, if the refs are

- filtered on reachability,
- sorted, or
- formatted with ahead-behind information

they cannot be filtered & formatted in the same iteration. In that case,
fall back on the current filter-then-sort-then-format flow.

This optimization substantially improves memory usage due to no longer
storing a ref array in memory. In some cases, it also dramatically reduces
runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
all refs into a 'struct ref_array' to printing only the first ref).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 ref-filter.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index ff00ab4b8d8..48453db24f7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2863,6 +2863,49 @@ static void free_array_item(struct ref_array_item *item)
 	free(item);
 }
 
+struct ref_filter_and_format_cbdata {
+	struct ref_filter *filter;
+	struct ref_format *format;
+
+	struct ref_filter_and_format_internal {
+		int count;
+	} internal;
+};
+
+static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+{
+	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
+	struct ref_array_item *ref;
+	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
+
+	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	if (!ref)
+		return 0;
+
+	if (format_ref_array_item(ref, ref_cbdata->format, &output, &err))
+		die("%s", err.buf);
+
+	if (output.len || !ref_cbdata->format->array_opts.omit_empty) {
+		fwrite(output.buf, 1, output.len, stdout);
+		putchar('\n');
+	}
+
+	strbuf_release(&output);
+	strbuf_release(&err);
+	free_array_item(ref);
+
+	/*
+	 * Increment the running count of refs that match the filter. If
+	 * max_count is set and we've reached the max, stop the ref
+	 * iteration by returning a nonzero value.
+	 */
+	if (ref_cbdata->format->array_opts.max_count &&
+	    ++ref_cbdata->internal.count >= ref_cbdata->format->array_opts.max_count)
+		return 1;
+
+	return 0;
+}
+
 /* Free all memory allocated for ref_array */
 void ref_array_clear(struct ref_array *array)
 {
@@ -3046,16 +3089,49 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 	return ret;
 }
 
+static inline int can_do_iterative_format(struct ref_filter *filter,
+					  struct ref_sorting *sorting,
+					  struct ref_format *format)
+{
+	/*
+	 * Filtering & formatting results within a single ref iteration
+	 * callback is not compatible with options that require
+	 * post-processing a filtered ref_array. These include:
+	 * - filtering on reachability
+	 * - sorting the filtered results
+	 * - including ahead-behind information in the formatted output
+	 */
+	return !(filter->reachable_from ||
+		 filter->unreachable_from ||
+		 sorting ||
+		 format->bases.nr);
+}
+
 void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
 			    struct ref_sorting *sorting,
 			    struct ref_format *format)
 {
-	struct ref_array array = { 0 };
-	filter_refs(&array, filter, type);
-	filter_ahead_behind(the_repository, format, &array);
-	ref_array_sort(sorting, &array);
-	print_formatted_ref_array(&array, format);
-	ref_array_clear(&array);
+	if (can_do_iterative_format(filter, sorting, format)) {
+		int save_commit_buffer_orig;
+		struct ref_filter_and_format_cbdata ref_cbdata = {
+			.filter = filter,
+			.format = format,
+		};
+
+		save_commit_buffer_orig = save_commit_buffer;
+		save_commit_buffer = 0;
+
+		do_filter_refs(filter, type, filter_and_format_one, &ref_cbdata);
+
+		save_commit_buffer = save_commit_buffer_orig;
+	} else {
+		struct ref_array array = { 0 };
+		filter_refs(&array, filter, type);
+		filter_ahead_behind(the_repository, format, &array);
+		ref_array_sort(sorting, &array);
+		print_formatted_ref_array(&array, format);
+		ref_array_clear(&array);
+	}
 }
 
 static int compare_detached_head(struct ref_array_item *a, struct ref_array_item *b)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 08/10] for-each-ref: clean up documentation of --format
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (6 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
  2023-11-14 19:53   ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Move the description of the `*` prefix from the --format option
documentation to the part of the command documentation that deals with other
object type-specific modifiers. Also reorganize and reword the remaining
--format documentation so that the explanation of the default format doesn't
interrupt the details on format string interpolation.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-for-each-ref.txt | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index e86d5700ddf..b136c9fa908 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -51,17 +51,14 @@ OPTIONS
 	key.
 
 --format=<format>::
-	A string that interpolates `%(fieldname)` from a ref being shown
-	and the object it points at.  If `fieldname`
-	is prefixed with an asterisk (`*`) and the ref points
-	at a tag object, use the value for the field in the object
-	which the tag object refers to (instead of the field in the tag object).
-	When unspecified, `<format>` defaults to
-	`%(objectname) SPC %(objecttype) TAB %(refname)`.
-	It also interpolates `%%` to `%`, and `%xx` where `xx`
-	are hex digits interpolates to character with hex code
-	`xx`; for example `%00` interpolates to `\0` (NUL),
-	`%09` to `\t` (TAB) and `%0a` to `\n` (LF).
+	A string that interpolates `%(fieldname)` from a ref being shown and
+	the object it points at. In addition, the string literal `%%`
+	renders as `%` and `%xx` - where `xx` are hex digits - renders as
+	the character with hex code `xx`. For example, `%00` interpolates to
+	`\0` (NUL), `%09` to `\t` (TAB), and `%0a` to `\n` (LF).
++
+When unspecified, `<format>` defaults to `%(objectname) SPC %(objecttype)
+TAB %(refname)`.
 
 --color[=<when>]::
 	Respect any colors specified in the `--format` option. The
@@ -298,6 +295,10 @@ fields will correspond to the appropriate date or name-email-date tuple
 from the `committer` or `tagger` fields depending on the object type.
 These are intended for working on a mix of annotated and lightweight tags.
 
+For tag objects, a `fieldname` prefixed with an asterisk (`*`) expands to
+the `fieldname` value of object the tag points at, rather than that of the
+tag object itself.
+
 Fields that have name-email-date tuple as its value (`author`,
 `committer`, and `tagger`) can be suffixed with `name`, `email`,
 and `date` to extract the named component.  For email fields (`authoremail`,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (7 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  2023-11-16  5:48     ` Junio C Hamano
  2023-11-14 19:53   ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
  9 siblings, 1 reply; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
"dereferencing" a tag refers to a recursive peel of the tag object. Unlike
these cases, the dereferencing prefix ('*') in 'for-each-ref' format
specifiers triggers only a single, non-recursive dereference of a given tag
object. For most annotated tags, a single dereference is all that is needed
to access the tag's associated commit or tree; "recursive" and
"non-recursive" dereferencing are functionally equivalent in these cases.
However, nested tags (annotated tags whose target is another annotated tag)
dereferenced once return another tag, where a recursive dereference would
return the commit or tree.

Currently, if a user wants to filter & format refs and include information
about a recursively-dereferenced tag, they can do so with something like
'cat-file --batch-check':

    git for-each-ref --format="%(objectname)^{} %(refname)" <pattern> |
        git cat-file --batch-check="%(objectname) %(rest)"

But the combination of commands is inefficient. So, to improve the
performance of this use case and align the defererencing behavior of
'for-each-ref' with that of other commands, update the ref formatting code
to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields
rather than the tag's immediate target object (from 'get_tagged_oid()').

Additionally, add a test to 't6300-for-each-ref' to verify new nested tag
behavior and update 't6302-for-each-ref-filter.sh' to print the correct
value for nested dereferenced fields.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-for-each-ref.txt |  4 ++--
 ref-filter.c                       | 13 ++++---------
 t/t6300-for-each-ref.sh            | 22 ++++++++++++++++++++++
 t/t6302-for-each-ref-filter.sh     |  4 ++--
 4 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index b136c9fa908..be9543f6840 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -296,8 +296,8 @@ from the `committer` or `tagger` fields depending on the object type.
 These are intended for working on a mix of annotated and lightweight tags.
 
 For tag objects, a `fieldname` prefixed with an asterisk (`*`) expands to
-the `fieldname` value of object the tag points at, rather than that of the
-tag object itself.
+the `fieldname` value of the peeled object, rather than that of the tag
+object itself.
 
 Fields that have name-email-date tuple as its value (`author`,
 `committer`, and `tagger`) can be suffixed with `name`, `email`,
diff --git a/ref-filter.c b/ref-filter.c
index 48453db24f7..fdaabb5bb45 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2508,17 +2508,12 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		return 0;
 
 	/*
-	 * If it is a tag object, see if we use a value that derefs
-	 * the object, and if we do grab the object it refers to.
+	 * If it is a tag object, see if we use the peeled value. If we do,
+	 * grab the peeled OID.
 	 */
-	oi_deref.oid = *get_tagged_oid((struct tag *)obj);
+	if (need_tagged && peel_iterated_oid(&obj->oid, &oi_deref.oid))
+		die("bad tag");
 
-	/*
-	 * NEEDSWORK: This derefs tag only once, which
-	 * is good to deal with chains of trust, but
-	 * is not consistent with what deref_tag() does
-	 * which peels the onion to the core.
-	 */
 	return get_object(ref, 1, &obj, &oi_deref, err);
 }
 
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 0613e5e3623..54e22812598 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1839,6 +1839,28 @@ test_expect_success 'git for-each-ref with non-existing refs' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'git for-each-ref with nested tags' '
+	git tag -am "Normal tag" nested/base HEAD &&
+	git tag -am "Nested tag" nested/nest1 refs/tags/nested/base &&
+	git tag -am "Double nested tag" nested/nest2 refs/tags/nested/nest1 &&
+
+	head_oid="$(git rev-parse HEAD)" &&
+	base_tag_oid="$(git rev-parse refs/tags/nested/base)" &&
+	nest1_tag_oid="$(git rev-parse refs/tags/nested/nest1)" &&
+	nest2_tag_oid="$(git rev-parse refs/tags/nested/nest2)" &&
+
+	cat >expect <<-EOF &&
+	refs/tags/nested/base $base_tag_oid tag $head_oid commit
+	refs/tags/nested/nest1 $nest1_tag_oid tag $head_oid commit
+	refs/tags/nested/nest2 $nest2_tag_oid tag $head_oid commit
+	EOF
+
+	git for-each-ref \
+		--format="%(refname) %(objectname) %(objecttype) %(*objectname) %(*objecttype)" \
+		refs/tags/nested/ >actual &&
+	test_cmp expect actual
+'
+
 GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
 TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
 
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index af223e44d67..82f3d1ea0f2 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -45,8 +45,8 @@ test_expect_success 'check signed tags with --points-at' '
 	sed -e "s/Z$//" >expect <<-\EOF &&
 	refs/heads/side Z
 	refs/tags/annotated-tag four
-	refs/tags/doubly-annotated-tag An annotated tag
-	refs/tags/doubly-signed-tag A signed tag
+	refs/tags/doubly-annotated-tag four
+	refs/tags/doubly-signed-tag four
 	refs/tags/four Z
 	refs/tags/signed-tag four
 	EOF
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 10/10] t/perf: add perf tests for for-each-ref
  2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
                     ` (8 preceding siblings ...)
  2023-11-14 19:53   ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
@ 2023-11-14 19:53   ` Victoria Dye via GitGitGadget
  9 siblings, 0 replies; 49+ messages in thread
From: Victoria Dye via GitGitGadget @ 2023-11-14 19:53 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Øystein Walle, Kristoffer Haugsbakk,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add performance tests for 'for-each-ref'. The tests exercise different
combinations of filters/formats/options, as well as the overall performance
of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
performance difference vs. 'git for-each-ref' with "%(*fieldname)" format
specifiers.

All tests are run against a repository with 40k loose refs - 10k commits,
each having a unique:

- branch
- custom ref (refs/custom/special_*)
- annotated tag pointing at the commit
- annotated tag pointing at the other annotated tag (i.e., a nested tag)

After those tests are finished, the refs are packed with 'pack-refs --all'
and the same tests are rerun.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p6300-for-each-ref.sh | 87 ++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100755 t/perf/p6300-for-each-ref.sh

diff --git a/t/perf/p6300-for-each-ref.sh b/t/perf/p6300-for-each-ref.sh
new file mode 100755
index 00000000000..fa7289c7522
--- /dev/null
+++ b/t/perf/p6300-for-each-ref.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+
+test_description='performance of for-each-ref'
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+ref_count_per_type=10000
+test_iteration_count=10
+
+test_expect_success "setup" '
+	test_commit_bulk $(( 1 + $ref_count_per_type )) &&
+
+	# Create refs
+	test_seq $ref_count_per_type |
+		sed "s,.*,update refs/heads/branch_& HEAD~&\nupdate refs/custom/special_& HEAD~&," |
+		git update-ref --stdin &&
+
+	# Create annotated tags
+	for i in $(test_seq $ref_count_per_type)
+	do
+		# Base tags
+		echo "tag tag_$i" &&
+		echo "mark :$i" &&
+		echo "from HEAD~$i" &&
+		printf "tagger %s <%s> %s\n" \
+			"$GIT_COMMITTER_NAME" \
+			"$GIT_COMMITTER_EMAIL" \
+			"$GIT_COMMITTER_DATE" &&
+		echo "data <<EOF" &&
+		echo "tag $i" &&
+		echo "EOF" &&
+
+		# Nested tags
+		echo "tag nested_$i" &&
+		echo "from :$i" &&
+		printf "tagger %s <%s> %s\n" \
+			"$GIT_COMMITTER_NAME" \
+			"$GIT_COMMITTER_EMAIL" \
+			"$GIT_COMMITTER_DATE" &&
+		echo "data <<EOF" &&
+		echo "nested tag $i" &&
+		echo "EOF" || return 1
+	done | git fast-import
+'
+
+test_for_each_ref () {
+	title="for-each-ref"
+	if test $# -gt 0; then
+		title="$title ($1)"
+		shift
+	fi
+	args="$@"
+
+	test_perf "$title" "
+		for i in \$(test_seq $test_iteration_count); do
+			git for-each-ref $args >/dev/null
+		done
+	"
+}
+
+run_tests () {
+	test_for_each_ref "$1"
+	test_for_each_ref "$1, no sort" --no-sort
+	test_for_each_ref "$1, --count=1" --count=1
+	test_for_each_ref "$1, --count=1, no sort" --no-sort --count=1
+	test_for_each_ref "$1, tags" refs/tags/
+	test_for_each_ref "$1, tags, no sort" --no-sort refs/tags/
+	test_for_each_ref "$1, tags, dereferenced" '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+	test_for_each_ref "$1, tags, dereferenced, no sort" --no-sort '--format="%(refname) %(objectname) %(*objectname)"' refs/tags/
+
+	test_perf "for-each-ref ($1, tags) + cat-file --batch-check (dereferenced)" "
+		for i in \$(test_seq $test_iteration_count); do
+			git for-each-ref --format='%(objectname)^{} %(refname) %(objectname)' refs/tags/ | \
+				git cat-file --batch-check='%(objectname) %(rest)' >/dev/null
+		done
+	"
+}
+
+run_tests "loose"
+
+test_expect_success 'pack refs' '
+	git pack-refs --all
+'
+run_tests "packed"
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort
  2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
@ 2023-11-16  5:29     ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-16  5:29 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, Patrick Steinhardt, Øystein Walle,
	Kristoffer Haugsbakk, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
> printed refs are still sorted by ascending refname. Change the handling of
> sort options in these commands so that '--no-sort' to truly disables
> sorting.

"to truly disables" -> "truly disables" I think?

> '--no-sort' does not disable sorting in these commands is because their

"'--no-sort' does not" -> "The reason why '--no-sort' does not", or
"is because" -> "because".

> option parsing does not distinguish between "the absence of '--sort'"
> (and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
> an empty 'sorting_options' string list, which is parsed by
> 'ref_sorting_options()' to create the 'struct ref_sorting *' for the
> command. If the string list is empty, 'ref_sorting_options()' interprets
> that as "the absence of '--sort'" and returns the default ref sorting
> structure (equivalent to "refname" sort).
>
> To handle '--no-sort' properly while preserving the "refname" sort in the
> "absence of --sort'" case, first explicitly add "refname" to the string list
> *before* parsing options. This alone doesn't actually change any behavior,
> since 'compare_refs()' already falls back on comparing refnames if two refs
> are equal w.r.t all other sort keys.
>
> Now that the string list is populated by default, '--no-sort' is the only
> way to empty the 'sorting_options' string list. Update
> 'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
> string list is empty, and add a condition to 'ref_array_sort()' to skip the
> sort altogether if the sort structure is NULL. Note that other functions
> using 'struct ref_sorting *' do not need any changes because they already
> ignore NULL values.

Nice.

> Finally, remove the condition around sorting in 'ls-remote', since it's no
> longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
> sorting by default. This default is preserved by simply leaving its sort key
> string list empty before parsing options; if no additional sort keys are
> set, 'struct ref_sorting *' is NULL and sorting is skipped.

Doubly nice.

> diff --git a/ref-filter.c b/ref-filter.c
> index e4d3510e28e..7250089b7c6 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3142,7 +3142,8 @@ void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting,
>  
>  void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
>  {
> -	QSORT_S(array->items, array->nr, compare_refs, sorting);
> +	if (sorting)
> +		QSORT_S(array->items, array->nr, compare_refs, sorting);
>  }

Nice.  We do allow passing NULL to ref_sorting_release(), and we can
return NULL from ref_sorting_options(), and allowing NULL to be
passed to this function makes it easier for the callers to deal with
the case where no sorting is specified.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 3182abde27f..9918ba05dec 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1570,9 +1570,10 @@ test_expect_success 'tracking with unexpected .fetch refspec' '
>  
>  test_expect_success 'configured committerdate sort' '
>  	git init -b main sort &&
> +	test_config -C sort branch.sort "committerdate" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort committerdate &&
>  		test_commit initial &&
>  		git checkout -b a &&
>  		test_commit a &&
> @@ -1592,9 +1593,10 @@ test_expect_success 'configured committerdate sort' '
>  '
>  
>  test_expect_success 'option override configured sort' '
> +	test_config -C sort branch.sort "committerdate" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort committerdate &&
>  		git branch --sort=refname >actual &&
>  		cat >expect <<-\EOF &&
>  		  a

The above two are not strictly necessary for the purpose of this
patch, in that the tests that come after these tests do not care if
the branch.sort configuration variable is set in the "sort"
repository, as they set their own value before doing their test.

But of course, cleaning up after yourself with test_config and
friends is a good idea regardless, and a handful of new tests added
after this point follow the same pattern.  Good.

> @@ -1606,10 +1608,70 @@ test_expect_success 'option override configured sort' '
>  	)
>  '
>  
> +test_expect_success '--no-sort cancels config sort keys' '
> +	test_config -C sort branch.sort "-refname" &&
> +
> +	(
> +		cd sort &&
> +
> +		# objecttype is identical for all of them, so sort falls back on
> +		# default (ascending refname)

Interesting.

> +		git branch \
> +			--no-sort \
> +			--sort="objecttype" >actual &&
> +		cat >expect <<-\EOF &&
> +		  a
> +		* b
> +		  c
> +		  main
> +		EOF
> +		test_cmp expect actual
> +	)
> +
> +'
> +
> +test_expect_success '--no-sort cancels command line sort keys' '
> +	(
> +		cd sort &&
> +
> +		# objecttype is identical for all of them, so sort falls back on
> +		# default (ascending refname)
> +		git branch \
> +			--sort="-refname" \
> +			--no-sort \
> +			--sort="objecttype" >actual &&

OK, this exercises the same "--no-sort cleans the slate" as before,
and for this one it is essential that we lack branch.sort after the
previous step is done, which is ensured thanks to the use of
test_config in the previous one.  Nice.

> +		cat >expect <<-\EOF &&
> +		  a
> +		* b
> +		  c
> +		  main
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success '--no-sort without subsequent --sort prints expected branches' '
> +	(
> +		cd sort &&
> +
> +		# Sort the results with `sort` for a consistent comparison
> +		# against expected
> +		git branch --no-sort | sort >actual &&
> +		cat >expect <<-\EOF &&
> +		  a
> +		  c
> +		  main
> +		* b
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_expect_success 'invalid sort parameter in configuration' '
> +	test_config -C sort branch.sort "v:notvalid" &&
> +
>  	(
>  		cd sort &&
> -		git config branch.sort "v:notvalid" &&
>  
>  		# this works in the "listing" mode, so bad sort key
>  		# is a dying offence.

With such an invalid configuration value set, running the command
with "--no-sort" would stop the command from failing?  Is that worth
protecting with a new test, I wonder.

Overall very nicely done.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only
  2023-11-14 19:53   ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
@ 2023-11-16  5:39     ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-16  5:39 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, Patrick Steinhardt, Øystein Walle,
	Kristoffer Haugsbakk, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This consolidates much of the code used to filter and format refs in
> 'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing
> duplication and simplifying the future changes needed to optimize the filter
> & format process.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/branch.c       | 33 +++++++++++++++++----------------
>  builtin/for-each-ref.c | 27 +--------------------------
>  builtin/tag.c          | 23 +----------------------
>  ref-filter.c           | 35 +++++++++++++++++++++++++++++++++++
>  ref-filter.h           | 14 ++++++++++++++
>  5 files changed, 68 insertions(+), 64 deletions(-)

The amount of existing duplication of code is rather surprising, and
this patch nicely refactors to improve.  Good.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields
  2023-11-14 19:53   ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
@ 2023-11-16  5:48     ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-11-16  5:48 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, Patrick Steinhardt, Øystein Walle,
	Kristoffer Haugsbakk, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
> "dereferencing" a tag refers to a recursive peel of the tag object. Unlike
> these cases, the dereferencing prefix ('*') in 'for-each-ref' format
> specifiers triggers only a single, non-recursive dereference of a given tag
> object. For most annotated tags, a single dereference is all that is needed
> to access the tag's associated commit or tree; "recursive" and
> "non-recursive" dereferencing are functionally equivalent in these cases.
> However, nested tags (annotated tags whose target is another annotated tag)
> dereferenced once return another tag, where a recursive dereference would
> return the commit or tree.

This may be the only potentially controversial step in the series.

> -	/*
> -	 * NEEDSWORK: This derefs tag only once, which
> -	 * is good to deal with chains of trust, but
> -	 * is not consistent with what deref_tag() does
> -	 * which peels the onion to the core.
> -	 */
>  	return get_object(ref, 1, &obj, &oi_deref, err);
>  }

Very nice to see an ancient comment I added at 9f613ddd (Add
git-for-each-ref: helper for language bindings, 2006-09-15) finally
go.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format
  2023-11-14 19:53   ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
@ 2023-11-16 12:06     ` Øystein Walle
  0 siblings, 0 replies; 49+ messages in thread
From: Øystein Walle @ 2023-11-16 12:06 UTC (permalink / raw)
  To: gitgitgadget; +Cc: code, git, oystwa, ps, vdye

Victoria Dye <vdye@github.com> writes:

> diff --git a/ref-filter.h b/ref-filter.h
> index 1524bc463a5..d87d61238b7 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -92,6 +92,11 @@ struct ref_format {
>  
>  	/* List of bases for ahead-behind counts. */
>  	struct string_list bases;
> +
> +	struct {
> +		int max_count;
> +		int omit_empty;
> +	} array_opts;
>  };

What the benefit of having them in a nested struct is compared to just
two distinct members?

Regardless this is the kind of deduplication I wanted to achieve when I
added --omit-empty, but never did. Either way, I meant to ack this in
the last round never got around to it. Nice work.

Acked-by: Øystein Walle <oystwa@gmail.com>

Øsse

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-11-16 12:06 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-07  1:25 [PATCH 0/9] for-each-ref optimizations & usability improvements Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 1/9] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 18:13     ` Victoria Dye
2023-11-07  1:25 ` [PATCH 2/9] for-each-ref: clarify interaction of --omit-empty & --count Victoria Dye via GitGitGadget
2023-11-07 19:23   ` Øystein Walle
2023-11-07 19:30     ` Victoria Dye
2023-11-08  7:53       ` Øystein Walle
2023-11-08 10:00         ` Kristoffer Haugsbakk
2023-11-07  1:25 ` [PATCH 3/9] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 4/9] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07  1:25 ` [PATCH 5/9] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-07  1:25 ` [PATCH 6/9] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 18:41     ` Victoria Dye
2023-11-07  1:25 ` [PATCH 7/9] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-07 10:49   ` Patrick Steinhardt
2023-11-07 19:45     ` Victoria Dye
2023-11-07  1:26 ` [PATCH 8/9] for-each-ref: add option to fully dereference tags Victoria Dye via GitGitGadget
2023-11-07 10:50   ` Patrick Steinhardt
2023-11-08  1:13     ` Victoria Dye
2023-11-08  3:14       ` Junio C Hamano
2023-11-08  7:19         ` Patrick Steinhardt
2023-11-08 18:02         ` Victoria Dye
2023-11-09  1:22           ` Junio C Hamano
2023-11-09  1:23           ` Junio C Hamano
2023-11-09  1:32             ` Junio C Hamano
2023-11-07  1:26 ` [PATCH 9/9] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget
2023-11-07  2:36 ` [PATCH 0/9] for-each-ref optimizations & usability improvements Junio C Hamano
2023-11-07  2:48   ` Victoria Dye
2023-11-07  3:04     ` Junio C Hamano
2023-11-07 10:49     ` Patrick Steinhardt
2023-11-08  1:31       ` Victoria Dye
2023-11-14 19:53 ` [PATCH v2 00/10] " Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 01/10] ref-filter.c: really don't sort when using --no-sort Victoria Dye via GitGitGadget
2023-11-16  5:29     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 02/10] ref-filter.h: add max_count and omit_empty to ref_format Victoria Dye via GitGitGadget
2023-11-16 12:06     ` Øystein Walle
2023-11-14 19:53   ` [PATCH v2 03/10] ref-filter.h: move contains caches into filter Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 04/10] ref-filter.h: add functions for filter/format & format-only Victoria Dye via GitGitGadget
2023-11-16  5:39     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 05/10] ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 06/10] ref-filter.c: refactor to create common helper functions Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 07/10] ref-filter.c: filter & format refs in the same callback Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 08/10] for-each-ref: clean up documentation of --format Victoria Dye via GitGitGadget
2023-11-14 19:53   ` [PATCH v2 09/10] ref-filter.c: use peeled tag for '*' format fields Victoria Dye via GitGitGadget
2023-11-16  5:48     ` Junio C Hamano
2023-11-14 19:53   ` [PATCH v2 10/10] t/perf: add perf tests for for-each-ref Victoria Dye via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).