All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, me@ttaylorr.com,
	vdye@github.com, Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v2 1/8] for-each-ref: add --stdin option
Date: Wed, 15 Mar 2023 14:37:39 +0100	[thread overview]
Message-ID: <230315.86h6umxh7c.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <a1d9e0f6ff6660c9264673be18bc24956f74eb9c.1678468864.git.gitgitgadget@gmail.com>


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When a user wishes to input a large list of patterns to 'git
> for-each-ref' (likely a long list of exact refs) there are frequently
> system limits on the number of command-line arguments.

Okey, and the current API assumes you just assign "argv" to this, but...

> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

..sounds potentially nasty...

> @@ -75,7 +77,27 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>  	filter.ignore_case = icase;
>  
> -	filter.name_patterns = argv;
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +		size_t nr = 0, alloc = 16;
> +
> +		if (argv[0])
> +			die(_("unknown arguments supplied with --stdin"));
> +
> +		CALLOC_ARRAY(filter.name_patterns, alloc);
> +
> +		while (strbuf_getline(&line, stdin) != EOF) {
> +			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
> +		}
> +
> +		/* Add a terminating NULL string. */
> +		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +		filter.name_patterns[nr + 1] = NULL;
> +	} else {
> +		filter.name_patterns = argv;
> +	}
> +
>  	filter.match_as_path = 1;
>  	filter_refs(&array, &filter, FILTER_REFS_ALL);
>  	ref_array_sort(sorting, &array);
> @@ -97,5 +119,10 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	free_commit_list(filter.with_commit);
>  	free_commit_list(filter.no_commit);
>  	ref_sorting_release(sorting);
> +	if (from_stdin) {
> +		for (size_t i = 0; filter.name_patterns[i]; i++)
> +			free((char *)filter.name_patterns[i]);
> +		free(filter.name_patterns);
> +	}
>  	return 0;
>  }

Why do we need to seemingly re-invent a "struct strvec" here? I tried to
simplify this on top of this (well, "seen"), and we can get rid of all
of this manual memory management & trailing NULL juggling as a result:
	
	diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
	index cf5ba6ffc12..13b75eff28c 100644
	--- a/builtin/for-each-ref.c
	+++ b/builtin/for-each-ref.c
	@@ -7,6 +7,7 @@
	 #include "parse-options.h"
	 #include "ref-filter.h"
	 #include "commit-reach.h"
	+#include "strvec.h"
	 
	 static char const * const for_each_ref_usage[] = {
	 	N_("git for-each-ref [<options>] [<pattern>]"),
	@@ -27,6 +28,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 	struct ref_format format = REF_FORMAT_INIT;
	 	struct strbuf output = STRBUF_INIT;
	 	struct strbuf err = STRBUF_INIT;
	+	struct strvec stdin_pat = STRVEC_INIT;
	 	int from_stdin = 0;
	 
	 	struct option opts[] = {
	@@ -81,21 +83,13 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 
	 	if (from_stdin) {
	 		struct strbuf line = STRBUF_INIT;
	-		size_t nr = 0, alloc = 16;
	 
	 		if (argv[0])
	 			die(_("unknown arguments supplied with --stdin"));
	 
	-		CALLOC_ARRAY(filter.name_patterns, alloc);
	-
	-		while (strbuf_getline(&line, stdin) != EOF) {
	-			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
	-			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
	-		}
	-
	-		/* Add a terminating NULL string. */
	-		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
	-		filter.name_patterns[nr + 1] = NULL;
	+		while (strbuf_getline(&line, stdin) != EOF)
	+			strvec_push(&stdin_pat, line.buf);
	+		filter.name_patterns = stdin_pat.v;
	 	} else {
	 		filter.name_patterns = argv;
	 	}
	@@ -123,10 +117,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 	free_commit_list(filter.with_commit);
	 	free_commit_list(filter.no_commit);
	 	ref_sorting_release(sorting);
	-	if (from_stdin) {
	-		for (size_t i = 0; filter.name_patterns[i]; i++)
	-			free(filter.name_patterns[i]);
	-		free(filter.name_patterns);
	-	}
	+	strvec_clear(&stdin_pat);
	 	return 0;
	 }

It *is* an extra copy though, as your implementation re-uses the strbuf
we already allocated.

But presumably that's trivial in this case, and if we care I think we
should resurrect something like [1] instead, i.e. we could just teach
the strvec API to have a strvec_push_nodup(). But I doubt that in this
case it'll matter.

In any case, if you don't want to take this as-is, please fix this so
that we're not reaching into the "filter.name_patterns" and casting its
"const char" to "char".

If we're going to add a hack here that API should instead know how to
free its own resources (so we could clean up the free_commit_list() here
seen in the context), and we could carry some "my argv needs free-ing".

But none of that seems needed in this case, this is just another case
where we can pretend that we have a "normal" argv, and then clean up our
own strvec, no?

1. https://lore.kernel.org/git/65a620b08ef359e29d678497f1b529e3ce6477b1.1673475190.git.gitgitgadget@gmail.com/

  parent reply	other threads:[~2023-03-15 13:45 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
2023-03-06 18:48   ` Junio C Hamano
2023-03-07  0:40     ` Taylor Blau
2023-03-08 22:14       ` Derrick Stolee
2023-03-08 22:56         ` Junio C Hamano
2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
2023-03-07  0:43   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
2023-03-07  0:46   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-06 18:52   ` Junio C Hamano
2023-03-07  0:50     ` Taylor Blau
2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-07  1:05   ` Taylor Blau
2023-03-09 17:32     ` Derrick Stolee
2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
2023-03-06 20:18   ` Derrick Stolee
2023-03-06 22:24     ` Junio C Hamano
2023-03-07  0:36   ` Taylor Blau
2023-03-09  9:20     ` Jeff King
2023-03-09 21:51       ` Junio C Hamano
2023-03-07  0:33 ` Taylor Blau
2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-10 18:08     ` Junio C Hamano
2023-03-13 10:31     ` Phillip Wood
2023-03-13 13:33       ` Derrick Stolee
2023-03-13 21:10         ` Taylor Blau
2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason [this message]
2023-03-15 17:17       ` Jeff King
2023-03-15 17:49     ` Jeff King
2023-03-15 19:24       ` Junio C Hamano
2023-03-15 19:44         ` Jeff King
2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:03       ` Junio C Hamano
2023-03-15 16:13         ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-10 19:09     ` Junio C Hamano
2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:01       ` Junio C Hamano
2023-03-15 16:12         ` Derrick Stolee
2023-03-15 16:11       ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:17       ` Derrick Stolee
2023-03-15 16:18         ` Derrick Stolee
2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
2023-03-10 19:25     ` Derrick Stolee
2023-03-15 17:31       ` Jeff King
2023-03-15 17:44         ` Derrick Stolee
2023-03-15 19:34         ` Junio C Hamano
2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
2023-03-15 13:54     ` Derrick Stolee
2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-15 18:06       ` Jeff King
2023-03-15 19:14         ` Junio C Hamano
2023-03-15 22:41       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-15 22:49       ` Jonathan Tan
2023-03-17 18:30         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-15 22:58       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 23:28       ` Jonathan Tan
2023-03-17 18:44         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-20 20:40         ` Jonathan Tan
2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=230315.86h6umxh7c.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.