Re: [PATCH 4/5] config: return an empty list, not NULL

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 4/5] config: return an empty list, not NULL
Date: Wed, 28 Sep 2022 16:37:55 +0200	[thread overview]
Message-ID: <220928.86y1u3wnaz.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <26b3c9ef-5dd7-18f2-89c4-8d210a409ce4@github.com>

On Wed, Sep 28 2022, Derrick Stolee wrote:

> On 9/27/22 3:18 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Sep 27 2022, Derrick Stolee wrote:
>> 
>>> On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>>>
>>>>>  /**
>>>>>   * Finds and returns the value list, sorted in order of increasing priority
>>>>>   * for the configuration variable `key`. When the configuration variable
>>>>> - * `key` is not found, returns NULL. The caller should not free or modify
>>>>> - * the returned pointer, as it is owned by the cache.
>>>>> + * `key` is not found, returns an empty list. The caller should not free or
>>>>> + * modify the returned pointer, as it is owned by the cache.
>>>>>   */
>>>>>  const struct string_list *git_config_get_value_multi(const char *key);
>>>>
>>>> Aside from the "DWIM API" aspect of this (which I don't mind) I think
>>>> this is really taking the low-level function in the wrong direction, and
>>>> that we should just add a new simple wrapper instead.
>>>>
>>>> I.e. both the pre-image API docs & this series gloss over the fact that
>>>> we'd not just return NULL here if the config wasn't there, but also if
>>>> git_config_parse_key() failed.
>>>>
>>>> So it seems to me that a better direction would be starting with
>>>> something like the WIP below (which doesn't compile the whole code, I
>>>> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
>>>> and "dest" pattern that most other things in the config API have.
>>>
>>> Do you have an example where a caller would benefit from this
>>> distinction? Without such an example, I don't think it is worth
>>> creating such a huge change for purity's sake alone.
>> 
>> Not initially, I started poking at this because the CL/series/commits
>> says that we don't care about the case of non-existing keys, without
>> being clear as to why we want to conflate that with other errors we
>> might get from this API.
>> 
>> But after some digging I found:
>> 
>> 	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
>> 	error: key does not contain a section: a
>> 	0
>> 	error: key does not contain variable name: a.b.
>> 	0
>> 	error: invalid key: 'x.y
>> 	0
>> 	
>> I.e. the repo_config_get_value_multi() you added in for-each-repo
>> doesn't distinguish between bad keys and non-existing keys, and returns
>> 0 even though it printed an "error".
>
> I can understand wanting to inform the user that they provided an
> invalid key using a nonzero exit code. I can also understand that
> the command does what is asked: it did nothing because the given
> key has no values (because it can't). I think the use of an "error"
> message balances things towards wanting a nonzero exit code.

Right, to be clear I think 6c62f015520 (for-each-repo: do nothing on
empty config, 2021-01-08) is sensible, i.e. we want to return 0 on a
non-existing key.

We just shouldn't conflate that with e.g. these parse errors, which the
API squashing the underlying negative return values and the "NULL list"
imposes on the user.

>>> I'm pretty happy that the diff for this series is an overall
>>> reduction in code, while also not being too large in the interim:
>>>
>>>  12 files changed, 39 insertions(+), 57 deletions(-)
>>>
>>> If all callers that use the *_multi() methods would only use the
>>> wrapper, then what is the point of doing the low-level manipulations?
>> 
>> I hacked up something that's at least RFC-quality based on this
>> approach, but CI is running etc., so not submitting it
>> now:
>> 
>> 	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern
>> 
>> I think the resulting diff is more idiomatic API use, i.e. you ended up
>> with:
>> 
>> 	        /* submodule.active is set */
>> 	        sl = repo_config_get_value_multi(repo, "submodule.active");
>> 	-       if (sl) {
>> 	+       if (sl && sl->nr) {
>
> You're right that I forgot to change this one to "if (sl->nr)"
> in patch 5.

If I am I didn't mean to point that out, I ws just pointing out the
end-API use. I.e. int return value v.s. the "populate dest" pattern, but
yes, in your end-state you'd drop the "sl &&" part.

>> But I ended up doing:
>> 
>> 	        /* submodule.active is set */
>> 	-       sl = repo_config_get_value_multi(repo, "submodule.active");
>> 	-       if (sl) {
>> 	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
>> 
>> Note the "const" in the function name, i.e. there's wrappers that handle
>> the case where we have a hardcoded key name, in which case we can BUG()
>> out if we'd return < 0, so all we have left is just "does key exist".
>
> The problem here is that the block actually cares that the list is non-empty
> and should not run if the list is empty. In that case, you would need to add
> "&& sl->nr" to the condition.
>
> I'm of course assuming that an empty list is different from an error. In
> your for-each-repo example, we would not want to return a non-zero exit
> code on an empty list, only on a bad key (or other I/O problem).
>
> If we return a negative value on an error and the number of matches on
> success, then this change could instead be "if (repo_config....() > 0)".

Hrm, I think you're confusing the worldview your series here is
advocating for, and what I'm suggesting as an alternative.

There isn't any way on "master" to have "an empty list", that's a
worldview you're proposing. In particular your 1/5 here removes:

	assert(values->nr > 0);

More generally the config format has no notion of "an empty list", if
you have a valid key-value pair at all you have a list of ".nr >= 1".

The "empty list" is a construct you're introducing in this series,
because you wanted the idiom of passing things to
for_each_string_list_item.

I'm advocating for not going that route, and instead make the *_multi()
method like the rest of the config API. I.e. to use the "return int,
populate dest" pattern.

It's fine if we disagree, but I get the sense that it's not clear what
we're disagreeing *on* :)

>> In any case, I'm all for having some simple wrapper for the common cases
> A simple wrapper would be nice, and be exactly the method as it is
> updated in this series. The error-result version could be adopted when
> there is reason to do so.

Well, no :) We ended up with two different "simple wrapper[s]", mine
doesn't have this notion of a "struct string_list *list" with .nr == 0.

>> But I didn't find a single case where we actually needed this "never
>> give me a non-NULL list" behavior, it could just be generalized to
>> "let's have the API tell us if the key exist".
>
> Most cases want to feed the result into the for_each_string_list_item()
> macro. Based on the changes in patch 5, I think the empty list is a
> better pattern and leads to prettier code in almost all cases.

I updated the WIP RFC series I linked to upthread a bit since my initial
reply (the link is still good, I force-pushed), I then rebased your
series here on "master", below is a diff of some select files.

The overall diff is much bigger obviously (API changes and all), but the
below demonstrates some of the API changes (yours is "-", mine is
"+"). I've commented inline on some of it:

	diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
	index 635ea5e15fd..16e9a76d04a 100644
	--- a/builtin/for-each-repo.c
	+++ b/builtin/for-each-repo.c
	@@ -29,6 +29,7 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	static const char *config_key = NULL;
	 	int i, result = 0;
	 	const struct string_list *values;
	+	int err;

	 	const struct option options[] = {
	 		OPT_STRING(0, "config", &config_key, N_("config"),
	@@ -42,8 +43,13 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
	 	if (!config_key)
	 		die(_("missing --config=<config>"));

	-	values = repo_config_get_value_multi(the_repository,
	-					     config_key);
	+	err = repo_config_get_value_multi(the_repository, config_key, &values);
	+	if (err < 0)
	+		usage_msg_optf(_("got bad config --config=%s"),
	+			       for_each_repo_usage, options, config_key);
	+	else if (err)
	+		return 0;
	+
	 	for (i = 0; !result && i < values->nr; i++)
	 		result = run_command_on_repo(values->items[i].string, argc, argv);

Here we're relying an error to the user that we couldn't before, because
repo_config_get_value_multi() would return "NULL" for both "key is bad"
and "key doesn't exist". There's a corresponding test modification
below.

	diff --git a/builtin/gc.c b/builtin/gc.c
	index 1e9ac2ac7e3..94b77a88a99 100644
	--- a/builtin/gc.c
	+++ b/builtin/gc.c
	@@ -1472,9 +1472,7 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	};
	 	int found = 0;
	 	const char *key = "maintenance.repo";
	-	char *config_value;
	 	char *maintpath = get_maintpath();
	-	struct string_list_item *item;
	 	const struct string_list *list;

	 	argc = parse_options(argc, argv, prefix, options,
	@@ -1487,18 +1485,11 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
	 	git_config_set("maintenance.auto", "false");

	 	/* Set maintenance strategy, if unset */
	-	if (!git_config_get_string("maintenance.strategy", &config_value))
	-		free(config_value);
	-	else
	+	if (git_config_lookup_value("maintenance.strategy"))
	 		git_config_set("maintenance.strategy", "incremental");

In looking at this I thought we were way overdue for a "does this key
exist?" helper, this and a few other API users use it.

	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

So, it turns out that the initial reason you wanted the "pass NULL to
for_each_string_list_item" is actually something we can do with
unsorted_string_list_has_string(), which implements the same loop.

The difference here is *the* API difference we're discussing. I.e. we'll
never get a NULL "list", we'll instead always get a non-NULL list with
>= 1 item if we can get this key at all.

The "const value" helper is a wrapper that handles the "err < 0" case. I
cases where we hardcode the key it's a BUG() if we get "err < 0". The
wrapper is just:

	int err = git_configset_get_value_multi(cs, key, dest);
        if (err < 0)
		BUG("failed to parse constant key '%s'!", key);
	return err;

	[...]
	@@ -1547,13 +1537,8 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
	 		usage_with_options(builtin_maintenance_unregister_usage,
	 				   options);

	-	list = git_config_get_value_multi(key);
	-	for_each_string_list_item(item, list) {
	-		if (!strcmp(maintpath, item->string)) {
	-			found = 1;
	-			break;
	-		}
	-	}
	+	if (!git_config_get_const_value_multi(key, &list))
	+		found = unsorted_string_list_has_string(list, maintpath);

Ditto the same git_config_get_const_value_multi() &
unsorted_string_list_has_string() pattern.

	 	if (found) {
	 		int rc;
	diff --git a/builtin/log.c b/builtin/log.c
	index 719ef966045..bdb87f6c42b 100644
	--- a/builtin/log.c
	+++ b/builtin/log.c
	@@ -182,13 +182,15 @@ static void set_default_decoration_filter(struct decoration_filter *decoration_f
	 	int i;
	 	char *value = NULL;
	 	struct string_list *include = decoration_filter->include_ref_pattern;
	-	struct string_list_item *item;
	-	const struct string_list *config_exclude =
	-			git_config_get_value_multi("log.excludeDecoration");
	+	const struct string_list *config_exclude;

	-	for_each_string_list_item(item, config_exclude)
	-		string_list_append(decoration_filter->exclude_ref_config_pattern,
	-				   item->string);
	+	if (!git_config_get_const_value_multi("log.excludeDecoration",
	+					      &config_exclude)) {
	+		struct string_list_item *item;
	+		for_each_string_list_item(item, config_exclude)
	+			string_list_append(decoration_filter->exclude_ref_config_pattern,
	+					   item->string);
	+	}

Here's a case where we need to use for_each_string_list_item(), I think
it's nice how we can now scope the "item" variable.

	 	/*
	 	 * By default, decorate_all is disabled. Enable it if
	diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
	index 5a8b6120157..b758255f816 100644
	--- a/builtin/submodule--helper.c
	+++ b/builtin/submodule--helper.c
	@@ -552,7 +552,7 @@ static int module_init(int argc, const char **argv, const char *prefix)
	 	 * If there are no path args and submodule.active is set then,
	 	 * by default, only initialize 'active' modules.
	 	 */
	-	if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+	if (!argc && !git_config_lookup_value("submodule.active"))
	 		module_list_active(&list);

You changed these in your 2/5, but they really just wanted the new "does
this key exist?" API. No need to construct the string_list just to throw
it away...

	 	info.prefix = prefix;
	@@ -2720,7 +2720,7 @@ static int module_update(int argc, const char **argv, const char *prefix)
	 		 * If there are no path args and submodule.active is set then,
	 		 * by default, only initialize 'active' modules.
	 		 */
	-		if (!argc && git_config_get_value_multi("submodule.active")->nr)
	+		if (!argc && !git_config_lookup_value("submodule.active"))
	 			module_list_active(&list);

Ditto.

	 		info.prefix = opt.prefix;
	@@ -3164,7 +3164,6 @@ static int config_submodule_in_gitmodules(const char *name, const char *var, con
	 static void configure_added_submodule(struct add_data *add_data)
	 {
	 	char *key;
	-	const char *val;
	 	struct child_process add_submod = CHILD_PROCESS_INIT;
	 	struct child_process add_gitmodules = CHILD_PROCESS_INIT;

	@@ -3209,7 +3208,7 @@ static void configure_added_submodule(struct add_data *add_data)
	 	 * is_submodule_active(), since that function needs to find
	 	 * out the value of "submodule.active" again anyway.
	 	 */
	-	if (!git_config_get_string_tmp("submodule.active", &val)) {
	+	if (!git_config_lookup_value("submodule.active")) {
	 		/*
	 		 * If the submodule being added isn't already covered by the
	 		 * current configured pathspec, set the submodule's active flag

Ditto.

	diff --git a/submodule.c b/submodule.c
	index 06230961c80..4474cf9ed2d 100644
	--- a/submodule.c
	+++ b/submodule.c
	@@ -274,8 +274,7 @@ int is_tree_submodule_active(struct repository *repo,
	 	free(key);

	 	/* submodule.active is set */
	-	sl = repo_config_get_value_multi(repo, "submodule.active");
	-	if (sl && sl->nr) {
	+	if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
	 		struct pathspec ps;
	 		struct strvec args = STRVEC_INIT;
	 		const struct string_list_item *item;

Another "*the* API difference we're discussing". I.e. sure, your end
state would be "if (sl->nr)", but if we make it return "int"...

	diff --git a/t/helper/test-config.c b/t/helper/test-config.c
	index 90810946783..432ad047537 100644
	--- a/t/helper/test-config.c
	+++ b/t/helper/test-config.c
	@@ -95,8 +95,7 @@ int cmd__config(int argc, const char **argv)
	 			goto exit1;
	 		}
	 	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
	-		strptr = git_config_get_value_multi(argv[2]);
	-		if (strptr->nr) {
	+		if (!git_config_get_const_value_multi(argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, (this one converts away from your preferred API use).

	@@ -159,8 +158,7 @@ int cmd__config(int argc, const char **argv)
	 				goto exit2;
	 			}
	 		}
	-		strptr = git_configset_get_value_multi(&cs, argv[2]);
	-		if (strptr && strptr->nr) {
	+		if (!git_configset_get_const_value_multi(&cs, argv[2], &strptr)) {
	 			for (i = 0; i < strptr->nr; i++) {
	 				v = strptr->items[i].string;
	 				if (!v)

Ditto, sans that you'd presumably want s/strptr && // here.

	diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
	index 4675e852517..115221c9ca5 100755
	--- a/t/t0068-for-each-repo.sh
	+++ b/t/t0068-for-each-repo.sh
	@@ -33,4 +33,10 @@ test_expect_success 'do nothing on empty config' '
	 	git for-each-repo --config=bogus.config -- help --no-such-option
	 '

	+test_expect_success 'error on bad config keys' '
	+	test_expect_code 129 git for-each-repo --config=a &&
	+	test_expect_code 129 git for-each-repo --config=a.b. &&
	+	test_expect_code 129 git for-each-repo --config="'\''.b"
	+'
	+
	 test_done

A test showing behavior change we can implement now that we don't sweep
the "err < 0" under the rug.

That branch also grew to have some other changes we may or may not want,
one thing was to convert the various *_get_*() functionts that now
normalize the non-zero return value with e.g.:

	 int git_configset_get_int(struct config_set *cs, const char *key, int *dest)
	 {
	        const char *value;
	-       if (!git_configset_get_value(cs, key, &value)) {
	-               *dest = git_config_int(key, value);
	-               return 0;
	-       } else
	-               return 1;
	+       int err;
	+
	+       if ((err = git_configset_get_value(cs, key, &value)))
	+               return err;
	+       *dest = git_config_int(key, value);
	+       return 0;
	 }

No caller currently cares about it, but I think it makes sense generally
not to throw away errors if we can (whether that part is worth the churn
is another topic).

Anyway, the reason I started looking at this RFC to begin with was
because this *_multi() part of the config API has often seemed odd to
me, i.e. I wondered why we couldn't just have it use the "return int,
populate dest" pattern. I'd just never tried to see if I could get that
to work.

It's a bit of one-off churn to get to this point, but I think the end
result of having all the API functions act the same way to signal key
existence v.s. validity is worth it.