git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] config-parse: create config parsing library
@ 2023-07-20 22:17 Glen Choo via GitGitGadget
  2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Glen Choo via GitGitGadget @ 2023-07-20 22:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Calvin Wan, Glen Choo

Config parsing no longer uses global state as of gc/config-context, so the
natural next step for libification is to turn that into its own library.
This series starts that process by moving config parsing into
config-parse.[c|h] so that other programs can include this functionality
without pulling in all of config.[c|h].

To use config-parse.[c|h], an external caller has to obey our convention of
using "#include git-compat-util.h" at the start of the .c file. This is
doable by including the in-tree git-compat-util.h and linking against the
libgit.a Make target (which is admittedly cumbersome), and we've verified
this by compiling and linking to the library using the Google-internal
version of Bazel.

This series is not meant to distract from Calvin's git-std-lib series [1].
In fact, the two are complementary: git-std-lib will make it possible for
external callers to compile a smaller subset of files in order to use a
library. Doing this for config-parse will make config-parse easier to use,
while testing that git-std-lib does what we want it to.

I considered calling the library config-ll (like we do in other parts of the
codebase) instead of config-parse, with the intention of adding more "low
level" config code to it in the future. A benefit to that is that by having
fewer modules, dependency management is easier to reason about. However, I
struggled to think of what other config code could be considered "low level"
but doesn't make sense as its own module. (E.g. struct config_set is a low
level implementation detail, but I think it's well-scoped enough to be its
own config-set module.) I'd appreciate suggestions on how the config
libraries could be organized.

[1]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/

Glen Choo (2):
  config: return positive from git_config_parse_key()
  config-parse: split library out of config.[c|h]

 Makefile           |   1 +
 builtin/config.c   |   3 +-
 config-parse.c     | 611 +++++++++++++++++++++++++++++++++++++++++++
 config-parse.h     | 182 +++++++++++++
 config.c           | 636 +--------------------------------------------
 config.h           | 146 +----------
 submodule-config.c |   4 +-
 t/t1300-config.sh  |  16 ++
 8 files changed, 816 insertions(+), 783 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h


base-commit: aa9166bcc0ba654fc21f198a30647ec087f733ed
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1551%2Fchooglen%2Fconfig%2Fparse-lib-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1551/chooglen/config/parse-lib-v1
Pull-Request: https://github.com/git/git/pull/1551
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/2] config: return positive from git_config_parse_key()
  2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
@ 2023-07-20 22:17 ` Glen Choo via GitGitGadget
  2023-07-20 23:44   ` Jonathan Tan
  2023-07-21  4:32   ` Junio C Hamano
  2023-07-20 22:17 ` [PATCH 2/2] config-parse: split library out of config.[c|h] Glen Choo via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 49+ messages in thread
From: Glen Choo via GitGitGadget @ 2023-07-20 22:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Calvin Wan, Glen Choo, Glen Choo

From: Glen Choo <chooglen@google.com>

git_config_parse_key() returns #define-d error codes, but negated. This
negation is merely a convenience to other parts of config.c that don't
bother inspecting the return value before passing it along. But:

a) There's no good reason why those callers couldn't negate the value
   themselves.

b) In other callers, this value eventually gets fed to exit(3), and
   those callers need to sanitize the negative value (and they sometimes
   do so lossily, by overriding the return value with
   CONFIG_INVALID_KEY).

c) We want to move that into a separate library, and returning only
   negative values no longer makes as much sense.

Change git_config_parse_key() to return positive values instead, and
adjust callers accordingly. Callers that sanitize the negative sign for
exit(3) now pass the return value opaquely, fixing a bug where "git
config <key with no section or name>" results in a different exit code
depending on whether we are setting or getting config. Callers that
wanted to pass along a negative value now negate the return value
themselves.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 builtin/config.c   |  3 +--
 config.c           | 16 ++++++++--------
 config.h           |  2 +-
 submodule-config.c |  4 ++--
 t/t1300-config.sh  | 16 ++++++++++++++++
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 1c75cbc43df..8a2840f0a8c 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
 			goto free_strings;
 		}
 	} else {
-		if (git_config_parse_key(key_, &key, NULL)) {
-			ret = CONFIG_INVALID_KEY;
+		if ((ret = git_config_parse_key(key_, &key, NULL))) {
 			goto free_strings;
 		}
 	}
diff --git a/config.c b/config.c
index 85c5f35132c..ca77ca17a47 100644
--- a/config.c
+++ b/config.c
@@ -534,8 +534,9 @@ static inline int iskeychar(int c)
  * Auxiliary function to sanity-check and split the key into the section
  * identifier and variable name.
  *
- * Returns 0 on success, -1 when there is an invalid character in the key and
- * -2 if there is no section name in the key.
+ * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
  *
  * store_key - pointer to char* which will hold a copy of the key with
  *             lowercase section and variable name
@@ -555,12 +556,12 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 	if (last_dot == NULL || last_dot == key) {
 		error(_("key does not contain a section: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	if (!last_dot[1]) {
 		error(_("key does not contain variable name: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	baselen = last_dot - key;
@@ -596,7 +597,7 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 out_free_ret_1:
 	FREE_AND_NULL(*store_key);
-	return -CONFIG_INVALID_KEY;
+	return CONFIG_INVALID_KEY;
 }
 
 static int config_parse_pair(const char *key, const char *value,
@@ -2346,7 +2347,7 @@ static int configset_find_element(struct config_set *set, const char *key,
 	 * `key` may come from the user, so normalize it before using it
 	 * for querying entries from the hashmap.
 	 */
-	ret = git_config_parse_key(key, &normalized_key, NULL);
+	ret = -git_config_parse_key(key, &normalized_key, NULL);
 	if (ret)
 		return ret;
 
@@ -3334,8 +3335,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	size_t contents_sz;
 	struct config_store_data store = CONFIG_STORE_INIT;
 
-	/* parse-key returns negative; flip the sign to feed exit(3) */
-	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
+	ret = git_config_parse_key(key, &store.key, &store.baselen);
 	if (ret)
 		goto out_free;
 
diff --git a/config.h b/config.h
index 6332d749047..40966cb6828 100644
--- a/config.h
+++ b/config.h
@@ -23,7 +23,7 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these negated: */
+/* git_config_parse_key() returns these: */
 #define CONFIG_INVALID_KEY 1
 #define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
diff --git a/submodule-config.c b/submodule-config.c
index b6908e295f4..2aafc7f9cbe 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -824,8 +824,8 @@ int print_config_from_gitmodules(struct repository *repo, const char *key)
 	char *store_key;
 
 	ret = git_config_parse_key(key, &store_key, NULL);
-	if (ret < 0)
-		return CONFIG_INVALID_KEY;
+	if (ret)
+		return ret;
 
 	config_from_gitmodules(config_print_callback, repo, store_key);
 
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 387d336c91f..3202b0f8843 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
 	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
 '
 
+# Exit codes
+test_expect_success '--get with bad key' '
+	# Also exits with 1 if the value is not found
+	test_expect_code 1 git config --get "bad.name\n" 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config --get "bad." 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
+test_expect_success 'set with bad key' '
+	test_expect_code 1 git config "bad.name\n" var 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config "bad." var 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/2] config-parse: split library out of config.[c|h]
  2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
  2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
@ 2023-07-20 22:17 ` Glen Choo via GitGitGadget
  2023-07-21  0:31   ` Jonathan Tan
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Glen Choo via GitGitGadget @ 2023-07-20 22:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Calvin Wan, Glen Choo, Glen Choo

From: Glen Choo <chooglen@google.com>

The config parsing machinery (besides "include" directives) is usable by
programs other than Git - it works with works with any file written in
Git config syntax (IOW it doesn't rely on 'core' Git features like a
repository), and as of the series ending at 6e8e7981eb (config: pass
source to config_parser_event_fn_t, 2023-06-28), it no longer relies on
global state. Thus, we can and should start turning it into a library
other programs can use.

Begin this process by splitting the config parsing code out of
config.[c|h] and into config-parse.[c|h]. Do not change interfaces or
function bodies, but tweak visibility and includes where appropriate,
namely:

- git_config_from_stdin() is now non-static so that it can be seen by
  config.c.

- "struct config_source" is now defined in the .h file so that it can be
  seen by config.c. And as a result, config-lib.h needs to "#include
  strbuf.h".

In theory, this makes it possible for in-tree files to decide whether
they only need all of the config functionality or only config parsing,
and bring in the smallest bit of functionality needed. But for now,
there are no in-tree files that can swap "#include config.h" for
"#include config-parse.h". E.g. Bundle URIs would only need config
parsing to parse bundle lists, but bundle-uri.c uses other config.h
functionality like key parsing and reading repo settings.

The resulting library is usable, though it is unergonomic to do so,
e.g. the caller needs to "#include git-compat-util.h" and other
dependencies, and we don't have an easy way of linking in the required
objects. This isn't the end state we want for our libraries, but at
least we have _some_ library whose usability we can improve in future
series.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Makefile       |   1 +
 config-parse.c | 611 +++++++++++++++++++++++++++++++++++++++++++++++
 config-parse.h | 182 ++++++++++++++
 config.c       | 632 -------------------------------------------------
 config.h       | 146 +-----------
 5 files changed, 795 insertions(+), 777 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

diff --git a/Makefile b/Makefile
index fb541dedc9f..67e05bcee57 100644
--- a/Makefile
+++ b/Makefile
@@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += compat/zlib-uncompress2.o
 LIB_OBJS += config.o
+LIB_OBJS += config-parse.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/config-parse.c b/config-parse.c
new file mode 100644
index 00000000000..9f26ca42079
--- /dev/null
+++ b/config-parse.c
@@ -0,0 +1,611 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "gettext.h"
+#include "hashmap.h"
+#include "utf8.h"
+#include "config-parse.h"
+
+static int config_file_fgetc(struct config_source *conf)
+{
+	return getc_unlocked(conf->u.file);
+}
+
+static int config_file_ungetc(int c, struct config_source *conf)
+{
+	return ungetc(c, conf->u.file);
+}
+
+static long config_file_ftell(struct config_source *conf)
+{
+	return ftell(conf->u.file);
+}
+
+
+static int config_buf_fgetc(struct config_source *conf)
+{
+	if (conf->u.buf.pos < conf->u.buf.len)
+		return conf->u.buf.buf[conf->u.buf.pos++];
+
+	return EOF;
+}
+
+static int config_buf_ungetc(int c, struct config_source *conf)
+{
+	if (conf->u.buf.pos > 0) {
+		conf->u.buf.pos--;
+		if (conf->u.buf.buf[conf->u.buf.pos] != c)
+			BUG("config_buf can only ungetc the same character");
+		return c;
+	}
+
+	return EOF;
+}
+
+static long config_buf_ftell(struct config_source *conf)
+{
+	return conf->u.buf.pos;
+}
+
+static inline int iskeychar(int c)
+{
+	return isalnum(c) || c == '-';
+}
+
+/*
+ * Auxiliary function to sanity-check and split the key into the section
+ * identifier and variable name.
+ *
+ * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
+ *
+ * store_key - pointer to char* which will hold a copy of the key with
+ *             lowercase section and variable name
+ * baselen - pointer to size_t which will hold the length of the
+ *           section + subsection part, can be NULL
+ */
+int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
+{
+	size_t i, baselen;
+	int dot;
+	const char *last_dot = strrchr(key, '.');
+
+	/*
+	 * Since "key" actually contains the section name and the real
+	 * key name separated by a dot, we have to know where the dot is.
+	 */
+
+	if (last_dot == NULL || last_dot == key) {
+		error(_("key does not contain a section: %s"), key);
+		return CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	if (!last_dot[1]) {
+		error(_("key does not contain variable name: %s"), key);
+		return CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	baselen = last_dot - key;
+	if (baselen_)
+		*baselen_ = baselen;
+
+	/*
+	 * Validate the key and while at it, lower case it for matching.
+	 */
+	*store_key = xmallocz(strlen(key));
+
+	dot = 0;
+	for (i = 0; key[i]; i++) {
+		unsigned char c = key[i];
+		if (c == '.')
+			dot = 1;
+		/* Leave the extended basename untouched.. */
+		if (!dot || i > baselen) {
+			if (!iskeychar(c) ||
+			    (i == baselen + 1 && !isalpha(c))) {
+				error(_("invalid key: %s"), key);
+				goto out_free_ret_1;
+			}
+			c = tolower(c);
+		} else if (c == '\n') {
+			error(_("invalid key (newline): %s"), key);
+			goto out_free_ret_1;
+		}
+		(*store_key)[i] = c;
+	}
+
+	return 0;
+
+out_free_ret_1:
+	FREE_AND_NULL(*store_key);
+	return CONFIG_INVALID_KEY;
+}
+
+static int get_next_char(struct config_source *cs)
+{
+	int c = cs->do_fgetc(cs);
+
+	if (c == '\r') {
+		/* DOS like systems */
+		c = cs->do_fgetc(cs);
+		if (c != '\n') {
+			if (c != EOF)
+				cs->do_ungetc(c, cs);
+			c = '\r';
+		}
+	}
+
+	if (c != EOF && ++cs->total_len > INT_MAX) {
+		/*
+		 * This is an absurdly long config file; refuse to parse
+		 * further in order to protect downstream code from integer
+		 * overflows. Note that we can't return an error specifically,
+		 * but we can mark EOF and put trash in the return value,
+		 * which will trigger a parse error.
+		 */
+		cs->eof = 1;
+		return 0;
+	}
+
+	if (c == '\n')
+		cs->linenr++;
+	if (c == EOF) {
+		cs->eof = 1;
+		cs->linenr++;
+		c = '\n';
+	}
+	return c;
+}
+
+static char *parse_value(struct config_source *cs)
+{
+	int quote = 0, comment = 0, space = 0;
+
+	strbuf_reset(&cs->value);
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n') {
+			if (quote) {
+				cs->linenr--;
+				return NULL;
+			}
+			return cs->value.buf;
+		}
+		if (comment)
+			continue;
+		if (isspace(c) && !quote) {
+			if (cs->value.len)
+				space++;
+			continue;
+		}
+		if (!quote) {
+			if (c == ';' || c == '#') {
+				comment = 1;
+				continue;
+			}
+		}
+		for (; space; space--)
+			strbuf_addch(&cs->value, ' ');
+		if (c == '\\') {
+			c = get_next_char(cs);
+			switch (c) {
+			case '\n':
+				continue;
+			case 't':
+				c = '\t';
+				break;
+			case 'b':
+				c = '\b';
+				break;
+			case 'n':
+				c = '\n';
+				break;
+			/* Some characters escape as themselves */
+			case '\\': case '"':
+				break;
+			/* Reject unknown escape sequences */
+			default:
+				return NULL;
+			}
+			strbuf_addch(&cs->value, c);
+			continue;
+		}
+		if (c == '"') {
+			quote = 1-quote;
+			continue;
+		}
+		strbuf_addch(&cs->value, c);
+	}
+}
+
+static int get_value(struct config_source *cs, struct key_value_info *kvi,
+		     config_fn_t fn, void *data, struct strbuf *name)
+{
+	int c;
+	char *value;
+	int ret;
+	struct config_context ctx = {
+		.kvi = kvi,
+	};
+
+	/* Get the full name */
+	for (;;) {
+		c = get_next_char(cs);
+		if (cs->eof)
+			break;
+		if (!iskeychar(c))
+			break;
+		strbuf_addch(name, tolower(c));
+	}
+
+	while (c == ' ' || c == '\t')
+		c = get_next_char(cs);
+
+	value = NULL;
+	if (c != '\n') {
+		if (c != '=')
+			return -1;
+		value = parse_value(cs);
+		if (!value)
+			return -1;
+	}
+	/*
+	 * We already consumed the \n, but we need linenr to point to
+	 * the line we just parsed during the call to fn to get
+	 * accurate line number in error messages.
+	 */
+	cs->linenr--;
+	kvi->linenr = cs->linenr;
+	ret = fn(name->buf, value, &ctx, data);
+	if (ret >= 0)
+		cs->linenr++;
+	return ret;
+}
+
+static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
+				 int c)
+{
+	cs->subsection_case_sensitive = 0;
+	do {
+		if (c == '\n')
+			goto error_incomplete_line;
+		c = get_next_char(cs);
+	} while (isspace(c));
+
+	/* We require the format to be '[base "extension"]' */
+	if (c != '"')
+		return -1;
+	strbuf_addch(name, '.');
+
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n')
+			goto error_incomplete_line;
+		if (c == '"')
+			break;
+		if (c == '\\') {
+			c = get_next_char(cs);
+			if (c == '\n')
+				goto error_incomplete_line;
+		}
+		strbuf_addch(name, c);
+	}
+
+	/* Final ']' */
+	if (get_next_char(cs) != ']')
+		return -1;
+	return 0;
+error_incomplete_line:
+	cs->linenr--;
+	return -1;
+}
+
+static int get_base_var(struct config_source *cs, struct strbuf *name)
+{
+	cs->subsection_case_sensitive = 1;
+	for (;;) {
+		int c = get_next_char(cs);
+		if (cs->eof)
+			return -1;
+		if (c == ']')
+			return 0;
+		if (isspace(c))
+			return get_extended_base_var(cs, name, c);
+		if (!iskeychar(c) && c != '.')
+			return -1;
+		strbuf_addch(name, tolower(c));
+	}
+}
+
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_options *opts;
+};
+
+static int do_event(struct config_source *cs, enum config_event_t type,
+		    struct parse_event_data *data)
+{
+	size_t offset;
+
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	offset = cs->do_ftell(cs);
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 offset, cs, data->opts->event_fn_data) < 0)
+		return -1;
+
+	data->previous_type = type;
+	data->previous_offset = offset;
+
+	return 0;
+}
+
+static void kvi_from_source(struct config_source *cs,
+			    enum config_scope scope,
+			    struct key_value_info *out)
+{
+	out->filename = strintern(cs->name);
+	out->origin_type = cs->origin_type;
+	out->linenr = cs->linenr;
+	out->scope = scope;
+	out->path = cs->path;
+}
+
+static int git_parse_source(struct config_source *cs, config_fn_t fn,
+			    struct key_value_info *kvi, void *data,
+			    const struct config_options *opts)
+{
+	int comment = 0;
+	size_t baselen = 0;
+	struct strbuf *var = &cs->var;
+	int error_return = 0;
+	char *error_msg = NULL;
+
+	/* U+FEFF Byte Order Mark in UTF8 */
+	const char *bomptr = utf8_bom;
+
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
+	for (;;) {
+		int c;
+
+		c = get_next_char(cs);
+		if (bomptr && *bomptr) {
+			/* We are at the file beginning; skip UTF8-encoded BOM
+			 * if present. Sane editors won't put this in on their
+			 * own, but e.g. Windows Notepad will do it happily. */
+			if (c == (*bomptr & 0377)) {
+				bomptr++;
+				continue;
+			} else {
+				/* Do not tolerate partial BOM. */
+				if (bomptr != utf8_bom)
+					break;
+				/* No BOM at file beginning. Cool. */
+				bomptr = NULL;
+			}
+		}
+		if (c == '\n') {
+			if (cs->eof) {
+				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
+				return 0;
+			}
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
+			comment = 0;
+			continue;
+		}
+		if (comment)
+			continue;
+		if (isspace(c)) {
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
+		if (c == '#' || c == ';') {
+			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
+			comment = 1;
+			continue;
+		}
+		if (c == '[') {
+			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
+			/* Reset prior to determining a new stem */
+			strbuf_reset(var);
+			if (get_base_var(cs, var) < 0 || var->len < 1)
+				break;
+			strbuf_addch(var, '.');
+			baselen = var->len;
+			continue;
+		}
+		if (!isalpha(c))
+			break;
+
+		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
+		/*
+		 * Truncate the var name back to the section header
+		 * stem prior to grabbing the suffix part of the name
+		 * and the value.
+		 */
+		strbuf_setlen(var, baselen);
+		strbuf_addch(var, tolower(c));
+		if (get_value(cs, kvi, fn, data, var) < 0)
+			break;
+	}
+
+	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
+		return -1;
+
+	switch (cs->origin_type) {
+	case CONFIG_ORIGIN_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in blob %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_FILE:
+		error_msg = xstrfmt(_("bad config line %d in file %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_STDIN:
+		error_msg = xstrfmt(_("bad config line %d in standard input"),
+				      cs->linenr);
+		break;
+	case CONFIG_ORIGIN_SUBMODULE_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
+				       cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_CMDLINE:
+		error_msg = xstrfmt(_("bad config line %d in command line %s"),
+				       cs->linenr, cs->name);
+		break;
+	default:
+		error_msg = xstrfmt(_("bad config line %d in %s"),
+				      cs->linenr, cs->name);
+	}
+
+	switch (opts && opts->error_action ?
+		opts->error_action :
+		cs->default_error_action) {
+	case CONFIG_ERROR_DIE:
+		die("%s", error_msg);
+		break;
+	case CONFIG_ERROR_ERROR:
+		error_return = error("%s", error_msg);
+		break;
+	case CONFIG_ERROR_SILENT:
+		error_return = -1;
+		break;
+	case CONFIG_ERROR_UNSET:
+		BUG("config error action unset");
+	}
+
+	free(error_msg);
+	return error_return;
+}
+
+/*
+ * All source specific fields in the union, die_on_error, name and the callbacks
+ * fgetc, ungetc, ftell of top need to be initialized before calling
+ * this function.
+ */
+static int do_config_from(struct config_source *top, config_fn_t fn,
+			  void *data, enum config_scope scope,
+			  const struct config_options *opts)
+{
+	struct key_value_info kvi = KVI_INIT;
+	int ret;
+
+	/* push config-file parsing state stack */
+	top->linenr = 1;
+	top->eof = 0;
+	top->total_len = 0;
+	strbuf_init(&top->value, 1024);
+	strbuf_init(&top->var, 1024);
+	kvi_from_source(top, scope, &kvi);
+
+	ret = git_parse_source(top, fn, &kvi, data, opts);
+
+	strbuf_release(&top->value);
+	strbuf_release(&top->var);
+
+	return ret;
+}
+
+static int do_config_from_file(config_fn_t fn,
+			       const enum config_origin_type origin_type,
+			       const char *name, const char *path, FILE *f,
+			       void *data, enum config_scope scope,
+			       const struct config_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+	int ret;
+
+	top.u.file = f;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = path;
+	top.default_error_action = CONFIG_ERROR_DIE;
+	top.do_fgetc = config_file_fgetc;
+	top.do_ungetc = config_file_ungetc;
+	top.do_ftell = config_file_ftell;
+
+	flockfile(f);
+	ret = do_config_from(&top, fn, data, scope, opts);
+	funlockfile(f);
+	return ret;
+}
+
+int git_config_from_stdin(config_fn_t fn, void *data,
+			  enum config_scope scope)
+{
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, scope, NULL);
+}
+
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data, enum config_scope scope,
+				      const struct config_options *opts)
+{
+	int ret = -1;
+	FILE *f;
+
+	if (!filename)
+		BUG("filename cannot be NULL");
+	f = fopen_or_warn(filename, "r");
+	if (f) {
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, scope, opts);
+		fclose(f);
+	}
+	return ret;
+}
+
+int git_config_from_file(config_fn_t fn, const char *filename, void *data)
+{
+	return git_config_from_file_with_options(fn, filename, data,
+						 CONFIG_SCOPE_UNKNOWN, NULL);
+}
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type origin_type,
+			const char *name, const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+
+	top.u.buf.buf = buf;
+	top.u.buf.len = len;
+	top.u.buf.pos = 0;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = NULL;
+	top.default_error_action = CONFIG_ERROR_ERROR;
+	top.do_fgetc = config_buf_fgetc;
+	top.do_ungetc = config_buf_ungetc;
+	top.do_ftell = config_buf_ftell;
+
+	return do_config_from(&top, fn, data, scope, opts);
+}
diff --git a/config-parse.h b/config-parse.h
new file mode 100644
index 00000000000..d6ee88583d8
--- /dev/null
+++ b/config-parse.h
@@ -0,0 +1,182 @@
+/*
+ * Low level config parsing.
+ */
+#ifndef CONFIG_LIB_H
+#define CONFIG_LIB_H
+
+#include "strbuf.h"
+
+/* git_config_parse_key() returns these: */
+#define CONFIG_INVALID_KEY 1
+#define CONFIG_NO_SECTION_OR_NAME 2
+
+int git_config_parse_key(const char *, char **, size_t *);
+
+enum config_scope {
+	CONFIG_SCOPE_UNKNOWN = 0,
+	CONFIG_SCOPE_SYSTEM,
+	CONFIG_SCOPE_GLOBAL,
+	CONFIG_SCOPE_LOCAL,
+	CONFIG_SCOPE_WORKTREE,
+	CONFIG_SCOPE_COMMAND,
+	CONFIG_SCOPE_SUBMODULE,
+};
+const char *config_scope_name(enum config_scope scope);
+
+enum config_origin_type {
+	CONFIG_ORIGIN_UNKNOWN = 0,
+	CONFIG_ORIGIN_BLOB,
+	CONFIG_ORIGIN_FILE,
+	CONFIG_ORIGIN_STDIN,
+	CONFIG_ORIGIN_SUBMODULE_BLOB,
+	CONFIG_ORIGIN_CMDLINE
+};
+
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+struct config_source;
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					struct config_source *cs,
+					void *event_fn_data);
+
+struct config_options {
+	unsigned int respect_includes : 1;
+	unsigned int ignore_repo : 1;
+	unsigned int ignore_worktree : 1;
+	unsigned int ignore_cmdline : 1;
+	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL, and when doing so, verify
+	 * that files included in this way do not configure any remote URLs
+	 * themselves.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
+	const char *commondir;
+	const char *git_dir;
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+	enum config_error_action {
+		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
+		CONFIG_ERROR_DIE, /* die() on error */
+		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+		CONFIG_ERROR_SILENT, /* return -1 */
+	} error_action;
+};
+
+struct config_source {
+	struct config_source *prev;
+	union {
+		FILE *file;
+		struct config_buf {
+			const char *buf;
+			size_t len;
+			size_t pos;
+		} buf;
+	} u;
+	enum config_origin_type origin_type;
+	const char *name;
+	const char *path;
+	enum config_error_action default_error_action;
+	int linenr;
+	int eof;
+	size_t total_len;
+	struct strbuf value;
+	struct strbuf var;
+	unsigned subsection_case_sensitive : 1;
+
+	int (*do_fgetc)(struct config_source *c);
+	int (*do_ungetc)(int c, struct config_source *conf);
+	long (*do_ftell)(struct config_source *c);
+};
+#define CONFIG_SOURCE_INIT { 0 }
+
+/* Config source metadata for a given config key-value pair */
+struct key_value_info {
+	const char *filename;
+	int linenr;
+	enum config_origin_type origin_type;
+	enum config_scope scope;
+	const char *path;
+};
+#define KVI_INIT { \
+	.filename = NULL, \
+	.linenr = -1, \
+	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
+	.scope = CONFIG_SCOPE_UNKNOWN, \
+	.path = NULL, \
+}
+
+/* Captures additional information that a config callback can use. */
+struct config_context {
+	/* Config source metadata for key and value. */
+	const struct key_value_info *kvi;
+};
+#define CONFIG_CONTEXT_INIT { 0 }
+
+/**
+ * A config callback function takes four parameters:
+ *
+ * - the name of the parsed variable. This is in canonical "flat" form: the
+ *   section, subsection, and variable segments will be separated by dots,
+ *   and the section and variable segments will be all lowercase. E.g.,
+ *   `core.ignorecase`, `diff.SomeType.textconv`.
+ *
+ * - the value of the found variable, as a string. If the variable had no
+ *   value specified, the value will be NULL (typically this means it
+ *   should be interpreted as boolean true).
+ *
+ * - the 'config context', that is, additional information about the config
+ *   iteration operation provided by the config machinery. For example, this
+ *   includes information about the config source being parsed (e.g. the
+ *   filename).
+ *
+ * - a void pointer passed in by the caller of the config API; this can
+ *   contain callback-specific data
+ *
+ * A config callback should return 0 for success, or -1 if the variable
+ * could not be parsed properly.
+ */
+typedef int (*config_fn_t)(const char *, const char *,
+			   const struct config_context *, void *);
+
+/**
+ * Read a specific file in git-config format.
+ */
+int git_config_from_file(config_fn_t fn, const char *, void *);
+
+int git_config_from_file_with_options(config_fn_t fn, const char *,
+				      void *, enum config_scope,
+				      const struct config_options *);
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type,
+			const char *name,
+			const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_options *opts);
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope);
+
+#endif /* CONFIG_LIB_H */
diff --git a/config.c b/config.c
index ca77ca17a47..b575dfd7ba0 100644
--- a/config.c
+++ b/config.c
@@ -42,33 +42,6 @@
 #include "wrapper.h"
 #include "write-or-die.h"
 
-struct config_source {
-	struct config_source *prev;
-	union {
-		FILE *file;
-		struct config_buf {
-			const char *buf;
-			size_t len;
-			size_t pos;
-		} buf;
-	} u;
-	enum config_origin_type origin_type;
-	const char *name;
-	const char *path;
-	enum config_error_action default_error_action;
-	int linenr;
-	int eof;
-	size_t total_len;
-	struct strbuf value;
-	struct strbuf var;
-	unsigned subsection_case_sensitive : 1;
-
-	int (*do_fgetc)(struct config_source *c);
-	int (*do_ungetc)(int c, struct config_source *conf);
-	long (*do_ftell)(struct config_source *c);
-};
-#define CONFIG_SOURCE_INIT { 0 }
-
 static int pack_compression_seen;
 static int zlib_compression_seen;
 
@@ -83,47 +56,6 @@ static int zlib_compression_seen;
  */
 static struct config_set protected_config;
 
-static int config_file_fgetc(struct config_source *conf)
-{
-	return getc_unlocked(conf->u.file);
-}
-
-static int config_file_ungetc(int c, struct config_source *conf)
-{
-	return ungetc(c, conf->u.file);
-}
-
-static long config_file_ftell(struct config_source *conf)
-{
-	return ftell(conf->u.file);
-}
-
-
-static int config_buf_fgetc(struct config_source *conf)
-{
-	if (conf->u.buf.pos < conf->u.buf.len)
-		return conf->u.buf.buf[conf->u.buf.pos++];
-
-	return EOF;
-}
-
-static int config_buf_ungetc(int c, struct config_source *conf)
-{
-	if (conf->u.buf.pos > 0) {
-		conf->u.buf.pos--;
-		if (conf->u.buf.buf[conf->u.buf.pos] != c)
-			BUG("config_buf can only ungetc the same character");
-		return c;
-	}
-
-	return EOF;
-}
-
-static long config_buf_ftell(struct config_source *conf)
-{
-	return conf->u.buf.pos;
-}
-
 struct config_include_data {
 	int depth;
 	config_fn_t fn;
@@ -525,81 +457,6 @@ void git_config_push_env(const char *spec)
 	free(key);
 }
 
-static inline int iskeychar(int c)
-{
-	return isalnum(c) || c == '-';
-}
-
-/*
- * Auxiliary function to sanity-check and split the key into the section
- * identifier and variable name.
- *
- * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
- * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
- * key.
- *
- * store_key - pointer to char* which will hold a copy of the key with
- *             lowercase section and variable name
- * baselen - pointer to size_t which will hold the length of the
- *           section + subsection part, can be NULL
- */
-int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
-{
-	size_t i, baselen;
-	int dot;
-	const char *last_dot = strrchr(key, '.');
-
-	/*
-	 * Since "key" actually contains the section name and the real
-	 * key name separated by a dot, we have to know where the dot is.
-	 */
-
-	if (last_dot == NULL || last_dot == key) {
-		error(_("key does not contain a section: %s"), key);
-		return CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	if (!last_dot[1]) {
-		error(_("key does not contain variable name: %s"), key);
-		return CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	baselen = last_dot - key;
-	if (baselen_)
-		*baselen_ = baselen;
-
-	/*
-	 * Validate the key and while at it, lower case it for matching.
-	 */
-	*store_key = xmallocz(strlen(key));
-
-	dot = 0;
-	for (i = 0; key[i]; i++) {
-		unsigned char c = key[i];
-		if (c == '.')
-			dot = 1;
-		/* Leave the extended basename untouched.. */
-		if (!dot || i > baselen) {
-			if (!iskeychar(c) ||
-			    (i == baselen + 1 && !isalpha(c))) {
-				error(_("invalid key: %s"), key);
-				goto out_free_ret_1;
-			}
-			c = tolower(c);
-		} else if (c == '\n') {
-			error(_("invalid key (newline): %s"), key);
-			goto out_free_ret_1;
-		}
-		(*store_key)[i] = c;
-	}
-
-	return 0;
-
-out_free_ret_1:
-	FREE_AND_NULL(*store_key);
-	return CONFIG_INVALID_KEY;
-}
-
 static int config_parse_pair(const char *key, const char *value,
 			     struct key_value_info *kvi,
 			     config_fn_t fn, void *data)
@@ -784,390 +641,6 @@ out:
 	return ret;
 }
 
-static int get_next_char(struct config_source *cs)
-{
-	int c = cs->do_fgetc(cs);
-
-	if (c == '\r') {
-		/* DOS like systems */
-		c = cs->do_fgetc(cs);
-		if (c != '\n') {
-			if (c != EOF)
-				cs->do_ungetc(c, cs);
-			c = '\r';
-		}
-	}
-
-	if (c != EOF && ++cs->total_len > INT_MAX) {
-		/*
-		 * This is an absurdly long config file; refuse to parse
-		 * further in order to protect downstream code from integer
-		 * overflows. Note that we can't return an error specifically,
-		 * but we can mark EOF and put trash in the return value,
-		 * which will trigger a parse error.
-		 */
-		cs->eof = 1;
-		return 0;
-	}
-
-	if (c == '\n')
-		cs->linenr++;
-	if (c == EOF) {
-		cs->eof = 1;
-		cs->linenr++;
-		c = '\n';
-	}
-	return c;
-}
-
-static char *parse_value(struct config_source *cs)
-{
-	int quote = 0, comment = 0, space = 0;
-
-	strbuf_reset(&cs->value);
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n') {
-			if (quote) {
-				cs->linenr--;
-				return NULL;
-			}
-			return cs->value.buf;
-		}
-		if (comment)
-			continue;
-		if (isspace(c) && !quote) {
-			if (cs->value.len)
-				space++;
-			continue;
-		}
-		if (!quote) {
-			if (c == ';' || c == '#') {
-				comment = 1;
-				continue;
-			}
-		}
-		for (; space; space--)
-			strbuf_addch(&cs->value, ' ');
-		if (c == '\\') {
-			c = get_next_char(cs);
-			switch (c) {
-			case '\n':
-				continue;
-			case 't':
-				c = '\t';
-				break;
-			case 'b':
-				c = '\b';
-				break;
-			case 'n':
-				c = '\n';
-				break;
-			/* Some characters escape as themselves */
-			case '\\': case '"':
-				break;
-			/* Reject unknown escape sequences */
-			default:
-				return NULL;
-			}
-			strbuf_addch(&cs->value, c);
-			continue;
-		}
-		if (c == '"') {
-			quote = 1-quote;
-			continue;
-		}
-		strbuf_addch(&cs->value, c);
-	}
-}
-
-static int get_value(struct config_source *cs, struct key_value_info *kvi,
-		     config_fn_t fn, void *data, struct strbuf *name)
-{
-	int c;
-	char *value;
-	int ret;
-	struct config_context ctx = {
-		.kvi = kvi,
-	};
-
-	/* Get the full name */
-	for (;;) {
-		c = get_next_char(cs);
-		if (cs->eof)
-			break;
-		if (!iskeychar(c))
-			break;
-		strbuf_addch(name, tolower(c));
-	}
-
-	while (c == ' ' || c == '\t')
-		c = get_next_char(cs);
-
-	value = NULL;
-	if (c != '\n') {
-		if (c != '=')
-			return -1;
-		value = parse_value(cs);
-		if (!value)
-			return -1;
-	}
-	/*
-	 * We already consumed the \n, but we need linenr to point to
-	 * the line we just parsed during the call to fn to get
-	 * accurate line number in error messages.
-	 */
-	cs->linenr--;
-	kvi->linenr = cs->linenr;
-	ret = fn(name->buf, value, &ctx, data);
-	if (ret >= 0)
-		cs->linenr++;
-	return ret;
-}
-
-static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
-				 int c)
-{
-	cs->subsection_case_sensitive = 0;
-	do {
-		if (c == '\n')
-			goto error_incomplete_line;
-		c = get_next_char(cs);
-	} while (isspace(c));
-
-	/* We require the format to be '[base "extension"]' */
-	if (c != '"')
-		return -1;
-	strbuf_addch(name, '.');
-
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n')
-			goto error_incomplete_line;
-		if (c == '"')
-			break;
-		if (c == '\\') {
-			c = get_next_char(cs);
-			if (c == '\n')
-				goto error_incomplete_line;
-		}
-		strbuf_addch(name, c);
-	}
-
-	/* Final ']' */
-	if (get_next_char(cs) != ']')
-		return -1;
-	return 0;
-error_incomplete_line:
-	cs->linenr--;
-	return -1;
-}
-
-static int get_base_var(struct config_source *cs, struct strbuf *name)
-{
-	cs->subsection_case_sensitive = 1;
-	for (;;) {
-		int c = get_next_char(cs);
-		if (cs->eof)
-			return -1;
-		if (c == ']')
-			return 0;
-		if (isspace(c))
-			return get_extended_base_var(cs, name, c);
-		if (!iskeychar(c) && c != '.')
-			return -1;
-		strbuf_addch(name, tolower(c));
-	}
-}
-
-struct parse_event_data {
-	enum config_event_t previous_type;
-	size_t previous_offset;
-	const struct config_options *opts;
-};
-
-static int do_event(struct config_source *cs, enum config_event_t type,
-		    struct parse_event_data *data)
-{
-	size_t offset;
-
-	if (!data->opts || !data->opts->event_fn)
-		return 0;
-
-	if (type == CONFIG_EVENT_WHITESPACE &&
-	    data->previous_type == type)
-		return 0;
-
-	offset = cs->do_ftell(cs);
-	/*
-	 * At EOF, the parser always "inserts" an extra '\n', therefore
-	 * the end offset of the event is the current file position, otherwise
-	 * we will already have advanced to the next event.
-	 */
-	if (type != CONFIG_EVENT_EOF)
-		offset--;
-
-	if (data->previous_type != CONFIG_EVENT_EOF &&
-	    data->opts->event_fn(data->previous_type, data->previous_offset,
-				 offset, cs, data->opts->event_fn_data) < 0)
-		return -1;
-
-	data->previous_type = type;
-	data->previous_offset = offset;
-
-	return 0;
-}
-
-static void kvi_from_source(struct config_source *cs,
-			    enum config_scope scope,
-			    struct key_value_info *out)
-{
-	out->filename = strintern(cs->name);
-	out->origin_type = cs->origin_type;
-	out->linenr = cs->linenr;
-	out->scope = scope;
-	out->path = cs->path;
-}
-
-static int git_parse_source(struct config_source *cs, config_fn_t fn,
-			    struct key_value_info *kvi, void *data,
-			    const struct config_options *opts)
-{
-	int comment = 0;
-	size_t baselen = 0;
-	struct strbuf *var = &cs->var;
-	int error_return = 0;
-	char *error_msg = NULL;
-
-	/* U+FEFF Byte Order Mark in UTF8 */
-	const char *bomptr = utf8_bom;
-
-	/* For the parser event callback */
-	struct parse_event_data event_data = {
-		CONFIG_EVENT_EOF, 0, opts
-	};
-
-	for (;;) {
-		int c;
-
-		c = get_next_char(cs);
-		if (bomptr && *bomptr) {
-			/* We are at the file beginning; skip UTF8-encoded BOM
-			 * if present. Sane editors won't put this in on their
-			 * own, but e.g. Windows Notepad will do it happily. */
-			if (c == (*bomptr & 0377)) {
-				bomptr++;
-				continue;
-			} else {
-				/* Do not tolerate partial BOM. */
-				if (bomptr != utf8_bom)
-					break;
-				/* No BOM at file beginning. Cool. */
-				bomptr = NULL;
-			}
-		}
-		if (c == '\n') {
-			if (cs->eof) {
-				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
-					return -1;
-				return 0;
-			}
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-				return -1;
-			comment = 0;
-			continue;
-		}
-		if (comment)
-			continue;
-		if (isspace(c)) {
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-					return -1;
-			continue;
-		}
-		if (c == '#' || c == ';') {
-			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
-					return -1;
-			comment = 1;
-			continue;
-		}
-		if (c == '[') {
-			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
-					return -1;
-
-			/* Reset prior to determining a new stem */
-			strbuf_reset(var);
-			if (get_base_var(cs, var) < 0 || var->len < 1)
-				break;
-			strbuf_addch(var, '.');
-			baselen = var->len;
-			continue;
-		}
-		if (!isalpha(c))
-			break;
-
-		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
-			return -1;
-
-		/*
-		 * Truncate the var name back to the section header
-		 * stem prior to grabbing the suffix part of the name
-		 * and the value.
-		 */
-		strbuf_setlen(var, baselen);
-		strbuf_addch(var, tolower(c));
-		if (get_value(cs, kvi, fn, data, var) < 0)
-			break;
-	}
-
-	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
-		return -1;
-
-	switch (cs->origin_type) {
-	case CONFIG_ORIGIN_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in blob %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_FILE:
-		error_msg = xstrfmt(_("bad config line %d in file %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_STDIN:
-		error_msg = xstrfmt(_("bad config line %d in standard input"),
-				      cs->linenr);
-		break;
-	case CONFIG_ORIGIN_SUBMODULE_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
-				       cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_CMDLINE:
-		error_msg = xstrfmt(_("bad config line %d in command line %s"),
-				       cs->linenr, cs->name);
-		break;
-	default:
-		error_msg = xstrfmt(_("bad config line %d in %s"),
-				      cs->linenr, cs->name);
-	}
-
-	switch (opts && opts->error_action ?
-		opts->error_action :
-		cs->default_error_action) {
-	case CONFIG_ERROR_DIE:
-		die("%s", error_msg);
-		break;
-	case CONFIG_ERROR_ERROR:
-		error_return = error("%s", error_msg);
-		break;
-	case CONFIG_ERROR_SILENT:
-		error_return = -1;
-		break;
-	case CONFIG_ERROR_UNSET:
-		BUG("config error action unset");
-	}
-
-	free(error_msg);
-	return error_return;
-}
-
 static uintmax_t get_unit_factor(const char *end)
 {
 	if (!*end)
@@ -1961,111 +1434,6 @@ int git_default_config(const char *var, const char *value,
 	return 0;
 }
 
-/*
- * All source specific fields in the union, die_on_error, name and the callbacks
- * fgetc, ungetc, ftell of top need to be initialized before calling
- * this function.
- */
-static int do_config_from(struct config_source *top, config_fn_t fn,
-			  void *data, enum config_scope scope,
-			  const struct config_options *opts)
-{
-	struct key_value_info kvi = KVI_INIT;
-	int ret;
-
-	/* push config-file parsing state stack */
-	top->linenr = 1;
-	top->eof = 0;
-	top->total_len = 0;
-	strbuf_init(&top->value, 1024);
-	strbuf_init(&top->var, 1024);
-	kvi_from_source(top, scope, &kvi);
-
-	ret = git_parse_source(top, fn, &kvi, data, opts);
-
-	strbuf_release(&top->value);
-	strbuf_release(&top->var);
-
-	return ret;
-}
-
-static int do_config_from_file(config_fn_t fn,
-			       const enum config_origin_type origin_type,
-			       const char *name, const char *path, FILE *f,
-			       void *data, enum config_scope scope,
-			       const struct config_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-	int ret;
-
-	top.u.file = f;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = path;
-	top.default_error_action = CONFIG_ERROR_DIE;
-	top.do_fgetc = config_file_fgetc;
-	top.do_ungetc = config_file_ungetc;
-	top.do_ftell = config_file_ftell;
-
-	flockfile(f);
-	ret = do_config_from(&top, fn, data, scope, opts);
-	funlockfile(f);
-	return ret;
-}
-
-static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope)
-{
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, NULL);
-}
-
-int git_config_from_file_with_options(config_fn_t fn, const char *filename,
-				      void *data, enum config_scope scope,
-				      const struct config_options *opts)
-{
-	int ret = -1;
-	FILE *f;
-
-	if (!filename)
-		BUG("filename cannot be NULL");
-	f = fopen_or_warn(filename, "r");
-	if (f) {
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
-					  filename, f, data, scope, opts);
-		fclose(f);
-	}
-	return ret;
-}
-
-int git_config_from_file(config_fn_t fn, const char *filename, void *data)
-{
-	return git_config_from_file_with_options(fn, filename, data,
-						 CONFIG_SCOPE_UNKNOWN, NULL);
-}
-
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type origin_type,
-			const char *name, const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-
-	top.u.buf.buf = buf;
-	top.u.buf.len = len;
-	top.u.buf.pos = 0;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = NULL;
-	top.default_error_action = CONFIG_ERROR_ERROR;
-	top.do_fgetc = config_buf_fgetc;
-	top.do_ungetc = config_buf_ungetc;
-	top.do_ftell = config_buf_ftell;
-
-	return do_config_from(&top, fn, data, scope, opts);
-}
-
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
 			      struct repository *repo,
diff --git a/config.h b/config.h
index 40966cb6828..1a7dcc24208 100644
--- a/config.h
+++ b/config.h
@@ -4,7 +4,7 @@
 #include "hashmap.h"
 #include "string-list.h"
 #include "repository.h"
-
+#include "config-parse.h"
 
 /**
  * The config API gives callers a way to access Git configuration files
@@ -23,9 +23,6 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these: */
-#define CONFIG_INVALID_KEY 1
-#define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
 #define CONFIG_NO_LOCK -1
 #define CONFIG_INVALID_FILE 3
@@ -36,17 +33,6 @@ struct object_id;
 
 #define CONFIG_REGEX_NONE ((void *)1)
 
-enum config_scope {
-	CONFIG_SCOPE_UNKNOWN = 0,
-	CONFIG_SCOPE_SYSTEM,
-	CONFIG_SCOPE_GLOBAL,
-	CONFIG_SCOPE_LOCAL,
-	CONFIG_SCOPE_WORKTREE,
-	CONFIG_SCOPE_COMMAND,
-	CONFIG_SCOPE_SUBMODULE,
-};
-const char *config_scope_name(enum config_scope scope);
-
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
@@ -54,137 +40,9 @@ struct git_config_source {
 	enum config_scope scope;
 };
 
-enum config_origin_type {
-	CONFIG_ORIGIN_UNKNOWN = 0,
-	CONFIG_ORIGIN_BLOB,
-	CONFIG_ORIGIN_FILE,
-	CONFIG_ORIGIN_STDIN,
-	CONFIG_ORIGIN_SUBMODULE_BLOB,
-	CONFIG_ORIGIN_CMDLINE
-};
-
-enum config_event_t {
-	CONFIG_EVENT_SECTION,
-	CONFIG_EVENT_ENTRY,
-	CONFIG_EVENT_WHITESPACE,
-	CONFIG_EVENT_COMMENT,
-	CONFIG_EVENT_EOF,
-	CONFIG_EVENT_ERROR
-};
-
-struct config_source;
-/*
- * The parser event function (if not NULL) is called with the event type and
- * the begin/end offsets of the parsed elements.
- *
- * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
- * character is considered part of the element.
- */
-typedef int (*config_parser_event_fn_t)(enum config_event_t type,
-					size_t begin_offset, size_t end_offset,
-					struct config_source *cs,
-					void *event_fn_data);
-
-struct config_options {
-	unsigned int respect_includes : 1;
-	unsigned int ignore_repo : 1;
-	unsigned int ignore_worktree : 1;
-	unsigned int ignore_cmdline : 1;
-	unsigned int system_gently : 1;
-
-	/*
-	 * For internal use. Include all includeif.hasremoteurl paths without
-	 * checking if the repo has that remote URL, and when doing so, verify
-	 * that files included in this way do not configure any remote URLs
-	 * themselves.
-	 */
-	unsigned int unconditional_remote_url : 1;
-
-	const char *commondir;
-	const char *git_dir;
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
-};
-
-/* Config source metadata for a given config key-value pair */
-struct key_value_info {
-	const char *filename;
-	int linenr;
-	enum config_origin_type origin_type;
-	enum config_scope scope;
-	const char *path;
-};
-#define KVI_INIT { \
-	.filename = NULL, \
-	.linenr = -1, \
-	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
-	.scope = CONFIG_SCOPE_UNKNOWN, \
-	.path = NULL, \
-}
-
-/* Captures additional information that a config callback can use. */
-struct config_context {
-	/* Config source metadata for key and value. */
-	const struct key_value_info *kvi;
-};
-#define CONFIG_CONTEXT_INIT { 0 }
-
-/**
- * A config callback function takes four parameters:
- *
- * - the name of the parsed variable. This is in canonical "flat" form: the
- *   section, subsection, and variable segments will be separated by dots,
- *   and the section and variable segments will be all lowercase. E.g.,
- *   `core.ignorecase`, `diff.SomeType.textconv`.
- *
- * - the value of the found variable, as a string. If the variable had no
- *   value specified, the value will be NULL (typically this means it
- *   should be interpreted as boolean true).
- *
- * - the 'config context', that is, additional information about the config
- *   iteration operation provided by the config machinery. For example, this
- *   includes information about the config source being parsed (e.g. the
- *   filename).
- *
- * - a void pointer passed in by the caller of the config API; this can
- *   contain callback-specific data
- *
- * A config callback should return 0 for success, or -1 if the variable
- * could not be parsed properly.
- */
-typedef int (*config_fn_t)(const char *, const char *,
-			   const struct config_context *, void *);
-
 int git_default_config(const char *, const char *,
 		       const struct config_context *, void *);
 
-/**
- * Read a specific file in git-config format.
- * This function takes the same callback and data parameters as `git_config`.
- *
- * Unlike git_config(), this function does not respect includes.
- */
-int git_config_from_file(config_fn_t fn, const char *, void *);
-
-int git_config_from_file_with_options(config_fn_t fn, const char *,
-				      void *, enum config_scope,
-				      const struct config_options *);
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type,
-			const char *name,
-			const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
@@ -322,8 +180,6 @@ int repo_config_set_worktree_gently(struct repository *, const char *, const cha
  */
 void git_config_set(const char *, const char *);
 
-int git_config_parse_key(const char *, char **, size_t *);
-
 /*
  * The following macros specify flag bits that alter the behavior
  * of the git_config_set_multivar*() methods.
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/2] config: return positive from git_config_parse_key()
  2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
@ 2023-07-20 23:44   ` Jonathan Tan
  2023-07-21  4:32   ` Junio C Hamano
  1 sibling, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-07-20 23:44 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget; +Cc: Jonathan Tan, git, Calvin Wan, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
> 
> a) There's no good reason why those callers couldn't negate the value
>    themselves.
> 
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).
> 
> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

I'm not sure if we ever concluded that functions returning errors should
return positive integers, but in this case I think it makes sense. We
can document what's returned as being the same as what's documented in
the config manpage.

The negative return was as early as when the function was first
introduced in b09c53a3e3 (Sanity-check config variable names, 2011-01-
30), but there's no indication there as to why the author chose negative
values.

> Change git_config_parse_key() to return positive values instead, and
> adjust callers accordingly. Callers that sanitize the negative sign for
> exit(3) now pass the return value opaquely, fixing a bug where "git
> config <key with no section or name>" results in a different exit code
> depending on whether we are setting or getting config.

Can you be more precise as to which bug is being fixed? (I think
somewhere, a 1 is returned when it should be a 2.)

> Callers that
> wanted to pass along a negative value now negate the return value
> themselves.

OK.

> diff --git a/builtin/config.c b/builtin/config.c
> index 1c75cbc43df..8a2840f0a8c 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
>  			goto free_strings;
>  		}
>  	} else {
> -		if (git_config_parse_key(key_, &key, NULL)) {
> -			ret = CONFIG_INVALID_KEY;
> +		if ((ret = git_config_parse_key(key_, &key, NULL))) {
>  			goto free_strings;
>  		}
>  	}

Ah, here, the return value was sanitized in such a way that it lost
information. The change makes sense.

Besides the callers modified in this patch, there is another caller
config_parse_pair() in config.c, but that just checks whether the return
value is 0, so it remaining unmodified is fine.

> diff --git a/config.h b/config.h
> index 6332d749047..40966cb6828 100644
> --- a/config.h
> +++ b/config.h
> @@ -23,7 +23,7 @@
>  
>  struct object_id;
>  
> -/* git_config_parse_key() returns these negated: */
> +/* git_config_parse_key() returns these: */
>  #define CONFIG_INVALID_KEY 1
>  #define CONFIG_NO_SECTION_OR_NAME 2

Should these be turned into an enum? Also, it might be worth adding that
these match the return values as documented in the manpage.

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 387d336c91f..3202b0f8843 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
>  	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
>  '
>  
> +# Exit codes
> +test_expect_success '--get with bad key' '

Rather than put an "exit codes" title, maybe embed that in the test
description.

> +	# Also exits with 1 if the value is not found

I don't understand this comment - what's the difference between a bad
key and a value not being found? And if there's a difference, could we
test both?

> +	test_expect_code 1 git config --get "bad.name\n" 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config --get "bad." 2>err &&
> +	grep "error: key does not contain variable name" err
> +'
> +
> +test_expect_success 'set with bad key' '
> +	test_expect_code 1 git config "bad.name\n" var 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config "bad." var 2>err &&
> +	grep "error: key does not contain variable name" err
> +'

Makes sense.

From a libification perspective, I'm not sure that using positive values
to indicate error is an advantage over negative values, but it makes
sense in this particular context to have the return values match the
manpage exactly, since that is part of the benefit of this function. So
I think this patch is worth getting in by itself.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/2] config-parse: split library out of config.[c|h]
  2023-07-20 22:17 ` [PATCH 2/2] config-parse: split library out of config.[c|h] Glen Choo via GitGitGadget
@ 2023-07-21  0:31   ` Jonathan Tan
  2023-07-21 15:55     ` Glen Choo
  0 siblings, 1 reply; 49+ messages in thread
From: Jonathan Tan @ 2023-07-21  0:31 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget; +Cc: Jonathan Tan, git, Calvin Wan, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Begin this process by splitting the config parsing code out of
> config.[c|h] and into config-parse.[c|h].

I think we need to be more careful in how we split. It would be easier
if there is a concrete use case, but some preliminary questions:

 - "respect_includes" is in the library, but callers might want to opt
   out of it or provide an alternative way to resolve includes.
 - There is a lot of error reporting capability with respect to the
   source of config, and not all sources are applicable to library
   users. How should we proceed? E.g. should we expect that all library
   users conform to the list here (e.g. even if the source is something
   like but not exactly STDIN, they should pick it), or allow users to
   customize sources?

In the absence of more information, the split I would envision is
either something that can only parse a buffer, its error messages being
very generic (the caller should turn them into something more specific
before showing them to the user) (but one problem here is that we must
postprocess includes, which might be a problem if the output of parsing
is a flat hashtable, since we wouldn't know which keys are overridden
by the includes and which are not); or something that can take in a
callback that is invoked whenever something is included and maybe also
a callback for access to the object database and that has full knowledge
of all sources for error reporting (or allows the caller to customize
the sources).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/2] config: return positive from git_config_parse_key()
  2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
  2023-07-20 23:44   ` Jonathan Tan
@ 2023-07-21  4:32   ` Junio C Hamano
  2023-07-21 16:12     ` Glen Choo
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2023-07-21  4:32 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget; +Cc: git, Jonathan Tan, Calvin Wan, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
>
> a) There's no good reason why those callers couldn't negate the value
>    themselves.

That is not a good reason to break from a widely adopted convention
in UNIXy library functions to signal success with 0 and failure with
negative values.  The callers if they want to have a positive values
can flip the polarity themselves, by the way.

>
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).

There is no reason to think that each and every minute difference
the direct callers of a library function may want to notice by
different error return values needs to be propagated to the calling
process via its exit value.  It is perfectly fine and expected for
the status values of the entire process is more coarse grained than
individual library calls, the latter may convey not just "I failed"
but "I failed why" to their callers, while the former may not want
to say "I made a call to some library functions and got this error
code", let alone "I called library function X and got error code Y".

In other words, if your program does

	err = library_call_about_some_filesystem_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_database_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_parsing();
	if (err)
		exit(err); /* or exit(-err) */

there is something wrong.  The error codes from these different
library functions share the same "integer" namespace without being
segregated, and expecting the calling process to be able to tell
what error we discovered by relaying the literal translation of low
level library error code would not work.  The exit() codes would
need to be wider (i.e. not limited only to the possible errors from
a single library function) and would be coarser (i.e. a filesystem
operation may say "open failed due to permission error" or "open
failed because there was no such file"; at the end-user level, it
may be more appropriate to say "configuration file could not be
read", regardless of the reason why the filesystem operation
failed).

	err = library_call_about_some_filesystem_operation();
	if (err) {
		error("cannot open the filesystem entity");
		exit(ERR_FILESYSTEM);
	err = library_call_about_some_database_operation();
	if (err)
		exit(ERR_DATABASE);
	err = library_call_about_some_parsing();
	if (err)
		exit(ERR_PARSING);

So, this is not a good reason, either.

> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

Quite the contrary, if it were a purely internal convention, we may
not care too much, as we have only ourselves to confuse by picking
an unusual convention, but if we are making more parts of our code
available in a library-ish interface, I would expect they follow
some convention, and that convention would be "0 for success,
negative for failure".

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/2] config-parse: split library out of config.[c|h]
  2023-07-21  0:31   ` Jonathan Tan
@ 2023-07-21 15:55     ` Glen Choo
  0 siblings, 0 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-21 15:55 UTC (permalink / raw)
  To: Jonathan Tan, Glen Choo via GitGitGadget; +Cc: Jonathan Tan, git, Calvin Wan

Jonathan Tan <jonathantanmy@google.com> writes:

>> Begin this process by splitting the config parsing code out of
>> config.[c|h] and into config-parse.[c|h].
>
> I think we need to be more careful in how we split. It would be easier
> if there is a concrete use case, but some preliminary questions:
>
>  - "respect_includes" is in the library, but callers might want to opt
>    out of it or provide an alternative way to resolve includes.

Makes sense. Or alternatively, we could choose not to support
"respect_includes" initially, and exclude it to avoid confusion.

>  - There is a lot of error reporting capability with respect to the
>    source of config, and not all sources are applicable to library
>    users. How should we proceed? E.g. should we expect that all library
>    users conform to the list here (e.g. even if the source is something
>    like but not exactly STDIN, they should pick it), or allow users to
>    customize sources?

Good point. I would also prefer to have the list of sources constrained
to the list of sources available via the library. Some possibilities I
can see are:

1. Move the Git-program-specific error message reporting to a level above
   the library (i.e. config.c).

2. Proceed as-is (with the additional sources in the library) and leave a
   FIXME to address this when we find a Git library-idiomatic way to
   handle errors. This won't be the last time we'll have to untangle
   Git-program-specific error reporting from the library - it might be
   useful to try to figure out all of that in one fell swoop.

3. Figure out library-idiomatic error handling mentioned in 2. right
   now.

I think 1. is the best option, but if that fails, 2. is also reasonable.
3. is too difficult to do with a sample size of 1.

> In the absence of more information, the split I would envision is
> either something that can only parse a buffer, its error messages being
> very generic (the caller should turn them into something more specific
> before showing them to the user) (but one problem here is that we must
> postprocess includes, which might be a problem if the output of parsing
> is a flat hashtable, since we wouldn't know which keys are overridden
> by the includes and which are not);

Hm, how does the include mechanism here this differ from what's in this
patch? This also only parses a single file and ignores includes. I'm not
sure why this requires us to postprocess includes - in config.c,
includes are handled by 'pausing' parsing of the current source,
evaluating the included files, then 'resuming' parsing.

> or something that can take in a
> callback that is invoked whenever something is included and maybe also
> a callback for access to the object database and that has full knowledge
> of all sources for error reporting (or allows the caller to customize
> the sources).

Ah, I like this callback idea quite a lot. This lets config-parse.c
easily support unconditional includes and provides entry points for
program-specific behavior (like checking the odb). I will try this.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/2] config: return positive from git_config_parse_key()
  2023-07-21  4:32   ` Junio C Hamano
@ 2023-07-21 16:12     ` Glen Choo
  2023-07-21 16:36       ` Junio C Hamano
  0 siblings, 1 reply; 49+ messages in thread
From: Glen Choo @ 2023-07-21 16:12 UTC (permalink / raw)
  To: Junio C Hamano, Glen Choo via GitGitGadget; +Cc: git, Jonathan Tan, Calvin Wan

Junio C Hamano <gitster@pobox.com> writes:

>> From: Glen Choo <chooglen@google.com>
>>
>> git_config_parse_key() returns #define-d error codes, but negated. This
>> negation is merely a convenience to other parts of config.c that don't
>> bother inspecting the return value before passing it along. But:
>>
>> a) There's no good reason why those callers couldn't negate the value
>>    themselves.
>
> That is not a good reason to break from a widely adopted convention
> in UNIXy library functions to signal success with 0 and failure with
> negative values.  The callers if they want to have a positive values
> can flip the polarity themselves, by the way.

Oh, interesting. I was trying to follow the conventions of the
surrounding config.c code and many other parts of the codebase, which
returns positive values. Why do we choose to return postive values
throughout the codebase, by the way? Is it because they were really
intended for exit(3), and not to be used as a library.

>>
>> b) In other callers, this value eventually gets fed to exit(3), and
>>    those callers need to sanitize the negative value (and they sometimes
>>    do so lossily, by overriding the return value with
>>    CONFIG_INVALID_KEY).
>
> There is no reason to think that each and every minute difference
> the direct callers of a library function may want to notice by
> different error return values needs to be propagated to the calling
> process via its exit value.

Yes, I fully agree. I didn't intend to be a statement on how things
should be, but rather how they already are. The oddities in this case
are:

- No callers actually care about the sign of git_config_parse_key()
  since it can only return values of one sign. Only
  configset_find_element() benefits from the sign being negative (since
  it returns negative on config key errors), but instead of putting the
  burden on the function it depends on, it could just return the
  negative value itself. And this "burden" is real, in that other
  callers have to worry about the negative value.

- For better or worse, we've already wired git_config_parse_key()
  directly to the exit values, e.g. if one peeks into
  builtin/config.c:cmd_config(), we'll see "return
  git_config_set_multivar_in_file_gently()", which in turn may return
  the value from git_config_parse_key(). (And as a result, we also try
  hard to separate the error values returnable by git_config_parse_key()
  vs the others returnable by git_config_set_multivar_in_file_gently().)

  I would strongly prefer if builtin/config.c took more responsibility
  for noticing config.c errors and sanitizing them accordingly, but it
  seemed like too much churn for this series. If you think it is the
  right time for it, though (which I think it might be), I could try to
  make that change.

>> c) We want to move that into a separate library, and returning only
>>    negative values no longer makes as much sense.
>
> Quite the contrary, if it were a purely internal convention, we may
> not care too much, as we have only ourselves to confuse by picking
> an unusual convention, but if we are making more parts of our code
> available in a library-ish interface, I would expect they follow
> some convention, and that convention would be "0 for success,
> negative for failure".

Right. I do not care what the convention is, only that we pick one. I
chose the one that I saw in the surrounding code (positive), but I'm
happy to go with UNIXy (negative) if others think it is worth the churn.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/2] config: return positive from git_config_parse_key()
  2023-07-21 16:12     ` Glen Choo
@ 2023-07-21 16:36       ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-07-21 16:36 UTC (permalink / raw)
  To: Glen Choo; +Cc: Glen Choo via GitGitGadget, git, Jonathan Tan, Calvin Wan

Glen Choo <chooglen@google.com> writes:

> Oh, interesting. I was trying to follow the conventions of the
> surrounding config.c code and many other parts of the codebase, which
> returns positive values. Why do we choose to return postive values
> throughout the codebase, by the way? Is it because they were really
> intended for exit(3), and not to be used as a library.

If config.c does that, I'd say that was poorly designed oddball.
Looking at read-cache.c (which is older parts of the codebase
written back when the developer base was smaller) may give you a
better examples to follow.  After all, error() returns negative
exactly because we want to follow the usual "negative is an error"
convention.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 0/5] config-parse: create config parsing library
  2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
  2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
  2023-07-20 22:17 ` [PATCH 2/2] config-parse: split library out of config.[c|h] Glen Choo via GitGitGadget
@ 2023-07-31 23:46 ` Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 1/5] config: return positive from git_config_parse_key() Glen Choo
                     ` (4 more replies)
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
  4 siblings, 5 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

I'll be leaving Google soon, so I won't be pushing this series through to
completion :/ As such, here's an rfc-quality 'reroll' to demonstrate how I
intended to respond to the comments on v1 2/2 that discussed where the library
boundary should be [1]. In particular:

> - "respect_includes" is in the library, but callers might want to opt
>   out of it or provide an alternative way to resolve includes.

I've punted on this altogether by removing "respect_includes" from the options
accepted by the library functions. As 2/5 describes, config_options is quite
bloated, so this split already makes sense even without config-parse.

Initially, I considered turning the "includes" machinery into a first-class
citizen in config-parse instead of having it implemented as a separate config
callback, but it turned out to be much more complicated than it first appears.
The biggest challenge is that as "hasconfig:remote.*url" performs the additional
pass over the config files, it performs different actions - from the starting
file, it collects remote urls, then in files that could be imported via
"hasconfig:remote.*url", it forbid remote urls. This 'action switching' didn't
translate well into the lower level machinery, so I opted to leave this for
later.

> - There is a lot of error reporting capability with respect to the
>   source of config, and not all sources are applicable to library
>   users. How should we proceed? E.g. should we expect that all library
>   users conform to the list here (e.g. even if the source is something
>   like but not exactly STDIN, they should pick it), or allow users to
>   customize sources?

I've opted to move the error reporting to a callback function (3/5), so in-tree
callers will use the existing logic, but other callers can provide their own
error handling. I've reused the config 'event listeners' for this purpose since
they seemed pretty well-suited to the task - they already respond to errors, and
the in-tree error handling doesn't do anything that an event listener can't
express.

As a result, the library now excludes functions that use the in-tree error
handling; we either teach the function to accept event listeners or exclude it
altogether.

[1] https://lore.kernel.org/git/20230721003107.3095493-1-jonathantanmy@google.com

Glen Choo (5):
  config: return positive from git_config_parse_key()
  config: split out config_parse_options
  config: report config parse errors using cb
  config.c: accept config_parse_options in git_config_from_stdin
  config-parse: split library out of config.[c|h]

 Makefile           |   1 +
 builtin/config.c   |   7 +-
 bundle-uri.c       |   4 +-
 config-parse.c     | 561 ++++++++++++++++++++++++++++++++++++++
 config-parse.h     | 155 +++++++++++
 config.c           | 660 ++++-----------------------------------------
 config.h           | 134 +--------
 fsck.c             |   4 +-
 submodule-config.c |  13 +-
 t/t1300-config.sh  |  16 ++
 10 files changed, 816 insertions(+), 739 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

Range-diff:
1:  9924481630 = 1:  9924481630 config: return positive from git_config_parse_key()
-:  ---------- > 2:  8807be8319 config: split out config_parse_options
-:  ---------- > 3:  3d4a2357f3 config: report config parse errors using cb
-:  ---------- > 4:  ff03ee1de7 config.c: accept config_parse_options in git_config_from_stdin
2:  4461d2163e ! 5:  377acbfbb5 config-parse: split library out of config.[c|h]
    @@ config-parse.c (new)
     +struct parse_event_data {
     +	enum config_event_t previous_type;
     +	size_t previous_offset;
    -+	const struct config_options *opts;
    ++	const struct config_parse_options *opts;
     +};
     +
     +static int do_event(struct config_source *cs, enum config_event_t type,
    @@ config-parse.c (new)
     +
     +static int git_parse_source(struct config_source *cs, config_fn_t fn,
     +			    struct key_value_info *kvi, void *data,
    -+			    const struct config_options *opts)
    ++			    const struct config_parse_options *opts)
     +{
     +	int comment = 0;
     +	size_t baselen = 0;
     +	struct strbuf *var = &cs->var;
    -+	int error_return = 0;
    -+	char *error_msg = NULL;
     +
     +	/* U+FEFF Byte Order Mark in UTF8 */
     +	const char *bomptr = utf8_bom;
    @@ config-parse.c (new)
     +		if (get_value(cs, kvi, fn, data, var) < 0)
     +			break;
     +	}
    -+
    -+	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
    -+		return -1;
    -+
    -+	switch (cs->origin_type) {
    -+	case CONFIG_ORIGIN_BLOB:
    -+		error_msg = xstrfmt(_("bad config line %d in blob %s"),
    -+				      cs->linenr, cs->name);
    -+		break;
    -+	case CONFIG_ORIGIN_FILE:
    -+		error_msg = xstrfmt(_("bad config line %d in file %s"),
    -+				      cs->linenr, cs->name);
    -+		break;
    -+	case CONFIG_ORIGIN_STDIN:
    -+		error_msg = xstrfmt(_("bad config line %d in standard input"),
    -+				      cs->linenr);
    -+		break;
    -+	case CONFIG_ORIGIN_SUBMODULE_BLOB:
    -+		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
    -+				       cs->linenr, cs->name);
    -+		break;
    -+	case CONFIG_ORIGIN_CMDLINE:
    -+		error_msg = xstrfmt(_("bad config line %d in command line %s"),
    -+				       cs->linenr, cs->name);
    -+		break;
    -+	default:
    -+		error_msg = xstrfmt(_("bad config line %d in %s"),
    -+				      cs->linenr, cs->name);
    -+	}
    -+
    -+	switch (opts && opts->error_action ?
    -+		opts->error_action :
    -+		cs->default_error_action) {
    -+	case CONFIG_ERROR_DIE:
    -+		die("%s", error_msg);
    -+		break;
    -+	case CONFIG_ERROR_ERROR:
    -+		error_return = error("%s", error_msg);
    -+		break;
    -+	case CONFIG_ERROR_SILENT:
    -+		error_return = -1;
    -+		break;
    -+	case CONFIG_ERROR_UNSET:
    -+		BUG("config error action unset");
    -+	}
    -+
    -+	free(error_msg);
    -+	return error_return;
    ++	/*
    ++	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    ++	 * in order for our callback to receive the error event, we have to call
    ++	 * do_event twice
    ++	 */
    ++	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++	return -1;
     +}
     +
     +/*
    @@ config-parse.c (new)
     + */
     +static int do_config_from(struct config_source *top, config_fn_t fn,
     +			  void *data, enum config_scope scope,
    -+			  const struct config_options *opts)
    ++			  const struct config_parse_options *opts)
     +{
     +	struct key_value_info kvi = KVI_INIT;
     +	int ret;
    @@ config-parse.c (new)
     +			       const enum config_origin_type origin_type,
     +			       const char *name, const char *path, FILE *f,
     +			       void *data, enum config_scope scope,
    -+			       const struct config_options *opts)
    ++			       const struct config_parse_options *opts)
     +{
     +	struct config_source top = CONFIG_SOURCE_INIT;
     +	int ret;
    @@ config-parse.c (new)
     +	top.origin_type = origin_type;
     +	top.name = name;
     +	top.path = path;
    -+	top.default_error_action = CONFIG_ERROR_DIE;
     +	top.do_fgetc = config_file_fgetc;
     +	top.do_ungetc = config_file_ungetc;
     +	top.do_ftell = config_file_ftell;
    @@ config-parse.c (new)
     +	return ret;
     +}
     +
    -+int git_config_from_stdin(config_fn_t fn, void *data,
    -+			  enum config_scope scope)
    ++int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
    ++			  const struct config_parse_options *config_opts)
     +{
     +	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
    -+				   data, scope, NULL);
    ++				   data, scope, config_opts);
     +}
     +
     +int git_config_from_file_with_options(config_fn_t fn, const char *filename,
     +				      void *data, enum config_scope scope,
    -+				      const struct config_options *opts)
    ++				      const struct config_parse_options *opts)
     +{
     +	int ret = -1;
     +	FILE *f;
    @@ config-parse.c (new)
     +	return ret;
     +}
     +
    -+int git_config_from_file(config_fn_t fn, const char *filename, void *data)
    -+{
    -+	return git_config_from_file_with_options(fn, filename, data,
    -+						 CONFIG_SCOPE_UNKNOWN, NULL);
    -+}
    -+
     +int git_config_from_mem(config_fn_t fn,
     +			const enum config_origin_type origin_type,
     +			const char *name, const char *buf, size_t len,
     +			void *data, enum config_scope scope,
    -+			const struct config_options *opts)
    ++			const struct config_parse_options *opts)
     +{
     +	struct config_source top = CONFIG_SOURCE_INIT;
     +
    @@ config-parse.c (new)
     +	top.origin_type = origin_type;
     +	top.name = name;
     +	top.path = NULL;
    -+	top.default_error_action = CONFIG_ERROR_ERROR;
     +	top.do_fgetc = config_buf_fgetc;
     +	top.do_ungetc = config_buf_ungetc;
     +	top.do_ftell = config_buf_ftell;
    @@ config-parse.h (new)
     +/*
     + * Low level config parsing.
     + */
    -+#ifndef CONFIG_LIB_H
    -+#define CONFIG_LIB_H
    ++#ifndef CONFIG_PARSE_H
    ++#define CONFIG_PARSE_H
     +
     +#include "strbuf.h"
     +
    @@ config-parse.h (new)
     +					struct config_source *cs,
     +					void *event_fn_data);
     +
    -+struct config_options {
    -+	unsigned int respect_includes : 1;
    -+	unsigned int ignore_repo : 1;
    -+	unsigned int ignore_worktree : 1;
    -+	unsigned int ignore_cmdline : 1;
    -+	unsigned int system_gently : 1;
    -+
    -+	/*
    -+	 * For internal use. Include all includeif.hasremoteurl paths without
    -+	 * checking if the repo has that remote URL, and when doing so, verify
    -+	 * that files included in this way do not configure any remote URLs
    -+	 * themselves.
    -+	 */
    -+	unsigned int unconditional_remote_url : 1;
    -+
    -+	const char *commondir;
    -+	const char *git_dir;
    ++struct config_parse_options {
     +	/*
     +	 * event_fn and event_fn_data are for internal use only. Handles events
     +	 * emitted by the config parser.
     +	 */
     +	config_parser_event_fn_t event_fn;
     +	void *event_fn_data;
    -+	enum config_error_action {
    -+		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
    -+		CONFIG_ERROR_DIE, /* die() on error */
    -+		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
    -+		CONFIG_ERROR_SILENT, /* return -1 */
    -+	} error_action;
     +};
     +
     +struct config_source {
    @@ config-parse.h (new)
     +	enum config_origin_type origin_type;
     +	const char *name;
     +	const char *path;
    -+	enum config_error_action default_error_action;
     +	int linenr;
     +	int eof;
     +	size_t total_len;
    @@ config-parse.h (new)
     +typedef int (*config_fn_t)(const char *, const char *,
     +			   const struct config_context *, void *);
     +
    -+/**
    -+ * Read a specific file in git-config format.
    -+ */
    -+int git_config_from_file(config_fn_t fn, const char *, void *);
    -+
     +int git_config_from_file_with_options(config_fn_t fn, const char *,
     +				      void *, enum config_scope,
    -+				      const struct config_options *);
    ++				      const struct config_parse_options *);
     +
     +int git_config_from_mem(config_fn_t fn,
     +			const enum config_origin_type,
     +			const char *name,
     +			const char *buf, size_t len,
     +			void *data, enum config_scope scope,
    -+			const struct config_options *opts);
    ++			const struct config_parse_options *opts);
     +
    -+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope);
    ++int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
    ++			  const struct config_parse_options *config_opts);
     +
    -+#endif /* CONFIG_LIB_H */
    ++#endif /* CONFIG_PARSE_H */
     
      ## config.c ##
     @@
    @@ config.c
     -	enum config_origin_type origin_type;
     -	const char *name;
     -	const char *path;
    --	enum config_error_action default_error_action;
     -	int linenr;
     -	int eof;
     -	size_t total_len;
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -struct parse_event_data {
     -	enum config_event_t previous_type;
     -	size_t previous_offset;
    --	const struct config_options *opts;
    +-	const struct config_parse_options *opts;
     -};
     -
     -static int do_event(struct config_source *cs, enum config_event_t type,
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -	out->path = cs->path;
     -}
     -
    + int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
    + 		      size_t end_offset UNUSED, struct config_source *cs,
    + 		      void *data)
    +@@ config.c: int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
    + 	return error_return;
    + }
    + 
     -static int git_parse_source(struct config_source *cs, config_fn_t fn,
     -			    struct key_value_info *kvi, void *data,
    --			    const struct config_options *opts)
    +-			    const struct config_parse_options *opts)
     -{
     -	int comment = 0;
     -	size_t baselen = 0;
     -	struct strbuf *var = &cs->var;
    --	int error_return = 0;
    --	char *error_msg = NULL;
     -
     -	/* U+FEFF Byte Order Mark in UTF8 */
     -	const char *bomptr = utf8_bom;
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -			break;
     -	}
     -
    --	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
    --		return -1;
    --
    --	switch (cs->origin_type) {
    --	case CONFIG_ORIGIN_BLOB:
    --		error_msg = xstrfmt(_("bad config line %d in blob %s"),
    --				      cs->linenr, cs->name);
    --		break;
    --	case CONFIG_ORIGIN_FILE:
    --		error_msg = xstrfmt(_("bad config line %d in file %s"),
    --				      cs->linenr, cs->name);
    --		break;
    --	case CONFIG_ORIGIN_STDIN:
    --		error_msg = xstrfmt(_("bad config line %d in standard input"),
    --				      cs->linenr);
    --		break;
    --	case CONFIG_ORIGIN_SUBMODULE_BLOB:
    --		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
    --				       cs->linenr, cs->name);
    --		break;
    --	case CONFIG_ORIGIN_CMDLINE:
    --		error_msg = xstrfmt(_("bad config line %d in command line %s"),
    --				       cs->linenr, cs->name);
    --		break;
    --	default:
    --		error_msg = xstrfmt(_("bad config line %d in %s"),
    --				      cs->linenr, cs->name);
    --	}
    --
    --	switch (opts && opts->error_action ?
    --		opts->error_action :
    --		cs->default_error_action) {
    --	case CONFIG_ERROR_DIE:
    --		die("%s", error_msg);
    --		break;
    --	case CONFIG_ERROR_ERROR:
    --		error_return = error("%s", error_msg);
    --		break;
    --	case CONFIG_ERROR_SILENT:
    --		error_return = -1;
    --		break;
    --	case CONFIG_ERROR_UNSET:
    --		BUG("config error action unset");
    --	}
    --
    --	free(error_msg);
    --	return error_return;
    +-	/*
    +-	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    +-	 * in order for our callback to receive the error event, we have to call
    +-	 * do_event twice
    +-	 */
    +-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    +-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    +-	return -1;
     -}
     -
      static uintmax_t get_unit_factor(const char *end)
    @@ config.c: int git_default_config(const char *var, const char *value,
     - */
     -static int do_config_from(struct config_source *top, config_fn_t fn,
     -			  void *data, enum config_scope scope,
    --			  const struct config_options *opts)
    +-			  const struct config_parse_options *opts)
     -{
     -	struct key_value_info kvi = KVI_INIT;
     -	int ret;
    @@ config.c: int git_default_config(const char *var, const char *value,
     -			       const enum config_origin_type origin_type,
     -			       const char *name, const char *path, FILE *f,
     -			       void *data, enum config_scope scope,
    --			       const struct config_options *opts)
    +-			       const struct config_parse_options *opts)
     -{
     -	struct config_source top = CONFIG_SOURCE_INIT;
     -	int ret;
    @@ config.c: int git_default_config(const char *var, const char *value,
     -	top.origin_type = origin_type;
     -	top.name = name;
     -	top.path = path;
    --	top.default_error_action = CONFIG_ERROR_DIE;
     -	top.do_fgetc = config_file_fgetc;
     -	top.do_ungetc = config_file_ungetc;
     -	top.do_ftell = config_file_ftell;
    @@ config.c: int git_default_config(const char *var, const char *value,
     -}
     -
     -static int git_config_from_stdin(config_fn_t fn, void *data,
    --				 enum config_scope scope)
    +-				 enum config_scope scope,
    +-				 const struct config_parse_options *config_opts)
     -{
     -	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
    --				   data, scope, NULL);
    +-				   data, scope, config_opts);
     -}
     -
     -int git_config_from_file_with_options(config_fn_t fn, const char *filename,
     -				      void *data, enum config_scope scope,
    --				      const struct config_options *opts)
    +-				      const struct config_parse_options *opts)
     -{
     -	int ret = -1;
     -	FILE *f;
    @@ config.c: int git_default_config(const char *var, const char *value,
     -	return ret;
     -}
     -
    --int git_config_from_file(config_fn_t fn, const char *filename, void *data)
    --{
    --	return git_config_from_file_with_options(fn, filename, data,
    --						 CONFIG_SCOPE_UNKNOWN, NULL);
    --}
    --
    + int git_config_from_file(config_fn_t fn, const char *filename, void *data)
    + {
    + 	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
    +@@ config.c: int git_config_from_file(config_fn_t fn, const char *filename, void *data)
    + 						 CONFIG_SCOPE_UNKNOWN, &config_opts);
    + }
    + 
     -int git_config_from_mem(config_fn_t fn,
     -			const enum config_origin_type origin_type,
     -			const char *name, const char *buf, size_t len,
     -			void *data, enum config_scope scope,
    --			const struct config_options *opts)
    +-			const struct config_parse_options *opts)
     -{
     -	struct config_source top = CONFIG_SOURCE_INIT;
     -
    @@ config.c: int git_default_config(const char *var, const char *value,
     -	top.origin_type = origin_type;
     -	top.name = name;
     -	top.path = NULL;
    --	top.default_error_action = CONFIG_ERROR_ERROR;
     -	top.do_fgetc = config_buf_fgetc;
     -	top.do_ungetc = config_buf_ungetc;
     -	top.do_ftell = config_buf_ftell;
    @@ config.h: struct git_config_source {
     -					struct config_source *cs,
     -					void *event_fn_data);
     -
    --struct config_options {
    --	unsigned int respect_includes : 1;
    --	unsigned int ignore_repo : 1;
    --	unsigned int ignore_worktree : 1;
    --	unsigned int ignore_cmdline : 1;
    --	unsigned int system_gently : 1;
    --
    --	/*
    --	 * For internal use. Include all includeif.hasremoteurl paths without
    --	 * checking if the repo has that remote URL, and when doing so, verify
    --	 * that files included in this way do not configure any remote URLs
    --	 * themselves.
    --	 */
    --	unsigned int unconditional_remote_url : 1;
    --
    --	const char *commondir;
    --	const char *git_dir;
    +-struct config_parse_options {
     -	/*
     -	 * event_fn and event_fn_data are for internal use only. Handles events
     -	 * emitted by the config parser.
     -	 */
     -	config_parser_event_fn_t event_fn;
     -	void *event_fn_data;
    --	enum config_error_action {
    --		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
    --		CONFIG_ERROR_DIE, /* die() on error */
    --		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
    --		CONFIG_ERROR_SILENT, /* return -1 */
    --	} error_action;
     -};
    +-
    + #define CP_OPTS_INIT(error_action) { \
    + 	.event_fn = git_config_err_fn, \
    + 	.event_fn_data = (enum config_error_action []){(error_action)}, \
    +@@ config.h: enum config_error_action {
    + int git_config_err_fn(enum config_event_t type, size_t begin_offset,
    + 		      size_t end_offset, struct config_source *cs,
    + 		      void *event_fn_data);
     -
     -/* Config source metadata for a given config key-value pair */
     -struct key_value_info {
    @@ config.h: struct git_config_source {
     -
      int git_default_config(const char *, const char *,
      		       const struct config_context *, void *);
    - 
    --/**
    -- * Read a specific file in git-config format.
    -- * This function takes the same callback and data parameters as `git_config`.
    -- *
    -- * Unlike git_config(), this function does not respect includes.
    -- */
    --int git_config_from_file(config_fn_t fn, const char *, void *);
    +-
    + /**
    +  * Read a specific file in git-config format.
    +  * This function takes the same callback and data parameters as `git_config`.
    +@@ config.h: int git_default_config(const char *, const char *,
    +  * Unlike git_config(), this function does not respect includes.
    +  */
    + int git_config_from_file(config_fn_t fn, const char *, void *);
     -
     -int git_config_from_file_with_options(config_fn_t fn, const char *,
     -				      void *, enum config_scope,
    --				      const struct config_options *);
    +-				      const struct config_parse_options *);
     -int git_config_from_mem(config_fn_t fn,
     -			const enum config_origin_type,
     -			const char *name,
     -			const char *buf, size_t len,
     -			void *data, enum config_scope scope,
    --			const struct config_options *opts);
    +-			const struct config_parse_options *opts);
      int git_config_from_blob_oid(config_fn_t fn, const char *name,
      			     struct repository *repo,
      			     const struct object_id *oid, void *data,

base-commit: aa9166bcc0ba654fc21f198a30647ec087f733ed
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 1/5] config: return positive from git_config_parse_key()
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
@ 2023-07-31 23:46   ` Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 2/5] config: split out config_parse_options Glen Choo
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

git_config_parse_key() returns #define-d error codes, but negated. This
negation is merely a convenience to other parts of config.c that don't
bother inspecting the return value before passing it along. But:

a) There's no good reason why those callers couldn't negate the value
   themselves.

b) In other callers, this value eventually gets fed to exit(3), and
   those callers need to sanitize the negative value (and they sometimes
   do so lossily, by overriding the return value with
   CONFIG_INVALID_KEY).

c) We want to move that into a separate library, and returning only
   negative values no longer makes as much sense.

Change git_config_parse_key() to return positive values instead, and
adjust callers accordingly. Callers that sanitize the negative sign for
exit(3) now pass the return value opaquely, fixing a bug where "git
config <key with no section or name>" results in a different exit code
depending on whether we are setting or getting config. Callers that
wanted to pass along a negative value now negate the return value
themselves.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 builtin/config.c   |  3 +--
 config.c           | 16 ++++++++--------
 config.h           |  2 +-
 submodule-config.c |  4 ++--
 t/t1300-config.sh  | 16 ++++++++++++++++
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 1c75cbc43d..8a2840f0a8 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
 			goto free_strings;
 		}
 	} else {
-		if (git_config_parse_key(key_, &key, NULL)) {
-			ret = CONFIG_INVALID_KEY;
+		if ((ret = git_config_parse_key(key_, &key, NULL))) {
 			goto free_strings;
 		}
 	}
diff --git a/config.c b/config.c
index 85c5f35132..ca77ca17a4 100644
--- a/config.c
+++ b/config.c
@@ -534,8 +534,9 @@ static inline int iskeychar(int c)
  * Auxiliary function to sanity-check and split the key into the section
  * identifier and variable name.
  *
- * Returns 0 on success, -1 when there is an invalid character in the key and
- * -2 if there is no section name in the key.
+ * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
  *
  * store_key - pointer to char* which will hold a copy of the key with
  *             lowercase section and variable name
@@ -555,12 +556,12 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 	if (last_dot == NULL || last_dot == key) {
 		error(_("key does not contain a section: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	if (!last_dot[1]) {
 		error(_("key does not contain variable name: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	baselen = last_dot - key;
@@ -596,7 +597,7 @@ int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 out_free_ret_1:
 	FREE_AND_NULL(*store_key);
-	return -CONFIG_INVALID_KEY;
+	return CONFIG_INVALID_KEY;
 }
 
 static int config_parse_pair(const char *key, const char *value,
@@ -2346,7 +2347,7 @@ static int configset_find_element(struct config_set *set, const char *key,
 	 * `key` may come from the user, so normalize it before using it
 	 * for querying entries from the hashmap.
 	 */
-	ret = git_config_parse_key(key, &normalized_key, NULL);
+	ret = -git_config_parse_key(key, &normalized_key, NULL);
 	if (ret)
 		return ret;
 
@@ -3334,8 +3335,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 	size_t contents_sz;
 	struct config_store_data store = CONFIG_STORE_INIT;
 
-	/* parse-key returns negative; flip the sign to feed exit(3) */
-	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
+	ret = git_config_parse_key(key, &store.key, &store.baselen);
 	if (ret)
 		goto out_free;
 
diff --git a/config.h b/config.h
index 6332d74904..40966cb682 100644
--- a/config.h
+++ b/config.h
@@ -23,7 +23,7 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these negated: */
+/* git_config_parse_key() returns these: */
 #define CONFIG_INVALID_KEY 1
 #define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
diff --git a/submodule-config.c b/submodule-config.c
index b6908e295f..2aafc7f9cb 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -824,8 +824,8 @@ int print_config_from_gitmodules(struct repository *repo, const char *key)
 	char *store_key;
 
 	ret = git_config_parse_key(key, &store_key, NULL);
-	if (ret < 0)
-		return CONFIG_INVALID_KEY;
+	if (ret)
+		return ret;
 
 	config_from_gitmodules(config_print_callback, repo, store_key);
 
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 387d336c91..3202b0f884 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
 	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
 '
 
+# Exit codes
+test_expect_success '--get with bad key' '
+	# Also exits with 1 if the value is not found
+	test_expect_code 1 git config --get "bad.name\n" 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config --get "bad." 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
+test_expect_success 'set with bad key' '
+	test_expect_code 1 git config "bad.name\n" var 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config "bad." var 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
 test_done
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 2/5] config: split out config_parse_options
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 1/5] config: return positive from git_config_parse_key() Glen Choo
@ 2023-07-31 23:46   ` Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 3/5] config: report config parse errors using cb Glen Choo
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

"struct config_options" is a disjoint set of options options used by the
config parser (e.g. event listners) and options used by
config_with_options() (e.g. to handle includes, choose which config
files to parse). Split parser-only options into config_parse_options.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 bundle-uri.c |  2 +-
 config.c     | 14 +++++++-------
 config.h     | 37 ++++++++++++++++++++-----------------
 fsck.c       |  2 +-
 4 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4b5c49b93d..f93ca6a486 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,7 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_options opts = {
+	struct config_parse_options opts = {
 		.error_action = CONFIG_ERROR_ERROR,
 	};
 
diff --git a/config.c b/config.c
index ca77ca17a4..dc6cda03aa 100644
--- a/config.c
+++ b/config.c
@@ -983,7 +983,7 @@ static int get_base_var(struct config_source *cs, struct strbuf *name)
 struct parse_event_data {
 	enum config_event_t previous_type;
 	size_t previous_offset;
-	const struct config_options *opts;
+	const struct config_parse_options *opts;
 };
 
 static int do_event(struct config_source *cs, enum config_event_t type,
@@ -1031,7 +1031,7 @@ static void kvi_from_source(struct config_source *cs,
 
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
-			    const struct config_options *opts)
+			    const struct config_parse_options *opts)
 {
 	int comment = 0;
 	size_t baselen = 0;
@@ -1968,7 +1968,7 @@ int git_default_config(const char *var, const char *value,
  */
 static int do_config_from(struct config_source *top, config_fn_t fn,
 			  void *data, enum config_scope scope,
-			  const struct config_options *opts)
+			  const struct config_parse_options *opts)
 {
 	struct key_value_info kvi = KVI_INIT;
 	int ret;
@@ -1993,7 +1993,7 @@ static int do_config_from_file(config_fn_t fn,
 			       const enum config_origin_type origin_type,
 			       const char *name, const char *path, FILE *f,
 			       void *data, enum config_scope scope,
-			       const struct config_options *opts)
+			       const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 	int ret;
@@ -2022,7 +2022,7 @@ static int git_config_from_stdin(config_fn_t fn, void *data,
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 				      void *data, enum config_scope scope,
-				      const struct config_options *opts)
+				      const struct config_parse_options *opts)
 {
 	int ret = -1;
 	FILE *f;
@@ -2048,7 +2048,7 @@ int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type origin_type,
 			const char *name, const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts)
+			const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 
@@ -3380,7 +3380,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_options opts;
+		struct config_parse_options opts;
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
diff --git a/config.h b/config.h
index 40966cb682..b13586307d 100644
--- a/config.h
+++ b/config.h
@@ -85,6 +85,21 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					struct config_source *cs,
 					void *event_fn_data);
 
+struct config_parse_options {
+	enum config_error_action {
+		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
+		CONFIG_ERROR_DIE, /* die() on error */
+		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+		CONFIG_ERROR_SILENT, /* return -1 */
+	} error_action;
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -92,6 +107,9 @@ struct config_options {
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
 
+	const char *commondir;
+	const char *git_dir;
+	struct config_parse_options parse_options;
 	/*
 	 * For internal use. Include all includeif.hasremoteurl paths without
 	 * checking if the repo has that remote URL, and when doing so, verify
@@ -99,21 +117,6 @@ struct config_options {
 	 * themselves.
 	 */
 	unsigned int unconditional_remote_url : 1;
-
-	const char *commondir;
-	const char *git_dir;
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 };
 
 /* Config source metadata for a given config key-value pair */
@@ -178,13 +181,13 @@ int git_config_from_file(config_fn_t fn, const char *, void *);
 
 int git_config_from_file_with_options(config_fn_t fn, const char *,
 				      void *, enum config_scope,
-				      const struct config_options *);
+				      const struct config_parse_options *);
 int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type,
 			const char *name,
 			const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts);
+			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
diff --git a/fsck.c b/fsck.c
index 3be86616c5..522ee1c18a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,7 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_options config_opts = { 0 };
+		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 3/5] config: report config parse errors using cb
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 1/5] config: return positive from git_config_parse_key() Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 2/5] config: split out config_parse_options Glen Choo
@ 2023-07-31 23:46   ` Glen Choo
  2023-08-04 21:34     ` Jonathan Tan
  2023-07-31 23:46   ` [RFC PATCH v1.5 4/5] config.c: accept config_parse_options in git_config_from_stdin Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 5/5] config-parse: split library out of config.[c|h] Glen Choo
  4 siblings, 1 reply; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

In a subsequent commit, config parsing will become its own library, and
it's likely that the caller will want flexibility in handling errors
(instead of being limited to the error handling we have in-tree).

Move the Git-specific error handling into a config_parser_event_fn_t
that responds to config errors, and make git_parse_source() always
return -1 (careful inspection shows that it was always returning -1
already). This makes CONFIG_ERROR_SILENT obsolete since that is
equivalent to not specifying an error event listener. Also, remove
CONFIG_ERROR_UNSET and the config_source 'default', since all callers
are now expected to specify the error handling they want.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config.c           | 175 ++++++++++++++++++++++++++-------------------
 config.h           |  20 ++++--
 fsck.c             |   4 +-
 submodule-config.c |   9 ++-
 6 files changed, 129 insertions(+), 87 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 8a2840f0a8..de1878b947 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -42,7 +42,9 @@ static int actions, type;
 static char *default_value;
 static int end_nul;
 static int respect_includes_opt = -1;
-static struct config_options config_options;
+static struct config_options config_options = {
+	.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE)
+};
 static int show_origin;
 static int show_scope;
 static int fixed_value;
diff --git a/bundle-uri.c b/bundle-uri.c
index f93ca6a486..856bffdcad 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_parse_options opts = {
-		.error_action = CONFIG_ERROR_ERROR,
-	};
+	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	if (!list->baseURI) {
 		struct strbuf baseURI = STRBUF_INIT;
diff --git a/config.c b/config.c
index dc6cda03aa..b6d0e16240 100644
--- a/config.c
+++ b/config.c
@@ -55,7 +55,6 @@ struct config_source {
 	enum config_origin_type origin_type;
 	const char *name;
 	const char *path;
-	enum config_error_action default_error_action;
 	int linenr;
 	int eof;
 	size_t total_len;
@@ -185,13 +184,15 @@ static int handle_path_include(const struct key_value_info *kvi,
 	}
 
 	if (!access_or_die(path, R_OK, 0)) {
+		struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 		if (++inc->depth > MAX_INCLUDE_DEPTH)
 			die(_(include_depth_advice), MAX_INCLUDE_DEPTH, path,
 			    !kvi ? "<unknown>" :
 			    kvi->filename ? kvi->filename :
 			    "the command line");
 		ret = git_config_from_file_with_options(git_config_include, path, inc,
-							kvi->scope, NULL);
+							kvi->scope, &config_opts);
 		inc->depth--;
 	}
 cleanup:
@@ -339,7 +340,9 @@ static int add_remote_url(const char *var, const char *value,
 
 static void populate_remote_urls(struct config_include_data *inc)
 {
-	struct config_options opts;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts = *inc->opts;
 	opts.unconditional_remote_url = 1;
@@ -1029,6 +1032,56 @@ static void kvi_from_source(struct config_source *cs,
 	out->path = cs->path;
 }
 
+int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
+		      size_t end_offset UNUSED, struct config_source *cs,
+		      void *data)
+{
+	char *error_msg = NULL;
+	int error_return = 0;
+	enum config_error_action *action = data;
+
+	if (type != CONFIG_EVENT_ERROR)
+		return 0;
+
+	switch (cs->origin_type) {
+	case CONFIG_ORIGIN_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in blob %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_FILE:
+		error_msg = xstrfmt(_("bad config line %d in file %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_STDIN:
+		error_msg = xstrfmt(_("bad config line %d in standard input"),
+				      cs->linenr);
+		break;
+	case CONFIG_ORIGIN_SUBMODULE_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
+				       cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_CMDLINE:
+		error_msg = xstrfmt(_("bad config line %d in command line %s"),
+				       cs->linenr, cs->name);
+		break;
+	default:
+		error_msg = xstrfmt(_("bad config line %d in %s"),
+				      cs->linenr, cs->name);
+	}
+
+	switch (*action) {
+	case CONFIG_ERROR_DIE:
+		die("%s", error_msg);
+		break;
+	case CONFIG_ERROR_ERROR:
+		error_return = error("%s", error_msg);
+		break;
+	}
+
+	free(error_msg);
+	return error_return;
+}
+
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
 			    const struct config_parse_options *opts)
@@ -1036,8 +1089,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 	int comment = 0;
 	size_t baselen = 0;
 	struct strbuf *var = &cs->var;
-	int error_return = 0;
-	char *error_msg = NULL;
 
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
@@ -1119,53 +1170,14 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			break;
 	}
 
-	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
-		return -1;
-
-	switch (cs->origin_type) {
-	case CONFIG_ORIGIN_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in blob %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_FILE:
-		error_msg = xstrfmt(_("bad config line %d in file %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_STDIN:
-		error_msg = xstrfmt(_("bad config line %d in standard input"),
-				      cs->linenr);
-		break;
-	case CONFIG_ORIGIN_SUBMODULE_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
-				       cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_CMDLINE:
-		error_msg = xstrfmt(_("bad config line %d in command line %s"),
-				       cs->linenr, cs->name);
-		break;
-	default:
-		error_msg = xstrfmt(_("bad config line %d in %s"),
-				      cs->linenr, cs->name);
-	}
-
-	switch (opts && opts->error_action ?
-		opts->error_action :
-		cs->default_error_action) {
-	case CONFIG_ERROR_DIE:
-		die("%s", error_msg);
-		break;
-	case CONFIG_ERROR_ERROR:
-		error_return = error("%s", error_msg);
-		break;
-	case CONFIG_ERROR_SILENT:
-		error_return = -1;
-		break;
-	case CONFIG_ERROR_UNSET:
-		BUG("config error action unset");
-	}
-
-	free(error_msg);
-	return error_return;
+	/*
+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
+	 * in order for our callback to receive the error event, we have to call
+	 * do_event twice
+	 */
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
 }
 
 static uintmax_t get_unit_factor(const char *end)
@@ -2002,7 +2014,6 @@ static int do_config_from_file(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = path;
-	top.default_error_action = CONFIG_ERROR_DIE;
 	top.do_fgetc = config_file_fgetc;
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
@@ -2016,8 +2027,10 @@ static int do_config_from_file(config_fn_t fn,
 static int git_config_from_stdin(config_fn_t fn, void *data,
 				 enum config_scope scope)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, NULL);
+				   data, scope, &config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2040,8 +2053,10 @@ int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return git_config_from_file_with_options(fn, filename, data,
-						 CONFIG_SCOPE_UNKNOWN, NULL);
+						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
 int git_config_from_mem(config_fn_t fn,
@@ -2058,7 +2073,6 @@ int git_config_from_mem(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = NULL;
-	top.default_error_action = CONFIG_ERROR_ERROR;
 	top.do_fgetc = config_buf_fgetc;
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
@@ -2077,6 +2091,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	char *buf;
 	unsigned long size;
 	int ret;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
@@ -2087,7 +2102,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	}
 
 	ret = git_config_from_mem(fn, CONFIG_ORIGIN_BLOB, name, buf, size,
-				  data, scope, NULL);
+				  data, scope, &config_opts);
 	free(buf);
 
 	return ret;
@@ -2188,29 +2203,32 @@ static int do_git_config_sequence(const struct config_options *opts,
 			   opts->system_gently ? ACCESS_EACCES_OK : 0))
 		ret += git_config_from_file_with_options(fn, system_config,
 							 data, CONFIG_SCOPE_SYSTEM,
-							 NULL);
+							 &opts->parse_options);
 
 	git_global_config(&user_config, &xdg_config);
 
 	if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, xdg_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, user_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_repo && repo_config &&
 	    !access_or_die(repo_config, R_OK, 0))
 		ret += git_config_from_file_with_options(fn, repo_config, data,
-							 CONFIG_SCOPE_LOCAL, NULL);
+							 CONFIG_SCOPE_LOCAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_worktree && worktree_config &&
 	    repo && repo->repository_format_worktree_config &&
 	    !access_or_die(worktree_config, R_OK, 0)) {
 			ret += git_config_from_file_with_options(fn, worktree_config, data,
 								 CONFIG_SCOPE_WORKTREE,
-								 NULL);
+								 &opts->parse_options);
 	}
 
 	if (!opts->ignore_cmdline && git_config_from_parameters(fn, data) < 0)
@@ -2251,7 +2269,7 @@ int config_with_options(config_fn_t fn, void *data,
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-							NULL);
+							&opts->parse_options);
 	} else if (config_source && config_source->blob) {
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 					       data, config_source->scope);
@@ -2289,9 +2307,11 @@ static void configset_iter(struct config_set *set, config_fn_t fn, void *data)
 
 void read_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = {0};
 	struct strbuf commondir = STRBUF_INIT;
 	struct strbuf gitdir = STRBUF_INIT;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 
@@ -2323,7 +2343,9 @@ void read_early_config(config_fn_t cb, void *data)
  */
 void read_very_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = { 0 };
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 	opts.ignore_repo = 1;
@@ -2614,7 +2636,9 @@ int git_configset_get_pathname(struct config_set *set, const char *key, const ch
 /* Functions use to read configuration from a repository */
 static void repo_read_config(struct repository *repo)
 {
-	struct config_options opts = { 0 };
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 	opts.commondir = repo->commondir;
@@ -2761,12 +2785,14 @@ int repo_config_get_pathname(struct repository *repo,
 static void read_protected_config(void)
 {
 	struct config_options opts = {
-		.respect_includes = 1,
-		.ignore_repo = 1,
-		.ignore_worktree = 1,
-		.system_gently = 1,
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
 	};
 
+	opts.respect_includes = 1;
+	opts.ignore_repo = 1;
+	opts.ignore_worktree = 1;
+	opts.system_gently = 1;
+
 	git_configset_init(&protected_config);
 	config_with_options(config_set_callback, &protected_config, NULL,
 			    NULL, &opts);
@@ -2977,6 +3003,7 @@ struct config_store_data {
 		enum config_event_t type;
 		int is_keys_section;
 	} *parsed;
+	enum config_error_action error_action;
 	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
@@ -3044,6 +3071,10 @@ static int store_aux_event(enum config_event_t type, size_t begin, size_t end,
 			store->seen[store->seen_nr] = store->parsed_nr;
 		}
 	}
+	if (type == CONFIG_EVENT_ERROR) {
+		return git_config_err_fn(type, begin, end, cs,
+					 &store->error_action);
+	}
 
 	store->parsed_nr++;
 
@@ -3380,7 +3411,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_parse_options opts;
+		struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
@@ -3407,8 +3438,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
 		store.parsed[0].end = 0;
+		store.error_action = CONFIG_ERROR_DIE;
 
-		memset(&opts, 0, sizeof(opts));
 		opts.event_fn = store_aux_event;
 		opts.event_fn_data = &store;
 
diff --git a/config.h b/config.h
index b13586307d..1aed02cd5d 100644
--- a/config.h
+++ b/config.h
@@ -86,12 +86,6 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					void *event_fn_data);
 
 struct config_parse_options {
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 	/*
 	 * event_fn and event_fn_data are for internal use only. Handles events
 	 * emitted by the config parser.
@@ -100,6 +94,11 @@ struct config_parse_options {
 	void *event_fn_data;
 };
 
+#define CP_OPTS_INIT(error_action) { \
+	.event_fn = git_config_err_fn, \
+	.event_fn_data = (enum config_error_action []){(error_action)}, \
+}
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -119,6 +118,15 @@ struct config_options {
 	unsigned int unconditional_remote_url : 1;
 };
 
+enum config_error_action {
+	CONFIG_ERROR_DIE, /* die() on error */
+	CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+};
+
+int git_config_err_fn(enum config_event_t type, size_t begin_offset,
+		      size_t end_offset, struct config_source *cs,
+		      void *event_fn_data);
+
 /* Config source metadata for a given config key-value pair */
 struct key_value_info {
 	const char *filename;
diff --git a/fsck.c b/fsck.c
index 522ee1c18a..bc0ca11421 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,6 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
@@ -1238,10 +1237,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		data.oid = oid;
 		data.options = options;
 		data.ret = 0;
-		config_opts.error_action = CONFIG_ERROR_SILENT;
 		if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 					".gitmodules", buf, size, &data,
-					CONFIG_SCOPE_UNKNOWN, &config_opts))
+					CONFIG_SCOPE_UNKNOWN, NULL))
 			data.ret |= report(options, oid, OBJ_BLOB,
 					FSCK_MSG_GITMODULES_PARSE,
 					"could not parse gitmodules blob");
diff --git a/submodule-config.c b/submodule-config.c
index 2aafc7f9cb..bcdd6feefa 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -565,6 +565,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	enum object_type type;
 	const struct submodule *submodule = NULL;
 	struct parse_config_parameter parameter;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
+
 
 	/*
 	 * If any parameter except the cache is a NULL pointer just
@@ -608,7 +610,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	parameter.gitmodules_oid = &oid;
 	parameter.overwrite = 0;
 	git_config_from_mem(parse_config, CONFIG_ORIGIN_SUBMODULE_BLOB, rev.buf,
-			    config, config_size, &parameter, CONFIG_SCOPE_UNKNOWN, NULL);
+			    config, config_size, &parameter,
+			    CONFIG_SCOPE_UNKNOWN, &config_opts);
 	strbuf_release(&rev);
 	free(config);
 
@@ -652,7 +655,9 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 		struct git_config_source config_source = {
 			0, .scope = CONFIG_SCOPE_SUBMODULE
 		};
-		const struct config_options opts = { 0 };
+		struct config_options opts = {
+			.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+		};
 		struct object_id oid;
 		char *file;
 		char *oidstr = NULL;
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 4/5] config.c: accept config_parse_options in git_config_from_stdin
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
                     ` (2 preceding siblings ...)
  2023-07-31 23:46   ` [RFC PATCH v1.5 3/5] config: report config parse errors using cb Glen Choo
@ 2023-07-31 23:46   ` Glen Choo
  2023-07-31 23:46   ` [RFC PATCH v1.5 5/5] config-parse: split library out of config.[c|h] Glen Choo
  4 siblings, 0 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

A later commit will move git_config_from_stdin() to a library, so it
will need to accept event listeners.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 config.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/config.c b/config.c
index b6d0e16240..cf0785a258 100644
--- a/config.c
+++ b/config.c
@@ -2025,12 +2025,11 @@ static int do_config_from_file(config_fn_t fn,
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope)
+				 enum config_scope scope,
+				 const struct config_parse_options *config_opts)
 {
-	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
-
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, &config_opts);
+				   data, scope, config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2265,7 +2264,8 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		ret = git_config_from_stdin(fn, data, config_source->scope);
+		ret = git_config_from_stdin(fn, data, config_source->scope,
+					    &opts->parse_options);
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v1.5 5/5] config-parse: split library out of config.[c|h]
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
                     ` (3 preceding siblings ...)
  2023-07-31 23:46   ` [RFC PATCH v1.5 4/5] config.c: accept config_parse_options in git_config_from_stdin Glen Choo
@ 2023-07-31 23:46   ` Glen Choo
  4 siblings, 0 replies; 49+ messages in thread
From: Glen Choo @ 2023-07-31 23:46 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Calvin Wan

The config parsing machinery (besides "include" directives) is usable by
programs other than Git - it works with works with any file written in
Git config syntax (IOW it doesn't rely on 'core' Git features like a
repository), and as of the series ending at 6e8e7981eb (config: pass
source to config_parser_event_fn_t, 2023-06-28), it no longer relies on
global state. Thus, we can and should start turning it into a library
other programs can use.

Begin this process by splitting the config parsing code out of
config.[c|h] and into config-parse.[c|h]. Do not change interfaces or
function bodies, but tweak visibility and includes where appropriate,
namely:

- git_config_from_stdin() is now non-static so that it can be seen by
  config.c.

- "struct config_source" is now defined in the .h file so that it can be
  seen by config.c. And as a result, config-lib.h needs to "#include
  strbuf.h".

In theory, this makes it possible for in-tree files to decide whether
they only need all of the config functionality or only config parsing,
and bring in the smallest bit of functionality needed. But for now,
there are no in-tree files that can swap "#include config.h" for
"#include config-parse.h". E.g. Bundle URIs would only need config
parsing to parse bundle lists, but bundle-uri.c uses other config.h
functionality like key parsing and reading repo settings.

The resulting library is usable, though it is unergonomic to do so,
e.g. the caller needs to "#include git-compat-util.h" and other
dependencies, and we don't have an easy way of linking in the required
objects. This isn't the end state we want for our libraries, but at
least we have _some_ library whose usability we can improve in future
series.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Makefile       |   1 +
 config-parse.c | 561 +++++++++++++++++++++++++++++++++++++++++++++++
 config-parse.h | 155 +++++++++++++
 config.c       | 583 -------------------------------------------------
 config.h       | 119 +---------
 5 files changed, 718 insertions(+), 701 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

diff --git a/Makefile b/Makefile
index fb541dedc9..67e05bcee5 100644
--- a/Makefile
+++ b/Makefile
@@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += compat/zlib-uncompress2.o
 LIB_OBJS += config.o
+LIB_OBJS += config-parse.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/config-parse.c b/config-parse.c
new file mode 100644
index 0000000000..ff6abc7bd3
--- /dev/null
+++ b/config-parse.c
@@ -0,0 +1,561 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "gettext.h"
+#include "hashmap.h"
+#include "utf8.h"
+#include "config-parse.h"
+
+static int config_file_fgetc(struct config_source *conf)
+{
+	return getc_unlocked(conf->u.file);
+}
+
+static int config_file_ungetc(int c, struct config_source *conf)
+{
+	return ungetc(c, conf->u.file);
+}
+
+static long config_file_ftell(struct config_source *conf)
+{
+	return ftell(conf->u.file);
+}
+
+
+static int config_buf_fgetc(struct config_source *conf)
+{
+	if (conf->u.buf.pos < conf->u.buf.len)
+		return conf->u.buf.buf[conf->u.buf.pos++];
+
+	return EOF;
+}
+
+static int config_buf_ungetc(int c, struct config_source *conf)
+{
+	if (conf->u.buf.pos > 0) {
+		conf->u.buf.pos--;
+		if (conf->u.buf.buf[conf->u.buf.pos] != c)
+			BUG("config_buf can only ungetc the same character");
+		return c;
+	}
+
+	return EOF;
+}
+
+static long config_buf_ftell(struct config_source *conf)
+{
+	return conf->u.buf.pos;
+}
+
+static inline int iskeychar(int c)
+{
+	return isalnum(c) || c == '-';
+}
+
+/*
+ * Auxiliary function to sanity-check and split the key into the section
+ * identifier and variable name.
+ *
+ * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
+ *
+ * store_key - pointer to char* which will hold a copy of the key with
+ *             lowercase section and variable name
+ * baselen - pointer to size_t which will hold the length of the
+ *           section + subsection part, can be NULL
+ */
+int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
+{
+	size_t i, baselen;
+	int dot;
+	const char *last_dot = strrchr(key, '.');
+
+	/*
+	 * Since "key" actually contains the section name and the real
+	 * key name separated by a dot, we have to know where the dot is.
+	 */
+
+	if (last_dot == NULL || last_dot == key) {
+		error(_("key does not contain a section: %s"), key);
+		return CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	if (!last_dot[1]) {
+		error(_("key does not contain variable name: %s"), key);
+		return CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	baselen = last_dot - key;
+	if (baselen_)
+		*baselen_ = baselen;
+
+	/*
+	 * Validate the key and while at it, lower case it for matching.
+	 */
+	*store_key = xmallocz(strlen(key));
+
+	dot = 0;
+	for (i = 0; key[i]; i++) {
+		unsigned char c = key[i];
+		if (c == '.')
+			dot = 1;
+		/* Leave the extended basename untouched.. */
+		if (!dot || i > baselen) {
+			if (!iskeychar(c) ||
+			    (i == baselen + 1 && !isalpha(c))) {
+				error(_("invalid key: %s"), key);
+				goto out_free_ret_1;
+			}
+			c = tolower(c);
+		} else if (c == '\n') {
+			error(_("invalid key (newline): %s"), key);
+			goto out_free_ret_1;
+		}
+		(*store_key)[i] = c;
+	}
+
+	return 0;
+
+out_free_ret_1:
+	FREE_AND_NULL(*store_key);
+	return CONFIG_INVALID_KEY;
+}
+
+static int get_next_char(struct config_source *cs)
+{
+	int c = cs->do_fgetc(cs);
+
+	if (c == '\r') {
+		/* DOS like systems */
+		c = cs->do_fgetc(cs);
+		if (c != '\n') {
+			if (c != EOF)
+				cs->do_ungetc(c, cs);
+			c = '\r';
+		}
+	}
+
+	if (c != EOF && ++cs->total_len > INT_MAX) {
+		/*
+		 * This is an absurdly long config file; refuse to parse
+		 * further in order to protect downstream code from integer
+		 * overflows. Note that we can't return an error specifically,
+		 * but we can mark EOF and put trash in the return value,
+		 * which will trigger a parse error.
+		 */
+		cs->eof = 1;
+		return 0;
+	}
+
+	if (c == '\n')
+		cs->linenr++;
+	if (c == EOF) {
+		cs->eof = 1;
+		cs->linenr++;
+		c = '\n';
+	}
+	return c;
+}
+
+static char *parse_value(struct config_source *cs)
+{
+	int quote = 0, comment = 0, space = 0;
+
+	strbuf_reset(&cs->value);
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n') {
+			if (quote) {
+				cs->linenr--;
+				return NULL;
+			}
+			return cs->value.buf;
+		}
+		if (comment)
+			continue;
+		if (isspace(c) && !quote) {
+			if (cs->value.len)
+				space++;
+			continue;
+		}
+		if (!quote) {
+			if (c == ';' || c == '#') {
+				comment = 1;
+				continue;
+			}
+		}
+		for (; space; space--)
+			strbuf_addch(&cs->value, ' ');
+		if (c == '\\') {
+			c = get_next_char(cs);
+			switch (c) {
+			case '\n':
+				continue;
+			case 't':
+				c = '\t';
+				break;
+			case 'b':
+				c = '\b';
+				break;
+			case 'n':
+				c = '\n';
+				break;
+			/* Some characters escape as themselves */
+			case '\\': case '"':
+				break;
+			/* Reject unknown escape sequences */
+			default:
+				return NULL;
+			}
+			strbuf_addch(&cs->value, c);
+			continue;
+		}
+		if (c == '"') {
+			quote = 1-quote;
+			continue;
+		}
+		strbuf_addch(&cs->value, c);
+	}
+}
+
+static int get_value(struct config_source *cs, struct key_value_info *kvi,
+		     config_fn_t fn, void *data, struct strbuf *name)
+{
+	int c;
+	char *value;
+	int ret;
+	struct config_context ctx = {
+		.kvi = kvi,
+	};
+
+	/* Get the full name */
+	for (;;) {
+		c = get_next_char(cs);
+		if (cs->eof)
+			break;
+		if (!iskeychar(c))
+			break;
+		strbuf_addch(name, tolower(c));
+	}
+
+	while (c == ' ' || c == '\t')
+		c = get_next_char(cs);
+
+	value = NULL;
+	if (c != '\n') {
+		if (c != '=')
+			return -1;
+		value = parse_value(cs);
+		if (!value)
+			return -1;
+	}
+	/*
+	 * We already consumed the \n, but we need linenr to point to
+	 * the line we just parsed during the call to fn to get
+	 * accurate line number in error messages.
+	 */
+	cs->linenr--;
+	kvi->linenr = cs->linenr;
+	ret = fn(name->buf, value, &ctx, data);
+	if (ret >= 0)
+		cs->linenr++;
+	return ret;
+}
+
+static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
+				 int c)
+{
+	cs->subsection_case_sensitive = 0;
+	do {
+		if (c == '\n')
+			goto error_incomplete_line;
+		c = get_next_char(cs);
+	} while (isspace(c));
+
+	/* We require the format to be '[base "extension"]' */
+	if (c != '"')
+		return -1;
+	strbuf_addch(name, '.');
+
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n')
+			goto error_incomplete_line;
+		if (c == '"')
+			break;
+		if (c == '\\') {
+			c = get_next_char(cs);
+			if (c == '\n')
+				goto error_incomplete_line;
+		}
+		strbuf_addch(name, c);
+	}
+
+	/* Final ']' */
+	if (get_next_char(cs) != ']')
+		return -1;
+	return 0;
+error_incomplete_line:
+	cs->linenr--;
+	return -1;
+}
+
+static int get_base_var(struct config_source *cs, struct strbuf *name)
+{
+	cs->subsection_case_sensitive = 1;
+	for (;;) {
+		int c = get_next_char(cs);
+		if (cs->eof)
+			return -1;
+		if (c == ']')
+			return 0;
+		if (isspace(c))
+			return get_extended_base_var(cs, name, c);
+		if (!iskeychar(c) && c != '.')
+			return -1;
+		strbuf_addch(name, tolower(c));
+	}
+}
+
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_parse_options *opts;
+};
+
+static int do_event(struct config_source *cs, enum config_event_t type,
+		    struct parse_event_data *data)
+{
+	size_t offset;
+
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	offset = cs->do_ftell(cs);
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 offset, cs, data->opts->event_fn_data) < 0)
+		return -1;
+
+	data->previous_type = type;
+	data->previous_offset = offset;
+
+	return 0;
+}
+
+static void kvi_from_source(struct config_source *cs,
+			    enum config_scope scope,
+			    struct key_value_info *out)
+{
+	out->filename = strintern(cs->name);
+	out->origin_type = cs->origin_type;
+	out->linenr = cs->linenr;
+	out->scope = scope;
+	out->path = cs->path;
+}
+
+static int git_parse_source(struct config_source *cs, config_fn_t fn,
+			    struct key_value_info *kvi, void *data,
+			    const struct config_parse_options *opts)
+{
+	int comment = 0;
+	size_t baselen = 0;
+	struct strbuf *var = &cs->var;
+
+	/* U+FEFF Byte Order Mark in UTF8 */
+	const char *bomptr = utf8_bom;
+
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
+	for (;;) {
+		int c;
+
+		c = get_next_char(cs);
+		if (bomptr && *bomptr) {
+			/* We are at the file beginning; skip UTF8-encoded BOM
+			 * if present. Sane editors won't put this in on their
+			 * own, but e.g. Windows Notepad will do it happily. */
+			if (c == (*bomptr & 0377)) {
+				bomptr++;
+				continue;
+			} else {
+				/* Do not tolerate partial BOM. */
+				if (bomptr != utf8_bom)
+					break;
+				/* No BOM at file beginning. Cool. */
+				bomptr = NULL;
+			}
+		}
+		if (c == '\n') {
+			if (cs->eof) {
+				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
+				return 0;
+			}
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
+			comment = 0;
+			continue;
+		}
+		if (comment)
+			continue;
+		if (isspace(c)) {
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
+		if (c == '#' || c == ';') {
+			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
+			comment = 1;
+			continue;
+		}
+		if (c == '[') {
+			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
+			/* Reset prior to determining a new stem */
+			strbuf_reset(var);
+			if (get_base_var(cs, var) < 0 || var->len < 1)
+				break;
+			strbuf_addch(var, '.');
+			baselen = var->len;
+			continue;
+		}
+		if (!isalpha(c))
+			break;
+
+		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
+		/*
+		 * Truncate the var name back to the section header
+		 * stem prior to grabbing the suffix part of the name
+		 * and the value.
+		 */
+		strbuf_setlen(var, baselen);
+		strbuf_addch(var, tolower(c));
+		if (get_value(cs, kvi, fn, data, var) < 0)
+			break;
+	}
+	/*
+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
+	 * in order for our callback to receive the error event, we have to call
+	 * do_event twice
+	 */
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
+}
+
+/*
+ * All source specific fields in the union, die_on_error, name and the callbacks
+ * fgetc, ungetc, ftell of top need to be initialized before calling
+ * this function.
+ */
+static int do_config_from(struct config_source *top, config_fn_t fn,
+			  void *data, enum config_scope scope,
+			  const struct config_parse_options *opts)
+{
+	struct key_value_info kvi = KVI_INIT;
+	int ret;
+
+	/* push config-file parsing state stack */
+	top->linenr = 1;
+	top->eof = 0;
+	top->total_len = 0;
+	strbuf_init(&top->value, 1024);
+	strbuf_init(&top->var, 1024);
+	kvi_from_source(top, scope, &kvi);
+
+	ret = git_parse_source(top, fn, &kvi, data, opts);
+
+	strbuf_release(&top->value);
+	strbuf_release(&top->var);
+
+	return ret;
+}
+
+static int do_config_from_file(config_fn_t fn,
+			       const enum config_origin_type origin_type,
+			       const char *name, const char *path, FILE *f,
+			       void *data, enum config_scope scope,
+			       const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+	int ret;
+
+	top.u.file = f;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = path;
+	top.do_fgetc = config_file_fgetc;
+	top.do_ungetc = config_file_ungetc;
+	top.do_ftell = config_file_ftell;
+
+	flockfile(f);
+	ret = do_config_from(&top, fn, data, scope, opts);
+	funlockfile(f);
+	return ret;
+}
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts)
+{
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, scope, config_opts);
+}
+
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data, enum config_scope scope,
+				      const struct config_parse_options *opts)
+{
+	int ret = -1;
+	FILE *f;
+
+	if (!filename)
+		BUG("filename cannot be NULL");
+	f = fopen_or_warn(filename, "r");
+	if (f) {
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, scope, opts);
+		fclose(f);
+	}
+	return ret;
+}
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type origin_type,
+			const char *name, const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+
+	top.u.buf.buf = buf;
+	top.u.buf.len = len;
+	top.u.buf.pos = 0;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = NULL;
+	top.do_fgetc = config_buf_fgetc;
+	top.do_ungetc = config_buf_ungetc;
+	top.do_ftell = config_buf_ftell;
+
+	return do_config_from(&top, fn, data, scope, opts);
+}
diff --git a/config-parse.h b/config-parse.h
new file mode 100644
index 0000000000..d5623906ef
--- /dev/null
+++ b/config-parse.h
@@ -0,0 +1,155 @@
+/*
+ * Low level config parsing.
+ */
+#ifndef CONFIG_PARSE_H
+#define CONFIG_PARSE_H
+
+#include "strbuf.h"
+
+/* git_config_parse_key() returns these: */
+#define CONFIG_INVALID_KEY 1
+#define CONFIG_NO_SECTION_OR_NAME 2
+
+int git_config_parse_key(const char *, char **, size_t *);
+
+enum config_scope {
+	CONFIG_SCOPE_UNKNOWN = 0,
+	CONFIG_SCOPE_SYSTEM,
+	CONFIG_SCOPE_GLOBAL,
+	CONFIG_SCOPE_LOCAL,
+	CONFIG_SCOPE_WORKTREE,
+	CONFIG_SCOPE_COMMAND,
+	CONFIG_SCOPE_SUBMODULE,
+};
+const char *config_scope_name(enum config_scope scope);
+
+enum config_origin_type {
+	CONFIG_ORIGIN_UNKNOWN = 0,
+	CONFIG_ORIGIN_BLOB,
+	CONFIG_ORIGIN_FILE,
+	CONFIG_ORIGIN_STDIN,
+	CONFIG_ORIGIN_SUBMODULE_BLOB,
+	CONFIG_ORIGIN_CMDLINE
+};
+
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+struct config_source;
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					struct config_source *cs,
+					void *event_fn_data);
+
+struct config_parse_options {
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
+struct config_source {
+	struct config_source *prev;
+	union {
+		FILE *file;
+		struct config_buf {
+			const char *buf;
+			size_t len;
+			size_t pos;
+		} buf;
+	} u;
+	enum config_origin_type origin_type;
+	const char *name;
+	const char *path;
+	int linenr;
+	int eof;
+	size_t total_len;
+	struct strbuf value;
+	struct strbuf var;
+	unsigned subsection_case_sensitive : 1;
+
+	int (*do_fgetc)(struct config_source *c);
+	int (*do_ungetc)(int c, struct config_source *conf);
+	long (*do_ftell)(struct config_source *c);
+};
+#define CONFIG_SOURCE_INIT { 0 }
+
+/* Config source metadata for a given config key-value pair */
+struct key_value_info {
+	const char *filename;
+	int linenr;
+	enum config_origin_type origin_type;
+	enum config_scope scope;
+	const char *path;
+};
+#define KVI_INIT { \
+	.filename = NULL, \
+	.linenr = -1, \
+	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
+	.scope = CONFIG_SCOPE_UNKNOWN, \
+	.path = NULL, \
+}
+
+/* Captures additional information that a config callback can use. */
+struct config_context {
+	/* Config source metadata for key and value. */
+	const struct key_value_info *kvi;
+};
+#define CONFIG_CONTEXT_INIT { 0 }
+
+/**
+ * A config callback function takes four parameters:
+ *
+ * - the name of the parsed variable. This is in canonical "flat" form: the
+ *   section, subsection, and variable segments will be separated by dots,
+ *   and the section and variable segments will be all lowercase. E.g.,
+ *   `core.ignorecase`, `diff.SomeType.textconv`.
+ *
+ * - the value of the found variable, as a string. If the variable had no
+ *   value specified, the value will be NULL (typically this means it
+ *   should be interpreted as boolean true).
+ *
+ * - the 'config context', that is, additional information about the config
+ *   iteration operation provided by the config machinery. For example, this
+ *   includes information about the config source being parsed (e.g. the
+ *   filename).
+ *
+ * - a void pointer passed in by the caller of the config API; this can
+ *   contain callback-specific data
+ *
+ * A config callback should return 0 for success, or -1 if the variable
+ * could not be parsed properly.
+ */
+typedef int (*config_fn_t)(const char *, const char *,
+			   const struct config_context *, void *);
+
+int git_config_from_file_with_options(config_fn_t fn, const char *,
+				      void *, enum config_scope,
+				      const struct config_parse_options *);
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type,
+			const char *name,
+			const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts);
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts);
+
+#endif /* CONFIG_PARSE_H */
diff --git a/config.c b/config.c
index cf0785a258..9688e5eb8f 100644
--- a/config.c
+++ b/config.c
@@ -42,32 +42,6 @@
 #include "wrapper.h"
 #include "write-or-die.h"
 
-struct config_source {
-	struct config_source *prev;
-	union {
-		FILE *file;
-		struct config_buf {
-			const char *buf;
-			size_t len;
-			size_t pos;
-		} buf;
-	} u;
-	enum config_origin_type origin_type;
-	const char *name;
-	const char *path;
-	int linenr;
-	int eof;
-	size_t total_len;
-	struct strbuf value;
-	struct strbuf var;
-	unsigned subsection_case_sensitive : 1;
-
-	int (*do_fgetc)(struct config_source *c);
-	int (*do_ungetc)(int c, struct config_source *conf);
-	long (*do_ftell)(struct config_source *c);
-};
-#define CONFIG_SOURCE_INIT { 0 }
-
 static int pack_compression_seen;
 static int zlib_compression_seen;
 
@@ -82,47 +56,6 @@ static int zlib_compression_seen;
  */
 static struct config_set protected_config;
 
-static int config_file_fgetc(struct config_source *conf)
-{
-	return getc_unlocked(conf->u.file);
-}
-
-static int config_file_ungetc(int c, struct config_source *conf)
-{
-	return ungetc(c, conf->u.file);
-}
-
-static long config_file_ftell(struct config_source *conf)
-{
-	return ftell(conf->u.file);
-}
-
-
-static int config_buf_fgetc(struct config_source *conf)
-{
-	if (conf->u.buf.pos < conf->u.buf.len)
-		return conf->u.buf.buf[conf->u.buf.pos++];
-
-	return EOF;
-}
-
-static int config_buf_ungetc(int c, struct config_source *conf)
-{
-	if (conf->u.buf.pos > 0) {
-		conf->u.buf.pos--;
-		if (conf->u.buf.buf[conf->u.buf.pos] != c)
-			BUG("config_buf can only ungetc the same character");
-		return c;
-	}
-
-	return EOF;
-}
-
-static long config_buf_ftell(struct config_source *conf)
-{
-	return conf->u.buf.pos;
-}
-
 struct config_include_data {
 	int depth;
 	config_fn_t fn;
@@ -528,81 +461,6 @@ void git_config_push_env(const char *spec)
 	free(key);
 }
 
-static inline int iskeychar(int c)
-{
-	return isalnum(c) || c == '-';
-}
-
-/*
- * Auxiliary function to sanity-check and split the key into the section
- * identifier and variable name.
- *
- * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
- * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
- * key.
- *
- * store_key - pointer to char* which will hold a copy of the key with
- *             lowercase section and variable name
- * baselen - pointer to size_t which will hold the length of the
- *           section + subsection part, can be NULL
- */
-int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
-{
-	size_t i, baselen;
-	int dot;
-	const char *last_dot = strrchr(key, '.');
-
-	/*
-	 * Since "key" actually contains the section name and the real
-	 * key name separated by a dot, we have to know where the dot is.
-	 */
-
-	if (last_dot == NULL || last_dot == key) {
-		error(_("key does not contain a section: %s"), key);
-		return CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	if (!last_dot[1]) {
-		error(_("key does not contain variable name: %s"), key);
-		return CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	baselen = last_dot - key;
-	if (baselen_)
-		*baselen_ = baselen;
-
-	/*
-	 * Validate the key and while at it, lower case it for matching.
-	 */
-	*store_key = xmallocz(strlen(key));
-
-	dot = 0;
-	for (i = 0; key[i]; i++) {
-		unsigned char c = key[i];
-		if (c == '.')
-			dot = 1;
-		/* Leave the extended basename untouched.. */
-		if (!dot || i > baselen) {
-			if (!iskeychar(c) ||
-			    (i == baselen + 1 && !isalpha(c))) {
-				error(_("invalid key: %s"), key);
-				goto out_free_ret_1;
-			}
-			c = tolower(c);
-		} else if (c == '\n') {
-			error(_("invalid key (newline): %s"), key);
-			goto out_free_ret_1;
-		}
-		(*store_key)[i] = c;
-	}
-
-	return 0;
-
-out_free_ret_1:
-	FREE_AND_NULL(*store_key);
-	return CONFIG_INVALID_KEY;
-}
-
 static int config_parse_pair(const char *key, const char *value,
 			     struct key_value_info *kvi,
 			     config_fn_t fn, void *data)
@@ -787,251 +645,6 @@ int git_config_from_parameters(config_fn_t fn, void *data)
 	return ret;
 }
 
-static int get_next_char(struct config_source *cs)
-{
-	int c = cs->do_fgetc(cs);
-
-	if (c == '\r') {
-		/* DOS like systems */
-		c = cs->do_fgetc(cs);
-		if (c != '\n') {
-			if (c != EOF)
-				cs->do_ungetc(c, cs);
-			c = '\r';
-		}
-	}
-
-	if (c != EOF && ++cs->total_len > INT_MAX) {
-		/*
-		 * This is an absurdly long config file; refuse to parse
-		 * further in order to protect downstream code from integer
-		 * overflows. Note that we can't return an error specifically,
-		 * but we can mark EOF and put trash in the return value,
-		 * which will trigger a parse error.
-		 */
-		cs->eof = 1;
-		return 0;
-	}
-
-	if (c == '\n')
-		cs->linenr++;
-	if (c == EOF) {
-		cs->eof = 1;
-		cs->linenr++;
-		c = '\n';
-	}
-	return c;
-}
-
-static char *parse_value(struct config_source *cs)
-{
-	int quote = 0, comment = 0, space = 0;
-
-	strbuf_reset(&cs->value);
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n') {
-			if (quote) {
-				cs->linenr--;
-				return NULL;
-			}
-			return cs->value.buf;
-		}
-		if (comment)
-			continue;
-		if (isspace(c) && !quote) {
-			if (cs->value.len)
-				space++;
-			continue;
-		}
-		if (!quote) {
-			if (c == ';' || c == '#') {
-				comment = 1;
-				continue;
-			}
-		}
-		for (; space; space--)
-			strbuf_addch(&cs->value, ' ');
-		if (c == '\\') {
-			c = get_next_char(cs);
-			switch (c) {
-			case '\n':
-				continue;
-			case 't':
-				c = '\t';
-				break;
-			case 'b':
-				c = '\b';
-				break;
-			case 'n':
-				c = '\n';
-				break;
-			/* Some characters escape as themselves */
-			case '\\': case '"':
-				break;
-			/* Reject unknown escape sequences */
-			default:
-				return NULL;
-			}
-			strbuf_addch(&cs->value, c);
-			continue;
-		}
-		if (c == '"') {
-			quote = 1-quote;
-			continue;
-		}
-		strbuf_addch(&cs->value, c);
-	}
-}
-
-static int get_value(struct config_source *cs, struct key_value_info *kvi,
-		     config_fn_t fn, void *data, struct strbuf *name)
-{
-	int c;
-	char *value;
-	int ret;
-	struct config_context ctx = {
-		.kvi = kvi,
-	};
-
-	/* Get the full name */
-	for (;;) {
-		c = get_next_char(cs);
-		if (cs->eof)
-			break;
-		if (!iskeychar(c))
-			break;
-		strbuf_addch(name, tolower(c));
-	}
-
-	while (c == ' ' || c == '\t')
-		c = get_next_char(cs);
-
-	value = NULL;
-	if (c != '\n') {
-		if (c != '=')
-			return -1;
-		value = parse_value(cs);
-		if (!value)
-			return -1;
-	}
-	/*
-	 * We already consumed the \n, but we need linenr to point to
-	 * the line we just parsed during the call to fn to get
-	 * accurate line number in error messages.
-	 */
-	cs->linenr--;
-	kvi->linenr = cs->linenr;
-	ret = fn(name->buf, value, &ctx, data);
-	if (ret >= 0)
-		cs->linenr++;
-	return ret;
-}
-
-static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
-				 int c)
-{
-	cs->subsection_case_sensitive = 0;
-	do {
-		if (c == '\n')
-			goto error_incomplete_line;
-		c = get_next_char(cs);
-	} while (isspace(c));
-
-	/* We require the format to be '[base "extension"]' */
-	if (c != '"')
-		return -1;
-	strbuf_addch(name, '.');
-
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n')
-			goto error_incomplete_line;
-		if (c == '"')
-			break;
-		if (c == '\\') {
-			c = get_next_char(cs);
-			if (c == '\n')
-				goto error_incomplete_line;
-		}
-		strbuf_addch(name, c);
-	}
-
-	/* Final ']' */
-	if (get_next_char(cs) != ']')
-		return -1;
-	return 0;
-error_incomplete_line:
-	cs->linenr--;
-	return -1;
-}
-
-static int get_base_var(struct config_source *cs, struct strbuf *name)
-{
-	cs->subsection_case_sensitive = 1;
-	for (;;) {
-		int c = get_next_char(cs);
-		if (cs->eof)
-			return -1;
-		if (c == ']')
-			return 0;
-		if (isspace(c))
-			return get_extended_base_var(cs, name, c);
-		if (!iskeychar(c) && c != '.')
-			return -1;
-		strbuf_addch(name, tolower(c));
-	}
-}
-
-struct parse_event_data {
-	enum config_event_t previous_type;
-	size_t previous_offset;
-	const struct config_parse_options *opts;
-};
-
-static int do_event(struct config_source *cs, enum config_event_t type,
-		    struct parse_event_data *data)
-{
-	size_t offset;
-
-	if (!data->opts || !data->opts->event_fn)
-		return 0;
-
-	if (type == CONFIG_EVENT_WHITESPACE &&
-	    data->previous_type == type)
-		return 0;
-
-	offset = cs->do_ftell(cs);
-	/*
-	 * At EOF, the parser always "inserts" an extra '\n', therefore
-	 * the end offset of the event is the current file position, otherwise
-	 * we will already have advanced to the next event.
-	 */
-	if (type != CONFIG_EVENT_EOF)
-		offset--;
-
-	if (data->previous_type != CONFIG_EVENT_EOF &&
-	    data->opts->event_fn(data->previous_type, data->previous_offset,
-				 offset, cs, data->opts->event_fn_data) < 0)
-		return -1;
-
-	data->previous_type = type;
-	data->previous_offset = offset;
-
-	return 0;
-}
-
-static void kvi_from_source(struct config_source *cs,
-			    enum config_scope scope,
-			    struct key_value_info *out)
-{
-	out->filename = strintern(cs->name);
-	out->origin_type = cs->origin_type;
-	out->linenr = cs->linenr;
-	out->scope = scope;
-	out->path = cs->path;
-}
-
 int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 		      size_t end_offset UNUSED, struct config_source *cs,
 		      void *data)
@@ -1082,104 +695,6 @@ int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 	return error_return;
 }
 
-static int git_parse_source(struct config_source *cs, config_fn_t fn,
-			    struct key_value_info *kvi, void *data,
-			    const struct config_parse_options *opts)
-{
-	int comment = 0;
-	size_t baselen = 0;
-	struct strbuf *var = &cs->var;
-
-	/* U+FEFF Byte Order Mark in UTF8 */
-	const char *bomptr = utf8_bom;
-
-	/* For the parser event callback */
-	struct parse_event_data event_data = {
-		CONFIG_EVENT_EOF, 0, opts
-	};
-
-	for (;;) {
-		int c;
-
-		c = get_next_char(cs);
-		if (bomptr && *bomptr) {
-			/* We are at the file beginning; skip UTF8-encoded BOM
-			 * if present. Sane editors won't put this in on their
-			 * own, but e.g. Windows Notepad will do it happily. */
-			if (c == (*bomptr & 0377)) {
-				bomptr++;
-				continue;
-			} else {
-				/* Do not tolerate partial BOM. */
-				if (bomptr != utf8_bom)
-					break;
-				/* No BOM at file beginning. Cool. */
-				bomptr = NULL;
-			}
-		}
-		if (c == '\n') {
-			if (cs->eof) {
-				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
-					return -1;
-				return 0;
-			}
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-				return -1;
-			comment = 0;
-			continue;
-		}
-		if (comment)
-			continue;
-		if (isspace(c)) {
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-					return -1;
-			continue;
-		}
-		if (c == '#' || c == ';') {
-			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
-					return -1;
-			comment = 1;
-			continue;
-		}
-		if (c == '[') {
-			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
-					return -1;
-
-			/* Reset prior to determining a new stem */
-			strbuf_reset(var);
-			if (get_base_var(cs, var) < 0 || var->len < 1)
-				break;
-			strbuf_addch(var, '.');
-			baselen = var->len;
-			continue;
-		}
-		if (!isalpha(c))
-			break;
-
-		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
-			return -1;
-
-		/*
-		 * Truncate the var name back to the section header
-		 * stem prior to grabbing the suffix part of the name
-		 * and the value.
-		 */
-		strbuf_setlen(var, baselen);
-		strbuf_addch(var, tolower(c));
-		if (get_value(cs, kvi, fn, data, var) < 0)
-			break;
-	}
-
-	/*
-	 * FIXME for whatever reason, do_event passes the _previous_ event, so
-	 * in order for our callback to receive the error event, we have to call
-	 * do_event twice
-	 */
-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
-	return -1;
-}
-
 static uintmax_t get_unit_factor(const char *end)
 {
 	if (!*end)
@@ -1973,83 +1488,6 @@ int git_default_config(const char *var, const char *value,
 	return 0;
 }
 
-/*
- * All source specific fields in the union, die_on_error, name and the callbacks
- * fgetc, ungetc, ftell of top need to be initialized before calling
- * this function.
- */
-static int do_config_from(struct config_source *top, config_fn_t fn,
-			  void *data, enum config_scope scope,
-			  const struct config_parse_options *opts)
-{
-	struct key_value_info kvi = KVI_INIT;
-	int ret;
-
-	/* push config-file parsing state stack */
-	top->linenr = 1;
-	top->eof = 0;
-	top->total_len = 0;
-	strbuf_init(&top->value, 1024);
-	strbuf_init(&top->var, 1024);
-	kvi_from_source(top, scope, &kvi);
-
-	ret = git_parse_source(top, fn, &kvi, data, opts);
-
-	strbuf_release(&top->value);
-	strbuf_release(&top->var);
-
-	return ret;
-}
-
-static int do_config_from_file(config_fn_t fn,
-			       const enum config_origin_type origin_type,
-			       const char *name, const char *path, FILE *f,
-			       void *data, enum config_scope scope,
-			       const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-	int ret;
-
-	top.u.file = f;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = path;
-	top.do_fgetc = config_file_fgetc;
-	top.do_ungetc = config_file_ungetc;
-	top.do_ftell = config_file_ftell;
-
-	flockfile(f);
-	ret = do_config_from(&top, fn, data, scope, opts);
-	funlockfile(f);
-	return ret;
-}
-
-static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope,
-				 const struct config_parse_options *config_opts)
-{
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, config_opts);
-}
-
-int git_config_from_file_with_options(config_fn_t fn, const char *filename,
-				      void *data, enum config_scope scope,
-				      const struct config_parse_options *opts)
-{
-	int ret = -1;
-	FILE *f;
-
-	if (!filename)
-		BUG("filename cannot be NULL");
-	f = fopen_or_warn(filename, "r");
-	if (f) {
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
-					  filename, f, data, scope, opts);
-		fclose(f);
-	}
-	return ret;
-}
-
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
 	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
@@ -2058,27 +1496,6 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type origin_type,
-			const char *name, const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-
-	top.u.buf.buf = buf;
-	top.u.buf.len = len;
-	top.u.buf.pos = 0;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = NULL;
-	top.do_fgetc = config_buf_fgetc;
-	top.do_ungetc = config_buf_ungetc;
-	top.do_ftell = config_buf_ftell;
-
-	return do_config_from(&top, fn, data, scope, opts);
-}
-
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
 			      struct repository *repo,
diff --git a/config.h b/config.h
index 1aed02cd5d..3bad5e1c32 100644
--- a/config.h
+++ b/config.h
@@ -4,7 +4,7 @@
 #include "hashmap.h"
 #include "string-list.h"
 #include "repository.h"
-
+#include "config-parse.h"
 
 /**
  * The config API gives callers a way to access Git configuration files
@@ -23,9 +23,6 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these: */
-#define CONFIG_INVALID_KEY 1
-#define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
 #define CONFIG_NO_LOCK -1
 #define CONFIG_INVALID_FILE 3
@@ -36,17 +33,6 @@ struct object_id;
 
 #define CONFIG_REGEX_NONE ((void *)1)
 
-enum config_scope {
-	CONFIG_SCOPE_UNKNOWN = 0,
-	CONFIG_SCOPE_SYSTEM,
-	CONFIG_SCOPE_GLOBAL,
-	CONFIG_SCOPE_LOCAL,
-	CONFIG_SCOPE_WORKTREE,
-	CONFIG_SCOPE_COMMAND,
-	CONFIG_SCOPE_SUBMODULE,
-};
-const char *config_scope_name(enum config_scope scope);
-
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
@@ -54,46 +40,6 @@ struct git_config_source {
 	enum config_scope scope;
 };
 
-enum config_origin_type {
-	CONFIG_ORIGIN_UNKNOWN = 0,
-	CONFIG_ORIGIN_BLOB,
-	CONFIG_ORIGIN_FILE,
-	CONFIG_ORIGIN_STDIN,
-	CONFIG_ORIGIN_SUBMODULE_BLOB,
-	CONFIG_ORIGIN_CMDLINE
-};
-
-enum config_event_t {
-	CONFIG_EVENT_SECTION,
-	CONFIG_EVENT_ENTRY,
-	CONFIG_EVENT_WHITESPACE,
-	CONFIG_EVENT_COMMENT,
-	CONFIG_EVENT_EOF,
-	CONFIG_EVENT_ERROR
-};
-
-struct config_source;
-/*
- * The parser event function (if not NULL) is called with the event type and
- * the begin/end offsets of the parsed elements.
- *
- * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
- * character is considered part of the element.
- */
-typedef int (*config_parser_event_fn_t)(enum config_event_t type,
-					size_t begin_offset, size_t end_offset,
-					struct config_source *cs,
-					void *event_fn_data);
-
-struct config_parse_options {
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-};
-
 #define CP_OPTS_INIT(error_action) { \
 	.event_fn = git_config_err_fn, \
 	.event_fn_data = (enum config_error_action []){(error_action)}, \
@@ -126,59 +72,8 @@ enum config_error_action {
 int git_config_err_fn(enum config_event_t type, size_t begin_offset,
 		      size_t end_offset, struct config_source *cs,
 		      void *event_fn_data);
-
-/* Config source metadata for a given config key-value pair */
-struct key_value_info {
-	const char *filename;
-	int linenr;
-	enum config_origin_type origin_type;
-	enum config_scope scope;
-	const char *path;
-};
-#define KVI_INIT { \
-	.filename = NULL, \
-	.linenr = -1, \
-	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
-	.scope = CONFIG_SCOPE_UNKNOWN, \
-	.path = NULL, \
-}
-
-/* Captures additional information that a config callback can use. */
-struct config_context {
-	/* Config source metadata for key and value. */
-	const struct key_value_info *kvi;
-};
-#define CONFIG_CONTEXT_INIT { 0 }
-
-/**
- * A config callback function takes four parameters:
- *
- * - the name of the parsed variable. This is in canonical "flat" form: the
- *   section, subsection, and variable segments will be separated by dots,
- *   and the section and variable segments will be all lowercase. E.g.,
- *   `core.ignorecase`, `diff.SomeType.textconv`.
- *
- * - the value of the found variable, as a string. If the variable had no
- *   value specified, the value will be NULL (typically this means it
- *   should be interpreted as boolean true).
- *
- * - the 'config context', that is, additional information about the config
- *   iteration operation provided by the config machinery. For example, this
- *   includes information about the config source being parsed (e.g. the
- *   filename).
- *
- * - a void pointer passed in by the caller of the config API; this can
- *   contain callback-specific data
- *
- * A config callback should return 0 for success, or -1 if the variable
- * could not be parsed properly.
- */
-typedef int (*config_fn_t)(const char *, const char *,
-			   const struct config_context *, void *);
-
 int git_default_config(const char *, const char *,
 		       const struct config_context *, void *);
-
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
@@ -186,16 +81,6 @@ int git_default_config(const char *, const char *,
  * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
-
-int git_config_from_file_with_options(config_fn_t fn, const char *,
-				      void *, enum config_scope,
-				      const struct config_parse_options *);
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type,
-			const char *name,
-			const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
@@ -333,8 +218,6 @@ int repo_config_set_worktree_gently(struct repository *, const char *, const cha
  */
 void git_config_set(const char *, const char *);
 
-int git_config_parse_key(const char *, char **, size_t *);
-
 /*
  * The following macros specify flag bits that alter the behavior
  * of the git_config_set_multivar*() methods.
-- 
2.41.0.585.gd2178a4bd4-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v1.5 3/5] config: report config parse errors using cb
  2023-07-31 23:46   ` [RFC PATCH v1.5 3/5] config: report config parse errors using cb Glen Choo
@ 2023-08-04 21:34     ` Jonathan Tan
  0 siblings, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-08-04 21:34 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Calvin Wan

Glen Choo <chooglen@google.com> writes:
> +	/*
> +	 * FIXME for whatever reason, do_event passes the _previous_ event, so
> +	 * in order for our callback to receive the error event, we have to call
> +	 * do_event twice
> +	 */
> +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> +	return -1;
>  }

I think this is because do_event() uses the current position in the
config source as the end_offset and emits the previous event if there is
one (that is, if do_event() has been called before). It can't emit the
current event because it still does not know what the current event's
end_offset is.

I also noticed some more difficult-to-understand things, like _EOF and
_ERROR are never emitted as of current master (i.e. without this patch
set) because they are always the last events to be passed to do_event().
A refactoring to have separate "do" and "flush" would make things
clearer.
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 0/4] config-parse: create config parsing library
  2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
@ 2023-08-23 21:53 ` Josh Steadmon
  2023-08-23 21:53   ` [PATCH v2 1/4] config: split out config_parse_options Josh Steadmon
                     ` (4 more replies)
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
  4 siblings, 5 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-08-23 21:53 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

I'll be taking over this series from Glen (thank you for the work so
far).

Here's what I see as the current open questions:
- Do we need a cleanup of return values in config.c? If so, does it need
  to be part of this series?
- Are we OK leaving the include machinery in config.c?
- If we expect non-Git-CLI projects to use config-parse as a library,
  should we provide alternative error-handling callbacks for (assumed)
  use cases, even if they won't be directly used in the Git project?
  (I lean to "No", but I want to make sure the option is considered.)
- Does this series need to include the config-parse.c:do_event()
  refactor Jonathan Tan mentioned in [1]?

[1]: https://lore.kernel.org/git/20230804213457.1174493-1-jonathantanmy@google.com/

Changes since v1.5:
- Dropped patch 1/5: config: return positive from git_config_parse_key()


Glen Choo (4):
  config: split out config_parse_options
  config: report config parse errors using cb
  config.c: accept config_parse_options in git_config_from_stdin
  config-parse: split library out of config.[c|h]

 Makefile           |   1 +
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config-parse.c     | 561 ++++++++++++++++++++++++++++++++++++++
 config-parse.h     | 155 +++++++++++
 config.c           | 655 ++++-----------------------------------------
 config.h           | 134 +---------
 fsck.c             |   4 +-
 submodule-config.c |   9 +-
 9 files changed, 795 insertions(+), 732 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

Range-diff against v1:
1:  9924481630 < -:  ---------- config: return positive from git_config_parse_key()
-:  ---------- > 1:  5c676fbac3 config: split out config_parse_options
-:  ---------- > 2:  cb92a1f2e3 config: report config parse errors using cb
-:  ---------- > 3:  e034d0780c config.c: accept config_parse_options in git_config_from_stdin
-:  ---------- > 4:  74c5dcd5a2 config-parse: split library out of config.[c|h]

base-commit: aa9166bcc0ba654fc21f198a30647ec087f733ed
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 1/4] config: split out config_parse_options
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
@ 2023-08-23 21:53   ` Josh Steadmon
  2023-08-23 23:26     ` Junio C Hamano
  2023-08-23 21:53   ` [PATCH v2 2/4] config: report config parse errors using cb Josh Steadmon
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-08-23 21:53 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

"struct config_options" is a disjoint set of options options used by the
config parser (e.g. event listners) and options used by
config_with_options() (e.g. to handle includes, choose which config
files to parse). Split parser-only options into config_parse_options.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 bundle-uri.c |  2 +-
 config.c     | 14 +++++++-------
 config.h     | 37 ++++++++++++++++++++-----------------
 fsck.c       |  2 +-
 4 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4b5c49b93d..f93ca6a486 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,7 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_options opts = {
+	struct config_parse_options opts = {
 		.error_action = CONFIG_ERROR_ERROR,
 	};
 
diff --git a/config.c b/config.c
index 85c5f35132..1518f70fc2 100644
--- a/config.c
+++ b/config.c
@@ -982,7 +982,7 @@ static int get_base_var(struct config_source *cs, struct strbuf *name)
 struct parse_event_data {
 	enum config_event_t previous_type;
 	size_t previous_offset;
-	const struct config_options *opts;
+	const struct config_parse_options *opts;
 };
 
 static int do_event(struct config_source *cs, enum config_event_t type,
@@ -1030,7 +1030,7 @@ static void kvi_from_source(struct config_source *cs,
 
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
-			    const struct config_options *opts)
+			    const struct config_parse_options *opts)
 {
 	int comment = 0;
 	size_t baselen = 0;
@@ -1967,7 +1967,7 @@ int git_default_config(const char *var, const char *value,
  */
 static int do_config_from(struct config_source *top, config_fn_t fn,
 			  void *data, enum config_scope scope,
-			  const struct config_options *opts)
+			  const struct config_parse_options *opts)
 {
 	struct key_value_info kvi = KVI_INIT;
 	int ret;
@@ -1992,7 +1992,7 @@ static int do_config_from_file(config_fn_t fn,
 			       const enum config_origin_type origin_type,
 			       const char *name, const char *path, FILE *f,
 			       void *data, enum config_scope scope,
-			       const struct config_options *opts)
+			       const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 	int ret;
@@ -2021,7 +2021,7 @@ static int git_config_from_stdin(config_fn_t fn, void *data,
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 				      void *data, enum config_scope scope,
-				      const struct config_options *opts)
+				      const struct config_parse_options *opts)
 {
 	int ret = -1;
 	FILE *f;
@@ -2047,7 +2047,7 @@ int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type origin_type,
 			const char *name, const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts)
+			const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 
@@ -3380,7 +3380,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_options opts;
+		struct config_parse_options opts;
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
diff --git a/config.h b/config.h
index 6332d74904..2537516446 100644
--- a/config.h
+++ b/config.h
@@ -85,6 +85,21 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					struct config_source *cs,
 					void *event_fn_data);
 
+struct config_parse_options {
+	enum config_error_action {
+		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
+		CONFIG_ERROR_DIE, /* die() on error */
+		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+		CONFIG_ERROR_SILENT, /* return -1 */
+	} error_action;
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -92,6 +107,9 @@ struct config_options {
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
 
+	const char *commondir;
+	const char *git_dir;
+	struct config_parse_options parse_options;
 	/*
 	 * For internal use. Include all includeif.hasremoteurl paths without
 	 * checking if the repo has that remote URL, and when doing so, verify
@@ -99,21 +117,6 @@ struct config_options {
 	 * themselves.
 	 */
 	unsigned int unconditional_remote_url : 1;
-
-	const char *commondir;
-	const char *git_dir;
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 };
 
 /* Config source metadata for a given config key-value pair */
@@ -178,13 +181,13 @@ int git_config_from_file(config_fn_t fn, const char *, void *);
 
 int git_config_from_file_with_options(config_fn_t fn, const char *,
 				      void *, enum config_scope,
-				      const struct config_options *);
+				      const struct config_parse_options *);
 int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type,
 			const char *name,
 			const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts);
+			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
diff --git a/fsck.c b/fsck.c
index 3be86616c5..522ee1c18a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,7 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_options config_opts = { 0 };
+		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 2/4] config: report config parse errors using cb
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
  2023-08-23 21:53   ` [PATCH v2 1/4] config: split out config_parse_options Josh Steadmon
@ 2023-08-23 21:53   ` Josh Steadmon
  2023-08-24  1:19     ` Junio C Hamano
  2023-08-23 21:53   ` [PATCH v2 3/4] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-08-23 21:53 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

In a subsequent commit, config parsing will become its own library, and
it's likely that the caller will want flexibility in handling errors
(instead of being limited to the error handling we have in-tree).

Move the Git-specific error handling into a config_parser_event_fn_t
that responds to config errors, and make git_parse_source() always
return -1 (careful inspection shows that it was always returning -1
already). This makes CONFIG_ERROR_SILENT obsolete since that is
equivalent to not specifying an error event listener. Also, remove
CONFIG_ERROR_UNSET and the config_source 'default', since all callers
are now expected to specify the error handling they want.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config.c           | 175 ++++++++++++++++++++++++++-------------------
 config.h           |  20 ++++--
 fsck.c             |   4 +-
 submodule-config.c |   9 ++-
 6 files changed, 129 insertions(+), 87 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 1c75cbc43d..e2cf49de7a 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -42,7 +42,9 @@ static int actions, type;
 static char *default_value;
 static int end_nul;
 static int respect_includes_opt = -1;
-static struct config_options config_options;
+static struct config_options config_options = {
+	.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE)
+};
 static int show_origin;
 static int show_scope;
 static int fixed_value;
diff --git a/bundle-uri.c b/bundle-uri.c
index f93ca6a486..856bffdcad 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_parse_options opts = {
-		.error_action = CONFIG_ERROR_ERROR,
-	};
+	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	if (!list->baseURI) {
 		struct strbuf baseURI = STRBUF_INIT;
diff --git a/config.c b/config.c
index 1518f70fc2..6cf4dafc6c 100644
--- a/config.c
+++ b/config.c
@@ -55,7 +55,6 @@ struct config_source {
 	enum config_origin_type origin_type;
 	const char *name;
 	const char *path;
-	enum config_error_action default_error_action;
 	int linenr;
 	int eof;
 	size_t total_len;
@@ -185,13 +184,15 @@ static int handle_path_include(const struct key_value_info *kvi,
 	}
 
 	if (!access_or_die(path, R_OK, 0)) {
+		struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 		if (++inc->depth > MAX_INCLUDE_DEPTH)
 			die(_(include_depth_advice), MAX_INCLUDE_DEPTH, path,
 			    !kvi ? "<unknown>" :
 			    kvi->filename ? kvi->filename :
 			    "the command line");
 		ret = git_config_from_file_with_options(git_config_include, path, inc,
-							kvi->scope, NULL);
+							kvi->scope, &config_opts);
 		inc->depth--;
 	}
 cleanup:
@@ -339,7 +340,9 @@ static int add_remote_url(const char *var, const char *value,
 
 static void populate_remote_urls(struct config_include_data *inc)
 {
-	struct config_options opts;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts = *inc->opts;
 	opts.unconditional_remote_url = 1;
@@ -1028,6 +1031,56 @@ static void kvi_from_source(struct config_source *cs,
 	out->path = cs->path;
 }
 
+int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
+		      size_t end_offset UNUSED, struct config_source *cs,
+		      void *data)
+{
+	char *error_msg = NULL;
+	int error_return = 0;
+	enum config_error_action *action = data;
+
+	if (type != CONFIG_EVENT_ERROR)
+		return 0;
+
+	switch (cs->origin_type) {
+	case CONFIG_ORIGIN_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in blob %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_FILE:
+		error_msg = xstrfmt(_("bad config line %d in file %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_STDIN:
+		error_msg = xstrfmt(_("bad config line %d in standard input"),
+				      cs->linenr);
+		break;
+	case CONFIG_ORIGIN_SUBMODULE_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
+				       cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_CMDLINE:
+		error_msg = xstrfmt(_("bad config line %d in command line %s"),
+				       cs->linenr, cs->name);
+		break;
+	default:
+		error_msg = xstrfmt(_("bad config line %d in %s"),
+				      cs->linenr, cs->name);
+	}
+
+	switch (*action) {
+	case CONFIG_ERROR_DIE:
+		die("%s", error_msg);
+		break;
+	case CONFIG_ERROR_ERROR:
+		error_return = error("%s", error_msg);
+		break;
+	}
+
+	free(error_msg);
+	return error_return;
+}
+
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
 			    const struct config_parse_options *opts)
@@ -1035,8 +1088,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 	int comment = 0;
 	size_t baselen = 0;
 	struct strbuf *var = &cs->var;
-	int error_return = 0;
-	char *error_msg = NULL;
 
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
@@ -1118,53 +1169,14 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			break;
 	}
 
-	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
-		return -1;
-
-	switch (cs->origin_type) {
-	case CONFIG_ORIGIN_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in blob %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_FILE:
-		error_msg = xstrfmt(_("bad config line %d in file %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_STDIN:
-		error_msg = xstrfmt(_("bad config line %d in standard input"),
-				      cs->linenr);
-		break;
-	case CONFIG_ORIGIN_SUBMODULE_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
-				       cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_CMDLINE:
-		error_msg = xstrfmt(_("bad config line %d in command line %s"),
-				       cs->linenr, cs->name);
-		break;
-	default:
-		error_msg = xstrfmt(_("bad config line %d in %s"),
-				      cs->linenr, cs->name);
-	}
-
-	switch (opts && opts->error_action ?
-		opts->error_action :
-		cs->default_error_action) {
-	case CONFIG_ERROR_DIE:
-		die("%s", error_msg);
-		break;
-	case CONFIG_ERROR_ERROR:
-		error_return = error("%s", error_msg);
-		break;
-	case CONFIG_ERROR_SILENT:
-		error_return = -1;
-		break;
-	case CONFIG_ERROR_UNSET:
-		BUG("config error action unset");
-	}
-
-	free(error_msg);
-	return error_return;
+	/*
+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
+	 * in order for our callback to receive the error event, we have to call
+	 * do_event twice
+	 */
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
 }
 
 static uintmax_t get_unit_factor(const char *end)
@@ -2001,7 +2013,6 @@ static int do_config_from_file(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = path;
-	top.default_error_action = CONFIG_ERROR_DIE;
 	top.do_fgetc = config_file_fgetc;
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
@@ -2015,8 +2026,10 @@ static int do_config_from_file(config_fn_t fn,
 static int git_config_from_stdin(config_fn_t fn, void *data,
 				 enum config_scope scope)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, NULL);
+				   data, scope, &config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2039,8 +2052,10 @@ int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return git_config_from_file_with_options(fn, filename, data,
-						 CONFIG_SCOPE_UNKNOWN, NULL);
+						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
 int git_config_from_mem(config_fn_t fn,
@@ -2057,7 +2072,6 @@ int git_config_from_mem(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = NULL;
-	top.default_error_action = CONFIG_ERROR_ERROR;
 	top.do_fgetc = config_buf_fgetc;
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
@@ -2076,6 +2090,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	char *buf;
 	unsigned long size;
 	int ret;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
@@ -2086,7 +2101,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	}
 
 	ret = git_config_from_mem(fn, CONFIG_ORIGIN_BLOB, name, buf, size,
-				  data, scope, NULL);
+				  data, scope, &config_opts);
 	free(buf);
 
 	return ret;
@@ -2187,29 +2202,32 @@ static int do_git_config_sequence(const struct config_options *opts,
 			   opts->system_gently ? ACCESS_EACCES_OK : 0))
 		ret += git_config_from_file_with_options(fn, system_config,
 							 data, CONFIG_SCOPE_SYSTEM,
-							 NULL);
+							 &opts->parse_options);
 
 	git_global_config(&user_config, &xdg_config);
 
 	if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, xdg_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, user_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_repo && repo_config &&
 	    !access_or_die(repo_config, R_OK, 0))
 		ret += git_config_from_file_with_options(fn, repo_config, data,
-							 CONFIG_SCOPE_LOCAL, NULL);
+							 CONFIG_SCOPE_LOCAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_worktree && worktree_config &&
 	    repo && repo->repository_format_worktree_config &&
 	    !access_or_die(worktree_config, R_OK, 0)) {
 			ret += git_config_from_file_with_options(fn, worktree_config, data,
 								 CONFIG_SCOPE_WORKTREE,
-								 NULL);
+								 &opts->parse_options);
 	}
 
 	if (!opts->ignore_cmdline && git_config_from_parameters(fn, data) < 0)
@@ -2250,7 +2268,7 @@ int config_with_options(config_fn_t fn, void *data,
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-							NULL);
+							&opts->parse_options);
 	} else if (config_source && config_source->blob) {
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 					       data, config_source->scope);
@@ -2288,9 +2306,11 @@ static void configset_iter(struct config_set *set, config_fn_t fn, void *data)
 
 void read_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = {0};
 	struct strbuf commondir = STRBUF_INIT;
 	struct strbuf gitdir = STRBUF_INIT;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 
@@ -2322,7 +2342,9 @@ void read_early_config(config_fn_t cb, void *data)
  */
 void read_very_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = { 0 };
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 	opts.ignore_repo = 1;
@@ -2613,7 +2635,9 @@ int git_configset_get_pathname(struct config_set *set, const char *key, const ch
 /* Functions use to read configuration from a repository */
 static void repo_read_config(struct repository *repo)
 {
-	struct config_options opts = { 0 };
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 	opts.commondir = repo->commondir;
@@ -2760,12 +2784,14 @@ int repo_config_get_pathname(struct repository *repo,
 static void read_protected_config(void)
 {
 	struct config_options opts = {
-		.respect_includes = 1,
-		.ignore_repo = 1,
-		.ignore_worktree = 1,
-		.system_gently = 1,
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
 	};
 
+	opts.respect_includes = 1;
+	opts.ignore_repo = 1;
+	opts.ignore_worktree = 1;
+	opts.system_gently = 1;
+
 	git_configset_init(&protected_config);
 	config_with_options(config_set_callback, &protected_config, NULL,
 			    NULL, &opts);
@@ -2976,6 +3002,7 @@ struct config_store_data {
 		enum config_event_t type;
 		int is_keys_section;
 	} *parsed;
+	enum config_error_action error_action;
 	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
@@ -3043,6 +3070,10 @@ static int store_aux_event(enum config_event_t type, size_t begin, size_t end,
 			store->seen[store->seen_nr] = store->parsed_nr;
 		}
 	}
+	if (type == CONFIG_EVENT_ERROR) {
+		return git_config_err_fn(type, begin, end, cs,
+					 &store->error_action);
+	}
 
 	store->parsed_nr++;
 
@@ -3380,7 +3411,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_parse_options opts;
+		struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
@@ -3407,8 +3438,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
 		store.parsed[0].end = 0;
+		store.error_action = CONFIG_ERROR_DIE;
 
-		memset(&opts, 0, sizeof(opts));
 		opts.event_fn = store_aux_event;
 		opts.event_fn_data = &store;
 
diff --git a/config.h b/config.h
index 2537516446..8ad399580f 100644
--- a/config.h
+++ b/config.h
@@ -86,12 +86,6 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					void *event_fn_data);
 
 struct config_parse_options {
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 	/*
 	 * event_fn and event_fn_data are for internal use only. Handles events
 	 * emitted by the config parser.
@@ -100,6 +94,11 @@ struct config_parse_options {
 	void *event_fn_data;
 };
 
+#define CP_OPTS_INIT(error_action) { \
+	.event_fn = git_config_err_fn, \
+	.event_fn_data = (enum config_error_action []){(error_action)}, \
+}
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -119,6 +118,15 @@ struct config_options {
 	unsigned int unconditional_remote_url : 1;
 };
 
+enum config_error_action {
+	CONFIG_ERROR_DIE, /* die() on error */
+	CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+};
+
+int git_config_err_fn(enum config_event_t type, size_t begin_offset,
+		      size_t end_offset, struct config_source *cs,
+		      void *event_fn_data);
+
 /* Config source metadata for a given config key-value pair */
 struct key_value_info {
 	const char *filename;
diff --git a/fsck.c b/fsck.c
index 522ee1c18a..bc0ca11421 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,6 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
@@ -1238,10 +1237,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		data.oid = oid;
 		data.options = options;
 		data.ret = 0;
-		config_opts.error_action = CONFIG_ERROR_SILENT;
 		if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 					".gitmodules", buf, size, &data,
-					CONFIG_SCOPE_UNKNOWN, &config_opts))
+					CONFIG_SCOPE_UNKNOWN, NULL))
 			data.ret |= report(options, oid, OBJ_BLOB,
 					FSCK_MSG_GITMODULES_PARSE,
 					"could not parse gitmodules blob");
diff --git a/submodule-config.c b/submodule-config.c
index b6908e295f..d97135c917 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -565,6 +565,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	enum object_type type;
 	const struct submodule *submodule = NULL;
 	struct parse_config_parameter parameter;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
+
 
 	/*
 	 * If any parameter except the cache is a NULL pointer just
@@ -608,7 +610,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	parameter.gitmodules_oid = &oid;
 	parameter.overwrite = 0;
 	git_config_from_mem(parse_config, CONFIG_ORIGIN_SUBMODULE_BLOB, rev.buf,
-			    config, config_size, &parameter, CONFIG_SCOPE_UNKNOWN, NULL);
+			    config, config_size, &parameter,
+			    CONFIG_SCOPE_UNKNOWN, &config_opts);
 	strbuf_release(&rev);
 	free(config);
 
@@ -652,7 +655,9 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 		struct git_config_source config_source = {
 			0, .scope = CONFIG_SCOPE_SUBMODULE
 		};
-		const struct config_options opts = { 0 };
+		struct config_options opts = {
+			.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+		};
 		struct object_id oid;
 		char *file;
 		char *oidstr = NULL;
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 3/4] config.c: accept config_parse_options in git_config_from_stdin
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
  2023-08-23 21:53   ` [PATCH v2 1/4] config: split out config_parse_options Josh Steadmon
  2023-08-23 21:53   ` [PATCH v2 2/4] config: report config parse errors using cb Josh Steadmon
@ 2023-08-23 21:53   ` Josh Steadmon
  2023-08-23 21:53   ` [PATCH v2 4/4] config-parse: split library out of config.[c|h] Josh Steadmon
  2023-08-24 20:10   ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
  4 siblings, 0 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-08-23 21:53 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

A later commit will move git_config_from_stdin() to a library, so it
will need to accept event listeners.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 config.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/config.c b/config.c
index 6cf4dafc6c..40cc9dbc40 100644
--- a/config.c
+++ b/config.c
@@ -2024,12 +2024,11 @@ static int do_config_from_file(config_fn_t fn,
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope)
+				 enum config_scope scope,
+				 const struct config_parse_options *config_opts)
 {
-	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
-
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, &config_opts);
+				   data, scope, config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2264,7 +2263,8 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		ret = git_config_from_stdin(fn, data, config_source->scope);
+		ret = git_config_from_stdin(fn, data, config_source->scope,
+					    &opts->parse_options);
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 4/4] config-parse: split library out of config.[c|h]
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
                     ` (2 preceding siblings ...)
  2023-08-23 21:53   ` [PATCH v2 3/4] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
@ 2023-08-23 21:53   ` Josh Steadmon
  2023-08-24 20:10   ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
  4 siblings, 0 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-08-23 21:53 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

The config parsing machinery (besides "include" directives) is usable by
programs other than Git - it works with any file written in Git config
syntax (IOW it doesn't rely on 'core' Git features like a repository),
and as of the series ending at 6e8e7981eb (config: pass source to
config_parser_event_fn_t, 2023-06-28), it no longer relies on global
state. Thus, we can and should start turning it into a library other
programs can use.

Begin this process by splitting the config parsing code out of
config.[c|h] and into config-parse.[c|h]. Do not change interfaces or
function bodies, but tweak visibility and includes where appropriate,
namely:

- git_config_from_stdin() is now non-static so that it can be seen by
  config.c.

- "struct config_source" is now defined in the .h file so that it can be
  seen by config.c. And as a result, config-lib.h needs to "#include
  strbuf.h".

In theory, this makes it possible for in-tree files to decide whether
they only need all of the config functionality or only config parsing,
and bring in the smallest bit of functionality needed. But for now,
there are no in-tree files that can swap "#include config.h" for
"#include config-parse.h". E.g. Bundle URIs would only need config
parsing to parse bundle lists, but bundle-uri.c uses other config.h
functionality like key parsing and reading repo settings.

The resulting library is usable, though it is unergonomic to do so,
e.g. the caller needs to "#include git-compat-util.h" and other
dependencies, and we don't have an easy way of linking in the required
objects. This isn't the end state we want for our libraries, but at
least we have _some_ library whose usability we can improve in future
series.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Makefile       |   1 +
 config-parse.c | 561 +++++++++++++++++++++++++++++++++++++++++++++++
 config-parse.h | 155 +++++++++++++
 config.c       | 582 -------------------------------------------------
 config.h       | 119 +---------
 5 files changed, 718 insertions(+), 700 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

diff --git a/Makefile b/Makefile
index fb541dedc9..67e05bcee5 100644
--- a/Makefile
+++ b/Makefile
@@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += compat/zlib-uncompress2.o
 LIB_OBJS += config.o
+LIB_OBJS += config-parse.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/config-parse.c b/config-parse.c
new file mode 100644
index 0000000000..97ebd6d72b
--- /dev/null
+++ b/config-parse.c
@@ -0,0 +1,561 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "gettext.h"
+#include "hashmap.h"
+#include "utf8.h"
+#include "config-parse.h"
+
+static int config_file_fgetc(struct config_source *conf)
+{
+	return getc_unlocked(conf->u.file);
+}
+
+static int config_file_ungetc(int c, struct config_source *conf)
+{
+	return ungetc(c, conf->u.file);
+}
+
+static long config_file_ftell(struct config_source *conf)
+{
+	return ftell(conf->u.file);
+}
+
+
+static int config_buf_fgetc(struct config_source *conf)
+{
+	if (conf->u.buf.pos < conf->u.buf.len)
+		return conf->u.buf.buf[conf->u.buf.pos++];
+
+	return EOF;
+}
+
+static int config_buf_ungetc(int c, struct config_source *conf)
+{
+	if (conf->u.buf.pos > 0) {
+		conf->u.buf.pos--;
+		if (conf->u.buf.buf[conf->u.buf.pos] != c)
+			BUG("config_buf can only ungetc the same character");
+		return c;
+	}
+
+	return EOF;
+}
+
+static long config_buf_ftell(struct config_source *conf)
+{
+	return conf->u.buf.pos;
+}
+
+static inline int iskeychar(int c)
+{
+	return isalnum(c) || c == '-';
+}
+
+/*
+ * Auxiliary function to sanity-check and split the key into the section
+ * identifier and variable name.
+ *
+ * Returns 0 on success, -CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and -CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
+ *
+ * store_key - pointer to char* which will hold a copy of the key with
+ *             lowercase section and variable name
+ * baselen - pointer to size_t which will hold the length of the
+ *           section + subsection part, can be NULL
+ */
+int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
+{
+	size_t i, baselen;
+	int dot;
+	const char *last_dot = strrchr(key, '.');
+
+	/*
+	 * Since "key" actually contains the section name and the real
+	 * key name separated by a dot, we have to know where the dot is.
+	 */
+
+	if (last_dot == NULL || last_dot == key) {
+		error(_("key does not contain a section: %s"), key);
+		return -CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	if (!last_dot[1]) {
+		error(_("key does not contain variable name: %s"), key);
+		return -CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	baselen = last_dot - key;
+	if (baselen_)
+		*baselen_ = baselen;
+
+	/*
+	 * Validate the key and while at it, lower case it for matching.
+	 */
+	*store_key = xmallocz(strlen(key));
+
+	dot = 0;
+	for (i = 0; key[i]; i++) {
+		unsigned char c = key[i];
+		if (c == '.')
+			dot = 1;
+		/* Leave the extended basename untouched.. */
+		if (!dot || i > baselen) {
+			if (!iskeychar(c) ||
+			    (i == baselen + 1 && !isalpha(c))) {
+				error(_("invalid key: %s"), key);
+				goto out_free_ret_1;
+			}
+			c = tolower(c);
+		} else if (c == '\n') {
+			error(_("invalid key (newline): %s"), key);
+			goto out_free_ret_1;
+		}
+		(*store_key)[i] = c;
+	}
+
+	return 0;
+
+out_free_ret_1:
+	FREE_AND_NULL(*store_key);
+	return -CONFIG_INVALID_KEY;
+}
+
+static int get_next_char(struct config_source *cs)
+{
+	int c = cs->do_fgetc(cs);
+
+	if (c == '\r') {
+		/* DOS like systems */
+		c = cs->do_fgetc(cs);
+		if (c != '\n') {
+			if (c != EOF)
+				cs->do_ungetc(c, cs);
+			c = '\r';
+		}
+	}
+
+	if (c != EOF && ++cs->total_len > INT_MAX) {
+		/*
+		 * This is an absurdly long config file; refuse to parse
+		 * further in order to protect downstream code from integer
+		 * overflows. Note that we can't return an error specifically,
+		 * but we can mark EOF and put trash in the return value,
+		 * which will trigger a parse error.
+		 */
+		cs->eof = 1;
+		return 0;
+	}
+
+	if (c == '\n')
+		cs->linenr++;
+	if (c == EOF) {
+		cs->eof = 1;
+		cs->linenr++;
+		c = '\n';
+	}
+	return c;
+}
+
+static char *parse_value(struct config_source *cs)
+{
+	int quote = 0, comment = 0, space = 0;
+
+	strbuf_reset(&cs->value);
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n') {
+			if (quote) {
+				cs->linenr--;
+				return NULL;
+			}
+			return cs->value.buf;
+		}
+		if (comment)
+			continue;
+		if (isspace(c) && !quote) {
+			if (cs->value.len)
+				space++;
+			continue;
+		}
+		if (!quote) {
+			if (c == ';' || c == '#') {
+				comment = 1;
+				continue;
+			}
+		}
+		for (; space; space--)
+			strbuf_addch(&cs->value, ' ');
+		if (c == '\\') {
+			c = get_next_char(cs);
+			switch (c) {
+			case '\n':
+				continue;
+			case 't':
+				c = '\t';
+				break;
+			case 'b':
+				c = '\b';
+				break;
+			case 'n':
+				c = '\n';
+				break;
+			/* Some characters escape as themselves */
+			case '\\': case '"':
+				break;
+			/* Reject unknown escape sequences */
+			default:
+				return NULL;
+			}
+			strbuf_addch(&cs->value, c);
+			continue;
+		}
+		if (c == '"') {
+			quote = 1-quote;
+			continue;
+		}
+		strbuf_addch(&cs->value, c);
+	}
+}
+
+static int get_value(struct config_source *cs, struct key_value_info *kvi,
+		     config_fn_t fn, void *data, struct strbuf *name)
+{
+	int c;
+	char *value;
+	int ret;
+	struct config_context ctx = {
+		.kvi = kvi,
+	};
+
+	/* Get the full name */
+	for (;;) {
+		c = get_next_char(cs);
+		if (cs->eof)
+			break;
+		if (!iskeychar(c))
+			break;
+		strbuf_addch(name, tolower(c));
+	}
+
+	while (c == ' ' || c == '\t')
+		c = get_next_char(cs);
+
+	value = NULL;
+	if (c != '\n') {
+		if (c != '=')
+			return -1;
+		value = parse_value(cs);
+		if (!value)
+			return -1;
+	}
+	/*
+	 * We already consumed the \n, but we need linenr to point to
+	 * the line we just parsed during the call to fn to get
+	 * accurate line number in error messages.
+	 */
+	cs->linenr--;
+	kvi->linenr = cs->linenr;
+	ret = fn(name->buf, value, &ctx, data);
+	if (ret >= 0)
+		cs->linenr++;
+	return ret;
+}
+
+static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
+				 int c)
+{
+	cs->subsection_case_sensitive = 0;
+	do {
+		if (c == '\n')
+			goto error_incomplete_line;
+		c = get_next_char(cs);
+	} while (isspace(c));
+
+	/* We require the format to be '[base "extension"]' */
+	if (c != '"')
+		return -1;
+	strbuf_addch(name, '.');
+
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n')
+			goto error_incomplete_line;
+		if (c == '"')
+			break;
+		if (c == '\\') {
+			c = get_next_char(cs);
+			if (c == '\n')
+				goto error_incomplete_line;
+		}
+		strbuf_addch(name, c);
+	}
+
+	/* Final ']' */
+	if (get_next_char(cs) != ']')
+		return -1;
+	return 0;
+error_incomplete_line:
+	cs->linenr--;
+	return -1;
+}
+
+static int get_base_var(struct config_source *cs, struct strbuf *name)
+{
+	cs->subsection_case_sensitive = 1;
+	for (;;) {
+		int c = get_next_char(cs);
+		if (cs->eof)
+			return -1;
+		if (c == ']')
+			return 0;
+		if (isspace(c))
+			return get_extended_base_var(cs, name, c);
+		if (!iskeychar(c) && c != '.')
+			return -1;
+		strbuf_addch(name, tolower(c));
+	}
+}
+
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_parse_options *opts;
+};
+
+static int do_event(struct config_source *cs, enum config_event_t type,
+		    struct parse_event_data *data)
+{
+	size_t offset;
+
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	offset = cs->do_ftell(cs);
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 offset, cs, data->opts->event_fn_data) < 0)
+		return -1;
+
+	data->previous_type = type;
+	data->previous_offset = offset;
+
+	return 0;
+}
+
+static void kvi_from_source(struct config_source *cs,
+			    enum config_scope scope,
+			    struct key_value_info *out)
+{
+	out->filename = strintern(cs->name);
+	out->origin_type = cs->origin_type;
+	out->linenr = cs->linenr;
+	out->scope = scope;
+	out->path = cs->path;
+}
+
+static int git_parse_source(struct config_source *cs, config_fn_t fn,
+			    struct key_value_info *kvi, void *data,
+			    const struct config_parse_options *opts)
+{
+	int comment = 0;
+	size_t baselen = 0;
+	struct strbuf *var = &cs->var;
+
+	/* U+FEFF Byte Order Mark in UTF8 */
+	const char *bomptr = utf8_bom;
+
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
+	for (;;) {
+		int c;
+
+		c = get_next_char(cs);
+		if (bomptr && *bomptr) {
+			/* We are at the file beginning; skip UTF8-encoded BOM
+			 * if present. Sane editors won't put this in on their
+			 * own, but e.g. Windows Notepad will do it happily. */
+			if (c == (*bomptr & 0377)) {
+				bomptr++;
+				continue;
+			} else {
+				/* Do not tolerate partial BOM. */
+				if (bomptr != utf8_bom)
+					break;
+				/* No BOM at file beginning. Cool. */
+				bomptr = NULL;
+			}
+		}
+		if (c == '\n') {
+			if (cs->eof) {
+				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
+				return 0;
+			}
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
+			comment = 0;
+			continue;
+		}
+		if (comment)
+			continue;
+		if (isspace(c)) {
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
+		if (c == '#' || c == ';') {
+			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
+			comment = 1;
+			continue;
+		}
+		if (c == '[') {
+			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
+			/* Reset prior to determining a new stem */
+			strbuf_reset(var);
+			if (get_base_var(cs, var) < 0 || var->len < 1)
+				break;
+			strbuf_addch(var, '.');
+			baselen = var->len;
+			continue;
+		}
+		if (!isalpha(c))
+			break;
+
+		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
+		/*
+		 * Truncate the var name back to the section header
+		 * stem prior to grabbing the suffix part of the name
+		 * and the value.
+		 */
+		strbuf_setlen(var, baselen);
+		strbuf_addch(var, tolower(c));
+		if (get_value(cs, kvi, fn, data, var) < 0)
+			break;
+	}
+	/*
+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
+	 * in order for our callback to receive the error event, we have to call
+	 * do_event twice
+	 */
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
+}
+
+/*
+ * All source specific fields in the union, die_on_error, name and the callbacks
+ * fgetc, ungetc, ftell of top need to be initialized before calling
+ * this function.
+ */
+static int do_config_from(struct config_source *top, config_fn_t fn,
+			  void *data, enum config_scope scope,
+			  const struct config_parse_options *opts)
+{
+	struct key_value_info kvi = KVI_INIT;
+	int ret;
+
+	/* push config-file parsing state stack */
+	top->linenr = 1;
+	top->eof = 0;
+	top->total_len = 0;
+	strbuf_init(&top->value, 1024);
+	strbuf_init(&top->var, 1024);
+	kvi_from_source(top, scope, &kvi);
+
+	ret = git_parse_source(top, fn, &kvi, data, opts);
+
+	strbuf_release(&top->value);
+	strbuf_release(&top->var);
+
+	return ret;
+}
+
+static int do_config_from_file(config_fn_t fn,
+			       const enum config_origin_type origin_type,
+			       const char *name, const char *path, FILE *f,
+			       void *data, enum config_scope scope,
+			       const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+	int ret;
+
+	top.u.file = f;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = path;
+	top.do_fgetc = config_file_fgetc;
+	top.do_ungetc = config_file_ungetc;
+	top.do_ftell = config_file_ftell;
+
+	flockfile(f);
+	ret = do_config_from(&top, fn, data, scope, opts);
+	funlockfile(f);
+	return ret;
+}
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts)
+{
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, scope, config_opts);
+}
+
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data, enum config_scope scope,
+				      const struct config_parse_options *opts)
+{
+	int ret = -1;
+	FILE *f;
+
+	if (!filename)
+		BUG("filename cannot be NULL");
+	f = fopen_or_warn(filename, "r");
+	if (f) {
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, scope, opts);
+		fclose(f);
+	}
+	return ret;
+}
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type origin_type,
+			const char *name, const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+
+	top.u.buf.buf = buf;
+	top.u.buf.len = len;
+	top.u.buf.pos = 0;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = NULL;
+	top.do_fgetc = config_buf_fgetc;
+	top.do_ungetc = config_buf_ungetc;
+	top.do_ftell = config_buf_ftell;
+
+	return do_config_from(&top, fn, data, scope, opts);
+}
diff --git a/config-parse.h b/config-parse.h
new file mode 100644
index 0000000000..ac73a826d9
--- /dev/null
+++ b/config-parse.h
@@ -0,0 +1,155 @@
+/*
+ * Low level config parsing.
+ */
+#ifndef CONFIG_PARSE_H
+#define CONFIG_PARSE_H
+
+#include "strbuf.h"
+
+/* git_config_parse_key() returns these negated: */
+#define CONFIG_INVALID_KEY 1
+#define CONFIG_NO_SECTION_OR_NAME 2
+
+int git_config_parse_key(const char *, char **, size_t *);
+
+enum config_scope {
+	CONFIG_SCOPE_UNKNOWN = 0,
+	CONFIG_SCOPE_SYSTEM,
+	CONFIG_SCOPE_GLOBAL,
+	CONFIG_SCOPE_LOCAL,
+	CONFIG_SCOPE_WORKTREE,
+	CONFIG_SCOPE_COMMAND,
+	CONFIG_SCOPE_SUBMODULE,
+};
+const char *config_scope_name(enum config_scope scope);
+
+enum config_origin_type {
+	CONFIG_ORIGIN_UNKNOWN = 0,
+	CONFIG_ORIGIN_BLOB,
+	CONFIG_ORIGIN_FILE,
+	CONFIG_ORIGIN_STDIN,
+	CONFIG_ORIGIN_SUBMODULE_BLOB,
+	CONFIG_ORIGIN_CMDLINE
+};
+
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+struct config_source;
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					struct config_source *cs,
+					void *event_fn_data);
+
+struct config_parse_options {
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
+struct config_source {
+	struct config_source *prev;
+	union {
+		FILE *file;
+		struct config_buf {
+			const char *buf;
+			size_t len;
+			size_t pos;
+		} buf;
+	} u;
+	enum config_origin_type origin_type;
+	const char *name;
+	const char *path;
+	int linenr;
+	int eof;
+	size_t total_len;
+	struct strbuf value;
+	struct strbuf var;
+	unsigned subsection_case_sensitive : 1;
+
+	int (*do_fgetc)(struct config_source *c);
+	int (*do_ungetc)(int c, struct config_source *conf);
+	long (*do_ftell)(struct config_source *c);
+};
+#define CONFIG_SOURCE_INIT { 0 }
+
+/* Config source metadata for a given config key-value pair */
+struct key_value_info {
+	const char *filename;
+	int linenr;
+	enum config_origin_type origin_type;
+	enum config_scope scope;
+	const char *path;
+};
+#define KVI_INIT { \
+	.filename = NULL, \
+	.linenr = -1, \
+	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
+	.scope = CONFIG_SCOPE_UNKNOWN, \
+	.path = NULL, \
+}
+
+/* Captures additional information that a config callback can use. */
+struct config_context {
+	/* Config source metadata for key and value. */
+	const struct key_value_info *kvi;
+};
+#define CONFIG_CONTEXT_INIT { 0 }
+
+/**
+ * A config callback function takes four parameters:
+ *
+ * - the name of the parsed variable. This is in canonical "flat" form: the
+ *   section, subsection, and variable segments will be separated by dots,
+ *   and the section and variable segments will be all lowercase. E.g.,
+ *   `core.ignorecase`, `diff.SomeType.textconv`.
+ *
+ * - the value of the found variable, as a string. If the variable had no
+ *   value specified, the value will be NULL (typically this means it
+ *   should be interpreted as boolean true).
+ *
+ * - the 'config context', that is, additional information about the config
+ *   iteration operation provided by the config machinery. For example, this
+ *   includes information about the config source being parsed (e.g. the
+ *   filename).
+ *
+ * - a void pointer passed in by the caller of the config API; this can
+ *   contain callback-specific data
+ *
+ * A config callback should return 0 for success, or -1 if the variable
+ * could not be parsed properly.
+ */
+typedef int (*config_fn_t)(const char *, const char *,
+			   const struct config_context *, void *);
+
+int git_config_from_file_with_options(config_fn_t fn, const char *,
+				      void *, enum config_scope,
+				      const struct config_parse_options *);
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type,
+			const char *name,
+			const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts);
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts);
+
+#endif /* CONFIG_PARSE_H */
diff --git a/config.c b/config.c
index 40cc9dbc40..787b42c228 100644
--- a/config.c
+++ b/config.c
@@ -42,32 +42,6 @@
 #include "wrapper.h"
 #include "write-or-die.h"
 
-struct config_source {
-	struct config_source *prev;
-	union {
-		FILE *file;
-		struct config_buf {
-			const char *buf;
-			size_t len;
-			size_t pos;
-		} buf;
-	} u;
-	enum config_origin_type origin_type;
-	const char *name;
-	const char *path;
-	int linenr;
-	int eof;
-	size_t total_len;
-	struct strbuf value;
-	struct strbuf var;
-	unsigned subsection_case_sensitive : 1;
-
-	int (*do_fgetc)(struct config_source *c);
-	int (*do_ungetc)(int c, struct config_source *conf);
-	long (*do_ftell)(struct config_source *c);
-};
-#define CONFIG_SOURCE_INIT { 0 }
-
 static int pack_compression_seen;
 static int zlib_compression_seen;
 
@@ -82,47 +56,6 @@ static int zlib_compression_seen;
  */
 static struct config_set protected_config;
 
-static int config_file_fgetc(struct config_source *conf)
-{
-	return getc_unlocked(conf->u.file);
-}
-
-static int config_file_ungetc(int c, struct config_source *conf)
-{
-	return ungetc(c, conf->u.file);
-}
-
-static long config_file_ftell(struct config_source *conf)
-{
-	return ftell(conf->u.file);
-}
-
-
-static int config_buf_fgetc(struct config_source *conf)
-{
-	if (conf->u.buf.pos < conf->u.buf.len)
-		return conf->u.buf.buf[conf->u.buf.pos++];
-
-	return EOF;
-}
-
-static int config_buf_ungetc(int c, struct config_source *conf)
-{
-	if (conf->u.buf.pos > 0) {
-		conf->u.buf.pos--;
-		if (conf->u.buf.buf[conf->u.buf.pos] != c)
-			BUG("config_buf can only ungetc the same character");
-		return c;
-	}
-
-	return EOF;
-}
-
-static long config_buf_ftell(struct config_source *conf)
-{
-	return conf->u.buf.pos;
-}
-
 struct config_include_data {
 	int depth;
 	config_fn_t fn;
@@ -528,80 +461,6 @@ void git_config_push_env(const char *spec)
 	free(key);
 }
 
-static inline int iskeychar(int c)
-{
-	return isalnum(c) || c == '-';
-}
-
-/*
- * Auxiliary function to sanity-check and split the key into the section
- * identifier and variable name.
- *
- * Returns 0 on success, -1 when there is an invalid character in the key and
- * -2 if there is no section name in the key.
- *
- * store_key - pointer to char* which will hold a copy of the key with
- *             lowercase section and variable name
- * baselen - pointer to size_t which will hold the length of the
- *           section + subsection part, can be NULL
- */
-int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
-{
-	size_t i, baselen;
-	int dot;
-	const char *last_dot = strrchr(key, '.');
-
-	/*
-	 * Since "key" actually contains the section name and the real
-	 * key name separated by a dot, we have to know where the dot is.
-	 */
-
-	if (last_dot == NULL || last_dot == key) {
-		error(_("key does not contain a section: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	if (!last_dot[1]) {
-		error(_("key does not contain variable name: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	baselen = last_dot - key;
-	if (baselen_)
-		*baselen_ = baselen;
-
-	/*
-	 * Validate the key and while at it, lower case it for matching.
-	 */
-	*store_key = xmallocz(strlen(key));
-
-	dot = 0;
-	for (i = 0; key[i]; i++) {
-		unsigned char c = key[i];
-		if (c == '.')
-			dot = 1;
-		/* Leave the extended basename untouched.. */
-		if (!dot || i > baselen) {
-			if (!iskeychar(c) ||
-			    (i == baselen + 1 && !isalpha(c))) {
-				error(_("invalid key: %s"), key);
-				goto out_free_ret_1;
-			}
-			c = tolower(c);
-		} else if (c == '\n') {
-			error(_("invalid key (newline): %s"), key);
-			goto out_free_ret_1;
-		}
-		(*store_key)[i] = c;
-	}
-
-	return 0;
-
-out_free_ret_1:
-	FREE_AND_NULL(*store_key);
-	return -CONFIG_INVALID_KEY;
-}
-
 static int config_parse_pair(const char *key, const char *value,
 			     struct key_value_info *kvi,
 			     config_fn_t fn, void *data)
@@ -786,251 +645,6 @@ int git_config_from_parameters(config_fn_t fn, void *data)
 	return ret;
 }
 
-static int get_next_char(struct config_source *cs)
-{
-	int c = cs->do_fgetc(cs);
-
-	if (c == '\r') {
-		/* DOS like systems */
-		c = cs->do_fgetc(cs);
-		if (c != '\n') {
-			if (c != EOF)
-				cs->do_ungetc(c, cs);
-			c = '\r';
-		}
-	}
-
-	if (c != EOF && ++cs->total_len > INT_MAX) {
-		/*
-		 * This is an absurdly long config file; refuse to parse
-		 * further in order to protect downstream code from integer
-		 * overflows. Note that we can't return an error specifically,
-		 * but we can mark EOF and put trash in the return value,
-		 * which will trigger a parse error.
-		 */
-		cs->eof = 1;
-		return 0;
-	}
-
-	if (c == '\n')
-		cs->linenr++;
-	if (c == EOF) {
-		cs->eof = 1;
-		cs->linenr++;
-		c = '\n';
-	}
-	return c;
-}
-
-static char *parse_value(struct config_source *cs)
-{
-	int quote = 0, comment = 0, space = 0;
-
-	strbuf_reset(&cs->value);
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n') {
-			if (quote) {
-				cs->linenr--;
-				return NULL;
-			}
-			return cs->value.buf;
-		}
-		if (comment)
-			continue;
-		if (isspace(c) && !quote) {
-			if (cs->value.len)
-				space++;
-			continue;
-		}
-		if (!quote) {
-			if (c == ';' || c == '#') {
-				comment = 1;
-				continue;
-			}
-		}
-		for (; space; space--)
-			strbuf_addch(&cs->value, ' ');
-		if (c == '\\') {
-			c = get_next_char(cs);
-			switch (c) {
-			case '\n':
-				continue;
-			case 't':
-				c = '\t';
-				break;
-			case 'b':
-				c = '\b';
-				break;
-			case 'n':
-				c = '\n';
-				break;
-			/* Some characters escape as themselves */
-			case '\\': case '"':
-				break;
-			/* Reject unknown escape sequences */
-			default:
-				return NULL;
-			}
-			strbuf_addch(&cs->value, c);
-			continue;
-		}
-		if (c == '"') {
-			quote = 1-quote;
-			continue;
-		}
-		strbuf_addch(&cs->value, c);
-	}
-}
-
-static int get_value(struct config_source *cs, struct key_value_info *kvi,
-		     config_fn_t fn, void *data, struct strbuf *name)
-{
-	int c;
-	char *value;
-	int ret;
-	struct config_context ctx = {
-		.kvi = kvi,
-	};
-
-	/* Get the full name */
-	for (;;) {
-		c = get_next_char(cs);
-		if (cs->eof)
-			break;
-		if (!iskeychar(c))
-			break;
-		strbuf_addch(name, tolower(c));
-	}
-
-	while (c == ' ' || c == '\t')
-		c = get_next_char(cs);
-
-	value = NULL;
-	if (c != '\n') {
-		if (c != '=')
-			return -1;
-		value = parse_value(cs);
-		if (!value)
-			return -1;
-	}
-	/*
-	 * We already consumed the \n, but we need linenr to point to
-	 * the line we just parsed during the call to fn to get
-	 * accurate line number in error messages.
-	 */
-	cs->linenr--;
-	kvi->linenr = cs->linenr;
-	ret = fn(name->buf, value, &ctx, data);
-	if (ret >= 0)
-		cs->linenr++;
-	return ret;
-}
-
-static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
-				 int c)
-{
-	cs->subsection_case_sensitive = 0;
-	do {
-		if (c == '\n')
-			goto error_incomplete_line;
-		c = get_next_char(cs);
-	} while (isspace(c));
-
-	/* We require the format to be '[base "extension"]' */
-	if (c != '"')
-		return -1;
-	strbuf_addch(name, '.');
-
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n')
-			goto error_incomplete_line;
-		if (c == '"')
-			break;
-		if (c == '\\') {
-			c = get_next_char(cs);
-			if (c == '\n')
-				goto error_incomplete_line;
-		}
-		strbuf_addch(name, c);
-	}
-
-	/* Final ']' */
-	if (get_next_char(cs) != ']')
-		return -1;
-	return 0;
-error_incomplete_line:
-	cs->linenr--;
-	return -1;
-}
-
-static int get_base_var(struct config_source *cs, struct strbuf *name)
-{
-	cs->subsection_case_sensitive = 1;
-	for (;;) {
-		int c = get_next_char(cs);
-		if (cs->eof)
-			return -1;
-		if (c == ']')
-			return 0;
-		if (isspace(c))
-			return get_extended_base_var(cs, name, c);
-		if (!iskeychar(c) && c != '.')
-			return -1;
-		strbuf_addch(name, tolower(c));
-	}
-}
-
-struct parse_event_data {
-	enum config_event_t previous_type;
-	size_t previous_offset;
-	const struct config_parse_options *opts;
-};
-
-static int do_event(struct config_source *cs, enum config_event_t type,
-		    struct parse_event_data *data)
-{
-	size_t offset;
-
-	if (!data->opts || !data->opts->event_fn)
-		return 0;
-
-	if (type == CONFIG_EVENT_WHITESPACE &&
-	    data->previous_type == type)
-		return 0;
-
-	offset = cs->do_ftell(cs);
-	/*
-	 * At EOF, the parser always "inserts" an extra '\n', therefore
-	 * the end offset of the event is the current file position, otherwise
-	 * we will already have advanced to the next event.
-	 */
-	if (type != CONFIG_EVENT_EOF)
-		offset--;
-
-	if (data->previous_type != CONFIG_EVENT_EOF &&
-	    data->opts->event_fn(data->previous_type, data->previous_offset,
-				 offset, cs, data->opts->event_fn_data) < 0)
-		return -1;
-
-	data->previous_type = type;
-	data->previous_offset = offset;
-
-	return 0;
-}
-
-static void kvi_from_source(struct config_source *cs,
-			    enum config_scope scope,
-			    struct key_value_info *out)
-{
-	out->filename = strintern(cs->name);
-	out->origin_type = cs->origin_type;
-	out->linenr = cs->linenr;
-	out->scope = scope;
-	out->path = cs->path;
-}
-
 int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 		      size_t end_offset UNUSED, struct config_source *cs,
 		      void *data)
@@ -1081,104 +695,6 @@ int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 	return error_return;
 }
 
-static int git_parse_source(struct config_source *cs, config_fn_t fn,
-			    struct key_value_info *kvi, void *data,
-			    const struct config_parse_options *opts)
-{
-	int comment = 0;
-	size_t baselen = 0;
-	struct strbuf *var = &cs->var;
-
-	/* U+FEFF Byte Order Mark in UTF8 */
-	const char *bomptr = utf8_bom;
-
-	/* For the parser event callback */
-	struct parse_event_data event_data = {
-		CONFIG_EVENT_EOF, 0, opts
-	};
-
-	for (;;) {
-		int c;
-
-		c = get_next_char(cs);
-		if (bomptr && *bomptr) {
-			/* We are at the file beginning; skip UTF8-encoded BOM
-			 * if present. Sane editors won't put this in on their
-			 * own, but e.g. Windows Notepad will do it happily. */
-			if (c == (*bomptr & 0377)) {
-				bomptr++;
-				continue;
-			} else {
-				/* Do not tolerate partial BOM. */
-				if (bomptr != utf8_bom)
-					break;
-				/* No BOM at file beginning. Cool. */
-				bomptr = NULL;
-			}
-		}
-		if (c == '\n') {
-			if (cs->eof) {
-				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
-					return -1;
-				return 0;
-			}
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-				return -1;
-			comment = 0;
-			continue;
-		}
-		if (comment)
-			continue;
-		if (isspace(c)) {
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-					return -1;
-			continue;
-		}
-		if (c == '#' || c == ';') {
-			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
-					return -1;
-			comment = 1;
-			continue;
-		}
-		if (c == '[') {
-			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
-					return -1;
-
-			/* Reset prior to determining a new stem */
-			strbuf_reset(var);
-			if (get_base_var(cs, var) < 0 || var->len < 1)
-				break;
-			strbuf_addch(var, '.');
-			baselen = var->len;
-			continue;
-		}
-		if (!isalpha(c))
-			break;
-
-		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
-			return -1;
-
-		/*
-		 * Truncate the var name back to the section header
-		 * stem prior to grabbing the suffix part of the name
-		 * and the value.
-		 */
-		strbuf_setlen(var, baselen);
-		strbuf_addch(var, tolower(c));
-		if (get_value(cs, kvi, fn, data, var) < 0)
-			break;
-	}
-
-	/*
-	 * FIXME for whatever reason, do_event passes the _previous_ event, so
-	 * in order for our callback to receive the error event, we have to call
-	 * do_event twice
-	 */
-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
-	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
-	return -1;
-}
-
 static uintmax_t get_unit_factor(const char *end)
 {
 	if (!*end)
@@ -1972,83 +1488,6 @@ int git_default_config(const char *var, const char *value,
 	return 0;
 }
 
-/*
- * All source specific fields in the union, die_on_error, name and the callbacks
- * fgetc, ungetc, ftell of top need to be initialized before calling
- * this function.
- */
-static int do_config_from(struct config_source *top, config_fn_t fn,
-			  void *data, enum config_scope scope,
-			  const struct config_parse_options *opts)
-{
-	struct key_value_info kvi = KVI_INIT;
-	int ret;
-
-	/* push config-file parsing state stack */
-	top->linenr = 1;
-	top->eof = 0;
-	top->total_len = 0;
-	strbuf_init(&top->value, 1024);
-	strbuf_init(&top->var, 1024);
-	kvi_from_source(top, scope, &kvi);
-
-	ret = git_parse_source(top, fn, &kvi, data, opts);
-
-	strbuf_release(&top->value);
-	strbuf_release(&top->var);
-
-	return ret;
-}
-
-static int do_config_from_file(config_fn_t fn,
-			       const enum config_origin_type origin_type,
-			       const char *name, const char *path, FILE *f,
-			       void *data, enum config_scope scope,
-			       const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-	int ret;
-
-	top.u.file = f;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = path;
-	top.do_fgetc = config_file_fgetc;
-	top.do_ungetc = config_file_ungetc;
-	top.do_ftell = config_file_ftell;
-
-	flockfile(f);
-	ret = do_config_from(&top, fn, data, scope, opts);
-	funlockfile(f);
-	return ret;
-}
-
-static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope,
-				 const struct config_parse_options *config_opts)
-{
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, config_opts);
-}
-
-int git_config_from_file_with_options(config_fn_t fn, const char *filename,
-				      void *data, enum config_scope scope,
-				      const struct config_parse_options *opts)
-{
-	int ret = -1;
-	FILE *f;
-
-	if (!filename)
-		BUG("filename cannot be NULL");
-	f = fopen_or_warn(filename, "r");
-	if (f) {
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
-					  filename, f, data, scope, opts);
-		fclose(f);
-	}
-	return ret;
-}
-
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
 	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
@@ -2057,27 +1496,6 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type origin_type,
-			const char *name, const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-
-	top.u.buf.buf = buf;
-	top.u.buf.len = len;
-	top.u.buf.pos = 0;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = NULL;
-	top.do_fgetc = config_buf_fgetc;
-	top.do_ungetc = config_buf_ungetc;
-	top.do_ftell = config_buf_ftell;
-
-	return do_config_from(&top, fn, data, scope, opts);
-}
-
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
 			      struct repository *repo,
diff --git a/config.h b/config.h
index 8ad399580f..3bad5e1c32 100644
--- a/config.h
+++ b/config.h
@@ -4,7 +4,7 @@
 #include "hashmap.h"
 #include "string-list.h"
 #include "repository.h"
-
+#include "config-parse.h"
 
 /**
  * The config API gives callers a way to access Git configuration files
@@ -23,9 +23,6 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these negated: */
-#define CONFIG_INVALID_KEY 1
-#define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
 #define CONFIG_NO_LOCK -1
 #define CONFIG_INVALID_FILE 3
@@ -36,17 +33,6 @@ struct object_id;
 
 #define CONFIG_REGEX_NONE ((void *)1)
 
-enum config_scope {
-	CONFIG_SCOPE_UNKNOWN = 0,
-	CONFIG_SCOPE_SYSTEM,
-	CONFIG_SCOPE_GLOBAL,
-	CONFIG_SCOPE_LOCAL,
-	CONFIG_SCOPE_WORKTREE,
-	CONFIG_SCOPE_COMMAND,
-	CONFIG_SCOPE_SUBMODULE,
-};
-const char *config_scope_name(enum config_scope scope);
-
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
@@ -54,46 +40,6 @@ struct git_config_source {
 	enum config_scope scope;
 };
 
-enum config_origin_type {
-	CONFIG_ORIGIN_UNKNOWN = 0,
-	CONFIG_ORIGIN_BLOB,
-	CONFIG_ORIGIN_FILE,
-	CONFIG_ORIGIN_STDIN,
-	CONFIG_ORIGIN_SUBMODULE_BLOB,
-	CONFIG_ORIGIN_CMDLINE
-};
-
-enum config_event_t {
-	CONFIG_EVENT_SECTION,
-	CONFIG_EVENT_ENTRY,
-	CONFIG_EVENT_WHITESPACE,
-	CONFIG_EVENT_COMMENT,
-	CONFIG_EVENT_EOF,
-	CONFIG_EVENT_ERROR
-};
-
-struct config_source;
-/*
- * The parser event function (if not NULL) is called with the event type and
- * the begin/end offsets of the parsed elements.
- *
- * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
- * character is considered part of the element.
- */
-typedef int (*config_parser_event_fn_t)(enum config_event_t type,
-					size_t begin_offset, size_t end_offset,
-					struct config_source *cs,
-					void *event_fn_data);
-
-struct config_parse_options {
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-};
-
 #define CP_OPTS_INIT(error_action) { \
 	.event_fn = git_config_err_fn, \
 	.event_fn_data = (enum config_error_action []){(error_action)}, \
@@ -126,59 +72,8 @@ enum config_error_action {
 int git_config_err_fn(enum config_event_t type, size_t begin_offset,
 		      size_t end_offset, struct config_source *cs,
 		      void *event_fn_data);
-
-/* Config source metadata for a given config key-value pair */
-struct key_value_info {
-	const char *filename;
-	int linenr;
-	enum config_origin_type origin_type;
-	enum config_scope scope;
-	const char *path;
-};
-#define KVI_INIT { \
-	.filename = NULL, \
-	.linenr = -1, \
-	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
-	.scope = CONFIG_SCOPE_UNKNOWN, \
-	.path = NULL, \
-}
-
-/* Captures additional information that a config callback can use. */
-struct config_context {
-	/* Config source metadata for key and value. */
-	const struct key_value_info *kvi;
-};
-#define CONFIG_CONTEXT_INIT { 0 }
-
-/**
- * A config callback function takes four parameters:
- *
- * - the name of the parsed variable. This is in canonical "flat" form: the
- *   section, subsection, and variable segments will be separated by dots,
- *   and the section and variable segments will be all lowercase. E.g.,
- *   `core.ignorecase`, `diff.SomeType.textconv`.
- *
- * - the value of the found variable, as a string. If the variable had no
- *   value specified, the value will be NULL (typically this means it
- *   should be interpreted as boolean true).
- *
- * - the 'config context', that is, additional information about the config
- *   iteration operation provided by the config machinery. For example, this
- *   includes information about the config source being parsed (e.g. the
- *   filename).
- *
- * - a void pointer passed in by the caller of the config API; this can
- *   contain callback-specific data
- *
- * A config callback should return 0 for success, or -1 if the variable
- * could not be parsed properly.
- */
-typedef int (*config_fn_t)(const char *, const char *,
-			   const struct config_context *, void *);
-
 int git_default_config(const char *, const char *,
 		       const struct config_context *, void *);
-
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
@@ -186,16 +81,6 @@ int git_default_config(const char *, const char *,
  * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
-
-int git_config_from_file_with_options(config_fn_t fn, const char *,
-				      void *, enum config_scope,
-				      const struct config_parse_options *);
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type,
-			const char *name,
-			const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
@@ -333,8 +218,6 @@ int repo_config_set_worktree_gently(struct repository *, const char *, const cha
  */
 void git_config_set(const char *, const char *);
 
-int git_config_parse_key(const char *, char **, size_t *);
-
 /*
  * The following macros specify flag bits that alter the behavior
  * of the git_config_set_multivar*() methods.
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/4] config: split out config_parse_options
  2023-08-23 21:53   ` [PATCH v2 1/4] config: split out config_parse_options Josh Steadmon
@ 2023-08-23 23:26     ` Junio C Hamano
  2023-09-21 21:08       ` Josh Steadmon
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2023-08-23 23:26 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz

Josh Steadmon <steadmon@google.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> "struct config_options" is a disjoint set of options options used by the
> config parser (e.g. event listners) and options used by
> config_with_options() (e.g. to handle includes, choose which config
> files to parse).

There is some punctuation missing on the first line.  Perhaps an em-dash
between "options---options" or something like that?

> Split parser-only options into config_parse_options.
>
> Signed-off-by: Glen Choo <chooglen@google.com>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  bundle-uri.c |  2 +-
>  config.c     | 14 +++++++-------
>  config.h     | 37 ++++++++++++++++++++-----------------
>  fsck.c       |  2 +-
>  4 files changed, 29 insertions(+), 26 deletions(-)

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 4b5c49b93d..f93ca6a486 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -237,7 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
>  				   struct bundle_list *list)
>  {
>  	int result;
> -	struct config_options opts = {
> +	struct config_parse_options opts = {
>  		.error_action = CONFIG_ERROR_ERROR,
>  	};

OK, and this one only needs the parse_options half, and presumably
all hunks (other than the one that splits the struct into two in
config.h) are about turning the users of config_options that only
need config_parse_options half.

As we do not see any funny casts in the patch text, compilers should
catch all questionable conversion in this step, if there were any.

OK.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] config: report config parse errors using cb
  2023-08-23 21:53   ` [PATCH v2 2/4] config: report config parse errors using cb Josh Steadmon
@ 2023-08-24  1:19     ` Junio C Hamano
  2023-08-24 17:31       ` Jonathan Tan
  2023-09-21 21:11       ` Josh Steadmon
  0 siblings, 2 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-08-24  1:19 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz

Josh Steadmon <steadmon@google.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> In a subsequent commit, config parsing will become its own library, and
> it's likely that the caller will want flexibility in handling errors
> (instead of being limited to the error handling we have in-tree).

And the in-tree error handling is abstracted out as the
git_config_err_fn() function; in other words, we become the first
client of the library interface, which makes sense.

> @@ -1035,8 +1088,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
>  	int comment = 0;
>  	size_t baselen = 0;
>  	struct strbuf *var = &cs->var;
> ...
> +	/*
> +	 * FIXME for whatever reason, do_event passes the _previous_ event, so
> +	 * in order for our callback to receive the error event, we have to call
> +	 * do_event twice
> +	 */
> +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> +	return -1;
>  }

This indeed is very curious and needs to be looked into before we
proceed further.  How does the current control flow cope with the
behaviour?

> @@ -2322,7 +2342,9 @@ void read_early_config(config_fn_t cb, void *data)
>   */
>  void read_very_early_config(config_fn_t cb, void *data)
>  {
> -	struct config_options opts = { 0 };
> +	struct config_options opts = {
> +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
> +	};
>  
>  	opts.respect_includes = 1;
>  	opts.ignore_repo = 1;

This uses a bit more assignments to various members of opts. to
initialize it, which could have been done with designated
initializer, like the one in read_protected_config() used to do.

> @@ -2760,12 +2784,14 @@ int repo_config_get_pathname(struct repository *repo,
>  static void read_protected_config(void)
>  {
>  	struct config_options opts = {
> -		.respect_includes = 1,
> -		.ignore_repo = 1,
> -		.ignore_worktree = 1,
> -		.system_gently = 1,
> +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
>  	};
>  
> +	opts.respect_includes = 1;
> +	opts.ignore_repo = 1;
> +	opts.ignore_worktree = 1;
> +	opts.system_gently = 1;
> +

It is curious why you want to switch to manual assignment, instead
of keeping the designated initializer for this one.  I would have
expected the initialization in read_very_early_config() to start
using designated initializer to be consistent, instead.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] config: report config parse errors using cb
  2023-08-24  1:19     ` Junio C Hamano
@ 2023-08-24 17:31       ` Jonathan Tan
  2023-08-24 18:48         ` Junio C Hamano
  2023-09-21 21:11       ` Josh Steadmon
  1 sibling, 1 reply; 49+ messages in thread
From: Jonathan Tan @ 2023-08-24 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, Josh Steadmon, git, calvinwan, glencbz

Junio C Hamano <gitster@pobox.com> writes:
> > @@ -1035,8 +1088,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
> >  	int comment = 0;
> >  	size_t baselen = 0;
> >  	struct strbuf *var = &cs->var;
> > ...
> > +	/*
> > +	 * FIXME for whatever reason, do_event passes the _previous_ event, so
> > +	 * in order for our callback to receive the error event, we have to call
> > +	 * do_event twice
> > +	 */
> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> > +	return -1;
> >  }
> 
> This indeed is very curious and needs to be looked into before we
> proceed further.  How does the current control flow cope with the
> behaviour?

I plan to look at this more fully later, but about this, I wrote about
this in a reply to the previous version:
https://lore.kernel.org/git/20230804213457.1174493-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] config: report config parse errors using cb
  2023-08-24 17:31       ` Jonathan Tan
@ 2023-08-24 18:48         ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-08-24 18:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Josh Steadmon, git, calvinwan, glencbz

Jonathan Tan <jonathantanmy@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> > @@ -1035,8 +1088,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
>> >  	int comment = 0;
>> >  	size_t baselen = 0;
>> >  	struct strbuf *var = &cs->var;
>> > ...
>> > +	/*
>> > +	 * FIXME for whatever reason, do_event passes the _previous_ event, so
>> > +	 * in order for our callback to receive the error event, we have to call
>> > +	 * do_event twice
>> > +	 */
>> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
>> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
>> > +	return -1;
>> >  }
>> 
>> This indeed is very curious and needs to be looked into before we
>> proceed further.  How does the current control flow cope with the
>> behaviour?
>
> I plan to look at this more fully later, but about this, I wrote about
> this in a reply to the previous version:
> https://lore.kernel.org/git/20230804213457.1174493-1-jonathantanmy@google.com/

Thanks.

Also if you have time, can you comment on the latest round of the
fix to the Bloom filter hash functions that Taylor sent the latest
iteration for, either to give it a "looks perfect" to nudge it to
merge it down, to give it a further polishing, or whatever.  Thanks.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 0/4] config-parse: create config parsing library
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
                     ` (3 preceding siblings ...)
  2023-08-23 21:53   ` [PATCH v2 4/4] config-parse: split library out of config.[c|h] Josh Steadmon
@ 2023-08-24 20:10   ` Josh Steadmon
  4 siblings, 0 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-08-24 20:10 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

On 2023.08.23 14:53, Josh Steadmon wrote:
> I'll be taking over this series from Glen (thank you for the work so
> far).

BTW, I'm going to be AFK for a couple weeks, so it will be a while
before I'm able to address feedback on this series. Thanks in advance.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/4] config: split out config_parse_options
  2023-08-23 23:26     ` Junio C Hamano
@ 2023-09-21 21:08       ` Josh Steadmon
  0 siblings, 0 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jonathantanmy, calvinwan, glencbz

On 2023.08.23 16:26, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > From: Glen Choo <chooglen@google.com>
> >
> > "struct config_options" is a disjoint set of options options used by the
> > config parser (e.g. event listners) and options used by
> > config_with_options() (e.g. to handle includes, choose which config
> > files to parse).
> 
> There is some punctuation missing on the first line.  Perhaps an em-dash
> between "options---options" or something like that?

Cleaned up this and an additional typo in the description.

> > Split parser-only options into config_parse_options.
> >
> > Signed-off-by: Glen Choo <chooglen@google.com>
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > ---
> >  bundle-uri.c |  2 +-
> >  config.c     | 14 +++++++-------
> >  config.h     | 37 ++++++++++++++++++++-----------------
> >  fsck.c       |  2 +-
> >  4 files changed, 29 insertions(+), 26 deletions(-)
> 
> > diff --git a/bundle-uri.c b/bundle-uri.c
> > index 4b5c49b93d..f93ca6a486 100644
> > --- a/bundle-uri.c
> > +++ b/bundle-uri.c
> > @@ -237,7 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
> >  				   struct bundle_list *list)
> >  {
> >  	int result;
> > -	struct config_options opts = {
> > +	struct config_parse_options opts = {
> >  		.error_action = CONFIG_ERROR_ERROR,
> >  	};
> 
> OK, and this one only needs the parse_options half, and presumably
> all hunks (other than the one that splits the struct into two in
> config.h) are about turning the users of config_options that only
> need config_parse_options half.
> 
> As we do not see any funny casts in the patch text, compilers should
> catch all questionable conversion in this step, if there were any.
> 
> OK.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] config: report config parse errors using cb
  2023-08-24  1:19     ` Junio C Hamano
  2023-08-24 17:31       ` Jonathan Tan
@ 2023-09-21 21:11       ` Josh Steadmon
  2023-09-21 23:36         ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jonathantanmy, calvinwan, glencbz

On 2023.08.23 18:19, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> > From: Glen Choo <chooglen@google.com>
> >
> > In a subsequent commit, config parsing will become its own library, and
> > it's likely that the caller will want flexibility in handling errors
> > (instead of being limited to the error handling we have in-tree).
> 
> And the in-tree error handling is abstracted out as the
> git_config_err_fn() function; in other words, we become the first
> client of the library interface, which makes sense.
> 
> > @@ -1035,8 +1088,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
> >  	int comment = 0;
> >  	size_t baselen = 0;
> >  	struct strbuf *var = &cs->var;
> > ...
> > +	/*
> > +	 * FIXME for whatever reason, do_event passes the _previous_ event, so
> > +	 * in order for our callback to receive the error event, we have to call
> > +	 * do_event twice
> > +	 */
> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> > +	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
> > +	return -1;
> >  }
> 
> This indeed is very curious and needs to be looked into before we
> proceed further.  How does the current control flow cope with the
> behaviour?

As Jonathan Tan mentioned in [1], on calling do_event() we set the start
offset of the new event, and execute the callback for the previous event
whose end offset we now know.

I refactored this into "start_event()" and "flush_event()" functions as
suggested, and added a new "do_event_and_flush()" function for the case
where we want to immediately execute a callback for an event.

[1]: https://lore.kernel.org/git/20230804213457.1174493-1-jonathantanmy@google.com/

> > @@ -2322,7 +2342,9 @@ void read_early_config(config_fn_t cb, void *data)
> >   */
> >  void read_very_early_config(config_fn_t cb, void *data)
> >  {
> > -	struct config_options opts = { 0 };
> > +	struct config_options opts = {
> > +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
> > +	};
> >  
> >  	opts.respect_includes = 1;
> >  	opts.ignore_repo = 1;
> 
> This uses a bit more assignments to various members of opts. to
> initialize it, which could have been done with designated
> initializer, like the one in read_protected_config() used to do.
> 
> > @@ -2760,12 +2784,14 @@ int repo_config_get_pathname(struct repository *repo,
> >  static void read_protected_config(void)
> >  {
> >  	struct config_options opts = {
> > -		.respect_includes = 1,
> > -		.ignore_repo = 1,
> > -		.ignore_worktree = 1,
> > -		.system_gently = 1,
> > +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
> >  	};
> >  
> > +	opts.respect_includes = 1;
> > +	opts.ignore_repo = 1;
> > +	opts.ignore_worktree = 1;
> > +	opts.system_gently = 1;
> > +
> 
> It is curious why you want to switch to manual assignment, instead
> of keeping the designated initializer for this one.  I would have
> expected the initialization in read_very_early_config() to start
> using designated initializer to be consistent, instead.
> 
> Thanks.

Agreed, fixed here and above.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 0/5] config-parse: create config parsing library
  2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
@ 2023-09-21 21:17 ` Josh Steadmon
  2023-09-21 21:17   ` [PATCH v3 1/5] config: split out config_parse_options Josh Steadmon
                     ` (5 more replies)
  4 siblings, 6 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

Config parsing no longer uses global state as of gc/config-context, so the
natural next step for libification is to turn that into its own library.
This series starts that process by moving config parsing into
config-parse.[c|h] so that other programs can include this functionality
without pulling in all of config.[c|h].

Open questions:
- How do folks feel about the do_event() refactor in patches 2 & 3?

Changes since v2:
- Added patch 2/5 to refactor do_event() into start_event() and
  flush_event().
- In patch 3/5, we can now add do_event_and_flush() to immediately run
  an event callback, rather than having to do_event() twice in a row.

Changes since v1.5:
- Dropped patch 1/5: config: return positive from git_config_parse_key()


Glen Choo (4):
  config: split out config_parse_options
  config: report config parse errors using cb
  config.c: accept config_parse_options in git_config_from_stdin
  config-parse: split library out of config.[c|h]

Josh Steadmon (1):
  config: split do_event() into start and flush operations

 Makefile           |   1 +
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config-parse.c     | 601 +++++++++++++++++++++++++++++++++++++++++
 config-parse.h     | 155 +++++++++++
 config.c           | 658 ++++-----------------------------------------
 config.h           | 134 +--------
 fsck.c             |   4 +-
 submodule-config.c |   9 +-
 9 files changed, 836 insertions(+), 734 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

Range-diff against v2:
1:  5c676fbac3 ! 1:  fa55b7836f config: split out config_parse_options
    @@ Metadata
      ## Commit message ##
         config: split out config_parse_options
     
    -    "struct config_options" is a disjoint set of options options used by the
    -    config parser (e.g. event listners) and options used by
    -    config_with_options() (e.g. to handle includes, choose which config
    -    files to parse). Split parser-only options into config_parse_options.
    +    "struct config_options" is a disjoint set of options used by the config
    +    parser (e.g. event listeners) and options used by config_with_options()
    +    (e.g. to handle includes, choose which config files to parse). Split
    +    parser-only options into config_parse_options.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## bundle-uri.c ##
-:  ---------- > 2:  8a1463c223 config: split do_event() into start and flush operations
2:  cb92a1f2e3 ! 3:  a888045c04 config: report config parse errors using cb
    @@ Commit message
         CONFIG_ERROR_UNSET and the config_source 'default', since all callers
         are now expected to specify the error handling they want.
     
    +    Add a new "do_event_and_flush" function for running event callbacks
    +    immediately, where the event does not need to calculate an end offset.
    +
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## builtin/config.c ##
    @@ config.c: static int add_remote_url(const char *var, const char *value,
      
      	opts = *inc->opts;
      	opts.unconditional_remote_url = 1;
    +@@ config.c: static int do_event(struct config_source *cs, enum config_event_t type,
    + 	return 0;
    + }
    + 
    ++static int do_event_and_flush(struct config_source *cs,
    ++			      enum config_event_t type,
    ++			      struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	/*
    ++	 * Not actually EOF, but this indicates we don't have a valid event
    ++	 * to flush next time around.
    ++	 */
    ++	data->previous_type = CONFIG_EVENT_EOF;
    ++
    ++	return 0;
    ++}
    ++
    + static void kvi_from_source(struct config_source *cs,
    + 			    enum config_scope scope,
    + 			    struct key_value_info *out)
     @@ config.c: static void kvi_from_source(struct config_source *cs,
      	out->path = cs->path;
      }
    @@ config.c: static int git_parse_source(struct config_source *cs, config_fn_t fn,
     -
     -	free(error_msg);
     -	return error_return;
    -+	/*
    -+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    -+	 * in order for our callback to receive the error event, we have to call
    -+	 * do_event twice
    -+	 */
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     +	return -1;
      }
      
    @@ config.c: void read_early_config(config_fn_t cb, void *data)
      void read_very_early_config(config_fn_t cb, void *data)
      {
     -	struct config_options opts = { 0 };
    +-
    +-	opts.respect_includes = 1;
    +-	opts.ignore_repo = 1;
    +-	opts.ignore_worktree = 1;
    +-	opts.ignore_cmdline = 1;
    +-	opts.system_gently = 1;
     +	struct config_options opts = {
    ++		.respect_includes = 1,
    ++		.ignore_repo = 1,
    ++		.ignore_worktree = 1,
    ++		.ignore_cmdline = 1,
    ++		.system_gently = 1,
     +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
     +	};
      
    - 	opts.respect_includes = 1;
    - 	opts.ignore_repo = 1;
    + 	config_with_options(cb, data, NULL, NULL, &opts);
    + }
     @@ config.c: int git_configset_get_pathname(struct config_set *set, const char *key, const ch
      /* Functions use to read configuration from a repository */
      static void repo_read_config(struct repository *repo)
    @@ config.c: int git_configset_get_pathname(struct config_set *set, const char *key
      
      	opts.respect_includes = 1;
      	opts.commondir = repo->commondir;
    -@@ config.c: int repo_config_get_pathname(struct repository *repo,
    - static void read_protected_config(void)
    - {
    - 	struct config_options opts = {
    --		.respect_includes = 1,
    --		.ignore_repo = 1,
    --		.ignore_worktree = 1,
    --		.system_gently = 1,
    +@@ config.c: static void read_protected_config(void)
    + 		.ignore_repo = 1,
    + 		.ignore_worktree = 1,
    + 		.system_gently = 1,
     +		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
      	};
      
    -+	opts.respect_includes = 1;
    -+	opts.ignore_repo = 1;
    -+	opts.ignore_worktree = 1;
    -+	opts.system_gently = 1;
     +
      	git_configset_init(&protected_config);
      	config_with_options(config_set_callback, &protected_config, NULL,
3:  e034d0780c = 4:  49d4b64991 config.c: accept config_parse_options in git_config_from_stdin
4:  74c5dcd5a2 ! 5:  e59ca992d0 config-parse: split library out of config.[c|h]
    @@ config-parse.c (new)
     +	const struct config_parse_options *opts;
     +};
     +
    -+static int do_event(struct config_source *cs, enum config_event_t type,
    -+		    struct parse_event_data *data)
    ++static size_t get_corrected_offset(struct config_source *cs,
    ++				   enum config_event_t type)
     +{
    -+	size_t offset;
    -+
    -+	if (!data->opts || !data->opts->event_fn)
    -+		return 0;
    ++	size_t offset = cs->do_ftell(cs);
     +
    -+	if (type == CONFIG_EVENT_WHITESPACE &&
    -+	    data->previous_type == type)
    -+		return 0;
    -+
    -+	offset = cs->do_ftell(cs);
     +	/*
     +	 * At EOF, the parser always "inserts" an extra '\n', therefore
     +	 * the end offset of the event is the current file position, otherwise
    @@ config-parse.c (new)
     +	 */
     +	if (type != CONFIG_EVENT_EOF)
     +		offset--;
    ++	return offset;
    ++}
    ++
    ++static void start_event(struct config_source *cs, enum config_event_t type,
    ++		       struct parse_event_data *data)
    ++{
    ++	data->previous_type = type;
    ++	data->previous_offset = get_corrected_offset(cs, type);
    ++}
    ++
    ++static int flush_event(struct config_source *cs, enum config_event_t type,
    ++		       struct parse_event_data *data)
    ++{
    ++	if (!data->opts || !data->opts->event_fn)
    ++		return 0;
    ++
    ++	if (type == CONFIG_EVENT_WHITESPACE &&
    ++	    data->previous_type == type)
    ++		return 0;
     +
     +	if (data->previous_type != CONFIG_EVENT_EOF &&
     +	    data->opts->event_fn(data->previous_type, data->previous_offset,
    -+				 offset, cs, data->opts->event_fn_data) < 0)
    ++				 get_corrected_offset(cs, type), cs,
    ++				 data->opts->event_fn_data) < 0)
     +		return -1;
     +
    -+	data->previous_type = type;
    -+	data->previous_offset = offset;
    ++	return 1;
    ++}
    ++
    ++static int do_event(struct config_source *cs, enum config_event_t type,
    ++		    struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	return 0;
    ++}
    ++
    ++static int do_event_and_flush(struct config_source *cs,
    ++			      enum config_event_t type,
    ++			      struct parse_event_data *data)
    ++{
    ++	int maybe_ret;
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	start_event(cs, type, data);
    ++
    ++	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    ++		return maybe_ret;
    ++
    ++	/*
    ++	 * Not actually EOF, but this indicates we don't have a valid event
    ++	 * to flush next time around.
    ++	 */
    ++	data->previous_type = CONFIG_EVENT_EOF;
     +
     +	return 0;
     +}
    @@ config-parse.c (new)
     +		if (get_value(cs, kvi, fn, data, var) < 0)
     +			break;
     +	}
    -+	/*
    -+	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    -+	 * in order for our callback to receive the error event, we have to call
    -+	 * do_event twice
    -+	 */
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    -+	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    ++
    ++	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     +	return -1;
     +}
     +
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -	const struct config_parse_options *opts;
     -};
     -
    --static int do_event(struct config_source *cs, enum config_event_t type,
    --		    struct parse_event_data *data)
    +-static size_t get_corrected_offset(struct config_source *cs,
    +-				   enum config_event_t type)
     -{
    --	size_t offset;
    --
    --	if (!data->opts || !data->opts->event_fn)
    --		return 0;
    +-	size_t offset = cs->do_ftell(cs);
     -
    --	if (type == CONFIG_EVENT_WHITESPACE &&
    --	    data->previous_type == type)
    --		return 0;
    --
    --	offset = cs->do_ftell(cs);
     -	/*
     -	 * At EOF, the parser always "inserts" an extra '\n', therefore
     -	 * the end offset of the event is the current file position, otherwise
    @@ config.c: int git_config_from_parameters(config_fn_t fn, void *data)
     -	 */
     -	if (type != CONFIG_EVENT_EOF)
     -		offset--;
    +-	return offset;
    +-}
    +-
    +-static void start_event(struct config_source *cs, enum config_event_t type,
    +-		       struct parse_event_data *data)
    +-{
    +-	data->previous_type = type;
    +-	data->previous_offset = get_corrected_offset(cs, type);
    +-}
    +-
    +-static int flush_event(struct config_source *cs, enum config_event_t type,
    +-		       struct parse_event_data *data)
    +-{
    +-	if (!data->opts || !data->opts->event_fn)
    +-		return 0;
    +-
    +-	if (type == CONFIG_EVENT_WHITESPACE &&
    +-	    data->previous_type == type)
    +-		return 0;
     -
     -	if (data->previous_type != CONFIG_EVENT_EOF &&
     -	    data->opts->event_fn(data->previous_type, data->previous_offset,
    --				 offset, cs, data->opts->event_fn_data) < 0)
    +-				 get_corrected_offset(cs, type), cs,
    +-				 data->opts->event_fn_data) < 0)
     -		return -1;
     -
    --	data->previous_type = type;
    --	data->previous_offset = offset;
    +-	return 1;
    +-}
    +-
    +-static int do_event(struct config_source *cs, enum config_event_t type,
    +-		    struct parse_event_data *data)
    +-{
    +-	int maybe_ret;
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	start_event(cs, type, data);
    +-
    +-	return 0;
    +-}
    +-
    +-static int do_event_and_flush(struct config_source *cs,
    +-			      enum config_event_t type,
    +-			      struct parse_event_data *data)
    +-{
    +-	int maybe_ret;
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	start_event(cs, type, data);
    +-
    +-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
    +-		return maybe_ret;
    +-
    +-	/*
    +-	 * Not actually EOF, but this indicates we don't have a valid event
    +-	 * to flush next time around.
    +-	 */
    +-	data->previous_type = CONFIG_EVENT_EOF;
     -
     -	return 0;
     -}
    @@ config.c: int git_config_err_fn(enum config_event_t type, size_t begin_offset UN
     -			break;
     -	}
     -
    --	/*
    --	 * FIXME for whatever reason, do_event passes the _previous_ event, so
    --	 * in order for our callback to receive the error event, we have to call
    --	 * do_event twice
    --	 */
    --	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    --	do_event(cs, CONFIG_EVENT_ERROR, &event_data);
    +-	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
     -	return -1;
     -}
     -

base-commit: aa9166bcc0ba654fc21f198a30647ec087f733ed
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/5] config: split out config_parse_options
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
@ 2023-09-21 21:17   ` Josh Steadmon
  2023-10-23 17:52     ` Jonathan Tan
  2023-09-21 21:17   ` [PATCH v3 2/5] config: split do_event() into start and flush operations Josh Steadmon
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

"struct config_options" is a disjoint set of options used by the config
parser (e.g. event listeners) and options used by config_with_options()
(e.g. to handle includes, choose which config files to parse). Split
parser-only options into config_parse_options.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 bundle-uri.c |  2 +-
 config.c     | 14 +++++++-------
 config.h     | 37 ++++++++++++++++++++-----------------
 fsck.c       |  2 +-
 4 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4b5c49b93d..f93ca6a486 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,7 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_options opts = {
+	struct config_parse_options opts = {
 		.error_action = CONFIG_ERROR_ERROR,
 	};
 
diff --git a/config.c b/config.c
index 85c5f35132..1518f70fc2 100644
--- a/config.c
+++ b/config.c
@@ -982,7 +982,7 @@ static int get_base_var(struct config_source *cs, struct strbuf *name)
 struct parse_event_data {
 	enum config_event_t previous_type;
 	size_t previous_offset;
-	const struct config_options *opts;
+	const struct config_parse_options *opts;
 };
 
 static int do_event(struct config_source *cs, enum config_event_t type,
@@ -1030,7 +1030,7 @@ static void kvi_from_source(struct config_source *cs,
 
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
-			    const struct config_options *opts)
+			    const struct config_parse_options *opts)
 {
 	int comment = 0;
 	size_t baselen = 0;
@@ -1967,7 +1967,7 @@ int git_default_config(const char *var, const char *value,
  */
 static int do_config_from(struct config_source *top, config_fn_t fn,
 			  void *data, enum config_scope scope,
-			  const struct config_options *opts)
+			  const struct config_parse_options *opts)
 {
 	struct key_value_info kvi = KVI_INIT;
 	int ret;
@@ -1992,7 +1992,7 @@ static int do_config_from_file(config_fn_t fn,
 			       const enum config_origin_type origin_type,
 			       const char *name, const char *path, FILE *f,
 			       void *data, enum config_scope scope,
-			       const struct config_options *opts)
+			       const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 	int ret;
@@ -2021,7 +2021,7 @@ static int git_config_from_stdin(config_fn_t fn, void *data,
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 				      void *data, enum config_scope scope,
-				      const struct config_options *opts)
+				      const struct config_parse_options *opts)
 {
 	int ret = -1;
 	FILE *f;
@@ -2047,7 +2047,7 @@ int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type origin_type,
 			const char *name, const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts)
+			const struct config_parse_options *opts)
 {
 	struct config_source top = CONFIG_SOURCE_INIT;
 
@@ -3380,7 +3380,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_options opts;
+		struct config_parse_options opts;
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
diff --git a/config.h b/config.h
index 6332d74904..2537516446 100644
--- a/config.h
+++ b/config.h
@@ -85,6 +85,21 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					struct config_source *cs,
 					void *event_fn_data);
 
+struct config_parse_options {
+	enum config_error_action {
+		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
+		CONFIG_ERROR_DIE, /* die() on error */
+		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+		CONFIG_ERROR_SILENT, /* return -1 */
+	} error_action;
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -92,6 +107,9 @@ struct config_options {
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
 
+	const char *commondir;
+	const char *git_dir;
+	struct config_parse_options parse_options;
 	/*
 	 * For internal use. Include all includeif.hasremoteurl paths without
 	 * checking if the repo has that remote URL, and when doing so, verify
@@ -99,21 +117,6 @@ struct config_options {
 	 * themselves.
 	 */
 	unsigned int unconditional_remote_url : 1;
-
-	const char *commondir;
-	const char *git_dir;
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 };
 
 /* Config source metadata for a given config key-value pair */
@@ -178,13 +181,13 @@ int git_config_from_file(config_fn_t fn, const char *, void *);
 
 int git_config_from_file_with_options(config_fn_t fn, const char *,
 				      void *, enum config_scope,
-				      const struct config_options *);
+				      const struct config_parse_options *);
 int git_config_from_mem(config_fn_t fn,
 			const enum config_origin_type,
 			const char *name,
 			const char *buf, size_t len,
 			void *data, enum config_scope scope,
-			const struct config_options *opts);
+			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
diff --git a/fsck.c b/fsck.c
index 3be86616c5..522ee1c18a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,7 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_options config_opts = { 0 };
+		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/5] config: split do_event() into start and flush operations
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
  2023-09-21 21:17   ` [PATCH v3 1/5] config: split out config_parse_options Josh Steadmon
@ 2023-09-21 21:17   ` Josh Steadmon
  2023-10-23 18:05     ` Jonathan Tan
  2023-09-21 21:17   ` [PATCH v3 3/5] config: report config parse errors using cb Josh Steadmon
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

When handling config-parsing events, the current do_event() handler is a
bit confusing; calling it with a specific event type records the initial
offset where the event occurred, and runs the supplied callback against
the previous event (whose end offset is now known).

Split this operation into "start_event" and "flush_event" functions.
Then reimplement "do_event" (preserving the original behavior) using the
newly split functions.

In a later change, we can use these building blocks to also handle
"immediate" events, where we want to run the callback without having to
calculate an end offset for the event.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 config.c | 50 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/config.c b/config.c
index 1518f70fc2..ff138500a2 100644
--- a/config.c
+++ b/config.c
@@ -985,19 +985,11 @@ struct parse_event_data {
 	const struct config_parse_options *opts;
 };
 
-static int do_event(struct config_source *cs, enum config_event_t type,
-		    struct parse_event_data *data)
+static size_t get_corrected_offset(struct config_source *cs,
+				   enum config_event_t type)
 {
-	size_t offset;
-
-	if (!data->opts || !data->opts->event_fn)
-		return 0;
-
-	if (type == CONFIG_EVENT_WHITESPACE &&
-	    data->previous_type == type)
-		return 0;
+	size_t offset = cs->do_ftell(cs);
 
-	offset = cs->do_ftell(cs);
 	/*
 	 * At EOF, the parser always "inserts" an extra '\n', therefore
 	 * the end offset of the event is the current file position, otherwise
@@ -1005,14 +997,44 @@ static int do_event(struct config_source *cs, enum config_event_t type,
 	 */
 	if (type != CONFIG_EVENT_EOF)
 		offset--;
+	return offset;
+}
+
+static void start_event(struct config_source *cs, enum config_event_t type,
+		       struct parse_event_data *data)
+{
+	data->previous_type = type;
+	data->previous_offset = get_corrected_offset(cs, type);
+}
+
+static int flush_event(struct config_source *cs, enum config_event_t type,
+		       struct parse_event_data *data)
+{
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
 
 	if (data->previous_type != CONFIG_EVENT_EOF &&
 	    data->opts->event_fn(data->previous_type, data->previous_offset,
-				 offset, cs, data->opts->event_fn_data) < 0)
+				 get_corrected_offset(cs, type), cs,
+				 data->opts->event_fn_data) < 0)
 		return -1;
 
-	data->previous_type = type;
-	data->previous_offset = offset;
+	return 1;
+}
+
+static int do_event(struct config_source *cs, enum config_event_t type,
+		    struct parse_event_data *data)
+{
+	int maybe_ret;
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	start_event(cs, type, data);
 
 	return 0;
 }
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/5] config: report config parse errors using cb
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
  2023-09-21 21:17   ` [PATCH v3 1/5] config: split out config_parse_options Josh Steadmon
  2023-09-21 21:17   ` [PATCH v3 2/5] config: split do_event() into start and flush operations Josh Steadmon
@ 2023-09-21 21:17   ` Josh Steadmon
  2023-10-23 18:41     ` Jonathan Tan
  2023-10-23 19:29     ` Taylor Blau
  2023-09-21 21:17   ` [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

In a subsequent commit, config parsing will become its own library, and
it's likely that the caller will want flexibility in handling errors
(instead of being limited to the error handling we have in-tree).

Move the Git-specific error handling into a config_parser_event_fn_t
that responds to config errors, and make git_parse_source() always
return -1 (careful inspection shows that it was always returning -1
already). This makes CONFIG_ERROR_SILENT obsolete since that is
equivalent to not specifying an error event listener. Also, remove
CONFIG_ERROR_UNSET and the config_source 'default', since all callers
are now expected to specify the error handling they want.

Add a new "do_event_and_flush" function for running event callbacks
immediately, where the event does not need to calculate an end offset.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 builtin/config.c   |   4 +-
 bundle-uri.c       |   4 +-
 config.c           | 195 ++++++++++++++++++++++++++++-----------------
 config.h           |  20 +++--
 fsck.c             |   4 +-
 submodule-config.c |   9 ++-
 6 files changed, 147 insertions(+), 89 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 1c75cbc43d..e2cf49de7a 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -42,7 +42,9 @@ static int actions, type;
 static char *default_value;
 static int end_nul;
 static int respect_includes_opt = -1;
-static struct config_options config_options;
+static struct config_options config_options = {
+	.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE)
+};
 static int show_origin;
 static int show_scope;
 static int fixed_value;
diff --git a/bundle-uri.c b/bundle-uri.c
index f93ca6a486..856bffdcad 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
 				   struct bundle_list *list)
 {
 	int result;
-	struct config_parse_options opts = {
-		.error_action = CONFIG_ERROR_ERROR,
-	};
+	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	if (!list->baseURI) {
 		struct strbuf baseURI = STRBUF_INIT;
diff --git a/config.c b/config.c
index ff138500a2..0c4f1a2874 100644
--- a/config.c
+++ b/config.c
@@ -55,7 +55,6 @@ struct config_source {
 	enum config_origin_type origin_type;
 	const char *name;
 	const char *path;
-	enum config_error_action default_error_action;
 	int linenr;
 	int eof;
 	size_t total_len;
@@ -185,13 +184,15 @@ static int handle_path_include(const struct key_value_info *kvi,
 	}
 
 	if (!access_or_die(path, R_OK, 0)) {
+		struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 		if (++inc->depth > MAX_INCLUDE_DEPTH)
 			die(_(include_depth_advice), MAX_INCLUDE_DEPTH, path,
 			    !kvi ? "<unknown>" :
 			    kvi->filename ? kvi->filename :
 			    "the command line");
 		ret = git_config_from_file_with_options(git_config_include, path, inc,
-							kvi->scope, NULL);
+							kvi->scope, &config_opts);
 		inc->depth--;
 	}
 cleanup:
@@ -339,7 +340,9 @@ static int add_remote_url(const char *var, const char *value,
 
 static void populate_remote_urls(struct config_include_data *inc)
 {
-	struct config_options opts;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts = *inc->opts;
 	opts.unconditional_remote_url = 1;
@@ -1039,6 +1042,29 @@ static int do_event(struct config_source *cs, enum config_event_t type,
 	return 0;
 }
 
+static int do_event_and_flush(struct config_source *cs,
+			      enum config_event_t type,
+			      struct parse_event_data *data)
+{
+	int maybe_ret;
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	start_event(cs, type, data);
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	/*
+	 * Not actually EOF, but this indicates we don't have a valid event
+	 * to flush next time around.
+	 */
+	data->previous_type = CONFIG_EVENT_EOF;
+
+	return 0;
+}
+
 static void kvi_from_source(struct config_source *cs,
 			    enum config_scope scope,
 			    struct key_value_info *out)
@@ -1050,6 +1076,56 @@ static void kvi_from_source(struct config_source *cs,
 	out->path = cs->path;
 }
 
+int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
+		      size_t end_offset UNUSED, struct config_source *cs,
+		      void *data)
+{
+	char *error_msg = NULL;
+	int error_return = 0;
+	enum config_error_action *action = data;
+
+	if (type != CONFIG_EVENT_ERROR)
+		return 0;
+
+	switch (cs->origin_type) {
+	case CONFIG_ORIGIN_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in blob %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_FILE:
+		error_msg = xstrfmt(_("bad config line %d in file %s"),
+				      cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_STDIN:
+		error_msg = xstrfmt(_("bad config line %d in standard input"),
+				      cs->linenr);
+		break;
+	case CONFIG_ORIGIN_SUBMODULE_BLOB:
+		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
+				       cs->linenr, cs->name);
+		break;
+	case CONFIG_ORIGIN_CMDLINE:
+		error_msg = xstrfmt(_("bad config line %d in command line %s"),
+				       cs->linenr, cs->name);
+		break;
+	default:
+		error_msg = xstrfmt(_("bad config line %d in %s"),
+				      cs->linenr, cs->name);
+	}
+
+	switch (*action) {
+	case CONFIG_ERROR_DIE:
+		die("%s", error_msg);
+		break;
+	case CONFIG_ERROR_ERROR:
+		error_return = error("%s", error_msg);
+		break;
+	}
+
+	free(error_msg);
+	return error_return;
+}
+
 static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			    struct key_value_info *kvi, void *data,
 			    const struct config_parse_options *opts)
@@ -1057,8 +1133,6 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 	int comment = 0;
 	size_t baselen = 0;
 	struct strbuf *var = &cs->var;
-	int error_return = 0;
-	char *error_msg = NULL;
 
 	/* U+FEFF Byte Order Mark in UTF8 */
 	const char *bomptr = utf8_bom;
@@ -1140,53 +1214,8 @@ static int git_parse_source(struct config_source *cs, config_fn_t fn,
 			break;
 	}
 
-	if (do_event(cs, CONFIG_EVENT_ERROR, &event_data) < 0)
-		return -1;
-
-	switch (cs->origin_type) {
-	case CONFIG_ORIGIN_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in blob %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_FILE:
-		error_msg = xstrfmt(_("bad config line %d in file %s"),
-				      cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_STDIN:
-		error_msg = xstrfmt(_("bad config line %d in standard input"),
-				      cs->linenr);
-		break;
-	case CONFIG_ORIGIN_SUBMODULE_BLOB:
-		error_msg = xstrfmt(_("bad config line %d in submodule-blob %s"),
-				       cs->linenr, cs->name);
-		break;
-	case CONFIG_ORIGIN_CMDLINE:
-		error_msg = xstrfmt(_("bad config line %d in command line %s"),
-				       cs->linenr, cs->name);
-		break;
-	default:
-		error_msg = xstrfmt(_("bad config line %d in %s"),
-				      cs->linenr, cs->name);
-	}
-
-	switch (opts && opts->error_action ?
-		opts->error_action :
-		cs->default_error_action) {
-	case CONFIG_ERROR_DIE:
-		die("%s", error_msg);
-		break;
-	case CONFIG_ERROR_ERROR:
-		error_return = error("%s", error_msg);
-		break;
-	case CONFIG_ERROR_SILENT:
-		error_return = -1;
-		break;
-	case CONFIG_ERROR_UNSET:
-		BUG("config error action unset");
-	}
-
-	free(error_msg);
-	return error_return;
+	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
 }
 
 static uintmax_t get_unit_factor(const char *end)
@@ -2023,7 +2052,6 @@ static int do_config_from_file(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = path;
-	top.default_error_action = CONFIG_ERROR_DIE;
 	top.do_fgetc = config_file_fgetc;
 	top.do_ungetc = config_file_ungetc;
 	top.do_ftell = config_file_ftell;
@@ -2037,8 +2065,10 @@ static int do_config_from_file(config_fn_t fn,
 static int git_config_from_stdin(config_fn_t fn, void *data,
 				 enum config_scope scope)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, NULL);
+				   data, scope, &config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2061,8 +2091,10 @@ int git_config_from_file_with_options(config_fn_t fn, const char *filename,
 
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
+
 	return git_config_from_file_with_options(fn, filename, data,
-						 CONFIG_SCOPE_UNKNOWN, NULL);
+						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
 int git_config_from_mem(config_fn_t fn,
@@ -2079,7 +2111,6 @@ int git_config_from_mem(config_fn_t fn,
 	top.origin_type = origin_type;
 	top.name = name;
 	top.path = NULL;
-	top.default_error_action = CONFIG_ERROR_ERROR;
 	top.do_fgetc = config_buf_fgetc;
 	top.do_ungetc = config_buf_ungetc;
 	top.do_ftell = config_buf_ftell;
@@ -2098,6 +2129,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	char *buf;
 	unsigned long size;
 	int ret;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
 
 	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
@@ -2108,7 +2140,7 @@ int git_config_from_blob_oid(config_fn_t fn,
 	}
 
 	ret = git_config_from_mem(fn, CONFIG_ORIGIN_BLOB, name, buf, size,
-				  data, scope, NULL);
+				  data, scope, &config_opts);
 	free(buf);
 
 	return ret;
@@ -2209,29 +2241,32 @@ static int do_git_config_sequence(const struct config_options *opts,
 			   opts->system_gently ? ACCESS_EACCES_OK : 0))
 		ret += git_config_from_file_with_options(fn, system_config,
 							 data, CONFIG_SCOPE_SYSTEM,
-							 NULL);
+							 &opts->parse_options);
 
 	git_global_config(&user_config, &xdg_config);
 
 	if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, xdg_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (user_config && !access_or_die(user_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, user_config, data,
-							 CONFIG_SCOPE_GLOBAL, NULL);
+							 CONFIG_SCOPE_GLOBAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_repo && repo_config &&
 	    !access_or_die(repo_config, R_OK, 0))
 		ret += git_config_from_file_with_options(fn, repo_config, data,
-							 CONFIG_SCOPE_LOCAL, NULL);
+							 CONFIG_SCOPE_LOCAL,
+							 &opts->parse_options);
 
 	if (!opts->ignore_worktree && worktree_config &&
 	    repo && repo->repository_format_worktree_config &&
 	    !access_or_die(worktree_config, R_OK, 0)) {
 			ret += git_config_from_file_with_options(fn, worktree_config, data,
 								 CONFIG_SCOPE_WORKTREE,
-								 NULL);
+								 &opts->parse_options);
 	}
 
 	if (!opts->ignore_cmdline && git_config_from_parameters(fn, data) < 0)
@@ -2272,7 +2307,7 @@ int config_with_options(config_fn_t fn, void *data,
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-							NULL);
+							&opts->parse_options);
 	} else if (config_source && config_source->blob) {
 		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 					       data, config_source->scope);
@@ -2310,9 +2345,11 @@ static void configset_iter(struct config_set *set, config_fn_t fn, void *data)
 
 void read_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = {0};
 	struct strbuf commondir = STRBUF_INIT;
 	struct strbuf gitdir = STRBUF_INIT;
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 
@@ -2344,13 +2381,14 @@ void read_early_config(config_fn_t cb, void *data)
  */
 void read_very_early_config(config_fn_t cb, void *data)
 {
-	struct config_options opts = { 0 };
-
-	opts.respect_includes = 1;
-	opts.ignore_repo = 1;
-	opts.ignore_worktree = 1;
-	opts.ignore_cmdline = 1;
-	opts.system_gently = 1;
+	struct config_options opts = {
+		.respect_includes = 1,
+		.ignore_repo = 1,
+		.ignore_worktree = 1,
+		.ignore_cmdline = 1,
+		.system_gently = 1,
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	config_with_options(cb, data, NULL, NULL, &opts);
 }
@@ -2635,7 +2673,9 @@ int git_configset_get_pathname(struct config_set *set, const char *key, const ch
 /* Functions use to read configuration from a repository */
 static void repo_read_config(struct repository *repo)
 {
-	struct config_options opts = { 0 };
+	struct config_options opts = {
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+	};
 
 	opts.respect_includes = 1;
 	opts.commondir = repo->commondir;
@@ -2786,8 +2826,10 @@ static void read_protected_config(void)
 		.ignore_repo = 1,
 		.ignore_worktree = 1,
 		.system_gently = 1,
+		.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
 	};
 
+
 	git_configset_init(&protected_config);
 	config_with_options(config_set_callback, &protected_config, NULL,
 			    NULL, &opts);
@@ -2998,6 +3040,7 @@ struct config_store_data {
 		enum config_event_t type;
 		int is_keys_section;
 	} *parsed;
+	enum config_error_action error_action;
 	unsigned int parsed_nr, parsed_alloc, *seen, seen_nr, seen_alloc;
 	unsigned int key_seen:1, section_seen:1, is_keys_section:1;
 };
@@ -3065,6 +3108,10 @@ static int store_aux_event(enum config_event_t type, size_t begin, size_t end,
 			store->seen[store->seen_nr] = store->parsed_nr;
 		}
 	}
+	if (type == CONFIG_EVENT_ERROR) {
+		return git_config_err_fn(type, begin, end, cs,
+					 &store->error_action);
+	}
 
 	store->parsed_nr++;
 
@@ -3402,7 +3449,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 		struct stat st;
 		size_t copy_begin, copy_end;
 		int i, new_line = 0;
-		struct config_parse_options opts;
+		struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
 
 		if (!value_pattern)
 			store.value_pattern = NULL;
@@ -3429,8 +3476,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 
 		ALLOC_GROW(store.parsed, 1, store.parsed_alloc);
 		store.parsed[0].end = 0;
+		store.error_action = CONFIG_ERROR_DIE;
 
-		memset(&opts, 0, sizeof(opts));
 		opts.event_fn = store_aux_event;
 		opts.event_fn_data = &store;
 
diff --git a/config.h b/config.h
index 2537516446..8ad399580f 100644
--- a/config.h
+++ b/config.h
@@ -86,12 +86,6 @@ typedef int (*config_parser_event_fn_t)(enum config_event_t type,
 					void *event_fn_data);
 
 struct config_parse_options {
-	enum config_error_action {
-		CONFIG_ERROR_UNSET = 0, /* use source-specific default */
-		CONFIG_ERROR_DIE, /* die() on error */
-		CONFIG_ERROR_ERROR, /* error() on error, return -1 */
-		CONFIG_ERROR_SILENT, /* return -1 */
-	} error_action;
 	/*
 	 * event_fn and event_fn_data are for internal use only. Handles events
 	 * emitted by the config parser.
@@ -100,6 +94,11 @@ struct config_parse_options {
 	void *event_fn_data;
 };
 
+#define CP_OPTS_INIT(error_action) { \
+	.event_fn = git_config_err_fn, \
+	.event_fn_data = (enum config_error_action []){(error_action)}, \
+}
+
 struct config_options {
 	unsigned int respect_includes : 1;
 	unsigned int ignore_repo : 1;
@@ -119,6 +118,15 @@ struct config_options {
 	unsigned int unconditional_remote_url : 1;
 };
 
+enum config_error_action {
+	CONFIG_ERROR_DIE, /* die() on error */
+	CONFIG_ERROR_ERROR, /* error() on error, return -1 */
+};
+
+int git_config_err_fn(enum config_event_t type, size_t begin_offset,
+		      size_t end_offset, struct config_source *cs,
+		      void *event_fn_data);
+
 /* Config source metadata for a given config key-value pair */
 struct key_value_info {
 	const char *filename;
diff --git a/fsck.c b/fsck.c
index 522ee1c18a..bc0ca11421 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1219,7 +1219,6 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 
 	if (oidset_contains(&options->gitmodules_found, oid)) {
-		struct config_parse_options config_opts = { 0 };
 		struct fsck_gitmodules_data data;
 
 		oidset_insert(&options->gitmodules_done, oid);
@@ -1238,10 +1237,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		data.oid = oid;
 		data.options = options;
 		data.ret = 0;
-		config_opts.error_action = CONFIG_ERROR_SILENT;
 		if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 					".gitmodules", buf, size, &data,
-					CONFIG_SCOPE_UNKNOWN, &config_opts))
+					CONFIG_SCOPE_UNKNOWN, NULL))
 			data.ret |= report(options, oid, OBJ_BLOB,
 					FSCK_MSG_GITMODULES_PARSE,
 					"could not parse gitmodules blob");
diff --git a/submodule-config.c b/submodule-config.c
index b6908e295f..d97135c917 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -565,6 +565,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	enum object_type type;
 	const struct submodule *submodule = NULL;
 	struct parse_config_parameter parameter;
+	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
+
 
 	/*
 	 * If any parameter except the cache is a NULL pointer just
@@ -608,7 +610,8 @@ static const struct submodule *config_from(struct submodule_cache *cache,
 	parameter.gitmodules_oid = &oid;
 	parameter.overwrite = 0;
 	git_config_from_mem(parse_config, CONFIG_ORIGIN_SUBMODULE_BLOB, rev.buf,
-			    config, config_size, &parameter, CONFIG_SCOPE_UNKNOWN, NULL);
+			    config, config_size, &parameter,
+			    CONFIG_SCOPE_UNKNOWN, &config_opts);
 	strbuf_release(&rev);
 	free(config);
 
@@ -652,7 +655,9 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 		struct git_config_source config_source = {
 			0, .scope = CONFIG_SCOPE_SUBMODULE
 		};
-		const struct config_options opts = { 0 };
+		struct config_options opts = {
+			.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE),
+		};
 		struct object_id oid;
 		char *file;
 		char *oidstr = NULL;
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
                     ` (2 preceding siblings ...)
  2023-09-21 21:17   ` [PATCH v3 3/5] config: report config parse errors using cb Josh Steadmon
@ 2023-09-21 21:17   ` Josh Steadmon
  2023-10-23 18:52     ` Jonathan Tan
  2023-09-21 21:17   ` [PATCH v3 5/5] config-parse: split library out of config.[c|h] Josh Steadmon
  2023-10-17 17:13   ` [PATCH v3 0/5] config-parse: create config parsing library Junio C Hamano
  5 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

A later commit will move git_config_from_stdin() to a library, so it
will need to accept event listeners.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 config.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/config.c b/config.c
index 0c4f1a2874..50188f469a 100644
--- a/config.c
+++ b/config.c
@@ -2063,12 +2063,11 @@ static int do_config_from_file(config_fn_t fn,
 }
 
 static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope)
+				 enum config_scope scope,
+				 const struct config_parse_options *config_opts)
 {
-	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
-
 	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, &config_opts);
+				   data, scope, config_opts);
 }
 
 int git_config_from_file_with_options(config_fn_t fn, const char *filename,
@@ -2303,7 +2302,8 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		ret = git_config_from_stdin(fn, data, config_source->scope);
+		ret = git_config_from_stdin(fn, data, config_source->scope,
+					    &opts->parse_options);
 	} else if (config_source && config_source->file) {
 		ret = git_config_from_file_with_options(fn, config_source->file,
 							data, config_source->scope,
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 5/5] config-parse: split library out of config.[c|h]
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
                     ` (3 preceding siblings ...)
  2023-09-21 21:17   ` [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
@ 2023-09-21 21:17   ` Josh Steadmon
  2023-10-23 18:53     ` Jonathan Tan
  2023-10-17 17:13   ` [PATCH v3 0/5] config-parse: create config parsing library Junio C Hamano
  5 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-09-21 21:17 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, calvinwan, glencbz, gitster

From: Glen Choo <chooglen@google.com>

The config parsing machinery (besides "include" directives) is usable by
programs other than Git - it works with any file written in Git config
syntax (IOW it doesn't rely on 'core' Git features like a repository),
and as of the series ending at 6e8e7981eb (config: pass source to
config_parser_event_fn_t, 2023-06-28), it no longer relies on global
state. Thus, we can and should start turning it into a library other
programs can use.

Begin this process by splitting the config parsing code out of
config.[c|h] and into config-parse.[c|h]. Do not change interfaces or
function bodies, but tweak visibility and includes where appropriate,
namely:

- git_config_from_stdin() is now non-static so that it can be seen by
  config.c.

- "struct config_source" is now defined in the .h file so that it can be
  seen by config.c. And as a result, config-lib.h needs to "#include
  strbuf.h".

In theory, this makes it possible for in-tree files to decide whether
they only need all of the config functionality or only config parsing,
and bring in the smallest bit of functionality needed. But for now,
there are no in-tree files that can swap "#include config.h" for
"#include config-parse.h". E.g. Bundle URIs would only need config
parsing to parse bundle lists, but bundle-uri.c uses other config.h
functionality like key parsing and reading repo settings.

The resulting library is usable, though it is unergonomic to do so,
e.g. the caller needs to "#include git-compat-util.h" and other
dependencies, and we don't have an easy way of linking in the required
objects. This isn't the end state we want for our libraries, but at
least we have _some_ library whose usability we can improve in future
series.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Makefile       |   1 +
 config-parse.c | 601 +++++++++++++++++++++++++++++++++++++++++++++++
 config-parse.h | 155 ++++++++++++
 config.c       | 621 -------------------------------------------------
 config.h       | 119 +---------
 5 files changed, 758 insertions(+), 739 deletions(-)
 create mode 100644 config-parse.c
 create mode 100644 config-parse.h

diff --git a/Makefile b/Makefile
index fb541dedc9..67e05bcee5 100644
--- a/Makefile
+++ b/Makefile
@@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += compat/zlib-uncompress2.o
 LIB_OBJS += config.o
+LIB_OBJS += config-parse.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/config-parse.c b/config-parse.c
new file mode 100644
index 0000000000..66e5953e29
--- /dev/null
+++ b/config-parse.c
@@ -0,0 +1,601 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "gettext.h"
+#include "hashmap.h"
+#include "utf8.h"
+#include "config-parse.h"
+
+static int config_file_fgetc(struct config_source *conf)
+{
+	return getc_unlocked(conf->u.file);
+}
+
+static int config_file_ungetc(int c, struct config_source *conf)
+{
+	return ungetc(c, conf->u.file);
+}
+
+static long config_file_ftell(struct config_source *conf)
+{
+	return ftell(conf->u.file);
+}
+
+
+static int config_buf_fgetc(struct config_source *conf)
+{
+	if (conf->u.buf.pos < conf->u.buf.len)
+		return conf->u.buf.buf[conf->u.buf.pos++];
+
+	return EOF;
+}
+
+static int config_buf_ungetc(int c, struct config_source *conf)
+{
+	if (conf->u.buf.pos > 0) {
+		conf->u.buf.pos--;
+		if (conf->u.buf.buf[conf->u.buf.pos] != c)
+			BUG("config_buf can only ungetc the same character");
+		return c;
+	}
+
+	return EOF;
+}
+
+static long config_buf_ftell(struct config_source *conf)
+{
+	return conf->u.buf.pos;
+}
+
+static inline int iskeychar(int c)
+{
+	return isalnum(c) || c == '-';
+}
+
+/*
+ * Auxiliary function to sanity-check and split the key into the section
+ * identifier and variable name.
+ *
+ * Returns 0 on success, -CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and -CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
+ *
+ * store_key - pointer to char* which will hold a copy of the key with
+ *             lowercase section and variable name
+ * baselen - pointer to size_t which will hold the length of the
+ *           section + subsection part, can be NULL
+ */
+int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
+{
+	size_t i, baselen;
+	int dot;
+	const char *last_dot = strrchr(key, '.');
+
+	/*
+	 * Since "key" actually contains the section name and the real
+	 * key name separated by a dot, we have to know where the dot is.
+	 */
+
+	if (last_dot == NULL || last_dot == key) {
+		error(_("key does not contain a section: %s"), key);
+		return -CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	if (!last_dot[1]) {
+		error(_("key does not contain variable name: %s"), key);
+		return -CONFIG_NO_SECTION_OR_NAME;
+	}
+
+	baselen = last_dot - key;
+	if (baselen_)
+		*baselen_ = baselen;
+
+	/*
+	 * Validate the key and while at it, lower case it for matching.
+	 */
+	*store_key = xmallocz(strlen(key));
+
+	dot = 0;
+	for (i = 0; key[i]; i++) {
+		unsigned char c = key[i];
+		if (c == '.')
+			dot = 1;
+		/* Leave the extended basename untouched.. */
+		if (!dot || i > baselen) {
+			if (!iskeychar(c) ||
+			    (i == baselen + 1 && !isalpha(c))) {
+				error(_("invalid key: %s"), key);
+				goto out_free_ret_1;
+			}
+			c = tolower(c);
+		} else if (c == '\n') {
+			error(_("invalid key (newline): %s"), key);
+			goto out_free_ret_1;
+		}
+		(*store_key)[i] = c;
+	}
+
+	return 0;
+
+out_free_ret_1:
+	FREE_AND_NULL(*store_key);
+	return -CONFIG_INVALID_KEY;
+}
+
+static int get_next_char(struct config_source *cs)
+{
+	int c = cs->do_fgetc(cs);
+
+	if (c == '\r') {
+		/* DOS like systems */
+		c = cs->do_fgetc(cs);
+		if (c != '\n') {
+			if (c != EOF)
+				cs->do_ungetc(c, cs);
+			c = '\r';
+		}
+	}
+
+	if (c != EOF && ++cs->total_len > INT_MAX) {
+		/*
+		 * This is an absurdly long config file; refuse to parse
+		 * further in order to protect downstream code from integer
+		 * overflows. Note that we can't return an error specifically,
+		 * but we can mark EOF and put trash in the return value,
+		 * which will trigger a parse error.
+		 */
+		cs->eof = 1;
+		return 0;
+	}
+
+	if (c == '\n')
+		cs->linenr++;
+	if (c == EOF) {
+		cs->eof = 1;
+		cs->linenr++;
+		c = '\n';
+	}
+	return c;
+}
+
+static char *parse_value(struct config_source *cs)
+{
+	int quote = 0, comment = 0, space = 0;
+
+	strbuf_reset(&cs->value);
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n') {
+			if (quote) {
+				cs->linenr--;
+				return NULL;
+			}
+			return cs->value.buf;
+		}
+		if (comment)
+			continue;
+		if (isspace(c) && !quote) {
+			if (cs->value.len)
+				space++;
+			continue;
+		}
+		if (!quote) {
+			if (c == ';' || c == '#') {
+				comment = 1;
+				continue;
+			}
+		}
+		for (; space; space--)
+			strbuf_addch(&cs->value, ' ');
+		if (c == '\\') {
+			c = get_next_char(cs);
+			switch (c) {
+			case '\n':
+				continue;
+			case 't':
+				c = '\t';
+				break;
+			case 'b':
+				c = '\b';
+				break;
+			case 'n':
+				c = '\n';
+				break;
+			/* Some characters escape as themselves */
+			case '\\': case '"':
+				break;
+			/* Reject unknown escape sequences */
+			default:
+				return NULL;
+			}
+			strbuf_addch(&cs->value, c);
+			continue;
+		}
+		if (c == '"') {
+			quote = 1-quote;
+			continue;
+		}
+		strbuf_addch(&cs->value, c);
+	}
+}
+
+static int get_value(struct config_source *cs, struct key_value_info *kvi,
+		     config_fn_t fn, void *data, struct strbuf *name)
+{
+	int c;
+	char *value;
+	int ret;
+	struct config_context ctx = {
+		.kvi = kvi,
+	};
+
+	/* Get the full name */
+	for (;;) {
+		c = get_next_char(cs);
+		if (cs->eof)
+			break;
+		if (!iskeychar(c))
+			break;
+		strbuf_addch(name, tolower(c));
+	}
+
+	while (c == ' ' || c == '\t')
+		c = get_next_char(cs);
+
+	value = NULL;
+	if (c != '\n') {
+		if (c != '=')
+			return -1;
+		value = parse_value(cs);
+		if (!value)
+			return -1;
+	}
+	/*
+	 * We already consumed the \n, but we need linenr to point to
+	 * the line we just parsed during the call to fn to get
+	 * accurate line number in error messages.
+	 */
+	cs->linenr--;
+	kvi->linenr = cs->linenr;
+	ret = fn(name->buf, value, &ctx, data);
+	if (ret >= 0)
+		cs->linenr++;
+	return ret;
+}
+
+static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
+				 int c)
+{
+	cs->subsection_case_sensitive = 0;
+	do {
+		if (c == '\n')
+			goto error_incomplete_line;
+		c = get_next_char(cs);
+	} while (isspace(c));
+
+	/* We require the format to be '[base "extension"]' */
+	if (c != '"')
+		return -1;
+	strbuf_addch(name, '.');
+
+	for (;;) {
+		int c = get_next_char(cs);
+		if (c == '\n')
+			goto error_incomplete_line;
+		if (c == '"')
+			break;
+		if (c == '\\') {
+			c = get_next_char(cs);
+			if (c == '\n')
+				goto error_incomplete_line;
+		}
+		strbuf_addch(name, c);
+	}
+
+	/* Final ']' */
+	if (get_next_char(cs) != ']')
+		return -1;
+	return 0;
+error_incomplete_line:
+	cs->linenr--;
+	return -1;
+}
+
+static int get_base_var(struct config_source *cs, struct strbuf *name)
+{
+	cs->subsection_case_sensitive = 1;
+	for (;;) {
+		int c = get_next_char(cs);
+		if (cs->eof)
+			return -1;
+		if (c == ']')
+			return 0;
+		if (isspace(c))
+			return get_extended_base_var(cs, name, c);
+		if (!iskeychar(c) && c != '.')
+			return -1;
+		strbuf_addch(name, tolower(c));
+	}
+}
+
+struct parse_event_data {
+	enum config_event_t previous_type;
+	size_t previous_offset;
+	const struct config_parse_options *opts;
+};
+
+static size_t get_corrected_offset(struct config_source *cs,
+				   enum config_event_t type)
+{
+	size_t offset = cs->do_ftell(cs);
+
+	/*
+	 * At EOF, the parser always "inserts" an extra '\n', therefore
+	 * the end offset of the event is the current file position, otherwise
+	 * we will already have advanced to the next event.
+	 */
+	if (type != CONFIG_EVENT_EOF)
+		offset--;
+	return offset;
+}
+
+static void start_event(struct config_source *cs, enum config_event_t type,
+		       struct parse_event_data *data)
+{
+	data->previous_type = type;
+	data->previous_offset = get_corrected_offset(cs, type);
+}
+
+static int flush_event(struct config_source *cs, enum config_event_t type,
+		       struct parse_event_data *data)
+{
+	if (!data->opts || !data->opts->event_fn)
+		return 0;
+
+	if (type == CONFIG_EVENT_WHITESPACE &&
+	    data->previous_type == type)
+		return 0;
+
+	if (data->previous_type != CONFIG_EVENT_EOF &&
+	    data->opts->event_fn(data->previous_type, data->previous_offset,
+				 get_corrected_offset(cs, type), cs,
+				 data->opts->event_fn_data) < 0)
+		return -1;
+
+	return 1;
+}
+
+static int do_event(struct config_source *cs, enum config_event_t type,
+		    struct parse_event_data *data)
+{
+	int maybe_ret;
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	start_event(cs, type, data);
+
+	return 0;
+}
+
+static int do_event_and_flush(struct config_source *cs,
+			      enum config_event_t type,
+			      struct parse_event_data *data)
+{
+	int maybe_ret;
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	start_event(cs, type, data);
+
+	if ((maybe_ret = flush_event(cs, type, data)) < 1)
+		return maybe_ret;
+
+	/*
+	 * Not actually EOF, but this indicates we don't have a valid event
+	 * to flush next time around.
+	 */
+	data->previous_type = CONFIG_EVENT_EOF;
+
+	return 0;
+}
+
+static void kvi_from_source(struct config_source *cs,
+			    enum config_scope scope,
+			    struct key_value_info *out)
+{
+	out->filename = strintern(cs->name);
+	out->origin_type = cs->origin_type;
+	out->linenr = cs->linenr;
+	out->scope = scope;
+	out->path = cs->path;
+}
+
+static int git_parse_source(struct config_source *cs, config_fn_t fn,
+			    struct key_value_info *kvi, void *data,
+			    const struct config_parse_options *opts)
+{
+	int comment = 0;
+	size_t baselen = 0;
+	struct strbuf *var = &cs->var;
+
+	/* U+FEFF Byte Order Mark in UTF8 */
+	const char *bomptr = utf8_bom;
+
+	/* For the parser event callback */
+	struct parse_event_data event_data = {
+		CONFIG_EVENT_EOF, 0, opts
+	};
+
+	for (;;) {
+		int c;
+
+		c = get_next_char(cs);
+		if (bomptr && *bomptr) {
+			/* We are at the file beginning; skip UTF8-encoded BOM
+			 * if present. Sane editors won't put this in on their
+			 * own, but e.g. Windows Notepad will do it happily. */
+			if (c == (*bomptr & 0377)) {
+				bomptr++;
+				continue;
+			} else {
+				/* Do not tolerate partial BOM. */
+				if (bomptr != utf8_bom)
+					break;
+				/* No BOM at file beginning. Cool. */
+				bomptr = NULL;
+			}
+		}
+		if (c == '\n') {
+			if (cs->eof) {
+				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
+					return -1;
+				return 0;
+			}
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+				return -1;
+			comment = 0;
+			continue;
+		}
+		if (comment)
+			continue;
+		if (isspace(c)) {
+			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
+					return -1;
+			continue;
+		}
+		if (c == '#' || c == ';') {
+			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
+					return -1;
+			comment = 1;
+			continue;
+		}
+		if (c == '[') {
+			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
+					return -1;
+
+			/* Reset prior to determining a new stem */
+			strbuf_reset(var);
+			if (get_base_var(cs, var) < 0 || var->len < 1)
+				break;
+			strbuf_addch(var, '.');
+			baselen = var->len;
+			continue;
+		}
+		if (!isalpha(c))
+			break;
+
+		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
+			return -1;
+
+		/*
+		 * Truncate the var name back to the section header
+		 * stem prior to grabbing the suffix part of the name
+		 * and the value.
+		 */
+		strbuf_setlen(var, baselen);
+		strbuf_addch(var, tolower(c));
+		if (get_value(cs, kvi, fn, data, var) < 0)
+			break;
+	}
+
+	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
+	return -1;
+}
+
+/*
+ * All source specific fields in the union, die_on_error, name and the callbacks
+ * fgetc, ungetc, ftell of top need to be initialized before calling
+ * this function.
+ */
+static int do_config_from(struct config_source *top, config_fn_t fn,
+			  void *data, enum config_scope scope,
+			  const struct config_parse_options *opts)
+{
+	struct key_value_info kvi = KVI_INIT;
+	int ret;
+
+	/* push config-file parsing state stack */
+	top->linenr = 1;
+	top->eof = 0;
+	top->total_len = 0;
+	strbuf_init(&top->value, 1024);
+	strbuf_init(&top->var, 1024);
+	kvi_from_source(top, scope, &kvi);
+
+	ret = git_parse_source(top, fn, &kvi, data, opts);
+
+	strbuf_release(&top->value);
+	strbuf_release(&top->var);
+
+	return ret;
+}
+
+static int do_config_from_file(config_fn_t fn,
+			       const enum config_origin_type origin_type,
+			       const char *name, const char *path, FILE *f,
+			       void *data, enum config_scope scope,
+			       const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+	int ret;
+
+	top.u.file = f;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = path;
+	top.do_fgetc = config_file_fgetc;
+	top.do_ungetc = config_file_ungetc;
+	top.do_ftell = config_file_ftell;
+
+	flockfile(f);
+	ret = do_config_from(&top, fn, data, scope, opts);
+	funlockfile(f);
+	return ret;
+}
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts)
+{
+	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
+				   data, scope, config_opts);
+}
+
+int git_config_from_file_with_options(config_fn_t fn, const char *filename,
+				      void *data, enum config_scope scope,
+				      const struct config_parse_options *opts)
+{
+	int ret = -1;
+	FILE *f;
+
+	if (!filename)
+		BUG("filename cannot be NULL");
+	f = fopen_or_warn(filename, "r");
+	if (f) {
+		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
+					  filename, f, data, scope, opts);
+		fclose(f);
+	}
+	return ret;
+}
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type origin_type,
+			const char *name, const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts)
+{
+	struct config_source top = CONFIG_SOURCE_INIT;
+
+	top.u.buf.buf = buf;
+	top.u.buf.len = len;
+	top.u.buf.pos = 0;
+	top.origin_type = origin_type;
+	top.name = name;
+	top.path = NULL;
+	top.do_fgetc = config_buf_fgetc;
+	top.do_ungetc = config_buf_ungetc;
+	top.do_ftell = config_buf_ftell;
+
+	return do_config_from(&top, fn, data, scope, opts);
+}
diff --git a/config-parse.h b/config-parse.h
new file mode 100644
index 0000000000..ac73a826d9
--- /dev/null
+++ b/config-parse.h
@@ -0,0 +1,155 @@
+/*
+ * Low level config parsing.
+ */
+#ifndef CONFIG_PARSE_H
+#define CONFIG_PARSE_H
+
+#include "strbuf.h"
+
+/* git_config_parse_key() returns these negated: */
+#define CONFIG_INVALID_KEY 1
+#define CONFIG_NO_SECTION_OR_NAME 2
+
+int git_config_parse_key(const char *, char **, size_t *);
+
+enum config_scope {
+	CONFIG_SCOPE_UNKNOWN = 0,
+	CONFIG_SCOPE_SYSTEM,
+	CONFIG_SCOPE_GLOBAL,
+	CONFIG_SCOPE_LOCAL,
+	CONFIG_SCOPE_WORKTREE,
+	CONFIG_SCOPE_COMMAND,
+	CONFIG_SCOPE_SUBMODULE,
+};
+const char *config_scope_name(enum config_scope scope);
+
+enum config_origin_type {
+	CONFIG_ORIGIN_UNKNOWN = 0,
+	CONFIG_ORIGIN_BLOB,
+	CONFIG_ORIGIN_FILE,
+	CONFIG_ORIGIN_STDIN,
+	CONFIG_ORIGIN_SUBMODULE_BLOB,
+	CONFIG_ORIGIN_CMDLINE
+};
+
+enum config_event_t {
+	CONFIG_EVENT_SECTION,
+	CONFIG_EVENT_ENTRY,
+	CONFIG_EVENT_WHITESPACE,
+	CONFIG_EVENT_COMMENT,
+	CONFIG_EVENT_EOF,
+	CONFIG_EVENT_ERROR
+};
+
+struct config_source;
+/*
+ * The parser event function (if not NULL) is called with the event type and
+ * the begin/end offsets of the parsed elements.
+ *
+ * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
+ * character is considered part of the element.
+ */
+typedef int (*config_parser_event_fn_t)(enum config_event_t type,
+					size_t begin_offset, size_t end_offset,
+					struct config_source *cs,
+					void *event_fn_data);
+
+struct config_parse_options {
+	/*
+	 * event_fn and event_fn_data are for internal use only. Handles events
+	 * emitted by the config parser.
+	 */
+	config_parser_event_fn_t event_fn;
+	void *event_fn_data;
+};
+
+struct config_source {
+	struct config_source *prev;
+	union {
+		FILE *file;
+		struct config_buf {
+			const char *buf;
+			size_t len;
+			size_t pos;
+		} buf;
+	} u;
+	enum config_origin_type origin_type;
+	const char *name;
+	const char *path;
+	int linenr;
+	int eof;
+	size_t total_len;
+	struct strbuf value;
+	struct strbuf var;
+	unsigned subsection_case_sensitive : 1;
+
+	int (*do_fgetc)(struct config_source *c);
+	int (*do_ungetc)(int c, struct config_source *conf);
+	long (*do_ftell)(struct config_source *c);
+};
+#define CONFIG_SOURCE_INIT { 0 }
+
+/* Config source metadata for a given config key-value pair */
+struct key_value_info {
+	const char *filename;
+	int linenr;
+	enum config_origin_type origin_type;
+	enum config_scope scope;
+	const char *path;
+};
+#define KVI_INIT { \
+	.filename = NULL, \
+	.linenr = -1, \
+	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
+	.scope = CONFIG_SCOPE_UNKNOWN, \
+	.path = NULL, \
+}
+
+/* Captures additional information that a config callback can use. */
+struct config_context {
+	/* Config source metadata for key and value. */
+	const struct key_value_info *kvi;
+};
+#define CONFIG_CONTEXT_INIT { 0 }
+
+/**
+ * A config callback function takes four parameters:
+ *
+ * - the name of the parsed variable. This is in canonical "flat" form: the
+ *   section, subsection, and variable segments will be separated by dots,
+ *   and the section and variable segments will be all lowercase. E.g.,
+ *   `core.ignorecase`, `diff.SomeType.textconv`.
+ *
+ * - the value of the found variable, as a string. If the variable had no
+ *   value specified, the value will be NULL (typically this means it
+ *   should be interpreted as boolean true).
+ *
+ * - the 'config context', that is, additional information about the config
+ *   iteration operation provided by the config machinery. For example, this
+ *   includes information about the config source being parsed (e.g. the
+ *   filename).
+ *
+ * - a void pointer passed in by the caller of the config API; this can
+ *   contain callback-specific data
+ *
+ * A config callback should return 0 for success, or -1 if the variable
+ * could not be parsed properly.
+ */
+typedef int (*config_fn_t)(const char *, const char *,
+			   const struct config_context *, void *);
+
+int git_config_from_file_with_options(config_fn_t fn, const char *,
+				      void *, enum config_scope,
+				      const struct config_parse_options *);
+
+int git_config_from_mem(config_fn_t fn,
+			const enum config_origin_type,
+			const char *name,
+			const char *buf, size_t len,
+			void *data, enum config_scope scope,
+			const struct config_parse_options *opts);
+
+int git_config_from_stdin(config_fn_t fn, void *data, enum config_scope scope,
+			  const struct config_parse_options *config_opts);
+
+#endif /* CONFIG_PARSE_H */
diff --git a/config.c b/config.c
index 50188f469a..e10901514a 100644
--- a/config.c
+++ b/config.c
@@ -42,32 +42,6 @@
 #include "wrapper.h"
 #include "write-or-die.h"
 
-struct config_source {
-	struct config_source *prev;
-	union {
-		FILE *file;
-		struct config_buf {
-			const char *buf;
-			size_t len;
-			size_t pos;
-		} buf;
-	} u;
-	enum config_origin_type origin_type;
-	const char *name;
-	const char *path;
-	int linenr;
-	int eof;
-	size_t total_len;
-	struct strbuf value;
-	struct strbuf var;
-	unsigned subsection_case_sensitive : 1;
-
-	int (*do_fgetc)(struct config_source *c);
-	int (*do_ungetc)(int c, struct config_source *conf);
-	long (*do_ftell)(struct config_source *c);
-};
-#define CONFIG_SOURCE_INIT { 0 }
-
 static int pack_compression_seen;
 static int zlib_compression_seen;
 
@@ -82,47 +56,6 @@ static int zlib_compression_seen;
  */
 static struct config_set protected_config;
 
-static int config_file_fgetc(struct config_source *conf)
-{
-	return getc_unlocked(conf->u.file);
-}
-
-static int config_file_ungetc(int c, struct config_source *conf)
-{
-	return ungetc(c, conf->u.file);
-}
-
-static long config_file_ftell(struct config_source *conf)
-{
-	return ftell(conf->u.file);
-}
-
-
-static int config_buf_fgetc(struct config_source *conf)
-{
-	if (conf->u.buf.pos < conf->u.buf.len)
-		return conf->u.buf.buf[conf->u.buf.pos++];
-
-	return EOF;
-}
-
-static int config_buf_ungetc(int c, struct config_source *conf)
-{
-	if (conf->u.buf.pos > 0) {
-		conf->u.buf.pos--;
-		if (conf->u.buf.buf[conf->u.buf.pos] != c)
-			BUG("config_buf can only ungetc the same character");
-		return c;
-	}
-
-	return EOF;
-}
-
-static long config_buf_ftell(struct config_source *conf)
-{
-	return conf->u.buf.pos;
-}
-
 struct config_include_data {
 	int depth;
 	config_fn_t fn;
@@ -528,80 +461,6 @@ void git_config_push_env(const char *spec)
 	free(key);
 }
 
-static inline int iskeychar(int c)
-{
-	return isalnum(c) || c == '-';
-}
-
-/*
- * Auxiliary function to sanity-check and split the key into the section
- * identifier and variable name.
- *
- * Returns 0 on success, -1 when there is an invalid character in the key and
- * -2 if there is no section name in the key.
- *
- * store_key - pointer to char* which will hold a copy of the key with
- *             lowercase section and variable name
- * baselen - pointer to size_t which will hold the length of the
- *           section + subsection part, can be NULL
- */
-int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
-{
-	size_t i, baselen;
-	int dot;
-	const char *last_dot = strrchr(key, '.');
-
-	/*
-	 * Since "key" actually contains the section name and the real
-	 * key name separated by a dot, we have to know where the dot is.
-	 */
-
-	if (last_dot == NULL || last_dot == key) {
-		error(_("key does not contain a section: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	if (!last_dot[1]) {
-		error(_("key does not contain variable name: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
-	}
-
-	baselen = last_dot - key;
-	if (baselen_)
-		*baselen_ = baselen;
-
-	/*
-	 * Validate the key and while at it, lower case it for matching.
-	 */
-	*store_key = xmallocz(strlen(key));
-
-	dot = 0;
-	for (i = 0; key[i]; i++) {
-		unsigned char c = key[i];
-		if (c == '.')
-			dot = 1;
-		/* Leave the extended basename untouched.. */
-		if (!dot || i > baselen) {
-			if (!iskeychar(c) ||
-			    (i == baselen + 1 && !isalpha(c))) {
-				error(_("invalid key: %s"), key);
-				goto out_free_ret_1;
-			}
-			c = tolower(c);
-		} else if (c == '\n') {
-			error(_("invalid key (newline): %s"), key);
-			goto out_free_ret_1;
-		}
-		(*store_key)[i] = c;
-	}
-
-	return 0;
-
-out_free_ret_1:
-	FREE_AND_NULL(*store_key);
-	return -CONFIG_INVALID_KEY;
-}
-
 static int config_parse_pair(const char *key, const char *value,
 			     struct key_value_info *kvi,
 			     config_fn_t fn, void *data)
@@ -786,296 +645,6 @@ int git_config_from_parameters(config_fn_t fn, void *data)
 	return ret;
 }
 
-static int get_next_char(struct config_source *cs)
-{
-	int c = cs->do_fgetc(cs);
-
-	if (c == '\r') {
-		/* DOS like systems */
-		c = cs->do_fgetc(cs);
-		if (c != '\n') {
-			if (c != EOF)
-				cs->do_ungetc(c, cs);
-			c = '\r';
-		}
-	}
-
-	if (c != EOF && ++cs->total_len > INT_MAX) {
-		/*
-		 * This is an absurdly long config file; refuse to parse
-		 * further in order to protect downstream code from integer
-		 * overflows. Note that we can't return an error specifically,
-		 * but we can mark EOF and put trash in the return value,
-		 * which will trigger a parse error.
-		 */
-		cs->eof = 1;
-		return 0;
-	}
-
-	if (c == '\n')
-		cs->linenr++;
-	if (c == EOF) {
-		cs->eof = 1;
-		cs->linenr++;
-		c = '\n';
-	}
-	return c;
-}
-
-static char *parse_value(struct config_source *cs)
-{
-	int quote = 0, comment = 0, space = 0;
-
-	strbuf_reset(&cs->value);
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n') {
-			if (quote) {
-				cs->linenr--;
-				return NULL;
-			}
-			return cs->value.buf;
-		}
-		if (comment)
-			continue;
-		if (isspace(c) && !quote) {
-			if (cs->value.len)
-				space++;
-			continue;
-		}
-		if (!quote) {
-			if (c == ';' || c == '#') {
-				comment = 1;
-				continue;
-			}
-		}
-		for (; space; space--)
-			strbuf_addch(&cs->value, ' ');
-		if (c == '\\') {
-			c = get_next_char(cs);
-			switch (c) {
-			case '\n':
-				continue;
-			case 't':
-				c = '\t';
-				break;
-			case 'b':
-				c = '\b';
-				break;
-			case 'n':
-				c = '\n';
-				break;
-			/* Some characters escape as themselves */
-			case '\\': case '"':
-				break;
-			/* Reject unknown escape sequences */
-			default:
-				return NULL;
-			}
-			strbuf_addch(&cs->value, c);
-			continue;
-		}
-		if (c == '"') {
-			quote = 1-quote;
-			continue;
-		}
-		strbuf_addch(&cs->value, c);
-	}
-}
-
-static int get_value(struct config_source *cs, struct key_value_info *kvi,
-		     config_fn_t fn, void *data, struct strbuf *name)
-{
-	int c;
-	char *value;
-	int ret;
-	struct config_context ctx = {
-		.kvi = kvi,
-	};
-
-	/* Get the full name */
-	for (;;) {
-		c = get_next_char(cs);
-		if (cs->eof)
-			break;
-		if (!iskeychar(c))
-			break;
-		strbuf_addch(name, tolower(c));
-	}
-
-	while (c == ' ' || c == '\t')
-		c = get_next_char(cs);
-
-	value = NULL;
-	if (c != '\n') {
-		if (c != '=')
-			return -1;
-		value = parse_value(cs);
-		if (!value)
-			return -1;
-	}
-	/*
-	 * We already consumed the \n, but we need linenr to point to
-	 * the line we just parsed during the call to fn to get
-	 * accurate line number in error messages.
-	 */
-	cs->linenr--;
-	kvi->linenr = cs->linenr;
-	ret = fn(name->buf, value, &ctx, data);
-	if (ret >= 0)
-		cs->linenr++;
-	return ret;
-}
-
-static int get_extended_base_var(struct config_source *cs, struct strbuf *name,
-				 int c)
-{
-	cs->subsection_case_sensitive = 0;
-	do {
-		if (c == '\n')
-			goto error_incomplete_line;
-		c = get_next_char(cs);
-	} while (isspace(c));
-
-	/* We require the format to be '[base "extension"]' */
-	if (c != '"')
-		return -1;
-	strbuf_addch(name, '.');
-
-	for (;;) {
-		int c = get_next_char(cs);
-		if (c == '\n')
-			goto error_incomplete_line;
-		if (c == '"')
-			break;
-		if (c == '\\') {
-			c = get_next_char(cs);
-			if (c == '\n')
-				goto error_incomplete_line;
-		}
-		strbuf_addch(name, c);
-	}
-
-	/* Final ']' */
-	if (get_next_char(cs) != ']')
-		return -1;
-	return 0;
-error_incomplete_line:
-	cs->linenr--;
-	return -1;
-}
-
-static int get_base_var(struct config_source *cs, struct strbuf *name)
-{
-	cs->subsection_case_sensitive = 1;
-	for (;;) {
-		int c = get_next_char(cs);
-		if (cs->eof)
-			return -1;
-		if (c == ']')
-			return 0;
-		if (isspace(c))
-			return get_extended_base_var(cs, name, c);
-		if (!iskeychar(c) && c != '.')
-			return -1;
-		strbuf_addch(name, tolower(c));
-	}
-}
-
-struct parse_event_data {
-	enum config_event_t previous_type;
-	size_t previous_offset;
-	const struct config_parse_options *opts;
-};
-
-static size_t get_corrected_offset(struct config_source *cs,
-				   enum config_event_t type)
-{
-	size_t offset = cs->do_ftell(cs);
-
-	/*
-	 * At EOF, the parser always "inserts" an extra '\n', therefore
-	 * the end offset of the event is the current file position, otherwise
-	 * we will already have advanced to the next event.
-	 */
-	if (type != CONFIG_EVENT_EOF)
-		offset--;
-	return offset;
-}
-
-static void start_event(struct config_source *cs, enum config_event_t type,
-		       struct parse_event_data *data)
-{
-	data->previous_type = type;
-	data->previous_offset = get_corrected_offset(cs, type);
-}
-
-static int flush_event(struct config_source *cs, enum config_event_t type,
-		       struct parse_event_data *data)
-{
-	if (!data->opts || !data->opts->event_fn)
-		return 0;
-
-	if (type == CONFIG_EVENT_WHITESPACE &&
-	    data->previous_type == type)
-		return 0;
-
-	if (data->previous_type != CONFIG_EVENT_EOF &&
-	    data->opts->event_fn(data->previous_type, data->previous_offset,
-				 get_corrected_offset(cs, type), cs,
-				 data->opts->event_fn_data) < 0)
-		return -1;
-
-	return 1;
-}
-
-static int do_event(struct config_source *cs, enum config_event_t type,
-		    struct parse_event_data *data)
-{
-	int maybe_ret;
-
-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
-		return maybe_ret;
-
-	start_event(cs, type, data);
-
-	return 0;
-}
-
-static int do_event_and_flush(struct config_source *cs,
-			      enum config_event_t type,
-			      struct parse_event_data *data)
-{
-	int maybe_ret;
-
-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
-		return maybe_ret;
-
-	start_event(cs, type, data);
-
-	if ((maybe_ret = flush_event(cs, type, data)) < 1)
-		return maybe_ret;
-
-	/*
-	 * Not actually EOF, but this indicates we don't have a valid event
-	 * to flush next time around.
-	 */
-	data->previous_type = CONFIG_EVENT_EOF;
-
-	return 0;
-}
-
-static void kvi_from_source(struct config_source *cs,
-			    enum config_scope scope,
-			    struct key_value_info *out)
-{
-	out->filename = strintern(cs->name);
-	out->origin_type = cs->origin_type;
-	out->linenr = cs->linenr;
-	out->scope = scope;
-	out->path = cs->path;
-}
-
 int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 		      size_t end_offset UNUSED, struct config_source *cs,
 		      void *data)
@@ -1126,98 +695,6 @@ int git_config_err_fn(enum config_event_t type, size_t begin_offset UNUSED,
 	return error_return;
 }
 
-static int git_parse_source(struct config_source *cs, config_fn_t fn,
-			    struct key_value_info *kvi, void *data,
-			    const struct config_parse_options *opts)
-{
-	int comment = 0;
-	size_t baselen = 0;
-	struct strbuf *var = &cs->var;
-
-	/* U+FEFF Byte Order Mark in UTF8 */
-	const char *bomptr = utf8_bom;
-
-	/* For the parser event callback */
-	struct parse_event_data event_data = {
-		CONFIG_EVENT_EOF, 0, opts
-	};
-
-	for (;;) {
-		int c;
-
-		c = get_next_char(cs);
-		if (bomptr && *bomptr) {
-			/* We are at the file beginning; skip UTF8-encoded BOM
-			 * if present. Sane editors won't put this in on their
-			 * own, but e.g. Windows Notepad will do it happily. */
-			if (c == (*bomptr & 0377)) {
-				bomptr++;
-				continue;
-			} else {
-				/* Do not tolerate partial BOM. */
-				if (bomptr != utf8_bom)
-					break;
-				/* No BOM at file beginning. Cool. */
-				bomptr = NULL;
-			}
-		}
-		if (c == '\n') {
-			if (cs->eof) {
-				if (do_event(cs, CONFIG_EVENT_EOF, &event_data) < 0)
-					return -1;
-				return 0;
-			}
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-				return -1;
-			comment = 0;
-			continue;
-		}
-		if (comment)
-			continue;
-		if (isspace(c)) {
-			if (do_event(cs, CONFIG_EVENT_WHITESPACE, &event_data) < 0)
-					return -1;
-			continue;
-		}
-		if (c == '#' || c == ';') {
-			if (do_event(cs, CONFIG_EVENT_COMMENT, &event_data) < 0)
-					return -1;
-			comment = 1;
-			continue;
-		}
-		if (c == '[') {
-			if (do_event(cs, CONFIG_EVENT_SECTION, &event_data) < 0)
-					return -1;
-
-			/* Reset prior to determining a new stem */
-			strbuf_reset(var);
-			if (get_base_var(cs, var) < 0 || var->len < 1)
-				break;
-			strbuf_addch(var, '.');
-			baselen = var->len;
-			continue;
-		}
-		if (!isalpha(c))
-			break;
-
-		if (do_event(cs, CONFIG_EVENT_ENTRY, &event_data) < 0)
-			return -1;
-
-		/*
-		 * Truncate the var name back to the section header
-		 * stem prior to grabbing the suffix part of the name
-		 * and the value.
-		 */
-		strbuf_setlen(var, baselen);
-		strbuf_addch(var, tolower(c));
-		if (get_value(cs, kvi, fn, data, var) < 0)
-			break;
-	}
-
-	do_event_and_flush(cs, CONFIG_EVENT_ERROR, &event_data);
-	return -1;
-}
-
 static uintmax_t get_unit_factor(const char *end)
 {
 	if (!*end)
@@ -2011,83 +1488,6 @@ int git_default_config(const char *var, const char *value,
 	return 0;
 }
 
-/*
- * All source specific fields in the union, die_on_error, name and the callbacks
- * fgetc, ungetc, ftell of top need to be initialized before calling
- * this function.
- */
-static int do_config_from(struct config_source *top, config_fn_t fn,
-			  void *data, enum config_scope scope,
-			  const struct config_parse_options *opts)
-{
-	struct key_value_info kvi = KVI_INIT;
-	int ret;
-
-	/* push config-file parsing state stack */
-	top->linenr = 1;
-	top->eof = 0;
-	top->total_len = 0;
-	strbuf_init(&top->value, 1024);
-	strbuf_init(&top->var, 1024);
-	kvi_from_source(top, scope, &kvi);
-
-	ret = git_parse_source(top, fn, &kvi, data, opts);
-
-	strbuf_release(&top->value);
-	strbuf_release(&top->var);
-
-	return ret;
-}
-
-static int do_config_from_file(config_fn_t fn,
-			       const enum config_origin_type origin_type,
-			       const char *name, const char *path, FILE *f,
-			       void *data, enum config_scope scope,
-			       const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-	int ret;
-
-	top.u.file = f;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = path;
-	top.do_fgetc = config_file_fgetc;
-	top.do_ungetc = config_file_ungetc;
-	top.do_ftell = config_file_ftell;
-
-	flockfile(f);
-	ret = do_config_from(&top, fn, data, scope, opts);
-	funlockfile(f);
-	return ret;
-}
-
-static int git_config_from_stdin(config_fn_t fn, void *data,
-				 enum config_scope scope,
-				 const struct config_parse_options *config_opts)
-{
-	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
-				   data, scope, config_opts);
-}
-
-int git_config_from_file_with_options(config_fn_t fn, const char *filename,
-				      void *data, enum config_scope scope,
-				      const struct config_parse_options *opts)
-{
-	int ret = -1;
-	FILE *f;
-
-	if (!filename)
-		BUG("filename cannot be NULL");
-	f = fopen_or_warn(filename, "r");
-	if (f) {
-		ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename,
-					  filename, f, data, scope, opts);
-		fclose(f);
-	}
-	return ret;
-}
-
 int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 {
 	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
@@ -2096,27 +1496,6 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 						 CONFIG_SCOPE_UNKNOWN, &config_opts);
 }
 
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type origin_type,
-			const char *name, const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts)
-{
-	struct config_source top = CONFIG_SOURCE_INIT;
-
-	top.u.buf.buf = buf;
-	top.u.buf.len = len;
-	top.u.buf.pos = 0;
-	top.origin_type = origin_type;
-	top.name = name;
-	top.path = NULL;
-	top.do_fgetc = config_buf_fgetc;
-	top.do_ungetc = config_buf_ungetc;
-	top.do_ftell = config_buf_ftell;
-
-	return do_config_from(&top, fn, data, scope, opts);
-}
-
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
 			      struct repository *repo,
diff --git a/config.h b/config.h
index 8ad399580f..3bad5e1c32 100644
--- a/config.h
+++ b/config.h
@@ -4,7 +4,7 @@
 #include "hashmap.h"
 #include "string-list.h"
 #include "repository.h"
-
+#include "config-parse.h"
 
 /**
  * The config API gives callers a way to access Git configuration files
@@ -23,9 +23,6 @@
 
 struct object_id;
 
-/* git_config_parse_key() returns these negated: */
-#define CONFIG_INVALID_KEY 1
-#define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
 #define CONFIG_NO_LOCK -1
 #define CONFIG_INVALID_FILE 3
@@ -36,17 +33,6 @@ struct object_id;
 
 #define CONFIG_REGEX_NONE ((void *)1)
 
-enum config_scope {
-	CONFIG_SCOPE_UNKNOWN = 0,
-	CONFIG_SCOPE_SYSTEM,
-	CONFIG_SCOPE_GLOBAL,
-	CONFIG_SCOPE_LOCAL,
-	CONFIG_SCOPE_WORKTREE,
-	CONFIG_SCOPE_COMMAND,
-	CONFIG_SCOPE_SUBMODULE,
-};
-const char *config_scope_name(enum config_scope scope);
-
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
@@ -54,46 +40,6 @@ struct git_config_source {
 	enum config_scope scope;
 };
 
-enum config_origin_type {
-	CONFIG_ORIGIN_UNKNOWN = 0,
-	CONFIG_ORIGIN_BLOB,
-	CONFIG_ORIGIN_FILE,
-	CONFIG_ORIGIN_STDIN,
-	CONFIG_ORIGIN_SUBMODULE_BLOB,
-	CONFIG_ORIGIN_CMDLINE
-};
-
-enum config_event_t {
-	CONFIG_EVENT_SECTION,
-	CONFIG_EVENT_ENTRY,
-	CONFIG_EVENT_WHITESPACE,
-	CONFIG_EVENT_COMMENT,
-	CONFIG_EVENT_EOF,
-	CONFIG_EVENT_ERROR
-};
-
-struct config_source;
-/*
- * The parser event function (if not NULL) is called with the event type and
- * the begin/end offsets of the parsed elements.
- *
- * Note: for CONFIG_EVENT_ENTRY (i.e. config variables), the trailing newline
- * character is considered part of the element.
- */
-typedef int (*config_parser_event_fn_t)(enum config_event_t type,
-					size_t begin_offset, size_t end_offset,
-					struct config_source *cs,
-					void *event_fn_data);
-
-struct config_parse_options {
-	/*
-	 * event_fn and event_fn_data are for internal use only. Handles events
-	 * emitted by the config parser.
-	 */
-	config_parser_event_fn_t event_fn;
-	void *event_fn_data;
-};
-
 #define CP_OPTS_INIT(error_action) { \
 	.event_fn = git_config_err_fn, \
 	.event_fn_data = (enum config_error_action []){(error_action)}, \
@@ -126,59 +72,8 @@ enum config_error_action {
 int git_config_err_fn(enum config_event_t type, size_t begin_offset,
 		      size_t end_offset, struct config_source *cs,
 		      void *event_fn_data);
-
-/* Config source metadata for a given config key-value pair */
-struct key_value_info {
-	const char *filename;
-	int linenr;
-	enum config_origin_type origin_type;
-	enum config_scope scope;
-	const char *path;
-};
-#define KVI_INIT { \
-	.filename = NULL, \
-	.linenr = -1, \
-	.origin_type = CONFIG_ORIGIN_UNKNOWN, \
-	.scope = CONFIG_SCOPE_UNKNOWN, \
-	.path = NULL, \
-}
-
-/* Captures additional information that a config callback can use. */
-struct config_context {
-	/* Config source metadata for key and value. */
-	const struct key_value_info *kvi;
-};
-#define CONFIG_CONTEXT_INIT { 0 }
-
-/**
- * A config callback function takes four parameters:
- *
- * - the name of the parsed variable. This is in canonical "flat" form: the
- *   section, subsection, and variable segments will be separated by dots,
- *   and the section and variable segments will be all lowercase. E.g.,
- *   `core.ignorecase`, `diff.SomeType.textconv`.
- *
- * - the value of the found variable, as a string. If the variable had no
- *   value specified, the value will be NULL (typically this means it
- *   should be interpreted as boolean true).
- *
- * - the 'config context', that is, additional information about the config
- *   iteration operation provided by the config machinery. For example, this
- *   includes information about the config source being parsed (e.g. the
- *   filename).
- *
- * - a void pointer passed in by the caller of the config API; this can
- *   contain callback-specific data
- *
- * A config callback should return 0 for success, or -1 if the variable
- * could not be parsed properly.
- */
-typedef int (*config_fn_t)(const char *, const char *,
-			   const struct config_context *, void *);
-
 int git_default_config(const char *, const char *,
 		       const struct config_context *, void *);
-
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
@@ -186,16 +81,6 @@ int git_default_config(const char *, const char *,
  * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
-
-int git_config_from_file_with_options(config_fn_t fn, const char *,
-				      void *, enum config_scope,
-				      const struct config_parse_options *);
-int git_config_from_mem(config_fn_t fn,
-			const enum config_origin_type,
-			const char *name,
-			const char *buf, size_t len,
-			void *data, enum config_scope scope,
-			const struct config_parse_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
 			     struct repository *repo,
 			     const struct object_id *oid, void *data,
@@ -333,8 +218,6 @@ int repo_config_set_worktree_gently(struct repository *, const char *, const cha
  */
 void git_config_set(const char *, const char *);
 
-int git_config_parse_key(const char *, char **, size_t *);
-
 /*
  * The following macros specify flag bits that alter the behavior
  * of the git_config_set_multivar*() methods.
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] config: report config parse errors using cb
  2023-09-21 21:11       ` Josh Steadmon
@ 2023-09-21 23:36         ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-09-21 23:36 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz

Josh Steadmon <steadmon@google.com> writes:

> As Jonathan Tan mentioned in [1], on calling do_event() we set the start
> offset of the new event, and execute the callback for the previous event
> whose end offset we now know.
>
> I refactored this into "start_event()" and "flush_event()" functions as
> suggested, and added a new "do_event_and_flush()" function for the case
> where we want to immediately execute a callback for an event.

Very nicely done.  Thanks, both.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
                     ` (4 preceding siblings ...)
  2023-09-21 21:17   ` [PATCH v3 5/5] config-parse: split library out of config.[c|h] Josh Steadmon
@ 2023-10-17 17:13   ` Junio C Hamano
  2023-10-23 19:34     ` Taylor Blau
  5 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2023-10-17 17:13 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz

Josh Steadmon <steadmon@google.com> writes:

> Config parsing no longer uses global state as of gc/config-context, so the
> natural next step for libification is to turn that into its own library.
> This series starts that process by moving config parsing into
> config-parse.[c|h] so that other programs can include this functionality
> without pulling in all of config.[c|h].

This has been in list archive collecting dust.  It is unfortunate
that not many people appear to be interested in reviewing others'
patches?

> Open questions:
> - How do folks feel about the do_event() refactor in patches 2 & 3?

I gave a quick re-read and found that the code after patch 2 made it
easier to see how config.c::do_event() does its thing (even though
the patch text of that exact step was somehow a bit hard to follow).

However, the helper added by patch 3, do_event_and_flush(), that
duplicates exactly what do_event() does, is hard to reason about, at
least for me.  It returns early without setting .previous_type to
EOF and the value returned from the helper signals if that is the
case (the two early return points both return what flush_event()
gave us), but the only caller of the helper does not even inspect
the return value, unlike all the callers of do_event(), which also
looks a bit fishy.

Thanks.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/5] config: split out config_parse_options
  2023-09-21 21:17   ` [PATCH v3 1/5] config: split out config_parse_options Josh Steadmon
@ 2023-10-23 17:52     ` Jonathan Tan
  2023-10-23 18:46       ` Taylor Blau
  0 siblings, 1 reply; 49+ messages in thread
From: Jonathan Tan @ 2023-10-23 17:52 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git, calvinwan, glencbz, gitster

Josh Steadmon <steadmon@google.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> "struct config_options" is a disjoint set of options used by the config
> parser (e.g. event listeners) and options used by config_with_options()
> (e.g. to handle includes, choose which config files to parse).

Can this sentence be reworded? In particular, "disjoint" is a word
normally applied to two or more sets (meaning that they have no elements
in common), but here it is used for only one.

Everything else looks good, and the reasoning (some functions only use
a subset of the fields, and this subset is easily explained conceptually
as those related to parsing) makes sense.
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/5] config: split do_event() into start and flush operations
  2023-09-21 21:17   ` [PATCH v3 2/5] config: split do_event() into start and flush operations Josh Steadmon
@ 2023-10-23 18:05     ` Jonathan Tan
  0 siblings, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-10-23 18:05 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git, calvinwan, glencbz, gitster

Josh Steadmon <steadmon@google.com> writes:
> +static void start_event(struct config_source *cs, enum config_event_t type,
> +		       struct parse_event_data *data)
> +{
> +	data->previous_type = type;
> +	data->previous_offset = get_corrected_offset(cs, type);
> +}

It's a pity that get_corrected_offset() has to be called twice (once
here and once below) but I think that's the best we can do given how the
code is laid out (and I can't think of a better code layout either).

> +static int flush_event(struct config_source *cs, enum config_event_t type,
> +		       struct parse_event_data *data)

One thing confusing here is that the "type" is not what's being flushed,
but used to change details about how we flush. Technically all we need
is is_whitespace_type and is_eof_type, but that's clumsier to code. I
think the best we can do is add some documentation to this function,
maybe 'Flush the event started by a prior start_event(), if one exists.
The type of the event being flushed is not "type" but the type that was
passed to the prior start_event(); "type" here may merely change how the
flush is performed' or something like that.

> +{
> +	if (!data->opts || !data->opts->event_fn)
> +		return 0;
> +
> +	if (type == CONFIG_EVENT_WHITESPACE &&
> +	    data->previous_type == type)
> +		return 0;
>  
>  	if (data->previous_type != CONFIG_EVENT_EOF &&
>  	    data->opts->event_fn(data->previous_type, data->previous_offset,
> -				 offset, cs, data->opts->event_fn_data) < 0)
> +				 get_corrected_offset(cs, type), cs,
> +				 data->opts->event_fn_data) < 0)
>  		return -1;

Another confusing point here is how EOF is used both to mean
"start_event() was never called" and a true EOF. I think for now it's
best to just document this where we define CONFIG_EVENT_EOF.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] config: report config parse errors using cb
  2023-09-21 21:17   ` [PATCH v3 3/5] config: report config parse errors using cb Josh Steadmon
@ 2023-10-23 18:41     ` Jonathan Tan
  2023-10-23 19:29     ` Taylor Blau
  1 sibling, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-10-23 18:41 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git, calvinwan, glencbz, gitster

Josh Steadmon <steadmon@google.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> In a subsequent commit, config parsing will become its own library, and
> it's likely that the caller will want flexibility in handling errors
> (instead of being limited to the error handling we have in-tree).
> 
> Move the Git-specific error handling into a config_parser_event_fn_t
> that responds to config errors, and make git_parse_source() always
> return -1 (careful inspection shows that it was always returning -1
> already). This makes CONFIG_ERROR_SILENT obsolete since that is
> equivalent to not specifying an error event listener. Also, remove
> CONFIG_ERROR_UNSET and the config_source 'default', since all callers
> are now expected to specify the error handling they want.

I think this has to be better explained. So:

- There is already a config_parser_event_fn_t that can be configured
by a user to receive emitted config events. This callback can return
negative to halt further config parsing.

- Currently, it is git_parse_source() that detects when an error
occurs, and it emits a CONFIG_EVENT_ERROR and either dies, prints
an error, or swallows the error depending on error_action; no
matter what error_action is, it halts config parsing, as one would
expect. This commit moves the die/print/swallow handling to a
config_parser_event_fn_t that will see the CONFIG_EVENT_ERROR and die/
print/swallow.

- This new config_parser_event_fn_t does not need to swallow, since
that's the same as not passing in a callback. So it just needs to die/
print.

> @@ -1039,6 +1042,29 @@ static int do_event(struct config_source *cs, enum config_event_t type,
>  	return 0;
>  }
>  
> +static int do_event_and_flush(struct config_source *cs,
> +			      enum config_event_t type,
> +			      struct parse_event_data *data)
> +{
> +	int maybe_ret;
> +
> +	if ((maybe_ret = flush_event(cs, type, data)) < 1)
> +		return maybe_ret;
> +
> +	start_event(cs, type, data);
> +
> +	if ((maybe_ret = flush_event(cs, type, data)) < 1)
> +		return maybe_ret;
> +
> +	/*
> +	 * Not actually EOF, but this indicates we don't have a valid event
> +	 * to flush next time around.
> +	 */
> +	data->previous_type = CONFIG_EVENT_EOF;
> +
> +	return 0;
> +}

A lot of this function only makes sense if the type is ERROR, so maybe
rename this as flush_and_emit_error() (and don't take in a type). As
it is, right now there is some confusion about how you can flush (I'm
referring to the second flush) with the same type as what you passed
to start_event().

Also, I don't think we should set data->previous_type here. Instead
there should be a comment saying that if you're emitting ERROR, you
should halt config parsing. The return value here is useless too (it
signals whether we should halt config parsing, but the caller should
always halt, so we don't need to return anything).


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/5] config: split out config_parse_options
  2023-10-23 17:52     ` Jonathan Tan
@ 2023-10-23 18:46       ` Taylor Blau
  0 siblings, 0 replies; 49+ messages in thread
From: Taylor Blau @ 2023-10-23 18:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Josh Steadmon, git, calvinwan, glencbz, gitster

On Mon, Oct 23, 2023 at 10:52:17AM -0700, Jonathan Tan wrote:
> Josh Steadmon <steadmon@google.com> writes:
> > From: Glen Choo <chooglen@google.com>
> >
> > "struct config_options" is a disjoint set of options used by the config
> > parser (e.g. event listeners) and options used by config_with_options()
> > (e.g. to handle includes, choose which config files to parse).
>
> Can this sentence be reworded? In particular, "disjoint" is a word
> normally applied to two or more sets (meaning that they have no elements
> in common), but here it is used for only one.

The pedant in me agrees with you. I do think that the sentence reads a
little awkwardly. Perhaps instead:

    "struct config_options" has members which serve two distinct
    purposes. There are a set of members used by the configuration parse
    (e.g. event listeners). There is also a set used by
    config_with_options() (e.g to handle includes, choose which config
    files to parse).

> Everything else looks good, and the reasoning (some functions only use
> a subset of the fields, and this subset is easily explained conceptually
> as those related to parsing) makes sense.

Yup.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin
  2023-09-21 21:17   ` [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
@ 2023-10-23 18:52     ` Jonathan Tan
  0 siblings, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-10-23 18:52 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git, calvinwan, glencbz, gitster

Josh Steadmon <steadmon@google.com> writes:
> diff --git a/config.c b/config.c
> index 0c4f1a2874..50188f469a 100644
> --- a/config.c
> +++ b/config.c
> @@ -2063,12 +2063,11 @@ static int do_config_from_file(config_fn_t fn,
>  }
>  
>  static int git_config_from_stdin(config_fn_t fn, void *data,
> -				 enum config_scope scope)
> +				 enum config_scope scope,
> +				 const struct config_parse_options *config_opts)
>  {
> -	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
> -
>  	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
> -				   data, scope, &config_opts);
> +				   data, scope, config_opts);
>  }
>  
>  int git_config_from_file_with_options(config_fn_t fn, const char *filename,
> @@ -2303,7 +2302,8 @@ int config_with_options(config_fn_t fn, void *data,
>  	 * regular lookup sequence.
>  	 */
>  	if (config_source && config_source->use_stdin) {
> -		ret = git_config_from_stdin(fn, data, config_source->scope);
> +		ret = git_config_from_stdin(fn, data, config_source->scope,
> +					    &opts->parse_options);
>  	} else if (config_source && config_source->file) {
>  		ret = git_config_from_file_with_options(fn, config_source->file,
>  							data, config_source->scope,

Does this change the behavior of stdin config parsing from "die" to
"silent" (since there is no event emitting callback)? The only user of
stdin parsing seems to be builtin/config.c, so maybe a corresponding
change needs to be made there.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/5] config-parse: split library out of config.[c|h]
  2023-09-21 21:17   ` [PATCH v3 5/5] config-parse: split library out of config.[c|h] Josh Steadmon
@ 2023-10-23 18:53     ` Jonathan Tan
  0 siblings, 0 replies; 49+ messages in thread
From: Jonathan Tan @ 2023-10-23 18:53 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git, calvinwan, glencbz, gitster

Josh Steadmon <steadmon@google.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> The config parsing machinery (besides "include" directives) is usable by
> programs other than Git - it works with any file written in Git config
> syntax (IOW it doesn't rely on 'core' Git features like a repository),
> and as of the series ending at 6e8e7981eb (config: pass source to
> config_parser_event_fn_t, 2023-06-28), it no longer relies on global
> state. Thus, we can and should start turning it into a library other
> programs can use.

Checking this with --color-moved looks good, but we'll need to take
another look once my comments about the earlier patches have been
addressed.
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] config: report config parse errors using cb
  2023-09-21 21:17   ` [PATCH v3 3/5] config: report config parse errors using cb Josh Steadmon
  2023-10-23 18:41     ` Jonathan Tan
@ 2023-10-23 19:29     ` Taylor Blau
  2023-10-23 20:11       ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Taylor Blau @ 2023-10-23 19:29 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jonathantanmy, calvinwan, glencbz, gitster

On Thu, Sep 21, 2023 at 02:17:22PM -0700, Josh Steadmon wrote:
> diff --git a/bundle-uri.c b/bundle-uri.c
> index f93ca6a486..856bffdcad 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
>  				   struct bundle_list *list)
>  {
>  	int result;
> -	struct config_parse_options opts = {
> -		.error_action = CONFIG_ERROR_ERROR,
> -	};
> +	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);

I'm nit-picking, but I find this parameterized initializer macro to be a
little unusual w.r.t our usual conventions.

In terms of "usual conventions," I'm thinking about STRING_LIST_INIT_DUP
versus STRING_LIST_INIT_NODUP (as opposed to something like
STRING_LIST_INIT(DUP) or STRING_LIST_INIT(NODUP)).

Since there are only two possible values (the ones corresponding to
error() and die()) I wonder if something like CP_OPTS_INIT_ERROR and
CP_OPTS_INIT_DIE might be more appropriate. If you don't like either of
those, I'd suggest making the initializer a function instead of a
parameterized macro.

>  	if (!list->baseURI) {
>  		struct strbuf baseURI = STRBUF_INIT;
> diff --git a/config.c b/config.c
> index ff138500a2..0c4f1a2874 100644
> --- a/config.c
> +++ b/config.c
> @@ -55,7 +55,6 @@ struct config_source {
>  	enum config_origin_type origin_type;
>  	const char *name;
>  	const char *path;
> -	enum config_error_action default_error_action;
>  	int linenr;
>  	int eof;
>  	size_t total_len;
> @@ -185,13 +184,15 @@ static int handle_path_include(const struct key_value_info *kvi,
>  	}
>
>  	if (!access_or_die(path, R_OK, 0)) {
> +		struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
> +
>  		if (++inc->depth > MAX_INCLUDE_DEPTH)
>  			die(_(include_depth_advice), MAX_INCLUDE_DEPTH, path,
>  			    !kvi ? "<unknown>" :
>  			    kvi->filename ? kvi->filename :
>  			    "the command line");
>  		ret = git_config_from_file_with_options(git_config_include, path, inc,
> -							kvi->scope, NULL);
> +							kvi->scope, &config_opts);

...OK, so using the CONFIG_ERROR_DIE variant seems like the right choice
here because git_config_from_file_with_options() calls
do_config_from_file() which sets its default_error_action as
CONFIG_ERROR_DIE.

>  static uintmax_t get_unit_factor(const char *end)
> @@ -2023,7 +2052,6 @@ static int do_config_from_file(config_fn_t fn,
>  	top.origin_type = origin_type;
>  	top.name = name;
>  	top.path = path;
> -	top.default_error_action = CONFIG_ERROR_DIE;
>  	top.do_fgetc = config_file_fgetc;
>  	top.do_ungetc = config_file_ungetc;
>  	top.do_ftell = config_file_ftell;
> @@ -2037,8 +2065,10 @@ static int do_config_from_file(config_fn_t fn,
>  static int git_config_from_stdin(config_fn_t fn, void *data,
>  				 enum config_scope scope)
>  {
> +	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
> +
>  	return do_config_from_file(fn, CONFIG_ORIGIN_STDIN, "", NULL, stdin,
> -				   data, scope, NULL);
> +				   data, scope, &config_opts);

Same here.

>  int git_config_from_file_with_options(config_fn_t fn, const char *filename,
> @@ -2061,8 +2091,10 @@ int git_config_from_file_with_options(config_fn_t fn, const char *filename,
>
>  int git_config_from_file(config_fn_t fn, const char *filename, void *data)
>  {
> +	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_DIE);
> +
>  	return git_config_from_file_with_options(fn, filename, data,
> -						 CONFIG_SCOPE_UNKNOWN, NULL);
> +						 CONFIG_SCOPE_UNKNOWN, &config_opts);
>  }

And here.

> @@ -2098,6 +2129,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>  	char *buf;
>  	unsigned long size;
>  	int ret;
> +	struct config_parse_options config_opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
>
>  	buf = repo_read_object_file(repo, oid, &type, &size);
>  	if (!buf)
> @@ -2108,7 +2140,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>  	}
>
>  	ret = git_config_from_mem(fn, CONFIG_ORIGIN_BLOB, name, buf, size,
> -				  data, scope, NULL);
> +				  data, scope, &config_opts);
>  	free(buf);

This one uses git_config_from_mem(), which sets the default error action
to "CONFIG_ERROR_ERROR", so this transformation looks correct.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-10-17 17:13   ` [PATCH v3 0/5] config-parse: create config parsing library Junio C Hamano
@ 2023-10-23 19:34     ` Taylor Blau
  2023-10-23 20:13       ` Junio C Hamano
  2023-10-24 22:50       ` Jonathan Tan
  0 siblings, 2 replies; 49+ messages in thread
From: Taylor Blau @ 2023-10-23 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git, jonathantanmy, calvinwan, glencbz

On Tue, Oct 17, 2023 at 10:13:49AM -0700, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
>
> > Open questions:
> > - How do folks feel about the do_event() refactor in patches 2 & 3?
>
> I gave a quick re-read and found that the code after patch 2 made it
> easier to see how config.c::do_event() does its thing (even though
> the patch text of that exact step was somehow a bit hard to follow).
>
> However, the helper added by patch 3, do_event_and_flush(), that
> duplicates exactly what do_event() does, is hard to reason about, at
> least for me.  It returns early without setting .previous_type to
> EOF and the value returned from the helper signals if that is the
> case (the two early return points both return what flush_event()
> gave us), but the only caller of the helper does not even inspect
> the return value, unlike all the callers of do_event(), which also
> looks a bit fishy.

I had similar thoughts while reviewing.

But I am not sure that I agree that this series is moving us in the
right direction necessarily. Or at least I am not convinced that
shipping the intermediate state is worth doing before we have callers
that could drop '#include "config.h"' for just the parser.

This feels like churn that does not yield a tangible pay-off, at least
in the sense that the refactoring and code movement delivers us
something that we can substantively use today.

I dunno.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] config: report config parse errors using cb
  2023-10-23 19:29     ` Taylor Blau
@ 2023-10-23 20:11       ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-10-23 20:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Josh Steadmon, git, jonathantanmy, calvinwan, glencbz

Taylor Blau <me@ttaylorr.com> writes:

>> +	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
>
> I'm nit-picking, but I find this parameterized initializer macro to be a
> little unusual w.r.t our usual conventions.
>
> In terms of "usual conventions," I'm thinking about STRING_LIST_INIT_DUP
> versus STRING_LIST_INIT_NODUP (as opposed to something like
> STRING_LIST_INIT(DUP) or STRING_LIST_INIT(NODUP)).

FWIW, I have always felt that the way STRING_LIST_INIT* was done was
quite ugly.  The new pattern does look superiour, as long as (1) it
does not involve voodoo like token pasting, and (2) the parameters
passed does not grow.  The latter is especially important as there
is no equivalent to designated initializers in C preprocessor macros.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-10-23 19:34     ` Taylor Blau
@ 2023-10-23 20:13       ` Junio C Hamano
  2023-10-24 22:50       ` Jonathan Tan
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-10-23 20:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Josh Steadmon, git, jonathantanmy, calvinwan, glencbz

Taylor Blau <me@ttaylorr.com> writes:

> This feels like churn that does not yield a tangible pay-off, at least
> in the sense that the refactoring and code movement delivers us
> something that we can substantively use today.
>
> I dunno.

That matches something I felt but was too polite to say aloud ;-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-10-23 19:34     ` Taylor Blau
  2023-10-23 20:13       ` Junio C Hamano
@ 2023-10-24 22:50       ` Jonathan Tan
  2023-10-25 19:37         ` Josh Steadmon
  1 sibling, 1 reply; 49+ messages in thread
From: Jonathan Tan @ 2023-10-24 22:50 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Jonathan Tan, Junio C Hamano, Josh Steadmon, git, calvinwan, glencbz

Taylor Blau <me@ttaylorr.com> writes:
> But I am not sure that I agree that this series is moving us in the
> right direction necessarily. Or at least I am not convinced that
> shipping the intermediate state is worth doing before we have callers
> that could drop '#include "config.h"' for just the parser.
> 
> This feels like churn that does not yield a tangible pay-off, at least
> in the sense that the refactoring and code movement delivers us
> something that we can substantively use today.
> 
> I dunno.
> 
> Thanks,
> Taylor

Thanks for calling this out. We do want our changes to be good for both
the libification and the non-libification cases as much as possible. As
it is, I do agree that since we won't have callers that can use the new
parser header (I think the likeliest cause of having such a caller is
if we have a "interpret-config" command, like "interpret-trailers"), we
probably shouldn't merge this (at least, the last 2 patches).

I think patches 1-3 are still usable (they make some internals of config
parsing less confusing) but I'm also OK if we hold off on them until
we find a compelling use case that motivates refactoring on the config
parser.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-10-24 22:50       ` Jonathan Tan
@ 2023-10-25 19:37         ` Josh Steadmon
  2023-10-27 13:04           ` Junio C Hamano
  0 siblings, 1 reply; 49+ messages in thread
From: Josh Steadmon @ 2023-10-25 19:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Taylor Blau, Junio C Hamano, git, calvinwan, glencbz

On 2023.10.24 15:50, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > But I am not sure that I agree that this series is moving us in the
> > right direction necessarily. Or at least I am not convinced that
> > shipping the intermediate state is worth doing before we have callers
> > that could drop '#include "config.h"' for just the parser.
> > 
> > This feels like churn that does not yield a tangible pay-off, at least
> > in the sense that the refactoring and code movement delivers us
> > something that we can substantively use today.
> > 
> > I dunno.
> > 
> > Thanks,
> > Taylor
> 
> Thanks for calling this out. We do want our changes to be good for both
> the libification and the non-libification cases as much as possible. As
> it is, I do agree that since we won't have callers that can use the new
> parser header (I think the likeliest cause of having such a caller is
> if we have a "interpret-config" command, like "interpret-trailers"), we
> probably shouldn't merge this (at least, the last 2 patches).
> 
> I think patches 1-3 are still usable (they make some internals of config
> parsing less confusing) but I'm also OK if we hold off on them until
> we find a compelling use case that motivates refactoring on the config
> parser.

Thanks everyone for the revived discussion here. I think I agree, this
series is not going in the right direction. Additionally, our internal
use case for this change has evaporated, so let's just drop the series.
We can pick it up again later if interest returns.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] config-parse: create config parsing library
  2023-10-25 19:37         ` Josh Steadmon
@ 2023-10-27 13:04           ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2023-10-27 13:04 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, Taylor Blau, git, calvinwan, glencbz

Josh Steadmon <steadmon@google.com> writes:

> Thanks everyone for the revived discussion here. I think I agree, this
> series is not going in the right direction. Additionally, our internal
> use case for this change has evaporated, so let's just drop the series.
> We can pick it up again later if interest returns.

OK.  Let's scrap it for now.

The "internal use case" behind a proposed feature changing so
quickly is a bit worrying.  What is good for this project should
ideally be good for everybody, not only for satisfying a particular
$CORP needs of the day.  But I think the idea of giving enhanced
visibility into stakeholder companies directions and priorities
Emily (I think?)  floated during the contributors' summit may help
reduce such a risk, hopefully.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-10-27 13:04 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-20 22:17 [PATCH 0/2] config-parse: create config parsing library Glen Choo via GitGitGadget
2023-07-20 22:17 ` [PATCH 1/2] config: return positive from git_config_parse_key() Glen Choo via GitGitGadget
2023-07-20 23:44   ` Jonathan Tan
2023-07-21  4:32   ` Junio C Hamano
2023-07-21 16:12     ` Glen Choo
2023-07-21 16:36       ` Junio C Hamano
2023-07-20 22:17 ` [PATCH 2/2] config-parse: split library out of config.[c|h] Glen Choo via GitGitGadget
2023-07-21  0:31   ` Jonathan Tan
2023-07-21 15:55     ` Glen Choo
2023-07-31 23:46 ` [RFC PATCH v1.5 0/5] config-parse: create config parsing library Glen Choo
2023-07-31 23:46   ` [RFC PATCH v1.5 1/5] config: return positive from git_config_parse_key() Glen Choo
2023-07-31 23:46   ` [RFC PATCH v1.5 2/5] config: split out config_parse_options Glen Choo
2023-07-31 23:46   ` [RFC PATCH v1.5 3/5] config: report config parse errors using cb Glen Choo
2023-08-04 21:34     ` Jonathan Tan
2023-07-31 23:46   ` [RFC PATCH v1.5 4/5] config.c: accept config_parse_options in git_config_from_stdin Glen Choo
2023-07-31 23:46   ` [RFC PATCH v1.5 5/5] config-parse: split library out of config.[c|h] Glen Choo
2023-08-23 21:53 ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
2023-08-23 21:53   ` [PATCH v2 1/4] config: split out config_parse_options Josh Steadmon
2023-08-23 23:26     ` Junio C Hamano
2023-09-21 21:08       ` Josh Steadmon
2023-08-23 21:53   ` [PATCH v2 2/4] config: report config parse errors using cb Josh Steadmon
2023-08-24  1:19     ` Junio C Hamano
2023-08-24 17:31       ` Jonathan Tan
2023-08-24 18:48         ` Junio C Hamano
2023-09-21 21:11       ` Josh Steadmon
2023-09-21 23:36         ` Junio C Hamano
2023-08-23 21:53   ` [PATCH v2 3/4] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
2023-08-23 21:53   ` [PATCH v2 4/4] config-parse: split library out of config.[c|h] Josh Steadmon
2023-08-24 20:10   ` [PATCH v2 0/4] config-parse: create config parsing library Josh Steadmon
2023-09-21 21:17 ` [PATCH v3 0/5] " Josh Steadmon
2023-09-21 21:17   ` [PATCH v3 1/5] config: split out config_parse_options Josh Steadmon
2023-10-23 17:52     ` Jonathan Tan
2023-10-23 18:46       ` Taylor Blau
2023-09-21 21:17   ` [PATCH v3 2/5] config: split do_event() into start and flush operations Josh Steadmon
2023-10-23 18:05     ` Jonathan Tan
2023-09-21 21:17   ` [PATCH v3 3/5] config: report config parse errors using cb Josh Steadmon
2023-10-23 18:41     ` Jonathan Tan
2023-10-23 19:29     ` Taylor Blau
2023-10-23 20:11       ` Junio C Hamano
2023-09-21 21:17   ` [PATCH v3 4/5] config.c: accept config_parse_options in git_config_from_stdin Josh Steadmon
2023-10-23 18:52     ` Jonathan Tan
2023-09-21 21:17   ` [PATCH v3 5/5] config-parse: split library out of config.[c|h] Josh Steadmon
2023-10-23 18:53     ` Jonathan Tan
2023-10-17 17:13   ` [PATCH v3 0/5] config-parse: create config parsing library Junio C Hamano
2023-10-23 19:34     ` Taylor Blau
2023-10-23 20:13       ` Junio C Hamano
2023-10-24 22:50       ` Jonathan Tan
2023-10-25 19:37         ` Josh Steadmon
2023-10-27 13:04           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).