git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Trailer readability cleanups
@ 2023-08-05  5:04 Linus Arver via GitGitGadget
  2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver

These patches were created while digging into the trailer code to better
understand how it works, in preparation for making the trailer.{c,h} files
as small as possible to make them available as a library for external users.
This series was originally created as part of [1], but are sent here
separately because the changes here are arguably more subjective in nature.
I think Patch 1 is the most important in this series. The others can wait,
if folks are opposed to adding them on their own merits at this point in
time.

These patches do not add or change any features. Instead, their goal is to
make the code easier to understand for new contributors (like myself), by
making various cleanups and improvements. Ultimately, my hope is that with
such cleanups, we are better positioned to make larger changes (especially
the broader libification effort, as in "Introduce Git Standard Library"
[2]).

Patch 1 was inspired by 576de3d956 (unpack_trees: start splitting internal
fields from public API, 2023-02-27) [3], and is in preparation for a
libification effort in the future around the trailer code. Independent of
libification, it still makes sense to discourage callers from peeking into
these trailer-internal fields.

Patches 2-3 aim to make some functions do a little less multitasking.

Patch 4 makes the find_patch_start function care about the "--no-divider"
option, because it that option matters for determining the start of the
"patch part" of the input.

Patch 5 is a renaming change to reduce overloaded language in the codebase.
It is inspired by 229d6ab6bf (doc: trailer: examples: avoid the word
"message" by itself, 2023-06-15) [4], which did a similar thing for the
interpret-trailers documentation.

[1]
https://lore.kernel.org/git/pull.1564.git.1691210737.gitgitgadget@gmail.com/T/#mb044012670663d8eb7a548924bbcc933bef116de
[2]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/
[3]
https://lore.kernel.org/git/pull.1149.git.1677143700.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/6b4cb31b17077181a311ca87e82464a1e2ad67dd.1686797630.git.gitgitgadget@gmail.com/

Linus Arver (5):
  trailer: separate public from internal portion of trailer_iterator
  trailer: split process_input_file into separate pieces
  trailer: split process_command_line_args into separate functions
  trailer: teach find_patch_start about --no-divider
  trailer: rename *_DEFAULT enums to *_UNSPECIFIED

 trailer.c | 113 ++++++++++++++++++++++++++++++------------------------
 trailer.h |  12 +++---
 2 files changed, 69 insertions(+), 56 deletions(-)


base-commit: 1b0a5129563ebe720330fdc8f5c6843d27641137
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1563%2Flistx%2Ftrailer-libification-prep-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1563/listx/trailer-libification-prep-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1563
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
@ 2023-08-05  5:04 ` Linus Arver via GitGitGadget
  2023-08-07 21:16   ` Glen Choo
  2023-08-05  5:04 ` [PATCH 2/5] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

The fields here are not meant to be used by downstream callers, so put
them behind an anonymous struct named as
"__private_to_trailer_c__do_not_use" to warn against their use.

Internally, use a "#define" to keep the code tidy.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 12 +++++++-----
 trailer.h |  6 ++++--
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index f408f9b058d..dff3fafe865 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1214,20 +1214,22 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	trailer_info_release(&info);
 }
 
+#define private __private_to_trailer_c__do_not_use
+
 void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
 {
 	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
 	strbuf_init(&iter->key, 0);
 	strbuf_init(&iter->val, 0);
 	opts.no_divider = 1;
-	trailer_info_get(&iter->info, msg, &opts);
-	iter->cur = 0;
+	trailer_info_get(&iter->private.info, msg, &opts);
+	iter->private.cur = 0;
 }
 
 int trailer_iterator_advance(struct trailer_iterator *iter)
 {
-	while (iter->cur < iter->info.trailer_nr) {
-		char *trailer = iter->info.trailers[iter->cur++];
+	while (iter->private.cur < iter->private.info.trailer_nr) {
+		char *trailer = iter->private.info.trailers[iter->private.cur++];
 		int separator_pos = find_separator(trailer, separators);
 
 		if (separator_pos < 1)
@@ -1245,7 +1247,7 @@ int trailer_iterator_advance(struct trailer_iterator *iter)
 
 void trailer_iterator_release(struct trailer_iterator *iter)
 {
-	trailer_info_release(&iter->info);
+	trailer_info_release(&iter->private.info);
 	strbuf_release(&iter->val);
 	strbuf_release(&iter->key);
 }
diff --git a/trailer.h b/trailer.h
index 795d2fccfd9..db57e028650 100644
--- a/trailer.h
+++ b/trailer.h
@@ -119,8 +119,10 @@ struct trailer_iterator {
 	struct strbuf val;
 
 	/* private */
-	struct trailer_info info;
-	size_t cur;
+	struct {
+		struct trailer_info info;
+		size_t cur;
+	} __private_to_trailer_c__do_not_use;
 };
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/5] trailer: split process_input_file into separate pieces
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
@ 2023-08-05  5:04 ` Linus Arver via GitGitGadget
  2023-08-07 22:39   ` Glen Choo
  2023-08-05  5:04 ` [PATCH 3/5] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Currently, process_input_file does three things:

    (1) parse the input string for trailers,
    (2) print text before the trailers, and
    (3) calculate the position of the input where the trailers end.

Rename this function to parse_trailers(), and make it only do
(1). The caller of this function, process_trailers, becomes responsible
for (2) and (3). These items belong inside process_trailers because they
are both concerned with printing the surrounding text around
trailers (which is already one of the immediate concerns of
process_trailers).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/trailer.c b/trailer.c
index dff3fafe865..16fbba03d07 100644
--- a/trailer.c
+++ b/trailer.c
@@ -961,28 +961,23 @@ static void unfold_value(struct strbuf *val)
 	strbuf_release(&out);
 }
 
-static size_t process_input_file(FILE *outfile,
-				 const char *str,
-				 struct list_head *head,
-				 const struct process_trailer_options *opts)
+/*
+ * Parse trailers in "str" and populate the "head" linked list structure.
+ */
+static void parse_trailers(struct trailer_info *info,
+			     const char *str,
+			     struct list_head *head,
+			     const struct process_trailer_options *opts)
 {
-	struct trailer_info info;
 	struct strbuf tok = STRBUF_INIT;
 	struct strbuf val = STRBUF_INIT;
 	size_t i;
 
-	trailer_info_get(&info, str, opts);
-
-	/* Print lines before the trailers as is */
-	if (!opts->only_trailers)
-		fwrite(str, 1, info.trailer_start - str, outfile);
+	trailer_info_get(info, str, opts);
 
-	if (!opts->only_trailers && !info.blank_line_before_trailer)
-		fprintf(outfile, "\n");
-
-	for (i = 0; i < info.trailer_nr; i++) {
+	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
-		char *trailer = info.trailers[i];
+		char *trailer = info->trailers[i];
 		if (trailer[0] == comment_line_char)
 			continue;
 		separator_pos = find_separator(trailer, separators);
@@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile,
 		}
 	}
 
-	trailer_info_release(&info);
-
-	return info.trailer_end - str;
+	trailer_info_release(info);
 }
 
 static void free_all(struct list_head *head)
@@ -1054,6 +1047,7 @@ void process_trailers(const char *file,
 {
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
+	struct trailer_info info;
 	size_t trailer_end;
 	FILE *outfile = stdout;
 
@@ -1064,8 +1058,16 @@ void process_trailers(const char *file,
 	if (opts->in_place)
 		outfile = create_in_place_tempfile(file);
 
+	parse_trailers(&info, sb.buf, &head, opts);
+	trailer_end = info.trailer_end - sb.buf;
+
 	/* Print the lines before the trailers */
-	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
+	if (!opts->only_trailers)
+		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+
+	if (!opts->only_trailers && !info.blank_line_before_trailer)
+		fprintf(outfile, "\n");
+
 
 	if (!opts->only_input) {
 		LIST_HEAD(arg_head);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/5] trailer: split process_command_line_args into separate functions
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
  2023-08-05  5:04 ` [PATCH 2/5] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
@ 2023-08-05  5:04 ` Linus Arver via GitGitGadget
  2023-08-07 22:51   ` Glen Choo
  2023-08-05  5:04 ` [PATCH 4/5] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, process_command_line_args did two things:

    (1) parse trailers from the configuration, and
    (2) parse trailers defined on the command line.

Separate these concerns into parse_trailers_from_config and
parse_trailers_from_command_line_args, respectively. Remove (now
redundant) process_command_line_args.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 16fbba03d07..89246a0d395 100644
--- a/trailer.c
+++ b/trailer.c
@@ -711,30 +711,35 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	list_add_tail(&new_item->list, arg_head);
 }
 
-static void process_command_line_args(struct list_head *arg_head,
-				      struct list_head *new_trailer_head)
+static void parse_trailers_from_config(struct list_head *config_head)
 {
 	struct arg_item *item;
-	struct strbuf tok = STRBUF_INIT;
-	struct strbuf val = STRBUF_INIT;
-	const struct conf_info *conf;
 	struct list_head *pos;
 
-	/*
-	 * In command-line arguments, '=' is accepted (in addition to the
-	 * separators that are defined).
-	 */
-	char *cl_separators = xstrfmt("=%s", separators);
-
 	/* Add an arg item for each configured trailer with a command */
 	list_for_each(pos, &conf_head) {
 		item = list_entry(pos, struct arg_item, list);
 		if (item->conf.command)
-			add_arg_item(arg_head,
+			add_arg_item(config_head,
 				     xstrdup(token_from_item(item, NULL)),
 				     xstrdup(""),
 				     &item->conf, NULL);
 	}
+}
+
+static void parse_trailers_from_command_line_args(struct list_head *arg_head,
+						  struct list_head *new_trailer_head)
+{
+	struct strbuf tok = STRBUF_INIT;
+	struct strbuf val = STRBUF_INIT;
+	const struct conf_info *conf;
+	struct list_head *pos;
+
+	/*
+	 * In command-line arguments, '=' is accepted (in addition to the
+	 * separators that are defined).
+	 */
+	char *cl_separators = xstrfmt("=%s", separators);
 
 	/* Add an arg item for each trailer on the command line */
 	list_for_each(pos, new_trailer_head) {
@@ -1070,8 +1075,11 @@ void process_trailers(const char *file,
 
 
 	if (!opts->only_input) {
+		LIST_HEAD(config_head);
 		LIST_HEAD(arg_head);
-		process_command_line_args(&arg_head, new_trailer_head);
+		parse_trailers_from_config(&config_head);
+		parse_trailers_from_command_line_args(&arg_head, new_trailer_head);
+		list_splice(&config_head, &arg_head);
 		process_trailers_lists(&head, &arg_head);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/5] trailer: teach find_patch_start about --no-divider
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-08-05  5:04 ` [PATCH 3/5] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
@ 2023-08-05  5:04 ` Linus Arver via GitGitGadget
  2023-08-07 23:28   ` Glen Choo
  2023-08-05  5:04 ` [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
  5 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Currently, find_patch_start only finds the start of the patch part of
the input (by looking at the "---" divider) for cases where the
"--no-divider" flag has not been provided. If the user provides this
flag, we do not rely on find_patch_start at all and just call strlen()
directly on the input.

Instead, make find_patch_start aware of "--no-divider" and make it
handle that case as well. This means we no longer need to call strlen at
all and can just rely on the existing code in find_patch_start.

This patch will make unit testing a bit more pleasant in this area in
the future when we adopt a unit testing framework, because we would not
have to test multiple functions to check how finding the start of a
patch part works (we would only need to test find_patch_start).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index 89246a0d395..3b9ce199636 100644
--- a/trailer.c
+++ b/trailer.c
@@ -812,14 +812,14 @@ static ssize_t last_line(const char *buf, size_t len)
  * Return the position of the start of the patch or the length of str if there
  * is no patch in the message.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_patch_start(const char *str, int no_divider)
 {
 	const char *s;
 
 	for (s = str; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
 			return s - str;
 	}
 
@@ -1109,11 +1109,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
+	patch_start = find_patch_start(str, opts->no_divider);
 	trailer_end = find_trailer_end(str, patch_start);
 	trailer_start = find_trailer_start(str, trailer_end);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-08-05  5:04 ` [PATCH 4/5] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
@ 2023-08-05  5:04 ` Linus Arver via GitGitGadget
  2023-08-07 23:45   ` Glen Choo
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
  5 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-08-05  5:04 UTC (permalink / raw)
  To: git; +Cc: Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Do not use *_DEFAULT as a suffix to the enums, because the word
"default" is overloaded. The following are two examples of the ambiguity
of the word "default":

(1) "Default" can mean using the "default" values that are hardcoded
    in trailer.c as

        default_conf_info.where = WHERE_END;
        default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
        default_conf_info.if_missing = MISSING_ADD;

    in ensure_configured(). These values are referred to as "the
    default" in the docs for interpret-trailers. These defaults are used
    if no "trailer.*" configurations are defined.

(2) "Default" can also mean the "trailer.*" configurations themselves,
    because these configurations are used by "default" (ahead of the
    hardcoded defaults in (1)) if no command line arguments are
    provided.

In addition, the corresponding *_DEFAULT values are chosen when the user
provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
on the command line. These "--no-*" flags are used to clear previously
provided flags of the form "--where", "--if-exists", and "--if-missing".
Using these "--no-*" flags undoes the specifying of these flags (if
any), so using the word "UNSPECIFIED" is more natural here.

So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
signals to the reader that the *_UNSPECIFIED value by itself carries no
meaning (it's a zero value and by itself does not "default" to anything,
necessitating the need to have some other way of getting to a useful
value).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 17 ++++++++++-------
 trailer.h |  6 +++---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/trailer.c b/trailer.c
index 3b9ce199636..c49826decae 100644
--- a/trailer.c
+++ b/trailer.c
@@ -388,7 +388,7 @@ static void process_trailers_lists(struct list_head *head,
 int trailer_set_where(enum trailer_where *item, const char *value)
 {
 	if (!value)
-		*item = WHERE_DEFAULT;
+		*item = WHERE_UNSPECIFIED;
 	else if (!strcasecmp("after", value))
 		*item = WHERE_AFTER;
 	else if (!strcasecmp("before", value))
@@ -405,7 +405,7 @@ int trailer_set_where(enum trailer_where *item, const char *value)
 int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 {
 	if (!value)
-		*item = EXISTS_DEFAULT;
+		*item = EXISTS_UNSPECIFIED;
 	else if (!strcasecmp("addIfDifferent", value))
 		*item = EXISTS_ADD_IF_DIFFERENT;
 	else if (!strcasecmp("addIfDifferentNeighbor", value))
@@ -424,7 +424,7 @@ int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 int trailer_set_if_missing(enum trailer_if_missing *item, const char *value)
 {
 	if (!value)
-		*item = MISSING_DEFAULT;
+		*item = MISSING_UNSPECIFIED;
 	else if (!strcasecmp("doNothing", value))
 		*item = MISSING_DO_NOTHING;
 	else if (!strcasecmp("add", value))
@@ -586,7 +586,10 @@ static void ensure_configured(void)
 	if (configured)
 		return;
 
-	/* Default config must be setup first */
+	/*
+	 * Default config must be setup first. These defaults are used if there
+	 * are no "trailer.*" or "trailer.<token>.*" options configured.
+	 */
 	default_conf_info.where = WHERE_END;
 	default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
 	default_conf_info.if_missing = MISSING_ADD;
@@ -701,11 +704,11 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	new_item->value = val;
 	duplicate_conf(&new_item->conf, conf);
 	if (new_trailer_item) {
-		if (new_trailer_item->where != WHERE_DEFAULT)
+		if (new_trailer_item->where != WHERE_UNSPECIFIED)
 			new_item->conf.where = new_trailer_item->where;
-		if (new_trailer_item->if_exists != EXISTS_DEFAULT)
+		if (new_trailer_item->if_exists != EXISTS_UNSPECIFIED)
 			new_item->conf.if_exists = new_trailer_item->if_exists;
-		if (new_trailer_item->if_missing != MISSING_DEFAULT)
+		if (new_trailer_item->if_missing != MISSING_UNSPECIFIED)
 			new_item->conf.if_missing = new_trailer_item->if_missing;
 	}
 	list_add_tail(&new_item->list, arg_head);
diff --git a/trailer.h b/trailer.h
index db57e028650..924cf5405c6 100644
--- a/trailer.h
+++ b/trailer.h
@@ -5,14 +5,14 @@
 #include "strbuf.h"
 
 enum trailer_where {
-	WHERE_DEFAULT,
+	WHERE_UNSPECIFIED,
 	WHERE_END,
 	WHERE_AFTER,
 	WHERE_BEFORE,
 	WHERE_START
 };
 enum trailer_if_exists {
-	EXISTS_DEFAULT,
+	EXISTS_UNSPECIFIED,
 	EXISTS_ADD_IF_DIFFERENT_NEIGHBOR,
 	EXISTS_ADD_IF_DIFFERENT,
 	EXISTS_ADD,
@@ -20,7 +20,7 @@ enum trailer_if_exists {
 	EXISTS_DO_NOTHING
 };
 enum trailer_if_missing {
-	MISSING_DEFAULT,
+	MISSING_UNSPECIFIED,
 	MISSING_ADD,
 	MISSING_DO_NOTHING
 };
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator
  2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
@ 2023-08-07 21:16   ` Glen Choo
  2023-08-08 12:19     ` Phillip Wood
  2023-08-10 22:50     ` Linus Arver
  0 siblings, 2 replies; 72+ messages in thread
From: Glen Choo @ 2023-08-07 21:16 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget, git; +Cc: Linus Arver

As someone who isn't that familiar with trailer code, and will have less
time for the ML soon, this is more of a quick drive-by..

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +#define private __private_to_trailer_c__do_not_use
> +
>  void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>  {
>  	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
>  	strbuf_init(&iter->key, 0);
>  	strbuf_init(&iter->val, 0);
>  	opts.no_divider = 1;
> -	trailer_info_get(&iter->info, msg, &opts);
> -	iter->cur = 0;
> +	trailer_info_get(&iter->private.info, msg, &opts);
> +	iter->private.cur = 0;
>  }
> --- a/trailer.h
> +++ b/trailer.h
> @@ -119,8 +119,10 @@ struct trailer_iterator {
>  	struct strbuf val;
>...
>  	/* private */
> -	struct trailer_info info;
> -	size_t cur;
> +	struct {
> +		struct trailer_info info;
> +		size_t cur;
> +	} __private_to_trailer_c__do_not_use;
>  };

Interesting approach to "private members". I like that it's fairly
lightweight and clear. On the other hand, I think this will fail to
autocomplete on most people's development setups, and I don't think this
is worth the tradeoff.

This is the first instance of this I could find in the codebase. I'm not
really opposed to having a new way of doing things, but it would be nice
for us to be consistent with how we handle private members. Other
approaches I've seen are:

- Using a "larger" struct to hold private members and "downcasting" for
  public users (struct dir_iterator and struct dir_iterator_int). I
  dislike this because I think this enables 'wrong' memory access too
  easily.

  (As an aside, if we really wanted to 'strictly' enforce privateness in
  this patch, shouldn't we move the "#define private" into the .c file,
  the way dir_iterator_int is in the .c file?)

- Prefixing private members with "__" (khash.h and other header-only
  libraries use this at least, not sure if we have this in the 'main
  tree'). I think this works pretty well most of the time.
- Just marking private members with a comment. IMO this is good enough
  the vast majority of the time - if something is private for a good
  reason, it's unlikely to get used accidentally anyway. But properly
  enforcing "privateness" is worthy goal anyway.

Personally, I think a decent tradeoff between enforcement and ergonomics
would be to use an inner struct like you do here, but name it something
autocomplete-friendly and obviously private, like "private" or
"_private". I suspect self-regulation and code review should be enough
to catch nearly all accidental uses of private members.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/5] trailer: split process_input_file into separate pieces
  2023-08-05  5:04 ` [PATCH 2/5] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
@ 2023-08-07 22:39   ` Glen Choo
  2023-08-11  0:41     ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Glen Choo @ 2023-08-07 22:39 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget, git; +Cc: Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Currently, process_input_file does three things:
>
>     (1) parse the input string for trailers,
>     (2) print text before the trailers, and
>     (3) calculate the position of the input where the trailers end.
>
> Rename this function to parse_trailers(), and make it only do
> (1).

Okay, process_input_file() is a very unhelpful name (What does it mean
to "process a file"?). In contrast, parse_trailers() is more
self-descriptive (It parses trailers into some appropriate format, so it
shouldn't do things like print.) Makes sense.

Is there some additional, unstated purpose behind this change besides
"move things around for readability"? E.g. do you intend to move
parse_trailers() to a future trailer parsing library? If so, that would
be useful context to evaluate the goodness of this split.

> The caller of this function, process_trailers, becomes responsible
> for (2) and (3). These items belong inside process_trailers because they
> are both concerned with printing the surrounding text around
> trailers (which is already one of the immediate concerns of
> process_trailers).

I agree that (2) doesn't belong in parse_trailers(). OTOH, (3) sounds
like something that belongs in parse_trailers() - you have to parse
trailers in order to tell where the trailers start and end, so it makes
sense for the parsing function to give those values.

> diff --git a/trailer.c b/trailer.c
> index dff3fafe865..16fbba03d07 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -961,28 +961,23 @@ static void unfold_value(struct strbuf *val)
>  	strbuf_release(&out);
>  }
>  
> -static size_t process_input_file(FILE *outfile,
> -				 const char *str,
> -				 struct list_head *head,
> -				 const struct process_trailer_options *opts)
> +/*
> + * Parse trailers in "str" and populate the "head" linked list structure.
> + */
> +static void parse_trailers(struct trailer_info *info,

"info" is an out parameter, and IIRC we typically put out parameters
towards the end. I didn't find a callout in CodingGuidelines, though, so
idk if this is an ironclad rule or not.

> +			     const char *str,
> +			     struct list_head *head,
> +			     const struct process_trailer_options *opts)
>  {
> -	struct trailer_info info;
>  	struct strbuf tok = STRBUF_INIT;
>  	struct strbuf val = STRBUF_INIT;
>  	size_t i;
>  
> -	trailer_info_get(&info, str, opts);
> -
> -	/* Print lines before the trailers as is */
> -	if (!opts->only_trailers)
> -		fwrite(str, 1, info.trailer_start - str, outfile);

We no longer fwrite the contents before the trailer, okay.

> +	trailer_info_get(info, str, opts);

This is where we actually get the start and end of trailers, and each
trailer string. This is parsing out the trailers from a string, so what
other parsing is left? Reading ahead shows that we're actually parsing
the trailer string into a "struct trailer_item". Okay, so this function
is basically a wrapper around trailer_info_get() that also "returns" the
parsed trailer_items.

> -	if (!opts->only_trailers && !info.blank_line_before_trailer)
> -		fprintf(outfile, "\n");
> -

So we don't print the trailing line. Also makes sense.

> @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile,
>  		}
>  	}
>  
> -	trailer_info_release(&info);
> -
> -	return info.trailer_end - str;
> +	trailer_info_release(info);
>  }
>  

Even though "info" is a pointer passed into this function, we are
_release-ing it. This is not an umabiguously good change, IMO. Before,
"info" was never used outside of this function, so we should obviously
release it before returning. However, now that "info" is an out
parameter, we should be more careful about releasing it. I don't think
it's obvious that the caller will see the right values for
info.trailer_end and info.trailer_start, but free()-d values for
info.trailers, and a meaningless value for info.trailer_nr (since the
items were free()-d).

I think it might be better to update the comment on parse_trailers()
like so:

  /*
   * Parse trailers in "str", populating the trailer info and "head"
   * linked list structure.
   */

and make it the caller's responsibility to call trailer_info_release().
We could move this call to where we "free_all(head)".

>  static void free_all(struct list_head *head)
> @@ -1054,6 +1047,7 @@ void process_trailers(const char *file,
>  {
>  	LIST_HEAD(head);
>  	struct strbuf sb = STRBUF_INIT;
> +	struct trailer_info info;
>  	size_t trailer_end;
>  	FILE *outfile = stdout;
>  
> @@ -1064,8 +1058,16 @@ void process_trailers(const char *file,
>  	if (opts->in_place)
>  		outfile = create_in_place_tempfile(file);

Thinking out loud, should we move the creation of outfile next to where
we first use it?

> +	parse_trailers(&info, sb.buf, &head, opts);
> +	trailer_end = info.trailer_end - sb.buf;
> +
>  	/* Print the lines before the trailers */
> -	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
> +	if (!opts->only_trailers)
> +		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);

I'm not sure if it is an unambiguously good change for the caller to
learn how to compute the start and end of the trailer sections by doing
pointer arithmetic, but I guess format_trailer_info() does this anyway,
so your proposal to move (3) outside of the parse_trailers() makes
sense.

It feels a bit non-obvious that trailer_start and trailer_end are
pointing inside the input string. I wonder if we should just return the
_start and _end offsets directly instead of returning pointers. I.e.:

   struct trailer_info {
     int blank_line_before_trailer;
 -  /*
 -   * Pointers to the start and end of the trailer block found. If there
 -   * is no trailer block found, these 2 pointers point to the end of the
 -   * input string.
 -   */
 -   const char *trailer_start, *trailer_end;
 +   /* Offsets to the trailer block start and end in the input string */
 +   size_t *trailer_start, *trailer_end;

Which makes their intended use fairly unambiguous. A quick grep suggests
that in trailer.c, we're roughly as likely to use the pointer directly
vs using it to do pointer arithmetic, so converging on one use might be
a win for readability. The only other user outside of trailer.c is
sequencer.c, which doesn't care about the return type - it only checks
if there are trailers.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/5] trailer: split process_command_line_args into separate functions
  2023-08-05  5:04 ` [PATCH 3/5] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
@ 2023-08-07 22:51   ` Glen Choo
  2023-08-11  0:59     ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Glen Choo @ 2023-08-07 22:51 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget, git; +Cc: Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Previously, process_command_line_args did two things:
>
>     (1) parse trailers from the configuration, and
>     (2) parse trailers defined on the command line.

It parses trailers from two places, but it still only does "one thing",
in that it only parses trailers.

> @@ -711,30 +711,35 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
>  	list_add_tail(&new_item->list, arg_head);
>  }
>  
> -static void process_command_line_args(struct list_head *arg_head,
> -				      struct list_head *new_trailer_head)
> +static void parse_trailers_from_config(struct list_head *config_head)
>  {
>  	struct arg_item *item;
> -	struct strbuf tok = STRBUF_INIT;
> -	struct strbuf val = STRBUF_INIT;
> -	const struct conf_info *conf;
>  	struct list_head *pos;
>  
> -	/*
> -	 * In command-line arguments, '=' is accepted (in addition to the
> -	 * separators that are defined).
> -	 */
> -	char *cl_separators = xstrfmt("=%s", separators);
> -
>  	/* Add an arg item for each configured trailer with a command */
>  	list_for_each(pos, &conf_head) {
>  		item = list_entry(pos, struct arg_item, list);
>  		if (item->conf.command)
> -			add_arg_item(arg_head,
> +			add_arg_item(config_head,
>  				     xstrdup(token_from_item(item, NULL)),
>  				     xstrdup(""),
>  				     &item->conf, NULL);
>  	}
> +}
> +
> +static void parse_trailers_from_command_line_args(struct list_head *arg_head,
> +						  struct list_head *new_trailer_head)
> +{
> +	struct strbuf tok = STRBUF_INIT;
> +	struct strbuf val = STRBUF_INIT;
> +	const struct conf_info *conf;
> +	struct list_head *pos;
> +
> +	/*
> +	 * In command-line arguments, '=' is accepted (in addition to the
> +	 * separators that are defined).
> +	 */
> +	char *cl_separators = xstrfmt("=%s", separators);
>  
>  	/* Add an arg item for each trailer on the command line */
>  	list_for_each(pos, new_trailer_head) {

I find this equally readable as the preimage, which IMO is adequately
scoped and commented.

> @@ -1070,8 +1075,11 @@ void process_trailers(const char *file,
>  
>  
>  	if (!opts->only_input) {
> +		LIST_HEAD(config_head);
>  		LIST_HEAD(arg_head);
> -		process_command_line_args(&arg_head, new_trailer_head);
> +		parse_trailers_from_config(&config_head);
> +		parse_trailers_from_command_line_args(&arg_head, new_trailer_head);
> +		list_splice(&config_head, &arg_head);
>  		process_trailers_lists(&head, &arg_head);
>  	}

But now, we have to remember to call two functions instead of just one.
This, and the slight additional churn makes me lean negative on this
change. I would be really happy if we had a use case where we only
wanted to call one function but not the other, but it seems like this
isn't the case.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/5] trailer: teach find_patch_start about --no-divider
  2023-08-05  5:04 ` [PATCH 4/5] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
@ 2023-08-07 23:28   ` Glen Choo
  2023-08-11  1:25     ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Glen Choo @ 2023-08-07 23:28 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget, git; +Cc: Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch will make unit testing a bit more pleasant in this area in
> the future when we adopt a unit testing framework, because we would not
> have to test multiple functions to check how finding the start of a
> patch part works (we would only need to test find_patch_start).

Unit tests typically only test external-facing interfaces, not
implementatino details, so without seeing the unit tests or library
boundary, it's hard to tell whether find_patch_start() is something we
want to unit test or not. I would have assumed it's not, given that it's
tiny and only has a single caller, so I'm hesitant to say that we should
definitely handle no_divider inside find_patch_start().

> @@ -812,14 +812,14 @@ static ssize_t last_line(const char *buf, size_t len)
>   * Return the position of the start of the patch or the length of str if there
>   * is no patch in the message.
>   */
> -static size_t find_patch_start(const char *str)
> +static size_t find_patch_start(const char *str, int no_divider)
>  {
>  	const char *s;
>  
>  	for (s = str; *s; s = next_line(s)) {
>  		const char *v;
>  
> -		if (skip_prefix(s, "---", &v) && isspace(*v))
> +		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
>  			return s - str;
>  	}

Assuming we wanted to make this unit-testable anyway, could we just move
the strlen() call into this function? Performance aside (I wouldn't be
surprised if a smart enough compiler could optimize away the noops), I
don't find this easier to understand. Now the reader needs to read the
code to see "if no_divider is given, noop until the end of the string,
at which point str will point to the end, and s - str will give us the
length of str", as opposed to "there are no dividers, so just return
strlen(str)".

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-08-05  5:04 ` [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
@ 2023-08-07 23:45   ` Glen Choo
  2023-08-11 18:00     ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Glen Choo @ 2023-08-07 23:45 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget, git; +Cc: Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> (2) "Default" can also mean the "trailer.*" configurations themselves,
>     because these configurations are used by "default" (ahead of the
>     hardcoded defaults in (1)) if no command line arguments are
>     provided.

Interesting, I would have never thought of config as 'default'. In fact,
I would have thought that this de facto behavior (which you also
clarified in [1]) is a bug if not for the fact that in an internal
version of this series, you cited a commit message that describes this
as expected behavior. That context would be very welcome in the ML, I
think.

[1] https://lore.kernel.org/git/6b427b4b1e82b1f01640f1f49fe8d1c2fd02111e.1691210737.git.gitgitgadget@gmail.com

> In addition, the corresponding *_DEFAULT values are chosen when the user
> provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
> on the command line. These "--no-*" flags are used to clear previously
> provided flags of the form "--where", "--if-exists", and "--if-missing".
> Using these "--no-*" flags undoes the specifying of these flags (if
> any), so using the word "UNSPECIFIED" is more natural here.
>
> So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
> signals to the reader that the *_UNSPECIFIED value by itself carries no
> meaning (it's a zero value and by itself does not "default" to anything,
> necessitating the need to have some other way of getting to a useful
> value).

Makse sense. This seems like a good change.

> @@ -586,7 +586,10 @@ static void ensure_configured(void)
>  	if (configured)
>  		return;
>  
> -	/* Default config must be setup first */
> +	/*
> +	 * Default config must be setup first. These defaults are used if there
> +	 * are no "trailer.*" or "trailer.<token>.*" options configured.
> +	 */
>  	default_conf_info.where = WHERE_END;
>  	default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
>  	default_conf_info.if_missing = MISSING_ADD;

As mentioned earlier, I find it a bit odd that we're calling config
'default' (and also that we're calling CLI args config), but
renaming default_conf_info to config_conf_info sounds worse, so let's
leave it as-is.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator
  2023-08-07 21:16   ` Glen Choo
@ 2023-08-08 12:19     ` Phillip Wood
  2023-08-10 23:15       ` Linus Arver
  2023-08-10 22:50     ` Linus Arver
  1 sibling, 1 reply; 72+ messages in thread
From: Phillip Wood @ 2023-08-08 12:19 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git; +Cc: Linus Arver

On 07/08/2023 22:16, Glen Choo wrote:
> As someone who isn't that familiar with trailer code, and will have less
> time for the ML soon, this is more of a quick drive-by..

This is a bit of a drive-by comment as well ...

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +#define private __private_to_trailer_c__do_not_use
>> +
>>   void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>>   {
>>   	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
>>   	strbuf_init(&iter->key, 0);
>>   	strbuf_init(&iter->val, 0);
>>   	opts.no_divider = 1;
>> -	trailer_info_get(&iter->info, msg, &opts);
>> -	iter->cur = 0;
>> +	trailer_info_get(&iter->private.info, msg, &opts);
>> +	iter->private.cur = 0;
>>   }
>> --- a/trailer.h
>> +++ b/trailer.h
>> @@ -119,8 +119,10 @@ struct trailer_iterator {
>>   	struct strbuf val;
>> ...
>>   	/* private */
>> -	struct trailer_info info;
>> -	size_t cur;
>> +	struct {
>> +		struct trailer_info info;
>> +		size_t cur;
>> +	} __private_to_trailer_c__do_not_use;
>>   };
> 
> Interesting approach to "private members". I like that it's fairly
> lightweight and clear. On the other hand, I think this will fail to
> autocomplete on most people's development setups, and I don't think this
> is worth the tradeoff.
> 
> This is the first instance of this I could find in the codebase. 

We have something similar in unpack_trees.h see 576de3d9560 
(unpack_trees: start splitting internal fields from public API, 
2023-02-27). That adds an "internal" member to "sturct unpack_trees" of 
type "struct unpack_trees_internal which seems to be a easier naming scheme.

> I'm not
> really opposed to having a new way of doing things, but it would be nice
> for us to be consistent with how we handle private members. Other
> approaches I've seen are:
> 
> - Using a "larger" struct to hold private members and "downcasting" for
>    public users (struct dir_iterator and struct dir_iterator_int). I
>    dislike this because I think this enables 'wrong' memory access too
>    easily.
> 
>    (As an aside, if we really wanted to 'strictly' enforce privateness in
>    this patch, shouldn't we move the "#define private" into the .c file,
>    the way dir_iterator_int is in the .c file?)

That #define is pretty ugly

Another common scheme is to have an opaque pointer to the private struct 
  in the public struct (aka pimpl idiom). The merge machinery uses this 
- see merge-recursive.h. (I'm working on something similar for the 
sequencer so we can change the internals without having to re-compile 
everything that includes "sequencer.h")

> - Prefixing private members with "__" (khash.h and other header-only
>    libraries use this at least, not sure if we have this in the 'main
>    tree'). I think this works pretty well most of the time.

It is common but I think the C standard reserves names beginning with "__"

> - Just marking private members with a comment. IMO this is good enough
>    the vast majority of the time - if something is private for a good
>    reason, it's unlikely to get used accidentally anyway. But properly
>    enforcing "privateness" is worthy goal anyway.
>
> Personally, I think a decent tradeoff between enforcement and ergonomics
> would be to use an inner struct like you do here, but name it something
> autocomplete-friendly and obviously private, like "private" or
> "_private".

I agree, something like that would match the unpack_trees example

> I suspect self-regulation and code review should be enough
> to catch nearly all accidental uses of private members.

Agreed

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator
  2023-08-07 21:16   ` Glen Choo
  2023-08-08 12:19     ` Phillip Wood
@ 2023-08-10 22:50     ` Linus Arver
  1 sibling, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-10 22:50 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Glen Choo <chooglen@google.com> writes:

> As someone who isn't that familiar with trailer code, and will have less
> time for the ML soon, this is more of a quick drive-by..

Aren't you also going on vacation soon? ;-)

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> +#define private __private_to_trailer_c__do_not_use
>> +
>>  void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>>  {
>>  	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
>>  	strbuf_init(&iter->key, 0);
>>  	strbuf_init(&iter->val, 0);
>>  	opts.no_divider = 1;
>> -	trailer_info_get(&iter->info, msg, &opts);
>> -	iter->cur = 0;
>> +	trailer_info_get(&iter->private.info, msg, &opts);
>> +	iter->private.cur = 0;
>>  }
>> --- a/trailer.h
>> +++ b/trailer.h
>> @@ -119,8 +119,10 @@ struct trailer_iterator {
>>  	struct strbuf val;
>>...
>>  	/* private */
>> -	struct trailer_info info;
>> -	size_t cur;
>> +	struct {
>> +		struct trailer_info info;
>> +		size_t cur;
>> +	} __private_to_trailer_c__do_not_use;
>>  };
>
> [...]
>
> This is the first instance of this I could find in the codebase. I'm not
> really opposed to having a new way of doing things, but it would be nice
> for us to be consistent with how we handle private members. Other
> approaches I've seen are:
>
> - Using a "larger" struct to hold private members and "downcasting" for
>   public users (struct dir_iterator and struct dir_iterator_int). I
>   dislike this because I think this enables 'wrong' memory access too
>   easily.
>   [...]
> - Prefixing private members with "__" (khash.h and other header-only
>   libraries use this at least, not sure if we have this in the 'main
>   tree'). I think this works pretty well most of the time.
> - Just marking private members with a comment. IMO this is good enough
>   the vast majority of the time - if something is private for a good
>   reason, it's unlikely to get used accidentally anyway. But properly
>   enforcing "privateness" is worthy goal anyway.

Thanks for documenting these other approaches.

I prefer the "larger" struct to hold private members pattern. More
specifically I like the container_of approach pointed out by Jacob [2],
because it is an established pattern in the Linux Kernel and because we
already sort of use the same idea in the list_head type we imported from
the Kernel in 94e99012fc (http-walker: reduce O(n) ops with
doubly-linked list, 2016-07-11). That is, for example for the
new_trailer_item struct we have

    struct new_trailer_item {
        struct list_head list;
        <list item stuff>
    };

and to me this is symmetric to the container_of pattern described by Jacob:

    struct dir_entry_private {
        struct dir_entry entry;
        <private stuff>
    };

Accordingly, we are already doing the "structure pointer math" (which
Jacob described in [2]) for list_head in list.h:

    /* Get typed element from list at a given position. */
    #define list_entry(ptr, type, member) \
        ((type *) ((char *) (ptr) - offsetof(type, member)))

In this patch series though, I decided to just stick with giving the
struct a private-sounding name, because I don't think we reached
consensus on what the preferred approach is for separating
public/private fields.

>   (As an aside, if we really wanted to 'strictly' enforce privateness in
>   this patch, shouldn't we move the "#define private" into the .c file,
>   the way dir_iterator_int is in the .c file?)

I think you meant moving the struct into the .c file (the "#define" is
already in the .c file).

> Personally, I think a decent tradeoff between enforcement and ergonomics
> would be to use an inner struct like you do here, but name it something
> autocomplete-friendly and obviously private, like "private" or
> "_private".

SGTM. I think I'll go with "internal" though, to align with 576de3d956
(unpack_trees: start splitting internal fields from public API,
2023-02-27) which Phillip pointed out. Will reroll.

> I suspect self-regulation and code review should be enough
> to catch nearly all accidental uses of private members.

Ack. In the future, if and when we want compiler-level guarantees to
make it impossible (this was the discussion at [1]), we can revisit this
area.

[1] https://lore.kernel.org/git/16ff5069-0408-21cd-995c-8b47afb9810d@github.com/
[2] https://lore.kernel.org/git/CA+P7+xo02dGkjb5DwJ1Af_hoQ5HiuxASheZxoFz+r6B-6cQMug@mail.gmail.com/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator
  2023-08-08 12:19     ` Phillip Wood
@ 2023-08-10 23:15       ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-10 23:15 UTC (permalink / raw)
  To: phillip.wood, Glen Choo, Linus Arver via GitGitGadget, git

Phillip Wood <phillip.wood123@gmail.com> writes:

> We have something similar in unpack_trees.h see 576de3d9560 
> (unpack_trees: start splitting internal fields from public API, 
> 2023-02-27). That adds an "internal" member to "sturct unpack_trees" of 
> type "struct unpack_trees_internal which seems to be a easier naming scheme.

Ack, I will use "internal" as the member name in the next reroll.

>>> +#define private __private_to_trailer_c__do_not_use
> [...]
> That #define is pretty ugly

Haha, indeed. But I think that's the point though (i.e., the degree of
ugliness matches the strength of our codebase's posture to discourage
its use by external users).

I will drop the #define in the next reroll though, so, I guess it's a
moot point anyway.

> Another common scheme is to have an opaque pointer to the private struct 
>   in the public struct (aka pimpl idiom). The merge machinery uses this 
> - see merge-recursive.h. (I'm working on something similar for the 
> sequencer so we can change the internals without having to re-compile 
> everything that includes "sequencer.h")

Very interesting! I look forward to seeing your work. :)

>> - Prefixing private members with "__" (khash.h and other header-only
>>    libraries use this at least, not sure if we have this in the 'main
>>    tree'). I think this works pretty well most of the time.
>
> It is common but I think the C standard reserves names beginning with "__"

Indeed (see [1]).

[1] https://devblogs.microsoft.com/oldnewthing/20230109-00/?p=107685

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/5] trailer: split process_input_file into separate pieces
  2023-08-07 22:39   ` Glen Choo
@ 2023-08-11  0:41     ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-11  0:41 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Glen Choo <chooglen@google.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Currently, process_input_file does three things:
>>
>>     (1) parse the input string for trailers,
>>     (2) print text before the trailers, and
>>     (3) calculate the position of the input where the trailers end.
>>
>> Rename this function to parse_trailers(), and make it only do
>> (1).
>
> [...]
>
> Is there some additional, unstated purpose behind this change besides
> "move things around for readability"? E.g. do you intend to move
> parse_trailers() to a future trailer parsing library? If so, that would
> be useful context to evaluate the goodness of this split.

I think it's still too early to say whether certain functions will make
it (unmodified) into the public, libified API. So currently "move things
around for readability" is the most concrete reason.

>> The caller of this function, process_trailers, becomes responsible
>> for (2) and (3). These items belong inside process_trailers because they
>> are both concerned with printing the surrounding text around
>> trailers (which is already one of the immediate concerns of
>> process_trailers).
>
> I agree that (2) doesn't belong in parse_trailers(). OTOH, (3) sounds
> like something that belongs in parse_trailers() - you have to parse
> trailers in order to tell where the trailers start and end, so it makes
> sense for the parsing function to give those values.

I don't think (3) should belong in parse_trailers, mainly because the
"info" struct we pass into it gets this information populated by
parse_trailers already. Which is why we can do (3) in the caller with

    parse_trailers(&info, sb.buf, &head, opts);
    trailer_end = info.trailer_end - sb.buf;

to get the same information. Also, the endpoint of the trailers is no
more inherently special than, for example, the following other possible
return values:

- the number of trailers that were recognized and parsed
- whether we encountered any trailers or not
- the start position of when we first encountered a trailer in the input

which makes me want to avoid returning this "trailer_end" value from
parse_trailers.

One more thing: we already have a function named "find_trailer_end"
which is supposed to do this already. But it uses "ignore_non_trailer"
from commit.c (that function should probably use the trailer API later
on to figure this out...). I wanted to clean that part up in the future
as part of libifcation.

>> -static size_t process_input_file(FILE *outfile,
>> -				 const char *str,
>> -				 struct list_head *head,
>> -				 const struct process_trailer_options *opts)
>> +/*
>> + * Parse trailers in "str" and populate the "head" linked list structure.
>> + */
>> +static void parse_trailers(struct trailer_info *info,
>
> "info" is an out parameter, and IIRC we typically put out parameters
> towards the end. I didn't find a callout in CodingGuidelines, though, so
> idk if this is an ironclad rule or not.

I wanted to minimize churn as much as possible (hence my hesitation with
changing around the order of these parameters). But also,
trailer_info_get uses "info" as the first parameter, so I wanted to
align with that usage.

>> @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile,
>>  		}
>>  	}
>>
>> -	trailer_info_release(&info);
>> -
>> -	return info.trailer_end - str;
>> +	trailer_info_release(info);
>>  }
>>
>
> Even though "info" is a pointer passed into this function, we are
> _release-ing it. This is not an umabiguously good change, IMO. Before,
> "info" was never used outside of this function, so we should obviously
> release it before returning. However, now that "info" is an out
> parameter, we should be more careful about releasing it.

Hmm, agreed.

> I don't think
> it's obvious that the caller will see the right values for
> info.trailer_end and info.trailer_start, but free()-d values for
> info.trailers, and a meaningless value for info.trailer_nr (since the
> items were free()-d).

Agreed. Will update to avoid calling trailer_info_release() inside
parse_trailers() because the caller might still need that information. I
think the fix is to move the trailer_info_get outside to the caller,
much like how format_trailers_from_commit() does it.

> I think it might be better to update the comment on parse_trailers()
> like so:
>
>   /*
>    * Parse trailers in "str", populating the trailer info and "head"
>    * linked list structure.
>    */
>
> and make it the caller's responsibility to call trailer_info_release().
> We could move this call to where we "free_all(head)".

SGTM. (I regret not reading this text before drafting my response above.)

>>  static void free_all(struct list_head *head)
>> @@ -1054,6 +1047,7 @@ void process_trailers(const char *file,
>>  {
>>  	LIST_HEAD(head);
>>  	struct strbuf sb = STRBUF_INIT;
>> +	struct trailer_info info;
>>  	size_t trailer_end;
>>  	FILE *outfile = stdout;
>>
>> @@ -1064,8 +1058,16 @@ void process_trailers(const char *file,
>>  	if (opts->in_place)
>>  		outfile = create_in_place_tempfile(file);
>
> Thinking out loud, should we move the creation of outfile next to where
> we first use it?

Not sure what you mean here. Can you clarify?

>> +	parse_trailers(&info, sb.buf, &head, opts);
>> +	trailer_end = info.trailer_end - sb.buf;
>> +
>>  	/* Print the lines before the trailers */
>> -	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
>> +	if (!opts->only_trailers)
>> +		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
>
> I'm not sure if it is an unambiguously good change for the caller to
> learn how to compute the start and end of the trailer sections by doing
> pointer arithmetic,

I think a future cleanup (in a follow-up series) involving
find_trailer_end should simplify this area and avoid the need for
pointer arithmetic in the caller.

> It feels a bit non-obvious that trailer_start and trailer_end are
> pointing inside the input string. I wonder if we should just return the
> _start and _end offsets directly instead of returning pointers. I.e.:
>
>    struct trailer_info {
>      int blank_line_before_trailer;
>  -  /*
>  -   * Pointers to the start and end of the trailer block found. If there
>  -   * is no trailer block found, these 2 pointers point to the end of the
>  -   * input string.
>  -   */
>  -   const char *trailer_start, *trailer_end;
>  +   /* Offsets to the trailer block start and end in the input string */
>  +   size_t *trailer_start, *trailer_end;
>
> Which makes their intended use fairly unambiguous. A quick grep suggests
> that in trailer.c, we're roughly as likely to use the pointer directly
> vs using it to do pointer arithmetic, so converging on one use might be
> a win for readability.

Agreed! I would prefer to use offsets everywhere, as I think that is
more direct (because we are concerned with locations in the input).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/5] trailer: split process_command_line_args into separate functions
  2023-08-07 22:51   ` Glen Choo
@ 2023-08-11  0:59     ` Linus Arver
  2023-08-11  1:02       ` Linus Arver
  2023-08-11 21:11       ` Glen Choo
  0 siblings, 2 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-11  0:59 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Glen Choo <chooglen@google.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Previously, process_command_line_args did two things:
>>
>>     (1) parse trailers from the configuration, and
>>     (2) parse trailers defined on the command line.
>
> It parses trailers from two places, but it still only does "one thing",
> in that it only parses trailers.

More precisely, it parses trailers from the command line by first
parsing trailers from the configuration. In other words, parsing
trailers from the configuration (independent of the input string!) is a
required dependency for parsing trailers coming from the command line.

If we take a step back, we need to do 3 things:

   (1) parse trailers from the configuration
   (2) parse trailers from the command line
   (3) parse trailers from the input

I think these three should be separated into separate functions. I think
no one wants to combine all three into one function. And I can't think
of a good enough reason to combine (1) and (2) together either. Hence
this patch.

> I find this equally readable as the preimage, which IMO is adequately
> scoped and commented.

Aside: is "preimage" the status quo before applying the patch?

>> @@ -1070,8 +1075,11 @@ void process_trailers(const char *file,
>>  
>>  
>>  	if (!opts->only_input) {
>> +		LIST_HEAD(config_head);
>>  		LIST_HEAD(arg_head);
>> -		process_command_line_args(&arg_head, new_trailer_head);
>> +		parse_trailers_from_config(&config_head);
>> +		parse_trailers_from_command_line_args(&arg_head, new_trailer_head);
>> +		list_splice(&config_head, &arg_head);
>>  		process_trailers_lists(&head, &arg_head);
>>  	}
>
> But now, we have to remember to call two functions instead of just one.

But only inside interpret-trailers.c, right?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/5] trailer: split process_command_line_args into separate functions
  2023-08-11  0:59     ` Linus Arver
@ 2023-08-11  1:02       ` Linus Arver
  2023-08-11 21:11       ` Glen Choo
  1 sibling, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-11  1:02 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Linus Arver <linusa@google.com> writes:
>>
>> But now, we have to remember to call two functions instead of just one.
>
> But only inside interpret-trailers.c, right?

Oops, I meant trailer.c.

In the future I expect to move this to interpret-trailers.c.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/5] trailer: teach find_patch_start about --no-divider
  2023-08-07 23:28   ` Glen Choo
@ 2023-08-11  1:25     ` Linus Arver
  2023-08-11 20:51       ` Glen Choo
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-08-11  1:25 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Glen Choo <chooglen@google.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> @@ -812,14 +812,14 @@ static ssize_t last_line(const char *buf, size_t len)
>>   * Return the position of the start of the patch or the length of str if there
>>   * is no patch in the message.
>>   */
>> -static size_t find_patch_start(const char *str)
>> +static size_t find_patch_start(const char *str, int no_divider)
>>  {
>>  	const char *s;
>>  
>>  	for (s = str; *s; s = next_line(s)) {
>>  		const char *v;
>>  
>> -		if (skip_prefix(s, "---", &v) && isspace(*v))
>> +		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
>>  			return s - str;
>>  	}
>
> Assuming we wanted to make this unit-testable anyway, could we just move
> the strlen() call into this function?

I don't see why we should preserve the if-statement and associated
strlen() call if we can just get rid of it.

> [...] I
> don't find this easier to understand. Now the reader needs to read the
> code to see "if no_divider is given, noop until the end of the string,
> at which point str will point to the end, and s - str will give us the
> length of str", as opposed to "there are no dividers, so just return
> strlen(str)".

The main idea behind this patch is to make find_patch_start() return the
correct answer. Currently it does not in all cases (whether --no-divider
is provided), and so the caller has to calculate the
start of the patch with strlen manually. By moving the --no-divider flag
into this function, we force all callers to consider this important
option.

For additional context we recently had to fix a bug where we weren't
passing in this flag to the interpret-trailers builtin. See be3d654343
(commit: pass --no-divider to interpret-trailers, 2023-06-17). There we
acknowledged that some callers forgot to pass in --no-divider to
interpret-trailers (such as the caller that was fixed up in that
commit).

I mention the above example because although it's not the exact same
thing as here, I think the scenario of "sometimes callers can forget
about --no-divider" is an important one to prevent wherever possible.
That's why I like this patch (in addition to the reasons cited in the
commit message).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-08-07 23:45   ` Glen Choo
@ 2023-08-11 18:00     ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-08-11 18:00 UTC (permalink / raw)
  To: Glen Choo, Linus Arver via GitGitGadget, git

Glen Choo <chooglen@google.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> (2) "Default" can also mean the "trailer.*" configurations themselves,
>>     because these configurations are used by "default" (ahead of the
>>     hardcoded defaults in (1)) if no command line arguments are
>>     provided.
>
> Interesting, I would have never thought of config as 'default'. In fact,
> I would have thought that this de facto behavior (which you also
> clarified in [1]) is a bug if not for the fact that in an internal
> version of this series, you cited a commit message that describes this
> as expected behavior. That context would be very welcome in the ML, I
> think.
>
> [1] https://lore.kernel.org/git/6b427b4b1e82b1f01640f1f49fe8d1c2fd02111e.1691210737.git.gitgitgadget@gmail.com

I forget the internal version/discussion, but I assume you're thinking
of 0ea5292e6b (interpret-trailers: add options for actions, 2017-08-01).
I will reroll and mention it to the commit message.

>> In addition, the corresponding *_DEFAULT values are chosen when the user
>> provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
>> on the command line. These "--no-*" flags are used to clear previously
>> provided flags of the form "--where", "--if-exists", and "--if-missing".
>> Using these "--no-*" flags undoes the specifying of these flags (if
>> any), so using the word "UNSPECIFIED" is more natural here.
>>
>> So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
>> signals to the reader that the *_UNSPECIFIED value by itself carries no
>> meaning (it's a zero value and by itself does not "default" to anything,
>> necessitating the need to have some other way of getting to a useful
>> value).
>
> Makse sense. This seems like a good change.
>
>> @@ -586,7 +586,10 @@ static void ensure_configured(void)
>>  	if (configured)
>>  		return;
>>  
>> -	/* Default config must be setup first */
>> +	/*
>> +	 * Default config must be setup first. These defaults are used if there
>> +	 * are no "trailer.*" or "trailer.<token>.*" options configured.
>> +	 */
>>  	default_conf_info.where = WHERE_END;
>>  	default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
>>  	default_conf_info.if_missing = MISSING_ADD;
>
> As mentioned earlier, I find it a bit odd that we're calling config
> 'default' (and also that we're calling CLI args config), but
> renaming default_conf_info to config_conf_info sounds worse, so let's
> leave it as-is.

SGTM. Although, we could also just rename it to "conf_info" (same name
as the struct name). Unless such same-variable-name-as-the-struct is
discouraged in the codebase.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 4/5] trailer: teach find_patch_start about --no-divider
  2023-08-11  1:25     ` Linus Arver
@ 2023-08-11 20:51       ` Glen Choo
  0 siblings, 0 replies; 72+ messages in thread
From: Glen Choo @ 2023-08-11 20:51 UTC (permalink / raw)
  To: Linus Arver, Linus Arver via GitGitGadget, git

Linus Arver <linusa@google.com> writes:

> I don't see why we should preserve the if-statement and associated
> strlen() call if we can just get rid of it.

Here are some reasons:

- Without compiler optimizations, it is faster.
- Subjectively, I find the early return easier to understand.

I don't think I need to nitpick over such a tiny issue though, so I'm
okay either way.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/5] trailer: split process_command_line_args into separate functions
  2023-08-11  0:59     ` Linus Arver
  2023-08-11  1:02       ` Linus Arver
@ 2023-08-11 21:11       ` Glen Choo
  1 sibling, 0 replies; 72+ messages in thread
From: Glen Choo @ 2023-08-11 21:11 UTC (permalink / raw)
  To: Linus Arver, Linus Arver via GitGitGadget, git

Linus Arver <linusa@google.com> writes:

>> I find this equally readable as the preimage, which IMO is adequately
>> scoped and commented.
>
> Aside: is "preimage" the status quo before applying the patch?

Yup.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 0/6] Trailer readability cleanups
  2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
                   ` (4 preceding siblings ...)
  2023-08-05  5:04 ` [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
@ 2023-09-09  6:16 ` Linus Arver via GitGitGadget
  2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
                     ` (6 more replies)
  5 siblings, 7 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver

These patches were created while digging into the trailer code to better
understand how it works, in preparation for making the trailer.{c,h} files
as small as possible to make them available as a library for external users.
This series was originally created as part of [1], but are sent here
separately because the changes here are arguably more subjective in nature.
I think Patch 1 is the most important in this series. The others can wait,
if folks are opposed to adding them on their own merits at this point in
time.

These patches do not add or change any features. Instead, their goal is to
make the code easier to understand for new contributors (like myself), by
making various cleanups and improvements. Ultimately, my hope is that with
such cleanups, we are better positioned to make larger changes (especially
the broader libification effort, as in "Introduce Git Standard Library"
[2]).

Patch 1 was inspired by 576de3d956 (unpack_trees: start splitting internal
fields from public API, 2023-02-27) [3], and is in preparation for a
libification effort in the future around the trailer code. Independent of
libification, it still makes sense to discourage callers from peeking into
these trailer-internal fields.

Patches 2-3 aim to make some functions do a little less multitasking.

Patch 4 makes the find_patch_start function care about the "--no-divider"
option, because it that option matters for determining the start of the
"patch part" of the input.

Patch 5 is a renaming change to reduce overloaded language in the codebase.
It is inspired by 229d6ab6bf (doc: trailer: examples: avoid the word
"message" by itself, 2023-06-15) [4], which did a similar thing for the
interpret-trailers documentation.

Patch 6 makes trailer_info use offsets for trailer_start and trailer_end.


Updates in v2
=============

 * Patch 1: Drop the use of a #define. Instead just use an anonymous struct
   named internal.
 * Patch 2: Don't free info out parameter inside parse_trailers(). Instead
   free it from the caller, process_trailers(). Update comment in
   parse_trailers().
 * Patch 3: Reword commit message.
 * Patch 4: Mention be3d654343 (commit: pass --no-divider to
   interpret-trailers, 2023-06-17) in commit message.
 * Added Patch 6 to make trailer_info use offsets for trailer_start and
   trailer_end (thanks to Glen Choo for the suggestion).

[1]
https://lore.kernel.org/git/pull.1564.git.1691210737.gitgitgadget@gmail.com/T/#mb044012670663d8eb7a548924bbcc933bef116de
[2]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/
[3]
https://lore.kernel.org/git/pull.1149.git.1677143700.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/6b4cb31b17077181a311ca87e82464a1e2ad67dd.1686797630.git.gitgitgadget@gmail.com/

Linus Arver (6):
  trailer: separate public from internal portion of trailer_iterator
  trailer: split process_input_file into separate pieces
  trailer: split process_command_line_args into separate functions
  trailer: teach find_patch_start about --no-divider
  trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  trailer: use offsets for trailer_start/trailer_end

 trailer.c | 126 +++++++++++++++++++++++++++++-------------------------
 trailer.h |  19 ++++----
 2 files changed, 77 insertions(+), 68 deletions(-)


base-commit: 1b0a5129563ebe720330fdc8f5c6843d27641137
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1563%2Flistx%2Ftrailer-libification-prep-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1563/listx/trailer-libification-prep-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1563

Range-diff vs v1:

 1:  0bce4d4b0d5 ! 1:  4f116d2550f trailer: separate public from internal portion of trailer_iterator
     @@ Commit message
          trailer: separate public from internal portion of trailer_iterator
      
          The fields here are not meant to be used by downstream callers, so put
     -    them behind an anonymous struct named as
     -    "__private_to_trailer_c__do_not_use" to warn against their use.
     +    them behind an anonymous struct named as "internal" to warn against
     +    their use. This follows the pattern in 576de3d956 (unpack_trees: start
     +    splitting internal fields from public API, 2023-02-27).
      
     -    Internally, use a "#define" to keep the code tidy.
     -
     -    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Linus Arver <linusa@google.com>
      
       ## trailer.c ##
     -@@ trailer.c: void format_trailers_from_commit(struct strbuf *out, const char *msg,
     - 	trailer_info_release(&info);
     - }
     - 
     -+#define private __private_to_trailer_c__do_not_use
     -+
     - void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
     - {
     - 	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
     +@@ trailer.c: void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
       	strbuf_init(&iter->key, 0);
       	strbuf_init(&iter->val, 0);
       	opts.no_divider = 1;
      -	trailer_info_get(&iter->info, msg, &opts);
      -	iter->cur = 0;
     -+	trailer_info_get(&iter->private.info, msg, &opts);
     -+	iter->private.cur = 0;
     ++	trailer_info_get(&iter->internal.info, msg, &opts);
     ++	iter->internal.cur = 0;
       }
       
       int trailer_iterator_advance(struct trailer_iterator *iter)
       {
      -	while (iter->cur < iter->info.trailer_nr) {
      -		char *trailer = iter->info.trailers[iter->cur++];
     -+	while (iter->private.cur < iter->private.info.trailer_nr) {
     -+		char *trailer = iter->private.info.trailers[iter->private.cur++];
     ++	while (iter->internal.cur < iter->internal.info.trailer_nr) {
     ++		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
       		int separator_pos = find_separator(trailer, separators);
       
       		if (separator_pos < 1)
     @@ trailer.c: int trailer_iterator_advance(struct trailer_iterator *iter)
       void trailer_iterator_release(struct trailer_iterator *iter)
       {
      -	trailer_info_release(&iter->info);
     -+	trailer_info_release(&iter->private.info);
     ++	trailer_info_release(&iter->internal.info);
       	strbuf_release(&iter->val);
       	strbuf_release(&iter->key);
       }
     @@ trailer.h: struct trailer_iterator {
      +	struct {
      +		struct trailer_info info;
      +		size_t cur;
     -+	} __private_to_trailer_c__do_not_use;
     ++	} internal;
       };
       
       /*
 2:  d023c297dca ! 2:  c00f4623d0b trailer: split process_input_file into separate pieces
     @@ trailer.c: static void unfold_value(struct strbuf *val)
      -				 struct list_head *head,
      -				 const struct process_trailer_options *opts)
      +/*
     -+ * Parse trailers in "str" and populate the "head" linked list structure.
     ++ * Parse trailers in "str", populating the trailer info and "head"
     ++ * linked list structure.
      + */
      +static void parse_trailers(struct trailer_info *info,
      +			     const char *str,
     @@ trailer.c: static void unfold_value(struct strbuf *val)
       			continue;
       		separator_pos = find_separator(trailer, separators);
      @@ trailer.c: static size_t process_input_file(FILE *outfile,
     + 					 strbuf_detach(&val, NULL));
       		}
       	}
     - 
     +-
      -	trailer_info_release(&info);
      -
      -	return info.trailer_end - str;
     -+	trailer_info_release(info);
       }
       
       static void free_all(struct list_head *head)
     @@ trailer.c: void process_trailers(const char *file,
       
       	if (!opts->only_input) {
       		LIST_HEAD(arg_head);
     +@@ trailer.c: void process_trailers(const char *file,
     + 	print_all(outfile, &head, opts);
     + 
     + 	free_all(&head);
     ++	trailer_info_release(&info);
     + 
     + 	/* Print the lines after the trailers as is */
     + 	if (!opts->only_trailers)
 3:  c8bb0136621 ! 3:  f78c2345fad trailer: split process_command_line_args into separate functions
     @@ Commit message
              (1) parse trailers from the configuration, and
              (2) parse trailers defined on the command line.
      
     -    Separate these concerns into parse_trailers_from_config and
     -    parse_trailers_from_command_line_args, respectively. Remove (now
     -    redundant) process_command_line_args.
     +    Separate (1) outside to a new function, parse_trailers_from_config.
     +    Rename the remaining logic to parse_trailers_from_command_line_args.
      
          Signed-off-by: Linus Arver <linusa@google.com>
      
 4:  1fc060041db ! 4:  f5f507c4c6c trailer: teach find_patch_start about --no-divider
     @@ Commit message
      
          Instead, make find_patch_start aware of "--no-divider" and make it
          handle that case as well. This means we no longer need to call strlen at
     -    all and can just rely on the existing code in find_patch_start.
     +    all and can just rely on the existing code in find_patch_start. By
     +    forcing callers to consider this important option, we avoid the kind of
     +    mistake described in be3d654343 (commit: pass --no-divider to
     +    interpret-trailers, 2023-06-17).
      
          This patch will make unit testing a bit more pleasant in this area in
          the future when we adopt a unit testing framework, because we would not
 5:  7c9b63c2616 ! 5:  52958c3557c trailer: rename *_DEFAULT enums to *_UNSPECIFIED
     @@ Commit message
          (2) "Default" can also mean the "trailer.*" configurations themselves,
              because these configurations are used by "default" (ahead of the
              hardcoded defaults in (1)) if no command line arguments are
     -        provided.
     +        provided. This concept of defaulting back to the configurations was
     +        introduced in 0ea5292e6b (interpret-trailers: add options for
     +        actions, 2017-08-01).
      
          In addition, the corresponding *_DEFAULT values are chosen when the user
          provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
 -:  ----------- > 6:  0463066ebe0 trailer: use offsets for trailer_start/trailer_end

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-11 17:10     ` Junio C Hamano
  2023-09-09  6:16   ` [PATCH v2 2/6] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

The fields here are not meant to be used by downstream callers, so put
them behind an anonymous struct named as "internal" to warn against
their use. This follows the pattern in 576de3d956 (unpack_trees: start
splitting internal fields from public API, 2023-02-27).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 10 +++++-----
 trailer.h |  6 ++++--
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index f408f9b058d..de4bdece847 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1220,14 +1220,14 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
 	strbuf_init(&iter->key, 0);
 	strbuf_init(&iter->val, 0);
 	opts.no_divider = 1;
-	trailer_info_get(&iter->info, msg, &opts);
-	iter->cur = 0;
+	trailer_info_get(&iter->internal.info, msg, &opts);
+	iter->internal.cur = 0;
 }
 
 int trailer_iterator_advance(struct trailer_iterator *iter)
 {
-	while (iter->cur < iter->info.trailer_nr) {
-		char *trailer = iter->info.trailers[iter->cur++];
+	while (iter->internal.cur < iter->internal.info.trailer_nr) {
+		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
 		int separator_pos = find_separator(trailer, separators);
 
 		if (separator_pos < 1)
@@ -1245,7 +1245,7 @@ int trailer_iterator_advance(struct trailer_iterator *iter)
 
 void trailer_iterator_release(struct trailer_iterator *iter)
 {
-	trailer_info_release(&iter->info);
+	trailer_info_release(&iter->internal.info);
 	strbuf_release(&iter->val);
 	strbuf_release(&iter->key);
 }
diff --git a/trailer.h b/trailer.h
index 795d2fccfd9..ab2cd017567 100644
--- a/trailer.h
+++ b/trailer.h
@@ -119,8 +119,10 @@ struct trailer_iterator {
 	struct strbuf val;
 
 	/* private */
-	struct trailer_info info;
-	size_t cur;
+	struct {
+		struct trailer_info info;
+		size_t cur;
+	} internal;
 };
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 2/6] trailer: split process_input_file into separate pieces
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-11 17:10     ` Junio C Hamano
  2023-09-09  6:16   ` [PATCH v2 3/6] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Currently, process_input_file does three things:

    (1) parse the input string for trailers,
    (2) print text before the trailers, and
    (3) calculate the position of the input where the trailers end.

Rename this function to parse_trailers(), and make it only do
(1). The caller of this function, process_trailers, becomes responsible
for (2) and (3). These items belong inside process_trailers because they
are both concerned with printing the surrounding text around
trailers (which is already one of the immediate concerns of
process_trailers).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/trailer.c b/trailer.c
index de4bdece847..2c56cbc4a2e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -961,28 +961,24 @@ static void unfold_value(struct strbuf *val)
 	strbuf_release(&out);
 }
 
-static size_t process_input_file(FILE *outfile,
-				 const char *str,
-				 struct list_head *head,
-				 const struct process_trailer_options *opts)
+/*
+ * Parse trailers in "str", populating the trailer info and "head"
+ * linked list structure.
+ */
+static void parse_trailers(struct trailer_info *info,
+			     const char *str,
+			     struct list_head *head,
+			     const struct process_trailer_options *opts)
 {
-	struct trailer_info info;
 	struct strbuf tok = STRBUF_INIT;
 	struct strbuf val = STRBUF_INIT;
 	size_t i;
 
-	trailer_info_get(&info, str, opts);
-
-	/* Print lines before the trailers as is */
-	if (!opts->only_trailers)
-		fwrite(str, 1, info.trailer_start - str, outfile);
+	trailer_info_get(info, str, opts);
 
-	if (!opts->only_trailers && !info.blank_line_before_trailer)
-		fprintf(outfile, "\n");
-
-	for (i = 0; i < info.trailer_nr; i++) {
+	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
-		char *trailer = info.trailers[i];
+		char *trailer = info->trailers[i];
 		if (trailer[0] == comment_line_char)
 			continue;
 		separator_pos = find_separator(trailer, separators);
@@ -1002,10 +998,6 @@ static size_t process_input_file(FILE *outfile,
 					 strbuf_detach(&val, NULL));
 		}
 	}
-
-	trailer_info_release(&info);
-
-	return info.trailer_end - str;
 }
 
 static void free_all(struct list_head *head)
@@ -1054,6 +1046,7 @@ void process_trailers(const char *file,
 {
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
+	struct trailer_info info;
 	size_t trailer_end;
 	FILE *outfile = stdout;
 
@@ -1064,8 +1057,16 @@ void process_trailers(const char *file,
 	if (opts->in_place)
 		outfile = create_in_place_tempfile(file);
 
+	parse_trailers(&info, sb.buf, &head, opts);
+	trailer_end = info.trailer_end - sb.buf;
+
 	/* Print the lines before the trailers */
-	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
+	if (!opts->only_trailers)
+		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+
+	if (!opts->only_trailers && !info.blank_line_before_trailer)
+		fprintf(outfile, "\n");
+
 
 	if (!opts->only_input) {
 		LIST_HEAD(arg_head);
@@ -1076,6 +1077,7 @@ void process_trailers(const char *file,
 	print_all(outfile, &head, opts);
 
 	free_all(&head);
+	trailer_info_release(&info);
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 3/6] trailer: split process_command_line_args into separate functions
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
  2023-09-09  6:16   ` [PATCH v2 2/6] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-09  6:16   ` [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, process_command_line_args did two things:

    (1) parse trailers from the configuration, and
    (2) parse trailers defined on the command line.

Separate (1) outside to a new function, parse_trailers_from_config.
Rename the remaining logic to parse_trailers_from_command_line_args.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 2c56cbc4a2e..b6de5d9cb2d 100644
--- a/trailer.c
+++ b/trailer.c
@@ -711,30 +711,35 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	list_add_tail(&new_item->list, arg_head);
 }
 
-static void process_command_line_args(struct list_head *arg_head,
-				      struct list_head *new_trailer_head)
+static void parse_trailers_from_config(struct list_head *config_head)
 {
 	struct arg_item *item;
-	struct strbuf tok = STRBUF_INIT;
-	struct strbuf val = STRBUF_INIT;
-	const struct conf_info *conf;
 	struct list_head *pos;
 
-	/*
-	 * In command-line arguments, '=' is accepted (in addition to the
-	 * separators that are defined).
-	 */
-	char *cl_separators = xstrfmt("=%s", separators);
-
 	/* Add an arg item for each configured trailer with a command */
 	list_for_each(pos, &conf_head) {
 		item = list_entry(pos, struct arg_item, list);
 		if (item->conf.command)
-			add_arg_item(arg_head,
+			add_arg_item(config_head,
 				     xstrdup(token_from_item(item, NULL)),
 				     xstrdup(""),
 				     &item->conf, NULL);
 	}
+}
+
+static void parse_trailers_from_command_line_args(struct list_head *arg_head,
+						  struct list_head *new_trailer_head)
+{
+	struct strbuf tok = STRBUF_INIT;
+	struct strbuf val = STRBUF_INIT;
+	const struct conf_info *conf;
+	struct list_head *pos;
+
+	/*
+	 * In command-line arguments, '=' is accepted (in addition to the
+	 * separators that are defined).
+	 */
+	char *cl_separators = xstrfmt("=%s", separators);
 
 	/* Add an arg item for each trailer on the command line */
 	list_for_each(pos, new_trailer_head) {
@@ -1069,8 +1074,11 @@ void process_trailers(const char *file,
 
 
 	if (!opts->only_input) {
+		LIST_HEAD(config_head);
 		LIST_HEAD(arg_head);
-		process_command_line_args(&arg_head, new_trailer_head);
+		parse_trailers_from_config(&config_head);
+		parse_trailers_from_command_line_args(&arg_head, new_trailer_head);
+		list_splice(&config_head, &arg_head);
 		process_trailers_lists(&head, &arg_head);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-09-09  6:16   ` [PATCH v2 3/6] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-11 17:25     ` Junio C Hamano
  2023-09-09  6:16   ` [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Currently, find_patch_start only finds the start of the patch part of
the input (by looking at the "---" divider) for cases where the
"--no-divider" flag has not been provided. If the user provides this
flag, we do not rely on find_patch_start at all and just call strlen()
directly on the input.

Instead, make find_patch_start aware of "--no-divider" and make it
handle that case as well. This means we no longer need to call strlen at
all and can just rely on the existing code in find_patch_start. By
forcing callers to consider this important option, we avoid the kind of
mistake described in be3d654343 (commit: pass --no-divider to
interpret-trailers, 2023-06-17).

This patch will make unit testing a bit more pleasant in this area in
the future when we adopt a unit testing framework, because we would not
have to test multiple functions to check how finding the start of a
patch part works (we would only need to test find_patch_start).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index b6de5d9cb2d..f646e484a23 100644
--- a/trailer.c
+++ b/trailer.c
@@ -812,14 +812,14 @@ static ssize_t last_line(const char *buf, size_t len)
  * Return the position of the start of the patch or the length of str if there
  * is no patch in the message.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_patch_start(const char *str, int no_divider)
 {
 	const char *s;
 
 	for (s = str; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
 			return s - str;
 	}
 
@@ -1109,11 +1109,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
+	patch_start = find_patch_start(str, opts->no_divider);
 	trailer_end = find_trailer_end(str, patch_start);
 	trailer_start = find_trailer_start(str, trailer_end);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-09-09  6:16   ` [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-11 18:54     ` Junio C Hamano
  2023-09-09  6:16   ` [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
  6 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Do not use *_DEFAULT as a suffix to the enums, because the word
"default" is overloaded. The following are two examples of the ambiguity
of the word "default":

(1) "Default" can mean using the "default" values that are hardcoded
    in trailer.c as

        default_conf_info.where = WHERE_END;
        default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
        default_conf_info.if_missing = MISSING_ADD;

    in ensure_configured(). These values are referred to as "the
    default" in the docs for interpret-trailers. These defaults are used
    if no "trailer.*" configurations are defined.

(2) "Default" can also mean the "trailer.*" configurations themselves,
    because these configurations are used by "default" (ahead of the
    hardcoded defaults in (1)) if no command line arguments are
    provided. This concept of defaulting back to the configurations was
    introduced in 0ea5292e6b (interpret-trailers: add options for
    actions, 2017-08-01).

In addition, the corresponding *_DEFAULT values are chosen when the user
provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
on the command line. These "--no-*" flags are used to clear previously
provided flags of the form "--where", "--if-exists", and "--if-missing".
Using these "--no-*" flags undoes the specifying of these flags (if
any), so using the word "UNSPECIFIED" is more natural here.

So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
signals to the reader that the *_UNSPECIFIED value by itself carries no
meaning (it's a zero value and by itself does not "default" to anything,
necessitating the need to have some other way of getting to a useful
value).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 17 ++++++++++-------
 trailer.h |  6 +++---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/trailer.c b/trailer.c
index f646e484a23..6ad2fbca942 100644
--- a/trailer.c
+++ b/trailer.c
@@ -388,7 +388,7 @@ static void process_trailers_lists(struct list_head *head,
 int trailer_set_where(enum trailer_where *item, const char *value)
 {
 	if (!value)
-		*item = WHERE_DEFAULT;
+		*item = WHERE_UNSPECIFIED;
 	else if (!strcasecmp("after", value))
 		*item = WHERE_AFTER;
 	else if (!strcasecmp("before", value))
@@ -405,7 +405,7 @@ int trailer_set_where(enum trailer_where *item, const char *value)
 int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 {
 	if (!value)
-		*item = EXISTS_DEFAULT;
+		*item = EXISTS_UNSPECIFIED;
 	else if (!strcasecmp("addIfDifferent", value))
 		*item = EXISTS_ADD_IF_DIFFERENT;
 	else if (!strcasecmp("addIfDifferentNeighbor", value))
@@ -424,7 +424,7 @@ int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 int trailer_set_if_missing(enum trailer_if_missing *item, const char *value)
 {
 	if (!value)
-		*item = MISSING_DEFAULT;
+		*item = MISSING_UNSPECIFIED;
 	else if (!strcasecmp("doNothing", value))
 		*item = MISSING_DO_NOTHING;
 	else if (!strcasecmp("add", value))
@@ -586,7 +586,10 @@ static void ensure_configured(void)
 	if (configured)
 		return;
 
-	/* Default config must be setup first */
+	/*
+	 * Default config must be setup first. These defaults are used if there
+	 * are no "trailer.*" or "trailer.<token>.*" options configured.
+	 */
 	default_conf_info.where = WHERE_END;
 	default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
 	default_conf_info.if_missing = MISSING_ADD;
@@ -701,11 +704,11 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	new_item->value = val;
 	duplicate_conf(&new_item->conf, conf);
 	if (new_trailer_item) {
-		if (new_trailer_item->where != WHERE_DEFAULT)
+		if (new_trailer_item->where != WHERE_UNSPECIFIED)
 			new_item->conf.where = new_trailer_item->where;
-		if (new_trailer_item->if_exists != EXISTS_DEFAULT)
+		if (new_trailer_item->if_exists != EXISTS_UNSPECIFIED)
 			new_item->conf.if_exists = new_trailer_item->if_exists;
-		if (new_trailer_item->if_missing != MISSING_DEFAULT)
+		if (new_trailer_item->if_missing != MISSING_UNSPECIFIED)
 			new_item->conf.if_missing = new_trailer_item->if_missing;
 	}
 	list_add_tail(&new_item->list, arg_head);
diff --git a/trailer.h b/trailer.h
index ab2cd017567..a689d768c79 100644
--- a/trailer.h
+++ b/trailer.h
@@ -5,14 +5,14 @@
 #include "strbuf.h"
 
 enum trailer_where {
-	WHERE_DEFAULT,
+	WHERE_UNSPECIFIED,
 	WHERE_END,
 	WHERE_AFTER,
 	WHERE_BEFORE,
 	WHERE_START
 };
 enum trailer_if_exists {
-	EXISTS_DEFAULT,
+	EXISTS_UNSPECIFIED,
 	EXISTS_ADD_IF_DIFFERENT_NEIGHBOR,
 	EXISTS_ADD_IF_DIFFERENT,
 	EXISTS_ADD,
@@ -20,7 +20,7 @@ enum trailer_if_exists {
 	EXISTS_DO_NOTHING
 };
 enum trailer_if_missing {
-	MISSING_DEFAULT,
+	MISSING_UNSPECIFIED,
 	MISSING_ADD,
 	MISSING_DO_NOTHING
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
                     ` (4 preceding siblings ...)
  2023-09-09  6:16   ` [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
@ 2023-09-09  6:16   ` Linus Arver via GitGitGadget
  2023-09-11 19:01     ` Junio C Hamano
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
  6 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-09  6:16 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 17 ++++++++---------
 trailer.h |  7 +++----
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 6ad2fbca942..00326720e81 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1055,7 +1055,6 @@ void process_trailers(const char *file,
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
 	struct trailer_info info;
-	size_t trailer_end;
 	FILE *outfile = stdout;
 
 	ensure_configured();
@@ -1066,11 +1065,10 @@ void process_trailers(const char *file,
 		outfile = create_in_place_tempfile(file);
 
 	parse_trailers(&info, sb.buf, &head, opts);
-	trailer_end = info.trailer_end - sb.buf;
 
 	/* Print the lines before the trailers */
 	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+		fwrite(sb.buf, 1, info.trailer_start, outfile);
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1092,7 +1090,7 @@ void process_trailers(const char *file,
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
+		fwrite(sb.buf + info.trailer_end, 1, sb.len - info.trailer_end, outfile);
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1104,7 +1102,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int patch_start, trailer_end, trailer_start;
+	size_t patch_start, trailer_end = 0, trailer_start = 0;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1139,8 +1137,8 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_start);
-	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + trailer_end;
+	info->trailer_start = trailer_start;
+	info->trailer_end = trailer_end;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
@@ -1155,6 +1153,7 @@ void trailer_info_release(struct trailer_info *info)
 
 static void format_trailer_info(struct strbuf *out,
 				const struct trailer_info *info,
+				const char *msg,
 				const struct process_trailer_options *opts)
 {
 	size_t origlen = out->len;
@@ -1164,7 +1163,7 @@ static void format_trailer_info(struct strbuf *out,
 	if (!opts->only_trailers && !opts->unfold && !opts->filter &&
 	    !opts->separator && !opts->key_only && !opts->value_only &&
 	    !opts->key_value_separator) {
-		strbuf_add(out, info->trailer_start,
+		strbuf_add(out, msg + info->trailer_start,
 			   info->trailer_end - info->trailer_start);
 		return;
 	}
@@ -1219,7 +1218,7 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	struct trailer_info info;
 
 	trailer_info_get(&info, msg, opts);
-	format_trailer_info(out, &info, opts);
+	format_trailer_info(out, &info, msg, opts);
 	trailer_info_release(&info);
 }
 
diff --git a/trailer.h b/trailer.h
index a689d768c79..13fbf0dcd12 100644
--- a/trailer.h
+++ b/trailer.h
@@ -37,11 +37,10 @@ struct trailer_info {
 	int blank_line_before_trailer;
 
 	/*
-	 * Pointers to the start and end of the trailer block found. If there
-	 * is no trailer block found, these 2 pointers point to the end of the
-	 * input string.
+	 * Offsets to the trailer block start and end positions in the input
+	 * string. If no trailer block is found, these are set to 0.
 	 */
-	const char *trailer_start, *trailer_end;
+	size_t trailer_start, trailer_end;
 
 	/*
 	 * Array of trailers found.
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator
  2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
@ 2023-09-11 17:10     ` Junio C Hamano
  0 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2023-09-11 17:10 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> The fields here are not meant to be used by downstream callers, so put
> them behind an anonymous struct named as "internal" to warn against
> their use. This follows the pattern in 576de3d956 (unpack_trees: start
> splitting internal fields from public API, 2023-02-27).

OK.  The patch shows that there exist no code external to this file
that touch these members that are marked "private", so it has some
auditing value from that point of view, which is nice.

But that is only about today's code and does not protect us from
future breakage.  In other words, "git grep internal\\." would not
be an effective way to find misuses of these members from the
sidelines.  But that is OK, as "git grep -E '([.]|->)info'" would
not be an effective way in today's code, either, and the patch is
not making things worse.

Queued.  Thanks.

> Signed-off-by: Linus Arver <linusa@google.com>
> ---
>  trailer.c | 10 +++++-----
>  trailer.h |  6 ++++--
>  2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/trailer.c b/trailer.c
> index f408f9b058d..de4bdece847 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -1220,14 +1220,14 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>  	strbuf_init(&iter->key, 0);
>  	strbuf_init(&iter->val, 0);
>  	opts.no_divider = 1;
> -	trailer_info_get(&iter->info, msg, &opts);
> -	iter->cur = 0;
> +	trailer_info_get(&iter->internal.info, msg, &opts);
> +	iter->internal.cur = 0;
>  }
>  
>  int trailer_iterator_advance(struct trailer_iterator *iter)
>  {
> -	while (iter->cur < iter->info.trailer_nr) {
> -		char *trailer = iter->info.trailers[iter->cur++];
> +	while (iter->internal.cur < iter->internal.info.trailer_nr) {
> +		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
>  		int separator_pos = find_separator(trailer, separators);
>  
>  		if (separator_pos < 1)
> @@ -1245,7 +1245,7 @@ int trailer_iterator_advance(struct trailer_iterator *iter)
>  
>  void trailer_iterator_release(struct trailer_iterator *iter)
>  {
> -	trailer_info_release(&iter->info);
> +	trailer_info_release(&iter->internal.info);
>  	strbuf_release(&iter->val);
>  	strbuf_release(&iter->key);
>  }
> diff --git a/trailer.h b/trailer.h
> index 795d2fccfd9..ab2cd017567 100644
> --- a/trailer.h
> +++ b/trailer.h
> @@ -119,8 +119,10 @@ struct trailer_iterator {
>  	struct strbuf val;
>  
>  	/* private */
> -	struct trailer_info info;
> -	size_t cur;
> +	struct {
> +		struct trailer_info info;
> +		size_t cur;
> +	} internal;
>  };
>  
>  /*

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 2/6] trailer: split process_input_file into separate pieces
  2023-09-09  6:16   ` [PATCH v2 2/6] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
@ 2023-09-11 17:10     ` Junio C Hamano
  0 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2023-09-11 17:10 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Currently, process_input_file does three things:
>
>     (1) parse the input string for trailers,
>     (2) print text before the trailers, and
>     (3) calculate the position of the input where the trailers end.
>
> Rename this function to parse_trailers(), and make it only do
> (1). The caller of this function, process_trailers, becomes responsible
> for (2) and (3). These items belong inside process_trailers because they
> are both concerned with printing the surrounding text around
> trailers (which is already one of the immediate concerns of
> process_trailers).

Nicely explained and the resulting code reads well.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider
  2023-09-09  6:16   ` [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
@ 2023-09-11 17:25     ` Junio C Hamano
  2023-09-14  2:19       ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-11 17:25 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Currently, find_patch_start only finds the start of the patch part of
> the input (by looking at the "---" divider) for cases where the
> "--no-divider" flag has not been provided. If the user provides this
> flag, we do not rely on find_patch_start at all and just call strlen()
> directly on the input.
>
> Instead, make find_patch_start aware of "--no-divider" and make it
> handle that case as well. This means we no longer need to call strlen at
> all and can just rely on the existing code in find_patch_start. By
> forcing callers to consider this important option, we avoid the kind of
> mistake described in be3d654343 (commit: pass --no-divider to
> interpret-trailers, 2023-06-17).

OK.  The code pays attention to "---" so making it stop doing so
when the "--no-*" option is given will make the function responsible
for finding the beginning of the patch.

I wonder if we should rename this function while we are at it,
though.  When "--no-divider" is given, the expected use case is
*not* to have a patch at all, and it is dubious that a function
whose name is find_patch_start() can possibly do anything useful.

The real purpose of this function is to find the end of the log
message, isn't it?  And the caller uses the end of the log message
it found and gives it to find_trailer_start() and find_trailer_end()
as the upper boundary of the search for the trailer block.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-09  6:16   ` [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
@ 2023-09-11 18:54     ` Junio C Hamano
  2023-09-14  2:41       ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-11 18:54 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Do not use *_DEFAULT as a suffix to the enums, because the word
> "default" is overloaded. The following are two examples of the ambiguity
> of the word "default":

In this case these are left unspecified to use the default; while it
is not wrong per-se to say *_DEFAULT, using *_UNSPECIFIED makes it
more obvious.

> So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
> signals to the reader that the *_UNSPECIFIED value by itself carries no
> meaning (it's a zero value and by itself does not "default" to anything,
> necessitating the need to have some other way of getting to a useful
> value).

It gets tempting to initialize a variable to the default and arrange
the rest of the system so that the variable set to the default
triggers the default activity.  Such an obvious solution however
cannot be used when (1) being left unspecified to use the default
value and (2) explicitly set by the user to a value that happens to
be the same as the default have to behave differently.  I am not
sure if that applies to the trailers system, though.

Thanks.


PS.  Glen's old e-mail address is no longer valid and there is no
forwarding done by @google.com mailservers, it seems.  Can you tell
GGG to drop the address (optionally replace it with his new address)?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end
  2023-09-09  6:16   ` [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
@ 2023-09-11 19:01     ` Junio C Hamano
  2023-09-14  1:21       ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-11 19:01 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Previously these fields in the trailer_info struct were of type "const
> char *" and pointed to positions in the input string directly (to the
> start and end positions of the trailer block).
>
> Use offsets to make the intended usage less ambiguous. We only need to
> reference the input string in format_trailer_info(), so update that
> function to take a pointer to the input.

Hmm, I am not sure if this is an improvement.  If the underlying
buffer can be reallocated (to grow), the approach to use the offsets
certainly is easier to deal with, as they will stay valid even after
such a reallocation.  But you lose the obvious sentinel value NULL
that can mean something special, and have to make the readers aware
of the local convention you happened to have picked with a comment
like ...

> Signed-off-by: Linus Arver <linusa@google.com>
> ---
>  trailer.c | 17 ++++++++---------
>  trailer.h |  7 +++----
>  2 files changed, 11 insertions(+), 13 deletions(-)
> ...
>  	/*
> -	 * Pointers to the start and end of the trailer block found. If there
> -	 * is no trailer block found, these 2 pointers point to the end of the
> -	 * input string.
> +	 * Offsets to the trailer block start and end positions in the input
> +	 * string. If no trailer block is found, these are set to 0.
>  	 */

... this, simply because there is no obvious sentinel value for an
unsigned integral type; even if you count MAX_ULONG and its friends,
they are not as obvious as NULL for pointer types.

So, I dunno.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end
  2023-09-11 19:01     ` Junio C Hamano
@ 2023-09-14  1:21       ` Linus Arver
  2023-09-14  3:18         ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-09-14  1:21 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Linus Arver <linusa@google.com>
>>
>> Previously these fields in the trailer_info struct were of type "const
>> char *" and pointed to positions in the input string directly (to the
>> start and end positions of the trailer block).
>>
>> Use offsets to make the intended usage less ambiguous. We only need to
>> reference the input string in format_trailer_info(), so update that
>> function to take a pointer to the input.
>
> Hmm, I am not sure if this is an improvement.  If the underlying
> buffer can be reallocated (to grow), the approach to use the offsets
> certainly is easier to deal with, as they will stay valid even after
> such a reallocation.  But you lose the obvious sentinel value NULL
> that can mean something special

True.

> and have to make the readers aware
> of the local convention you happened to have picked with a comment
> like ...
>
>> Signed-off-by: Linus Arver <linusa@google.com>
>> ---
>>  trailer.c | 17 ++++++++---------
>>  trailer.h |  7 +++----
>>  2 files changed, 11 insertions(+), 13 deletions(-)
>> ...
>>  	/*
>> -	 * Pointers to the start and end of the trailer block found. If there
>> -	 * is no trailer block found, these 2 pointers point to the end of the
>> -	 * input string.
>> +	 * Offsets to the trailer block start and end positions in the input
>> +	 * string. If no trailer block is found, these are set to 0.
>>  	 */
>
> ... this, simply because there is no obvious sentinel value for an
> unsigned integral type; even if you count MAX_ULONG and its friends,
> they are not as obvious as NULL for pointer types.

I agree that there is no trustworthy sentinel value for an unsigned
integral type.

On the other hand, we never used NULL as a sentinel value before even
when they were const char pointers --- the current comment for these
fields which say ...

    If there is no trailer block found, these 2 pointers point to the end of the
    input string.

... sounds somewhat arbitrary to me (and I don't think we care about
this property in trailer.c, and AFAICS it's also not something that
consumers should be aware of). Consumers of the trailer_info struct
could also just see if "info->trailer_nr > 0" to check whether any
trailers were found, although if we're merging Patch 1 [1] of this
series the consumers will not have easy access any more to any of the
trailer_info fields, and they should instead be using a public-facing
function that does the "were trailers found" check.

> So, I dunno.

If the "we don't use NULL sentinel values currently anyway" argument is
convincing enough, I'm happy to add this to the commit message on a
reroll. But I'm also OK with dropping this patch. Thoughts?

[1] https://lore.kernel.org/git/pull.1563.git.1691211879.gitgitgadget@gmail.com/T/#m8f1b5f1eb346331658c8c7b3e057a4ee31223664

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider
  2023-09-11 17:25     ` Junio C Hamano
@ 2023-09-14  2:19       ` Linus Arver
  2023-09-14  3:12         ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-09-14  2:19 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Linus Arver <linusa@google.com>
>>
>> Currently, find_patch_start only finds the start of the patch part of
>> the input (by looking at the "---" divider) for cases where the
>> "--no-divider" flag has not been provided. If the user provides this
>> flag, we do not rely on find_patch_start at all and just call strlen()
>> directly on the input.
>>
>> Instead, make find_patch_start aware of "--no-divider" and make it
>> handle that case as well. This means we no longer need to call strlen at
>> all and can just rely on the existing code in find_patch_start. By
>> forcing callers to consider this important option, we avoid the kind of
>> mistake described in be3d654343 (commit: pass --no-divider to
>> interpret-trailers, 2023-06-17).
>
> OK.  The code pays attention to "---" so making it stop doing so
> when the "--no-*" option is given will make the function responsible
> for finding the beginning of the patch.
>
> I wonder if we should rename this function while we are at it,
> though.  When "--no-divider" is given, the expected use case is
> *not* to have a patch at all, and it is dubious that a function
> whose name is find_patch_start() can possibly do anything useful.

IOW, saying as the caller of this function, "find the patch start in
this input I'm giving you, but also FYI the input has no patch in it"
sounds wrong. Agreed.

> The real purpose of this function is to find the end of the log
> message, isn't it?

Indeed.

> And the caller uses the end of the log message
> it found and gives it to find_trailer_start() and find_trailer_end()
> as the upper boundary of the search for the trailer block.

Right! So a better name might be something like
"find_trailer_search_boundary" with a comment like

    Find the point at which we should stop searching for trailers (to
    parse them). This is either the end of the input string (obviously),
    or the point when we see "---" indicating the start of the "patch
    part".

Will update.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-11 18:54     ` Junio C Hamano
@ 2023-09-14  2:41       ` Linus Arver
  2023-09-14  3:16         ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-09-14  2:41 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Linus Arver <linusa@google.com>
>>
>> [...]
>> So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
>> signals to the reader that the *_UNSPECIFIED value by itself carries no
>> meaning (it's a zero value and by itself does not "default" to anything,
>> necessitating the need to have some other way of getting to a useful
>> value).
>
> It gets tempting to initialize a variable to the default and arrange
> the rest of the system so that the variable set to the default
> triggers the default activity.  Such an obvious solution however
> cannot be used when (1) being left unspecified to use the default
> value and (2) explicitly set by the user to a value that happens to
> be the same as the default have to behave differently.  I am not
> sure if that applies to the trailers system, though.
>
> Thanks.

I get the feeling that you wrote the "Such an obvious ... differently"
sentence after writing the last sentence in that paragraph, because when
you say

    I am not
    sure if that applies to the trailers system, though.

I read the "that" (emphasis added) in there as referring to the solution
described in the first sentence, and not the conditions (1) and (2) you
enumerated. IOW, you are OK with this patch.

Am I parsing your paragraph correctly?

> PS.  Glen's old e-mail address is no longer valid and there is no
> forwarding done by @google.com mailservers, it seems.  Can you tell
> GGG to drop the address (optionally replace it with his new address)?

Thanks for catching this. I've updated the GGG PR [1] to use Glen's new
address.

[1] https://github.com/gitgitgadget/git/pull/1563#issue-1837440424

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider
  2023-09-14  2:19       ` Linus Arver
@ 2023-09-14  3:12         ` Junio C Hamano
  2023-09-14  5:31           ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-14  3:12 UTC (permalink / raw)
  To: Linus Arver
  Cc: Linus Arver via GitGitGadget, git, Glen Choo, Christian Couder,
	Phillip Wood

Linus Arver <linusa@google.com> writes:

>> The real purpose of this function is to find the end of the log
>> message, isn't it?
>
> Indeed.
> ...
> Right! So a better name might be something like
> "find_trailer_search_boundary" with a comment like

Or "find_end_of_log_message()", which we agreed to be the real
purpose of this function ;-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-14  2:41       ` Linus Arver
@ 2023-09-14  3:16         ` Junio C Hamano
  2023-09-22 18:23           ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-14  3:16 UTC (permalink / raw)
  To: Linus Arver
  Cc: Linus Arver via GitGitGadget, git, Christian Couder, Phillip Wood

Linus Arver <linusa@google.com> writes:

>> It gets tempting to initialize a variable to the default and arrange
>> the rest of the system so that the variable set to the default
>> triggers the default activity.  Such an obvious solution however
>> cannot be used when (1) being left unspecified to use the default
>> value and (2) explicitly set by the user to a value that happens to
>> be the same as the default have to behave differently.  I am not
>> sure if that applies to the trailers system, though.
>>
>> Thanks.
>
> I get the feeling that you wrote the "Such an obvious ... differently"
> sentence after writing the last sentence in that paragraph, because when
> you say
>
>     I am not
>     sure if that applies to the trailers system, though.
>
> I read the "that" (emphasis added) in there as referring to the solution
> described in the first sentence, and not the conditions (1) and (2) you
> enumerated. IOW, you are OK with this patch.

"that" refers to "the reason not to use such an obvious solution".
I do not know if trailer subsystem wants to treat "left unspecified"
and "set to the value that happens to be the same as the default" in
a different way.  If it does want to do so, then I do not see a
strong reason not to use the "obvious solution".

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end
  2023-09-14  1:21       ` Linus Arver
@ 2023-09-14  3:18         ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-09-14  3:18 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood

Linus Arver <linusa@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Linus Arver <linusa@google.com>
>>>
>>> Previously these fields in the trailer_info struct were of type "const
>>> char *" and pointed to positions in the input string directly (to the
>>> start and end positions of the trailer block).
>>>
>>> Use offsets to make the intended usage less ambiguous. We only need to
>>> reference the input string in format_trailer_info(), so update that
>>> function to take a pointer to the input.
>>
>> Hmm, I am not sure if this is an improvement.  If the underlying
>> buffer can be reallocated (to grow), the approach to use the offsets
>> certainly is easier to deal with, as they will stay valid even after
>> such a reallocation.  But you lose the obvious sentinel value NULL
>> that can mean something special
>
> True.
>
>> and have to make the readers aware
>> of the local convention you happened to have picked with a comment
>> like ...
>>
>>> Signed-off-by: Linus Arver <linusa@google.com>
>>> ---
>>>  trailer.c | 17 ++++++++---------
>>>  trailer.h |  7 +++----
>>>  2 files changed, 11 insertions(+), 13 deletions(-)
>>> ...
>>>  	/*
>>> -	 * Pointers to the start and end of the trailer block found. If there
>>> -	 * is no trailer block found, these 2 pointers point to the end of the
>>> -	 * input string.
>>> +	 * Offsets to the trailer block start and end positions in the input
>>> +	 * string. If no trailer block is found, these are set to 0.
>>>  	 */

I've just realized that the new comment "If no trailer block is found,
these are set to 0" is perhaps not always true in the current version
of this patch. This is because this patch did a mechanical type
conversion of the old fields from "const char *" to "size_t", and we
still run

    trailer_end = find_trailer_end(str, trailer_search_boundary);
    trailer_start = find_trailer_start(str, trailer_end);

inside trailer_info_get() (not visible in the patch context lines). I
will update this patch to make the comment "If no trailer block is found,
these are set to 0" true.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider
  2023-09-14  3:12         ` Junio C Hamano
@ 2023-09-14  5:31           ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-09-14  5:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Arver via GitGitGadget, git, Glen Choo, Christian Couder,
	Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>>> The real purpose of this function is to find the end of the log
>>> message, isn't it?
>>
>> Indeed.
>> ...
>> Right! So a better name might be something like
>> "find_trailer_search_boundary" with a comment like
>
> Or "find_end_of_log_message()", which we agreed to be the real
> purpose of this function ;-)

I did this locally, but in doing so realized that we have in trailer.c

    /* Return the position of the end of the trailers. */
    static size_t find_trailer_end(const char *buf, size_t len)
    {
        return len - ignore_non_trailer(buf, len);
    }

and the ignore_non_trailer() comes from commit.c, which says

    /*
    * Inspect the given string and determine the true "end" of the log message, in
    * order to find where to put a new Signed-off-by trailer.  Ignored are
    * trailing comment lines and blank lines.  To support "git commit -s
    * --amend" on an existing commit, we also ignore "Conflicts:".  To
    * support "git commit -v", we truncate at cut lines.
    *
    * Returns the number of bytes from the tail to ignore, to be fed as
    * the second parameter to append_signoff().
    */
    size_t ignore_non_trailer(const char *buf, size_t len)
    {

...and I am not so sure the new "find_end_of_log_message" name for
find_patch_start() is desirable because of the overlap in meaning with
the comment for ignore_non_trailer(). To recap, we have in
trailer_info_get() in trailer.c which (without this patch) has

    if (opts->no_divider)
        patch_start = strlen(str);
    else
        patch_start = find_patch_start(str);

    trailer_end = find_trailer_end(str, patch_start);
    trailer_start = find_trailer_start(str, trailer_end);
                
to find the trailer_end and trailer_start positions in the input. These
positions are the boundaries for parsing for actual trailers (if any).
More precisely, the "patch_start" variable helps us _skip_ the "patch
part" of the input (if any, denoted by "---"). The call to
find_trailer_end() helps us (again) _skip_ any parts that are not part
of the actual log message (per the comment in ignore_non_trailer()). So
as far as trailer_info_get() is concerned, we are just trying to skip
over areas where we shouldn't search/parse for trailers.

The above analysis leads me to some new ideas:

(1) For "find_end_of_log_message()", I think this name should really
    belong to ignore_non_trailer() in commit.c instead of
    find_patch_start() in trailer.c. The existing comment for
    ignore_non_trailer() already says that its job is to determine the
    true end of the log message (and does a lot of the necessary work to
    do this job).

(2) find_patch_start() should be named something like
    "shrink_trailer_search_space" (although this meaning belongs equally
    well to find_trailer_end()), because the point is to reduce the
    search space for parsing trailers.

(3) "find_trailer_end" is not a great function name because on first
    glance it implies that it found the end of trailers, but this is
    only true for the "happy path" of actually finding trailers.

I need to consider all of the above ideas and reroll this patch. Because
the ideas here closely relate to the "trailer_end" and "trailer_start"
variables, I will probably reorder the series so that this patch and the
last patch are closer together.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-14  3:16         ` Junio C Hamano
@ 2023-09-22 18:23           ` Linus Arver
  2023-09-22 19:48             ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-09-22 18:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Arver via GitGitGadget, git, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>>> It gets tempting to initialize a variable to the default and arrange
>>> the rest of the system so that the variable set to the default
>>> triggers the default activity.  Such an obvious solution however
>>> cannot be used when (1) being left unspecified to use the default
>>> value and (2) explicitly set by the user to a value that happens to
>>> be the same as the default have to behave differently.  I am not
>>> sure if that applies to the trailers system, though.
>>>
>>> Thanks.
>>
>> I get the feeling that you wrote the "Such an obvious ... differently"
>> sentence after writing the last sentence in that paragraph, because when
>> you say
>>
>>     I am not
>>     sure if that applies to the trailers system, though.
>>
>> I read the "that" (emphasis added) in there as referring to the solution
>> described in the first sentence, and not the conditions (1) and (2) you
>> enumerated. IOW, you are OK with this patch.
>
> "that" refers to "the reason not to use such an obvious solution".
> I do not know if trailer subsystem wants to treat "left unspecified"
> and "set to the value that happens to be the same as the default" in
> a different way.  If it does want to do so, then I do not see a
> strong reason not to use the "obvious solution".

Currently we set the defaults (these take effect absent any
configuration or CLI options) in trailer.c like this:

    static void ensure_configured(void)
    {
            if (configured)
                    return;

            /* Default config must be setup first */
            default_conf_info.where = WHERE_END;
            default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
            default_conf_info.if_missing = MISSING_ADD;
            git_config(git_trailer_default_config, NULL);
            git_config(git_trailer_config, NULL);
            configured = 1;
    }

So technically we already sort of do the "obvious solution". And then
these get overriden by configuration (if any), and finally any CLI
options that are passed in (e.g., "--where after"). The reason why I
prefer the *_UNSPECIFIED style in this patch for these enums though is
because of this (and other similar functions) in trailer.c:

    int trailer_set_where(enum trailer_where *item, const char *value)
    {
            if (!value)
                    *item = WHERE_DEFAULT;
            else if (!strcasecmp("after", value))
                    *item = WHERE_AFTER;
            else if (!strcasecmp("before", value))
                    *item = WHERE_BEFORE;
            else if (!strcasecmp("end", value))
                    *item = WHERE_END;
            else if (!strcasecmp("start", value))
                    *item = WHERE_START;
            else
                    return -1;
            return 0;
    }

and this function is used as a callback to the "--where" flag, such that
the WHERE_DEFAULT gets chosen if "--no-where" is where. I prefer the
WHERE_UNSPECIFIED as in this patch because the WHERE_DEFAULT is
ambiguous on its own (i.e., WHERE_DEFAULT could mean that we either use
the default value WHERE_END in default_conf_info, or it could mean that
we fall back to the configuration variables (where it could be something
else)).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-22 18:23           ` Linus Arver
@ 2023-09-22 19:48             ` Junio C Hamano
  2023-09-26  5:30               ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-22 19:48 UTC (permalink / raw)
  To: Linus Arver
  Cc: Linus Arver via GitGitGadget, git, Christian Couder, Phillip Wood

Linus Arver <linusa@google.com> writes:

> ... I prefer the
> WHERE_UNSPECIFIED as in this patch because the WHERE_DEFAULT is
> ambiguous on its own (i.e., WHERE_DEFAULT could mean that we either use
> the default value WHERE_END in default_conf_info, or it could mean that
> we fall back to the configuration variables (where it could be something
> else)).

Yup.  "Turning something that is left UNSPECIFIED after command line
options and configuration files are processed into the hardcoded
DEFAULT" is one mental model that is easy to explain.

I however am not sure if it is easier than "Setting something to
hardcoded DEFAULT before command line options and configuration
files are allowed to tweak it, and if nobody touches it, then it
gets the hardcoded DEFAULT value in the end", which is another valid
mental model, though.  If both can be used, I'd personally prefer
the latter, and reserve the "UNSPECIFIED to DEFAULT" pattern to
signal that we are dealing with a case where the simpler pattern
without UNSPECIFIED cannot solve.

The simpler pattern would not work, when the default is defined
depending on a larger context.  Imagine we have two Boolean
variables, A and B, where A defaults to false, and B defaults to
some value derived from the value of A (say, opposite of A).

In the most natural implementation, you'd initialize A to false and
B to unspecified, let command line options and configuration
variables to set them to true or false, and after all that, you do
not have to tweak A's value (it will be left to false that is the
default unless the user or the configuration gave an explicit
value), but you need to check if B is left unspecified and tweak it
to true or false using the final value of A.

For a variable with such a need like B, we cannot avoid having
"unspecified".  If you initialize it to false (or true), after the
command line and the configuration files are read and you find B is
set to false (or true), you cannot tell if the user or the
configuration explicitly set B to false (or true), in which case you
do not want to futz with its value based on what is in A, or it is
false (or true) only because nobody touched it, in which case you
need to compute its value based on what is in A.

And that is why I asked if we need to special case "the user did not
touch and the variable is left untouched" in the trailer subsystem.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 0/9] Trailer readability cleanups
  2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
                     ` (5 preceding siblings ...)
  2023-09-09  6:16   ` [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
@ 2023-09-22 19:50   ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
                       ` (10 more replies)
  6 siblings, 11 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver

These patches were created while digging into the trailer code to better
understand how it works, in preparation for making the trailer.{c,h} files
as small as possible to make them available as a library for external users.
This series was originally created as part of [1], but are sent here
separately because the changes here are arguably more subjective in nature.
I think Patch 1 is the most important in this series. The others can wait,
if folks are opposed to adding them on their own merits at this point in
time.

These patches do not add or change any features. Instead, their goal is to
make the code easier to understand for new contributors (like myself), by
making various cleanups and improvements. Ultimately, my hope is that with
such cleanups, we are better positioned to make larger changes (especially
the broader libification effort, as in "Introduce Git Standard Library"
[2]).

Patch 1 was inspired by 576de3d956 (unpack_trees: start splitting internal
fields from public API, 2023-02-27) [3], and is in preparation for a
libification effort in the future around the trailer code. Independent of
libification, it still makes sense to discourage callers from peeking into
these trailer-internal fields.

Patches 2-3 aim to make some functions do a little less multitasking.

Patch 4 is a renaming change to reduce overloaded language in the codebase.
It is inspired by 229d6ab6bf (doc: trailer: examples: avoid the word
"message" by itself, 2023-06-15) [4], which did a similar thing for the
interpret-trailers documentation.

Patches 5-8 clean up the area around handling the trailer block start and
end of the input. In particular we rename find_patch_start() to
find_end_of_log_message(). These patches address the new approach I cited in
[5].


Updates in v3
=============

 * Patches 4 and 6 (--no-divider and trailer block start/end cleanups) have
   been reorganized to Patches 5-8. This ended up touching commit.c in a
   minor way, but otherwise all of the changes here are cleanups and do not
   change any behavior.


Updates in v2
=============

 * Patch 1: Drop the use of a #define. Instead just use an anonymous struct
   named internal.
 * Patch 2: Don't free info out parameter inside parse_trailers(). Instead
   free it from the caller, process_trailers(). Update comment in
   parse_trailers().
 * Patch 3: Reword commit message.
 * Patch 4: Mention be3d654343 (commit: pass --no-divider to
   interpret-trailers, 2023-06-17) in commit message.
 * Added Patch 6 to make trailer_info use offsets for trailer_start and
   trailer_end (thanks to Glen Choo for the suggestion).

[1]
https://lore.kernel.org/git/pull.1564.git.1691210737.gitgitgadget@gmail.com/T/#mb044012670663d8eb7a548924bbcc933bef116de
[2]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/
[3]
https://lore.kernel.org/git/pull.1149.git.1677143700.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/6b4cb31b17077181a311ca87e82464a1e2ad67dd.1686797630.git.gitgitgadget@gmail.com/
[5]
https://lore.kernel.org/git/pull.1563.git.1691211879.gitgitgadget@gmail.com/T/#m0131f9829c35d8e0103ffa88f07d8e0e43dd732c

Linus Arver (9):
  trailer: separate public from internal portion of trailer_iterator
  trailer: split process_input_file into separate pieces
  trailer: split process_command_line_args into separate functions
  trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  commit: ignore_non_trailer computes number of bytes to ignore
  trailer: find the end of the log message
  trailer: use offsets for trailer_start/trailer_end
  trailer: only use trailer_block_* variables if trailers were found
  trailer: make stack variable names match field names

 builtin/commit.c |   2 +-
 builtin/merge.c  |   2 +-
 commit.c         |   2 +-
 commit.h         |   4 +-
 sequencer.c      |   2 +-
 trailer.c        | 220 ++++++++++++++++++++++++++++-------------------
 trailer.h        |  27 +++---
 7 files changed, 154 insertions(+), 105 deletions(-)


base-commit: 1b0a5129563ebe720330fdc8f5c6843d27641137
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1563%2Flistx%2Ftrailer-libification-prep-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1563/listx/trailer-libification-prep-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1563

Range-diff vs v2:

  1:  4f116d2550f =  1:  4f116d2550f trailer: separate public from internal portion of trailer_iterator
  2:  c00f4623d0b =  2:  c00f4623d0b trailer: split process_input_file into separate pieces
  3:  f78c2345fad =  3:  f78c2345fad trailer: split process_command_line_args into separate functions
  5:  52958c3557c =  4:  47186a09b24 trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  -:  ----------- >  5:  da52cec42e1 commit: ignore_non_trailer computes number of bytes to ignore
  4:  f5f507c4c6c !  6:  ab8a6ced143 trailer: teach find_patch_start about --no-divider
     @@ Metadata
      Author: Linus Arver <linusa@google.com>
      
       ## Commit message ##
     -    trailer: teach find_patch_start about --no-divider
     +    trailer: find the end of the log message
      
     -    Currently, find_patch_start only finds the start of the patch part of
     -    the input (by looking at the "---" divider) for cases where the
     -    "--no-divider" flag has not been provided. If the user provides this
     -    flag, we do not rely on find_patch_start at all and just call strlen()
     -    directly on the input.
     +    Previously, trailer_info_get() computed the trailer block end position
     +    by
      
     -    Instead, make find_patch_start aware of "--no-divider" and make it
     -    handle that case as well. This means we no longer need to call strlen at
     -    all and can just rely on the existing code in find_patch_start. By
     -    forcing callers to consider this important option, we avoid the kind of
     -    mistake described in be3d654343 (commit: pass --no-divider to
     -    interpret-trailers, 2023-06-17).
     +    (1) checking for the opts->no_divider flag and optionally calling
     +        find_patch_start() to find the "patch start" location (patch_start), and
     +    (2) calling find_trailer_end() to find the end of the trailer block
     +        using patch_start as a guide, saving the return value into
     +        "trailer_end".
      
     -    This patch will make unit testing a bit more pleasant in this area in
     -    the future when we adopt a unit testing framework, because we would not
     -    have to test multiple functions to check how finding the start of a
     -    patch part works (we would only need to test find_patch_start).
     +    The logic in (1) was awkward because the variable "patch_start" is
     +    misleading if there is no patch in the input. The logic in (2) was
     +    misleading because it could be the case that no trailers are in the
     +    input (yet we are setting a "trailer_end" variable before even searching
     +    for trailers, which happens later in find_trailer_start()). The name
     +    "find_trailer_end" was misleading because that function did not look for
     +    any trailer block itself --- instead it just computed the end position
     +    of the log message in the input where the end of the trailer block (if
     +    it exists) would be (because trailer blocks must always come after the
     +    end of the log message).
      
     +    Combine the logic in (1) and (2) together into find_patch_start() by
     +    renaming it to find_end_of_log_message(). The end of the log message is
     +    the starting point which find_trailer_start() needs to start searching
     +    backward to parse individual trailers (if any).
     +
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Linus Arver <linusa@google.com>
      
       ## trailer.c ##
      @@ trailer.c: static ssize_t last_line(const char *buf, size_t len)
     -  * Return the position of the start of the patch or the length of str if there
     -  * is no patch in the message.
     + }
     + 
     + /*
     +- * Return the position of the start of the patch or the length of str if there
     +- * is no patch in the message.
     ++ * Find the end of the log message as an offset from the start of the input
     ++ * (where callers of this function are interested in looking for a trailers
     ++ * block in the same input). We have to consider two categories of content that
     ++ * can come at the end of the input which we want to ignore (because they don't
     ++ * belong in the log message):
     ++ *
     ++ * (1) the "patch part" which begins with a "---" divider and has patch
     ++ * information (like the output of git-format-patch), and
     ++ *
     ++ * (2) any trailing comment lines, blank lines like in the output of "git
     ++ * commit -v", or stuff below the "cut" (scissor) line.
     ++ *
     ++ * As a formula, the situation looks like this:
     ++ *
     ++ *     INPUT = LOG MESSAGE + IGNORED
     ++ *
     ++ * where IGNORED can be either of the two categories described above. It may be
     ++ * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE
     ++ * contains a trailer block, but that's not the concern of this function.
        */
      -static size_t find_patch_start(const char *str)
     -+static size_t find_patch_start(const char *str, int no_divider)
     ++static size_t find_end_of_log_message(const char *input, int no_divider)
       {
     ++	size_t end;
     ++
       	const char *s;
       
     - 	for (s = str; *s; s = next_line(s)) {
     +-	for (s = str; *s; s = next_line(s)) {
     ++	/* Assume the naive end of the input is already what we want. */
     ++	end = strlen(input);
     ++
     ++	/* Optionally skip over any patch part ("---" line and below). */
     ++	for (s = input; *s; s = next_line(s)) {
       		const char *v;
       
      -		if (skip_prefix(s, "---", &v) && isspace(*v))
     -+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
     - 			return s - str;
     +-			return s - str;
     ++		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
     ++			end = s - input;
     ++			break;
     ++		}
       	}
       
     +-	return s - str;
     ++	/* Skip over other ignorable bits. */
     ++	return end - ignored_log_message_bytes(input, end);
     + }
     + 
     + /*
     +@@ trailer.c: continue_outer_loop:
     + 	return len;
     + }
     + 
     +-/* Return the position of the end of the trailers. */
     +-static size_t find_trailer_end(const char *buf, size_t len)
     +-{
     +-	return len - ignored_log_message_bytes(buf, len);
     +-}
     +-
     + static int ends_with_blank_line(const char *buf, size_t len)
     + {
     + 	ssize_t ll = last_line(buf, len);
     +@@ trailer.c: void process_trailers(const char *file,
     + void trailer_info_get(struct trailer_info *info, const char *str,
     + 		      const struct process_trailer_options *opts)
     + {
     +-	int patch_start, trailer_end, trailer_start;
     ++	int end_of_log_message, trailer_start;
     + 	struct strbuf **trailer_lines, **ptr;
     + 	char **trailer_strings = NULL;
     + 	size_t nr = 0, alloc = 0;
      @@ trailer.c: void trailer_info_get(struct trailer_info *info, const char *str,
       
       	ensure_configured();
     @@ trailer.c: void trailer_info_get(struct trailer_info *info, const char *str,
      -	else
      -		patch_start = find_patch_start(str);
      -
     -+	patch_start = find_patch_start(str, opts->no_divider);
     - 	trailer_end = find_trailer_end(str, patch_start);
     - 	trailer_start = find_trailer_start(str, trailer_end);
     +-	trailer_end = find_trailer_end(str, patch_start);
     +-	trailer_start = find_trailer_start(str, trailer_end);
     ++	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
     ++	trailer_start = find_trailer_start(str, end_of_log_message);
       
     + 	trailer_lines = strbuf_split_buf(str + trailer_start,
     +-					 trailer_end - trailer_start,
     ++					 end_of_log_message - trailer_start,
     + 					 '\n',
     + 					 0);
     + 	for (ptr = trailer_lines; *ptr; ptr++) {
     +@@ trailer.c: void trailer_info_get(struct trailer_info *info, const char *str,
     + 	info->blank_line_before_trailer = ends_with_blank_line(str,
     + 							       trailer_start);
     + 	info->trailer_start = str + trailer_start;
     +-	info->trailer_end = str + trailer_end;
     ++	info->trailer_end = str + end_of_log_message;
     + 	info->trailers = trailer_strings;
     + 	info->trailer_nr = nr;
     + }
  6:  0463066ebe0 !  7:  091805eb7d9 trailer: use offsets for trailer_start/trailer_end
     @@ Commit message
          reference the input string in format_trailer_info(), so update that
          function to take a pointer to the input.
      
     +    While we're at it, rename trailer_start to trailer_block_start to be
     +    more explicit about these offsets (that they are for the entire trailer
     +    block including other trailers). Ditto for trailer_end.
     +
          Signed-off-by: Linus Arver <linusa@google.com>
      
     + ## sequencer.c ##
     +@@ sequencer.c: static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
     + 	if (ignore_footer)
     + 		sb->buf[sb->len - ignore_footer] = saved_char;
     + 
     +-	if (info.trailer_start == info.trailer_end)
     ++	if (info.trailer_block_start == info.trailer_block_end)
     + 		return 0;
     + 
     + 	for (i = 0; i < info.trailer_nr; i++)
     +
       ## trailer.c ##
     +@@ trailer.c: static size_t find_end_of_log_message(const char *input, int no_divider)
     +  * Return the position of the first trailer line or len if there are no
     +  * trailers.
     +  */
     +-static size_t find_trailer_start(const char *buf, size_t len)
     ++static size_t find_trailer_block_start(const char *buf, size_t len)
     + {
     + 	const char *s;
     + 	ssize_t end_of_title, l;
      @@ trailer.c: void process_trailers(const char *file,
       	LIST_HEAD(head);
       	struct strbuf sb = STRBUF_INIT;
     @@ trailer.c: void process_trailers(const char *file,
       	/* Print the lines before the trailers */
       	if (!opts->only_trailers)
      -		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
     -+		fwrite(sb.buf, 1, info.trailer_start, outfile);
     ++		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
       
       	if (!opts->only_trailers && !info.blank_line_before_trailer)
       		fprintf(outfile, "\n");
     @@ trailer.c: void process_trailers(const char *file,
       	/* Print the lines after the trailers as is */
       	if (!opts->only_trailers)
      -		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
     -+		fwrite(sb.buf + info.trailer_end, 1, sb.len - info.trailer_end, outfile);
     ++		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
       
       	if (opts->in_place)
       		if (rename_tempfile(&trailers_tempfile, file))
     @@ trailer.c: void process_trailers(const char *file,
       void trailer_info_get(struct trailer_info *info, const char *str,
       		      const struct process_trailer_options *opts)
       {
     --	int patch_start, trailer_end, trailer_start;
     -+	size_t patch_start, trailer_end = 0, trailer_start = 0;
     +-	int end_of_log_message, trailer_start;
     ++	size_t end_of_log_message = 0, trailer_block_start = 0;
       	struct strbuf **trailer_lines, **ptr;
       	char **trailer_strings = NULL;
       	size_t nr = 0, alloc = 0;
      @@ trailer.c: void trailer_info_get(struct trailer_info *info, const char *str,
     + 	ensure_configured();
     + 
     + 	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
     +-	trailer_start = find_trailer_start(str, end_of_log_message);
     ++	trailer_block_start = find_trailer_block_start(str, end_of_log_message);
     + 
     +-	trailer_lines = strbuf_split_buf(str + trailer_start,
     +-					 end_of_log_message - trailer_start,
     ++	trailer_lines = strbuf_split_buf(str + trailer_block_start,
     ++					 end_of_log_message - trailer_block_start,
     + 					 '\n',
     + 					 0);
     + 	for (ptr = trailer_lines; *ptr; ptr++) {
     +@@ trailer.c: void trailer_info_get(struct trailer_info *info, const char *str,
     + 	strbuf_list_free(trailer_lines);
       
       	info->blank_line_before_trailer = ends_with_blank_line(str,
     - 							       trailer_start);
     +-							       trailer_start);
      -	info->trailer_start = str + trailer_start;
     --	info->trailer_end = str + trailer_end;
     -+	info->trailer_start = trailer_start;
     -+	info->trailer_end = trailer_end;
     +-	info->trailer_end = str + end_of_log_message;
     ++							       trailer_block_start);
     ++	info->trailer_block_start = trailer_block_start;
     ++	info->trailer_block_end = end_of_log_message;
       	info->trailers = trailer_strings;
       	info->trailer_nr = nr;
       }
     @@ trailer.c: static void format_trailer_info(struct strbuf *out,
       	    !opts->separator && !opts->key_only && !opts->value_only &&
       	    !opts->key_value_separator) {
      -		strbuf_add(out, info->trailer_start,
     -+		strbuf_add(out, msg + info->trailer_start,
     - 			   info->trailer_end - info->trailer_start);
     +-			   info->trailer_end - info->trailer_start);
     ++		strbuf_add(out, msg + info->trailer_block_start,
     ++			   info->trailer_block_end - info->trailer_block_start);
       		return;
       	}
     + 
      @@ trailer.c: void format_trailers_from_commit(struct strbuf *out, const char *msg,
       	struct trailer_info info;
       
     @@ trailer.c: void format_trailers_from_commit(struct strbuf *out, const char *msg,
       
      
       ## trailer.h ##
     -@@ trailer.h: struct trailer_info {
     +@@ trailer.h: int trailer_set_if_missing(enum trailer_if_missing *item, const char *value);
     + struct trailer_info {
     + 	/*
     + 	 * True if there is a blank line before the location pointed to by
     +-	 * trailer_start.
     ++	 * trailer_block_start.
     + 	 */
       	int blank_line_before_trailer;
       
       	/*
     @@ trailer.h: struct trailer_info {
      -	 * is no trailer block found, these 2 pointers point to the end of the
      -	 * input string.
      +	 * Offsets to the trailer block start and end positions in the input
     -+	 * string. If no trailer block is found, these are set to 0.
     ++	 * string. If no trailer block is found, these are both set to the
     ++	 * "true" end of the input, per find_true_end_of_input().
     ++	 *
     ++	 * NOTE: This will be changed so that these point to 0 in the next
     ++	 * patch if no trailers are found.
       	 */
      -	const char *trailer_start, *trailer_end;
     -+	size_t trailer_start, trailer_end;
     ++	size_t trailer_block_start, trailer_block_end;
       
       	/*
       	 * Array of trailers found.
  -:  ----------- >  8:  1762f78a613 trailer: only use trailer_block_* variables if trailers were found
  -:  ----------- >  9:  a784c45ed71 trailer: make stack variable names match field names

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 2/9] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

The fields here are not meant to be used by downstream callers, so put
them behind an anonymous struct named as "internal" to warn against
their use. This follows the pattern in 576de3d956 (unpack_trees: start
splitting internal fields from public API, 2023-02-27).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 10 +++++-----
 trailer.h |  6 ++++--
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/trailer.c b/trailer.c
index f408f9b058d..de4bdece847 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1220,14 +1220,14 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
 	strbuf_init(&iter->key, 0);
 	strbuf_init(&iter->val, 0);
 	opts.no_divider = 1;
-	trailer_info_get(&iter->info, msg, &opts);
-	iter->cur = 0;
+	trailer_info_get(&iter->internal.info, msg, &opts);
+	iter->internal.cur = 0;
 }
 
 int trailer_iterator_advance(struct trailer_iterator *iter)
 {
-	while (iter->cur < iter->info.trailer_nr) {
-		char *trailer = iter->info.trailers[iter->cur++];
+	while (iter->internal.cur < iter->internal.info.trailer_nr) {
+		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
 		int separator_pos = find_separator(trailer, separators);
 
 		if (separator_pos < 1)
@@ -1245,7 +1245,7 @@ int trailer_iterator_advance(struct trailer_iterator *iter)
 
 void trailer_iterator_release(struct trailer_iterator *iter)
 {
-	trailer_info_release(&iter->info);
+	trailer_info_release(&iter->internal.info);
 	strbuf_release(&iter->val);
 	strbuf_release(&iter->key);
 }
diff --git a/trailer.h b/trailer.h
index 795d2fccfd9..ab2cd017567 100644
--- a/trailer.h
+++ b/trailer.h
@@ -119,8 +119,10 @@ struct trailer_iterator {
 	struct strbuf val;
 
 	/* private */
-	struct trailer_info info;
-	size_t cur;
+	struct {
+		struct trailer_info info;
+		size_t cur;
+	} internal;
 };
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 2/9] trailer: split process_input_file into separate pieces
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 3/9] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Currently, process_input_file does three things:

    (1) parse the input string for trailers,
    (2) print text before the trailers, and
    (3) calculate the position of the input where the trailers end.

Rename this function to parse_trailers(), and make it only do
(1). The caller of this function, process_trailers, becomes responsible
for (2) and (3). These items belong inside process_trailers because they
are both concerned with printing the surrounding text around
trailers (which is already one of the immediate concerns of
process_trailers).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/trailer.c b/trailer.c
index de4bdece847..2c56cbc4a2e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -961,28 +961,24 @@ static void unfold_value(struct strbuf *val)
 	strbuf_release(&out);
 }
 
-static size_t process_input_file(FILE *outfile,
-				 const char *str,
-				 struct list_head *head,
-				 const struct process_trailer_options *opts)
+/*
+ * Parse trailers in "str", populating the trailer info and "head"
+ * linked list structure.
+ */
+static void parse_trailers(struct trailer_info *info,
+			     const char *str,
+			     struct list_head *head,
+			     const struct process_trailer_options *opts)
 {
-	struct trailer_info info;
 	struct strbuf tok = STRBUF_INIT;
 	struct strbuf val = STRBUF_INIT;
 	size_t i;
 
-	trailer_info_get(&info, str, opts);
-
-	/* Print lines before the trailers as is */
-	if (!opts->only_trailers)
-		fwrite(str, 1, info.trailer_start - str, outfile);
+	trailer_info_get(info, str, opts);
 
-	if (!opts->only_trailers && !info.blank_line_before_trailer)
-		fprintf(outfile, "\n");
-
-	for (i = 0; i < info.trailer_nr; i++) {
+	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
-		char *trailer = info.trailers[i];
+		char *trailer = info->trailers[i];
 		if (trailer[0] == comment_line_char)
 			continue;
 		separator_pos = find_separator(trailer, separators);
@@ -1002,10 +998,6 @@ static size_t process_input_file(FILE *outfile,
 					 strbuf_detach(&val, NULL));
 		}
 	}
-
-	trailer_info_release(&info);
-
-	return info.trailer_end - str;
 }
 
 static void free_all(struct list_head *head)
@@ -1054,6 +1046,7 @@ void process_trailers(const char *file,
 {
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
+	struct trailer_info info;
 	size_t trailer_end;
 	FILE *outfile = stdout;
 
@@ -1064,8 +1057,16 @@ void process_trailers(const char *file,
 	if (opts->in_place)
 		outfile = create_in_place_tempfile(file);
 
+	parse_trailers(&info, sb.buf, &head, opts);
+	trailer_end = info.trailer_end - sb.buf;
+
 	/* Print the lines before the trailers */
-	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
+	if (!opts->only_trailers)
+		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+
+	if (!opts->only_trailers && !info.blank_line_before_trailer)
+		fprintf(outfile, "\n");
+
 
 	if (!opts->only_input) {
 		LIST_HEAD(arg_head);
@@ -1076,6 +1077,7 @@ void process_trailers(const char *file,
 	print_all(outfile, &head, opts);
 
 	free_all(&head);
+	trailer_info_release(&info);
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 3/9] trailer: split process_command_line_args into separate functions
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 2/9] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 4/9] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, process_command_line_args did two things:

    (1) parse trailers from the configuration, and
    (2) parse trailers defined on the command line.

Separate (1) outside to a new function, parse_trailers_from_config.
Rename the remaining logic to parse_trailers_from_command_line_args.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 2c56cbc4a2e..b6de5d9cb2d 100644
--- a/trailer.c
+++ b/trailer.c
@@ -711,30 +711,35 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	list_add_tail(&new_item->list, arg_head);
 }
 
-static void process_command_line_args(struct list_head *arg_head,
-				      struct list_head *new_trailer_head)
+static void parse_trailers_from_config(struct list_head *config_head)
 {
 	struct arg_item *item;
-	struct strbuf tok = STRBUF_INIT;
-	struct strbuf val = STRBUF_INIT;
-	const struct conf_info *conf;
 	struct list_head *pos;
 
-	/*
-	 * In command-line arguments, '=' is accepted (in addition to the
-	 * separators that are defined).
-	 */
-	char *cl_separators = xstrfmt("=%s", separators);
-
 	/* Add an arg item for each configured trailer with a command */
 	list_for_each(pos, &conf_head) {
 		item = list_entry(pos, struct arg_item, list);
 		if (item->conf.command)
-			add_arg_item(arg_head,
+			add_arg_item(config_head,
 				     xstrdup(token_from_item(item, NULL)),
 				     xstrdup(""),
 				     &item->conf, NULL);
 	}
+}
+
+static void parse_trailers_from_command_line_args(struct list_head *arg_head,
+						  struct list_head *new_trailer_head)
+{
+	struct strbuf tok = STRBUF_INIT;
+	struct strbuf val = STRBUF_INIT;
+	const struct conf_info *conf;
+	struct list_head *pos;
+
+	/*
+	 * In command-line arguments, '=' is accepted (in addition to the
+	 * separators that are defined).
+	 */
+	char *cl_separators = xstrfmt("=%s", separators);
 
 	/* Add an arg item for each trailer on the command line */
 	list_for_each(pos, new_trailer_head) {
@@ -1069,8 +1074,11 @@ void process_trailers(const char *file,
 
 
 	if (!opts->only_input) {
+		LIST_HEAD(config_head);
 		LIST_HEAD(arg_head);
-		process_command_line_args(&arg_head, new_trailer_head);
+		parse_trailers_from_config(&config_head);
+		parse_trailers_from_command_line_args(&arg_head, new_trailer_head);
+		list_splice(&config_head, &arg_head);
 		process_trailers_lists(&head, &arg_head);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 4/9] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (2 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 3/9] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 5/9] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Do not use *_DEFAULT as a suffix to the enums, because the word
"default" is overloaded. The following are two examples of the ambiguity
of the word "default":

(1) "Default" can mean using the "default" values that are hardcoded
    in trailer.c as

        default_conf_info.where = WHERE_END;
        default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
        default_conf_info.if_missing = MISSING_ADD;

    in ensure_configured(). These values are referred to as "the
    default" in the docs for interpret-trailers. These defaults are used
    if no "trailer.*" configurations are defined.

(2) "Default" can also mean the "trailer.*" configurations themselves,
    because these configurations are used by "default" (ahead of the
    hardcoded defaults in (1)) if no command line arguments are
    provided. This concept of defaulting back to the configurations was
    introduced in 0ea5292e6b (interpret-trailers: add options for
    actions, 2017-08-01).

In addition, the corresponding *_DEFAULT values are chosen when the user
provides the "--no-where", "--no-if-exists", or "--no-if-missing" flags
on the command line. These "--no-*" flags are used to clear previously
provided flags of the form "--where", "--if-exists", and "--if-missing".
Using these "--no-*" flags undoes the specifying of these flags (if
any), so using the word "UNSPECIFIED" is more natural here.

So instead of using "*_DEFAULT", use "*_UNSPECIFIED" because this
signals to the reader that the *_UNSPECIFIED value by itself carries no
meaning (it's a zero value and by itself does not "default" to anything,
necessitating the need to have some other way of getting to a useful
value).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 17 ++++++++++-------
 trailer.h |  6 +++---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/trailer.c b/trailer.c
index b6de5d9cb2d..0b66effceb5 100644
--- a/trailer.c
+++ b/trailer.c
@@ -388,7 +388,7 @@ static void process_trailers_lists(struct list_head *head,
 int trailer_set_where(enum trailer_where *item, const char *value)
 {
 	if (!value)
-		*item = WHERE_DEFAULT;
+		*item = WHERE_UNSPECIFIED;
 	else if (!strcasecmp("after", value))
 		*item = WHERE_AFTER;
 	else if (!strcasecmp("before", value))
@@ -405,7 +405,7 @@ int trailer_set_where(enum trailer_where *item, const char *value)
 int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 {
 	if (!value)
-		*item = EXISTS_DEFAULT;
+		*item = EXISTS_UNSPECIFIED;
 	else if (!strcasecmp("addIfDifferent", value))
 		*item = EXISTS_ADD_IF_DIFFERENT;
 	else if (!strcasecmp("addIfDifferentNeighbor", value))
@@ -424,7 +424,7 @@ int trailer_set_if_exists(enum trailer_if_exists *item, const char *value)
 int trailer_set_if_missing(enum trailer_if_missing *item, const char *value)
 {
 	if (!value)
-		*item = MISSING_DEFAULT;
+		*item = MISSING_UNSPECIFIED;
 	else if (!strcasecmp("doNothing", value))
 		*item = MISSING_DO_NOTHING;
 	else if (!strcasecmp("add", value))
@@ -586,7 +586,10 @@ static void ensure_configured(void)
 	if (configured)
 		return;
 
-	/* Default config must be setup first */
+	/*
+	 * Default config must be setup first. These defaults are used if there
+	 * are no "trailer.*" or "trailer.<token>.*" options configured.
+	 */
 	default_conf_info.where = WHERE_END;
 	default_conf_info.if_exists = EXISTS_ADD_IF_DIFFERENT_NEIGHBOR;
 	default_conf_info.if_missing = MISSING_ADD;
@@ -701,11 +704,11 @@ static void add_arg_item(struct list_head *arg_head, char *tok, char *val,
 	new_item->value = val;
 	duplicate_conf(&new_item->conf, conf);
 	if (new_trailer_item) {
-		if (new_trailer_item->where != WHERE_DEFAULT)
+		if (new_trailer_item->where != WHERE_UNSPECIFIED)
 			new_item->conf.where = new_trailer_item->where;
-		if (new_trailer_item->if_exists != EXISTS_DEFAULT)
+		if (new_trailer_item->if_exists != EXISTS_UNSPECIFIED)
 			new_item->conf.if_exists = new_trailer_item->if_exists;
-		if (new_trailer_item->if_missing != MISSING_DEFAULT)
+		if (new_trailer_item->if_missing != MISSING_UNSPECIFIED)
 			new_item->conf.if_missing = new_trailer_item->if_missing;
 	}
 	list_add_tail(&new_item->list, arg_head);
diff --git a/trailer.h b/trailer.h
index ab2cd017567..a689d768c79 100644
--- a/trailer.h
+++ b/trailer.h
@@ -5,14 +5,14 @@
 #include "strbuf.h"
 
 enum trailer_where {
-	WHERE_DEFAULT,
+	WHERE_UNSPECIFIED,
 	WHERE_END,
 	WHERE_AFTER,
 	WHERE_BEFORE,
 	WHERE_START
 };
 enum trailer_if_exists {
-	EXISTS_DEFAULT,
+	EXISTS_UNSPECIFIED,
 	EXISTS_ADD_IF_DIFFERENT_NEIGHBOR,
 	EXISTS_ADD_IF_DIFFERENT,
 	EXISTS_ADD,
@@ -20,7 +20,7 @@ enum trailer_if_exists {
 	EXISTS_DO_NOTHING
 };
 enum trailer_if_missing {
-	MISSING_DEFAULT,
+	MISSING_UNSPECIFIED,
 	MISSING_ADD,
 	MISSING_DO_NOTHING
 };
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 5/9] commit: ignore_non_trailer computes number of bytes to ignore
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (3 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 4/9] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 6/9] trailer: find the end of the log message Linus Arver via GitGitGadget
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

ignore_non_trailer() returns the _number of bytes_ that should be
ignored from the end of the log message. It does not by itself "ignore"
anything.

Rename this function to remove the leading "ignore" verb, to sound more
like a quantity than an action.

Signed-off-by: Linus Arver <linusa@google.com>
---
 builtin/commit.c | 2 +-
 builtin/merge.c  | 2 +-
 commit.c         | 2 +-
 commit.h         | 4 ++--
 trailer.c        | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484..d1785d32db1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -900,7 +900,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		strbuf_stripspace(&sb, '\0');
 
 	if (signoff)
-		append_signoff(&sb, ignore_non_trailer(sb.buf, sb.len), 0);
+		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
 
 	if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
 		die_errno(_("could not write commit template"));
diff --git a/builtin/merge.c b/builtin/merge.c
index de68910177f..6cbbebca13d 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -891,7 +891,7 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 				_(no_scissors_editor_comment), comment_line_char);
 	}
 	if (signoff)
-		append_signoff(&msg, ignore_non_trailer(msg.buf, msg.len), 0);
+		append_signoff(&msg, ignored_log_message_bytes(msg.buf, msg.len), 0);
 	write_merge_heads(remoteheads);
 	write_file_buf(git_path_merge_msg(the_repository), msg.buf, msg.len);
 	if (run_commit_hook(0 < option_edit, get_index_file(), NULL,
diff --git a/commit.c b/commit.c
index b3223478bc2..4440fbabb83 100644
--- a/commit.c
+++ b/commit.c
@@ -1769,7 +1769,7 @@ const char *find_commit_header(const char *msg, const char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-size_t ignore_non_trailer(const char *buf, size_t len)
+size_t ignored_log_message_bytes(const char *buf, size_t len)
 {
 	size_t boc = 0;
 	size_t bol = 0;
diff --git a/commit.h b/commit.h
index 28928833c54..1cc872f225f 100644
--- a/commit.h
+++ b/commit.h
@@ -294,8 +294,8 @@ const char *find_header_mem(const char *msg, size_t len,
 const char *find_commit_header(const char *msg, const char *key,
 			       size_t *out_len);
 
-/* Find the end of the log message, the right place for a new trailer. */
-size_t ignore_non_trailer(const char *buf, size_t len);
+/* Find the number of bytes to ignore from the end of a log message. */
+size_t ignored_log_message_bytes(const char *buf, size_t len);
 
 typedef int (*each_mergetag_fn)(struct commit *commit, struct commit_extra_header *extra,
 				void *cb_data);
diff --git a/trailer.c b/trailer.c
index 0b66effceb5..185b3e2707f 100644
--- a/trailer.c
+++ b/trailer.c
@@ -931,7 +931,7 @@ continue_outer_loop:
 /* Return the position of the end of the trailers. */
 static size_t find_trailer_end(const char *buf, size_t len)
 {
-	return len - ignore_non_trailer(buf, len);
+	return len - ignored_log_message_bytes(buf, len);
 }
 
 static int ends_with_blank_line(const char *buf, size_t len)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 6/9] trailer: find the end of the log message
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (4 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 5/9] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 7/9] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, trailer_info_get() computed the trailer block end position
by

(1) checking for the opts->no_divider flag and optionally calling
    find_patch_start() to find the "patch start" location (patch_start), and
(2) calling find_trailer_end() to find the end of the trailer block
    using patch_start as a guide, saving the return value into
    "trailer_end".

The logic in (1) was awkward because the variable "patch_start" is
misleading if there is no patch in the input. The logic in (2) was
misleading because it could be the case that no trailers are in the
input (yet we are setting a "trailer_end" variable before even searching
for trailers, which happens later in find_trailer_start()). The name
"find_trailer_end" was misleading because that function did not look for
any trailer block itself --- instead it just computed the end position
of the log message in the input where the end of the trailer block (if
it exists) would be (because trailer blocks must always come after the
end of the log message).

Combine the logic in (1) and (2) together into find_patch_start() by
renaming it to find_end_of_log_message(). The end of the log message is
the starting point which find_trailer_start() needs to start searching
backward to parse individual trailers (if any).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 61 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/trailer.c b/trailer.c
index 185b3e2707f..9da89df9d8a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -812,21 +812,47 @@ static ssize_t last_line(const char *buf, size_t len)
 }
 
 /*
- * Return the position of the start of the patch or the length of str if there
- * is no patch in the message.
+ * Find the end of the log message as an offset from the start of the input
+ * (where callers of this function are interested in looking for a trailers
+ * block in the same input). We have to consider two categories of content that
+ * can come at the end of the input which we want to ignore (because they don't
+ * belong in the log message):
+ *
+ * (1) the "patch part" which begins with a "---" divider and has patch
+ * information (like the output of git-format-patch), and
+ *
+ * (2) any trailing comment lines, blank lines like in the output of "git
+ * commit -v", or stuff below the "cut" (scissor) line.
+ *
+ * As a formula, the situation looks like this:
+ *
+ *     INPUT = LOG MESSAGE + IGNORED
+ *
+ * where IGNORED can be either of the two categories described above. It may be
+ * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE
+ * contains a trailer block, but that's not the concern of this function.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_end_of_log_message(const char *input, int no_divider)
 {
+	size_t end;
+
 	const char *s;
 
-	for (s = str; *s; s = next_line(s)) {
+	/* Assume the naive end of the input is already what we want. */
+	end = strlen(input);
+
+	/* Optionally skip over any patch part ("---" line and below). */
+	for (s = input; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
-			return s - str;
+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
+			end = s - input;
+			break;
+		}
 	}
 
-	return s - str;
+	/* Skip over other ignorable bits. */
+	return end - ignored_log_message_bytes(input, end);
 }
 
 /*
@@ -928,12 +954,6 @@ continue_outer_loop:
 	return len;
 }
 
-/* Return the position of the end of the trailers. */
-static size_t find_trailer_end(const char *buf, size_t len)
-{
-	return len - ignored_log_message_bytes(buf, len);
-}
-
 static int ends_with_blank_line(const char *buf, size_t len)
 {
 	ssize_t ll = last_line(buf, len);
@@ -1104,7 +1124,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int patch_start, trailer_end, trailer_start;
+	int end_of_log_message, trailer_start;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1112,16 +1132,11 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
-	trailer_end = find_trailer_end(str, patch_start);
-	trailer_start = find_trailer_start(str, trailer_end);
+	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
+	trailer_start = find_trailer_start(str, end_of_log_message);
 
 	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 trailer_end - trailer_start,
+					 end_of_log_message - trailer_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1144,7 +1159,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_start);
 	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + trailer_end;
+	info->trailer_end = str + end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 7/9] trailer: use offsets for trailer_start/trailer_end
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (5 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 6/9] trailer: find the end of the log message Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 8/9] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

While we're at it, rename trailer_start to trailer_block_start to be
more explicit about these offsets (that they are for the entire trailer
block including other trailers). Ditto for trailer_end.

Signed-off-by: Linus Arver <linusa@google.com>
---
 sequencer.c |  2 +-
 trailer.c   | 29 ++++++++++++++---------------
 trailer.h   | 13 ++++++++-----
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index adc9cfb4df3..77362f5cd5d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -331,7 +331,7 @@ static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
 	if (ignore_footer)
 		sb->buf[sb->len - ignore_footer] = saved_char;
 
-	if (info.trailer_start == info.trailer_end)
+	if (info.trailer_block_start == info.trailer_block_end)
 		return 0;
 
 	for (i = 0; i < info.trailer_nr; i++)
diff --git a/trailer.c b/trailer.c
index 9da89df9d8a..471c2722536 100644
--- a/trailer.c
+++ b/trailer.c
@@ -859,7 +859,7 @@ static size_t find_end_of_log_message(const char *input, int no_divider)
  * Return the position of the first trailer line or len if there are no
  * trailers.
  */
-static size_t find_trailer_start(const char *buf, size_t len)
+static size_t find_trailer_block_start(const char *buf, size_t len)
 {
 	const char *s;
 	ssize_t end_of_title, l;
@@ -1075,7 +1075,6 @@ void process_trailers(const char *file,
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
 	struct trailer_info info;
-	size_t trailer_end;
 	FILE *outfile = stdout;
 
 	ensure_configured();
@@ -1086,11 +1085,10 @@ void process_trailers(const char *file,
 		outfile = create_in_place_tempfile(file);
 
 	parse_trailers(&info, sb.buf, &head, opts);
-	trailer_end = info.trailer_end - sb.buf;
 
 	/* Print the lines before the trailers */
 	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1112,7 +1110,7 @@ void process_trailers(const char *file,
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
+		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1124,7 +1122,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int end_of_log_message, trailer_start;
+	size_t end_of_log_message = 0, trailer_block_start = 0;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1133,10 +1131,10 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	ensure_configured();
 
 	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
-	trailer_start = find_trailer_start(str, end_of_log_message);
+	trailer_block_start = find_trailer_block_start(str, end_of_log_message);
 
-	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 end_of_log_message - trailer_start,
+	trailer_lines = strbuf_split_buf(str + trailer_block_start,
+					 end_of_log_message - trailer_block_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1157,9 +1155,9 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	strbuf_list_free(trailer_lines);
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
-							       trailer_start);
-	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + end_of_log_message;
+							       trailer_block_start);
+	info->trailer_block_start = trailer_block_start;
+	info->trailer_block_end = end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
@@ -1174,6 +1172,7 @@ void trailer_info_release(struct trailer_info *info)
 
 static void format_trailer_info(struct strbuf *out,
 				const struct trailer_info *info,
+				const char *msg,
 				const struct process_trailer_options *opts)
 {
 	size_t origlen = out->len;
@@ -1183,8 +1182,8 @@ static void format_trailer_info(struct strbuf *out,
 	if (!opts->only_trailers && !opts->unfold && !opts->filter &&
 	    !opts->separator && !opts->key_only && !opts->value_only &&
 	    !opts->key_value_separator) {
-		strbuf_add(out, info->trailer_start,
-			   info->trailer_end - info->trailer_start);
+		strbuf_add(out, msg + info->trailer_block_start,
+			   info->trailer_block_end - info->trailer_block_start);
 		return;
 	}
 
@@ -1238,7 +1237,7 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	struct trailer_info info;
 
 	trailer_info_get(&info, msg, opts);
-	format_trailer_info(out, &info, opts);
+	format_trailer_info(out, &info, msg, opts);
 	trailer_info_release(&info);
 }
 
diff --git a/trailer.h b/trailer.h
index a689d768c79..4dcb9080327 100644
--- a/trailer.h
+++ b/trailer.h
@@ -32,16 +32,19 @@ int trailer_set_if_missing(enum trailer_if_missing *item, const char *value);
 struct trailer_info {
 	/*
 	 * True if there is a blank line before the location pointed to by
-	 * trailer_start.
+	 * trailer_block_start.
 	 */
 	int blank_line_before_trailer;
 
 	/*
-	 * Pointers to the start and end of the trailer block found. If there
-	 * is no trailer block found, these 2 pointers point to the end of the
-	 * input string.
+	 * Offsets to the trailer block start and end positions in the input
+	 * string. If no trailer block is found, these are both set to the
+	 * "true" end of the input, per find_true_end_of_input().
+	 *
+	 * NOTE: This will be changed so that these point to 0 in the next
+	 * patch if no trailers are found.
 	 */
-	const char *trailer_start, *trailer_end;
+	size_t trailer_block_start, trailer_block_end;
 
 	/*
 	 * Array of trailers found.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 8/9] trailer: only use trailer_block_* variables if trailers were found
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (6 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 7/9] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 19:50     ` [PATCH v3 9/9] trailer: make stack variable names match field names Linus Arver via GitGitGadget
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, these variables were overloaded to act as the end of the log
message even if no trailers were found.

Remove the overloaded meaning by adding a new end_of_log_message field
to the trailer_info struct. The trailer_info struct consumers now only
refer to the trailer_block_start and trailer_block_end fields if
trailers were found (trailer_nr > 0), and otherwise refer to the
end_of_log_message.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 31 +++++++++++++++++++++++--------
 trailer.h | 12 +++++++-----
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 471c2722536..9a3837be770 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1086,9 +1086,14 @@ void process_trailers(const char *file,
 
 	parse_trailers(&info, sb.buf, &head, opts);
 
-	/* Print the lines before the trailers */
-	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
+	/* Print the lines before the trailers (if any) as is. */
+	if (!opts->only_trailers) {
+		if (info.trailer_nr) {
+			fwrite(sb.buf, 1, info.trailer_block_start, outfile);
+		} else {
+			fwrite(sb.buf, 1, info.end_of_log_message, outfile);
+		}
+	}
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1108,9 +1113,14 @@ void process_trailers(const char *file,
 	free_all(&head);
 	trailer_info_release(&info);
 
-	/* Print the lines after the trailers as is */
-	if (!opts->only_trailers)
-		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
+	/* Print the lines after the trailers (if any) as is. */
+	if (!opts->only_trailers) {
+		if (info.trailer_nr) {
+			fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
+		} else {
+			fwrite(sb.buf + info.end_of_log_message, 1, sb.len - info.end_of_log_message, outfile);
+		}
+	}
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1156,8 +1166,13 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_block_start);
-	info->trailer_block_start = trailer_block_start;
-	info->trailer_block_end = end_of_log_message;
+	info->trailer_block_start = 0;
+	info->trailer_block_end = 0;
+	if (nr) {
+		info->trailer_block_start = trailer_block_start;
+		info->trailer_block_end = end_of_log_message;
+	}
+	info->end_of_log_message = end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
diff --git a/trailer.h b/trailer.h
index 4dcb9080327..5e2843d320a 100644
--- a/trailer.h
+++ b/trailer.h
@@ -38,14 +38,16 @@ struct trailer_info {
 
 	/*
 	 * Offsets to the trailer block start and end positions in the input
-	 * string. If no trailer block is found, these are both set to the
-	 * "true" end of the input, per find_true_end_of_input().
-	 *
-	 * NOTE: This will be changed so that these point to 0 in the next
-	 * patch if no trailers are found.
+	 * string. If no trailer block is found, these are set to 0.
 	 */
 	size_t trailer_block_start, trailer_block_end;
 
+	/*
+	 * Offset to the end of the log message in the input (may not be the
+	 * same as the end of the input).
+	 */
+	size_t end_of_log_message;
+
 	/*
 	 * Array of trailers found.
 	 */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 9/9] trailer: make stack variable names match field names
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (7 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 8/9] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
@ 2023-09-22 19:50     ` Linus Arver via GitGitGadget
  2023-09-22 22:47     ` [PATCH v3 0/9] Trailer readability cleanups Junio C Hamano
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
  10 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-22 19:50 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/trailer.c b/trailer.c
index 9a3837be770..739acafc4e9 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1134,8 +1134,8 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 {
 	size_t end_of_log_message = 0, trailer_block_start = 0;
 	struct strbuf **trailer_lines, **ptr;
-	char **trailer_strings = NULL;
-	size_t nr = 0, alloc = 0;
+	char **trailers = NULL;
+	size_t trailer_nr = 0, alloc = 0;
 	char **last = NULL;
 
 	ensure_configured();
@@ -1155,12 +1155,12 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 			*last = strbuf_detach(&sb, NULL);
 			continue;
 		}
-		ALLOC_GROW(trailer_strings, nr + 1, alloc);
-		trailer_strings[nr] = strbuf_detach(*ptr, NULL);
-		last = find_separator(trailer_strings[nr], separators) >= 1
-			? &trailer_strings[nr]
+		ALLOC_GROW(trailers, trailer_nr + 1, alloc);
+		trailers[trailer_nr] = strbuf_detach(*ptr, NULL);
+		last = find_separator(trailers[trailer_nr], separators) >= 1
+			? &trailers[trailer_nr]
 			: NULL;
-		nr++;
+		trailer_nr++;
 	}
 	strbuf_list_free(trailer_lines);
 
@@ -1168,13 +1168,13 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 							       trailer_block_start);
 	info->trailer_block_start = 0;
 	info->trailer_block_end = 0;
-	if (nr) {
+	if (trailer_nr) {
 		info->trailer_block_start = trailer_block_start;
 		info->trailer_block_end = end_of_log_message;
 	}
 	info->end_of_log_message = end_of_log_message;
-	info->trailers = trailer_strings;
-	info->trailer_nr = nr;
+	info->trailers = trailers;
+	info->trailer_nr = trailer_nr;
 }
 
 void trailer_info_release(struct trailer_info *info)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 0/9] Trailer readability cleanups
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (8 preceding siblings ...)
  2023-09-22 19:50     ` [PATCH v3 9/9] trailer: make stack variable names match field names Linus Arver via GitGitGadget
@ 2023-09-22 22:47     ` Junio C Hamano
  2023-09-22 23:13       ` Linus Arver
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
  10 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-22 22:47 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> These patches were created while digging into the trailer code to better
> understand how it works, in preparation for making the trailer.{c,h} files
> as small as possible to make them available as a library for external users.
> This series was originally created as part of [1], but are sent here
> separately because the changes here are arguably more subjective in nature.
> I think Patch 1 is the most important in this series. The others can wait,
> if folks are opposed to adding them on their own merits at this point in
> time.

Hmph, as we discussed, these changes have already been cooking in
'next' for some time:

    13211ae23f trailer: separate public from internal portion of trailer_iterator
    c2a8edf997 trailer: split process_input_file into separate pieces
    94430d03df trailer: split process_command_line_args into separate functions
    ee8c5ee08c trailer: teach find_patch_start about --no-divider
    d2be104085 trailer: rename *_DEFAULT enums to *_UNSPECIFIED
    b5e75f87b5 trailer: use offsets for trailer_start/trailer_end

and I thought we agreed that we'll park them in 'next' and do
whatever necessary fix-up on top as incremental patches?  The first
three patches in this iteration seems to be identical to the
previous round, so I can ignore them and keep the old iteration, but
the remainder of this series are replacements that are suitable when
the series was still out of 'next', but they are no longer usable
once the series is in 'next'.

I could revert and discard [4-6/6] of the previous iteration out of
'next' and have only the first three (which I thought have been
adequately reviewed without remaining issues) graduate to 'master',
if it makes it easier to fix this update on top, but I'd rather not
to encourage people to form a habit of reverting changes out of
'next'.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 0/9] Trailer readability cleanups
  2023-09-22 22:47     ` [PATCH v3 0/9] Trailer readability cleanups Junio C Hamano
@ 2023-09-22 23:13       ` Linus Arver
  2023-09-23  0:48         ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-09-22 23:13 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> These patches were created while digging into the trailer code to better
>> understand how it works, in preparation for making the trailer.{c,h} files
>> as small as possible to make them available as a library for external users.
>> This series was originally created as part of [1], but are sent here
>> separately because the changes here are arguably more subjective in nature.
>> I think Patch 1 is the most important in this series. The others can wait,
>> if folks are opposed to adding them on their own merits at this point in
>> time.
>
> Hmph, as we discussed, these changes have already been cooking in
> 'next' for some time:
>
>     13211ae23f trailer: separate public from internal portion of trailer_iterator
>     c2a8edf997 trailer: split process_input_file into separate pieces
>     94430d03df trailer: split process_command_line_args into separate functions
>     ee8c5ee08c trailer: teach find_patch_start about --no-divider
>     d2be104085 trailer: rename *_DEFAULT enums to *_UNSPECIFIED
>     b5e75f87b5 trailer: use offsets for trailer_start/trailer_end
>
> and I thought we agreed that we'll park them in 'next' and do
> whatever necessary fix-up on top as incremental patches?

Ahhh yes! I completely forgot. So sorry for the noise...

> The first
> three patches in this iteration seems to be identical to the
> previous round, so I can ignore them and keep the old iteration, but
> the remainder of this series are replacements that are suitable when
> the series was still out of 'next', but they are no longer usable
> once the series is in 'next'.

Right.

> I could revert and discard [4-6/6] of the previous iteration out of
> 'next' and have only the first three (which I thought have been
> adequately reviewed without remaining issues) graduate to 'master',
> if it makes it easier to fix this update on top, but I'd rather not
> to encourage people to form a habit of reverting changes out of
> 'next'.
>
> Thanks.

I totally agree that reverting changes out of next is undesirable. I
will do a reroll on top of 'next' with only those incremental (new)
patches.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 0/9] Trailer readability cleanups
  2023-09-22 23:13       ` Linus Arver
@ 2023-09-23  0:48         ` Junio C Hamano
  2023-09-26  5:40           ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-09-23  0:48 UTC (permalink / raw)
  To: Linus Arver
  Cc: Linus Arver via GitGitGadget, git, Glen Choo, Christian Couder,
	Phillip Wood

Linus Arver <linusa@google.com> writes:

>> I could revert and discard [4-6/6] of the previous iteration out of
>> 'next' and have only the first three (which I thought have been
>> adequately reviewed without remaining issues) graduate to 'master',
>> if it makes it easier to fix this update on top, but I'd rather not
>> to encourage people to form a habit of reverting changes out of
>> 'next'.
>>
>> Thanks.
>
> I totally agree that reverting changes out of next is undesirable. I
> will do a reroll on top of 'next' with only those incremental (new)
> patches.

OK, so the first 3 patches are now in 'master', and the remainder of
the previous series have been discarded.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  2023-09-22 19:48             ` Junio C Hamano
@ 2023-09-26  5:30               ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-09-26  5:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Arver via GitGitGadget, git, Christian Couder, Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>> ... I prefer the
>> WHERE_UNSPECIFIED as in this patch because the WHERE_DEFAULT is
>> ambiguous on its own (i.e., WHERE_DEFAULT could mean that we either use
>> the default value WHERE_END in default_conf_info, or it could mean that
>> we fall back to the configuration variables (where it could be something
>> else)).
>
> Yup.  "Turning something that is left UNSPECIFIED after command line
> options and configuration files are processed into the hardcoded
> DEFAULT" is one mental model that is easy to explain.
>
> I however am not sure if it is easier than "Setting something to
> hardcoded DEFAULT before command line options and configuration
> files are allowed to tweak it, and if nobody touches it, then it
> gets the hardcoded DEFAULT value in the end", which is another valid
> mental model, though.

True.

> If both can be used, I'd personally prefer
> the latter, and reserve the "UNSPECIFIED to DEFAULT" pattern to
> signal that we are dealing with a case where the simpler pattern
> without UNSPECIFIED cannot solve.

SGTM.

> The simpler pattern would not work, when the default is defined
> depending on a larger context.  Imagine we have two Boolean
> variables, A and B, where A defaults to false, and B defaults to
> some value derived from the value of A (say, opposite of A).
>
> In the most natural implementation, you'd initialize A to false and
> B to unspecified, let command line options and configuration
> variables to set them to true or false, and after all that, you do
> not have to tweak A's value (it will be left to false that is the
> default unless the user or the configuration gave an explicit
> value), but you need to check if B is left unspecified and tweak it
> to true or false using the final value of A.
>
> For a variable with such a need like B, we cannot avoid having
> "unspecified".  If you initialize it to false (or true), after the
> command line and the configuration files are read and you find B is
> set to false (or true), you cannot tell if the user or the
> configuration explicitly set B to false (or true), in which case you
> do not want to futz with its value based on what is in A, or it is
> false (or true) only because nobody touched it, in which case you
> need to compute its value based on what is in A.

Thanks for the illustrative example! I don't think we have a case of a
"B" variable here for trailers.

> And that is why I asked if we need to special case "the user did not
> touch and the variable is left untouched" in the trailer subsystem.

I think the answer is "no, we don't need to special case". I'll be
dropping this patch in the next re-roll.

> Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 0/9] Trailer readability cleanups
  2023-09-23  0:48         ` Junio C Hamano
@ 2023-09-26  5:40           ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-09-26  5:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Arver via GitGitGadget, git, Glen Choo, Christian Couder,
	Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>>> I could revert and discard [4-6/6] of the previous iteration out of
>>> 'next' and have only the first three (which I thought have been
>>> adequately reviewed without remaining issues) graduate to 'master',
>>> if it makes it easier to fix this update on top, but I'd rather not
>>> to encourage people to form a habit of reverting changes out of
>>> 'next'.
>>>
>>> Thanks.
>>
>> I totally agree that reverting changes out of next is undesirable. I
>> will do a reroll on top of 'next' with only those incremental (new)
>> patches.
>
> OK, so the first 3 patches are now in 'master', and the remainder of
> the previous series have been discarded.
>
> Thanks.

Oh, that simplifies things. I will re-roll on top of 'master' as the
starting point for the other remaining patches instead of using 'next'
as I suggested earlier.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 0/4] Trailer readability cleanups
  2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
                       ` (9 preceding siblings ...)
  2023-09-22 22:47     ` [PATCH v3 0/9] Trailer readability cleanups Junio C Hamano
@ 2023-09-26  6:22     ` Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
                         ` (4 more replies)
  10 siblings, 5 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-26  6:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver

These patches were created while digging into the trailer code to better
understand how it works, in preparation for making the trailer.{c,h} files
as small as possible to make them available as a library for external users.
This series was originally created as part of [1], but are sent here
separately because the changes here are arguably more subjective in nature.

These patches do not add or change any features. Instead, their goal is to
make the code easier to understand for new contributors (like myself), by
making various cleanups and improvements. Ultimately, my hope is that with
such cleanups, we are better positioned to make larger changes (especially
the broader libification effort, as in "Introduce Git Standard Library"
[2]).


Updates in v4
=============

 * The first 3 patches in v3 were merged into 'master'. Necessarily, those 3
   patches have been dropped.
 * Patch 4 in v3 ("trailer: rename *_DEFAULT enums to *_UNSPECIFIED") has
   been dropped, as well as Patch 9 in v3 ("trailer: make stack variable
   names match field names"). These were dropped to simplify this series for
   what I think is the more immediate, important change (see next bullet
   point).
 * Patches 5-8 in v3 are the only ones remaining in this series. They still
   solely deal with --no-divider and trailer block start/end cleanups.


Updates in v3
=============

 * Patches 4 and 6 (--no-divider and trailer block start/end cleanups) have
   been reorganized to Patches 5-8. This ended up touching commit.c in a
   minor way, but otherwise all of the changes here are cleanups and do not
   change any behavior.


Updates in v2
=============

 * Patch 1: Drop the use of a #define. Instead just use an anonymous struct
   named internal.
 * Patch 2: Don't free info out parameter inside parse_trailers(). Instead
   free it from the caller, process_trailers(). Update comment in
   parse_trailers().
 * Patch 3: Reword commit message.
 * Patch 4: Mention be3d654343 (commit: pass --no-divider to
   interpret-trailers, 2023-06-17) in commit message.
 * Added Patch 6 to make trailer_info use offsets for trailer_start and
   trailer_end (thanks to Glen Choo for the suggestion).

[1]
https://lore.kernel.org/git/pull.1564.git.1691210737.gitgitgadget@gmail.com/T/#mb044012670663d8eb7a548924bbcc933bef116de
[2]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/
[3]
https://lore.kernel.org/git/pull.1149.git.1677143700.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/6b4cb31b17077181a311ca87e82464a1e2ad67dd.1686797630.git.gitgitgadget@gmail.com/
[5]
https://lore.kernel.org/git/pull.1563.git.1691211879.gitgitgadget@gmail.com/T/#m0131f9829c35d8e0103ffa88f07d8e0e43dd732c

Linus Arver (4):
  commit: ignore_non_trailer computes number of bytes to ignore
  trailer: find the end of the log message
  trailer: use offsets for trailer_start/trailer_end
  trailer: only use trailer_block_* variables if trailers were found

 builtin/commit.c |   2 +-
 builtin/merge.c  |   2 +-
 commit.c         |   2 +-
 commit.h         |   4 +-
 sequencer.c      |   2 +-
 trailer.c        | 105 ++++++++++++++++++++++++++++++-----------------
 trailer.h        |  15 ++++---
 7 files changed, 83 insertions(+), 49 deletions(-)


base-commit: bcb6cae2966cc407ca1afc77413b3ef11103c175
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1563%2Flistx%2Ftrailer-libification-prep-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1563/listx/trailer-libification-prep-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1563

Range-diff vs v3:

  1:  4f116d2550f <  -:  ----------- trailer: separate public from internal portion of trailer_iterator
  2:  c00f4623d0b <  -:  ----------- trailer: split process_input_file into separate pieces
  3:  f78c2345fad <  -:  ----------- trailer: split process_command_line_args into separate functions
  4:  47186a09b24 <  -:  ----------- trailer: rename *_DEFAULT enums to *_UNSPECIFIED
  5:  da52cec42e1 =  1:  4ce5cf77005 commit: ignore_non_trailer computes number of bytes to ignore
  6:  ab8a6ced143 =  2:  c904caba7e1 trailer: find the end of the log message
  7:  091805eb7d9 =  3:  796e47c1e5f trailer: use offsets for trailer_start/trailer_end
  8:  1762f78a613 =  4:  64e1bd4e4be trailer: only use trailer_block_* variables if trailers were found
  9:  a784c45ed71 <  -:  ----------- trailer: make stack variable names match field names

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
@ 2023-09-26  6:22       ` Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 2/4] trailer: find the end of the log message Linus Arver via GitGitGadget
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-26  6:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

ignore_non_trailer() returns the _number of bytes_ that should be
ignored from the end of the log message. It does not by itself "ignore"
anything.

Rename this function to remove the leading "ignore" verb, to sound more
like a quantity than an action.

Signed-off-by: Linus Arver <linusa@google.com>
---
 builtin/commit.c | 2 +-
 builtin/merge.c  | 2 +-
 commit.c         | 2 +-
 commit.h         | 4 ++--
 trailer.c        | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484..d1785d32db1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -900,7 +900,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		strbuf_stripspace(&sb, '\0');
 
 	if (signoff)
-		append_signoff(&sb, ignore_non_trailer(sb.buf, sb.len), 0);
+		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
 
 	if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
 		die_errno(_("could not write commit template"));
diff --git a/builtin/merge.c b/builtin/merge.c
index 545da0c8a11..c654a29fe85 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -870,7 +870,7 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 				_(no_scissors_editor_comment), comment_line_char);
 	}
 	if (signoff)
-		append_signoff(&msg, ignore_non_trailer(msg.buf, msg.len), 0);
+		append_signoff(&msg, ignored_log_message_bytes(msg.buf, msg.len), 0);
 	write_merge_heads(remoteheads);
 	write_file_buf(git_path_merge_msg(the_repository), msg.buf, msg.len);
 	if (run_commit_hook(0 < option_edit, get_index_file(), NULL,
diff --git a/commit.c b/commit.c
index b3223478bc2..4440fbabb83 100644
--- a/commit.c
+++ b/commit.c
@@ -1769,7 +1769,7 @@ const char *find_commit_header(const char *msg, const char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-size_t ignore_non_trailer(const char *buf, size_t len)
+size_t ignored_log_message_bytes(const char *buf, size_t len)
 {
 	size_t boc = 0;
 	size_t bol = 0;
diff --git a/commit.h b/commit.h
index 28928833c54..1cc872f225f 100644
--- a/commit.h
+++ b/commit.h
@@ -294,8 +294,8 @@ const char *find_header_mem(const char *msg, size_t len,
 const char *find_commit_header(const char *msg, const char *key,
 			       size_t *out_len);
 
-/* Find the end of the log message, the right place for a new trailer. */
-size_t ignore_non_trailer(const char *buf, size_t len);
+/* Find the number of bytes to ignore from the end of a log message. */
+size_t ignored_log_message_bytes(const char *buf, size_t len);
 
 typedef int (*each_mergetag_fn)(struct commit *commit, struct commit_extra_header *extra,
 				void *cb_data);
diff --git a/trailer.c b/trailer.c
index b6de5d9cb2d..3c54b38a85a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -928,7 +928,7 @@ continue_outer_loop:
 /* Return the position of the end of the trailers. */
 static size_t find_trailer_end(const char *buf, size_t len)
 {
-	return len - ignore_non_trailer(buf, len);
+	return len - ignored_log_message_bytes(buf, len);
 }
 
 static int ends_with_blank_line(const char *buf, size_t len)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 2/4] trailer: find the end of the log message
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
@ 2023-09-26  6:22       ` Linus Arver via GitGitGadget
  2023-09-28 23:16         ` Jonathan Tan
  2023-09-26  6:22       ` [PATCH v4 3/4] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-26  6:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, trailer_info_get() computed the trailer block end position
by

(1) checking for the opts->no_divider flag and optionally calling
    find_patch_start() to find the "patch start" location (patch_start), and
(2) calling find_trailer_end() to find the end of the trailer block
    using patch_start as a guide, saving the return value into
    "trailer_end".

The logic in (1) was awkward because the variable "patch_start" is
misleading if there is no patch in the input. The logic in (2) was
misleading because it could be the case that no trailers are in the
input (yet we are setting a "trailer_end" variable before even searching
for trailers, which happens later in find_trailer_start()). The name
"find_trailer_end" was misleading because that function did not look for
any trailer block itself --- instead it just computed the end position
of the log message in the input where the end of the trailer block (if
it exists) would be (because trailer blocks must always come after the
end of the log message).

Combine the logic in (1) and (2) together into find_patch_start() by
renaming it to find_end_of_log_message(). The end of the log message is
the starting point which find_trailer_start() needs to start searching
backward to parse individual trailers (if any).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 61 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/trailer.c b/trailer.c
index 3c54b38a85a..96cb285a4ea 100644
--- a/trailer.c
+++ b/trailer.c
@@ -809,21 +809,47 @@ static ssize_t last_line(const char *buf, size_t len)
 }
 
 /*
- * Return the position of the start of the patch or the length of str if there
- * is no patch in the message.
+ * Find the end of the log message as an offset from the start of the input
+ * (where callers of this function are interested in looking for a trailers
+ * block in the same input). We have to consider two categories of content that
+ * can come at the end of the input which we want to ignore (because they don't
+ * belong in the log message):
+ *
+ * (1) the "patch part" which begins with a "---" divider and has patch
+ * information (like the output of git-format-patch), and
+ *
+ * (2) any trailing comment lines, blank lines like in the output of "git
+ * commit -v", or stuff below the "cut" (scissor) line.
+ *
+ * As a formula, the situation looks like this:
+ *
+ *     INPUT = LOG MESSAGE + IGNORED
+ *
+ * where IGNORED can be either of the two categories described above. It may be
+ * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE
+ * contains a trailer block, but that's not the concern of this function.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_end_of_log_message(const char *input, int no_divider)
 {
+	size_t end;
+
 	const char *s;
 
-	for (s = str; *s; s = next_line(s)) {
+	/* Assume the naive end of the input is already what we want. */
+	end = strlen(input);
+
+	/* Optionally skip over any patch part ("---" line and below). */
+	for (s = input; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
-			return s - str;
+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
+			end = s - input;
+			break;
+		}
 	}
 
-	return s - str;
+	/* Skip over other ignorable bits. */
+	return end - ignored_log_message_bytes(input, end);
 }
 
 /*
@@ -925,12 +951,6 @@ continue_outer_loop:
 	return len;
 }
 
-/* Return the position of the end of the trailers. */
-static size_t find_trailer_end(const char *buf, size_t len)
-{
-	return len - ignored_log_message_bytes(buf, len);
-}
-
 static int ends_with_blank_line(const char *buf, size_t len)
 {
 	ssize_t ll = last_line(buf, len);
@@ -1101,7 +1121,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int patch_start, trailer_end, trailer_start;
+	int end_of_log_message, trailer_start;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1109,16 +1129,11 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
-	trailer_end = find_trailer_end(str, patch_start);
-	trailer_start = find_trailer_start(str, trailer_end);
+	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
+	trailer_start = find_trailer_start(str, end_of_log_message);
 
 	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 trailer_end - trailer_start,
+					 end_of_log_message - trailer_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1141,7 +1156,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_start);
 	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + trailer_end;
+	info->trailer_end = str + end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 3/4] trailer: use offsets for trailer_start/trailer_end
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 2/4] trailer: find the end of the log message Linus Arver via GitGitGadget
@ 2023-09-26  6:22       ` Linus Arver via GitGitGadget
  2023-09-26  6:22       ` [PATCH v4 4/4] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
  2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
  4 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-26  6:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

While we're at it, rename trailer_start to trailer_block_start to be
more explicit about these offsets (that they are for the entire trailer
block including other trailers). Ditto for trailer_end.

Signed-off-by: Linus Arver <linusa@google.com>
---
 sequencer.c |  2 +-
 trailer.c   | 29 ++++++++++++++---------------
 trailer.h   | 13 ++++++++-----
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..8707a92204f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -345,7 +345,7 @@ static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
 	if (ignore_footer)
 		sb->buf[sb->len - ignore_footer] = saved_char;
 
-	if (info.trailer_start == info.trailer_end)
+	if (info.trailer_block_start == info.trailer_block_end)
 		return 0;
 
 	for (i = 0; i < info.trailer_nr; i++)
diff --git a/trailer.c b/trailer.c
index 96cb285a4ea..3dc2faa969c 100644
--- a/trailer.c
+++ b/trailer.c
@@ -856,7 +856,7 @@ static size_t find_end_of_log_message(const char *input, int no_divider)
  * Return the position of the first trailer line or len if there are no
  * trailers.
  */
-static size_t find_trailer_start(const char *buf, size_t len)
+static size_t find_trailer_block_start(const char *buf, size_t len)
 {
 	const char *s;
 	ssize_t end_of_title, l;
@@ -1072,7 +1072,6 @@ void process_trailers(const char *file,
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
 	struct trailer_info info;
-	size_t trailer_end;
 	FILE *outfile = stdout;
 
 	ensure_configured();
@@ -1083,11 +1082,10 @@ void process_trailers(const char *file,
 		outfile = create_in_place_tempfile(file);
 
 	parse_trailers(&info, sb.buf, &head, opts);
-	trailer_end = info.trailer_end - sb.buf;
 
 	/* Print the lines before the trailers */
 	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1109,7 +1107,7 @@ void process_trailers(const char *file,
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
+		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1121,7 +1119,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int end_of_log_message, trailer_start;
+	size_t end_of_log_message = 0, trailer_block_start = 0;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1130,10 +1128,10 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	ensure_configured();
 
 	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
-	trailer_start = find_trailer_start(str, end_of_log_message);
+	trailer_block_start = find_trailer_block_start(str, end_of_log_message);
 
-	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 end_of_log_message - trailer_start,
+	trailer_lines = strbuf_split_buf(str + trailer_block_start,
+					 end_of_log_message - trailer_block_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1154,9 +1152,9 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	strbuf_list_free(trailer_lines);
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
-							       trailer_start);
-	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + end_of_log_message;
+							       trailer_block_start);
+	info->trailer_block_start = trailer_block_start;
+	info->trailer_block_end = end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
@@ -1171,6 +1169,7 @@ void trailer_info_release(struct trailer_info *info)
 
 static void format_trailer_info(struct strbuf *out,
 				const struct trailer_info *info,
+				const char *msg,
 				const struct process_trailer_options *opts)
 {
 	size_t origlen = out->len;
@@ -1180,8 +1179,8 @@ static void format_trailer_info(struct strbuf *out,
 	if (!opts->only_trailers && !opts->unfold && !opts->filter &&
 	    !opts->separator && !opts->key_only && !opts->value_only &&
 	    !opts->key_value_separator) {
-		strbuf_add(out, info->trailer_start,
-			   info->trailer_end - info->trailer_start);
+		strbuf_add(out, msg + info->trailer_block_start,
+			   info->trailer_block_end - info->trailer_block_start);
 		return;
 	}
 
@@ -1235,7 +1234,7 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	struct trailer_info info;
 
 	trailer_info_get(&info, msg, opts);
-	format_trailer_info(out, &info, opts);
+	format_trailer_info(out, &info, msg, opts);
 	trailer_info_release(&info);
 }
 
diff --git a/trailer.h b/trailer.h
index ab2cd017567..70d7b8bf1d8 100644
--- a/trailer.h
+++ b/trailer.h
@@ -32,16 +32,19 @@ int trailer_set_if_missing(enum trailer_if_missing *item, const char *value);
 struct trailer_info {
 	/*
 	 * True if there is a blank line before the location pointed to by
-	 * trailer_start.
+	 * trailer_block_start.
 	 */
 	int blank_line_before_trailer;
 
 	/*
-	 * Pointers to the start and end of the trailer block found. If there
-	 * is no trailer block found, these 2 pointers point to the end of the
-	 * input string.
+	 * Offsets to the trailer block start and end positions in the input
+	 * string. If no trailer block is found, these are both set to the
+	 * "true" end of the input, per find_true_end_of_input().
+	 *
+	 * NOTE: This will be changed so that these point to 0 in the next
+	 * patch if no trailers are found.
 	 */
-	const char *trailer_start, *trailer_end;
+	size_t trailer_block_start, trailer_block_end;
 
 	/*
 	 * Array of trailers found.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v4 4/4] trailer: only use trailer_block_* variables if trailers were found
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
                         ` (2 preceding siblings ...)
  2023-09-26  6:22       ` [PATCH v4 3/4] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
@ 2023-09-26  6:22       ` Linus Arver via GitGitGadget
  2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
  4 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-09-26  6:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, these variables were overloaded to act as the end of the log
message even if no trailers were found.

Remove the overloaded meaning by adding a new end_of_log_message field
to the trailer_info struct. The trailer_info struct consumers now only
refer to the trailer_block_start and trailer_block_end fields if
trailers were found (trailer_nr > 0), and otherwise refer to the
end_of_log_message.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 31 +++++++++++++++++++++++--------
 trailer.h | 12 +++++++-----
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 3dc2faa969c..c11839ae365 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1083,9 +1083,14 @@ void process_trailers(const char *file,
 
 	parse_trailers(&info, sb.buf, &head, opts);
 
-	/* Print the lines before the trailers */
-	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
+	/* Print the lines before the trailers (if any) as is. */
+	if (!opts->only_trailers) {
+		if (info.trailer_nr) {
+			fwrite(sb.buf, 1, info.trailer_block_start, outfile);
+		} else {
+			fwrite(sb.buf, 1, info.end_of_log_message, outfile);
+		}
+	}
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1105,9 +1110,14 @@ void process_trailers(const char *file,
 	free_all(&head);
 	trailer_info_release(&info);
 
-	/* Print the lines after the trailers as is */
-	if (!opts->only_trailers)
-		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
+	/* Print the lines after the trailers (if any) as is. */
+	if (!opts->only_trailers) {
+		if (info.trailer_nr) {
+			fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
+		} else {
+			fwrite(sb.buf + info.end_of_log_message, 1, sb.len - info.end_of_log_message, outfile);
+		}
+	}
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1153,8 +1163,13 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_block_start);
-	info->trailer_block_start = trailer_block_start;
-	info->trailer_block_end = end_of_log_message;
+	info->trailer_block_start = 0;
+	info->trailer_block_end = 0;
+	if (nr) {
+		info->trailer_block_start = trailer_block_start;
+		info->trailer_block_end = end_of_log_message;
+	}
+	info->end_of_log_message = end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
diff --git a/trailer.h b/trailer.h
index 70d7b8bf1d8..d1e8751952b 100644
--- a/trailer.h
+++ b/trailer.h
@@ -38,14 +38,16 @@ struct trailer_info {
 
 	/*
 	 * Offsets to the trailer block start and end positions in the input
-	 * string. If no trailer block is found, these are both set to the
-	 * "true" end of the input, per find_true_end_of_input().
-	 *
-	 * NOTE: This will be changed so that these point to 0 in the next
-	 * patch if no trailers are found.
+	 * string. If no trailer block is found, these are set to 0.
 	 */
 	size_t trailer_block_start, trailer_block_end;
 
+	/*
+	 * Offset to the end of the log message in the input (may not be the
+	 * same as the end of the input).
+	 */
+	size_t end_of_log_message;
+
 	/*
 	 * Array of trailers found.
 	 */
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/4] trailer: find the end of the log message
  2023-09-26  6:22       ` [PATCH v4 2/4] trailer: find the end of the log message Linus Arver via GitGitGadget
@ 2023-09-28 23:16         ` Jonathan Tan
  2023-10-20  0:24           ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Jonathan Tan @ 2023-09-28 23:16 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: Jonathan Tan, git, Glen Choo, Christian Couder, Phillip Wood,
	Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Linus Arver <linusa@google.com>
> 
> Previously, trailer_info_get() computed the trailer block end position
> by
> 
> (1) checking for the opts->no_divider flag and optionally calling
>     find_patch_start() to find the "patch start" location (patch_start), and
> (2) calling find_trailer_end() to find the end of the trailer block
>     using patch_start as a guide, saving the return value into
>     "trailer_end".
> 
> The logic in (1) was awkward because the variable "patch_start" is
> misleading if there is no patch in the input. The logic in (2) was
> misleading because it could be the case that no trailers are in the
> input (yet we are setting a "trailer_end" variable before even searching
> for trailers, which happens later in find_trailer_start()). The name
> "find_trailer_end" was misleading because that function did not look for
> any trailer block itself --- instead it just computed the end position
> of the log message in the input where the end of the trailer block (if
> it exists) would be (because trailer blocks must always come after the
> end of the log message).

I might be biased since I wrote the text in question in 022349c3b0
(trailer: avoid unnecessary splitting on lines, 2016-11-02), but the
concept of patch_start and trailer_end being where the patch would start
and where the trailer would end (if they were present) goes back to
2013d8505d (trailer: parse trailers from file or stdin, 2014-10-13). I
don't remember exactly my thoughts in 2016, but today, this makes sense
to me.

As it is, the reader still needs to know that trailer_start is where
the trailer would start if it was present, and then I think it's quite
natural to have trailer_end be where the trailer would end if it was
present.

I believe the code is simpler this way, because trailer absence now no
longer needs to be special-cased when we use these variables (or maybe
they sometimes do, but not as often, since code that writes to the end
of the trailers, for example, can now just write at trailer_end instead
of having to check whether trailers exist). Same comment for patch 4
regarding using the special value 0 if no trailers are found (I think
the existing code makes more sense).

> Combine the logic in (1) and (2) together into find_patch_start() by
> renaming it to find_end_of_log_message(). The end of the log message is
> the starting point which find_trailer_start() needs to start searching
> backward to parse individual trailers (if any).

Having said that, if patch_start is too confusing for whatever reason,
this refactoring makes sense. (Avoid the confusing name by avoiding
needing to name it in the first place.)

> -static size_t find_patch_start(const char *str)
> +static size_t find_end_of_log_message(const char *input, int no_divider)
>  {
> +	size_t end;
> +
>  	const char *s;
>  
> -	for (s = str; *s; s = next_line(s)) {
> +	/* Assume the naive end of the input is already what we want. */
> +	end = strlen(input);
> +
> +	/* Optionally skip over any patch part ("---" line and below). */
> +	for (s = input; *s; s = next_line(s)) {
>  		const char *v;
>  
> -		if (skip_prefix(s, "---", &v) && isspace(*v))
> -			return s - str;
> +		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
> +			end = s - input;
> +			break;
> +		}
>  	}
>  
> -	return s - str;
> +	/* Skip over other ignorable bits. */
> +	return end - ignored_log_message_bytes(input, end);
>  }

This sometimes redundantly calls strlen and sometimes redundantly loops.
I think it's better to do as the code currently does - so, have a big
if/else at the beginning of this function that checks no_divider.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/4] trailer: find the end of the log message
  2023-09-28 23:16         ` Jonathan Tan
@ 2023-10-20  0:24           ` Linus Arver
  2023-10-20  0:36             ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-10-20  0:24 UTC (permalink / raw)
  To: Jonathan Tan, Linus Arver via GitGitGadget
  Cc: Jonathan Tan, git, Glen Choo, Christian Couder, Phillip Wood

Hi Jonathan, it's been a while because I was on vacation. I've forgotten
about most of the intricacies of this patch (I think this was a good
thing, read on below).

Jonathan Tan <jonathantanmy@google.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> From: Linus Arver <linusa@google.com>
>> 
>> Previously, trailer_info_get() computed the trailer block end position
>> by
>> 
>> (1) checking for the opts->no_divider flag and optionally calling
>>     find_patch_start() to find the "patch start" location (patch_start), and
>> (2) calling find_trailer_end() to find the end of the trailer block
>>     using patch_start as a guide, saving the return value into
>>     "trailer_end".
>> 
>> The logic in (1) was awkward because the variable "patch_start" is
>> misleading if there is no patch in the input. The logic in (2) was
>> misleading because it could be the case that no trailers are in the
>> input (yet we are setting a "trailer_end" variable before even searching
>> for trailers, which happens later in find_trailer_start()). The name
>> "find_trailer_end" was misleading because that function did not look for
>> any trailer block itself --- instead it just computed the end position
>> of the log message in the input where the end of the trailer block (if
>> it exists) would be (because trailer blocks must always come after the
>> end of the log message).
>
> [...]
>
> As it is, the reader still needs to know that trailer_start is where
> the trailer would start if it was present, and then I think it's quite
> natural to have trailer_end be where the trailer would end if it was
> present.
>
> I believe the code is simpler this way, because trailer absence now no
> longer needs to be special-cased when we use these variables (or maybe
> they sometimes do, but not as often, since code that writes to the end
> of the trailers, for example, can now just write at trailer_end instead
> of having to check whether trailers exist). Same comment for patch 4
> regarding using the special value 0 if no trailers are found (I think
> the existing code makes more sense).

I think the root cause of my confusion with this codebase is due to the
variables being named as if the things they refer to exist, but without
any guarantees that they indeed exist. This applies to "patch_start"
(the patch part might not be present in the input),
"trailer_{start,end}" (trailers block might not exist (yet)). IOW these
variables are named as if the intent is to always add new trailers into
the input, which may not be the case (we have "--parse", after all).

Looking again at patch 4, I'm now leaning toward dropping it. Other
than the reasons you cited, we also add a new struct field which by
itself does not add new information (the information can already be
deduced from the other fields). In the near future I want to simplify
the data structures as much as possible, and adding a new field seems to
go against this desire of mine.

>> Combine the logic in (1) and (2) together into find_patch_start() by
>> renaming it to find_end_of_log_message(). The end of the log message is
>> the starting point which find_trailer_start() needs to start searching
>> backward to parse individual trailers (if any).
>
> Having said that, if patch_start is too confusing for whatever reason,
> this refactoring makes sense. (Avoid the confusing name by avoiding
> needing to name it in the first place.)

I think the existing code is confusing, and would prefer to keep this
patch.

>> -static size_t find_patch_start(const char *str)
>> +static size_t find_end_of_log_message(const char *input, int no_divider)
>>  {
>> +	size_t end;
>> +
>>  	const char *s;
>>  
>> -	for (s = str; *s; s = next_line(s)) {
>> +	/* Assume the naive end of the input is already what we want. */
>> +	end = strlen(input);
>> +
>> +	/* Optionally skip over any patch part ("---" line and below). */
>> +	for (s = input; *s; s = next_line(s)) {
>>  		const char *v;
>>  
>> -		if (skip_prefix(s, "---", &v) && isspace(*v))
>> -			return s - str;
>> +		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
>> +			end = s - input;
>> +			break;
>> +		}
>>  	}
>>  
>> -	return s - str;
>> +	/* Skip over other ignorable bits. */
>> +	return end - ignored_log_message_bytes(input, end);
>>  }
>
> This sometimes redundantly calls strlen and sometimes redundantly loops.
> I think it's better to do as the code currently does - so, have a big
> if/else at the beginning of this function that checks no_divider.

Will update, thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v4 2/4] trailer: find the end of the log message
  2023-10-20  0:24           ` Linus Arver
@ 2023-10-20  0:36             ` Junio C Hamano
  0 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2023-10-20  0:36 UTC (permalink / raw)
  To: Linus Arver
  Cc: Jonathan Tan, Linus Arver via GitGitGadget, git, Glen Choo,
	Christian Couder, Phillip Wood

Linus Arver <linusa@google.com> writes:

> Hi Jonathan, it's been a while because I was on vacation. I've forgotten
> about most of the intricacies of this patch (I think this was a good
> thing, read on below).

Welcome back ;-).

> Will update, thanks.

Thanks.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 0/3] Trailer readability cleanups
  2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
                         ` (3 preceding siblings ...)
  2023-09-26  6:22       ` [PATCH v4 4/4] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
@ 2023-10-20 19:01       ` Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
                           ` (2 more replies)
  4 siblings, 3 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-10-20 19:01 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan, Linus Arver

These patches were created while digging into the trailer code to better
understand how it works, in preparation for making the trailer.{c,h} files
as small as possible to make them available as a library for external users.
This series was originally created as part of [1], but are sent here
separately because the changes here are arguably more subjective in nature.

These patches do not add or change any features. Instead, their goal is to
make the code easier to understand for new contributors (like myself), by
making various cleanups and improvements. Ultimately, my hope is that with
such cleanups, we are better positioned to make larger changes (especially
the broader libification effort, as in "Introduce Git Standard Library"
[2]).


Updates in v5
=============

 * Patch 4 ("trailer: only use trailer_block_* variables if trailers were
   found") has been dropped.
 * Patch 2 returns early if "--no-divider" is true, avoiding unnecessary
   loop iterations (thanks Jonathan).
 * Added missing Reported-by trailer for Patch 3 (it was originally Glen's
   idea to use offsets).
 * Patch 3: Fixed typo in "trailer.h" that referred to an obsolete function
   name ("find_true_end_of_input()", instead of
   "find_end_of_log_message()").


Updates in v4
=============

 * The first 3 patches in v3 were merged into 'master'. Necessarily, those 3
   patches have been dropped.
 * Patch 4 in v3 ("trailer: rename *_DEFAULT enums to *_UNSPECIFIED") has
   been dropped, as well as Patch 9 in v3 ("trailer: make stack variable
   names match field names"). These were dropped to simplify this series for
   what I think is the more immediate, important change (see next bullet
   point).
 * Patches 5-8 in v3 are the only ones remaining in this series. They still
   solely deal with --no-divider and trailer block start/end cleanups.


Updates in v3
=============

 * Patches 4 and 6 (--no-divider and trailer block start/end cleanups) have
   been reorganized to Patches 5-8. This ended up touching commit.c in a
   minor way, but otherwise all of the changes here are cleanups and do not
   change any behavior.


Updates in v2
=============

 * Patch 1: Drop the use of a #define. Instead just use an anonymous struct
   named internal.
 * Patch 2: Don't free info out parameter inside parse_trailers(). Instead
   free it from the caller, process_trailers(). Update comment in
   parse_trailers().
 * Patch 3: Reword commit message.
 * Patch 4: Mention be3d654343 (commit: pass --no-divider to
   interpret-trailers, 2023-06-17) in commit message.
 * Added Patch 6 to make trailer_info use offsets for trailer_start and
   trailer_end (thanks to Glen Choo for the suggestion).

[1]
https://lore.kernel.org/git/pull.1564.git.1691210737.gitgitgadget@gmail.com/T/#mb044012670663d8eb7a548924bbcc933bef116de
[2]
https://lore.kernel.org/git/20230627195251.1973421-1-calvinwan@google.com/
[3]
https://lore.kernel.org/git/pull.1149.git.1677143700.gitgitgadget@gmail.com/
[4]
https://lore.kernel.org/git/6b4cb31b17077181a311ca87e82464a1e2ad67dd.1686797630.git.gitgitgadget@gmail.com/
[5]
https://lore.kernel.org/git/pull.1563.git.1691211879.gitgitgadget@gmail.com/T/#m0131f9829c35d8e0103ffa88f07d8e0e43dd732c

Linus Arver (3):
  commit: ignore_non_trailer computes number of bytes to ignore
  trailer: find the end of the log message
  trailer: use offsets for trailer_start/trailer_end

 builtin/commit.c |  2 +-
 builtin/merge.c  |  2 +-
 commit.c         |  2 +-
 commit.h         |  4 +--
 sequencer.c      |  2 +-
 trailer.c        | 85 +++++++++++++++++++++++++++++-------------------
 trailer.h        | 10 +++---
 7 files changed, 62 insertions(+), 45 deletions(-)


base-commit: bcb6cae2966cc407ca1afc77413b3ef11103c175
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1563%2Flistx%2Ftrailer-libification-prep-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1563/listx/trailer-libification-prep-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1563

Range-diff vs v4:

 1:  4ce5cf77005 = 1:  4ce5cf77005 commit: ignore_non_trailer computes number of bytes to ignore
 2:  c904caba7e1 ! 2:  ce25420db29 trailer: find the end of the log message
     @@ Commit message
          the starting point which find_trailer_start() needs to start searching
          backward to parse individual trailers (if any).
      
     +    Helped-by: Jonathan Tan <jonathantanmy@google.com>
          Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Linus Arver <linusa@google.com>
      
     @@ trailer.c: static ssize_t last_line(const char *buf, size_t len)
      +static size_t find_end_of_log_message(const char *input, int no_divider)
       {
      +	size_t end;
     -+
       	const char *s;
       
      -	for (s = str; *s; s = next_line(s)) {
      +	/* Assume the naive end of the input is already what we want. */
      +	end = strlen(input);
      +
     ++	if (no_divider) {
     ++		return end;
     ++	}
     ++
      +	/* Optionally skip over any patch part ("---" line and below). */
      +	for (s = input; *s; s = next_line(s)) {
       		const char *v;
       
      -		if (skip_prefix(s, "---", &v) && isspace(*v))
      -			return s - str;
     -+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v)) {
     ++		if (skip_prefix(s, "---", &v) && isspace(*v)) {
      +			end = s - input;
      +			break;
      +		}
 3:  796e47c1e5f ! 3:  e3a7b150241 trailer: use offsets for trailer_start/trailer_end
     @@ Commit message
          more explicit about these offsets (that they are for the entire trailer
          block including other trailers). Ditto for trailer_end.
      
     +    Reported-by: Glen Choo <glencbz@gmail.com>
          Signed-off-by: Linus Arver <linusa@google.com>
      
       ## sequencer.c ##
     @@ trailer.h: int trailer_set_if_missing(enum trailer_if_missing *item, const char
      -	 * input string.
      +	 * Offsets to the trailer block start and end positions in the input
      +	 * string. If no trailer block is found, these are both set to the
     -+	 * "true" end of the input, per find_true_end_of_input().
     -+	 *
     -+	 * NOTE: This will be changed so that these point to 0 in the next
     -+	 * patch if no trailers are found.
     ++	 * "true" end of the input (find_end_of_log_message()).
       	 */
      -	const char *trailer_start, *trailer_end;
      +	size_t trailer_block_start, trailer_block_end;
 4:  64e1bd4e4be < -:  ----------- trailer: only use trailer_block_* variables if trailers were found

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore
  2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
@ 2023-10-20 19:01         ` Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 2/3] trailer: find the end of the log message Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 3/3] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
  2 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-10-20 19:01 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan,
	Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

ignore_non_trailer() returns the _number of bytes_ that should be
ignored from the end of the log message. It does not by itself "ignore"
anything.

Rename this function to remove the leading "ignore" verb, to sound more
like a quantity than an action.

Signed-off-by: Linus Arver <linusa@google.com>
---
 builtin/commit.c | 2 +-
 builtin/merge.c  | 2 +-
 commit.c         | 2 +-
 commit.h         | 4 ++--
 trailer.c        | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484..d1785d32db1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -900,7 +900,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		strbuf_stripspace(&sb, '\0');
 
 	if (signoff)
-		append_signoff(&sb, ignore_non_trailer(sb.buf, sb.len), 0);
+		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
 
 	if (fwrite(sb.buf, 1, sb.len, s->fp) < sb.len)
 		die_errno(_("could not write commit template"));
diff --git a/builtin/merge.c b/builtin/merge.c
index 545da0c8a11..c654a29fe85 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -870,7 +870,7 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 				_(no_scissors_editor_comment), comment_line_char);
 	}
 	if (signoff)
-		append_signoff(&msg, ignore_non_trailer(msg.buf, msg.len), 0);
+		append_signoff(&msg, ignored_log_message_bytes(msg.buf, msg.len), 0);
 	write_merge_heads(remoteheads);
 	write_file_buf(git_path_merge_msg(the_repository), msg.buf, msg.len);
 	if (run_commit_hook(0 < option_edit, get_index_file(), NULL,
diff --git a/commit.c b/commit.c
index b3223478bc2..4440fbabb83 100644
--- a/commit.c
+++ b/commit.c
@@ -1769,7 +1769,7 @@ const char *find_commit_header(const char *msg, const char *key, size_t *out_len
  * Returns the number of bytes from the tail to ignore, to be fed as
  * the second parameter to append_signoff().
  */
-size_t ignore_non_trailer(const char *buf, size_t len)
+size_t ignored_log_message_bytes(const char *buf, size_t len)
 {
 	size_t boc = 0;
 	size_t bol = 0;
diff --git a/commit.h b/commit.h
index 28928833c54..1cc872f225f 100644
--- a/commit.h
+++ b/commit.h
@@ -294,8 +294,8 @@ const char *find_header_mem(const char *msg, size_t len,
 const char *find_commit_header(const char *msg, const char *key,
 			       size_t *out_len);
 
-/* Find the end of the log message, the right place for a new trailer. */
-size_t ignore_non_trailer(const char *buf, size_t len);
+/* Find the number of bytes to ignore from the end of a log message. */
+size_t ignored_log_message_bytes(const char *buf, size_t len);
 
 typedef int (*each_mergetag_fn)(struct commit *commit, struct commit_extra_header *extra,
 				void *cb_data);
diff --git a/trailer.c b/trailer.c
index b6de5d9cb2d..3c54b38a85a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -928,7 +928,7 @@ continue_outer_loop:
 /* Return the position of the end of the trailers. */
 static size_t find_trailer_end(const char *buf, size_t len)
 {
-	return len - ignore_non_trailer(buf, len);
+	return len - ignored_log_message_bytes(buf, len);
 }
 
 static int ends_with_blank_line(const char *buf, size_t len)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 2/3] trailer: find the end of the log message
  2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
@ 2023-10-20 19:01         ` Linus Arver via GitGitGadget
  2023-10-20 21:29           ` Junio C Hamano
  2023-10-20 19:01         ` [PATCH v5 3/3] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
  2 siblings, 1 reply; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-10-20 19:01 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan,
	Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously, trailer_info_get() computed the trailer block end position
by

(1) checking for the opts->no_divider flag and optionally calling
    find_patch_start() to find the "patch start" location (patch_start), and
(2) calling find_trailer_end() to find the end of the trailer block
    using patch_start as a guide, saving the return value into
    "trailer_end".

The logic in (1) was awkward because the variable "patch_start" is
misleading if there is no patch in the input. The logic in (2) was
misleading because it could be the case that no trailers are in the
input (yet we are setting a "trailer_end" variable before even searching
for trailers, which happens later in find_trailer_start()). The name
"find_trailer_end" was misleading because that function did not look for
any trailer block itself --- instead it just computed the end position
of the log message in the input where the end of the trailer block (if
it exists) would be (because trailer blocks must always come after the
end of the log message).

Combine the logic in (1) and (2) together into find_patch_start() by
renaming it to find_end_of_log_message(). The end of the log message is
the starting point which find_trailer_start() needs to start searching
backward to parse individual trailers (if any).

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 64 +++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/trailer.c b/trailer.c
index 3c54b38a85a..70c81fda710 100644
--- a/trailer.c
+++ b/trailer.c
@@ -809,21 +809,50 @@ static ssize_t last_line(const char *buf, size_t len)
 }
 
 /*
- * Return the position of the start of the patch or the length of str if there
- * is no patch in the message.
+ * Find the end of the log message as an offset from the start of the input
+ * (where callers of this function are interested in looking for a trailers
+ * block in the same input). We have to consider two categories of content that
+ * can come at the end of the input which we want to ignore (because they don't
+ * belong in the log message):
+ *
+ * (1) the "patch part" which begins with a "---" divider and has patch
+ * information (like the output of git-format-patch), and
+ *
+ * (2) any trailing comment lines, blank lines like in the output of "git
+ * commit -v", or stuff below the "cut" (scissor) line.
+ *
+ * As a formula, the situation looks like this:
+ *
+ *     INPUT = LOG MESSAGE + IGNORED
+ *
+ * where IGNORED can be either of the two categories described above. It may be
+ * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE
+ * contains a trailer block, but that's not the concern of this function.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_end_of_log_message(const char *input, int no_divider)
 {
+	size_t end;
 	const char *s;
 
-	for (s = str; *s; s = next_line(s)) {
+	/* Assume the naive end of the input is already what we want. */
+	end = strlen(input);
+
+	if (no_divider) {
+		return end;
+	}
+
+	/* Optionally skip over any patch part ("---" line and below). */
+	for (s = input; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
-			return s - str;
+		if (skip_prefix(s, "---", &v) && isspace(*v)) {
+			end = s - input;
+			break;
+		}
 	}
 
-	return s - str;
+	/* Skip over other ignorable bits. */
+	return end - ignored_log_message_bytes(input, end);
 }
 
 /*
@@ -925,12 +954,6 @@ continue_outer_loop:
 	return len;
 }
 
-/* Return the position of the end of the trailers. */
-static size_t find_trailer_end(const char *buf, size_t len)
-{
-	return len - ignored_log_message_bytes(buf, len);
-}
-
 static int ends_with_blank_line(const char *buf, size_t len)
 {
 	ssize_t ll = last_line(buf, len);
@@ -1101,7 +1124,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int patch_start, trailer_end, trailer_start;
+	int end_of_log_message, trailer_start;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1109,16 +1132,11 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
-	trailer_end = find_trailer_end(str, patch_start);
-	trailer_start = find_trailer_start(str, trailer_end);
+	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
+	trailer_start = find_trailer_start(str, end_of_log_message);
 
 	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 trailer_end - trailer_start,
+					 end_of_log_message - trailer_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1141,7 +1159,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_start);
 	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + trailer_end;
+	info->trailer_end = str + end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 3/3] trailer: use offsets for trailer_start/trailer_end
  2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
  2023-10-20 19:01         ` [PATCH v5 2/3] trailer: find the end of the log message Linus Arver via GitGitGadget
@ 2023-10-20 19:01         ` Linus Arver via GitGitGadget
  2 siblings, 0 replies; 72+ messages in thread
From: Linus Arver via GitGitGadget @ 2023-10-20 19:01 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan,
	Linus Arver, Linus Arver

From: Linus Arver <linusa@google.com>

Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

While we're at it, rename trailer_start to trailer_block_start to be
more explicit about these offsets (that they are for the entire trailer
block including other trailers). Ditto for trailer_end.

Reported-by: Glen Choo <glencbz@gmail.com>
Signed-off-by: Linus Arver <linusa@google.com>
---
 sequencer.c |  2 +-
 trailer.c   | 29 ++++++++++++++---------------
 trailer.h   | 10 +++++-----
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index d584cac8ed9..8707a92204f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -345,7 +345,7 @@ static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
 	if (ignore_footer)
 		sb->buf[sb->len - ignore_footer] = saved_char;
 
-	if (info.trailer_start == info.trailer_end)
+	if (info.trailer_block_start == info.trailer_block_end)
 		return 0;
 
 	for (i = 0; i < info.trailer_nr; i++)
diff --git a/trailer.c b/trailer.c
index 70c81fda710..f7dc7c4c008 100644
--- a/trailer.c
+++ b/trailer.c
@@ -859,7 +859,7 @@ static size_t find_end_of_log_message(const char *input, int no_divider)
  * Return the position of the first trailer line or len if there are no
  * trailers.
  */
-static size_t find_trailer_start(const char *buf, size_t len)
+static size_t find_trailer_block_start(const char *buf, size_t len)
 {
 	const char *s;
 	ssize_t end_of_title, l;
@@ -1075,7 +1075,6 @@ void process_trailers(const char *file,
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
 	struct trailer_info info;
-	size_t trailer_end;
 	FILE *outfile = stdout;
 
 	ensure_configured();
@@ -1086,11 +1085,10 @@ void process_trailers(const char *file,
 		outfile = create_in_place_tempfile(file);
 
 	parse_trailers(&info, sb.buf, &head, opts);
-	trailer_end = info.trailer_end - sb.buf;
 
 	/* Print the lines before the trailers */
 	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+		fwrite(sb.buf, 1, info.trailer_block_start, outfile);
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1112,7 +1110,7 @@ void process_trailers(const char *file,
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
+		fwrite(sb.buf + info.trailer_block_end, 1, sb.len - info.trailer_block_end, outfile);
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1124,7 +1122,7 @@ void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int end_of_log_message, trailer_start;
+	size_t end_of_log_message = 0, trailer_block_start = 0;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1133,10 +1131,10 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	ensure_configured();
 
 	end_of_log_message = find_end_of_log_message(str, opts->no_divider);
-	trailer_start = find_trailer_start(str, end_of_log_message);
+	trailer_block_start = find_trailer_block_start(str, end_of_log_message);
 
-	trailer_lines = strbuf_split_buf(str + trailer_start,
-					 end_of_log_message - trailer_start,
+	trailer_lines = strbuf_split_buf(str + trailer_block_start,
+					 end_of_log_message - trailer_block_start,
 					 '\n',
 					 0);
 	for (ptr = trailer_lines; *ptr; ptr++) {
@@ -1157,9 +1155,9 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	strbuf_list_free(trailer_lines);
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
-							       trailer_start);
-	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + end_of_log_message;
+							       trailer_block_start);
+	info->trailer_block_start = trailer_block_start;
+	info->trailer_block_end = end_of_log_message;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
@@ -1174,6 +1172,7 @@ void trailer_info_release(struct trailer_info *info)
 
 static void format_trailer_info(struct strbuf *out,
 				const struct trailer_info *info,
+				const char *msg,
 				const struct process_trailer_options *opts)
 {
 	size_t origlen = out->len;
@@ -1183,8 +1182,8 @@ static void format_trailer_info(struct strbuf *out,
 	if (!opts->only_trailers && !opts->unfold && !opts->filter &&
 	    !opts->separator && !opts->key_only && !opts->value_only &&
 	    !opts->key_value_separator) {
-		strbuf_add(out, info->trailer_start,
-			   info->trailer_end - info->trailer_start);
+		strbuf_add(out, msg + info->trailer_block_start,
+			   info->trailer_block_end - info->trailer_block_start);
 		return;
 	}
 
@@ -1238,7 +1237,7 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	struct trailer_info info;
 
 	trailer_info_get(&info, msg, opts);
-	format_trailer_info(out, &info, opts);
+	format_trailer_info(out, &info, msg, opts);
 	trailer_info_release(&info);
 }
 
diff --git a/trailer.h b/trailer.h
index ab2cd017567..1644cd05f60 100644
--- a/trailer.h
+++ b/trailer.h
@@ -32,16 +32,16 @@ int trailer_set_if_missing(enum trailer_if_missing *item, const char *value);
 struct trailer_info {
 	/*
 	 * True if there is a blank line before the location pointed to by
-	 * trailer_start.
+	 * trailer_block_start.
 	 */
 	int blank_line_before_trailer;
 
 	/*
-	 * Pointers to the start and end of the trailer block found. If there
-	 * is no trailer block found, these 2 pointers point to the end of the
-	 * input string.
+	 * Offsets to the trailer block start and end positions in the input
+	 * string. If no trailer block is found, these are both set to the
+	 * "true" end of the input (find_end_of_log_message()).
 	 */
-	const char *trailer_start, *trailer_end;
+	size_t trailer_block_start, trailer_block_end;
 
 	/*
 	 * Array of trailers found.
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 2/3] trailer: find the end of the log message
  2023-10-20 19:01         ` [PATCH v5 2/3] trailer: find the end of the log message Linus Arver via GitGitGadget
@ 2023-10-20 21:29           ` Junio C Hamano
  2023-12-29  6:42             ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2023-10-20 21:29 UTC (permalink / raw)
  To: Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan,
	Linus Arver

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Previously, trailer_info_get() computed the trailer block end position
> by
>
> (1) checking for the opts->no_divider flag and optionally calling
>     find_patch_start() to find the "patch start" location (patch_start), and
> (2) calling find_trailer_end() to find the end of the trailer block
>     using patch_start as a guide, saving the return value into
>     "trailer_end".
>
> The logic in (1) was awkward because the variable "patch_start" is
> misleading if there is no patch in the input. The logic in (2) was
> misleading because it could be the case that no trailers are in the
> input (yet we are setting a "trailer_end" variable before even searching
> for trailers, which happens later in find_trailer_start()). The name
> "find_trailer_end" was misleading because that function did not look for
> any trailer block itself --- instead it just computed the end position
> of the log message in the input where the end of the trailer block (if
> it exists) would be (because trailer blocks must always come after the
> end of the log message).
>
> Combine the logic in (1) and (2) together into find_patch_start() by
> renaming it to find_end_of_log_message(). The end of the log message is
> the starting point which find_trailer_start() needs to start searching
> backward to parse individual trailers (if any).
>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Linus Arver <linusa@google.com>
> ---
>  trailer.c | 64 +++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/trailer.c b/trailer.c
> index 3c54b38a85a..70c81fda710 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -809,21 +809,50 @@ static ssize_t last_line(const char *buf, size_t len)
>  }
>  
>  /*
> - * Return the position of the start of the patch or the length of str if there
> - * is no patch in the message.
> + * Find the end of the log message as an offset from the start of the input
> + * (where callers of this function are interested in looking for a trailers
> + * block in the same input). We have to consider two categories of content that
> + * can come at the end of the input which we want to ignore (because they don't
> + * belong in the log message):
> + *
> + * (1) the "patch part" which begins with a "---" divider and has patch
> + * information (like the output of git-format-patch), and
> + *
> + * (2) any trailing comment lines, blank lines like in the output of "git
> + * commit -v", or stuff below the "cut" (scissor) line.
> + *
> + * As a formula, the situation looks like this:
> + *
> + *     INPUT = LOG MESSAGE + IGNORED
> + *
> + * where IGNORED can be either of the two categories described above. It may be
> + * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE
> + * contains a trailer block, but that's not the concern of this function.
>   */
> -static size_t find_patch_start(const char *str)
> +static size_t find_end_of_log_message(const char *input, int no_divider)
>  {
> +	size_t end;
>  	const char *s;
>  
> -	for (s = str; *s; s = next_line(s)) {
> +	/* Assume the naive end of the input is already what we want. */
> +	end = strlen(input);
> +
> +	if (no_divider) {
> +		return end;
> +	}

OK.  The early return may make sense, as we are essentially
declaring that everything is the "INPUT (= message + ignored)".

You do not need {braces} around a single-statement block, though.

Other than that, I didn't find anything quesionable in any of the
patches in this round.  Looking good.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 2/3] trailer: find the end of the log message
  2023-10-20 21:29           ` Junio C Hamano
@ 2023-12-29  6:42             ` Linus Arver
  2023-12-29 21:03               ` Linus Arver
  0 siblings, 1 reply; 72+ messages in thread
From: Linus Arver @ 2023-12-29  6:42 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan


TL;DR: I'm working on a new approach.

Junio C Hamano <gitster@pobox.com> writes:
> Other than that, I didn't find anything quesionable in any of the
> patches in this round.  Looking good.

So actually, I'm now taking a much more aggressive approach to libifying
the trailer subsystem. Instead of incrementally simplifying/improving
things as in this series, I think I need to get to the root problem,
which is that the trailer.h API isn't rich enough to make it pleasant
for clients to use, including our own builtin/interpret-trailers.c
client. That is, the problem we have today is that the trailer subsystem
is not very ergonomic for internal use, much less external use (outside
of Git itself).

As an example, the current API exposes process_trailers() which does a
whole bunch of things that only builtin/interpret-trailers.c cares
about. Multiple other clients of trailer.h exist in our codebase (e.g.,
sequencer.c, pretty.c, ref-filter.c) but none of them use
process_trailers().

One really useful data structure is the trailer_iterator that was
introduced in f0939a0eb1 (trailer: add interface for iterating over
commit trailers, 2020-09-27). The only problem is that it is not generic
enough such that interpret-trailers.c can use it.

My new goal is to introduce a new API in trailer.h so that
interpret-trailers.c and everyone else can start using these new data
structures and associated functions (while preserving the
trailer_iterator interface). So the order of operations should be:

(1) enrich the trailer API (make trailer.h have simpler data structures
    and practical functions that clients can readily use), and
(2) make builtin/interpret-trailers.c, and other clients in the Git
    codebase use this new API.

This way when the unit test framework selection process is finalized we
can

(3) write unit tests for the functions in the (enriched) trailer API,

which is one of the major goals for my efforts around this area.

The work I've started locally for (1) does not depend on this series,
and I think it'll be cleaner (less churn) that way. So, feel free to
drop this series in favor of the forthcoming work described in this
message.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 2/3] trailer: find the end of the log message
  2023-12-29  6:42             ` Linus Arver
@ 2023-12-29 21:03               ` Linus Arver
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Arver @ 2023-12-29 21:03 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan


> (1) enrich the trailer API (make trailer.h have simpler data structures
>     and practical functions that clients can readily use), and
> (2) make builtin/interpret-trailers.c, and other clients in the Git
>     codebase use this new API.

I've done some more thinking/hacking and I'm realizing that changing the
data structures significantly as a first step will be difficult to get
right without being able to unit-test things as we go. As we don't have
unit tests (sorry, I keep forgetting...), I think changing the shape of
data structures right now is a showstopper.

Step (1) will have to be done without changing any of the existing data
structures.

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2023-12-29 21:03 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-08-07 21:16   ` Glen Choo
2023-08-08 12:19     ` Phillip Wood
2023-08-10 23:15       ` Linus Arver
2023-08-10 22:50     ` Linus Arver
2023-08-05  5:04 ` [PATCH 2/5] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-08-07 22:39   ` Glen Choo
2023-08-11  0:41     ` Linus Arver
2023-08-05  5:04 ` [PATCH 3/5] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-08-07 22:51   ` Glen Choo
2023-08-11  0:59     ` Linus Arver
2023-08-11  1:02       ` Linus Arver
2023-08-11 21:11       ` Glen Choo
2023-08-05  5:04 ` [PATCH 4/5] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
2023-08-07 23:28   ` Glen Choo
2023-08-11  1:25     ` Linus Arver
2023-08-11 20:51       ` Glen Choo
2023-08-05  5:04 ` [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-08-07 23:45   ` Glen Choo
2023-08-11 18:00     ` Linus Arver
2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-09-11 17:10     ` Junio C Hamano
2023-09-09  6:16   ` [PATCH v2 2/6] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-09-11 17:10     ` Junio C Hamano
2023-09-09  6:16   ` [PATCH v2 3/6] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-09-09  6:16   ` [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
2023-09-11 17:25     ` Junio C Hamano
2023-09-14  2:19       ` Linus Arver
2023-09-14  3:12         ` Junio C Hamano
2023-09-14  5:31           ` Linus Arver
2023-09-09  6:16   ` [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-09-11 18:54     ` Junio C Hamano
2023-09-14  2:41       ` Linus Arver
2023-09-14  3:16         ` Junio C Hamano
2023-09-22 18:23           ` Linus Arver
2023-09-22 19:48             ` Junio C Hamano
2023-09-26  5:30               ` Linus Arver
2023-09-09  6:16   ` [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-11 19:01     ` Junio C Hamano
2023-09-14  1:21       ` Linus Arver
2023-09-14  3:18         ` Linus Arver
2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 2/9] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 3/9] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 4/9] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 5/9] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 6/9] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 7/9] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 8/9] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 9/9] trailer: make stack variable names match field names Linus Arver via GitGitGadget
2023-09-22 22:47     ` [PATCH v3 0/9] Trailer readability cleanups Junio C Hamano
2023-09-22 23:13       ` Linus Arver
2023-09-23  0:48         ` Junio C Hamano
2023-09-26  5:40           ` Linus Arver
2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 2/4] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-09-28 23:16         ` Jonathan Tan
2023-10-20  0:24           ` Linus Arver
2023-10-20  0:36             ` Junio C Hamano
2023-09-26  6:22       ` [PATCH v4 3/4] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 4/4] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
2023-10-20 19:01         ` [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-10-20 19:01         ` [PATCH v5 2/3] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-10-20 21:29           ` Junio C Hamano
2023-12-29  6:42             ` Linus Arver
2023-12-29 21:03               ` Linus Arver
2023-10-20 19:01         ` [PATCH v5 3/3] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).