All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add cat-file --batch-command flag
@ 2022-02-03 19:08 John Cai via GitGitGadget
  2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-03 19:08 UTC (permalink / raw)
  To: git; +Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

The latest change since [C] was replacing the flush command with a series of
commands to allow a read "session", so we can queue up a bunch of objects
and then get either their info or contents, then end by flushing. This is to
address Phillip's concern about deadlocks if we allow the caller to call
flush whenever they want.

This patch series has two parts:

 1. preparation patch to rename a variable
 2. logic to handle --batch-command flag, and adding different commands

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (2):
  cat-file.c: rename cmdmode to mode
  catfile.c: add --batch-command mode

 Documentation/git-cat-file.txt |  27 +++++
 builtin/cat-file.c             | 203 +++++++++++++++++++++++++++++----
 t/t1006-cat-file.sh            |  46 +++++++-
 3 files changed, 251 insertions(+), 25 deletions(-)


base-commit: 5d01301f2b865aa8dba1654d3f447ce9d21db0b5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v1
Pull-Request: https://github.com/git/git/pull/1212
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/2] cat-file.c: rename cmdmode to mode
  2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-03 19:08 ` John Cai via GitGitGadget
  2022-02-03 19:28   ` Junio C Hamano
  2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
  2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
  2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2 siblings, 2 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-03 19:08 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

To prepare for a new flag --batch-command, we will add a flag that
indicates whether or not an interactive command mode will be used
that reads commands and arguments off of stdin.

An intuitive name for this flag would be "command", which can get
confusing with the already existing cmdmode.

Rename cmdmode->mode to prepare for this change.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index d94050e6c18..858bca208ff 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -306,19 +306,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -330,7 +330,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid mode: %c", opt->mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -533,7 +533,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -695,10 +695,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 
 	batch.buffer_output = -1;
 	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
-
 	if (opt) {
 		if (batch.enabled && (opt == 'c' || opt == 'w'))
-			batch.cmdmode = opt;
+			batch.mode = opt;
 		else if (argc == 1)
 			obj_name = argv[0];
 		else
@@ -712,9 +711,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			usage_with_options(cat_file_usage, options);
 	}
 	if (batch.enabled) {
-		if (batch.cmdmode != opt || argc)
+		if (batch.mode != opt || argc)
 			usage_with_options(cat_file_usage, options);
-		if (batch.cmdmode && batch.all_objects)
+		if (batch.mode && batch.all_objects)
 			die("--batch-all-objects cannot be combined with "
 			    "--textconv nor with --filters");
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
@ 2022-02-03 19:08 ` John Cai via GitGitGadget
  2022-02-03 19:57   ` Junio C Hamano
                     ` (2 more replies)
  2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2 siblings, 3 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-03 19:08 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

Add new flag --batch-command that will accept commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldnt need to keep both
processes around, and instead just have one process where we can flip
between getting object info, and getting object contents. This means we
can get rid of roughly half of long lived git cat-file processes. This
can lead to huge savings since on a given server there could be hundreds
of git cat-file processes running.

git cat-file --batch-command

would enter an interactive command mode whereby the user can enter in
commands and their arguments:

<command> [arg1] [arg2] NL

This patch adds the basic structure for add command which can be
extended in the future to add more commands.

This patch also adds the following commands:

contents <object> NL
info <object> NL

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

In addition, we need a set of commands that enable a "read session".

When a separate process (A) is connected to a git cat-file process (B)
and is interactively writing to and reading from it in --buffer mode,
(A) needs to be able to know when the buffer is flushed to stdout.
Currently, from (A)'s perspective, the only way is to either 1. exit
(B)'s process or 2. send an invalid object to stdin. 1. is not ideal
from a performance perspective as it will require spawning a new cat-file
process each time, and 2. is hacky and not a good long term solution.

With the following commands, process (A) can begin a "session" and
send a list of object names over stdin. When "get contents" or "get info"
is issued, this list of object names will be fed into batch_one_object()
to retrieve either info or contents. Finally an fflush() will be called
to end the session.

begin NL
get contents NL
get info NL

These can be used in the following way:

begin
<sha1>
<sha1>
<sha1>
<sha1>
get info

begin
<sha1>
<sha1>
<sha1>
<sha1>
get contents

With this mechanism, process (A) can be guaranteed to receive all of the
output even in --buffer mode.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  27 +++++
 builtin/cat-file.c             | 184 ++++++++++++++++++++++++++++++---
 t/t1006-cat-file.sh            |  46 ++++++++-
 3 files changed, 242 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 27b27e2b300..d1ba0b12e54 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -90,6 +90,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin.
+	May not be combined with any other options or arguments except
+	`--textconv` or `--filters`, in which case the input lines also need to
+	specify the path, separated by whitespace.  See the section
+	`BATCH OUTPUT` below for details.
+
+contents <object>::
+	Print object contents for object reference <object>
+
+info <object>::
+	Print object info for object reference <object>
+
+begin::
+	Begins a session to read object names off of stdin. A session can be
+	terminated with `get contents` or `get info`.
+
+get contents::
+	After a read session is begun with the `begin` command, and object
+	names have been fed into stdin, end the session and retrieve contents of
+	all the objects requested.
+
+get info::
+	After a read session is begun with the `begin` command, and object
+	names have been fed into stdin, end the session and retrieve info of
+	all the objects requested.
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 858bca208ff..29d5cd6857b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,6 +26,7 @@ struct batch_options {
 	int unordered;
 	int mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
+	int command;
 };
 
 static const char *force_path;
@@ -512,6 +513,151 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+static void parse_cmd_object(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data,
+			     struct string_list revs)
+{
+	opt->print_contents = 1;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data,
+			   struct string_list revs)
+{
+	opt->print_contents = 0;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_begin(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data,
+			   struct string_list revs)
+{
+	/* nothing needs to be done here */
+}
+
+static void parse_cmd_get(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data,
+			   struct string_list revs)
+{
+	struct string_list_item *item;
+	for_each_string_list_item(item, &revs) {
+		batch_one_object(item->string, output, opt, data);
+	}
+	if (opt->buffer_output)
+		fflush(stdout);
+}
+static void parse_cmd_get_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data,
+			   struct string_list revs)
+{
+	opt->print_contents = 0;
+	parse_cmd_get(opt, line, output, data, revs);
+}
+
+static void parse_cmd_get_objects(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data,
+			   struct string_list revs)
+{
+	opt->print_contents = 1;
+	parse_cmd_get(opt, line, output, data, revs);
+	if (opt->buffer_output)
+		fflush(stdout);
+}
+
+enum batch_state {
+	BATCH_STATE_COMMAND,
+	BATCH_STATE_INPUT,
+};
+
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *,
+			       struct string_list revs);
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+	enum batch_state next_state;
+} commands[] = {
+	{ "contents", parse_cmd_object, 1, BATCH_STATE_COMMAND},
+	{ "info", parse_cmd_info, 1, BATCH_STATE_COMMAND},
+	{ "begin", parse_cmd_begin, 0, BATCH_STATE_INPUT},
+	{ "get info", parse_cmd_get_info, 0, BATCH_STATE_COMMAND},
+	{ "get contents", parse_cmd_get_objects, 0, BATCH_STATE_COMMAND},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	enum batch_state state = BATCH_STATE_COMMAND;
+	struct string_list revs = STRING_LIST_INIT_DUP;
+
+	/* Read each line dispatch its command */
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p, *cmd_end;
+
+		if (state == BATCH_STATE_COMMAND) {
+			if (*input.buf == '\n')
+				die("empty command in input");
+			else if (isspace(*input.buf))
+				die("whitespace before command: %s", input.buf);
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			const char *prefix = commands[i].prefix;
+			char c;
+			if (!skip_prefix(input.buf, prefix, &cmd_end))
+				continue;
+			/*
+			 * If the command has arguments, verify that it's
+			 * followed by a space. Otherwise, it shall be followed
+			 * by a line terminator.
+			 */
+			c = commands[i].takes_args ? ' ' : '\n';
+			if (*cmd_end && *cmd_end != c)
+				die("arguments invalid for command: %s", commands[i].prefix);
+
+			cmd = &commands[i];
+			if (cmd->takes_args)
+				p = cmd_end + 1;
+			break;
+		}
+
+		if (input.buf[input.len - 1] == '\n')
+			input.buf[--input.len] = '\0';
+
+		if (state == BATCH_STATE_INPUT && !cmd){
+			string_list_append(&revs, input.buf);
+			continue;
+		}
+
+		if (!cmd)
+			die("unknown command: %s", input.buf);
+
+		state = cmd->next_state;
+		cmd->fn(opt, p, output, data, revs);
+	}
+	strbuf_release(&input);
+	string_list_clear(&revs, 0);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt)
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
+	const int command = opt->command;
 
 	if (!opt->format)
 		opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
-	while (strbuf_getline(&input, stdin) != EOF) {
-		if (data.split_on_whitespace) {
-			/*
-			 * Split at first whitespace, tying off the beginning
-			 * of the string and saving the remainder (or NULL) in
-			 * data.rest.
-			 */
-			char *p = strpbrk(input.buf, " \t");
-			if (p) {
-				while (*p && strchr(" \t", *p))
-					*p++ = '\0';
+	if (command)
+		batch_objects_command(opt, &output, &data);
+	else {
+		while (strbuf_getline(&input, stdin) != EOF) {
+			if (data.split_on_whitespace) {
+				/*
+				 * Split at first whitespace, tying off the beginning
+				 * of the string and saving the remainder (or NULL) in
+				 * data.rest.
+				 */
+				char *p = strpbrk(input.buf, " \t");
+				if (p) {
+					while (*p && strchr(" \t", *p))
+						*p++ = '\0';
+				}
+				data.rest = p;
 			}
-			data.rest = p;
+			batch_one_object(input.buf, &output, opt, &data);
 		}
-
-		batch_one_object(input.buf, &output, opt, &data);
 	}
 
 	strbuf_release(&input);
@@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt,
 
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
+	bo->command = !strcmp(opt->long_name, "batch-command");
 	bo->format = arg;
 
 	return 0;
@@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("show info about objects fed from the standard input"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_(""),
+			 N_("enters batch mode that accepts commands"),
+			 PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			 batch_option_callback),
+
 		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
 			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
 		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 39382fa1958..7360d049113 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -85,6 +85,34 @@ $content"
 	test_cmp expect actual
     '
 
+    test -z "$content" ||
+    test_expect_success "--batch-command output of $type content is correct" '
+	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+	maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual &&
+	test_cmp expect actual
+    '
+
+    test -z "$content" ||
+    test_expect_success "--batch-command session for $type content is correct" '
+	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+	maybe_remove_timestamp \
+		"$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \
+		$no_ts >actual &&
+	test_cmp expect actual
+    '
+
+    test_expect_success "--batch-command output of $type info is correct" '
+	echo "$sha1 $type $size" >expect &&
+	echo "info $sha1" | git cat-file --batch-command >actual &&
+	test_cmp expect actual
+    '
+
+    test_expect_success "--batch-command session for $type info is correct" '
+	echo "$sha1 $type $size" >expect &&
+	test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual &&
+	test_cmp expect actual
+    '
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
 '
 
 tree_sha1=$(git write-tree)
+
 tree_size=$(($(test_oid rawsz) + 13))
 tree_pretty_content="100644 blob $hello_sha1	hello"
 
@@ -175,7 +204,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success "--batch-command with multiple sha1s gives correct format" '
+    echo "$batch_check_output" >expect &&
+    echo begin >input &&
+    echo_without_newline "$batch_check_input" >>input &&
+    echo "get info" >>input &&
+    git cat-file --batch-command <input >actual &&
+    test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -872,4 +910,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	test_cmp expect actual
 '
 
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command < cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/2] cat-file.c: rename cmdmode to mode
  2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
@ 2022-02-03 19:28   ` Junio C Hamano
  2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-03 19:28 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> To prepare for a new flag --batch-command, we will add a flag that
> indicates whether or not an interactive command mode will be used
> that reads commands and arguments off of stdin.
>
> An intuitive name for this flag would be "command", which can get
> confusing with the already existing cmdmode.

I am moderately against this change.  "mode" is too vague a word
(i.e. it does not answer "mode of what?" question with the name),
and renaming it to "mode" is probably a regression in readability.

The original name "cmdmode" is not all that better, either, as I am
sure that it was named after the irrelevant implementation detail
that it is originally read using the OPT_CMDMODE() feature of the
parse-options machinery, and did not mean to express mode of doing
"what" with the name.

So, let's think aloud what this "mode" is about.  The feature lets
the batch operation to choose how the blob objects are mangled [*],
so "mangle-mode", "filter-mode", etc. (or find and use a better verb
than these) would be an improvement over the original "cmdmode".


[Footnote]

* 'w' to apply the smudge filter to make them as if they were
  written to the working tree, 'c' to apply the textconv filter, or
  0 to stream them as-is.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-03 19:57   ` Junio C Hamano
  2022-02-04  4:11     ` John Cai
  2022-02-04  6:45   ` Eric Sunshine
  2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-03 19:57 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode

"cat-file: add --batch-command mode" perhaps.  The patch touching
the file "catfile.c" (which does not even exist) is an irrelevant
implementation detail to spend 2 extra bytes in "git shortlog"
output.

> From: John Cai <johncai86@gmail.com>
>
> Add new flag --batch-command that will accept commands and arguments
> from stdin, similar to git-update-ref --stdin.
>
> At GitLab, we use a pair of long running cat-file processes when
> accessing object content. One for iterating over object metadata with
> --batch-check, and the other to grab object contents with --batch.
>
> However, if we had --batch-command, we wouldnt need to keep both
> processes around, and instead just have one process where we can flip
> between getting object info, and getting object contents. This means we
> can get rid of roughly half of long lived git cat-file processes. This
> can lead to huge savings since on a given server there could be hundreds
> of git cat-file processes running.

Hmph, why hundreds, not two you listed?

Do you mean "we have two per repository, but by combining, we can do
with just one per repository, halving the number of processes"?

> git cat-file --batch-command
>
> would enter an interactive command mode whereby the user can enter in
> commands and their arguments:
>
> <command> [arg1] [arg2] NL
>
> This patch adds the basic structure for add command which can be
> extended in the future to add more commands.
>
> This patch also adds the following commands:
>
> contents <object> NL
> info <object> NL
>
> The contents command takes an <object> argument and prints out the object
> contents.
>
> The info command takes a <object> argument and prints out the object
> metadata.
>
> In addition, we need a set of commands that enable a "read session".
>
> When a separate process (A) is connected to a git cat-file process (B)

This description misleads readers into thinking as if we have just
created a daemon process that is running, and an unrelated process
can connect to it, which obviously poses a question about securing
the connection.  It is my understanding that what this creates is
just a consumer process (A) starts the cat-file process (B) locally
on its behalf under process (A)'s privilege, and they talk over pipe
without allowing any third-party to participate in the exchange, so
we should avoid misleading users by saying "is connected to" here.

> and is interactively writing to and reading from it in --buffer mode,
> (A) needs to be able to know when the buffer is flushed to stdout.

If A and B are talking over a pair pipes, in order to avoid
deadlocking, both ends need to be able to control whose turn it is
to speak (and it is turn for the other side to listen).  A needs to
be able to _control_ (not "know") when the buffer it uses to write
to B gets flushed, in order to reliably say "I am done for now, it
is your turn to speak" and be assured that it reaches B.  The story
is the same for the other side.  When a request by A needs to be
responded with multiple lines of output, B needs to be able to say
"And that concludes my response, and I am ready to accept a new
request from you" and make sure it reaches A.  "know when..." is
probably a wrong phrase here.

> Currently, from (A)'s perspective, the only way is to either 1. exit
> (B)'s process or 2. send an invalid object to stdin. 1. is not ideal
> from a performance perspective as it will require spawning a new cat-file
> process each time, and 2. is hacky and not a good long term solution.

Writing enumeration as bulletted or enumerated list would make it
much easier to read, I would think.

    From (A)'s perspective, the only way is to either 

    1. exit (B)'s process or
    2. send an invalid object to stdin.

    1. is not ideal from a performance perspective, as it will
    require spawning a new cat-file process each time, and 2. is
    hacky and not a good long term solution.

I am not sure what you exactly mean by "exit" in the above.  Do you
mean "kill" instead?

> With the following commands, process (A) can begin a "session" and
> send a list of object names over stdin. When "get contents" or "get info"
> is issued, this list of object names will be fed into batch_one_object()
> to retrieve either info or contents. Finally an fflush() will be called
> to end the session.
>
> begin NL
> get contents NL
> get info NL
>
> These can be used in the following way:
>
> begin
> <sha1>
> <sha1>
> <sha1>
> <sha1>
> get info
>
> begin
> <sha1>
> <sha1>
> <sha1>
> <sha1>
> get contents
>
> With this mechanism, process (A) can be guaranteed to receive all of the
> output even in --buffer mode.

OK, so do these "get blah" serve both as command and an implicit
"flush"?

With an implicit "flush", do we really need "begin"?

Also, from the point of view of extensibility, not saying what kind
of operation is started when given "begin" is probably not a good
idea.  "get info" and "get contents" may happen to be the only
commands that are supported right now, and the parameters to them
may happen to be just list of object names and nothing else, but
what happens when a new "git frotz" command is added and its
operation is extended with something other than object names and
pathnames?  The way to parse these parameter lines for the "get"
would be different for different commands, and if "cat-file" knows
upfront what is to be done to these parameters, it can even start
prefetching and precomputing to reduce latency observed by the
client before the final "get info" command is given.

So, from that point of view,

	begin <cmd>
	<parameter>
	<parameter>
	...
	end

may be a better design, no?

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-03 19:57   ` Junio C Hamano
@ 2022-02-04  4:11     ` John Cai
  2022-02-04 16:46       ` Phillip Wood
  0 siblings, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-04  4:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, avarab, e,
	bagasdotme

Hi Junio, thanks for the review!

On 3 Feb 2022, at 14:57, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode
>
> "cat-file: add --batch-command mode" perhaps.  The patch touching
> the file "catfile.c" (which does not even exist) is an irrelevant
> implementation detail to spend 2 extra bytes in "git shortlog"
> output.
>
>> From: John Cai <johncai86@gmail.com>
>>
>> Add new flag --batch-command that will accept commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>>
>> At GitLab, we use a pair of long running cat-file processes when
>> accessing object content. One for iterating over object metadata with
>> --batch-check, and the other to grab object contents with --batch.
>>
>> However, if we had --batch-command, we wouldnt need to keep both
>> processes around, and instead just have one process where we can flip
>> between getting object info, and getting object contents. This means we
>> can get rid of roughly half of long lived git cat-file processes. This
>> can lead to huge savings since on a given server there could be hundreds
>> of git cat-file processes running.
>
> Hmph, why hundreds, not two you listed?
>
> Do you mean "we have two per repository, but by combining, we can do
> with just one per repository, halving the number of processes"?

Yes, exactly. I'll reword this in the next version to be more clear.

>
>> git cat-file --batch-command
>>
>> would enter an interactive command mode whereby the user can enter in
>> commands and their arguments:
>>
>> <command> [arg1] [arg2] NL
>>
>> This patch adds the basic structure for add command which can be
>> extended in the future to add more commands.
>>
>> This patch also adds the following commands:
>>
>> contents <object> NL
>> info <object> NL
>>
>> The contents command takes an <object> argument and prints out the object
>> contents.
>>
>> The info command takes a <object> argument and prints out the object
>> metadata.
>>
>> In addition, we need a set of commands that enable a "read session".
>>
>> When a separate process (A) is connected to a git cat-file process (B)
>
> This description misleads readers into thinking as if we have just
> created a daemon process that is running, and an unrelated process
> can connect to it, which obviously poses a question about securing
> the connection.  It is my understanding that what this creates is
> just a consumer process (A) starts the cat-file process (B) locally
> on its behalf under process (A)'s privilege, and they talk over pipe
> without allowing any third-party to participate in the exchange, so
> we should avoid misleading users by saying "is connected to" here.

Yes this understanding is correct. Will fix wording in next version

>
>> and is interactively writing to and reading from it in --buffer mode,
>> (A) needs to be able to know when the buffer is flushed to stdout.
>
> If A and B are talking over a pair pipes, in order to avoid
> deadlocking, both ends need to be able to control whose turn it is
> to speak (and it is turn for the other side to listen).  A needs to
> be able to _control_ (not "know") when the buffer it uses to write
> to B gets flushed, in order to reliably say "I am done for now, it
> is your turn to speak" and be assured that it reaches B.  The story
> is the same for the other side.  When a request by A needs to be
> responded with multiple lines of output, B needs to be able to say
> "And that concludes my response, and I am ready to accept a new
> request from you" and make sure it reaches A.  "know when..." is
> probably a wrong phrase here.

Correct, "know" is not exactly right. _control_ would be the more accurate description.
>
>> Currently, from (A)'s perspective, the only way is to either 1. exit
>> (B)'s process or 2. send an invalid object to stdin. 1. is not ideal
>> from a performance perspective as it will require spawning a new cat-file
>> process each time, and 2. is hacky and not a good long term solution.
>
> Writing enumeration as bulletted or enumerated list would make it
> much easier to read, I would think.
>
>     From (A)'s perspective, the only way is to either
>
>     1. exit (B)'s process or
>     2. send an invalid object to stdin.
>
>     1. is not ideal from a performance perspective, as it will
>     require spawning a new cat-file process each time, and 2. is
>     hacky and not a good long term solution.
>
> I am not sure what you exactly mean by "exit" in the above.  Do you
> mean "kill" instead?

Yes

>
>> With the following commands, process (A) can begin a "session" and
>> send a list of object names over stdin. When "get contents" or "get info"
>> is issued, this list of object names will be fed into batch_one_object()
>> to retrieve either info or contents. Finally an fflush() will be called
>> to end the session.
>>
>> begin NL
>> get contents NL
>> get info NL
>>
>> These can be used in the following way:
>>
>> begin
>> <sha1>
>> <sha1>
>> <sha1>
>> <sha1>
>> get info
>>
>> begin
>> <sha1>
>> <sha1>
>> <sha1>
>> <sha1>
>> get contents
>>
>> With this mechanism, process (A) can be guaranteed to receive all of the
>> output even in --buffer mode.
>
> OK, so do these "get blah" serve both as command and an implicit
> "flush"?

yes, that's the idea.

>
> With an implicit "flush", do we really need "begin"?
>
> Also, from the point of view of extensibility, not saying what kind
> of operation is started when given "begin" is probably not a good
> idea.  "get info" and "get contents" may happen to be the only
> commands that are supported right now, and the parameters to them
> may happen to be just list of object names and nothing else, but
> what happens when a new "git frotz" command is added and its
> operation is extended with something other than object names and
> pathnames?  The way to parse these parameter lines for the "get"
> would be different for different commands, and if "cat-file" knows
> upfront what is to be done to these parameters, it can even start
> prefetching and precomputing to reduce latency observed by the
> client before the final "get info" command is given.
>
> So, from that point of view,
>
>     begin <cmd>
>     <parameter>
>     <parameter>
>     ...
>     end
>
> may be a better design, no?

Good point. Now I'm wondering if we can simplify where commands get queued up
and a "get" will execute them along with an implicit flush.

<cmd> <parameter>
<cmd> <parameter>
<cmd> <parameter>
get

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
  2022-02-03 19:57   ` Junio C Hamano
@ 2022-02-04  6:45   ` Eric Sunshine
  2022-02-04 21:41     ` John Cai
  2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 97+ messages in thread
From: Eric Sunshine @ 2022-02-04  6:45 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, John Cai

()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Add new flag --batch-command that will accept commands and arguments
> from stdin, similar to git-update-ref --stdin.
>
> contents <object> NL
> info <object> NL
>
> With the following commands, process (A) can begin a "session" and
> send a list of object names over stdin. When "get contents" or "get info"
> is issued, this list of object names will be fed into batch_one_object()
> to retrieve either info or contents. Finally an fflush() will be called
> to end the session.
>
> begin
> <sha1>
> get info
>
> begin
> <sha1>
> get contents

I had the same reaction to this new command set as Junio expressed
upstream[1], and was prepared to suggest an alternative but Junio said
everything I wanted to say, so I won't say anything more about it
here.

That aside, the implementation presented here is overly loose about
malformed input and accepts all sorts of bogus invocations which it
should be reporting as errors. For instance, a lone:

    get info

without a preceding `begin` should be an error but such bogus input is
not diagnosed. `get contents` is likewise affected. Another example:

    begin
    <oid>
    <EOF>

which lacks a closing `get info` or `get contents` is silently
ignored. Similarly, malformed:

    begin
    begin
    ...

should be reported as an error but is not.

There is also a bug in which the accumulated list of OID's is never
cleared. Thus:

    begin
    <oid>
    get info
    get info

emits info about <oid> twice, once for each `get info` invocation. Similarly:

    begin
    <oid1>
    get info
    <oid2>
    get info

emits information about <oid1> twice and <oid2> once. Invoking `begin`
between the `get info` commands doesn't help because `begin` is a
no-op. Thus:

    begin
    <oid1>
    get info
    begin
    <oid2>
    get info

likewise emits information about <oid1> twice and <oid2> once.

It also incorrectly accepts non-"session" commands in the middle of a
session. For instance:

    begin
    <oid1>
    info <oid2>
    <oid3>

immediately emits information about <oid2> and then bombs out claiming
that <oid3> is an "unknown command" because the `info` command --
which should not have been allowed within a session -- prematurely
ended the "session".

The `info` and `contents` commands neglect to do any sort of
validation of their arguments, thus any and all bogus invocations are
accepted. Thus:

    info
    info <arg1> <arg2>
    info <non-oid>

are all accepted as valid invocations, misleadingly producing "<foo>
missing" messages, rather than erroring out as they should. The "<oid>
missing" message should be reserved for the case when the lone
argument to `info` or `contents` is something which looks like a
legitimate OID.

The above is a long-winded way of saying that it is important not only
to check the obvious "expect success" cases when implementing a new
feature, but it's also important to add checks for the "expect
failure" cases, such as all the above malformed inputs.

It's subjective, but it feels like this implementation is trying to be
too clever by handling all these cases via a single strbuf_getline()
loop in batch_objects_command(). It would be easier to reason about,
and would have avoided some of the above problems, for instance, if
handling of `begin` was split out to its own function which looped
over strbuf_getline() itself, thus could easily have detected
premature EOF (lacking `get info` or `get contents`) and likewise
would not have allowed `info` or `contents` commands to be executed
within a "session".

Similarly (again subjective), the generic command dispatcher seems
over-engineered and perhaps contributed to the oversight of `info` and
`contents` failing to perform validation of their arguments. A simpler
hand-rolled command-response loop is more common on this project and
often easier to reason about. Perhaps something like this
(pseudo-code):

    while (strbuf_getline(&input, stdin)) {
        char *end_cmd = strchrnul(&input.buf, ' ');
        const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd;
        *end_cmd = '\0';
        if (strcmp(input.buf, "info"))
            show_info(argstart);
        else if (strcmp(input.buf, "contents"))
            show_contents(argstart);
        else if (strcmp(input.buf, "begin"))
            begin_session(argstart);
        else
            die(...);
    }

and each of the command-handler functions would perform its own
argument validation (including the case when no argument should be
present).

[1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/

> +static void parse_cmd_begin(struct batch_options *opt,
> +                          const char *line,
> +                          struct strbuf *output,
> +                          struct expand_data *data,
> +                          struct string_list revs)
> +{
> +       /* nothing needs to be done here */
> +}

Either this function should be clearing the list of collected OID's or...

> +static void parse_cmd_get(struct batch_options *opt,
> +                          const char *line,
> +                          struct strbuf *output,
> +                          struct expand_data *data,
> +                          struct string_list revs)
> +{
> +       struct string_list_item *item;
> +       for_each_string_list_item(item, &revs) {
> +               batch_one_object(item->string, output, opt, data);
> +       }
> +       if (opt->buffer_output)
> +               fflush(stdout);

... this function should do so after it's done processing the OID list.

> +}
> +static void parse_cmd_get_info(struct batch_options *opt,
> +                          const char *line,
> +                          struct strbuf *output,
> +                          struct expand_data *data,
> +                          struct string_list revs)

Missing blank line before the function definition.

> +static void parse_cmd_get_objects(struct batch_options *opt,
> +                          const char *line,
> +                          struct strbuf *output,
> +                          struct expand_data *data,
> +                          struct string_list revs)
> +{
> +       opt->print_contents = 1;
> +       parse_cmd_get(opt, line, output, data, revs);
> +       if (opt->buffer_output)
> +               fflush(stdout);
> +}

Why does this function duplicate the flushing logic of parse_cmd_get()
immediately after calling parse_cmd_get()?

> +static void batch_objects_command(struct batch_options *opt,
> +                                   struct strbuf *output,
> +                                   struct expand_data *data)
> +{
> +       /* Read each line dispatch its command */

Missing "and".

> +       while (!strbuf_getline(&input, stdin)) {
> +               if (state == BATCH_STATE_COMMAND) {
> +                       if (*input.buf == '\n')
> +                               die("empty command in input");

This never triggers because strbuf_getline() removes the line
terminator before returning.

> +                       else if (isspace(*input.buf))
> +                               die("whitespace before command: %s", input.buf);
> +               }
> +
> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +                       if (!skip_prefix(input.buf, prefix, &cmd_end))
> +                               continue;
> +                       /*
> +                        * If the command has arguments, verify that it's
> +                        * followed by a space. Otherwise, it shall be followed
> +                        * by a line terminator.
> +                        */
> +                       c = commands[i].takes_args ? ' ' : '\n';
> +                       if (*cmd_end && *cmd_end != c)
> +                               die("arguments invalid for command: %s", commands[i].prefix);

Ditto regarding strbuf_getline() removing line-terminator.

> +                       cmd = &commands[i];
> +                       if (cmd->takes_args)
> +                               p = cmd_end + 1;
> +                       break;
> +               }
> +
> +               if (input.buf[input.len - 1] == '\n')
> +                       input.buf[--input.len] = '\0';

Ditto again. This is especially scary since it's accessing element
`input.len-1` without even checking if `input.len` is greater than
zero.

Also, it's better to use published strbuf API rather than mucking with
the internals:

    strbuf_setlen(&input, input.len - 1);

> +               if (state == BATCH_STATE_INPUT && !cmd){
> +                       string_list_append(&revs, input.buf);
> +                       continue;
> +               }
> +
> +               if (!cmd)
> +                       die("unknown command: %s", input.buf);
> +
> +               state = cmd->next_state;
> +               cmd->fn(opt, p, output, data, revs);
> +       }
> +       strbuf_release(&input);
> +       string_list_clear(&revs, 0);
> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/2] cat-file.c: rename cmdmode to mode
  2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
  2022-02-03 19:28   ` Junio C Hamano
@ 2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 97+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-04 12:10 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, e, bagasdotme, gitster, John Cai


On Thu, Feb 03 2022, John Cai via GitGitGadget wrote:

> From: John Cai <johncai86@gmail.com>
>
> To prepare for a new flag --batch-command, we will add a flag that
> indicates whether or not an interactive command mode will be used
> that reads commands and arguments off of stdin.
>
> An intuitive name for this flag would be "command", which can get
> confusing with the already existing cmdmode.
>
> Rename cmdmode->mode to prepare for this change.
>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  builtin/cat-file.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index d94050e6c18..858bca208ff 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -24,7 +24,7 @@ struct batch_options {
>  	int buffer_output;
>  	int all_objects;
>  	int unordered;
> -	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
> +	int mode; /* may be 'w' or 'c' for --filters or --textconv */
>  	const char *format;
>  };
>  
> @@ -306,19 +306,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  	if (data->type == OBJ_BLOB) {
>  		if (opt->buffer_output)
>  			fflush(stdout);
> -		if (opt->cmdmode) {
> +		if (opt->mode) {
>  			char *contents;
>  			unsigned long size;
>  
>  			if (!data->rest)
>  				die("missing path for '%s'", oid_to_hex(oid));
>  
> -			if (opt->cmdmode == 'w') {
> +			if (opt->mode == 'w') {
>  				if (filter_object(data->rest, 0100644, oid,
>  						  &contents, &size))
>  					die("could not convert '%s' %s",
>  					    oid_to_hex(oid), data->rest);
> -			} else if (opt->cmdmode == 'c') {
> +			} else if (opt->mode == 'c') {
>  				enum object_type type;
>  				if (!textconv_object(the_repository,
>  						     data->rest, 0100644, oid,
> @@ -330,7 +330,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  					die("could not convert '%s' %s",
>  					    oid_to_hex(oid), data->rest);
>  			} else
> -				BUG("invalid cmdmode: %c", opt->cmdmode);
> +				BUG("invalid mode: %c", opt->mode);
>  			batch_write(opt, contents, size);
>  			free(contents);
>  		} else {
> @@ -533,7 +533,7 @@ static int batch_objects(struct batch_options *opt)
>  	strbuf_expand(&output, opt->format, expand_format, &data);
>  	data.mark_query = 0;
>  	strbuf_release(&output);
> -	if (opt->cmdmode)
> +	if (opt->mode)
>  		data.split_on_whitespace = 1;
>  
>  	/*
> @@ -695,10 +695,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  
>  	batch.buffer_output = -1;
>  	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
> -
>  	if (opt) {
>  		if (batch.enabled && (opt == 'c' || opt == 'w'))
> -			batch.cmdmode = opt;
> +			batch.mode = opt;
>  		else if (argc == 1)
>  			obj_name = argv[0];
>  		else
> @@ -712,9 +711,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  			usage_with_options(cat_file_usage, options);
>  	}
>  	if (batch.enabled) {
> -		if (batch.cmdmode != opt || argc)
> +		if (batch.mode != opt || argc)
>  			usage_with_options(cat_file_usage, options);
> -		if (batch.cmdmode && batch.all_objects)
> +		if (batch.mode && batch.all_objects)
>  			die("--batch-all-objects cannot be combined with "
>  			    "--textconv nor with --filters");
>  	}

There's a rather major rewrite in-flight to this codepath in
ab/cat-file. I think it would be better to base this on top of that. Per
Junio's latest "What's Cooking" that may be getting merged down today or
tomorrow.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
  2022-02-03 19:57   ` Junio C Hamano
  2022-02-04  6:45   ` Eric Sunshine
@ 2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
  2022-02-04 16:51     ` Phillip Wood
  2 siblings, 1 reply; 97+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-04 12:11 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, e, bagasdotme, gitster, John Cai


On Thu, Feb 03 2022, John Cai via GitGitGadget wrote:

> From: John Cai <johncai86@gmail.com>

[Trying not to pile on and mentioning some things others haven't, but
maybe there'll be duplications]

>  	requested batch operation on all objects in the repository and
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 858bca208ff..29d5cd6857b 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -26,6 +26,7 @@ struct batch_options {
>  	int unordered;
>  	int mode; /* may be 'w' or 'c' for --filters or --textconv */
>  	const char *format;
> +	int command;
>  };

Not sure why it's not added to "int mode", but in any case
post-ab/cat-file that might be clearer...

> +	/* Read each line dispatch its command */
> +	while (!strbuf_getline(&input, stdin)) {

I think comments that are obvious from the code are probably best
dropped. We're just doing a fairly obvious consume stdin pattern here.

> +		int i;
> +		const struct parse_cmd *cmd = NULL;
> +		const char *p, *cmd_end;
> +
> +		if (state == BATCH_STATE_COMMAND) {
> +			if (*input.buf == '\n')
> +				die("empty command in input");
> +			else if (isspace(*input.buf))
> +				die("whitespace before command: %s", input.buf);

These should use _() to mark strings for translation, and let's quote %s
like '%s'

> +		}
> +
> +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +			const char *prefix = commands[i].prefix;
> +			char c;

nit: add a \n between variable decls and code.

> +			if (!skip_prefix(input.buf, prefix, &cmd_end))
> +				continue;
> +			/*
> +			 * If the command has arguments, verify that it's
> +			 * followed by a space. Otherwise, it shall be followed
> +			 * by a line terminator.
> +			 */

I'd say ditto on the "the code says that" comments...

> +			c = commands[i].takes_args ? ' ' : '\n';
> +			if (*cmd_end && *cmd_end != c)
> +				die("arguments invalid for command: %s", commands[i].prefix);
> +
> +			cmd = &commands[i];
> +			if (cmd->takes_args)
> +				p = cmd_end + 1;
> +			break;
> +		}
> +
> +		if (input.buf[input.len - 1] == '\n')
> +			input.buf[--input.len] = '\0';

Don't we mean strbuf_trim_trailing_newline() here, or do we not want
Windows newlines to be accepted?

But more generally doesn't one of the strbuf_getline_*() functions do
the right thing here already?

> +
> +		if (state == BATCH_STATE_INPUT && !cmd){
> +			string_list_append(&revs, input.buf);

Nit: You can save yourself some malloc() churn here with:

    string_list_append_nodup(..., strbuf_detach(&input, NULL));

I.e. we're looping over the input, here we're done, so we might as well
steal the already alloc'd string....

> +			continue;
> +		}
> +
> +		if (!cmd)
> +			die("unknown command: %s", input.buf);
> +
> +		state = cmd->next_state;
> +		cmd->fn(opt, p, output, data, revs);
> +	}
> +	strbuf_release(&input);
> +	string_list_clear(&revs, 0);

...and these will do the right thing, as strbuf will notice the string
is stolen (it'll be the slopbuf again), and due to the combination of
*_DUP and *_nodup() we'll properly free it here too.

> +}
> +
>  static int batch_objects(struct batch_options *opt)
>  {
>  	struct strbuf input = STRBUF_INIT;
> @@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt)
>  	struct expand_data data;
>  	int save_warning;
>  	int retval = 0;
> +	const int command = opt->command;
>  
>  	if (!opt->format)
>  		opt->format = "%(objectname) %(objecttype) %(objectsize)";
> @@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt)
>  	save_warning = warn_on_object_refname_ambiguity;
>  	warn_on_object_refname_ambiguity = 0;
>  
> -	while (strbuf_getline(&input, stdin) != EOF) {
> -		if (data.split_on_whitespace) {
> -			/*
> -			 * Split at first whitespace, tying off the beginning
> -			 * of the string and saving the remainder (or NULL) in
> -			 * data.rest.
> -			 */
> -			char *p = strpbrk(input.buf, " \t");
> -			if (p) {
> -				while (*p && strchr(" \t", *p))
> -					*p++ = '\0';
> +	if (command)
> +		batch_objects_command(opt, &output, &data);
> +	else {

Style: {} braces for all arms if one requires it.

> +		while (strbuf_getline(&input, stdin) != EOF) {
> +			if (data.split_on_whitespace) {

diff nit: maybe we can find some way to not require re-indenting the existing code. E.g.:
	
	if (command) {
		batch_objects_command(...);
	        goto cleanup;
	}

...

> +				/*
> +				 * Split at first whitespace, tying off the beginning
> +				 * of the string and saving the remainder (or NULL) in
> +				 * data.rest.
> +				 */
> +				char *p = strpbrk(input.buf, " \t");
> +				if (p) {
> +					while (*p && strchr(" \t", *p))
> +						*p++ = '\0';
> +				}
> +				data.rest = p;
>  			}
> -			data.rest = p;
> +			batch_one_object(input.buf, &output, opt, &data);
>  		}
> -
> -		batch_one_object(input.buf, &output, opt, &data);
>  	}
>  

...and then just add a "cleanup:" label here.

>  	strbuf_release(&input);
> @@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt,
>  
>  	bo->enabled = 1;
>  	bo->print_contents = !strcmp(opt->long_name, "batch");
> +	bo->command = !strcmp(opt->long_name, "batch-command");
>  	bo->format = arg;
>  
>  	return 0;
> @@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  			N_("show info about objects fed from the standard input"),
>  			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>  			batch_option_callback),
> +		OPT_CALLBACK_F(0, "batch-command", &batch, N_(""),

You're either missing a string here in "", or we don't need N_() to mark
it for translation.

> +			 N_("enters batch mode that accepts commands"),
> +			 PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
> +			 batch_option_callback),
> +
>  		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
>  			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
>  		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 39382fa1958..7360d049113 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -85,6 +85,34 @@ $content"
>  	test_cmp expect actual
>      '
>  
> +    test -z "$content" ||
> +    test_expect_success "--batch-command output of $type content is correct" '
> +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +	maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual &&
> +	test_cmp expect actual
> +    '
> +
> +    test -z "$content" ||
> +    test_expect_success "--batch-command session for $type content is correct" '
> +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +	maybe_remove_timestamp \
> +		"$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \
> +		$no_ts >actual &&
> +	test_cmp expect actual
> +    '
> +
> +    test_expect_success "--batch-command output of $type info is correct" '
> +	echo "$sha1 $type $size" >expect &&
> +	echo "info $sha1" | git cat-file --batch-command >actual &&
> +	test_cmp expect actual
> +    '
> +
> +    test_expect_success "--batch-command session for $type info is correct" '
> +	echo "$sha1 $type $size" >expect &&
> +	test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual &&
> +	test_cmp expect actual
> +    '
> +
>      test_expect_success "custom --batch-check format" '
>  	echo "$type $sha1" >expect &&
>  	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
> @@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
>  '
>  
>  tree_sha1=$(git write-tree)
> +

stray newline addition.

>  tree_size=$(($(test_oid rawsz) + 13))
>  tree_pretty_content="100644 blob $hello_sha1	hello"
>  
> @@ -175,7 +204,7 @@ test_expect_success \
>      "Reach a blob from a tag pointing to it" \
>      "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>  
> -for batch in batch batch-check
> +for batch in batch batch-check batch-command
>  do
>      for opt in t s e p
>      do
> @@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>      "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>  '
>  
> +test_expect_success "--batch-command with multiple sha1s gives correct format" '
> +    echo "$batch_check_output" >expect &&
> +    echo begin >input &&
> +    echo_without_newline "$batch_check_input" >>input &&
> +    echo "get info" >>input &&
> +    git cat-file --batch-command <input >actual &&
> +    test_cmp expect actual
> +'

indentation with spaces, \t correctly used for the rest.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-04  4:11     ` John Cai
@ 2022-02-04 16:46       ` Phillip Wood
  0 siblings, 0 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-04 16:46 UTC (permalink / raw)
  To: John Cai, Junio C Hamano
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme

Hi John

On 04/02/2022 04:11, John Cai wrote:
> Hi Junio, thanks for the review!
>  [...]
>> So, from that point of view,
>>
>>      begin <cmd>
>>      <parameter>
>>      <parameter>
>>      ...
>>      end
>>
>> may be a better design, no?
> 
> Good point. Now I'm wondering if we can simplify where commands get queued up
> and a "get" will execute them along with an implicit flush.
> 
> <cmd> <parameter>
> <cmd> <parameter>
> <cmd> <parameter>
> get

I think that would be an improvement (I'd suggest calling "get" "flush" 
instead), the begin ... get <cmd> sequence seems unnecessarily complex 
compared the the RFC of this series. If the user gives --buffer on the 
command line then we can buffer the input until we see a "flush" command 
and if the user does not give --buffer on the command line then we 
should flush the output after each command.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
@ 2022-02-04 16:51     ` Phillip Wood
  0 siblings, 0 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-04 16:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, John Cai via GitGitGadget
  Cc: git, me, e, bagasdotme, gitster, John Cai

Hi John/Ævar

On 04/02/2022 12:11, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Feb 03 2022, John Cai via GitGitGadget wrote:
> 
>> From: John Cai <johncai86@gmail.com>
>> +		if (state == BATCH_STATE_INPUT && !cmd){
>> +			string_list_append(&revs, input.buf);
> 
> Nit: You can save yourself some malloc() churn here with:
> 
>      string_list_append_nodup(..., strbuf_detach(&input, NULL));
> 
> I.e. we're looping over the input, here we're done, so we might as well
> steal the already alloc'd string....

If we do that then the strbuf will have to reallocate its buffer when it 
reads the next input line on the next iteration of the loop. As the 
strbuf will have likely over allocated the buffer using the 
string_list_append_nodup() + strbuf_detach() will will be less efficient 
overall than using string_list_append().

Best Wishes

Phillip

>> +			continue;
>> +		}
>> +
>> +		if (!cmd)
>> +			die("unknown command: %s", input.buf);
>> +
>> +		state = cmd->next_state;
>> +		cmd->fn(opt, p, output, data, revs);
>> +	}
>> +	strbuf_release(&input);
>> +	string_list_clear(&revs, 0);
> 
> ...and these will do the right thing, as strbuf will notice the string
> is stolen (it'll be the slopbuf again), and due to the combination of
> *_DUP and *_nodup() we'll properly free it here too.
> 
>> +}
>> +
>>   static int batch_objects(struct batch_options *opt)
>>   {
>>   	struct strbuf input = STRBUF_INIT;
>> @@ -519,6 +665,7 @@ static int batch_objects(struct batch_options *opt)
>>   	struct expand_data data;
>>   	int save_warning;
>>   	int retval = 0;
>> +	const int command = opt->command;
>>   
>>   	if (!opt->format)
>>   		opt->format = "%(objectname) %(objecttype) %(objectsize)";
>> @@ -594,22 +741,25 @@ static int batch_objects(struct batch_options *opt)
>>   	save_warning = warn_on_object_refname_ambiguity;
>>   	warn_on_object_refname_ambiguity = 0;
>>   
>> -	while (strbuf_getline(&input, stdin) != EOF) {
>> -		if (data.split_on_whitespace) {
>> -			/*
>> -			 * Split at first whitespace, tying off the beginning
>> -			 * of the string and saving the remainder (or NULL) in
>> -			 * data.rest.
>> -			 */
>> -			char *p = strpbrk(input.buf, " \t");
>> -			if (p) {
>> -				while (*p && strchr(" \t", *p))
>> -					*p++ = '\0';
>> +	if (command)
>> +		batch_objects_command(opt, &output, &data);
>> +	else {
> 
> Style: {} braces for all arms if one requires it.
> 
>> +		while (strbuf_getline(&input, stdin) != EOF) {
>> +			if (data.split_on_whitespace) {
> 
> diff nit: maybe we can find some way to not require re-indenting the existing code. E.g.:
> 	
> 	if (command) {
> 		batch_objects_command(...);
> 	        goto cleanup;
> 	}
> 
> ...
> 
>> +				/*
>> +				 * Split at first whitespace, tying off the beginning
>> +				 * of the string and saving the remainder (or NULL) in
>> +				 * data.rest.
>> +				 */
>> +				char *p = strpbrk(input.buf, " \t");
>> +				if (p) {
>> +					while (*p && strchr(" \t", *p))
>> +						*p++ = '\0';
>> +				}
>> +				data.rest = p;
>>   			}
>> -			data.rest = p;
>> +			batch_one_object(input.buf, &output, opt, &data);
>>   		}
>> -
>> -		batch_one_object(input.buf, &output, opt, &data);
>>   	}
>>   
> 
> ...and then just add a "cleanup:" label here.
> 
>>   	strbuf_release(&input);
>> @@ -646,6 +796,7 @@ static int batch_option_callback(const struct option *opt,
>>   
>>   	bo->enabled = 1;
>>   	bo->print_contents = !strcmp(opt->long_name, "batch");
>> +	bo->command = !strcmp(opt->long_name, "batch-command");
>>   	bo->format = arg;
>>   
>>   	return 0;
>> @@ -682,6 +833,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>>   			N_("show info about objects fed from the standard input"),
>>   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>>   			batch_option_callback),
>> +		OPT_CALLBACK_F(0, "batch-command", &batch, N_(""),
> 
> You're either missing a string here in "", or we don't need N_() to mark
> it for translation.
> 
>> +			 N_("enters batch mode that accepts commands"),
>> +			 PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>> +			 batch_option_callback),
>> +
>>   		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
>>   			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
>>   		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
>> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
>> index 39382fa1958..7360d049113 100755
>> --- a/t/t1006-cat-file.sh
>> +++ b/t/t1006-cat-file.sh
>> @@ -85,6 +85,34 @@ $content"
>>   	test_cmp expect actual
>>       '
>>   
>> +    test -z "$content" ||
>> +    test_expect_success "--batch-command output of $type content is correct" '
>> +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
>> +	maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual &&
>> +	test_cmp expect actual
>> +    '
>> +
>> +    test -z "$content" ||
>> +    test_expect_success "--batch-command session for $type content is correct" '
>> +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
>> +	maybe_remove_timestamp \
>> +		"$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \
>> +		$no_ts >actual &&
>> +	test_cmp expect actual
>> +    '
>> +
>> +    test_expect_success "--batch-command output of $type info is correct" '
>> +	echo "$sha1 $type $size" >expect &&
>> +	echo "info $sha1" | git cat-file --batch-command >actual &&
>> +	test_cmp expect actual
>> +    '
>> +
>> +    test_expect_success "--batch-command session for $type info is correct" '
>> +	echo "$sha1 $type $size" >expect &&
>> +	test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual &&
>> +	test_cmp expect actual
>> +    '
>> +
>>       test_expect_success "custom --batch-check format" '
>>   	echo "$type $sha1" >expect &&
>>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
>> @@ -141,6 +169,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
>>   '
>>   
>>   tree_sha1=$(git write-tree)
>> +
> 
> stray newline addition.
> 
>>   tree_size=$(($(test_oid rawsz) + 13))
>>   tree_pretty_content="100644 blob $hello_sha1	hello"
>>   
>> @@ -175,7 +204,7 @@ test_expect_success \
>>       "Reach a blob from a tag pointing to it" \
>>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>>   
>> -for batch in batch batch-check
>> +for batch in batch batch-check batch-command
>>   do
>>       for opt in t s e p
>>       do
>> @@ -281,6 +310,15 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>>   '
>>   
>> +test_expect_success "--batch-command with multiple sha1s gives correct format" '
>> +    echo "$batch_check_output" >expect &&
>> +    echo begin >input &&
>> +    echo_without_newline "$batch_check_input" >>input &&
>> +    echo "get info" >>input &&
>> +    git cat-file --batch-command <input >actual &&
>> +    test_cmp expect actual
>> +'
> 
> indentation with spaces, \t correctly used for the rest.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-04  6:45   ` Eric Sunshine
@ 2022-02-04 21:41     ` John Cai
  2022-02-05  6:52       ` Eric Sunshine
  0 siblings, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-04 21:41 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: John Cai via GitGitGadget, Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano

Hi Eric,

I appreciate the feedback and the thorough review!

On 4 Feb 2022, at 1:45, Eric Sunshine wrote:

> ()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Add new flag --batch-command that will accept commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>>
>> contents <object> NL
>> info <object> NL
>>
>> With the following commands, process (A) can begin a "session" and
>> send a list of object names over stdin. When "get contents" or "get info"
>> is issued, this list of object names will be fed into batch_one_object()
>> to retrieve either info or contents. Finally an fflush() will be called
>> to end the session.
>>
>> begin
>> <sha1>
>> get info
>>
>> begin
>> <sha1>
>> get contents
>
> I had the same reaction to this new command set as Junio expressed
> upstream[1], and was prepared to suggest an alternative but Junio said
> everything I wanted to say, so I won't say anything more about it
> here.
>
> That aside, the implementation presented here is overly loose about
> malformed input and accepts all sorts of bogus invocations which it
> should be reporting as errors. For instance, a lone:
>
>     get info
>
> without a preceding `begin` should be an error but such bogus input is
> not diagnosed. `get contents` is likewise affected. Another example:
>
>     begin
>     <oid>
>     <EOF>
>
> which lacks a closing `get info` or `get contents` is silently
> ignored. Similarly, malformed:
>
>     begin
>     begin
>     ...
>
> should be reported as an error but is not.
>
> There is also a bug in which the accumulated list of OID's is never
> cleared. Thus:
>
>     begin
>     <oid>
>     get info
>     get info
>
> emits info about <oid> twice, once for each `get info` invocation. Similarly:
>
>     begin
>     <oid1>
>     get info
>     <oid2>
>     get info
>
> emits information about <oid1> twice and <oid2> once. Invoking `begin`
> between the `get info` commands doesn't help because `begin` is a
> no-op. Thus:
>
>     begin
>     <oid1>
>     get info
>     begin
>     <oid2>
>     get info
>
> likewise emits information about <oid1> twice and <oid2> once.
>
> It also incorrectly accepts non-"session" commands in the middle of a
> session. For instance:
>
>     begin
>     <oid1>
>     info <oid2>
>     <oid3>
>
> immediately emits information about <oid2> and then bombs out claiming
> that <oid3> is an "unknown command" because the `info` command --
> which should not have been allowed within a session -- prematurely
> ended the "session".
>
> The `info` and `contents` commands neglect to do any sort of
> validation of their arguments, thus any and all bogus invocations are
> accepted. Thus:
>
>     info
>     info <arg1> <arg2>
>     info <non-oid>
>
> are all accepted as valid invocations, misleadingly producing "<foo>
> missing" messages, rather than erroring out as they should. The "<oid>
> missing" message should be reserved for the case when the lone
> argument to `info` or `contents` is something which looks like a
> legitimate OID.

So actually the argument to info and contents doesn't have to be an OID but an object name that
eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the
same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My
goal was to maintain this behavior in --batch-command.
>
> The above is a long-winded way of saying that it is important not only
> to check the obvious "expect success" cases when implementing a new
> feature, but it's also important to add checks for the "expect
> failure" cases, such as all the above malformed inputs.

I overall agree and will add test coverage for invalid input.

> It's subjective, but it feels like this implementation is trying to be
> too clever by handling all these cases via a single strbuf_getline()
> loop in batch_objects_command(). It would be easier to reason about,
> and would have avoided some of the above problems, for instance, if
> handling of `begin` was split out to its own function which looped
> over strbuf_getline() itself, thus could easily have detected
> premature EOF (lacking `get info` or `get contents`) and likewise
> would not have allowed `info` or `contents` commands to be executed
> within a "session".
>
> Similarly (again subjective), the generic command dispatcher seems
> over-engineered and perhaps contributed to the oversight of `info` and
> `contents` failing to perform validation of their arguments. A simpler
> hand-rolled command-response loop is more common on this project and
> often easier to reason about. Perhaps something like this
> (pseudo-code):
>
>     while (strbuf_getline(&input, stdin)) {
>         char *end_cmd = strchrnul(&input.buf, ' ');
>         const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd;
>         *end_cmd = '\0';
>         if (strcmp(input.buf, "info"))
>             show_info(argstart);
>         else if (strcmp(input.buf, "contents"))
>             show_contents(argstart);
>         else if (strcmp(input.buf, "begin"))
>             begin_session(argstart);
>         else
>             die(...);
>     }

I think I agree that the code in this patch is a little hard to follow,
but now I'm planning to simplify the design by getting rid of "begin"[2].

I'm hoping this change will make the code easier to reason about.

2. https://lore.kernel.org/git/767d5f5a-8395-78bc-865f-a39acc39e061@gmail.com/

>
> and each of the command-handler functions would perform its own
> argument validation (including the case when no argument should be
> present).
>
> [1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/
>
>> +static void parse_cmd_begin(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       /* nothing needs to be done here */
>> +}
>
> Either this function should be clearing the list of collected OID's or...
>
>> +static void parse_cmd_get(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       struct string_list_item *item;
>> +       for_each_string_list_item(item, &revs) {
>> +               batch_one_object(item->string, output, opt, data);
>> +       }
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>
> ... this function should do so after it's done processing the OID list.
>
>> +}
>> +static void parse_cmd_get_info(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>
> Missing blank line before the function definition.
>
>> +static void parse_cmd_get_objects(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       opt->print_contents = 1;
>> +       parse_cmd_get(opt, line, output, data, revs);
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>> +}
>
> Why does this function duplicate the flushing logic of parse_cmd_get()
> immediately after calling parse_cmd_get()?
>
>> +static void batch_objects_command(struct batch_options *opt,
>> +                                   struct strbuf *output,
>> +                                   struct expand_data *data)
>> +{
>> +       /* Read each line dispatch its command */
>
> Missing "and".
>
>> +       while (!strbuf_getline(&input, stdin)) {
>> +               if (state == BATCH_STATE_COMMAND) {
>> +                       if (*input.buf == '\n')
>> +                               die("empty command in input");
>
> This never triggers because strbuf_getline() removes the line
> terminator before returning.

yes, this was an oversight. thanks for catching it!

>
>> +                       else if (isspace(*input.buf))
>> +                               die("whitespace before command: %s", input.buf);
>> +               }
>> +
>> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
>> +                       if (!skip_prefix(input.buf, prefix, &cmd_end))
>> +                               continue;
>> +                       /*
>> +                        * If the command has arguments, verify that it's
>> +                        * followed by a space. Otherwise, it shall be followed
>> +                        * by a line terminator.
>> +                        */
>> +                       c = commands[i].takes_args ? ' ' : '\n';
>> +                       if (*cmd_end && *cmd_end != c)
>> +                               die("arguments invalid for command: %s", commands[i].prefix);
>
> Ditto regarding strbuf_getline() removing line-terminator.
>
>> +                       cmd = &commands[i];
>> +                       if (cmd->takes_args)
>> +                               p = cmd_end + 1;
>> +                       break;
>> +               }
>> +
>> +               if (input.buf[input.len - 1] == '\n')
>> +                       input.buf[--input.len] = '\0';
>
> Ditto again. This is especially scary since it's accessing element
> `input.len-1` without even checking if `input.len` is greater than
> zero.

I realize we actually don't need this if block at all because strubf_getline()
strips the line terminator for us already.

>
> Also, it's better to use published strbuf API rather than mucking with
> the internals:
>
>     strbuf_setlen(&input, input.len - 1);
>
>> +               if (state == BATCH_STATE_INPUT && !cmd){
>> +                       string_list_append(&revs, input.buf);
>> +                       continue;
>> +               }
>> +
>> +               if (!cmd)
>> +                       die("unknown command: %s", input.buf);
>> +
>> +               state = cmd->next_state;
>> +               cmd->fn(opt, p, output, data, revs);
>> +       }
>> +       strbuf_release(&input);
>> +       string_list_clear(&revs, 0);
>> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/2] catfile.c: add --batch-command mode
  2022-02-04 21:41     ` John Cai
@ 2022-02-05  6:52       ` Eric Sunshine
  0 siblings, 0 replies; 97+ messages in thread
From: Eric Sunshine @ 2022-02-05  6:52 UTC (permalink / raw)
  To: John Cai
  Cc: John Cai via GitGitGadget, Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano

On Fri, Feb 4, 2022 at 4:42 PM John Cai <johncai86@gmail.com> wrote:
> On 4 Feb 2022, at 1:45, Eric Sunshine wrote:
> > The `info` and `contents` commands neglect to do any sort of
> > validation of their arguments, thus any and all bogus invocations are
> > accepted. Thus:
> >
> >     info
> >     info <arg1> <arg2>
> >     info <non-oid>
> >
> > are all accepted as valid invocations, misleadingly producing "<foo>
> > missing" messages, rather than erroring out as they should. The "<oid>
> > missing" message should be reserved for the case when the lone
> > argument to `info` or `contents` is something which looks like a
> > legitimate OID.
>
> So actually the argument to info and contents doesn't have to be an OID but an object name that
> eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the
> same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My
> goal was to maintain this behavior in --batch-command.

Okay, that makes sense. I was misled by the commit message talking
about "<sha1>" in its examples, and didn't do my due diligence by
digging more deeply into the existing documentation and code.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v2 0/2] Add cat-file --batch-command flag
  2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
  2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-07 16:33 ` John Cai via GitGitGadget
  2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-07 16:33 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has two parts:

 1. preparation patch to rename a variable
 2. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (2):
  cat-file: rename cmdmode to transform_mode
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  19 ++++
 builtin/cat-file.c             | 138 +++++++++++++++++++++--
 t/t1006-cat-file.sh            | 197 ++++++++++++++++++++++++++++++++-
 3 files changed, 346 insertions(+), 8 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v2
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v1:

 1:  86df0c9e4df ! 1:  2d9a0b06ce5 cat-file.c: rename cmdmode to mode
     @@ Metadata
      Author: John Cai <johncai86@gmail.com>
      
       ## Commit message ##
     -    cat-file.c: rename cmdmode to mode
     +    cat-file: rename cmdmode to transform_mode
      
     -    To prepare for a new flag --batch-command, we will add a flag that
     -    indicates whether or not an interactive command mode will be used
     -    that reads commands and arguments off of stdin.
     +    When introducing a new flag --batch-command, we will add a flag on the
     +    batch_options struct that indicates whether or not an interactive
     +    command mode will be used that reads commands and arguments off of
     +    stdin.
      
          An intuitive name for this flag would be "command", which can get
          confusing with the already existing cmdmode.
      
     -    Rename cmdmode->mode to prepare for this change.
     +    cmdmode refers to how the result output of the blob will be transformed,
     +    either according to --filter or --textconv. So transform_mode is a more
     +    descriptive name for the flag, and will not get confused with the new
     +    command flag to be added in the next commit.
     +
     +    Rename cmdmode to transform_mode in cat-file.c
      
          Signed-off-by: John Cai <johncai86@gmail.com>
      
     @@ builtin/cat-file.c: struct batch_options {
       	int all_objects;
       	int unordered;
      -	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
     -+	int mode; /* may be 'w' or 'c' for --filters or --textconv */
     ++	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
       	const char *format;
       };
       
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
       		if (opt->buffer_output)
       			fflush(stdout);
      -		if (opt->cmdmode) {
     -+		if (opt->mode) {
     ++		if (opt->transform_mode) {
       			char *contents;
       			unsigned long size;
       
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
       				die("missing path for '%s'", oid_to_hex(oid));
       
      -			if (opt->cmdmode == 'w') {
     -+			if (opt->mode == 'w') {
     ++			if (opt->transform_mode == 'w') {
       				if (filter_object(data->rest, 0100644, oid,
       						  &contents, &size))
       					die("could not convert '%s' %s",
       					    oid_to_hex(oid), data->rest);
      -			} else if (opt->cmdmode == 'c') {
     -+			} else if (opt->mode == 'c') {
     ++			} else if (opt->transform_mode == 'c') {
       				enum object_type type;
       				if (!textconv_object(the_repository,
       						     data->rest, 0100644, oid,
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
       					    oid_to_hex(oid), data->rest);
       			} else
      -				BUG("invalid cmdmode: %c", opt->cmdmode);
     -+				BUG("invalid mode: %c", opt->mode);
     ++				BUG("invalid transform_mode: %c", opt->transform_mode);
       			batch_write(opt, contents, size);
       			free(contents);
       		} else {
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	data.mark_query = 0;
       	strbuf_release(&output);
      -	if (opt->cmdmode)
     -+	if (opt->mode)
     ++	if (opt->transform_mode)
       		data.split_on_whitespace = 1;
       
       	/*
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
     - 
     - 	batch.buffer_output = -1;
     - 	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
     --
     - 	if (opt) {
     - 		if (batch.enabled && (opt == 'c' || opt == 'w'))
     --			batch.cmdmode = opt;
     -+			batch.mode = opt;
     - 		else if (argc == 1)
     - 			obj_name = argv[0];
     - 		else
     -@@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
     - 			usage_with_options(cat_file_usage, options);
     - 	}
     + 	/* Return early if we're in batch mode? */
       	if (batch.enabled) {
     --		if (batch.cmdmode != opt || argc)
     -+		if (batch.mode != opt || argc)
     - 			usage_with_options(cat_file_usage, options);
     --		if (batch.cmdmode && batch.all_objects)
     -+		if (batch.mode && batch.all_objects)
     - 			die("--batch-all-objects cannot be combined with "
     - 			    "--textconv nor with --filters");
     - 	}
     + 		if (opt_cw)
     +-			batch.cmdmode = opt;
     ++			batch.transform_mode = opt;
     + 		else if (opt && opt != 'b')
     + 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
     + 				       usage, options, opt);
 2:  ebd2a135601 ! 2:  1b63164ad4d catfile.c: add --batch-command mode
     @@ Metadata
      Author: John Cai <johncai86@gmail.com>
      
       ## Commit message ##
     -    catfile.c: add --batch-command mode
     +    cat-file: add --batch-command mode
      
     -    Add new flag --batch-command that will accept commands and arguments
     +    Add a new flag --batch-command that accepts commands and arguments
          from stdin, similar to git-update-ref --stdin.
      
          At GitLab, we use a pair of long running cat-file processes when
          accessing object content. One for iterating over object metadata with
          --batch-check, and the other to grab object contents with --batch.
      
     -    However, if we had --batch-command, we wouldnt need to keep both
     -    processes around, and instead just have one process where we can flip
     -    between getting object info, and getting object contents. This means we
     -    can get rid of roughly half of long lived git cat-file processes. This
     -    can lead to huge savings since on a given server there could be hundreds
     -    of git cat-file processes running.
     +    However, if we had --batch-command, we wouldn't need to keep both
     +    processes around, and instead just have one --batch-command process
     +    where we can flip between getting object info, and getting object
     +    contents. Since we have a pair of cat-file processes per repository,
     +    this means we can get rid of roughly half of long lived git cat-file
     +    processes. Given there are many repositories being accessed at any given
     +    time, this can lead to huge savings since on a given server.
      
          git cat-file --batch-command
      
     -    would enter an interactive command mode whereby the user can enter in
     -    commands and their arguments:
     +    will enter an interactive command mode whereby the user can enter in
     +    commands and their arguments that get queued in memory:
      
     -    <command> [arg1] [arg2] NL
     +    <command1> [arg1] [arg2] NL
     +    <command2> [arg1] [arg2] NL
      
     -    This patch adds the basic structure for add command which can be
     -    extended in the future to add more commands.
     +    When --buffer mode is used, commands will be queued in memory until a
     +    flush command is issued that execute them:
      
     -    This patch also adds the following commands:
     +    flush NL
     +
     +    The reason for a flush command is that when a consumer process (A)
     +    talks to a git cat-file process (B) and interactively writes to and
     +    reads from it in --buffer mode, (A) needs to be able to control when
     +    the buffer is flushed to stdout.
     +
     +    Currently, from (A)'s perspective, the only way is to either
     +
     +    1. kill (B)'s process
     +    2. send an invalid object to stdin.
     +
     +    1. is not ideal from a performance perspective as it will require
     +    spawning a new cat-file process each time, and 2. is hacky and not a
     +    good long term solution.
     +
     +    With this mechanism of queueing up commands and letting (A) issue a
     +    flush command, process (A) can control when the buffer is flushed and
     +    can guarantee it will receive all of the output when in --buffer mode.
     +
     +    This patch adds the basic structure for adding command which can be
     +    extended in the future to add more commands. It also adds the following
     +    two commands (on top of the flush command):
      
          contents <object> NL
          info <object> NL
     @@ Commit message
          The info command takes a <object> argument and prints out the object
          metadata.
      
     -    In addition, we need a set of commands that enable a "read session".
     -
     -    When a separate process (A) is connected to a git cat-file process (B)
     -    and is interactively writing to and reading from it in --buffer mode,
     -    (A) needs to be able to know when the buffer is flushed to stdout.
     -    Currently, from (A)'s perspective, the only way is to either 1. exit
     -    (B)'s process or 2. send an invalid object to stdin. 1. is not ideal
     -    from a performance perspective as it will require spawning a new cat-file
     -    process each time, and 2. is hacky and not a good long term solution.
     +    These can be used in the following way with --buffer:
      
     -    With the following commands, process (A) can begin a "session" and
     -    send a list of object names over stdin. When "get contents" or "get info"
     -    is issued, this list of object names will be fed into batch_one_object()
     -    to retrieve either info or contents. Finally an fflush() will be called
     -    to end the session.
     +    contents <sha1> NL
     +    object <sha1> NL
     +    object <sha1> NL
     +    contents <sha1> NL
     +    flush
     +    contents <sha1> NL
     +    flush
      
     -    begin NL
     -    get contents NL
     -    get info NL
     +    When used without --buffer:
      
     -    These can be used in the following way:
     -
     -    begin
     -    <sha1>
     -    <sha1>
     -    <sha1>
     -    <sha1>
     -    get info
     -
     -    begin
     -    <sha1>
     -    <sha1>
     -    <sha1>
     -    <sha1>
     -    get contents
     -
     -    With this mechanism, process (A) can be guaranteed to receive all of the
     -    output even in --buffer mode.
     +    contents <sha1> NL
     +    object <sha1> NL
     +    object <sha1> NL
     +    contents <sha1> NL
     +    contents <sha1> NL
      
          Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: John Cai <johncai86@gmail.com>
     @@ Documentation/git-cat-file.txt: OPTIONS
      +info <object>::
      +	Print object info for object reference <object>
      +
     -+begin::
     -+	Begins a session to read object names off of stdin. A session can be
     -+	terminated with `get contents` or `get info`.
     -+
     -+get contents::
     -+	After a read session is begun with the `begin` command, and object
     -+	names have been fed into stdin, end the session and retrieve contents of
     -+	all the objects requested.
     -+
     -+get info::
     -+	After a read session is begun with the `begin` command, and object
     -+	names have been fed into stdin, end the session and retrieve info of
     -+	all the objects requested.
     ++flush::
     ++	Execute all preceding commands that were issued since the beginning or
     ++	since the last flush command was issued. Only used with --buffer. When
     ++	--buffer is not used, commands are flushed each time without issuing
     ++	`flush`.
      +
       --batch-all-objects::
       	Instead of reading a list of objects on stdin, perform the
     @@ Documentation/git-cat-file.txt: OPTIONS
       ## builtin/cat-file.c ##
      @@ builtin/cat-file.c: struct batch_options {
       	int unordered;
     - 	int mode; /* may be 'w' or 'c' for --filters or --textconv */
     + 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
       	const char *format;
      +	int command;
       };
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
       				      data);
       }
       
     -+static void parse_cmd_object(struct batch_options *opt,
     ++typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
     ++			       struct strbuf *, struct expand_data *);
     ++
     ++struct queued_cmd {
     ++	parse_cmd_fn_t fn;
     ++	const char *line;
     ++};
     ++
     ++static void parse_cmd_contents(struct batch_options *opt,
      +			     const char *line,
      +			     struct strbuf *output,
     -+			     struct expand_data *data,
     -+			     struct string_list revs)
     ++			     struct expand_data *data)
      +{
      +	opt->print_contents = 1;
      +	batch_one_object(line, output, opt, data);
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +static void parse_cmd_info(struct batch_options *opt,
      +			   const char *line,
      +			   struct strbuf *output,
     -+			   struct expand_data *data,
     -+			   struct string_list revs)
     ++			   struct expand_data *data)
      +{
      +	opt->print_contents = 0;
      +	batch_one_object(line, output, opt, data);
      +}
      +
     -+static void parse_cmd_begin(struct batch_options *opt,
     -+			   const char *line,
     -+			   struct strbuf *output,
     -+			   struct expand_data *data,
     -+			   struct string_list revs)
     ++static void flush_batch_calls(struct batch_options *opt,
     ++		struct strbuf *output,
     ++		struct expand_data *data,
     ++		struct queued_cmd *cmds,
     ++		int queued)
      +{
     -+	/* nothing needs to be done here */
     -+}
     -+
     -+static void parse_cmd_get(struct batch_options *opt,
     -+			   const char *line,
     -+			   struct strbuf *output,
     -+			   struct expand_data *data,
     -+			   struct string_list revs)
     -+{
     -+	struct string_list_item *item;
     -+	for_each_string_list_item(item, &revs) {
     -+		batch_one_object(item->string, output, opt, data);
     ++	int i;
     ++	for(i = 0; i < queued; i++){
     ++		cmds[i].fn(opt, cmds[i].line, output, data);
      +	}
     -+	if (opt->buffer_output)
     -+		fflush(stdout);
     -+}
     -+static void parse_cmd_get_info(struct batch_options *opt,
     -+			   const char *line,
     -+			   struct strbuf *output,
     -+			   struct expand_data *data,
     -+			   struct string_list revs)
     -+{
     -+	opt->print_contents = 0;
     -+	parse_cmd_get(opt, line, output, data, revs);
     -+}
     -+
     -+static void parse_cmd_get_objects(struct batch_options *opt,
     -+			   const char *line,
     -+			   struct strbuf *output,
     -+			   struct expand_data *data,
     -+			   struct string_list revs)
     -+{
     -+	opt->print_contents = 1;
     -+	parse_cmd_get(opt, line, output, data, revs);
     -+	if (opt->buffer_output)
     -+		fflush(stdout);
     ++	fflush(stdout);
      +}
      +
     -+enum batch_state {
     -+	BATCH_STATE_COMMAND,
     -+	BATCH_STATE_INPUT,
     -+};
     -+
     -+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
     -+			       struct strbuf *, struct expand_data *,
     -+			       struct string_list revs);
     -+
      +static const struct parse_cmd {
      +	const char *prefix;
      +	parse_cmd_fn_t fn;
      +	unsigned takes_args;
     -+	enum batch_state next_state;
      +} commands[] = {
     -+	{ "contents", parse_cmd_object, 1, BATCH_STATE_COMMAND},
     -+	{ "info", parse_cmd_info, 1, BATCH_STATE_COMMAND},
     -+	{ "begin", parse_cmd_begin, 0, BATCH_STATE_INPUT},
     -+	{ "get info", parse_cmd_get_info, 0, BATCH_STATE_COMMAND},
     -+	{ "get contents", parse_cmd_get_objects, 0, BATCH_STATE_COMMAND},
     ++	{ "contents", parse_cmd_contents, 1},
     ++	{ "info", parse_cmd_info, 1},
      +};
      +
      +static void batch_objects_command(struct batch_options *opt,
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +				    struct expand_data *data)
      +{
      +	struct strbuf input = STRBUF_INIT;
     -+	enum batch_state state = BATCH_STATE_COMMAND;
     -+	struct string_list revs = STRING_LIST_INIT_DUP;
     ++	struct queued_cmd *cmds = NULL;
     ++	size_t alloc = 0, nr = 0;
     ++	int queued = 0;
      +
     -+	/* Read each line dispatch its command */
      +	while (!strbuf_getline(&input, stdin)) {
      +		int i;
      +		const struct parse_cmd *cmd = NULL;
      +		const char *p, *cmd_end;
     ++		struct queued_cmd call = {0};
     ++
     ++		if (!input.len)
     ++			die(_("empty command in input"));
     ++		if (isspace(*input.buf))
     ++			die(_("whitespace before command: '%s'"), input.buf);
      +
     -+		if (state == BATCH_STATE_COMMAND) {
     -+			if (*input.buf == '\n')
     -+				die("empty command in input");
     -+			else if (isspace(*input.buf))
     -+				die("whitespace before command: %s", input.buf);
     ++		if (skip_prefix(input.buf, "flush", &cmd_end)) {
     ++			if (!opt->buffer_output)
     ++				die(_("flush is only for --buffer mode"));
     ++			if (*cmd_end)
     ++				die(_("flush takes no arguments"));
     ++			if (!queued)
     ++				die(_("nothing to flush"));
     ++			flush_batch_calls(opt, output, data, cmds, queued);
     ++			queued = 0;
     ++			continue;
      +		}
      +
      +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
     -+			const char *prefix = commands[i].prefix;
     -+			char c;
     -+			if (!skip_prefix(input.buf, prefix, &cmd_end))
     ++			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
      +				continue;
     -+			/*
     -+			 * If the command has arguments, verify that it's
     -+			 * followed by a space. Otherwise, it shall be followed
     -+			 * by a line terminator.
     -+			 */
     -+			c = commands[i].takes_args ? ' ' : '\n';
     -+			if (*cmd_end && *cmd_end != c)
     -+				die("arguments invalid for command: %s", commands[i].prefix);
      +
      +			cmd = &commands[i];
      +			if (cmd->takes_args)
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			break;
      +		}
      +
     -+		if (input.buf[input.len - 1] == '\n')
     -+			input.buf[--input.len] = '\0';
     ++		if (!cmd)
     ++			die(_("unknown command: '%s'"), input.buf);
      +
     -+		if (state == BATCH_STATE_INPUT && !cmd){
     -+			string_list_append(&revs, input.buf);
     ++		if (!opt->buffer_output) {
     ++			cmd->fn(opt, p, output, data);
      +			continue;
      +		}
      +
     -+		if (!cmd)
     -+			die("unknown command: %s", input.buf);
     ++		queued++;
     ++		if (queued > nr) {
     ++			ALLOC_GROW(cmds, nr+1, alloc);
     ++			nr++;
     ++		}
      +
     -+		state = cmd->next_state;
     -+		cmd->fn(opt, p, output, data, revs);
     ++		call.fn = cmd->fn;
     ++		call.line = xstrdup(p);
     ++		cmds[queued-1] = call;
      +	}
     ++	free(cmds);
      +	strbuf_release(&input);
     -+	string_list_clear(&revs, 0);
      +}
      +
       static int batch_objects(struct batch_options *opt)
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	save_warning = warn_on_object_refname_ambiguity;
       	warn_on_object_refname_ambiguity = 0;
       
     --	while (strbuf_getline(&input, stdin) != EOF) {
     --		if (data.split_on_whitespace) {
     --			/*
     --			 * Split at first whitespace, tying off the beginning
     --			 * of the string and saving the remainder (or NULL) in
     --			 * data.rest.
     --			 */
     --			char *p = strpbrk(input.buf, " \t");
     --			if (p) {
     --				while (*p && strchr(" \t", *p))
     --					*p++ = '\0';
     -+	if (command)
     ++	if (command) {
      +		batch_objects_command(opt, &output, &data);
     -+	else {
     -+		while (strbuf_getline(&input, stdin) != EOF) {
     -+			if (data.split_on_whitespace) {
     -+				/*
     -+				 * Split at first whitespace, tying off the beginning
     -+				 * of the string and saving the remainder (or NULL) in
     -+				 * data.rest.
     -+				 */
     -+				char *p = strpbrk(input.buf, " \t");
     -+				if (p) {
     -+					while (*p && strchr(" \t", *p))
     -+						*p++ = '\0';
     -+				}
     -+				data.rest = p;
     - 			}
     --			data.rest = p;
     -+			batch_one_object(input.buf, &output, opt, &data);
     - 		}
     --
     --		batch_one_object(input.buf, &output, opt, &data);
     ++		goto cleanup;
     ++	}
     + 	while (strbuf_getline(&input, stdin) != EOF) {
     + 		if (data.split_on_whitespace) {
     + 			/*
     +@@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
     + 		batch_one_object(input.buf, &output, opt, &data);
       	}
       
     ++ cleanup:
       	strbuf_release(&input);
     + 	strbuf_release(&output);
     + 	warn_on_object_refname_ambiguity = save_warning;
      @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       
       	bo->enabled = 1;
     @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       
       	return 0;
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
     - 			N_("show info about objects fed from the standard input"),
     + 			N_("like --batch, but don't emit <contents>"),
       			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
       			batch_option_callback),
     -+		OPT_CALLBACK_F(0, "batch-command", &batch, N_(""),
     -+			 N_("enters batch mode that accepts commands"),
     -+			 PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     -+			 batch_option_callback),
     -+
     - 		OPT_BOOL(0, "follow-symlinks", &batch.follow_symlinks,
     - 			 N_("follow in-tree symlinks (used with --batch or --batch-check)")),
     - 		OPT_BOOL(0, "batch-all-objects", &batch.all_objects,
     ++		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
     ++			N_("read commands from stdin"),
     ++			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     ++			batch_option_callback),
     + 		OPT_CMDMODE(0, "batch-all-objects", &opt,
     + 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
     + 		/* Batch-specific options */
      
       ## t/t1006-cat-file.sh ##
      @@ t/t1006-cat-file.sh: $content"
     @@ t/t1006-cat-file.sh: $content"
      +    test -z "$content" ||
      +    test_expect_success "--batch-command output of $type content is correct" '
      +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
     -+	maybe_remove_timestamp "$(echo contents $sha1 | git cat-file --batch-command)" $no_ts >actual &&
     -+	test_cmp expect actual
     -+    '
     -+
     -+    test -z "$content" ||
     -+    test_expect_success "--batch-command session for $type content is correct" '
     -+	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
     -+	maybe_remove_timestamp \
     -+		"$(test_write_lines "begin" "$sha1" "get contents" | git cat-file --batch-command)" \
     -+		$no_ts >actual &&
     ++	maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
     ++	| git cat-file --batch-command)" $no_ts >actual &&
      +	test_cmp expect actual
      +    '
      +
      +    test_expect_success "--batch-command output of $type info is correct" '
      +	echo "$sha1 $type $size" >expect &&
     -+	echo "info $sha1" | git cat-file --batch-command >actual &&
     -+	test_cmp expect actual
     -+    '
     -+
     -+    test_expect_success "--batch-command session for $type info is correct" '
     -+	echo "$sha1 $type $size" >expect &&
     -+	test_write_lines "begin" "$sha1" "get info" | git cat-file --batch-command >actual &&
     ++	test_write_lines "info $sha1" | git cat-file --batch-command >actual &&
      +	test_cmp expect actual
      +    '
      +
           test_expect_success "custom --batch-check format" '
       	echo "$type $sha1" >expect &&
       	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
     -@@ t/t1006-cat-file.sh: test_expect_success '--batch-check without %(rest) considers whole line' '
     - '
     +@@ t/t1006-cat-file.sh: $content"
     +     '
     + }
       
     - tree_sha1=$(git write-tree)
     ++run_buffer_test_flush () {
     ++	type=$1
     ++	sha1=$2
     ++	size=$3
     ++
     ++	mkfifo input &&
     ++	test_when_finished 'rm input; exec 8<&-' &&
     ++	mkfifo output &&
     ++	exec 9<>output &&
     ++	test_when_finished 'rm output; exec 9<&-'
     ++	(
     ++		git cat-file --buffer --batch-command <input 2>err &
     ++		echo $! &&
     ++		wait $!
     ++	) >&9 &
     ++	sh_pid=$! &&
     ++	read cat_file_pid <&9 &&
     ++	test_when_finished "kill $cat_file_pid
     ++			    kill $sh_pid; wait $sh_pid; :" &&
     ++	echo "$sha1 $type $size" >expect &&
     ++	test_write_lines "info $sha1" flush "info $sha1" >input
     ++	# TODO - consume all available input, not just one
     ++	# line (see above).
     ++	# check output is flushed on exit
     ++	read actual <&9 &&
     ++	echo "$actual" >actual &&
     ++	test_cmp expect actual &&
     ++	test_must_be_empty err
     ++}
     ++
     ++run_buffer_test_no_flush () {
     ++	type=$1
     ++	sha1=$2
     ++	size=$3
     ++
     ++	touch output &&
     ++	test_when_finished 'rm output' &&
     ++	mkfifo input &&
     ++	test_when_finished 'rm input' &&
     ++	mkfifo pid &&
     ++	exec 9<>pid &&
     ++	test_when_finished 'rm pid; exec 9<&-'
     ++	(
     ++		git cat-file --buffer --batch-command <input >output &
     ++		echo $! &&
     ++		wait $!
     ++		echo $?
     ++	) >&9 &
     ++	sh_pid=$! &&
     ++	read cat_file_pid <&9 &&
     ++	test_when_finished "kill $cat_file_pid
     ++			    kill $sh_pid; wait $sh_pid; :" &&
     ++	test_write_lines "info $sha1" "info $sha1" &&
     ++	kill $cat_file_pid &&
     ++	read status <&9 &&
     ++	test_must_be_empty output
     ++}
      +
     - tree_size=$(($(test_oid rawsz) + 13))
     - tree_pretty_content="100644 blob $hello_sha1	hello"
     + hello_content="Hello World"
     + hello_size=$(strlen "$hello_content")
     + hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
     +@@ t/t1006-cat-file.sh: test_expect_success "setup" '
       
     -@@ t/t1006-cat-file.sh: test_expect_success \
     + run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
     + 
     ++test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
     ++       run_buffer_test_flush blob $hello_sha1 $hello_size
     ++'
     ++
     ++test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
     ++       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
     ++'
     ++
     + test_expect_success '--batch-check without %(rest) considers whole line' '
     + 	echo "$hello_sha1 blob $hello_size" >expect &&
     + 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
     +@@ t/t1006-cat-file.sh: tree_pretty_content="100644 blob $hello_sha1	hello"
     + 
     + run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
     + 
     ++test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
     ++       run_buffer_test_flush tree $tree_sha1 $tree_size
     ++'
     ++
     ++test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
     ++       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
     ++'
     ++
     + commit_message="Initial commit"
     + commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
     + commit_size=$(($(test_oid hexsz) + 137))
     +@@ t/t1006-cat-file.sh: $commit_message"
     + 
     + run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
     + 
     ++test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
     ++       run_buffer_test_flush commit $commit_sha1 $commit_size
     ++'
     ++
     ++test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
     ++       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
     ++'
     ++
     + tag_header_without_timestamp="object $hello_sha1
     + type blob
     + tag hellotag
     +@@ t/t1006-cat-file.sh: tag_size=$(strlen "$tag_content")
     + 
     + run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
     + 
     ++test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
     ++       run_buffer_test_flush tag $tag_sha1 $tag_size
     ++'
     ++
     ++test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
     ++       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
     ++'
     ++
     + test_expect_success \
           "Reach a blob from a tag pointing to it" \
           "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
       
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
           "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
       '
       
     -+test_expect_success "--batch-command with multiple sha1s gives correct format" '
     -+    echo "$batch_check_output" >expect &&
     -+    echo begin >input &&
     -+    echo_without_newline "$batch_check_input" >>input &&
     -+    echo "get info" >>input &&
     -+    git cat-file --batch-command <input >actual &&
     -+    test_cmp expect actual
     ++batch_command_info_input="info $hello_sha1
     ++info $tree_sha1
     ++info $commit_sha1
     ++info $tag_sha1
     ++info deadbeef
     ++info 
     ++flush
     ++"
     ++
     ++test_expect_success "--batch-command with multiple info calls gives correct format" '
     ++	test "$batch_check_output" = "$(echo_without_newline \
     ++	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
     ++'
     ++
     ++batch_command_contents_input="contents $hello_sha1
     ++contents $commit_sha1
     ++contents $tag_sha1
     ++contents deadbeef
     ++contents 
     ++flush
     ++"
     ++
     ++test_expect_success "--batch-command with multiple contents calls gives correct format" '
     ++	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
     ++	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
     ++'
     ++
     ++batch_command_mixed_input="info $hello_sha1
     ++contents $hello_sha1
     ++info $commit_sha1
     ++contents $commit_sha1
     ++info $tag_sha1
     ++contents $tag_sha1
     ++contents deadbeef
     ++info 
     ++flush
     ++"
     ++
     ++batch_command_mixed_output="$hello_sha1 blob $hello_size
     ++$hello_sha1 blob $hello_size
     ++$hello_content
     ++$commit_sha1 commit $commit_size
     ++$commit_sha1 commit $commit_size
     ++$commit_content
     ++$tag_sha1 tag $tag_size
     ++$tag_sha1 tag $tag_size
     ++$tag_content
     ++deadbeef missing
     ++ missing"
     ++
     ++test_expect_success "--batch-command with mixed calls gives correct format" '
     ++	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
     ++	"$(maybe_remove_timestamp "$(echo_without_newline \
     ++	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
      +'
      +
       test_expect_success 'setup blobs which are likely to delta' '
       	test-tool genrandom foo 10240 >foo &&
       	{ cat foo && echo plus; } >foo-plus &&
      @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
     + 	echo "$orig commit $orig_size" >expect &&
       	test_cmp expect actual
       '
     - 
     ++test_expect_success 'batch-command empty command' '
     ++	echo "" >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     ++	grep -E "^fatal:.*empty command in input.*" err
     ++'
     ++
     ++test_expect_success 'batch-command whitespace before command' '
     ++	echo " info deadbeef" >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     ++	grep -E "^fatal:.*whitespace before command.*" err
     ++'
     ++
      +test_expect_success 'batch-command unknown command' '
      +	echo unknown_command >cmd &&
     -+	test_expect_code 128 git cat-file --batch-command < cmd 2>err &&
     ++	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
      +	grep -E "^fatal:.*unknown command.*" err
      +'
      +
     ++test_expect_success 'batch-command flush with arguments' '
     ++	echo "flush arg" >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
     ++	grep -E "^fatal:.*flush takes no arguments.*" err
     ++'
     ++
     ++test_expect_success 'batch-command flush without --buffer' '
     ++	echo "flush arg" >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     ++	grep -E "^fatal:.*flush is only for --buffer mode.*" err
     ++'
     ++
     ++test_expect_success 'batch-command flush empty queue' '
     ++	echo flush >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
     ++	grep -E "^fatal:.*nothing to flush.*" err
     ++'
     + 
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode
  2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-07 16:33   ` John Cai via GitGitGadget
  2022-02-07 23:58     ` Junio C Hamano
  2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
  2 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-07 16:33 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

When introducing a new flag --batch-command, we will add a flag on the
batch_options struct that indicates whether or not an interactive
command mode will be used that reads commands and arguments off of
stdin.

An intuitive name for this flag would be "command", which can get
confusing with the already existing cmdmode.

cmdmode refers to how the result output of the blob will be transformed,
either according to --filter or --textconv. So transform_mode is a more
descriptive name for the flag, and will not get confused with the new
command flag to be added in the next commit.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-07 16:33   ` John Cai via GitGitGadget
  2022-02-07 23:34     ` Jonathan Tan
                       ` (2 more replies)
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
  2 siblings, 3 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-07 16:33 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings since on a given server.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] NL
<command2> [arg1] [arg2] NL

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush NL

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> NL
info <object> NL

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

contents <sha1> NL
object <sha1> NL
object <sha1> NL
contents <sha1> NL
flush
contents <sha1> NL
flush

When used without --buffer:

contents <sha1> NL
object <sha1> NL
object <sha1> NL
contents <sha1> NL
contents <sha1> NL

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  19 ++++
 builtin/cat-file.c             | 124 +++++++++++++++++++++
 t/t1006-cat-file.sh            | 197 ++++++++++++++++++++++++++++++++-
 3 files changed, 339 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..618dbd15338 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,25 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin.
+	May not be combined with any other options or arguments except
+	`--textconv` or `--filters`, in which case the input lines also need to
+	specify the path, separated by whitespace.  See the section
+	`BATCH OUTPUT` below for details.
+
+contents <object>::
+	Print object contents for object reference <object>
+
+info <object>::
+	Print object info for object reference <object>
+
+flush::
+	Execute all preceding commands that were issued since the beginning or
+	since the last flush command was issued. Only used with --buffer. When
+	--buffer is not used, commands are flushed each time without issuing
+	`flush`.
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..6bfab74b58a 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,6 +26,7 @@ struct batch_options {
 	int unordered;
 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
+	int command;
 };
 
 static const char *force_path;
@@ -508,6 +509,118 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	const char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->print_contents = 1;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->print_contents = 0;
+	batch_one_object(line, output, opt, data);
+}
+
+static void flush_batch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmds,
+		int queued)
+{
+	int i;
+	for(i = 0; i < queued; i++){
+		cmds[i].fn(opt, cmds[i].line, output, data);
+	}
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *cmds = NULL;
+	size_t alloc = 0, nr = 0;
+	int queued = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
+			if (!opt->buffer_output)
+				die(_("flush is only for --buffer mode"));
+			if (*cmd_end)
+				die(_("flush takes no arguments"));
+			if (!queued)
+				die(_("nothing to flush"));
+			flush_batch_calls(opt, output, data, cmds, queued);
+			queued = 0;
+			continue;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args)
+				p = cmd_end + 1;
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		queued++;
+		if (queued > nr) {
+			ALLOC_GROW(cmds, nr+1, alloc);
+			nr++;
+		}
+
+		call.fn = cmd->fn;
+		call.line = xstrdup(p);
+		cmds[queued-1] = call;
+	}
+	free(cmds);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -515,6 +628,7 @@ static int batch_objects(struct batch_options *opt)
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
+	const int command = opt->command;
 
 	if (!opt->format)
 		opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -590,6 +704,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (command) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -608,6 +726,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -636,6 +755,7 @@ static int batch_option_callback(const struct option *opt,
 
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
+	bo->command = !strcmp(opt->long_name, "batch-command");
 	bo->format = arg;
 
 	return 0;
@@ -683,6 +803,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..c57a35ea20a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -177,6 +177,20 @@ $content"
 	test_cmp expect actual
     '
 
+    test -z "$content" ||
+    test_expect_success "--batch-command output of $type content is correct" '
+	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+	maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+	| git cat-file --batch-command)" $no_ts >actual &&
+	test_cmp expect actual
+    '
+
+    test_expect_success "--batch-command output of $type info is correct" '
+	echo "$sha1 $type $size" >expect &&
+	test_write_lines "info $sha1" | git cat-file --batch-command >actual &&
+	test_cmp expect actual
+    '
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -213,6 +227,64 @@ $content"
     '
 }
 
+run_buffer_test_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	mkfifo input &&
+	test_when_finished 'rm input; exec 8<&-' &&
+	mkfifo output &&
+	exec 9<>output &&
+	test_when_finished 'rm output; exec 9<&-'
+	(
+		git cat-file --buffer --batch-command <input 2>err &
+		echo $! &&
+		wait $!
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	echo "$sha1 $type $size" >expect &&
+	test_write_lines "info $sha1" flush "info $sha1" >input
+	# TODO - consume all available input, not just one
+	# line (see above).
+	# check output is flushed on exit
+	read actual <&9 &&
+	echo "$actual" >actual &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+}
+
+run_buffer_test_no_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	touch output &&
+	test_when_finished 'rm output' &&
+	mkfifo input &&
+	test_when_finished 'rm input' &&
+	mkfifo pid &&
+	exec 9<>pid &&
+	test_when_finished 'rm pid; exec 9<&-'
+	(
+		git cat-file --buffer --batch-command <input >output &
+		echo $! &&
+		wait $!
+		echo $?
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	test_write_lines "info $sha1" "info $sha1" &&
+	kill $cat_file_pid &&
+	read status <&9 &&
+	test_must_be_empty output
+}
+
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
 hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
@@ -224,6 +296,14 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
+       run_buffer_test_flush blob $hello_sha1 $hello_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
+       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -238,6 +318,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
+       run_buffer_test_flush tree $tree_sha1 $tree_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
+       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
+'
+
 commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
@@ -249,6 +337,14 @@ $commit_message"
 
 run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
+       run_buffer_test_flush commit $commit_sha1 $commit_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
+       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
+'
+
 tag_header_without_timestamp="object $hello_sha1
 type blob
 tag hellotag
@@ -263,11 +359,19 @@ tag_size=$(strlen "$tag_content")
 
 run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
+       run_buffer_test_flush tag $tag_sha1 $tag_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
+       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
+'
+
 test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -373,6 +477,62 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+batch_command_info_input="info $hello_sha1
+info $tree_sha1
+info $commit_sha1
+info $tag_sha1
+info deadbeef
+info 
+flush
+"
+
+test_expect_success "--batch-command with multiple info calls gives correct format" '
+	test "$batch_check_output" = "$(echo_without_newline \
+	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
+'
+
+batch_command_contents_input="contents $hello_sha1
+contents $commit_sha1
+contents $tag_sha1
+contents deadbeef
+contents 
+flush
+"
+
+test_expect_success "--batch-command with multiple contents calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
+'
+
+batch_command_mixed_input="info $hello_sha1
+contents $hello_sha1
+info $commit_sha1
+contents $commit_sha1
+info $tag_sha1
+contents $tag_sha1
+contents deadbeef
+info 
+flush
+"
+
+batch_command_mixed_output="$hello_sha1 blob $hello_size
+$hello_sha1 blob $hello_size
+$hello_content
+$commit_sha1 commit $commit_size
+$commit_sha1 commit $commit_size
+$commit_content
+$tag_sha1 tag $tag_size
+$tag_sha1 tag $tag_size
+$tag_content
+deadbeef missing
+ missing"
+
+test_expect_success "--batch-command with mixed calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline \
+	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -963,5 +1123,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep -E "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*flush is only for --buffer mode.*" err
+'
+
+test_expect_success 'batch-command flush empty queue' '
+	echo flush >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep -E "^fatal:.*nothing to flush.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-07 23:34     ` Jonathan Tan
  2022-02-08 11:00       ` Phillip Wood
  2022-02-08  0:49     ` Junio C Hamano
  2022-02-08 11:06     ` Phillip Wood
  2 siblings, 1 reply; 97+ messages in thread
From: Jonathan Tan @ 2022-02-07 23:34 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, gitster,
	sunshine, johncai86, Jonathan Tan

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> However, if we had --batch-command, we wouldn't need to keep both
> processes around, and instead just have one --batch-command process
> where we can flip between getting object info, and getting object
> contents. Since we have a pair of cat-file processes per repository,
> this means we can get rid of roughly half of long lived git cat-file
> processes. Given there are many repositories being accessed at any given
> time, this can lead to huge savings since on a given server.

One other benefit is that with explicit flushes, in a partial clone,
this makes it possible to batch prefetch objects.

> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index bef76f4dd06..618dbd15338 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -96,6 +96,25 @@ OPTIONS
>  	need to specify the path, separated by whitespace.  See the
>  	section `BATCH OUTPUT` below for details.
>  
> +--batch-command::
> +	Enter a command mode that reads commands and arguments from stdin.
> +	May not be combined with any other options or arguments except
> +	`--textconv` or `--filters`, in which case the input lines also need to
> +	specify the path, separated by whitespace.  See the section
> +	`BATCH OUTPUT` below for details.
> +
> +contents <object>::
> +	Print object contents for object reference <object>
> +
> +info <object>::
> +	Print object info for object reference <object>
> +
> +flush::
> +	Execute all preceding commands that were issued since the beginning or
> +	since the last flush command was issued. Only used with --buffer. When
> +	--buffer is not used, commands are flushed each time without issuing
> +	`flush`.

The way this is formatted leads me to think that "contents", etc. are
CLI arguments, not things written to stdin. Some of the commit message
probably needs to go here.

I just looked at the commit message and documentation for now.

If you have time and are interested, we at Google are thinking of a more
comprehensive "batch" process [1].

[1] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode
  2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-07 23:58     ` Junio C Hamano
  0 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-07 23:58 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> When introducing a new flag --batch-command, we will add a flag on the
> batch_options struct that indicates whether or not an interactive
> command mode will be used that reads commands and arguments off of
> stdin.
>
> An intuitive name for this flag would be "command", which can get
> confusing with the already existing cmdmode.

Is "command" truly an intuitive name for "were we given '--batch-command'
option?" bit?  I dunno.

But I agree that "transform_mode" is much better name than "cmdmode"
for what this member is about.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-07 23:34     ` Jonathan Tan
@ 2022-02-08  0:49     ` Junio C Hamano
  2022-02-08 11:06     ` Phillip Wood
  2 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-08  0:49 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> Add a new flag --batch-command that accepts commands and arguments
> from stdin, similar to git-update-ref --stdin.
>
> At GitLab, we use a pair of long running cat-file processes when
> accessing object content. One for iterating over object metadata with
> --batch-check, and the other to grab object contents with --batch.
>
> However, if we had --batch-command, we wouldn't need to keep both
> processes around, and instead just have one --batch-command process
> where we can flip between getting object info, and getting object
> contents. Since we have a pair of cat-file processes per repository,
> this means we can get rid of roughly half of long lived git cat-file
> processes. Given there are many repositories being accessed at any given
> time, this can lead to huge savings since on a given server.
>
> git cat-file --batch-command
>
> will enter an interactive command mode whereby the user can enter in
> commands and their arguments that get queued in memory:
>
> <command1> [arg1] [arg2] NL
> <command2> [arg1] [arg2] NL

If you mean you take one command with its args per line, use LF not
NL.

    $ git grep '\<NL\>' Documentation
    $ git grep '\<LF\>' Documentation

We may want to fix the sole offender in Documentation/config.txt
file (#leftoverbits).

> With this mechanism of queueing up commands and letting (A) issue a
> flush command, process (A) can control when the buffer is flushed and
> can guarantee it will receive all of the output when in --buffer mode.

Are we giving them guarantee when output will *not* come?  If (B) is
allowed to flush when it thinks it has too much in-core, it would
mean that (A) cannot keep issuing commands forever without reading
the response from (B), because (B) will eventually be blocked when
it tries to flush to a pipe that (A) is not reading.  I think there
should be some discussion on that, too.  IOW, --batch-command does
not allow (B) to flush until it gets "flush", or something like that.

> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index bef76f4dd06..618dbd15338 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -96,6 +96,25 @@ OPTIONS
>  	need to specify the path, separated by whitespace.  See the
>  	section `BATCH OUTPUT` below for details.
>  
> +--batch-command::
> +	Enter a command mode that reads commands and arguments from stdin.
> +	May not be combined with any other options or arguments except
> +	`--textconv` or `--filters`, in which case the input lines also need to
> +	specify the path, separated by whitespace.  See the section
> +	`BATCH OUTPUT` below for details.
> +
> +contents <object>::
> +	Print object contents for object reference <object>

Presumably this corresponds to what you get out of "--batch"?

> +info <object>::
> +	Print object info for object reference <object>

and this one "--batch-check"?

I expect that future readers will ask this same question because it
is not clear how "object contents" and "object info" are exactly
printed.  These two paragraphs may want to anticipate it and reduce
the need for readers to ask such a question.  

> +flush::
> +	Execute all preceding commands that were issued since the beginning or
> +	since the last flush command was issued. Only used with --buffer. When
> +	--buffer is not used, commands are flushed each time without issuing
> +	`flush`.

Here is a good place to also say "When --buffer is used, no output
will come until this is issued" or something.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5f015e71096..6bfab74b58a 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -26,6 +26,7 @@ struct batch_options {
>  	int unordered;
>  	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
>  	const char *format;
> +	int command;

I am not sure if "command" is a good name.  Does it answer this
question clearly? "'command' as opposed to what?"

> @@ -508,6 +509,118 @@ static int batch_unordered_packed(const struct object_id *oid,
>  				      data);
>  }
>  
> +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> +			       struct strbuf *, struct expand_data *);
> +
> +struct queued_cmd {
> +	parse_cmd_fn_t fn;
> +	const char *line;
> +};
> +
> +static void parse_cmd_contents(struct batch_options *opt,
> +			     const char *line,
> +			     struct strbuf *output,
> +			     struct expand_data *data)
> +{
> +	opt->print_contents = 1;
> +	batch_one_object(line, output, opt, data);
> +}
> +
> +static void parse_cmd_info(struct batch_options *opt,
> +			   const char *line,
> +			   struct strbuf *output,
> +			   struct expand_data *data)
> +{
> +	opt->print_contents = 0;
> +	batch_one_object(line, output, opt, data);
> +}

OK, these are as simple as expected.

> +static void flush_batch_calls(struct batch_options *opt,
> +		struct strbuf *output,
> +		struct expand_data *data,
> +		struct queued_cmd *cmds,
> +		int queued)
> +{
> +	int i;
> +	for(i = 0; i < queued; i++){

Missing SP around parentheses.

Excess brace pair wround a single-statement block.

> +		cmds[i].fn(opt, cmds[i].line, output, data);
> +	}
> +	fflush(stdout);
> +}
> +
> +static const struct parse_cmd {
> +	const char *prefix;
> +	parse_cmd_fn_t fn;
> +	unsigned takes_args;
> +} commands[] = {
> +	{ "contents", parse_cmd_contents, 1},
> +	{ "info", parse_cmd_info, 1},
> +};
> +
> +static void batch_objects_command(struct batch_options *opt,
> +				    struct strbuf *output,
> +				    struct expand_data *data)
> +{
> +	struct strbuf input = STRBUF_INIT;
> +	struct queued_cmd *cmds = NULL;
> +	size_t alloc = 0, nr = 0;
> +	int queued = 0;
> +
> +	while (!strbuf_getline(&input, stdin)) {
> +		int i;
> +		const struct parse_cmd *cmd = NULL;
> +		const char *p, *cmd_end;
> +		struct queued_cmd call = {0};
> +
> +		if (!input.len)
> +			die(_("empty command in input"));
> +		if (isspace(*input.buf))
> +			die(_("whitespace before command: '%s'"), input.buf);
> +
> +		if (skip_prefix(input.buf, "flush", &cmd_end)) {
> +			if (!opt->buffer_output)
> +				die(_("flush is only for --buffer mode"));
> +			if (*cmd_end)
> +				die(_("flush takes no arguments"));
> +			if (!queued)
> +				die(_("nothing to flush"));

I am not sure if this is a good idea at all.  What do we gain from
punishing an automated stupid loop that issues flush every once in a
while after issuing a handful real commands and issues another flush
after running out of the real commands for a good measure?

> +			flush_batch_calls(opt, output, data, cmds, queued);
> +			queued = 0;
> +			continue;
> +		}
> +
> +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
> +				continue;
> +
> +			cmd = &commands[i];
> +			if (cmd->takes_args)
> +				p = cmd_end + 1;

If somebody adds an entry with .takes_args==false, p will stay
uninitialized and used in the call ->fn() below, or passed to
xstrdup(p).  It probabloy should be initialized to NULL, and
xstrdup(p) below replaced with xstrdup_or_null(p).

> +			break;
> +		}
> +
> +		if (!cmd)
> +			die(_("unknown command: '%s'"), input.buf);
> +
> +		if (!opt->buffer_output) {
> +			cmd->fn(opt, p, output, data);
> +			continue;
> +		}
> +


> +		queued++;
> +		if (queued > nr) {
> +			ALLOC_GROW(cmds, nr+1, alloc);
> +			nr++;
> +		}
> +
> +		call.fn = cmd->fn;
> +		call.line = xstrdup(p);
> +		cmds[queued-1] = call;

Can nr and queued ever go out of sync?

If cmds is the usual <array, nr, alloc> tuple we let ALLOC_GROW() to
manage, alloc keeps track of how physically large the array is, and
nr indicates how many slots are filled.  Holding onto the block of
memory we used when discarding the accumulated items and reusing
that block without having to reallocate until we use the slots that
we have allocated is done by using only <nr, alloc> pair.

But the above code seems that it does not understand that, and
instead thinks it has to use "nr" for the "we have made the array
this big, so we do not have to realloc up to that point" pointer,
hence its use of a separate "queued".  IOW, the array growing code
above seems confused.

It is more customery (hence easier to follow by readers who work on
our code base) to lose queued and say

		ALLOC_GROW(cmds, nr + 1, alloc);
		cmds[nr++] = call;
		call.fn = cmd->fn;
		call.line = xstrdup_or_null(p);

instead of the above 9 lines.

> @@ -515,6 +628,7 @@ static int batch_objects(struct batch_options *opt)
>  	struct expand_data data;
>  	int save_warning;
>  	int retval = 0;
> +	const int command = opt->command;

This tells me that it is quite a misnomer.  This single bit is used
to differentiate between "other batch modes" and "--batch-command"
mode, which already smells like a misdesign, because we have, from
an end-user's point of view, three modes:

    --batch
    --batch-check
    --batch-command

so it would be far cleaner to have a single batch_mode enum that can
represent these three "batch modes", no?

>  	if (!opt->format)
>  		opt->format = "%(objectname) %(objecttype) %(objectsize)";
> @@ -590,6 +704,10 @@ static int batch_objects(struct batch_options *opt)
>  	save_warning = warn_on_object_refname_ambiguity;
>  	warn_on_object_refname_ambiguity = 0;
>  
> +	if (command) {
> +		batch_objects_command(opt, &output, &data);
> +		goto cleanup;
> +	}
>  	while (strbuf_getline(&input, stdin) != EOF) {
>  		if (data.split_on_whitespace) {
>  			/*
> @@ -608,6 +726,7 @@ static int batch_objects(struct batch_options *opt)
>  		batch_one_object(input.buf, &output, opt, &data);
>  	}
>  
> + cleanup:
>  	strbuf_release(&input);
>  	strbuf_release(&output);
>  	warn_on_object_refname_ambiguity = save_warning;
> @@ -636,6 +755,7 @@ static int batch_option_callback(const struct option *opt,
>  
>  	bo->enabled = 1;
>  	bo->print_contents = !strcmp(opt->long_name, "batch");
> +	bo->command = !strcmp(opt->long_name, "batch-command");

And this part needs fixing.  The original used to say

	we supoprt "batch" and something else (it turns out that
	"batch-check" is that something else, but the above code is
	so sloppy that it does not even tell readers that), and
	the .print_contents member is how you can tell them apart.

This patch is making it worse by introducing another member that can
be independently set or unset, pretending that the four combinations
all can make sense, but that is not the case, right?

So, perhaps a good first step would be to lose .print_contents
member and add .batch_command member that is used to more explicitly
name one of the three possibilities?

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-07 23:34     ` Jonathan Tan
@ 2022-02-08 11:00       ` Phillip Wood
  2022-02-08 17:56         ` Jonathan Tan
  0 siblings, 1 reply; 97+ messages in thread
From: Phillip Wood @ 2022-02-08 11:00 UTC (permalink / raw)
  To: Jonathan Tan, gitgitgadget
  Cc: git, me, avarab, e, bagasdotme, gitster, sunshine, johncai86

Hi Jonathan and John

On 07/02/2022 23:34, Jonathan Tan wrote:
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> However, if we had --batch-command, we wouldn't need to keep both
>> processes around, and instead just have one --batch-command process
>> where we can flip between getting object info, and getting object
>> contents. Since we have a pair of cat-file processes per repository,
>> this means we can get rid of roughly half of long lived git cat-file
>> processes. Given there are many repositories being accessed at any given
>> time, this can lead to huge savings since on a given server.
> 
> One other benefit is that with explicit flushes, in a partial clone,
> this makes it possible to batch prefetch objects.

Jonathan is there any overlap between what this series is trying to do 
and your proposal for a batch command[1]? For example would extending 
this series to get blob sizes be useful to you?

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

>> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>> index bef76f4dd06..618dbd15338 100644
>> --- a/Documentation/git-cat-file.txt
>> +++ b/Documentation/git-cat-file.txt
>> @@ -96,6 +96,25 @@ OPTIONS
>>   	need to specify the path, separated by whitespace.  See the
>>   	section `BATCH OUTPUT` below for details.
>>   
>> +--batch-command::
>> +	Enter a command mode that reads commands and arguments from stdin.
>> +	May not be combined with any other options or arguments except
>> +	`--textconv` or `--filters`, in which case the input lines also need to
>> +	specify the path, separated by whitespace.  See the section
>> +	`BATCH OUTPUT` below for details.
>> +
>> +contents <object>::
>> +	Print object contents for object reference <object>
>> +
>> +info <object>::
>> +	Print object info for object reference <object>
>> +
>> +flush::
>> +	Execute all preceding commands that were issued since the beginning or
>> +	since the last flush command was issued. Only used with --buffer. When
>> +	--buffer is not used, commands are flushed each time without issuing
>> +	`flush`.
> 
> The way this is formatted leads me to think that "contents", etc. are
> CLI arguments, not things written to stdin. Some of the commit message
> probably needs to go here.
> 
> I just looked at the commit message and documentation for now.
> 
> If you have time and are interested, we at Google are thinking of a more
> comprehensive "batch" process [1].
> 
> [1] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-07 23:34     ` Jonathan Tan
  2022-02-08  0:49     ` Junio C Hamano
@ 2022-02-08 11:06     ` Phillip Wood
  2 siblings, 0 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-08 11:06 UTC (permalink / raw)
  To: John Cai via GitGitGadget, git
  Cc: me, avarab, e, bagasdotme, gitster, Eric Sunshine, John Cai

Hi John

On 07/02/2022 16:33, John Cai via GitGitGadget wrote:
> From: John Cai <johncai86@gmail.com>
> 
> Add a new flag --batch-command that accepts commands and arguments
> from stdin, similar to git-update-ref --stdin.
> 
> At GitLab, we use a pair of long running cat-file processes when
> accessing object content. One for iterating over object metadata with
> --batch-check, and the other to grab object contents with --batch.
> 
> However, if we had --batch-command, we wouldn't need to keep both
> processes around, and instead just have one --batch-command process
> where we can flip between getting object info, and getting object
> contents. Since we have a pair of cat-file processes per repository,
> this means we can get rid of roughly half of long lived git cat-file
> processes. Given there are many repositories being accessed at any given
> time, this can lead to huge savings since on a given server.
> 
> git cat-file --batch-command
> 
> will enter an interactive command mode whereby the user can enter in
> commands and their arguments that get queued in memory:
> 
> <command1> [arg1] [arg2] NL
> <command2> [arg1] [arg2] NL
> 
> When --buffer mode is used, commands will be queued in memory until a
> flush command is issued that execute them:
> 
> flush NL
> 
> The reason for a flush command is that when a consumer process (A)
> talks to a git cat-file process (B) and interactively writes to and
> reads from it in --buffer mode, (A) needs to be able to control when
> the buffer is flushed to stdout.
> 
> Currently, from (A)'s perspective, the only way is to either
> 
> 1. kill (B)'s process
> 2. send an invalid object to stdin.
> 
> 1. is not ideal from a performance perspective as it will require
> spawning a new cat-file process each time, and 2. is hacky and not a
> good long term solution.
> 
> With this mechanism of queueing up commands and letting (A) issue a
> flush command, process (A) can control when the buffer is flushed and
> can guarantee it will receive all of the output when in --buffer mode.
> 
> This patch adds the basic structure for adding command which can be
> extended in the future to add more commands. It also adds the following
> two commands (on top of the flush command):
> 
> contents <object> NL
> info <object> NL
> 
> The contents command takes an <object> argument and prints out the object
> contents.
> 
> The info command takes a <object> argument and prints out the object
> metadata.
> 
> These can be used in the following way with --buffer:
> 
> contents <sha1> NL
> object <sha1> NL

There is no object command

 >[...]
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 145eee11df9..c57a35ea20a 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -177,6 +177,20 @@ $content"
>   	test_cmp expect actual
>       '
>   
> +    test -z "$content" ||
> +    test_expect_success "--batch-command output of $type content is correct" '
> +	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +	maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
> +	| git cat-file --batch-command)" $no_ts >actual &&
> +	test_cmp expect actual
> +    '
> +
> +    test_expect_success "--batch-command output of $type info is correct" '
> +	echo "$sha1 $type $size" >expect &&
> +	test_write_lines "info $sha1" | git cat-file --batch-command >actual &&
> +	test_cmp expect actual
> +    '
> +
>       test_expect_success "custom --batch-check format" '
>   	echo "$type $sha1" >expect &&
>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
> @@ -213,6 +227,64 @@ $content"
>       '
>   }
>   
> +run_buffer_test_flush () {
> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	mkfifo input &&
> +	test_when_finished 'rm input; exec 8<&-' &&
> +	mkfifo output &&
> +	exec 9<>output &&
> +	test_when_finished 'rm output; exec 9<&-'
> +	(
> +		git cat-file --buffer --batch-command <input 2>err &
> +		echo $! &&
> +		wait $!
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	echo "$sha1 $type $size" >expect &&
> +	test_write_lines "info $sha1" flush "info $sha1" >input
> +	# TODO - consume all available input, not just one
> +	# line (see above).
> +	# check output is flushed on exit

This test seems to have lost some code above this comment so the comment 
is no longer correct - we do not test if the output is flushed on exit 
and looking at the implementation I don't think it is.

Best Wishes

Phillip

> +	read actual <&9 &&
> +	echo "$actual" >actual &&
> +	test_cmp expect actual &&
> +	test_must_be_empty err
> +}
> +
> +run_buffer_test_no_flush () {
> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	touch output &&
> +	test_when_finished 'rm output' &&
> +	mkfifo input &&
> +	test_when_finished 'rm input' &&
> +	mkfifo pid &&
> +	exec 9<>pid &&
> +	test_when_finished 'rm pid; exec 9<&-'
> +	(
> +		git cat-file --buffer --batch-command <input >output &
> +		echo $! &&
> +		wait $!
> +		echo $?
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	test_write_lines "info $sha1" "info $sha1" &&
> +	kill $cat_file_pid &&
> +	read status <&9 &&
> +	test_must_be_empty output
> +}
> +
>   hello_content="Hello World"
>   hello_size=$(strlen "$hello_content")
>   hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
> @@ -224,6 +296,14 @@ test_expect_success "setup" '
>   
>   run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
> +       run_buffer_test_flush blob $hello_sha1 $hello_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
> +       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
> +'
> +
>   test_expect_success '--batch-check without %(rest) considers whole line' '
>   	echo "$hello_sha1 blob $hello_size" >expect &&
>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
> @@ -238,6 +318,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
>   
>   run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
> +       run_buffer_test_flush tree $tree_sha1 $tree_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
> +       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
> +'
> +
>   commit_message="Initial commit"
>   commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
>   commit_size=$(($(test_oid hexsz) + 137))
> @@ -249,6 +337,14 @@ $commit_message"
>   
>   run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
> +       run_buffer_test_flush commit $commit_sha1 $commit_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
> +       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
> +'
> +
>   tag_header_without_timestamp="object $hello_sha1
>   type blob
>   tag hellotag
> @@ -263,11 +359,19 @@ tag_size=$(strlen "$tag_content")
>   
>   run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
> +       run_buffer_test_flush tag $tag_sha1 $tag_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
> +       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
> +'
> +
>   test_expect_success \
>       "Reach a blob from a tag pointing to it" \
>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>   
> -for batch in batch batch-check
> +for batch in batch batch-check batch-command
>   do
>       for opt in t s e p
>       do
> @@ -373,6 +477,62 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>   '
>   
> +batch_command_info_input="info $hello_sha1
> +info $tree_sha1
> +info $commit_sha1
> +info $tag_sha1
> +info deadbeef
> +info
> +flush
> +"
> +
> +test_expect_success "--batch-command with multiple info calls gives correct format" '
> +	test "$batch_check_output" = "$(echo_without_newline \
> +	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
> +'
> +
> +batch_command_contents_input="contents $hello_sha1
> +contents $commit_sha1
> +contents $tag_sha1
> +contents deadbeef
> +contents
> +flush
> +"
> +
> +test_expect_success "--batch-command with multiple contents calls gives correct format" '
> +	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
> +	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
> +'
> +
> +batch_command_mixed_input="info $hello_sha1
> +contents $hello_sha1
> +info $commit_sha1
> +contents $commit_sha1
> +info $tag_sha1
> +contents $tag_sha1
> +contents deadbeef
> +info
> +flush
> +"
> +
> +batch_command_mixed_output="$hello_sha1 blob $hello_size
> +$hello_sha1 blob $hello_size
> +$hello_content
> +$commit_sha1 commit $commit_size
> +$commit_sha1 commit $commit_size
> +$commit_content
> +$tag_sha1 tag $tag_size
> +$tag_sha1 tag $tag_size
> +$tag_content
> +deadbeef missing
> + missing"
> +
> +test_expect_success "--batch-command with mixed calls gives correct format" '
> +	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
> +	"$(maybe_remove_timestamp "$(echo_without_newline \
> +	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
> +'
> +
>   test_expect_success 'setup blobs which are likely to delta' '
>   	test-tool genrandom foo 10240 >foo &&
>   	{ cat foo && echo plus; } >foo-plus &&
> @@ -963,5 +1123,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>   	echo "$orig commit $orig_size" >expect &&
>   	test_cmp expect actual
>   '
> +test_expect_success 'batch-command empty command' '
> +	echo "" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*empty command in input.*" err
> +'
> +
> +test_expect_success 'batch-command whitespace before command' '
> +	echo " info deadbeef" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*whitespace before command.*" err
> +'
> +
> +test_expect_success 'batch-command unknown command' '
> +	echo unknown_command >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*unknown command.*" err
> +'
> +
> +test_expect_success 'batch-command flush with arguments' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
> +	grep -E "^fatal:.*flush takes no arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush without --buffer' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*flush is only for --buffer mode.*" err
> +'
> +
> +test_expect_success 'batch-command flush empty queue' '
> +	echo flush >cmd &&
> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
> +	grep -E "^fatal:.*nothing to flush.*" err
> +'
>   
>   test_done


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-08 11:00       ` Phillip Wood
@ 2022-02-08 17:56         ` Jonathan Tan
  2022-02-08 18:09           ` Junio C Hamano
  0 siblings, 1 reply; 97+ messages in thread
From: Jonathan Tan @ 2022-02-08 17:56 UTC (permalink / raw)
  To: phillip.wood123; +Cc: git, johncai86, Jonathan Tan

Phillip Wood <phillip.wood123@gmail.com> writes:
> Jonathan is there any overlap between what this series is trying to do 
> and your proposal for a batch command[1]? For example would extending 
> this series to get blob sizes be useful to you?
> 
> Best Wishes
> 
> Phillip
> 
> [1] 
> https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

There is overlap, yes. I'm not sure of the best way to resolve it,
though. John mentions a substantial reduction ("roughly half") of Git
processes [1], and if they foresee needing to access things other than
object info and contents, it might be better to start with something
more extensible, like my proposal for a specific batch command. (If not,
they will encounter another increase in the number of processes.) If
they think that they can make do with this patch for the time being, I
think that's fine too: once this is merged (which will be earlier than
any extensible batch command, for sure), they (and anyone else who needs
batched object info and contents without the overhead of initializing
all the data structures in Git) can make use of this improvement.

As for getting blob sizes, I think that --batch-check can already give
it to us. If that is the case, the series is fine as-is (at least in
that aspect).

[1] https://lore.kernel.org/git/1b63164ad4d9ec6b5fa6f733b6095b2779298b36.1644251611.git.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-08 17:56         ` Jonathan Tan
@ 2022-02-08 18:09           ` Junio C Hamano
  2022-02-09  0:11             ` Jonathan Tan
  0 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-08 18:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: phillip.wood123, git, johncai86

Jonathan Tan <jonathantanmy@google.com> writes:

> There is overlap, yes. I'm not sure of the best way to resolve it,
> though. John mentions a substantial reduction ("roughly half") of Git
> processes [1], and if they foresee needing to access things other than
> object info and contents, it might be better to start with something
> more extensible, like my proposal for a specific batch command.

I agree that it would be ideal to have just one way generic and
extensible enough.  I do not know if there are much difference in
that area between the two approaches, though.  The RFC I saw did
look more complex and rigidly specified with framing and such, but
that is only the syntax part---in the way in which interaction
between two processes happen, I didn't quite see fundamental
differences.  I'd expect it wouldn't be too much trouble to add new
commands to code written using either approach (although I haven't
seen yours yet ;-).

Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v3 0/3] Add cat-file --batch-command flag
  2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-08 20:58   ` John Cai via GitGitGadget
  2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                       ` (4 more replies)
  2 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-08 20:58 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has three parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (3):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_command enum to replace print_contents
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  24 ++++
 builtin/cat-file.c             | 154 ++++++++++++++++++++++--
 t/t1006-cat-file.sh            | 207 ++++++++++++++++++++++++++++++++-
 3 files changed, 373 insertions(+), 12 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v3
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v2:

 1:  2d9a0b06ce5 ! 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
     @@ Metadata
       ## Commit message ##
          cat-file: rename cmdmode to transform_mode
      
     -    When introducing a new flag --batch-command, we will add a flag on the
     -    batch_options struct that indicates whether or not an interactive
     -    command mode will be used that reads commands and arguments off of
     -    stdin.
     +    In the next patch, we will add an enum on the batch_options struct that
     +    indicates which type of batch operation will be used: --batch,
     +    --batch-check and the soon to be  --batch-command that will read
     +    commands from stdin. --batch-command mode might get confused with
     +    the cmdmode flag.
      
     -    An intuitive name for this flag would be "command", which can get
     -    confusing with the already existing cmdmode.
     -
     -    cmdmode refers to how the result output of the blob will be transformed,
     -    either according to --filter or --textconv. So transform_mode is a more
     -    descriptive name for the flag, and will not get confused with the new
     -    command flag to be added in the next commit.
     +    There is value in renaming cmdmode in any case. cmdmode refers to how
     +    the result output of the blob will be transformed, either according to
     +    --filter or --textconv. So transform_mode is a more descriptive name
     +    for the flag.
      
          Rename cmdmode to transform_mode in cat-file.c
      
 -:  ----------- > 2:  ae2dfa512a7 cat-file: introduce batch_command enum to replace print_contents
 2:  1b63164ad4d ! 3:  1ab5524ee87 cat-file: add --batch-command mode
     @@ Commit message
          contents. Since we have a pair of cat-file processes per repository,
          this means we can get rid of roughly half of long lived git cat-file
          processes. Given there are many repositories being accessed at any given
     -    time, this can lead to huge savings since on a given server.
     +    time, this can lead to huge savings.
      
          git cat-file --batch-command
      
          will enter an interactive command mode whereby the user can enter in
          commands and their arguments that get queued in memory:
      
     -    <command1> [arg1] [arg2] NL
     -    <command2> [arg1] [arg2] NL
     +    <command1> [arg1] [arg2] LF
     +    <command2> [arg1] [arg2] LF
      
          When --buffer mode is used, commands will be queued in memory until a
          flush command is issued that execute them:
      
     -    flush NL
     +    flush LF
      
          The reason for a flush command is that when a consumer process (A)
          talks to a git cat-file process (B) and interactively writes to and
     @@ Commit message
          With this mechanism of queueing up commands and letting (A) issue a
          flush command, process (A) can control when the buffer is flushed and
          can guarantee it will receive all of the output when in --buffer mode.
     +    --batch-command also will not allow (B) to flush to stdout until a flush
     +    is received.
      
          This patch adds the basic structure for adding command which can be
          extended in the future to add more commands. It also adds the following
          two commands (on top of the flush command):
      
     -    contents <object> NL
     -    info <object> NL
     +    contents <object> LF
     +    info <object> LF
      
          The contents command takes an <object> argument and prints out the object
          contents.
     @@ Commit message
      
          These can be used in the following way with --buffer:
      
     -    contents <sha1> NL
     -    object <sha1> NL
     -    object <sha1> NL
     -    contents <sha1> NL
     +    info <sha1> LF
     +    contents <sha1> LF
     +    contents <sha1> LF
     +    info <sha1> LF
          flush
     -    contents <sha1> NL
     +    info <sha1> LF
          flush
      
          When used without --buffer:
      
     -    contents <sha1> NL
     -    object <sha1> NL
     -    object <sha1> NL
     -    contents <sha1> NL
     -    contents <sha1> NL
     +    info <sha1> LF
     +    contents <sha1> LF
     +    contents <sha1> LF
     +    info <sha1> LF
     +    info <sha1> LF
      
          Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: John Cai <johncai86@gmail.com>
     @@ Documentation/git-cat-file.txt: OPTIONS
      +	`--textconv` or `--filters`, in which case the input lines also need to
      +	specify the path, separated by whitespace.  See the section
      +	`BATCH OUTPUT` below for details.
     -+
     +++
     ++--
      +contents <object>::
     -+	Print object contents for object reference <object>
     ++	Print object contents for object reference <object>. This corresponds to
     ++	the output of --batch.
      +
      +info <object>::
     -+	Print object info for object reference <object>
     ++	Print object info for object reference <object>. This corresponds to the
     ++	output of --batch-check.
      +
      +flush::
     -+	Execute all preceding commands that were issued since the beginning or
     -+	since the last flush command was issued. Only used with --buffer. When
     -+	--buffer is not used, commands are flushed each time without issuing
     -+	`flush`.
     ++	Used in --buffer mode to execute all preceding commands that were issued
     ++	since the beginning or since the last flush was issued. When --buffer
     ++	is used, no output will come until flush is issued. When --buffer is not
     ++	used, commands are flushed each time without issuing `flush`.
     ++--
     +++
      +
       --batch-all-objects::
       	Instead of reading a list of objects on stdin, perform the
       	requested batch operation on all objects in the repository and
      
       ## builtin/cat-file.c ##
     -@@ builtin/cat-file.c: struct batch_options {
     - 	int unordered;
     - 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
     - 	const char *format;
     -+	int command;
     +@@
     + #include "object-store.h"
     + #include "promisor-remote.h"
     + 
     +-enum batch_command {
     +-	BATCH_COMMAND_CONTENTS,
     +-	BATCH_COMMAND_INFO,
     ++enum batch_mode {
     ++	BATCH_MODE_CONTENTS,
     ++	BATCH_MODE_INFO,
     ++	BATCH_MODE_PARSE_CMDS,
       };
       
     - static const char *force_path;
     + struct batch_options {
     + 	int enabled;
     + 	int follow_symlinks;
     +-	enum batch_command command_mode;
     ++	enum batch_mode batch_mode;
     + 	int buffer_output;
     + 	int all_objects;
     + 	int unordered;
     +@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     + 	strbuf_addch(scratch, '\n');
     + 	batch_write(opt, scratch->buf, scratch->len);
     + 
     +-	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
     ++	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
     + 		print_object_or_die(opt, data);
     + 		batch_write(opt, "\n", 1);
     + 	}
      @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oid,
       				      data);
       }
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			     struct strbuf *output,
      +			     struct expand_data *data)
      +{
     -+	opt->print_contents = 1;
     ++	opt->batch_mode = BATCH_MODE_CONTENTS;
      +	batch_one_object(line, output, opt, data);
      +}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			   struct strbuf *output,
      +			   struct expand_data *data)
      +{
     -+	opt->print_contents = 0;
     ++	opt->batch_mode = BATCH_MODE_INFO;
      +	batch_one_object(line, output, opt, data);
      +}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		struct strbuf *output,
      +		struct expand_data *data,
      +		struct queued_cmd *cmds,
     -+		int queued)
     ++		int nr)
      +{
      +	int i;
     -+	for(i = 0; i < queued; i++){
     ++	for (i = 0; i < nr; i++)
      +		cmds[i].fn(opt, cmds[i].line, output, data);
     -+	}
     ++
      +	fflush(stdout);
      +}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	struct strbuf input = STRBUF_INIT;
      +	struct queued_cmd *cmds = NULL;
      +	size_t alloc = 0, nr = 0;
     -+	int queued = 0;
      +
      +	while (!strbuf_getline(&input, stdin)) {
      +		int i;
      +		const struct parse_cmd *cmd = NULL;
     -+		const char *p, *cmd_end;
     ++		const char *p = NULL, *cmd_end;
      +		struct queued_cmd call = {0};
      +
      +		if (!input.len)
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +				die(_("flush is only for --buffer mode"));
      +			if (*cmd_end)
      +				die(_("flush takes no arguments"));
     -+			if (!queued)
     -+				die(_("nothing to flush"));
     -+			flush_batch_calls(opt, output, data, cmds, queued);
     -+			queued = 0;
     ++			if (!nr)
     ++				error(_("nothing to flush"));
     ++
     ++			flush_batch_calls(opt, output, data, cmds, nr);
     ++			nr = 0;
      +			continue;
      +		}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			cmd->fn(opt, p, output, data);
      +			continue;
      +		}
     -+
     -+		queued++;
     -+		if (queued > nr) {
     -+			ALLOC_GROW(cmds, nr+1, alloc);
     -+			nr++;
     -+		}
     -+
     ++		
     ++		ALLOC_GROW(cmds, nr + 1, alloc);
      +		call.fn = cmd->fn;
      +		call.line = xstrdup(p);
     -+		cmds[queued-1] = call;
     ++		cmds[nr++] = call;
      +	}
     ++
     ++	if (opt->buffer_output && nr)
     ++		flush_batch_calls(opt, output, data, cmds, nr);
     ++
      +	free(cmds);
      +	strbuf_release(&input);
      +}
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
       {
       	struct strbuf input = STRBUF_INIT;
      @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
     - 	struct expand_data data;
     - 	int save_warning;
     - 	int retval = 0;
     -+	const int command = opt->command;
     + 	 * If we are printing out the object, then always fill in the type,
     + 	 * since we will want to decide whether or not to stream.
     + 	 */
     +-	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
     ++	if (opt->batch_mode == BATCH_MODE_CONTENTS)
     + 		data.info.typep = &data.type;
       
     - 	if (!opt->format)
     - 		opt->format = "%(objectname) %(objecttype) %(objectsize)";
     + 	if (opt->all_objects) {
      @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	save_warning = warn_on_object_refname_ambiguity;
       	warn_on_object_refname_ambiguity = 0;
       
     -+	if (command) {
     ++	if (opt->batch_mode == BATCH_MODE_PARSE_CMDS) {
      +		batch_objects_command(opt, &output, &data);
      +		goto cleanup;
      +	}
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	strbuf_release(&output);
       	warn_on_object_refname_ambiguity = save_warning;
      @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
     + 	}
       
       	bo->enabled = 1;
     - 	bo->print_contents = !strcmp(opt->long_name, "batch");
     -+	bo->command = !strcmp(opt->long_name, "batch-command");
     +-
     + 	if (!strcmp(opt->long_name, "batch"))
     +-		bo->command_mode = BATCH_COMMAND_CONTENTS;
     ++		bo->batch_mode = BATCH_MODE_CONTENTS;
     + 	if (!strcmp(opt->long_name, "batch-check"))
     +-		bo->command_mode = BATCH_COMMAND_INFO;
     ++		bo->batch_mode = BATCH_MODE_INFO;
     ++	if (!strcmp(opt->long_name, "batch-command"))
     ++		bo->batch_mode = BATCH_MODE_PARSE_CMDS;
     + 
       	bo->format = arg;
       
     - 	return 0;
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
       			N_("like --batch, but don't emit <contents>"),
       			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     @@ t/t1006-cat-file.sh: $content"
       	test_cmp expect actual
           '
       
     -+    test -z "$content" ||
     -+    test_expect_success "--batch-command output of $type content is correct" '
     -+	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
     -+	maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
     -+	| git cat-file --batch-command)" $no_ts >actual &&
     -+	test_cmp expect actual
     -+    '
     -+
     -+    test_expect_success "--batch-command output of $type info is correct" '
     -+	echo "$sha1 $type $size" >expect &&
     -+	test_write_lines "info $sha1" | git cat-file --batch-command >actual &&
     -+	test_cmp expect actual
     -+    '
     ++    for opt in --buffer --no-buffer
     ++    do
     ++	test -z "$content" ||
     ++		test_expect_success "--batch-command $opt output of $type content is correct" '
     ++		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
     ++		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
     ++		| git cat-file --batch-command $opt)" $no_ts >actual &&
     ++		test_cmp expect actual
     ++	'
     ++
     ++	test_expect_success "--batch-command $opt output of $type info is correct" '
     ++		echo "$sha1 $type $size" >expect &&
     ++		test_write_lines "info $sha1" \
     ++		| git cat-file --batch-command $opt >actual &&
     ++		test_cmp expect actual
     ++	'
     ++    done
      +
           test_expect_success "custom --batch-check format" '
       	echo "$type $sha1" >expect &&
     @@ t/t1006-cat-file.sh: $content"
      +	exec 9<>output &&
      +	test_when_finished 'rm output; exec 9<&-'
      +	(
     ++		# TODO - Ideally we'd pipe the output of cat-file
     ++		# through "sed s'/$/\\/'" to make sure that that read
     ++		# would consume all the available
     ++		# output. Unfortunately we cannot do this as we cannot
     ++		# control when sed flushes its output. We could write
     ++		# a test helper in C that appended a '\' to the end of
     ++		# each line and flushes its output after every line.
      +		git cat-file --buffer --batch-command <input 2>err &
      +		echo $! &&
      +		wait $!
     @@ t/t1006-cat-file.sh: $content"
      +	test_write_lines "info $sha1" flush "info $sha1" >input
      +	# TODO - consume all available input, not just one
      +	# line (see above).
     -+	# check output is flushed on exit
      +	read actual <&9 &&
      +	echo "$actual" >actual &&
      +	test_cmp expect actual &&
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch-all-objects --batch-c
      +
      +test_expect_success 'batch-command flush empty queue' '
      +	echo flush >cmd &&
     -+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
     -+	grep -E "^fatal:.*nothing to flush.*" err
     ++	test_expect_code 0 git cat-file --batch-command --buffer <cmd 2>err &&
     ++	grep -E "^error:.*nothing to flush.*" err
      +'
       
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-08 20:58     ` John Cai via GitGitGadget
  2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-08 20:58 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-08 20:58     ` John Cai via GitGitGadget
  2022-02-08 23:43       ` Junio C Hamano
  2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-08 20:58 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

The next patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. Currently,
from the batch_options struct's perspective, print_options is the only
member used to distinguish between the different modes. This makes the
code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..1c673385868 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_command {
+	BATCH_COMMAND_CONTENTS,
+	BATCH_COMMAND_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_command command_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,12 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->command_mode = BATCH_COMMAND_CONTENTS;
+	if (!strcmp(opt->long_name, "batch-check"))
+		bo->command_mode = BATCH_COMMAND_INFO;
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v3 3/3] cat-file: add --batch-command mode
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-08 20:58     ` John Cai via GitGitGadget
  2022-02-08 23:59       ` Junio C Hamano
  2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-08 20:58 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <sha1> LF
contents <sha1> LF
contents <sha1> LF
info <sha1> LF
flush
info <sha1> LF
flush

When used without --buffer:

info <sha1> LF
contents <sha1> LF
contents <sha1> LF
info <sha1> LF
info <sha1> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  24 ++++
 builtin/cat-file.c             | 140 ++++++++++++++++++++--
 t/t1006-cat-file.sh            | 207 ++++++++++++++++++++++++++++++++-
 3 files changed, 361 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..d77a61c47de 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,30 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin.
+	May not be combined with any other options or arguments except
+	`--textconv` or `--filters`, in which case the input lines also need to
+	specify the path, separated by whitespace.  See the section
+	`BATCH OUTPUT` below for details.
++
+--
+contents <object>::
+	Print object contents for object reference <object>. This corresponds to
+	the output of --batch.
+
+info <object>::
+	Print object info for object reference <object>. This corresponds to the
+	output of --batch-check.
+
+flush::
+	Used in --buffer mode to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When --buffer
+	is used, no output will come until flush is issued. When --buffer is not
+	used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1c673385868..ec266ff95e9 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,15 +17,16 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
-enum batch_command {
-	BATCH_COMMAND_CONTENTS,
-	BATCH_COMMAND_INFO,
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+	BATCH_MODE_PARSE_CMDS,
 };
 
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	enum batch_command command_mode;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -391,7 +392,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -513,6 +514,117 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	const char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void flush_batch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmds,
+		int nr)
+{
+	int i;
+	for (i = 0; i < nr; i++)
+		cmds[i].fn(opt, cmds[i].line, output, data);
+
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *cmds = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
+			if (!opt->buffer_output)
+				die(_("flush is only for --buffer mode"));
+			if (*cmd_end)
+				die(_("flush takes no arguments"));
+			if (!nr)
+				error(_("nothing to flush"));
+
+			flush_batch_calls(opt, output, data, cmds, nr);
+			nr = 0;
+			continue;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args)
+				p = cmd_end + 1;
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+		
+		ALLOC_GROW(cmds, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup(p);
+		cmds[nr++] = call;
+	}
+
+	if (opt->buffer_output && nr)
+		flush_batch_calls(opt, output, data, cmds, nr);
+
+	free(cmds);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -541,7 +653,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -595,6 +707,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_PARSE_CMDS) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +729,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -640,11 +757,12 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-
 	if (!strcmp(opt->long_name, "batch"))
-		bo->command_mode = BATCH_COMMAND_CONTENTS;
+		bo->batch_mode = BATCH_MODE_CONTENTS;
 	if (!strcmp(opt->long_name, "batch-check"))
-		bo->command_mode = BATCH_COMMAND_INFO;
+		bo->batch_mode = BATCH_MODE_INFO;
+	if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_PARSE_CMDS;
 
 	bo->format = arg;
 
@@ -693,6 +811,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..635667f8168 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -177,6 +177,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -213,6 +231,70 @@ $content"
     '
 }
 
+run_buffer_test_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	mkfifo input &&
+	test_when_finished 'rm input; exec 8<&-' &&
+	mkfifo output &&
+	exec 9<>output &&
+	test_when_finished 'rm output; exec 9<&-'
+	(
+		# TODO - Ideally we'd pipe the output of cat-file
+		# through "sed s'/$/\\/'" to make sure that that read
+		# would consume all the available
+		# output. Unfortunately we cannot do this as we cannot
+		# control when sed flushes its output. We could write
+		# a test helper in C that appended a '\' to the end of
+		# each line and flushes its output after every line.
+		git cat-file --buffer --batch-command <input 2>err &
+		echo $! &&
+		wait $!
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	echo "$sha1 $type $size" >expect &&
+	test_write_lines "info $sha1" flush "info $sha1" >input
+	# TODO - consume all available input, not just one
+	# line (see above).
+	read actual <&9 &&
+	echo "$actual" >actual &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+}
+
+run_buffer_test_no_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	touch output &&
+	test_when_finished 'rm output' &&
+	mkfifo input &&
+	test_when_finished 'rm input' &&
+	mkfifo pid &&
+	exec 9<>pid &&
+	test_when_finished 'rm pid; exec 9<&-'
+	(
+		git cat-file --buffer --batch-command <input >output &
+		echo $! &&
+		wait $!
+		echo $?
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	test_write_lines "info $sha1" "info $sha1" &&
+	kill $cat_file_pid &&
+	read status <&9 &&
+	test_must_be_empty output
+}
+
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
 hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
@@ -224,6 +306,14 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
+       run_buffer_test_flush blob $hello_sha1 $hello_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
+       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -238,6 +328,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
+       run_buffer_test_flush tree $tree_sha1 $tree_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
+       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
+'
+
 commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
@@ -249,6 +347,14 @@ $commit_message"
 
 run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
+       run_buffer_test_flush commit $commit_sha1 $commit_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
+       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
+'
+
 tag_header_without_timestamp="object $hello_sha1
 type blob
 tag hellotag
@@ -263,11 +369,19 @@ tag_size=$(strlen "$tag_content")
 
 run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
+       run_buffer_test_flush tag $tag_sha1 $tag_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
+       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
+'
+
 test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -373,6 +487,62 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+batch_command_info_input="info $hello_sha1
+info $tree_sha1
+info $commit_sha1
+info $tag_sha1
+info deadbeef
+info 
+flush
+"
+
+test_expect_success "--batch-command with multiple info calls gives correct format" '
+	test "$batch_check_output" = "$(echo_without_newline \
+	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
+'
+
+batch_command_contents_input="contents $hello_sha1
+contents $commit_sha1
+contents $tag_sha1
+contents deadbeef
+contents 
+flush
+"
+
+test_expect_success "--batch-command with multiple contents calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
+'
+
+batch_command_mixed_input="info $hello_sha1
+contents $hello_sha1
+info $commit_sha1
+contents $commit_sha1
+info $tag_sha1
+contents $tag_sha1
+contents deadbeef
+info 
+flush
+"
+
+batch_command_mixed_output="$hello_sha1 blob $hello_size
+$hello_sha1 blob $hello_size
+$hello_content
+$commit_sha1 commit $commit_size
+$commit_sha1 commit $commit_size
+$commit_content
+$tag_sha1 tag $tag_size
+$tag_sha1 tag $tag_size
+$tag_content
+deadbeef missing
+ missing"
+
+test_expect_success "--batch-command with mixed calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline \
+	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -963,5 +1133,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep -E "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*flush is only for --buffer mode.*" err
+'
+
+test_expect_success 'batch-command flush empty queue' '
+	echo flush >cmd &&
+	test_expect_code 0 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep -E "^error:.*nothing to flush.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents
  2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-08 23:43       ` Junio C Hamano
  0 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-08 23:43 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +enum batch_command {
> +	BATCH_COMMAND_CONTENTS,
> +	BATCH_COMMAND_INFO,
> +};
> +
>  struct batch_options {
>  	int enabled;
>  	int follow_symlinks;
> -	int print_contents;
> +	enum batch_command command_mode;
>  	int buffer_output;
>  	int all_objects;
>  	int unordered;
> @@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
>  	strbuf_addch(scratch, '\n');
>  	batch_write(opt, scratch->buf, scratch->len);
>  
> -	if (opt->print_contents) {
> +	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
>  		print_object_or_die(opt, data);
>  		batch_write(opt, "\n", 1);
>  	}

Nice.

> @@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
>  	 * If we are printing out the object, then always fill in the type,
>  	 * since we will want to decide whether or not to stream.
>  	 */
> -	if (opt->print_contents)
> +	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
>  		data.info.typep = &data.type;
>  
>  	if (opt->all_objects) {
> @@ -635,7 +640,12 @@ static int batch_option_callback(const struct option *opt,
>  	}
>  
>  	bo->enabled = 1;
> -	bo->print_contents = !strcmp(opt->long_name, "batch");
> +
> +	if (!strcmp(opt->long_name, "batch"))
> +		bo->command_mode = BATCH_COMMAND_CONTENTS;
> +	if (!strcmp(opt->long_name, "batch-check"))
> +		bo->command_mode = BATCH_COMMAND_INFO;

This may want to become if / else if / else cascade, whose last
"else" clause would say

	BUG("%s given to batch-option-callback", opt->long_name);

perhaps, but it is so minor that there is no need to reroll only for
this.

Looking good.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v3 3/3] cat-file: add --batch-command mode
  2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-08 23:59       ` Junio C Hamano
  0 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-08 23:59 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 1c673385868..ec266ff95e9 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -17,15 +17,16 @@
>  #include "object-store.h"
>  #include "promisor-remote.h"
>  
> -enum batch_command {
> -	BATCH_COMMAND_CONTENTS,
> -	BATCH_COMMAND_INFO,
> +enum batch_mode {
> +	BATCH_MODE_CONTENTS,
> +	BATCH_MODE_INFO,

Would have been better to introduce batch_mode at [2/3] insteads of
having to rename it like this, I guess.

> +	BATCH_MODE_PARSE_CMDS,

What the new mode does looks more like queue-and-dispatch, as
opposed to a mode that can only do "info" or another mode that can
only do "contents".

> @@ -513,6 +514,117 @@ static int batch_unordered_packed(const struct object_id *oid,
>  				      data);
>  }
>  
> +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> +			       struct strbuf *, struct expand_data *);
> +
> +struct queued_cmd {
> +	parse_cmd_fn_t fn;
> +	const char *line;
> +};
> +
> +static void parse_cmd_contents(struct batch_options *opt,
> +			     const char *line,
> +			     struct strbuf *output,
> +			     struct expand_data *data)
> +{
> +	opt->batch_mode = BATCH_MODE_CONTENTS;
> +	batch_one_object(line, output, opt, data);
> +}
> +
> +static void parse_cmd_info(struct batch_options *opt,
> +			   const char *line,
> +			   struct strbuf *output,
> +			   struct expand_data *data)
> +{
> +	opt->batch_mode = BATCH_MODE_INFO;
> +	batch_one_object(line, output, opt, data);
> +}

OK.

> +static void flush_batch_calls(struct batch_options *opt,
> +		struct strbuf *output,
> +		struct expand_data *data,
> +		struct queued_cmd *cmds,
> +		int nr)
> +{
> +	int i;

Have a blank line here between the decl(s) and the first statement.

> +	for (i = 0; i < nr; i++)
> +		cmds[i].fn(opt, cmds[i].line, output, data);
> +
> +	fflush(stdout);
> +}
> +
> +static const struct parse_cmd {
> +	const char *prefix;
> +	parse_cmd_fn_t fn;
> +	unsigned takes_args;
> +} commands[] = {
> +	{ "contents", parse_cmd_contents, 1},
> +	{ "info", parse_cmd_info, 1},
> +};
> +
> +static void batch_objects_command(struct batch_options *opt,
> +				    struct strbuf *output,
> +				    struct expand_data *data)
> +{
> +	struct strbuf input = STRBUF_INIT;
> +	struct queued_cmd *cmds = NULL;
> +	size_t alloc = 0, nr = 0;
> +
> +	while (!strbuf_getline(&input, stdin)) {
> +		int i;
> +		const struct parse_cmd *cmd = NULL;
> +		const char *p = NULL, *cmd_end;
> +		struct queued_cmd call = {0};
> +
> +		if (!input.len)
> +			die(_("empty command in input"));
> +		if (isspace(*input.buf))
> +			die(_("whitespace before command: '%s'"), input.buf);
> +
> +		if (skip_prefix(input.buf, "flush", &cmd_end)) {
> +			if (!opt->buffer_output)
> +				die(_("flush is only for --buffer mode"));
> +			if (*cmd_end)
> +				die(_("flush takes no arguments"));
> +			if (!nr)
> +				error(_("nothing to flush"));

I am not sure if "you already gave us flush and haven't given a new
command, saying 'flush' in such a state is an error" is a good
interface.  What does it achieve to punish such a caller like this
(as opposed to just iterate the loop zero times)?

> +			flush_batch_calls(opt, output, data, cmds, nr);

This iterated cmds[] array and called .fn() for each.  For a command
in the cmds[] array that takes an argument, each element in cmds[]
has a pointer that holds memory obtained from xstrdup_or_null().

Rewinding the array with "nr = 0" to reuse the slots we have
allocated is good, but the memory pointed at by the .line member of
these elements must be freed when this happens.

Perhaps flush_batch_calls() should do so while iterating, i.e.

	for (i = 0; i < nr; i++) {
		cmds[i].fn(opt, cmds[i].line, output, data);
		free(cmds[i].line);
	}
	fflush(stdout);

or something like that?

A tangent, but naming an array as singular, cmd[], would work more
natural when the code often works on individual elements of the
array, i.e. work_on(cmd[4]) would name the "4th cmd", which would
not work well if the array were named cmds[].


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/2] cat-file: add --batch-command mode
  2022-02-08 18:09           ` Junio C Hamano
@ 2022-02-09  0:11             ` Jonathan Tan
  0 siblings, 0 replies; 97+ messages in thread
From: Jonathan Tan @ 2022-02-09  0:11 UTC (permalink / raw)
  To: gitster; +Cc: git, johncai86, jonathantanmy, phillip.wood123

Junio C Hamano <gitster@pobox.com> writes:
> I agree that it would be ideal to have just one way generic and
> extensible enough.  I do not know if there are much difference in
> that area between the two approaches, though.  The RFC I saw did
> look more complex and rigidly specified with framing and such, but
> that is only the syntax part---in the way in which interaction
> between two processes happen, I didn't quite see fundamental
> differences.  I'd expect it wouldn't be too much trouble to add new
> commands to code written using either approach (although I haven't
> seen yours yet ;-).
> 
> Thanks.

It is similar to this approach, except:
 - the approach I sent out uses pkt-line, which might be difficult to
   retrofit to cat-file if we need it
 - in the future, we want the Git side to be able to initiate requests
 - (possibly minor) it may be confusing if we add functionality to
   cat-file that is not about reading objects

Other than that, yes, they are similar.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v3 0/3] Add cat-file --batch-command flag
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-09 21:40     ` Junio C Hamano
  2022-02-09 22:22       ` John Cai
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-09 21:40 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> John Cai (3):
>   cat-file: rename cmdmode to transform_mode
>   cat-file: introduce batch_command enum to replace print_contents
>   cat-file: add --batch-command mode
>
>  Documentation/git-cat-file.txt |  24 ++++
>  builtin/cat-file.c             | 154 ++++++++++++++++++++++--
>  t/t1006-cat-file.sh            | 207 ++++++++++++++++++++++++++++++++-
>  3 files changed, 373 insertions(+), 12 deletions(-)

Does t1006-cat-file.sh pass the --stress test?  I have no time to
test it for you but I've seen "make test" got stuck and this is the
only cat-file related change in flight.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v3 0/3] Add cat-file --batch-command flag
  2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
@ 2022-02-09 22:22       ` John Cai
  2022-02-09 23:10         ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-09 22:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan

Hi Junio

On 9 Feb 2022, at 16:40, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> John Cai (3):
>>   cat-file: rename cmdmode to transform_mode
>>   cat-file: introduce batch_command enum to replace print_contents
>>   cat-file: add --batch-command mode
>>
>>  Documentation/git-cat-file.txt |  24 ++++
>>  builtin/cat-file.c             | 154 ++++++++++++++++++++++--
>>  t/t1006-cat-file.sh            | 207 ++++++++++++++++++++++++++++++++-
>>  3 files changed, 373 insertions(+), 12 deletions(-)
>
> Does t1006-cat-file.sh pass the --stress test?  I have no time to
> test it for you but I've seen "make test" got stuck and this is the
> only cat-file related change in flight.

Yes it looks like there are some failures. Thanks for pointing this out. It
looks like the flush test is getting stuck. I can actually reproduce it on my
end when I do a make clean in t/ and then run the test. Will investigate.

thanks!

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v3 0/3] Add cat-file --batch-command flag
  2022-02-09 22:22       ` John Cai
@ 2022-02-09 23:10         ` John Cai
  0 siblings, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-09 23:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan

On 9 Feb 2022, at 17:22, John Cai wrote:

> Hi Junio
>
> On 9 Feb 2022, at 16:40, Junio C Hamano wrote:
>
>> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> John Cai (3):
>>>   cat-file: rename cmdmode to transform_mode
>>>   cat-file: introduce batch_command enum to replace print_contents
>>>   cat-file: add --batch-command mode
>>>
>>>  Documentation/git-cat-file.txt |  24 ++++
>>>  builtin/cat-file.c             | 154 ++++++++++++++++++++++--
>>>  t/t1006-cat-file.sh            | 207 ++++++++++++++++++++++++++++++++-
>>>  3 files changed, 373 insertions(+), 12 deletions(-)
>>
>> Does t1006-cat-file.sh pass the --stress test?  I have no time to
>> test it for you but I've seen "make test" got stuck and this is the
>> only cat-file related change in flight.
>
> Yes it looks like there are some failures. Thanks for pointing this out. It
> looks like the flush test is getting stuck. I can actually reproduce it on my
> end when I do a make clean in t/ and then run the test. Will investigate.

I believe this was the culprit, as the stress tests that failed passed once I
removed this:

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 9428a04482..a20c8dae85 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -237,7 +237,7 @@ run_buffer_test_flush () {
        size=$3

        mkfifo input &&
-       test_when_finished 'rm input; exec 8<&-' &&
+       test_when_finished 'rm input' &&
        mkfifo output &&
        exec 9<>output &&
        test_when_finished 'rm output; exec 9<&-'

I was closing a file descriptor 8 that was never opened. But, I don't fully
understand why that would create problems.

>
> thanks!

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 0/3] Add cat-file --batch-command flag
  2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
@ 2022-02-10  4:01     ` John Cai via GitGitGadget
  2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                         ` (4 more replies)
  4 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-10  4:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has three parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (3):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  24 ++++
 builtin/cat-file.c             | 159 +++++++++++++++++++++++--
 t/t1006-cat-file.sh            | 211 ++++++++++++++++++++++++++++++++-
 3 files changed, 382 insertions(+), 12 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v4
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v3:

 1:  fa6294387ab = 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
 2:  ae2dfa512a7 ! 2:  81bc5ae1fc1 cat-file: introduce batch_command enum to replace print_contents
     @@ Metadata
      Author: John Cai <johncai86@gmail.com>
      
       ## Commit message ##
     -    cat-file: introduce batch_command enum to replace print_contents
     +    cat-file: introduce batch_mode enum to replace print_contents
      
          The next patch introduces a new --batch-command flag. Including --batch
          and --batch-check, we will have a total of three batch modes. Currently,
     @@ builtin/cat-file.c
       #include "object-store.h"
       #include "promisor-remote.h"
       
     -+enum batch_command {
     -+	BATCH_COMMAND_CONTENTS,
     -+	BATCH_COMMAND_INFO,
     ++enum batch_mode {
     ++	BATCH_MODE_CONTENTS,
     ++	BATCH_MODE_INFO,
      +};
      +
       struct batch_options {
       	int enabled;
       	int follow_symlinks;
      -	int print_contents;
     -+	enum batch_command command_mode;
     ++	enum batch_mode batch_mode;
       	int buffer_output;
       	int all_objects;
       	int unordered;
     @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
       	batch_write(opt, scratch->buf, scratch->len);
       
      -	if (opt->print_contents) {
     -+	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
     ++	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
       		print_object_or_die(opt, data);
       		batch_write(opt, "\n", 1);
       	}
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	 * since we will want to decide whether or not to stream.
       	 */
      -	if (opt->print_contents)
     -+	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
     ++	if (opt->batch_mode == BATCH_MODE_CONTENTS)
       		data.info.typep = &data.type;
       
       	if (opt->all_objects) {
     @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       	bo->enabled = 1;
      -	bo->print_contents = !strcmp(opt->long_name, "batch");
      +
     -+	if (!strcmp(opt->long_name, "batch"))
     -+		bo->command_mode = BATCH_COMMAND_CONTENTS;
     -+	if (!strcmp(opt->long_name, "batch-check"))
     -+		bo->command_mode = BATCH_COMMAND_INFO;
     ++	if (!strcmp(opt->long_name, "batch")) {
     ++		bo->batch_mode = BATCH_MODE_CONTENTS;
     ++	} else if (!strcmp(opt->long_name, "batch-check")) {
     ++		bo->batch_mode = BATCH_MODE_INFO;
     ++	} else {
     ++		BUG("%s given to batch-option-callback", opt->long_name);
     ++	}
      +
       	bo->format = arg;
       
 3:  1ab5524ee87 ! 3:  6c51324a662 cat-file: add --batch-command mode
     @@ Documentation/git-cat-file.txt: OPTIONS
      
       ## builtin/cat-file.c ##
      @@
     - #include "object-store.h"
     - #include "promisor-remote.h"
     - 
     --enum batch_command {
     --	BATCH_COMMAND_CONTENTS,
     --	BATCH_COMMAND_INFO,
     -+enum batch_mode {
     -+	BATCH_MODE_CONTENTS,
     -+	BATCH_MODE_INFO,
     -+	BATCH_MODE_PARSE_CMDS,
     + enum batch_mode {
     + 	BATCH_MODE_CONTENTS,
     + 	BATCH_MODE_INFO,
     ++	BATCH_MODE_QUEUE_AND_DISPATCH,
       };
       
       struct batch_options {
     - 	int enabled;
     - 	int follow_symlinks;
     --	enum batch_command command_mode;
     -+	enum batch_mode batch_mode;
     - 	int buffer_output;
     - 	int all_objects;
     - 	int unordered;
     -@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
     - 	strbuf_addch(scratch, '\n');
     - 	batch_write(opt, scratch->buf, scratch->len);
     - 
     --	if (opt->command_mode == BATCH_COMMAND_CONTENTS) {
     -+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
     - 		print_object_or_die(opt, data);
     - 		batch_write(opt, "\n", 1);
     - 	}
      @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oid,
       				      data);
       }
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +
      +struct queued_cmd {
      +	parse_cmd_fn_t fn;
     -+	const char *line;
     ++	char *line;
      +};
      +
      +static void parse_cmd_contents(struct batch_options *opt,
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	batch_one_object(line, output, opt, data);
      +}
      +
     -+static void flush_batch_calls(struct batch_options *opt,
     ++static void dispatch_calls(struct batch_options *opt,
      +		struct strbuf *output,
      +		struct expand_data *data,
     -+		struct queued_cmd *cmds,
     ++		struct queued_cmd *cmd,
      +		int nr)
      +{
      +	int i;
     -+	for (i = 0; i < nr; i++)
     -+		cmds[i].fn(opt, cmds[i].line, output, data);
     ++
     ++	for (i = 0; i < nr; i++){
     ++		cmd[i].fn(opt, cmd[i].line, output, data);
     ++		free(cmd[i].line);
     ++	}
      +
      +	fflush(stdout);
      +}
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +				    struct expand_data *data)
      +{
      +	struct strbuf input = STRBUF_INIT;
     -+	struct queued_cmd *cmds = NULL;
     ++	struct queued_cmd *queued_cmd = NULL;
      +	size_t alloc = 0, nr = 0;
      +
      +	while (!strbuf_getline(&input, stdin)) {
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +				die(_("flush is only for --buffer mode"));
      +			if (*cmd_end)
      +				die(_("flush takes no arguments"));
     -+			if (!nr)
     -+				error(_("nothing to flush"));
      +
     -+			flush_batch_calls(opt, output, data, cmds, nr);
     ++			dispatch_calls(opt, output, data, queued_cmd, nr);
      +			nr = 0;
      +			continue;
      +		}
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			cmd->fn(opt, p, output, data);
      +			continue;
      +		}
     -+		
     -+		ALLOC_GROW(cmds, nr + 1, alloc);
     ++
     ++		ALLOC_GROW(queued_cmd, nr + 1, alloc);
      +		call.fn = cmd->fn;
     -+		call.line = xstrdup(p);
     -+		cmds[nr++] = call;
     ++		call.line = xstrdup_or_null(p);
     ++		queued_cmd[nr++] = call;
      +	}
      +
      +	if (opt->buffer_output && nr)
     -+		flush_batch_calls(opt, output, data, cmds, nr);
     ++		dispatch_calls(opt, output, data, queued_cmd, nr);
      +
     -+	free(cmds);
     ++	free(queued_cmd);
      +	strbuf_release(&input);
      +}
      +
       static int batch_objects(struct batch_options *opt)
       {
       	struct strbuf input = STRBUF_INIT;
     -@@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
     - 	 * If we are printing out the object, then always fill in the type,
     - 	 * since we will want to decide whether or not to stream.
     - 	 */
     --	if (opt->command_mode == BATCH_COMMAND_CONTENTS)
     -+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
     - 		data.info.typep = &data.type;
     - 
     - 	if (opt->all_objects) {
      @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	save_warning = warn_on_object_refname_ambiguity;
       	warn_on_object_refname_ambiguity = 0;
       
     -+	if (opt->batch_mode == BATCH_MODE_PARSE_CMDS) {
     ++	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
      +		batch_objects_command(opt, &output, &data);
      +		goto cleanup;
      +	}
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	strbuf_release(&output);
       	warn_on_object_refname_ambiguity = save_warning;
      @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
     + 		bo->batch_mode = BATCH_MODE_CONTENTS;
     + 	} else if (!strcmp(opt->long_name, "batch-check")) {
     + 		bo->batch_mode = BATCH_MODE_INFO;
     ++	} else if (!strcmp(opt->long_name, "batch-command")) {
     ++		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
     + 	} else {
     + 		BUG("%s given to batch-option-callback", opt->long_name);
       	}
     - 
     - 	bo->enabled = 1;
     --
     - 	if (!strcmp(opt->long_name, "batch"))
     --		bo->command_mode = BATCH_COMMAND_CONTENTS;
     -+		bo->batch_mode = BATCH_MODE_CONTENTS;
     - 	if (!strcmp(opt->long_name, "batch-check"))
     --		bo->command_mode = BATCH_COMMAND_INFO;
     -+		bo->batch_mode = BATCH_MODE_INFO;
     -+	if (!strcmp(opt->long_name, "batch-command"))
     -+		bo->batch_mode = BATCH_MODE_PARSE_CMDS;
     - 
     - 	bo->format = arg;
     - 
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
       			N_("like --batch, but don't emit <contents>"),
       			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     @@ t/t1006-cat-file.sh: $content"
      +	size=$3
      +
      +	mkfifo input &&
     -+	test_when_finished 'rm input; exec 8<&-' &&
     ++	test_when_finished 'rm input' &&
      +	mkfifo output &&
      +	exec 9<>output &&
      +	test_when_finished 'rm output; exec 9<&-'
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +info $commit_sha1
      +info $tag_sha1
      +info deadbeef
     -+info 
      +flush
      +"
      +
     ++batch_command_info_output="$hello_sha1 blob $hello_size
     ++$tree_sha1 tree $tree_size
     ++$commit_sha1 commit $commit_size
     ++$tag_sha1 tag $tag_size
     ++deadbeef missing"
     ++
      +test_expect_success "--batch-command with multiple info calls gives correct format" '
     -+	test "$batch_check_output" = "$(echo_without_newline \
     ++	test "$batch_command_info_output" = "$(echo_without_newline \
      +	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
      +'
      +
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +contents $commit_sha1
      +contents $tag_sha1
      +contents deadbeef
     -+contents 
      +flush
      +"
      +
     ++batch_command_output="$hello_sha1 blob $hello_size
     ++$hello_content
     ++$commit_sha1 commit $commit_size
     ++$commit_content
     ++$tag_sha1 tag $tag_size
     ++$tag_content
     ++deadbeef missing"
     ++
      +test_expect_success "--batch-command with multiple contents calls gives correct format" '
     -+	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
     ++	test "$(maybe_remove_timestamp "$batch_command_output" 1)" = \
      +	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
      +'
      +
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +info $tag_sha1
      +contents $tag_sha1
      +contents deadbeef
     -+info 
      +flush
      +"
      +
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +$tag_sha1 tag $tag_size
      +$tag_sha1 tag $tag_size
      +$tag_content
     -+deadbeef missing
     -+ missing"
     ++deadbeef missing"
      +
      +test_expect_success "--batch-command with mixed calls gives correct format" '
      +	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch-all-objects --batch-c
      +	echo "flush arg" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
      +	grep -E "^fatal:.*flush is only for --buffer mode.*" err
     -+'
     -+
     -+test_expect_success 'batch-command flush empty queue' '
     -+	echo flush >cmd &&
     -+	test_expect_code 0 git cat-file --batch-command --buffer <cmd 2>err &&
     -+	grep -E "^error:.*nothing to flush.*" err
      +'
       
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
@ 2022-02-10  4:01       ` John Cai via GitGitGadget
  2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-10  4:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
  2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-10  4:01       ` John Cai via GitGitGadget
  2022-02-10 10:10         ` Christian Couder
  2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-10  4:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

The next patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. Currently,
from the batch_options struct's perspective, print_options is the only
member used to distinguish between the different modes. This makes the
code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..709510c6564 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,15 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch")) {
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	} else if (!strcmp(opt->long_name, "batch-check")) {
+		bo->batch_mode = BATCH_MODE_INFO;
+	} else {
+		BUG("%s given to batch-option-callback", opt->long_name);
+	}
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
  2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-10  4:01       ` John Cai via GitGitGadget
  2022-02-10 10:57         ` Phillip Wood
  2022-02-10 22:46         ` Eric Sunshine
  2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
  4 siblings, 2 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-10  4:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, John Cai, John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <sha1> LF
contents <sha1> LF
contents <sha1> LF
info <sha1> LF
flush
info <sha1> LF
flush

When used without --buffer:

info <sha1> LF
contents <sha1> LF
contents <sha1> LF
info <sha1> LF
info <sha1> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  24 ++++
 builtin/cat-file.c             | 124 +++++++++++++++++++
 t/t1006-cat-file.sh            | 211 ++++++++++++++++++++++++++++++++-
 3 files changed, 358 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..d77a61c47de 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,30 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin.
+	May not be combined with any other options or arguments except
+	`--textconv` or `--filters`, in which case the input lines also need to
+	specify the path, separated by whitespace.  See the section
+	`BATCH OUTPUT` below for details.
++
+--
+contents <object>::
+	Print object contents for object reference <object>. This corresponds to
+	the output of --batch.
+
+info <object>::
+	Print object info for object reference <object>. This corresponds to the
+	output of --batch-check.
+
+flush::
+	Used in --buffer mode to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When --buffer
+	is used, no output will come until flush is issued. When --buffer is not
+	used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 709510c6564..713658cc222 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,118 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	for (i = 0; i < nr; i++){
+		cmd[i].fn(opt, cmd[i].line, output, data);
+		free(cmd[i].line);
+	}
+
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
+			if (!opt->buffer_output)
+				die(_("flush is only for --buffer mode"));
+			if (*cmd_end)
+				die(_("flush takes no arguments"));
+
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			nr = 0;
+			continue;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args)
+				p = cmd_end + 1;
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output && nr)
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +708,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +730,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +763,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	} else if (!strcmp(opt->long_name, "batch-check")) {
 		bo->batch_mode = BATCH_MODE_INFO;
+	} else if (!strcmp(opt->long_name, "batch-command")) {
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	} else {
 		BUG("%s given to batch-option-callback", opt->long_name);
 	}
@@ -696,6 +816,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..a20c8dae85d 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -177,6 +177,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -213,6 +231,70 @@ $content"
     '
 }
 
+run_buffer_test_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	mkfifo input &&
+	test_when_finished 'rm input' &&
+	mkfifo output &&
+	exec 9<>output &&
+	test_when_finished 'rm output; exec 9<&-'
+	(
+		# TODO - Ideally we'd pipe the output of cat-file
+		# through "sed s'/$/\\/'" to make sure that that read
+		# would consume all the available
+		# output. Unfortunately we cannot do this as we cannot
+		# control when sed flushes its output. We could write
+		# a test helper in C that appended a '\' to the end of
+		# each line and flushes its output after every line.
+		git cat-file --buffer --batch-command <input 2>err &
+		echo $! &&
+		wait $!
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	echo "$sha1 $type $size" >expect &&
+	test_write_lines "info $sha1" flush "info $sha1" >input
+	# TODO - consume all available input, not just one
+	# line (see above).
+	read actual <&9 &&
+	echo "$actual" >actual &&
+	test_cmp expect actual &&
+	test_must_be_empty err
+}
+
+run_buffer_test_no_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	touch output &&
+	test_when_finished 'rm output' &&
+	mkfifo input &&
+	test_when_finished 'rm input' &&
+	mkfifo pid &&
+	exec 9<>pid &&
+	test_when_finished 'rm pid; exec 9<&-'
+	(
+		git cat-file --buffer --batch-command <input >output &
+		echo $! &&
+		wait $!
+		echo $?
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	test_write_lines "info $sha1" "info $sha1" &&
+	kill $cat_file_pid &&
+	read status <&9 &&
+	test_must_be_empty output
+}
+
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
 hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
@@ -224,6 +306,14 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
+       run_buffer_test_flush blob $hello_sha1 $hello_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
+       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -238,6 +328,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
+       run_buffer_test_flush tree $tree_sha1 $tree_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
+       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
+'
+
 commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
@@ -249,6 +347,14 @@ $commit_message"
 
 run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
+       run_buffer_test_flush commit $commit_sha1 $commit_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
+       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
+'
+
 tag_header_without_timestamp="object $hello_sha1
 type blob
 tag hellotag
@@ -263,11 +369,19 @@ tag_size=$(strlen "$tag_content")
 
 run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
 
+test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
+       run_buffer_test_flush tag $tag_sha1 $tag_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
+       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
+'
+
 test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -373,6 +487,72 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+batch_command_info_input="info $hello_sha1
+info $tree_sha1
+info $commit_sha1
+info $tag_sha1
+info deadbeef
+flush
+"
+
+batch_command_info_output="$hello_sha1 blob $hello_size
+$tree_sha1 tree $tree_size
+$commit_sha1 commit $commit_size
+$tag_sha1 tag $tag_size
+deadbeef missing"
+
+test_expect_success "--batch-command with multiple info calls gives correct format" '
+	test "$batch_command_info_output" = "$(echo_without_newline \
+	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
+'
+
+batch_command_contents_input="contents $hello_sha1
+contents $commit_sha1
+contents $tag_sha1
+contents deadbeef
+flush
+"
+
+batch_command_output="$hello_sha1 blob $hello_size
+$hello_content
+$commit_sha1 commit $commit_size
+$commit_content
+$tag_sha1 tag $tag_size
+$tag_content
+deadbeef missing"
+
+test_expect_success "--batch-command with multiple contents calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_command_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
+'
+
+batch_command_mixed_input="info $hello_sha1
+contents $hello_sha1
+info $commit_sha1
+contents $commit_sha1
+info $tag_sha1
+contents $tag_sha1
+contents deadbeef
+flush
+"
+
+batch_command_mixed_output="$hello_sha1 blob $hello_size
+$hello_sha1 blob $hello_size
+$hello_content
+$commit_sha1 commit $commit_size
+$commit_sha1 commit $commit_size
+$commit_content
+$tag_sha1 tag $tag_size
+$tag_sha1 tag $tag_size
+$tag_content
+deadbeef missing"
+
+test_expect_success "--batch-command with mixed calls gives correct format" '
+	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
+	"$(maybe_remove_timestamp "$(echo_without_newline \
+	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -963,5 +1143,34 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep -E "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep -E "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-10 10:10         ` Christian Couder
  0 siblings, 0 replies; 97+ messages in thread
From: Christian Couder @ 2022-02-10 10:10 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, Eric Sunshine, Jonathan Tan, John Cai

On Thu, Feb 10, 2022 at 9:46 AM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: John Cai <johncai86@gmail.com>
>
> The next patch introduces a new --batch-command flag. Including --batch
> and --batch-check, we will have a total of three batch modes. Currently,
> from the batch_options struct's perspective, print_options is the only

Here you talk about "print_options"...

> member used to distinguish between the different modes. This makes the
> code harder to read.
>
> To reduce potential confusion, replace print_contents with an enum to

...but here it's "print_contents".

Also it would perhaps be a bit clearer if you introduced it saying
something like "the print_contents flag (or boolean?) is the only
member..."

> help readability and clarity.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: John Cai <johncai86@gmail.com>

> @@ -635,7 +640,15 @@ static int batch_option_callback(const struct option *opt,
>         }
>
>         bo->enabled = 1;
> -       bo->print_contents = !strcmp(opt->long_name, "batch");
> +
> +       if (!strcmp(opt->long_name, "batch")) {
> +               bo->batch_mode = BATCH_MODE_CONTENTS;
> +       } else if (!strcmp(opt->long_name, "batch-check")) {
> +               bo->batch_mode = BATCH_MODE_INFO;
> +       } else {
> +               BUG("%s given to batch-option-callback", opt->long_name);
> +       }

I think we prefer to remove braces when there is only one instruction.
So the above could be just:

       if (!strcmp(opt->long_name, "batch"))
               bo->batch_mode = BATCH_MODE_CONTENTS;
       else if (!strcmp(opt->long_name, "batch-check"))
               bo->batch_mode = BATCH_MODE_INFO;
       else
               BUG("%s given to batch-option-callback", opt->long_name);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-10 10:57         ` Phillip Wood
  2022-02-10 17:05           ` Junio C Hamano
  2022-02-10 18:55           ` John Cai
  2022-02-10 22:46         ` Eric Sunshine
  1 sibling, 2 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-10 10:57 UTC (permalink / raw)
  To: John Cai via GitGitGadget, git
  Cc: me, avarab, e, bagasdotme, gitster, Eric Sunshine, Jonathan Tan,
	John Cai

Hi John

I've concentrated on the tests as others have commented on the 
implementation

On 10/02/2022 04:01, John Cai via GitGitGadget wrote:
> From: John Cai <johncai86@gmail.com>
> [...]
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 145eee11df9..a20c8dae85d 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -177,6 +177,24 @@ $content"
>   	test_cmp expect actual
>       '
>   
> +    for opt in --buffer --no-buffer
> +    do
> +	test -z "$content" ||
> +		test_expect_success "--batch-command $opt output of $type content is correct" '
> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
> +		test_cmp expect actual
> +	'
> +
> +	test_expect_success "--batch-command $opt output of $type info is correct" '
> +		echo "$sha1 $type $size" >expect &&
> +		test_write_lines "info $sha1" \
> +		| git cat-file --batch-command $opt >actual &&
> +		test_cmp expect actual
> +	'
> +    done
> +
>       test_expect_success "custom --batch-check format" '
>   	echo "$type $sha1" >expect &&
>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
> @@ -213,6 +231,70 @@ $content"
>       '
>   }
>   
> +run_buffer_test_flush () {
> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	mkfifo input &&
> +	test_when_finished 'rm input' &&
> +	mkfifo output &&
> +	exec 9<>output &&
> +	test_when_finished 'rm output; exec 9<&-'
> +	(
> +		# TODO - Ideally we'd pipe the output of cat-file
> +		# through "sed s'/$/\\/'" to make sure that that read
> +		# would consume all the available
> +		# output. Unfortunately we cannot do this as we cannot
> +		# control when sed flushes its output. We could write
> +		# a test helper in C that appended a '\' to the end of
> +		# each line and flushes its output after every line.
> +		git cat-file --buffer --batch-command <input 2>err &
> +		echo $! &&
> +		wait $!
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	echo "$sha1 $type $size" >expect &&
> +	test_write_lines "info $sha1" flush "info $sha1" >input

This closes input and so cat-file exits and flushes its output - 
therefore you are not testing whether flush actually flushes. When I 
wrote this test in[1] this line was inside a subshell that was 
redirected to the input fifo so that the read happened before cat-file 
exited. This test is also not testing the exit code of cat-file or that 
the output is flushed on exit. Is there a reason you can't just use the 
test as I wrote it? I'm happy to explain anything that isn't clear.

> +	# TODO - consume all available input, not just one
> +	# line (see above).
> +	read actual <&9 &&
> +	echo "$actual" >actual &&
> +	test_cmp expect actual &&
> +	test_must_be_empty err
> +}
> +
> +run_buffer_test_no_flush () {
> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	touch output &&
> +	test_when_finished 'rm output' &&
> +	mkfifo input &&
> +	test_when_finished 'rm input' &&
> +	mkfifo pid &&
> +	exec 9<>pid &&
> +	test_when_finished 'rm pid; exec 9<&-'
> +	(
> +		git cat-file --buffer --batch-command <input >output &
> +		echo $! &&
> +		wait $!
> +		echo $?
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	test_write_lines "info $sha1" "info $sha1" &&

This prints to stdout rather than piping into cat-file so it would not 
produce any output even if it exited normally. In my original[1] this 
line is inside a subshell that is redirected to the input fifo.

> +	kill $cat_file_pid &&
> +	read status <&9 &&
> +	test_must_be_empty output
> +}
> +
>   hello_content="Hello World"
>   hello_size=$(strlen "$hello_content")
>   hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
> @@ -224,6 +306,14 @@ test_expect_success "setup" '
>   
>   run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
> +       run_buffer_test_flush blob $hello_sha1 $hello_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
> +       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
> +'

If we need to run the flush tests for each object type then could they 
go inside run_tests? Personally I think I'd be happy just to test the 
flush command on one object type.

>   test_expect_success '--batch-check without %(rest) considers whole line' '
>   	echo "$hello_sha1 blob $hello_size" >expect &&
>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
> @@ -238,6 +328,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
>   
>   run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
> +       run_buffer_test_flush tree $tree_sha1 $tree_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
> +       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
> +'
> +
>   commit_message="Initial commit"
>   commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
>   commit_size=$(($(test_oid hexsz) + 137))
> @@ -249,6 +347,14 @@ $commit_message"
>   
>   run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
> +       run_buffer_test_flush commit $commit_sha1 $commit_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
> +       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
> +'
> +
>   tag_header_without_timestamp="object $hello_sha1
>   type blob
>   tag hellotag
> @@ -263,11 +369,19 @@ tag_size=$(strlen "$tag_content")
>   
>   run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
> +       run_buffer_test_flush tag $tag_sha1 $tag_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
> +       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
> +'
> +
>   test_expect_success \
>       "Reach a blob from a tag pointing to it" \
>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>   
> -for batch in batch batch-check
> +for batch in batch batch-check batch-command
>   do
>       for opt in t s e p
>       do
> @@ -373,6 +487,72 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>   '
>   
> +batch_command_info_input="info $hello_sha1
> +info $tree_sha1
> +info $commit_sha1
> +info $tag_sha1
> +info deadbeef

I know there are existing uses of the constant in the file but I'm not 
thrilled about adding more.

> +flush

This flush in redundant isn't it

> +"
> +
> +batch_command_info_output="$hello_sha1 blob $hello_size
> +$tree_sha1 tree $tree_size
> +$commit_sha1 commit $commit_size
> +$tag_sha1 tag $tag_size
> +deadbeef missing"
> +
> +test_expect_success "--batch-command with multiple info calls gives correct format" '

double quotes are generally reserved for test titles that use parameter 
substitution which this one does not.

> +	test "$batch_command_info_output" = "$(echo_without_newline \
> +	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
> +'

This test and the one below are quite hard to follow. These days we try 
to avoid using test to compare strings as when it fails it does not 
provide any clues as to what when wrong. Instead we use here documents 
and test_cmp so that when a test fails you can see what went wrong. Also 
the setup happens inside the test

test_expect_success '--batch-command with multiple info calls gives 
correct format' '
	batch_command_info_input="info $hello_sha1\
	info $tree_sha1\
	info $commit_sha1\
	info $tag_sha1\
	info deadbeef\
	flush"
	
	cat >expect <<-EOF &&
	$hello_sha1 blob $hello_size
	$tree_sha1 tree $tree_size
	$commit_sha1 commit $commit_size
	$tag_sha1 tag $tag_size
	deadbeef missing
	EOF

	echo_without_newline "$batch_command_info_input" | git cat-file 
--batch-command --buffer >actual &&
	test_cmp expect actual
'

> +batch_command_contents_input="contents $hello_sha1
> +contents $commit_sha1
> +contents $tag_sha1
> +contents deadbeef
> +flush
> +"
> +
> +batch_command_output="$hello_sha1 blob $hello_size
> +$hello_content
> +$commit_sha1 commit $commit_size
> +$commit_content
> +$tag_sha1 tag $tag_size
> +$tag_content
> +deadbeef missing"
> +
> +test_expect_success "--batch-command with multiple contents calls gives correct format" '
> +	test "$(maybe_remove_timestamp "$batch_command_output" 1)" = \
> +	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
> +'
> +
> +batch_command_mixed_input="info $hello_sha1
> +contents $hello_sha1
> +info $commit_sha1
> +contents $commit_sha1
> +info $tag_sha1
> +contents $tag_sha1
> +contents deadbeef
> +flush
> +"
> +
> +batch_command_mixed_output="$hello_sha1 blob $hello_size
> +$hello_sha1 blob $hello_size
> +$hello_content
> +$commit_sha1 commit $commit_size
> +$commit_sha1 commit $commit_size
> +$commit_content
> +$tag_sha1 tag $tag_size
> +$tag_sha1 tag $tag_size
> +$tag_content
> +deadbeef missing"
> +
> +test_expect_success "--batch-command with mixed calls gives correct format" '
> +	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
> +	"$(maybe_remove_timestamp "$(echo_without_newline \
> +	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
> +'
> +
>   test_expect_success 'setup blobs which are likely to delta' '
>   	test-tool genrandom foo 10240 >foo &&
>   	{ cat foo && echo plus; } >foo-plus &&
> @@ -963,5 +1143,34 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>   	echo "$orig commit $orig_size" >expect &&
>   	test_cmp expect actual
>   '
> +test_expect_success 'batch-command empty command' '
> +	echo "" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*empty command in input.*" err
> +'

This test and the ones below look good but they don't need to pass -E to 
grep are they are not using an extended regex.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com

> +test_expect_success 'batch-command whitespace before command' '
> +	echo " info deadbeef" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*whitespace before command.*" err
> +'
> +
> +test_expect_success 'batch-command unknown command' '
> +	echo unknown_command >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*unknown command.*" err
> +'
> +
> +test_expect_success 'batch-command flush with arguments' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
> +	grep -E "^fatal:.*flush takes no arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush without --buffer' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep -E "^fatal:.*flush is only for --buffer mode.*" err
> +'
>   
>   test_done


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10 10:57         ` Phillip Wood
@ 2022-02-10 17:05           ` Junio C Hamano
  2022-02-11 17:45             ` John Cai
  2022-02-10 18:55           ` John Cai
  1 sibling, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-10 17:05 UTC (permalink / raw)
  To: Phillip Wood
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme,
	Eric Sunshine, Jonathan Tan, John Cai

Phillip Wood <phillip.wood123@gmail.com> writes:

>> +	type=$1
>> +	sha1=$2
>> +	size=$3
>> +
>> +	mkfifo input &&
>> +	test_when_finished 'rm input' &&
>> +	mkfifo output &&
>> +	exec 9<>output &&
>> +	test_when_finished 'rm output; exec 9<&-'
>> +	(
>> +		# TODO - Ideally we'd pipe the output of cat-file
>> +		# through "sed s'/$/\\/'" to make sure that that read
>> +		# would consume all the available
>> +		# output. Unfortunately we cannot do this as we cannot
>> +		# control when sed flushes its output. We could write
>> +		# a test helper in C that appended a '\' to the end of
>> +		# each line and flushes its output after every line.
>> +		git cat-file --buffer --batch-command <input 2>err &
>> +		echo $! &&
>> +		wait $!
>> +	) >&9 &
>> +	sh_pid=$! &&
>> +	read cat_file_pid <&9 &&
>> +	test_when_finished "kill $cat_file_pid
>> +			    kill $sh_pid; wait $sh_pid; :" &&
>> +	echo "$sha1 $type $size" >expect &&
>> +	test_write_lines "info $sha1" flush "info $sha1" >input
>
> This closes input and so cat-file exits and flushes its output -
> therefore you are not testing whether flush actually flushes. When I 
> wrote this test in[1] this line was inside a subshell that was
> redirected to the input fifo so that the read happened before cat-file 
> exited.

Yeah, very good point.

> This test is also not testing the exit code of cat-file or
> that the output is flushed on exit. Is there a reason you can't just
> use the test as I wrote it? I'm happy to explain anything that isn't
> clear.

I admit I do not offhand recall what your tests did but help with
this (and more) level of detail with an offer to collaborate is
something I am very happy to see.  Thanks for working well together.

One thing that I wasn't quite sure was how well failure cases are
tested.  If we ask, in a batch mode, "info" for two objects and then
"flush", does the asker get enough clue when to read and when to
stop reading with all four combinations of states, i.e. asking for
two missing objects, one good object and one bad object, one bad
object and one good object, two good objects, for example?

Testing such combinations reliably is tricky---if the asker needs to
react to different response differently, a test that expects good
and then bad may not just fail but can get into deadlock, for
example if the reaction to good response has to read a lot but the
reaction to bad response is to just consume the "bad object" notice,
when a bug in the program being tested makes it issue the response
for a bad case when the asker is expecting a response for a good
object, because the asker will keep waiting for more response to
read which may not come.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10 10:57         ` Phillip Wood
  2022-02-10 17:05           ` Junio C Hamano
@ 2022-02-10 18:55           ` John Cai
  1 sibling, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-10 18:55 UTC (permalink / raw)
  To: phillip.wood
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme,
	gitster, Eric Sunshine, Jonathan Tan

Hi Phillip,

Thanks again for helping with this! a few comments/questions below:

On 10 Feb 2022, at 5:57, Phillip Wood wrote:

> Hi John
>
> I've concentrated on the tests as others have commented on the implementation
>
> On 10/02/2022 04:01, John Cai via GitGitGadget wrote:
>> From: John Cai <johncai86@gmail.com>
>> [...]
>> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
>> index 145eee11df9..a20c8dae85d 100755
>> --- a/t/t1006-cat-file.sh
>> +++ b/t/t1006-cat-file.sh
>> @@ -177,6 +177,24 @@ $content"
>>   	test_cmp expect actual
>>       '
>>  +    for opt in --buffer --no-buffer
>> +    do
>> +	test -z "$content" ||
>> +		test_expect_success "--batch-command $opt output of $type content is correct" '
>> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
>> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
>> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
>> +		test_cmp expect actual
>> +	'
>> +
>> +	test_expect_success "--batch-command $opt output of $type info is correct" '
>> +		echo "$sha1 $type $size" >expect &&
>> +		test_write_lines "info $sha1" \
>> +		| git cat-file --batch-command $opt >actual &&
>> +		test_cmp expect actual
>> +	'
>> +    done
>> +
>>       test_expect_success "custom --batch-check format" '
>>   	echo "$type $sha1" >expect &&
>>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
>> @@ -213,6 +231,70 @@ $content"
>>       '
>>   }
>>  +run_buffer_test_flush () {
>> +	type=$1
>> +	sha1=$2
>> +	size=$3
>> +
>> +	mkfifo input &&
>> +	test_when_finished 'rm input' &&
>> +	mkfifo output &&
>> +	exec 9<>output &&
>> +	test_when_finished 'rm output; exec 9<&-'
>> +	(
>> +		# TODO - Ideally we'd pipe the output of cat-file
>> +		# through "sed s'/$/\\/'" to make sure that that read
>> +		# would consume all the available
>> +		# output. Unfortunately we cannot do this as we cannot
>> +		# control when sed flushes its output. We could write
>> +		# a test helper in C that appended a '\' to the end of
>> +		# each line and flushes its output after every line.
>> +		git cat-file --buffer --batch-command <input 2>err &
>> +		echo $! &&
>> +		wait $!
>> +	) >&9 &
>> +	sh_pid=$! &&
>> +	read cat_file_pid <&9 &&
>> +	test_when_finished "kill $cat_file_pid
>> +			    kill $sh_pid; wait $sh_pid; :" &&
>> +	echo "$sha1 $type $size" >expect &&
>> +	test_write_lines "info $sha1" flush "info $sha1" >input
>
> This closes input and so cat-file exits and flushes its output - therefore you are not testing whether flush actually flushes. When I wrote this test in[1] this line was inside a subshell that was redirected to the input fifo so that the read happened before cat-file exited. This test is also not testing the exit code of cat-file or that the output is flushed on exit. Is there a reason you can't just use the test as I wrote it? I'm happy to explain anything that isn't clear.

I've restored the tests in the form you suggested. I had removed some lines to simplify the test but as it turns out I removed some of the important aspects of the test.

Here are my modifications to the tests you helped me with. Let me know if these changes make sense, or if I'm missing something.

> @@ -3,6 +3,7 @@ run_buffer_test_flush () {
>         sha1=$2
>         size=$3
>
> +       rm -f input output &&

on my end some tests were hanging because these files were not getting removed
by test_when_finished.

>         mkfifo input &&
>         test_when_finished 'rm input'
>         mkfifo output &&
> @@ -26,7 +27,7 @@ run_buffer_test_flush () {
>         test_when_finished "kill $cat_file_pid
>                             kill $sh_pid; wait $sh_pid; :" &&
>         (
> -               test_write_lines "info $sha1" fflush "info $sha1" &&
> +               test_write_lines "info $sha1" flush "info $sha1" &&
>                 # TODO - consume all available input, not just one
>                 # line (see above).
>                 read actual <&9 &&
> @@ -48,13 +49,14 @@ run_buffer_test_no_flush () {
>         sha1=$2
>         size=$3
>
> +       touch output &&

It looks like test_must_be_empty expects a file, and if output is never written
to it doesn't open the file.

>         mkfifo input &&
>         test_when_finished 'rm input'
>         mkfifo pid &&
>         exec 9<>pid &&
>         test_when_finished 'rm pid; exec 9<&-'
>         (
> -               git cat-file --buffer --batch-command <input >output &
> +               git cat-file --buffer --batch-command <input >>output &
>                 echo $! &&
>                 wait $!
>                 echo $?
> @@ -67,7 +69,7 @@ run_buffer_test_no_flush () {
>                 test_write_lines "info $sha1" "info $sha1" &&
>                 kill $cat_file_pid &&
>                 read status <&9 &&
> -               test "$status" -ne 0 &&
> -               test_must_be_empty output
> -       ) >input
> +               test "$status" -ne 0
> +       ) >input &&
> +       test_must_be_empty output

I wanted to ask about this, because the test hung here. I surmised that it was
because we are checking the output before writing to the input.

>  }

>
>> +	# TODO - consume all available input, not just one
>> +	# line (see above).
>> +	read actual <&9 &&
>> +	echo "$actual" >actual &&
>> +	test_cmp expect actual &&
>> +	test_must_be_empty err
>> +}
>> +
>> +run_buffer_test_no_flush () {
>> +	type=$1
>> +	sha1=$2
>> +	size=$3
>> +
>> +	touch output &&
>> +	test_when_finished 'rm output' &&
>> +	mkfifo input &&
>> +	test_when_finished 'rm input' &&
>> +	mkfifo pid &&
>> +	exec 9<>pid &&
>> +	test_when_finished 'rm pid; exec 9<&-'
>> +	(
>> +		git cat-file --buffer --batch-command <input >output &
>> +		echo $! &&
>> +		wait $!
>> +		echo $?
>> +	) >&9 &
>> +	sh_pid=$! &&
>> +	read cat_file_pid <&9 &&
>> +	test_when_finished "kill $cat_file_pid
>> +			    kill $sh_pid; wait $sh_pid; :" &&
>> +	test_write_lines "info $sha1" "info $sha1" &&
>
> This prints to stdout rather than piping into cat-file so it would not produce any output even if it exited normally. In my original[1] this line is inside a subshell that is redirected to the input fifo.
>
>> +	kill $cat_file_pid &&
>> +	read status <&9 &&
>> +	test_must_be_empty output
>> +}
>> +
>>   hello_content="Hello World"
>>   hello_size=$(strlen "$hello_content")
>>   hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
>> @@ -224,6 +306,14 @@ test_expect_success "setup" '
>>    run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>>  +test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
>> +       run_buffer_test_flush blob $hello_sha1 $hello_size
>> +'
>> +
>> +test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
>> +       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
>> +'
>
> If we need to run the flush tests for each object type then could they go inside run_tests? Personally I think I'd be happy just to test the flush command on one object type.

yeah, that makes sense

>
>>   test_expect_success '--batch-check without %(rest) considers whole line' '
>>   	echo "$hello_sha1 blob $hello_size" >expect &&
>>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
>> @@ -238,6 +328,14 @@ tree_pretty_content="100644 blob $hello_sha1	hello"
>>    run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
>>  +test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
>> +       run_buffer_test_flush tree $tree_sha1 $tree_size
>> +'
>> +
>> +test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
>> +       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
>> +'
>> +
>>   commit_message="Initial commit"
>>   commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
>>   commit_size=$(($(test_oid hexsz) + 137))
>> @@ -249,6 +347,14 @@ $commit_message"
>>    run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
>>  +test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
>> +       run_buffer_test_flush commit $commit_sha1 $commit_size
>> +'
>> +
>> +test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
>> +       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
>> +'
>> +
>>   tag_header_without_timestamp="object $hello_sha1
>>   type blob
>>   tag hellotag
>> @@ -263,11 +369,19 @@ tag_size=$(strlen "$tag_content")
>>    run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
>>  +test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
>> +       run_buffer_test_flush tag $tag_sha1 $tag_size
>> +'
>> +
>> +test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
>> +       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
>> +'
>> +
>>   test_expect_success \
>>       "Reach a blob from a tag pointing to it" \
>>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>>  -for batch in batch batch-check
>> +for batch in batch batch-check batch-command
>>   do
>>       for opt in t s e p
>>       do
>> @@ -373,6 +487,72 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>>   '
>>  +batch_command_info_input="info $hello_sha1
>> +info $tree_sha1
>> +info $commit_sha1
>> +info $tag_sha1
>> +info deadbeef
>
> I know there are existing uses of the constant in the file but I'm not thrilled about adding more.
>
>> +flush
>
> This flush in redundant isn't it

true, we don't actually need it

>
>> +"
>> +
>> +batch_command_info_output="$hello_sha1 blob $hello_size
>> +$tree_sha1 tree $tree_size
>> +$commit_sha1 commit $commit_size
>> +$tag_sha1 tag $tag_size
>> +deadbeef missing"
>> +
>> +test_expect_success "--batch-command with multiple info calls gives correct format" '
>
> double quotes are generally reserved for test titles that use parameter substitution which this one does not.
>
>> +	test "$batch_command_info_output" = "$(echo_without_newline \
>> +	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
>> +'
>
> This test and the one below are quite hard to follow. These days we try to avoid using test to compare strings as when it fails it does not provide any clues as to what when wrong. Instead we use here documents and test_cmp so that when a test fails you can see what went wrong. Also the setup happens inside the test
>
> test_expect_success '--batch-command with multiple info calls gives correct format' '
>     batch_command_info_input="info $hello_sha1\
>     info $tree_sha1\
>     info $commit_sha1\
>     info $tag_sha1\
>     info deadbeef\
>     flush"
>
>     cat >expect <<-EOF &&
>     $hello_sha1 blob $hello_size
>     $tree_sha1 tree $tree_size
>     $commit_sha1 commit $commit_size
>     $tag_sha1 tag $tag_size
>     deadbeef missing
>     EOF
>
>     echo_without_newline "$batch_command_info_input" | git cat-file --batch-command --buffer >actual &&
>     test_cmp expect actual
> '

sounds good, will adjust
>
>> +batch_command_contents_input="contents $hello_sha1
>> +contents $commit_sha1
>> +contents $tag_sha1
>> +contents deadbeef
>> +flush
>> +"
>> +
>> +batch_command_output="$hello_sha1 blob $hello_size
>> +$hello_content
>> +$commit_sha1 commit $commit_size
>> +$commit_content
>> +$tag_sha1 tag $tag_size
>> +$tag_content
>> +deadbeef missing"
>> +
>> +test_expect_success "--batch-command with multiple contents calls gives correct format" '
>> +	test "$(maybe_remove_timestamp "$batch_command_output" 1)" = \
>> +	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
>> +'
>> +
>> +batch_command_mixed_input="info $hello_sha1
>> +contents $hello_sha1
>> +info $commit_sha1
>> +contents $commit_sha1
>> +info $tag_sha1
>> +contents $tag_sha1
>> +contents deadbeef
>> +flush
>> +"
>> +
>> +batch_command_mixed_output="$hello_sha1 blob $hello_size
>> +$hello_sha1 blob $hello_size
>> +$hello_content
>> +$commit_sha1 commit $commit_size
>> +$commit_sha1 commit $commit_size
>> +$commit_content
>> +$tag_sha1 tag $tag_size
>> +$tag_sha1 tag $tag_size
>> +$tag_content
>> +deadbeef missing"
>> +
>> +test_expect_success "--batch-command with mixed calls gives correct format" '
>> +	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
>> +	"$(maybe_remove_timestamp "$(echo_without_newline \
>> +	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
>> +'
>> +
>>   test_expect_success 'setup blobs which are likely to delta' '
>>   	test-tool genrandom foo 10240 >foo &&
>>   	{ cat foo && echo plus; } >foo-plus &&
>> @@ -963,5 +1143,34 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>>   	echo "$orig commit $orig_size" >expect &&
>>   	test_cmp expect actual
>>   '
>> +test_expect_success 'batch-command empty command' '
>> +	echo "" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep -E "^fatal:.*empty command in input.*" err
>> +'
>
> This test and the ones below look good but they don't need to pass -E to grep are they are not using an extended regex.
>
> Best Wishes
>
> Phillip
>
> [1] https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com
>
>> +test_expect_success 'batch-command whitespace before command' '
>> +	echo " info deadbeef" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep -E "^fatal:.*whitespace before command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command unknown command' '
>> +	echo unknown_command >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep -E "^fatal:.*unknown command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush with arguments' '
>> +	echo "flush arg" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
>> +	grep -E "^fatal:.*flush takes no arguments.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush without --buffer' '
>> +	echo "flush arg" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep -E "^fatal:.*flush is only for --buffer mode.*" err
>> +'
>>    test_done

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 0/3] Add cat-file --batch-command flag
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-10 20:30       ` Junio C Hamano
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
  4 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-10 20:30 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The feature proposal of adding a command interface to cat-file was first
> discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
> before moving forward with a new flag. An RFC was created [C] and the idea
> was discussed more thoroughly, and overall it seemed like it was headed in
> the right direction.
>
> This patch series consolidates the feedback from these different threads.
>
> This patch series has three parts:
>
>  1. preparation patch to rename a variable
>  2. adding an enum to keep track of batch modes
>  3. logic to handle --batch-command flag, adding contents, info, flush
>     commands
>
> Changes since v3 (thanks to Junio's feedback):
>
>  * added cascading logic in batch_options_callback()
>  * free memory for queued call input lines
>  * do not throw error when flushing an empty queue
>  * renamed cmds array to singular queued_cmd
>  * fixed flaky test that failed --stress

Unfortunately this round still seems to fail fairly reliably without
--stress.  FWIW, I usually run tests with "cd t && make prove" but
it seems to get stuck after passing 

    ok 69 - --batch without size (blob)
    ok 70 - --batch-command --buffer with flush for blob info

even when the test is run without prove.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-10 10:57         ` Phillip Wood
@ 2022-02-10 22:46         ` Eric Sunshine
  1 sibling, 0 replies; 97+ messages in thread
From: Eric Sunshine @ 2022-02-10 22:46 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, Jonathan Tan, John Cai

On Wed, Feb 9, 2022 at 11:01 PM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Add a new flag --batch-command that accepts commands and arguments
> from stdin, similar to git-update-ref --stdin.

Some comments not offered by other reviewers...

> This patch adds the basic structure for adding command which can be
> extended in the future to add more commands. It also adds the following
> two commands (on top of the flush command):
>
> contents <object> LF
> info <object> LF
>
> The contents command takes an <object> argument and prints out the object
> contents.
>
> The info command takes a <object> argument and prints out the object
> metadata.
>
> These can be used in the following way with --buffer:
>
> info <sha1> LF
> contents <sha1> LF
> contents <sha1> LF
> info <sha1> LF
> flush
> info <sha1> LF
> flush

s/<sha1>/<object>/ for consistency with the usage information earlier
in the commit message, and since Git is migrating to SHA-256, and to
avoid reviewer confusion as occurred earlier[1].

Also: s/flush$/flush LF/

> When used without --buffer:
>
> info <sha1> LF
> contents <sha1> LF
> contents <sha1> LF
> info <sha1> LF
> info <sha1> LF

Ditto.

[1]: https://lore.kernel.org/git/CAPig+cTeqhOYTu9WBiY=LnZtt35hAp3Qa5RduC2yLut6p01_1w@mail.gmail.com/

> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> @@ -96,6 +96,30 @@ OPTIONS
> +--batch-command::
> +       Enter a command mode that reads commands and arguments from stdin.
> +       May not be combined with any other options or arguments except
> +       `--textconv` or `--filters`, in which case the input lines also need to
> +       specify the path, separated by whitespace.  See the section
> +       `BATCH OUTPUT` below for details.

The SYNOPSIS probably needs an update too.

Perhaps say something like "Recognized commands include:" here before
enumerating the commands themselves?

> +--
> +contents <object>::
> +       Print object contents for object reference <object>. This corresponds to
> +       the output of --batch.

s/<object>/`<object>`/
s/--batch/`--batch`/

> +info <object>::
> +       Print object info for object reference <object>. This corresponds to the
> +       output of --batch-check.

s/<object>/`<object>`/
s/--batch/`--batch-check`/

> +flush::
> +       Used in --buffer mode to execute all preceding commands that were issued
> +       since the beginning or since the last flush was issued. When --buffer
> +       is used, no output will come until flush is issued. When --buffer is not
> +       used, commands are flushed each time without issuing `flush`.
> +--

s/--buffer/`--buffer`/g
s/flush/`flush`/g

This says that it's legal to use `--buffer` along with
`--batch-command`, but the description of `--batch-command` itself
just above says that it can be combined only with `--textconv` or
`--filters`. (I see you copied the problematic text from the other
batch options, so they also are guilty of not mentioning `--buffer`.
This series doesn't necessarily need to fix those existing
documentation problems, but perhaps don't repeat the problem with
newly-added text?)

The description of the `--buffer` option probably also needs to be
updated to mention the new `--batch-command` option, and there may be
other places in this document which should mention it, as well.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> +static const struct parse_cmd {
> +       const char *prefix;
> +       parse_cmd_fn_t fn;
> +       unsigned takes_args;
> +} commands[] = {
> +       { "contents", parse_cmd_contents, 1},
> +       { "info", parse_cmd_info, 1},
> +};
> +
> +static void batch_objects_command(struct batch_options *opt,
> +                                   struct strbuf *output,
> +                                   struct expand_data *data)
> +{
> +       while (!strbuf_getline(&input, stdin)) {
> +               if (!input.len)
> +                       die(_("empty command in input"));
> +               if (isspace(*input.buf))
> +                       die(_("whitespace before command: '%s'"), input.buf);
> +
> +               if (skip_prefix(input.buf, "flush", &cmd_end)) {
> +                       if (!opt->buffer_output)
> +                               die(_("flush is only for --buffer mode"));
> +                       if (*cmd_end)
> +                               die(_("flush takes no arguments"));
> +
> +                       dispatch_calls(opt, output, data, queued_cmd, nr);
> +                       nr = 0;
> +                       continue;
> +               }
> +
> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +                       if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
> +                               continue;

This prefix-matching is going to incorrectly match non-commands such
as "contentsify <object>" and "information <object>" and then treat
them as "contents fy <object>" and "info mation <object>",
respectively, with undesirable results. You need to verify that there
is a space or NUL at `*cmd_end` before treating `input.buf` as an
actual command.

> +                       cmd = &commands[i];
> +                       if (cmd->takes_args)

What happens if `cmd->takes_arg` is true but no arguments follow the
command? Should that be diagnosed as an error?

> +                               p = cmd_end + 1;

This unconditional +1 is going to make `p` point beyond the NUL
character if the input is just a bare command, such as "contents" or
"info" without any space or any argument...

> +                       break;
> +               }
> +
> +               if (!cmd)
> +                       die(_("unknown command: '%s'"), input.buf);
> +
> +               if (!opt->buffer_output) {
> +                       cmd->fn(opt, p, output, data);
> +                       continue;
> +               }
> +
> +               ALLOC_GROW(queued_cmd, nr + 1, alloc);
> +               call.fn = cmd->fn;
> +               call.line = xstrdup_or_null(p);

... which means that xstrdup_or_null() will be copying whatever random
garbage is in memory following the bare command.

> +               queued_cmd[nr++] = call;
> +       }
> +
> +       if (opt->buffer_output && nr)
> +               dispatch_calls(opt, output, data, queued_cmd, nr);
> +
> +       free(queued_cmd);
> +       strbuf_release(&input);
> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-10 17:05           ` Junio C Hamano
@ 2022-02-11 17:45             ` John Cai
  2022-02-11 20:07               ` Junio C Hamano
  0 siblings, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-11 17:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, John Cai via GitGitGadget, git, me, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan

Hi Junio

On 10 Feb 2022, at 12:05, Junio C Hamano wrote:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>>> +	type=$1
>>> +	sha1=$2
>>> +	size=$3
>>> +
>>> +	mkfifo input &&
>>> +	test_when_finished 'rm input' &&
>>> +	mkfifo output &&
>>> +	exec 9<>output &&
>>> +	test_when_finished 'rm output; exec 9<&-'
>>> +	(
>>> +		# TODO - Ideally we'd pipe the output of cat-file
>>> +		# through "sed s'/$/\\/'" to make sure that that read
>>> +		# would consume all the available
>>> +		# output. Unfortunately we cannot do this as we cannot
>>> +		# control when sed flushes its output. We could write
>>> +		# a test helper in C that appended a '\' to the end of
>>> +		# each line and flushes its output after every line.
>>> +		git cat-file --buffer --batch-command <input 2>err &
>>> +		echo $! &&
>>> +		wait $!
>>> +	) >&9 &
>>> +	sh_pid=$! &&
>>> +	read cat_file_pid <&9 &&
>>> +	test_when_finished "kill $cat_file_pid
>>> +			    kill $sh_pid; wait $sh_pid; :" &&
>>> +	echo "$sha1 $type $size" >expect &&
>>> +	test_write_lines "info $sha1" flush "info $sha1" >input
>>
>> This closes input and so cat-file exits and flushes its output -
>> therefore you are not testing whether flush actually flushes. When I
>> wrote this test in[1] this line was inside a subshell that was
>> redirected to the input fifo so that the read happened before cat-file
>> exited.
>
> Yeah, very good point.
>
>> This test is also not testing the exit code of cat-file or
>> that the output is flushed on exit. Is there a reason you can't just
>> use the test as I wrote it? I'm happy to explain anything that isn't
>> clear.
>
> I admit I do not offhand recall what your tests did but help with
> this (and more) level of detail with an offer to collaborate is
> something I am very happy to see.  Thanks for working well together.
>
> One thing that I wasn't quite sure was how well failure cases are
> tested.  If we ask, in a batch mode, "info" for two objects and then
> "flush", does the asker get enough clue when to read and when to
> stop reading with all four combinations of states, i.e. asking for
> two missing objects, one good object and one bad object, one bad
> object and one good object, two good objects, for example?

This is a good point. We currently don't have tests that exercise these
combinations.

>
> Testing such combinations reliably is tricky---if the asker needs to
> react to different response differently, a test that expects good
> and then bad may not just fail but can get into deadlock, for
> example if the reaction to good response has to read a lot but the
> reaction to bad response is to just consume the "bad object" notice,
> when a bug in the program being tested makes it issue the response
> for a bad case when the asker is expecting a response for a good
> object, because the asker will keep waiting for more response to
> read which may not come.

Let me see if I understand you. What I'm hearing is that it's hard to test a git
processes (A) that read/write from/to pipes without knowing exactly how (A) will
behave. By necessity, the test logic will have embedded some logic in it that
assumes certain behavior from (A), which might or might not be the case.

This can lead to a hanging test if, say, it is waiting around for (A) to output
data when due to a bug in the code, it never does. Did I get that right?

I still see value in having a test that hangs when it doesn't receive expected
output from the git process. If we had something that detected timeout on tests
then this could catch such a case. But since we don't, then that means having
tests like run_buffer_test_flush() and run_buffer_test_no_flush() will run the
risk of being a deadlocked test if there is a regression of the code in the future.
While still providing value in showing that something is wrong, these deadlocked
tests can be inconvenient to debug.




^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 0/3] Add cat-file --batch-command flag
  2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
@ 2022-02-11 20:01       ` John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                           ` (3 more replies)
  4 siblings, 4 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-11 20:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has three parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (3):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  41 +++++++-
 builtin/cat-file.c             | 167 +++++++++++++++++++++++++++++--
 t/t1006-cat-file.sh            | 178 ++++++++++++++++++++++++++++++++-
 3 files changed, 370 insertions(+), 16 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v5
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v4:

 1:  fa6294387ab = 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
 2:  81bc5ae1fc1 ! 2:  5e0d1161df4 cat-file: introduce batch_mode enum to replace print_contents
     @@ Commit message
          cat-file: introduce batch_mode enum to replace print_contents
      
          The next patch introduces a new --batch-command flag. Including --batch
     -    and --batch-check, we will have a total of three batch modes. Currently,
     -    from the batch_options struct's perspective, print_options is the only
     -    member used to distinguish between the different modes. This makes the
     -    code harder to read.
     +    and --batch-check, we will have a total of three batch modes. print_contents
     +    is the only boolean on the batch_options sturct used to distinguish
     +    between the different modes. This makes the code harder to read.
      
          To reduce potential confusion, replace print_contents with an enum to
          help readability and clarity.
     @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       	bo->enabled = 1;
      -	bo->print_contents = !strcmp(opt->long_name, "batch");
      +
     -+	if (!strcmp(opt->long_name, "batch")) {
     ++	if (!strcmp(opt->long_name, "batch"))
      +		bo->batch_mode = BATCH_MODE_CONTENTS;
     -+	} else if (!strcmp(opt->long_name, "batch-check")) {
     ++	else if (!strcmp(opt->long_name, "batch-check"))
      +		bo->batch_mode = BATCH_MODE_INFO;
     -+	} else {
     ++	else
      +		BUG("%s given to batch-option-callback", opt->long_name);
     -+	}
      +
       	bo->format = arg;
       
 3:  6c51324a662 ! 3:  ad66d1f3e2b cat-file: add --batch-command mode
     @@ Commit message
      
          These can be used in the following way with --buffer:
      
     -    info <sha1> LF
     -    contents <sha1> LF
     -    contents <sha1> LF
     -    info <sha1> LF
     -    flush
     -    info <sha1> LF
     -    flush
     +    info <object> LF
     +    contents <object> LF
     +    contents <object> LF
     +    info <object> LF
     +    flush LF
     +    info <object> LF
     +    flush LF
      
          When used without --buffer:
      
     -    info <sha1> LF
     -    contents <sha1> LF
     -    contents <sha1> LF
     -    info <sha1> LF
     -    info <sha1> LF
     +    info <object> LF
     +    contents <object> LF
     +    contents <object> LF
     +    info <object> LF
     +    info <object> LF
      
          Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: John Cai <johncai86@gmail.com>
     @@ Documentation/git-cat-file.txt: OPTIONS
       	section `BATCH OUTPUT` below for details.
       
      +--batch-command::
     -+	Enter a command mode that reads commands and arguments from stdin.
     -+	May not be combined with any other options or arguments except
     -+	`--textconv` or `--filters`, in which case the input lines also need to
     -+	specify the path, separated by whitespace.  See the section
     -+	`BATCH OUTPUT` below for details.
     ++	Enter a command mode that reads commands and arguments from stdin. May
     ++	only be combined with `--buffer`, `--textconv` or `--filters`. In the
     ++	case of `--textconv` or `--filters`, the input lines also need to specify
     ++	the path, separated by whitespace. See the section `BATCH OUTPUT` below
     ++	for details.
     +++
     ++`--batch-command` recognizes the following commands:
      ++
      +--
     -+contents <object>::
     -+	Print object contents for object reference <object>. This corresponds to
     -+	the output of --batch.
     ++contents `<object>`::
     ++	Print object contents for object reference `<object>`. This corresponds to
     ++	the output of `--batch`.
      +
     -+info <object>::
     -+	Print object info for object reference <object>. This corresponds to the
     -+	output of --batch-check.
     ++info `<object>`::
     ++	Print object info for object reference `<object>`. This corresponds to the
     ++	output of `--batch-check`.
      +
      +flush::
     -+	Used in --buffer mode to execute all preceding commands that were issued
     -+	since the beginning or since the last flush was issued. When --buffer
     -+	is used, no output will come until flush is issued. When --buffer is not
     -+	used, commands are flushed each time without issuing `flush`.
     ++	Used with `--buffer` to execute all preceding commands that were issued
     ++	since the beginning or since the last flush was issued. When `--buffer`
     ++	is used, no output will come until a `flush` is issued. When `--buffer`
     ++	is not used, commands are flushed each time without issuing `flush`.
      +--
      ++
      +
       --batch-all-objects::
       	Instead of reading a list of objects on stdin, perform the
       	requested batch operation on all objects in the repository and
     +@@ Documentation/git-cat-file.txt: OPTIONS
     + 	that a process can interactively read and write from
     + 	`cat-file`. With this option, the output uses normal stdio
     + 	buffering; this is much more efficient when invoking
     +-	`--batch-check` on a large number of objects.
     ++	`--batch-check` or `--batch-command` on a large number of objects.
     + 
     + --unordered::
     + 	When `--batch-all-objects` is in use, visit objects in an
     +@@ Documentation/git-cat-file.txt: from stdin, one per line, and print information about them. By default,
     + the whole line is considered as an object, as if it were fed to
     + linkgit:git-rev-parse[1].
     + 
     ++When `--batch-command` is given, `cat-file` will read commands from stdin,
     ++one per line, and print information based on the command given. With
     ++`--batch-command`, the `info` command followed by an object will print
     ++information about the object the same way `--batch-check` would, and the
     ++`contents` command followed by an object prints contents in the same way
     ++`--batch` would.
     ++
     + You can specify the information shown for each object by using a custom
     + `<format>`. The `<format>` is copied literally to stdout for each
     + object, with placeholders of the form `%(atom)` expanded, followed by a
     +@@ Documentation/git-cat-file.txt: newline. The available atoms are:
     + If no format is specified, the default format is `%(objectname)
     + %(objecttype) %(objectsize)`.
     + 
     +-If `--batch` is specified, the object information is followed by the
     +-object contents (consisting of `%(objectsize)` bytes), followed by a
     +-newline.
     ++If `--batch` is specified, or if `--batch-command` is used with the `contents`
     ++command, the object information is followed by the object contents (consisting
     ++of `%(objectsize)` bytes), followed by a newline.
     + 
     + For example, `--batch` without a custom format would produce:
     + 
      
       ## builtin/cat-file.c ##
      @@
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +				continue;
      +
      +			cmd = &commands[i];
     -+			if (cmd->takes_args)
     ++			if (cmd->takes_args) {
     ++				if (*cmd_end != ' ')
     ++					die(_("%s requires arguments"),
     ++					    commands[i].prefix);
     ++
      +				p = cmd_end + 1;
     ++			} else if (*cmd_end) {
     ++				die(_("%s takes no arguments"),
     ++				    commands[i].prefix);
     ++			}
     ++
      +			break;
      +		}
      +
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
       	warn_on_object_refname_ambiguity = save_warning;
      @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       		bo->batch_mode = BATCH_MODE_CONTENTS;
     - 	} else if (!strcmp(opt->long_name, "batch-check")) {
     + 	else if (!strcmp(opt->long_name, "batch-check"))
       		bo->batch_mode = BATCH_MODE_INFO;
     -+	} else if (!strcmp(opt->long_name, "batch-command")) {
     ++	else if (!strcmp(opt->long_name, "batch-command"))
      +		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
     - 	} else {
     + 	else
       		BUG("%s given to batch-option-callback", opt->long_name);
     - 	}
     + 
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
       			N_("like --batch, but don't emit <contents>"),
       			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     @@ t/t1006-cat-file.sh: $content"
      +	sha1=$2
      +	size=$3
      +
     ++	rm -f input output &&
      +	mkfifo input &&
     -+	test_when_finished 'rm input' &&
     ++	test_when_finished 'rm input'
      +	mkfifo output &&
      +	exec 9<>output &&
      +	test_when_finished 'rm output; exec 9<&-'
     @@ t/t1006-cat-file.sh: $content"
      +		git cat-file --buffer --batch-command <input 2>err &
      +		echo $! &&
      +		wait $!
     ++		echo $?
      +	) >&9 &
      +	sh_pid=$! &&
      +	read cat_file_pid <&9 &&
      +	test_when_finished "kill $cat_file_pid
      +			    kill $sh_pid; wait $sh_pid; :" &&
     -+	echo "$sha1 $type $size" >expect &&
     -+	test_write_lines "info $sha1" flush "info $sha1" >input
     -+	# TODO - consume all available input, not just one
     -+	# line (see above).
     ++	(
     ++		test_write_lines "info $sha1" flush "info $sha1" &&
     ++		# TODO - consume all available input, not just one
     ++		# line (see above).
     ++		read actual <&9 &&
     ++		echo "$actual" >actual &&
     ++		echo "$sha1 $type $size" >expect &&
     ++		test_cmp expect actual
     ++	) >input &&
     ++	# check output is flushed on exit
      +	read actual <&9 &&
      +	echo "$actual" >actual &&
      +	test_cmp expect actual &&
     -+	test_must_be_empty err
     ++	test_must_be_empty err &&
     ++	read status <&9 &&
     ++	test "$status" -eq 0
      +}
      +
      +run_buffer_test_no_flush () {
     @@ t/t1006-cat-file.sh: $content"
      +	size=$3
      +
      +	touch output &&
     -+	test_when_finished 'rm output' &&
     ++	test_when_finished 'rm output'
      +	mkfifo input &&
     -+	test_when_finished 'rm input' &&
     ++	test_when_finished 'rm input'
      +	mkfifo pid &&
      +	exec 9<>pid &&
      +	test_when_finished 'rm pid; exec 9<&-'
      +	(
     -+		git cat-file --buffer --batch-command <input >output &
     ++		git cat-file --buffer --batch-command <input >>output &
      +		echo $! &&
      +		wait $!
      +		echo $?
     @@ t/t1006-cat-file.sh: $content"
      +	read cat_file_pid <&9 &&
      +	test_when_finished "kill $cat_file_pid
      +			    kill $sh_pid; wait $sh_pid; :" &&
     -+	test_write_lines "info $sha1" "info $sha1" &&
     -+	kill $cat_file_pid &&
     -+	read status <&9 &&
     -+	test_must_be_empty output
     ++	(
     ++		test_write_lines "info $sha1" "info $sha1" &&
     ++		kill $cat_file_pid &&
     ++		read status <&9 &&
     ++		test "$status" -ne 0 &&
     ++		test_must_be_empty output
     ++	) >input
      +}
     ++
      +
       hello_content="Hello World"
       hello_size=$(strlen "$hello_content")
     @@ t/t1006-cat-file.sh: test_expect_success "setup" '
       test_expect_success '--batch-check without %(rest) considers whole line' '
       	echo "$hello_sha1 blob $hello_size" >expect &&
       	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
     -@@ t/t1006-cat-file.sh: tree_pretty_content="100644 blob $hello_sha1	hello"
     - 
     - run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
     - 
     -+test_expect_success PIPE '--batch-command --buffer with flush for tree info' '
     -+       run_buffer_test_flush tree $tree_sha1 $tree_size
     -+'
     -+
     -+test_expect_success PIPE '--batch-command --buffer without flush for tree info' '
     -+       run_buffer_test_no_flush tree $tree_sha1 $tree_size false
     -+'
     -+
     - commit_message="Initial commit"
     - commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
     - commit_size=$(($(test_oid hexsz) + 137))
     -@@ t/t1006-cat-file.sh: $commit_message"
     - 
     - run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
     - 
     -+test_expect_success PIPE '--batch-command --buffer with flush for commit info' '
     -+       run_buffer_test_flush commit $commit_sha1 $commit_size
     -+'
     -+
     -+test_expect_success PIPE '--batch-command --buffer without flush for commit info' '
     -+       run_buffer_test_no_flush commit $commit_sha1 $commit_size false
     -+'
     -+
     - tag_header_without_timestamp="object $hello_sha1
     - type blob
     - tag hellotag
     -@@ t/t1006-cat-file.sh: tag_size=$(strlen "$tag_content")
     - 
     - run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
     - 
     -+test_expect_success PIPE '--batch-command --buffer with flush for tag info' '
     -+       run_buffer_test_flush tag $tag_sha1 $tag_size
     -+'
     -+
     -+test_expect_success PIPE '--batch-command --buffer without flush for tag info' '
     -+       run_buffer_test_no_flush tag $tag_sha1 $tag_size false
     -+'
     -+
     - test_expect_success \
     +@@ t/t1006-cat-file.sh: test_expect_success \
           "Reach a blob from a tag pointing to it" \
           "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
       
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
           "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
       '
       
     -+batch_command_info_input="info $hello_sha1
     -+info $tree_sha1
     -+info $commit_sha1
     -+info $tag_sha1
     -+info deadbeef
     -+flush
     -+"
     -+
     -+batch_command_info_output="$hello_sha1 blob $hello_size
     -+$tree_sha1 tree $tree_size
     -+$commit_sha1 commit $commit_size
     -+$tag_sha1 tag $tag_size
     -+deadbeef missing"
     -+
     -+test_expect_success "--batch-command with multiple info calls gives correct format" '
     -+	test "$batch_command_info_output" = "$(echo_without_newline \
     -+	"$batch_command_info_input" | git cat-file --batch-command --buffer)"
     -+'
     -+
     -+batch_command_contents_input="contents $hello_sha1
     -+contents $commit_sha1
     -+contents $tag_sha1
     -+contents deadbeef
     -+flush
     -+"
     -+
     -+batch_command_output="$hello_sha1 blob $hello_size
     -+$hello_content
     -+$commit_sha1 commit $commit_size
     -+$commit_content
     -+$tag_sha1 tag $tag_size
     -+$tag_content
     -+deadbeef missing"
     -+
     -+test_expect_success "--batch-command with multiple contents calls gives correct format" '
     -+	test "$(maybe_remove_timestamp "$batch_command_output" 1)" = \
     -+	"$(maybe_remove_timestamp "$(echo_without_newline "$batch_command_contents_input" | git cat-file --batch-command)" 1)"
     ++test_expect_success '--batch-command with multiple info calls gives correct format' '
     ++	cat >expect <<-EOF &&
     ++	$hello_sha1 blob $hello_size
     ++	$tree_sha1 tree $tree_size
     ++	$commit_sha1 commit $commit_size
     ++	$tag_sha1 tag $tag_size
     ++	deadbeef missing
     ++	EOF
     ++
     ++	test_write_lines "info $hello_sha1"\
     ++	"info $tree_sha1"\
     ++	"info $commit_sha1"\
     ++	"info $tag_sha1"\
     ++	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
     ++	test_cmp expect actual
      +'
      +
     -+batch_command_mixed_input="info $hello_sha1
     -+contents $hello_sha1
     -+info $commit_sha1
     -+contents $commit_sha1
     -+info $tag_sha1
     -+contents $tag_sha1
     -+contents deadbeef
     -+flush
     -+"
     -+
     -+batch_command_mixed_output="$hello_sha1 blob $hello_size
     -+$hello_sha1 blob $hello_size
     -+$hello_content
     -+$commit_sha1 commit $commit_size
     -+$commit_sha1 commit $commit_size
     -+$commit_content
     -+$tag_sha1 tag $tag_size
     -+$tag_sha1 tag $tag_size
     -+$tag_content
     -+deadbeef missing"
     -+
     -+test_expect_success "--batch-command with mixed calls gives correct format" '
     -+	test "$(maybe_remove_timestamp "$batch_command_mixed_output" 1)" = \
     -+	"$(maybe_remove_timestamp "$(echo_without_newline \
     -+	"$batch_command_mixed_input" | git cat-file --batch-command --buffer)" 1)"
     ++test_expect_success '--batch-command with multiple command calls gives correct format' '
     ++	cat >expect <<-EOF &&
     ++	$hello_sha1 blob $hello_size
     ++	$hello_content
     ++	$commit_sha1 commit $commit_size
     ++	$commit_content
     ++	$tag_sha1 tag $tag_size
     ++	$tag_content
     ++	deadbeef missing
     ++	EOF
     ++
     ++	maybe_remove_timestamp "$(cat expect)" 1 >expect &&
     ++	maybe_remove_timestamp "$(test_write_lines "contents $hello_sha1"\
     ++	"contents $commit_sha1"\
     ++	"contents $tag_sha1"\
     ++	"contents deadbeef"\
     ++	"flush" | git cat-file --batch-command --buffer)" 1 >actual &&
     ++	test_cmp expect actual
      +'
      +
       test_expect_success 'setup blobs which are likely to delta' '
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch-all-objects --batch-c
      +test_expect_success 'batch-command empty command' '
      +	echo "" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     -+	grep -E "^fatal:.*empty command in input.*" err
     ++	grep "^fatal:.*empty command in input.*" err
      +'
      +
      +test_expect_success 'batch-command whitespace before command' '
      +	echo " info deadbeef" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     -+	grep -E "^fatal:.*whitespace before command.*" err
     ++	grep "^fatal:.*whitespace before command.*" err
      +'
      +
      +test_expect_success 'batch-command unknown command' '
      +	echo unknown_command >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     -+	grep -E "^fatal:.*unknown command.*" err
     ++	grep "^fatal:.*unknown command.*" err
     ++'
     ++
     ++test_expect_success 'batch-command missing arguments' '
     ++	echo "info" >cmd &&
     ++	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     ++	grep "^fatal:.*info requires arguments.*" err
      +'
      +
      +test_expect_success 'batch-command flush with arguments' '
      +	echo "flush arg" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
     -+	grep -E "^fatal:.*flush takes no arguments.*" err
     ++	grep "^fatal:.*flush takes no arguments.*" err
      +'
      +
      +test_expect_success 'batch-command flush without --buffer' '
      +	echo "flush arg" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
     -+	grep -E "^fatal:.*flush is only for --buffer mode.*" err
     ++	grep "^fatal:.*flush is only for --buffer mode.*" err
      +'
       
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
@ 2022-02-11 20:01         ` John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-11 20:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-11 20:01         ` John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  3 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-11 20:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

The next patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v5 3/3] cat-file: add --batch-command mode
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-11 20:01         ` John Cai via GitGitGadget
  2022-02-14 13:59           ` Phillip Wood
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  3 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-11 20:01 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  41 +++++++-
 builtin/cat-file.c             | 133 ++++++++++++++++++++++++
 t/t1006-cat-file.sh            | 178 ++++++++++++++++++++++++++++++++-
 3 files changed, 347 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..e8da704477d 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,32 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents `<object>`::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info `<object>`::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +136,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +228,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +270,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..6d54a0eb38d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,127 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	for (i = 0; i < nr; i++){
+		cmd[i].fn(opt, cmd[i].line, output, data);
+		free(cmd[i].line);
+	}
+
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
+			if (!opt->buffer_output)
+				die(_("flush is only for --buffer mode"));
+			if (*cmd_end)
+				die(_("flush takes no arguments"));
+
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			nr = 0;
+			continue;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].prefix);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].prefix);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output && nr)
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +717,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +739,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +772,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -695,6 +824,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..a501dbcc39b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -177,6 +177,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -213,6 +231,84 @@ $content"
     '
 }
 
+run_buffer_test_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	rm -f input output &&
+	mkfifo input &&
+	test_when_finished 'rm input'
+	mkfifo output &&
+	exec 9<>output &&
+	test_when_finished 'rm output; exec 9<&-'
+	(
+		# TODO - Ideally we'd pipe the output of cat-file
+		# through "sed s'/$/\\/'" to make sure that that read
+		# would consume all the available
+		# output. Unfortunately we cannot do this as we cannot
+		# control when sed flushes its output. We could write
+		# a test helper in C that appended a '\' to the end of
+		# each line and flushes its output after every line.
+		git cat-file --buffer --batch-command <input 2>err &
+		echo $! &&
+		wait $!
+		echo $?
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	(
+		test_write_lines "info $sha1" flush "info $sha1" &&
+		# TODO - consume all available input, not just one
+		# line (see above).
+		read actual <&9 &&
+		echo "$actual" >actual &&
+		echo "$sha1 $type $size" >expect &&
+		test_cmp expect actual
+	) >input &&
+	# check output is flushed on exit
+	read actual <&9 &&
+	echo "$actual" >actual &&
+	test_cmp expect actual &&
+	test_must_be_empty err &&
+	read status <&9 &&
+	test "$status" -eq 0
+}
+
+run_buffer_test_no_flush () {
+	type=$1
+	sha1=$2
+	size=$3
+
+	touch output &&
+	test_when_finished 'rm output'
+	mkfifo input &&
+	test_when_finished 'rm input'
+	mkfifo pid &&
+	exec 9<>pid &&
+	test_when_finished 'rm pid; exec 9<&-'
+	(
+		git cat-file --buffer --batch-command <input >>output &
+		echo $! &&
+		wait $!
+		echo $?
+	) >&9 &
+	sh_pid=$! &&
+	read cat_file_pid <&9 &&
+	test_when_finished "kill $cat_file_pid
+			    kill $sh_pid; wait $sh_pid; :" &&
+	(
+		test_write_lines "info $sha1" "info $sha1" &&
+		kill $cat_file_pid &&
+		read status <&9 &&
+		test "$status" -ne 0 &&
+		test_must_be_empty output
+	) >input
+}
+
+
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
 hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
@@ -224,6 +320,14 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
+       run_buffer_test_flush blob $hello_sha1 $hello_size
+'
+
+test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
+       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -267,7 +371,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -373,6 +477,43 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	test_write_lines "info $hello_sha1"\
+	"info $tree_sha1"\
+	"info $commit_sha1"\
+	"info $tag_sha1"\
+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	maybe_remove_timestamp "$(cat expect)" 1 >expect &&
+	maybe_remove_timestamp "$(test_write_lines "contents $hello_sha1"\
+	"contents $commit_sha1"\
+	"contents $tag_sha1"\
+	"contents deadbeef"\
+	"flush" | git cat-file --batch-command --buffer)" 1 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -963,5 +1104,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-11 17:45             ` John Cai
@ 2022-02-11 20:07               ` Junio C Hamano
  2022-02-11 21:30                 ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-11 20:07 UTC (permalink / raw)
  To: John Cai
  Cc: Phillip Wood, John Cai via GitGitGadget, git, me, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan

John Cai <johncai86@gmail.com> writes:

> Let me see if I understand you. What I'm hearing is that it's hard to test a git
> processes (A) that read/write from/to pipes without knowing exactly how (A) will
> behave. By necessity, the test logic will have embedded some logic in it that
> assumes certain behavior from (A), which might or might not be the case.
>
> This can lead to a hanging test if, say, it is waiting around for (A) to output
> data when due to a bug in the code, it never does. Did I get that right?

Exactly.  And we've seen such tests that are designed to hang, when
they detect bugs, which made us very unhappy and we fixed them not
to hang but reliably fail.  Otherwise, such tests weren't very
useful in unattended CI environment, which we do not want to wait
for 3 hours to timeout and leave later steps in the same script
untested.

Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 3/3] cat-file: add --batch-command mode
  2022-02-11 20:07               ` Junio C Hamano
@ 2022-02-11 21:30                 ` John Cai
  0 siblings, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-11 21:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, John Cai via GitGitGadget, git, me, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan

Hi Junio

On 11 Feb 2022, at 15:07, Junio C Hamano wrote:

> John Cai <johncai86@gmail.com> writes:
>
>> Let me see if I understand you. What I'm hearing is that it's hard to test a git
>> processes (A) that read/write from/to pipes without knowing exactly how (A) will
>> behave. By necessity, the test logic will have embedded some logic in it that
>> assumes certain behavior from (A), which might or might not be the case.
>>
>> This can lead to a hanging test if, say, it is waiting around for (A) to output
>> data when due to a bug in the code, it never does. Did I get that right?
>
> Exactly.  And we've seen such tests that are designed to hang, when
> they detect bugs, which made us very unhappy and we fixed them not
> to hang but reliably fail.  Otherwise, such tests weren't very
> useful in unattended CI environment, which we do not want to wait
> for 3 hours to timeout and leave later steps in the same script
> untested.

That makes sense. Do you have an example of one of these tests? I'd like to see
how it was converted from a test that hung to a test that failed reliably. As
I'm thinking about converting run_buffer_test_flush() and run_buffer_test_no_flush()
into tests that fail rather than hang, I'm having a hard time avoiding the
pattern of A writes to B and waits for B to respond.

>
> Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 3/3] cat-file: add --batch-command mode
  2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-14 13:59           ` Phillip Wood
  2022-02-14 16:19             ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Phillip Wood @ 2022-02-14 13:59 UTC (permalink / raw)
  To: John Cai via GitGitGadget, git
  Cc: me, avarab, e, bagasdotme, gitster, Eric Sunshine, Jonathan Tan,
	Christian Couder, John Cai

Hi John

I've concentrated on the tests again, I think the flush tests still need 
some work but the others are looking good

 >[...]>   Documentation/git-cat-file.txt |  41 +++++++-
>   builtin/cat-file.c             | 133 ++++++++++++++++++++++++
>   t/t1006-cat-file.sh            | 178 ++++++++++++++++++++++++++++++++-
>   3 files changed, 347 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index bef76f4dd06..e8da704477d 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -96,6 +96,32 @@ OPTIONS
>   	need to specify the path, separated by whitespace.  See the
>   	section `BATCH OUTPUT` below for details.
>   
> +--batch-command::

+--batch-command=<format>::

as we also take an optional format string

> +	Enter a command mode that reads commands and arguments from stdin. May
> +	only be combined with `--buffer`, `--textconv` or `--filters`. In the
> +	case of `--textconv` or `--filters`, the input lines also need to specify
> +	the path, separated by whitespace. See the section `BATCH OUTPUT` below
> +	for details.
> ++
> +`--batch-command` recognizes the following commands:
> ++
> +--
> +contents `<object>`::
> +	Print object contents for object reference `<object>`. This corresponds to
> +	the output of `--batch`.
> +
> +info `<object>`::
> +	Print object info for object reference `<object>`. This corresponds to the
> +	output of `--batch-check`.
> +
> +flush::
> +	Used with `--buffer` to execute all preceding commands that were issued
> +	since the beginning or since the last flush was issued. When `--buffer`
> +	is used, no output will come until a `flush` is issued. When `--buffer`
> +	is not used, commands are flushed each time without issuing `flush`.
> +--
> ++
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 145eee11df9..a501dbcc39b 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -177,6 +177,24 @@ $content"
>   	test_cmp expect actual
>       '
>   
> +    for opt in --buffer --no-buffer
> +    do
> +	test -z "$content" ||
> +		test_expect_success "--batch-command $opt output of $type content is correct" '
> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
> +		test_cmp expect actual
> +	'
> +
> +	test_expect_success "--batch-command $opt output of $type info is correct" '
> +		echo "$sha1 $type $size" >expect &&
> +		test_write_lines "info $sha1" \
> +		| git cat-file --batch-command $opt >actual &&
> +		test_cmp expect actual
> +	'
> +    done
> +
>       test_expect_success "custom --batch-check format" '
>   	echo "$type $sha1" >expect &&
>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
> @@ -213,6 +231,84 @@ $content"
>       '
>   }
>   
> +run_buffer_test_flush () {
> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	rm -f input output &&

I think that this should not be needed with the addition of 
"test_when_finished 'rm input output'" in run_buffer_test_no_flush()

> +	mkfifo input &&
> +	test_when_finished 'rm input'
> +	mkfifo output &&
> +	exec 9<>output &&

To address the worries about this test hanging rather than failing if 
something goes wrong I wonder if we could do something like

	(
		sleep 10
		echo "error: timeout" >&2
		echo TIMEOUT >&9
	) &
	watchdog_pid=$! &&
	test_when_finished 'kill $watchdog_pid; wait $watchdog_pid'

That should unblock any reads from fd 9 if the test hangs

> +	test_when_finished 'rm output; exec 9<&-'
> +	(
> +		# TODO - Ideally we'd pipe the output of cat-file
> +		# through "sed s'/$/\\/'" to make sure that that read
> +		# would consume all the available
> +		# output. Unfortunately we cannot do this as we cannot
> +		# control when sed flushes its output. We could write
> +		# a test helper in C that appended a '\' to the end of
> +		# each line and flushes its output after every line.
> +		git cat-file --buffer --batch-command <input 2>err &
> +		echo $! &&
> +		wait $!
> +		echo $?
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	(
> +		test_write_lines "info $sha1" flush "info $sha1" &&
> +		# TODO - consume all available input, not just one
> +		# line (see above).
> +		read actual <&9 &&
> +		echo "$actual" >actual &&
> +		echo "$sha1 $type $size" >expect &&
> +		test_cmp expect actual
> +	) >input &&
> +	# check output is flushed on exit
> +	read actual <&9 &&
> +	echo "$actual" >actual &&
> +	test_cmp expect actual &&
> +	test_must_be_empty err &&
> +	read status <&9 &&
> +	test "$status" -eq 0
> +}
> +
> +run_buffer_test_no_flush () {

This test reliably hangs for me when running with --stress

> +	type=$1
> +	sha1=$2
> +	size=$3
> +
> +	touch output &&

If output is missing at the end it means cat-file never ran which is an 
error which we do not want to hide. This is because the subshell creates 
output after opening input and before it executes cat-file below. As 
input is a fifo the open will block until it is opened for writing by 
another process and nothing wrote to it in V4 so I think that is why you 
saw an error there.

> +	test_when_finished 'rm output'
> +	mkfifo input &&
> +	test_when_finished 'rm input'
> +	mkfifo pid &&
> +	exec 9<>pid &&
> +	test_when_finished 'rm pid; exec 9<&-'
> +	(
> +		git cat-file --buffer --batch-command <input >>output &
> +		echo $! &&
> +		wait $!
> +		echo $?
> +	) >&9 &
> +	sh_pid=$! &&
> +	read cat_file_pid <&9 &&
> +	test_when_finished "kill $cat_file_pid
> +			    kill $sh_pid; wait $sh_pid; :" &&
> +	(
> +		test_write_lines "info $sha1" "info $sha1" &&
> +		kill $cat_file_pid &&
> +		read status <&9 &&

This is where the test hangs. There seems to be a race (which I don't 
understand) where we're able to read the pid of cat-file but it is not 
killed by the kill above (the subshell above is blocked on "wait $!"). 
Adding "sleep 1" before the kill above makes everything work but I'm not 
very comfortable with it. I think we might be better taking a different 
approach and introducing an environment variable such as 
GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT which stops cat-file flushing its 
output on exit and having a test along the lines of

test_write_lines "info $sha1" | GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 git 
cat-file --batch-command --buffer >output &&
test_must_be_empty_output



> +		test "$status" -ne 0 &&
> +		test_must_be_empty output
> +	) >input
> +}
> +
> +
>   hello_content="Hello World"
>   hello_size=$(strlen "$hello_content")
>   hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
> @@ -224,6 +320,14 @@ test_expect_success "setup" '
>   
>   run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>   
> +test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
> +       run_buffer_test_flush blob $hello_sha1 $hello_size
> +'
> +
> +test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
> +       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
> +'
> +
>   test_expect_success '--batch-check without %(rest) considers whole line' '
>   	echo "$hello_sha1 blob $hello_size" >expect &&
>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
> @@ -267,7 +371,7 @@ test_expect_success \
>       "Reach a blob from a tag pointing to it" \
>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>   
> -for batch in batch batch-check
> +for batch in batch batch-check batch-command
>   do
>       for opt in t s e p
>       do
> @@ -373,6 +477,43 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>   '
>   
> +test_expect_success '--batch-command with multiple info calls gives correct format' '
> +	cat >expect <<-EOF &&
> +	$hello_sha1 blob $hello_size
> +	$tree_sha1 tree $tree_size
> +	$commit_sha1 commit $commit_size
> +	$tag_sha1 tag $tag_size
> +	deadbeef missing
> +	EOF
> +
> +	test_write_lines "info $hello_sha1"\
> +	"info $tree_sha1"\
> +	"info $commit_sha1"\
> +	"info $tag_sha1"\
> +	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--batch-command with multiple command calls gives correct format' '
> +	cat >expect <<-EOF &&
> +	$hello_sha1 blob $hello_size
> +	$hello_content
> +	$commit_sha1 commit $commit_size
> +	$commit_content
> +	$tag_sha1 tag $tag_size
> +	$tag_content
> +	deadbeef missing
> +	EOF
> +
> +	maybe_remove_timestamp "$(cat expect)" 1 >expect &&
> +	maybe_remove_timestamp "$(test_write_lines "contents $hello_sha1"\
> +	"contents $commit_sha1"\
> +	"contents $tag_sha1"\
> +	"contents deadbeef"\
> +	"flush" | git cat-file --batch-command --buffer)" 1 >actual &&
> +	test_cmp expect actual

It is a shame that maybe_remove_timestamp does no read from stdin, this 
test would look much nicer if it did. Apart from that these and the ones 
below are looking good

Best Wishes

Phillip

> +'
> +
>   test_expect_success 'setup blobs which are likely to delta' '
>   	test-tool genrandom foo 10240 >foo &&
>   	{ cat foo && echo plus; } >foo-plus &&
> @@ -963,5 +1104,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>   	echo "$orig commit $orig_size" >expect &&
>   	test_cmp expect actual
>   '
> +test_expect_success 'batch-command empty command' '
> +	echo "" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*empty command in input.*" err
> +'
> +
> +test_expect_success 'batch-command whitespace before command' '
> +	echo " info deadbeef" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*whitespace before command.*" err
> +'
> +
> +test_expect_success 'batch-command unknown command' '
> +	echo unknown_command >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*unknown command.*" err
> +'
> +
> +test_expect_success 'batch-command missing arguments' '
> +	echo "info" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*info requires arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush with arguments' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
> +	grep "^fatal:.*flush takes no arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush without --buffer' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*flush is only for --buffer mode.*" err
> +'
>   
>   test_done


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v5 3/3] cat-file: add --batch-command mode
  2022-02-14 13:59           ` Phillip Wood
@ 2022-02-14 16:19             ` John Cai
  0 siblings, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-14 16:19 UTC (permalink / raw)
  To: phillip.wood
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme,
	gitster, Eric Sunshine, Jonathan Tan, Christian Couder

Hi Phillip

On 14 Feb 2022, at 8:59, Phillip Wood wrote:

> Hi John
>
> I've concentrated on the tests again, I think the flush tests still need some work but the others are looking good
>
>> [...]>   Documentation/git-cat-file.txt |  41 +++++++-
>>   builtin/cat-file.c             | 133 ++++++++++++++++++++++++
>>   t/t1006-cat-file.sh            | 178 ++++++++++++++++++++++++++++++++-
>>   3 files changed, 347 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>> index bef76f4dd06..e8da704477d 100644
>> --- a/Documentation/git-cat-file.txt
>> +++ b/Documentation/git-cat-file.txt
>> @@ -96,6 +96,32 @@ OPTIONS
>>   	need to specify the path, separated by whitespace.  See the
>>   	section `BATCH OUTPUT` below for details.
>>  +--batch-command::
>
> +--batch-command=<format>::
>
> as we also take an optional format string

good catch!

>
>> +	Enter a command mode that reads commands and arguments from stdin. May
>> +	only be combined with `--buffer`, `--textconv` or `--filters`. In the
>> +	case of `--textconv` or `--filters`, the input lines also need to specify
>> +	the path, separated by whitespace. See the section `BATCH OUTPUT` below
>> +	for details.
>> ++
>> +`--batch-command` recognizes the following commands:
>> ++
>> +--
>> +contents `<object>`::
>> +	Print object contents for object reference `<object>`. This corresponds to
>> +	the output of `--batch`.
>> +
>> +info `<object>`::
>> +	Print object info for object reference `<object>`. This corresponds to the
>> +	output of `--batch-check`.
>> +
>> +flush::
>> +	Used with `--buffer` to execute all preceding commands that were issued
>> +	since the beginning or since the last flush was issued. When `--buffer`
>> +	is used, no output will come until a `flush` is issued. When `--buffer`
>> +	is not used, commands are flushed each time without issuing `flush`.
>> +--
>> ++
>> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
>> index 145eee11df9..a501dbcc39b 100755
>> --- a/t/t1006-cat-file.sh
>> +++ b/t/t1006-cat-file.sh
>> @@ -177,6 +177,24 @@ $content"
>>   	test_cmp expect actual
>>       '
>>  +    for opt in --buffer --no-buffer
>> +    do
>> +	test -z "$content" ||
>> +		test_expect_success "--batch-command $opt output of $type content is correct" '
>> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
>> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
>> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
>> +		test_cmp expect actual
>> +	'
>> +
>> +	test_expect_success "--batch-command $opt output of $type info is correct" '
>> +		echo "$sha1 $type $size" >expect &&
>> +		test_write_lines "info $sha1" \
>> +		| git cat-file --batch-command $opt >actual &&
>> +		test_cmp expect actual
>> +	'
>> +    done
>> +
>>       test_expect_success "custom --batch-check format" '
>>   	echo "$type $sha1" >expect &&
>>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
>> @@ -213,6 +231,84 @@ $content"
>>       '
>>   }
>>  +run_buffer_test_flush () {
>> +	type=$1
>> +	sha1=$2
>> +	size=$3
>> +
>> +	rm -f input output &&
>
> I think that this should not be needed with the addition of "test_when_finished 'rm input output'" in run_buffer_test_no_flush()
>
>> +	mkfifo input &&
>> +	test_when_finished 'rm input'
>> +	mkfifo output &&
>> +	exec 9<>output &&
>
> To address the worries about this test hanging rather than failing if something goes wrong I wonder if we could do something like
>
>     (
>     	sleep 10
>     	echo "error: timeout" >&2
>     	echo TIMEOUT >&9
>     ) &
>     watchdog_pid=$! &&
>     test_when_finished 'kill $watchdog_pid; wait $watchdog_pid'
>
> That should unblock any reads from fd 9 if the test hangs
>
>> +	test_when_finished 'rm output; exec 9<&-'
>> +	(
>> +		# TODO - Ideally we'd pipe the output of cat-file
>> +		# through "sed s'/$/\\/'" to make sure that that read
>> +		# would consume all the available
>> +		# output. Unfortunately we cannot do this as we cannot
>> +		# control when sed flushes its output. We could write
>> +		# a test helper in C that appended a '\' to the end of
>> +		# each line and flushes its output after every line.
>> +		git cat-file --buffer --batch-command <input 2>err &
>> +		echo $! &&
>> +		wait $!
>> +		echo $?
>> +	) >&9 &
>> +	sh_pid=$! &&
>> +	read cat_file_pid <&9 &&
>> +	test_when_finished "kill $cat_file_pid
>> +			    kill $sh_pid; wait $sh_pid; :" &&
>> +	(
>> +		test_write_lines "info $sha1" flush "info $sha1" &&
>> +		# TODO - consume all available input, not just one
>> +		# line (see above).
>> +		read actual <&9 &&
>> +		echo "$actual" >actual &&
>> +		echo "$sha1 $type $size" >expect &&
>> +		test_cmp expect actual
>> +	) >input &&
>> +	# check output is flushed on exit
>> +	read actual <&9 &&
>> +	echo "$actual" >actual &&
>> +	test_cmp expect actual &&
>> +	test_must_be_empty err &&
>> +	read status <&9 &&
>> +	test "$status" -eq 0
>> +}
>> +
>> +run_buffer_test_no_flush () {
>
> This test reliably hangs for me when running with --stress
>
>> +	type=$1
>> +	sha1=$2
>> +	size=$3
>> +
>> +	touch output &&
>
> If output is missing at the end it means cat-file never ran which is an error which we do not want to hide. This is because the subshell creates output after opening input and before it executes cat-file below. As input is a fifo the open will block until it is opened for writing by another process and nothing wrote to it in V4 so I think that is why you saw an error there.
>
>> +	test_when_finished 'rm output'
>> +	mkfifo input &&
>> +	test_when_finished 'rm input'
>> +	mkfifo pid &&
>> +	exec 9<>pid &&
>> +	test_when_finished 'rm pid; exec 9<&-'
>> +	(
>> +		git cat-file --buffer --batch-command <input >>output &
>> +		echo $! &&
>> +		wait $!
>> +		echo $?
>> +	) >&9 &
>> +	sh_pid=$! &&
>> +	read cat_file_pid <&9 &&
>> +	test_when_finished "kill $cat_file_pid
>> +			    kill $sh_pid; wait $sh_pid; :" &&
>> +	(
>> +		test_write_lines "info $sha1" "info $sha1" &&
>> +		kill $cat_file_pid &&
>> +		read status <&9 &&
>
> This is where the test hangs. There seems to be a race (which I don't understand) where we're able to read the pid of cat-file but it is not killed by the kill above (the subshell above is blocked on "wait $!"). Adding "sleep 1" before the kill above makes everything work but I'm not very comfortable with it. I think we might be better taking a different approach and introducing an environment variable such as GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT which stops cat-file flushing its output on exit and having a test along the lines of
>
> test_write_lines "info $sha1" | GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 git cat-file --batch-command --buffer >output &&
> test_must_be_empty_output

Thanks for this suggestion! this would allow us to test flushing in a much more
straightforward way without having to open up fifo pipes. This addresses Junio's
concern in [1] about this test hanging in the future when there's a regression.

Above you suggested having a timeout with sleep, which I was considering as
well. However, I feel like using this env var is overall simpler and safer, so
maybe we can use this for both testing the case when we get a flush and when we
do not get a flush

1. https://lore.kernel.org/git/xmqqpmnt9ngx.fsf@gitster.g/

>
>
>
>> +		test "$status" -ne 0 &&
>> +		test_must_be_empty output
>> +	) >input
>> +}
>> +
>> +
>>   hello_content="Hello World"
>>   hello_size=$(strlen "$hello_content")
>>   hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
>> @@ -224,6 +320,14 @@ test_expect_success "setup" '
>>    run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>>  +test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
>> +       run_buffer_test_flush blob $hello_sha1 $hello_size
>> +'
>> +
>> +test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
>> +       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
>> +'
>> +
>>   test_expect_success '--batch-check without %(rest) considers whole line' '
>>   	echo "$hello_sha1 blob $hello_size" >expect &&
>>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
>> @@ -267,7 +371,7 @@ test_expect_success \
>>       "Reach a blob from a tag pointing to it" \
>>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>>  -for batch in batch batch-check
>> +for batch in batch batch-check batch-command
>>   do
>>       for opt in t s e p
>>       do
>> @@ -373,6 +477,43 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>>   '
>>  +test_expect_success '--batch-command with multiple info calls gives correct format' '
>> +	cat >expect <<-EOF &&
>> +	$hello_sha1 blob $hello_size
>> +	$tree_sha1 tree $tree_size
>> +	$commit_sha1 commit $commit_size
>> +	$tag_sha1 tag $tag_size
>> +	deadbeef missing
>> +	EOF
>> +
>> +	test_write_lines "info $hello_sha1"\
>> +	"info $tree_sha1"\
>> +	"info $commit_sha1"\
>> +	"info $tag_sha1"\
>> +	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success '--batch-command with multiple command calls gives correct format' '
>> +	cat >expect <<-EOF &&
>> +	$hello_sha1 blob $hello_size
>> +	$hello_content
>> +	$commit_sha1 commit $commit_size
>> +	$commit_content
>> +	$tag_sha1 tag $tag_size
>> +	$tag_content
>> +	deadbeef missing
>> +	EOF
>> +
>> +	maybe_remove_timestamp "$(cat expect)" 1 >expect &&
>> +	maybe_remove_timestamp "$(test_write_lines "contents $hello_sha1"\
>> +	"contents $commit_sha1"\
>> +	"contents $tag_sha1"\
>> +	"contents deadbeef"\
>> +	"flush" | git cat-file --batch-command --buffer)" 1 >actual &&
>> +	test_cmp expect actual
>
> It is a shame that maybe_remove_timestamp does no read from stdin, this test would look much nicer if it did. Apart from that these and the ones below are looking good

Good point. I'll see if I can adjust this in the next version.

>
> Best Wishes
>
> Phillip
>
>> +'
>> +
>>   test_expect_success 'setup blobs which are likely to delta' '
>>   	test-tool genrandom foo 10240 >foo &&
>>   	{ cat foo && echo plus; } >foo-plus &&
>> @@ -963,5 +1104,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>>   	echo "$orig commit $orig_size" >expect &&
>>   	test_cmp expect actual
>>   '
>> +test_expect_success 'batch-command empty command' '
>> +	echo "" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*empty command in input.*" err
>> +'
>> +
>> +test_expect_success 'batch-command whitespace before command' '
>> +	echo " info deadbeef" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*whitespace before command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command unknown command' '
>> +	echo unknown_command >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*unknown command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command missing arguments' '
>> +	echo "info" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*info requires arguments.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush with arguments' '
>> +	echo "flush arg" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
>> +	grep "^fatal:.*flush takes no arguments.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush without --buffer' '
>> +	echo "flush arg" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*flush is only for --buffer mode.*" err
>> +'
>>    test_done

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v6 0/4] Add cat-file --batch-command flag
  2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-14 18:23         ` John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                             ` (4 more replies)
  3 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-14 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has four parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. add a remove_timestamp() helper that takes stdin and removes timestamps
 4. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v5

 * replaced flush tests that used fifo pipes to using a GIT_TEST_ env
   variable to control whether or not --batch-command flushes on exit.
 * added remove_timestamp helper in tests.
 * added documentation to show format can be used with --batch-command

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (4):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add remove_timestamp helper
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  42 +++++++-
 builtin/cat-file.c             | 169 ++++++++++++++++++++++++++++++---
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 122 ++++++++++++++++++++++--
 4 files changed, 315 insertions(+), 21 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v6
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v5:

 1:  fa6294387ab = 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
 2:  5e0d1161df4 ! 2:  1a038097bfc cat-file: introduce batch_mode enum to replace print_contents
     @@ Metadata
       ## Commit message ##
          cat-file: introduce batch_mode enum to replace print_contents
      
     -    The next patch introduces a new --batch-command flag. Including --batch
     +    A future patch introduces a new --batch-command flag. Including --batch
          and --batch-check, we will have a total of three batch modes. print_contents
          is the only boolean on the batch_options sturct used to distinguish
          between the different modes. This makes the code harder to read.
 -:  ----------- > 3:  486ee847816 cat-file: add remove_timestamp helper
 3:  ad66d1f3e2b ! 4:  a6dd5d72fce cat-file: add --batch-command mode
     @@ Documentation/git-cat-file.txt: OPTIONS
       	section `BATCH OUTPUT` below for details.
       
      +--batch-command::
     ++--batch-command=<format>::
      +	Enter a command mode that reads commands and arguments from stdin. May
      +	only be combined with `--buffer`, `--textconv` or `--filters`. In the
      +	case of `--textconv` or `--filters`, the input lines also need to specify
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		queued_cmd[nr++] = call;
      +	}
      +
     -+	if (opt->buffer_output && nr)
     ++	if (opt->buffer_output &&
     ++	    nr &&
     ++	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
      +		dispatch_calls(opt, output, data, queued_cmd, nr);
      +
      +	free(queued_cmd);
     @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
       			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
       		/* Batch-specific options */
      
     + ## t/README ##
     +@@ t/README: a test and then fails then the whole test run will abort. This can help to make
     + sure the expected tests are executed and not silently skipped when their
     + dependency breaks or is simply not present in a new environment.
     + 
     ++GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
     ++--batch-command from flushing to output on exit.
     ++
     + Naming Tests
     + ------------
     + 
     +
       ## t/t1006-cat-file.sh ##
      @@ t/t1006-cat-file.sh: $content"
       	test_cmp expect actual
     @@ t/t1006-cat-file.sh: $content"
           test_expect_success "custom --batch-check format" '
       	echo "$type $sha1" >expect &&
       	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
     -@@ t/t1006-cat-file.sh: $content"
     -     '
     - }
     - 
     -+run_buffer_test_flush () {
     -+	type=$1
     -+	sha1=$2
     -+	size=$3
     -+
     -+	rm -f input output &&
     -+	mkfifo input &&
     -+	test_when_finished 'rm input'
     -+	mkfifo output &&
     -+	exec 9<>output &&
     -+	test_when_finished 'rm output; exec 9<&-'
     -+	(
     -+		# TODO - Ideally we'd pipe the output of cat-file
     -+		# through "sed s'/$/\\/'" to make sure that that read
     -+		# would consume all the available
     -+		# output. Unfortunately we cannot do this as we cannot
     -+		# control when sed flushes its output. We could write
     -+		# a test helper in C that appended a '\' to the end of
     -+		# each line and flushes its output after every line.
     -+		git cat-file --buffer --batch-command <input 2>err &
     -+		echo $! &&
     -+		wait $!
     -+		echo $?
     -+	) >&9 &
     -+	sh_pid=$! &&
     -+	read cat_file_pid <&9 &&
     -+	test_when_finished "kill $cat_file_pid
     -+			    kill $sh_pid; wait $sh_pid; :" &&
     -+	(
     -+		test_write_lines "info $sha1" flush "info $sha1" &&
     -+		# TODO - consume all available input, not just one
     -+		# line (see above).
     -+		read actual <&9 &&
     -+		echo "$actual" >actual &&
     -+		echo "$sha1 $type $size" >expect &&
     -+		test_cmp expect actual
     -+	) >input &&
     -+	# check output is flushed on exit
     -+	read actual <&9 &&
     -+	echo "$actual" >actual &&
     -+	test_cmp expect actual &&
     -+	test_must_be_empty err &&
     -+	read status <&9 &&
     -+	test "$status" -eq 0
     -+}
     -+
     -+run_buffer_test_no_flush () {
     -+	type=$1
     -+	sha1=$2
     -+	size=$3
     -+
     -+	touch output &&
     -+	test_when_finished 'rm output'
     -+	mkfifo input &&
     -+	test_when_finished 'rm input'
     -+	mkfifo pid &&
     -+	exec 9<>pid &&
     -+	test_when_finished 'rm pid; exec 9<&-'
     -+	(
     -+		git cat-file --buffer --batch-command <input >>output &
     -+		echo $! &&
     -+		wait $!
     -+		echo $?
     -+	) >&9 &
     -+	sh_pid=$! &&
     -+	read cat_file_pid <&9 &&
     -+	test_when_finished "kill $cat_file_pid
     -+			    kill $sh_pid; wait $sh_pid; :" &&
     -+	(
     -+		test_write_lines "info $sha1" "info $sha1" &&
     -+		kill $cat_file_pid &&
     -+		read status <&9 &&
     -+		test "$status" -ne 0 &&
     -+		test_must_be_empty output
     -+	) >input
     -+}
     -+
     -+
     - hello_content="Hello World"
     - hello_size=$(strlen "$hello_content")
     - hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
      @@ t/t1006-cat-file.sh: test_expect_success "setup" '
       
       run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
       
     -+test_expect_success PIPE '--batch-command --buffer with flush for blob info' '
     -+       run_buffer_test_flush blob $hello_sha1 $hello_size
     ++test_expect_success '--batch-command --buffer with flush for blob info' '
     ++	echo "$hello_sha1 blob $hello_size" >expect &&
     ++	test_write_lines "info $hello_sha1" "flush" | \
     ++	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
     ++	git cat-file --batch-command --buffer >actual &&
     ++	test_cmp expect actual
      +'
      +
     -+test_expect_success PIPE '--batch-command --buffer without flush for blob info' '
     -+       run_buffer_test_no_flush blob $hello_sha1 $hello_size false
     ++test_expect_success '--batch-command --buffer without flush for blob info' '
     ++	touch output &&
     ++	test_write_lines "info $hello_sha1" | \
     ++	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
     ++	git cat-file --batch-command --buffer >>output &&
     ++	test_must_be_empty output
      +'
      +
       test_expect_success '--batch-check without %(rest) considers whole line' '
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +'
      +
      +test_expect_success '--batch-command with multiple command calls gives correct format' '
     -+	cat >expect <<-EOF &&
     ++	remove_timestamp >expect <<-EOF &&
      +	$hello_sha1 blob $hello_size
      +	$hello_content
      +	$commit_sha1 commit $commit_size
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +	deadbeef missing
      +	EOF
      +
     -+	maybe_remove_timestamp "$(cat expect)" 1 >expect &&
     -+	maybe_remove_timestamp "$(test_write_lines "contents $hello_sha1"\
     ++	test_write_lines "contents $hello_sha1"\
      +	"contents $commit_sha1"\
      +	"contents $tag_sha1"\
      +	"contents deadbeef"\
     -+	"flush" | git cat-file --batch-command --buffer)" 1 >actual &&
     ++	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
      +	test_cmp expect actual
      +'
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-14 18:23           ` John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-14 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-14 18:23           ` John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-14 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

A future patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v6 3/4] cat-file: add remove_timestamp helper
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-14 18:23           ` John Cai via GitGitGadget
  2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-14 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

maybe_remove_timestamp() takes arguments, but it would be useful to have
a function that reads from stdin and strips the timestamp. This would
allow tests to pipe data into a function to remove timestamps, and
wouldn't have to always assign a variable. This is especially helpful
when the data is multiple lines.

Keep maybe_remove_timestamp() the same, but add a remove_timestamp
helper that reads from stdin.

The tests in the next patch will make use of this.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 t/t1006-cat-file.sh | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..2d52851dadc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -105,13 +105,18 @@ strlen () {
 }
 
 maybe_remove_timestamp () {
-    if test -z "$2"; then
-        echo_without_newline "$1"
-    else
-	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
-    fi
+	if test -z "$2"; then
+		echo_without_newline "$1"
+	else
+		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
+	fi
 }
 
+remove_timestamp () {
+	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
+}
+
+
 run_tests () {
     type=$1
     sha1=$2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v6 4/4] cat-file: add --batch-command mode
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                             ` (2 preceding siblings ...)
  2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-14 18:23           ` John Cai via GitGitGadget
  2022-02-15 19:39             ` Eric Sunshine
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-14 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes a <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  42 +++++++++-
 builtin/cat-file.c             | 135 +++++++++++++++++++++++++++++++++
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 107 +++++++++++++++++++++++++-
 4 files changed, 282 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..1553f65c656 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+--batch-command=<format>::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents `<object>`::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info `<object>`::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..2d433bb6180 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,129 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	for (i = 0; i < nr; i++){
+		cmd[i].fn(opt, cmd[i].line, output, data);
+		free(cmd[i].line);
+	}
+
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
+			if (!opt->buffer_output)
+				die(_("flush is only for --buffer mode"));
+			if (*cmd_end)
+				die(_("flush takes no arguments"));
+
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			nr = 0;
+			continue;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].prefix);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].prefix);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output &&
+	    nr &&
+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +719,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +741,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +774,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -695,6 +826,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/README b/t/README
index f48e0542cdc..bcd813b0c59 100644
--- a/t/README
+++ b/t/README
@@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
 sure the expected tests are executed and not silently skipped when their
 dependency breaks or is simply not present in a new environment.
 
+GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
+--batch-command from flushing to output on exit.
+
 Naming Tests
 ------------
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..3575cc63cff 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -182,6 +182,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -229,6 +247,22 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	test_write_lines "info $hello_sha1" "flush" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command --buffer without flush for blob info' '
+	touch output &&
+	test_write_lines "info $hello_sha1" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -272,7 +306,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	test_write_lines "info $hello_sha1"\
+	"info $tree_sha1"\
+	"info $commit_sha1"\
+	"info $tag_sha1"\
+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	remove_timestamp >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	test_write_lines "contents $hello_sha1"\
+	"contents $commit_sha1"\
+	"contents $tag_sha1"\
+	"contents deadbeef"\
+	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v6 4/4] cat-file: add --batch-command mode
  2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-15 19:39             ` Eric Sunshine
  2022-02-15 22:58               ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Eric Sunshine @ 2022-02-15 19:39 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, Jonathan Tan, Christian Couder, John Cai

On Mon, Feb 14, 2022 at 1:23 PM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Add a new flag --batch-command that accepts commands and arguments
> from stdin, similar to git-update-ref --stdin.

Some relatively minor comments below. Not sure any of them are serious
enough to warrant a reroll...

> The contents command takes an <object> argument and prints out the object
> contents.
>
> The info command takes a <object> argument and prints out the object
> metadata.

s/a <object>/an <object>/

> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> @@ -96,6 +96,33 @@ OPTIONS
> +contents `<object>`::
> +       Print object contents for object reference `<object>`. This corresponds to
> +       the output of `--batch`.
> +
> +info `<object>`::
> +       Print object info for object reference `<object>`. This corresponds to the
> +       output of `--batch-check`.

Sorry if I wasn't clear in my earlier review, but when I suggested
s/<object>/`<object>`/, I was referring only to the body of each item,
not to the item itself (for which we do not -- I think -- ever use
`<...>`). So:

    content <object>::
        Print object contents ... `<object>`. ...

As mentioned in my earlier review, I think the SYNOPSIS also needs an
update to mention --batch-command.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> @@ -513,6 +514,129 @@ static int batch_unordered_packed(const struct object_id *oid,
> +static void dispatch_calls(struct batch_options *opt,
> +               struct strbuf *output,
> +               struct expand_data *data,
> +               struct queued_cmd *cmd,
> +               int nr)
> +{
> +       int i;
> +
> +       for (i = 0; i < nr; i++){

Style: for (...) {

> +               cmd[i].fn(opt, cmd[i].line, output, data);
> +               free(cmd[i].line);
> +       }
> +
> +       fflush(stdout);
> +}

If I recall correctly, Junio suggested calling free() within this
loop, but this feels like an incorrect separation of concerns since it
doesn't also reset the caller's `nr` to 0. If (for some reason),
dispatch_calls() is invoked twice in a row without resetting `nr` to 0
in between the calls, then the dispatched commands will be called with
a pointer to freed memory.

One somewhat ugly way to fix this potential problem would be for this
function to clear `nr`:

    static void dispatch_calls(..., int *nr)
    {
        for (...) {
            cmd[i].fn(...);
            free(cmd[i].line);
        }
        *nr = 0;
        flush(stdout);
    }

But, as this is a private helper, the code as presented in the patch
may be "good enough" even though it's a bit fragile.

> +static void batch_objects_command(struct batch_options *opt,
> +                                   struct strbuf *output,
> +                                   struct expand_data *data)
> +{
> +       while (!strbuf_getline(&input, stdin)) {
> +               if (!input.len)
> +                       die(_("empty command in input"));
> +               if (isspace(*input.buf))
> +                       die(_("whitespace before command: '%s'"), input.buf);
> +
> +               if (skip_prefix(input.buf, "flush", &cmd_end)) {
> +                       if (!opt->buffer_output)
> +                               die(_("flush is only for --buffer mode"));
> +                       if (*cmd_end)
> +                               die(_("flush takes no arguments"));

I didn't articulate it in my previous review since the thought was
only half-formed, but given "flushify", this will incorrectly complain
that "flush takes no arguments" instead of complaining "unknown
command: flushify" as is done below (or it will incorrectly complain
"flush is only for --buffer mode" if --buffer wasn't specified).

If I'm reading the code correctly, it seems as if these problems could
be avoided by treating `flush` as just another parse_cmd::commands[]
item so that it gets all the same parsing/checking as the other
commands rather than parsing it separately here.

> +                       dispatch_calls(opt, output, data, queued_cmd, nr);
> +                       nr = 0;
> +                       continue;
> +               }
> +
> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +                       if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
> +                               continue;
> +
> +                       cmd = &commands[i];
> +                       if (cmd->takes_args) {
> +                               if (*cmd_end != ' ')
> +                                       die(_("%s requires arguments"),
> +                                           commands[i].prefix);
> +
> +                               p = cmd_end + 1;
> +                       } else if (*cmd_end) {
> +                               die(_("%s takes no arguments"),
> +                                   commands[i].prefix);
> +                       }

Good. Appears to be correctly handling the full matrix of
command-requires-arguments and the actual input having or not having
arguments.

> +                       break;
> +               }
> +
> +               if (!cmd)
> +                       die(_("unknown command: '%s'"), input.buf);

If you treat `flush` as just another parse_cmd::commands[], then right
here is where you would handle it (I think):

    if (strcmp(cmd->prefix, "flush")) {
        dispatch_calls(opt, output, data, queued_cmd, nr);
        nr = 0;
        continue;
    }

> +               if (!opt->buffer_output) {
> +                       cmd->fn(opt, p, output, data);
> +                       continue;
> +               }
> +
> +               ALLOC_GROW(queued_cmd, nr + 1, alloc);
> +               call.fn = cmd->fn;
> +               call.line = xstrdup_or_null(p);
> +               queued_cmd[nr++] = call;
> +       }
> +
> +       if (opt->buffer_output &&
> +           nr &&
> +           !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
> +               dispatch_calls(opt, output, data, queued_cmd, nr);
> +
> +       free(queued_cmd);
> +       strbuf_release(&input);
> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v6 4/4] cat-file: add --batch-command mode
  2022-02-15 19:39             ` Eric Sunshine
@ 2022-02-15 22:58               ` John Cai
  2022-02-15 23:20                 ` Eric Sunshine
  0 siblings, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-15 22:58 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: John Cai via GitGitGadget, Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, Jonathan Tan, Christian Couder

Hi Eric,

Thanks for taking another look!

On 15 Feb 2022, at 14:39, Eric Sunshine wrote:

> On Mon, Feb 14, 2022 at 1:23 PM John Cai via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Add a new flag --batch-command that accepts commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>
> Some relatively minor comments below. Not sure any of them are serious
> enough to warrant a reroll...
>
>> The contents command takes an <object> argument and prints out the object
>> contents.
>>
>> The info command takes a <object> argument and prints out the object
>> metadata.
>
> s/a <object>/an <object>/
>
>> Signed-off-by: John Cai <johncai86@gmail.com>
>> ---
>> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>> @@ -96,6 +96,33 @@ OPTIONS
>> +contents `<object>`::
>> +       Print object contents for object reference `<object>`. This corresponds to
>> +       the output of `--batch`.
>> +
>> +info `<object>`::
>> +       Print object info for object reference `<object>`. This corresponds to the
>> +       output of `--batch-check`.
>
> Sorry if I wasn't clear in my earlier review, but when I suggested
> s/<object>/`<object>`/, I was referring only to the body of each item,
> not to the item itself (for which we do not -- I think -- ever use
> `<...>`). So:
>
>     content <object>::
>         Print object contents ... `<object>`. ...
>
> As mentioned in my earlier review, I think the SYNOPSIS also needs an
> update to mention --batch-command.

:face-palm: yes I forgot about that in the last version.

>
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> @@ -513,6 +514,129 @@ static int batch_unordered_packed(const struct object_id *oid,
>> +static void dispatch_calls(struct batch_options *opt,
>> +               struct strbuf *output,
>> +               struct expand_data *data,
>> +               struct queued_cmd *cmd,
>> +               int nr)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < nr; i++){
>
> Style: for (...) {
>
>> +               cmd[i].fn(opt, cmd[i].line, output, data);
>> +               free(cmd[i].line);
>> +       }
>> +
>> +       fflush(stdout);
>> +}
>
> If I recall correctly, Junio suggested calling free() within this
> loop, but this feels like an incorrect separation of concerns since it
> doesn't also reset the caller's `nr` to 0. If (for some reason),
> dispatch_calls() is invoked twice in a row without resetting `nr` to 0
> in between the calls, then the dispatched commands will be called with
> a pointer to freed memory.
>
> One somewhat ugly way to fix this potential problem would be for this
> function to clear `nr`:
>
>     static void dispatch_calls(..., int *nr)
>     {
>         for (...) {
>             cmd[i].fn(...);
>             free(cmd[i].line);
>         }
>         *nr = 0;
>         flush(stdout);
>     }
>
> But, as this is a private helper, the code as presented in the patch
> may be "good enough" even though it's a bit fragile.

What you suggested makes sense from a separation of concerns point of view. I'm
still learning what looks ugly in C :), but I think this is easier to read
overall than what I had before.

>
>> +static void batch_objects_command(struct batch_options *opt,
>> +                                   struct strbuf *output,
>> +                                   struct expand_data *data)
>> +{
>> +       while (!strbuf_getline(&input, stdin)) {
>> +               if (!input.len)
>> +                       die(_("empty command in input"));
>> +               if (isspace(*input.buf))
>> +                       die(_("whitespace before command: '%s'"), input.buf);
>> +
>> +               if (skip_prefix(input.buf, "flush", &cmd_end)) {
>> +                       if (!opt->buffer_output)
>> +                               die(_("flush is only for --buffer mode"));
>> +                       if (*cmd_end)
>> +                               die(_("flush takes no arguments"));
>
> I didn't articulate it in my previous review since the thought was
> only half-formed, but given "flushify", this will incorrectly complain
> that "flush takes no arguments" instead of complaining "unknown
> command: flushify" as is done below (or it will incorrectly complain
> "flush is only for --buffer mode" if --buffer wasn't specified).
>
> If I'm reading the code correctly, it seems as if these problems could
> be avoided by treating `flush` as just another parse_cmd::commands[]
> item so that it gets all the same parsing/checking as the other
> commands rather than parsing it separately here.

This is a good idea. I like the reduced complexity.

>
>> +                       dispatch_calls(opt, output, data, queued_cmd, nr);
>> +                       nr = 0;
>> +                       continue;
>> +               }
>> +
>> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
>> +                       if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
>> +                               continue;
>> +
>> +                       cmd = &commands[i];
>> +                       if (cmd->takes_args) {
>> +                               if (*cmd_end != ' ')
>> +                                       die(_("%s requires arguments"),
>> +                                           commands[i].prefix);
>> +
>> +                               p = cmd_end + 1;
>> +                       } else if (*cmd_end) {
>> +                               die(_("%s takes no arguments"),
>> +                                   commands[i].prefix);
>> +                       }
>
> Good. Appears to be correctly handling the full matrix of
> command-requires-arguments and the actual input having or not having
> arguments.
>
>> +                       break;
>> +               }
>> +
>> +               if (!cmd)
>> +                       die(_("unknown command: '%s'"), input.buf);
>
> If you treat `flush` as just another parse_cmd::commands[], then right
> here is where you would handle it (I think):
>
>     if (strcmp(cmd->prefix, "flush")) {
>         dispatch_calls(opt, output, data, queued_cmd, nr);
>         nr = 0;
>         continue;
>     }
>
>> +               if (!opt->buffer_output) {
>> +                       cmd->fn(opt, p, output, data);
>> +                       continue;
>> +               }
>> +
>> +               ALLOC_GROW(queued_cmd, nr + 1, alloc);
>> +               call.fn = cmd->fn;
>> +               call.line = xstrdup_or_null(p);
>> +               queued_cmd[nr++] = call;
>> +       }
>> +
>> +       if (opt->buffer_output &&
>> +           nr &&
>> +           !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
>> +               dispatch_calls(opt, output, data, queued_cmd, nr);
>> +
>> +       free(queued_cmd);
>> +       strbuf_release(&input);
>> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v6 4/4] cat-file: add --batch-command mode
  2022-02-15 22:58               ` John Cai
@ 2022-02-15 23:20                 ` Eric Sunshine
  0 siblings, 0 replies; 97+ messages in thread
From: Eric Sunshine @ 2022-02-15 23:20 UTC (permalink / raw)
  To: John Cai
  Cc: John Cai via GitGitGadget, Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Junio C Hamano, Jonathan Tan, Christian Couder

On Tue, Feb 15, 2022 at 5:58 PM John Cai <johncai86@gmail.com> wrote:
> On 15 Feb 2022, at 14:39, Eric Sunshine wrote:
> > On Mon, Feb 14, 2022 at 1:23 PM John Cai via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> +static void dispatch_calls(struct batch_options *opt,
> >> +               int nr)
> >> +{
> >> +       for (i = 0; i < nr; i++){
> >> +               cmd[i].fn(opt, cmd[i].line, output, data);
> >> +               free(cmd[i].line);
> >> +       }
> >> +       fflush(stdout);
> >> +}
> >
> > If I recall correctly, Junio suggested calling free() within this
> > loop, but this feels like an incorrect separation of concerns since it
> > doesn't also reset the caller's `nr` to 0. If (for some reason),
> > dispatch_calls() is invoked twice in a row without resetting `nr` to 0
> > in between the calls, then the dispatched commands will be called with
> > a pointer to freed memory.
> >
> > One somewhat ugly way to fix this potential problem would be for this
> > function to clear `nr`:
> >
> >     static void dispatch_calls(..., int *nr)
> >     {
> >         for (...) {
> >             cmd[i].fn(...);
> >             free(cmd[i].line);
> >         }
> >         *nr = 0;
> >         flush(stdout);
> >     }
> >
> > But, as this is a private helper, the code as presented in the patch
> > may be "good enough" even though it's a bit fragile.
>
> What you suggested makes sense from a separation of concerns point of view. I'm
> still learning what looks ugly in C :), but I think this is easier to read
> overall than what I had before.

Even with my suggestion, it's still rather ugly for a "dispatcher"
function to be freeing resources allocated by some other entity and
for it to be clearing the other entity's `nr` variable, but at least
it's less fragile. It would be less ugly, perhaps, if this function
was named dispatch_and_free().

A better way to make it less ugly would probably be to introduce a
structure which holds the array of batch_options* and the `nr` and
`alloc` variables, and then have a dedicated function for
clearing/freeing that structure. However, while such an approach is
fine for reusable containers but is probably way overkill for this
one-off case.

> > If I'm reading the code correctly, it seems as if these problems could
> > be avoided by treating `flush` as just another parse_cmd::commands[]
> > item so that it gets all the same parsing/checking as the other
> > commands rather than parsing it separately here.
>
> This is a good idea. I like the reduced complexity.
>
> > If you treat `flush` as just another parse_cmd::commands[], then right
> > here is where you would handle it (I think):
> >
> >     if (strcmp(cmd->prefix, "flush")) {
> >         dispatch_calls(opt, output, data, queued_cmd, nr);
> >         nr = 0;
> >         continue;
> >     }

One other point which would make this suggested code even clearer is
to rename the parse_cmd::prefix member to `command` or `name` or
`token` or something other than "prefix" since it really _is_ the
command name, not a prefix of the command. (You're only treating it as
a prefix semantically at parse time.) Thus:

    if (strcmp(cmd->name, "flush")) {
        dispatch_calls(opt, output, data, queued_cmd, nr);
        nr = 0;
        continue;
    }

is easier to understand at a glance than `strcmp(cmd->prefix, "flush"))`.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v7 0/4] Add cat-file --batch-command flag
  2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                             ` (3 preceding siblings ...)
  2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-16  0:53           ` John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                               ` (4 more replies)
  4 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16  0:53 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has four parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. add a remove_timestamp() helper that takes stdin and removes timestamps
 4. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v6 (thanks to Eric's feedback)

 * allow command parsing logic to handle the case of flush as well
 * fixed documentation by adding --batch-command to the synopsis and
   adjusting tick marks
 * set nr=0 within helper function

Changes since v5

 * replaced flush tests that used fifo pipes to using a GIT_TEST_ env
   variable to control whether or not --batch-command flushes on exit.
 * added remove_timestamp helper in tests.
 * added documentation to show format can be used with --batch-command

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (4):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add remove_timestamp helper
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  42 +++++++-
 builtin/cat-file.c             | 170 ++++++++++++++++++++++++++++++---
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 122 +++++++++++++++++++++--
 4 files changed, 315 insertions(+), 22 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v7
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v6:

 1:  fa6294387ab = 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
 2:  1a038097bfc = 2:  1a038097bfc cat-file: introduce batch_mode enum to replace print_contents
 3:  486ee847816 = 3:  486ee847816 cat-file: add remove_timestamp helper
 4:  a6dd5d72fce ! 4:  aebaf7e3fe1 cat-file: add --batch-command mode
     @@ Commit message
          The contents command takes an <object> argument and prints out the object
          contents.
      
     -    The info command takes a <object> argument and prints out the object
     +    The info command takes an <object> argument and prints out the object
          metadata.
      
          These can be used in the following way with --buffer:
     @@ Documentation/git-cat-file.txt: OPTIONS
      +`--batch-command` recognizes the following commands:
      ++
      +--
     -+contents `<object>`::
     ++contents <object>::
      +	Print object contents for object reference `<object>`. This corresponds to
      +	the output of `--batch`.
      +
     -+info `<object>`::
     ++info <object>::
      +	Print object info for object reference `<object>`. This corresponds to the
      +	output of `--batch-check`.
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		struct strbuf *output,
      +		struct expand_data *data,
      +		struct queued_cmd *cmd,
     -+		int nr)
     ++		size_t *nr)
      +{
      +	int i;
      +
     -+	for (i = 0; i < nr; i++){
     ++	if (!opt->buffer_output)
     ++		die(_("flush is only for --buffer mode"));
     ++
     ++	for (i = 0; i < *nr; i++) {
      +		cmd[i].fn(opt, cmd[i].line, output, data);
      +		free(cmd[i].line);
      +	}
      +
     ++	*nr = 0;
      +	fflush(stdout);
      +}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +} commands[] = {
      +	{ "contents", parse_cmd_contents, 1},
      +	{ "info", parse_cmd_info, 1},
     ++	{ "flush", NULL, 0},
      +};
      +
      +static void batch_objects_command(struct batch_options *opt,
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		if (isspace(*input.buf))
      +			die(_("whitespace before command: '%s'"), input.buf);
      +
     -+		if (skip_prefix(input.buf, "flush", &cmd_end)) {
     -+			if (!opt->buffer_output)
     -+				die(_("flush is only for --buffer mode"));
     -+			if (*cmd_end)
     -+				die(_("flush takes no arguments"));
     -+
     -+			dispatch_calls(opt, output, data, queued_cmd, nr);
     -+			nr = 0;
     -+			continue;
     -+		}
     -+
      +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
      +			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
      +				continue;
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		if (!cmd)
      +			die(_("unknown command: '%s'"), input.buf);
      +
     ++		if (!strcmp(cmd->prefix, "flush")) {
     ++			dispatch_calls(opt, output, data, queued_cmd, &nr);
     ++			continue;
     ++		}
     ++
      +		if (!opt->buffer_output) {
      +			cmd->fn(opt, p, output, data);
      +			continue;
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	if (opt->buffer_output &&
      +	    nr &&
      +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
     -+		dispatch_calls(opt, output, data, queued_cmd, nr);
     ++		dispatch_calls(opt, output, data, queued_cmd, &nr);
      +
      +	free(queued_cmd);
      +	strbuf_release(&input);
     @@ builtin/cat-file.c: static int batch_option_callback(const struct option *opt,
       	else
       		BUG("%s given to batch-option-callback", opt->long_name);
       
     +@@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
     + 		N_("git cat-file <type> <object>"),
     + 		N_("git cat-file (-e | -p) <object>"),
     + 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
     +-		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
     ++		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
     + 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
     + 		   "             [--textconv | --filters]"),
     + 		N_("git cat-file (--textconv | --filters)\n"
      @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
       			N_("like --batch, but don't emit <contents>"),
       			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch-all-objects --batch-c
      +'
      +
      +test_expect_success 'batch-command flush without --buffer' '
     -+	echo "flush arg" >cmd &&
     ++	echo "flush" >cmd &&
      +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
      +	grep "^fatal:.*flush is only for --buffer mode.*" err
      +'

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-16  0:53             ` John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16  0:53 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-16  0:53             ` John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16  0:53 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

A future patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v7 3/4] cat-file: add remove_timestamp helper
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-16  0:53             ` John Cai via GitGitGadget
  2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16  0:53 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

maybe_remove_timestamp() takes arguments, but it would be useful to have
a function that reads from stdin and strips the timestamp. This would
allow tests to pipe data into a function to remove timestamps, and
wouldn't have to always assign a variable. This is especially helpful
when the data is multiple lines.

Keep maybe_remove_timestamp() the same, but add a remove_timestamp
helper that reads from stdin.

The tests in the next patch will make use of this.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 t/t1006-cat-file.sh | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..2d52851dadc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -105,13 +105,18 @@ strlen () {
 }
 
 maybe_remove_timestamp () {
-    if test -z "$2"; then
-        echo_without_newline "$1"
-    else
-	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
-    fi
+	if test -z "$2"; then
+		echo_without_newline "$1"
+	else
+		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
+	fi
 }
 
+remove_timestamp () {
+	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
+}
+
+
 run_tests () {
     type=$1
     sha1=$2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                               ` (2 preceding siblings ...)
  2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-16  0:53             ` John Cai via GitGitGadget
  2022-02-16  1:28               ` Junio C Hamano
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16  0:53 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes an <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  42 +++++++++-
 builtin/cat-file.c             | 136 ++++++++++++++++++++++++++++++++-
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 107 +++++++++++++++++++++++++-
 4 files changed, 282 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..70c5b4f12d1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+--batch-command=<format>::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents <object>::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info <object>::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..5dd876c5b09 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,128 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		size_t *nr)
+{
+	int i;
+
+	if (!opt->buffer_output)
+		die(_("flush is only for --buffer mode"));
+
+	for (i = 0; i < *nr; i++) {
+		cmd[i].fn(opt, cmd[i].line, output, data);
+		free(cmd[i].line);
+	}
+
+	*nr = 0;
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+	{ "flush", NULL, 0},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].prefix);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].prefix);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!strcmp(cmd->prefix, "flush")) {
+			dispatch_calls(opt, output, data, queued_cmd, &nr);
+			continue;
+		}
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output &&
+	    nr &&
+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
+		dispatch_calls(opt, output, data, queued_cmd, &nr);
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +718,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +740,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +773,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -666,7 +796,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file <type> <object>"),
 		N_("git cat-file (-e | -p) <object>"),
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
-		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
+		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
 		   "             [--textconv | --filters]"),
 		N_("git cat-file (--textconv | --filters)\n"
@@ -695,6 +825,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/README b/t/README
index f48e0542cdc..bcd813b0c59 100644
--- a/t/README
+++ b/t/README
@@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
 sure the expected tests are executed and not silently skipped when their
 dependency breaks or is simply not present in a new environment.
 
+GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
+--batch-command from flushing to output on exit.
+
 Naming Tests
 ------------
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..74f0e36b69e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -182,6 +182,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -229,6 +247,22 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	test_write_lines "info $hello_sha1" "flush" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command --buffer without flush for blob info' '
+	touch output &&
+	test_write_lines "info $hello_sha1" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -272,7 +306,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	test_write_lines "info $hello_sha1"\
+	"info $tree_sha1"\
+	"info $commit_sha1"\
+	"info $tag_sha1"\
+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	remove_timestamp >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	test_write_lines "contents $hello_sha1"\
+	"contents $commit_sha1"\
+	"contents $tag_sha1"\
+	"contents deadbeef"\
+	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-16  1:28               ` Junio C Hamano
  2022-02-16  2:48                 ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-16  1:28 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, Christian Couder, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static void dispatch_calls(struct batch_options *opt,
> +		struct strbuf *output,
> +		struct expand_data *data,
> +		struct queued_cmd *cmd,
> +		size_t *nr)
> +{
> +	int i;
> +
> +	if (!opt->buffer_output)
> +		die(_("flush is only for --buffer mode"));
> +
> +	for (i = 0; i < *nr; i++) {

If you updated the max number of items *nr to size_t, don't you need
to use 'i' with the same type to count up to it?

> +		cmd[i].fn(opt, cmd[i].line, output, data);
> +		free(cmd[i].line);
> +	}
> +
> +	*nr = 0;
> +	fflush(stdout);
> +}

Wouldn't it be easier to reason about what the caller/callee are
responsible for, if the function signature looked more like:

	static size_t dispatch_calls(struct batch_options *opt,
				     ...
				     struct queued_cmd cmd[], size_t nr)
	{
		size_t i;

		for (i = 0; i < nr; i++)
			... do stuff ...;

		return updated_nr;
	}

and make the caller do

	nr = dispatch_calls(opt, ..., nr);

or if the function *never* leaves anything in the queue when it
returns, then

	static void dispatch_calls(struct batch_options *opt,
				     ...
				     struct queued_cmd cmd[], size_t nr)
	{
		size_t i;

		for (i = 0; i < nr; i++)
			... do stuff ...;
	}

and make the caller do

	dispatch_calls(opt, ..., nr);
	nr = 0;

instead of passing a pointer to nr like the posted patch?

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  1:28               ` Junio C Hamano
@ 2022-02-16  2:48                 ` John Cai
  2022-02-16  3:00                   ` Junio C Hamano
  2022-02-16  3:01                   ` Eric Sunshine
  0 siblings, 2 replies; 97+ messages in thread
From: John Cai @ 2022-02-16  2:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan, Christian Couder



On 15 Feb 2022, at 20:28, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> +static void dispatch_calls(struct batch_options *opt,
>> +		struct strbuf *output,
>> +		struct expand_data *data,
>> +		struct queued_cmd *cmd,
>> +		size_t *nr)
>> +{
>> +	int i;
>> +
>> +	if (!opt->buffer_output)
>> +		die(_("flush is only for --buffer mode"));
>> +
>> +	for (i = 0; i < *nr; i++) {
>
> If you updated the max number of items *nr to size_t, don't you need
> to use 'i' with the same type to count up to it?
>
>> +		cmd[i].fn(opt, cmd[i].line, output, data);
>> +		free(cmd[i].line);
>> +	}
>> +
>> +	*nr = 0;
>> +	fflush(stdout);
>> +}
>
> Wouldn't it be easier to reason about what the caller/callee are
> responsible for, if the function signature looked more like:
>
>     static size_t dispatch_calls(struct batch_options *opt,
>     			     ...
>     			     struct queued_cmd cmd[], size_t nr)
>     {
>     	size_t i;
>
>     	for (i = 0; i < nr; i++)
>     		... do stuff ...;
>
>     	return updated_nr;
>     }
>
> and make the caller do
>
>     nr = dispatch_calls(opt, ..., nr);
>
> or if the function *never* leaves anything in the queue when it
> returns, then
>
>     static void dispatch_calls(struct batch_options *opt,
>     			     ...
>     			     struct queued_cmd cmd[], size_t nr)
>     {
>     	size_t i;
>
>     	for (i = 0; i < nr; i++)
>     		... do stuff ...;
>     }
>
> and make the caller do
>
>     dispatch_calls(opt, ..., nr);
>     nr = 0;
>
> instead of passing a pointer to nr like the posted patch?

Yeah, this is what I had before but there was discussion about separation of concerns in [1]. But perhaps it's preferable compared to passing a pointer to nr.

1. https://lore.kernel.org/git/CAPig+cTwLhn1GZ_=6s0FXL0z=Q=p1w9ZGK0hAV8wfK9RsQYjnA@mail.gmail.com/


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  2:48                 ` John Cai
@ 2022-02-16  3:00                   ` Junio C Hamano
  2022-02-16  3:17                     ` Eric Sunshine
  2022-02-16  3:01                   ` Eric Sunshine
  1 sibling, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-16  3:00 UTC (permalink / raw)
  To: John Cai
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan, Christian Couder

John Cai <johncai86@gmail.com> writes:

> Yeah, this is what I had before but there was discussion about
> separation of concerns in [1]. But perhaps it's preferable
> compared to passing a pointer to nr.

Oh, I see.

I do not see any issue with separation of concerns here, actually.

As long as "dispatch_calls() consumes all the cmd[] before it
returns to the caller" is clearly understood between the function
and its caller(s) [*], clearing of "nr" the caller has is entirely
caller's problem.  It becomes needed only because this caller
decides to reuse cmd[] array.

	Side note: you do have a comment before the function to tell
	what to expect out of the helper for its callers, right?

If it were just "accumulate many cmd[] and call the function once"
caller, it would care to maintain the correct "nr" only up to the
point where the function is called (because <cmd[], nr> pair is the
way the function takes the list of commands and it needs a correct
"nr"), but not after making the call, as the only thing left to do
would be to free the cmd[] array itself, which does not even need
"nr".


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  2:48                 ` John Cai
  2022-02-16  3:00                   ` Junio C Hamano
@ 2022-02-16  3:01                   ` Eric Sunshine
  1 sibling, 0 replies; 97+ messages in thread
From: Eric Sunshine @ 2022-02-16  3:01 UTC (permalink / raw)
  To: John Cai
  Cc: Junio C Hamano, John Cai via GitGitGadget, Git List, Taylor Blau,
	Phillip Wood, Ævar Arnfjörð Bjarmason, Eric Wong,
	Bagas Sanjaya, Jonathan Tan, Christian Couder

On Tue, Feb 15, 2022 at 9:48 PM John Cai <johncai86@gmail.com> wrote:
> On 15 Feb 2022, at 20:28, Junio C Hamano wrote:
> > and make the caller do
> >
> >     dispatch_calls(opt, ..., nr);
> >     nr = 0;
> >
> > instead of passing a pointer to nr like the posted patch?
>
> Yeah, this is what I had before but there was discussion about separation of concerns in [1]. But perhaps it's preferable compared to passing a pointer to nr.
>
> 1. https://lore.kernel.org/git/CAPig+cTwLhn1GZ_=6s0FXL0z=Q=p1w9ZGK0hAV8wfK9RsQYjnA@mail.gmail.com/

My biggest concern when mentioning it during review was that if a
caller forgets to do `nr = 0`, then a sequence such as:

    dispatch_calls(...);
    ...
    dispatch_calls(...);

will send dangling pointers to the command handlers in the second
dispatch_calls() invocation because the first call to dispatch_calls()
did free(cmd[i].line). In that sense, it's an accident waiting to
happen if people modifying this code in the future aren't paying close
attention.

That said, I don't feel too strongly about it and mentioned in my
review that it might be "good enough" as-is (with the caller having to
remember to `nr = 0`) since it's a local helper function.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v7 4/4] cat-file: add --batch-command mode
  2022-02-16  3:00                   ` Junio C Hamano
@ 2022-02-16  3:17                     ` Eric Sunshine
  0 siblings, 0 replies; 97+ messages in thread
From: Eric Sunshine @ 2022-02-16  3:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai, John Cai via GitGitGadget, Git List, Taylor Blau,
	Phillip Wood, Ævar Arnfjörð Bjarmason, Eric Wong,
	Bagas Sanjaya, Jonathan Tan, Christian Couder

On Tue, Feb 15, 2022 at 10:00 PM Junio C Hamano <gitster@pobox.com> wrote:
> John Cai <johncai86@gmail.com> writes:
> > Yeah, this is what I had before but there was discussion about
> > separation of concerns in [1]. But perhaps it's preferable
> > compared to passing a pointer to nr.
>
> Oh, I see.
>
> I do not see any issue with separation of concerns here, actually.
>
> As long as "dispatch_calls() consumes all the cmd[] before it
> returns to the caller" is clearly understood between the function
> and its caller(s) [*], clearing of "nr" the caller has is entirely
> caller's problem.  It becomes needed only because this caller
> decides to reuse cmd[] array.
>
>         Side note: you do have a comment before the function to tell
>         what to expect out of the helper for its callers, right?

True, it is the caller's problem due to reusing the array, though
dispatch_calls() freeing `line` muddies things slightly. A comment
before the function could indeed be helpful, especially letting
callers know that `line` is being freed.

It probably also would be wise to replace the free() call with
FREE_AND_NULL() in order to address the dangling-pointer concern.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v8 0/4] Add cat-file --batch-command flag
  2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                               ` (3 preceding siblings ...)
  2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-16 15:02             ` John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                                 ` (4 more replies)
  4 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 15:02 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has four parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. add a remove_timestamp() helper that takes stdin and removes timestamps
 4. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v7

 * revert back to having caller set nr to 0
 * add comment before dispatch_calls to clarify usage of helper
 * rename prefix->name

Changes since v6 (thanks to Eric's feedback)

 * allow command parsing logic to handle the case of flush as well
 * fixed documentation by adding --batch-command to the synopsis and
   adjusting tick marks
 * set nr=0 within helper function

Changes since v5

 * replaced flush tests that used fifo pipes to using a GIT_TEST_ env
   variable to control whether or not --batch-command flushes on exit.
 * added remove_timestamp helper in tests.
 * added documentation to show format can be used with --batch-command

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (4):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add remove_timestamp helper
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  42 +++++++-
 builtin/cat-file.c             | 174 ++++++++++++++++++++++++++++++---
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 122 +++++++++++++++++++++--
 4 files changed, 319 insertions(+), 22 deletions(-)


base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v8
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v7:

 1:  fa6294387ab = 1:  fa6294387ab cat-file: rename cmdmode to transform_mode
 2:  1a038097bfc = 2:  1a038097bfc cat-file: introduce batch_mode enum to replace print_contents
 3:  486ee847816 = 3:  486ee847816 cat-file: add remove_timestamp helper
 4:  aebaf7e3fe1 ! 4:  8edf80574b8 cat-file: add --batch-command mode
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	batch_one_object(line, output, opt, data);
      +}
      +
     ++/* Loop through each queued_cmd, dispatch the function, free the
     ++ * memory associated with line so the cmd array can be reused.
     ++ * Callers must set nr back to 0 in order to reuse the cmd array.
     ++ */
      +static void dispatch_calls(struct batch_options *opt,
      +		struct strbuf *output,
      +		struct expand_data *data,
      +		struct queued_cmd *cmd,
     -+		size_t *nr)
     ++		int nr)
      +{
      +	int i;
      +
      +	if (!opt->buffer_output)
      +		die(_("flush is only for --buffer mode"));
      +
     -+	for (i = 0; i < *nr; i++) {
     ++	for (i = 0; i < nr; i++) {
      +		cmd[i].fn(opt, cmd[i].line, output, data);
     -+		free(cmd[i].line);
     ++		FREE_AND_NULL(cmd[i].line);
      +	}
      +
     -+	*nr = 0;
      +	fflush(stdout);
      +}
      +
      +static const struct parse_cmd {
     -+	const char *prefix;
     ++	const char *name;
      +	parse_cmd_fn_t fn;
      +	unsigned takes_args;
      +} commands[] = {
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +			die(_("whitespace before command: '%s'"), input.buf);
      +
      +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
     -+			if (!skip_prefix(input.buf, commands[i].prefix, &cmd_end))
     ++			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
      +				continue;
      +
      +			cmd = &commands[i];
      +			if (cmd->takes_args) {
      +				if (*cmd_end != ' ')
      +					die(_("%s requires arguments"),
     -+					    commands[i].prefix);
     ++					    commands[i].name);
      +
      +				p = cmd_end + 1;
      +			} else if (*cmd_end) {
      +				die(_("%s takes no arguments"),
     -+				    commands[i].prefix);
     ++				    commands[i].name);
      +			}
      +
      +			break;
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +		if (!cmd)
      +			die(_("unknown command: '%s'"), input.buf);
      +
     -+		if (!strcmp(cmd->prefix, "flush")) {
     -+			dispatch_calls(opt, output, data, queued_cmd, &nr);
     ++		if (!strcmp(cmd->name, "flush")) {
     ++			dispatch_calls(opt, output, data, queued_cmd, nr);
     ++			nr = 0;
      +			continue;
      +		}
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	if (opt->buffer_output &&
      +	    nr &&
      +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
     -+		dispatch_calls(opt, output, data, queued_cmd, &nr);
     ++		dispatch_calls(opt, output, data, queued_cmd, nr);
      +
      +	free(queued_cmd);
      +	strbuf_release(&input);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-16 15:02               ` John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 15:02 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-16 15:02               ` John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 15:02 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

A future patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v8 3/4] cat-file: add remove_timestamp helper
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-16 15:02               ` John Cai via GitGitGadget
  2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 15:02 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

maybe_remove_timestamp() takes arguments, but it would be useful to have
a function that reads from stdin and strips the timestamp. This would
allow tests to pipe data into a function to remove timestamps, and
wouldn't have to always assign a variable. This is especially helpful
when the data is multiple lines.

Keep maybe_remove_timestamp() the same, but add a remove_timestamp
helper that reads from stdin.

The tests in the next patch will make use of this.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 t/t1006-cat-file.sh | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..2d52851dadc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -105,13 +105,18 @@ strlen () {
 }
 
 maybe_remove_timestamp () {
-    if test -z "$2"; then
-        echo_without_newline "$1"
-    else
-	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
-    fi
+	if test -z "$2"; then
+		echo_without_newline "$1"
+	else
+		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
+	fi
 }
 
+remove_timestamp () {
+	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
+}
+
+
 run_tests () {
     type=$1
     sha1=$2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v8 4/4] cat-file: add --batch-command mode
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                 ` (2 preceding siblings ...)
  2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-16 15:02               ` John Cai via GitGitGadget
  2022-02-16 17:15                 ` Junio C Hamano
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 15:02 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes an <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  42 +++++++++-
 builtin/cat-file.c             | 140 ++++++++++++++++++++++++++++++++-
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 107 ++++++++++++++++++++++++-
 4 files changed, 286 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..70c5b4f12d1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+--batch-command=<format>::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents <object>::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info <object>::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..39d486dd737 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,132 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+/* Loop through each queued_cmd, dispatch the function, free the
+ * memory associated with line so the cmd array can be reused.
+ * Callers must set nr back to 0 in order to reuse the cmd array.
+ */
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	if (!opt->buffer_output)
+		die(_("flush is only for --buffer mode"));
+
+	for (i = 0; i < nr; i++) {
+		cmd[i].fn(opt, cmd[i].line, output, data);
+		FREE_AND_NULL(cmd[i].line);
+	}
+
+	fflush(stdout);
+}
+
+static const struct parse_cmd {
+	const char *name;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+	{ "flush", NULL, 0},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].name);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].name);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!strcmp(cmd->name, "flush")) {
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			nr = 0;
+			continue;
+		}
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output &&
+	    nr &&
+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +722,10 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +744,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +777,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -666,7 +800,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file <type> <object>"),
 		N_("git cat-file (-e | -p) <object>"),
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
-		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
+		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
 		   "             [--textconv | --filters]"),
 		N_("git cat-file (--textconv | --filters)\n"
@@ -695,6 +829,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/README b/t/README
index f48e0542cdc..bcd813b0c59 100644
--- a/t/README
+++ b/t/README
@@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
 sure the expected tests are executed and not silently skipped when their
 dependency breaks or is simply not present in a new environment.
 
+GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
+--batch-command from flushing to output on exit.
+
 Naming Tests
 ------------
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..74f0e36b69e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -182,6 +182,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -229,6 +247,22 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	test_write_lines "info $hello_sha1" "flush" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command --buffer without flush for blob info' '
+	touch output &&
+	test_write_lines "info $hello_sha1" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -272,7 +306,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	test_write_lines "info $hello_sha1"\
+	"info $tree_sha1"\
+	"info $commit_sha1"\
+	"info $tag_sha1"\
+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	remove_timestamp >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	test_write_lines "contents $hello_sha1"\
+	"contents $commit_sha1"\
+	"contents $tag_sha1"\
+	"contents deadbeef"\
+	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v8 4/4] cat-file: add --batch-command mode
  2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-16 17:15                 ` Junio C Hamano
  2022-02-16 17:25                   ` Eric Sunshine
  0 siblings, 1 reply; 97+ messages in thread
From: Junio C Hamano @ 2022-02-16 17:15 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, Christian Couder, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/* Loop through each queued_cmd, dispatch the function, free the
> + * memory associated with line so the cmd array can be reused.
> + * Callers must set nr back to 0 in order to reuse the cmd array.
> + */

Style.

	/*
	 * Our multi-line comments look like this;
	 * slash-asterisk on the opening line and
	 * asterisk-slash on the closing line sit
	 * on their own lines.
	 */

I am not sure if that is an accurate description.  The caller does
not have to reuse the cmd array.  The only thing they have to know
about this function is that the .line member is freed after the
function is done with it, so they do not have to free the member
themselves.

As it seems that this code structure and division of responsibility
between the caller and the callee is confusing even to the author of
this code, it may make sense to make the caller responsible for
freeing.

Then the caller becomes

> +		if (!strcmp(cmd->name, "flush")) {
> +			dispatch_calls(opt, output, data, queued_cmd, nr);

			for (i = 0; i < nr; i++)
				free(queued_cmd[i].line);

> +			nr = 0;
> +			continue;
> +		}

which is not too bad.  And then we'd free the array itself at the
end ...

> ...
> +		call.line = xstrdup_or_null(p);
> +		queued_cmd[nr++] = call;
> +	}
> +
> +	if (opt->buffer_output &&
> +	    nr &&
> +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
> +		dispatch_calls(opt, output, data, queued_cmd, nr);
> +
> +	free(queued_cmd);

... which may be easier to see what is going on.

Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v8 4/4] cat-file: add --batch-command mode
  2022-02-16 17:15                 ` Junio C Hamano
@ 2022-02-16 17:25                   ` Eric Sunshine
  2022-02-16 20:30                     ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Eric Sunshine @ 2022-02-16 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: John Cai via GitGitGadget, Git List, Taylor Blau, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Eric Wong, Bagas Sanjaya,
	Jonathan Tan, Christian Couder, John Cai

On Wed, Feb 16, 2022 at 12:15 PM Junio C Hamano <gitster@pobox.com> wrote:
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> As it seems that this code structure and division of responsibility
> between the caller and the callee is confusing even to the author of
> this code, it may make sense to make the caller responsible for
> freeing.
>
> Then the caller becomes
>
> > +             if (!strcmp(cmd->name, "flush")) {
> > +                     dispatch_calls(opt, output, data, queued_cmd, nr);
>
>                         for (i = 0; i < nr; i++)
>                                 free(queued_cmd[i].line);
>
> > +                     nr = 0;
> > +                     continue;
> > +             }
>
> which is not too bad.  And then we'd free the array itself at the
> end ...
>
> > ...
> > +             call.line = xstrdup_or_null(p);
> > +             queued_cmd[nr++] = call;
> > +     }
> > +
> > +     if (opt->buffer_output &&
> > +         nr &&
> > +         !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
> > +             dispatch_calls(opt, output, data, queued_cmd, nr);
> > +
> > +     free(queued_cmd);
>
> ... which may be easier to see what is going on.

I agree that it is easier to see what is going on when the caller is
responsible for freeing `line`. It _may_ make sense to factor out the
free-line-loop to a separate function since the caller will have to do
so after both calls to dispatch_calls(), not just the one inside the
strbuf_getline() loop, but also the one after the loop. A separate
function might be overkill for this two-line loop; on the other hand,
it may clue in future readers that resource management needs to be
taken into account.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v8 4/4] cat-file: add --batch-command mode
  2022-02-16 17:25                   ` Eric Sunshine
@ 2022-02-16 20:30                     ` John Cai
  0 siblings, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-16 20:30 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Junio C Hamano, John Cai via GitGitGadget, Git List, Taylor Blau,
	Phillip Wood, Ævar Arnfjörð Bjarmason, Eric Wong,
	Bagas Sanjaya, Jonathan Tan, Christian Couder



On 16 Feb 2022, at 12:25, Eric Sunshine wrote:

> On Wed, Feb 16, 2022 at 12:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> As it seems that this code structure and division of responsibility
>> between the caller and the callee is confusing even to the author of
>> this code, it may make sense to make the caller responsible for
>> freeing.
>>
>> Then the caller becomes
>>
>>> +             if (!strcmp(cmd->name, "flush")) {
>>> +                     dispatch_calls(opt, output, data, queued_cmd, nr);
>>
>>                         for (i = 0; i < nr; i++)
>>                                 free(queued_cmd[i].line);
>>
>>> +                     nr = 0;
>>> +                     continue;
>>> +             }
>>
>> which is not too bad.  And then we'd free the array itself at the
>> end ...
>>
>>> ...
>>> +             call.line = xstrdup_or_null(p);
>>> +             queued_cmd[nr++] = call;
>>> +     }
>>> +
>>> +     if (opt->buffer_output &&
>>> +         nr &&
>>> +         !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
>>> +             dispatch_calls(opt, output, data, queued_cmd, nr);
>>> +
>>> +     free(queued_cmd);
>>
>> ... which may be easier to see what is going on.
>
> I agree that it is easier to see what is going on when the caller is
> responsible for freeing `line`. It _may_ make sense to factor out the
> free-line-loop to a separate function since the caller will have to do
> so after both calls to dispatch_calls(), not just the one inside the
> strbuf_getline() loop, but also the one after the loop. A separate
> function might be overkill for this two-line loop; on the other hand,
> it may clue in future readers that resource management needs to be
> taken into account.

Both of these suggestions sound good to me. Thanks for the help down to these details. These are
all valuable learning points!

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v9 0/4] Add cat-file --batch-command flag
  2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                 ` (3 preceding siblings ...)
  2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-16 20:59               ` John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                                   ` (4 more replies)
  4 siblings, 5 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 20:59 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has four parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. add a remove_timestamp() helper that takes stdin and removes timestamps
 4. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v8

 * have caller free line through a helper function for the sake of
   separation of concerns

Changes since v7

 * revert back to having caller set nr to 0
 * add comment before dispatch_calls to clarify usage of helper
 * rename prefix->name

Changes since v6 (thanks to Eric's feedback)

 * allow command parsing logic to handle the case of flush as well
 * fixed documentation by adding --batch-command to the synopsis and
   adjusting tick marks
 * set nr=0 within helper function

Changes since v5

 * replaced flush tests that used fifo pipes to using a GIT_TEST_ env
   variable to control whether or not --batch-command flushes on exit.
 * added remove_timestamp helper in tests.
 * added documentation to show format can be used with --batch-command

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (4):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add remove_timestamp helper
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  42 +++++++-
 builtin/cat-file.c             | 181 ++++++++++++++++++++++++++++++---
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 122 ++++++++++++++++++++--
 4 files changed, 326 insertions(+), 22 deletions(-)


base-commit: b80121027d1247a0754b3cc46897fee75c050b44
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v9
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v9
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v8:

 1:  fa6294387ab = 1:  76d6e4fe517 cat-file: rename cmdmode to transform_mode
 2:  1a038097bfc = 2:  12084a335cb cat-file: introduce batch_mode enum to replace print_contents
 3:  486ee847816 = 3:  bf74b6cd75b cat-file: add remove_timestamp helper
 4:  8edf80574b8 ! 4:  dbe194f8a85 cat-file: add --batch-command mode
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	batch_one_object(line, output, opt, data);
      +}
      +
     -+/* Loop through each queued_cmd, dispatch the function, free the
     -+ * memory associated with line so the cmd array can be reused.
     -+ * Callers must set nr back to 0 in order to reuse the cmd array.
     -+ */
      +static void dispatch_calls(struct batch_options *opt,
      +		struct strbuf *output,
      +		struct expand_data *data,
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	if (!opt->buffer_output)
      +		die(_("flush is only for --buffer mode"));
      +
     -+	for (i = 0; i < nr; i++) {
     ++	for (i = 0; i < nr; i++)
      +		cmd[i].fn(opt, cmd[i].line, output, data);
     -+		FREE_AND_NULL(cmd[i].line);
     -+	}
      +
      +	fflush(stdout);
      +}
      +
     ++static void free_cmds(struct queued_cmd *cmd, int nr)
     ++{
     ++	int i;
     ++
     ++	for (i = 0; i < nr; i++)
     ++		FREE_AND_NULL(cmd[i].line);
     ++}
     ++
     ++
      +static const struct parse_cmd {
      +	const char *name;
      +	parse_cmd_fn_t fn;
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +
      +		if (!strcmp(cmd->name, "flush")) {
      +			dispatch_calls(opt, output, data, queued_cmd, nr);
     ++			free_cmds(queued_cmd, nr);
      +			nr = 0;
      +			continue;
      +		}
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +
      +	if (opt->buffer_output &&
      +	    nr &&
     -+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0))
     ++	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
      +		dispatch_calls(opt, output, data, queued_cmd, nr);
     ++		free_cmds(queued_cmd, nr);
     ++	}
      +
      +	free(queued_cmd);
      +	strbuf_release(&input);
     @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
      +		batch_objects_command(opt, &output, &data);
      +		goto cleanup;
      +	}
     ++
       	while (strbuf_getline(&input, stdin) != EOF) {
       		if (data.split_on_whitespace) {
       			/*

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-16 20:59                 ` John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 20:59 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-16 20:59                 ` John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 20:59 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

A future patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v9 3/4] cat-file: add remove_timestamp helper
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-16 20:59                 ` John Cai via GitGitGadget
  2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 20:59 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

maybe_remove_timestamp() takes arguments, but it would be useful to have
a function that reads from stdin and strips the timestamp. This would
allow tests to pipe data into a function to remove timestamps, and
wouldn't have to always assign a variable. This is especially helpful
when the data is multiple lines.

Keep maybe_remove_timestamp() the same, but add a remove_timestamp
helper that reads from stdin.

The tests in the next patch will make use of this.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 t/t1006-cat-file.sh | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..2d52851dadc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -105,13 +105,18 @@ strlen () {
 }
 
 maybe_remove_timestamp () {
-    if test -z "$2"; then
-        echo_without_newline "$1"
-    else
-	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
-    fi
+	if test -z "$2"; then
+		echo_without_newline "$1"
+	else
+		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
+	fi
 }
 
+remove_timestamp () {
+	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
+}
+
+
 run_tests () {
     type=$1
     sha1=$2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v9 4/4] cat-file: add --batch-command mode
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                   ` (2 preceding siblings ...)
  2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-16 20:59                 ` John Cai via GitGitGadget
  2022-02-18 11:26                   ` Phillip Wood
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  4 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-16 20:59 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes an <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  42 +++++++++-
 builtin/cat-file.c             | 147 ++++++++++++++++++++++++++++++++-
 t/README                       |   3 +
 t/t1006-cat-file.sh            | 107 +++++++++++++++++++++++-
 4 files changed, 293 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..70c5b4f12d1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+--batch-command=<format>::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents <object>::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info <object>::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..3dc960e9f85 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,138 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	if (!opt->buffer_output)
+		die(_("flush is only for --buffer mode"));
+
+	for (i = 0; i < nr; i++)
+		cmd[i].fn(opt, cmd[i].line, output, data);
+
+	fflush(stdout);
+}
+
+static void free_cmds(struct queued_cmd *cmd, int nr)
+{
+	int i;
+
+	for (i = 0; i < nr; i++)
+		FREE_AND_NULL(cmd[i].line);
+}
+
+
+static const struct parse_cmd {
+	const char *name;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+	{ "flush", NULL, 0},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].name);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].name);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!strcmp(cmd->name, "flush")) {
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			free_cmds(queued_cmd, nr);
+			nr = 0;
+			continue;
+		}
+
+		if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+			continue;
+		}
+
+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
+		call.fn = cmd->fn;
+		call.line = xstrdup_or_null(p);
+		queued_cmd[nr++] = call;
+	}
+
+	if (opt->buffer_output &&
+	    nr &&
+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+		free_cmds(queued_cmd, nr);
+	}
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +728,11 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
+
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +751,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +784,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -666,7 +807,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file <type> <object>"),
 		N_("git cat-file (-e | -p) <object>"),
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
-		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
+		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
 		   "             [--textconv | --filters]"),
 		N_("git cat-file (--textconv | --filters)\n"
@@ -695,6 +836,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/README b/t/README
index f48e0542cdc..bcd813b0c59 100644
--- a/t/README
+++ b/t/README
@@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
 sure the expected tests are executed and not silently skipped when their
 dependency breaks or is simply not present in a new environment.
 
+GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
+--batch-command from flushing to output on exit.
+
 Naming Tests
 ------------
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..74f0e36b69e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -182,6 +182,24 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
+		| git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" \
+		| git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -229,6 +247,22 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	test_write_lines "info $hello_sha1" "flush" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command --buffer without flush for blob info' '
+	touch output &&
+	test_write_lines "info $hello_sha1" | \
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -272,7 +306,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	test_write_lines "info $hello_sha1"\
+	"info $tree_sha1"\
+	"info $commit_sha1"\
+	"info $tag_sha1"\
+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	remove_timestamp >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	test_write_lines "contents $hello_sha1"\
+	"contents $commit_sha1"\
+	"contents $tag_sha1"\
+	"contents deadbeef"\
+	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v9 4/4] cat-file: add --batch-command mode
  2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-18 11:26                   ` Phillip Wood
  2022-02-18 16:53                     ` John Cai
  2022-02-18 17:23                     ` Junio C Hamano
  0 siblings, 2 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-18 11:26 UTC (permalink / raw)
  To: John Cai via GitGitGadget, git
  Cc: me, avarab, e, bagasdotme, gitster, Eric Sunshine, Jonathan Tan,
	Christian Couder, John Cai

Hi John

This is looking good. I think the only thing that is missing (and which 
I should have realized earlier) is that there are no tests for valid or 
invalid format arguments to --batch-command. I haven't checked but there 
must be some other tests in the t1006 that we can piggy back on to add 
that. I've left some stylistic comments below but I don't fell strongly 
about them apart from the README comment so please don't feel obliged to 
act on them, it's looking pretty good as is.

Best Wishes

Phillip

On 16/02/2022 20:59, John Cai via GitGitGadget wrote:
> From: John Cai <johncai86@gmail.com>
> 
> Add a new flag --batch-command that accepts commands and arguments
> from stdin, similar to git-update-ref --stdin.
> 
> At GitLab, we use a pair of long running cat-file processes when
> accessing object content. One for iterating over object metadata with
> --batch-check, and the other to grab object contents with --batch.
> 
> However, if we had --batch-command, we wouldn't need to keep both
> processes around, and instead just have one --batch-command process
> where we can flip between getting object info, and getting object
> contents. Since we have a pair of cat-file processes per repository,
> this means we can get rid of roughly half of long lived git cat-file
> processes. Given there are many repositories being accessed at any given
> time, this can lead to huge savings.
> 
> git cat-file --batch-command
> 
> will enter an interactive command mode whereby the user can enter in
> commands and their arguments that get queued in memory:
> 
> <command1> [arg1] [arg2] LF
> <command2> [arg1] [arg2] LF
> 
> When --buffer mode is used, commands will be queued in memory until a
> flush command is issued that execute them:
> 
> flush LF
> 
> The reason for a flush command is that when a consumer process (A)
> talks to a git cat-file process (B) and interactively writes to and
> reads from it in --buffer mode, (A) needs to be able to control when
> the buffer is flushed to stdout.
> 
> Currently, from (A)'s perspective, the only way is to either
> 
> 1. kill (B)'s process
> 2. send an invalid object to stdin.
> 
> 1. is not ideal from a performance perspective as it will require
> spawning a new cat-file process each time, and 2. is hacky and not a
> good long term solution.
> 
> With this mechanism of queueing up commands and letting (A) issue a
> flush command, process (A) can control when the buffer is flushed and
> can guarantee it will receive all of the output when in --buffer mode.
> --batch-command also will not allow (B) to flush to stdout until a flush
> is received.
> 
> This patch adds the basic structure for adding command which can be
> extended in the future to add more commands. It also adds the following
> two commands (on top of the flush command):
> 
> contents <object> LF
> info <object> LF
> 
> The contents command takes an <object> argument and prints out the object
> contents.
> 
> The info command takes an <object> argument and prints out the object
> metadata.
> 
> These can be used in the following way with --buffer:
> 
> info <object> LF
> contents <object> LF
> contents <object> LF
> info <object> LF
> flush LF
> info <object> LF
> flush LF
> 
> When used without --buffer:
> 
> info <object> LF
> contents <object> LF
> contents <object> LF
> info <object> LF
> info <object> LF
> 
> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>   Documentation/git-cat-file.txt |  42 +++++++++-
>   builtin/cat-file.c             | 147 ++++++++++++++++++++++++++++++++-
>   t/README                       |   3 +
>   t/t1006-cat-file.sh            | 107 +++++++++++++++++++++++-
>   4 files changed, 293 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index bef76f4dd06..70c5b4f12d1 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -96,6 +96,33 @@ OPTIONS
>   	need to specify the path, separated by whitespace.  See the
>   	section `BATCH OUTPUT` below for details.
>   
> +--batch-command::
> +--batch-command=<format>::
> +	Enter a command mode that reads commands and arguments from stdin. May
> +	only be combined with `--buffer`, `--textconv` or `--filters`. In the
> +	case of `--textconv` or `--filters`, the input lines also need to specify
> +	the path, separated by whitespace. See the section `BATCH OUTPUT` below
> +	for details.
> ++
> +`--batch-command` recognizes the following commands:
> ++
> +--
> +contents <object>::
> +	Print object contents for object reference `<object>`. This corresponds to
> +	the output of `--batch`.
> +
> +info <object>::
> +	Print object info for object reference `<object>`. This corresponds to the
> +	output of `--batch-check`.
> +
> +flush::
> +	Used with `--buffer` to execute all preceding commands that were issued
> +	since the beginning or since the last flush was issued. When `--buffer`
> +	is used, no output will come until a `flush` is issued. When `--buffer`
> +	is not used, commands are flushed each time without issuing `flush`.
> +--
> ++
> +
>   --batch-all-objects::
>   	Instead of reading a list of objects on stdin, perform the
>   	requested batch operation on all objects in the repository and
> @@ -110,7 +137,7 @@ OPTIONS
>   	that a process can interactively read and write from
>   	`cat-file`. With this option, the output uses normal stdio
>   	buffering; this is much more efficient when invoking
> -	`--batch-check` on a large number of objects.
> +	`--batch-check` or `--batch-command` on a large number of objects.
>   
>   --unordered::
>   	When `--batch-all-objects` is in use, visit objects in an
> @@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
>   the whole line is considered as an object, as if it were fed to
>   linkgit:git-rev-parse[1].
>   
> +When `--batch-command` is given, `cat-file` will read commands from stdin,
> +one per line, and print information based on the command given. With
> +`--batch-command`, the `info` command followed by an object will print
> +information about the object the same way `--batch-check` would, and the
> +`contents` command followed by an object prints contents in the same way
> +`--batch` would.
> +
>   You can specify the information shown for each object by using a custom
>   `<format>`. The `<format>` is copied literally to stdout for each
>   object, with placeholders of the form `%(atom)` expanded, followed by a
> @@ -237,9 +271,9 @@ newline. The available atoms are:
>   If no format is specified, the default format is `%(objectname)
>   %(objecttype) %(objectsize)`.
>   
> -If `--batch` is specified, the object information is followed by the
> -object contents (consisting of `%(objectsize)` bytes), followed by a
> -newline.
> +If `--batch` is specified, or if `--batch-command` is used with the `contents`
> +command, the object information is followed by the object contents (consisting
> +of `%(objectsize)` bytes), followed by a newline.
>   
>   For example, `--batch` without a custom format would produce:
>   
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5e38af82af1..3dc960e9f85 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -20,6 +20,7 @@
>   enum batch_mode {
>   	BATCH_MODE_CONTENTS,
>   	BATCH_MODE_INFO,
> +	BATCH_MODE_QUEUE_AND_DISPATCH,
>   };
>   
>   struct batch_options {
> @@ -513,6 +514,138 @@ static int batch_unordered_packed(const struct object_id *oid,
>   				      data);
>   }
>   
> +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
> +			       struct strbuf *, struct expand_data *);
> +
> +struct queued_cmd {
> +	parse_cmd_fn_t fn;
> +	char *line;
> +};
> +
> +static void parse_cmd_contents(struct batch_options *opt,
> +			     const char *line,
> +			     struct strbuf *output,
> +			     struct expand_data *data)
> +{
> +	opt->batch_mode = BATCH_MODE_CONTENTS;
> +	batch_one_object(line, output, opt, data);
> +}
> +
> +static void parse_cmd_info(struct batch_options *opt,
> +			   const char *line,
> +			   struct strbuf *output,
> +			   struct expand_data *data)
> +{
> +	opt->batch_mode = BATCH_MODE_INFO;
> +	batch_one_object(line, output, opt, data);
> +}
> +
> +static void dispatch_calls(struct batch_options *opt,
> +		struct strbuf *output,
> +		struct expand_data *data,
> +		struct queued_cmd *cmd,
> +		int nr)
> +{
> +	int i;
> +
> +	if (!opt->buffer_output)
> +		die(_("flush is only for --buffer mode"));
> +
> +	for (i = 0; i < nr; i++)
> +		cmd[i].fn(opt, cmd[i].line, output, data);
> +
> +	fflush(stdout);
> +}
> +
> +static void free_cmds(struct queued_cmd *cmd, int nr)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr; i++)
> +		FREE_AND_NULL(cmd[i].line);
> +}
> +
> +
> +static const struct parse_cmd {
> +	const char *name;
> +	parse_cmd_fn_t fn;
> +	unsigned takes_args;
> +} commands[] = {
> +	{ "contents", parse_cmd_contents, 1},
> +	{ "info", parse_cmd_info, 1},
> +	{ "flush", NULL, 0},
> +};
> +
> +static void batch_objects_command(struct batch_options *opt,
> +				    struct strbuf *output,
> +				    struct expand_data *data)
> +{
> +	struct strbuf input = STRBUF_INIT;
> +	struct queued_cmd *queued_cmd = NULL;
> +	size_t alloc = 0, nr = 0;
> +
> +	while (!strbuf_getline(&input, stdin)) {
> +		int i;
> +		const struct parse_cmd *cmd = NULL;
> +		const char *p = NULL, *cmd_end;
> +		struct queued_cmd call = {0};
> +
> +		if (!input.len)
> +			die(_("empty command in input"));
> +		if (isspace(*input.buf))
> +			die(_("whitespace before command: '%s'"), input.buf);
> +
> +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
> +			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
> +				continue;
> +
> +			cmd = &commands[i];
> +			if (cmd->takes_args) {
> +				if (*cmd_end != ' ')
> +					die(_("%s requires arguments"),
> +					    commands[i].name);
> +
> +				p = cmd_end + 1;
> +			} else if (*cmd_end) {
> +				die(_("%s takes no arguments"),
> +				    commands[i].name);
> +			}
> +
> +			break;
> +		}
> +
> +		if (!cmd)
> +			die(_("unknown command: '%s'"), input.buf);
> +
> +		if (!strcmp(cmd->name, "flush")) {
> +			dispatch_calls(opt, output, data, queued_cmd, nr);
> +			free_cmds(queued_cmd, nr);
> +			nr = 0;

It'd be nice if free_cmds() zeroed nr for us rather than having to 
remember to do it separately as the two are intimately linked.

> +			continue;
> +		}
> +
> +		if (!opt->buffer_output) {
> +			cmd->fn(opt, p, output, data);
> +			continue;
> +		}
> +
> +		ALLOC_GROW(queued_cmd, nr + 1, alloc);
> +		call.fn = cmd->fn;
> +		call.line = xstrdup_or_null(p);
> +		queued_cmd[nr++] = call;

I found this a bit confusing to follow with all the "continue"s for me 
it would be easier to follow if this was written as
	if (!strcmp(cmd->name, "flush") {
		...
	} else if (!opt->buffer_output) {
		...
	} else {
		ALLOC_GROW ...
	}

> +	}
> +
> +	if (opt->buffer_output &&
> +	    nr &&
> +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
> +		dispatch_calls(opt, output, data, queued_cmd, nr);
> +		free_cmds(queued_cmd, nr);
> +	}
> +
> +	free(queued_cmd);
> +	strbuf_release(&input);
> +}
> +
>   static int batch_objects(struct batch_options *opt)
>   {
>   	struct strbuf input = STRBUF_INIT;
> @@ -595,6 +728,11 @@ static int batch_objects(struct batch_options *opt)
>   	save_warning = warn_on_object_refname_ambiguity;
>   	warn_on_object_refname_ambiguity = 0;
>   
> +	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
> +		batch_objects_command(opt, &output, &data);
> +		goto cleanup;
> +	}
> +
>   	while (strbuf_getline(&input, stdin) != EOF) {
>   		if (data.split_on_whitespace) {
>   			/*
> @@ -613,6 +751,7 @@ static int batch_objects(struct batch_options *opt)
>   		batch_one_object(input.buf, &output, opt, &data);
>   	}
>   
> + cleanup:
>   	strbuf_release(&input);
>   	strbuf_release(&output);
>   	warn_on_object_refname_ambiguity = save_warning;
> @@ -645,6 +784,8 @@ static int batch_option_callback(const struct option *opt,
>   		bo->batch_mode = BATCH_MODE_CONTENTS;
>   	else if (!strcmp(opt->long_name, "batch-check"))
>   		bo->batch_mode = BATCH_MODE_INFO;
> +	else if (!strcmp(opt->long_name, "batch-command"))
> +		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
>   	else
>   		BUG("%s given to batch-option-callback", opt->long_name);
>   
> @@ -666,7 +807,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   		N_("git cat-file <type> <object>"),
>   		N_("git cat-file (-e | -p) <object>"),
>   		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
> -		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
> +		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
>   		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
>   		   "             [--textconv | --filters]"),
>   		N_("git cat-file (--textconv | --filters)\n"
> @@ -695,6 +836,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   			N_("like --batch, but don't emit <contents>"),
>   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>   			batch_option_callback),
> +		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
> +			N_("read commands from stdin"),
> +			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
> +			batch_option_callback),
>   		OPT_CMDMODE(0, "batch-all-objects", &opt,
>   			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
>   		/* Batch-specific options */
> diff --git a/t/README b/t/README
> index f48e0542cdc..bcd813b0c59 100644
> --- a/t/README
> +++ b/t/README
> @@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
>   sure the expected tests are executed and not silently skipped when their
>   dependency breaks or is simply not present in a new environment.
>   
> +GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
> +--batch-command from flushing to output on exit.

I don't think you need to document this here. Looking at the other 
variables this is a list of things one can set to change the behavior of 
the tests when they are run. GIT_TEST_CAT_FILE_NO_FLUSH is not in that 
category - we don't want anyone setting it when they run the tests, it's 
just an implementation detail.

> +
>   Naming Tests
>   ------------
>   
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 2d52851dadc..74f0e36b69e 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -182,6 +182,24 @@ $content"
>   	test_cmp expect actual
>       '
>   
> +    for opt in --buffer --no-buffer
> +    do
> +	test -z "$content" ||
> +		test_expect_success "--batch-command $opt output of $type content is correct" '
> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
> +		test_cmp expect actual
> +	'
> +
> +	test_expect_success "--batch-command $opt output of $type info is correct" '
> +		echo "$sha1 $type $size" >expect &&
> +		test_write_lines "info $sha1" \
> +		| git cat-file --batch-command $opt >actual &&
> +		test_cmp expect actual
> +	'
> +    done
> +
>       test_expect_success "custom --batch-check format" '
>   	echo "$type $sha1" >expect &&
>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
> @@ -229,6 +247,22 @@ test_expect_success "setup" '
>   
>   run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>   
> +test_expect_success '--batch-command --buffer with flush for blob info' '
> +	echo "$hello_sha1 blob $hello_size" >expect &&
> +	test_write_lines "info $hello_sha1" "flush" | \

You don't need a '\' after a '|', however it might be better to use the 
style from the tests above where the '|' is on the beginning of the next 
line.

> +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
> +	git cat-file --batch-command --buffer >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--batch-command --buffer without flush for blob info' '
> +	touch output &&
> +	test_write_lines "info $hello_sha1" | \
> +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
> +	git cat-file --batch-command --buffer >>output &&
> +	test_must_be_empty output
> +'
> +
>   test_expect_success '--batch-check without %(rest) considers whole line' '
>   	echo "$hello_sha1 blob $hello_size" >expect &&
>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
> @@ -272,7 +306,7 @@ test_expect_success \
>       "Reach a blob from a tag pointing to it" \
>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>   
> -for batch in batch batch-check
> +for batch in batch batch-check batch-command
>   do
>       for opt in t s e p
>       do
> @@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>   '
>   
> +test_expect_success '--batch-command with multiple info calls gives correct format' '
> +	cat >expect <<-EOF &&
> +	$hello_sha1 blob $hello_size
> +	$tree_sha1 tree $tree_size
> +	$commit_sha1 commit $commit_size
> +	$tag_sha1 tag $tag_size
> +	deadbeef missing
> +	EOF
> +
> +	test_write_lines "info $hello_sha1"\
> +	"info $tree_sha1"\
> +	"info $commit_sha1"\
> +	"info $tag_sha1"\
> +	"info deadbeef" | git cat-file --batch-command --buffer >actual &&

This is quite noisy with all the " and \, using a here document instead 
would match our usual style.

> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--batch-command with multiple command calls gives correct format' '
> +	remove_timestamp >expect <<-EOF &&
> +	$hello_sha1 blob $hello_size
> +	$hello_content
> +	$commit_sha1 commit $commit_size
> +	$commit_content
> +	$tag_sha1 tag $tag_size
> +	$tag_content
> +	deadbeef missing
> +	EOF
> +
> +	test_write_lines "contents $hello_sha1"\
> +	"contents $commit_sha1"\
> +	"contents $tag_sha1"\
> +	"contents deadbeef"\
> +	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&

This loses the exit code of the command we're trying to test, it would 
be better to have
	git cat-file ... >actual-raw &&
	remove_timestamp <actual-raw >actual

> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'setup blobs which are likely to delta' '
>   	test-tool genrandom foo 10240 >foo &&
>   	{ cat foo && echo plus; } >foo-plus &&
> @@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>   	echo "$orig commit $orig_size" >expect &&
>   	test_cmp expect actual
>   '
> +test_expect_success 'batch-command empty command' '
> +	echo "" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*empty command in input.*" err
> +'
> +
> +test_expect_success 'batch-command whitespace before command' '
> +	echo " info deadbeef" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*whitespace before command.*" err
> +'
> +
> +test_expect_success 'batch-command unknown command' '
> +	echo unknown_command >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*unknown command.*" err
> +'
> +
> +test_expect_success 'batch-command missing arguments' '
> +	echo "info" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*info requires arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush with arguments' '
> +	echo "flush arg" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
> +	grep "^fatal:.*flush takes no arguments.*" err
> +'
> +
> +test_expect_success 'batch-command flush without --buffer' '
> +	echo "flush" >cmd &&
> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
> +	grep "^fatal:.*flush is only for --buffer mode.*" err
> +'
>   
>   test_done


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v9 4/4] cat-file: add --batch-command mode
  2022-02-18 11:26                   ` Phillip Wood
@ 2022-02-18 16:53                     ` John Cai
  2022-02-18 17:32                       ` Junio C Hamano
  2022-02-18 17:23                     ` Junio C Hamano
  1 sibling, 1 reply; 97+ messages in thread
From: John Cai @ 2022-02-18 16:53 UTC (permalink / raw)
  To: phillip.wood
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme,
	gitster, Eric Sunshine, Jonathan Tan, Christian Couder

Hi Phillip,

On 18 Feb 2022, at 6:26, Phillip Wood wrote:

> Hi John
>
> This is looking good. I think the only thing that is missing (and which I should have realized earlier) is that there are no tests for valid or invalid format arguments to --batch-command. I haven't checked but there must be some other tests in the t1006 that we can piggy back on to add that. I've left some stylistic comments below but I don't fell strongly about them apart from the README comment so please don't feel obliged to act on them, it's looking pretty good as is.
>
> Best Wishes
>
> Phillip
>
> On 16/02/2022 20:59, John Cai via GitGitGadget wrote:
>> From: John Cai <johncai86@gmail.com>
>>
>> Add a new flag --batch-command that accepts commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>>
>> At GitLab, we use a pair of long running cat-file processes when
>> accessing object content. One for iterating over object metadata with
>> --batch-check, and the other to grab object contents with --batch.
>>
>> However, if we had --batch-command, we wouldn't need to keep both
>> processes around, and instead just have one --batch-command process
>> where we can flip between getting object info, and getting object
>> contents. Since we have a pair of cat-file processes per repository,
>> this means we can get rid of roughly half of long lived git cat-file
>> processes. Given there are many repositories being accessed at any given
>> time, this can lead to huge savings.
>>
>> git cat-file --batch-command
>>
>> will enter an interactive command mode whereby the user can enter in
>> commands and their arguments that get queued in memory:
>>
>> <command1> [arg1] [arg2] LF
>> <command2> [arg1] [arg2] LF
>>
>> When --buffer mode is used, commands will be queued in memory until a
>> flush command is issued that execute them:
>>
>> flush LF
>>
>> The reason for a flush command is that when a consumer process (A)
>> talks to a git cat-file process (B) and interactively writes to and
>> reads from it in --buffer mode, (A) needs to be able to control when
>> the buffer is flushed to stdout.
>>
>> Currently, from (A)'s perspective, the only way is to either
>>
>> 1. kill (B)'s process
>> 2. send an invalid object to stdin.
>>
>> 1. is not ideal from a performance perspective as it will require
>> spawning a new cat-file process each time, and 2. is hacky and not a
>> good long term solution.
>>
>> With this mechanism of queueing up commands and letting (A) issue a
>> flush command, process (A) can control when the buffer is flushed and
>> can guarantee it will receive all of the output when in --buffer mode.
>> --batch-command also will not allow (B) to flush to stdout until a flush
>> is received.
>>
>> This patch adds the basic structure for adding command which can be
>> extended in the future to add more commands. It also adds the following
>> two commands (on top of the flush command):
>>
>> contents <object> LF
>> info <object> LF
>>
>> The contents command takes an <object> argument and prints out the object
>> contents.
>>
>> The info command takes an <object> argument and prints out the object
>> metadata.
>>
>> These can be used in the following way with --buffer:
>>
>> info <object> LF
>> contents <object> LF
>> contents <object> LF
>> info <object> LF
>> flush LF
>> info <object> LF
>> flush LF
>>
>> When used without --buffer:
>>
>> info <object> LF
>> contents <object> LF
>> contents <object> LF
>> info <object> LF
>> info <object> LF
>>
>> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Signed-off-by: John Cai <johncai86@gmail.com>
>> ---
>>   Documentation/git-cat-file.txt |  42 +++++++++-
>>   builtin/cat-file.c             | 147 ++++++++++++++++++++++++++++++++-
>>   t/README                       |   3 +
>>   t/t1006-cat-file.sh            | 107 +++++++++++++++++++++++-
>>   4 files changed, 293 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>> index bef76f4dd06..70c5b4f12d1 100644
>> --- a/Documentation/git-cat-file.txt
>> +++ b/Documentation/git-cat-file.txt
>> @@ -96,6 +96,33 @@ OPTIONS
>>   	need to specify the path, separated by whitespace.  See the
>>   	section `BATCH OUTPUT` below for details.
>>  +--batch-command::
>> +--batch-command=<format>::
>> +	Enter a command mode that reads commands and arguments from stdin. May
>> +	only be combined with `--buffer`, `--textconv` or `--filters`. In the
>> +	case of `--textconv` or `--filters`, the input lines also need to specify
>> +	the path, separated by whitespace. See the section `BATCH OUTPUT` below
>> +	for details.
>> ++
>> +`--batch-command` recognizes the following commands:
>> ++
>> +--
>> +contents <object>::
>> +	Print object contents for object reference `<object>`. This corresponds to
>> +	the output of `--batch`.
>> +
>> +info <object>::
>> +	Print object info for object reference `<object>`. This corresponds to the
>> +	output of `--batch-check`.
>> +
>> +flush::
>> +	Used with `--buffer` to execute all preceding commands that were issued
>> +	since the beginning or since the last flush was issued. When `--buffer`
>> +	is used, no output will come until a `flush` is issued. When `--buffer`
>> +	is not used, commands are flushed each time without issuing `flush`.
>> +--
>> ++
>> +
>>   --batch-all-objects::
>>   	Instead of reading a list of objects on stdin, perform the
>>   	requested batch operation on all objects in the repository and
>> @@ -110,7 +137,7 @@ OPTIONS
>>   	that a process can interactively read and write from
>>   	`cat-file`. With this option, the output uses normal stdio
>>   	buffering; this is much more efficient when invoking
>> -	`--batch-check` on a large number of objects.
>> +	`--batch-check` or `--batch-command` on a large number of objects.
>>    --unordered::
>>   	When `--batch-all-objects` is in use, visit objects in an
>> @@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
>>   the whole line is considered as an object, as if it were fed to
>>   linkgit:git-rev-parse[1].
>>  +When `--batch-command` is given, `cat-file` will read commands from stdin,
>> +one per line, and print information based on the command given. With
>> +`--batch-command`, the `info` command followed by an object will print
>> +information about the object the same way `--batch-check` would, and the
>> +`contents` command followed by an object prints contents in the same way
>> +`--batch` would.
>> +
>>   You can specify the information shown for each object by using a custom
>>   `<format>`. The `<format>` is copied literally to stdout for each
>>   object, with placeholders of the form `%(atom)` expanded, followed by a
>> @@ -237,9 +271,9 @@ newline. The available atoms are:
>>   If no format is specified, the default format is `%(objectname)
>>   %(objecttype) %(objectsize)`.
>>  -If `--batch` is specified, the object information is followed by the
>> -object contents (consisting of `%(objectsize)` bytes), followed by a
>> -newline.
>> +If `--batch` is specified, or if `--batch-command` is used with the `contents`
>> +command, the object information is followed by the object contents (consisting
>> +of `%(objectsize)` bytes), followed by a newline.
>>    For example, `--batch` without a custom format would produce:
>>  diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index 5e38af82af1..3dc960e9f85 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -20,6 +20,7 @@
>>   enum batch_mode {
>>   	BATCH_MODE_CONTENTS,
>>   	BATCH_MODE_INFO,
>> +	BATCH_MODE_QUEUE_AND_DISPATCH,
>>   };
>>    struct batch_options {
>> @@ -513,6 +514,138 @@ static int batch_unordered_packed(const struct object_id *oid,
>>   				      data);
>>   }
>>  +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
>> +			       struct strbuf *, struct expand_data *);
>> +
>> +struct queued_cmd {
>> +	parse_cmd_fn_t fn;
>> +	char *line;
>> +};
>> +
>> +static void parse_cmd_contents(struct batch_options *opt,
>> +			     const char *line,
>> +			     struct strbuf *output,
>> +			     struct expand_data *data)
>> +{
>> +	opt->batch_mode = BATCH_MODE_CONTENTS;
>> +	batch_one_object(line, output, opt, data);
>> +}
>> +
>> +static void parse_cmd_info(struct batch_options *opt,
>> +			   const char *line,
>> +			   struct strbuf *output,
>> +			   struct expand_data *data)
>> +{
>> +	opt->batch_mode = BATCH_MODE_INFO;
>> +	batch_one_object(line, output, opt, data);
>> +}
>> +
>> +static void dispatch_calls(struct batch_options *opt,
>> +		struct strbuf *output,
>> +		struct expand_data *data,
>> +		struct queued_cmd *cmd,
>> +		int nr)
>> +{
>> +	int i;
>> +
>> +	if (!opt->buffer_output)
>> +		die(_("flush is only for --buffer mode"));
>> +
>> +	for (i = 0; i < nr; i++)
>> +		cmd[i].fn(opt, cmd[i].line, output, data);
>> +
>> +	fflush(stdout);
>> +}
>> +
>> +static void free_cmds(struct queued_cmd *cmd, int nr)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < nr; i++)
>> +		FREE_AND_NULL(cmd[i].line);
>> +}
>> +
>> +
>> +static const struct parse_cmd {
>> +	const char *name;
>> +	parse_cmd_fn_t fn;
>> +	unsigned takes_args;
>> +} commands[] = {
>> +	{ "contents", parse_cmd_contents, 1},
>> +	{ "info", parse_cmd_info, 1},
>> +	{ "flush", NULL, 0},
>> +};
>> +
>> +static void batch_objects_command(struct batch_options *opt,
>> +				    struct strbuf *output,
>> +				    struct expand_data *data)
>> +{
>> +	struct strbuf input = STRBUF_INIT;
>> +	struct queued_cmd *queued_cmd = NULL;
>> +	size_t alloc = 0, nr = 0;
>> +
>> +	while (!strbuf_getline(&input, stdin)) {
>> +		int i;
>> +		const struct parse_cmd *cmd = NULL;
>> +		const char *p = NULL, *cmd_end;
>> +		struct queued_cmd call = {0};
>> +
>> +		if (!input.len)
>> +			die(_("empty command in input"));
>> +		if (isspace(*input.buf))
>> +			die(_("whitespace before command: '%s'"), input.buf);
>> +
>> +		for (i = 0; i < ARRAY_SIZE(commands); i++) {
>> +			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
>> +				continue;
>> +
>> +			cmd = &commands[i];
>> +			if (cmd->takes_args) {
>> +				if (*cmd_end != ' ')
>> +					die(_("%s requires arguments"),
>> +					    commands[i].name);
>> +
>> +				p = cmd_end + 1;
>> +			} else if (*cmd_end) {
>> +				die(_("%s takes no arguments"),
>> +				    commands[i].name);
>> +			}
>> +
>> +			break;
>> +		}
>> +
>> +		if (!cmd)
>> +			die(_("unknown command: '%s'"), input.buf);
>> +
>> +		if (!strcmp(cmd->name, "flush")) {
>> +			dispatch_calls(opt, output, data, queued_cmd, nr);
>> +			free_cmds(queued_cmd, nr);
>> +			nr = 0;
>
> It'd be nice if free_cmds() zeroed nr for us rather than having to remember to do it separately as the two are intimately linked.

This does feel cleaner. Before there was a version where I did this inside of
dispatch_calls and there was feedback that this wasn't clean. But now that
free_cmds prepares the queued_cmd array for reuse, then it may make sense to do
it inside. Though honestly from the back and forth around this, I'm not too sure
what the best thing to do stylistically would be.

>
>> +			continue;
>> +		}
>> +
>> +		if (!opt->buffer_output) {
>> +			cmd->fn(opt, p, output, data);
>> +			continue;
>> +		}
>> +
>> +		ALLOC_GROW(queued_cmd, nr + 1, alloc);
>> +		call.fn = cmd->fn;
>> +		call.line = xstrdup_or_null(p);
>> +		queued_cmd[nr++] = call;
>
> I found this a bit confusing to follow with all the "continue"s for me it would be easier to follow if this was written as
>     if (!strcmp(cmd->name, "flush") {
>     	...
>     } else if (!opt->buffer_output) {
>     	...
>     } else {
>     	ALLOC_GROW ...
>     }

Good point. I think this would be easier to follow.
>
>> +	}
>> +
>> +	if (opt->buffer_output &&
>> +	    nr &&
>> +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
>> +		dispatch_calls(opt, output, data, queued_cmd, nr);
>> +		free_cmds(queued_cmd, nr);
>> +	}
>> +
>> +	free(queued_cmd);
>> +	strbuf_release(&input);
>> +}
>> +
>>   static int batch_objects(struct batch_options *opt)
>>   {
>>   	struct strbuf input = STRBUF_INIT;
>> @@ -595,6 +728,11 @@ static int batch_objects(struct batch_options *opt)
>>   	save_warning = warn_on_object_refname_ambiguity;
>>   	warn_on_object_refname_ambiguity = 0;
>>  +	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
>> +		batch_objects_command(opt, &output, &data);
>> +		goto cleanup;
>> +	}
>> +
>>   	while (strbuf_getline(&input, stdin) != EOF) {
>>   		if (data.split_on_whitespace) {
>>   			/*
>> @@ -613,6 +751,7 @@ static int batch_objects(struct batch_options *opt)
>>   		batch_one_object(input.buf, &output, opt, &data);
>>   	}
>>  + cleanup:
>>   	strbuf_release(&input);
>>   	strbuf_release(&output);
>>   	warn_on_object_refname_ambiguity = save_warning;
>> @@ -645,6 +784,8 @@ static int batch_option_callback(const struct option *opt,
>>   		bo->batch_mode = BATCH_MODE_CONTENTS;
>>   	else if (!strcmp(opt->long_name, "batch-check"))
>>   		bo->batch_mode = BATCH_MODE_INFO;
>> +	else if (!strcmp(opt->long_name, "batch-command"))
>> +		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
>>   	else
>>   		BUG("%s given to batch-option-callback", opt->long_name);
>>  @@ -666,7 +807,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>>   		N_("git cat-file <type> <object>"),
>>   		N_("git cat-file (-e | -p) <object>"),
>>   		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
>> -		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
>> +		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
>>   		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
>>   		   "             [--textconv | --filters]"),
>>   		N_("git cat-file (--textconv | --filters)\n"
>> @@ -695,6 +836,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>>   			N_("like --batch, but don't emit <contents>"),
>>   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>>   			batch_option_callback),
>> +		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
>> +			N_("read commands from stdin"),
>> +			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>> +			batch_option_callback),
>>   		OPT_CMDMODE(0, "batch-all-objects", &opt,
>>   			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
>>   		/* Batch-specific options */
>> diff --git a/t/README b/t/README
>> index f48e0542cdc..bcd813b0c59 100644
>> --- a/t/README
>> +++ b/t/README
>> @@ -472,6 +472,9 @@ a test and then fails then the whole test run will abort. This can help to make
>>   sure the expected tests are executed and not silently skipped when their
>>   dependency breaks or is simply not present in a new environment.
>>  +GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
>> +--batch-command from flushing to output on exit.
>
> I don't think you need to document this here. Looking at the other variables this is a list of things one can set to change the behavior of the tests when they are run. GIT_TEST_CAT_FILE_NO_FLUSH is not in that category - we don't want anyone setting it when they run the tests, it's just an implementation detail.

okay sounds good.

>
>> +
>>   Naming Tests
>>   ------------
>>  diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
>> index 2d52851dadc..74f0e36b69e 100755
>> --- a/t/t1006-cat-file.sh
>> +++ b/t/t1006-cat-file.sh
>> @@ -182,6 +182,24 @@ $content"
>>   	test_cmp expect actual
>>       '
>>  +    for opt in --buffer --no-buffer
>> +    do
>> +	test -z "$content" ||
>> +		test_expect_success "--batch-command $opt output of $type content is correct" '
>> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
>> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
>> +		| git cat-file --batch-command $opt)" $no_ts >actual &&
>> +		test_cmp expect actual
>> +	'
>> +
>> +	test_expect_success "--batch-command $opt output of $type info is correct" '
>> +		echo "$sha1 $type $size" >expect &&
>> +		test_write_lines "info $sha1" \
>> +		| git cat-file --batch-command $opt >actual &&
>> +		test_cmp expect actual
>> +	'
>> +    done
>> +
>>       test_expect_success "custom --batch-check format" '
>>   	echo "$type $sha1" >expect &&
>>   	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
>> @@ -229,6 +247,22 @@ test_expect_success "setup" '
>>    run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
>>  +test_expect_success '--batch-command --buffer with flush for blob info' '
>> +	echo "$hello_sha1 blob $hello_size" >expect &&
>> +	test_write_lines "info $hello_sha1" "flush" | \
>
> You don't need a '\' after a '|', however it might be better to use the style from the tests above where the '|' is on the beginning of the next line.

good to know. I'll adjust this.
>
>> +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
>> +	git cat-file --batch-command --buffer >actual &&
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success '--batch-command --buffer without flush for blob info' '
>> +	touch output &&
>> +	test_write_lines "info $hello_sha1" | \
>> +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
>> +	git cat-file --batch-command --buffer >>output &&
>> +	test_must_be_empty output
>> +'
>> +
>>   test_expect_success '--batch-check without %(rest) considers whole line' '
>>   	echo "$hello_sha1 blob $hello_size" >expect &&
>>   	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
>> @@ -272,7 +306,7 @@ test_expect_success \
>>       "Reach a blob from a tag pointing to it" \
>>       "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
>>  -for batch in batch batch-check
>> +for batch in batch batch-check batch-command
>>   do
>>       for opt in t s e p
>>       do
>> @@ -378,6 +412,42 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>>       "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>>   '
>>  +test_expect_success '--batch-command with multiple info calls gives correct format' '
>> +	cat >expect <<-EOF &&
>> +	$hello_sha1 blob $hello_size
>> +	$tree_sha1 tree $tree_size
>> +	$commit_sha1 commit $commit_size
>> +	$tag_sha1 tag $tag_size
>> +	deadbeef missing
>> +	EOF
>> +
>> +	test_write_lines "info $hello_sha1"\
>> +	"info $tree_sha1"\
>> +	"info $commit_sha1"\
>> +	"info $tag_sha1"\
>> +	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
>
> This is quite noisy with all the " and \, using a here document instead would match our usual style.

That's a good suggestion. I can change this into a here doc

>
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success '--batch-command with multiple command calls gives correct format' '
>> +	remove_timestamp >expect <<-EOF &&
>> +	$hello_sha1 blob $hello_size
>> +	$hello_content
>> +	$commit_sha1 commit $commit_size
>> +	$commit_content
>> +	$tag_sha1 tag $tag_size
>> +	$tag_content
>> +	deadbeef missing
>> +	EOF
>> +
>> +	test_write_lines "contents $hello_sha1"\
>> +	"contents $commit_sha1"\
>> +	"contents $tag_sha1"\
>> +	"contents deadbeef"\
>> +	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
>
> This loses the exit code of the command we're trying to test, it would be better to have
>     git cat-file ... >actual-raw &&
>     remove_timestamp <actual-raw >actual
>

I hadn't considered this effect. Thanks for pointing that out!

>> +	test_cmp expect actual
>> +'
>> +
>>   test_expect_success 'setup blobs which are likely to delta' '
>>   	test-tool genrandom foo 10240 >foo &&
>>   	{ cat foo && echo plus; } >foo-plus &&
>> @@ -968,5 +1038,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
>>   	echo "$orig commit $orig_size" >expect &&
>>   	test_cmp expect actual
>>   '
>> +test_expect_success 'batch-command empty command' '
>> +	echo "" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*empty command in input.*" err
>> +'
>> +
>> +test_expect_success 'batch-command whitespace before command' '
>> +	echo " info deadbeef" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*whitespace before command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command unknown command' '
>> +	echo unknown_command >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*unknown command.*" err
>> +'
>> +
>> +test_expect_success 'batch-command missing arguments' '
>> +	echo "info" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*info requires arguments.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush with arguments' '
>> +	echo "flush arg" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
>> +	grep "^fatal:.*flush takes no arguments.*" err
>> +'
>> +
>> +test_expect_success 'batch-command flush without --buffer' '
>> +	echo "flush" >cmd &&
>> +	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
>> +	grep "^fatal:.*flush is only for --buffer mode.*" err
>> +'
>>    test_done

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v9 4/4] cat-file: add --batch-command mode
  2022-02-18 11:26                   ` Phillip Wood
  2022-02-18 16:53                     ` John Cai
@ 2022-02-18 17:23                     ` Junio C Hamano
  1 sibling, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-18 17:23 UTC (permalink / raw)
  To: Phillip Wood
  Cc: John Cai via GitGitGadget, git, me, avarab, e, bagasdotme,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

Phillip Wood <phillip.wood123@gmail.com> writes:

> This is looking good. I think the only thing that is missing (and
> which I should have realized earlier) is that there are no tests for
> valid or invalid format arguments to --batch-command. I haven't
> checked but there must be some other tests in the t1006 that we can
> piggy back on to add that. I've left some stylistic comments below but
> I don't fell strongly about them apart from the README comment so
> please don't feel obliged to act on them, it's looking pretty good as
> is.

Yes, I agree that this is mostly good.  I also agree with all the
points you raised in your review, including the stylistic ones.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v9 4/4] cat-file: add --batch-command mode
  2022-02-18 16:53                     ` John Cai
@ 2022-02-18 17:32                       ` Junio C Hamano
  0 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-18 17:32 UTC (permalink / raw)
  To: John Cai
  Cc: phillip.wood, John Cai via GitGitGadget, git, me, avarab, e,
	bagasdotme, Eric Sunshine, Jonathan Tan, Christian Couder

John Cai <johncai86@gmail.com> writes:

>>> +		if (!strcmp(cmd->name, "flush")) {
>>> +			dispatch_calls(opt, output, data, queued_cmd, nr);
>>> +			free_cmds(queued_cmd, nr);
>>> +			nr = 0;
>>
>> It'd be nice if free_cmds() zeroed nr for us rather than having to remember to do it separately as the two are intimately linked.
>
> This does feel cleaner. Before there was a version where I did this inside of
> dispatch_calls and there was feedback that this wasn't clean. But now that
> free_cmds prepares the queued_cmd array for reuse, then it may make sense to do
> it inside. Though honestly from the back and forth around this, I'm not too sure
> what the best thing to do stylistically would be.

I am not sure about style, but at the semantic level, free_cmds()
that "frees" the queued_cmd by releasing the resources it holds and
resets its counter to zero would be a more complete "does one thing
and one thing well" helper function.

>>>  +test_expect_success '--batch-command --buffer with flush for blob info' '
>>> +	echo "$hello_sha1 blob $hello_size" >expect &&
>>> +	test_write_lines "info $hello_sha1" "flush" | \
>>
>> You don't need a '\' after a '|', however it might be better to use the style from the tests above where the '|' is on the beginning of the next line.

Please don't do

	producer \
	| consumer

instead, write

	producer |
	consumer

With two fewer bytes, and is far more common, judging from the
output of

    $ git grep -e '^[   ]*| [A-Za-z]' t

i.e. indent with whitespace or tab, pipe, space and alpha (i.e. the
beginning of the command, possibly a single-shot environment
assignment).

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v10 0/4] Add cat-file --batch-command flag
  2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                   ` (3 preceding siblings ...)
  2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-18 18:23                 ` John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
                                     ` (5 more replies)
  4 siblings, 6 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-18 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai

The feature proposal of adding a command interface to cat-file was first
discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
before moving forward with a new flag. An RFC was created [C] and the idea
was discussed more thoroughly, and overall it seemed like it was headed in
the right direction.

This patch series consolidates the feedback from these different threads.

This patch series has four parts:

 1. preparation patch to rename a variable
 2. adding an enum to keep track of batch modes
 3. add a remove_timestamp() helper that takes stdin and removes timestamps
 4. logic to handle --batch-command flag, adding contents, info, flush
    commands

Changes since v9

 * add test to exercise format for batch-command
 * minor semantic improvements
 * removed README entry for environment variable used in test

Changes since v8

 * have caller free line through a helper function for the sake of
   separation of concerns

Changes since v7

 * revert back to having caller set nr to 0
 * add comment before dispatch_calls to clarify usage of helper
 * rename prefix->name

Changes since v6 (thanks to Eric's feedback)

 * allow command parsing logic to handle the case of flush as well
 * fixed documentation by adding --batch-command to the synopsis and
   adjusting tick marks
 * set nr=0 within helper function

Changes since v5

 * replaced flush tests that used fifo pipes to using a GIT_TEST_ env
   variable to control whether or not --batch-command flushes on exit.
 * added remove_timestamp helper in tests.
 * added documentation to show format can be used with --batch-command

Changes since v4

 * added Phillip's suggested test for testing flush. This should have
   addressed the flaky test that was hanging. I tested it on my side and
   wasn't able to reproduce the deadlock.
 * plugged some holes in the logic that parsed the command and arguments,
   thanks to Eric's feedback
 * fixed verbiage in commit messages per Christian's feedback
 * clarified places in documentation that should mention --batch-command per
   Eric's feedback

Changes since v3 (thanks to Junio's feedback):

 * added cascading logic in batch_options_callback()
 * free memory for queued call input lines
 * do not throw error when flushing an empty queue
 * renamed cmds array to singular queued_cmd
 * fixed flaky test that failed --stress

Changes since v2:

 * added enum to keep track of which batch mode we are in (thanks to Junio's
   feedback)
 * fixed array allocation logic (thanks to Junio's feedback)
 * added code to flush commands when --batch-commands receives an EOF and
   exits (thanks to Phillip's feedback)
 * fixed docs formatting (thanks to Jonathan's feedback)

Changes since v1:

 * simplified "session" mechanism. "flush" will execute all commands that
   were entered in since the last "flush" when in --buffer mode
 * when not in --buffer mode, each command is executed and flushed each time
 * rename cmdmode to transform_mode instead of just mode
 * simplified command parsing logic
 * changed rename of cmdmode to transform_mode
 * clarified verbiage in commit messages

A. https://lore.kernel.org/git/xmqqk0hitnkc.fsf@gitster.g/ B.
https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/ C.
https://lore.kernel.org/git/e75ba9ea-fdda-6e9f-4dd6-24190117d93b@gmail.com/

John Cai (4):
  cat-file: rename cmdmode to transform_mode
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: add remove_timestamp helper
  cat-file: add --batch-command mode

 Documentation/git-cat-file.txt |  42 +++++++-
 builtin/cat-file.c             | 178 ++++++++++++++++++++++++++++++---
 t/t1006-cat-file.sh            | 135 +++++++++++++++++++++++--
 3 files changed, 333 insertions(+), 22 deletions(-)


base-commit: b80121027d1247a0754b3cc46897fee75c050b44
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1212%2Fjohn-cai%2Fjc-cat-file-batch-command-v10
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1212/john-cai/jc-cat-file-batch-command-v10
Pull-Request: https://github.com/git/git/pull/1212

Range-diff vs v9:

 1:  76d6e4fe517 = 1:  76d6e4fe517 cat-file: rename cmdmode to transform_mode
 2:  12084a335cb = 2:  12084a335cb cat-file: introduce batch_mode enum to replace print_contents
 3:  bf74b6cd75b = 3:  bf74b6cd75b cat-file: add remove_timestamp helper
 4:  dbe194f8a85 ! 4:  c6ea1214062 cat-file: add --batch-command mode
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +	fflush(stdout);
      +}
      +
     -+static void free_cmds(struct queued_cmd *cmd, int nr)
     ++static void free_cmds(struct queued_cmd *cmd, size_t *nr)
      +{
     -+	int i;
     ++	size_t i;
      +
     -+	for (i = 0; i < nr; i++)
     ++	for (i = 0; i < *nr; i++)
      +		FREE_AND_NULL(cmd[i].line);
     ++
     ++	*nr = 0;
      +}
      +
      +
     @@ builtin/cat-file.c: static int batch_unordered_packed(const struct object_id *oi
      +
      +		if (!strcmp(cmd->name, "flush")) {
      +			dispatch_calls(opt, output, data, queued_cmd, nr);
     -+			free_cmds(queued_cmd, nr);
     -+			nr = 0;
     -+			continue;
     -+		}
     -+
     -+		if (!opt->buffer_output) {
     ++			free_cmds(queued_cmd, &nr);
     ++		} else if (!opt->buffer_output) {
      +			cmd->fn(opt, p, output, data);
     -+			continue;
     ++		} else {
     ++			ALLOC_GROW(queued_cmd, nr + 1, alloc);
     ++			call.fn = cmd->fn;
     ++			call.line = xstrdup_or_null(p);
     ++			queued_cmd[nr++] = call;
      +		}
     -+
     -+		ALLOC_GROW(queued_cmd, nr + 1, alloc);
     -+		call.fn = cmd->fn;
     -+		call.line = xstrdup_or_null(p);
     -+		queued_cmd[nr++] = call;
      +	}
      +
      +	if (opt->buffer_output &&
      +	    nr &&
      +	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
      +		dispatch_calls(opt, output, data, queued_cmd, nr);
     -+		free_cmds(queued_cmd, nr);
     ++		free_cmds(queued_cmd, &nr);
      +	}
      +
      +	free(queued_cmd);
     @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
       			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
       		/* Batch-specific options */
      
     - ## t/README ##
     -@@ t/README: a test and then fails then the whole test run will abort. This can help to make
     - sure the expected tests are executed and not silently skipped when their
     - dependency breaks or is simply not present in a new environment.
     - 
     -+GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=<boolean>, when true will prevent cat-file
     -+--batch-command from flushing to output on exit.
     -+
     - Naming Tests
     - ------------
     - 
     -
       ## t/t1006-cat-file.sh ##
      @@ t/t1006-cat-file.sh: $content"
       	test_cmp expect actual
     @@ t/t1006-cat-file.sh: $content"
      +	test -z "$content" ||
      +		test_expect_success "--batch-command $opt output of $type content is correct" '
      +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
     -+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" \
     -+		| git cat-file --batch-command $opt)" $no_ts >actual &&
     ++		maybe_remove_timestamp "$(test_write_lines "contents $sha1" |
     ++		git cat-file --batch-command $opt)" $no_ts >actual &&
      +		test_cmp expect actual
      +	'
      +
      +	test_expect_success "--batch-command $opt output of $type info is correct" '
      +		echo "$sha1 $type $size" >expect &&
     -+		test_write_lines "info $sha1" \
     -+		| git cat-file --batch-command $opt >actual &&
     ++		test_write_lines "info $sha1" |
     ++		git cat-file --batch-command $opt >actual &&
      +		test_cmp expect actual
      +	'
      +    done
     @@ t/t1006-cat-file.sh: $content"
           test_expect_success "custom --batch-check format" '
       	echo "$type $sha1" >expect &&
       	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
     + 	test_cmp expect actual
     +     '
     + 
     ++    test_expect_success "custom --batch-command format" '
     ++	echo "$type $sha1" >expect &&
     ++	echo "info $sha1" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
     ++	test_cmp expect actual
     ++    '
     ++
     +     test_expect_success '--batch-check with %(rest)' '
     + 	echo "$type this is some extra content" >expect &&
     + 	echo "$sha1    this is some extra content" |
      @@ t/t1006-cat-file.sh: test_expect_success "setup" '
       
       run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
       
      +test_expect_success '--batch-command --buffer with flush for blob info' '
      +	echo "$hello_sha1 blob $hello_size" >expect &&
     -+	test_write_lines "info $hello_sha1" "flush" | \
     ++	test_write_lines "info $hello_sha1" "flush" |
      +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
      +	git cat-file --batch-command --buffer >actual &&
      +	test_cmp expect actual
     @@ t/t1006-cat-file.sh: test_expect_success "setup" '
      +
      +test_expect_success '--batch-command --buffer without flush for blob info' '
      +	touch output &&
     -+	test_write_lines "info $hello_sha1" | \
     ++	test_write_lines "info $hello_sha1" |
      +	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
      +	git cat-file --batch-command --buffer >>output &&
      +	test_must_be_empty output
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +	deadbeef missing
      +	EOF
      +
     -+	test_write_lines "info $hello_sha1"\
     -+	"info $tree_sha1"\
     -+	"info $commit_sha1"\
     -+	"info $tag_sha1"\
     -+	"info deadbeef" | git cat-file --batch-command --buffer >actual &&
     ++	git cat-file --batch-command --buffer >actual <<-EOF &&
     ++	info $hello_sha1
     ++	info $tree_sha1
     ++	info $commit_sha1
     ++	info $tag_sha1
     ++	info deadbeef
     ++	EOF
     ++
      +	test_cmp expect actual
      +'
      +
     @@ t/t1006-cat-file.sh: test_expect_success "--batch-check with multiple sha1s give
      +	deadbeef missing
      +	EOF
      +
     -+	test_write_lines "contents $hello_sha1"\
     -+	"contents $commit_sha1"\
     -+	"contents $tag_sha1"\
     -+	"contents deadbeef"\
     -+	"flush" | git cat-file --batch-command --buffer | remove_timestamp >actual &&
     ++	git cat-file --batch-command --buffer >actual_raw <<-EOF &&
     ++	contents $hello_sha1
     ++	contents $commit_sha1
     ++	contents $tag_sha1
     ++	contents deadbeef
     ++	flush
     ++	EOF
     ++
     ++	remove_timestamp <actual_raw >actual &&
      +	test_cmp expect actual
      +'
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
@ 2022-02-18 18:23                   ` John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
                                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-18 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

In the next patch, we will add an enum on the batch_options struct that
indicates which type of batch operation will be used: --batch,
--batch-check and the soon to be  --batch-command that will read
commands from stdin. --batch-command mode might get confused with
the cmdmode flag.

There is value in renaming cmdmode in any case. cmdmode refers to how
the result output of the blob will be transformed, either according to
--filter or --textconv. So transform_mode is a more descriptive name
for the flag.

Rename cmdmode to transform_mode in cat-file.c

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..5f015e71096 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,7 +24,7 @@ struct batch_options {
 	int buffer_output;
 	int all_objects;
 	int unordered;
-	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
 };
 
@@ -302,19 +302,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
-		if (opt->cmdmode) {
+		if (opt->transform_mode) {
 			char *contents;
 			unsigned long size;
 
 			if (!data->rest)
 				die("missing path for '%s'", oid_to_hex(oid));
 
-			if (opt->cmdmode == 'w') {
+			if (opt->transform_mode == 'w') {
 				if (filter_object(data->rest, 0100644, oid,
 						  &contents, &size))
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
+			} else if (opt->transform_mode == 'c') {
 				enum object_type type;
 				if (!textconv_object(the_repository,
 						     data->rest, 0100644, oid,
@@ -326,7 +326,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 					die("could not convert '%s' %s",
 					    oid_to_hex(oid), data->rest);
 			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
+				BUG("invalid transform_mode: %c", opt->transform_mode);
 			batch_write(opt, contents, size);
 			free(contents);
 		} else {
@@ -529,7 +529,7 @@ static int batch_objects(struct batch_options *opt)
 	strbuf_expand(&output, opt->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (opt->transform_mode)
 		data.split_on_whitespace = 1;
 
 	/*
@@ -742,7 +742,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
 		if (opt_cw)
-			batch.cmdmode = opt;
+			batch.transform_mode = opt;
 		else if (opt && opt != 'b')
 			usage_msg_optf(_("'-%c' is incompatible with batch mode"),
 				       usage, options, opt);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
@ 2022-02-18 18:23                   ` John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
                                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-18 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

A future patch introduces a new --batch-command flag. Including --batch
and --batch-check, we will have a total of three batch modes. print_contents
is the only boolean on the batch_options sturct used to distinguish
between the different modes. This makes the code harder to read.

To reduce potential confusion, replace print_contents with an enum to
help readability and clarity.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5f015e71096..5e38af82af1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -17,10 +17,15 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 
+enum batch_mode {
+	BATCH_MODE_CONTENTS,
+	BATCH_MODE_INFO,
+};
+
 struct batch_options {
 	int enabled;
 	int follow_symlinks;
-	int print_contents;
+	enum batch_mode batch_mode;
 	int buffer_output;
 	int all_objects;
 	int unordered;
@@ -386,7 +391,7 @@ static void batch_object_write(const char *obj_name,
 	strbuf_addch(scratch, '\n');
 	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
 		batch_write(opt, "\n", 1);
 	}
@@ -536,7 +541,7 @@ static int batch_objects(struct batch_options *opt)
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
@@ -635,7 +640,14 @@ static int batch_option_callback(const struct option *opt,
 	}
 
 	bo->enabled = 1;
-	bo->print_contents = !strcmp(opt->long_name, "batch");
+
+	if (!strcmp(opt->long_name, "batch"))
+		bo->batch_mode = BATCH_MODE_CONTENTS;
+	else if (!strcmp(opt->long_name, "batch-check"))
+		bo->batch_mode = BATCH_MODE_INFO;
+	else
+		BUG("%s given to batch-option-callback", opt->long_name);
+
 	bo->format = arg;
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v10 3/4] cat-file: add remove_timestamp helper
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
  2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
@ 2022-02-18 18:23                   ` John Cai via GitGitGadget
  2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
  2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
                                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-18 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

maybe_remove_timestamp() takes arguments, but it would be useful to have
a function that reads from stdin and strips the timestamp. This would
allow tests to pipe data into a function to remove timestamps, and
wouldn't have to always assign a variable. This is especially helpful
when the data is multiple lines.

Keep maybe_remove_timestamp() the same, but add a remove_timestamp
helper that reads from stdin.

The tests in the next patch will make use of this.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 t/t1006-cat-file.sh | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..2d52851dadc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -105,13 +105,18 @@ strlen () {
 }
 
 maybe_remove_timestamp () {
-    if test -z "$2"; then
-        echo_without_newline "$1"
-    else
-	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
-    fi
+	if test -z "$2"; then
+		echo_without_newline "$1"
+	else
+		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
+	fi
 }
 
+remove_timestamp () {
+	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
+}
+
+
 run_tests () {
     type=$1
     sha1=$2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v10 4/4] cat-file: add --batch-command mode
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                     ` (2 preceding siblings ...)
  2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-18 18:23                   ` John Cai via GitGitGadget
  2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
  2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
  2022-02-22 11:07                   ` Phillip Wood
  5 siblings, 1 reply; 97+ messages in thread
From: John Cai via GitGitGadget @ 2022-02-18 18:23 UTC (permalink / raw)
  To: git
  Cc: me, phillip.wood123, avarab, e, bagasdotme, gitster,
	Eric Sunshine, Jonathan Tan, Christian Couder, John Cai,
	John Cai

From: John Cai <johncai86@gmail.com>

Add a new flag --batch-command that accepts commands and arguments
from stdin, similar to git-update-ref --stdin.

At GitLab, we use a pair of long running cat-file processes when
accessing object content. One for iterating over object metadata with
--batch-check, and the other to grab object contents with --batch.

However, if we had --batch-command, we wouldn't need to keep both
processes around, and instead just have one --batch-command process
where we can flip between getting object info, and getting object
contents. Since we have a pair of cat-file processes per repository,
this means we can get rid of roughly half of long lived git cat-file
processes. Given there are many repositories being accessed at any given
time, this can lead to huge savings.

git cat-file --batch-command

will enter an interactive command mode whereby the user can enter in
commands and their arguments that get queued in memory:

<command1> [arg1] [arg2] LF
<command2> [arg1] [arg2] LF

When --buffer mode is used, commands will be queued in memory until a
flush command is issued that execute them:

flush LF

The reason for a flush command is that when a consumer process (A)
talks to a git cat-file process (B) and interactively writes to and
reads from it in --buffer mode, (A) needs to be able to control when
the buffer is flushed to stdout.

Currently, from (A)'s perspective, the only way is to either

1. kill (B)'s process
2. send an invalid object to stdin.

1. is not ideal from a performance perspective as it will require
spawning a new cat-file process each time, and 2. is hacky and not a
good long term solution.

With this mechanism of queueing up commands and letting (A) issue a
flush command, process (A) can control when the buffer is flushed and
can guarantee it will receive all of the output when in --buffer mode.
--batch-command also will not allow (B) to flush to stdout until a flush
is received.

This patch adds the basic structure for adding command which can be
extended in the future to add more commands. It also adds the following
two commands (on top of the flush command):

contents <object> LF
info <object> LF

The contents command takes an <object> argument and prints out the object
contents.

The info command takes an <object> argument and prints out the object
metadata.

These can be used in the following way with --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
flush LF
info <object> LF
flush LF

When used without --buffer:

info <object> LF
contents <object> LF
contents <object> LF
info <object> LF
info <object> LF

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  42 +++++++++-
 builtin/cat-file.c             | 144 ++++++++++++++++++++++++++++++++-
 t/t1006-cat-file.sh            | 120 ++++++++++++++++++++++++++-
 3 files changed, 300 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd06..70c5b4f12d1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,33 @@ OPTIONS
 	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
+--batch-command::
+--batch-command=<format>::
+	Enter a command mode that reads commands and arguments from stdin. May
+	only be combined with `--buffer`, `--textconv` or `--filters`. In the
+	case of `--textconv` or `--filters`, the input lines also need to specify
+	the path, separated by whitespace. See the section `BATCH OUTPUT` below
+	for details.
++
+`--batch-command` recognizes the following commands:
++
+--
+contents <object>::
+	Print object contents for object reference `<object>`. This corresponds to
+	the output of `--batch`.
+
+info <object>::
+	Print object info for object reference `<object>`. This corresponds to the
+	output of `--batch-check`.
+
+flush::
+	Used with `--buffer` to execute all preceding commands that were issued
+	since the beginning or since the last flush was issued. When `--buffer`
+	is used, no output will come until a `flush` is issued. When `--buffer`
+	is not used, commands are flushed each time without issuing `flush`.
+--
++
+
 --batch-all-objects::
 	Instead of reading a list of objects on stdin, perform the
 	requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
 	that a process can interactively read and write from
 	`cat-file`. With this option, the output uses normal stdio
 	buffering; this is much more efficient when invoking
-	`--batch-check` on a large number of objects.
+	`--batch-check` or `--batch-command` on a large number of objects.
 
 --unordered::
 	When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
 the whole line is considered as an object, as if it were fed to
 linkgit:git-rev-parse[1].
 
+When `--batch-command` is given, `cat-file` will read commands from stdin,
+one per line, and print information based on the command given. With
+`--batch-command`, the `info` command followed by an object will print
+information about the object the same way `--batch-check` would, and the
+`contents` command followed by an object prints contents in the same way
+`--batch` would.
+
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
 object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
-If `--batch` is specified, the object information is followed by the
-object contents (consisting of `%(objectsize)` bytes), followed by a
-newline.
+If `--batch` is specified, or if `--batch-command` is used with the `contents`
+command, the object information is followed by the object contents (consisting
+of `%(objectsize)` bytes), followed by a newline.
 
 For example, `--batch` without a custom format would produce:
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5e38af82af1..e75e110302e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -20,6 +20,7 @@
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
 	BATCH_MODE_INFO,
+	BATCH_MODE_QUEUE_AND_DISPATCH,
 };
 
 struct batch_options {
@@ -513,6 +514,135 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+			       struct strbuf *, struct expand_data *);
+
+struct queued_cmd {
+	parse_cmd_fn_t fn;
+	char *line;
+};
+
+static void parse_cmd_contents(struct batch_options *opt,
+			     const char *line,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_CONTENTS;
+	batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+			   const char *line,
+			   struct strbuf *output,
+			   struct expand_data *data)
+{
+	opt->batch_mode = BATCH_MODE_INFO;
+	batch_one_object(line, output, opt, data);
+}
+
+static void dispatch_calls(struct batch_options *opt,
+		struct strbuf *output,
+		struct expand_data *data,
+		struct queued_cmd *cmd,
+		int nr)
+{
+	int i;
+
+	if (!opt->buffer_output)
+		die(_("flush is only for --buffer mode"));
+
+	for (i = 0; i < nr; i++)
+		cmd[i].fn(opt, cmd[i].line, output, data);
+
+	fflush(stdout);
+}
+
+static void free_cmds(struct queued_cmd *cmd, size_t *nr)
+{
+	size_t i;
+
+	for (i = 0; i < *nr; i++)
+		FREE_AND_NULL(cmd[i].line);
+
+	*nr = 0;
+}
+
+
+static const struct parse_cmd {
+	const char *name;
+	parse_cmd_fn_t fn;
+	unsigned takes_args;
+} commands[] = {
+	{ "contents", parse_cmd_contents, 1},
+	{ "info", parse_cmd_info, 1},
+	{ "flush", NULL, 0},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	struct queued_cmd *queued_cmd = NULL;
+	size_t alloc = 0, nr = 0;
+
+	while (!strbuf_getline(&input, stdin)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		const char *p = NULL, *cmd_end;
+		struct queued_cmd call = {0};
+
+		if (!input.len)
+			die(_("empty command in input"));
+		if (isspace(*input.buf))
+			die(_("whitespace before command: '%s'"), input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(commands); i++) {
+			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
+				continue;
+
+			cmd = &commands[i];
+			if (cmd->takes_args) {
+				if (*cmd_end != ' ')
+					die(_("%s requires arguments"),
+					    commands[i].name);
+
+				p = cmd_end + 1;
+			} else if (*cmd_end) {
+				die(_("%s takes no arguments"),
+				    commands[i].name);
+			}
+
+			break;
+		}
+
+		if (!cmd)
+			die(_("unknown command: '%s'"), input.buf);
+
+		if (!strcmp(cmd->name, "flush")) {
+			dispatch_calls(opt, output, data, queued_cmd, nr);
+			free_cmds(queued_cmd, &nr);
+		} else if (!opt->buffer_output) {
+			cmd->fn(opt, p, output, data);
+		} else {
+			ALLOC_GROW(queued_cmd, nr + 1, alloc);
+			call.fn = cmd->fn;
+			call.line = xstrdup_or_null(p);
+			queued_cmd[nr++] = call;
+		}
+	}
+
+	if (opt->buffer_output &&
+	    nr &&
+	    !git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
+		dispatch_calls(opt, output, data, queued_cmd, nr);
+		free_cmds(queued_cmd, &nr);
+	}
+
+	free(queued_cmd);
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -595,6 +725,11 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
+	if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
+		batch_objects_command(opt, &output, &data);
+		goto cleanup;
+	}
+
 	while (strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
@@ -613,6 +748,7 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+ cleanup:
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +781,8 @@ static int batch_option_callback(const struct option *opt,
 		bo->batch_mode = BATCH_MODE_CONTENTS;
 	else if (!strcmp(opt->long_name, "batch-check"))
 		bo->batch_mode = BATCH_MODE_INFO;
+	else if (!strcmp(opt->long_name, "batch-command"))
+		bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
 	else
 		BUG("%s given to batch-option-callback", opt->long_name);
 
@@ -666,7 +804,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file <type> <object>"),
 		N_("git cat-file (-e | -p) <object>"),
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
-		N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
+		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
 		   "             [--textconv | --filters]"),
 		N_("git cat-file (--textconv | --filters)\n"
@@ -695,6 +833,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
+		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
+			N_("read commands from stdin"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
 		/* Batch-specific options */
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..1b852076944 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -182,12 +182,36 @@ $content"
 	test_cmp expect actual
     '
 
+    for opt in --buffer --no-buffer
+    do
+	test -z "$content" ||
+		test_expect_success "--batch-command $opt output of $type content is correct" '
+		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+		maybe_remove_timestamp "$(test_write_lines "contents $sha1" |
+		git cat-file --batch-command $opt)" $no_ts >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "--batch-command $opt output of $type info is correct" '
+		echo "$sha1 $type $size" >expect &&
+		test_write_lines "info $sha1" |
+		git cat-file --batch-command $opt >actual &&
+		test_cmp expect actual
+	'
+    done
+
     test_expect_success "custom --batch-check format" '
 	echo "$type $sha1" >expect &&
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
 
+    test_expect_success "custom --batch-command format" '
+	echo "$type $sha1" >expect &&
+	echo "info $sha1" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
+	test_cmp expect actual
+    '
+
     test_expect_success '--batch-check with %(rest)' '
 	echo "$type this is some extra content" >expect &&
 	echo "$sha1    this is some extra content" |
@@ -229,6 +253,22 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	test_write_lines "info $hello_sha1" "flush" |
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command --buffer without flush for blob info' '
+	touch output &&
+	test_write_lines "info $hello_sha1" |
+	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
+	git cat-file --batch-command --buffer >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_sha1 blob $hello_size" >expect &&
 	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
@@ -272,7 +312,7 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
-for batch in batch batch-check
+for batch in batch batch-check batch-command
 do
     for opt in t s e p
     do
@@ -378,6 +418,49 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success '--batch-command with multiple info calls gives correct format' '
+	cat >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$tree_sha1 tree $tree_size
+	$commit_sha1 commit $commit_size
+	$tag_sha1 tag $tag_size
+	deadbeef missing
+	EOF
+
+	git cat-file --batch-command --buffer >actual <<-EOF &&
+	info $hello_sha1
+	info $tree_sha1
+	info $commit_sha1
+	info $tag_sha1
+	info deadbeef
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success '--batch-command with multiple command calls gives correct format' '
+	remove_timestamp >expect <<-EOF &&
+	$hello_sha1 blob $hello_size
+	$hello_content
+	$commit_sha1 commit $commit_size
+	$commit_content
+	$tag_sha1 tag $tag_size
+	$tag_content
+	deadbeef missing
+	EOF
+
+	git cat-file --batch-command --buffer >actual_raw <<-EOF &&
+	contents $hello_sha1
+	contents $commit_sha1
+	contents $tag_sha1
+	contents deadbeef
+	flush
+	EOF
+
+	remove_timestamp <actual_raw >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&
@@ -968,5 +1051,40 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	echo "$orig commit $orig_size" >expect &&
 	test_cmp expect actual
 '
+test_expect_success 'batch-command empty command' '
+	echo "" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*empty command in input.*" err
+'
+
+test_expect_success 'batch-command whitespace before command' '
+	echo " info deadbeef" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*whitespace before command.*" err
+'
+
+test_expect_success 'batch-command unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'batch-command missing arguments' '
+	echo "info" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*info requires arguments.*" err
+'
+
+test_expect_success 'batch-command flush with arguments' '
+	echo "flush arg" >cmd &&
+	test_expect_code 128 git cat-file --batch-command --buffer <cmd 2>err &&
+	grep "^fatal:.*flush takes no arguments.*" err
+'
+
+test_expect_success 'batch-command flush without --buffer' '
+	echo "flush" >cmd &&
+	test_expect_code 128 git cat-file --batch-command <cmd 2>err &&
+	grep "^fatal:.*flush is only for --buffer mode.*" err
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v10 0/4] Add cat-file --batch-command flag
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                     ` (3 preceding siblings ...)
  2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-18 19:38                   ` Junio C Hamano
  2022-02-22 11:07                   ` Phillip Wood
  5 siblings, 0 replies; 97+ messages in thread
From: Junio C Hamano @ 2022-02-18 19:38 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, avarab, e, bagasdotme, Eric Sunshine,
	Jonathan Tan, Christian Couder, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The feature proposal of adding a command interface to cat-file was first
> discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
> before moving forward with a new flag. An RFC was created [C] and the idea
> was discussed more thoroughly, and overall it seemed like it was headed in
> the right direction.
>
> This patch series consolidates the feedback from these different threads.
>
> This patch series has four parts:
>
>  1. preparation patch to rename a variable
>  2. adding an enum to keep track of batch modes
>  3. add a remove_timestamp() helper that takes stdin and removes timestamps
>  4. logic to handle --batch-command flag, adding contents, info, flush
>     commands
>
> Changes since v9
>
>  * add test to exercise format for batch-command
>  * minor semantic improvements
>  * removed README entry for environment variable used in test

Both the changes relative to the previous round and relative to
'master' look good to me.

Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v10 3/4] cat-file: add remove_timestamp helper
  2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
@ 2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
  2022-02-22  3:31                       ` John Cai
  0 siblings, 1 reply; 97+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-19  6:33 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, e, bagasdotme, gitster, Eric Sunshine,
	Jonathan Tan, Christian Couder, John Cai


On Fri, Feb 18 2022, John Cai via GitGitGadget wrote:

> From: John Cai <johncai86@gmail.com>
>
> maybe_remove_timestamp() takes arguments, but it would be useful to have
> a function that reads from stdin and strips the timestamp. This would
> allow tests to pipe data into a function to remove timestamps, and
> wouldn't have to always assign a variable. This is especially helpful
> when the data is multiple lines.
>
> Keep maybe_remove_timestamp() the same, but add a remove_timestamp
> helper that reads from stdin.
>
> The tests in the next patch will make use of this.
>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  t/t1006-cat-file.sh | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 145eee11df9..2d52851dadc 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -105,13 +105,18 @@ strlen () {
>  }
>  
>  maybe_remove_timestamp () {
> -    if test -z "$2"; then
> -        echo_without_newline "$1"
> -    else
> -	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
> -    fi
> +	if test -z "$2"; then
> +		echo_without_newline "$1"
> +	else
> +		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
> +	fi
>  }
>  
> +remove_timestamp () {
> +	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
> +}
> +
> +
>  run_tests () {
>      type=$1
>      sha1=$2

I may have missed some previous discussions, but is there a reason this
echo_without_newline dance is needed? At this point this on top passes
all tests for me on both dash and bash:

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d52851dadc..8266a939f99 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -104,18 +104,19 @@ strlen () {
     echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
 }
 
-maybe_remove_timestamp () {
-	if test -z "$2"; then
-		echo_without_newline "$1"
-	else
-		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
-	fi
-}
-
 remove_timestamp () {
 	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
 }
 
+maybe_remove_timestamp () {
+	if test -n "$2"
+	then
+		echo "$1" | remove_timestamp
+		return 0
+	fi
+
+	echo "$1"
+}
 
 run_tests () {
     type=$1

The move is another comment, if we're adding a remove_timestamp() let's
define it before maybe_remove_timestamp() which uses it, even though in
this case we can get away with it...

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v10 4/4] cat-file: add --batch-command mode
  2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
@ 2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 97+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-19  6:35 UTC (permalink / raw)
  To: John Cai via GitGitGadget
  Cc: git, me, phillip.wood123, e, bagasdotme, gitster, Eric Sunshine,
	Jonathan Tan, Christian Couder, John Cai


On Fri, Feb 18 2022, John Cai via GitGitGadget wrote:

> From: John Cai <johncai86@gmail.com>
> [....]
> +    for opt in --buffer --no-buffer
> +    do
> +	test -z "$content" ||
> +		test_expect_success "--batch-command $opt output of $type content is correct" '
> +		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
> +		maybe_remove_timestamp "$(test_write_lines "contents $sha1" |
> +		git cat-file --batch-command $opt)" $no_ts >actual &&
> +		test_cmp expect actual
> +	'
> +
> [...]
> +test_expect_success '--batch-command with multiple command calls gives correct format' '
> +	remove_timestamp >expect <<-EOF &&
> +	$hello_sha1 blob $hello_size
> +	$hello_content
> +	$commit_sha1 commit $commit_size
> +	$commit_content
> +	$tag_sha1 tag $tag_size
> +	$tag_content
> +	deadbeef missing
> +	EOF
> +
> +	git cat-file --batch-command --buffer >actual_raw <<-EOF &&
> +	contents $hello_sha1
> +	contents $commit_sha1
> +	contents $tag_sha1
> +	contents deadbeef
> +	flush
> +	EOF
> +
> +	remove_timestamp <actual_raw >actual &&
> +	test_cmp expect actual
> +'

Re my comment on 3/4; I then tried my suggested change to
maybe_remove_timestamp on this patch, and it also works on this
commit...

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v10 3/4] cat-file: add remove_timestamp helper
  2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
@ 2022-02-22  3:31                       ` John Cai
  0 siblings, 0 replies; 97+ messages in thread
From: John Cai @ 2022-02-22  3:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: John Cai via GitGitGadget, git, me, phillip.wood123, e,
	bagasdotme, gitster, Eric Sunshine, Jonathan Tan,
	Christian Couder

Hi Ævar,

On 19 Feb 2022, at 1:33, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Feb 18 2022, John Cai via GitGitGadget wrote:
>
>> From: John Cai <johncai86@gmail.com>
>>
>> maybe_remove_timestamp() takes arguments, but it would be useful to have
>> a function that reads from stdin and strips the timestamp. This would
>> allow tests to pipe data into a function to remove timestamps, and
>> wouldn't have to always assign a variable. This is especially helpful
>> when the data is multiple lines.
>>
>> Keep maybe_remove_timestamp() the same, but add a remove_timestamp
>> helper that reads from stdin.
>>
>> The tests in the next patch will make use of this.
>>
>> Signed-off-by: John Cai <johncai86@gmail.com>
>> ---
>>  t/t1006-cat-file.sh | 15 ++++++++++-----
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
>> index 145eee11df9..2d52851dadc 100755
>> --- a/t/t1006-cat-file.sh
>> +++ b/t/t1006-cat-file.sh
>> @@ -105,13 +105,18 @@ strlen () {
>>  }
>>
>>  maybe_remove_timestamp () {
>> -    if test -z "$2"; then
>> -        echo_without_newline "$1"
>> -    else
>> -	echo_without_newline "$(printf '%s\n' "$1" | sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//')"
>> -    fi
>> +	if test -z "$2"; then
>> +		echo_without_newline "$1"
>> +	else
>> +		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
>> +	fi
>>  }
>>
>> +remove_timestamp () {
>> +	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
>> +}
>> +
>> +
>>  run_tests () {
>>      type=$1
>>      sha1=$2
>
> I may have missed some previous discussions, but is there a reason this
> echo_without_newline dance is needed? At this point this on top passes
> all tests for me on both dash and bash:
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 2d52851dadc..8266a939f99 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -104,18 +104,19 @@ strlen () {
>      echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
>  }
>
> -maybe_remove_timestamp () {
> -	if test -z "$2"; then
> -		echo_without_newline "$1"
> -	else
> -		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
> -	fi
> -}
> -
>  remove_timestamp () {
>  	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
>  }
>
> +maybe_remove_timestamp () {
> +	if test -n "$2"
> +	then
> +		echo "$1" | remove_timestamp
> +		return 0
> +	fi
> +
> +	echo "$1"
> +}
>
>  run_tests () {
>      type=$1
>
> The move is another comment, if we're adding a remove_timestamp() let's
> define it before maybe_remove_timestamp() which uses it, even though in
> this case we can get away with it...

Thanks for these suggestions! I'll adjust 3/4 to include these changes.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v10 0/4] Add cat-file --batch-command flag
  2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
                                     ` (4 preceding siblings ...)
  2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
@ 2022-02-22 11:07                   ` Phillip Wood
  5 siblings, 0 replies; 97+ messages in thread
From: Phillip Wood @ 2022-02-22 11:07 UTC (permalink / raw)
  To: John Cai via GitGitGadget, git
  Cc: me, avarab, e, bagasdotme, gitster, Eric Sunshine, Jonathan Tan,
	Christian Couder, John Cai

On 18/02/2022 18:23, John Cai via GitGitGadget wrote:
> The feature proposal of adding a command interface to cat-file was first
> discussed in [A]. In [B], Taylor expressed the need for a fuller proposal
> before moving forward with a new flag. An RFC was created [C] and the idea
> was discussed more thoroughly, and overall it seemed like it was headed in
> the right direction.
> 
> This patch series consolidates the feedback from these different threads.
> 
> This patch series has four parts:
> 
>   1. preparation patch to rename a variable
>   2. adding an enum to keep track of batch modes
>   3. add a remove_timestamp() helper that takes stdin and removes timestamps
>   4. logic to handle --batch-command flag, adding contents, info, flush
>      commands
> 
> Changes since v9
> 
>   * add test to exercise format for batch-command
>   * minor semantic improvements
>   * removed README entry for environment variable used in test

The range-diff looks good to me.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2022-02-22 13:20 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
2022-02-03 19:28   ` Junio C Hamano
2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
2022-02-03 19:57   ` Junio C Hamano
2022-02-04  4:11     ` John Cai
2022-02-04 16:46       ` Phillip Wood
2022-02-04  6:45   ` Eric Sunshine
2022-02-04 21:41     ` John Cai
2022-02-05  6:52       ` Eric Sunshine
2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
2022-02-04 16:51     ` Phillip Wood
2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-07 23:58     ` Junio C Hamano
2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-07 23:34     ` Jonathan Tan
2022-02-08 11:00       ` Phillip Wood
2022-02-08 17:56         ` Jonathan Tan
2022-02-08 18:09           ` Junio C Hamano
2022-02-09  0:11             ` Jonathan Tan
2022-02-08  0:49     ` Junio C Hamano
2022-02-08 11:06     ` Phillip Wood
2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
2022-02-08 23:43       ` Junio C Hamano
2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-08 23:59       ` Junio C Hamano
2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-09 22:22       ` John Cai
2022-02-09 23:10         ` John Cai
2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-10 10:10         ` Christian Couder
2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-10 10:57         ` Phillip Wood
2022-02-10 17:05           ` Junio C Hamano
2022-02-11 17:45             ` John Cai
2022-02-11 20:07               ` Junio C Hamano
2022-02-11 21:30                 ` John Cai
2022-02-10 18:55           ` John Cai
2022-02-10 22:46         ` Eric Sunshine
2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-14 13:59           ` Phillip Wood
2022-02-14 16:19             ` John Cai
2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-15 19:39             ` Eric Sunshine
2022-02-15 22:58               ` John Cai
2022-02-15 23:20                 ` Eric Sunshine
2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16  1:28               ` Junio C Hamano
2022-02-16  2:48                 ` John Cai
2022-02-16  3:00                   ` Junio C Hamano
2022-02-16  3:17                     ` Eric Sunshine
2022-02-16  3:01                   ` Eric Sunshine
2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16 17:15                 ` Junio C Hamano
2022-02-16 17:25                   ` Eric Sunshine
2022-02-16 20:30                     ` John Cai
2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-18 11:26                   ` Phillip Wood
2022-02-18 16:53                     ` John Cai
2022-02-18 17:32                       ` Junio C Hamano
2022-02-18 17:23                     ` Junio C Hamano
2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
2022-02-22  3:31                       ` John Cai
2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
2022-02-22 11:07                   ` Phillip Wood

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.