All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] cat-file: add a --stdin-cmd mode
@ 2022-01-19 17:58 John Cai via GitGitGadget
  2022-01-19 17:58 ` [PATCH 1/2] strvec.c: add a strvec_split_delim() John Cai via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: John Cai via GitGitGadget @ 2022-01-19 17:58 UTC (permalink / raw)
  To: git; +Cc: John Cai

This WIP patch is mostly stealing code from builtin/update-ref.c and
implementing the same sort of prefixed command-mode that it supports. I.e.
in addition to --batch now supporting:

<object> LF


It'll support with --stdin-cmd, with and without -z, respectively:

object <object> NL
object <object> NUL


The plus being that we can now implement additional commands:

fflush NL
fflush NUL


That command simply calls fflush(stdout), which could be done as an emergent
effect before by feeding the input a "NL".

I think this will be useful for other things, e.g. a not-trivial part of
"cat-file --batch" time is spent on parsing its argument and seeing if it's
a revision, ref etc.

So we could e.g. add a command that only accepts a full-length 40 character
SHA-1, or switch the --format output mid-request etc.

 1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/

requires ee4d43041d ab/cat-file

John Cai (2):
  strvec.c: add a strvec_split_delim()
  cat-file: add a --stdin-cmd mode

 builtin/cat-file.c  | 128 +++++++++++++++++++++++++++++++++++++++++++-
 strvec.c            |  23 ++++++++
 strvec.h            |   8 +++
 t/t1006-cat-file.sh |  72 +++++++++++++++++++++++++
 4 files changed, 230 insertions(+), 1 deletion(-)


base-commit: 00780c9af44409a68481c82f63a97bd18bb2593e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1191%2Fjohn-cai%2Fjc-cat-file-stdin-cmd-mode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1191/john-cai/jc-cat-file-stdin-cmd-mode-v1
Pull-Request: https://github.com/git/git/pull/1191
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] strvec.c: add a strvec_split_delim()
  2022-01-19 17:58 [PATCH 0/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
@ 2022-01-19 17:58 ` John Cai via GitGitGadget
  2022-01-19 17:58 ` [PATCH 2/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
  2022-01-19 19:38 ` [PATCH 0/2] " Taylor Blau
  2 siblings, 0 replies; 5+ messages in thread
From: John Cai via GitGitGadget @ 2022-01-19 17:58 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

This is just string_list_split() copy/pasted with the "push" part
adjusted. The next commit will call this function to parse a line of
command arguments.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 strvec.c | 23 +++++++++++++++++++++++
 strvec.h |  8 ++++++++
 2 files changed, 31 insertions(+)

diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..7dca04bf7a5 100644
--- a/strvec.c
+++ b/strvec.c
@@ -85,6 +85,29 @@ void strvec_split(struct strvec *array, const char *to_split)
 	}
 }
 
+size_t strvec_split_delim(struct strvec *array, const char *string,
+			  int delim, int maxsplit)
+{
+	size_t count = 0;
+	const char *p = string, *end;
+
+	for (;;) {
+		count++;
+		if (maxsplit >= 0 && count > maxsplit) {
+			strvec_push(array, p);
+			return count;
+		}
+		end = strchr(p, delim);
+		if (end) {
+			strvec_push_nodup(array, xmemdupz(p, end - p));
+			p = end + 1;
+		} else {
+			strvec_push(array, p);
+			return count;
+		}
+	}
+}
+
 void strvec_clear(struct strvec *array)
 {
 	if (array->v != empty_strvec) {
diff --git a/strvec.h b/strvec.h
index 9f55c8766ba..c474918b91a 100644
--- a/strvec.h
+++ b/strvec.h
@@ -73,6 +73,14 @@ void strvec_pop(struct strvec *);
 /* Splits by whitespace; does not handle quoted arguments! */
 void strvec_split(struct strvec *, const char *);
 
+/**
+ * strvec_split_delim() is a split function that behaves more like its
+ * string_list_split() cousin than the whitespace-splitting
+ * strvec_split().
+ */
+size_t strvec_split_delim(struct strvec *array, const char *string,
+			  int delim, int maxsplit);
+
 /**
  * Free all memory associated with the array and return it to the
  * initial, empty state.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] cat-file: add a --stdin-cmd mode
  2022-01-19 17:58 [PATCH 0/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
  2022-01-19 17:58 ` [PATCH 1/2] strvec.c: add a strvec_split_delim() John Cai via GitGitGadget
@ 2022-01-19 17:58 ` John Cai via GitGitGadget
  2022-01-19 19:38 ` [PATCH 0/2] " Taylor Blau
  2 siblings, 0 replies; 5+ messages in thread
From: John Cai via GitGitGadget @ 2022-01-19 17:58 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

Teach cat-file to accept a --stdin-cmd which causes cat-file to enter a
command mode to receive commands and arguments on stdin..

batch_objects_stdin_cmd() handles a line from stdin, which it expects a
command name and arguments. -z will tell --stdin-cmd to look for NUL
as the line termination.

This commit adds two commands, fflush and object. fflush immediately
flushes the buffer. object takes a sha as an argument and passes it to
batch_one_object().

The code allows for additional commands to be added as parse_cmd structs.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/cat-file.c  | 128 +++++++++++++++++++++++++++++++++++++++++++-
 t/t1006-cat-file.sh |  72 +++++++++++++++++++++++++
 2 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950ec..f56dbc85388 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "strvec.h"
 
 struct batch_options {
 	int enabled;
@@ -26,7 +27,10 @@ struct batch_options {
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
 	const char *format;
+	int stdin_cmd;
+	int end_null;
 };
+static char line_termination = '\n';
 
 static const char *force_path;
 
@@ -508,6 +512,117 @@ static int batch_unordered_packed(const struct object_id *oid,
 				      data);
 }
 
+enum batch_state {
+	/* Non-transactional state open for commands. */
+	BATCH_STATE_OPEN,
+};
+
+static void parse_cmd_object(struct batch_options *opt,
+			     const int argc, const char **argv,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	batch_one_object(argv[0], output, opt, data);
+}
+
+static void parse_cmd_fflush(struct batch_options *opt,
+			     const int argc, const char **argv,
+			     struct strbuf *output,
+			     struct expand_data *data)
+{
+	fflush(stdout);
+}
+
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const int,
+			       const char **, struct strbuf *,
+			       struct expand_data *);
+
+static const struct parse_cmd {
+	const char *prefix;
+	parse_cmd_fn_t fn;
+	unsigned args;
+	enum batch_state state;
+} command[] = {
+	{ "object", parse_cmd_object, 1, BATCH_STATE_OPEN },
+	{ "fflush", parse_cmd_fflush, 0, BATCH_STATE_OPEN },
+};
+
+static void batch_objects_stdin_cmd(struct batch_options *opt,
+				    struct strbuf *output,
+				    struct expand_data *data)
+{
+	struct strbuf input = STRBUF_INIT;
+	enum batch_state state = BATCH_STATE_OPEN;
+
+	/* Read each line dispatch its command */
+	while (!strbuf_getwholeline(&input, stdin, line_termination)) {
+		int i;
+		const struct parse_cmd *cmd = NULL;
+		struct strvec args = STRVEC_INIT;
+		size_t n;
+		const char *p;
+
+		if (*input.buf == line_termination)
+			die("empty command in input");
+		else if (isspace(*input.buf))
+			die("whitespace before command: %s", input.buf);
+
+		for (i = 0; i < ARRAY_SIZE(command); i++) {
+			const char *prefix = command[i].prefix;
+			char c;
+
+			if (!starts_with(input.buf, prefix))
+				continue;
+
+			/*
+			 * If the command has arguments, verify that it's
+			 * followed by a space. Otherwise, it shall be followed
+			 * by a line terminator.
+			 */
+			c = command[i].args ? ' ' : line_termination;
+			if (input.buf[strlen(prefix)] != c)
+				continue;
+
+			cmd = &command[i];
+			break;
+		}
+		if (!cmd)
+			die("unknown command: %s", input.buf);
+
+		/*
+		 * Read additional arguments. Do not raise an error in
+		 * case there is an early EOF to let the command
+		 * handle missing arguments with a proper error message
+		 */
+		p = input.buf + strlen(cmd->prefix) + 1;
+		if (*p == line_termination || !*p) {
+			n = 0;
+		} else {
+			const char *pos = strstr(p, "\n");
+			char *str = xstrndup(p, pos - p);
+
+			n = strvec_split_delim(&args, str, ' ', - 1);
+			free(str);
+		}
+		for (i = 0; i < args.nr; i++)
+			fprintf(stderr, "%d: <%s>\n", i, args.v[i]);
+		if (n < cmd->args)
+			die("%s: too few arguments", cmd->prefix);
+		if (n > cmd->args)
+			die("%s: too many arguments", cmd->prefix);
+
+		switch (state) {
+		case BATCH_STATE_OPEN:
+			/* TODO: command state management */
+			break;
+		}
+		cmd->fn(opt, args.nr, args.v, output, data);
+		strvec_clear(&args);
+	}
+
+	strbuf_release(&input);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
 	struct strbuf input = STRBUF_INIT;
@@ -515,6 +630,7 @@ static int batch_objects(struct batch_options *opt)
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
+	const int stdin_cmd = opt->stdin_cmd;
 
 	if (!opt->format)
 		opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -590,7 +706,8 @@ static int batch_objects(struct batch_options *opt)
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
 
-	while (strbuf_getline(&input, stdin) != EOF) {
+	while (!stdin_cmd &&
+	       strbuf_getline(&input, stdin) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
 			 * Split at first whitespace, tying off the beginning
@@ -608,6 +725,9 @@ static int batch_objects(struct batch_options *opt)
 		batch_one_object(input.buf, &output, opt, &data);
 	}
 
+	if (stdin_cmd)
+		batch_objects_stdin_cmd(opt, &output, &data);
+
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -685,6 +805,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			batch_option_callback),
 		OPT_CMDMODE(0, "batch-all-objects", &opt,
 			    N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
+		OPT_BOOL(0, "stdin-cmd", &batch.stdin_cmd,
+			 N_("with --batch[-check]: enters stdin 'command mode")),
+		OPT_BOOL('z', NULL, &batch.end_null, N_("with --stdin-cmd, use NUL termination")),
+
 		/* Batch-specific options */
 		OPT_GROUP(N_("Change or optimize batch output")),
 		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
@@ -738,6 +862,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
 		batch.buffer_output = batch.all_objects;
+	if (batch.end_null)
+		line_termination = '\0';
 
 	/* Return early if we're in batch mode? */
 	if (batch.enabled) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df9..8f339746ecf 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -964,4 +964,76 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
 	test_cmp expect actual
 '
 
+F='%s\0'
+
+test_expect_success 'stdin-cmd not enough arguments' '
+	echo "object " >cmd &&
+	test_expect_code 128 git cat-file --batch --stdin-cmd < cmd 2>err &&
+	grep -E "^fatal:.*too few arguments" err
+'
+
+test_expect_success 'stdin-cmd too many arguments' '
+	echo "object foo bar" >cmd &&
+	test_expect_code 128 git cat-file --batch --stdin-cmd < cmd 2>err &&
+	grep -E "^fatal:.*too many arguments" err
+'
+
+test_expect_success 'stdin-cmd unknown command' '
+	echo unknown_command >cmd &&
+	test_expect_code 128 git cat-file --batch --stdin-cmd < cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err &&
+	test_expect_code 128 git cat-file --batch --stdin-cmd -z < cmd 2>err &&
+	grep -E "^fatal:.*unknown command.*" err
+'
+
+test_expect_success 'setup object data' '
+	content="Object Data" &&
+	size=$(strlen "$content") &&
+	sha1=$(echo_without_newline "$content" | git hash-object -w --stdin)
+'
+
+test_expect_success 'stdin-cmd calling object works' '
+	echo "object $sha1" | git cat-file --batch --stdin-cmd >actual &&
+	echo "$sha1 blob $size" >expect &&
+	echo `git cat-file -p "$sha1"` >>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin-cmd -z calling object works' '
+	printf $F "object $sha1" | git cat-file --batch --stdin-cmd -z >actual &&
+	echo "$sha1 blob $size" >expect &&
+	echo `git cat-file -p "$sha1"` >>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup object data' '
+	content="Object Data" &&
+	size=$(strlen "$content") &&
+	sha1=$(echo_without_newline "$content" | git hash-object -w --stdin)
+'
+
+test_expect_success 'stdin-cmd calling object works' '
+	echo "object $sha1" | git cat-file --batch --stdin-cmd >actual &&
+	echo "$sha1 blob $size" >expect &&
+	echo `git cat-file -p "$sha1"` >>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin-cmd -z calling object works' '
+	printf $F "object $sha1" | git cat-file --batch --stdin-cmd -z >actual &&
+	echo "$sha1 blob $size" >expect &&
+	echo `git cat-file -p "$sha1"` >>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin-cmd fflush works' '
+	printf "fflush\n" > cmd &&
+	test_expect_code 0 git cat-file --batch --stdin-cmd < cmd 2>err
+'
+
+test_expect_success 'stdin-cmd -z fflush works' '
+	printf $F 'fflush' > cmd &&
+	test_expect_code 0 git cat-file --batch --stdin-cmd -z < cmd 2>err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] cat-file: add a --stdin-cmd mode
  2022-01-19 17:58 [PATCH 0/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
  2022-01-19 17:58 ` [PATCH 1/2] strvec.c: add a strvec_split_delim() John Cai via GitGitGadget
  2022-01-19 17:58 ` [PATCH 2/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
@ 2022-01-19 19:38 ` Taylor Blau
  2022-01-21 17:46   ` John Cai
  2 siblings, 1 reply; 5+ messages in thread
From: Taylor Blau @ 2022-01-19 19:38 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

On Wed, Jan 19, 2022 at 05:58:40PM +0000, John Cai via GitGitGadget wrote:
> I think this will be useful for other things, e.g. a not-trivial part of
> "cat-file --batch" time is spent on parsing its argument and seeing if it's
> a revision, ref etc.
>
> So we could e.g. add a command that only accepts a full-length 40 character
> SHA-1, or switch the --format output mid-request etc.

I would like to see a more concrete proposal or need for this sort of
thing before we go too far down adding a new mode to a command so
fundamental as cat-file is.

Between your two proposals for other commands that you could add, I am
not convinced that either of them needs to be in cat-file itself. IOW,
you could easily inject another process in between which verifies that
the provided objects are 40 character SHA-1s.

The latter, changing the output format in-process, seems dubious to me.
Is the start-up time of cat-file so slow (and you need to change formats
so often) that the two together are unbearable? I'd be surprised if they
were (and if so, we should focus our efforts on improving Git's start-up
time).

In the meantime, this is quite an invasive way to provide callers a way
to control the output stream. If there really is a need to just force
cat-file to flush more often, perhaps consider designating a special
signal that we could treat as a request to flush the output stream?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] cat-file: add a --stdin-cmd mode
  2022-01-19 19:38 ` [PATCH 0/2] " Taylor Blau
@ 2022-01-21 17:46   ` John Cai
  0 siblings, 0 replies; 5+ messages in thread
From: John Cai @ 2022-01-21 17:46 UTC (permalink / raw)
  To: Taylor Blau
  Cc: John Cai via GitGitGadget, git, Ævar Arnfjörð Bjarmason



On 19 Jan 2022, at 14:38, Taylor Blau wrote:

> On Wed, Jan 19, 2022 at 05:58:40PM +0000, John Cai via GitGitGadget wrote:
>> I think this will be useful for other things, e.g. a not-trivial part of
>> "cat-file --batch" time is spent on parsing its argument and seeing if it's
>> a revision, ref etc.
>>
>> So we could e.g. add a command that only accepts a full-length 40 character
>> SHA-1, or switch the --format output mid-request etc.
>
> I would like to see a more concrete proposal or need for this sort of
> thing before we go too far down adding a new mode to a command so
> fundamental as cat-file is.

Thanks for the feedback! I realized I should have made this an RFC first.
I’ll submit an RFC separately.

>
> Between your two proposals for other commands that you could add, I am
> not convinced that either of them needs to be in cat-file itself. IOW,
> you could easily inject another process in between which verifies that
> the provided objects are 40 character SHA-1s.
>
> The latter, changing the output format in-process, seems dubious to me.
> Is the start-up time of cat-file so slow (and you need to change formats
> so often) that the two together are unbearable? I'd be surprised if they
> were (and if so, we should focus our efforts on improving Git's start-up
> time).
>
> In the meantime, this is quite an invasive way to provide callers a way
> to control the output stream. If there really is a need to just force
> cat-file to flush more often, perhaps consider designating a special
> signal that we could treat as a request to flush the output stream?
>
> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-21 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 17:58 [PATCH 0/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
2022-01-19 17:58 ` [PATCH 1/2] strvec.c: add a strvec_split_delim() John Cai via GitGitGadget
2022-01-19 17:58 ` [PATCH 2/2] cat-file: add a --stdin-cmd mode John Cai via GitGitGadget
2022-01-19 19:38 ` [PATCH 0/2] " Taylor Blau
2022-01-21 17:46   ` John Cai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.