From: John Cai <johncai86@gmail.com>
To: git@vger.kernel.org
Cc: avarab@gmail.com, me@ttaylorr.com, phillip.wood123@gmail.com,
e@80x24.org, John Cai <johncai86@gmail.com>
Subject: [RFC v3] cat-file: add a --stdin-cmd mode
Date: Fri, 28 Jan 2022 13:33:19 -0500 [thread overview]
Message-ID: <20220128183319.43496-1-johncai86@gmail.com> (raw)
This RFC patch proposes a new flag --batch-command that works with
git-cat-file --batch. Similar to git-update-ref --stdin, it will accept
commands and arguments from stdin.
The start of this idea was discussed in [1], where the original
motivation was to be able to control when the buffer was flushed to
stdout in --buffer mode.
However, this can actually be much more useful in situations when
git-cat-file --batch is being used as a long lived backend query
process. At GitLab, we use a pair of cat-file processes. One for
iterating over object metadata with --batch-check, and the other to grab
object contents with --batch. However, if we had --batch-command, we could
get rid of the second --batch-check process, and just have one process
where we can flip between getting object info, and getting object contents.
This can lead to huge savings since on a given server there could be hundreds to
thousands of git cat-file processes at a time.
git cat-file --batch-command
$ <command> [arg1] [arg2] NL
This patch adds three commands: object, info, fflush
$ object <sha1> NL
$ info <sha1> NL
$ fflush NL
These three would be immediately useful in GitLab's context, but one can
imagine this mode to be further extended for other things.
Future improvements:
- a non-trivial part of "cat-file --batch" time is spent
on parsing its argument and seeing if it's a revision, ref etc. So we
could add a command that only accepts a full-length 40
character SHA-1.
This would be the first step in adding such an interface to
git-cat-file.
[1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
---
Taylor, I'd be interested in your thoughts on this proposal since you helped
review the previous patch that turned into this RFC. Thanks!
Changes from v2:
- refactored tests to be within run_tests()
- added a test to test --buffer behavior with fflush
- cleaned up cat-file.c: clarified var names, removed unnecessary code
based on suggestions from Phillip Wood
- removed strvec changes
Changes from v1:
- changed option name to batch-command.
- changed command function interface to receive the whole line after the command
name to put the onus of parsing arguments to each individual command function.
- pass in whole line to batch_one_object in both parse_cmd_object and
parse_cmd_info to support spaces in the object reference.
- removed addition of -z to include in a separate patch series
- added documentation.
---
Documentation/git-cat-file.txt | 15 +++++
builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++--
t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++
3 files changed, 205 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index bef76f4dd0..254e546c79 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,6 +96,21 @@ OPTIONS
need to specify the path, separated by whitespace. See the
section `BATCH OUTPUT` below for details.
+--batch-command::
+ Enter a command mode that reads from stdin. May not be combined with any
+ other options or arguments except `--textconv` or `--filters`, in which
+ case the input lines also need to specify the path, separated by
+ whitespace. See the section `BATCH OUTPUT` below for details.
+
+object <object>::
+ Print object contents for object reference <object>
+
+info <object>::
+ Print object info for object reference <object>
+
+flush::
+ Flush to stdout immediately when used with --buffer
+
--batch-all-objects::
Instead of reading a list of objects on stdin, perform the
requested batch operation on all objects in the repository and
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7b3f42950e..cc9e47943b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -24,9 +24,11 @@ struct batch_options {
int buffer_output;
int all_objects;
int unordered;
- int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+ int mode; /* may be 'w' or 'c' for --filters or --textconv */
const char *format;
+ int command;
};
+static char line_termination = '\n';
static const char *force_path;
@@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
if (data->type == OBJ_BLOB) {
if (opt->buffer_output)
fflush(stdout);
- if (opt->cmdmode) {
+ if (opt->mode) {
char *contents;
unsigned long size;
if (!data->rest)
die("missing path for '%s'", oid_to_hex(oid));
- if (opt->cmdmode == 'w') {
+ if (opt->mode == 'w') {
if (filter_object(data->rest, 0100644, oid,
&contents, &size))
die("could not convert '%s' %s",
oid_to_hex(oid), data->rest);
- } else if (opt->cmdmode == 'c') {
+ } else if (opt->mode == 'c') {
enum object_type type;
if (!textconv_object(the_repository,
data->rest, 0100644, oid,
@@ -326,7 +328,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
die("could not convert '%s' %s",
oid_to_hex(oid), data->rest);
} else
- BUG("invalid cmdmode: %c", opt->cmdmode);
+ BUG("invalid mode: %c", opt->mode);
batch_write(opt, contents, size);
free(contents);
} else {
@@ -508,6 +510,95 @@ static int batch_unordered_packed(const struct object_id *oid,
data);
}
+static void parse_cmd_object(struct batch_options *opt,
+ const char *line,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ opt->print_contents = 1;
+ batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_info(struct batch_options *opt,
+ const char *line,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ opt->print_contents = 0;
+ batch_one_object(line, output, opt, data);
+}
+
+static void parse_cmd_fflush(struct batch_options *opt,
+ const char *line,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ fflush(stdout);
+}
+
+typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
+ struct strbuf *, struct expand_data *);
+
+static const struct parse_cmd {
+ const char *prefix;
+ parse_cmd_fn_t fn;
+ unsigned takes_args;
+} commands[] = {
+ { "object", parse_cmd_object, 1},
+ { "info", parse_cmd_info, 1},
+ { "fflush", parse_cmd_fflush, 0},
+};
+
+static void batch_objects_command(struct batch_options *opt,
+ struct strbuf *output,
+ struct expand_data *data)
+{
+ struct strbuf input = STRBUF_INIT;
+
+ /* Read each line dispatch its command */
+ while (!strbuf_getwholeline(&input, stdin, line_termination)) {
+ int i;
+ const struct parse_cmd *cmd = NULL;
+ const char *p, *cmd_end;
+
+ if (*input.buf == line_termination)
+ die("empty command in input");
+ else if (isspace(*input.buf))
+ die("whitespace before command: %s", input.buf);
+
+ for (i = 0; i < ARRAY_SIZE(commands); i++) {
+ const char *prefix = commands[i].prefix;
+ char c;
+ if (!skip_prefix(input.buf, prefix, &cmd_end))
+ continue;
+ /*
+ * If the command has arguments, verify that it's
+ * followed by a space. Otherwise, it shall be followed
+ * by a line terminator.
+ */
+ c = commands[i].takes_args ? ' ' : line_termination;
+ if (input.buf[strlen(prefix)] != c)
+ die("arguments invalid for command: %s", commands[i].prefix);
+
+ cmd = &commands[i];
+ if (cmd->takes_args) {
+ p = cmd_end + 1;
+ // strip newline before handing it to the
+ // handling function
+ input.buf[strcspn(input.buf, "\n")] = '\0';
+ }
+
+ break;
+ }
+
+ if (!cmd)
+ die("unknown command: %s", input.buf);
+
+ cmd->fn(opt, p, output, data);
+ }
+ strbuf_release(&input);
+}
+
static int batch_objects(struct batch_options *opt)
{
struct strbuf input = STRBUF_INIT;
@@ -515,6 +606,7 @@ static int batch_objects(struct batch_options *opt)
struct expand_data data;
int save_warning;
int retval = 0;
+ const int command = opt->command;
if (!opt->format)
opt->format = "%(objectname) %(objecttype) %(objectsize)";
@@ -529,7 +621,7 @@ static int batch_objects(struct batch_options *opt)
strbuf_expand(&output, opt->format, expand_format, &data);
data.mark_query = 0;
strbuf_release(&output);
- if (opt->cmdmode)
+ if (opt->mode)
data.split_on_whitespace = 1;
/*
@@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt)
save_warning = warn_on_object_refname_ambiguity;
warn_on_object_refname_ambiguity = 0;
+ if (command)
+ batch_objects_command(opt, &output, &data);
+
while (strbuf_getline(&input, stdin) != EOF) {
if (data.split_on_whitespace) {
/*
@@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt,
bo->enabled = 1;
bo->print_contents = !strcmp(opt->long_name, "batch");
+ bo->command = !strcmp(opt->long_name, "batch-command");
bo->format = arg;
return 0;
@@ -683,6 +779,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
N_("like --batch, but don't emit <contents>"),
PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
batch_option_callback),
+ OPT_CALLBACK_F(0, "batch-command", &batch, N_(""),
+ N_("enters batch mode that accepts commands"),
+ PARSE_OPT_NOARG | PARSE_OPT_NONEG,
+ batch_option_callback),
OPT_CMDMODE(0, "batch-all-objects", &opt,
N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
/* Batch-specific options */
@@ -742,7 +842,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
/* Return early if we're in batch mode? */
if (batch.enabled) {
if (opt_cw)
- batch.cmdmode = opt;
+ batch.mode = opt;
else if (opt && opt != 'b')
usage_msg_optf(_("'-%c' is incompatible with batch mode"),
usage, options, opt);
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 145eee11df..92f0b14a95 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -112,6 +112,46 @@ maybe_remove_timestamp () {
fi
}
+run_buffer_test () {
+ type=$1
+ sha1=$2
+ size=$3
+ flush=$4
+
+ mkfifo V.input
+ exec 8<>V.input
+ rm V.input
+
+ mkfifo V.output
+ exec 9<>V.output
+ rm V.output
+
+ (
+ git cat-file --buffer --batch-command <&8 >&9 &
+ echo $! >&9 &&
+ wait $!
+ echo >&9 EARLY_EXIT
+ ) &
+ sh_pid=$!
+ read fi_pid <&9
+ test_when_finished "
+ exec 8>&-; exec 9>&-;
+ kill $sh_pid && wait $sh_pid
+ kill $fi_pid && wait $fi_pid
+ true"
+ expect=$(echo "$sha1 $type $size")
+ echo "info $sha1" >&8
+ if test "$flush" = "true"
+ then
+ echo "fflush" >&8
+ else
+ expect="EARLY_EXIT"
+ kill $fi_pid && wait $fi_pid
+ fi
+ read actual <&9
+ test "$actual" = "$expect"
+}
+
run_tests () {
type=$1
sha1=$2
@@ -177,6 +217,18 @@ $content"
test_cmp expect actual
'
+ test -z "$content" ||
+ test_expect_success "--batch-command output of $type content is correct" '
+ maybe_remove_timestamp "$batch_output" $no_ts >expect &&
+ maybe_remove_timestamp "$(echo object $sha1 | git cat-file --batch-command)" $no_ts >actual &&
+ test_cmp expect actual
+ '
+
+ test_expect_success "batch-command output of $type info is correct" '
+ echo "$sha1 $type $size" >expect &&
+ echo "info $sha1" | git cat-file --batch-command >actual &&
+ test_cmp expect actual
+'
test_expect_success "custom --batch-check format" '
echo "$type $sha1" >expect &&
echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
@@ -232,12 +284,29 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
test_cmp expect actual
'
+test_expect_success '--batch-command --buffer with flush is correct for blob' '
+ run_buffer_test 'blob' $hello_sha1 $hello_size true
+'
+
+test_expect_success '--batch-command --buffer without flush is correct for blob' '
+ run_buffer_test 'blob' $hello_sha1 $hello_size false
+'
+
tree_sha1=$(git write-tree)
+
tree_size=$(($(test_oid rawsz) + 13))
tree_pretty_content="100644 blob $hello_sha1 hello"
run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+test_expect_success '--batch-command --buffer with flush is correct for tree' '
+ run_buffer_test 'tree' $tree_sha1 $tree_size true
+'
+
+test_expect_success '--batch-command --buffer without flush is correct for tree' '
+ run_buffer_test 'tree' $tree_sha1 $tree_size false
+'
+
commit_message="Initial commit"
commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
commit_size=$(($(test_oid hexsz) + 137))
@@ -263,6 +332,14 @@ tag_size=$(strlen "$tag_content")
run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
+test_expect_success '--batch-command --buffer with flush is correct for tag' '
+ run_buffer_test 'tag' $tag_sha1 $tag_size true
+'
+
+test_expect_success '--batch-command --buffer without flush is correct for tag' '
+ run_buffer_test 'tag' $tag_sha1 $tag_size false
+'
+
test_expect_success \
"Reach a blob from a tag pointing to it" \
"test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
@@ -964,4 +1041,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace'
test_cmp expect actual
'
+test_expect_success 'batch-command unknown command' '
+ echo unknown_command >cmd &&
+ test_expect_code 128 git cat-file --batch-command < cmd 2>err &&
+ grep -E "^fatal:.*unknown command.*" err
+'
+
test_done
--
2.34.1
next reply other threads:[~2022-01-28 18:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-28 18:33 John Cai [this message]
2022-01-31 11:44 ` [RFC v3] cat-file: add a --stdin-cmd mode Bagas Sanjaya
2022-01-31 18:24 ` Junio C Hamano
2022-02-01 9:48 ` Ævar Arnfjörð Bjarmason
2022-02-01 9:39 ` Christian Couder
2022-02-01 17:52 ` Taylor Blau
2022-02-01 19:27 ` John Cai
2022-02-01 20:14 ` Taylor Blau
[not found] ` <3FE1D509-8AD0-4F0E-9298-DFD3552A98EF@gmail.com>
2022-02-02 1:45 ` Taylor Blau
2022-02-02 13:11 ` John Cai
2022-02-01 10:43 ` Phillip Wood
2022-02-02 15:05 ` John Cai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220128183319.43496-1-johncai86@gmail.com \
--to=johncai86@gmail.com \
--cc=avarab@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=phillip.wood123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).