All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Cai <johncai86@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "John Cai via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>, "Taylor Blau" <me@ttaylorr.com>,
	"Phillip Wood" <phillip.wood123@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode
Date: Fri, 04 Feb 2022 16:41:56 -0500	[thread overview]
Message-ID: <3F83E252-D99C-47CB-B47F-11B8EA554A4F@gmail.com> (raw)
In-Reply-To: <CAPig+cQHcstj28F-==oCakp3iMLHtd1SVQV+snzewYihHxNG6g@mail.gmail.com>

Hi Eric,

I appreciate the feedback and the thorough review!

On 4 Feb 2022, at 1:45, Eric Sunshine wrote:

> ()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Add new flag --batch-command that will accept commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>>
>> contents <object> NL
>> info <object> NL
>>
>> With the following commands, process (A) can begin a "session" and
>> send a list of object names over stdin. When "get contents" or "get info"
>> is issued, this list of object names will be fed into batch_one_object()
>> to retrieve either info or contents. Finally an fflush() will be called
>> to end the session.
>>
>> begin
>> <sha1>
>> get info
>>
>> begin
>> <sha1>
>> get contents
>
> I had the same reaction to this new command set as Junio expressed
> upstream[1], and was prepared to suggest an alternative but Junio said
> everything I wanted to say, so I won't say anything more about it
> here.
>
> That aside, the implementation presented here is overly loose about
> malformed input and accepts all sorts of bogus invocations which it
> should be reporting as errors. For instance, a lone:
>
>     get info
>
> without a preceding `begin` should be an error but such bogus input is
> not diagnosed. `get contents` is likewise affected. Another example:
>
>     begin
>     <oid>
>     <EOF>
>
> which lacks a closing `get info` or `get contents` is silently
> ignored. Similarly, malformed:
>
>     begin
>     begin
>     ...
>
> should be reported as an error but is not.
>
> There is also a bug in which the accumulated list of OID's is never
> cleared. Thus:
>
>     begin
>     <oid>
>     get info
>     get info
>
> emits info about <oid> twice, once for each `get info` invocation. Similarly:
>
>     begin
>     <oid1>
>     get info
>     <oid2>
>     get info
>
> emits information about <oid1> twice and <oid2> once. Invoking `begin`
> between the `get info` commands doesn't help because `begin` is a
> no-op. Thus:
>
>     begin
>     <oid1>
>     get info
>     begin
>     <oid2>
>     get info
>
> likewise emits information about <oid1> twice and <oid2> once.
>
> It also incorrectly accepts non-"session" commands in the middle of a
> session. For instance:
>
>     begin
>     <oid1>
>     info <oid2>
>     <oid3>
>
> immediately emits information about <oid2> and then bombs out claiming
> that <oid3> is an "unknown command" because the `info` command --
> which should not have been allowed within a session -- prematurely
> ended the "session".
>
> The `info` and `contents` commands neglect to do any sort of
> validation of their arguments, thus any and all bogus invocations are
> accepted. Thus:
>
>     info
>     info <arg1> <arg2>
>     info <non-oid>
>
> are all accepted as valid invocations, misleadingly producing "<foo>
> missing" messages, rather than erroring out as they should. The "<oid>
> missing" message should be reserved for the case when the lone
> argument to `info` or `contents` is something which looks like a
> legitimate OID.

So actually the argument to info and contents doesn't have to be an OID but an object name that
eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the
same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My
goal was to maintain this behavior in --batch-command.
>
> The above is a long-winded way of saying that it is important not only
> to check the obvious "expect success" cases when implementing a new
> feature, but it's also important to add checks for the "expect
> failure" cases, such as all the above malformed inputs.

I overall agree and will add test coverage for invalid input.

> It's subjective, but it feels like this implementation is trying to be
> too clever by handling all these cases via a single strbuf_getline()
> loop in batch_objects_command(). It would be easier to reason about,
> and would have avoided some of the above problems, for instance, if
> handling of `begin` was split out to its own function which looped
> over strbuf_getline() itself, thus could easily have detected
> premature EOF (lacking `get info` or `get contents`) and likewise
> would not have allowed `info` or `contents` commands to be executed
> within a "session".
>
> Similarly (again subjective), the generic command dispatcher seems
> over-engineered and perhaps contributed to the oversight of `info` and
> `contents` failing to perform validation of their arguments. A simpler
> hand-rolled command-response loop is more common on this project and
> often easier to reason about. Perhaps something like this
> (pseudo-code):
>
>     while (strbuf_getline(&input, stdin)) {
>         char *end_cmd = strchrnul(&input.buf, ' ');
>         const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd;
>         *end_cmd = '\0';
>         if (strcmp(input.buf, "info"))
>             show_info(argstart);
>         else if (strcmp(input.buf, "contents"))
>             show_contents(argstart);
>         else if (strcmp(input.buf, "begin"))
>             begin_session(argstart);
>         else
>             die(...);
>     }

I think I agree that the code in this patch is a little hard to follow,
but now I'm planning to simplify the design by getting rid of "begin"[2].

I'm hoping this change will make the code easier to reason about.

2. https://lore.kernel.org/git/767d5f5a-8395-78bc-865f-a39acc39e061@gmail.com/

>
> and each of the command-handler functions would perform its own
> argument validation (including the case when no argument should be
> present).
>
> [1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/
>
>> +static void parse_cmd_begin(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       /* nothing needs to be done here */
>> +}
>
> Either this function should be clearing the list of collected OID's or...
>
>> +static void parse_cmd_get(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       struct string_list_item *item;
>> +       for_each_string_list_item(item, &revs) {
>> +               batch_one_object(item->string, output, opt, data);
>> +       }
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>
> ... this function should do so after it's done processing the OID list.
>
>> +}
>> +static void parse_cmd_get_info(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>
> Missing blank line before the function definition.
>
>> +static void parse_cmd_get_objects(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       opt->print_contents = 1;
>> +       parse_cmd_get(opt, line, output, data, revs);
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>> +}
>
> Why does this function duplicate the flushing logic of parse_cmd_get()
> immediately after calling parse_cmd_get()?
>
>> +static void batch_objects_command(struct batch_options *opt,
>> +                                   struct strbuf *output,
>> +                                   struct expand_data *data)
>> +{
>> +       /* Read each line dispatch its command */
>
> Missing "and".
>
>> +       while (!strbuf_getline(&input, stdin)) {
>> +               if (state == BATCH_STATE_COMMAND) {
>> +                       if (*input.buf == '\n')
>> +                               die("empty command in input");
>
> This never triggers because strbuf_getline() removes the line
> terminator before returning.

yes, this was an oversight. thanks for catching it!

>
>> +                       else if (isspace(*input.buf))
>> +                               die("whitespace before command: %s", input.buf);
>> +               }
>> +
>> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
>> +                       if (!skip_prefix(input.buf, prefix, &cmd_end))
>> +                               continue;
>> +                       /*
>> +                        * If the command has arguments, verify that it's
>> +                        * followed by a space. Otherwise, it shall be followed
>> +                        * by a line terminator.
>> +                        */
>> +                       c = commands[i].takes_args ? ' ' : '\n';
>> +                       if (*cmd_end && *cmd_end != c)
>> +                               die("arguments invalid for command: %s", commands[i].prefix);
>
> Ditto regarding strbuf_getline() removing line-terminator.
>
>> +                       cmd = &commands[i];
>> +                       if (cmd->takes_args)
>> +                               p = cmd_end + 1;
>> +                       break;
>> +               }
>> +
>> +               if (input.buf[input.len - 1] == '\n')
>> +                       input.buf[--input.len] = '\0';
>
> Ditto again. This is especially scary since it's accessing element
> `input.len-1` without even checking if `input.len` is greater than
> zero.

I realize we actually don't need this if block at all because strubf_getline()
strips the line terminator for us already.

>
> Also, it's better to use published strbuf API rather than mucking with
> the internals:
>
>     strbuf_setlen(&input, input.len - 1);
>
>> +               if (state == BATCH_STATE_INPUT && !cmd){
>> +                       string_list_append(&revs, input.buf);
>> +                       continue;
>> +               }
>> +
>> +               if (!cmd)
>> +                       die("unknown command: %s", input.buf);
>> +
>> +               state = cmd->next_state;
>> +               cmd->fn(opt, p, output, data, revs);
>> +       }
>> +       strbuf_release(&input);
>> +       string_list_clear(&revs, 0);
>> +}

  reply	other threads:[~2022-02-04 21:42 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
2022-02-03 19:28   ` Junio C Hamano
2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
2022-02-03 19:57   ` Junio C Hamano
2022-02-04  4:11     ` John Cai
2022-02-04 16:46       ` Phillip Wood
2022-02-04  6:45   ` Eric Sunshine
2022-02-04 21:41     ` John Cai [this message]
2022-02-05  6:52       ` Eric Sunshine
2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
2022-02-04 16:51     ` Phillip Wood
2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-07 23:58     ` Junio C Hamano
2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-07 23:34     ` Jonathan Tan
2022-02-08 11:00       ` Phillip Wood
2022-02-08 17:56         ` Jonathan Tan
2022-02-08 18:09           ` Junio C Hamano
2022-02-09  0:11             ` Jonathan Tan
2022-02-08  0:49     ` Junio C Hamano
2022-02-08 11:06     ` Phillip Wood
2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
2022-02-08 23:43       ` Junio C Hamano
2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-08 23:59       ` Junio C Hamano
2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-09 22:22       ` John Cai
2022-02-09 23:10         ` John Cai
2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-10 10:10         ` Christian Couder
2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-10 10:57         ` Phillip Wood
2022-02-10 17:05           ` Junio C Hamano
2022-02-11 17:45             ` John Cai
2022-02-11 20:07               ` Junio C Hamano
2022-02-11 21:30                 ` John Cai
2022-02-10 18:55           ` John Cai
2022-02-10 22:46         ` Eric Sunshine
2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-14 13:59           ` Phillip Wood
2022-02-14 16:19             ` John Cai
2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-15 19:39             ` Eric Sunshine
2022-02-15 22:58               ` John Cai
2022-02-15 23:20                 ` Eric Sunshine
2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16  1:28               ` Junio C Hamano
2022-02-16  2:48                 ` John Cai
2022-02-16  3:00                   ` Junio C Hamano
2022-02-16  3:17                     ` Eric Sunshine
2022-02-16  3:01                   ` Eric Sunshine
2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16 17:15                 ` Junio C Hamano
2022-02-16 17:25                   ` Eric Sunshine
2022-02-16 20:30                     ` John Cai
2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-18 11:26                   ` Phillip Wood
2022-02-18 16:53                     ` John Cai
2022-02-18 17:32                       ` Junio C Hamano
2022-02-18 17:23                     ` Junio C Hamano
2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
2022-02-22  3:31                       ` John Cai
2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
2022-02-22 11:07                   ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F83E252-D99C-47CB-B47F-11B8EA554A4F@gmail.com \
    --to=johncai86@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood123@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.