git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "John Cai via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, phillip.wood123@gmail.com,
	avarab@gmail.com, e@80x24.org, bagasdotme@gmail.com,
	John Cai <johncai86@gmail.com>
Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode
Date: Thu, 03 Feb 2022 11:57:51 -0800	[thread overview]
Message-ID: <xmqqo83nsoxs.fsf@gitster.g> (raw)
In-Reply-To: <ebd2a1356017905245744babf32c5b78a0af1639.1643915286.git.gitgitgadget@gmail.com> (John Cai via GitGitGadget's message of "Thu, 03 Feb 2022 19:08:06 +0000")

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode

"cat-file: add --batch-command mode" perhaps.  The patch touching
the file "catfile.c" (which does not even exist) is an irrelevant
implementation detail to spend 2 extra bytes in "git shortlog"
output.

> From: John Cai <johncai86@gmail.com>
>
> Add new flag --batch-command that will accept commands and arguments
> from stdin, similar to git-update-ref --stdin.
>
> At GitLab, we use a pair of long running cat-file processes when
> accessing object content. One for iterating over object metadata with
> --batch-check, and the other to grab object contents with --batch.
>
> However, if we had --batch-command, we wouldnt need to keep both
> processes around, and instead just have one process where we can flip
> between getting object info, and getting object contents. This means we
> can get rid of roughly half of long lived git cat-file processes. This
> can lead to huge savings since on a given server there could be hundreds
> of git cat-file processes running.

Hmph, why hundreds, not two you listed?

Do you mean "we have two per repository, but by combining, we can do
with just one per repository, halving the number of processes"?

> git cat-file --batch-command
>
> would enter an interactive command mode whereby the user can enter in
> commands and their arguments:
>
> <command> [arg1] [arg2] NL
>
> This patch adds the basic structure for add command which can be
> extended in the future to add more commands.
>
> This patch also adds the following commands:
>
> contents <object> NL
> info <object> NL
>
> The contents command takes an <object> argument and prints out the object
> contents.
>
> The info command takes a <object> argument and prints out the object
> metadata.
>
> In addition, we need a set of commands that enable a "read session".
>
> When a separate process (A) is connected to a git cat-file process (B)

This description misleads readers into thinking as if we have just
created a daemon process that is running, and an unrelated process
can connect to it, which obviously poses a question about securing
the connection.  It is my understanding that what this creates is
just a consumer process (A) starts the cat-file process (B) locally
on its behalf under process (A)'s privilege, and they talk over pipe
without allowing any third-party to participate in the exchange, so
we should avoid misleading users by saying "is connected to" here.

> and is interactively writing to and reading from it in --buffer mode,
> (A) needs to be able to know when the buffer is flushed to stdout.

If A and B are talking over a pair pipes, in order to avoid
deadlocking, both ends need to be able to control whose turn it is
to speak (and it is turn for the other side to listen).  A needs to
be able to _control_ (not "know") when the buffer it uses to write
to B gets flushed, in order to reliably say "I am done for now, it
is your turn to speak" and be assured that it reaches B.  The story
is the same for the other side.  When a request by A needs to be
responded with multiple lines of output, B needs to be able to say
"And that concludes my response, and I am ready to accept a new
request from you" and make sure it reaches A.  "know when..." is
probably a wrong phrase here.

> Currently, from (A)'s perspective, the only way is to either 1. exit
> (B)'s process or 2. send an invalid object to stdin. 1. is not ideal
> from a performance perspective as it will require spawning a new cat-file
> process each time, and 2. is hacky and not a good long term solution.

Writing enumeration as bulletted or enumerated list would make it
much easier to read, I would think.

    From (A)'s perspective, the only way is to either 

    1. exit (B)'s process or
    2. send an invalid object to stdin.

    1. is not ideal from a performance perspective, as it will
    require spawning a new cat-file process each time, and 2. is
    hacky and not a good long term solution.

I am not sure what you exactly mean by "exit" in the above.  Do you
mean "kill" instead?

> With the following commands, process (A) can begin a "session" and
> send a list of object names over stdin. When "get contents" or "get info"
> is issued, this list of object names will be fed into batch_one_object()
> to retrieve either info or contents. Finally an fflush() will be called
> to end the session.
>
> begin NL
> get contents NL
> get info NL
>
> These can be used in the following way:
>
> begin
> <sha1>
> <sha1>
> <sha1>
> <sha1>
> get info
>
> begin
> <sha1>
> <sha1>
> <sha1>
> <sha1>
> get contents
>
> With this mechanism, process (A) can be guaranteed to receive all of the
> output even in --buffer mode.

OK, so do these "get blah" serve both as command and an implicit
"flush"?

With an implicit "flush", do we really need "begin"?

Also, from the point of view of extensibility, not saying what kind
of operation is started when given "begin" is probably not a good
idea.  "get info" and "get contents" may happen to be the only
commands that are supported right now, and the parameters to them
may happen to be just list of object names and nothing else, but
what happens when a new "git frotz" command is added and its
operation is extended with something other than object names and
pathnames?  The way to parse these parameter lines for the "get"
would be different for different commands, and if "cat-file" knows
upfront what is to be done to these parameters, it can even start
prefetching and precomputing to reduce latency observed by the
client before the final "get info" command is given.

So, from that point of view,

	begin <cmd>
	<parameter>
	<parameter>
	...
	end

may be a better design, no?

  reply	other threads:[~2022-02-03 19:57 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
2022-02-03 19:28   ` Junio C Hamano
2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
2022-02-03 19:57   ` Junio C Hamano [this message]
2022-02-04  4:11     ` John Cai
2022-02-04 16:46       ` Phillip Wood
2022-02-04  6:45   ` Eric Sunshine
2022-02-04 21:41     ` John Cai
2022-02-05  6:52       ` Eric Sunshine
2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
2022-02-04 16:51     ` Phillip Wood
2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-07 23:58     ` Junio C Hamano
2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-07 23:34     ` Jonathan Tan
2022-02-08 11:00       ` Phillip Wood
2022-02-08 17:56         ` Jonathan Tan
2022-02-08 18:09           ` Junio C Hamano
2022-02-09  0:11             ` Jonathan Tan
2022-02-08  0:49     ` Junio C Hamano
2022-02-08 11:06     ` Phillip Wood
2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
2022-02-08 23:43       ` Junio C Hamano
2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-08 23:59       ` Junio C Hamano
2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-09 22:22       ` John Cai
2022-02-09 23:10         ` John Cai
2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-10 10:10         ` Christian Couder
2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-10 10:57         ` Phillip Wood
2022-02-10 17:05           ` Junio C Hamano
2022-02-11 17:45             ` John Cai
2022-02-11 20:07               ` Junio C Hamano
2022-02-11 21:30                 ` John Cai
2022-02-10 18:55           ` John Cai
2022-02-10 22:46         ` Eric Sunshine
2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-14 13:59           ` Phillip Wood
2022-02-14 16:19             ` John Cai
2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-15 19:39             ` Eric Sunshine
2022-02-15 22:58               ` John Cai
2022-02-15 23:20                 ` Eric Sunshine
2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16  1:28               ` Junio C Hamano
2022-02-16  2:48                 ` John Cai
2022-02-16  3:00                   ` Junio C Hamano
2022-02-16  3:17                     ` Eric Sunshine
2022-02-16  3:01                   ` Eric Sunshine
2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16 17:15                 ` Junio C Hamano
2022-02-16 17:25                   ` Eric Sunshine
2022-02-16 20:30                     ` John Cai
2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-18 11:26                   ` Phillip Wood
2022-02-18 16:53                     ` John Cai
2022-02-18 17:32                       ` Junio C Hamano
2022-02-18 17:23                     ` Junio C Hamano
2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
2022-02-22  3:31                       ` John Cai
2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
2022-02-22 11:07                   ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqo83nsoxs.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).