From: "Ævar Arnfjörð Bjarmason" <email@example.com>
To: Junio C Hamano <firstname.lastname@example.org>
Cc: email@example.com, Emily Shaffer <firstname.lastname@example.org>,
Eric Sunshine <email@example.com>,
Felipe Contreras <firstname.lastname@example.org>,
Taylor Blau <email@example.com>,
Michael Strawbridge <firstname.lastname@example.org>,
Phillip Wood <email@example.com>
Subject: Re: [PATCH v2 5/5] hook: support a --to-stdin=<path> option
Date: Thu, 09 Feb 2023 02:56:24 +0100 [thread overview]
Message-ID: <firstname.lastname@example.org> (raw)
On Wed, Feb 08 2023, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <email@example.com> writes:
>> From: Emily Shaffer <firstname.lastname@example.org>
>> Expose the "path_to_stdin" API added in the preceding commit in the
>> "git hook run" command.
>> For now we won't be using this command interface outside of the tests,
>> but exposing this functionality makes it easier to test the hook
>> API. The plan is to use this to extend the "sendemail-validate"
> What does it take to tackle the obvious leftover bits of [4/5]? Use
> tempfile API to allocate a temporary file, slurp the input and close
> it, and then use the "path_to_stdin" feature to spawn the hook?
You did ask for it :)
The below is something I wrote for the end of the initial v2 CL, but
then decided it was way too long and dropped it.
The tl;dr is that no, that would be the shortest way forward, and
arguably what we should do now.
But those hooks currently print straight to the pipe, so having the API
force them to use a tempfile would suck.
But since this is all going for supporting N hooks in parallel the next
step requires an API that's future-proofing the feeding of the same
content to multiple hooks.
So, without further adieu, that dropped part of the v2 CL:
The rest of this is a large digression about the future API design of
this topic, please don't read ahead unless you're very curious about
that (mainly I wanted to brain-dump this somewhere).
I considered expanding this series to include the rest of the
remaining "post-rewrite" hook in sequencer.c, but as that needs at
least a couple of prep patches to expand the API I've left it out for
The main reason I didn't include (aside from the "let's start small"
of this topic) it is that I still don't like the API we'll eventually
need for the parallel hooks that take "stdin", and would like to mull
it over a bit.
What comes after this series eventually wants to convert these hooks
struct child_process proc = CHILD_PROCESS_INIT;
for item in transaction:
write_in_full(proc.in, item, strlen(item));
struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT_PARALLEL;
struct string_list to_stdin = STRING_LIST_INIT_DUP;
opt.feed_pipe = pipe_from_string_list;
opt.feed_pipe_ctx = &to_stdin;
for item in transaction:
The reason is that if we're producing data for stdin we'll need to
give it to N hooks. For this topic we neatly side-step that with a
"to-stdin", as Junio notes in  and .
I think even that is arguably a bit ugly, but it's all internally
changable uglyness, i.e. for the "sendemail-validate" whether we feed
"git hook" with content on stdin or are forced to create a file and
feed it with "--to-stdin" is something we can change later.
But the answer to the question raised in  is that we'll eventually
need something like the above. But I don't like it, because:
A) The currently proposed API wants to represent lines to the hooks
are a "struct string_list", so you add "\n"-less items to it, and
we'll always print "%s\n" for each item.
This is an arbitrary limitation over the write_in_full() we do now,
and will be a hassle e.g. if you have already prepared content,
you'll first need to line-split it.
Maybe I'm missing some reason for why the hook interface needs to
intrinsically promise that it'll be doing write()'s to the hooks
ending in \n's, but right now I don't see that, we could leave that
up to each hook, and if they're spewing content larger than that
the hook itself should just be line-buffering if they care about
B) Even if we internally line-split a "struct string_list *" is just a
bad fit as we lose the string size, we should pass something that
give us a "size_t len" (which we could stuff in the "util" field,
C) We need to buffer up the stdin in full before we feed the first
line to the hook, which seems to me to be an API design that
creates the problem the second paragraph of  claims to be trying
to avoid. I.e. "simply taking a string_list or strbuf is not as
scalable as using a callback".
That would be true if the result of the callback were streamed to
the N hooks we have, but it's not the case. It's a callback
mechanism that amounts to just handing off a big "struct
string_list", which we need to fully populate before we start the
D) Most importantly, the API seems to be structured around a problem
we don't actually have.
The more general problem *could be* that you'd want to feed N hooks
with the same content, we ourselves get that content streamed into
us on "stdin", and therefore need to either buffer it in full
up-front, or as we read it re-spew it into the N hooks we're
But e.g. for the "reference-transaction" hook all we need to
support N hooks is to allocate a single "size_t pos" for them, as
when we're executing them we have a "struct ref_transaction
*transaction" that doesn't change for the duration of the hook. The
same goes for the "pre-push.
Actually, the only eventual API users that really could have used
buffering to ensure consistent results won't use the buffering
API. E.g. for the "post-rewrite" and "rebase" hooks we consume a
file in ".git/" to give to the hooks on stdin. Currently (and still
with this topic) we'll only have one hook, so the file's content
will always be the same.
But if we were being paranoid we'd buffer it up, so that we could
ensure that our N hooks all get the same input, but for the API
users that could get a benefit from that we don't use it, but only
for those that are guaranteed not to need it.
So before the next iteration I'll try to find some time to play with
that. I.e. I think we can rip out the whole "struct string_list" feeder,
and just have an eventual mechanism for each of the N hooks to
init/release their "feed stdin" state with callbacks, and to save away
their state in their own "void *".
Then e.g. for the reference-transaction we'd just allocate a "size_t
pos" per hook, and then just spew content at them on the fly from the
transaction struct, no pre-generation necessary.
Such an interface will nicely support streaming without pre-buffering
delay, and could even run lock-less while multi-threaded (the source
data being const, and (almost) all state per-thread.
The interface would then be general enough to support
pre-slurping/buffering content at the start, and then streaming to N
hooks, e.g. for the "read a file, but guarantee that everyone gets the
same version". We won't need a string list for that, at most a strbuf,
but maybe we can just mmap() those...
next prev parent reply other threads:[~2023-02-09 2:10 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-23 17:15 [PATCH 0/5] hook API: support stdin, convert post-rewrite Ævar Arnfjörð Bjarmason
2023-01-23 17:15 ` [PATCH 1/5] run-command.c: remove dead assignment in while-loop Ævar Arnfjörð Bjarmason
2023-01-23 22:48 ` Junio C Hamano
2023-01-23 17:15 ` [PATCH 2/5] run-command: allow stdin for run_processes_parallel Ævar Arnfjörð Bjarmason
2023-01-23 22:52 ` Junio C Hamano
2023-01-23 17:15 ` [PATCH 3/5] hook API: support passing stdin to hooks, convert am's 'post-rewrite' Ævar Arnfjörð Bjarmason
2023-01-23 23:10 ` Junio C Hamano
2023-01-23 23:13 ` Junio C Hamano
2023-01-23 17:15 ` [PATCH 4/5] sequencer: use the new hook API for the simpler "post-rewrite" call Ævar Arnfjörð Bjarmason
2023-01-24 14:46 ` Phillip Wood
2023-01-27 15:08 ` Phillip Wood
2023-01-23 17:15 ` [PATCH 5/5] hook: support a --to-stdin=<path> option for testing Ævar Arnfjörð Bjarmason
2023-01-24 0:55 ` Junio C Hamano
2023-01-24 0:59 ` Michael Strawbridge
2023-02-08 19:21 ` [PATCH v2 0/5] hook API: support stdin, convert post-rewrite Ævar Arnfjörð Bjarmason
2023-02-08 19:21 ` [PATCH v2 1/5] run-command.c: remove dead assignment in while-loop Ævar Arnfjörð Bjarmason
2023-02-08 21:03 ` Junio C Hamano
2023-02-08 19:21 ` [PATCH v2 2/5] run-command: allow stdin for run_processes_parallel Ævar Arnfjörð Bjarmason
2023-02-08 21:09 ` Junio C Hamano
2023-02-08 19:21 ` [PATCH v2 3/5] hook API: support passing stdin to hooks, convert am's 'post-rewrite' Ævar Arnfjörð Bjarmason
2023-02-08 21:12 ` Junio C Hamano
2023-02-08 19:21 ` [PATCH v2 4/5] sequencer: use the new hook API for the simpler "post-rewrite" call Ævar Arnfjörð Bjarmason
2023-02-08 21:17 ` Junio C Hamano
2023-02-08 19:21 ` [PATCH v2 5/5] hook: support a --to-stdin=<path> option Ævar Arnfjörð Bjarmason
2023-02-08 21:22 ` Junio C Hamano
2023-02-09 1:56 ` Ævar Arnfjörð Bjarmason [this message]
2023-02-08 21:23 ` [PATCH v2 0/5] hook API: support stdin, convert post-rewrite Junio C Hamano
2023-02-03 12:15 Ævar Arnfjörð Bjarmason
2023-02-03 12:15 ` [PATCH v2 5/5] hook: support a --to-stdin=<path> option Ævar Arnfjörð Bjarmason
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).