git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Phillip Wood" <phillip.wood123@gmail.com>,
	"Anthony Sottile" <asottile@umich.edu>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: git 2.36.0 regression: pre-commit hooks no longer have stdout/stderr as tty
Date: Wed, 20 Apr 2022 10:09:38 -0700	[thread overview]
Message-ID: <CAJoAoZm7p32Hn=TLQeWUqp_nMjo_TQ2whR4F=cXk4c6PV1M5bA@mail.gmail.com> (raw)
In-Reply-To: <xmqqr15rr9k6.fsf@gitster.g>

On Wed, Apr 20, 2022 at 9:42 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
> >> It *is* true that run-command.c:pp_start_one() sets child_process:err=-1
> >> for the child and run-command.c:run_hook_ve() didn't do that; that -1
> >> means that start_command() will create a new fd for the child's stderr.
> >> Since run_hook_ve() didn't care about the child's stderr before, I
> >> wonder if that is why? Could it be that now that we're processing the
> >> child's stderr, the child no longer thinks stderr is in tty, because the
> >> parent is consuming its output?
> >
> > Exactly, stderr is redirected to a pipe so that we can buffer the
> > output from each process and then write it to the real stdout when the
> > process has finished to avoid the output from different processes
> > getting mixed together. Ideally in this case we'd see that stdout is a
> > tty and create a pty rather than a pipe when buffering the output from
> > the process.
>
> Ah, thanks, and sigh.  That means this was an unintended regression
> caused by use of parallel infrastructure, mixed with a bit of "the
> original problem report wrote hook properly so that when it is not
> connected to a terminal (such as in this new implementation) it
> refrains to do terminal-y things like coloring, so everything is
> working as intended" ;-).
>
> IIRC, the parallel subprocess stuff was invented to spawn multiple
> tasks we internally need (like "checkout these submodules") that are
> not interactive (hence does not need access to stdin) en masse, and
> the output buffering is there to avoid interleaving the output that
> would make it unreadable.
>
> Use of the parallel subprocess API means that we inherently cannot
> give access to the standard input to the hooks.  The users of the
> original run_hooks_ve() API would be OK with that, because it did
> .no_stdin=1 before the problematic hooks API rewrite, but I wonder
> what our plans should be for hooks that want to go interactive.
> They could open /dev/tty themselves (and that would have been the
> only way to go interactive even in the old world order, so it is
> perfectly acceptable to keep it that way with .no_stdin=1), but if
> they run in parallel, the end-user would not know whom they are
> typing to (and which output lines are the prompts they are expected
> to respond to).
>
> In the longer term, there are multiple possible action items.
>
>  * We probably would want to design a bit better anti-interleaving
>    machinery than "buffer everything and show only after the process
>    exists", if we want to keep using the parallel subprocess API.
>    And that would help the original "do this thing in multiple
>    submodules at the same time" use case, too.

I've noticed this too, but for very noisy things which are
parallelized, I'm not sure a better user experience is possible. I
suppose we could pick the "first" job in the task queue and print that
output as it comes in, so that users are aware that *something* is
happening?

[job 0 starts]
[job 1 starts]
job 0 says 0-foo
[job 1 says 1-foo, but it's buffered]
job 0 says 0-bar
[job 1 says 1-bar, but it's buffered]
[job 0 finishes]
[we replay the buffer from job 1 so far:]
job 1 says 1-foo
job 1 says 1-bar
job 1 says 1-baz
[job 1 finishes]

I think it could be possible, but then job 1 still will never learn
that it's a tty, because it's being buffered to prevent interleaving,
even if we have the illusion of non-buffering.

>
>  * We should teach hooks API to make it _optional_ to use the
>    parallel subprocess API.  If we are not spawning hooks in
>    parallel today, there is no reason to incur this regression by
>    using the parallel subprocess API---this was a needress bug, and
>    I am angry.

To counter, I think that having hooks invoked via two different
mechanisms depending on how many are provided or whether they are
parallelized is a mess to debug and maintain. I still stand by the
decision to use the parallel subprocess API, which I think was
reasonable to expect to do the same thing when jobs=1, and I think we
should continue to do so. It simplifies the hook code significantly.

>
>  * the hooks API should learn a mechanism for multiple hooks to
>    coordinate their executions.  Perhaps they indicate their
>    preference if they are OK to be run in parallel, and those that
>    want isolation will be run one-at-a-time before or after others
>    run in parallel, or something.

There is such a mechanism for hooks overall, but not yet for
individual hooks. I know we discussed it at length[1] before, and
decided it would be okay to figure this out later on. I suppose "later
on" may have come :)

>
>  * The hooks API should learn a mechanism for us to tell what
>    execution environment they are in.  Ideally, the hooks, if it is
>    sane to run under the parallel subprocess API, shouldn't have
>    been learning if they are talking to an interactive human user by
>    looking at isatty(), but we should have been explicitly telling
>    them that they are, perhaps by exporting an environment
>    variable.  There may probably be more clue hooks writers want
>    other than "am I talking to human user?" that we would want to
>    enumerate before going this route.

Hm. I was going to mention that Ævar and I discussed the possibility
of setting an environment variable for hook child processes, telling
them which hook they are being run as - e.g.
"GIT_HOOK=prepare-commit-msg" - but I suppose that relying on that
alone doesn't tell us anything about whether the parent is being run
in tty. I agree it could be very useful to simply pass
GIT_PARENT_ISATTY to hooks (and I suppose other child processes).
Could we simply do that from start_command() or something else deep in
run-command.h machinery? Then Anthony's use case becomes

if [-t 1|| GIT_PARENT_ISATTY]
 ...

and no need to examine Git version.

 - Emily

1: https://lore.kernel.org/git/20210527000856.695702-2-emilyshaffer%40google.com
under "Parallelization with dependencies" (and preceding
conversations)

  reply	other threads:[~2022-04-20 17:09 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 18:59 git 2.36.0 regression: pre-commit hooks no longer have stdout/stderr as tty Anthony Sottile
2022-04-19 23:37 ` Emily Shaffer
2022-04-19 23:52   ` Anthony Sottile
2022-04-20  9:00   ` Phillip Wood
2022-04-20 12:25     ` Ævar Arnfjörð Bjarmason
2022-04-20 16:22       ` Emily Shaffer
2022-04-20 16:42     ` Junio C Hamano
2022-04-20 17:09       ` Emily Shaffer [this message]
2022-04-20 17:25         ` Junio C Hamano
2022-04-20 17:41           ` Emily Shaffer
2022-04-21 12:03             ` Ævar Arnfjörð Bjarmason
2022-04-21 17:24               ` Junio C Hamano
2022-04-21 18:40                 ` Junio C Hamano
2022-04-20  4:23 ` Junio C Hamano
2022-04-21 12:25 ` [PATCH 0/6] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Ævar Arnfjörð Bjarmason
2022-04-21 12:25   ` [PATCH 1/6] run-command API: replace run_processes_parallel_tr2() with opts struct Ævar Arnfjörð Bjarmason
2022-04-23  4:24     ` Junio C Hamano
2022-04-28 23:16     ` Emily Shaffer
2022-04-29 16:44       ` Junio C Hamano
2022-04-21 12:25   ` [PATCH 2/6] run-command tests: test stdout of run_command_parallel() Ævar Arnfjörð Bjarmason
2022-04-23  4:24     ` Junio C Hamano
2022-04-21 12:25   ` [PATCH 3/6] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-04-23  3:54     ` Junio C Hamano
2022-04-28 23:26     ` Emily Shaffer
2022-04-21 12:25   ` [PATCH 4/6] hook tests: fix redirection logic error in 96e7225b310 Ævar Arnfjörð Bjarmason
2022-04-23  3:54     ` Junio C Hamano
2022-04-21 12:25   ` [PATCH 5/6] hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr" Ævar Arnfjörð Bjarmason
2022-04-29 22:54     ` Junio C Hamano
2022-04-21 12:25   ` [PATCH 6/6] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-04-28 23:31     ` Emily Shaffer
2022-04-29 23:09     ` Junio C Hamano
2022-04-21 17:35   ` [PATCH 0/6] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Junio C Hamano
2022-04-21 18:50     ` Ævar Arnfjörð Bjarmason
2022-05-18 20:05   ` [PATCH v2 0/8] " Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 1/8] run-command tests: change if/if/... to if/else if/else Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 2/8] run-command API: use "opts" struct for run_processes_parallel{,_tr2}() Ævar Arnfjörð Bjarmason
2022-05-18 21:45       ` Junio C Hamano
2022-05-25 13:18       ` Emily Shaffer
2022-05-18 20:05     ` [PATCH v2 3/8] run-command tests: test stdout of run_command_parallel() Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 4/8] run-command.c: add an initializer for "struct parallel_processes" Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 5/8] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-05-18 21:51       ` Junio C Hamano
2022-05-26 17:18       ` Emily Shaffer
2022-05-27 16:08         ` Junio C Hamano
2022-05-18 20:05     ` [PATCH v2 6/8] hook tests: fix redirection logic error in 96e7225b310 Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 7/8] hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr" Ævar Arnfjörð Bjarmason
2022-05-18 20:05     ` [PATCH v2 8/8] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-05-18 21:53       ` Junio C Hamano
2022-05-26 17:23       ` Emily Shaffer
2022-05-26 18:23         ` Ævar Arnfjörð Bjarmason
2022-05-26 18:54           ` Emily Shaffer
2022-05-25 11:30     ` [PATCH v2 0/8] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Johannes Schindelin
2022-05-25 13:00       ` Ævar Arnfjörð Bjarmason
2022-05-25 16:57       ` Junio C Hamano
2022-05-26  1:10         ` Junio C Hamano
2022-05-26 10:16           ` Ævar Arnfjörð Bjarmason
2022-05-26 16:36             ` Junio C Hamano
2022-05-26 19:13               ` Ævar Arnfjörð Bjarmason
2022-05-26 19:56                 ` Junio C Hamano
2022-05-27  9:14     ` [PATCH v3 0/2] " Ævar Arnfjörð Bjarmason
2022-05-27  9:14       ` [PATCH v3 1/2] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-05-27 16:58         ` Junio C Hamano
2022-05-27  9:14       ` [PATCH v3 2/2] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-05-27 17:17       ` [PATCH v3 0/2] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Junio C Hamano
2022-05-31 17:32       ` [PATCH v4 " Ævar Arnfjörð Bjarmason
2022-05-31 17:32         ` [PATCH v4 1/2] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-06-01 16:49           ` Johannes Schindelin
2022-06-01 17:09             ` Junio C Hamano
2022-05-31 17:32         ` [PATCH v4 2/2] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-06-01 16:50           ` Johannes Schindelin
2022-06-01 16:53         ` [PATCH v4 0/2] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Johannes Schindelin
2022-06-02 14:07         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2022-06-02 14:07           ` [PATCH v5 1/2] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-06-02 14:07           ` [PATCH v5 2/2] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-06-02 20:05           ` [PATCH v5 0/2] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Junio C Hamano
2022-06-03  8:51           ` Phillip Wood
2022-06-03  9:20             ` Ævar Arnfjörð Bjarmason
2022-06-03 13:21               ` Phillip Wood
2022-06-07  8:49                 ` Ævar Arnfjörð Bjarmason
2022-06-07  8:48           ` [PATCH v6 " Ævar Arnfjörð Bjarmason
2022-06-07  8:48             ` [PATCH v6 1/2] run-command: add an "ungroup" option to run_process_parallel() Ævar Arnfjörð Bjarmason
2022-06-17  0:07               ` Emily Shaffer
2022-06-07  8:48             ` [PATCH v6 2/2] hook API: fix v2.36.0 regression: hooks should be connected to a TTY Ævar Arnfjörð Bjarmason
2022-06-07 17:08               ` Junio C Hamano
2022-06-07 17:02             ` [PATCH v6 0/2] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJoAoZm7p32Hn=TLQeWUqp_nMjo_TQ2whR4F=cXk4c6PV1M5bA@mail.gmail.com' \
    --to=emilyshaffer@google.com \
    --cc=asottile@umich.edu \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).