Re: [RFC PATCH] receive-pack: run post-receive before reporting status

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Robin Jarry <robin.jarry@6wind.com>
Cc: git@vger.kernel.org, Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	Jan Smets <jan.smets@nokia.com>,
	Stephen Morton <stephen.morton@nokia.com>,
	Jeff King <peff@peff.net>
Subject: Re: [RFC PATCH] receive-pack: run post-receive before reporting status
Date: Sat, 06 Nov 2021 06:03:22 +0100	[thread overview]
Message-ID: <211106.86y261g88n.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20211104133546.1967308-1-robin.jarry@6wind.com>


On Thu, Nov 04 2021, Robin Jarry wrote:

> When a remote client exits while the pre-receive hook is running,
> receive-pack is not killed by SIGPIPE because the signal is ignored.
> This is a side effect of commit ec7dbd145bd8 ("receive-pack: allow hooks
> to ignore its standard input stream").

FWIW we include the date when mentioning commits. E.g. ec7dbd145bd
(receive-pack: allow hooks to ignore its standard input stream,
2014-09-12).

> The pre-receive hook is not interrupted and does not receive any error
> since its stdout is a pipe which is read in an async thread and output
> back to the client socket in a side band channel. When writing the data
> in the socket, the async thread gets a SIGPIPE which also seems ignored.
> This may be a race between the main and the async threads. I do not know
> the code well enough to be sure.
>
> After the pre-receive has exited the SIGPIPE default handler is restored
> and if the hook did not report any error, objects are migrated from
> temporary to permanent storage.
>
> Before running the post-receive hook, status info is reported back to
> the client. Since the client has died, receive-pack is killed by SIGPIPE
> and post-receive is never executed.
>
> The post-receive hook is often used to send email notifications (see
> contrib/hooks/post-receive-email), update bug trackers, start automatic
> builds, etc. Not executing it after an interrupted yet "successful" push
> can lead to inconsistencies.
>
> Execute the post-receive hook before reporting status to the client to
> avoid this issue. This is not an ideal solution but I don't know if
> allowing hooks to be killed when a client exits is a good idea. Maybe
> for pre-receive but definitely not for post-receive.
>
> Signed-off-by: Robin Jarry <robin.jarry@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  builtin/receive-pack.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 49b846d96052..df8bedf71319 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2564,14 +2564,14 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
>  		use_keepalive = KEEPALIVE_ALWAYS;
>  		execute_commands(commands, unpack_status, &si,
>  				 &push_options);
> +		run_receive_hook(commands, "post-receive", 1,
> +				 &push_options);
>  		if (pack_lockfile)
>  			unlink_or_warn(pack_lockfile);
>  		if (report_status_v2)
>  			report_v2(commands, unpack_status);
>  		else if (report_status)
>  			report(commands, unpack_status);
> -		run_receive_hook(commands, "post-receive", 1,
> -				 &push_options);
>  		run_update_post_hook(commands);
>  		string_list_clear(&push_options, 0);
>  		if (auto_gc) {

I think the discussion at [1] is current to everything you're seeing
here.

tl;dr: Even with this change you're not guaranteed to run your hook.

Personally I've implemented something (independent of) what Junio
suggested downthread[2] of that. I.e. to simply insert a DB record on
pre-receive/post-receive, and have all "real" work done async by a job
that's following that.

I used MySQL as a dumb queue, but this can also be done with a text
file. I'd end up with 3x records:

 A. pre-receive: what the client wants to push
 B. pre-receive (at the very end): what we accepted the client pushing (after running all checks)
 C. post-receive: logging the same rev range, hopefully

As you've found you won't always get a "C", so such a following job
currently needs to repair such records after the fact, i.e. be ready to
inspect the repo and see if the push actually happened. You also won't
get "C" if you OK'd a push, but had a race for the *.lock file, or other
similar push contention.

I think one objection some might have to this is that we'd like to not
wait for the post-receive hook, which I falsely recalled would be
impacted by this, but as Jeff King points out at [3] we'd do the same
either way, so this change won't impact that either way.

But I think one thing that will be negatively impacted is touched upon
by your:

    "Since the client has died[...]Not executing it after an interrupted
    yet "successful" push can lead to inconsistencies".

You don't know if it died, you just got killed by SIGPIPE. That can
happen for any number of reasons, the client might have gotten its data,
you just can't reach it anymore.

I think you're right that it's inconsistent, but wrong about this
"fixing" the inconsistency. I.e. the inconsistency is just being moved
from the server-side to the client-side.

I'd think that in this case we'd very much want to give the client the
benefit of the doubt, because the server can more easily work around
issues with its hook workflow.

But a client inherently can't work around not getting an "OK & flush"
while waiting for the hook to execute, and in the meantime the cat
unplugged the WiFi, so we won't be getting the "OK" at all.

I.e. if put a "sleep 30" in a post-receive hook, push, and in the middle
of that sleep have the client disconnect from the network the push will
have gone through.

But aren't we changing what gets shown to the client from being a
successful push to a non-successful one, since they never got the
report() (and therefore flush) they were expecting? *Goes and tests*

Yes, e.g. with this:

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 49b846d9605..fc273e7dc4d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2567,9 +2567,9 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
                if (pack_lockfile)
                        unlink_or_warn(pack_lockfile);
                if (report_status_v2)
-                       report_v2(commands, unpack_status);
+                       exit(0);
                else if (report_status)
-                       report(commands, unpack_status);
+                       exit(0);
                run_receive_hook(commands, "post-receive", 1,
                                 &push_options);
                run_update_post_hook(commands);

I've made an attempt to emulate that, and running that we'll get various
test suite failures with e.g.:
    
    + git push dest HEAD
    Enumerating objects: 4, done.
    Counting objects: 100% (4/4), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (4/4), 1.25 KiB | 1.25 MiB/s, done.
    Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
    send-pack: unexpected disconnect while reading sideband packet
    fatal: the remote end hung up unexpectedly
    error: last command exited with $?=128

Which is a race we'll definitely see now, but would increase in
frequency if we wait longer in sending the OK.

But as noted in [1] there's a way forward to having our cake & eating it
too. I.e. when we into that on the server-side we can try a little
harder not to die, and run post-receive anyway, and in either case it
would be nice if we'd run it after disconnecting from the client, so it
doesn't have to wait for it.

1. https://lore.kernel.org/git/5795EB1C.1080102@nokia.com/
2. https://lore.kernel.org/git/xmqqlh0d8w6v.fsf@gitster.mtv.corp.google.com/
3. https://lore.kernel.org/git/20160803193018.ydhmxntikhyowmjz@sigill.intra.peff.net/