git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: do people find t5504.8 flaky?
Date: Tue, 12 Nov 2019 20:03:37 -0500	[thread overview]
Message-ID: <20191113010337.GA20023@sigill.intra.peff.net> (raw)
In-Reply-To: <20191113000747.GQ4348@szeder.dev>

On Wed, Nov 13, 2019 at 01:07:47AM +0100, SZEDER Gábor wrote:

> > No, I haven't seen it fail, nor does running with --stress turn up
> > anything.
> 
> I can reproduce the failure fairly quickly with '-r 1,8 --stress' (and
> nr of jobs = 4x cores).

I was able to reproduce with that, too (I wonder why I had so much
trouble before? Maybe "-r 1,8" simply gives many more opportunities per
second).

Running with "-x", I don't see any sign of a premature SIGPIPE death,
which is good:

  + git push --porcelain dst master:refs/heads/test
  remote: fatal: object of unexpected type        
  error: remote unpack failed: unpack-objects abnormal exit
  error: failed to push some refs to 'dst'
  + exit_code=1

The "exit_code" assignment there is from test_must_fail. And looking at
the code, I think we handle a remote hangup well, because the writer is
actually a pack-objects sub-process. So we hit this code in send-pack.c:

          rc = finish_command(&po);
          if (rc) {
                  /*
                   * For a normal non-zero exit, we assume pack-objects wrote
                   * something useful to stderr. For death by signal, though,
                   * we should mention it to the user. The exception is SIGPIPE
                   * (141), because that's a normal occurrence if the remote end
                   * hangs up (and we'll report that by trying to read the unpack
                   * status).
                   */
                  if (rc > 128 && rc != 141)
                          error("pack-objects died of signal %d", rc - 128);
                  return -1;
          }

But that error return might explain things.

The difference in your traces seems to be that in the failing case we
exit immediately after seeing the unpack error:

>   packet:     sideband< \2fatal: object of unexpected type
>   packet: receive-pack> unpack unpack-objects abnormal exit
>   packet: receive-pack> ng refs/heads/test unpacker error
>   packet: receive-pack> 0000
>   packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
>   packet: receive-pack> 0000
>   packet:     sideband< 0000
>   packet:         push< unpack unpack-objects abnormal exit
>   + test_cmp exp act
>   --- exp 2019-11-12 23:40:33.131679990 +0000
>   +++ act 2019-11-12 23:40:33.203680114 +0000
>   @@ -1,2 +0,0 @@
>   -To dst
>   -!      refs/heads/master:refs/heads/test       [remote rejected] (unpacker error)
>   error: last command exited with $?=1
>   not ok 8 - push with receive.fsckobjects

Note that "receive-pack" sent us the "ng refs/heads/test" line. And our
sideband demuxer even saw it! But it never made it to push.

Whereas in the successful case:

>   packet:     sideband< \2fatal: object of unexpected type
>   packet: receive-pack> unpack unpack-objects abnormal exit
>   packet: receive-pack> ng refs/heads/test unpacker error
>   packet: receive-pack> 0000
>   packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
>   packet:         push< unpack unpack-objects abnormal exit
>   packet:         push< ng refs/heads/test unpacker error
>   packet:         push< 0000
>   packet: receive-pack> 0000
>   packet:     sideband< 0000

We do see "push" getting it.

So for some reason we're deciding not to read the status from out
sideband demuxer. And that is probably because of this code in
send_pack():

                  if (pack_objects(out, remote_refs, extra_have, args) < 0) {
                          for (ref = remote_refs; ref; ref = ref->next)
                                  ref->status = REF_STATUS_NONE;
                          if (args->stateless_rpc)
                                  close(out);
                          if (git_connection_is_socket(conn))
                                  shutdown(fd[0], SHUT_WR);
  
                          /*
                           * Do not even bother with the return value; we know we
                           * are failing, and just want the error() side effects.
                           */
                          if (status_report)
                                  receive_unpack_status(&reader);
  
                          if (use_sideband) {
                                  close(demux.out);
                                  finish_async(&demux);
                          }
                          fd[1] = -1;
                          return -1;
                  }

If we saw a pack-objects error, then we do bother to receive the unpack
status, but we don't stick around to get the individual ref status, and
everything stays as REF_STATUS_NONE.

And looking in transport_print_push_status(), NONE is skipped when
printing. We probably ought to be seeding it with a different status
there, but there's not really one for "hey, _we_ failed". Let's hack it
to use REMOTE_REJECT, put put in our own message, like so:

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..f6d609a244 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -565,8 +565,10 @@ int send_pack(struct send_pack_args *args,
 
 	if (need_pack_data && cmds_sent) {
 		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
+			for (ref = remote_refs; ref; ref = ref->next) {
+				ref->status = REF_STATUS_REMOTE_REJECT;
+				ref->remote_status = xstrdup("pack-objects error");
+			}
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))

Now when the stress-test fails, I get:

  --- exp	2019-11-13 00:48:53.269943875 +0000
  +++ act	2019-11-13 00:48:53.317943957 +0000
  @@ -1,2 +1,2 @@
   To dst
  -!	refs/heads/master:refs/heads/test	[remote rejected] (unpacker error)
  +!	refs/heads/master:refs/heads/test	[remote rejected] (pack-objects error)

So we're definitely on the right track! Of course we could say "unpacker
error" and the test failure would go away. But it feels a bit like
lying, since the remote didn't tell us that.

So really, what we want for this case is to just get the remote status,
like so:

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..d4db965b96 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -565,19 +565,19 @@ int send_pack(struct send_pack_args *args,
 
 	if (need_pack_data && cmds_sent) {
 		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))
 				shutdown(fd[0], SHUT_WR);
 
 			/*
 			 * Do not even bother with the return value; we know we
-			 * are failing, and just want the error() side effects.
+			 * are failing, and just want the error() side effects,
+			 * as well as marking refs with the remote status (if
+			 * we get one).
 			 */
 			if (status_report)
-				receive_unpack_status(&reader);
+				receive_status(&reader, remote_refs);
 
 			if (use_sideband) {
 				close(demux.out);

I was worried at first that we might make things worse for the case that
the network has hung up completely (which would cause pack-objects to
fail, but also cause us to not get anything from the remote). But this
is really no worse. Even in the existing code, we'd complain to stderr
about trying to read the unpack status. And then when we read the remote
ref status, as soon as we see a bad packet we just quietly stop reading
(thus leaving any unmentioned refs as EXPECTING_REPORT).

So with that second patch above, the test failure goes away for me.

-Peff

  reply	other threads:[~2019-11-13  1:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23  2:45 do people find t5504.8 flaky? Junio C Hamano
2019-04-23  3:02 ` Jeff King
2019-11-13  0:07   ` SZEDER Gábor
2019-11-13  1:03     ` Jeff King [this message]
2019-11-13  2:07       ` Jeff King
2019-11-18 22:30         ` SZEDER Gábor
2019-11-18 23:25           ` Randall S. Becker
2019-11-13  3:47       ` Junio C Hamano
2019-04-29 13:36 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113010337.GA20023@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).