All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
@ 2019-02-14 21:33 Johannes Schindelin via GitGitGadget
  2019-02-14 21:33 ` [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it Johannes Schindelin via GitGitGadget
  2019-02-14 22:17 ` [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Randall S. Becker
  0 siblings, 2 replies; 21+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-02-14 21:33 UTC (permalink / raw)
  To: git; +Cc: Randall Becker, Junio C Hamano

The last-minute patch to replace /dev/zero with a Perl script snippet broke
the Linux part of the CI builds on Azure Pipelines: it timed out. The
culprit is the rb/no-dev-zero-in-test branch (see the build for this branch 
here [https://dev.azure.com/gitgitgadget/git/_build/results?buildId=1727]).

All of master, next, jch and pu are broken that way. You might see it in the
commit status of the active branches
[https://github.com/gitgitgadget/git/branches/active].

Turns out that it is that particular Perl script snippet which for some
reason hangs the build. If you kill it, t5562.15 succeeds, if you don't kill
it, it will hang indefinitely (or until killed).

Sadly, despite my earnest attempts, I could not figure out why it hangs in
those Linux agents (I could not reproduce that hang locally), or for that
matter, why it does not hang in the Windows and macOS agents.

Let's avoid that hang. This patch fixes things on Azure Pipelines, and my
hope is that it also fixes the hang on NonStop.

Johannes Schindelin (1):
  tests: teach the test-tool to generate NUL bytes and use it

 Makefile                               |  1 +
 t/helper/test-genzeros.c               | 22 ++++++++++++++++++++++
 t/helper/test-tool.c                   |  1 +
 t/helper/test-tool.h                   |  1 +
 t/t5562-http-backend-content-length.sh |  2 +-
 t/test-lib-functions.sh                |  8 +-------
 6 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 t/helper/test-genzeros.c


base-commit: 8989e1950a845ceeb186d490321a4f917ca4de47
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-126%2Fdscho%2Ffix-t5562-hang-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-126/dscho/fix-t5562-hang-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/126
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it
  2019-02-14 21:33 [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Johannes Schindelin via GitGitGadget
@ 2019-02-14 21:33 ` Johannes Schindelin via GitGitGadget
  2019-02-14 22:13   ` Junio C Hamano
  2019-02-14 22:17 ` [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Randall S. Becker
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-02-14 21:33 UTC (permalink / raw)
  To: git; +Cc: Randall Becker, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In cc95bc2025 (t5562: replace /dev/zero with a pipe from
generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
is not available on NonStop, apparently) by a Perl script snippet to
generate NUL bytes.

Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.

Worse, this also hangs in the Ubuntu 16.04 agents of the CI builds on
Azure Pipelines: for some reason, the Perl script snippet that is run
via `generate_zero_bytes` in t5562's 'CONTENT_LENGTH overflow ssite_t'
test case tries to write out an infinite amount of NUL bytes unless a
broken pipe is encountered, that snippet never encounters the broken
pipe, and keeps going until the build times out.

Oddly enough, this does not reproduce on the Windows and macOS agents,
nor in a local Ubuntu 18.04.

This developer tried for a day to figure out the exact circumstances
under which this hang happens, to no avail, the details remain a
mystery.

In the end, though, what counts is that this here change incidentally
fixes that hang (maybe also on NonStop?). Even more positively, it gets
rid of yet another unnecessary Perl invocation.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Makefile                               |  1 +
 t/helper/test-genzeros.c               | 22 ++++++++++++++++++++++
 t/helper/test-tool.c                   |  1 +
 t/helper/test-tool.h                   |  1 +
 t/t5562-http-backend-content-length.sh |  2 +-
 t/test-lib-functions.sh                |  8 +-------
 6 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 t/helper/test-genzeros.c

diff --git a/Makefile b/Makefile
index f0b2299172..c5240942f2 100644
--- a/Makefile
+++ b/Makefile
@@ -740,6 +740,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-genzeros.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-hash-speed.o
diff --git a/t/helper/test-genzeros.c b/t/helper/test-genzeros.c
new file mode 100644
index 0000000000..f607f800a9
--- /dev/null
+++ b/t/helper/test-genzeros.c
@@ -0,0 +1,22 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+
+int cmd__genzeros(int argc, const char **argv)
+{
+	long count;
+
+	if (argc > 2) {
+		fprintf(stderr, "usage: %s [<count>]\n", argv[0]);
+		return 1;
+	}
+
+	count = argc > 1 ? strtol(argv[1], NULL, 0) : -1L;
+
+	while (count < 0 || count--) {
+		if (putchar(0) == EOF)
+			return -1;
+	}
+
+	return 0;
+}
+
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 50c55f8b1a..99db7409b8 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,6 +19,7 @@ static struct test_cmd cmds[] = {
 	{ "dump-untracked-cache", cmd__dump_untracked_cache },
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
+	{ "genzeros", cmd__genzeros },
 	{ "hashmap", cmd__hashmap },
 	{ "hash-speed", cmd__hash_speed },
 	{ "index-version", cmd__index_version },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index a563df49bf..25abed1cf2 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
+int cmd__genzeros(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
 int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index bbadde2c6e..c0105423e6 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -143,7 +143,7 @@ test_expect_success GZIP 'push gzipped empty' '
 
 test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	NOT_FIT_IN_SSIZE=$(ssize_b100dots) &&
-	generate_zero_bytes infinity  | env \
+	generate_zero_bytes | env \
 		CONTENT_TYPE=application/x-git-upload-pack-request \
 		QUERY_STRING=/repo.git/git-upload-pack \
 		PATH_TRANSLATED="$PWD"/.git/git-upload-pack \
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 094c07748a..80402a428f 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -120,13 +120,7 @@ remove_cr () {
 # If $1 is 'infinity', output forever or until the receiving pipe stops reading,
 # whichever comes first.
 generate_zero_bytes () {
-	perl -e 'if ($ARGV[0] == "infinity") {
-		while (-1) {
-			print "\0"
-		}
-	} else {
-		print "\0" x $ARGV[0]
-	}' "$@"
+	test-tool genzeros "$@"
 }
 
 # In some bourne shell implementations, the "unset" builtin returns
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it
  2019-02-14 21:33 ` [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it Johannes Schindelin via GitGitGadget
@ 2019-02-14 22:13   ` Junio C Hamano
  2019-02-15 14:59     ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2019-02-14 22:13 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Randall Becker, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In cc95bc2025 (t5562: replace /dev/zero with a pipe from
> generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
> is not available on NonStop, apparently) by a Perl script snippet to
> generate NUL bytes.
>
> Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.
> ...
> In the end, though, what counts is that this here change incidentally
> fixes that hang (maybe also on NonStop?). Even more positively, it gets
> rid of yet another unnecessary Perl invocation.

Thanks for a quick band-aid.

Will apply directly to 'master' so that we won't forget before -rc2.

In the meantime, perhaps somebody who knows Perl interpreter's
quirks well can tell us what's different between the obvious and
simple C program and an equivalent in Perl to convince us why this
is a good solution to the problem.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 21:33 [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Johannes Schindelin via GitGitGadget
  2019-02-14 21:33 ` [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it Johannes Schindelin via GitGitGadget
@ 2019-02-14 22:17 ` Randall S. Becker
  2019-02-14 22:33   ` Max Kirillov
  2019-02-14 22:38   ` Junio C Hamano
  1 sibling, 2 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-14 22:17 UTC (permalink / raw)
  To: 'Johannes Schindelin via GitGitGadget', git
  Cc: 'Junio C Hamano', 'Max Kirillov'

On February 14, 2019 16:33, Johannes Schindelin wrote:
> To: git@vger.kernel.org
> Cc: Randall Becker <rsbecker@nexbridge.com>; Junio C Hamano
> <gitster@pobox.com>
> Subject: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> 
> The last-minute patch to replace /dev/zero with a Perl script snippet broke
> the Linux part of the CI builds on Azure Pipelines: it timed out. The culprit is
> the rb/no-dev-zero-in-test branch (see the build for this branch here
> [https://dev.azure.com/gitgitgadget/git/_build/results?buildId=1727]).
> 
> All of master, next, jch and pu are broken that way. You might see it in the
> commit status of the active branches
> [https://github.com/gitgitgadget/git/branches/active].
> 
> Turns out that it is that particular Perl script snippet which for some reason
> hangs the build. If you kill it, t5562.15 succeeds, if you don't kill it, it will hang
> indefinitely (or until killed).
> 
> Sadly, despite my earnest attempts, I could not figure out why it hangs in
> those Linux agents (I could not reproduce that hang locally), or for that
> matter, why it does not hang in the Windows and macOS agents.
> 
> Let's avoid that hang. This patch fixes things on Azure Pipelines, and my hope
> is that it also fixes the hang on NonStop.
> 
> Johannes Schindelin (1):
>   tests: teach the test-tool to generate NUL bytes and use it
> 
>  Makefile                               |  1 +
>  t/helper/test-genzeros.c               | 22 ++++++++++++++++++++++
>  t/helper/test-tool.c                   |  1 +
>  t/helper/test-tool.h                   |  1 +
>  t/t5562-http-backend-content-length.sh |  2 +-
>  t/test-lib-functions.sh                |  8 +-------
>  6 files changed, 27 insertions(+), 8 deletions(-)  create mode 100644
> t/helper/test-genzeros.c
> 
> 
> base-commit: 8989e1950a845ceeb186d490321a4f917ca4de47
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-
> 126%2Fdscho%2Ffix-t5562-hang-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-126/dscho/fix-
> t5562-hang-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/126

Unfortunately, subtest 13 still hangs on NonStop, even with this patch, so our Pipeline still hangs. I'm glad it's better on Azure, but I don't think this actually addresses the root cause of the hang. This is now the fourth attempt at fixing this. Is it possible this is not the test that is failing, but actually the git-http-backend? The code is not in a loop, if that helps. It is not consuming any significant cycles. I don't know that part of the code at all, sadly. The code is here:

* in the operating system from here up *
  cleanup_children + 0x5D0 (UCr)
  cleanup_children_on_exit + 0x70 (UCr)
  git_atexit_dispatch + 0x200 (UCr)
  __process_atexit_functions + 0xA0 (DLL zcredll)
  CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
  exit + 0x2A0 (DLL zcrtldll)
  die_webcgi + 0x240 (UCr)
  die_errno + 0x360 (UCr)
  write_or_die + 0x1C0 (UCr)
  end_headers + 0x1A0 (UCr)
  die_webcgi + 0x220 (UCr)
  die + 0x320 (UCr)
  inflate_request + 0x520 (UCr)
  run_service + 0xC20 (UCr)
  service_rpc + 0x530 (UCr)
  cmd_main + 0xD00 (UCr)
  main + 0x190 (UCr)

Best guess is that a signal (SIGCHLD?) is possibly getting eaten or neglected somewhere between the test, perl, and git-http-backend.

Stuck,
Randall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:17 ` [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Randall S. Becker
@ 2019-02-14 22:33   ` Max Kirillov
  2019-02-14 22:59     ` Randall S. Becker
  2019-02-14 23:04     ` Randall S. Becker
  2019-02-14 22:38   ` Junio C Hamano
  1 sibling, 2 replies; 21+ messages in thread
From: Max Kirillov @ 2019-02-14 22:33 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Junio C Hamano', 'Max Kirillov'

On Thu, Feb 14, 2019 at 05:17:26PM -0500, Randall S. Becker wrote:
> Unfortunately, subtest 13 still hangs on NonStop, even
> with this patch, so our Pipeline still hangs. I'm glad
> it's better on Azure, but I don't think this actually
> addresses the root cause of the hang. This is now the
> fourth attempt at fixing this. Is it possible this is not
> the test that is failing, but actually the
> git-http-backend? The code is not in a loop, if that
> helps. It is not consuming any significant cycles. I don't
> know that part of the code at all, sadly. The code is
> here:
> 
> * in the operating system from here up *
>   cleanup_children + 0x5D0 (UCr)

... so does the process which the stack was taken from has
any children processes still?

I could imagine if a child somehow manages to end up in
uninterruptible sleep, then probably it would never complete
this way, wouldn't it?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:17 ` [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Randall S. Becker
  2019-02-14 22:33   ` Max Kirillov
@ 2019-02-14 22:38   ` Junio C Hamano
  2019-02-14 23:01     ` Randall S. Becker
  2019-02-18 21:06     ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2019-02-14 22:38 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> Unfortunately, subtest 13 still hangs on NonStop, even with this
> patch, so our Pipeline still hangs. I'm glad it's better on Azure,
> but I don't think this actually addresses the root cause of the
> hang.

Sigh.

> possible this is not the test that is failing, but actually the
> git-http-backend? The code is not in a loop, if that helps. It is
> not consuming any significant cycles. I don't know that part of
> the code at all, sadly. The code is here:
>
> * in the operating system from here up *
>   cleanup_children + 0x5D0 (UCr)
>   cleanup_children_on_exit + 0x70 (UCr)
>   git_atexit_dispatch + 0x200 (UCr)
>   __process_atexit_functions + 0xA0 (DLL zcredll)
>   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
>   exit + 0x2A0 (DLL zcrtldll)
>   die_webcgi + 0x240 (UCr)
>   die_errno + 0x360 (UCr)
>   write_or_die + 0x1C0 (UCr)
>   end_headers + 0x1A0 (UCr)
>   die_webcgi + 0x220 (UCr)
>   die + 0x320 (UCr)
>   inflate_request + 0x520 (UCr)
>   run_service + 0xC20 (UCr)
>   service_rpc + 0x530 (UCr)
>   cmd_main + 0xD00 (UCr)
>   main + 0x190 (UCr)
>
> Best guess is that a signal (SIGCHLD?) is possibly getting eaten
> or neglected somewhere between the test, perl, and
> git-http-backend.

So we are trying to die(), which actually happens in die_webcgi(),
and then try to write some message _but_ notice an error inside
write_or_dir() and try to exit because we do not want to recurse
forever trying to die, giving a message to say how/why we died, and
die because failing to give that message, forever.

But in our attempt to exit(), we try to "cleanup children" and that
is what gets stuck.

One big difference before and after the /dev/zero change is that the
process is now on a downstream of the pipe.  If we prepare a large
file with a finite size full of NULs and replace /dev/null with it,
instead of feeding NULs from the pipe, would it change the equation?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:33   ` Max Kirillov
@ 2019-02-14 22:59     ` Randall S. Becker
  2019-02-14 23:04     ` Randall S. Becker
  1 sibling, 0 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-14 22:59 UTC (permalink / raw)
  To: 'Max Kirillov'
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Junio C Hamano'

On February 14, 2019 17:34, Max Kirillov wrote:
> On Thu, Feb 14, 2019 at 05:17:26PM -0500, Randall S. Becker wrote:
> > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but
> > I don't think this actually addresses the root cause of the hang. This
> > is now the fourth attempt at fixing this. Is it possible this is not
> > the test that is failing, but actually the git-http-backend? The code
> > is not in a loop, if that helps. It is not consuming any significant
> > cycles. I don't know that part of the code at all, sadly. The code is
> > here:
> >
> > * in the operating system from here up *
> >   cleanup_children + 0x5D0 (UCr)
> 
> ... so does the process which the stack was taken from has any children
> processes still?
> 
> I could imagine if a child somehow manages to end up in uninterruptible
> sleep, then probably it would never complete this way, wouldn't it?

From what I can tell (previously reported), none of the children are dead.
git-http-backend is waiting and the others are in a read state. I can try to
get full stack traces once the current cycle ends.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:38   ` Junio C Hamano
@ 2019-02-14 23:01     ` Randall S. Becker
  2019-02-18 20:41       ` Johannes Schindelin
  2019-02-18 21:06     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 21+ messages in thread
From: Randall S. Becker @ 2019-02-14 23:01 UTC (permalink / raw)
  To: 'Junio C Hamano'
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On February 14, 2019 17:39, Junio C Hamano wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@gmail.com>;
> git@vger.kernel.org; 'Max Kirillov' <max@max630.net>
> Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> 
> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
> > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but
> > I don't think this actually addresses the root cause of the hang.
> 
> Sigh.
> 
> > possible this is not the test that is failing, but actually the
> > git-http-backend? The code is not in a loop, if that helps. It is not
> > consuming any significant cycles. I don't know that part of the code
> > at all, sadly. The code is here:
> >
> > * in the operating system from here up *
> >   cleanup_children + 0x5D0 (UCr)
> >   cleanup_children_on_exit + 0x70 (UCr)
> >   git_atexit_dispatch + 0x200 (UCr)
> >   __process_atexit_functions + 0xA0 (DLL zcredll)
> >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> >   exit + 0x2A0 (DLL zcrtldll)
> >   die_webcgi + 0x240 (UCr)
> >   die_errno + 0x360 (UCr)
> >   write_or_die + 0x1C0 (UCr)
> >   end_headers + 0x1A0 (UCr)
> >   die_webcgi + 0x220 (UCr)
> >   die + 0x320 (UCr)
> >   inflate_request + 0x520 (UCr)
> >   run_service + 0xC20 (UCr)
> >   service_rpc + 0x530 (UCr)
> >   cmd_main + 0xD00 (UCr)
> >   main + 0x190 (UCr)
> >
> > Best guess is that a signal (SIGCHLD?) is possibly getting eaten or
> > neglected somewhere between the test, perl, and git-http-backend.
> 
> So we are trying to die(), which actually happens in die_webcgi(), and
then try
> to write some message _but_ notice an error inside
> write_or_dir() and try to exit because we do not want to recurse forever
> trying to die, giving a message to say how/why we died, and die because
> failing to give that message, forever.
> 
> But in our attempt to exit(), we try to "cleanup children" and that is
what gets
> stuck.
> 
> One big difference before and after the /dev/zero change is that the
process
> is now on a downstream of the pipe.  If we prepare a large file with a
finite
> size full of NULs and replace /dev/null with it, instead of feeding NULs
from
> the pipe, would it change the equation?

Doubtful. The processes are still around, and are waiting on read but not
actively reading (CPU time is not going up, so we're not reading an infinite
stream). To me, this is a pipe situation where there is simply nothing
waiting on the pipe (maybe a flush missing?). I'm grasping are straws
without knowing the actual process architecture of the test to debug it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:33   ` Max Kirillov
  2019-02-14 22:59     ` Randall S. Becker
@ 2019-02-14 23:04     ` Randall S. Becker
  1 sibling, 0 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-14 23:04 UTC (permalink / raw)
  To: 'Max Kirillov'
  Cc: 'Johannes Schindelin via GitGitGadget',
	git, 'Junio C Hamano'

On February 14, 2019 17:34, Max Kirillov wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@gmail.com>;
> git@vger.kernel.org; 'Junio C Hamano' <gitster@pobox.com>; 'Max Kirillov'
> <max@max630.net>
> Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> 
> On Thu, Feb 14, 2019 at 05:17:26PM -0500, Randall S. Becker wrote:
> > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but
> > I don't think this actually addresses the root cause of the hang. This
> > is now the fourth attempt at fixing this. Is it possible this is not
> > the test that is failing, but actually the git-http-backend? The code
> > is not in a loop, if that helps. It is not consuming any significant
> > cycles. I don't know that part of the code at all, sadly. The code is
> > here:
> >
> > * in the operating system from here up *
> >   cleanup_children + 0x5D0 (UCr)
> 
> ... so does the process which the stack was taken from has any children
> processes still?
> 
> I could imagine if a child somehow manages to end up in uninterruptible
> sleep, then probably it would never complete this way, wouldn't it?

Yes, this is typical of a hang. Two processes reading on the same pipe, or
one reading on a pipe and the other waiting for something that never shows.
Or one process attempting both reading and writing on the same pipe (no
kernel threads here). I did not see anything actually in sleep. perl is in a
close call, waiting for its output to be consumed - which never happens,
making me suspect this is a pipe setup issue, but I can't demonstrate that,
sorry.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it
  2019-02-14 22:13   ` Junio C Hamano
@ 2019-02-15 14:59     ` Johannes Schindelin
  2019-02-15 17:41       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2019-02-15 14:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git, Randall Becker

Hi Junio,

On Thu, 14 Feb 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In cc95bc2025 (t5562: replace /dev/zero with a pipe from
> > generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
> > is not available on NonStop, apparently) by a Perl script snippet to
> > generate NUL bytes.
> >
> > Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.
> > ...
> > In the end, though, what counts is that this here change incidentally
> > fixes that hang (maybe also on NonStop?). Even more positively, it gets
> > rid of yet another unnecessary Perl invocation.
> 
> Thanks for a quick band-aid.
> 
> Will apply directly to 'master' so that we won't forget before -rc2.

Thank you, that will be good, as the builds still seem to fail. All of
them.

> In the meantime, perhaps somebody who knows Perl interpreter's
> quirks well can tell us what's different between the obvious and
> simple C program and an equivalent in Perl to convince us why this
> is a good solution to the problem.

I would be *really* curious about that, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it
  2019-02-15 14:59     ` Johannes Schindelin
@ 2019-02-15 17:41       ` Junio C Hamano
  2019-02-18 15:55         ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2019-02-15 17:41 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Randall Becker

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Thu, 14 Feb 2019, Junio C Hamano wrote:
>
>> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>> writes:
>> 
>> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> >
>> > In cc95bc2025 (t5562: replace /dev/zero with a pipe from
>> > generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
>> > is not available on NonStop, apparently) by a Perl script snippet to
>> > generate NUL bytes.
>> >
>> > Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.
>> > ...
>> > In the end, though, what counts is that this here change incidentally
>> > fixes that hang (maybe also on NonStop?). Even more positively, it gets
>> > rid of yet another unnecessary Perl invocation.
>> 
>> Thanks for a quick band-aid.
>> 
>> Will apply directly to 'master' so that we won't forget before -rc2.
>
> Thank you, that will be good, as the builds still seem to fail. All of
> them.

Actually, I am really tempted to instead not apply this, but revert
that genzerobytes Perl thing.  This assumes that your Azure thing
did not have the breakage before we applied that patchset.  What do
you think?

Trying four or more possible band-aids that may or may not work
without knowing what the real cause of the hangs are is not
something I want to see people spend excessive time of theirs on
this close to the final.  I'd rather avoid distraction and see
people spend their cycles on bugs that matter, instead of trying to
chase test breakages that have always been present for those without
/dev/zero.  I am not fundamentally opposed to supporting those
without /dev/zero but I'd prefer to see it happen in 'pu' until we
identify and fix the real cause---which may well be a real bug in
the http-backend stuff---and the time to do that is not during the
rc period where we close the tree for new features and non-regression
fixes.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it
  2019-02-15 17:41       ` Junio C Hamano
@ 2019-02-18 15:55         ` Johannes Schindelin
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Schindelin @ 2019-02-18 15:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git, Randall Becker

Hi Junio,

On Fri, 15 Feb 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Thu, 14 Feb 2019, Junio C Hamano wrote:
> >
> >> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> >> writes:
> >> 
> >> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >> >
> >> > In cc95bc2025 (t5562: replace /dev/zero with a pipe from
> >> > generate_zero_bytes, 2019-02-09), we replaced usage of /dev/zero (which
> >> > is not available on NonStop, apparently) by a Perl script snippet to
> >> > generate NUL bytes.
> >> >
> >> > Sadly, it does not seem to work on NonStop, as t5562 reportedly hangs.
> >> > ...
> >> > In the end, though, what counts is that this here change incidentally
> >> > fixes that hang (maybe also on NonStop?). Even more positively, it gets
> >> > rid of yet another unnecessary Perl invocation.
> >> 
> >> Thanks for a quick band-aid.
> >> 
> >> Will apply directly to 'master' so that we won't forget before -rc2.
> >
> > Thank you, that will be good, as the builds still seem to fail. All of
> > them.
> 
> Actually, I am really tempted to instead not apply this, but revert
> that genzerobytes Perl thing.  This assumes that your Azure thing
> did not have the breakage before we applied that patchset.  What do
> you think?

Honestly, I don't care, as long as we can stop *all* of the CI builds
failing, soon.

So whether you revert that commit, or apply mine, it is up to you, but
I'd really rather not see another batch of useless builds.

> Trying four or more possible band-aids that may or may not work
> without knowing what the real cause of the hangs are is not
> something I want to see people spend excessive time of theirs on
> this close to the final.  I'd rather avoid distraction and see
> people spend their cycles on bugs that matter, instead of trying to
> chase test breakages that have always been present for those without
> /dev/zero.  I am not fundamentally opposed to supporting those
> without /dev/zero but I'd prefer to see it happen in 'pu' until we
> identify and fix the real cause---which may well be a real bug in
> the http-backend stuff---and the time to do that is not during the
> rc period where we close the tree for new features and non-regression
> fixes.

While I am quite positive that my patch helps (even on NonStop, because
reportedly the hang is in .13 while my fix is about .15, which the commit
that caused the regression also touches), I am okay with holding off until
after v2.21.0.

But in the long run, the only sane thing really is to move more and more
functionality in the test suite for which we rely on Perl into test-tool.
It is really the only thing that makes sense, from the perspectives of
performance, robustness and portability.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 23:01     ` Randall S. Becker
@ 2019-02-18 20:41       ` Johannes Schindelin
  2019-02-18 20:46         ` Randall S. Becker
  2019-02-18 21:49         ` Randall S. Becker
  0 siblings, 2 replies; 21+ messages in thread
From: Johannes Schindelin @ 2019-02-18 20:41 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

Hi Randall,

On Thu, 14 Feb 2019, Randall S. Becker wrote:

> On February 14, 2019 17:39, Junio C Hamano wrote:
> > To: Randall S. Becker <rsbecker@nexbridge.com>
> > Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@gmail.com>;
> > git@vger.kernel.org; 'Max Kirillov' <max@max630.net>
> > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> > 
> > "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> > 
> > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but
> > > I don't think this actually addresses the root cause of the hang.
> > 
> > Sigh.
> > 
> > > possible this is not the test that is failing, but actually the
> > > git-http-backend? The code is not in a loop, if that helps. It is not
> > > consuming any significant cycles. I don't know that part of the code
> > > at all, sadly. The code is here:
> > >
> > > * in the operating system from here up *
> > >   cleanup_children + 0x5D0 (UCr)
> > >   cleanup_children_on_exit + 0x70 (UCr)
> > >   git_atexit_dispatch + 0x200 (UCr)
> > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > >   exit + 0x2A0 (DLL zcrtldll)
> > >   die_webcgi + 0x240 (UCr)
> > >   die_errno + 0x360 (UCr)
> > >   write_or_die + 0x1C0 (UCr)
> > >   end_headers + 0x1A0 (UCr)
> > >   die_webcgi + 0x220 (UCr)
> > >   die + 0x320 (UCr)
> > >   inflate_request + 0x520 (UCr)
> > >   run_service + 0xC20 (UCr)
> > >   service_rpc + 0x530 (UCr)
> > >   cmd_main + 0xD00 (UCr)
> > >   main + 0x190 (UCr)
> > >
> > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten or
> > > neglected somewhere between the test, perl, and git-http-backend.
> > 
> > So we are trying to die(), which actually happens in die_webcgi(), and
> then try
> > to write some message _but_ notice an error inside
> > write_or_dir() and try to exit because we do not want to recurse forever
> > trying to die, giving a message to say how/why we died, and die because
> > failing to give that message, forever.
> > 
> > But in our attempt to exit(), we try to "cleanup children" and that is
> what gets
> > stuck.
> > 
> > One big difference before and after the /dev/zero change is that the
> process
> > is now on a downstream of the pipe.  If we prepare a large file with a
> finite
> > size full of NULs and replace /dev/null with it, instead of feeding NULs
> from
> > the pipe, would it change the equation?
> 
> Doubtful. The processes are still around, and are waiting on read but not
> actively reading (CPU time is not going up, so we're not reading an infinite
> stream). To me, this is a pipe situation where there is simply nothing
> waiting on the pipe (maybe a flush missing?). I'm grasping are straws
> without knowing the actual process architecture of the test to debug it.

So could you try with this patch?

-- snipsnap --
diff --git a/http-backend.c b/http-backend.c
index d5cea0329a..7c1b4a2555 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name, int out, int buffer_input, ss
 
 done:
 	git_inflate_end(&stream);
+	close(0);
 	close(out);
 	free(full_request);
 }


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 20:41       ` Johannes Schindelin
@ 2019-02-18 20:46         ` Randall S. Becker
  2019-02-18 20:57           ` Max Kirillov
  2019-02-18 20:57           ` Randall S. Becker
  2019-02-18 21:49         ` Randall S. Becker
  1 sibling, 2 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-18 20:46 UTC (permalink / raw)
  To: 'Johannes Schindelin'
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On February 18, 2019 15:41, Johannes Schindelin wrote:
> On Thu, 14 Feb 2019, Randall S. Becker wrote:
> 
> > On February 14, 2019 17:39, Junio C Hamano wrote:
> > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@gmail.com>;
> > > git@vger.kernel.org; 'Max Kirillov' <max@max630.net>
> > > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in
> > > v2.21.0-rc1
> > >
> > > "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> > >
> > > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > > patch, so our Pipeline still hangs. I'm glad it's better on Azure,
> > > > but I don't think this actually addresses the root cause of the
hang.
> > >
> > > Sigh.
> > >
> > > > possible this is not the test that is failing, but actually the
> > > > git-http-backend? The code is not in a loop, if that helps. It is
> > > > not consuming any significant cycles. I don't know that part of
> > > > the code at all, sadly. The code is here:
> > > >
> > > > * in the operating system from here up *
> > > >   cleanup_children + 0x5D0 (UCr)
> > > >   cleanup_children_on_exit + 0x70 (UCr)
> > > >   git_atexit_dispatch + 0x200 (UCr)
> > > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > > >   exit + 0x2A0 (DLL zcrtldll)
> > > >   die_webcgi + 0x240 (UCr)
> > > >   die_errno + 0x360 (UCr)
> > > >   write_or_die + 0x1C0 (UCr)
> > > >   end_headers + 0x1A0 (UCr)
> > > >   die_webcgi + 0x220 (UCr)
> > > >   die + 0x320 (UCr)
> > > >   inflate_request + 0x520 (UCr)
> > > >   run_service + 0xC20 (UCr)
> > > >   service_rpc + 0x530 (UCr)
> > > >   cmd_main + 0xD00 (UCr)
> > > >   main + 0x190 (UCr)
> > > >
> > > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten
> > > > or neglected somewhere between the test, perl, and git-http-backend.
> > >
> > > So we are trying to die(), which actually happens in die_webcgi(),
> > > and
> > then try
> > > to write some message _but_ notice an error inside
> > > write_or_dir() and try to exit because we do not want to recurse
> > > forever trying to die, giving a message to say how/why we died, and
> > > die because failing to give that message, forever.
> > >
> > > But in our attempt to exit(), we try to "cleanup children" and that
> > > is
> > what gets
> > > stuck.
> > >
> > > One big difference before and after the /dev/zero change is that the
> > process
> > > is now on a downstream of the pipe.  If we prepare a large file with
> > > a
> > finite
> > > size full of NULs and replace /dev/null with it, instead of feeding
> > > NULs
> > from
> > > the pipe, would it change the equation?
> >
> > Doubtful. The processes are still around, and are waiting on read but
> > not actively reading (CPU time is not going up, so we're not reading
> > an infinite stream). To me, this is a pipe situation where there is
> > simply nothing waiting on the pipe (maybe a flush missing?). I'm
> > grasping are straws without knowing the actual process architecture of
the
> test to debug it.
> 
> So could you try with this patch?
> 
> -- snipsnap --
> diff --git a/http-backend.c b/http-backend.c index d5cea0329a..7c1b4a2555
> 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> int out, int buffer_input, ss
> 
>  done:
>  	git_inflate_end(&stream);
> +	close(0);
>  	close(out);
>  	free(full_request);
>  }

In isolation or with the other fixes associated with t5562? Or, which
baseline commit should I use? 8989e1950a or d92031209a or some other?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 20:46         ` Randall S. Becker
@ 2019-02-18 20:57           ` Max Kirillov
  2019-02-19 14:09             ` Johannes Schindelin
  2019-02-18 20:57           ` Randall S. Becker
  1 sibling, 1 reply; 21+ messages in thread
From: Max Kirillov @ 2019-02-18 20:57 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Johannes Schindelin', 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On Mon, Feb 18, 2019 at 03:46:34PM -0500, Randall S. Becker wrote:
> On February 18, 2019 15:41, Johannes Schindelin wrote:
> > So could you try with this patch?
> > 
> > -- snipsnap --
> > diff --git a/http-backend.c b/http-backend.c index d5cea0329a..7c1b4a2555
> > 100644
> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> > int out, int buffer_input, ss
> > 
> >  done:
> >  	git_inflate_end(&stream);
> > +	close(0);
> >  	close(out);
> >  	free(full_request);
> >  }
> 
> In isolation or with the other fixes associated with t5562? Or, which
> baseline commit should I use? 8989e1950a or d92031209a or some other?

As far as I understand, it should be tried instead of 
https://public-inbox.org/git/20181124093719.10705-1-max@max630.net/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 20:46         ` Randall S. Becker
  2019-02-18 20:57           ` Max Kirillov
@ 2019-02-18 20:57           ` Randall S. Becker
  1 sibling, 0 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-18 20:57 UTC (permalink / raw)
  To: 'Johannes Schindelin'
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On February 18, 2019 15:47, I wrote:
> On February 18, 2019 15:41, Johannes Schindelin wrote:
> > On Thu, 14 Feb 2019, Randall S. Becker wrote:
> >
> > > On February 14, 2019 17:39, Junio C Hamano wrote:
> > > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > > Cc: 'Johannes Schindelin via GitGitGadget'
> > > > <gitgitgadget@gmail.com>; git@vger.kernel.org; 'Max Kirillov'
> > > > <max@max630.net>
> > > > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in
> > > > v2.21.0-rc1
> > > >
> > > > "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> > > >
> > > > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > > > patch, so our Pipeline still hangs. I'm glad it's better on
> > > > > Azure, but I don't think this actually addresses the root cause
> > > > > of the
> hang.
> > > >
> > > > Sigh.
> > > >
> > > > > possible this is not the test that is failing, but actually the
> > > > > git-http-backend? The code is not in a loop, if that helps. It
> > > > > is not consuming any significant cycles. I don't know that part
> > > > > of the code at all, sadly. The code is here:
> > > > >
> > > > > * in the operating system from here up *
> > > > >   cleanup_children + 0x5D0 (UCr)
> > > > >   cleanup_children_on_exit + 0x70 (UCr)
> > > > >   git_atexit_dispatch + 0x200 (UCr)
> > > > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > > > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > > > >   exit + 0x2A0 (DLL zcrtldll)
> > > > >   die_webcgi + 0x240 (UCr)
> > > > >   die_errno + 0x360 (UCr)
> > > > >   write_or_die + 0x1C0 (UCr)
> > > > >   end_headers + 0x1A0 (UCr)
> > > > >   die_webcgi + 0x220 (UCr)
> > > > >   die + 0x320 (UCr)
> > > > >   inflate_request + 0x520 (UCr)
> > > > >   run_service + 0xC20 (UCr)
> > > > >   service_rpc + 0x530 (UCr)
> > > > >   cmd_main + 0xD00 (UCr)
> > > > >   main + 0x190 (UCr)
> > > > >
> > > > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten
> > > > > or neglected somewhere between the test, perl, and git-http-
> backend.
> > > >
> > > > So we are trying to die(), which actually happens in die_webcgi(),
> > > > and
> > > then try
> > > > to write some message _but_ notice an error inside
> > > > write_or_dir() and try to exit because we do not want to recurse
> > > > forever trying to die, giving a message to say how/why we died,
> > > > and die because failing to give that message, forever.
> > > >
> > > > But in our attempt to exit(), we try to "cleanup children" and
> > > > that is
> > > what gets
> > > > stuck.
> > > >
> > > > One big difference before and after the /dev/zero change is that
> > > > the
> > > process
> > > > is now on a downstream of the pipe.  If we prepare a large file
> > > > with a
> > > finite
> > > > size full of NULs and replace /dev/null with it, instead of
> > > > feeding NULs
> > > from
> > > > the pipe, would it change the equation?
> > >
> > > Doubtful. The processes are still around, and are waiting on read
> > > but not actively reading (CPU time is not going up, so we're not
> > > reading an infinite stream). To me, this is a pipe situation where
> > > there is simply nothing waiting on the pipe (maybe a flush
> > > missing?). I'm grasping are straws without knowing the actual
> > > process architecture of
> the
> > test to debug it.
> >
> > So could you try with this patch?
> >
> > -- snipsnap --
> > diff --git a/http-backend.c b/http-backend.c index
> > d5cea0329a..7c1b4a2555
> > 100644
> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> > int out, int buffer_input, ss
> >
> >  done:
> >  	git_inflate_end(&stream);
> > +	close(0);
> >  	close(out);
> >  	free(full_request);
> >  }
> 
> In isolation or with the other fixes associated with t5562? Or, which
baseline
> commit should I use? 8989e1950a or d92031209a or some other?

It works fine based off of d92031209a - the test actually runs a bit faster
based on the wall clock.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-14 22:38   ` Junio C Hamano
  2019-02-14 23:01     ` Randall S. Becker
@ 2019-02-18 21:06     ` Ævar Arnfjörð Bjarmason
  2019-02-18 21:17       ` Max Kirillov
  1 sibling, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-18 21:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Randall S. Becker, 'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'


On Thu, Feb 14 2019, Junio C Hamano wrote:

> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
>
>> Unfortunately, subtest 13 still hangs on NonStop, even with this
>> patch, so our Pipeline still hangs. I'm glad it's better on Azure,
>> but I don't think this actually addresses the root cause of the
>> hang.
>
> Sigh.
>
>> possible this is not the test that is failing, but actually the
>> git-http-backend? The code is not in a loop, if that helps. It is
>> not consuming any significant cycles. I don't know that part of
>> the code at all, sadly. The code is here:
>>
>> * in the operating system from here up *
>>   cleanup_children + 0x5D0 (UCr)
>>   cleanup_children_on_exit + 0x70 (UCr)
>>   git_atexit_dispatch + 0x200 (UCr)
>>   __process_atexit_functions + 0xA0 (DLL zcredll)
>>   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
>>   exit + 0x2A0 (DLL zcrtldll)
>>   die_webcgi + 0x240 (UCr)
>>   die_errno + 0x360 (UCr)
>>   write_or_die + 0x1C0 (UCr)
>>   end_headers + 0x1A0 (UCr)
>>   die_webcgi + 0x220 (UCr)
>>   die + 0x320 (UCr)
>>   inflate_request + 0x520 (UCr)
>>   run_service + 0xC20 (UCr)
>>   service_rpc + 0x530 (UCr)
>>   cmd_main + 0xD00 (UCr)
>>   main + 0x190 (UCr)
>>
>> Best guess is that a signal (SIGCHLD?) is possibly getting eaten
>> or neglected somewhere between the test, perl, and
>> git-http-backend.
>
> So we are trying to die(), which actually happens in die_webcgi(),
> and then try to write some message _but_ notice an error inside
> write_or_dir() and try to exit because we do not want to recurse
> forever trying to die, giving a message to say how/why we died, and
> die because failing to give that message, forever.
>
> But in our attempt to exit(), we try to "cleanup children" and that
> is what gets stuck.

I have not paid enough attention to this thread to say if this is dumb,
but just in case it's useful. For this class of problem where cleanup
bites you for whatever reason in Perl, you can sometimes use this:

    use POSIX ();
    POSIX::_exit($code);

This will call "exit" from "stdlib" instead of Perl's "exit". So go away
*now* and let the OS deal with the mess. Perl's will run around cleaning
up stuff, freeing memory, running destructors etc, all of which might
have side effects you don't want/care about, and might (as maybe in this
case?) cause some hang.

> One big difference before and after the /dev/zero change is that the
> process is now on a downstream of the pipe.  If we prepare a large
> file with a finite size full of NULs and replace /dev/null with it,
> instead of feeding NULs from the pipe, would it change the equation?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 21:06     ` Ævar Arnfjörð Bjarmason
@ 2019-02-18 21:17       ` Max Kirillov
  2019-02-19 14:13         ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Max Kirillov @ 2019-02-18 21:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Randall S. Becker,
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On Mon, Feb 18, 2019 at 10:06:21PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> But in our attempt to exit(), we try to "cleanup children" and that
>> is what gets stuck.
> 
> I have not paid enough attention to this thread to say if this is dumb,
> but just in case it's useful. For this class of problem where cleanup
> bites you for whatever reason in Perl, you can sometimes use this:
> 
>     use POSIX ();
>     POSIX::_exit($code);
> 
> This will call "exit" from "stdlib" instead of Perl's "exit". So go away
> *now* and let the OS deal with the mess. Perl's will run around cleaning
> up stuff, freeing memory, running destructors etc, all of which might
> have side effects you don't want/care about, and might (as maybe in this
> case?) cause some hang.

* Perl is running in foreground, so it cannot outlive test
  case and spoil the subsequent ones.
* From the dumps I have an impression that it waits
  legitimately - there are other processes to wait for.
  And anyway the waits happen before perl script comes to
  its exit.

Though I am already convinced that I should have done the
helper in C. Let's see when I have time to fix it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 20:41       ` Johannes Schindelin
  2019-02-18 20:46         ` Randall S. Becker
@ 2019-02-18 21:49         ` Randall S. Becker
  1 sibling, 0 replies; 21+ messages in thread
From: Randall S. Becker @ 2019-02-18 21:49 UTC (permalink / raw)
  To: 'Johannes Schindelin'
  Cc: 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git, 'Max Kirillov'

On February 18, 2019 15:41, Johannes Schindelin wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: 'Junio C Hamano' <gitster@pobox.com>; 'Johannes Schindelin via
> GitGitGadget' <gitgitgadget@gmail.com>; git@vger.kernel.org; 'Max
Kirillov'
> <max@max630.net>
> Subject: RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> 
> Hi Randall,
> 
> On Thu, 14 Feb 2019, Randall S. Becker wrote:
> 
> > On February 14, 2019 17:39, Junio C Hamano wrote:
> > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@gmail.com>;
> > > git@vger.kernel.org; 'Max Kirillov' <max@max630.net>
> > > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in
> > > v2.21.0-rc1
> > >
> > > "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> > >
> > > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > > patch, so our Pipeline still hangs. I'm glad it's better on Azure,
> > > > but I don't think this actually addresses the root cause of the
hang.
> > >
> > > Sigh.
> > >
> > > > possible this is not the test that is failing, but actually the
> > > > git-http-backend? The code is not in a loop, if that helps. It is
> > > > not consuming any significant cycles. I don't know that part of
> > > > the code at all, sadly. The code is here:
> > > >
> > > > * in the operating system from here up *
> > > >   cleanup_children + 0x5D0 (UCr)
> > > >   cleanup_children_on_exit + 0x70 (UCr)
> > > >   git_atexit_dispatch + 0x200 (UCr)
> > > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > > >   exit + 0x2A0 (DLL zcrtldll)
> > > >   die_webcgi + 0x240 (UCr)
> > > >   die_errno + 0x360 (UCr)
> > > >   write_or_die + 0x1C0 (UCr)
> > > >   end_headers + 0x1A0 (UCr)
> > > >   die_webcgi + 0x220 (UCr)
> > > >   die + 0x320 (UCr)
> > > >   inflate_request + 0x520 (UCr)
> > > >   run_service + 0xC20 (UCr)
> > > >   service_rpc + 0x530 (UCr)
> > > >   cmd_main + 0xD00 (UCr)
> > > >   main + 0x190 (UCr)
> > > >
> > > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten
> > > > or neglected somewhere between the test, perl, and git-http-backend.
> > >
> > > So we are trying to die(), which actually happens in die_webcgi(),
> > > and
> > then try
> > > to write some message _but_ notice an error inside
> > > write_or_dir() and try to exit because we do not want to recurse
> > > forever trying to die, giving a message to say how/why we died, and
> > > die because failing to give that message, forever.
> > >
> > > But in our attempt to exit(), we try to "cleanup children" and that
> > > is
> > what gets
> > > stuck.
> > >
> > > One big difference before and after the /dev/zero change is that the
> > process
> > > is now on a downstream of the pipe.  If we prepare a large file with
> > > a
> > finite
> > > size full of NULs and replace /dev/null with it, instead of feeding
> > > NULs
> > from
> > > the pipe, would it change the equation?
> >
> > Doubtful. The processes are still around, and are waiting on read but
> > not actively reading (CPU time is not going up, so we're not reading
> > an infinite stream). To me, this is a pipe situation where there is
> > simply nothing waiting on the pipe (maybe a flush missing?). I'm
> > grasping are straws without knowing the actual process architecture of
the
> test to debug it.
> 
> So could you try with this patch?
> 
> -- snipsnap --
> diff --git a/http-backend.c b/http-backend.c index d5cea0329a..7c1b4a2555
> 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> int out, int buffer_input, ss
> 
>  done:
>  	git_inflate_end(&stream);
> +	close(0);
>  	close(out);
>  	free(full_request);
>  }

Based on d62dad7a7d (v2.21.0-rc0) undoing all of the fixes, this change on
its own makes no difference to the hang situation - it is still there as it
was when originally reported. Using POSIX::_exit does not change the outcome
of the test either on its own or in conjunction with this fix.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 20:57           ` Max Kirillov
@ 2019-02-19 14:09             ` Johannes Schindelin
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Schindelin @ 2019-02-19 14:09 UTC (permalink / raw)
  To: Max Kirillov
  Cc: Randall S. Becker, 'Junio C Hamano',
	'Johannes Schindelin via GitGitGadget',
	git

Hi Max & Randall,

On Mon, 18 Feb 2019, Max Kirillov wrote:

> On Mon, Feb 18, 2019 at 03:46:34PM -0500, Randall S. Becker wrote:
> > On February 18, 2019 15:41, Johannes Schindelin wrote:
> > > So could you try with this patch?
> > > 
> > > -- snipsnap --
> > > diff --git a/http-backend.c b/http-backend.c index d5cea0329a..7c1b4a2555
> > > 100644
> > > --- a/http-backend.c
> > > +++ b/http-backend.c
> > > @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name,
> > > int out, int buffer_input, ss
> > > 
> > >  done:
> > >  	git_inflate_end(&stream);
> > > +	close(0);
> > >  	close(out);
> > >  	free(full_request);
> > >  }
> > 
> > In isolation or with the other fixes associated with t5562? Or, which
> > baseline commit should I use? 8989e1950a or d92031209a or some other?
> 
> As far as I understand, it should be tried instead of 
> https://public-inbox.org/git/20181124093719.10705-1-max@max630.net/

Don't ask me which patches you need to try this with. I was just answering
to the observation that the hangs happen in the gzip-encoding test cases,
and this was my guess as to what is going wrong there. I have no idea
whether other patches try to address the same thing, are obsoleted by this
diff, or whatever, as I have not been able to pay attention to the Git
mailing list in the past 5 days.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
  2019-02-18 21:17       ` Max Kirillov
@ 2019-02-19 14:13         ` Johannes Schindelin
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Schindelin @ 2019-02-19 14:13 UTC (permalink / raw)
  To: Max Kirillov
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Randall S. Becker, 'Johannes Schindelin via GitGitGadget',
	git

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

Hi Max,

On Mon, 18 Feb 2019, Max Kirillov wrote:

> On Mon, Feb 18, 2019 at 10:06:21PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> But in our attempt to exit(), we try to "cleanup children" and that
> >> is what gets stuck.
> > 
> > I have not paid enough attention to this thread to say if this is dumb,
> > but just in case it's useful. For this class of problem where cleanup
> > bites you for whatever reason in Perl, you can sometimes use this:
> > 
> >     use POSIX ();
> >     POSIX::_exit($code);
> > 
> > This will call "exit" from "stdlib" instead of Perl's "exit". So go away
> > *now* and let the OS deal with the mess. Perl's will run around cleaning
> > up stuff, freeing memory, running destructors etc, all of which might
> > have side effects you don't want/care about, and might (as maybe in this
> > case?) cause some hang.
> 
> * Perl is running in foreground, so it cannot outlive test
>   case and spoil the subsequent ones.
> * From the dumps I have an impression that it waits
>   legitimately - there are other processes to wait for.
>   And anyway the waits happen before perl script comes to
>   its exit.
> 
> Though I am already convinced that I should have done the
> helper in C. Let's see when I have time to fix it.

Perl has this nasty habit of causing unexpected problems, doesn't it?

I look forward to that dependency on Perl going away, thank you so much!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-02-19 14:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 21:33 [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Johannes Schindelin via GitGitGadget
2019-02-14 21:33 ` [PATCH 1/1] tests: teach the test-tool to generate NUL bytes and use it Johannes Schindelin via GitGitGadget
2019-02-14 22:13   ` Junio C Hamano
2019-02-15 14:59     ` Johannes Schindelin
2019-02-15 17:41       ` Junio C Hamano
2019-02-18 15:55         ` Johannes Schindelin
2019-02-14 22:17 ` [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 Randall S. Becker
2019-02-14 22:33   ` Max Kirillov
2019-02-14 22:59     ` Randall S. Becker
2019-02-14 23:04     ` Randall S. Becker
2019-02-14 22:38   ` Junio C Hamano
2019-02-14 23:01     ` Randall S. Becker
2019-02-18 20:41       ` Johannes Schindelin
2019-02-18 20:46         ` Randall S. Becker
2019-02-18 20:57           ` Max Kirillov
2019-02-19 14:09             ` Johannes Schindelin
2019-02-18 20:57           ` Randall S. Becker
2019-02-18 21:49         ` Randall S. Becker
2019-02-18 21:06     ` Ævar Arnfjörð Bjarmason
2019-02-18 21:17       ` Max Kirillov
2019-02-19 14:13         ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.