All of lore.kernel.org
 help / color / mirror / Atom feed
* t5800-*.sh: Intermittent test failures
@ 2011-08-09 18:30 Ramsay Jones
  2011-08-11 21:39 ` Sverre Rabbelier
  0 siblings, 1 reply; 12+ messages in thread
From: Ramsay Jones @ 2011-08-09 18:30 UTC (permalink / raw)
  To: GIT Mailing-list; +Cc: srabbelier, Jeff King, Jonathan Nieder, Junio C Hamano


I've noticed some intermittent test failures in t5800-*.sh on Linux
recently. The failures (test #7 onwards) are due to a git-push to a
remote, via the git-remote-test helper, hanging in git-fast-import.

git-bisect fingers the following commit:

    a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
    commit a515ebe9f1ac9bc248c12a291dc008570de505ca
    Author: Sverre Rabbelier <srabbelier@gmail.com>
    Date:   Sat Jul 16 15:03:40 2011 +0200

        transport-helper: implement marks location as capability

        Now that the gitdir location is exported as an environment variable
        this can be implemented elegantly without requiring any explicit
        flushes nor an ad-hoc exchange of values.

        Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
        Acked-by: Jeff King <peff@peff.net>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

    :100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23 e9c832bfd3da7db771cc2113
    027d3e590dc51d59 M      git-remote-testgit.py
    :100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c 74c3122df1835c45a6b62120
    5fb18b4fc89af366 M      transport-helper.c

which didn't seem too likely at first, but it does reduce the size of the
fast-import stream (by moving the import/export marks filenames to the
command line). This could change the timings enough to cause the problem.

I set various environment variables (eg GIT_TRANSLOOP_DEBUG, GIT_DEBUG_TESTGIT etc)
in order to get some additional clues, in addition to looking at the stackframe
of all of the processes in the hung pipeline, which looks like:

    git(push)->git-remote-test->git(fast-import)->git-fast-import

The git-fast-import is hung in the read() syscall waiting for data which will
never arrive. This is because the git(fast-export) process, started by the above
git(push), executes (producing it's data on stdout) and completes successfully
and exits *before* the above git-fast-import process starts.

I haven't looked to see how the git(fast-export)/git-fast-import processes are
plumbed together, but there seems to be a synchronization problem somewhere ...

Unfortunately, I don't have time at the moment to finish debugging this, so I
was hoping someone who knows the code better than me could fix it up ...
Thanks! :-P

[I've included the stackframes (from the above pipeline) below in case it helps]

ATB,
Ramsay Jones


[git-fast-import]
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7dd6033 in read () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d774f8 in _IO_file_read () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7d788c0 in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7d78fbb in _IO_default_uflow () from /lib/tls/i686/cmov/libc.so.6
#5  0xb7d7a31d in __uflow () from /lib/tls/i686/cmov/libc.so.6
#6  0xb7d742a0 in getc () from /lib/tls/i686/cmov/libc.so.6
#7  0x0807e203 in strbuf_getwholeline (sb=0x80e348c, fp=0xb7e53420, term=10)
    at strbuf.c:361
#8  0x0807e262 in strbuf_getline (sb=0x80e348c, fp=0xb7e53420, term=10)
    at strbuf.c:376
#9  0x0804f681 in read_next_command () at fast-import.c:1853
#10 0x0805368b in main (argc=4, argv=0xbf8eac74) at fast-import.c:3295
(gdb)

[git(fast-import)]
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7dcf0b3 in __waitpid_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08129706 in wait_or_whine (pid=6200, argv0=0x81e4070 "git-fast-import", 
    silent_exec_failure=1) at run-command.c:105
#3  0x0812a08f in finish_command (cmd=0xbfe42874) at run-command.c:415
#4  0x0812a0be in run_command (cmd=0xbfe42874) at run-command.c:423
#5  0x0812a1bf in run_command_v_opt (argv=0xbfe429dc, opt=8)
    at run-command.c:443
#6  0x0804c12d in execv_dashed_external (argv=0xbfe429dc) at git.c:489
#7  0x0804c192 in run_argv (argcp=0xbfe42950, argv=0xbfe42954) at git.c:507
#8  0x0804c321 in main (argc=4, argv=0xbfe429dc) at git.c:577
(gdb)

[git-remote-test]
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f230b3 in __waitpid_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x080f8fc0 in posix_waitpid (self=0x0, args=0xb7d615ec)
    at ../Modules/posixmodule.c:5636
... [snipped as uninteresting!]
(gdb) 

[git(push)]
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7dde033 in read () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d7f4f8 in _IO_file_read () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7d808c0 in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7d80fbb in _IO_default_uflow () from /lib/tls/i686/cmov/libc.so.6
#5  0xb7d8231d in __uflow () from /lib/tls/i686/cmov/libc.so.6
#6  0xb7d7c2a0 in getc () from /lib/tls/i686/cmov/libc.so.6
#7  0x08138d6b in strbuf_getwholeline (sb=0xbfb662c8, fp=0x81e4760, term=10)
    at strbuf.c:361
#8  0x08138dca in strbuf_getline (sb=0xbfb662c8, fp=0x81e4760, term=10)
    at strbuf.c:376
#9  0x0813ffe3 in recvline_fh (helper=0x81e4760, buffer=0xbfb662c8)
    at transport-helper.c:51
#10 0x081400be in recvline (helper=0x81e44a0, buffer=0xbfb662c8)
    at transport-helper.c:64
#11 0x08141a6e in push_update_refs_status (data=0x81e44a0, 
    remote_refs=0x81e48e8) at transport-helper.c:652
#12 0x08141e80 in push_refs_with_export (transport=0x81e4450, 
    remote_refs=0x81e48e8, flags=0) at transport-helper.c:759
#13 0x08141f74 in push_refs (transport=0x81e4450, remote_refs=0x81e48e8, 
    flags=0) at transport-helper.c:783
#14 0x0813f846 in transport_push (transport=0x81e4450, refspec_nr=1, 
    refspec=0x81e43e8, flags=0, nonfastforward=0xbfb6642c) at transport.c:1044
#15 0x080a3bda in push_with_options (transport=0x81e4450, flags=0)
    at builtin/push.c:131
#16 0x080a3ea7 in do_push (repo=0x0, flags=0) at builtin/push.c:209
#17 0x080a4377 in cmd_push (argc=0, argv=0xbfb668c8, prefix=0x0)
    at builtin/push.c:265
#18 0x0804bf3f in run_builtin (p=0x81977b4, argc=1, argv=0xbfb668c8)
    at git.c:302
#19 0x0804c0a5 in handle_internal_command (argc=1, argv=0xbfb668c8)
    at git.c:460
#20 0x0804c185 in run_argv (argcp=0xbfb66840, argv=0xbfb66844) at git.c:504
#21 0x0804c321 in main (argc=1, argv=0xbfb668c8) at git.c:577
(gdb)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-08-09 18:30 t5800-*.sh: Intermittent test failures Ramsay Jones
@ 2011-08-11 21:39 ` Sverre Rabbelier
  2011-08-13 20:51   ` Ramsay Jones
  2011-09-04 19:06   ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Sverre Rabbelier @ 2011-08-11 21:39 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: GIT Mailing-list, Jeff King, Jonathan Nieder, Junio C Hamano

Heya,

On Tue, Aug 9, 2011 at 20:30, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
> The git-fast-import is hung in the read() syscall waiting for data which will
> never arrive. This is because the git(fast-export) process, started by the above
> git(push), executes (producing it's data on stdout) and completes successfully
> and exits *before* the above git-fast-import process starts.
>
> I haven't looked to see how the git(fast-export)/git-fast-import processes are
> plumbed together, but there seems to be a synchronization problem somewhere ...

This seems odd, before the fast-export process is even started it's
stdout are wired to the stdin of the helper (and thus the fast-import
process). What indication do you have that fast-import hasn't started
and that fast-export has finished?

Also, you say git remote-test everywhere, but it should be git
remote-testgit, typo?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-08-11 21:39 ` Sverre Rabbelier
@ 2011-08-13 20:51   ` Ramsay Jones
  2011-09-04 19:06   ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Ramsay Jones @ 2011-08-13 20:51 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: GIT Mailing-list, Jeff King, Jonathan Nieder, Junio C Hamano

Sverre Rabbelier wrote:
>> I haven't looked to see how the git(fast-export)/git-fast-import processes are
>> plumbed together, but there seems to be a synchronization problem somewhere ...
> 
> This seems odd, before the fast-export process is even started it's
> stdout are wired to the stdin of the helper (and thus the fast-import
> process). What indication do you have that fast-import hasn't started
> and that fast-export has finished?

I indulged in a spot of "printf debugging". ;-)  see more below.

> Also, you say git remote-test everywhere, but it should be git
> remote-testgit, typo?

Yep. [It was actually caused by a cut/paste/edit of pstree output (pstree
truncates long fields); not that you could guess that! ;-P ]

So ...

I added some additional debug code to transport-helper.c (see below) in
addition to creating debug output files from the git-fast-import/export
commands. (I won't show the code for this debug output; it wouldn't be
hard to imagine! :-)

In addition to the uninteresting "printf debugging" info, I used
gettimeofday() to show the start and end times for the git(fast-export)
process and the start time for git-fast-import. The last hunk below,
for instance, shows the code to output the git(fast-export) end time ...

--- >8 ----
diff --git a/transport-helper.c b/transport-helper.c
index 74c3122..7c9d881 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -132,6 +132,8 @@ static struct child_process *get_helper(struct transport *transport)
 	snprintf(git_dir_buf, sizeof(git_dir_buf), "%s=%s", GIT_DIR_ENVIRONMENT, get_git_dir());
 	helper->env = helper_env;
 
+	if (debug)
+		fprintf(stderr, "Debug: start remote helper: <%s>\n", helper->argv[0]);
 	code = start_command(helper);
 	if (code < 0 && errno == ENOENT)
 		die("Unable to find remote helper for '%s'", data->name);
@@ -376,6 +378,8 @@ static int get_importer(struct transport *transport, struct child_process *fasti
 	fastimport->argv[1] = "--quiet";
 
 	fastimport->git_cmd = 1;
+	if (debug)
+		fprintf(stderr, "Debug: get_importer, start fast-import\n");
 	return start_command(fastimport);
 }
 
@@ -403,6 +407,8 @@ static int get_exporter(struct transport *transport,
 		fastexport->argv[argc++] = revlist_args->items[i].string;
 
 	fastexport->git_cmd = 1;
+	if (debug)
+		fprintf(stderr, "Debug: get_exporter, start fast-export\n");
 	return start_command(fastexport);
 }
 
@@ -756,6 +762,11 @@ static int push_refs_with_export(struct transport *transport,
 
 	if (finish_command(&exporter))
 		die("Error while running fast-export");
+	if (debug) {
+		struct timeval tv;
+		gettimeofday(&tv, NULL);
+		fprintf(stderr, "fast-export finished @ %lds %ldus\n", tv.tv_sec, tv.tv_usec);
+	}
 	push_update_refs_status(data, remote_refs);
 	return 0;
 }
--- >8 ----

The debug output from "./t5800-remote-helpers.sh -v" ends like this:

... [snipped]
Debug: Capabilities complete.
Debug: Remote helper: Waiting...
Got command 'list' with args ''
? refs/heads/new
? refs/heads/master
@refs/heads/master HEAD
Debug: Remote helper: <- ? refs/heads/new
Debug: Remote helper: Waiting...
Debug: Remote helper: <- ? refs/heads/master
Debug: Remote helper: Waiting...
Debug: Remote helper: <- @refs/heads/master HEAD
Debug: Remote helper: Waiting...
Debug: Remote helper: <- 
Debug: Read ref listing.
Debug: Remote helper: -> export
Debug: get_exporter, start fast-export
fast-export finished @ 1313178956s 366398us
Debug: Remote helper: Waiting...
Got command 'export' with args ''

The fast-export debug file looks like:

--- >8 ----
fast-export: pid = 11096 (ppid 11090)
started @ 1313178956s 364790us
arg: <fast-export>
arg: <--use-done-feature>
arg: <--export-marks=.git/info/fast-import/a08486a77c5cf1b4aa17fa9e64673e352ebe1a96/testgit.marks>
arg: <--import-marks=.git/info/fast-import/a08486a77c5cf1b4aa17fa9e64673e352ebe1a96/testgit.marks>
arg: <^refs/testgit/origin/master>
arg: <refs/heads/master>
----end args----: <>
handle object: <ab28ce7f215103f3f4bf70fd439541590dccc91b>
handle commit: <refs/heads/master>
main: <done!>
--- >8 ----

The fast-import debug file looks like:

--- >8 ----
fast-import: pid = 11104 (ppid = 11103)
started @ 1313178956s 382392us
main: <start-up>
main: <start-up #1>
main: <before loop>
--- >8 ----

Note that git(fast-export) executes in 1608 micro-seconds and finishes
15994 micro-seconds before git-fast-import starts.

ATB,
Ramsay Jones

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-08-11 21:39 ` Sverre Rabbelier
  2011-08-13 20:51   ` Ramsay Jones
@ 2011-09-04 19:06   ` Junio C Hamano
  2011-09-08 17:42     ` Ramsay Jones
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-09-04 19:06 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Ramsay Jones, GIT Mailing-list, Jeff King, Jonathan Nieder

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Heya,
>
> On Tue, Aug 9, 2011 at 20:30, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>> The git-fast-import is hung in the read() syscall waiting for data which will
>> never arrive. This is because the git(fast-export) process, started by the above
>> git(push), executes (producing it's data on stdout) and completes successfully
>> and exits *before* the above git-fast-import process starts.
>>
>> I haven't looked to see how the git(fast-export)/git-fast-import processes are
>> plumbed together, but there seems to be a synchronization problem somewhere ...
>
> This seems odd, before the fast-export process is even started it's
> stdout are wired to the stdin of the helper (and thus the fast-import
> process). What indication do you have that fast-import hasn't started
> and that fast-export has finished?
>
> Also, you say git remote-test everywhere, but it should be git
> remote-testgit, typo?

FWIW, I have been seeing this every once in a while.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-09-04 19:06   ` Junio C Hamano
@ 2011-09-08 17:42     ` Ramsay Jones
  2011-09-08 18:20       ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Ramsay Jones @ 2011-09-08 17:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Sverre Rabbelier, GIT Mailing-list, Jeff King, Jonathan Nieder

Junio C Hamano wrote:
> Sverre Rabbelier <srabbelier@gmail.com> writes:
>> On Tue, Aug 9, 2011 at 20:30, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>> The git-fast-import is hung in the read() syscall waiting for data which will
>>> never arrive. This is because the git(fast-export) process, started by the above
>>> git(push), executes (producing it's data on stdout) and completes successfully
>>> and exits *before* the above git-fast-import process starts.
>>>
>>> I haven't looked to see how the git(fast-export)/git-fast-import processes are
>>> plumbed together, but there seems to be a synchronization problem somewhere ...
>> This seems odd, before the fast-export process is even started it's
>> stdout are wired to the stdin of the helper (and thus the fast-import
>> process). What indication do you have that fast-import hasn't started
>> and that fast-export has finished?
>>
>> Also, you say git remote-test everywhere, but it should be git
>> remote-testgit, typo?
> 
> FWIW, I have been seeing this every once in a while.

Good to know I'm not alone ;-P

Unfortunately, I haven't had the time to debug this further than I've
already reported ...

As I said, it's obviously a process plumbing/synchronization problem; the reading
end of the fast-export output pipe must be open for read by someone (probably by
it's parent), otherwise it would receive SIGPIPE (also, the output is small enough
not to fill the pipe) rather than exiting with success.

When I run the tests with "make test >test-out", I see a failure rate of about
1 in 10. If I then set the debug environment variables (GIT_TRANSPORT_HELPER_DEBUG,
GIT_TRANSLOOP_DEBUG and GIT_DEBUG_TESTGIT) and run the test script directly (-v),
then the failure rate goes up to about 1 in 3.

Well, ... I added debug code to git-fast-{im,ex}port which writes the debug info
to a file (can't write to stdout/stderr obviously), so that may well be affecting
the timing enough to increase the chance of a failure. Having said that, If I'm
listening to music (rhythmbox) at the same time, then the failure rate seems to
increase ...

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-09-08 17:42     ` Ramsay Jones
@ 2011-09-08 18:20       ` Jeff King
  2011-09-11 19:14         ` Ramsay Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-09-08 18:20 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Junio C Hamano, Sverre Rabbelier, GIT Mailing-list, Jonathan Nieder

On Thu, Sep 08, 2011 at 06:42:11PM +0100, Ramsay Jones wrote:

> When I run the tests with "make test >test-out", I see a failure rate of about
> 1 in 10. If I then set the debug environment variables (GIT_TRANSPORT_HELPER_DEBUG,
> GIT_TRANSLOOP_DEBUG and GIT_DEBUG_TESTGIT) and run the test script directly (-v),
> then the failure rate goes up to about 1 in 3.

Hmm. I can't reproduce a failure here, but I do get some weirdness. My
recipe is:

-- >8 --
cat >foo.sh <<\EOF
#!/bin/sh

exec >$1.out 2>&1

n=0
while test $n -lt 100; do
	n=$(($n+1))
	GIT_TRANSPORT_HELPER_DEBUG=1 \
	GIT_TRANSLOOP_DEBUG=1 \
	GIT_DEBUG_TESTGIT=1 \
	./t5800-remote-helpers.sh --root=/run/shm/git-tests-$1 -v || {
		echo FAIL $n
		exit 1
	}
	echo OK $n
done
EOF

# try to keep an 8-core machine busy
for i in `seq 1 16`; do
  sh foo.sh $i &
done
-- 8< --

I never see a test failure, but a few of the 16 end up hanging. The
process tree for the hanged tests look like:

  t5800-remote-helper
    git push
      git-remote-testgit
        git fast-import
          git-fast-import

All of them are blocked on wait(), except for the final fast-import,
which is blocked trying to read() from stdin.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-09-08 18:20       ` Jeff King
@ 2011-09-11 19:14         ` Ramsay Jones
  2011-11-01 21:57           ` Alex Riesen
  0 siblings, 1 reply; 12+ messages in thread
From: Ramsay Jones @ 2011-09-11 19:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Sverre Rabbelier, GIT Mailing-list, Jonathan Nieder

Jeff King wrote:
> On Thu, Sep 08, 2011 at 06:42:11PM +0100, Ramsay Jones wrote:
> 
>> When I run the tests with "make test >test-out", I see a failure rate of about
>> 1 in 10. If I then set the debug environment variables (GIT_TRANSPORT_HELPER_DEBUG,
>> GIT_TRANSLOOP_DEBUG and GIT_DEBUG_TESTGIT) and run the test script directly (-v),
>> then the failure rate goes up to about 1 in 3.
> 
> Hmm. I can't reproduce a failure here, but I do get some weirdness. My
> recipe is:

Ah, sorry, ... I didn't make myself clear then, because ...

> -- >8 --
> cat >foo.sh <<\EOF
> #!/bin/sh
> 
> exec >$1.out 2>&1
> 
> n=0
> while test $n -lt 100; do
> 	n=$(($n+1))
> 	GIT_TRANSPORT_HELPER_DEBUG=1 \
> 	GIT_TRANSLOOP_DEBUG=1 \
> 	GIT_DEBUG_TESTGIT=1 \
> 	./t5800-remote-helpers.sh --root=/run/shm/git-tests-$1 -v || {
> 		echo FAIL $n
> 		exit 1
> 	}
> 	echo OK $n
> done
> EOF
> 
> # try to keep an 8-core machine busy
> for i in `seq 1 16`; do
>   sh foo.sh $i &
> done
> -- 8< --
> 
> I never see a test failure, but a few of the 16 end up hanging. The
> process tree for the hanged tests look like:
> 
>   t5800-remote-helper
>     git push
>       git-remote-testgit
>         git fast-import
>           git-fast-import
> 
> All of them are blocked on wait(), except for the final fast-import,
> which is blocked trying to read() from stdin.

... these hangs *are* the failures of which I speak!  Yes, the script
doesn't get to declare a failure, but AFAIAC a hanging test (and it
isn't the same test # each time) is a failing test. :-D

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-09-11 19:14         ` Ramsay Jones
@ 2011-11-01 21:57           ` Alex Riesen
  2011-11-01 22:18             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Riesen @ 2011-11-01 21:57 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: Jeff King, Junio C Hamano, Sverre Rabbelier, GIT Mailing-list,
	Jonathan Nieder

On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
> ... these hangs *are* the failures of which I speak!  Yes, the script
> doesn't get to declare a failure, but AFAIAC a hanging test (and it
> isn't the same test # each time) is a failing test. :-D

Was there any outcome of this discussion? I'm asking because I
can reproduce this very reliably on a little server here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-11-01 21:57           ` Alex Riesen
@ 2011-11-01 22:18             ` Junio C Hamano
  2011-11-01 23:02               ` Alex Riesen
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-11-01 22:18 UTC (permalink / raw)
  To: Alex Riesen
  Cc: Ramsay Jones, Jeff King, Sverre Rabbelier, GIT Mailing-list,
	Jonathan Nieder

Alex Riesen <raa.lkml@gmail.com> writes:

> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>> ... these hangs *are* the failures of which I speak!  Yes, the script
>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>> isn't the same test # each time) is a failing test. :-D
>
> Was there any outcome of this discussion? I'm asking because I
> can reproduce this very reliably on a little server here.

I do remember this discussion and recall seeing _no_ outcome.

I did see the hang myself once or twice but did not and do not have a
reliable reproduction. I have been waiting for somebody to raise the issue
again ;-).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-11-01 22:18             ` Junio C Hamano
@ 2011-11-01 23:02               ` Alex Riesen
  2011-11-02 23:35                 ` Sverre Rabbelier
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Riesen @ 2011-11-01 23:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramsay Jones, Jeff King, Sverre Rabbelier, GIT Mailing-list,
	Jonathan Nieder

On Tue, Nov 1, 2011 at 23:18, Junio C Hamano <gitster@pobox.com> wrote:
> Alex Riesen <raa.lkml@gmail.com> writes:
>
>> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>> ... these hangs *are* the failures of which I speak!  Yes, the script
>>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>>> isn't the same test # each time) is a failing test. :-D
>>
>> Was there any outcome of this discussion? I'm asking because I
>> can reproduce this very reliably on a little server here.
>
> I do remember this discussion and recall seeing _no_ outcome.
>
> I did see the hang myself once or twice but did not and do not have a
> reliable reproduction. I have been waiting for somebody to raise the issue
> again ;-).
>

I think I managed to bisect it (between 1.7.6 and 1.7.7):

$ git bisect start v1.7.7 v1.7.6
...
$ git bisect good
a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
commit a515ebe9f1ac9bc248c12a291dc008570de505ca
Author: Sverre Rabbelier <srabbelier@gmail.com>
Date:   Sat Jul 16 15:03:40 2011 +0200

    transport-helper: implement marks location as capability

    Now that the gitdir location is exported as an environment variable
    this can be implemented elegantly without requiring any explicit
    flushes nor an ad-hoc exchange of values.

    Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
    Acked-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23
e9c832bfd3da7db771cc2113027d3e590dc51d59 M	git-remote-testgit.py
:100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c
74c3122df1835c45a6b621205fb18b4fc89af366 M	transport-helper.c

Sadly, I'm going to be able to repeat the test in about 20 hours.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-11-01 23:02               ` Alex Riesen
@ 2011-11-02 23:35                 ` Sverre Rabbelier
  2011-11-03  1:30                   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Sverre Rabbelier @ 2011-11-02 23:35 UTC (permalink / raw)
  To: Alex Riesen, Ævar Arnfjörð
  Cc: Junio C Hamano, Ramsay Jones, Jeff King, GIT Mailing-list,
	Jonathan Nieder

Heya,

On Wed, Nov 2, 2011 at 00:02, Alex Riesen <raa.lkml@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 23:18, Junio C Hamano <gitster@pobox.com> wrote:
>> Alex Riesen <raa.lkml@gmail.com> writes:
>>
>>> On Sun, Sep 11, 2011 at 21:14, Ramsay Jones <ramsay@ramsay1.demon.co.uk> wrote:
>>>> ... these hangs *are* the failures of which I speak!  Yes, the script
>>>> doesn't get to declare a failure, but AFAIAC a hanging test (and it
>>>> isn't the same test # each time) is a failing test. :-D
>>>
>>> Was there any outcome of this discussion? I'm asking because I
>>> can reproduce this very reliably on a little server here.
>>
>> I do remember this discussion and recall seeing _no_ outcome.
>>
>> I did see the hang myself once or twice but did not and do not have a
>> reliable reproduction. I have been waiting for somebody to raise the issue
>> again ;-).
>>
>
> I think I managed to bisect it (between 1.7.6 and 1.7.7):
>
> $ git bisect start v1.7.7 v1.7.6
> ...
> $ git bisect good
> a515ebe9f1ac9bc248c12a291dc008570de505ca is the first bad commit
> commit a515ebe9f1ac9bc248c12a291dc008570de505ca
> Author: Sverre Rabbelier <srabbelier@gmail.com>
> Date:   Sat Jul 16 15:03:40 2011 +0200
>
>    transport-helper: implement marks location as capability
>
>    Now that the gitdir location is exported as an environment variable
>    this can be implemented elegantly without requiring any explicit
>    flushes nor an ad-hoc exchange of values.
>
>    Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
>    Acked-by: Jeff King <peff@peff.net>
>    Signed-off-by: Junio C Hamano <gitster@pobox.com>
>
> :100644 100644 1ed7a5651ef5a2320c56856b5a1fe784e178ab23
> e9c832bfd3da7db771cc2113027d3e590dc51d59 M      git-remote-testgit.py
> :100644 100644 0cfc9ae9059ce121b567406d7941b71cd54b961c
> 74c3122df1835c45a6b621205fb18b4fc89af366 M      transport-helper.c
>
> Sadly, I'm going to be able to repeat the test in about 20 hours.

Ævar, this seems like something we could look at during the mini
GitTogether in Amsterdam this Saturday, no?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: t5800-*.sh: Intermittent test failures
  2011-11-02 23:35                 ` Sverre Rabbelier
@ 2011-11-03  1:30                   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-11-03  1:30 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Alex Riesen, Ævar Arnfjörð,
	Ramsay Jones, Jeff King, GIT Mailing-list, Jonathan Nieder

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Ævar, this seems like something we could look at during the mini
> GitTogether in Amsterdam this Saturday, no?

Have fun.

I think I happened to hit this while testing today's 'pu' that hasn't been
pushed out. The process chain looks like this:

pid  command                     stuck at
4767 sh t5800-remote-helpers.sh  wait4(-1)
 4793 git push                   read(6)
  4809 git-remote-testgit        wait4(4906)
   4906 git fast-import          wait4(4912)
    4912 git-fast-import         read(0)

lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/6 -> pipe:[133037701]
l-wx------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/7 -> pipe:[133037700]
lr-x------ 1 junio junio 64 Nov  2 18:21 /proc/4793/fd/8 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4809/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:05 /proc/4906/fd/1 -> pipe:[133037701]
lr-x------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/0 -> pipe:[133037700]
l-wx------ 1 junio junio 64 Nov  2 18:03 /proc/4912/fd/1 -> pipe:[133037701]

So "git push (4793)" is stuck reading from pipe:[133037701], expecting the
innermost "git-fast-import (4912)" to write to it via its standard output,
but the latter is waiting to read from pipe:[133037700], hoping the former
to write to it via its fd#7.

Does this deadlock ring a bell to anybody who's involved in these
codepaths?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-11-03  1:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-09 18:30 t5800-*.sh: Intermittent test failures Ramsay Jones
2011-08-11 21:39 ` Sverre Rabbelier
2011-08-13 20:51   ` Ramsay Jones
2011-09-04 19:06   ` Junio C Hamano
2011-09-08 17:42     ` Ramsay Jones
2011-09-08 18:20       ` Jeff King
2011-09-11 19:14         ` Ramsay Jones
2011-11-01 21:57           ` Alex Riesen
2011-11-01 22:18             ` Junio C Hamano
2011-11-01 23:02               ` Alex Riesen
2011-11-02 23:35                 ` Sverre Rabbelier
2011-11-03  1:30                   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.