* [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
@ 2018-07-09 13:45 Cornelia Huck
2018-07-11 13:06 ` Stefan Hajnoczi
0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-09 13:45 UTC (permalink / raw)
To: Kevin Wolf, Max Reitz, Stefan Hajnoczi
Cc: qemu-block, qemu-devel, Eric Blake, Paolo Bonzini
Hi,
I recently noticed that iotest 147 was hanging on my laptop, but worked
fine on my s390x LPAR. Turned out that the architecture was a red
herring; on both platforms, things fail with the 'simple' trace backend
and work with e.g. the 'log' trace backend. Some details on the
failures with the 'simple' backend:
- The first run of 147 passes. However, there are two processes hanging
around, one using a unix socket and one using an inet socket:
cohuck 22912 0.0 0.0 156580 3836 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
cohuck 22925 0.0 0.0 156580 3840 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
Attaching a gdb shows that we seem to be waiting on flushing:
(gdb) bt
#0 0x00007f461c078b99 in syscall () from /lib64/libc.so.6
#1 0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
#2 0x0000560cf3a1caf2 in flush_trace_file (wait=255)
at /home/cohuck/git/qemu/trace/simple.c:139
#3 st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
#4 0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
#5 0x00007f461bfc022a in exit () from /lib64/libc.so.6
#6 0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
at /home/cohuck/git/qemu/qemu-nbd.c:1076
(for both processes)
- When I'm running 147 again, the test using the inet socket is unhappy
as the port is already in use:
test_inet (__main__.QemuNBD) ... DEBUG:QMP:>>> {'execute': 'qmp_capabilities'}
DEBUG:QMP:<<< {u'return': {}}
Failed to find an available port: Address already in use
...but then it sits there, with ps showing a zombie qemu-nbd process,
until I Ctrl-C out of it. There's now a new qemu-nbd process (command
line identical to the first process above), and it's hanging with the
same backtrace (so same root problem, presumably). I would have
expected 147 to either fail or continue, though, instead of hanging
around.
Running under Fedora 27, current QEMU master.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-09 13:45 [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147 Cornelia Huck
@ 2018-07-11 13:06 ` Stefan Hajnoczi
2018-07-11 13:15 ` Cornelia Huck
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-11 13:06 UTC (permalink / raw)
To: Cornelia Huck
Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]
On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:
> Hi,
>
> I recently noticed that iotest 147 was hanging on my laptop, but worked
> fine on my s390x LPAR. Turned out that the architecture was a red
> herring; on both platforms, things fail with the 'simple' trace backend
> and work with e.g. the 'log' trace backend. Some details on the
> failures with the 'simple' backend:
>
> - The first run of 147 passes. However, there are two processes hanging
> around, one using a unix socket and one using an inet socket:
>
> cohuck 22912 0.0 0.0 156580 3836 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> cohuck 22925 0.0 0.0 156580 3840 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
>
> Attaching a gdb shows that we seem to be waiting on flushing:
>
> (gdb) bt
> #0 0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> #1 0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> #2 0x0000560cf3a1caf2 in flush_trace_file (wait=255)
> at /home/cohuck/git/qemu/trace/simple.c:139
> #3 st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> #4 0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> #5 0x00007f461bfc022a in exit () from /lib64/libc.so.6
> #6 0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
> at /home/cohuck/git/qemu/qemu-nbd.c:1076
>
> (for both processes)
Please also print backtraces for the other threads:
(gdb) thread apply all bt
There should be another thread in writeout_thread() so I'm surprised
that flush_trace_file() is getting stuck in g_cond_wait().
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-11 13:06 ` Stefan Hajnoczi
@ 2018-07-11 13:15 ` Cornelia Huck
2018-07-11 13:33 ` Cornelia Huck
0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-11 13:15 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
qemu-devel, qemu-block
On Wed, 11 Jul 2018 14:06:17 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:
> > Hi,
> >
> > I recently noticed that iotest 147 was hanging on my laptop, but worked
> > fine on my s390x LPAR. Turned out that the architecture was a red
> > herring; on both platforms, things fail with the 'simple' trace backend
> > and work with e.g. the 'log' trace backend. Some details on the
> > failures with the 'simple' backend:
> >
> > - The first run of 147 passes. However, there are two processes hanging
> > around, one using a unix socket and one using an inet socket:
> >
> > cohuck 22912 0.0 0.0 156580 3836 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> > cohuck 22925 0.0 0.0 156580 3840 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
> >
> > Attaching a gdb shows that we seem to be waiting on flushing:
> >
> > (gdb) bt
> > #0 0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> > #1 0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> > #2 0x0000560cf3a1caf2 in flush_trace_file (wait=255)
> > at /home/cohuck/git/qemu/trace/simple.c:139
> > #3 st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> > #4 0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> > #5 0x00007f461bfc022a in exit () from /lib64/libc.so.6
> > #6 0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
> > at /home/cohuck/git/qemu/qemu-nbd.c:1076
> >
> > (for both processes)
>
> Please also print backtraces for the other threads:
>
> (gdb) thread apply all bt
>
> There should be another thread in writeout_thread() so I'm surprised
> that flush_trace_file() is getting stuck in g_cond_wait().
I'll re-run to check, but there was only one thread in the process in
question.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-11 13:15 ` Cornelia Huck
@ 2018-07-11 13:33 ` Cornelia Huck
2018-07-12 16:30 ` Stefan Hajnoczi
0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-11 13:33 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
qemu-devel, qemu-block
On Wed, 11 Jul 2018 15:15:45 +0200
Cornelia Huck <cohuck@redhat.com> wrote:
> On Wed, 11 Jul 2018 14:06:17 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> > On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:
> > > Hi,
> > >
> > > I recently noticed that iotest 147 was hanging on my laptop, but worked
> > > fine on my s390x LPAR. Turned out that the architecture was a red
> > > herring; on both platforms, things fail with the 'simple' trace backend
> > > and work with e.g. the 'log' trace backend. Some details on the
> > > failures with the 'simple' backend:
> > >
> > > - The first run of 147 passes. However, there are two processes hanging
> > > around, one using a unix socket and one using an inet socket:
> > >
> > > cohuck 22912 0.0 0.0 156580 3836 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> > > cohuck 22925 0.0 0.0 156580 3840 ? Ss 14:32 0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
> > >
> > > Attaching a gdb shows that we seem to be waiting on flushing:
> > >
> > > (gdb) bt
> > > #0 0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> > > #1 0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> > > #2 0x0000560cf3a1caf2 in flush_trace_file (wait=255)
> > > at /home/cohuck/git/qemu/trace/simple.c:139
> > > #3 st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> > > #4 0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> > > #5 0x00007f461bfc022a in exit () from /lib64/libc.so.6
> > > #6 0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
> > > at /home/cohuck/git/qemu/qemu-nbd.c:1076
> > >
> > > (for both processes)
> >
> > Please also print backtraces for the other threads:
> >
> > (gdb) thread apply all bt
> >
> > There should be another thread in writeout_thread() so I'm surprised
> > that flush_trace_file() is getting stuck in g_cond_wait().
>
> I'll re-run to check, but there was only one thread in the process in
> question.
OK, I have two threads for one of the qemu-nbds using inet created on
the second run (when it fails with the 'port already in use' message):
(gdb) thread apply all bt
Thread 2 (Thread 0x7f639be49700 (LWP 3091)):
#0 0x00007f639d549b99 in syscall () from /lib64/libc.so.6
#1 0x00007f639e60750f in g_cond_wait () from /lib64/libglib-2.0.so.0
#2 0x00005619516d298f in wait_for_trace_records_available ()
at /home/cohuck/git/qemu/trace/simple.c:150
#3 writeout_thread (opaque=<optimized out>)
at /home/cohuck/git/qemu/trace/simple.c:169
#4 0x00007f639e5e9486 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#5 0x00007f639d81750b in start_thread () from /lib64/libpthread.so.0
#6 0x00007f639d54f16f in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f63a05bba40 (LWP 3090)):
#0 0x00007f639d820d68 in read () from /lib64/libpthread.so.0
#1 0x00005619515ec61c in main (argc=<optimized out>, argv=0x7ffc1220bfb8)
at /home/cohuck/git/qemu/qemu-nbd.c:881
That one goes away after I Ctrl-C out of the hanging iotest (together
with the zombie qemu-ndb).
The other qemu-nbds (the inet and the unix socket ones from the first
run, the second inet one from the second run) have a single thread with
the same backtrace I posted above.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-11 13:33 ` Cornelia Huck
@ 2018-07-12 16:30 ` Stefan Hajnoczi
2018-07-13 6:40 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-12 16:30 UTC (permalink / raw)
To: Cornelia Huck
Cc: Stefan Hajnoczi, Kevin Wolf, Max Reitz, Paolo Bonzini,
qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 721 bytes --]
On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
> The other qemu-nbds (the inet and the unix socket ones from the first
> run, the second inet one from the second run) have a single thread with
> the same backtrace I posted above.
We just discussed this on IRC, but for the record:
qemu-nbd --fork will fork the process after the simpletrace write-out
thread has been spawned. The child process lacks this thread (due to
how fork(2) handles multithreading). Either qemu-nbd needs to
initialize tracing later (but that means we cannot trace early init) or
simpletrace needs a way to respawn the write-out thread.
I have put this on my TODO list but I'm not sure how soon I'll get
around to it.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-12 16:30 ` Stefan Hajnoczi
@ 2018-07-13 6:40 ` Paolo Bonzini
2018-07-13 13:53 ` Stefan Hajnoczi
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-07-13 6:40 UTC (permalink / raw)
To: Stefan Hajnoczi, Cornelia Huck
Cc: Stefan Hajnoczi, Kevin Wolf, Max Reitz, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 760 bytes --]
On 12/07/2018 18:30, Stefan Hajnoczi wrote:
> On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
>> The other qemu-nbds (the inet and the unix socket ones from the first
>> run, the second inet one from the second run) have a single thread with
>> the same backtrace I posted above.
>
> We just discussed this on IRC, but for the record:
>
> qemu-nbd --fork will fork the process after the simpletrace write-out
> thread has been spawned. The child process lacks this thread (due to
> how fork(2) handles multithreading). Either qemu-nbd needs to
> initialize tracing later (but that means we cannot trace early init) or
> simpletrace needs a way to respawn the write-out thread.
You can use pthread_atfork for this.
Paolo
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
2018-07-13 6:40 ` Paolo Bonzini
@ 2018-07-13 13:53 ` Stefan Hajnoczi
0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-13 13:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Cornelia Huck, Stefan Hajnoczi, Kevin Wolf, Max Reitz,
qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]
On Fri, Jul 13, 2018 at 08:40:19AM +0200, Paolo Bonzini wrote:
> On 12/07/2018 18:30, Stefan Hajnoczi wrote:
> > On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
> >> The other qemu-nbds (the inet and the unix socket ones from the first
> >> run, the second inet one from the second run) have a single thread with
> >> the same backtrace I posted above.
> >
> > We just discussed this on IRC, but for the record:
> >
> > qemu-nbd --fork will fork the process after the simpletrace write-out
> > thread has been spawned. The child process lacks this thread (due to
> > how fork(2) handles multithreading). Either qemu-nbd needs to
> > initialize tracing later (but that means we cannot trace early init) or
> > simpletrace needs a way to respawn the write-out thread.
>
> You can use pthread_atfork for this.
Thanks. The man page says:
The intent of pthread_atfork() was to provide a mechanism
whereby the application (or a library) could ensure that mutexes and
other process and thread state would be restored to a consistent
state. In practice, this task is generally too difficult to be
practicable.
Challenge accepted!
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-07-13 13:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 13:45 [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147 Cornelia Huck
2018-07-11 13:06 ` Stefan Hajnoczi
2018-07-11 13:15 ` Cornelia Huck
2018-07-11 13:33 ` Cornelia Huck
2018-07-12 16:30 ` Stefan Hajnoczi
2018-07-13 6:40 ` Paolo Bonzini
2018-07-13 13:53 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.