All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
@ 2018-07-09 13:45 Cornelia Huck
  2018-07-11 13:06 ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-09 13:45 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz, Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, Eric Blake, Paolo Bonzini

Hi,

I recently noticed that iotest 147 was hanging on my laptop, but worked
fine on my s390x LPAR. Turned out that the architecture was a red
herring; on both platforms, things fail with the 'simple' trace backend
and work with e.g. the 'log' trace backend. Some details on the
failures with the 'simple' backend:

- The first run of 147 passes. However, there are two processes hanging
  around, one using a unix socket and one using an inet socket:

cohuck   22912  0.0  0.0 156580  3836 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
cohuck   22925  0.0  0.0 156580  3840 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket

Attaching a gdb shows that we seem to be waiting on flushing:

(gdb) bt
#0  0x00007f461c078b99 in syscall () from /lib64/libc.so.6
#1  0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
#2  0x0000560cf3a1caf2 in flush_trace_file (wait=255)
    at /home/cohuck/git/qemu/trace/simple.c:139
#3  st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
#4  0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
#5  0x00007f461bfc022a in exit () from /lib64/libc.so.6
#6  0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
    at /home/cohuck/git/qemu/qemu-nbd.c:1076

(for both processes)

- When I'm running 147 again, the test using the inet socket is unhappy
  as the port is already in use:

test_inet (__main__.QemuNBD) ... DEBUG:QMP:>>> {'execute': 'qmp_capabilities'}
DEBUG:QMP:<<< {u'return': {}}
Failed to find an available port: Address already in use

...but then it sits there, with ps showing a zombie qemu-nbd process,
until I Ctrl-C out of it. There's now a new qemu-nbd process (command
line identical to the first process above), and it's hanging with the
same backtrace (so same root problem, presumably). I would have
expected 147 to either fail or continue, though, instead of hanging
around.

Running under Fedora 27, current QEMU master.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-09 13:45 [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147 Cornelia Huck
@ 2018-07-11 13:06 ` Stefan Hajnoczi
  2018-07-11 13:15   ` Cornelia Huck
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-11 13:06 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:
> Hi,
> 
> I recently noticed that iotest 147 was hanging on my laptop, but worked
> fine on my s390x LPAR. Turned out that the architecture was a red
> herring; on both platforms, things fail with the 'simple' trace backend
> and work with e.g. the 'log' trace backend. Some details on the
> failures with the 'simple' backend:
> 
> - The first run of 147 passes. However, there are two processes hanging
>   around, one using a unix socket and one using an inet socket:
> 
> cohuck   22912  0.0  0.0 156580  3836 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> cohuck   22925  0.0  0.0 156580  3840 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
> 
> Attaching a gdb shows that we seem to be waiting on flushing:
> 
> (gdb) bt
> #0  0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> #1  0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> #2  0x0000560cf3a1caf2 in flush_trace_file (wait=255)
>     at /home/cohuck/git/qemu/trace/simple.c:139
> #3  st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> #4  0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> #5  0x00007f461bfc022a in exit () from /lib64/libc.so.6
> #6  0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
>     at /home/cohuck/git/qemu/qemu-nbd.c:1076
> 
> (for both processes)

Please also print backtraces for the other threads:

  (gdb) thread apply all bt

There should be another thread in writeout_thread() so I'm surprised
that flush_trace_file() is getting stuck in g_cond_wait().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-11 13:06 ` Stefan Hajnoczi
@ 2018-07-11 13:15   ` Cornelia Huck
  2018-07-11 13:33     ` Cornelia Huck
  0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-11 13:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	qemu-devel, qemu-block

On Wed, 11 Jul 2018 14:06:17 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:
> > Hi,
> > 
> > I recently noticed that iotest 147 was hanging on my laptop, but worked
> > fine on my s390x LPAR. Turned out that the architecture was a red
> > herring; on both platforms, things fail with the 'simple' trace backend
> > and work with e.g. the 'log' trace backend. Some details on the
> > failures with the 'simple' backend:
> > 
> > - The first run of 147 passes. However, there are two processes hanging
> >   around, one using a unix socket and one using an inet socket:
> > 
> > cohuck   22912  0.0  0.0 156580  3836 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> > cohuck   22925  0.0  0.0 156580  3840 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
> > 
> > Attaching a gdb shows that we seem to be waiting on flushing:
> > 
> > (gdb) bt
> > #0  0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> > #1  0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> > #2  0x0000560cf3a1caf2 in flush_trace_file (wait=255)
> >     at /home/cohuck/git/qemu/trace/simple.c:139
> > #3  st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> > #4  0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> > #5  0x00007f461bfc022a in exit () from /lib64/libc.so.6
> > #6  0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
> >     at /home/cohuck/git/qemu/qemu-nbd.c:1076
> > 
> > (for both processes)  
> 
> Please also print backtraces for the other threads:
> 
>   (gdb) thread apply all bt
> 
> There should be another thread in writeout_thread() so I'm surprised
> that flush_trace_file() is getting stuck in g_cond_wait().

I'll re-run to check, but there was only one thread in the process in
question.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-11 13:15   ` Cornelia Huck
@ 2018-07-11 13:33     ` Cornelia Huck
  2018-07-12 16:30       ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Cornelia Huck @ 2018-07-11 13:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	qemu-devel, qemu-block

On Wed, 11 Jul 2018 15:15:45 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 11 Jul 2018 14:06:17 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Mon, Jul 09, 2018 at 03:45:49PM +0200, Cornelia Huck wrote:  
> > > Hi,
> > > 
> > > I recently noticed that iotest 147 was hanging on my laptop, but worked
> > > fine on my s390x LPAR. Turned out that the architecture was a red
> > > herring; on both platforms, things fail with the 'simple' trace backend
> > > and work with e.g. the 'log' trace backend. Some details on the
> > > failures with the 'simple' backend:
> > > 
> > > - The first run of 147 passes. However, there are two processes hanging
> > >   around, one using a unix socket and one using an inet socket:
> > > 
> > > cohuck   22912  0.0  0.0 156580  3836 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -p 10811
> > > cohuck   22925  0.0  0.0 156580  3840 ?        Ss   14:32   0:00 /home/cohuck/git/qemu/build/tests/qemu-iotests/../../qemu-nbd --fork -f qcow2 /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/test.img -k /home/cohuck/git/qemu/build/tests/qemu-iotests/scratch/nbd.socket
> > > 
> > > Attaching a gdb shows that we seem to be waiting on flushing:
> > > 
> > > (gdb) bt
> > > #0  0x00007f461c078b99 in syscall () from /lib64/libc.so.6
> > > #1  0x00007f461d13650f in g_cond_wait () from /lib64/libglib-2.0.so.0
> > > #2  0x0000560cf3a1caf2 in flush_trace_file (wait=255)
> > >     at /home/cohuck/git/qemu/trace/simple.c:139
> > > #3  st_flush_trace_buffer () at /home/cohuck/git/qemu/trace/simple.c:374
> > > #4  0x00007f461bfc01d8 in __run_exit_handlers () from /lib64/libc.so.6
> > > #5  0x00007f461bfc022a in exit () from /lib64/libc.so.6
> > > #6  0x0000560cf392eb7e in main (argc=<optimized out>, argv=<optimized out>)
> > >     at /home/cohuck/git/qemu/qemu-nbd.c:1076
> > > 
> > > (for both processes)    
> > 
> > Please also print backtraces for the other threads:
> > 
> >   (gdb) thread apply all bt
> > 
> > There should be another thread in writeout_thread() so I'm surprised
> > that flush_trace_file() is getting stuck in g_cond_wait().  
> 
> I'll re-run to check, but there was only one thread in the process in
> question.

OK, I have two threads for one of the qemu-nbds using inet created on
the second run (when it fails with the 'port already in use' message):

(gdb) thread apply all bt

Thread 2 (Thread 0x7f639be49700 (LWP 3091)):
#0  0x00007f639d549b99 in syscall () from /lib64/libc.so.6
#1  0x00007f639e60750f in g_cond_wait () from /lib64/libglib-2.0.so.0
#2  0x00005619516d298f in wait_for_trace_records_available ()
    at /home/cohuck/git/qemu/trace/simple.c:150
#3  writeout_thread (opaque=<optimized out>)
    at /home/cohuck/git/qemu/trace/simple.c:169
#4  0x00007f639e5e9486 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#5  0x00007f639d81750b in start_thread () from /lib64/libpthread.so.0
#6  0x00007f639d54f16f in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f63a05bba40 (LWP 3090)):
#0  0x00007f639d820d68 in read () from /lib64/libpthread.so.0
#1  0x00005619515ec61c in main (argc=<optimized out>, argv=0x7ffc1220bfb8)
    at /home/cohuck/git/qemu/qemu-nbd.c:881

That one goes away after I Ctrl-C out of the hanging iotest (together
with the zombie qemu-ndb).

The other qemu-nbds (the inet and the unix socket ones from the first
run, the second inet one from the second run) have a single thread with
the same backtrace I posted above.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-11 13:33     ` Cornelia Huck
@ 2018-07-12 16:30       ` Stefan Hajnoczi
  2018-07-13  6:40         ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-12 16:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Stefan Hajnoczi, Kevin Wolf, Max Reitz, Paolo Bonzini,
	qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
> The other qemu-nbds (the inet and the unix socket ones from the first
> run, the second inet one from the second run) have a single thread with
> the same backtrace I posted above.

We just discussed this on IRC, but for the record:

qemu-nbd --fork will fork the process after the simpletrace write-out
thread has been spawned.  The child process lacks this thread (due to
how fork(2) handles multithreading).  Either qemu-nbd needs to
initialize tracing later (but that means we cannot trace early init) or
simpletrace needs a way to respawn the write-out thread.

I have put this on my TODO list but I'm not sure how soon I'll get
around to it.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-12 16:30       ` Stefan Hajnoczi
@ 2018-07-13  6:40         ` Paolo Bonzini
  2018-07-13 13:53           ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-07-13  6:40 UTC (permalink / raw)
  To: Stefan Hajnoczi, Cornelia Huck
  Cc: Stefan Hajnoczi, Kevin Wolf, Max Reitz, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On 12/07/2018 18:30, Stefan Hajnoczi wrote:
> On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
>> The other qemu-nbds (the inet and the unix socket ones from the first
>> run, the second inet one from the second run) have a single thread with
>> the same backtrace I posted above.
> 
> We just discussed this on IRC, but for the record:
> 
> qemu-nbd --fork will fork the process after the simpletrace write-out
> thread has been spawned.  The child process lacks this thread (due to
> how fork(2) handles multithreading).  Either qemu-nbd needs to
> initialize tracing later (but that means we cannot trace early init) or
> simpletrace needs a way to respawn the write-out thread.

You can use pthread_atfork for this.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147
  2018-07-13  6:40         ` Paolo Bonzini
@ 2018-07-13 13:53           ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-07-13 13:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Cornelia Huck, Stefan Hajnoczi, Kevin Wolf, Max Reitz,
	qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

On Fri, Jul 13, 2018 at 08:40:19AM +0200, Paolo Bonzini wrote:
> On 12/07/2018 18:30, Stefan Hajnoczi wrote:
> > On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
> >> The other qemu-nbds (the inet and the unix socket ones from the first
> >> run, the second inet one from the second run) have a single thread with
> >> the same backtrace I posted above.
> > 
> > We just discussed this on IRC, but for the record:
> > 
> > qemu-nbd --fork will fork the process after the simpletrace write-out
> > thread has been spawned.  The child process lacks this thread (due to
> > how fork(2) handles multithreading).  Either qemu-nbd needs to
> > initialize tracing later (but that means we cannot trace early init) or
> > simpletrace needs a way to respawn the write-out thread.
> 
> You can use pthread_atfork for this.

Thanks.  The man page says:

  The intent of  pthread_atfork()  was  to provide  a  mechanism
  whereby  the application (or a library) could ensure that mutexes and
  other process and thread state would be restored to a consistent
  state.  In practice, this task is generally too difficult to be
  practicable.

Challenge accepted!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-13 13:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 13:45 [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147 Cornelia Huck
2018-07-11 13:06 ` Stefan Hajnoczi
2018-07-11 13:15   ` Cornelia Huck
2018-07-11 13:33     ` Cornelia Huck
2018-07-12 16:30       ` Stefan Hajnoczi
2018-07-13  6:40         ` Paolo Bonzini
2018-07-13 13:53           ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.