All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] test-filter-mirror hangs
@ 2019-01-11 15:01 Peter Maydell
  2019-01-11 16:15 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Maydell @ 2019-01-11 15:01 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Zhang Chen, Li Zhijian, Paolo Bonzini, Dr. David Alan Gilbert

Recently I've noticed that test-filter-mirror has been hanging
intermittently, typically when run on some other TCG architecture.
In the instance I've just looked at, this was with s390x guest on
x86-64 host, though I've also seen it on other host archs and
perhaps with other guests.

Below is a backtrace, though it seems to be pretty unhelpful.
Anybody got any theories ? Does the mirror test rely on dirty
memory bitmaps like the migration test (which also hangs
occasionally with TCG due to some bug I'm sure we've investigated
in the past) ?

Processes:
petmay01 10533  0.0  0.0  15624  1400 pts/16   S+   14:22   0:00
                           \_ tests/test-filter-mirror --quiet
--keep-going -m=quick --GTestLogFD=6
petmay01 10534  0.0  0.2 979892 36492 pts/16   Sl+  14:22   0:00
                               \_ s390x-softmmu/qemu-system-s390x
-qtest unix:/tmp/qtest-10533.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-10533.qmp,nowait,id=char0 -mon
chardev=char0,mode=control -machine accel=qtest -display none -netdev
socket,id=qtest-bn0,fd=4 -device
virtio-net-ccw,netdev=qtest-bn0,id=qtest-e0 -chardev
socket,id=mirror0,path=filter-mirror.Ongzms,server,nowait -object
filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0

Backtrace of all threads in the QEMU process:
Thread 4 (Thread 0x7f59b67ff700 (LWP 10539)):
#0  0x00007f59cf908156 in __sigwait (sig=0x7f59b67fc510, set=<optimised out>)
    at ../sysdeps/unix/sysv/linux/sigwait.c:64
#1  0x00007f59cf908156 in __sigwait (set=<optimised out>, sig=0x7f59b67fc510)
    at ../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x00005639a87c4c8e in qemu_dummy_cpu_thread_fn (arg=0x5639aa544250)
    at /home/petmay01/linaro/qemu-for-merges/cpus.c:1326
#3  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa569010)
    at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
#4  0x00007f59cf8fe6ba in start_thread (arg=0x7f59b67ff700) at
pthread_create.c:333
#5  0x00007f59cf63441d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7f59bf1f0700 (LWP 10538)):
#0  0x00007f59cf62874d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f59e64e638c in g_main_context_iterate (priority=2147483647,
n_fds=3, fds=0x7f59b0001300, timeout=<optimised out>,
context=0x7f59b00008c0)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4135
#2  0x00007f59e64e638c in g_main_context_iterate
(context=0x7f59b00008c0, block=block@entry=1,
dispatch=dispatch@entry=1, self=<optimised out>)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:3835
#3  0x00007f59e64e6712 in g_main_loop_run (loop=0x7f59b00012e0)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4034
#4  0x00005639a88d187a in iothread_run (opaque=0x5639aa512710)
    at /home/petmay01/linaro/qemu-for-merges/iothread.c:74
#5  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa49d6a0)
    at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
#6  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf1f0700) at
pthread_create.c:333
#7  0x00007f59cf63441d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7f59bf9f1700 (LWP 10537)):
#0  0x00007f59cf62e4d9 in syscall () at
../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005639a8b8b58f in qemu_futex_wait (f=0x5639a930a418
<rcu_call_ready_event>, val=4294967295)
    at /home/petmay01/linaro/qemu-for-merges/include/qemu/futex.h:29
#2  0x00005639a8b8b75e in qemu_event_wait (ev=0x5639a930a418
<rcu_call_ready_event>)
    at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:442
#3  0x00005639a8ba41d3 in call_rcu_thread (opaque=0x0)
    at /home/petmay01/linaro/qemu-for-merges/util/rcu.c:261
#4  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa449ac0)
    at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
#5  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf9f1700) at
pthread_create.c:333
#6  0x00007f59cf63441d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f59e9017fc0 (LWP 10534)):
#0  0x00007f59cf628811 in __GI_ppoll (fds=0x5639aa5a1b90, nfds=6,
timeout=<optimised out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:50
#1  0x00005639a8b8534b in qemu_poll_ns (fds=0x5639aa5a1b90, nfds=6, timeout=-1)
    at /home/petmay01/linaro/qemu-for-merges/util/qemu-timer.c:322
---Type <return> to continue, or q <return> to quit---
#2  0x00005639a8b86527 in os_host_main_loop_wait (timeout=-1)
    at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:233
#3  0x00005639a8b865fe in main_loop_wait (nonblocking=0)
    at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:497
#4  0x00005639a88dab83 in main_loop () at
/home/petmay01/linaro/qemu-for-merges/vl.c:1925
#5  0x00005639a88e25b9 in main (argc=21, argv=0x7ffd344d6138,
envp=0x7ffd344d61e8)
    at /home/petmay01/linaro/qemu-for-merges/vl.c:4669

Backtrace in test-filter-mirror:
Thread 1 (Thread 0x7fd24ab03740 (LWP 10533)):
#0  0x00007fd24a0eb81d in __libc_recv (fd=8, buf=0x7ffcef3a8c58, n=4, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
#1  0x000055e065d2d26a in test_mirror ()
    at /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:70
#2  0x00007fd24a67087b in test_case_run (tc=0x55e0676b6a00)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2158
#3  g_test_run_suite_internal (suite=suite@entry=0x55e0676b6060,
path=path@entry=0x0)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2241
#4  0x00007fd24a670a43 in g_test_run_suite_internal
(suite=suite@entry=0x55e0676b6040,
    path=path@entry=0x0) at
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
#5  0x00007fd24a670a43 in g_test_run_suite_internal
(suite=suite@entry=0x55e0676b6020,
    path=path@entry=0x0) at
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
#6  0x00007fd24a670c4e in g_test_run_suite (suite=0x55e0676b6020)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2328
#7  0x00007fd24a670c71 in g_test_run ()
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:1596
#8  0x000055e065d2d4af in main (argc=1, argv=0x7ffcef3a9128)
    at /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:92


thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-11 15:01 [Qemu-devel] test-filter-mirror hangs Peter Maydell
@ 2019-01-11 16:15 ` Dr. David Alan Gilbert
  2019-01-14 16:33   ` Zhang Chen
  2019-01-15 10:28   ` Peter Maydell
  0 siblings, 2 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-11 16:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Zhang Chen, Li Zhijian, Paolo Bonzini

* Peter Maydell (peter.maydell@linaro.org) wrote:
> Recently I've noticed that test-filter-mirror has been hanging
> intermittently, typically when run on some other TCG architecture.
> In the instance I've just looked at, this was with s390x guest on
> x86-64 host, though I've also seen it on other host archs and
> perhaps with other guests.

Watch out to see if you really do see it for other guests;
it carefully avoids using virtio-net to avoid vhost; but on s390x it
uses virtio-net-ccw - could that hit the vhost it was trying to avoid?

> Below is a backtrace, though it seems to be pretty unhelpful.
> Anybody got any theories ? Does the mirror test rely on dirty
> memory bitmaps like the migration test (which also hangs
> occasionally with TCG due to some bug I'm sure we've investigated
> in the past) ?

I don't think it relies on the CPU at all.

Dave

> Processes:
> petmay01 10533  0.0  0.0  15624  1400 pts/16   S+   14:22   0:00
>                            \_ tests/test-filter-mirror --quiet
> --keep-going -m=quick --GTestLogFD=6
> petmay01 10534  0.0  0.2 979892 36492 pts/16   Sl+  14:22   0:00
>                                \_ s390x-softmmu/qemu-system-s390x
> -qtest unix:/tmp/qtest-10533.sock,nowait -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-10533.qmp,nowait,id=char0 -mon
> chardev=char0,mode=control -machine accel=qtest -display none -netdev
> socket,id=qtest-bn0,fd=4 -device
> virtio-net-ccw,netdev=qtest-bn0,id=qtest-e0 -chardev
> socket,id=mirror0,path=filter-mirror.Ongzms,server,nowait -object
> filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0
> 
> Backtrace of all threads in the QEMU process:
> Thread 4 (Thread 0x7f59b67ff700 (LWP 10539)):
> #0  0x00007f59cf908156 in __sigwait (sig=0x7f59b67fc510, set=<optimised out>)
>     at ../sysdeps/unix/sysv/linux/sigwait.c:64
> #1  0x00007f59cf908156 in __sigwait (set=<optimised out>, sig=0x7f59b67fc510)
>     at ../sysdeps/unix/sysv/linux/sigwait.c:96
> #2  0x00005639a87c4c8e in qemu_dummy_cpu_thread_fn (arg=0x5639aa544250)
>     at /home/petmay01/linaro/qemu-for-merges/cpus.c:1326
> #3  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa569010)
>     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> #4  0x00007f59cf8fe6ba in start_thread (arg=0x7f59b67ff700) at
> pthread_create.c:333
> #5  0x00007f59cf63441d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 3 (Thread 0x7f59bf1f0700 (LWP 10538)):
> #0  0x00007f59cf62874d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007f59e64e638c in g_main_context_iterate (priority=2147483647,
> n_fds=3, fds=0x7f59b0001300, timeout=<optimised out>,
> context=0x7f59b00008c0)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4135
> #2  0x00007f59e64e638c in g_main_context_iterate
> (context=0x7f59b00008c0, block=block@entry=1,
> dispatch=dispatch@entry=1, self=<optimised out>)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:3835
> #3  0x00007f59e64e6712 in g_main_loop_run (loop=0x7f59b00012e0)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4034
> #4  0x00005639a88d187a in iothread_run (opaque=0x5639aa512710)
>     at /home/petmay01/linaro/qemu-for-merges/iothread.c:74
> #5  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa49d6a0)
>     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> #6  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf1f0700) at
> pthread_create.c:333
> #7  0x00007f59cf63441d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 2 (Thread 0x7f59bf9f1700 (LWP 10537)):
> #0  0x00007f59cf62e4d9 in syscall () at
> ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> #1  0x00005639a8b8b58f in qemu_futex_wait (f=0x5639a930a418
> <rcu_call_ready_event>, val=4294967295)
>     at /home/petmay01/linaro/qemu-for-merges/include/qemu/futex.h:29
> #2  0x00005639a8b8b75e in qemu_event_wait (ev=0x5639a930a418
> <rcu_call_ready_event>)
>     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:442
> #3  0x00005639a8ba41d3 in call_rcu_thread (opaque=0x0)
>     at /home/petmay01/linaro/qemu-for-merges/util/rcu.c:261
> #4  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa449ac0)
>     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> #5  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf9f1700) at
> pthread_create.c:333
> #6  0x00007f59cf63441d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 1 (Thread 0x7f59e9017fc0 (LWP 10534)):
> #0  0x00007f59cf628811 in __GI_ppoll (fds=0x5639aa5a1b90, nfds=6,
> timeout=<optimised out>, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:50
> #1  0x00005639a8b8534b in qemu_poll_ns (fds=0x5639aa5a1b90, nfds=6, timeout=-1)
>     at /home/petmay01/linaro/qemu-for-merges/util/qemu-timer.c:322
> ---Type <return> to continue, or q <return> to quit---
> #2  0x00005639a8b86527 in os_host_main_loop_wait (timeout=-1)
>     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:233
> #3  0x00005639a8b865fe in main_loop_wait (nonblocking=0)
>     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:497
> #4  0x00005639a88dab83 in main_loop () at
> /home/petmay01/linaro/qemu-for-merges/vl.c:1925
> #5  0x00005639a88e25b9 in main (argc=21, argv=0x7ffd344d6138,
> envp=0x7ffd344d61e8)
>     at /home/petmay01/linaro/qemu-for-merges/vl.c:4669
> 
> Backtrace in test-filter-mirror:
> Thread 1 (Thread 0x7fd24ab03740 (LWP 10533)):
> #0  0x00007fd24a0eb81d in __libc_recv (fd=8, buf=0x7ffcef3a8c58, n=4, flags=0)
>     at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
> #1  0x000055e065d2d26a in test_mirror ()
>     at /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:70
> #2  0x00007fd24a67087b in test_case_run (tc=0x55e0676b6a00)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2158
> #3  g_test_run_suite_internal (suite=suite@entry=0x55e0676b6060,
> path=path@entry=0x0)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2241
> #4  0x00007fd24a670a43 in g_test_run_suite_internal
> (suite=suite@entry=0x55e0676b6040,
>     path=path@entry=0x0) at
> /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
> #5  0x00007fd24a670a43 in g_test_run_suite_internal
> (suite=suite@entry=0x55e0676b6020,
>     path=path@entry=0x0) at
> /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
> #6  0x00007fd24a670c4e in g_test_run_suite (suite=0x55e0676b6020)
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2328
> #7  0x00007fd24a670c71 in g_test_run ()
>     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:1596
> #8  0x000055e065d2d4af in main (argc=1, argv=0x7ffcef3a9128)
>     at /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:92
> 
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-11 16:15 ` Dr. David Alan Gilbert
@ 2019-01-14 16:33   ` Zhang Chen
  2019-01-17  9:46     ` Jason Wang
  2019-01-15 10:28   ` Peter Maydell
  1 sibling, 1 reply; 26+ messages in thread
From: Zhang Chen @ 2019-01-14 16:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Jason Wang
  Cc: Peter Maydell, QEMU Developers, Li Zhijian, Paolo Bonzini

On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > Recently I've noticed that test-filter-mirror has been hanging
> > intermittently, typically when run on some other TCG architecture.
> > In the instance I've just looked at, this was with s390x guest on
> > x86-64 host, though I've also seen it on other host archs and
> > perhaps with other guests.
>
> Watch out to see if you really do see it for other guests;
> it carefully avoids using virtio-net to avoid vhost; but on s390x it
> uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>
> > Below is a backtrace, though it seems to be pretty unhelpful.
> > Anybody got any theories ? Does the mirror test rely on dirty
> > memory bitmaps like the migration test (which also hangs
> > occasionally with TCG due to some bug I'm sure we've investigated
> > in the past) ?
>
> I don't think it relies on the CPU at all.
>
> Dave
>
>
 I have no idea about this currently, but Jason and I designed the test
case.
Add Jason: Have any comments about this ?

Thanks
Zhang Chen



> > Processes:
> > petmay01 10533  0.0  0.0  15624  1400 pts/16   S+   14:22   0:00
> >                            \_ tests/test-filter-mirror --quiet
> > --keep-going -m=quick --GTestLogFD=6
> > petmay01 10534  0.0  0.2 979892 36492 pts/16   Sl+  14:22   0:00
> >                                \_ s390x-softmmu/qemu-system-s390x
> > -qtest unix:/tmp/qtest-10533.sock,nowait -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-10533.qmp,nowait,id=char0 -mon
> > chardev=char0,mode=control -machine accel=qtest -display none -netdev
> > socket,id=qtest-bn0,fd=4 -device
> > virtio-net-ccw,netdev=qtest-bn0,id=qtest-e0 -chardev
> > socket,id=mirror0,path=filter-mirror.Ongzms,server,nowait -object
> > filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0
> >
> > Backtrace of all threads in the QEMU process:
> > Thread 4 (Thread 0x7f59b67ff700 (LWP 10539)):
> > #0  0x00007f59cf908156 in __sigwait (sig=0x7f59b67fc510, set=<optimised
> out>)
> >     at ../sysdeps/unix/sysv/linux/sigwait.c:64
> > #1  0x00007f59cf908156 in __sigwait (set=<optimised out>,
> sig=0x7f59b67fc510)
> >     at ../sysdeps/unix/sysv/linux/sigwait.c:96
> > #2  0x00005639a87c4c8e in qemu_dummy_cpu_thread_fn (arg=0x5639aa544250)
> >     at /home/petmay01/linaro/qemu-for-merges/cpus.c:1326
> > #3  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa569010)
> >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> > #4  0x00007f59cf8fe6ba in start_thread (arg=0x7f59b67ff700) at
> > pthread_create.c:333
> > #5  0x00007f59cf63441d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> >
> > Thread 3 (Thread 0x7f59bf1f0700 (LWP 10538)):
> > #0  0x00007f59cf62874d in poll () at
> ../sysdeps/unix/syscall-template.S:84
> > #1  0x00007f59e64e638c in g_main_context_iterate (priority=2147483647,
> > n_fds=3, fds=0x7f59b0001300, timeout=<optimised out>,
> > context=0x7f59b00008c0)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4135
> > #2  0x00007f59e64e638c in g_main_context_iterate
> > (context=0x7f59b00008c0, block=block@entry=1,
> > dispatch=dispatch@entry=1, self=<optimised out>)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:3835
> > #3  0x00007f59e64e6712 in g_main_loop_run (loop=0x7f59b00012e0)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4034
> > #4  0x00005639a88d187a in iothread_run (opaque=0x5639aa512710)
> >     at /home/petmay01/linaro/qemu-for-merges/iothread.c:74
> > #5  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa49d6a0)
> >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> > #6  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf1f0700) at
> > pthread_create.c:333
> > #7  0x00007f59cf63441d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> >
> > Thread 2 (Thread 0x7f59bf9f1700 (LWP 10537)):
> > #0  0x00007f59cf62e4d9 in syscall () at
> > ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> > #1  0x00005639a8b8b58f in qemu_futex_wait (f=0x5639a930a418
> > <rcu_call_ready_event>, val=4294967295)
> >     at /home/petmay01/linaro/qemu-for-merges/include/qemu/futex.h:29
> > #2  0x00005639a8b8b75e in qemu_event_wait (ev=0x5639a930a418
> > <rcu_call_ready_event>)
> >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:442
> > #3  0x00005639a8ba41d3 in call_rcu_thread (opaque=0x0)
> >     at /home/petmay01/linaro/qemu-for-merges/util/rcu.c:261
> > #4  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa449ac0)
> >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
> > #5  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf9f1700) at
> > pthread_create.c:333
> > #6  0x00007f59cf63441d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> >
> > Thread 1 (Thread 0x7f59e9017fc0 (LWP 10534)):
> > #0  0x00007f59cf628811 in __GI_ppoll (fds=0x5639aa5a1b90, nfds=6,
> > timeout=<optimised out>, sigmask=0x0) at
> > ../sysdeps/unix/sysv/linux/ppoll.c:50
> > #1  0x00005639a8b8534b in qemu_poll_ns (fds=0x5639aa5a1b90, nfds=6,
> timeout=-1)
> >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-timer.c:322
> > ---Type <return> to continue, or q <return> to quit---
> > #2  0x00005639a8b86527 in os_host_main_loop_wait (timeout=-1)
> >     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:233
> > #3  0x00005639a8b865fe in main_loop_wait (nonblocking=0)
> >     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:497
> > #4  0x00005639a88dab83 in main_loop () at
> > /home/petmay01/linaro/qemu-for-merges/vl.c:1925
> > #5  0x00005639a88e25b9 in main (argc=21, argv=0x7ffd344d6138,
> > envp=0x7ffd344d61e8)
> >     at /home/petmay01/linaro/qemu-for-merges/vl.c:4669
> >
> > Backtrace in test-filter-mirror:
> > Thread 1 (Thread 0x7fd24ab03740 (LWP 10533)):
> > #0  0x00007fd24a0eb81d in __libc_recv (fd=8, buf=0x7ffcef3a8c58, n=4,
> flags=0)
> >     at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
> > #1  0x000055e065d2d26a in test_mirror ()
> >     at
> /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:70
> > #2  0x00007fd24a67087b in test_case_run (tc=0x55e0676b6a00)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2158
> > #3  g_test_run_suite_internal (suite=suite@entry=0x55e0676b6060,
> > path=path@entry=0x0)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2241
> > #4  0x00007fd24a670a43 in g_test_run_suite_internal
> > (suite=suite@entry=0x55e0676b6040,
> >     path=path@entry=0x0) at
> > /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
> > #5  0x00007fd24a670a43 in g_test_run_suite_internal
> > (suite=suite@entry=0x55e0676b6020,
> >     path=path@entry=0x0) at
> > /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
> > #6  0x00007fd24a670c4e in g_test_run_suite (suite=0x55e0676b6020)
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2328
> > #7  0x00007fd24a670c71 in g_test_run ()
> >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:1596
> > #8  0x000055e065d2d4af in main (argc=1, argv=0x7ffcef3a9128)
> >     at
> /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:92
> >
> >
> > thanks
> > -- PMM
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-11 16:15 ` Dr. David Alan Gilbert
  2019-01-14 16:33   ` Zhang Chen
@ 2019-01-15 10:28   ` Peter Maydell
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Maydell @ 2019-01-15 10:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: QEMU Developers, Zhang Chen, Li Zhijian, Paolo Bonzini

On Fri, 11 Jan 2019 at 16:15, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > Recently I've noticed that test-filter-mirror has been hanging
> > intermittently, typically when run on some other TCG architecture.
> > In the instance I've just looked at, this was with s390x guest on
> > x86-64 host, though I've also seen it on other host archs and
> > perhaps with other guests.
>
> Watch out to see if you really do see it for other guests;
> it carefully avoids using virtio-net to avoid vhost; but on s390x it
> uses virtio-net-ccw - could that hit the vhost it was trying to avoid?

I've seen several hangs in the last few days, all on s390x guests.
It is intermittent though, so sometimes s390x works fine.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-14 16:33   ` Zhang Chen
@ 2019-01-17  9:46     ` Jason Wang
  2019-01-21 18:56       ` Peter Maydell
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-17  9:46 UTC (permalink / raw)
  To: Zhang Chen, Dr. David Alan Gilbert
  Cc: Peter Maydell, QEMU Developers, Li Zhijian, Paolo Bonzini


On 2019/1/15 上午12:33, Zhang Chen wrote:
>
>
> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert 
> <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
>
>     * Peter Maydell (peter.maydell@linaro.org
>     <mailto:peter.maydell@linaro.org>) wrote:
>     > Recently I've noticed that test-filter-mirror has been hanging
>     > intermittently, typically when run on some other TCG architecture.
>     > In the instance I've just looked at, this was with s390x guest on
>     > x86-64 host, though I've also seen it on other host archs and
>     > perhaps with other guests.
>
>     Watch out to see if you really do see it for other guests;
>     it carefully avoids using virtio-net to avoid vhost; but on s390x it
>     uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>
>     > Below is a backtrace, though it seems to be pretty unhelpful.
>     > Anybody got any theories ? Does the mirror test rely on dirty
>     > memory bitmaps like the migration test (which also hangs
>     > occasionally with TCG due to some bug I'm sure we've investigated
>     > in the past) ?
>
>     I don't think it relies on the CPU at all.
>
>     Dave
>
>
>  I have no idea about this currently, but Jason and I designed the 
> test case.
> Add Jason: Have any comments about this ?


I can't reproduce this locally with s390x-softmmu. It looks to me the 
test should be independent to any kinds of emulation. It should pass 
when mainloop work.

Thanks


>
> Thanks
> Zhang Chen
>
>     > Processes:
>     > petmay01 10533  0.0  0.0  15624  1400 pts/16   S+  14:22   0:00
>     >                            \_ tests/test-filter-mirror --quiet
>     > --keep-going -m=quick --GTestLogFD=6
>     > petmay01 10534  0.0  0.2 979892 36492 pts/16   Sl+ 14:22   0:00
>     >                                \_ s390x-softmmu/qemu-system-s390x
>     > -qtest unix:/tmp/qtest-10533.sock,nowait -qtest-log /dev/null
>     -chardev
>     > socket,path=/tmp/qtest-10533.qmp,nowait,id=char0 -mon
>     > chardev=char0,mode=control -machine accel=qtest -display none
>     -netdev
>     > socket,id=qtest-bn0,fd=4 -device
>     > virtio-net-ccw,netdev=qtest-bn0,id=qtest-e0 -chardev
>     > socket,id=mirror0,path=filter-mirror.Ongzms,server,nowait -object
>     > filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0
>     >
>     > Backtrace of all threads in the QEMU process:
>     > Thread 4 (Thread 0x7f59b67ff700 (LWP 10539)):
>     > #0  0x00007f59cf908156 in __sigwait (sig=0x7f59b67fc510,
>     set=<optimised out>)
>     >     at ../sysdeps/unix/sysv/linux/sigwait.c:64
>     > #1  0x00007f59cf908156 in __sigwait (set=<optimised out>,
>     sig=0x7f59b67fc510)
>     >     at ../sysdeps/unix/sysv/linux/sigwait.c:96
>     > #2  0x00005639a87c4c8e in qemu_dummy_cpu_thread_fn
>     (arg=0x5639aa544250)
>     >     at /home/petmay01/linaro/qemu-for-merges/cpus.c:1326
>     > #3  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa569010)
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
>     > #4  0x00007f59cf8fe6ba in start_thread (arg=0x7f59b67ff700) at
>     > pthread_create.c:333
>     > #5  0x00007f59cf63441d in clone () at
>     > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>     >
>     > Thread 3 (Thread 0x7f59bf1f0700 (LWP 10538)):
>     > #0  0x00007f59cf62874d in poll () at
>     ../sysdeps/unix/syscall-template.S:84
>     > #1  0x00007f59e64e638c in g_main_context_iterate
>     (priority=2147483647,
>     > n_fds=3, fds=0x7f59b0001300, timeout=<optimised out>,
>     > context=0x7f59b00008c0)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4135
>     > #2  0x00007f59e64e638c in g_main_context_iterate
>     > (context=0x7f59b00008c0, block=block@entry=1,
>     > dispatch=dispatch@entry=1, self=<optimised out>)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:3835
>     > #3  0x00007f59e64e6712 in g_main_loop_run (loop=0x7f59b00012e0)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c:4034
>     > #4  0x00005639a88d187a in iothread_run (opaque=0x5639aa512710)
>     >     at /home/petmay01/linaro/qemu-for-merges/iothread.c:74
>     > #5  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa49d6a0)
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
>     > #6  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf1f0700) at
>     > pthread_create.c:333
>     > #7  0x00007f59cf63441d in clone () at
>     > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>     >
>     > Thread 2 (Thread 0x7f59bf9f1700 (LWP 10537)):
>     > #0  0x00007f59cf62e4d9 in syscall () at
>     > ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>     > #1  0x00005639a8b8b58f in qemu_futex_wait (f=0x5639a930a418
>     > <rcu_call_ready_event>, val=4294967295)
>     >     at /home/petmay01/linaro/qemu-for-merges/include/qemu/futex.h:29
>     > #2  0x00005639a8b8b75e in qemu_event_wait (ev=0x5639a930a418
>     > <rcu_call_ready_event>)
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:442
>     > #3  0x00005639a8ba41d3 in call_rcu_thread (opaque=0x0)
>     >     at /home/petmay01/linaro/qemu-for-merges/util/rcu.c:261
>     > #4  0x00005639a8b8b911 in qemu_thread_start (args=0x5639aa449ac0)
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/util/qemu-thread-posix.c:502
>     > #5  0x00007f59cf8fe6ba in start_thread (arg=0x7f59bf9f1700) at
>     > pthread_create.c:333
>     > #6  0x00007f59cf63441d in clone () at
>     > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>     >
>     > Thread 1 (Thread 0x7f59e9017fc0 (LWP 10534)):
>     > #0  0x00007f59cf628811 in __GI_ppoll (fds=0x5639aa5a1b90, nfds=6,
>     > timeout=<optimised out>, sigmask=0x0) at
>     > ../sysdeps/unix/sysv/linux/ppoll.c:50
>     > #1  0x00005639a8b8534b in qemu_poll_ns (fds=0x5639aa5a1b90,
>     nfds=6, timeout=-1)
>     >     at /home/petmay01/linaro/qemu-for-merges/util/qemu-timer.c:322
>     > ---Type <return> to continue, or q <return> to quit---
>     > #2  0x00005639a8b86527 in os_host_main_loop_wait (timeout=-1)
>     >     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:233
>     > #3  0x00005639a8b865fe in main_loop_wait (nonblocking=0)
>     >     at /home/petmay01/linaro/qemu-for-merges/util/main-loop.c:497
>     > #4  0x00005639a88dab83 in main_loop () at
>     > /home/petmay01/linaro/qemu-for-merges/vl.c:1925
>     > #5  0x00005639a88e25b9 in main (argc=21, argv=0x7ffd344d6138,
>     > envp=0x7ffd344d61e8)
>     >     at /home/petmay01/linaro/qemu-for-merges/vl.c:4669
>     >
>     > Backtrace in test-filter-mirror:
>     > Thread 1 (Thread 0x7fd24ab03740 (LWP 10533)):
>     > #0  0x00007fd24a0eb81d in __libc_recv (fd=8, buf=0x7ffcef3a8c58,
>     n=4, flags=0)
>     >     at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28
>     > #1  0x000055e065d2d26a in test_mirror ()
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:70
>     > #2  0x00007fd24a67087b in test_case_run (tc=0x55e0676b6a00)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2158
>     > #3  g_test_run_suite_internal (suite=suite@entry=0x55e0676b6060,
>     > path=path@entry=0x0)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2241
>     > #4  0x00007fd24a670a43 in g_test_run_suite_internal
>     > (suite=suite@entry=0x55e0676b6040,
>     >     path=path@entry=0x0) at
>     > /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
>     > #5  0x00007fd24a670a43 in g_test_run_suite_internal
>     > (suite=suite@entry=0x55e0676b6020,
>     >     path=path@entry=0x0) at
>     > /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2253
>     > #6  0x00007fd24a670c4e in g_test_run_suite (suite=0x55e0676b6020)
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:2328
>     > #7  0x00007fd24a670c71 in g_test_run ()
>     >     at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c:1596
>     > #8  0x000055e065d2d4af in main (argc=1, argv=0x7ffcef3a9128)
>     >     at
>     /home/petmay01/linaro/qemu-for-merges/tests/test-filter-mirror.c:92
>     >
>     >
>     > thanks
>     > -- PMM
>     --
>     Dr. David Alan Gilbert / dgilbert@redhat.com
>     <mailto:dgilbert@redhat.com> / Manchester, UK
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-17  9:46     ` Jason Wang
@ 2019-01-21 18:56       ` Peter Maydell
  2019-01-21 20:01         ` Dr. David Alan Gilbert
  2019-01-23  2:43         ` Jason Wang
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Maydell @ 2019-01-21 18:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhang Chen, Dr. David Alan Gilbert, QEMU Developers, Li Zhijian,
	Paolo Bonzini

On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/1/15 上午12:33, Zhang Chen wrote:
> >
> >
> > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> >
> >     * Peter Maydell (peter.maydell@linaro.org
> >     <mailto:peter.maydell@linaro.org>) wrote:
> >     > Recently I've noticed that test-filter-mirror has been hanging
> >     > intermittently, typically when run on some other TCG architecture.
> >     > In the instance I've just looked at, this was with s390x guest on
> >     > x86-64 host, though I've also seen it on other host archs and
> >     > perhaps with other guests.
> >
> >     Watch out to see if you really do see it for other guests;
> >     it carefully avoids using virtio-net to avoid vhost; but on s390x it
> >     uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> >
> >     > Below is a backtrace, though it seems to be pretty unhelpful.
> >     > Anybody got any theories ? Does the mirror test rely on dirty
> >     > memory bitmaps like the migration test (which also hangs
> >     > occasionally with TCG due to some bug I'm sure we've investigated
> >     > in the past) ?
> >
> >     I don't think it relies on the CPU at all.

> >  I have no idea about this currently, but Jason and I designed the
> > test case.
> > Add Jason: Have any comments about this ?
>
>
> I can't reproduce this locally with s390x-softmmu. It looks to me the
> test should be independent to any kinds of emulation. It should pass
> when mainloop work.

I've just seen a hang with ppc64 guest on s390x host, so it is
indeed not specific to s390x guest (and so not specific to
virtio-net either, since the ppc64 guest setup uses e1000).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-21 18:56       ` Peter Maydell
@ 2019-01-21 20:01         ` Dr. David Alan Gilbert
  2019-01-22  9:06           ` Peter Maydell
  2019-01-23  2:43         ` Jason Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-21 20:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jason Wang, Zhang Chen, QEMU Developers, Li Zhijian, Paolo Bonzini

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > >
> > >
> > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> > >
> > >     * Peter Maydell (peter.maydell@linaro.org
> > >     <mailto:peter.maydell@linaro.org>) wrote:
> > >     > Recently I've noticed that test-filter-mirror has been hanging
> > >     > intermittently, typically when run on some other TCG architecture.
> > >     > In the instance I've just looked at, this was with s390x guest on
> > >     > x86-64 host, though I've also seen it on other host archs and
> > >     > perhaps with other guests.
> > >
> > >     Watch out to see if you really do see it for other guests;
> > >     it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > >     uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > >
> > >     > Below is a backtrace, though it seems to be pretty unhelpful.
> > >     > Anybody got any theories ? Does the mirror test rely on dirty
> > >     > memory bitmaps like the migration test (which also hangs
> > >     > occasionally with TCG due to some bug I'm sure we've investigated
> > >     > in the past) ?
> > >
> > >     I don't think it relies on the CPU at all.
> 
> > >  I have no idea about this currently, but Jason and I designed the
> > > test case.
> > > Add Jason: Have any comments about this ?
> >
> >
> > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > test should be independent to any kinds of emulation. It should pass
> > when mainloop work.
> 
> I've just seen a hang with ppc64 guest on s390x host, so it is
> indeed not specific to s390x guest (and so not specific to
> virtio-net either, since the ppc64 guest setup uses e1000).

Hmph, there goes that idea.

I guess we need some tracing of the packet flow;  do you build with
tracing on and can we enable it for a test?

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-21 20:01         ` Dr. David Alan Gilbert
@ 2019-01-22  9:06           ` Peter Maydell
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Maydell @ 2019-01-22  9:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Jason Wang, Zhang Chen, QEMU Developers, Li Zhijian, Paolo Bonzini

On Mon, 21 Jan 2019 at 20:01, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> I guess we need some tracing of the packet flow;  do you build with
> tracing on and can we enable it for a test?

What you get is "make && make check", effectively. If you want
better debuggability of the test you need to improve the test...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-21 18:56       ` Peter Maydell
  2019-01-21 20:01         ` Dr. David Alan Gilbert
@ 2019-01-23  2:43         ` Jason Wang
  2019-01-23 19:53           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-23  2:43 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Zhang Chen, Dr. David Alan Gilbert, QEMU Developers, Li Zhijian,
	Paolo Bonzini, Peter Xu


On 2019/1/22 上午2:56, Peter Maydell wrote:
> On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>
>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>> <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
>>>
>>>      * Peter Maydell (peter.maydell@linaro.org
>>>      <mailto:peter.maydell@linaro.org>) wrote:
>>>      > Recently I've noticed that test-filter-mirror has been hanging
>>>      > intermittently, typically when run on some other TCG architecture.
>>>      > In the instance I've just looked at, this was with s390x guest on
>>>      > x86-64 host, though I've also seen it on other host archs and
>>>      > perhaps with other guests.
>>>
>>>      Watch out to see if you really do see it for other guests;
>>>      it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>      uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>
>>>      > Below is a backtrace, though it seems to be pretty unhelpful.
>>>      > Anybody got any theories ? Does the mirror test rely on dirty
>>>      > memory bitmaps like the migration test (which also hangs
>>>      > occasionally with TCG due to some bug I'm sure we've investigated
>>>      > in the past) ?
>>>
>>>      I don't think it relies on the CPU at all.
>>>   I have no idea about this currently, but Jason and I designed the
>>> test case.
>>> Add Jason: Have any comments about this ?
>>
>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>> test should be independent to any kinds of emulation. It should pass
>> when mainloop work.
> I've just seen a hang with ppc64 guest on s390x host, so it is
> indeed not specific to s390x guest (and so not specific to
> virtio-net either, since the ppc64 guest setup uses e1000).
>
> thanks
> -- PMM


Finally reproduced locally after hundreds (sometimes thousands) times of 
running.

Bisection points to OOB monitor[1].

It looks to me after OOB is used unconditionally we lose a barrier to 
make sure socket is connected before sending packets in 
test-filter-mirror.c. Is there any other similar and simple thing that 
we could do to kick the mainloop?

Thanks

[1]

commit 8258292e18c39480b64eba9f3551ab772ce29b5d (HEAD, refs/bisect/bad)
Author: Peter Xu <peterx@redhat.com>
Date:   Tue Oct 9 14:27:15 2018 +0800

     monitor: Remove "x-oob", offer capability "oob" unconditionally

     Out-of-band command execution was introduced in commit cf869d53172.
     Unfortunately, we ran into a regression, and had to turn it into an
     experimental option for 2.12 (commit be933ffc23).

http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html

     The regression has since been fixed (commit 951702f39c7 "monitor: bind
     dispatch bh to iohandler context").  A thorough re-review of OOB
     commands led to a few more issues, which have also been addressed.

     This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"),
     and makes QMP monitors again offer capability "oob" whenever they can
     provide it, i.e. when the monitor's character device is capable of
     running in an I/O thread.

     Some trivial touch-up in the test code is required to make sure 
qmp-test
     won't break.

     Reviewed-by: Markus Armbruster <armbru@redhat.com>
     Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
     Signed-off-by: Peter Xu <peterx@redhat.com>
     Message-Id: <20181009062718.1914-4-peterx@redhat.com>
     [Conflict with "monitor: check if chardev can switch gcontext for OOB"
     resolved, commit message updated]
     Signed-off-by: Markus Armbruster <armbru@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-23  2:43         ` Jason Wang
@ 2019-01-23 19:53           ` Dr. David Alan Gilbert
  2019-01-24  4:01             ` Jason Wang
  2019-01-24 10:11             ` Daniel P. Berrangé
  0 siblings, 2 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-23 19:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Maydell, Zhang Chen, QEMU Developers, Li Zhijian,
	Paolo Bonzini, Peter Xu

* Jason Wang (jasowang@redhat.com) wrote:
> 
> On 2019/1/22 上午2:56, Peter Maydell wrote:
> > On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
> > > 
> > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > 
> > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> > > > 
> > > >      * Peter Maydell (peter.maydell@linaro.org
> > > >      <mailto:peter.maydell@linaro.org>) wrote:
> > > >      > Recently I've noticed that test-filter-mirror has been hanging
> > > >      > intermittently, typically when run on some other TCG architecture.
> > > >      > In the instance I've just looked at, this was with s390x guest on
> > > >      > x86-64 host, though I've also seen it on other host archs and
> > > >      > perhaps with other guests.
> > > > 
> > > >      Watch out to see if you really do see it for other guests;
> > > >      it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > >      uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > 
> > > >      > Below is a backtrace, though it seems to be pretty unhelpful.
> > > >      > Anybody got any theories ? Does the mirror test rely on dirty
> > > >      > memory bitmaps like the migration test (which also hangs
> > > >      > occasionally with TCG due to some bug I'm sure we've investigated
> > > >      > in the past) ?
> > > > 
> > > >      I don't think it relies on the CPU at all.
> > > >   I have no idea about this currently, but Jason and I designed the
> > > > test case.
> > > > Add Jason: Have any comments about this ?
> > > 
> > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > test should be independent to any kinds of emulation. It should pass
> > > when mainloop work.
> > I've just seen a hang with ppc64 guest on s390x host, so it is
> > indeed not specific to s390x guest (and so not specific to
> > virtio-net either, since the ppc64 guest setup uses e1000).
> > 
> > thanks
> > -- PMM
> 
> 
> Finally reproduced locally after hundreds (sometimes thousands) times of
> running.
> 
> Bisection points to OOB monitor[1].
> 
> It looks to me after OOB is used unconditionally we lose a barrier to make
> sure socket is connected before sending packets in test-filter-mirror.c. Is
> there any other similar and simple thing that we could do to kick the
> mainloop?

Do you mean the:

    /* send a qmp command to guarantee that 'connected' is setting to true. */
    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");

why was that ever sufficient to know the socket was ready?

Dave

> Thanks
> 
> [1]
> 
> commit 8258292e18c39480b64eba9f3551ab772ce29b5d (HEAD, refs/bisect/bad)
> Author: Peter Xu <peterx@redhat.com>
> Date:   Tue Oct 9 14:27:15 2018 +0800
> 
>     monitor: Remove "x-oob", offer capability "oob" unconditionally
> 
>     Out-of-band command execution was introduced in commit cf869d53172.
>     Unfortunately, we ran into a regression, and had to turn it into an
>     experimental option for 2.12 (commit be933ffc23).
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html
> 
>     The regression has since been fixed (commit 951702f39c7 "monitor: bind
>     dispatch bh to iohandler context").  A thorough re-review of OOB
>     commands led to a few more issues, which have also been addressed.
> 
>     This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"),
>     and makes QMP monitors again offer capability "oob" whenever they can
>     provide it, i.e. when the monitor's character device is capable of
>     running in an I/O thread.
> 
>     Some trivial touch-up in the test code is required to make sure qmp-test
>     won't break.
> 
>     Reviewed-by: Markus Armbruster <armbru@redhat.com>
>     Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>     Signed-off-by: Peter Xu <peterx@redhat.com>
>     Message-Id: <20181009062718.1914-4-peterx@redhat.com>
>     [Conflict with "monitor: check if chardev can switch gcontext for OOB"
>     resolved, commit message updated]
>     Signed-off-by: Markus Armbruster <armbru@redhat.com>
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-23 19:53           ` Dr. David Alan Gilbert
@ 2019-01-24  4:01             ` Jason Wang
  2019-01-24  9:11               ` Dr. David Alan Gilbert
  2019-01-24  9:47               ` Markus Armbruster
  2019-01-24 10:11             ` Daniel P. Berrangé
  1 sibling, 2 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-24  4:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu, Zhang Chen,
	Paolo Bonzini


On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>
>>>>>       * Peter Maydell (peter.maydell@linaro.org
>>>>>       <mailto:peter.maydell@linaro.org>) wrote:
>>>>>       > Recently I've noticed that test-filter-mirror has been hanging
>>>>>       > intermittently, typically when run on some other TCG architecture.
>>>>>       > In the instance I've just looked at, this was with s390x guest on
>>>>>       > x86-64 host, though I've also seen it on other host archs and
>>>>>       > perhaps with other guests.
>>>>>
>>>>>       Watch out to see if you really do see it for other guests;
>>>>>       it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>       uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>
>>>>>       > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>       > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>       > memory bitmaps like the migration test (which also hangs
>>>>>       > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>       > in the past) ?
>>>>>
>>>>>       I don't think it relies on the CPU at all.
>>>>>    I have no idea about this currently, but Jason and I designed the
>>>>> test case.
>>>>> Add Jason: Have any comments about this ?
>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>> test should be independent to any kinds of emulation. It should pass
>>>> when mainloop work.
>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>> indeed not specific to s390x guest (and so not specific to
>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>
>>> thanks
>>> -- PMM
>> Finally reproduced locally after hundreds (sometimes thousands) times of
>> running.
>>
>> Bisection points to OOB monitor[1].
>>
>> It looks to me after OOB is used unconditionally we lose a barrier to make
>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>> there any other similar and simple thing that we could do to kick the
>> mainloop?
> Do you mean the:
>
>      /* send a qmp command to guarantee that 'connected' is setting to true. */
>      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");


Yes.


>
> why was that ever sufficient to know the socket was ready?


It was suggested by Fam, I don't remember the details. Can we make sure 
all pending events has been processed (UNIX socket was set to connected) 
after query-status is returned with an non OOB monitor?

Thanks


>
> Dave
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  4:01             ` Jason Wang
@ 2019-01-24  9:11               ` Dr. David Alan Gilbert
  2019-01-24  9:51                 ` Peter Xu
  2019-01-25  3:45                 ` Jason Wang
  2019-01-24  9:47               ` Markus Armbruster
  1 sibling, 2 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-24  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu, Zhang Chen,
	Paolo Bonzini

* Jason Wang (jasowang@redhat.com) wrote:
> 
> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> > > On 2019/1/22 上午2:56, Peter Maydell wrote:
> > > > On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
> > > > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > > > <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
> > > > > > 
> > > > > >       * Peter Maydell (peter.maydell@linaro.org
> > > > > >       <mailto:peter.maydell@linaro.org>) wrote:
> > > > > >       > Recently I've noticed that test-filter-mirror has been hanging
> > > > > >       > intermittently, typically when run on some other TCG architecture.
> > > > > >       > In the instance I've just looked at, this was with s390x guest on
> > > > > >       > x86-64 host, though I've also seen it on other host archs and
> > > > > >       > perhaps with other guests.
> > > > > > 
> > > > > >       Watch out to see if you really do see it for other guests;
> > > > > >       it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > > > >       uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > > > 
> > > > > >       > Below is a backtrace, though it seems to be pretty unhelpful.
> > > > > >       > Anybody got any theories ? Does the mirror test rely on dirty
> > > > > >       > memory bitmaps like the migration test (which also hangs
> > > > > >       > occasionally with TCG due to some bug I'm sure we've investigated
> > > > > >       > in the past) ?
> > > > > > 
> > > > > >       I don't think it relies on the CPU at all.
> > > > > >    I have no idea about this currently, but Jason and I designed the
> > > > > > test case.
> > > > > > Add Jason: Have any comments about this ?
> > > > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > > > test should be independent to any kinds of emulation. It should pass
> > > > > when mainloop work.
> > > > I've just seen a hang with ppc64 guest on s390x host, so it is
> > > > indeed not specific to s390x guest (and so not specific to
> > > > virtio-net either, since the ppc64 guest setup uses e1000).
> > > > 
> > > > thanks
> > > > -- PMM
> > > Finally reproduced locally after hundreds (sometimes thousands) times of
> > > running.
> > > 
> > > Bisection points to OOB monitor[1].
> > > 
> > > It looks to me after OOB is used unconditionally we lose a barrier to make
> > > sure socket is connected before sending packets in test-filter-mirror.c. Is
> > > there any other similar and simple thing that we could do to kick the
> > > mainloop?
> > Do you mean the:
> > 
> >      /* send a qmp command to guarantee that 'connected' is setting to true. */
> >      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> 
> 
> Yes.
> 
> 
> > 
> > why was that ever sufficient to know the socket was ready?
> 
> 
> It was suggested by Fam, I don't remember the details. Can we make sure all
> pending events has been processed (UNIX socket was set to connected) after
> query-status is returned with an non OOB monitor?

I'm not sure - it doesn't sound like a 'query-status' should ensure
anything else.
How about something like a 'query-chardev' - can that tell you what you
need and loop until it's ready?

Dave

> Thanks
> 
> 
> > 
> > Dave
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  4:01             ` Jason Wang
  2019-01-24  9:11               ` Dr. David Alan Gilbert
@ 2019-01-24  9:47               ` Markus Armbruster
  2019-01-25  3:56                 ` Jason Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Markus Armbruster @ 2019-01-24  9:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dr. David Alan Gilbert, Peter Maydell, Li Zhijian,
	QEMU Developers, Peter Xu, Zhang Chen, Paolo Bonzini

Please cc: me on QMP issues.

Jason Wang <jasowang@redhat.com> writes:

> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>> * Jason Wang (jasowang@redhat.com) wrote:
>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>
>>>>>>       * Peter Maydell (peter.maydell@linaro.org
>>>>>>       <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>       > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>       > intermittently, typically when run on some other TCG architecture.
>>>>>>       > In the instance I've just looked at, this was with s390x guest on
>>>>>>       > x86-64 host, though I've also seen it on other host archs and
>>>>>>       > perhaps with other guests.
>>>>>>
>>>>>>       Watch out to see if you really do see it for other guests;
>>>>>>       it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>       uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>
>>>>>>       > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>       > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>       > memory bitmaps like the migration test (which also hangs
>>>>>>       > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>       > in the past) ?
>>>>>>
>>>>>>       I don't think it relies on the CPU at all.
>>>>>>    I have no idea about this currently, but Jason and I designed the
>>>>>> test case.
>>>>>> Add Jason: Have any comments about this ?
>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>> test should be independent to any kinds of emulation. It should pass
>>>>> when mainloop work.
>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>> indeed not specific to s390x guest (and so not specific to
>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>
>>>> thanks
>>>> -- PMM
>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>> running.
>>>
>>> Bisection points to OOB monitor[1].
>>>
>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>> there any other similar and simple thing that we could do to kick the
>>> mainloop?
>> Do you mean the:
>>
>>      /* send a qmp command to guarantee that 'connected' is setting to true. */
>>      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>
>
> Yes.
>
>
>>
>> why was that ever sufficient to know the socket was ready?
>
>
> It was suggested by Fam, I don't remember the details. Can we make
> sure all pending events has been processed (UNIX socket was set to
> connected) after query-status is returned with an non OOB monitor?

I'm afraid I lack context.  Which socket are you talking about?  The
test has at least the QMP socket, the send_sock[], and recv_sock.  What
exactly are you trying to accomplish?

By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
looks rather fishy.  Why create a temporary file only to create a Unix
domain socket right over it?  Why is ignoring errors a good idea?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  9:11               ` Dr. David Alan Gilbert
@ 2019-01-24  9:51                 ` Peter Xu
  2019-01-25  3:55                   ` Jason Wang
  2019-01-25  3:45                 ` Jason Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Xu @ 2019-01-24  9:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Jason Wang, Peter Maydell, Li Zhijian, QEMU Developers,
	Zhang Chen, Paolo Bonzini

On Thu, Jan 24, 2019 at 09:11:15AM +0000, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
> > 
> > On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
> > > * Jason Wang (jasowang@redhat.com) wrote:
> > > > On 2019/1/22 上午2:56, Peter Maydell wrote:
> > > > > On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
> > > > > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > > > > <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
> > > > > > > 
> > > > > > >       * Peter Maydell (peter.maydell@linaro.org
> > > > > > >       <mailto:peter.maydell@linaro.org>) wrote:
> > > > > > >       > Recently I've noticed that test-filter-mirror has been hanging
> > > > > > >       > intermittently, typically when run on some other TCG architecture.
> > > > > > >       > In the instance I've just looked at, this was with s390x guest on
> > > > > > >       > x86-64 host, though I've also seen it on other host archs and
> > > > > > >       > perhaps with other guests.
> > > > > > > 
> > > > > > >       Watch out to see if you really do see it for other guests;
> > > > > > >       it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > > > > >       uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > > > > 
> > > > > > >       > Below is a backtrace, though it seems to be pretty unhelpful.
> > > > > > >       > Anybody got any theories ? Does the mirror test rely on dirty
> > > > > > >       > memory bitmaps like the migration test (which also hangs
> > > > > > >       > occasionally with TCG due to some bug I'm sure we've investigated
> > > > > > >       > in the past) ?
> > > > > > > 
> > > > > > >       I don't think it relies on the CPU at all.
> > > > > > >    I have no idea about this currently, but Jason and I designed the
> > > > > > > test case.
> > > > > > > Add Jason: Have any comments about this ?
> > > > > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > > > > test should be independent to any kinds of emulation. It should pass
> > > > > > when mainloop work.
> > > > > I've just seen a hang with ppc64 guest on s390x host, so it is
> > > > > indeed not specific to s390x guest (and so not specific to
> > > > > virtio-net either, since the ppc64 guest setup uses e1000).
> > > > > 
> > > > > thanks
> > > > > -- PMM
> > > > Finally reproduced locally after hundreds (sometimes thousands) times of
> > > > running.
> > > > 
> > > > Bisection points to OOB monitor[1].
> > > > 
> > > > It looks to me after OOB is used unconditionally we lose a barrier to make
> > > > sure socket is connected before sending packets in test-filter-mirror.c. Is
> > > > there any other similar and simple thing that we could do to kick the
> > > > mainloop?
> > > Do you mean the:
> > > 
> > >      /* send a qmp command to guarantee that 'connected' is setting to true. */
> > >      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> > 
> > 
> > Yes.
> > 
> > 
> > > 
> > > why was that ever sufficient to know the socket was ready?
> > 
> > 
> > It was suggested by Fam, I don't remember the details. Can we make sure all
> > pending events has been processed (UNIX socket was set to connected) after
> > query-status is returned with an non OOB monitor?
> 
> I'm not sure - it doesn't sound like a 'query-status' should ensure
> anything else.
> How about something like a 'query-chardev' - can that tell you what you
> need and loop until it's ready?

Yeah it sounds hacky to use "query status" to make sure a specific
chardev is connected even before the OOB...

I saw that currently the chardev requires "nowait":

    qts = qtest_initf(
        "-netdev socket,id=qtest-bn0,fd=%d "
        "-device %s,netdev=qtest-bn0,id=qtest-e0 "
        "-chardev socket,id=mirror0,path=%s,server,nowait "
        "-object filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0 "
        , send_sock[1], devstr, sock_path);

Could it work without "nowait"?  Would that make sure QEMU will wait
until connection established before going on?

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-23 19:53           ` Dr. David Alan Gilbert
  2019-01-24  4:01             ` Jason Wang
@ 2019-01-24 10:11             ` Daniel P. Berrangé
  2019-01-24 10:30               ` Daniel P. Berrangé
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2019-01-24 10:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Jason Wang, Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu,
	Zhang Chen, Paolo Bonzini

On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
> > 
> > On 2019/1/22 上午2:56, Peter Maydell wrote:
> > > On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > > 
> > > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> > > > > 
> > > > >      * Peter Maydell (peter.maydell@linaro.org
> > > > >      <mailto:peter.maydell@linaro.org>) wrote:
> > > > >      > Recently I've noticed that test-filter-mirror has been hanging
> > > > >      > intermittently, typically when run on some other TCG architecture.
> > > > >      > In the instance I've just looked at, this was with s390x guest on
> > > > >      > x86-64 host, though I've also seen it on other host archs and
> > > > >      > perhaps with other guests.
> > > > > 
> > > > >      Watch out to see if you really do see it for other guests;
> > > > >      it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > > >      uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > > 
> > > > >      > Below is a backtrace, though it seems to be pretty unhelpful.
> > > > >      > Anybody got any theories ? Does the mirror test rely on dirty
> > > > >      > memory bitmaps like the migration test (which also hangs
> > > > >      > occasionally with TCG due to some bug I'm sure we've investigated
> > > > >      > in the past) ?
> > > > > 
> > > > >      I don't think it relies on the CPU at all.
> > > > >   I have no idea about this currently, but Jason and I designed the
> > > > > test case.
> > > > > Add Jason: Have any comments about this ?
> > > > 
> > > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > > test should be independent to any kinds of emulation. It should pass
> > > > when mainloop work.
> > > I've just seen a hang with ppc64 guest on s390x host, so it is
> > > indeed not specific to s390x guest (and so not specific to
> > > virtio-net either, since the ppc64 guest setup uses e1000).
> > > 
> > > thanks
> > > -- PMM
> > 
> > 
> > Finally reproduced locally after hundreds (sometimes thousands) times of
> > running.
> > 
> > Bisection points to OOB monitor[1].
> > 
> > It looks to me after OOB is used unconditionally we lose a barrier to make
> > sure socket is connected before sending packets in test-filter-mirror.c. Is
> > there any other similar and simple thing that we could do to kick the
> > mainloop?
> 
> Do you mean the:
> 
>     /* send a qmp command to guarantee that 'connected' is setting to true. */
>     qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> 
> why was that ever sufficient to know the socket was ready?

This doesn't make any sense to me.

There's the netdev socket, which has been passed in as a pre-opened socket
FD, so that's guaranteed connected.

There's the chardev server socket, to which we've just done a unix_connect()
call to establish a connection. If unix_connect() has succeeded, then at least
the socket is connected & ready for I/O from the test's side. This is a
reliable stream socket, so even if the test sends data on the socket right away
and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
as soon as QEMU starts to monitor for incoming data on the socket.

So I don't get what trying to wait for a "connected" state actually achieves.
It feels like a mistaken attempt to paper over some other unknown flaw that
just worked by some lucky side-effect.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24 10:11             ` Daniel P. Berrangé
@ 2019-01-24 10:30               ` Daniel P. Berrangé
  2019-01-24 11:01                 ` Daniel P. Berrangé
  2019-01-25  7:00                 ` Jason Wang
  0 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2019-01-24 10:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, Jason Wang, QEMU Developers, Peter Xu,
	Zhang Chen, Paolo Bonzini

On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
> On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> > > 
> > > On 2019/1/22 上午2:56, Peter Maydell wrote:
> > > > On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
> > > > > 
> > > > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > > > 
> > > > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > > > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> > > > > > 
> > > > > >      * Peter Maydell (peter.maydell@linaro.org
> > > > > >      <mailto:peter.maydell@linaro.org>) wrote:
> > > > > >      > Recently I've noticed that test-filter-mirror has been hanging
> > > > > >      > intermittently, typically when run on some other TCG architecture.
> > > > > >      > In the instance I've just looked at, this was with s390x guest on
> > > > > >      > x86-64 host, though I've also seen it on other host archs and
> > > > > >      > perhaps with other guests.
> > > > > > 
> > > > > >      Watch out to see if you really do see it for other guests;
> > > > > >      it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > > > >      uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > > > 
> > > > > >      > Below is a backtrace, though it seems to be pretty unhelpful.
> > > > > >      > Anybody got any theories ? Does the mirror test rely on dirty
> > > > > >      > memory bitmaps like the migration test (which also hangs
> > > > > >      > occasionally with TCG due to some bug I'm sure we've investigated
> > > > > >      > in the past) ?
> > > > > > 
> > > > > >      I don't think it relies on the CPU at all.
> > > > > >   I have no idea about this currently, but Jason and I designed the
> > > > > > test case.
> > > > > > Add Jason: Have any comments about this ?
> > > > > 
> > > > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > > > test should be independent to any kinds of emulation. It should pass
> > > > > when mainloop work.
> > > > I've just seen a hang with ppc64 guest on s390x host, so it is
> > > > indeed not specific to s390x guest (and so not specific to
> > > > virtio-net either, since the ppc64 guest setup uses e1000).
> > > > 
> > > > thanks
> > > > -- PMM
> > > 
> > > 
> > > Finally reproduced locally after hundreds (sometimes thousands) times of
> > > running.
> > > 
> > > Bisection points to OOB monitor[1].
> > > 
> > > It looks to me after OOB is used unconditionally we lose a barrier to make
> > > sure socket is connected before sending packets in test-filter-mirror.c. Is
> > > there any other similar and simple thing that we could do to kick the
> > > mainloop?
> > 
> > Do you mean the:
> > 
> >     /* send a qmp command to guarantee that 'connected' is setting to true. */
> >     qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> > 
> > why was that ever sufficient to know the socket was ready?
> 
> This doesn't make any sense to me.
> 
> There's the netdev socket, which has been passed in as a pre-opened socket
> FD, so that's guaranteed connected.
> 
> There's the chardev server socket, to which we've just done a unix_connect()
> call to establish a connection. If unix_connect() has succeeded, then at least
> the socket is connected & ready for I/O from the test's side. This is a
> reliable stream socket, so even if the test sends data on the socket right away
> and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
> as soon as QEMU starts to monitor for incoming data on the socket.
> 
> So I don't get what trying to wait for a "connected" state actually achieves.
> It feels like a mistaken attempt to paper over some other unknown flaw that
> just worked by some lucky side-effect.

Immediately after writing that, I see what's happened.

The  filter_redirector_receive_iov() method is triggered when QEMU reads
from the -netdev socket (which we passed in as an FD and immediately
write to).

This method will discard all data, however, if the chr_out -chardev is
not in a connected state. So we do indeed have a race condition in this
test suite.

In fact I'd say this filter-mirror object is racy by design even when
run in normal usage, if your chardev is a server mode with "nowait" set,
or is a client mode with "reconnect" set. It will simply discard data.

We can fix the test suite by using FD passing for the -chardev
too, so we're guaranteed to be connected immediately.  It might be
possible to remove "nowait" flag, but I'm not sure if that will cause
problems with the qtest handshake as it might block QEMU at startup
preventing qtest handshake from being performed.

If we care about the race in real QEMU execution, then we must either
document that "nowait" or "reconnect" should never be used with
filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
to synchronize startup fo the filter-mirror object with the chardev
initialization. That could fix the test suite too

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24 10:30               ` Daniel P. Berrangé
@ 2019-01-24 11:01                 ` Daniel P. Berrangé
  2019-01-25  7:12                   ` Jason Wang
  2019-01-25  7:00                 ` Jason Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2019-01-24 11:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, Jason Wang, QEMU Developers, Peter Xu,
	Zhang Chen, Paolo Bonzini

On Thu, Jan 24, 2019 at 10:30:23AM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
> > On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
> > > Do you mean the:
> > > 
> > >     /* send a qmp command to guarantee that 'connected' is setting to true. */
> > >     qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> > > 
> > > why was that ever sufficient to know the socket was ready?
> > 
> > This doesn't make any sense to me.
> > 
> > There's the netdev socket, which has been passed in as a pre-opened socket
> > FD, so that's guaranteed connected.
> > 
> > There's the chardev server socket, to which we've just done a unix_connect()
> > call to establish a connection. If unix_connect() has succeeded, then at least
> > the socket is connected & ready for I/O from the test's side. This is a
> > reliable stream socket, so even if the test sends data on the socket right away
> > and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
> > as soon as QEMU starts to monitor for incoming data on the socket.
> > 
> > So I don't get what trying to wait for a "connected" state actually achieves.
> > It feels like a mistaken attempt to paper over some other unknown flaw that
> > just worked by some lucky side-effect.
> 
> Immediately after writing that, I see what's happened.
> 
> The  filter_redirector_receive_iov() method is triggered when QEMU reads
> from the -netdev socket (which we passed in as an FD and immediately
> write to).
> 
> This method will discard all data, however, if the chr_out -chardev is
> not in a connected state. So we do indeed have a race condition in this
> test suite.
> 
> In fact I'd say this filter-mirror object is racy by design even when
> run in normal usage, if your chardev is a server mode with "nowait" set,
> or is a client mode with "reconnect" set. It will simply discard data.
> 
> We can fix the test suite by using FD passing for the -chardev
> too, so we're guaranteed to be connected immediately.  It might be
> possible to remove "nowait" flag, but I'm not sure if that will cause
> problems with the qtest handshake as it might block QEMU at startup
> preventing qtest handshake from being performed.
> 
> If we care about the race in real QEMU execution, then we must either
> document that "nowait" or "reconnect" should never be used with
> filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
> to synchronize startup fo the filter-mirror object with the chardev
> initialization. That could fix the test suite too

Actually using qemu_chr_wait_connected would cause the test suite to
hang, and it wouldn't fix data loss in the case where the chardev
disconnected and then waited to connect again.

I think the core problem here is that the netdev code assumes that the
filters are always able to process packets. A proper solution would
involve the filters having a "bool ready" state and callback to notify
the netdev anytime this state changes.

The filter-mirror should *not* report ready until the chardev has been
opened.

The netdevs should then not read packets off the wire unless all the
regsitered filters are reporting that they are ready. If a filter then
transitions to not-ready, the netdev should again stop reading packets
off the wire & queue any that it might have had in flight, until the
filter becomes ready again.

Without this kind of setup the filters are inherantly racy in several
of the possible -chardev  configurations.

In that sense the flaky test has actually done us a favour showing that
the code is broken. It is not in fact the test that is broken, and though
we could workaround it in the test that doens't fix the root cause problem.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  9:11               ` Dr. David Alan Gilbert
  2019-01-24  9:51                 ` Peter Xu
@ 2019-01-25  3:45                 ` Jason Wang
  1 sibling, 0 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-25  3:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu, Zhang Chen,
	Paolo Bonzini


On 2019/1/24 下午5:11, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>
>>>>>>>        * Peter Maydell (peter.maydell@linaro.org
>>>>>>>        <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>        > intermittently, typically when run on some other TCG architecture.
>>>>>>>        > In the instance I've just looked at, this was with s390x guest on
>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>        > perhaps with other guests.
>>>>>>>
>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>
>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>        > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>        > in the past) ?
>>>>>>>
>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>> test case.
>>>>>>> Add Jason: Have any comments about this ?
>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>> when mainloop work.
>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>> indeed not specific to s390x guest (and so not specific to
>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>
>>>>> thanks
>>>>> -- PMM
>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>> running.
>>>>
>>>> Bisection points to OOB monitor[1].
>>>>
>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>> there any other similar and simple thing that we could do to kick the
>>>> mainloop?
>>> Do you mean the:
>>>
>>>       /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>
>> Yes.
>>
>>
>>> why was that ever sufficient to know the socket was ready?
>>
>> It was suggested by Fam, I don't remember the details. Can we make sure all
>> pending events has been processed (UNIX socket was set to connected) after
>> query-status is returned with an non OOB monitor?
> I'm not sure - it doesn't sound like a 'query-status' should ensure
> anything else.
> How about something like a 'query-chardev' - can that tell you what you
> need and loop until it's ready?
>
> Dave


That may work.

Thanks


>
>> Thanks
>>
>>
>>> Dave
>>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  9:51                 ` Peter Xu
@ 2019-01-25  3:55                   ` Jason Wang
  2019-01-25  7:14                     ` Markus Armbruster
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-25  3:55 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Zhang Chen, Paolo Bonzini


On 2019/1/24 下午5:51, Peter Xu wrote:
> On Thu, Jan 24, 2019 at 09:11:15AM +0000, Dr. David Alan Gilbert wrote:
>> * Jason Wang (jasowang@redhat.com) wrote:
>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>>
>>>>>>>>        * Peter Maydell (peter.maydell@linaro.org
>>>>>>>>        <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>        > intermittently, typically when run on some other TCG architecture.
>>>>>>>>        > In the instance I've just looked at, this was with s390x guest on
>>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>        > perhaps with other guests.
>>>>>>>>
>>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>>
>>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>>        > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>>        > in the past) ?
>>>>>>>>
>>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>>> test case.
>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>> when mainloop work.
>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>
>>>>>> thanks
>>>>>> -- PMM
>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>> running.
>>>>>
>>>>> Bisection points to OOB monitor[1].
>>>>>
>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>>> there any other similar and simple thing that we could do to kick the
>>>>> mainloop?
>>>> Do you mean the:
>>>>
>>>>       /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>
>>> Yes.
>>>
>>>
>>>> why was that ever sufficient to know the socket was ready?
>>>
>>> It was suggested by Fam, I don't remember the details. Can we make sure all
>>> pending events has been processed (UNIX socket was set to connected) after
>>> query-status is returned with an non OOB monitor?
>> I'm not sure - it doesn't sound like a 'query-status' should ensure
>> anything else.
>> How about something like a 'query-chardev' - can that tell you what you
>> need and loop until it's ready?
> Yeah it sounds hacky to use "query status" to make sure a specific
> chardev is connected even before the OOB...


Probably, but anyway it works before OOB.


>
> I saw that currently the chardev requires "nowait":
>
>      qts = qtest_initf(
>          "-netdev socket,id=qtest-bn0,fd=%d "
>          "-device %s,netdev=qtest-bn0,id=qtest-e0 "
>          "-chardev socket,id=mirror0,path=%s,server,nowait "
>          "-object filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0 "
>          , send_sock[1], devstr, sock_path);
>
> Could it work without "nowait"?  Would that make sure QEMU will wait
> until connection established before going on?


Doesn't work for qtest which will wait for the qemu as well.

Thanks


>
> Regards,
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24  9:47               ` Markus Armbruster
@ 2019-01-25  3:56                 ` Jason Wang
  2019-01-25  7:12                   ` Markus Armbruster
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-25  3:56 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Dr. David Alan Gilbert, Peter Maydell, Li Zhijian,
	QEMU Developers, Peter Xu, Zhang Chen, Paolo Bonzini


On 2019/1/24 下午5:47, Markus Armbruster wrote:
> Please cc: me on QMP issues.


Ok.


>
> Jason Wang <jasowang@redhat.com> writes:
>
>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>
>>>>>>>        * Peter Maydell (peter.maydell@linaro.org
>>>>>>>        <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>        > intermittently, typically when run on some other TCG architecture.
>>>>>>>        > In the instance I've just looked at, this was with s390x guest on
>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>        > perhaps with other guests.
>>>>>>>
>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>
>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>        > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>        > in the past) ?
>>>>>>>
>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>> test case.
>>>>>>> Add Jason: Have any comments about this ?
>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>> when mainloop work.
>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>> indeed not specific to s390x guest (and so not specific to
>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>
>>>>> thanks
>>>>> -- PMM
>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>> running.
>>>>
>>>> Bisection points to OOB monitor[1].
>>>>
>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>> there any other similar and simple thing that we could do to kick the
>>>> mainloop?
>>> Do you mean the:
>>>
>>>       /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>
>> Yes.
>>
>>
>>> why was that ever sufficient to know the socket was ready?
>>
>> It was suggested by Fam, I don't remember the details. Can we make
>> sure all pending events has been processed (UNIX socket was set to
>> connected) after query-status is returned with an non OOB monitor?
> I'm afraid I lack context.  Which socket are you talking about?  The
> test has at least the QMP socket, the send_sock[], and recv_sock.  What
> exactly are you trying to accomplish?


I mean recv_sock. If mirror tries to send a packet to it before its 
is_connected is set to true, packet will be dropped.


>
> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
> looks rather fishy.  Why create a temporary file only to create a Unix
> domain socket right over it?


I vaguely remember passing fd created by unix domain socket doesn't work 
when the test is introduced. So my understanding is the author needs a 
way to create a unique file name which will be used b Unix domain socket 
at that time.



>   Why is ignoring errors a good idea?


I don't get, which error is missed, it checks the return value of both 
mkstemp() and unix_connect().

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24 10:30               ` Daniel P. Berrangé
  2019-01-24 11:01                 ` Daniel P. Berrangé
@ 2019-01-25  7:00                 ` Jason Wang
  1 sibling, 0 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-25  7:00 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu, Zhang Chen,
	Paolo Bonzini


On 2019/1/24 下午6:30, Daniel P. Berrangé wrote:
> On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
>> On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>> <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>
>>>>>>>       * Peter Maydell (peter.maydell@linaro.org
>>>>>>>       <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>       > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>       > intermittently, typically when run on some other TCG architecture.
>>>>>>>       > In the instance I've just looked at, this was with s390x guest on
>>>>>>>       > x86-64 host, though I've also seen it on other host archs and
>>>>>>>       > perhaps with other guests.
>>>>>>>
>>>>>>>       Watch out to see if you really do see it for other guests;
>>>>>>>       it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>       uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>
>>>>>>>       > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>       > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>       > memory bitmaps like the migration test (which also hangs
>>>>>>>       > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>       > in the past) ?
>>>>>>>
>>>>>>>       I don't think it relies on the CPU at all.
>>>>>>>    I have no idea about this currently, but Jason and I designed the
>>>>>>> test case.
>>>>>>> Add Jason: Have any comments about this ?
>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>> when mainloop work.
>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>> indeed not specific to s390x guest (and so not specific to
>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>
>>>>> thanks
>>>>> -- PMM
>>>>
>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>> running.
>>>>
>>>> Bisection points to OOB monitor[1].
>>>>
>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>> there any other similar and simple thing that we could do to kick the
>>>> mainloop?
>>> Do you mean the:
>>>
>>>      /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>
>>> why was that ever sufficient to know the socket was ready?
>> This doesn't make any sense to me.
>>
>> There's the netdev socket, which has been passed in as a pre-opened socket
>> FD, so that's guaranteed connected.
>>
>> There's the chardev server socket, to which we've just done a unix_connect()
>> call to establish a connection. If unix_connect() has succeeded, then at least
>> the socket is connected & ready for I/O from the test's side. This is a
>> reliable stream socket, so even if the test sends data on the socket right away
>> and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
>> as soon as QEMU starts to monitor for incoming data on the socket.
>>
>> So I don't get what trying to wait for a "connected" state actually achieves.
>> It feels like a mistaken attempt to paper over some other unknown flaw that
>> just worked by some lucky side-effect.
> Immediately after writing that, I see what's happened.
>
> The  filter_redirector_receive_iov() method is triggered when QEMU reads
> from the -netdev socket (which we passed in as an FD and immediately
> write to).
>
> This method will discard all data, however, if the chr_out -chardev is
> not in a connected state. So we do indeed have a race condition in this
> test suite.
>
> In fact I'd say this filter-mirror object is racy by design even when
> run in normal usage, if your chardev is a server mode with "nowait" set,
> or is a client mode with "reconnect" set. It will simply discard data.


Is this issue only existed in the case of mirror? It looks to me some 
other user of chardev has the same assumption. They neither wait for the 
socket to be connected nor process the CHR_EVENT_OPEN.


>
> We can fix the test suite by using FD passing for the -chardev
> too, so we're guaranteed to be connected immediately.


Good to know this, I remember when the case is introduced this doesn't 
work. Will post a fix shortly.


>    It might be
> possible to remove "nowait" flag, but I'm not sure if that will cause
> problems with the qtest handshake as it might block QEMU at startup
> preventing qtest handshake from being performed.


Yes, nowait doesn't work qtest wait for qemu in this case.


>
> If we care about the race in real QEMU execution, then we must either
> document that "nowait" or "reconnect" should never be used with
> filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
> to synchronize startup fo the filter-mirror object with the chardev
> initialization. That could fix the test suite too


 From my point of view, the issue is tcp_chr_write() drop packet 
silently. If it can return error in this case, caller can decide e.g 
queue the packets and resubmit when connection is established?

Thanks


>
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-25  3:56                 ` Jason Wang
@ 2019-01-25  7:12                   ` Markus Armbruster
  2019-01-25  8:12                     ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Markus Armbruster @ 2019-01-25  7:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Maydell, Li Zhijian, Dr. David Alan Gilbert, Peter Xu,
	QEMU Developers, Zhang Chen, Paolo Bonzini

Jason Wang <jasowang@redhat.com> writes:

> On 2019/1/24 下午5:47, Markus Armbruster wrote:
>> Please cc: me on QMP issues.
>
>
> Ok.
>
>
>>
>> Jason Wang <jasowang@redhat.com> writes:
>>
>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>>
>>>>>>>>        * Peter Maydell (peter.maydell@linaro.org
>>>>>>>>        <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>        > intermittently, typically when run on some other TCG architecture.
>>>>>>>>        > In the instance I've just looked at, this was with s390x guest on
>>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>        > perhaps with other guests.
>>>>>>>>
>>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>>
>>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>>        > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>>        > in the past) ?
>>>>>>>>
>>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>>> test case.
>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>> when mainloop work.
>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>
>>>>>> thanks
>>>>>> -- PMM
>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>> running.
>>>>>
>>>>> Bisection points to OOB monitor[1].
>>>>>
>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>>> there any other similar and simple thing that we could do to kick the
>>>>> mainloop?
>>>> Do you mean the:
>>>>
>>>>       /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>
>>> Yes.
>>>
>>>
>>>> why was that ever sufficient to know the socket was ready?
>>>
>>> It was suggested by Fam, I don't remember the details. Can we make
>>> sure all pending events has been processed (UNIX socket was set to
>>> connected) after query-status is returned with an non OOB monitor?
>> I'm afraid I lack context.  Which socket are you talking about?  The
>> test has at least the QMP socket, the send_sock[], and recv_sock.  What
>> exactly are you trying to accomplish?
>
>
> I mean recv_sock. If mirror tries to send a packet to it before its
> is_connected is set to true, packet will be dropped.

So the *socket* is connected (in the TCP sense), but something else
(whatever owns is_connected) is not.  Can you point me to where
is_connected is set to true?

>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
>> looks rather fishy.  Why create a temporary file only to create a Unix
>> domain socket right over it?
>
>
> I vaguely remember passing fd created by unix domain socket doesn't
> work when the test is introduced. So my understanding is the author
> needs a way to create a unique file name which will be used b Unix
> domain socket at that time.

We should really, really, really improve the test harness to run each
test program in its very own temporary directory.  Then tests can simply
create files with fixed names, and leave cleanup to the test harness.

>>   Why is ignoring errors a good idea?
>
>
> I don't get, which error is missed, it checks the return value of both
> mkstemp() and unix_connect().

Now I neglected to provide enough context for you :)

I read

    recv_sock = unix_connect(sock_path, NULL);

and immediately went "why are errors ignored".  If I had read on (as I
should've), I would've seen the are not:

    g_assert_cmpint(recv_sock, !=, -1);

Sorry for the noise.

I'd replace both lines by

    recv_sock = unix_connect(sock_path, &error_abort);

Reports the actual error, which is an obvious improvement, with the
location pointing to the failing spot within unix_connect().  To find
where unix_connect() was called, you need to examine the stack
backtrace.  Strictly more information, but your actual mileage may vary.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-24 11:01                 ` Daniel P. Berrangé
@ 2019-01-25  7:12                   ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2019-01-25  7:12 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert
  Cc: Peter Maydell, Li Zhijian, QEMU Developers, Peter Xu, Zhang Chen,
	Paolo Bonzini


On 2019/1/24 下午7:01, Daniel P. Berrangé wrote:
> On Thu, Jan 24, 2019 at 10:30:23AM +0000, Daniel P. Berrangé wrote:
>> On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
>>> On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
>>>> Do you mean the:
>>>>
>>>>      /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>>
>>>> why was that ever sufficient to know the socket was ready?
>>> This doesn't make any sense to me.
>>>
>>> There's the netdev socket, which has been passed in as a pre-opened socket
>>> FD, so that's guaranteed connected.
>>>
>>> There's the chardev server socket, to which we've just done a unix_connect()
>>> call to establish a connection. If unix_connect() has succeeded, then at least
>>> the socket is connected & ready for I/O from the test's side. This is a
>>> reliable stream socket, so even if the test sends data on the socket right away
>>> and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
>>> as soon as QEMU starts to monitor for incoming data on the socket.
>>>
>>> So I don't get what trying to wait for a "connected" state actually achieves.
>>> It feels like a mistaken attempt to paper over some other unknown flaw that
>>> just worked by some lucky side-effect.
>> Immediately after writing that, I see what's happened.
>>
>> The  filter_redirector_receive_iov() method is triggered when QEMU reads
>> from the -netdev socket (which we passed in as an FD and immediately
>> write to).
>>
>> This method will discard all data, however, if the chr_out -chardev is
>> not in a connected state. So we do indeed have a race condition in this
>> test suite.
>>
>> In fact I'd say this filter-mirror object is racy by design even when
>> run in normal usage, if your chardev is a server mode with "nowait" set,
>> or is a client mode with "reconnect" set. It will simply discard data.
>>
>> We can fix the test suite by using FD passing for the -chardev
>> too, so we're guaranteed to be connected immediately.  It might be
>> possible to remove "nowait" flag, but I'm not sure if that will cause
>> problems with the qtest handshake as it might block QEMU at startup
>> preventing qtest handshake from being performed.
>>
>> If we care about the race in real QEMU execution, then we must either
>> document that "nowait" or "reconnect" should never be used with
>> filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
>> to synchronize startup fo the filter-mirror object with the chardev
>> initialization. That could fix the test suite too
> Actually using qemu_chr_wait_connected would cause the test suite to
> hang, and it wouldn't fix data loss in the case where the chardev
> disconnected and then waited to connect again.
>
> I think the core problem here is that the netdev code assumes that the
> filters are always able to process packets. A proper solution would
> involve the filters having a "bool ready" state and callback to notify
> the netdev anytime this state changes.
>
> The filter-mirror should *not* report ready until the chardev has been
> opened.
>
> The netdevs should then not read packets off the wire unless all the
> regsitered filters are reporting that they are ready.


Netdev should know nothing about filters. And there will be still a race 
between iterating all filters and handling disconnection if we did this.


>   If a filter then
> transitions to not-ready, the netdev should again stop reading packets
> off the wire & queue any that it might have had in flight, until the
> filter becomes ready again.


I agree to queue the packets in this case.

Thanks


>
> Without this kind of setup the filters are inherantly racy in several
> of the possible -chardev  configurations.
>
> In that sense the flaky test has actually done us a favour showing that
> the code is broken. It is not in fact the test that is broken, and though
> we could workaround it in the test that doens't fix the root cause problem.
>
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-25  3:55                   ` Jason Wang
@ 2019-01-25  7:14                     ` Markus Armbruster
  0 siblings, 0 replies; 26+ messages in thread
From: Markus Armbruster @ 2019-01-25  7:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Xu, Dr. David Alan Gilbert, Peter Maydell, Paolo Bonzini,
	QEMU Developers, Li Zhijian, Zhang Chen

Jason Wang <jasowang@redhat.com> writes:

> On 2019/1/24 下午5:51, Peter Xu wrote:
>> On Thu, Jan 24, 2019 at 09:11:15AM +0000, Dr. David Alan Gilbert wrote:
>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>>>
>>>>>>>>>        * Peter Maydell (peter.maydell@linaro.org
>>>>>>>>>        <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>>        > intermittently, typically when run on some other TCG architecture.
>>>>>>>>>        > In the instance I've just looked at, this was with s390x guest on
>>>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>>        > perhaps with other guests.
>>>>>>>>>
>>>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>>>
>>>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>>>        > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>>>        > in the past) ?
>>>>>>>>>
>>>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>>>> test case.
>>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>>> when mainloop work.
>>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>>
>>>>>>> thanks
>>>>>>> -- PMM
>>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>>> running.
>>>>>>
>>>>>> Bisection points to OOB monitor[1].
>>>>>>
>>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>>>> there any other similar and simple thing that we could do to kick the
>>>>>> mainloop?
>>>>> Do you mean the:
>>>>>
>>>>>       /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>>
>>>> Yes.
>>>>
>>>>
>>>>> why was that ever sufficient to know the socket was ready?
>>>>
>>>> It was suggested by Fam, I don't remember the details. Can we make sure all
>>>> pending events has been processed (UNIX socket was set to connected) after
>>>> query-status is returned with an non OOB monitor?
>>> I'm not sure - it doesn't sound like a 'query-status' should ensure
>>> anything else.
>>> How about something like a 'query-chardev' - can that tell you what you
>>> need and loop until it's ready?
>> Yeah it sounds hacky to use "query status" to make sure a specific
>> chardev is connected even before the OOB...
>
>
> Probably, but anyway it works before OOB.

I don't doubt it worked.  Relying on inappropriate assumptions always
works just fine right until the assumptions become invalid :)

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-25  7:12                   ` Markus Armbruster
@ 2019-01-25  8:12                     ` Jason Wang
  2019-01-25  8:44                       ` Markus Armbruster
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2019-01-25  8:12 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Maydell, Li Zhijian, Dr. David Alan Gilbert, Peter Xu,
	QEMU Developers, Zhang Chen, Paolo Bonzini


On 2019/1/25 下午3:12, Markus Armbruster wrote:
> Jason Wang <jasowang@redhat.com> writes:
>
>> On 2019/1/24 下午5:47, Markus Armbruster wrote:
>>> Please cc: me on QMP issues.
>>
>> Ok.
>>
>>
>>> Jason Wang <jasowang@redhat.com> writes:
>>>
>>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>>>
>>>>>>>>>         * Peter Maydell (peter.maydell@linaro.org
>>>>>>>>>         <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>>>         > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>>         > intermittently, typically when run on some other TCG architecture.
>>>>>>>>>         > In the instance I've just looked at, this was with s390x guest on
>>>>>>>>>         > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>>         > perhaps with other guests.
>>>>>>>>>
>>>>>>>>>         Watch out to see if you really do see it for other guests;
>>>>>>>>>         it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>>>         uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>>>
>>>>>>>>>         > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>>         > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>>         > memory bitmaps like the migration test (which also hangs
>>>>>>>>>         > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>>>         > in the past) ?
>>>>>>>>>
>>>>>>>>>         I don't think it relies on the CPU at all.
>>>>>>>>>      I have no idea about this currently, but Jason and I designed the
>>>>>>>>> test case.
>>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>>> when mainloop work.
>>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>>
>>>>>>> thanks
>>>>>>> -- PMM
>>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>>> running.
>>>>>>
>>>>>> Bisection points to OOB monitor[1].
>>>>>>
>>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>>>> there any other similar and simple thing that we could do to kick the
>>>>>> mainloop?
>>>>> Do you mean the:
>>>>>
>>>>>        /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>>        qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>> Yes.
>>>>
>>>>
>>>>> why was that ever sufficient to know the socket was ready?
>>>> It was suggested by Fam, I don't remember the details. Can we make
>>>> sure all pending events has been processed (UNIX socket was set to
>>>> connected) after query-status is returned with an non OOB monitor?
>>> I'm afraid I lack context.  Which socket are you talking about?  The
>>> test has at least the QMP socket, the send_sock[], and recv_sock.  What
>>> exactly are you trying to accomplish?
>>
>> I mean recv_sock. If mirror tries to send a packet to it before its
>> is_connected is set to true, packet will be dropped.
> So the *socket* is connected (in the TCP sense),


UNIX domain socket actually in the case of this test.


> but something else
> (whatever owns is_connected) is not.  Can you point me to where
> is_connected is set to true?


Sorry, should be "connected". It was set in tcp_chr_connect(). So if 
filter want to send a packet to socket chardev before tcp_chr_connect() 
is called, the packet will be dropped silently by tcp_chr_write(). This 
will fail this unit-test.


>
>>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
>>> looks rather fishy.  Why create a temporary file only to create a Unix
>>> domain socket right over it?
>>
>> I vaguely remember passing fd created by unix domain socket doesn't
>> work when the test is introduced. So my understanding is the author
>> needs a way to create a unique file name which will be used b Unix
>> domain socket at that time.
> We should really, really, really improve the test harness to run each
> test program in its very own temporary directory.  Then tests can simply
> create files with fixed names, and leave cleanup to the test harness.


Agree, but for this test, since passing fd works now. I tend to using 
socketpair().


>>>    Why is ignoring errors a good idea?
>>
>> I don't get, which error is missed, it checks the return value of both
>> mkstemp() and unix_connect().
> Now I neglected to provide enough context for you :)
>
> I read
>
>      recv_sock = unix_connect(sock_path, NULL);
>
> and immediately went "why are errors ignored".  If I had read on (as I
> should've), I would've seen the are not:
>
>      g_assert_cmpint(recv_sock, !=, -1);
>
> Sorry for the noise.
>
> I'd replace both lines by
>
>      recv_sock = unix_connect(sock_path, &error_abort);
>
> Reports the actual error, which is an obvious improvement, with the
> location pointing to the failing spot within unix_connect().  To find
> where unix_connect() was called, you need to examine the stack
> backtrace.  Strictly more information, but your actual mileage may vary.
>

I see.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] test-filter-mirror hangs
  2019-01-25  8:12                     ` Jason Wang
@ 2019-01-25  8:44                       ` Markus Armbruster
  0 siblings, 0 replies; 26+ messages in thread
From: Markus Armbruster @ 2019-01-25  8:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Maydell, Li Zhijian, Dr. David Alan Gilbert, Peter Xu,
	QEMU Developers, Zhang Chen, Paolo Bonzini

Jason Wang <jasowang@redhat.com> writes:

> On 2019/1/25 下午3:12, Markus Armbruster wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>>
>>> On 2019/1/24 下午5:47, Markus Armbruster wrote:
>>>> Please cc: me on QMP issues.
>>>
>>> Ok.
>>>
>>>
>>>> Jason Wang <jasowang@redhat.com> writes:
>>>>
>>>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<jasowang@redhat.com>  wrote:
>>>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>>>> <dgilbert@redhat.com  <mailto:dgilbert@redhat.com>> wrote:
>>>>>>>>>>
>>>>>>>>>>         * Peter Maydell (peter.maydell@linaro.org
>>>>>>>>>>         <mailto:peter.maydell@linaro.org>) wrote:
>>>>>>>>>>         > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>>>         > intermittently, typically when run on some other TCG architecture.
>>>>>>>>>>         > In the instance I've just looked at, this was with s390x guest on
>>>>>>>>>>         > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>>>         > perhaps with other guests.
>>>>>>>>>>
>>>>>>>>>>         Watch out to see if you really do see it for other guests;
>>>>>>>>>>         it carefully avoids using virtio-net to avoid vhost; but on s390x it
>>>>>>>>>>         uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
>>>>>>>>>>
>>>>>>>>>>         > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>>>         > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>>>         > memory bitmaps like the migration test (which also hangs
>>>>>>>>>>         > occasionally with TCG due to some bug I'm sure we've investigated
>>>>>>>>>>         > in the past) ?
>>>>>>>>>>
>>>>>>>>>>         I don't think it relies on the CPU at all.
>>>>>>>>>>      I have no idea about this currently, but Jason and I designed the
>>>>>>>>>> test case.
>>>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>>>> when mainloop work.
>>>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> -- PMM
>>>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>>>> running.
>>>>>>>
>>>>>>> Bisection points to OOB monitor[1].
>>>>>>>
>>>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>>>> sure socket is connected before sending packets in test-filter-mirror.c. Is
>>>>>>> there any other similar and simple thing that we could do to kick the
>>>>>>> mainloop?
>>>>>> Do you mean the:
>>>>>>
>>>>>>        /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>>>        qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>>> Yes.
>>>>>
>>>>>
>>>>>> why was that ever sufficient to know the socket was ready?
>>>>> It was suggested by Fam, I don't remember the details. Can we make
>>>>> sure all pending events has been processed (UNIX socket was set to
>>>>> connected) after query-status is returned with an non OOB monitor?
>>>> I'm afraid I lack context.  Which socket are you talking about?  The
>>>> test has at least the QMP socket, the send_sock[], and recv_sock.  What
>>>> exactly are you trying to accomplish?
>>>
>>> I mean recv_sock. If mirror tries to send a packet to it before its
>>> is_connected is set to true, packet will be dropped.
>> So the *socket* is connected (in the TCP sense),
>
>
> UNIX domain socket actually in the case of this test.

Yes.

>> but something else
>> (whatever owns is_connected) is not.  Can you point me to where
>> is_connected is set to true?
>
>
> Sorry, should be "connected". It was set in tcp_chr_connect(). So if
> filter want to send a packet to socket chardev before
> tcp_chr_connect() is called, the packet will be dropped silently by
> tcp_chr_write(). This will fail this unit-test.

Aha: the thing that isn't connected is the character device.

>>>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
>>>> looks rather fishy.  Why create a temporary file only to create a Unix
>>>> domain socket right over it?
>>>
>>> I vaguely remember passing fd created by unix domain socket doesn't
>>> work when the test is introduced. So my understanding is the author
>>> needs a way to create a unique file name which will be used b Unix
>>> domain socket at that time.
>> We should really, really, really improve the test harness to run each
>> test program in its very own temporary directory.  Then tests can simply
>> create files with fixed names, and leave cleanup to the test harness.
>
>
> Agree, but for this test, since passing fd works now. I tend to using
> socketpair().

Resources that don't require manual cleanup (such as file descriptors
obtained with socketpair() or pipe()) are the best choice when they
work.

[..]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-01-25  8:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11 15:01 [Qemu-devel] test-filter-mirror hangs Peter Maydell
2019-01-11 16:15 ` Dr. David Alan Gilbert
2019-01-14 16:33   ` Zhang Chen
2019-01-17  9:46     ` Jason Wang
2019-01-21 18:56       ` Peter Maydell
2019-01-21 20:01         ` Dr. David Alan Gilbert
2019-01-22  9:06           ` Peter Maydell
2019-01-23  2:43         ` Jason Wang
2019-01-23 19:53           ` Dr. David Alan Gilbert
2019-01-24  4:01             ` Jason Wang
2019-01-24  9:11               ` Dr. David Alan Gilbert
2019-01-24  9:51                 ` Peter Xu
2019-01-25  3:55                   ` Jason Wang
2019-01-25  7:14                     ` Markus Armbruster
2019-01-25  3:45                 ` Jason Wang
2019-01-24  9:47               ` Markus Armbruster
2019-01-25  3:56                 ` Jason Wang
2019-01-25  7:12                   ` Markus Armbruster
2019-01-25  8:12                     ` Jason Wang
2019-01-25  8:44                       ` Markus Armbruster
2019-01-24 10:11             ` Daniel P. Berrangé
2019-01-24 10:30               ` Daniel P. Berrangé
2019-01-24 11:01                 ` Daniel P. Berrangé
2019-01-25  7:12                   ` Jason Wang
2019-01-25  7:00                 ` Jason Wang
2019-01-15 10:28   ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.