qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* migration-test hang, s390x host, aarch64 guest
@ 2021-07-29 19:37 Peter Maydell
  2021-08-02  8:55 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Maydell @ 2021-07-29 19:37 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Richard Henderson, Dr. David Alan Gilbert, Juan Quintela

migration-test hung again during 'make check'.

Process tree:

ubuntu   42067  0.0  0.0   5460  3156 ?        S    13:55   0:00
           \_ bash -o pipefail -c echo
'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
QTEST_QEMU_IMG=./qemu-img
G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-aarch64
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
tests/qtest/migration-test --tap -k' &&
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
QTEST_QEMU_IMG=./qemu-img
G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-aarch64
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
tests/qtest/migration-test --tap -k -m quick < /dev/null |
./scripts/tap-driver.pl --test-name="qtest-aarch64/migration-test"
ubuntu   42070  0.0  0.0  78844  3184 ?        Sl   13:55   0:01
               \_ tests/qtest/migration-test --tap -k -m quick
ubuntu   43248  0.0  4.1 1401368 160852 ?      Sl   13:55   0:03
               |   \_ ./qemu-system-aarch64 -qtest
unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg
-machine virt,gic-version=max -name source,debug-threads=on -m 150M
-serial file:/tmp/migration-test-SL7EkN/src_serial -cpu max -kernel
/tmp/migration-test-SL7EkN/bootsect -accel qtest
ubuntu   43256  0.0  1.4 1394208 57648 ?       Sl   13:55   0:00
               |   \_ ./qemu-system-aarch64 -qtest
unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg
-machine virt,gic-version=max -name target,debug-threads=on -m 150M
-serial file:/tmp/migration-test-SL7EkN/dest_serial -incoming
unix:/tmp/migration-test-SL7EkN/migsocket -cpu max -kernel
/tmp/migration-test-SL7EkN/bootsect -accel qtest
ubuntu   42071  0.0  0.2  14116 11372 ?        S    13:55   0:00
               \_ perl ./scripts/tap-driver.pl
--test-name=qtest-aarch64/migration-test

Backtrace, migration-test process:

Thread 2 (Thread 0x3ff8ff7f910 (LWP 42075)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
#1  0x000002aa1d0d1c1c in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x2aa1d0fda58 <rcu_call_ready_event>)
at ../../util/qemu-thread-posix.c:480
#3  0x000002aa1d0d0884 in call_rcu_thread (opaque=opaque@entry=0x0) at
../../util/rcu.c:258
#4  0x000002aa1d0d0c32 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#5  0x000003ff90207aa8 in start_thread (arg=0x3ff8ff7f910) at
pthread_create.c:463
#6  0x000003ff900fa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff905f7c00 (LWP 42070)):
#0  0x000003ff90212978 in __waitpid (pid=<optimized out>,
stat_loc=stat_loc@entry=0x2aa1d5e06bc, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x000002aa1d0a2176 in qtest_kill_qemu (s=0x2aa1d5e06b0) at
../../tests/qtest/libqtest.c:144
#2  0x000003ff903c0de6 in g_hook_list_invoke () from
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x000003ff9003f3ca in __GI_abort () at abort.c:79
#6  0x000003ff903fcbb2 in g_assertion_message () from
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#7  0x000003ff903fd2b4 in g_assertion_message_cmpstr () from
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#8  0x000002aa1d0a10f6 in check_migration_status
(ungoals=0x3ffffdfe198, goal=0x2aa1d0d75c0 "postcopy-paused",
who=0x2aa1d5dee90)
    at ../../tests/qtest/migration-helpers.c:146
#9  wait_for_migration_status (who=0x2aa1d5dee90, goal=<optimized
out>, ungoals=0x3ffffdfe198)
    at ../../tests/qtest/migration-helpers.c:156
#10 0x000002aa1d09f610 in test_postcopy_recovery () at
../../tests/qtest/migration-test.c:773
#11 0x000003ff903fc70c in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#12 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x000003ff903fc8ee in g_test_run_suite () from
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#16 0x000003ff903fc928 in g_test_run () from
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#17 0x000002aa1d09d8f2 in main (argc=<optimized out>, argv=<optimized
out>) at ../../tests/qtest/migration-test.c:1495


Backtrace, QEMU process 43248:

Thread 5 (Thread 0x3ff8d7fe910 (LWP 43361)):
#0  0x000003ff9d110582 in futex_abstimed_wait_cancelable (private=0,
abstime=0x0, expected=0, futex_word=0x2aa1e27dfac)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  0x000003ff9d110582 in do_futex_wait (sem=sem@entry=0x2aa1e27dfa8,
abstime=0x0) at sem_waitcommon.c:111
#2  0x000003ff9d11068c in __new_sem_wait_slow
(sem=sem@entry=0x2aa1e27dfa8, abstime=0x0) at sem_waitcommon.c:181
#3  0x000003ff9d110734 in __new_sem_wait (sem=sem@entry=0x2aa1e27dfa8)
at sem_wait.c:42
#4  0x000002aa1b6dfb96 in qemu_sem_wait (sem=0x2aa1e27dfa8,
sem@entry=<error reading variable: value has been optimized out>)
    at ../../util/qemu-thread-posix.c:357
#5  0x000002aa1aefe5b0 in postcopy_pause_return_path_thread
(s=0x2aa1e27dc00) at ../../migration/migration.c:2654
#6  0x000002aa1aefe5b0 in source_return_path_thread
(opaque=opaque@entry=0x2aa1e27dc00) at
../../migration/migration.c:2854
#7  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#8  0x000003ff9d107aa8 in start_thread (arg=0x3ff8d7fe910) at
pthread_create.c:463
#9  0x000003ff9cffa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 4 (Thread 0x3ff8ed42910 (LWP 43252)):
#0  0x000003ff9d10d9fe in futex_wait_cancelable (private=<optimized
out>, expected=0, futex_word=0x2aa1e49ca68)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x000003ff9d10d9fe in __pthread_cond_wait_common (abstime=0x0,
mutex=0x2aa1c1a4840 <qemu_global_mutex>, cond=0x2aa1e49ca40)
    at pthread_cond_wait.c:502
#2  0x000003ff9d10d9fe in __pthread_cond_wait
(cond=cond@entry=0x2aa1e49ca40, mutex=mutex@entry=0x2aa1c1a4840
<qemu_global_mutex>)
    at pthread_cond_wait.c:655
#3  0x000002aa1b6df59e in qemu_cond_wait_impl (cond=0x2aa1e49ca40,
mutex=0x2aa1c1a4840 <qemu_global_mutex>, file=0x2aa1b8a93c6
"../../softmmu/cpus.c", line=<optimized out>) at
../../util/qemu-thread-posix.c:194
#4  0x000002aa1b40764e in qemu_wait_io_event
(cpu=cpu@entry=0x2aa1e43aa50) at ../../softmmu/cpus.c:419
#5  0x000002aa1b406916 in mttcg_cpu_thread_fn
(arg=arg@entry=0x2aa1e43aa50) at
../../accel/tcg/tcg-accel-ops-mttcg.c:98
#6  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#7  0x000003ff9d107aa8 in start_thread (arg=0x3ff8ed42910) at
pthread_create.c:463
#8  0x000003ff9cffa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 3 (Thread 0x3ff8f735910 (LWP 43251)):
#0  0x000003ff9cfef1b0 in __GI___poll (fds=0x3ff64003620, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x000003ff9e852624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#2  0x000003ff9e852aa8 in g_main_loop_run () at
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  0x000002aa1b4eaf0e in iothread_run
(opaque=opaque@entry=0x2aa1df83b80) at ../../iothread.c:73
#4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#5  0x000003ff9d107aa8 in start_thread (arg=0x3ff8f735910) at
pthread_create.c:463
#6  0x000003ff9cffa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 2 (Thread 0x3ff90037910 (LWP 43250)):
#0  0x000003ff9cff52ea in syscall () at
../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
#1  0x000002aa1b6dfd5c in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  0x000002aa1b6dfd5c in qemu_event_wait (ev=ev@entry=0x2aa1c1c3888
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
#3  0x000002aa1b6d917c in call_rcu_thread (opaque=opaque@entry=0x0) at
../../util/rcu.c:258
#4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#5  0x000003ff9d107aa8 in start_thread (arg=0x3ff90037910) at
pthread_create.c:463
#6  0x000003ff9cffa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff9feb5470 (LWP 43248)):
#0  0x000003ff9cfef302 in __GI_ppoll (fds=0x2aa1ed28d00, nfds=6,
timeout=<optimized out>,
    timeout@entry=0x3fff517ecf8, sigmask=sigmask@entry=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x000002aa1b701fec in ppoll (__ss=0x0, __timeout=0x3fff517ecf8,
__nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/s390x-linux-gnu/bits/poll2.h:77
#2  0x000002aa1b701fec in qemu_poll_ns (fds=<optimized out>,
nfds=<optimized out>, timeout=timeout@entry=1000000000)
    at ../../util/qemu-timer.c:348
#3  0x000002aa1b6d2f98 in os_host_main_loop_wait (timeout=1000000000)
at ../../util/main-loop.c:250
#4  0x000002aa1b6d2f98 in main_loop_wait
(nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531
#5  0x000002aa1b496d44 in qemu_main_loop () at ../../softmmu/runstate.c:726
#6  0x000002aa1ae6d3f0 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at ../../softmmu/main.c:50

Backtrace, QEMU process 43256:

Thread 6 (Thread 0x3ff726ff910 (LWP 43440)):
#0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
abstime=0x0, expected=0, futex_word=0x2aa1a75d2d4)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2d0,
abstime=0x0) at sem_waitcommon.c:111
#2  0x000003ffb9f1068c in __new_sem_wait_slow
(sem=sem@entry=0x2aa1a75d2d0, abstime=0x0) at sem_waitcommon.c:181
#3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
at sem_wait.c:42
#4  0x000002aa17ddfb96 in qemu_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
at ../../util/qemu-thread-posix.c:357
#5  0x000002aa17599f06 in postcopy_pause_incoming (mis=<optimized
out>) at ../../migration/savevm.c:2600
#6  0x000002aa17599f06 in qemu_loadvm_state_main
(f=f@entry=0x2aa1a751610, mis=mis@entry=0x2aa1a75d130)
    at ../../migration/savevm.c:2674
#7  0x000002aa1759af4a in postcopy_ram_listen_thread
(opaque=opaque@entry=0x0) at ../../migration/savevm.c:1872
#8  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#9  0x000003ffb9f07aa8 in start_thread (arg=0x3ff726ff910) at
pthread_create.c:463
#10 0x000003ffb9dfa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 5 (Thread 0x3ff88ffe910 (LWP 43439)):
#0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
abstime=0x0, expected=0, futex_word=0x2aa1a75d2fc)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2f8,
abstime=0x0) at sem_waitcommon.c:111
#2  0x000003ffb9f1068c in __new_sem_wait_slow
(sem=sem@entry=0x2aa1a75d2f8, abstime=0x0) at sem_waitcommon.c:181
#3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2f8)
at sem_wait.c:42
#4  0x000002aa17ddfb96 in qemu_sem_wait (sem=0x2aa1a75d2f8,
sem@entry=<error reading variable: value has been optimized out>)
    at ../../util/qemu-thread-posix.c:357
#5  0x000002aa177ae6aa in postcopy_pause_fault_thread (mis=<optimized
out>) at ../../migration/postcopy-ram.c:847
#6  0x000002aa177ae6aa in postcopy_ram_fault_thread
(opaque=opaque@entry=0x2aa1a75d130) at
../../migration/postcopy-ram.c:911
#7  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#8  0x000003ffb9f07aa8 in start_thread (arg=0x3ff88ffe910) at
pthread_create.c:463
#9  0x000003ffb9dfa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 4 (Thread 0x3ff89b18910 (LWP 43271)):
Python Exception <class 'gdb.error'> PC not saved:

Thread 3 (Thread 0x3ff8a319910 (LWP 43270)):
#0  0x000003ffb9def1b0 in __GI___poll (fds=0x3ff84003620, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x000003ffbb652624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#2  0x000003ffbb652aa8 in g_main_loop_run () at
/usr/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  0x000002aa17beaf0e in iothread_run
(opaque=opaque@entry=0x2aa1a3aeb80) at ../../iothread.c:73
#4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#5  0x000003ffb9f07aa8 in start_thread (arg=0x3ff8a319910) at
pthread_create.c:463
#6  0x000003ffb9dfa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 2 (Thread 0x3fface37910 (LWP 43267)):
#0  0x000003ffb9df52ea in syscall () at
../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
#1  0x000002aa17ddfd5c in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  0x000002aa17ddfd5c in qemu_event_wait (ev=ev@entry=0x2aa188c3888
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
#3  0x000002aa17dd917c in call_rcu_thread (opaque=opaque@entry=0x0) at
../../util/rcu.c:258
#4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
../../util/qemu-thread-posix.c:541
#5  0x000003ffb9f07aa8 in start_thread (arg=0x3fface37910) at
pthread_create.c:463
#6  0x000003ffb9dfa11e in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ffbccb5470 (LWP 43256)):
#0  0x000003ffb9f0d9fe in futex_wait_cancelable (private=<optimized
out>, expected=0, futex_word=0x2aa188a47e8 <qemu_pause_cond+40>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x000003ffb9f0d9fe in __pthread_cond_wait_common (abstime=0x0,
mutex=0x2aa188a4840 <qemu_global_mutex>, cond=0x2aa188a47c0
<qemu_pause_cond>) at pthread_cond_wait.c:502
#2  0x000003ffb9f0d9fe in __pthread_cond_wait
(cond=cond@entry=0x2aa188a47c0 <qemu_pause_cond>,
mutex=mutex@entry=0x2aa188a4840 <qemu_global_mutex>) at
pthread_cond_wait.c:655
#3  0x000002aa17ddf59e in qemu_cond_wait_impl (cond=0x2aa188a47c0
<qemu_pause_cond>, mutex=0x2aa188a4840 <qemu_global_mutex>,
file=0x2aa17fa93c6 "../../softmmu/cpus.c", line=<optimized out>) at
../../util/qemu-thread-posix.c:194
#4  0x000002aa17b079e2 in pause_all_vcpus () at ../../softmmu/cpus.c:562
#5  0x000002aa17b07aec in do_vm_stop
(state=state@entry=RUN_STATE_SHUTDOWN,
send_stop=send_stop@entry=false)
    at ../../softmmu/cpus.c:261
#6  0x000002aa17b07bb6 in vm_shutdown () at ../../softmmu/cpus.c:280
#7  0x000002aa17b9716e in qemu_cleanup () at ../../softmmu/runstate.c:812
#8  0x000002aa1756d3f6 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at ../../softmmu/main.c:51

No, I dunno why thread 4 was odd. Backtracing that thread seems to
only work about one time in two. Here's the backtrace from a
success:
#0  0x000002aa17b1ca2e in load_memop (op=MO_8, haddr=0x3ff80c6d000) at
../../accel/tcg/cputlb.c:1860
#1  0x000002aa17b1ca2e in load_helper (full_load=<optimized out>,
code_read=false, op=MO_8, retaddr=4396070052416, oi=<optimized out>,
addr=1177997312, env=0x2aa1a86f890) at ../../accel/tcg/cputlb.c:1980
#2  0x000002aa17b1ca2e in full_ldub_mmu (env=0x2aa1a86f890,
addr=1177997312, oi=<optimized out>, retaddr=4396070052416)
    at ../../accel/tcg/cputlb.c:1996
#3  0x000003ff8a31a35e in code_gen_buffer ()
#4  0x000002aa17bbabb4 in cpu_tb_exec (tb_exit=<synthetic pointer>,
itb=<optimized out>, cpu=<optimized out>)
    at ../../accel/tcg/cpu-exec.c:353
#5  0x000002aa17bbabb4 in cpu_loop_exec_tb (tb_exit=<synthetic
pointer>, last_tb=<synthetic pointer>, tb=<optimized out>,
cpu=<optimized out>) at ../../accel/tcg/cpu-exec.c:812
#6  0x000002aa17bbabb4 in cpu_exec (cpu=<optimized out>) at
../../accel/tcg/cpu-exec.c:970
#7  0x000002aa18764620 in  ()

-- PMM


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: migration-test hang, s390x host, aarch64 guest
  2021-07-29 19:37 migration-test hang, s390x host, aarch64 guest Peter Maydell
@ 2021-08-02  8:55 ` Dr. David Alan Gilbert
  2021-08-03  9:11   ` Dr. David Alan Gilbert
  2021-08-03 20:53   ` Peter Xu
  0 siblings, 2 replies; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-02  8:55 UTC (permalink / raw)
  To: Peter Maydell, peterx; +Cc: Richard Henderson, QEMU Developers, Juan Quintela

* Peter Maydell (peter.maydell@linaro.org) wrote:
> migration-test hung again during 'make check'.

ccing in Peter Xu

> Process tree:
> 
> ubuntu   42067  0.0  0.0   5460  3156 ?        S    13:55   0:00
>            \_ bash -o pipefail -c echo
> 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-aarch64
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/migration-test --tap -k' &&
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-aarch64
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/migration-test --tap -k -m quick < /dev/null |
> ./scripts/tap-driver.pl --test-name="qtest-aarch64/migration-test"
> ubuntu   42070  0.0  0.0  78844  3184 ?        Sl   13:55   0:01
>                \_ tests/qtest/migration-test --tap -k -m quick
> ubuntu   43248  0.0  4.1 1401368 160852 ?      Sl   13:55   0:03
>                |   \_ ./qemu-system-aarch64 -qtest
> unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg
> -machine virt,gic-version=max -name source,debug-threads=on -m 150M
> -serial file:/tmp/migration-test-SL7EkN/src_serial -cpu max -kernel
> /tmp/migration-test-SL7EkN/bootsect -accel qtest
> ubuntu   43256  0.0  1.4 1394208 57648 ?       Sl   13:55   0:00
>                |   \_ ./qemu-system-aarch64 -qtest
> unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg
> -machine virt,gic-version=max -name target,debug-threads=on -m 150M
> -serial file:/tmp/migration-test-SL7EkN/dest_serial -incoming
> unix:/tmp/migration-test-SL7EkN/migsocket -cpu max -kernel
> /tmp/migration-test-SL7EkN/bootsect -accel qtest
> ubuntu   42071  0.0  0.2  14116 11372 ?        S    13:55   0:00
>                \_ perl ./scripts/tap-driver.pl
> --test-name=qtest-aarch64/migration-test
> 
> Backtrace, migration-test process:
> 
> Thread 2 (Thread 0x3ff8ff7f910 (LWP 42075)):
> #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa1d0d1c1c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  qemu_event_wait (ev=ev@entry=0x2aa1d0fda58 <rcu_call_ready_event>)
> at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa1d0d0884 in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa1d0d0c32 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff90207aa8 in start_thread (arg=0x3ff8ff7f910) at
> pthread_create.c:463
> #6  0x000003ff900fa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff905f7c00 (LWP 42070)):
> #0  0x000003ff90212978 in __waitpid (pid=<optimized out>,
> stat_loc=stat_loc@entry=0x2aa1d5e06bc, options=options@entry=0)
>     at ../sysdeps/unix/sysv/linux/waitpid.c:30
> #1  0x000002aa1d0a2176 in qtest_kill_qemu (s=0x2aa1d5e06b0) at
> ../../tests/qtest/libqtest.c:144
> #2  0x000003ff903c0de6 in g_hook_list_invoke () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  <signal handler called>
> #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #5  0x000003ff9003f3ca in __GI_abort () at abort.c:79
> #6  0x000003ff903fcbb2 in g_assertion_message () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #7  0x000003ff903fd2b4 in g_assertion_message_cmpstr () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #8  0x000002aa1d0a10f6 in check_migration_status
> (ungoals=0x3ffffdfe198, goal=0x2aa1d0d75c0 "postcopy-paused",
> who=0x2aa1d5dee90)

            g_assert_cmpstr(current_status, !=,  *ungoal);

>     at ../../tests/qtest/migration-helpers.c:146
> #9  wait_for_migration_status (who=0x2aa1d5dee90, goal=<optimized
> out>, ungoals=0x3ffffdfe198)
>     at ../../tests/qtest/migration-helpers.c:156
> #10 0x000002aa1d09f610 in test_postcopy_recovery () at
> ../../tests/qtest/migration-test.c:773

    wait_for_migration_status(from, "postcopy-paused",
                              (const char * []) { "failed", "active",
                                                  "completed", NULL });

so I think that means it hit one of failed/active/completed rather than
the postcopy-paused it expected (shame we can't tell which)

> #11 0x000003ff903fc70c in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #12 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #13 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #14 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #15 0x000003ff903fc8ee in g_test_run_suite () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #16 0x000003ff903fc928 in g_test_run () from
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #17 0x000002aa1d09d8f2 in main (argc=<optimized out>, argv=<optimized
> out>) at ../../tests/qtest/migration-test.c:1495
> 
> 
> Backtrace, QEMU process 43248:
> 
> Thread 5 (Thread 0x3ff8d7fe910 (LWP 43361)):
> #0  0x000003ff9d110582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1e27dfac)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ff9d110582 in do_futex_wait (sem=sem@entry=0x2aa1e27dfa8,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ff9d11068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1e27dfa8, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ff9d110734 in __new_sem_wait (sem=sem@entry=0x2aa1e27dfa8)
> at sem_wait.c:42
> #4  0x000002aa1b6dfb96 in qemu_sem_wait (sem=0x2aa1e27dfa8,
> sem@entry=<error reading variable: value has been optimized out>)
>     at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa1aefe5b0 in postcopy_pause_return_path_thread
> (s=0x2aa1e27dc00) at ../../migration/migration.c:2654
> #6  0x000002aa1aefe5b0 in source_return_path_thread
> (opaque=opaque@entry=0x2aa1e27dc00) at
> ../../migration/migration.c:2854
> #7  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #8  0x000003ff9d107aa8 in start_thread (arg=0x3ff8d7fe910) at
> pthread_create.c:463
> #9  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

OK, source waiting for recovery - my reading is we don't send the
recovery until the next command in the test.

> 
> Thread 4 (Thread 0x3ff8ed42910 (LWP 43252)):
> #0  0x000003ff9d10d9fe in futex_wait_cancelable (private=<optimized
> out>, expected=0, futex_word=0x2aa1e49ca68)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  0x000003ff9d10d9fe in __pthread_cond_wait_common (abstime=0x0,
> mutex=0x2aa1c1a4840 <qemu_global_mutex>, cond=0x2aa1e49ca40)
>     at pthread_cond_wait.c:502
> #2  0x000003ff9d10d9fe in __pthread_cond_wait
> (cond=cond@entry=0x2aa1e49ca40, mutex=mutex@entry=0x2aa1c1a4840
> <qemu_global_mutex>)
>     at pthread_cond_wait.c:655
> #3  0x000002aa1b6df59e in qemu_cond_wait_impl (cond=0x2aa1e49ca40,
> mutex=0x2aa1c1a4840 <qemu_global_mutex>, file=0x2aa1b8a93c6
> "../../softmmu/cpus.c", line=<optimized out>) at
> ../../util/qemu-thread-posix.c:194
> #4  0x000002aa1b40764e in qemu_wait_io_event
> (cpu=cpu@entry=0x2aa1e43aa50) at ../../softmmu/cpus.c:419
> #5  0x000002aa1b406916 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x2aa1e43aa50) at
> ../../accel/tcg/tcg-accel-ops-mttcg.c:98
> #6  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #7  0x000003ff9d107aa8 in start_thread (arg=0x3ff8ed42910) at
> pthread_create.c:463
> #8  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 3 (Thread 0x3ff8f735910 (LWP 43251)):
> #0  0x000003ff9cfef1b0 in __GI___poll (fds=0x3ff64003620, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ff9e852624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ff9e852aa8 in g_main_loop_run () at
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa1b4eaf0e in iothread_run
> (opaque=opaque@entry=0x2aa1df83b80) at ../../iothread.c:73
> #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff8f735910) at
> pthread_create.c:463
> #6  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3ff90037910 (LWP 43250)):
> #0  0x000003ff9cff52ea in syscall () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa1b6dfd5c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  0x000002aa1b6dfd5c in qemu_event_wait (ev=ev@entry=0x2aa1c1c3888
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa1b6d917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff90037910) at
> pthread_create.c:463
> #6  0x000003ff9cffa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff9feb5470 (LWP 43248)):
> #0  0x000003ff9cfef302 in __GI_ppoll (fds=0x2aa1ed28d00, nfds=6,
> timeout=<optimized out>,
>     timeout@entry=0x3fff517ecf8, sigmask=sigmask@entry=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x000002aa1b701fec in ppoll (__ss=0x0, __timeout=0x3fff517ecf8,
> __nfds=<optimized out>, __fds=<optimized out>)
>     at /usr/include/s390x-linux-gnu/bits/poll2.h:77
> #2  0x000002aa1b701fec in qemu_poll_ns (fds=<optimized out>,
> nfds=<optimized out>, timeout=timeout@entry=1000000000)
>     at ../../util/qemu-timer.c:348
> #3  0x000002aa1b6d2f98 in os_host_main_loop_wait (timeout=1000000000)
> at ../../util/main-loop.c:250
> #4  0x000002aa1b6d2f98 in main_loop_wait
> (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531
> #5  0x000002aa1b496d44 in qemu_main_loop () at ../../softmmu/runstate.c:726
> #6  0x000002aa1ae6d3f0 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../../softmmu/main.c:50
> 
> Backtrace, QEMU process 43256:
> 
> Thread 6 (Thread 0x3ff726ff910 (LWP 43440)):
> #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1a75d2d4)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2d0,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ffb9f1068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1a75d2d0, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> at sem_wait.c:42
> #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa17599f06 in postcopy_pause_incoming (mis=<optimized
> out>) at ../../migration/savevm.c:2600

So the dest is still waiting to be woken up from pause? Hmm I thought
the test had already given it a kick.

> #6  0x000002aa17599f06 in qemu_loadvm_state_main
> (f=f@entry=0x2aa1a751610, mis=mis@entry=0x2aa1a75d130)
>     at ../../migration/savevm.c:2674
> #7  0x000002aa1759af4a in postcopy_ram_listen_thread
> (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1872
> #8  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #9  0x000003ffb9f07aa8 in start_thread (arg=0x3ff726ff910) at
> pthread_create.c:463
> #10 0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 5 (Thread 0x3ff88ffe910 (LWP 43439)):
> #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, expected=0, futex_word=0x2aa1a75d2fc)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2f8,
> abstime=0x0) at sem_waitcommon.c:111
> #2  0x000003ffb9f1068c in __new_sem_wait_slow
> (sem=sem@entry=0x2aa1a75d2f8, abstime=0x0) at sem_waitcommon.c:181
> #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2f8)
> at sem_wait.c:42
> #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=0x2aa1a75d2f8,
> sem@entry=<error reading variable: value has been optimized out>)
>     at ../../util/qemu-thread-posix.c:357
> #5  0x000002aa177ae6aa in postcopy_pause_fault_thread (mis=<optimized
> out>) at ../../migration/postcopy-ram.c:847
> #6  0x000002aa177ae6aa in postcopy_ram_fault_thread
> (opaque=opaque@entry=0x2aa1a75d130) at
> ../../migration/postcopy-ram.c:911
> #7  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #8  0x000003ffb9f07aa8 in start_thread (arg=0x3ff88ffe910) at
> pthread_create.c:463
> #9  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 4 (Thread 0x3ff89b18910 (LWP 43271)):
> Python Exception <class 'gdb.error'> PC not saved:
> 
> Thread 3 (Thread 0x3ff8a319910 (LWP 43270)):
> #0  0x000003ffb9def1b0 in __GI___poll (fds=0x3ff84003620, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ffbb652624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ffbb652aa8 in g_main_loop_run () at
> /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa17beaf0e in iothread_run
> (opaque=opaque@entry=0x2aa1a3aeb80) at ../../iothread.c:73
> #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ffb9f07aa8 in start_thread (arg=0x3ff8a319910) at
> pthread_create.c:463
> #6  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3fface37910 (LWP 43267)):
> #0  0x000003ffb9df52ea in syscall () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> #1  0x000002aa17ddfd5c in qemu_futex_wait (val=<optimized out>,
> f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> #2  0x000002aa17ddfd5c in qemu_event_wait (ev=ev@entry=0x2aa188c3888
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> #3  0x000002aa17dd917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../../util/rcu.c:258
> #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> ../../util/qemu-thread-posix.c:541
> #5  0x000003ffb9f07aa8 in start_thread (arg=0x3fface37910) at
> pthread_create.c:463
> #6  0x000003ffb9dfa11e in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ffbccb5470 (LWP 43256)):
> #0  0x000003ffb9f0d9fe in futex_wait_cancelable (private=<optimized
> out>, expected=0, futex_word=0x2aa188a47e8 <qemu_pause_cond+40>)
>     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  0x000003ffb9f0d9fe in __pthread_cond_wait_common (abstime=0x0,
> mutex=0x2aa188a4840 <qemu_global_mutex>, cond=0x2aa188a47c0
> <qemu_pause_cond>) at pthread_cond_wait.c:502
> #2  0x000003ffb9f0d9fe in __pthread_cond_wait
> (cond=cond@entry=0x2aa188a47c0 <qemu_pause_cond>,
> mutex=mutex@entry=0x2aa188a4840 <qemu_global_mutex>) at
> pthread_cond_wait.c:655
> #3  0x000002aa17ddf59e in qemu_cond_wait_impl (cond=0x2aa188a47c0
> <qemu_pause_cond>, mutex=0x2aa188a4840 <qemu_global_mutex>,
> file=0x2aa17fa93c6 "../../softmmu/cpus.c", line=<optimized out>) at
> ../../util/qemu-thread-posix.c:194
> #4  0x000002aa17b079e2 in pause_all_vcpus () at ../../softmmu/cpus.c:562
> #5  0x000002aa17b07aec in do_vm_stop
> (state=state@entry=RUN_STATE_SHUTDOWN,
> send_stop=send_stop@entry=false)
>     at ../../softmmu/cpus.c:261

Hang on; how is the dest (?) in shutdown?

> #6  0x000002aa17b07bb6 in vm_shutdown () at ../../softmmu/cpus.c:280
> #7  0x000002aa17b9716e in qemu_cleanup () at ../../softmmu/runstate.c:812
> #8  0x000002aa1756d3f6 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../../softmmu/main.c:51
> 
> No, I dunno why thread 4 was odd. Backtracing that thread seems to
> only work about one time in two. Here's the backtrace from a
> success:

That maybe due to postcopy userfault; it can confuse things sometimes.

> #0  0x000002aa17b1ca2e in load_memop (op=MO_8, haddr=0x3ff80c6d000) at
> ../../accel/tcg/cputlb.c:1860

So just a read, on a page boundary?  So possibly triggered a page read
over postcopy?

Dave

> #1  0x000002aa17b1ca2e in load_helper (full_load=<optimized out>,
> code_read=false, op=MO_8, retaddr=4396070052416, oi=<optimized out>,
> addr=1177997312, env=0x2aa1a86f890) at ../../accel/tcg/cputlb.c:1980
> #2  0x000002aa17b1ca2e in full_ldub_mmu (env=0x2aa1a86f890,
> addr=1177997312, oi=<optimized out>, retaddr=4396070052416)
>     at ../../accel/tcg/cputlb.c:1996
> #3  0x000003ff8a31a35e in code_gen_buffer ()
> #4  0x000002aa17bbabb4 in cpu_tb_exec (tb_exit=<synthetic pointer>,
> itb=<optimized out>, cpu=<optimized out>)
>     at ../../accel/tcg/cpu-exec.c:353
> #5  0x000002aa17bbabb4 in cpu_loop_exec_tb (tb_exit=<synthetic
> pointer>, last_tb=<synthetic pointer>, tb=<optimized out>,
> cpu=<optimized out>) at ../../accel/tcg/cpu-exec.c:812
> #6  0x000002aa17bbabb4 in cpu_exec (cpu=<optimized out>) at
> ../../accel/tcg/cpu-exec.c:970
> #7  0x000002aa18764620 in  ()
> 
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: migration-test hang, s390x host, aarch64 guest
  2021-08-02  8:55 ` Dr. David Alan Gilbert
@ 2021-08-03  9:11   ` Dr. David Alan Gilbert
  2021-08-03 20:53   ` Peter Xu
  1 sibling, 0 replies; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-03  9:11 UTC (permalink / raw)
  To: Peter Maydell, peterx; +Cc: Richard Henderson, QEMU Developers, Juan Quintela

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > migration-test hung again during 'make check'.
> 
> ccing in Peter Xu

I've not managed to reproduce this here; I've had the migration tests
running in a loop for the last day just running aarch64 on s390's
migration test;  I've hit a multifd/cancel hang a few times (which I
think might be the same one Leo is fighting), but no problems in
the postcopy test.

[2 vcpu, ~2G RAM zvm guest???]

Dave

> > Process tree:
> > 
> > ubuntu   42067  0.0  0.0   5460  3156 ?        S    13:55   0:00
> >            \_ bash -o pipefail -c echo
> > 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-aarch64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/migration-test --tap -k' &&
> > MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-aarch64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/migration-test --tap -k -m quick < /dev/null |
> > ./scripts/tap-driver.pl --test-name="qtest-aarch64/migration-test"
> > ubuntu   42070  0.0  0.0  78844  3184 ?        Sl   13:55   0:01
> >                \_ tests/qtest/migration-test --tap -k -m quick
> > ubuntu   43248  0.0  4.1 1401368 160852 ?      Sl   13:55   0:03
> >                |   \_ ./qemu-system-aarch64 -qtest
> > unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg
> > -machine virt,gic-version=max -name source,debug-threads=on -m 150M
> > -serial file:/tmp/migration-test-SL7EkN/src_serial -cpu max -kernel
> > /tmp/migration-test-SL7EkN/bootsect -accel qtest
> > ubuntu   43256  0.0  1.4 1394208 57648 ?       Sl   13:55   0:00
> >                |   \_ ./qemu-system-aarch64 -qtest
> > unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg
> > -machine virt,gic-version=max -name target,debug-threads=on -m 150M
> > -serial file:/tmp/migration-test-SL7EkN/dest_serial -incoming
> > unix:/tmp/migration-test-SL7EkN/migsocket -cpu max -kernel
> > /tmp/migration-test-SL7EkN/bootsect -accel qtest
> > ubuntu   42071  0.0  0.2  14116 11372 ?        S    13:55   0:00
> >                \_ perl ./scripts/tap-driver.pl
> > --test-name=qtest-aarch64/migration-test
> > 
> > Backtrace, migration-test process:
> > 
> > Thread 2 (Thread 0x3ff8ff7f910 (LWP 42075)):
> > #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa1d0d1c1c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  qemu_event_wait (ev=ev@entry=0x2aa1d0fda58 <rcu_call_ready_event>)
> > at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa1d0d0884 in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa1d0d0c32 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff90207aa8 in start_thread (arg=0x3ff8ff7f910) at
> > pthread_create.c:463
> > #6  0x000003ff900fa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ff905f7c00 (LWP 42070)):
> > #0  0x000003ff90212978 in __waitpid (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa1d5e06bc, options=options@entry=0)
> >     at ../sysdeps/unix/sysv/linux/waitpid.c:30
> > #1  0x000002aa1d0a2176 in qtest_kill_qemu (s=0x2aa1d5e06b0) at
> > ../../tests/qtest/libqtest.c:144
> > #2  0x000003ff903c0de6 in g_hook_list_invoke () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  <signal handler called>
> > #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > #5  0x000003ff9003f3ca in __GI_abort () at abort.c:79
> > #6  0x000003ff903fcbb2 in g_assertion_message () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #7  0x000003ff903fd2b4 in g_assertion_message_cmpstr () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #8  0x000002aa1d0a10f6 in check_migration_status
> > (ungoals=0x3ffffdfe198, goal=0x2aa1d0d75c0 "postcopy-paused",
> > who=0x2aa1d5dee90)
> 
>             g_assert_cmpstr(current_status, !=,  *ungoal);
> 
> >     at ../../tests/qtest/migration-helpers.c:146
> > #9  wait_for_migration_status (who=0x2aa1d5dee90, goal=<optimized
> > out>, ungoals=0x3ffffdfe198)
> >     at ../../tests/qtest/migration-helpers.c:156
> > #10 0x000002aa1d09f610 in test_postcopy_recovery () at
> > ../../tests/qtest/migration-test.c:773
> 
>     wait_for_migration_status(from, "postcopy-paused",
>                               (const char * []) { "failed", "active",
>                                                   "completed", NULL });
> 
> so I think that means it hit one of failed/active/completed rather than
> the postcopy-paused it expected (shame we can't tell which)
> 
> > #11 0x000003ff903fc70c in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #12 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #13 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #14 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #15 0x000003ff903fc8ee in g_test_run_suite () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #16 0x000003ff903fc928 in g_test_run () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #17 0x000002aa1d09d8f2 in main (argc=<optimized out>, argv=<optimized
> > out>) at ../../tests/qtest/migration-test.c:1495
> > 
> > 
> > Backtrace, QEMU process 43248:
> > 
> > Thread 5 (Thread 0x3ff8d7fe910 (LWP 43361)):
> > #0  0x000003ff9d110582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1e27dfac)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ff9d110582 in do_futex_wait (sem=sem@entry=0x2aa1e27dfa8,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ff9d11068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1e27dfa8, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ff9d110734 in __new_sem_wait (sem=sem@entry=0x2aa1e27dfa8)
> > at sem_wait.c:42
> > #4  0x000002aa1b6dfb96 in qemu_sem_wait (sem=0x2aa1e27dfa8,
> > sem@entry=<error reading variable: value has been optimized out>)
> >     at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa1aefe5b0 in postcopy_pause_return_path_thread
> > (s=0x2aa1e27dc00) at ../../migration/migration.c:2654
> > #6  0x000002aa1aefe5b0 in source_return_path_thread
> > (opaque=opaque@entry=0x2aa1e27dc00) at
> > ../../migration/migration.c:2854
> > #7  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #8  0x000003ff9d107aa8 in start_thread (arg=0x3ff8d7fe910) at
> > pthread_create.c:463
> > #9  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> OK, source waiting for recovery - my reading is we don't send the
> recovery until the next command in the test.
> 
> > 
> > Thread 4 (Thread 0x3ff8ed42910 (LWP 43252)):
> > #0  0x000003ff9d10d9fe in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa1e49ca68)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ff9d10d9fe in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa1c1a4840 <qemu_global_mutex>, cond=0x2aa1e49ca40)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ff9d10d9fe in __pthread_cond_wait
> > (cond=cond@entry=0x2aa1e49ca40, mutex=mutex@entry=0x2aa1c1a4840
> > <qemu_global_mutex>)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa1b6df59e in qemu_cond_wait_impl (cond=0x2aa1e49ca40,
> > mutex=0x2aa1c1a4840 <qemu_global_mutex>, file=0x2aa1b8a93c6
> > "../../softmmu/cpus.c", line=<optimized out>) at
> > ../../util/qemu-thread-posix.c:194
> > #4  0x000002aa1b40764e in qemu_wait_io_event
> > (cpu=cpu@entry=0x2aa1e43aa50) at ../../softmmu/cpus.c:419
> > #5  0x000002aa1b406916 in mttcg_cpu_thread_fn
> > (arg=arg@entry=0x2aa1e43aa50) at
> > ../../accel/tcg/tcg-accel-ops-mttcg.c:98
> > #6  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #7  0x000003ff9d107aa8 in start_thread (arg=0x3ff8ed42910) at
> > pthread_create.c:463
> > #8  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 3 (Thread 0x3ff8f735910 (LWP 43251)):
> > #0  0x000003ff9cfef1b0 in __GI___poll (fds=0x3ff64003620, nfds=3,
> > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> > #1  0x000003ff9e852624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #2  0x000003ff9e852aa8 in g_main_loop_run () at
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  0x000002aa1b4eaf0e in iothread_run
> > (opaque=opaque@entry=0x2aa1df83b80) at ../../iothread.c:73
> > #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff8f735910) at
> > pthread_create.c:463
> > #6  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 2 (Thread 0x3ff90037910 (LWP 43250)):
> > #0  0x000003ff9cff52ea in syscall () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa1b6dfd5c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  0x000002aa1b6dfd5c in qemu_event_wait (ev=ev@entry=0x2aa1c1c3888
> > <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa1b6d917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff90037910) at
> > pthread_create.c:463
> > #6  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ff9feb5470 (LWP 43248)):
> > #0  0x000003ff9cfef302 in __GI_ppoll (fds=0x2aa1ed28d00, nfds=6,
> > timeout=<optimized out>,
> >     timeout@entry=0x3fff517ecf8, sigmask=sigmask@entry=0x0) at
> > ../sysdeps/unix/sysv/linux/ppoll.c:39
> > #1  0x000002aa1b701fec in ppoll (__ss=0x0, __timeout=0x3fff517ecf8,
> > __nfds=<optimized out>, __fds=<optimized out>)
> >     at /usr/include/s390x-linux-gnu/bits/poll2.h:77
> > #2  0x000002aa1b701fec in qemu_poll_ns (fds=<optimized out>,
> > nfds=<optimized out>, timeout=timeout@entry=1000000000)
> >     at ../../util/qemu-timer.c:348
> > #3  0x000002aa1b6d2f98 in os_host_main_loop_wait (timeout=1000000000)
> > at ../../util/main-loop.c:250
> > #4  0x000002aa1b6d2f98 in main_loop_wait
> > (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531
> > #5  0x000002aa1b496d44 in qemu_main_loop () at ../../softmmu/runstate.c:726
> > #6  0x000002aa1ae6d3f0 in main (argc=<optimized out>, argv=<optimized
> > out>, envp=<optimized out>) at ../../softmmu/main.c:50
> > 
> > Backtrace, QEMU process 43256:
> > 
> > Thread 6 (Thread 0x3ff726ff910 (LWP 43440)):
> > #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1a75d2d4)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2d0,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ffb9f1068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1a75d2d0, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> > at sem_wait.c:42
> > #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> > at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa17599f06 in postcopy_pause_incoming (mis=<optimized
> > out>) at ../../migration/savevm.c:2600
> 
> So the dest is still waiting to be woken up from pause? Hmm I thought
> the test had already given it a kick.
> 
> > #6  0x000002aa17599f06 in qemu_loadvm_state_main
> > (f=f@entry=0x2aa1a751610, mis=mis@entry=0x2aa1a75d130)
> >     at ../../migration/savevm.c:2674
> > #7  0x000002aa1759af4a in postcopy_ram_listen_thread
> > (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1872
> > #8  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #9  0x000003ffb9f07aa8 in start_thread (arg=0x3ff726ff910) at
> > pthread_create.c:463
> > #10 0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 5 (Thread 0x3ff88ffe910 (LWP 43439)):
> > #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1a75d2fc)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2f8,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ffb9f1068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1a75d2f8, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2f8)
> > at sem_wait.c:42
> > #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=0x2aa1a75d2f8,
> > sem@entry=<error reading variable: value has been optimized out>)
> >     at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa177ae6aa in postcopy_pause_fault_thread (mis=<optimized
> > out>) at ../../migration/postcopy-ram.c:847
> > #6  0x000002aa177ae6aa in postcopy_ram_fault_thread
> > (opaque=opaque@entry=0x2aa1a75d130) at
> > ../../migration/postcopy-ram.c:911
> > #7  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #8  0x000003ffb9f07aa8 in start_thread (arg=0x3ff88ffe910) at
> > pthread_create.c:463
> > #9  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 4 (Thread 0x3ff89b18910 (LWP 43271)):
> > Python Exception <class 'gdb.error'> PC not saved:
> > 
> > Thread 3 (Thread 0x3ff8a319910 (LWP 43270)):
> > #0  0x000003ffb9def1b0 in __GI___poll (fds=0x3ff84003620, nfds=3,
> > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> > #1  0x000003ffbb652624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #2  0x000003ffbb652aa8 in g_main_loop_run () at
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  0x000002aa17beaf0e in iothread_run
> > (opaque=opaque@entry=0x2aa1a3aeb80) at ../../iothread.c:73
> > #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ffb9f07aa8 in start_thread (arg=0x3ff8a319910) at
> > pthread_create.c:463
> > #6  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 2 (Thread 0x3fface37910 (LWP 43267)):
> > #0  0x000003ffb9df52ea in syscall () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa17ddfd5c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  0x000002aa17ddfd5c in qemu_event_wait (ev=ev@entry=0x2aa188c3888
> > <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa17dd917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ffb9f07aa8 in start_thread (arg=0x3fface37910) at
> > pthread_create.c:463
> > #6  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ffbccb5470 (LWP 43256)):
> > #0  0x000003ffb9f0d9fe in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa188a47e8 <qemu_pause_cond+40>)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ffb9f0d9fe in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa188a4840 <qemu_global_mutex>, cond=0x2aa188a47c0
> > <qemu_pause_cond>) at pthread_cond_wait.c:502
> > #2  0x000003ffb9f0d9fe in __pthread_cond_wait
> > (cond=cond@entry=0x2aa188a47c0 <qemu_pause_cond>,
> > mutex=mutex@entry=0x2aa188a4840 <qemu_global_mutex>) at
> > pthread_cond_wait.c:655
> > #3  0x000002aa17ddf59e in qemu_cond_wait_impl (cond=0x2aa188a47c0
> > <qemu_pause_cond>, mutex=0x2aa188a4840 <qemu_global_mutex>,
> > file=0x2aa17fa93c6 "../../softmmu/cpus.c", line=<optimized out>) at
> > ../../util/qemu-thread-posix.c:194
> > #4  0x000002aa17b079e2 in pause_all_vcpus () at ../../softmmu/cpus.c:562
> > #5  0x000002aa17b07aec in do_vm_stop
> > (state=state@entry=RUN_STATE_SHUTDOWN,
> > send_stop=send_stop@entry=false)
> >     at ../../softmmu/cpus.c:261
> 
> Hang on; how is the dest (?) in shutdown?
> 
> > #6  0x000002aa17b07bb6 in vm_shutdown () at ../../softmmu/cpus.c:280
> > #7  0x000002aa17b9716e in qemu_cleanup () at ../../softmmu/runstate.c:812
> > #8  0x000002aa1756d3f6 in main (argc=<optimized out>, argv=<optimized
> > out>, envp=<optimized out>) at ../../softmmu/main.c:51
> > 
> > No, I dunno why thread 4 was odd. Backtracing that thread seems to
> > only work about one time in two. Here's the backtrace from a
> > success:
> 
> That maybe due to postcopy userfault; it can confuse things sometimes.
> 
> > #0  0x000002aa17b1ca2e in load_memop (op=MO_8, haddr=0x3ff80c6d000) at
> > ../../accel/tcg/cputlb.c:1860
> 
> So just a read, on a page boundary?  So possibly triggered a page read
> over postcopy?
> 
> Dave
> 
> > #1  0x000002aa17b1ca2e in load_helper (full_load=<optimized out>,
> > code_read=false, op=MO_8, retaddr=4396070052416, oi=<optimized out>,
> > addr=1177997312, env=0x2aa1a86f890) at ../../accel/tcg/cputlb.c:1980
> > #2  0x000002aa17b1ca2e in full_ldub_mmu (env=0x2aa1a86f890,
> > addr=1177997312, oi=<optimized out>, retaddr=4396070052416)
> >     at ../../accel/tcg/cputlb.c:1996
> > #3  0x000003ff8a31a35e in code_gen_buffer ()
> > #4  0x000002aa17bbabb4 in cpu_tb_exec (tb_exit=<synthetic pointer>,
> > itb=<optimized out>, cpu=<optimized out>)
> >     at ../../accel/tcg/cpu-exec.c:353
> > #5  0x000002aa17bbabb4 in cpu_loop_exec_tb (tb_exit=<synthetic
> > pointer>, last_tb=<synthetic pointer>, tb=<optimized out>,
> > cpu=<optimized out>) at ../../accel/tcg/cpu-exec.c:812
> > #6  0x000002aa17bbabb4 in cpu_exec (cpu=<optimized out>) at
> > ../../accel/tcg/cpu-exec.c:970
> > #7  0x000002aa18764620 in  ()
> > 
> > -- PMM
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: migration-test hang, s390x host, aarch64 guest
  2021-08-02  8:55 ` Dr. David Alan Gilbert
  2021-08-03  9:11   ` Dr. David Alan Gilbert
@ 2021-08-03 20:53   ` Peter Xu
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Xu @ 2021-08-03 20:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, Richard Henderson, QEMU Developers, Juan Quintela

On Mon, Aug 02, 2021 at 09:55:55AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > migration-test hung again during 'make check'.
> 
> ccing in Peter Xu
> 
> > Process tree:
> > 
> > ubuntu   42067  0.0  0.0   5460  3156 ?        S    13:55   0:00
> >            \_ bash -o pipefail -c echo
> > 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-aarch64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/migration-test --tap -k' &&
> > MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-aarch64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/migration-test --tap -k -m quick < /dev/null |
> > ./scripts/tap-driver.pl --test-name="qtest-aarch64/migration-test"
> > ubuntu   42070  0.0  0.0  78844  3184 ?        Sl   13:55   0:01
> >                \_ tests/qtest/migration-test --tap -k -m quick
> > ubuntu   43248  0.0  4.1 1401368 160852 ?      Sl   13:55   0:03
> >                |   \_ ./qemu-system-aarch64 -qtest
> > unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg
> > -machine virt,gic-version=max -name source,debug-threads=on -m 150M
> > -serial file:/tmp/migration-test-SL7EkN/src_serial -cpu max -kernel
> > /tmp/migration-test-SL7EkN/bootsect -accel qtest
> > ubuntu   43256  0.0  1.4 1394208 57648 ?       Sl   13:55   0:00
> >                |   \_ ./qemu-system-aarch64 -qtest
> > unix:/tmp/qtest-42070.sock -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-42070.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg
> > -machine virt,gic-version=max -name target,debug-threads=on -m 150M
> > -serial file:/tmp/migration-test-SL7EkN/dest_serial -incoming
> > unix:/tmp/migration-test-SL7EkN/migsocket -cpu max -kernel
> > /tmp/migration-test-SL7EkN/bootsect -accel qtest
> > ubuntu   42071  0.0  0.2  14116 11372 ?        S    13:55   0:00
> >                \_ perl ./scripts/tap-driver.pl
> > --test-name=qtest-aarch64/migration-test
> > 
> > Backtrace, migration-test process:
> > 
> > Thread 2 (Thread 0x3ff8ff7f910 (LWP 42075)):
> > #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa1d0d1c1c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  qemu_event_wait (ev=ev@entry=0x2aa1d0fda58 <rcu_call_ready_event>)
> > at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa1d0d0884 in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa1d0d0c32 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff90207aa8 in start_thread (arg=0x3ff8ff7f910) at
> > pthread_create.c:463
> > #6  0x000003ff900fa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ff905f7c00 (LWP 42070)):
> > #0  0x000003ff90212978 in __waitpid (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa1d5e06bc, options=options@entry=0)
> >     at ../sysdeps/unix/sysv/linux/waitpid.c:30
> > #1  0x000002aa1d0a2176 in qtest_kill_qemu (s=0x2aa1d5e06b0) at
> > ../../tests/qtest/libqtest.c:144
> > #2  0x000003ff903c0de6 in g_hook_list_invoke () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  <signal handler called>
> > #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > #5  0x000003ff9003f3ca in __GI_abort () at abort.c:79
> > #6  0x000003ff903fcbb2 in g_assertion_message () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #7  0x000003ff903fd2b4 in g_assertion_message_cmpstr () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #8  0x000002aa1d0a10f6 in check_migration_status
> > (ungoals=0x3ffffdfe198, goal=0x2aa1d0d75c0 "postcopy-paused",
> > who=0x2aa1d5dee90)
> 
>             g_assert_cmpstr(current_status, !=,  *ungoal);
> 
> >     at ../../tests/qtest/migration-helpers.c:146
> > #9  wait_for_migration_status (who=0x2aa1d5dee90, goal=<optimized
> > out>, ungoals=0x3ffffdfe198)
> >     at ../../tests/qtest/migration-helpers.c:156
> > #10 0x000002aa1d09f610 in test_postcopy_recovery () at
> > ../../tests/qtest/migration-test.c:773
> 
>     wait_for_migration_status(from, "postcopy-paused",
>                               (const char * []) { "failed", "active",
>                                                   "completed", NULL });
> 
> so I think that means it hit one of failed/active/completed rather than
> the postcopy-paused it expected (shame we can't tell which)
> 
> > #11 0x000003ff903fc70c in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #12 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #13 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #14 0x000003ff903fc648 in ?? () from /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #15 0x000003ff903fc8ee in g_test_run_suite () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #16 0x000003ff903fc928 in g_test_run () from
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #17 0x000002aa1d09d8f2 in main (argc=<optimized out>, argv=<optimized
> > out>) at ../../tests/qtest/migration-test.c:1495
> > 
> > 
> > Backtrace, QEMU process 43248:
> > 
> > Thread 5 (Thread 0x3ff8d7fe910 (LWP 43361)):
> > #0  0x000003ff9d110582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1e27dfac)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ff9d110582 in do_futex_wait (sem=sem@entry=0x2aa1e27dfa8,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ff9d11068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1e27dfa8, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ff9d110734 in __new_sem_wait (sem=sem@entry=0x2aa1e27dfa8)
> > at sem_wait.c:42
> > #4  0x000002aa1b6dfb96 in qemu_sem_wait (sem=0x2aa1e27dfa8,
> > sem@entry=<error reading variable: value has been optimized out>)
> >     at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa1aefe5b0 in postcopy_pause_return_path_thread
> > (s=0x2aa1e27dc00) at ../../migration/migration.c:2654
> > #6  0x000002aa1aefe5b0 in source_return_path_thread
> > (opaque=opaque@entry=0x2aa1e27dc00) at
> > ../../migration/migration.c:2854
> > #7  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #8  0x000003ff9d107aa8 in start_thread (arg=0x3ff8d7fe910) at
> > pthread_create.c:463
> > #9  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> OK, source waiting for recovery - my reading is we don't send the
> recovery until the next command in the test.
> 
> > 
> > Thread 4 (Thread 0x3ff8ed42910 (LWP 43252)):
> > #0  0x000003ff9d10d9fe in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa1e49ca68)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ff9d10d9fe in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa1c1a4840 <qemu_global_mutex>, cond=0x2aa1e49ca40)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ff9d10d9fe in __pthread_cond_wait
> > (cond=cond@entry=0x2aa1e49ca40, mutex=mutex@entry=0x2aa1c1a4840
> > <qemu_global_mutex>)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa1b6df59e in qemu_cond_wait_impl (cond=0x2aa1e49ca40,
> > mutex=0x2aa1c1a4840 <qemu_global_mutex>, file=0x2aa1b8a93c6
> > "../../softmmu/cpus.c", line=<optimized out>) at
> > ../../util/qemu-thread-posix.c:194
> > #4  0x000002aa1b40764e in qemu_wait_io_event
> > (cpu=cpu@entry=0x2aa1e43aa50) at ../../softmmu/cpus.c:419
> > #5  0x000002aa1b406916 in mttcg_cpu_thread_fn
> > (arg=arg@entry=0x2aa1e43aa50) at
> > ../../accel/tcg/tcg-accel-ops-mttcg.c:98
> > #6  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #7  0x000003ff9d107aa8 in start_thread (arg=0x3ff8ed42910) at
> > pthread_create.c:463
> > #8  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 3 (Thread 0x3ff8f735910 (LWP 43251)):
> > #0  0x000003ff9cfef1b0 in __GI___poll (fds=0x3ff64003620, nfds=3,
> > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> > #1  0x000003ff9e852624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #2  0x000003ff9e852aa8 in g_main_loop_run () at
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  0x000002aa1b4eaf0e in iothread_run
> > (opaque=opaque@entry=0x2aa1df83b80) at ../../iothread.c:73
> > #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff8f735910) at
> > pthread_create.c:463
> > #6  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 2 (Thread 0x3ff90037910 (LWP 43250)):
> > #0  0x000003ff9cff52ea in syscall () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa1b6dfd5c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  0x000002aa1b6dfd5c in qemu_event_wait (ev=ev@entry=0x2aa1c1c3888
> > <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa1b6d917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa1b6ded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ff9d107aa8 in start_thread (arg=0x3ff90037910) at
> > pthread_create.c:463
> > #6  0x000003ff9cffa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ff9feb5470 (LWP 43248)):
> > #0  0x000003ff9cfef302 in __GI_ppoll (fds=0x2aa1ed28d00, nfds=6,
> > timeout=<optimized out>,
> >     timeout@entry=0x3fff517ecf8, sigmask=sigmask@entry=0x0) at
> > ../sysdeps/unix/sysv/linux/ppoll.c:39
> > #1  0x000002aa1b701fec in ppoll (__ss=0x0, __timeout=0x3fff517ecf8,
> > __nfds=<optimized out>, __fds=<optimized out>)
> >     at /usr/include/s390x-linux-gnu/bits/poll2.h:77
> > #2  0x000002aa1b701fec in qemu_poll_ns (fds=<optimized out>,
> > nfds=<optimized out>, timeout=timeout@entry=1000000000)
> >     at ../../util/qemu-timer.c:348
> > #3  0x000002aa1b6d2f98 in os_host_main_loop_wait (timeout=1000000000)
> > at ../../util/main-loop.c:250
> > #4  0x000002aa1b6d2f98 in main_loop_wait
> > (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531
> > #5  0x000002aa1b496d44 in qemu_main_loop () at ../../softmmu/runstate.c:726
> > #6  0x000002aa1ae6d3f0 in main (argc=<optimized out>, argv=<optimized
> > out>, envp=<optimized out>) at ../../softmmu/main.c:50
> > 
> > Backtrace, QEMU process 43256:
> > 
> > Thread 6 (Thread 0x3ff726ff910 (LWP 43440)):
> > #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1a75d2d4)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2d0,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ffb9f1068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1a75d2d0, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> > at sem_wait.c:42
> > #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=sem@entry=0x2aa1a75d2d0)
> > at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa17599f06 in postcopy_pause_incoming (mis=<optimized
> > out>) at ../../migration/savevm.c:2600
> 
> So the dest is still waiting to be woken up from pause? Hmm I thought
> the test had already given it a kick.
> 
> > #6  0x000002aa17599f06 in qemu_loadvm_state_main
> > (f=f@entry=0x2aa1a751610, mis=mis@entry=0x2aa1a75d130)
> >     at ../../migration/savevm.c:2674
> > #7  0x000002aa1759af4a in postcopy_ram_listen_thread
> > (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1872
> > #8  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #9  0x000003ffb9f07aa8 in start_thread (arg=0x3ff726ff910) at
> > pthread_create.c:463
> > #10 0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 5 (Thread 0x3ff88ffe910 (LWP 43439)):
> > #0  0x000003ffb9f10582 in futex_abstimed_wait_cancelable (private=0,
> > abstime=0x0, expected=0, futex_word=0x2aa1a75d2fc)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> > #1  0x000003ffb9f10582 in do_futex_wait (sem=sem@entry=0x2aa1a75d2f8,
> > abstime=0x0) at sem_waitcommon.c:111
> > #2  0x000003ffb9f1068c in __new_sem_wait_slow
> > (sem=sem@entry=0x2aa1a75d2f8, abstime=0x0) at sem_waitcommon.c:181
> > #3  0x000003ffb9f10734 in __new_sem_wait (sem=sem@entry=0x2aa1a75d2f8)
> > at sem_wait.c:42
> > #4  0x000002aa17ddfb96 in qemu_sem_wait (sem=0x2aa1a75d2f8,
> > sem@entry=<error reading variable: value has been optimized out>)
> >     at ../../util/qemu-thread-posix.c:357
> > #5  0x000002aa177ae6aa in postcopy_pause_fault_thread (mis=<optimized
> > out>) at ../../migration/postcopy-ram.c:847
> > #6  0x000002aa177ae6aa in postcopy_ram_fault_thread
> > (opaque=opaque@entry=0x2aa1a75d130) at
> > ../../migration/postcopy-ram.c:911
> > #7  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #8  0x000003ffb9f07aa8 in start_thread (arg=0x3ff88ffe910) at
> > pthread_create.c:463
> > #9  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 4 (Thread 0x3ff89b18910 (LWP 43271)):
> > Python Exception <class 'gdb.error'> PC not saved:
> > 
> > Thread 3 (Thread 0x3ff8a319910 (LWP 43270)):
> > #0  0x000003ffb9def1b0 in __GI___poll (fds=0x3ff84003620, nfds=3,
> > timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> > #1  0x000003ffbb652624 in  () at /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #2  0x000003ffbb652aa8 in g_main_loop_run () at
> > /usr/lib/s390x-linux-gnu/libglib-2.0.so.0
> > #3  0x000002aa17beaf0e in iothread_run
> > (opaque=opaque@entry=0x2aa1a3aeb80) at ../../iothread.c:73
> > #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ffb9f07aa8 in start_thread (arg=0x3ff8a319910) at
> > pthread_create.c:463
> > #6  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 2 (Thread 0x3fface37910 (LWP 43267)):
> > #0  0x000003ffb9df52ea in syscall () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:53
> > #1  0x000002aa17ddfd5c in qemu_futex_wait (val=<optimized out>,
> > f=<optimized out>) at /home/ubuntu/qemu/include/qemu/futex.h:29
> > #2  0x000002aa17ddfd5c in qemu_event_wait (ev=ev@entry=0x2aa188c3888
> > <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480
> > #3  0x000002aa17dd917c in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x000002aa17dded72 in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:541
> > #5  0x000003ffb9f07aa8 in start_thread (arg=0x3fface37910) at
> > pthread_create.c:463
> > #6  0x000003ffb9dfa11e in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> > 
> > Thread 1 (Thread 0x3ffbccb5470 (LWP 43256)):
> > #0  0x000003ffb9f0d9fe in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa188a47e8 <qemu_pause_cond+40>)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ffb9f0d9fe in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa188a4840 <qemu_global_mutex>, cond=0x2aa188a47c0
> > <qemu_pause_cond>) at pthread_cond_wait.c:502
> > #2  0x000003ffb9f0d9fe in __pthread_cond_wait
> > (cond=cond@entry=0x2aa188a47c0 <qemu_pause_cond>,
> > mutex=mutex@entry=0x2aa188a4840 <qemu_global_mutex>) at
> > pthread_cond_wait.c:655
> > #3  0x000002aa17ddf59e in qemu_cond_wait_impl (cond=0x2aa188a47c0
> > <qemu_pause_cond>, mutex=0x2aa188a4840 <qemu_global_mutex>,
> > file=0x2aa17fa93c6 "../../softmmu/cpus.c", line=<optimized out>) at
> > ../../util/qemu-thread-posix.c:194
> > #4  0x000002aa17b079e2 in pause_all_vcpus () at ../../softmmu/cpus.c:562
> > #5  0x000002aa17b07aec in do_vm_stop
> > (state=state@entry=RUN_STATE_SHUTDOWN,
> > send_stop=send_stop@entry=false)
> >     at ../../softmmu/cpus.c:261
> 
> Hang on; how is the dest (?) in shutdown?

Yeh, the other "mistery" is we don't have migration thread running on src at
all...  So I'm thinking maybe the migration failed somehow, and migration
thread quitted already.

Normally for recovery test we'll be stuck in postcopy_pause() on the src, if
postcopy_do_resume() failed we'll stick with the postcopy_pause_sem and wait
forever there; that won't eat the migration thread.

So it's gone only if we went the other way to MIG_THR_ERR_FATAL, I think.  That
can possibly be caused by not reaching MIGRATION_STATUS_POSTCOPY_RECOVER state
for the previous "migrate -r" command.  However I've no clue on why.

Maybe.. the stderr could help on what has failed; but by default they're
dropped.  So I'm wondering whether we should apply some patch like this to the
test temporarily:

---8<---
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cc5e83d98a..d4e3c78acc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -725,7 +725,7 @@ static void test_postcopy_recovery(void)
     QTestState *from, *to;
     g_autofree char *uri = NULL;
 
-    args->hide_stderr = true;
+    args->hide_stderr = false;
 
     if (migrate_postcopy_prepare(&from, &to, args)) {
         return;
---8<---

So if postcopy recovery is the only thing that fails, then we enable stderr for
this test. When it crashes we have a chance to get the stderr of both src/dst
qemu, maybe?

Thanks,

> 
> > #6  0x000002aa17b07bb6 in vm_shutdown () at ../../softmmu/cpus.c:280
> > #7  0x000002aa17b9716e in qemu_cleanup () at ../../softmmu/runstate.c:812
> > #8  0x000002aa1756d3f6 in main (argc=<optimized out>, argv=<optimized
> > out>, envp=<optimized out>) at ../../softmmu/main.c:51
> > 
> > No, I dunno why thread 4 was odd. Backtracing that thread seems to
> > only work about one time in two. Here's the backtrace from a
> > success:
> 
> That maybe due to postcopy userfault; it can confuse things sometimes.
> 
> > #0  0x000002aa17b1ca2e in load_memop (op=MO_8, haddr=0x3ff80c6d000) at
> > ../../accel/tcg/cputlb.c:1860
> 
> So just a read, on a page boundary?  So possibly triggered a page read
> over postcopy?
> 
> Dave
> 
> > #1  0x000002aa17b1ca2e in load_helper (full_load=<optimized out>,
> > code_read=false, op=MO_8, retaddr=4396070052416, oi=<optimized out>,
> > addr=1177997312, env=0x2aa1a86f890) at ../../accel/tcg/cputlb.c:1980
> > #2  0x000002aa17b1ca2e in full_ldub_mmu (env=0x2aa1a86f890,
> > addr=1177997312, oi=<optimized out>, retaddr=4396070052416)
> >     at ../../accel/tcg/cputlb.c:1996
> > #3  0x000003ff8a31a35e in code_gen_buffer ()
> > #4  0x000002aa17bbabb4 in cpu_tb_exec (tb_exit=<synthetic pointer>,
> > itb=<optimized out>, cpu=<optimized out>)
> >     at ../../accel/tcg/cpu-exec.c:353
> > #5  0x000002aa17bbabb4 in cpu_loop_exec_tb (tb_exit=<synthetic
> > pointer>, last_tb=<synthetic pointer>, tb=<optimized out>,
> > cpu=<optimized out>) at ../../accel/tcg/cpu-exec.c:812
> > #6  0x000002aa17bbabb4 in cpu_exec (cpu=<optimized out>) at
> > ../../accel/tcg/cpu-exec.c:970
> > #7  0x000002aa18764620 in  ()
> > 
> > -- PMM
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-03 20:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 19:37 migration-test hang, s390x host, aarch64 guest Peter Maydell
2021-08-02  8:55 ` Dr. David Alan Gilbert
2021-08-03  9:11   ` Dr. David Alan Gilbert
2021-08-03 20:53   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).