All of lore.kernel.org
 help / color / mirror / Atom feed
From: 贞贵李 <1824053@bugs.launchpad.net>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability
Date: Sat, 20 Apr 2019 02:55:47 -0000	[thread overview]
Message-ID: <155572894823.14003.12733061980284058304.launchpad@wampee.canonical.com> (raw)
In-Reply-To: 155486495593.20543.13567634487304856304.malonedeb@chaenomeles.canonical.com

** Description changed:

  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.
  
  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".
  
  The bt is below:
  
  Thread 2 (Thread 0x40000b776e50 (LWP 27215)):
  #0  0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0x0000aaaaaae82610 in sigwait_compat (opaque=0xaaaac5163b00) at util/compatfd.c:37
  #3  0x0000aaaaaae85038 in qemu_thread_start (args=args@entry=0xaaaac5163b90) at util/qemu_thread_posix.c:496
  #4  0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x000040000a492b2c in thread_start () from /lib64/libc.so.6
  
  Thread 1 (Thread 0x40000b573370 (LWP 27214)):
  #0  0x000040000a489020 in ppoll () from /lib64/libc.so.6
  #1  0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu_timer.c:391
  #3  0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at main_loop.c:272
  #4  0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at main_loop.c:534
  #5  0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923
  #6  0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2414
  #7  0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305
  
- 
- The problem seems to be very similar to the phenomenon described by this patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch), 
+ The problem seems to be very similar to the phenomenon described by this
+ patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025
+ -aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),
  
  which force main loop wakeup with SIGIO.  But this patch was reverted by
  the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-Revert-
  aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).
  
- The problem still seems to exist in aarch64 host. The qemu version I used is 2.8.1. The host version is 4.19.28-1.2.108.aarch64.
-  Do you have any solutions to fix it?  Thanks for your reply !
+ I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in
+ worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enventfd to notify main_loop. If such a scene happens, the main_loop will hang:
+     main loop                        worker thread1                         worker thread2
+ ------------------------------------------------------------------------------------------       
+      qemu_poll_ns                     aio_worker        
+                                     qemu_bh_schedule(pool->completion_bh)                              
+     glib_pollfds_poll
+     g_main_context_check
+     aio_ctx_check                                                         aio_worker                                                                                                    
+     atomic_and(&ctx->notify_me, ~1)                 qemu_bh_schedule(pool->completion_bh)                          
+                                                                                
+     /* do something for event */   
+     qemu_poll_ns
+     /* hangs !!!*/
+ 
+ As we known ,ctx->notify_me will be visited by worker thread and main
+ loop. I thank we should add a lock protection for ctx->notify_me to
+ avoid this happend.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824053

Title:
  Qemu-img convert appears to be stuck on aarch64 host with low
  probability

Status in QEMU:
  Confirmed

Bug description:
  Hi,  I found a problem that qemu-img convert appears to be stuck on
  aarch64 host with low probability.

  The convert command  line is  "qemu-img convert -f qcow2 -O raw
  disk.qcow2 disk.raw ".

  The bt is below:

  Thread 2 (Thread 0x40000b776e50 (LWP 27215)):
  #0  0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6
  #1  0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0
  #2  0x0000aaaaaae82610 in sigwait_compat (opaque=0xaaaac5163b00) at util/compatfd.c:37
  #3  0x0000aaaaaae85038 in qemu_thread_start (args=args@entry=0xaaaac5163b90) at util/qemu_thread_posix.c:496
  #4  0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0
  #5  0x000040000a492b2c in thread_start () from /lib64/libc.so.6

  Thread 1 (Thread 0x40000b573370 (LWP 27214)):
  #0  0x000040000a489020 in ppoll () from /lib64/libc.so.6
  #1  0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
  #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu_timer.c:391
  #3  0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at main_loop.c:272
  #4  0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at main_loop.c:534
  #5  0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923
  #6  0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2414
  #7  0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305

  The problem seems to be very similar to the phenomenon described by
  this patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-
  ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch),

  which force main loop wakeup with SIGIO.  But this patch was reverted
  by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-
  Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch).

  I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in
  worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enventfd to notify main_loop. If such a scene happens, the main_loop will hang:
      main loop                        worker thread1                         worker thread2
  ------------------------------------------------------------------------------------------       
       qemu_poll_ns                     aio_worker        
                                      qemu_bh_schedule(pool->completion_bh)                              
      glib_pollfds_poll
      g_main_context_check
      aio_ctx_check                                                         aio_worker                                                                                                    
      atomic_and(&ctx->notify_me, ~1)                 qemu_bh_schedule(pool->completion_bh)                          
                                                                                 
      /* do something for event */   
      qemu_poll_ns
      /* hangs !!!*/

  As we known ,ctx->notify_me will be visited by worker thread and main
  loop. I thank we should add a lock protection for ctx->notify_me to
  avoid this happend.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions

  parent reply	other threads:[~2019-04-20  3:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10  2:55 [Qemu-devel] [Bug 1824053] [NEW] Qemu-img convert appears to be stuck on aarch64 host with low probability 贞贵李
2019-04-13  2:04 ` [Qemu-devel] [Bug 1824053] " 贞贵李
2019-04-15  4:07 ` 贞贵李
2019-04-15 19:02 ` John Snow
2019-04-16  5:31 ` Thomas Huth
2019-04-16  7:53 ` 贞贵李
2019-04-16  8:08 ` Thomas Huth
2019-04-20  1:46 ` 贞贵李
2019-04-20  2:49 ` 贞贵李
2019-04-20  2:55 ` 贞贵李 [this message]
2019-06-06 22:57 ` dann frazier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155572894823.14003.12733061980284058304.launchpad@wampee.canonical.com \
    --to=1824053@bugs.launchpad.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.