All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bug 1805256 <1805256@bugs.launchpad.net>,
	QEMU Developers - ARM <qemu-arm@nongnu.org>,
	QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Wed, 11 Sep 2019 04:05:41 -0300	[thread overview]
Message-ID: <c3bf1a4e-4fa7-2c13-35fb-f63429ee39d1@ubuntu.com> (raw)
In-Reply-To: <cbe46ad6-ef6c-d155-e79a-672182c725ad@ubuntu.com>

Quick update...

> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():
> 
> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
> 
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***
> 
> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15

I realized this might be intentional, but, still, I tried:

    https://pastebin.ubuntu.com/p/6rkkY6fJdm/

looking for anything that could have misbehaved in arm64 (specially
concerned on casting and type conversions between the functions).

> QUESTION:
> 
> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

And I also tried:

-    qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+    while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0)
+        continue;

and it made little difference (took way more time for me to reproduce
the issue though):

"""
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2

[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]

[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]

<HUNG w/ 3 threads in the stack trace showed before>
"""

All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).

Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).

I'll try to check why those tasks exited.

Any thoughts ?

Tks


WARNING: multiple messages have this Message-ID (diff)
From: Rafael David Tinoco <rafaeldtinoco@kernelpath.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [Bug 1805256] Re: qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Wed, 11 Sep 2019 07:05:41 -0000	[thread overview]
Message-ID: <c3bf1a4e-4fa7-2c13-35fb-f63429ee39d1@ubuntu.com> (raw)
Message-ID: <20190911070541.6tcAZSFPvX65xG3PVMKNiEoddDsHwCBryx90bqLKwlI@z> (raw)
In-Reply-To: cbe46ad6-ef6c-d155-e79a-672182c725ad@ubuntu.com

Quick update...

> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():
> 
> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
> 
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***
> 
> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15

I realized this might be intentional, but, still, I tried:

    https://pastebin.ubuntu.com/p/6rkkY6fJdm/

looking for anything that could have misbehaved in arm64 (specially
concerned on casting and type conversions between the functions).

> QUESTION:
> 
> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

And I also tried:

-    qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+    while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0)
+        continue;

and it made little difference (took way more time for me to reproduce
the issue though):

"""
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2

[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]

[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]

<HUNG w/ 3 threads in the stack trace showed before>
"""

All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).

Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).

I'll try to check why those tasks exited.

Any thoughts ?

Tks

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, 
      timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
      at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, 
      __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
      timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
      at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
  #5  0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
  #6  img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
  #7  0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions


  reply	other threads:[~2019-09-11  7:13 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  2:15 [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco
2019-09-11  2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11  7:05 ` Rafael David Tinoco [this message]
2019-09-11  7:05   ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 13:17 ` [Qemu-devel] " Paolo Bonzini
2019-09-11 14:48   ` Rafael David Tinoco
2019-09-11 19:09   ` Rafael David Tinoco
2019-09-11 19:09     ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-24 20:25     ` [Qemu-devel] " dann frazier
2019-09-24 20:25       ` [Bug 1805256] " dann frazier
2019-10-02  9:23       ` Jan Glauber
2019-10-02  9:23         ` Jan Glauber
2019-10-02  9:45         ` Paolo Bonzini
2019-10-02 11:05           ` Jan Glauber
2019-10-02 11:05             ` [Bug 1805256] " Jan Glauber
2019-10-02 13:20             ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini
2019-10-02 14:58               ` Torvald Riegel
2019-10-02 16:30                 ` Paolo Bonzini
2019-10-07 11:06         ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini
2019-10-07 14:36           ` Jan Glauber
2019-10-07 14:36             ` [Bug 1805256] " Jan Glauber
2019-10-07 14:44           ` dann frazier
2019-10-07 14:44             ` [Bug 1805256] " dann frazier
2019-10-07 14:58             ` Paolo Bonzini
2019-10-09  8:02               ` Jan Glauber
2019-10-09  8:02                 ` [Bug 1805256] " Jan Glauber
2019-10-09  9:15                 ` Paolo Bonzini
2019-10-11  6:05                   ` Jan Glauber
2019-10-11  6:05                     ` [Bug 1805256] " Jan Glauber
2019-10-11  8:18                     ` Paolo Bonzini
2019-10-11  8:30                       ` Jan Glauber
2019-10-11  8:30                         ` [Bug 1805256] " Jan Glauber
2019-10-11 17:55                         ` dann frazier
2019-10-11 17:55                           ` dann frazier
2019-10-12  0:24                           ` [Bug 1805256] " no-reply
2019-10-12  0:49                           ` no-reply
2019-10-11 17:50                     ` dann frazier
2019-10-11 17:50                       ` [Bug 1805256] " dann frazier
  -- strict thread matches above, loose matches on Subject: below --
2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier
2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow
2018-11-26 23:54 ` dann frazier
2018-12-05 11:20 ` Alex Bennée
2019-04-15 12:59 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 22:25 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-16  8:16 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-23  1:29 ` 贞贵李
2019-06-05 16:16 ` dann frazier
2019-09-05 15:03 ` Rafael David Tinoco
2019-09-06 15:12 ` Rafael David Tinoco
2019-09-06 15:16 ` Rafael David Tinoco
2019-09-06 21:22 ` Rafael David Tinoco
2019-09-09 16:47 ` Rafael David Tinoco
2019-09-10  2:04 ` Rafael David Tinoco
2019-09-10 14:16 ` Rafael David Tinoco
2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-10 22:56 ` Rafael David Tinoco
2019-09-11  2:17 ` Rafael David Tinoco
2019-09-11 11:19 ` Rafael David Tinoco
2019-09-11 19:23 ` Rafael David Tinoco
2019-10-02 11:02 ` Jan Glauber
2019-10-03 12:28 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 21:35 ` dann frazier
2019-12-13 14:24 ` dann frazier
2019-12-17  1:34 ` Fred Kimmy
2019-12-17 19:17 ` dann frazier
2019-12-18  2:40 ` Rafael David Tinoco
2019-12-18  9:52 ` iveskim
2019-12-18 14:52 ` dann frazier
2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot
2020-02-13  8:41 ` Ike Panhc
2020-02-13  8:42 ` Andrew Cloke
2020-02-13  9:20 ` Fred Kimmy
2020-04-15  2:47 ` Rafael David Tinoco
2020-05-04  7:24 ` Ike Panhc
2020-05-05  0:54 ` Ike Panhc
2020-05-05  1:22 ` Ying Fang
2020-05-05  6:15 ` Ike Panhc
2020-05-05 15:01 ` Ike Panhc
2020-05-05 18:48 ` Rafael David Tinoco
2020-05-05 23:55 ` dann frazier
2020-05-06 13:08 ` Rafael David Tinoco
2020-05-06 13:23 ` Rafael David Tinoco
2020-05-06 15:45 ` Ike Panhc
2020-05-06 16:42 ` dann frazier
2020-05-06 19:04 ` Launchpad Bug Tracker
2020-05-06 19:09 ` Philippe Mathieu-Daudé
2020-05-06 19:57   ` dann frazier
2020-05-06 20:11 ` Rafael David Tinoco
2020-05-06 21:10 ` Launchpad Bug Tracker
2020-05-06 21:44 ` Launchpad Bug Tracker
2020-05-07  3:37 ` Launchpad Bug Tracker
2020-05-07  7:00 ` Ike Panhc
2020-05-07 22:27 ` dann frazier
2020-05-14  8:05 ` Andrew Cloke
2020-05-27  4:55 ` Christian Ehrhardt 
2020-05-28 14:58 ` Christian Ehrhardt 
2020-05-29  7:55 ` Launchpad Bug Tracker
2020-05-29  8:01 ` Christian Ehrhardt 
2020-06-02 22:45 ` Brian Murray
2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray
2020-06-02 22:54 ` Brian Murray
2020-06-03  4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot
2020-06-03  6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc
2020-06-03  8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot
2020-06-05  3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt 
2020-06-11  8:04 ` Andrew Cloke
2020-06-17  5:16 ` Christian Ehrhardt 
2020-06-18  9:23 ` Launchpad Bug Tracker
2020-06-18  9:23 ` [Bug 1805256] Update Released Łukasz Zemczak
2020-06-18  9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker
2020-06-18  9:39 ` Launchpad Bug Tracker
2020-06-18 10:27 ` Andrew Cloke
2020-06-30  6:54 ` Christian Ehrhardt 
2020-07-01  7:01 ` Ike Panhc
2020-07-12 13:16 ` Rafael David Tinoco
2020-07-13  3:59 ` Launchpad Bug Tracker
2020-07-13  4:12 ` Rafael David Tinoco
2020-07-15 15:31 ` dann frazier
2020-07-20 12:22 ` Rafael David Tinoco
2020-07-21 20:02 ` Rafael David Tinoco
2020-07-21 20:03 ` Rafael David Tinoco
2020-07-31 18:51 ` Rafael David Tinoco
2020-07-31 21:42 ` Rafael David Tinoco
2020-08-07  9:53 ` Timo Aaltonen
2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot
2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier
2020-08-14 19:49 ` dann frazier
2020-08-19 16:36 ` Launchpad Bug Tracker
2020-08-19 17:16 ` Andrew Cloke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3bf1a4e-4fa7-2c13-35fb-f63429ee39d1@ubuntu.com \
    --to=rafaeldtinoco@ubuntu.com \
    --cc=1805256@bugs.launchpad.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.