From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11657C43331 for ; Fri, 6 Sep 2019 21:37:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D95082070C for ; Fri, 6 Sep 2019 21:37:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D95082070C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernelpath.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:60512 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i6LvU-0008RU-Tv for qemu-devel@archiver.kernel.org; Fri, 06 Sep 2019 17:37:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43742) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i6Ltx-0007XR-QD for qemu-devel@nongnu.org; Fri, 06 Sep 2019 17:36:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i6Ltw-0000Kp-FG for qemu-devel@nongnu.org; Fri, 06 Sep 2019 17:36:17 -0400 Received: from indium.canonical.com ([91.189.90.7]:54398) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1i6Ltw-0000Jv-9P for qemu-devel@nongnu.org; Fri, 06 Sep 2019 17:36:16 -0400 Received: from loganberry.canonical.com ([91.189.90.37]) by indium.canonical.com with esmtp (Exim 4.86_2 #2 (Debian)) id 1i6Ltu-0006I4-KG for ; Fri, 06 Sep 2019 21:36:14 +0000 Received: from loganberry.canonical.com (localhost [127.0.0.1]) by loganberry.canonical.com (Postfix) with ESMTP id 97BB22E80C8 for ; Fri, 6 Sep 2019 21:36:14 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Fri, 06 Sep 2019 21:22:42 -0000 From: Rafael David Tinoco To: qemu-devel@nongnu.org X-Launchpad-Notification-Type: bug X-Launchpad-Bug: product=qemu; status=Confirmed; importance=Undecided; assignee=None; X-Launchpad-Bug: distribution=ubuntu; sourcepackage=qemu; component=main; status=In Progress; importance=Medium; assignee=rafaeldtinoco@kernelpath.com; X-Launchpad-Bug-Tags: qemu-img X-Launchpad-Bug-Information-Type: Public X-Launchpad-Bug-Private: no X-Launchpad-Bug-Security-Vulnerability: no X-Launchpad-Bug-Commenters: dannf jnsnow lizhengui rafaeldtinoco X-Launchpad-Bug-Reporter: dann frazier (dannf) X-Launchpad-Bug-Modifier: Rafael David Tinoco (rafaeldtinoco) References: <154327283728.15443.11625169757714443608.malonedeb@soybean.canonical.com> Message-Id: <156780496240.864.10410116882949423627.malone@wampee.canonical.com> X-Launchpad-Message-Rationale: Subscriber (QEMU) @qemu-devel-ml X-Launchpad-Message-For: qemu-devel-ml Precedence: bulk X-Generated-By: Launchpad (canonical.com); Revision="19044"; Instance="production-secrets-lazr.conf" X-Launchpad-Hash: a3bb8ced42f7c6f57a5fb674b12ce77cd21477dd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 91.189.90.7 Subject: [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Bug 1805256 <1805256@bugs.launchpad.net> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Alright, I couldn't reproduce this yet, I'm running same test case in a 24 cores box and causing lots of context switches and CPU migrations in parallel (trying to exhaust the logic). Will let this running for sometime to check. Unfortunately this can be related QEMU AIO BH locking/primitives and cache coherency in the HW in question (which I got specs from: https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616): l1$ size 8 MiB l1d$ size 4 MiB l1i$ size 4 MiB l2$ size 32 MiB l3$ size 64 MiB like for example when having 2 threads in different NUMA domains, or some other situation. I can't simulate the same since I have a SOC with: Cortex-A53 MPCore 24cores, L1 I/D=3D32KB/32KB L2 =3D256KB L3 =3D4MB and I'm not even close to L1/L2/L3 cache numbers from D06 =3Do). Just got a note that I'll be able to reproduce this in the real HW, will get back soon with real gdb debugging. -- = You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=3D0xaaaae8a67dc0, nfds=3D187650= 274213760, = timeout=3D, timeout@entry=3D0x0, sigmask=3D0xffffc123b= 950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=3D0x0, __timeout=3D0x0, __nfds=3D, = __fds=3D) at /usr/include/aarch64-linux-gnu/bits/poll2= .h:77 #2 qemu_poll_ns (fds=3D, nfds=3D, = timeout=3Dtimeout@entry=3D-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=3D-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=3D) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=3D0xffffc123bb58) at qemu-im= g.c:1980 #6 img_convert (argc=3D, argv=3D) at qemu-= img.c:2456 #7 0x0000aaaabbe2333c in main (argc=3D7, argv=3D) at qemu= -img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions