All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: peter.maydell@linaro.org
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, lvivier@redhat.com,
	groug@kaod.org, clg@kaod.org, mark.cave-ayland@ilande.co.uk,
	Jose Ricardo Ziviani <joserz@linux.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: [Qemu-devel] [PULL 12/14] Fix a deadlock case in the CPU hotplug flow
Date: Fri,  7 Sep 2018 17:31:53 +1000	[thread overview]
Message-ID: <20180907073155.26200-13-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20180907073155.26200-1-david@gibson.dropbear.id.au>

From: Jose Ricardo Ziviani <joserz@linux.ibm.com>

We need to set cs->halted to 1 before calling ppc_set_compat. The reason
is that ppc_set_compat kicks up the new thread created to manage the
hotplugged KVM virtual CPU and the code drives directly to KVM_RUN
ioctl. When cs->halted is 1, the code:

int kvm_cpu_exec(CPUState *cpu)
...
     if (kvm_arch_process_async_events(cpu)) {
         atomic_set(&cpu->exit_request, 0);
         return EXCP_HLT;
     }
...

returns before it reaches KVM_RUN, giving time to the main thread to
finish its job. Otherwise we can fall in a deadlock because the KVM
thread will issue the KVM_RUN ioctl while the main thread is setting up
KVM registers. Depending on how these jobs are scheduled we'll end up
freezing QEMU.

The following output shows kvm_vcpu_ioctl sleeping because it cannot get
the mutex and never will.
PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr.

STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL

PID: 61564  TASK: c000003e981e0780  CPU: 48  COMMAND: "qemu-system-ppc"
 #0 [c000003e982679a0] __schedule at c000000000b10a44
 #1 [c000003e98267a60] schedule at c000000000b113a8
 #2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910
 #3 [c000003e98267ab0] __mutex_lock at c000000000b132ec
 #4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm]
 #5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30
 #6 [c000003e98267dc0] ksys_ioctl at c000000000408674
 #7 [c000003e98267e10] sys_ioctl at c0000000004086f8
 #8 [c000003e98267e30] system_call at c00000000000b488

crash> struct -x kvm.vcpus 0xc000003da0000000
vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...}

crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000
  mutex.owner = {
    counter = 0xc000003a23a5c881 <- flag 1: waiters
  },

crash> bt 0xc000003a23a5c880
PID: 61579  TASK: c000003a23a5c880  CPU: 9   COMMAND: "CPU 4/KVM"
(active)

crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000
  mutex.wait_list = {
    next = 0xc000003e98267b10,
    prev = 0xc000003e98267b10
  },

crash> struct -x mutex_waiter.task 0xc000003e98267b10
  task = 0xc000003e981e0780

The following command-line was used to reproduce the problem (note: gdb
and trace can change the results).

 $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \
     -enable-kvm -m 4096 \
     -smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \
     -display none -nographic \
     -drive file=disk1.qcow2,format=qcow2
 ...
 (qemu) device_add host-spapr-cpu-core,core-id=4
[no interaction is possible after it, only SIGKILL to take the terminal
back]

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_cpu_core.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 876f0b3d9d..a73b244a3f 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -34,16 +34,16 @@ static void spapr_cpu_reset(void *opaque)
 
     cpu_reset(cs);
 
-    /* Set compatibility mode to match the boot CPU, which was either set
-     * by the machine reset code or by CAS. This should never fail.
-     */
-    ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort);
-
     /* All CPUs start halted.  CPU0 is unhalted from the machine level
      * reset code and the rest are explicitly started up by the guest
      * using an RTAS call */
     cs->halted = 1;
 
+    /* Set compatibility mode to match the boot CPU, which was either set
+     * by the machine reset code or by CAS. This should never fail.
+     */
+    ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort);
+
     env->spr[SPR_HIOR] = 0;
 
     lpcr = env->spr[SPR_LPCR];
-- 
2.17.1

  parent reply	other threads:[~2018-09-07  7:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07  7:31 [Qemu-devel] [PULL 00/14] ppc-for-3.1 queue 20180907 David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 01/14] ppc: Remove deprecated ppcemb target David Gibson
2018-09-07 13:39   ` Eric Blake
2018-09-10  4:25     ` David Gibson
2018-09-10  6:32       ` Thomas Huth
2018-09-11  1:42         ` David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 02/14] spapr: fix leak of rev array David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 03/14] spapr_pci: fix potential NULL pointer dereference David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 04/14] macio: move MACIOIDEState type declarations to macio.h David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 05/14] macio: add macio bus to help with fw path generation David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 06/14] macio: add addr property to macio IDE object David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 07/14] grackle: set device fw_name and address for correct fw path generation David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 08/14] mac_oldworld: implement custom FWPathProvider David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 09/14] uninorth: add ofw-addr property to allow correct fw path generation David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 10/14] mac_newworld: implement custom FWPathProvider David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 11/14] spapr: Correct reference count on spapr-cpu-core David Gibson
2018-09-07  7:31 ` David Gibson [this message]
2018-09-07  7:31 ` [Qemu-devel] [PULL 13/14] target/ppc/kvm: set vcpu as online/offline David Gibson
2018-09-07  7:31 ` [Qemu-devel] [PULL 14/14] target-ppc: Extend HWCAP2 bits for ISA 3.0 David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180907073155.26200-13-david@gibson.dropbear.id.au \
    --to=david@gibson.dropbear.id.au \
    --cc=clg@kaod.org \
    --cc=groug@kaod.org \
    --cc=joserz@linux.ibm.com \
    --cc=lvivier@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.