qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] ppc: Fix 'info pic' crash
@ 2019-10-24 14:27 Greg Kurz
  2019-10-24 14:27 ` [PATCH 1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip Greg Kurz
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Greg Kurz @ 2019-10-24 14:27 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Cédric Le Goater, qemu-devel

The interrupt presenters are currently parented to their associated
VCPU, and we rely on CPU_FOREACH() when we need to perform a specific
task with them. Like exposing their state with 'info pic', or finding
the target VCPU for an interrupt when using the XIVE controller.

We recently realized that the latter could crash QEMU because CPU_FOREACH()
can race with CPU hotplug. This got fixed by checking the presenter pointer
under the CPU was set (commit 627fa61746f7), but I'm not that sure that
this is enough since the presenter pointers also get stale at some point
during CPU unplug. And we still have other users of CPU_FOREACH(), namely
'info pic' with both XICS and XIVE, that have the very same problem:

With XIVE:

Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
0x00000001003d2848 in xive_tctx_pic_print_info (tctx=0x101ae5280, 
    mon=0x7fffffffe180) at /home/greg/Work/qemu/qemu-spapr/hw/intc/xive.c:526
526         int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
(gdb) p tctx
$1 = (XiveTCTX *) 0x101ae5280
(gdb) p tctx->cs
$2 = (CPUState *) 0x2057512020203a5d <-- tctx is stale
(gdb) p tctx->cs->cpu_index
Cannot access memory at address 0x205751202020bead

With XICS:

Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
0x00000001003cc39c in icp_pic_print_info (icp=0x10244ccf0, mon=0x7fffffffe940)
    at /home/greg/Work/qemu/qemu-spapr/hw/intc/xics.c:47
47          int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
(gdb) p icp
$1 = (ICPState *) 0x10244ccf0
(gdb) p icp->cs
$2 = (CPUState *) 0x524958203220 <-- icp is stale
(gdb) p icp->cs->cpu_index
Cannot access memory at address 0x52495820b670

It may be worth finding a way to address this globally instead of
open-coding the check of the presenter pointer everywhere because
this is fragile. I gave a try with this series:

	[0/6] ppc: Reparent the interrupt presenter

	https://patchwork.ozlabs.org/cover/1182224/

but it requires some more reflexion. Also, we're about to enter
softfreeze, and it seems better to come up with a simpler fix.

Let's forget the reparenting and check the presenter pointers
where needed instead. Patch 1 from the previous series was changed
to also NULLify presenter pointers, so that they can be used to
filter out unwanted vCPUs in patch 3. I've kept patch 2 because
it's a fix in the same area, but it isn't related to the QEMU
crashes.

--
Greg

---

Greg Kurz (3):
      ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip
      xive, xics: Fix reference counting on CPU objects
      ppc: Skip partially initialized vCPUs in 'info pic'


 hw/intc/spapr_xive.c       |   10 ++++++++++
 hw/intc/xics.c             |   22 +++++++++++++++++++++-
 hw/intc/xics_spapr.c       |   10 ++++++++++
 hw/intc/xive.c             |   20 +++++++++++++++++++-
 hw/ppc/pnv.c               |   21 +++++++++++++++++++++
 hw/ppc/pnv_core.c          |    7 ++++---
 hw/ppc/spapr_cpu_core.c    |    7 +------
 hw/ppc/spapr_irq.c         |   14 ++++++++++++++
 include/hw/ppc/pnv.h       |    1 +
 include/hw/ppc/spapr_irq.h |    2 ++
 include/hw/ppc/xics.h      |    1 +
 include/hw/ppc/xive.h      |    1 +
 12 files changed, 105 insertions(+), 11 deletions(-)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip
  2019-10-24 14:27 [PATCH 0/3] ppc: Fix 'info pic' crash Greg Kurz
@ 2019-10-24 14:27 ` Greg Kurz
  2019-10-24 14:27 ` [PATCH 2/3] xive, xics: Fix reference counting on CPU objects Greg Kurz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Greg Kurz @ 2019-10-24 14:27 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Cédric Le Goater, qemu-devel

SpaprInterruptControllerClass and PnvChipClass have an intc_create() method
that calls the appropriate routine, ie. icp_create() or xive_tctx_create(),
to establish the link between the VCPU and the presenter component of the
interrupt controller during realize.

There aren't any symmetrical call to be called when the VCPU gets unrealized
though. It is assumed that object_unparent() is the only thing to do.

This is questionable because the parenting logic around the CPU and
presenter objects is really an implementation detail of the interrupt
controller. It shouldn't be open-coded in the machine code.

Fix this by adding an intc_destroy() method that undoes what was done in
intc_create(). Also NULLify the presenter pointers to avoid having
stale pointers around. This will allow to reliably check if a vCPU has
a valid presenter.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/intc/spapr_xive.c       |   10 ++++++++++
 hw/intc/xics.c             |    5 +++++
 hw/intc/xics_spapr.c       |   10 ++++++++++
 hw/intc/xive.c             |    5 +++++
 hw/ppc/pnv.c               |   21 +++++++++++++++++++++
 hw/ppc/pnv_core.c          |    7 ++++---
 hw/ppc/spapr_cpu_core.c    |    7 +------
 hw/ppc/spapr_irq.c         |   14 ++++++++++++++
 include/hw/ppc/pnv.h       |    1 +
 include/hw/ppc/spapr_irq.h |    2 ++
 include/hw/ppc/xics.h      |    1 +
 include/hw/ppc/xive.h      |    1 +
 12 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index d8e1291905c3..9cb8d38a3bab 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -555,6 +555,15 @@ static void spapr_xive_cpu_intc_reset(SpaprInterruptController *intc,
     xive_tctx_set_os_cam(tctx, xive_nvt_cam_line(nvt_blk, nvt_idx));
 }
 
+static void spapr_xive_cpu_intc_destroy(SpaprInterruptController *intc,
+                                        PowerPCCPU *cpu)
+{
+    SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+
+    xive_tctx_destroy(spapr_cpu->tctx);
+    spapr_cpu->tctx = NULL;
+}
+
 static void spapr_xive_set_irq(SpaprInterruptController *intc, int irq, int val)
 {
     SpaprXive *xive = SPAPR_XIVE(intc);
@@ -692,6 +701,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     sicc->deactivate = spapr_xive_deactivate;
     sicc->cpu_intc_create = spapr_xive_cpu_intc_create;
     sicc->cpu_intc_reset = spapr_xive_cpu_intc_reset;
+    sicc->cpu_intc_destroy = spapr_xive_cpu_intc_destroy;
     sicc->claim_irq = spapr_xive_claim_irq;
     sicc->free_irq = spapr_xive_free_irq;
     sicc->set_irq = spapr_xive_set_irq;
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 6da05763f9db..935f325749cb 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -401,6 +401,11 @@ Object *icp_create(Object *cpu, const char *type, XICSFabric *xi, Error **errp)
     return obj;
 }
 
+void icp_destroy(ICPState *icp)
+{
+    object_unparent(OBJECT(icp));
+}
+
 /*
  * ICS: Source layer
  */
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 7418fb9f370c..b3705dab0e8a 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -352,6 +352,15 @@ static void xics_spapr_cpu_intc_reset(SpaprInterruptController *intc,
     icp_reset(spapr_cpu_state(cpu)->icp);
 }
 
+static void xics_spapr_cpu_intc_destroy(SpaprInterruptController *intc,
+                                        PowerPCCPU *cpu)
+{
+    SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+
+    icp_destroy(spapr_cpu->icp);
+    spapr_cpu->icp = NULL;
+}
+
 static int xics_spapr_claim_irq(SpaprInterruptController *intc, int irq,
                                 bool lsi, Error **errp)
 {
@@ -440,6 +449,7 @@ static void ics_spapr_class_init(ObjectClass *klass, void *data)
     sicc->deactivate = xics_spapr_deactivate;
     sicc->cpu_intc_create = xics_spapr_cpu_intc_create;
     sicc->cpu_intc_reset = xics_spapr_cpu_intc_reset;
+    sicc->cpu_intc_destroy = xics_spapr_cpu_intc_destroy;
     sicc->claim_irq = xics_spapr_claim_irq;
     sicc->free_irq = xics_spapr_free_irq;
     sicc->set_irq = xics_spapr_set_irq;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index f066be5eb5e3..38257aa02083 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -696,6 +696,11 @@ error:
     return NULL;
 }
 
+void xive_tctx_destroy(XiveTCTX *tctx)
+{
+    object_unparent(OBJECT(tctx));
+}
+
 /*
  * XIVE ESB helpers
  */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 4a51fb65a834..68cc3c81aa75 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -778,6 +778,7 @@ static void pnv_chip_power8_intc_create(PnvChip *chip, PowerPCCPU *cpu,
     pnv_cpu->intc = obj;
 }
 
+
 static void pnv_chip_power8_intc_reset(PnvChip *chip, PowerPCCPU *cpu)
 {
     PnvCPUState *pnv_cpu = pnv_cpu_state(cpu);
@@ -785,6 +786,14 @@ static void pnv_chip_power8_intc_reset(PnvChip *chip, PowerPCCPU *cpu)
     icp_reset(ICP(pnv_cpu->intc));
 }
 
+static void pnv_chip_power8_intc_destroy(PnvChip *chip, PowerPCCPU *cpu)
+{
+    PnvCPUState *pnv_cpu = pnv_cpu_state(cpu);
+
+    icp_destroy(ICP(pnv_cpu->intc));
+    pnv_cpu->intc = NULL;
+}
+
 /*
  *    0:48  Reserved - Read as zeroes
  *   49:52  Node ID
@@ -829,6 +838,14 @@ static void pnv_chip_power9_intc_reset(PnvChip *chip, PowerPCCPU *cpu)
     xive_tctx_reset(XIVE_TCTX(pnv_cpu->intc));
 }
 
+static void pnv_chip_power9_intc_destroy(PnvChip *chip, PowerPCCPU *cpu)
+{
+    PnvCPUState *pnv_cpu = pnv_cpu_state(cpu);
+
+    xive_tctx_destroy(XIVE_TCTX(pnv_cpu->intc));
+    pnv_cpu->intc = NULL;
+}
+
 /*
  * Allowed core identifiers on a POWER8 Processor Chip :
  *
@@ -999,6 +1016,7 @@ static void pnv_chip_power8e_class_init(ObjectClass *klass, void *data)
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
+    k->intc_destroy = pnv_chip_power8_intc_destroy;
     k->isa_create = pnv_chip_power8_isa_create;
     k->dt_populate = pnv_chip_power8_dt_populate;
     k->pic_print_info = pnv_chip_power8_pic_print_info;
@@ -1019,6 +1037,7 @@ static void pnv_chip_power8_class_init(ObjectClass *klass, void *data)
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
+    k->intc_destroy = pnv_chip_power8_intc_destroy;
     k->isa_create = pnv_chip_power8_isa_create;
     k->dt_populate = pnv_chip_power8_dt_populate;
     k->pic_print_info = pnv_chip_power8_pic_print_info;
@@ -1039,6 +1058,7 @@ static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
     k->core_pir = pnv_chip_core_pir_p8;
     k->intc_create = pnv_chip_power8_intc_create;
     k->intc_reset = pnv_chip_power8_intc_reset;
+    k->intc_destroy = pnv_chip_power8_intc_destroy;
     k->isa_create = pnv_chip_power8nvl_isa_create;
     k->dt_populate = pnv_chip_power8_dt_populate;
     k->pic_print_info = pnv_chip_power8_pic_print_info;
@@ -1209,6 +1229,7 @@ static void pnv_chip_power9_class_init(ObjectClass *klass, void *data)
     k->core_pir = pnv_chip_core_pir_p9;
     k->intc_create = pnv_chip_power9_intc_create;
     k->intc_reset = pnv_chip_power9_intc_reset;
+    k->intc_destroy = pnv_chip_power9_intc_destroy;
     k->isa_create = pnv_chip_power9_isa_create;
     k->dt_populate = pnv_chip_power9_dt_populate;
     k->pic_print_info = pnv_chip_power9_pic_print_info;
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index e81cd3a3e047..61b3d3ce2250 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -269,11 +269,12 @@ err:
     error_propagate(errp, local_err);
 }
 
-static void pnv_core_cpu_unrealize(PowerPCCPU *cpu)
+static void pnv_core_cpu_unrealize(PowerPCCPU *cpu, PnvChip *chip)
 {
     PnvCPUState *pnv_cpu = pnv_cpu_state(cpu);
+    PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
 
-    object_unparent(OBJECT(pnv_cpu_state(cpu)->intc));
+    pcc->intc_destroy(chip, cpu);
     cpu_remove_sync(CPU(cpu));
     cpu->machine_data = NULL;
     g_free(pnv_cpu);
@@ -289,7 +290,7 @@ static void pnv_core_unrealize(DeviceState *dev, Error **errp)
     qemu_unregister_reset(pnv_core_reset, pc);
 
     for (i = 0; i < cc->nr_threads; i++) {
-        pnv_core_cpu_unrealize(pc->threads[i]);
+        pnv_core_cpu_unrealize(pc->threads[i], pc->chip);
     }
     g_free(pc->threads);
 }
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ef7b27a66d56..8339c4c0f86b 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -195,12 +195,7 @@ static void spapr_unrealize_vcpu(PowerPCCPU *cpu, SpaprCpuCore *sc)
     if (!sc->pre_3_0_migration) {
         vmstate_unregister(NULL, &vmstate_spapr_cpu_state, cpu->machine_data);
     }
-    if (spapr_cpu_state(cpu)->icp) {
-        object_unparent(OBJECT(spapr_cpu_state(cpu)->icp));
-    }
-    if (spapr_cpu_state(cpu)->tctx) {
-        object_unparent(OBJECT(spapr_cpu_state(cpu)->tctx));
-    }
+    spapr_irq_cpu_intc_destroy(SPAPR_MACHINE(qdev_get_machine()), cpu);
     cpu_remove_sync(CPU(cpu));
     object_unparent(OBJECT(cpu));
 }
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index b941608b69ba..168044be853a 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -234,6 +234,20 @@ void spapr_irq_cpu_intc_reset(SpaprMachineState *spapr, PowerPCCPU *cpu)
     }
 }
 
+void spapr_irq_cpu_intc_destroy(SpaprMachineState *spapr, PowerPCCPU *cpu)
+{
+    SpaprInterruptController *intcs[] = ALL_INTCS(spapr);
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(intcs); i++) {
+        SpaprInterruptController *intc = intcs[i];
+        if (intc) {
+            SpaprInterruptControllerClass *sicc = SPAPR_INTC_GET_CLASS(intc);
+            sicc->cpu_intc_destroy(intc, cpu);
+        }
+    }
+}
+
 static void spapr_set_irq(void *opaque, int irq, int level)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(opaque);
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 2a780e633f23..0b4c722e6b48 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -112,6 +112,7 @@ typedef struct PnvChipClass {
     uint32_t (*core_pir)(PnvChip *chip, uint32_t core_id);
     void (*intc_create)(PnvChip *chip, PowerPCCPU *cpu, Error **errp);
     void (*intc_reset)(PnvChip *chip, PowerPCCPU *cpu);
+    void (*intc_destroy)(PnvChip *chip, PowerPCCPU *cpu);
     ISABus *(*isa_create)(PnvChip *chip, Error **errp);
     void (*dt_populate)(PnvChip *chip, void *fdt);
     void (*pic_print_info)(PnvChip *chip, Monitor *mon);
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 09232999b07e..ff814d13de37 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -53,6 +53,7 @@ typedef struct SpaprInterruptControllerClass {
     int (*cpu_intc_create)(SpaprInterruptController *intc,
                             PowerPCCPU *cpu, Error **errp);
     void (*cpu_intc_reset)(SpaprInterruptController *intc, PowerPCCPU *cpu);
+    void (*cpu_intc_destroy)(SpaprInterruptController *intc, PowerPCCPU *cpu);
     int (*claim_irq)(SpaprInterruptController *intc, int irq, bool lsi,
                      Error **errp);
     void (*free_irq)(SpaprInterruptController *intc, int irq);
@@ -70,6 +71,7 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr);
 int spapr_irq_cpu_intc_create(SpaprMachineState *spapr,
                               PowerPCCPU *cpu, Error **errp);
 void spapr_irq_cpu_intc_reset(SpaprMachineState *spapr, PowerPCCPU *cpu);
+void spapr_irq_cpu_intc_destroy(SpaprMachineState *spapr, PowerPCCPU *cpu);
 void spapr_irq_print_info(SpaprMachineState *spapr, Monitor *mon);
 void spapr_irq_dt(SpaprMachineState *spapr, uint32_t nr_servers,
                   void *fdt, uint32_t phandle);
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 602173c12250..48a75aa4ab75 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -181,6 +181,7 @@ void icp_resend(ICPState *ss);
 
 Object *icp_create(Object *cpu, const char *type, XICSFabric *xi,
                    Error **errp);
+void icp_destroy(ICPState *icp);
 
 /* KVM */
 void icp_get_kvm_state(ICPState *icp);
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 99381639f50c..8fd439ec9bba 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -416,6 +416,7 @@ uint64_t xive_tctx_tm_read(XiveTCTX *tctx, hwaddr offset, unsigned size);
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
 Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
 void xive_tctx_reset(XiveTCTX *tctx);
+void xive_tctx_destroy(XiveTCTX *tctx);
 
 static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
 {



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] xive, xics: Fix reference counting on CPU objects
  2019-10-24 14:27 [PATCH 0/3] ppc: Fix 'info pic' crash Greg Kurz
  2019-10-24 14:27 ` [PATCH 1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip Greg Kurz
@ 2019-10-24 14:27 ` Greg Kurz
  2019-10-24 14:27 ` [PATCH 3/3] ppc: Skip partially initialized vCPUs in 'info pic' Greg Kurz
  2019-10-27 17:10 ` [PATCH 0/3] ppc: Fix 'info pic' crash David Gibson
  3 siblings, 0 replies; 5+ messages in thread
From: Greg Kurz @ 2019-10-24 14:27 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Cédric Le Goater, qemu-devel

When a VCPU gets connected to the XIVE interrupt controller, we add a
const link targetting the CPU object to the TCTX object. Similar links
are added to the ICP object when using the XICS interrupt controller.

As explained in <qom/object.h>:

 * The caller must ensure that @target stays alive as long as
 * this property exists.  In the case @target is a child of @obj,
 * this will be the case.  Otherwise, the caller is responsible for
 * taking a reference.

We're in the latter case for both XICS and XIVE. Add the missing
calls to object_ref() and object_unref().

This doesn't fix any known issue because the life cycle of the TCTX or
ICP happens to be shorter than the one of the CPU or XICS fabric, but
better safe than sorry.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/intc/xics.c |    8 +++++++-
 hw/intc/xive.c |    6 +++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 935f325749cb..5f746079be46 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -388,8 +388,10 @@ Object *icp_create(Object *cpu, const char *type, XICSFabric *xi, Error **errp)
     obj = object_new(type);
     object_property_add_child(cpu, type, obj, &error_abort);
     object_unref(obj);
+    object_ref(OBJECT(xi));
     object_property_add_const_link(obj, ICP_PROP_XICS, OBJECT(xi),
                                    &error_abort);
+    object_ref(cpu);
     object_property_add_const_link(obj, ICP_PROP_CPU, cpu, &error_abort);
     object_property_set_bool(obj, true, "realized", &local_err);
     if (local_err) {
@@ -403,7 +405,11 @@ Object *icp_create(Object *cpu, const char *type, XICSFabric *xi, Error **errp)
 
 void icp_destroy(ICPState *icp)
 {
-    object_unparent(OBJECT(icp));
+    Object *obj = OBJECT(icp);
+
+    object_unref(object_property_get_link(obj, ICP_PROP_CPU, &error_abort));
+    object_unref(object_property_get_link(obj, ICP_PROP_XICS, &error_abort));
+    object_unparent(obj);
 }
 
 /*
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 38257aa02083..952a461d5329 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -682,6 +682,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
     obj = object_new(TYPE_XIVE_TCTX);
     object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
     object_unref(obj);
+    object_ref(cpu);
     object_property_add_const_link(obj, "cpu", cpu, &error_abort);
     object_property_set_bool(obj, true, "realized", &local_err);
     if (local_err) {
@@ -698,7 +699,10 @@ error:
 
 void xive_tctx_destroy(XiveTCTX *tctx)
 {
-    object_unparent(OBJECT(tctx));
+    Object *obj = OBJECT(tctx);
+
+    object_unref(object_property_get_link(obj, "cpu", &error_abort));
+    object_unparent(obj);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] ppc: Skip partially initialized vCPUs in 'info pic'
  2019-10-24 14:27 [PATCH 0/3] ppc: Fix 'info pic' crash Greg Kurz
  2019-10-24 14:27 ` [PATCH 1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip Greg Kurz
  2019-10-24 14:27 ` [PATCH 2/3] xive, xics: Fix reference counting on CPU objects Greg Kurz
@ 2019-10-24 14:27 ` Greg Kurz
  2019-10-27 17:10 ` [PATCH 0/3] ppc: Fix 'info pic' crash David Gibson
  3 siblings, 0 replies; 5+ messages in thread
From: Greg Kurz @ 2019-10-24 14:27 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Cédric Le Goater, qemu-devel

CPU_FOREACH() can race with vCPU hotplug/unplug on sPAPR machines, ie.
we may try to print out info about a vCPU with a NULL presenter pointer.

Check that in order to prevent QEMU from crashing.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/intc/xics.c |   11 ++++++++++-
 hw/intc/xive.c |   11 ++++++++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 5f746079be46..e7ac9ba618fa 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -44,7 +44,16 @@
 
 void icp_pic_print_info(ICPState *icp, Monitor *mon)
 {
-    int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
+    int cpu_index;
+
+    /* Skip partially initialized vCPUs. This can happen on sPAPR when vCPUs
+     * are hot plugged or unplugged.
+     */
+    if (!icp) {
+        return;
+    }
+
+    cpu_index = icp->cs ? icp->cs->cpu_index : -1;
 
     if (!icp->output) {
         return;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 952a461d5329..75dce82fb205 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -523,9 +523,18 @@ static const char * const xive_tctx_ring_names[] = {
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
 {
-    int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
+    int cpu_index;
     int i;
 
+    /* Skip partially initialized vCPUs. This can happen on sPAPR when vCPUs
+     * are hot plugged or unplugged.
+     */
+    if (!tctx) {
+        return;
+    }
+
+    cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
+
     if (kvm_irqchip_in_kernel()) {
         Error *local_err = NULL;
 



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] ppc: Fix 'info pic' crash
  2019-10-24 14:27 [PATCH 0/3] ppc: Fix 'info pic' crash Greg Kurz
                   ` (2 preceding siblings ...)
  2019-10-24 14:27 ` [PATCH 3/3] ppc: Skip partially initialized vCPUs in 'info pic' Greg Kurz
@ 2019-10-27 17:10 ` David Gibson
  3 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2019-10-27 17:10 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, Cédric Le Goater, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2855 bytes --]

On Thu, Oct 24, 2019 at 04:27:16PM +0200, Greg Kurz wrote:
> The interrupt presenters are currently parented to their associated
> VCPU, and we rely on CPU_FOREACH() when we need to perform a specific
> task with them. Like exposing their state with 'info pic', or finding
> the target VCPU for an interrupt when using the XIVE controller.
> 
> We recently realized that the latter could crash QEMU because CPU_FOREACH()
> can race with CPU hotplug. This got fixed by checking the presenter pointer
> under the CPU was set (commit 627fa61746f7), but I'm not that sure that
> this is enough since the presenter pointers also get stale at some point
> during CPU unplug. And we still have other users of CPU_FOREACH(), namely
> 'info pic' with both XICS and XIVE, that have the very same problem:
> 
> With XIVE:
> 
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x00000001003d2848 in xive_tctx_pic_print_info (tctx=0x101ae5280, 
>     mon=0x7fffffffe180) at /home/greg/Work/qemu/qemu-spapr/hw/intc/xive.c:526
> 526         int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
> (gdb) p tctx
> $1 = (XiveTCTX *) 0x101ae5280
> (gdb) p tctx->cs
> $2 = (CPUState *) 0x2057512020203a5d <-- tctx is stale
> (gdb) p tctx->cs->cpu_index
> Cannot access memory at address 0x205751202020bead
> 
> With XICS:
> 
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x00000001003cc39c in icp_pic_print_info (icp=0x10244ccf0, mon=0x7fffffffe940)
>     at /home/greg/Work/qemu/qemu-spapr/hw/intc/xics.c:47
> 47          int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
> (gdb) p icp
> $1 = (ICPState *) 0x10244ccf0
> (gdb) p icp->cs
> $2 = (CPUState *) 0x524958203220 <-- icp is stale
> (gdb) p icp->cs->cpu_index
> Cannot access memory at address 0x52495820b670
> 
> It may be worth finding a way to address this globally instead of
> open-coding the check of the presenter pointer everywhere because
> this is fragile. I gave a try with this series:
> 
> 	[0/6] ppc: Reparent the interrupt presenter
> 
> 	https://patchwork.ozlabs.org/cover/1182224/
> 
> but it requires some more reflexion. Also, we're about to enter
> softfreeze, and it seems better to come up with a simpler fix.
> 
> Let's forget the reparenting and check the presenter pointers
> where needed instead. Patch 1 from the previous series was changed
> to also NULLify presenter pointers, so that they can be used to
> filter out unwanted vCPUs in patch 3. I've kept patch 2 because
> it's a fix in the same area, but it isn't related to the QEMU
> crashes.

Applied to ppc-for-4.2, thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-27 17:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 14:27 [PATCH 0/3] ppc: Fix 'info pic' crash Greg Kurz
2019-10-24 14:27 ` [PATCH 1/3] ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip Greg Kurz
2019-10-24 14:27 ` [PATCH 2/3] xive, xics: Fix reference counting on CPU objects Greg Kurz
2019-10-24 14:27 ` [PATCH 3/3] ppc: Skip partially initialized vCPUs in 'info pic' Greg Kurz
2019-10-27 17:10 ` [PATCH 0/3] ppc: Fix 'info pic' crash David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).