All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support
@ 2016-10-25  4:47 Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers Michael Roth
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

This series is based on David's ppc-for-2.8 branch, and is also available from:

  https://github.com/mdroth/qemu/commits/spapr-hotplug-event-update

Changes since RFC:
  * Submit as v1 now that PAPR Hotplug ACR is accepted
  * Rebase on latest ppc-for-2.8 (with device-tree refactoring)
  * address Patchew warnings
  * add comments to clarify spapr->ov5/ov5_cas usage. (David)
  * revise comment to clarify intent when setting spapr->ov5
    OV5_HP_EVT bit. (Bharata)
  * drop internal usage of spapr_ovec_from_bitmap() in favor of
    directly assigning bitmap to sPAPROptionVector instances. (David)
  * standardize meaning of 'vector_len' variable through spapr_ovec_*
    functions to be the byte-wise length of option vectors entries,
    and not including the preceeding length byte itself. (David)
  * fix spapr_ovec_populate_dt() to parse up to OV_MAXBITS bits
    rather than OV_MAXBITS - 1. (David)
  * fix spapr_ovec_populate_dt() encode the minimum of 1 option
    vector byte instead of the max of OV_MAXBYTES in cases where
    no option bits are set. (David)
  * add some comments to spapr_ovec_populate_dt() to clarify what
    is being encoded into length byte of ibm,architecture-vec-5
  * switch 'legacy-hotplug-events' option to
    'modern-hotplug-events' (David)
  * modify rtas_event_log_to_source() to check for OV5_HP_EVT
    option rather than relying on whether the hotplug source is
    specifically enabled. Assert the latter in cases where
    OV5_HP_EVT is set. (Bharata)
  * drop global EventSource list in favor of an sPAPREventSource
    list field within sPAPRMachineState (David)
  * add CPU unplug hook in mc->unplug_request (Bharata)


Patches 1-4 address various deficiencies in how we currently handle option
vectors via ibm,client-architecture-support. This is done here in preparation
for a new option vector bit introduced later in this series, as well as a
number of future option vector bits related to other features, but I can
break this out into a separate series if preferred.

Patches 5-7 add support for an updated event format for hotplug events,
which includes a new way to specify a range of DRCs/LMBs to hotplug/unplug
using a starting position and count, which is necessary for memory unplug.
The format for this new event format is still in draft form, but slated
for inclusion in the PAPR/LoPAPR.

Patches 8-10 add support for memory unplug using the new event format.

In addition to kernel 4.8 or later, there are a number of patches required
to enable support on the guest kernel side. I've including the minimum set
of patches in my branch here:

   https://github.com/mdroth/linux/commits/spapr-hotplug-event-update

   *powerpc/pseries: advertise Hot Plug Event support to firmware
   powerpc/pseries: Implement indexed-count hotplug memory remove
   powerpc/pseries: Implement indexed-count hotplug memory add

Note that there is currently an issue that arises when attempting to
offline an LMB that was onlined using a guest kernel's auto-onlining
mechanism, which can prevent full completion of memory unplug requests.
This is being investigated, but for the purposes of testing this can
be worked around currently by disabling auto-onlining in guests via:

  "echo offline >/sys/devices/system/memory/auto_online_blocks"

and instead onlining the blocks manually or via udev.

 docs/specs/ppc-spapr-hotplug.txt |  55 +++++++++++++++++++++-----
 hw/ppc/Makefile.objs             |   2 +-
 hw/ppc/spapr.c                   | 243 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 hw/ppc/spapr_drc.c               |  17 ++++++++
 hw/ppc/spapr_events.c            | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
 hw/ppc/spapr_hcall.c             |  70 ++++++++++++++-------------------
 hw/ppc/spapr_ovec.c              | 242 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h           |  17 ++++++--
 include/hw/ppc/spapr_ovec.h      |  67 ++++++++++++++++++++++++++++++++
 9 files changed, 868 insertions(+), 123 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-25 23:54   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 02/10] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.

Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".

The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.

This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/Makefile.objs        |   2 +-
 hw/ppc/spapr_ovec.c         | 242 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_ovec.h |  62 ++++++++++++
 3 files changed, 305 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_ovec.c
 create mode 100644 include/hw/ppc/spapr_ovec.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index ebc72af..8025129 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
-obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
+obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
 # IBM PowerNV
 obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
new file mode 100644
index 0000000..c2a0d18
--- /dev/null
+++ b/hw/ppc/spapr_ovec.c
@@ -0,0 +1,242 @@
+/*
+ * QEMU SPAPR Architecture Option Vector Helper Functions
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * Authors:
+ *  Bharata B Rao     <bharata@linux.vnet.ibm.com>
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/ppc/spapr_ovec.h"
+#include "qemu/bitmap.h"
+#include "exec/address-spaces.h"
+#include "qemu/error-report.h"
+#include <libfdt.h>
+
+/* #define DEBUG_SPAPR_OVEC */
+
+#ifdef DEBUG_SPAPR_OVEC
+#define DPRINTFN(fmt, ...) \
+    do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTFN(fmt, ...) \
+    do { } while (0)
+#endif
+
+#define OV_MAXBYTES 256 /* not including length byte */
+#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
+
+/* we *could* work with bitmaps directly, but handling the bitmap privately
+ * allows us to more safely make assumptions about the bitmap size and
+ * simplify the calling code somewhat
+ */
+struct sPAPROptionVector {
+    unsigned long *bitmap;
+};
+
+sPAPROptionVector *spapr_ovec_new(void)
+{
+    sPAPROptionVector *ov;
+
+    ov = g_new0(sPAPROptionVector, 1);
+    ov->bitmap = bitmap_new(OV_MAXBITS);
+
+    return ov;
+}
+
+sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
+{
+    sPAPROptionVector *ov;
+
+    g_assert(ov_orig);
+
+    ov = spapr_ovec_new();
+    bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
+
+    return ov;
+}
+
+void spapr_ovec_intersect(sPAPROptionVector *ov,
+                          sPAPROptionVector *ov1,
+                          sPAPROptionVector *ov2)
+{
+    g_assert(ov);
+    g_assert(ov1);
+    g_assert(ov2);
+
+    bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
+}
+
+/* returns true if options bits were removed, false otherwise */
+bool spapr_ovec_diff(sPAPROptionVector *ov,
+                     sPAPROptionVector *ov_old,
+                     sPAPROptionVector *ov_new)
+{
+    unsigned long *change_mask = bitmap_new(OV_MAXBITS);
+    unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
+    bool bits_were_removed = false;
+
+    g_assert(ov);
+    g_assert(ov_old);
+    g_assert(ov_new);
+
+    bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
+    bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
+    bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
+
+    if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
+        bits_were_removed = true;
+    }
+
+    g_free(change_mask);
+    g_free(removed_bits);
+
+    return bits_were_removed;
+}
+
+void spapr_ovec_cleanup(sPAPROptionVector *ov)
+{
+    if (ov) {
+        g_free(ov->bitmap);
+        g_free(ov);
+    }
+}
+
+void spapr_ovec_set(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    set_bit(bitnr, ov->bitmap);
+}
+
+void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    clear_bit(bitnr, ov->bitmap);
+}
+
+bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    return test_bit(bitnr, ov->bitmap) ? true : false;
+}
+
+static void guest_byte_to_bitmap(uint8_t entry, unsigned long *bitmap,
+                                 long bitmap_offset)
+{
+    int i;
+
+    for (i = 0; i < BITS_PER_BYTE; i++) {
+        if (entry & (1 << (BITS_PER_BYTE - 1 - i))) {
+            bitmap_set(bitmap, bitmap_offset + i, 1);
+        }
+    }
+}
+
+static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
+{
+    uint8_t entry = 0;
+    int i;
+
+    for (i = 0; i < BITS_PER_BYTE; i++) {
+        if (test_bit(bitmap_offset + i, bitmap)) {
+            entry |= (1 << (BITS_PER_BYTE - 1 - i));
+        }
+    }
+
+    return entry;
+}
+
+static target_ulong vector_addr(target_ulong table_addr, int vector)
+{
+    uint16_t vector_count, vector_len;
+    int i;
+
+    vector_count = ldub_phys(&address_space_memory, table_addr) + 1;
+    if (vector > vector_count) {
+        return 0;
+    }
+    table_addr++; /* skip nr option vectors */
+
+    for (i = 0; i < vector - 1; i++) {
+        vector_len = ldub_phys(&address_space_memory, table_addr) + 1;
+        table_addr += vector_len + 1; /* bit-vector + length byte */
+    }
+    return table_addr;
+}
+
+sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector)
+{
+    sPAPROptionVector *ov;
+    target_ulong addr;
+    uint16_t vector_len;
+    int i;
+
+    g_assert(table_addr);
+    g_assert_cmpint(vector, >=, 1); /* vector numbering starts at 1 */
+
+    addr = vector_addr(table_addr, vector);
+    if (!addr) {
+        /* specified vector isn't present */
+        return NULL;
+    }
+
+    vector_len = ldub_phys(&address_space_memory, addr++) + 1;
+    g_assert_cmpint(vector_len, <=, OV_MAXBYTES);
+    ov = spapr_ovec_new();
+
+    for (i = 0; i < vector_len; i++) {
+        uint8_t entry = ldub_phys(&address_space_memory, addr + i);
+        if (entry) {
+            DPRINTFN("read guest vector %2d, byte %3d / %3d: 0x%.2x",
+                     vector, i + 1, vector_len, entry);
+            guest_byte_to_bitmap(entry, ov->bitmap, i * BITS_PER_BYTE);
+        }
+    }
+
+    return ov;
+}
+
+int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
+                           sPAPROptionVector *ov, const char *name)
+{
+    uint8_t vec[OV_MAXBYTES + 1];
+    uint16_t vec_len;
+    unsigned long lastbit;
+    int i;
+
+    g_assert(ov);
+
+    lastbit = find_last_bit(ov->bitmap, OV_MAXBITS);
+    /* if no bits are set, include at least 1 byte of the vector so we can
+     * still encoded this in the device tree while abiding by the same
+     * encoding/sizing expected in ibm,client-architecture-support
+     */
+    vec_len = (lastbit == OV_MAXBITS) ? 1 : lastbit / BITS_PER_BYTE + 1;
+    g_assert_cmpint(vec_len, <=, OV_MAXBYTES);
+    /* guest expects vector len encoded as vec_len - 1, since the length byte
+     * is assumed and not included, and the first byte of the vector
+     * is assumed as well
+     */
+    vec[0] = vec_len - 1;
+
+    for (i = 1; i < vec_len + 1; i++) {
+        vec[i] = guest_byte_from_bitmap(ov->bitmap, (i - 1) * BITS_PER_BYTE);
+        if (vec[i]) {
+            DPRINTFN("encoding guest vector byte %3d / %3d: 0x%.2x",
+                     i, vec_len, vec[i]);
+        }
+    }
+
+    return fdt_setprop(fdt, fdt_offset, name, vec, vec_len);
+}
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
new file mode 100644
index 0000000..fba2d98
--- /dev/null
+++ b/include/hw/ppc/spapr_ovec.h
@@ -0,0 +1,62 @@
+/*
+ * QEMU SPAPR Option/Architecture Vector Definitions
+ *
+ * Each architecture option is organized/documented by the following
+ * in LoPAPR 1.1, Table 244:
+ *
+ *   <vector number>: the bit-vector in which the option is located
+ *   <vector byte>: the byte offset of the vector entry
+ *   <vector bit>: the bit offset within the vector entry
+ *
+ * where each vector entry can be one or more bytes.
+ *
+ * Firmware expects a somewhat literal encoding of this bit-vector
+ * structure, where each entry is stored in little-endian so that the
+ * byte ordering reflects that of the documentation, but where each bit
+ * offset is from "left-to-right" in the traditional representation of
+ * a byte value where the MSB is the left-most bit. Thus, each
+ * individual byte encodes the option bits in reverse order of the
+ * documented bit.
+ *
+ * These definitions/helpers attempt to abstract away this internal
+ * representation so that we can define/set/test for individual option
+ * bits using only the documented values. This is done mainly by relying
+ * on a bitmap to approximate the documented "bit-vector" structure and
+ * handling conversations to-from the internal representation under the
+ * covers.
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)
+#define __HW_SPAPR_OPTION_VECTORS_H__
+
+#include "cpu.h"
+
+typedef struct sPAPROptionVector sPAPROptionVector;
+
+#define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
+
+/* interfaces */
+sPAPROptionVector *spapr_ovec_new(void);
+sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
+void spapr_ovec_intersect(sPAPROptionVector *ov,
+                          sPAPROptionVector *ov1,
+                          sPAPROptionVector *ov2);
+bool spapr_ovec_diff(sPAPROptionVector *ov,
+                     sPAPROptionVector *ov_old,
+                     sPAPROptionVector *ov_new);
+void spapr_ovec_cleanup(sPAPROptionVector *ov);
+void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
+void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
+bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
+sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
+int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
+                           sPAPROptionVector *ov, const char *name);
+
+#endif /* !defined (__HW_SPAPR_OPTION_VECTORS_H__) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 02/10] spapr_hcall: use spapr_ovec_* interfaces for CAS options
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets Michael Roth
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 10 ++++++--
 hw/ppc/spapr_hcall.c        | 56 ++++++++++++---------------------------------
 include/hw/ppc/spapr.h      |  5 +++-
 include/hw/ppc/spapr_ovec.h |  3 +++
 4 files changed, 30 insertions(+), 44 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 593d437..af5a239 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -657,7 +657,7 @@ out:
 
 int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update, bool memory_update)
+                                 bool cpu_update)
 {
     void *fdt, *fdt_skel;
     sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
@@ -681,7 +681,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
     }
 
     /* Generate ibm,dynamic-reconfiguration-memory node if required */
-    if (memory_update && smc->dr_lmb_enabled) {
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
+        g_assert(smc->dr_lmb_enabled);
         _FDT((spapr_populate_drconf_memory(spapr, fdt)));
     }
 
@@ -1740,7 +1741,12 @@ static void ppc_spapr_init(MachineState *machine)
                                    DIV_ROUND_UP(max_cpus * smt, smp_threads),
                                    XICS_IRQS_SPAPR, &error_fatal);
 
+    /* Set up containers for ibm,client-set-architecture negotiated options */
+    spapr->ov5 = spapr_ovec_new();
+    spapr->ov5_cas = spapr_ovec_new();
+
     if (smc->dr_lmb_enabled) {
+        spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
         spapr_validate_node_memory(machine, &error_fatal);
     }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index c5e7e8c..f1d081b 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -11,6 +11,7 @@
 #include "trace.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
+#include "hw/ppc/spapr_ovec.h"
 
 struct SPRSyncState {
     int spr;
@@ -880,32 +881,6 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     return ret;
 }
 
-/*
- * Return the offset to the requested option vector @vector in the
- * option vector table @table.
- */
-static target_ulong cas_get_option_vector(int vector, target_ulong table)
-{
-    int i;
-    char nr_vectors, nr_entries;
-
-    if (!table) {
-        return 0;
-    }
-
-    nr_vectors = (ldl_phys(&address_space_memory, table) >> 24) + 1;
-    if (!vector || vector > nr_vectors) {
-        return 0;
-    }
-    table++; /* skip nr option vectors */
-
-    for (i = 0; i < vector - 1; i++) {
-        nr_entries = ldl_phys(&address_space_memory, table) >> 24;
-        table += nr_entries + 2;
-    }
-    return table;
-}
-
 typedef struct {
     uint32_t cpu_version;
     Error *err;
@@ -961,23 +936,21 @@ static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, uint32_t pvr,
     }
 }
 
-#define OV5_DRCONF_MEMORY 0x20
-
 static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
                                                   sPAPRMachineState *spapr,
                                                   target_ulong opcode,
                                                   target_ulong *args)
 {
     target_ulong list = ppc64_phys_to_real(args[0]);
-    target_ulong ov_table, ov5;
+    target_ulong ov_table;
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu_);
     CPUState *cs;
-    bool cpu_match = false, cpu_update = true, memory_update = false;
+    bool cpu_match = false, cpu_update = true;
     unsigned old_cpu_version = cpu_->cpu_version;
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
-    char ov5_byte2;
+    sPAPROptionVector *ov5_guest;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -1033,19 +1006,20 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     /* For the future use: here @ov_table points to the first option vector */
     ov_table = list;
 
-    ov5 = cas_get_option_vector(5, ov_table);
-    if (!ov5) {
-        return H_SUCCESS;
-    }
+    ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
 
-    /* @list now points to OV 5 */
-    ov5_byte2 = ldub_phys(&address_space_memory, ov5 + 2);
-    if (ov5_byte2 & OV5_DRCONF_MEMORY) {
-        memory_update = true;
-    }
+    /* NOTE: there are actually a number of ov5 bits where input from the
+     * guest is always zero, and the platform/QEMU enables them independently
+     * of guest input. To model these properly we'd want some sort of mask,
+     * but since they only currently apply to memory migration as defined
+     * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
+     * to worry about this.
+     */
+    spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
+    spapr_ovec_cleanup(ov5_guest);
 
     if (spapr_h_cas_compose_response(spapr, args[1], args[2],
-                                     cpu_update, memory_update)) {
+                                     cpu_update)) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d5d6e57..a37eee8 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -6,6 +6,7 @@
 #include "hw/ppc/xics.h"
 #include "hw/ppc/spapr_drc.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/ppc/spapr_ovec.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
@@ -72,6 +73,8 @@ struct sPAPRMachineState {
     uint64_t rtc_offset; /* Now used only during incoming migration */
     struct PPCTimebase tb;
     bool has_graphics;
+    sPAPROptionVector *ov5;         /* QEMU-supported option vectors */
+    sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
@@ -583,7 +586,7 @@ void spapr_events_init(sPAPRMachineState *sm);
 void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update, bool memory_update);
+                                 bool cpu_update);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index fba2d98..09afd59 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -42,6 +42,9 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 
 #define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
 
+/* option vector 5 */
+#define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
+
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
 sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 02/10] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-25 23:56   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_dt_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_dt_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_dt_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c         | 40 ++++++++++++++++++++++++++++++++++------
 hw/ppc/spapr_hcall.c   | 22 ++++++++++++++++++----
 include/hw/ppc/spapr.h |  4 +++-
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index af5a239..3b64580 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -655,13 +655,28 @@ out:
     return ret;
 }
 
+static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
+                                sPAPROptionVector *ov5_updates)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    int ret = 0;
+
+    /* Generate ibm,dynamic-reconfiguration-memory node if required */
+    if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
+        g_assert(smc->dr_lmb_enabled);
+        ret = spapr_populate_drconf_memory(spapr, fdt);
+    }
+
+    return ret;
+}
+
 int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update)
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates)
 {
     void *fdt, *fdt_skel;
     sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
 
     size -= sizeof(hdr);
 
@@ -680,10 +695,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
         _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
     }
 
-    /* Generate ibm,dynamic-reconfiguration-memory node if required */
-    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
-        g_assert(smc->dr_lmb_enabled);
-        _FDT((spapr_populate_drconf_memory(spapr, fdt)));
+    if (spapr_dt_cas_updates(spapr, fdt, ov5_updates)) {
+        return -1;
     }
 
     /* Pack resulting tree */
@@ -972,6 +985,13 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
         _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base, spapr->initrd_size)));
     }
 
+    /* ibm,client-architecture-support updates */
+    ret = spapr_dt_cas_updates(spapr, fdt, spapr->ov5_cas);
+    if (ret < 0) {
+        error_report("couldn't setup CAS properties fdt");
+        exit(1);
+    }
+
     return fdt;
 }
 
@@ -1137,6 +1157,13 @@ static void ppc_spapr_reset(void)
     rtas_addr = rtas_limit - RTAS_MAX_SIZE;
     fdt_addr = rtas_addr - FDT_MAX_SIZE;
 
+    /* if this reset wasn't generated by CAS, we should reset our
+     * negotiated options and start from scratch */
+    if (!spapr->cas_reboot) {
+        spapr_ovec_cleanup(spapr->ov5_cas);
+        spapr->ov5_cas = spapr_ovec_new();
+    }
+
     fdt = spapr_build_fdt(spapr, rtas_addr, spapr->rtas_size);
 
     spapr_load_rtas(spapr, fdt, rtas_addr);
@@ -1164,6 +1191,7 @@ static void ppc_spapr_reset(void)
     first_cpu->halted = 0;
     first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
 
+    spapr->cas_reboot = false;
 }
 
 static void spapr_create_nvram(sPAPRMachineState *spapr)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f1d081b..d277813 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -950,7 +950,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
-    sPAPROptionVector *ov5_guest;
+    sPAPROptionVector *ov5_guest, *ov5_cas_old, *ov5_updates;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -1013,13 +1013,27 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
      * of guest input. To model these properly we'd want some sort of mask,
      * but since they only currently apply to memory migration as defined
      * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
-     * to worry about this.
+     * to worry about this for now.
      */
+    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
+    /* full range of negotiated ov5 capabilities */
     spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
     spapr_ovec_cleanup(ov5_guest);
+    /* capabilities that have been added since CAS-generated guest reset.
+     * if capabilities have since been removed, generate another reset
+     */
+    ov5_updates = spapr_ovec_new();
+    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
+                                        ov5_cas_old, spapr->ov5_cas);
+
+    if (!spapr->cas_reboot) {
+        spapr->cas_reboot =
+            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
+                                         ov5_updates);
+    }
+    spapr_ovec_cleanup(ov5_updates);
 
-    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
-                                     cpu_update)) {
+    if (spapr->cas_reboot) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a37eee8..b6f9f1b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -75,6 +75,7 @@ struct sPAPRMachineState {
     bool has_graphics;
     sPAPROptionVector *ov5;         /* QEMU-supported option vectors */
     sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
+    bool cas_reboot;
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
@@ -586,7 +587,8 @@ void spapr_events_init(sPAPRMachineState *sm);
 void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update);
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (2 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-25 23:58   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 05/10] spapr: update spapr hotplug documentation Michael Roth
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 23 +++++++++++++++++------
 include/hw/ppc/spapr_ovec.h |  1 +
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3b64580..828072a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -659,14 +659,28 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
                                 sPAPROptionVector *ov5_updates)
 {
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-    int ret = 0;
+    int ret = 0, offset;
 
     /* Generate ibm,dynamic-reconfiguration-memory node if required */
     if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
         g_assert(smc->dr_lmb_enabled);
         ret = spapr_populate_drconf_memory(spapr, fdt);
+        if (ret) {
+            goto out;
+        }
     }
 
+    offset = fdt_path_offset(fdt, "/chosen");
+    if (offset < 0) {
+        offset = fdt_add_subnode(fdt, 0, "chosen");
+        if (offset < 0) {
+            return offset;
+        }
+    }
+    ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
+                                 "ibm,architecture-vec-5");
+
+out:
     return ret;
 }
 
@@ -792,14 +806,9 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
     char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
     size_t cb = 0;
     char *bootlist = get_boot_devices_list(&cb, true);
-    unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
 
     _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
 
-    /* Set Form1_affinity */
-    _FDT(fdt_setprop(fdt, chosen, "ibm,architecture-vec-5",
-                     vec5, sizeof(vec5)));
-
     _FDT(fdt_setprop_string(fdt, chosen, "bootargs", machine->kernel_cmdline));
     _FDT(fdt_setprop_cell(fdt, chosen, "linux,initrd-start",
                           spapr->initrd_base));
@@ -1778,6 +1787,8 @@ static void ppc_spapr_init(MachineState *machine)
         spapr_validate_node_memory(machine, &error_fatal);
     }
 
+    spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
+
     /* init CPUs */
     if (machine->cpu_model == NULL) {
         machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index 09afd59..47fa04c 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 
 /* option vector 5 */
 #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
+#define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
 
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 05/10] spapr: update spapr hotplug documentation
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (3 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options Michael Roth
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

This updates the existing documentation to reflect recent updates to
the hotplug event structure, which are in draft form but slated
for inclusion in PAPR/LoPAPR.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 docs/specs/ppc-spapr-hotplug.txt | 55 +++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
index 631b0ca..f57e2a0 100644
--- a/docs/specs/ppc-spapr-hotplug.txt
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -233,12 +233,27 @@ tools by host-level management such as an HMC. This level of management is not
 applicable to PowerKVM, hence the reason for extending the notification
 framework to support hotplug events.
 
-Note that these events are not yet formally part of the PAPR+ specification,
-but support for this format has already been implemented in DR-related
-guest tools such as powerpc-utils/librtas, as well as kernel patches that have
-been submitted to handle in-kernel processing of memory/cpu-related hotplug
-events[1], and is planned for formal inclusion is PAPR+ specification. The
-hotplug-specific payload is QEMU implemented as follows (with all values
+The format for these EPOW-signalled events is described below under
+"hotplug/unplug event structure". Note that these events are not
+formally part of the PAPR+ specification, and have been superseded by a
+newer format, also described below under "hotplug/unplug event structure",
+and so are now deemed a "legacy" format. The formats are similar, but the
+"modern" format contains additional fields/flags, which are denoted for the
+purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards.
+
+QEMU should assume support only for "legacy" fields/flags unless the guest
+advertises support for the "modern" format via ibm,client-architecture-support
+hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector
+structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events,
+"modern" format events are surfaced to the guest via check-exception RTAS calls,
+but use a dedicated event source to signal the guest. This event source is
+advertised to the guest by the addition of a "hot-plug-events" node under
+"/event-sources" node of the guest's device tree using the standard format
+described in LoPAPR v11, B.6.12.1.
+
+== hotplug/unplug event structure ==
+
+The hotplug-specific payload in QEMU is implemented as follows (with all values
 encoded in big-endian format):
 
 struct rtas_event_log_v6_hp {
@@ -263,14 +278,23 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_ACTION_ADD       1
 #define RTAS_LOG_V6_HP_ACTION_REMOVE    2
     uint8_t hotplug_action;             /* action (add/remove) */
-#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
-#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
-#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
+#define RTAS_LOG_V6_HP_ID_DRC_NAME          1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX         2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT         3
+#ifdef GUEST_SUPPORTS_MODERN
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
+#endif
     uint8_t hotplug_identifier;         /* type of the resource identifier,
                                          * which serves as the discriminator
                                          * for the 'drc' union field below
                                          */
+#ifdef GUEST_SUPPORTS_MODERN
+    uint8_t capabilities;               /* capability flags, currently unused
+                                         * by QEMU
+                                         */
+#else
     uint8_t reserved;
+#endif
     union {
         uint32_t index;                 /* DRC index of resource to take action
                                          * on
@@ -278,6 +302,19 @@ struct rtas_event_log_v6_hp {
         uint32_t count;                 /* number of DR resources to take
                                          * action on (guest chooses which)
                                          */
+#ifdef GUEST_SUPPORTS_MODERN
+        struct {
+            uint32_t count;             /* number of DR resources to take
+                                         * action on
+                                         */
+            uint32_t index;             /* DRC index of first resource to take
+                                         * action on. guest will take action
+                                         * on DRC index <index> through
+                                         * DRC index <index + count - 1> in
+                                         * sequential order
+                                         */
+        } count_indexed;
+#endif
         char name[1];                   /* string representing the name of the
                                          * DRC to take action on
                                          */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (4 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 05/10] spapr: update spapr hotplug documentation Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-26  0:25   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source Michael Roth
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

This adds machine options of the form:

  -machine pseries,modern-hotplug-events=true
  -machine pseries,modern-hotplug-events=false

If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.

If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.

For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 33 +++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h      |  1 +
 include/hw/ppc/spapr_ovec.h |  1 +
 3 files changed, 35 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 828072a..a3ea140 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1789,6 +1789,11 @@ static void ppc_spapr_init(MachineState *machine)
 
     spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
 
+    /* advertise support for dedicated HP event source to guests */
+    if (spapr->use_hotplug_event_source) {
+        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
+    }
+
     /* init CPUs */
     if (machine->cpu_model == NULL) {
         machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
@@ -2138,16 +2143,41 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
     spapr->kvm_type = g_strdup(value);
 }
 
+static bool spapr_get_modern_hotplug_events(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return spapr->use_hotplug_event_source;
+}
+
+static void spapr_set_modern_hotplug_events(Object *obj, bool value,
+                                            Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->use_hotplug_event_source = value;
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
 
     spapr->htab_fd = -1;
+    spapr->use_hotplug_event_source = true;
     object_property_add_str(obj, "kvm-type",
                             spapr_get_kvm_type, spapr_set_kvm_type, NULL);
     object_property_set_description(obj, "kvm-type",
                                     "Specifies the KVM virtualization mode (HV, PR)",
                                     NULL);
+    object_property_add_bool(obj, "modern-hotplug-events",
+                            spapr_get_modern_hotplug_events,
+                            spapr_set_modern_hotplug_events,
+                            NULL);
+    object_property_set_description(obj, "modern-hotplug-events",
+                                    "Use dedicated hotplug event mechanism in"
+                                    " place of standard EPOW events when possible"
+                                    " (required for memory hot-unplug support)",
+                                    NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -2594,7 +2624,10 @@ static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
 
 static void spapr_machine_2_7_instance_options(MachineState *machine)
 {
+    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
     spapr_machine_2_8_instance_options(machine);
+    spapr->use_hotplug_event_source = false;
 }
 
 static void spapr_machine_2_7_class_options(MachineClass *mc)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b6f9f1b..851f536 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -80,6 +80,7 @@ struct sPAPRMachineState {
     uint32_t check_exception_irq;
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
+    bool use_hotplug_event_source;
 
     /* Migration state */
     int htab_save_index;
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index 47fa04c..92167c6 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 /* option vector 5 */
 #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
 #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
+#define OV5_HP_EVT              OV_BIT(6, 5)
 
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (5 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-26  0:42   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type Michael Roth
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   4 +-
 hw/ppc/spapr_events.c  | 202 ++++++++++++++++++++++++++++++++++++++++---------
 include/hw/ppc/spapr.h |   5 +-
 3 files changed, 170 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a3ea140..dc4224b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -973,7 +973,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
     }
 
     /* /event-sources */
-    spapr_dt_events(fdt, spapr->check_exception_irq);
+    spapr_dt_events(spapr, fdt);
 
     /* /rtas */
     spapr_dt_rtas(spapr, fdt);
@@ -1917,7 +1917,7 @@ static void ppc_spapr_init(MachineState *machine)
     }
     g_free(filename);
 
-    /* Set up EPOW events infrastructure */
+    /* Set up RTAS event infrastructure */
     spapr_events_init(spapr);
 
     /* Set up the RTC RTAS interfaces */
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 89aa5a7..b6b3511 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -40,6 +40,7 @@
 #include "hw/ppc/spapr_drc.h"
 #include "qemu/help_option.h"
 #include "qemu/bcd.h"
+#include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
 
 struct rtas_error_log {
@@ -206,27 +207,140 @@ struct hp_log_full {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
-#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
-#define EVENT_MASK_EPOW                      0x40000000
-#define EVENT_MASK_HOTPLUG                   0x10000000
-#define EVENT_MASK_IO                        0x08000000
+typedef enum EventClass {
+    EVENT_CLASS_INTERNAL_ERRORS     = 0,
+    EVENT_CLASS_EPOW                = 1,
+    EVENT_CLASS_RESERVED            = 2,
+    EVENT_CLASS_HOT_PLUG            = 3,
+    EVENT_CLASS_IO                  = 4,
+    EVENT_CLASS_MAX
+} EventClassIndex;
+#define EVENT_CLASS_MASK(index) (1 << (31 - index))
+
+static const char *event_names[EVENT_CLASS_MAX] = {
+    [EVENT_CLASS_INTERNAL_ERRORS]       = "internal-errors",
+    [EVENT_CLASS_EPOW]                  = "epow-events",
+    [EVENT_CLASS_HOT_PLUG]              = "hot-plug-events",
+    [EVENT_CLASS_IO]                    = "ibm,io-events",
+};
+
+struct sPAPREventSource {
+    const char *name;
+    int irq;
+    uint32_t mask;
+    bool enabled;
+};
+
+static sPAPREventSource *spapr_event_sources_new(void)
+{
+    sPAPREventSource *event_sources = g_new0(sPAPREventSource,
+                                             EVENT_CLASS_MAX);
+    int i;
+
+    for (i = 0; i < EVENT_CLASS_MAX; i++) {
+        event_sources[i].name = event_names[i];
+    }
 
-void spapr_dt_events(void *fdt, uint32_t check_exception_irq)
+    return event_sources;
+}
+
+static void spapr_event_sources_register(sPAPREventSource *event_sources,
+                                        EventClassIndex index, int irq)
 {
-    int event_sources, epow_events;
-    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
-    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
+    /* we only support 1 irq per event class at the moment */
+    g_assert(event_sources);
+    g_assert(!event_sources[index].enabled);
+    event_sources[index].irq = irq;
+    event_sources[index].mask = EVENT_CLASS_MASK(index);
+    event_sources[index].enabled = true;
+}
+
+static const sPAPREventSource
+*spapr_event_sources_get_source(sPAPREventSource *event_sources,
+                                EventClassIndex index)
+{
+    g_assert(index < EVENT_CLASS_MAX);
+    g_assert(event_sources);
+
+    return &event_sources[index];
+}
+
+void spapr_dt_events(sPAPRMachineState *spapr, void *fdt)
+{
+    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
+    int i, count = 0, event_sources;
+    sPAPREventSource *events = spapr->event_sources;
+
+    g_assert(events);
 
     _FDT(event_sources = fdt_add_subnode(fdt, 0, "event-sources"));
 
-    _FDT(fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0));
-    _FDT(fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2));
-    _FDT(fdt_setprop(fdt, event_sources, "interrupt-ranges",
-                     irq_ranges, sizeof(irq_ranges)));
+    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
+        int node_offset;
+        uint32_t interrupts[2];
+        const sPAPREventSource *source =
+            spapr_event_sources_get_source(events, i);
+
+        if (!source->enabled) {
+            continue;
+        }
+
+        interrupts[0] = cpu_to_be32(source->irq);
+        interrupts[1] = 0;
+
+        _FDT(node_offset = fdt_add_subnode(fdt, event_sources, source->name));
+        _FDT(fdt_setprop(fdt, node_offset, "interrupts", interrupts,
+                         sizeof(interrupts)));
+
+        irq_ranges[count++] = interrupts[0];
+        irq_ranges[count++] = cpu_to_be32(1);
+    }
+
+    irq_ranges[count] = cpu_to_be32(count);
+    count++;
+
+    _FDT((fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0)));
+    _FDT((fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2)));
+    _FDT((fdt_setprop(fdt, event_sources, "interrupt-ranges",
+                      irq_ranges, count * sizeof(uint32_t))));
+}
+
+static const sPAPREventSource
+*rtas_event_log_to_source(sPAPRMachineState *spapr, int log_type)
+{
+    const sPAPREventSource *source;
+
+    g_assert(spapr->event_sources);
+
+    switch (log_type) {
+    case RTAS_LOG_TYPE_HOTPLUG:
+        source = spapr_event_sources_get_source(spapr->event_sources,
+                                                EVENT_CLASS_HOT_PLUG);
+        if (spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT)) {
+            g_assert(source->enabled);
+            break;
+        }
+        /* fall back to epow for legacy hotplug interrupt source */
+    case RTAS_LOG_TYPE_EPOW:
+        source = spapr_event_sources_get_source(spapr->event_sources,
+                                                EVENT_CLASS_EPOW);
+        break;
+    default:
+        source = NULL;
+    }
 
-    _FDT(epow_events = fdt_add_subnode(fdt, event_sources, "epow-events"));
-    _FDT(fdt_setprop(fdt, epow_events, "interrupts",
-                     interrupts, sizeof(interrupts)));
+    return source;
+}
+
+static int rtas_event_log_to_irq(sPAPRMachineState *spapr, int log_type)
+{
+    const sPAPREventSource *source;
+
+    source = rtas_event_log_to_source(spapr, log_type);
+    g_assert(source);
+    g_assert(source->enabled);
+
+    return source->irq;
 }
 
 static void rtas_event_log_queue(int log_type, void *data, bool exception)
@@ -247,19 +361,15 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = NULL;
 
-    /* we only queue EPOW events atm. */
-    if ((event_mask & EVENT_MASK_EPOW) == 0) {
-        return NULL;
-    }
-
     QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
+        const sPAPREventSource *source =
+            rtas_event_log_to_source(spapr, entry->log_type);
+
         if (entry->exception != exception) {
             continue;
         }
 
-        /* EPOW and hotplug events are surfaced in the same manner */
-        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
-            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
+        if (source->mask & event_mask) {
             break;
         }
     }
@@ -276,19 +386,15 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = NULL;
 
-    /* we only queue EPOW events atm. */
-    if ((event_mask & EVENT_MASK_EPOW) == 0) {
-        return false;
-    }
-
     QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
+        const sPAPREventSource *source =
+            rtas_event_log_to_source(spapr, entry->log_type);
+
         if (entry->exception != exception) {
             continue;
         }
 
-        /* EPOW and hotplug events are surfaced in the same manner */
-        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
-            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
+        if (source->mask & event_mask) {
             return true;
         }
     }
@@ -376,7 +482,9 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
 
     rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
 
-    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->xics,
+                                 rtas_event_log_to_irq(spapr,
+                                                       RTAS_LOG_TYPE_EPOW)));
 }
 
 static void spapr_hotplug_set_signalled(uint32_t drc_index)
@@ -458,7 +566,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
 
     rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
 
-    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->xics,
+                                 rtas_event_log_to_irq(spapr,
+                                                       RTAS_LOG_TYPE_HOTPLUG)));
 }
 
 void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
@@ -504,6 +614,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     uint64_t xinfo;
     sPAPREventLogEntry *event;
     struct rtas_error_log *hdr;
+    int i;
 
     if ((nargs < 6) || (nargs > 7) || nret != 1) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -540,8 +651,14 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
      * do the latter here, since our code relies on edge-triggered
      * interrupts.
      */
-    if (rtas_event_log_contains(mask, true)) {
-        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    for (i = 0; i < EVENT_CLASS_MAX; i++) {
+        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
+            const sPAPREventSource *source =
+                spapr_event_sources_get_source(spapr->event_sources, i);
+
+            g_assert(source->enabled);
+            qemu_irq_pulse(xics_get_qirq(spapr->xics, source->irq));
+        }
     }
 
     return;
@@ -593,8 +710,19 @@ out_no_events:
 void spapr_events_init(sPAPRMachineState *spapr)
 {
     QTAILQ_INIT(&spapr->pending_events);
-    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, false,
-                                            &error_fatal);
+
+    spapr->event_sources = spapr_event_sources_new();
+
+    spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
+                                 xics_spapr_alloc(spapr->xics, 0, false,
+                                                  &error_fatal));
+
+    if (spapr->use_hotplug_event_source) {
+        spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
+                                     xics_spapr_alloc(spapr->xics, 0, false,
+                                                      &error_fatal));
+    }
+
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 851f536..21af971 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -13,6 +13,7 @@ struct sPAPRPHBState;
 struct sPAPRNVRAM;
 typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
 typedef struct sPAPREventLogEntry sPAPREventLogEntry;
+typedef struct sPAPREventSource sPAPREventSource;
 
 #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
 #define SPAPR_ENTRY_POINT       0x100
@@ -77,10 +78,10 @@ struct sPAPRMachineState {
     sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
     bool cas_reboot;
 
-    uint32_t check_exception_irq;
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
     bool use_hotplug_event_source;
+    sPAPREventSource *event_sources;
 
     /* Migration state */
     int htab_save_index;
@@ -585,7 +586,7 @@ struct sPAPREventLogEntry {
 };
 
 void spapr_events_init(sPAPRMachineState *sm);
-void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
+void spapr_dt_events(sPAPRMachineState *sm, void *fdt);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
                                  bool cpu_update,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (6 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-26  0:48   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug Michael Roth
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
  used in PAPR hotplug ACR
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_events.c  | 76 ++++++++++++++++++++++++++++++++++++++++----------
 include/hw/ppc/spapr.h |  4 +++
 2 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index b6b3511..596e991 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -175,6 +175,16 @@ struct epow_log_full {
     struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+union drc_identifier {
+    uint32_t index;
+    uint32_t count;
+    struct {
+        uint32_t count;
+        uint32_t index;
+    } count_indexed;
+    char name[1];
+} QEMU_PACKED;
+
 struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
     struct rtas_event_log_v6_section_header hdr;
@@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
 #define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
 #define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED              4
     uint8_t reserved;
-    union {
-        uint32_t index;
-        uint32_t count;
-        char name[1];
-    } drc;
+    union drc_identifier drc_id;
 } QEMU_PACKED;
 
 struct hp_log_full {
@@ -496,7 +503,7 @@ static void spapr_hotplug_set_signalled(uint32_t drc_index)
 
 static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
                                     sPAPRDRConnectorType drc_type,
-                                    uint32_t drc)
+                                    union drc_identifier *drc_id)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     struct hp_log_full *new_hp;
@@ -541,7 +548,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     case SPAPR_DR_CONNECTOR_TYPE_PCI:
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
         if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
-            spapr_hotplug_set_signalled(drc);
+            spapr_hotplug_set_signalled(drc_id->index);
         }
         break;
     case SPAPR_DR_CONNECTOR_TYPE_LMB:
@@ -559,9 +566,18 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     }
 
     if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
-        hp->drc.count = cpu_to_be32(drc);
+        hp->drc_id.count = cpu_to_be32(drc_id->count);
     } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
-        hp->drc.index = cpu_to_be32(drc);
+        hp->drc_id.index = cpu_to_be32(drc_id->index);
+    } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
+        /* we should not be using count_indexed value unless the guest
+         * supports dedicated hotplug event source
+         */
+        g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
+        hp->drc_id.count_indexed.count =
+            cpu_to_be32(drc_id->count_indexed.count);
+        hp->drc_id.count_indexed.index =
+            cpu_to_be32(drc_id->count_indexed.index);
     }
 
     rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
@@ -575,34 +591,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
 {
     sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
     sPAPRDRConnectorType drc_type = drck->get_type(drc);
-    uint32_t index = drck->get_index(drc);
+    union drc_identifier drc_id;
 
+    drc_id.index = drck->get_index(drc);
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
-                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
 {
     sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
     sPAPRDRConnectorType drc_type = drck->get_type(drc);
-    uint32_t index = drck->get_index(drc);
+    union drc_identifier drc_id;
 
+    drc_id.index = drck->get_index(drc);
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
-                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
                                        uint32_t count)
 {
+    union drc_identifier drc_id;
+
+    drc_id.count = count;
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
-                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, count);
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
                                           uint32_t count)
 {
+    union drc_identifier drc_id;
+
+    drc_id.count = count;
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
-                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, count);
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
+}
+
+void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                            uint32_t count, uint32_t index)
+{
+    union drc_identifier drc_id;
+
+    drc_id.count_indexed.count = count;
+    drc_id.count_indexed.index = index;
+    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
+}
+
+void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                               uint32_t count, uint32_t index)
+{
+    union drc_identifier drc_id;
+
+    drc_id.count_indexed.count = count;
+    drc_id.count_indexed.index = index;
+    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
 static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 21af971..bd5bcf7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -610,6 +610,10 @@ void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
                                        uint32_t count);
 void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
                                           uint32_t count);
+void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                            uint32_t count, uint32_t index);
+void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                               uint32_t count, uint32_t index);
 void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp);
 void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
                                     sPAPRMachineState *spapr);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (7 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-26  0:49   ` David Gibson
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support Michael Roth
  2016-10-26  0:02 ` [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support David Gibson
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

Commit 0a417869:

    spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type

dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.

With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.

Cc: bharata@linux.vnet.ibm.com
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index dc4224b..0b3aa2f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2202,14 +2202,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
-static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
-                           uint32_t node, Error **errp)
+static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
+                           uint32_t node, bool dedicated_hp_event_source,
+                           Error **errp)
 {
     sPAPRDRConnector *drc;
     sPAPRDRConnectorClass *drck;
     uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
     int i, fdt_offset, fdt_size;
     void *fdt;
+    uint64_t addr = addr_start;
 
     for (i = 0; i < nr_lmbs; i++) {
         drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
@@ -2228,7 +2230,17 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
      * guest only in case of hotplugged memory
      */
     if (dev->hotplugged) {
-       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+        if (dedicated_hp_event_source) {
+            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                    addr_start / SPAPR_MEMORY_BLOCK_SIZE);
+            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+            spapr_hotplug_req_add_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                                   nr_lmbs,
+                                                   drck->get_index(drc));
+        } else {
+            spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                           nr_lmbs);
+        }
     }
 }
 
@@ -2261,7 +2273,9 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    spapr_add_lmbs(dev, addr, size, node, &error_abort);
+    spapr_add_lmbs(dev, addr, size, node,
+                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
+                   &error_abort);
 
 out:
     error_propagate(errp, local_err);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (8 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug Michael Roth
@ 2016-10-25  4:47 ` Michael Roth
  2016-10-26  0:57   ` David Gibson
  2016-10-26  0:02 ` [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support David Gibson
  10 siblings, 1 reply; 22+ messages in thread
From: Michael Roth @ 2016-10-25  4:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: nfont, david, qemu-ppc, jallen, bharata

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Add support to hot remove pc-dimm memory devices.

Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c     | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_drc.c |  17 ++++++++
 2 files changed, 135 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0b3aa2f..a4a6058 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2281,6 +2281,90 @@ out:
     error_propagate(errp, local_err);
 }
 
+typedef struct sPAPRDIMMState {
+    uint32_t nr_lmbs;
+} sPAPRDIMMState;
+
+static void spapr_lmb_release(DeviceState *dev, void *opaque)
+{
+    sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
+    HotplugHandler *hotplug_ctrl = NULL;
+
+    if (--ds->nr_lmbs) {
+        return;
+    }
+
+    g_free(ds);
+
+    /*
+     * Now that all the LMBs have been removed by the guest, call the
+     * pc-dimm unplug handler to cleanup up the pc-dimm device.
+     */
+    hotplug_ctrl = qdev_get_hotplug_handler(dev);
+    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
+}
+
+static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
+                           Error **errp)
+{
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
+    int i;
+    sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
+    uint64_t addr = addr_start;
+
+    ds->nr_lmbs = nr_lmbs;
+    for (i = 0; i < nr_lmbs; i++) {
+        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                addr / SPAPR_MEMORY_BLOCK_SIZE);
+        g_assert(drc);
+
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+        drck->detach(drc, dev, spapr_lmb_release, ds, errp);
+        addr += SPAPR_MEMORY_BLOCK_SIZE;
+    }
+
+    drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                              nr_lmbs,
+                                              drck->get_index(drc));
+}
+
+static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                Error **errp)
+{
+    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+
+    pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
+    object_unparent(OBJECT(dev));
+}
+
+static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
+                                        DeviceState *dev, Error **errp)
+{
+    Error *local_err = NULL;
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+    uint64_t size = memory_region_size(mr);
+    uint64_t addr;
+
+    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    spapr_del_lmbs(dev, addr, size, &error_abort);
+out:
+    error_propagate(errp, local_err);
+}
+
 void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
                                     sPAPRMachineState *spapr)
 {
@@ -2354,10 +2438,42 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
+    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
     MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        error_setg(errp, "Memory hot unplug not supported by sPAPR");
+        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
+            spapr_memory_unplug(hotplug_dev, dev, errp);
+        } else {
+            error_setg(errp, "Memory hot unplug not supported for this guest");
+        }
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
+        if (!mc->query_hotpluggable_cpus) {
+            error_setg(errp, "CPU hot unplug not supported on this machine");
+            return;
+        }
+        spapr_core_unplug(hotplug_dev, dev, errp);
+    }
+}
+
+static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
+                                                DeviceState *dev, Error **errp)
+{
+    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
+    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
+            spapr_memory_unplug_request(hotplug_dev, dev, errp);
+        } else {
+            /* NOTE: this means there is a window after guest reset, prior to
+             * CAS negotiation, where unplug requests will fail due to the
+             * capability not being detected yet. This is a bit different than
+             * the case with PCI unplug, where the events will be queued and
+             * eventually handled by the guest after boot
+             */
+            error_setg(errp, "Memory hot unplug not supported for this guest");
+        }
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
         if (!mc->query_hotpluggable_cpus) {
             error_setg(errp, "CPU hot unplug not supported on this machine");
@@ -2503,6 +2619,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     hc->plug = spapr_machine_device_plug;
     hc->unplug = spapr_machine_device_unplug;
     mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
+    hc->unplug_request = spapr_machine_device_unplug_request;
 
     smc->dr_lmb_enabled = true;
     smc->tcg_default_cpu = "POWER8";
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 6e54fd4..a0c44ee 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -68,6 +68,23 @@ static uint32_t set_isolation_state(sPAPRDRConnector *drc,
         }
     }
 
+    /*
+     * Fail any requests to ISOLATE the LMB DRC if this LMB doesn't
+     * belong to a DIMM device that is marked for removal.
+     *
+     * Currently the guest userspace tool drmgr that drives the memory
+     * hotplug/unplug will just try to remove a set of 'removable' LMBs
+     * in response to a hot unplug request that is based on drc-count.
+     * If the LMB being removed doesn't belong to a DIMM device that is
+     * actually being unplugged, fail the isolation request here.
+     */
+    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_LMB) {
+        if ((state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
+             !drc->awaiting_release) {
+            return RTAS_OUT_HW_ERROR;
+        }
+    }
+
     drc->isolation_state = state;
 
     if (drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers Michael Roth
@ 2016-10-25 23:54   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-25 23:54 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 13212 bytes --]

On Mon, Oct 24, 2016 at 11:47:27PM -0500, Michael Roth wrote:
> PAPR guests advertise their capabilities to the platform by passing
> an ibm,architecture-vec structure via an
> ibm,client-architecture-support hcall as described by LoPAPR v11,
> B.6.2.3. during early boot.
> 
> Using this information, the platform enables the capabilities it
> supports, then encodes a subset of those enabled capabilities (the
> 5th option vector of the ibm,architecture-vec structure passed to
> ibm,client-architecture-support) into the guest device tree via
> "/chosen/ibm,architecture-vec-5".
> 
> The logical format of these these option vectors is a bit-vector,
> where individual bits are addressed/documented based on the byte-wise
> offset from the beginning of the bit-vector, followed by the bit-wise
> index starting from the byte-wise offset. Thus the bits of each of
> these bytes are stored in reverse order. Additionally, the first
> byte of each option vector is encodes the length of the option vector,
> so byte offsets begin at 1, and bit offset at 0.
> 
> This is not very intuitive for the purposes of mapping these bits to
> a particular documented capability, so this patch introduces a set
> of abstractions that encapsulate the work of parsing/encoding these
> options vectors and testing for individual capabilities.
> 
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/Makefile.objs        |   2 +-
>  hw/ppc/spapr_ovec.c         | 242 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_ovec.h |  62 ++++++++++++
>  3 files changed, 305 insertions(+), 1 deletion(-)
>  create mode 100644 hw/ppc/spapr_ovec.c
>  create mode 100644 include/hw/ppc/spapr_ovec.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index ebc72af..8025129 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> -obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> +obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
>  # IBM PowerNV
>  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> new file mode 100644
> index 0000000..c2a0d18
> --- /dev/null
> +++ b/hw/ppc/spapr_ovec.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU SPAPR Architecture Option Vector Helper Functions
> + *
> + * Copyright IBM Corp. 2016
> + *
> + * Authors:
> + *  Bharata B Rao     <bharata@linux.vnet.ibm.com>
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/ppc/spapr_ovec.h"
> +#include "qemu/bitmap.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/error-report.h"
> +#include <libfdt.h>
> +
> +/* #define DEBUG_SPAPR_OVEC */
> +
> +#ifdef DEBUG_SPAPR_OVEC
> +#define DPRINTFN(fmt, ...) \
> +    do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTFN(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +#define OV_MAXBYTES 256 /* not including length byte */
> +#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
> +
> +/* we *could* work with bitmaps directly, but handling the bitmap privately
> + * allows us to more safely make assumptions about the bitmap size and
> + * simplify the calling code somewhat
> + */
> +struct sPAPROptionVector {
> +    unsigned long *bitmap;
> +};
> +
> +sPAPROptionVector *spapr_ovec_new(void)
> +{
> +    sPAPROptionVector *ov;
> +
> +    ov = g_new0(sPAPROptionVector, 1);
> +    ov->bitmap = bitmap_new(OV_MAXBITS);
> +
> +    return ov;
> +}
> +
> +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
> +{
> +    sPAPROptionVector *ov;
> +
> +    g_assert(ov_orig);
> +
> +    ov = spapr_ovec_new();
> +    bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
> +
> +    return ov;
> +}
> +
> +void spapr_ovec_intersect(sPAPROptionVector *ov,
> +                          sPAPROptionVector *ov1,
> +                          sPAPROptionVector *ov2)
> +{
> +    g_assert(ov);
> +    g_assert(ov1);
> +    g_assert(ov2);
> +
> +    bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
> +}
> +
> +/* returns true if options bits were removed, false otherwise */
> +bool spapr_ovec_diff(sPAPROptionVector *ov,
> +                     sPAPROptionVector *ov_old,
> +                     sPAPROptionVector *ov_new)
> +{
> +    unsigned long *change_mask = bitmap_new(OV_MAXBITS);
> +    unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
> +    bool bits_were_removed = false;
> +
> +    g_assert(ov);
> +    g_assert(ov_old);
> +    g_assert(ov_new);
> +
> +    bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
> +    bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
> +    bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
> +
> +    if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
> +        bits_were_removed = true;
> +    }
> +
> +    g_free(change_mask);
> +    g_free(removed_bits);
> +
> +    return bits_were_removed;
> +}
> +
> +void spapr_ovec_cleanup(sPAPROptionVector *ov)
> +{
> +    if (ov) {
> +        g_free(ov->bitmap);
> +        g_free(ov);
> +    }
> +}
> +
> +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    set_bit(bitnr, ov->bitmap);
> +}
> +
> +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    clear_bit(bitnr, ov->bitmap);
> +}
> +
> +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    return test_bit(bitnr, ov->bitmap) ? true : false;
> +}
> +
> +static void guest_byte_to_bitmap(uint8_t entry, unsigned long *bitmap,
> +                                 long bitmap_offset)
> +{
> +    int i;
> +
> +    for (i = 0; i < BITS_PER_BYTE; i++) {
> +        if (entry & (1 << (BITS_PER_BYTE - 1 - i))) {
> +            bitmap_set(bitmap, bitmap_offset + i, 1);
> +        }
> +    }
> +}
> +
> +static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
> +{
> +    uint8_t entry = 0;
> +    int i;
> +
> +    for (i = 0; i < BITS_PER_BYTE; i++) {
> +        if (test_bit(bitmap_offset + i, bitmap)) {
> +            entry |= (1 << (BITS_PER_BYTE - 1 - i));
> +        }
> +    }
> +
> +    return entry;
> +}
> +
> +static target_ulong vector_addr(target_ulong table_addr, int vector)
> +{
> +    uint16_t vector_count, vector_len;
> +    int i;
> +
> +    vector_count = ldub_phys(&address_space_memory, table_addr) + 1;
> +    if (vector > vector_count) {
> +        return 0;
> +    }
> +    table_addr++; /* skip nr option vectors */
> +
> +    for (i = 0; i < vector - 1; i++) {
> +        vector_len = ldub_phys(&address_space_memory, table_addr) + 1;
> +        table_addr += vector_len + 1; /* bit-vector + length byte */
> +    }
> +    return table_addr;
> +}
> +
> +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector)
> +{
> +    sPAPROptionVector *ov;
> +    target_ulong addr;
> +    uint16_t vector_len;
> +    int i;
> +
> +    g_assert(table_addr);
> +    g_assert_cmpint(vector, >=, 1); /* vector numbering starts at 1 */
> +
> +    addr = vector_addr(table_addr, vector);
> +    if (!addr) {
> +        /* specified vector isn't present */
> +        return NULL;
> +    }
> +
> +    vector_len = ldub_phys(&address_space_memory, addr++) + 1;
> +    g_assert_cmpint(vector_len, <=, OV_MAXBYTES);
> +    ov = spapr_ovec_new();
> +
> +    for (i = 0; i < vector_len; i++) {
> +        uint8_t entry = ldub_phys(&address_space_memory, addr + i);
> +        if (entry) {
> +            DPRINTFN("read guest vector %2d, byte %3d / %3d: 0x%.2x",
> +                     vector, i + 1, vector_len, entry);
> +            guest_byte_to_bitmap(entry, ov->bitmap, i * BITS_PER_BYTE);
> +        }
> +    }
> +
> +    return ov;
> +}
> +
> +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> +                           sPAPROptionVector *ov, const char *name)
> +{
> +    uint8_t vec[OV_MAXBYTES + 1];
> +    uint16_t vec_len;
> +    unsigned long lastbit;
> +    int i;
> +
> +    g_assert(ov);
> +
> +    lastbit = find_last_bit(ov->bitmap, OV_MAXBITS);
> +    /* if no bits are set, include at least 1 byte of the vector so we can
> +     * still encoded this in the device tree while abiding by the same
> +     * encoding/sizing expected in ibm,client-architecture-support
> +     */
> +    vec_len = (lastbit == OV_MAXBITS) ? 1 : lastbit / BITS_PER_BYTE + 1;
> +    g_assert_cmpint(vec_len, <=, OV_MAXBYTES);
> +    /* guest expects vector len encoded as vec_len - 1, since the length byte
> +     * is assumed and not included, and the first byte of the vector
> +     * is assumed as well
> +     */
> +    vec[0] = vec_len - 1;
> +
> +    for (i = 1; i < vec_len + 1; i++) {
> +        vec[i] = guest_byte_from_bitmap(ov->bitmap, (i - 1) * BITS_PER_BYTE);
> +        if (vec[i]) {
> +            DPRINTFN("encoding guest vector byte %3d / %3d: 0x%.2x",
> +                     i, vec_len, vec[i]);
> +        }
> +    }
> +
> +    return fdt_setprop(fdt, fdt_offset, name, vec, vec_len);
> +}
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> new file mode 100644
> index 0000000..fba2d98
> --- /dev/null
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -0,0 +1,62 @@
> +/*
> + * QEMU SPAPR Option/Architecture Vector Definitions
> + *
> + * Each architecture option is organized/documented by the following
> + * in LoPAPR 1.1, Table 244:
> + *
> + *   <vector number>: the bit-vector in which the option is located
> + *   <vector byte>: the byte offset of the vector entry
> + *   <vector bit>: the bit offset within the vector entry
> + *
> + * where each vector entry can be one or more bytes.
> + *
> + * Firmware expects a somewhat literal encoding of this bit-vector
> + * structure, where each entry is stored in little-endian so that the
> + * byte ordering reflects that of the documentation, but where each bit
> + * offset is from "left-to-right" in the traditional representation of
> + * a byte value where the MSB is the left-most bit. Thus, each
> + * individual byte encodes the option bits in reverse order of the
> + * documented bit.
> + *
> + * These definitions/helpers attempt to abstract away this internal
> + * representation so that we can define/set/test for individual option
> + * bits using only the documented values. This is done mainly by relying
> + * on a bitmap to approximate the documented "bit-vector" structure and
> + * handling conversations to-from the internal representation under the
> + * covers.
> + *
> + * Copyright IBM Corp. 2016
> + *
> + * Authors:
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)
> +#define __HW_SPAPR_OPTION_VECTORS_H__
> +
> +#include "cpu.h"
> +
> +typedef struct sPAPROptionVector sPAPROptionVector;
> +
> +#define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
> +
> +/* interfaces */
> +sPAPROptionVector *spapr_ovec_new(void);
> +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> +void spapr_ovec_intersect(sPAPROptionVector *ov,
> +                          sPAPROptionVector *ov1,
> +                          sPAPROptionVector *ov2);
> +bool spapr_ovec_diff(sPAPROptionVector *ov,
> +                     sPAPROptionVector *ov_old,
> +                     sPAPROptionVector *ov_new);
> +void spapr_ovec_cleanup(sPAPROptionVector *ov);
> +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
> +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
> +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
> +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
> +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> +                           sPAPROptionVector *ov, const char *name);
> +
> +#endif /* !defined (__HW_SPAPR_OPTION_VECTORS_H__) */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets Michael Roth
@ 2016-10-25 23:56   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-25 23:56 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 6483 bytes --]

On Mon, Oct 24, 2016 at 11:47:29PM -0500, Michael Roth wrote:
> In some cases, ibm,client-architecture-support calls can fail. This
> could happen in the current code for situations where the modified
> device tree segment exceeds the buffer size provided by the guest
> via the call parameters. In these cases, QEMU will reset, allowing
> an opportunity to regenerate the device tree from scratch via
> boot-time handling. There are potentially other scenarios as well,
> not currently reachable in the current code, but possible in theory,
> such as cases where device-tree properties or nodes need to be removed.
> 
> We currently don't handle either of these properly for option vector
> capabilities however. Instead of carrying the negotiated capability
> beyond the reset and creating the boot-time device tree accordingly,
> we start from scratch, generating the same boot-time device tree as we
> did prior to the CAS-generated and the same device tree updates as we
> did before. This could (in theory) cause us to get stuck in a reset
> loop. This hasn't been observed, but depending on the extensiveness
> of CAS-induced device tree updates in the future, could eventually
> become an issue.
> 
> Address this by pulling capability-related device tree
> updates resulting from CAS calls into a common routine,
> spapr_dt_cas_updates(), and adding an sPAPROptionVector*
> parameter that allows us to test for newly-negotiated capabilities.
> We invoke it as follows:
> 
> 1) When ibm,client-architecture-support gets called, we
>    call spapr_dt_cas_updates() with the set of capabilities
>    added since the previous call to ibm,client-architecture-support.
>    For the initial boot, or a system reset generated by something
>    other than the CAS call itself, this set will consist of *all*
>    options supported both the platform and the guest. For calls
>    to ibm,client-architecture-support immediately after a CAS-induced
>    reset, we call spapr_dt_cas_updates() with only the set
>    of capabilities added since the previous call, since the other
>    capabilities will have already been addressed by the boot-time
>    device-tree this time around. In the unlikely event that
>    capabilities are *removed* since the previous CAS, we will
>    generate a CAS-induced reset. In the unlikely event that we
>    cannot fit the device-tree updates into the buffer provided
>    by the guest, well generate a CAS-induced reset.
> 
> 2) When a CAS update results in the need to reset the machine and
>    include the updates in the boot-time device tree, we call the
>    spapr_dt_cas_updates() using the full set of negotiated
>    capabilities as part of the reset path. At initial boot, or after
>    a reset generated by something other than the CAS call itself,
>    this set will be empty, resulting in what should be the same
>    boot-time device-tree as we generated prior to this patch. For
>    CAS-induced reset, this routine will be called with the full set of
>    capabilities negotiated by the platform/guest in the previous
>    CAS call, which should result in CAS updates from previous call
>    being accounted for in the initial boot-time device tree.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

One little nit..

[snip]
> @@ -1013,13 +1013,27 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>       * of guest input. To model these properly we'd want some sort of mask,
>       * but since they only currently apply to memory migration as defined
>       * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
> -     * to worry about this.
> +     * to worry about this for now.
>       */
> +    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
> +    /* full range of negotiated ov5 capabilities */
>      spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
>      spapr_ovec_cleanup(ov5_guest);
> +    /* capabilities that have been added since CAS-generated guest reset.
> +     * if capabilities have since been removed, generate another reset
> +     */
> +    ov5_updates = spapr_ovec_new();
> +    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
> +                                        ov5_cas_old, spapr->ov5_cas);
> +
> +    if (!spapr->cas_reboot) {
> +        spapr->cas_reboot =
> +            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
> +                                         ov5_updates);

spapr->cas_reboot is a bool, whereas spapr_h_cas_compose_response()
returns an int error code.  Now that C has real bools, I think the
compiler will do the right thing here, but I'd prefer to see an explicit

cas_reboot = (spapr_h_cas_compose_response() != 0)

for clarity.

> +    }
> +    spapr_ovec_cleanup(ov5_updates);
>  
> -    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
> -                                     cpu_update)) {
> +    if (spapr->cas_reboot) {
>          qemu_system_reset_request();
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a37eee8..b6f9f1b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -75,6 +75,7 @@ struct sPAPRMachineState {
>      bool has_graphics;
>      sPAPROptionVector *ov5;         /* QEMU-supported option vectors */
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
> +    bool cas_reboot;
>  
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
> @@ -586,7 +587,8 @@ void spapr_events_init(sPAPRMachineState *sm);
>  void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update);
> +                                 bool cpu_update,
> +                                 sPAPROptionVector *ov5_updates);
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
>  void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                              uint32_t page_shift, uint64_t bus_offset,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
@ 2016-10-25 23:58   ` David Gibson
  2016-10-26  1:34     ` David Gibson
  0 siblings, 1 reply; 22+ messages in thread
From: David Gibson @ 2016-10-25 23:58 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 3723 bytes --]

On Mon, Oct 24, 2016 at 11:47:30PM -0500, Michael Roth wrote:
> ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
> negotiated between platform/guest. Currently we hardcode this property
> in the boot-time device tree to advertise a single negotiated
> capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
> has been invoked or that capability has actually been negotiated.
> 
> Improve this by generating ibm,architecture-vec-5 based on the full
> set of option vector 5 capabilities negotiated via CAS.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c              | 23 +++++++++++++++++------
>  include/hw/ppc/spapr_ovec.h |  1 +
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3b64580..828072a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -659,14 +659,28 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
>                                  sPAPROptionVector *ov5_updates)
>  {
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -    int ret = 0;
> +    int ret = 0, offset;
>  
>      /* Generate ibm,dynamic-reconfiguration-memory node if required */
>      if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
>          g_assert(smc->dr_lmb_enabled);
>          ret = spapr_populate_drconf_memory(spapr, fdt);
> +        if (ret) {
> +            goto out;
> +        }
>      }
>  
> +    offset = fdt_path_offset(fdt, "/chosen");
> +    if (offset < 0) {
> +        offset = fdt_add_subnode(fdt, 0, "chosen");
> +        if (offset < 0) {
> +            return offset;
> +        }

Just asserting offset >= 0 would be fine here.  We always create a
/chosen node.

> +    }
> +    ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
> +                                 "ibm,architecture-vec-5");
> +
> +out:
>      return ret;
>  }
>  
> @@ -792,14 +806,9 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
>      char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
>      size_t cb = 0;
>      char *bootlist = get_boot_devices_list(&cb, true);
> -    unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
>  
>      _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
>  
> -    /* Set Form1_affinity */
> -    _FDT(fdt_setprop(fdt, chosen, "ibm,architecture-vec-5",
> -                     vec5, sizeof(vec5)));
> -
>      _FDT(fdt_setprop_string(fdt, chosen, "bootargs", machine->kernel_cmdline));
>      _FDT(fdt_setprop_cell(fdt, chosen, "linux,initrd-start",
>                            spapr->initrd_base));
> @@ -1778,6 +1787,8 @@ static void ppc_spapr_init(MachineState *machine)
>          spapr_validate_node_memory(machine, &error_fatal);
>      }
>  
> +    spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> +
>      /* init CPUs */
>      if (machine->cpu_model == NULL) {
>          machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 09afd59..47fa04c 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  
>  /* option vector 5 */
>  #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
> +#define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
>  
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support
  2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (9 preceding siblings ...)
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support Michael Roth
@ 2016-10-26  0:02 ` David Gibson
  10 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:02 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On Mon, Oct 24, 2016 at 11:47:26PM -0500, Michael Roth wrote:
> This series is based on David's ppc-for-2.8 branch, and is also available from:
> 
>   https://github.com/mdroth/qemu/commits/spapr-hotplug-event-update
> 
> Changes since RFC:
>   * Submit as v1 now that PAPR Hotplug ACR is accepted
>   * Rebase on latest ppc-for-2.8 (with device-tree refactoring)
>   * address Patchew warnings
>   * add comments to clarify spapr->ov5/ov5_cas usage. (David)
>   * revise comment to clarify intent when setting spapr->ov5
>     OV5_HP_EVT bit. (Bharata)
>   * drop internal usage of spapr_ovec_from_bitmap() in favor of
>     directly assigning bitmap to sPAPROptionVector instances. (David)
>   * standardize meaning of 'vector_len' variable through spapr_ovec_*
>     functions to be the byte-wise length of option vectors entries,
>     and not including the preceeding length byte itself. (David)
>   * fix spapr_ovec_populate_dt() to parse up to OV_MAXBITS bits
>     rather than OV_MAXBITS - 1. (David)
>   * fix spapr_ovec_populate_dt() encode the minimum of 1 option
>     vector byte instead of the max of OV_MAXBYTES in cases where
>     no option bits are set. (David)
>   * add some comments to spapr_ovec_populate_dt() to clarify what
>     is being encoded into length byte of ibm,architecture-vec-5
>   * switch 'legacy-hotplug-events' option to
>     'modern-hotplug-events' (David)
>   * modify rtas_event_log_to_source() to check for OV5_HP_EVT
>     option rather than relying on whether the hotplug source is
>     specifically enabled. Assert the latter in cases where
>     OV5_HP_EVT is set. (Bharata)
>   * drop global EventSource list in favor of an sPAPREventSource
>     list field within sPAPRMachineState (David)
>   * add CPU unplug hook in mc->unplug_request (Bharata)
> 
> 
> Patches 1-4 address various deficiencies in how we currently handle option
> vectors via ibm,client-architecture-support. This is done here in preparation
> for a new option vector bit introduced later in this series, as well as a
> number of future option vector bits related to other features, but I can
> break this out into a separate series if preferred.

I've now merged these 4 (adjusting for a couple of tiny nits mentioned
in comments).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options Michael Roth
@ 2016-10-26  0:25   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:25 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 5265 bytes --]

On Mon, Oct 24, 2016 at 11:47:32PM -0500, Michael Roth wrote:
> This adds machine options of the form:
> 
>   -machine pseries,modern-hotplug-events=true
>   -machine pseries,modern-hotplug-events=false
> 
> If false, QEMU will force the use of "legacy" style hotplug events,
> which are surfaced through EPOW events instead of a dedicated
> hot plug event source, and lack certain features necessary, mainly,
> for memory unplug support.
> 
> If true, QEMU will enable support for "modern" dedicated hot plug
> event source. Note that we will still default to "legacy" style unless
> the guest advertises support for the "modern" hotplug events via
> ibm,client-architecture-support hcall during early boot.
> 
> For pseries-2.7 and earlier we default to false, for newer machine
> types we default to true.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

I think this either needs to go after the next patch, or be merged
with it.

As it stands, after this patch, you're advertising availability of the
new mechanism without having actually implemented it.

> ---
>  hw/ppc/spapr.c              | 33 +++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h      |  1 +
>  include/hw/ppc/spapr_ovec.h |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828072a..a3ea140 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1789,6 +1789,11 @@ static void ppc_spapr_init(MachineState *machine)
>  
>      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
>  
> +    /* advertise support for dedicated HP event source to guests */
> +    if (spapr->use_hotplug_event_source) {
> +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> +    }
> +
>      /* init CPUs */
>      if (machine->cpu_model == NULL) {
>          machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> @@ -2138,16 +2143,41 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
>      spapr->kvm_type = g_strdup(value);
>  }
>  
> +static bool spapr_get_modern_hotplug_events(Object *obj, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    return spapr->use_hotplug_event_source;
> +}
> +
> +static void spapr_set_modern_hotplug_events(Object *obj, bool value,
> +                                            Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    spapr->use_hotplug_event_source = value;
> +}
> +
>  static void spapr_machine_initfn(Object *obj)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>  
>      spapr->htab_fd = -1;
> +    spapr->use_hotplug_event_source = true;
>      object_property_add_str(obj, "kvm-type",
>                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
>      object_property_set_description(obj, "kvm-type",
>                                      "Specifies the KVM virtualization mode (HV, PR)",
>                                      NULL);
> +    object_property_add_bool(obj, "modern-hotplug-events",
> +                            spapr_get_modern_hotplug_events,
> +                            spapr_set_modern_hotplug_events,
> +                            NULL);
> +    object_property_set_description(obj, "modern-hotplug-events",
> +                                    "Use dedicated hotplug event mechanism in"
> +                                    " place of standard EPOW events when possible"
> +                                    " (required for memory hot-unplug support)",
> +                                    NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -2594,7 +2624,10 @@ static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
>  
>  static void spapr_machine_2_7_instance_options(MachineState *machine)
>  {
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +
>      spapr_machine_2_8_instance_options(machine);
> +    spapr->use_hotplug_event_source = false;
>  }
>  
>  static void spapr_machine_2_7_class_options(MachineClass *mc)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b6f9f1b..851f536 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -80,6 +80,7 @@ struct sPAPRMachineState {
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> +    bool use_hotplug_event_source;
>  
>      /* Migration state */
>      int htab_save_index;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 47fa04c..92167c6 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  /* option vector 5 */
>  #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
>  #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
> +#define OV5_HP_EVT              OV_BIT(6, 5)
>  
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source Michael Roth
@ 2016-10-26  0:42   ` David Gibson
  2016-10-26 21:44     ` Michael Roth
  0 siblings, 1 reply; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:42 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 15405 bytes --]

On Mon, Oct 24, 2016 at 11:47:33PM -0500, Michael Roth wrote:
> Hotplug events were previously delivered using an EPOW interrupt
> and were queued by linux guests into a circular buffer. For traditional
> EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> events there are cases where this buffer can be exhausted, resulting
> in the loss of hotplug events, resets, etc.
> 
> Newer-style hotplug event are delivered using a dedicated event source.
> We enable this in supported guests by adding standard an additional
> event source in the guest device-tree via /event-sources, and, if
> the guest advertises support for the newer-style hotplug events,
> using the corresponding interrupt to signal the available of
> hotplug/unplug events.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   4 +-
>  hw/ppc/spapr_events.c  | 202 ++++++++++++++++++++++++++++++++++++++++---------
>  include/hw/ppc/spapr.h |   5 +-
>  3 files changed, 170 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a3ea140..dc4224b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -973,7 +973,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>      }
>  
>      /* /event-sources */
> -    spapr_dt_events(fdt, spapr->check_exception_irq);
> +    spapr_dt_events(spapr, fdt);
>  
>      /* /rtas */
>      spapr_dt_rtas(spapr, fdt);
> @@ -1917,7 +1917,7 @@ static void ppc_spapr_init(MachineState *machine)
>      }
>      g_free(filename);
>  
> -    /* Set up EPOW events infrastructure */
> +    /* Set up RTAS event infrastructure */
>      spapr_events_init(spapr);
>  
>      /* Set up the RTC RTAS interfaces */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 89aa5a7..b6b3511 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -40,6 +40,7 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qemu/help_option.h"
>  #include "qemu/bcd.h"
> +#include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
>  
>  struct rtas_error_log {
> @@ -206,27 +207,140 @@ struct hp_log_full {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> -#define EVENT_MASK_EPOW                      0x40000000
> -#define EVENT_MASK_HOTPLUG                   0x10000000
> -#define EVENT_MASK_IO                        0x08000000
> +typedef enum EventClass {
> +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> +    EVENT_CLASS_EPOW                = 1,
> +    EVENT_CLASS_RESERVED            = 2,
> +    EVENT_CLASS_HOT_PLUG            = 3,
> +    EVENT_CLASS_IO                  = 4,
> +    EVENT_CLASS_MAX
> +} EventClassIndex;
> +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> +
> +static const char *event_names[EVENT_CLASS_MAX] = {
> +    [EVENT_CLASS_INTERNAL_ERRORS]       = "internal-errors",
> +    [EVENT_CLASS_EPOW]                  = "epow-events",
> +    [EVENT_CLASS_HOT_PLUG]              = "hot-plug-events",
> +    [EVENT_CLASS_IO]                    = "ibm,io-events",
> +};
> +
> +struct sPAPREventSource {
> +    const char *name;
> +    int irq;
> +    uint32_t mask;
> +    bool enabled;
> +};
> +
> +static sPAPREventSource *spapr_event_sources_new(void)
> +{
> +    sPAPREventSource *event_sources = g_new0(sPAPREventSource,
> +                                             EVENT_CLASS_MAX);
> +    int i;
> +
> +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> +        event_sources[i].name = event_names[i];

You don't really need to have the pointer to the name in
sPAPREventSource.  You only need it for building the DT, and you can
look up event_names in parallel just as easily there.

> +    }
>  
> -void spapr_dt_events(void *fdt, uint32_t check_exception_irq)
> +    return event_sources;
> +}
> +
> +static void spapr_event_sources_register(sPAPREventSource *event_sources,
> +                                        EventClassIndex index, int irq)
>  {
> -    int event_sources, epow_events;
> -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> +    /* we only support 1 irq per event class at the moment */
> +    g_assert(event_sources);
> +    g_assert(!event_sources[index].enabled);
> +    event_sources[index].irq = irq;
> +    event_sources[index].mask = EVENT_CLASS_MASK(index);
> +    event_sources[index].enabled = true;
> +}
> +
> +static const sPAPREventSource
> +*spapr_event_sources_get_source(sPAPREventSource *event_sources,
> +                                EventClassIndex index)

function return type on previous line or same line as the function
name is fine by me.  But please don't split it across lines as you
have here (with the '*' on the second line).

> +{
> +    g_assert(index < EVENT_CLASS_MAX);
> +    g_assert(event_sources);
> +
> +    return &event_sources[index];
> +}
> +
> +void spapr_dt_events(sPAPRMachineState *spapr, void *fdt)
> +{
> +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> +    int i, count = 0, event_sources;
> +    sPAPREventSource *events = spapr->event_sources;
> +
> +    g_assert(events);
>  
>      _FDT(event_sources = fdt_add_subnode(fdt, 0, "event-sources"));
>  
> -    _FDT(fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0));
> -    _FDT(fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2));
> -    _FDT(fdt_setprop(fdt, event_sources, "interrupt-ranges",
> -                     irq_ranges, sizeof(irq_ranges)));
> +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> +        int node_offset;
> +        uint32_t interrupts[2];
> +        const sPAPREventSource *source =
> +            spapr_event_sources_get_source(events, i);
> +
> +        if (!source->enabled) {
> +            continue;
> +        }
> +
> +        interrupts[0] = cpu_to_be32(source->irq);
> +        interrupts[1] = 0;
> +
> +        _FDT(node_offset = fdt_add_subnode(fdt, event_sources, source->name));
> +        _FDT(fdt_setprop(fdt, node_offset, "interrupts", interrupts,
> +                         sizeof(interrupts)));
> +
> +        irq_ranges[count++] = interrupts[0];
> +        irq_ranges[count++] = cpu_to_be32(1);
> +    }
> +
> +    irq_ranges[count] = cpu_to_be32(count);
> +    count++;
> +
> +    _FDT((fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0)));
> +    _FDT((fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2)));
> +    _FDT((fdt_setprop(fdt, event_sources, "interrupt-ranges",
> +                      irq_ranges, count * sizeof(uint32_t))));
> +}
> +
> +static const sPAPREventSource
> +*rtas_event_log_to_source(sPAPRMachineState *spapr, int log_type)

And here.

> +{
> +    const sPAPREventSource *source;
> +
> +    g_assert(spapr->event_sources);
> +
> +    switch (log_type) {
> +    case RTAS_LOG_TYPE_HOTPLUG:
> +        source = spapr_event_sources_get_source(spapr->event_sources,
> +                                                EVENT_CLASS_HOT_PLUG);
> +        if (spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT)) {
> +            g_assert(source->enabled);
> +            break;
> +        }
> +        /* fall back to epow for legacy hotplug interrupt source */
> +    case RTAS_LOG_TYPE_EPOW:
> +        source = spapr_event_sources_get_source(spapr->event_sources,
> +                                                EVENT_CLASS_EPOW);
> +        break;
> +    default:
> +        source = NULL;
> +    }
>  
> -    _FDT(epow_events = fdt_add_subnode(fdt, event_sources, "epow-events"));
> -    _FDT(fdt_setprop(fdt, epow_events, "interrupts",
> -                     interrupts, sizeof(interrupts)));
> +    return source;
> +}
> +
> +static int rtas_event_log_to_irq(sPAPRMachineState *spapr, int log_type)
> +{
> +    const sPAPREventSource *source;
> +
> +    source = rtas_event_log_to_source(spapr, log_type);
> +    g_assert(source);
> +    g_assert(source->enabled);
> +
> +    return source->irq;
>  }
>  
>  static void rtas_event_log_queue(int log_type, void *data, bool exception)
> @@ -247,19 +361,15 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      sPAPREventLogEntry *entry = NULL;
>  
> -    /* we only queue EPOW events atm. */
> -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> -        return NULL;
> -    }
> -
>      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> +        const sPAPREventSource *source =
> +            rtas_event_log_to_source(spapr, entry->log_type);
> +
>          if (entry->exception != exception) {
>              continue;
>          }
>  
> -        /* EPOW and hotplug events are surfaced in the same manner */
> -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> +        if (source->mask & event_mask) {
>              break;
>          }
>      }
> @@ -276,19 +386,15 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      sPAPREventLogEntry *entry = NULL;
>  
> -    /* we only queue EPOW events atm. */
> -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> -        return false;
> -    }
> -
>      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> +        const sPAPREventSource *source =
> +            rtas_event_log_to_source(spapr, entry->log_type);
> +
>          if (entry->exception != exception) {
>              continue;
>          }
>  
> -        /* EPOW and hotplug events are surfaced in the same manner */
> -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> +        if (source->mask & event_mask) {
>              return true;
>          }
>      }
> @@ -376,7 +482,9 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> +                                 rtas_event_log_to_irq(spapr,
> +                                                       RTAS_LOG_TYPE_EPOW)));
>  }
>  
>  static void spapr_hotplug_set_signalled(uint32_t drc_index)
> @@ -458,7 +566,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> +                                 rtas_event_log_to_irq(spapr,
> +                                                       RTAS_LOG_TYPE_HOTPLUG)));
>  }
>  
>  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> @@ -504,6 +614,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      uint64_t xinfo;
>      sPAPREventLogEntry *event;
>      struct rtas_error_log *hdr;
> +    int i;
>  
>      if ((nargs < 6) || (nargs > 7) || nret != 1) {
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> @@ -540,8 +651,14 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>       * do the latter here, since our code relies on edge-triggered
>       * interrupts.
>       */
> -    if (rtas_event_log_contains(mask, true)) {
> -        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> +        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
> +            const sPAPREventSource *source =
> +                spapr_event_sources_get_source(spapr->event_sources, i);
> +
> +            g_assert(source->enabled);
> +            qemu_irq_pulse(xics_get_qirq(spapr->xics, source->irq));
> +        }
>      }
>  
>      return;
> @@ -593,8 +710,19 @@ out_no_events:
>  void spapr_events_init(sPAPRMachineState *spapr)
>  {
>      QTAILQ_INIT(&spapr->pending_events);
> -    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, false,
> -                                            &error_fatal);
> +
> +    spapr->event_sources = spapr_event_sources_new();
> +
> +    spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
> +                                 xics_spapr_alloc(spapr->xics, 0, false,
> +                                                  &error_fatal));
> +
> +    if (spapr->use_hotplug_event_source) {
> +        spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
> +                                     xics_spapr_alloc(spapr->xics, 0, false,
> +                                                      &error_fatal));
> +    }

If I'm following this correctly, this means that if modern events are
enabled on the qemu side, then the event source will be registered and
thus advertised in the device tree, regardless of whether the guest
supports the new-style events.

However, if the guest didn't advertise modern events support, the
hotplug events sources would be unused, and hotplug events would be
surfaced through epow.

That seems like it should work, just verifying that having the
new-but-unused event source show up in the guest device tree is
intended behaviour.

>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 851f536..21af971 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -13,6 +13,7 @@ struct sPAPRPHBState;
>  struct sPAPRNVRAM;
>  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
> +typedef struct sPAPREventSource sPAPREventSource;
>  
>  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
>  #define SPAPR_ENTRY_POINT       0x100
> @@ -77,10 +78,10 @@ struct sPAPRMachineState {
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
>      bool cas_reboot;
>  
> -    uint32_t check_exception_irq;
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
>      bool use_hotplug_event_source;
> +    sPAPREventSource *event_sources;
>  
>      /* Migration state */
>      int htab_save_index;
> @@ -585,7 +586,7 @@ struct sPAPREventLogEntry {
>  };
>  
>  void spapr_events_init(sPAPRMachineState *sm);
> -void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
> +void spapr_dt_events(sPAPRMachineState *sm, void *fdt);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
>                                   bool cpu_update,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type Michael Roth
@ 2016-10-26  0:48   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:48 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 8136 bytes --]

On Mon, Oct 24, 2016 at 11:47:34PM -0500, Michael Roth wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Add support for DRC count indexed hotplug ID type which is primarily
> needed for memory hot unplug. This type allows for specifying the
> number of DRs that should be plugged/unplugged starting from a given
> DRC index.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> * updated rtas_event_log_v6_hp to reflect count/index field ordering
>   used in PAPR hotplug ACR
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr_events.c  | 76 ++++++++++++++++++++++++++++++++++++++++----------
>  include/hw/ppc/spapr.h |  4 +++
>  2 files changed, 65 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index b6b3511..596e991 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -175,6 +175,16 @@ struct epow_log_full {
>      struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +union drc_identifier {
> +    uint32_t index;
> +    uint32_t count;
> +    struct {
> +        uint32_t count;
> +        uint32_t index;
> +    } count_indexed;
> +    char name[1];
> +} QEMU_PACKED;
> +
>  struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
>      struct rtas_event_log_v6_section_header hdr;
> @@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
>  #define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
>  #define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED              4
>      uint8_t reserved;
> -    union {
> -        uint32_t index;
> -        uint32_t count;
> -        char name[1];
> -    } drc;
> +    union drc_identifier drc_id;
>  } QEMU_PACKED;
>  
>  struct hp_log_full {
> @@ -496,7 +503,7 @@ static void spapr_hotplug_set_signalled(uint32_t drc_index)
>  
>  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>                                      sPAPRDRConnectorType drc_type,
> -                                    uint32_t drc)
> +                                    union drc_identifier *drc_id)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      struct hp_log_full *new_hp;
> @@ -541,7 +548,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      case SPAPR_DR_CONNECTOR_TYPE_PCI:
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
>          if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
> -            spapr_hotplug_set_signalled(drc);
> +            spapr_hotplug_set_signalled(drc_id->index);
>          }
>          break;
>      case SPAPR_DR_CONNECTOR_TYPE_LMB:
> @@ -559,9 +566,18 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      }
>  
>      if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
> -        hp->drc.count = cpu_to_be32(drc);
> +        hp->drc_id.count = cpu_to_be32(drc_id->count);
>      } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
> -        hp->drc.index = cpu_to_be32(drc);
> +        hp->drc_id.index = cpu_to_be32(drc_id->index);
> +    } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
> +        /* we should not be using count_indexed value unless the guest
> +         * supports dedicated hotplug event source
> +         */
> +        g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
> +        hp->drc_id.count_indexed.count =
> +            cpu_to_be32(drc_id->count_indexed.count);
> +        hp->drc_id.count_indexed.index =
> +            cpu_to_be32(drc_id->count_indexed.index);
>      }
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> @@ -575,34 +591,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
>  {
>      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -    uint32_t index = drck->get_index(drc);
> +    union drc_identifier drc_id;
>  
> +    drc_id.index = drck->get_index(drc);
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
>  {
>      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -    uint32_t index = drck->get_index(drc);
> +    union drc_identifier drc_id;
>  
> +    drc_id.index = drck->get_index(drc);
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
>                                         uint32_t count)
>  {
> +    union drc_identifier drc_id;
> +
> +    drc_id.count = count;
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, count);
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
>                                            uint32_t count)
>  {
> +    union drc_identifier drc_id;
> +
> +    drc_id.count = count;
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, count);
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> +}
> +
> +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                            uint32_t count, uint32_t index)
> +{
> +    union drc_identifier drc_id;
> +
> +    drc_id.count_indexed.count = count;
> +    drc_id.count_indexed.index = index;
> +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
> +}
> +
> +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                               uint32_t count, uint32_t index)
> +{
> +    union drc_identifier drc_id;
> +
> +    drc_id.count_indexed.count = count;
> +    drc_id.count_indexed.index = index;
> +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 21af971..bd5bcf7 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -610,6 +610,10 @@ void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
>                                         uint32_t count);
>  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
>                                            uint32_t count);
> +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                            uint32_t count, uint32_t index);
> +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                               uint32_t count, uint32_t index);
>  void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp);
>  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>                                      sPAPRMachineState *spapr);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug Michael Roth
@ 2016-10-26  0:49   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:49 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

On Mon, Oct 24, 2016 at 11:47:35PM -0500, Michael Roth wrote:
> Commit 0a417869:
> 
>     spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type
> 
> dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
> single LMB count value. This was to avoid overrunning the guest EPOW
> event queue with hotplug events. This works fine, but relies on the
> guest exhaustively scanning for pluggable LMBs to satisfy the
> requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
> until all the LMBs associated with the DIMM are identified.
> 
> With newer support for dedicated hotplug event source, this queue
> exhaustion is no longer as much of an issue due to implementation
> details on the guest side, but we still try to avoid excessive hotplug
> events by now supporting both a count and a starting index to avoid
> unecessary work. This patch makes use of that approach when the
> capability is available.
> 
> Cc: bharata@linux.vnet.ibm.com
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index dc4224b..0b3aa2f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2202,14 +2202,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>      }
>  }
>  
> -static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
> -                           uint32_t node, Error **errp)
> +static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> +                           uint32_t node, bool dedicated_hp_event_source,
> +                           Error **errp)
>  {
>      sPAPRDRConnector *drc;
>      sPAPRDRConnectorClass *drck;
>      uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
>      int i, fdt_offset, fdt_size;
>      void *fdt;
> +    uint64_t addr = addr_start;
>  
>      for (i = 0; i < nr_lmbs; i++) {
>          drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> @@ -2228,7 +2230,17 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
>       * guest only in case of hotplugged memory
>       */
>      if (dev->hotplugged) {
> -       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
> +        if (dedicated_hp_event_source) {
> +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                    addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +            spapr_hotplug_req_add_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                                   nr_lmbs,
> +                                                   drck->get_index(drc));
> +        } else {
> +            spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                           nr_lmbs);
> +        }
>      }
>  }
>  
> @@ -2261,7 +2273,9 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> -    spapr_add_lmbs(dev, addr, size, node, &error_abort);
> +    spapr_add_lmbs(dev, addr, size, node,
> +                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> +                   &error_abort);
>  
>  out:
>      error_propagate(errp, local_err);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support
  2016-10-25  4:47 ` [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support Michael Roth
@ 2016-10-26  0:57   ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  0:57 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 8349 bytes --]

On Mon, Oct 24, 2016 at 11:47:36PM -0500, Michael Roth wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Add support to hot remove pc-dimm memory devices.
> 
> Since we're introducing a machine-level unplug_request hook, we also
> had handling for CPU unplug there as well to ensure CPU unplug
> continues to work as it did before.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> * add hooks to CAS/cmdline enablement of hotplug ACR support
> * add hook for CPU unplug
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c     | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/ppc/spapr_drc.c |  17 ++++++++
>  2 files changed, 135 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0b3aa2f..a4a6058 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2281,6 +2281,90 @@ out:
>      error_propagate(errp, local_err);
>  }
>  
> +typedef struct sPAPRDIMMState {
> +    uint32_t nr_lmbs;
> +} sPAPRDIMMState;
> +
> +static void spapr_lmb_release(DeviceState *dev, void *opaque)
> +{
> +    sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
> +    HotplugHandler *hotplug_ctrl = NULL;

No reason for the NULL initializer here - you set the variable
unconditionally below.

Otherwise,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> +    if (--ds->nr_lmbs) {
> +        return;
> +    }
> +
> +    g_free(ds);
> +
> +    /*
> +     * Now that all the LMBs have been removed by the guest, call the
> +     * pc-dimm unplug handler to cleanup up the pc-dimm device.
> +     */
> +    hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> +                           Error **errp)
> +{
> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +    int i;
> +    sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
> +    uint64_t addr = addr_start;
> +
> +    ds->nr_lmbs = nr_lmbs;
> +    for (i = 0; i < nr_lmbs; i++) {
> +        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                addr / SPAPR_MEMORY_BLOCK_SIZE);
> +        g_assert(drc);
> +
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +        drck->detach(drc, dev, spapr_lmb_release, ds, errp);
> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
> +    }
> +
> +    drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                              nr_lmbs,
> +                                              drck->get_index(drc));
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                Error **errp)
> +{
> +    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +
> +    pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +    uint64_t size = memory_region_size(mr);
> +    uint64_t addr;
> +
> +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +
> +    spapr_del_lmbs(dev, addr, size, &error_abort);
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
>  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>                                      sPAPRMachineState *spapr)
>  {
> @@ -2354,10 +2438,42 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>  static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp)
>  {
> +    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
>      MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
>  
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        error_setg(errp, "Memory hot unplug not supported by sPAPR");
> +        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
> +            spapr_memory_unplug(hotplug_dev, dev, errp);
> +        } else {
> +            error_setg(errp, "Memory hot unplug not supported for this guest");
> +        }
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +        if (!mc->query_hotpluggable_cpus) {
> +            error_setg(errp, "CPU hot unplug not supported on this machine");
> +            return;
> +        }
> +        spapr_core_unplug(hotplug_dev, dev, errp);
> +    }
> +}
> +
> +static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> +                                                DeviceState *dev, Error **errp)
> +{
> +    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
> +    MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
> +            spapr_memory_unplug_request(hotplug_dev, dev, errp);
> +        } else {
> +            /* NOTE: this means there is a window after guest reset, prior to
> +             * CAS negotiation, where unplug requests will fail due to the
> +             * capability not being detected yet. This is a bit different than
> +             * the case with PCI unplug, where the events will be queued and
> +             * eventually handled by the guest after boot
> +             */
> +            error_setg(errp, "Memory hot unplug not supported for this guest");
> +        }
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>          if (!mc->query_hotpluggable_cpus) {
>              error_setg(errp, "CPU hot unplug not supported on this machine");
> @@ -2503,6 +2619,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      hc->plug = spapr_machine_device_plug;
>      hc->unplug = spapr_machine_device_unplug;
>      mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +    hc->unplug_request = spapr_machine_device_unplug_request;
>  
>      smc->dr_lmb_enabled = true;
>      smc->tcg_default_cpu = "POWER8";
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 6e54fd4..a0c44ee 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -68,6 +68,23 @@ static uint32_t set_isolation_state(sPAPRDRConnector *drc,
>          }
>      }
>  
> +    /*
> +     * Fail any requests to ISOLATE the LMB DRC if this LMB doesn't
> +     * belong to a DIMM device that is marked for removal.
> +     *
> +     * Currently the guest userspace tool drmgr that drives the memory
> +     * hotplug/unplug will just try to remove a set of 'removable' LMBs
> +     * in response to a hot unplug request that is based on drc-count.
> +     * If the LMB being removed doesn't belong to a DIMM device that is
> +     * actually being unplugged, fail the isolation request here.
> +     */
> +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_LMB) {
> +        if ((state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> +             !drc->awaiting_release) {
> +            return RTAS_OUT_HW_ERROR;
> +        }
> +    }
> +
>      drc->isolation_state = state;
>  
>      if (drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling
  2016-10-25 23:58   ` David Gibson
@ 2016-10-26  1:34     ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2016-10-26  1:34 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 4226 bytes --]

On Wed, Oct 26, 2016 at 10:58:09AM +1100, David Gibson wrote:
> On Mon, Oct 24, 2016 at 11:47:30PM -0500, Michael Roth wrote:
> > ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
> > negotiated between platform/guest. Currently we hardcode this property
> > in the boot-time device tree to advertise a single negotiated
> > capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
> > has been invoked or that capability has actually been negotiated.
> > 
> > Improve this by generating ibm,architecture-vec-5 based on the full
> > set of option vector 5 capabilities negotiated via CAS.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c              | 23 +++++++++++++++++------
> >  include/hw/ppc/spapr_ovec.h |  1 +
> >  2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 3b64580..828072a 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -659,14 +659,28 @@ static int spapr_dt_cas_updates(sPAPRMachineState *spapr, void *fdt,
> >                                  sPAPROptionVector *ov5_updates)
> >  {
> >      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> > -    int ret = 0;
> > +    int ret = 0, offset;
> >  
> >      /* Generate ibm,dynamic-reconfiguration-memory node if required */
> >      if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
> >          g_assert(smc->dr_lmb_enabled);
> >          ret = spapr_populate_drconf_memory(spapr, fdt);
> > +        if (ret) {
> > +            goto out;
> > +        }
> >      }
> >  
> > +    offset = fdt_path_offset(fdt, "/chosen");
> > +    if (offset < 0) {
> > +        offset = fdt_add_subnode(fdt, 0, "chosen");
> > +        if (offset < 0) {
> > +            return offset;
> > +        }
> 
> Just asserting offset >= 0 would be fine here.  We always create a
> /chosen node.

Duh.  Realised during testing that of course this *is* necessary for
the case where we're just making a CAS patch to the tree, rather than
building the whole tree.  I've reverted my ill-considered change in my
tree back to your original patch.

> 
> > +    }
> > +    ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
> > +                                 "ibm,architecture-vec-5");
> > +
> > +out:
> >      return ret;
> >  }
> >  
> > @@ -792,14 +806,9 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt)
> >      char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
> >      size_t cb = 0;
> >      char *bootlist = get_boot_devices_list(&cb, true);
> > -    unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
> >  
> >      _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
> >  
> > -    /* Set Form1_affinity */
> > -    _FDT(fdt_setprop(fdt, chosen, "ibm,architecture-vec-5",
> > -                     vec5, sizeof(vec5)));
> > -
> >      _FDT(fdt_setprop_string(fdt, chosen, "bootargs", machine->kernel_cmdline));
> >      _FDT(fdt_setprop_cell(fdt, chosen, "linux,initrd-start",
> >                            spapr->initrd_base));
> > @@ -1778,6 +1787,8 @@ static void ppc_spapr_init(MachineState *machine)
> >          spapr_validate_node_memory(machine, &error_fatal);
> >      }
> >  
> > +    spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> > +
> >      /* init CPUs */
> >      if (machine->cpu_model == NULL) {
> >          machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> > diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> > index 09afd59..47fa04c 100644
> > --- a/include/hw/ppc/spapr_ovec.h
> > +++ b/include/hw/ppc/spapr_ovec.h
> > @@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
> >  
> >  /* option vector 5 */
> >  #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
> > +#define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
> >  
> >  /* interfaces */
> >  sPAPROptionVector *spapr_ovec_new(void);
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source
  2016-10-26  0:42   ` David Gibson
@ 2016-10-26 21:44     ` Michael Roth
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Roth @ 2016-10-26 21:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, nfont, qemu-ppc, jallen, bharata

Quoting David Gibson (2016-10-25 19:42:18)
> On Mon, Oct 24, 2016 at 11:47:33PM -0500, Michael Roth wrote:
> > Hotplug events were previously delivered using an EPOW interrupt
> > and were queued by linux guests into a circular buffer. For traditional
> > EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> > events there are cases where this buffer can be exhausted, resulting
> > in the loss of hotplug events, resets, etc.
> > 
> > Newer-style hotplug event are delivered using a dedicated event source.
> > We enable this in supported guests by adding standard an additional
> > event source in the guest device-tree via /event-sources, and, if
> > the guest advertises support for the newer-style hotplug events,
> > using the corresponding interrupt to signal the available of
> > hotplug/unplug events.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   4 +-
> >  hw/ppc/spapr_events.c  | 202 ++++++++++++++++++++++++++++++++++++++++---------
> >  include/hw/ppc/spapr.h |   5 +-
> >  3 files changed, 170 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index a3ea140..dc4224b 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -973,7 +973,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >      }
> >  
> >      /* /event-sources */
> > -    spapr_dt_events(fdt, spapr->check_exception_irq);
> > +    spapr_dt_events(spapr, fdt);
> >  
> >      /* /rtas */
> >      spapr_dt_rtas(spapr, fdt);
> > @@ -1917,7 +1917,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      }
> >      g_free(filename);
> >  
> > -    /* Set up EPOW events infrastructure */
> > +    /* Set up RTAS event infrastructure */
> >      spapr_events_init(spapr);
> >  
> >      /* Set up the RTC RTAS interfaces */
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 89aa5a7..b6b3511 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -40,6 +40,7 @@
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qemu/help_option.h"
> >  #include "qemu/bcd.h"
> > +#include "hw/ppc/spapr_ovec.h"
> >  #include <libfdt.h>
> >  
> >  struct rtas_error_log {
> > @@ -206,27 +207,140 @@ struct hp_log_full {
> >      struct rtas_event_log_v6_hp hp;
> >  } QEMU_PACKED;
> >  
> > -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> > -#define EVENT_MASK_EPOW                      0x40000000
> > -#define EVENT_MASK_HOTPLUG                   0x10000000
> > -#define EVENT_MASK_IO                        0x08000000
> > +typedef enum EventClass {
> > +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > +    EVENT_CLASS_EPOW                = 1,
> > +    EVENT_CLASS_RESERVED            = 2,
> > +    EVENT_CLASS_HOT_PLUG            = 3,
> > +    EVENT_CLASS_IO                  = 4,
> > +    EVENT_CLASS_MAX
> > +} EventClassIndex;
> > +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> > +
> > +static const char *event_names[EVENT_CLASS_MAX] = {
> > +    [EVENT_CLASS_INTERNAL_ERRORS]       = "internal-errors",
> > +    [EVENT_CLASS_EPOW]                  = "epow-events",
> > +    [EVENT_CLASS_HOT_PLUG]              = "hot-plug-events",
> > +    [EVENT_CLASS_IO]                    = "ibm,io-events",
> > +};
> > +
> > +struct sPAPREventSource {
> > +    const char *name;
> > +    int irq;
> > +    uint32_t mask;
> > +    bool enabled;
> > +};
> > +
> > +static sPAPREventSource *spapr_event_sources_new(void)
> > +{
> > +    sPAPREventSource *event_sources = g_new0(sPAPREventSource,
> > +                                             EVENT_CLASS_MAX);
> > +    int i;
> > +
> > +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> > +        event_sources[i].name = event_names[i];
> 
> You don't really need to have the pointer to the name in
> sPAPREventSource.  You only need it for building the DT, and you can
> look up event_names in parallel just as easily there.
> 
> > +    }
> >  
> > -void spapr_dt_events(void *fdt, uint32_t check_exception_irq)
> > +    return event_sources;
> > +}
> > +
> > +static void spapr_event_sources_register(sPAPREventSource *event_sources,
> > +                                        EventClassIndex index, int irq)
> >  {
> > -    int event_sources, epow_events;
> > -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> > +    /* we only support 1 irq per event class at the moment */
> > +    g_assert(event_sources);
> > +    g_assert(!event_sources[index].enabled);
> > +    event_sources[index].irq = irq;
> > +    event_sources[index].mask = EVENT_CLASS_MASK(index);
> > +    event_sources[index].enabled = true;
> > +}
> > +
> > +static const sPAPREventSource
> > +*spapr_event_sources_get_source(sPAPREventSource *event_sources,
> > +                                EventClassIndex index)
> 
> function return type on previous line or same line as the function
> name is fine by me.  But please don't split it across lines as you
> have here (with the '*' on the second line).
> 
> > +{
> > +    g_assert(index < EVENT_CLASS_MAX);
> > +    g_assert(event_sources);
> > +
> > +    return &event_sources[index];
> > +}
> > +
> > +void spapr_dt_events(sPAPRMachineState *spapr, void *fdt)
> > +{
> > +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> > +    int i, count = 0, event_sources;
> > +    sPAPREventSource *events = spapr->event_sources;
> > +
> > +    g_assert(events);
> >  
> >      _FDT(event_sources = fdt_add_subnode(fdt, 0, "event-sources"));
> >  
> > -    _FDT(fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0));
> > -    _FDT(fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2));
> > -    _FDT(fdt_setprop(fdt, event_sources, "interrupt-ranges",
> > -                     irq_ranges, sizeof(irq_ranges)));
> > +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> > +        int node_offset;
> > +        uint32_t interrupts[2];
> > +        const sPAPREventSource *source =
> > +            spapr_event_sources_get_source(events, i);
> > +
> > +        if (!source->enabled) {
> > +            continue;
> > +        }
> > +
> > +        interrupts[0] = cpu_to_be32(source->irq);
> > +        interrupts[1] = 0;
> > +
> > +        _FDT(node_offset = fdt_add_subnode(fdt, event_sources, source->name));
> > +        _FDT(fdt_setprop(fdt, node_offset, "interrupts", interrupts,
> > +                         sizeof(interrupts)));
> > +
> > +        irq_ranges[count++] = interrupts[0];
> > +        irq_ranges[count++] = cpu_to_be32(1);
> > +    }
> > +
> > +    irq_ranges[count] = cpu_to_be32(count);
> > +    count++;
> > +
> > +    _FDT((fdt_setprop(fdt, event_sources, "interrupt-controller", NULL, 0)));
> > +    _FDT((fdt_setprop_cell(fdt, event_sources, "#interrupt-cells", 2)));
> > +    _FDT((fdt_setprop(fdt, event_sources, "interrupt-ranges",
> > +                      irq_ranges, count * sizeof(uint32_t))));
> > +}
> > +
> > +static const sPAPREventSource
> > +*rtas_event_log_to_source(sPAPRMachineState *spapr, int log_type)
> 
> And here.
> 
> > +{
> > +    const sPAPREventSource *source;
> > +
> > +    g_assert(spapr->event_sources);
> > +
> > +    switch (log_type) {
> > +    case RTAS_LOG_TYPE_HOTPLUG:
> > +        source = spapr_event_sources_get_source(spapr->event_sources,
> > +                                                EVENT_CLASS_HOT_PLUG);
> > +        if (spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT)) {
> > +            g_assert(source->enabled);
> > +            break;
> > +        }
> > +        /* fall back to epow for legacy hotplug interrupt source */
> > +    case RTAS_LOG_TYPE_EPOW:
> > +        source = spapr_event_sources_get_source(spapr->event_sources,
> > +                                                EVENT_CLASS_EPOW);
> > +        break;
> > +    default:
> > +        source = NULL;
> > +    }
> >  
> > -    _FDT(epow_events = fdt_add_subnode(fdt, event_sources, "epow-events"));
> > -    _FDT(fdt_setprop(fdt, epow_events, "interrupts",
> > -                     interrupts, sizeof(interrupts)));
> > +    return source;
> > +}
> > +
> > +static int rtas_event_log_to_irq(sPAPRMachineState *spapr, int log_type)
> > +{
> > +    const sPAPREventSource *source;
> > +
> > +    source = rtas_event_log_to_source(spapr, log_type);
> > +    g_assert(source);
> > +    g_assert(source->enabled);
> > +
> > +    return source->irq;
> >  }
> >  
> >  static void rtas_event_log_queue(int log_type, void *data, bool exception)
> > @@ -247,19 +361,15 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >      sPAPREventLogEntry *entry = NULL;
> >  
> > -    /* we only queue EPOW events atm. */
> > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > -        return NULL;
> > -    }
> > -
> >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > +        const sPAPREventSource *source =
> > +            rtas_event_log_to_source(spapr, entry->log_type);
> > +
> >          if (entry->exception != exception) {
> >              continue;
> >          }
> >  
> > -        /* EPOW and hotplug events are surfaced in the same manner */
> > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > +        if (source->mask & event_mask) {
> >              break;
> >          }
> >      }
> > @@ -276,19 +386,15 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >      sPAPREventLogEntry *entry = NULL;
> >  
> > -    /* we only queue EPOW events atm. */
> > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > -        return false;
> > -    }
> > -
> >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > +        const sPAPREventSource *source =
> > +            rtas_event_log_to_source(spapr, entry->log_type);
> > +
> >          if (entry->exception != exception) {
> >              continue;
> >          }
> >  
> > -        /* EPOW and hotplug events are surfaced in the same manner */
> > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > +        if (source->mask & event_mask) {
> >              return true;
> >          }
> >      }
> > @@ -376,7 +482,9 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >  
> >      rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > +                                 rtas_event_log_to_irq(spapr,
> > +                                                       RTAS_LOG_TYPE_EPOW)));
> >  }
> >  
> >  static void spapr_hotplug_set_signalled(uint32_t drc_index)
> > @@ -458,7 +566,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >  
> >      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > +                                 rtas_event_log_to_irq(spapr,
> > +                                                       RTAS_LOG_TYPE_HOTPLUG)));
> >  }
> >  
> >  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> > @@ -504,6 +614,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >      uint64_t xinfo;
> >      sPAPREventLogEntry *event;
> >      struct rtas_error_log *hdr;
> > +    int i;
> >  
> >      if ((nargs < 6) || (nargs > 7) || nret != 1) {
> >          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > @@ -540,8 +651,14 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >       * do the latter here, since our code relies on edge-triggered
> >       * interrupts.
> >       */
> > -    if (rtas_event_log_contains(mask, true)) {
> > -        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> > +        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
> > +            const sPAPREventSource *source =
> > +                spapr_event_sources_get_source(spapr->event_sources, i);
> > +
> > +            g_assert(source->enabled);
> > +            qemu_irq_pulse(xics_get_qirq(spapr->xics, source->irq));
> > +        }
> >      }
> >  
> >      return;
> > @@ -593,8 +710,19 @@ out_no_events:
> >  void spapr_events_init(sPAPRMachineState *spapr)
> >  {
> >      QTAILQ_INIT(&spapr->pending_events);
> > -    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, false,
> > -                                            &error_fatal);
> > +
> > +    spapr->event_sources = spapr_event_sources_new();
> > +
> > +    spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
> > +                                 xics_spapr_alloc(spapr->xics, 0, false,
> > +                                                  &error_fatal));
> > +
> > +    if (spapr->use_hotplug_event_source) {
> > +        spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_HOT_PLUG,
> > +                                     xics_spapr_alloc(spapr->xics, 0, false,
> > +                                                      &error_fatal));
> > +    }
> 
> If I'm following this correctly, this means that if modern events are
> enabled on the qemu side, then the event source will be registered and
> thus advertised in the device tree, regardless of whether the guest
> supports the new-style events.
> 
> However, if the guest didn't advertise modern events support, the
> hotplug events sources would be unused, and hotplug events would be
> surfaced through epow.

Yes, exactly.

> 
> That seems like it should work, just verifying that having the
> new-but-unused event source show up in the guest device tree is
> intended behaviour.

Yes, that's the intent. It's up to the OS to set up an interrupt handler
for the event source in cases where it intends to handle modern hotplug
events, but whether or not we opt to include it in the device tree
before vs. after negotiation (via CAS) is arbitrary.

There was actually a bug in the RFC due to me forgetting this source was,
as a result of the above, always enabled for pseries-2.8+. I'll add a
small comment here to help clarify.

> 
> >      spapr->epow_notifier.notify = spapr_powerdown_req;
> >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> >      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 851f536..21af971 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -13,6 +13,7 @@ struct sPAPRPHBState;
> >  struct sPAPRNVRAM;
> >  typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState;
> >  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
> > +typedef struct sPAPREventSource sPAPREventSource;
> >  
> >  #define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
> >  #define SPAPR_ENTRY_POINT       0x100
> > @@ -77,10 +78,10 @@ struct sPAPRMachineState {
> >      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
> >      bool cas_reboot;
> >  
> > -    uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> >      bool use_hotplug_event_source;
> > +    sPAPREventSource *event_sources;
> >  
> >      /* Migration state */
> >      int htab_save_index;
> > @@ -585,7 +586,7 @@ struct sPAPREventLogEntry {
> >  };
> >  
> >  void spapr_events_init(sPAPRMachineState *sm);
> > -void spapr_dt_events(void *fdt, uint32_t check_exception_irq);
> > +void spapr_dt_events(sPAPRMachineState *sm, void *fdt);
> >  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
> >                                   target_ulong addr, target_ulong size,
> >                                   bool cpu_update,
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-10-26 21:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25  4:47 [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support Michael Roth
2016-10-25  4:47 ` [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers Michael Roth
2016-10-25 23:54   ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 02/10] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
2016-10-25  4:47 ` [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets Michael Roth
2016-10-25 23:56   ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
2016-10-25 23:58   ` David Gibson
2016-10-26  1:34     ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 05/10] spapr: update spapr hotplug documentation Michael Roth
2016-10-25  4:47 ` [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options Michael Roth
2016-10-26  0:25   ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source Michael Roth
2016-10-26  0:42   ` David Gibson
2016-10-26 21:44     ` Michael Roth
2016-10-25  4:47 ` [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type Michael Roth
2016-10-26  0:48   ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug Michael Roth
2016-10-26  0:49   ` David Gibson
2016-10-25  4:47 ` [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support Michael Roth
2016-10-26  0:57   ` David Gibson
2016-10-26  0:02 ` [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.