All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support
@ 2016-10-12 23:13 Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers Michael Roth
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

This series is based on David's ppc-for-2.8 branch, and is also available from:

  https://github.com/mdroth/qemu/commits/spapr-hotplug-event-update

Patches 1-4 address various deficiencies in how we currently handle option
vectors via ibm,client-architecture-support. This is done here in preparation
for a new option vector bit introduced later in this series, as well as a
number of future option vector bits related to other features, but I can
break this out into a separate series if preferred.

Patches 5-8 add support for an updated event format for hotplug events,
which includes a new way to specify a range of DRCs/LMBs to hotplug/unplug
using a starting position and count, which is necessary for memory unplug.
The format for this new event format is still in draft form, but slated
for inclusion in the PAPR/LoPAPR.

Patches 9-11 add support for memory unplug using the new event format.

In addition to kernel 4.8 or later, there are a number of patches required
to enable support on the guest kernel side. I've including the minimum set
of patches in my branch here:

   https://github.com/mdroth/linux/commits/spapr-hotplug-event-update

   *powerpc/pseries: advertise Hot Plug Event support to firmware
   powerpc/pseries: Implement indexed-count hotplug memory remove
   powerpc/pseries: Implement indexed-count hotplug memory add

Note that there is currently an issue that arises when attempting to
offline an LMB that was onlined using a guest kernel's auto-onlining
mechanism, which can prevent full completion of memory unplug requests.
This is being investigated, but for the purposes of testing this can
be worked around currently by disabling auto-onlining in guests via:

  "echo offline >/sys/devices/system/memory/auto_online_blocks"

and instead onlining the blocks manually or via udev.

 docs/specs/ppc-spapr-hotplug.txt |  55 ++++++++++---
 hw/ppc/Makefile.objs             |   2 +-
 hw/ppc/spapr.c                   | 237 ++++++++++++++++++++++++++++++++++++++++++++++++------
 hw/ppc/spapr_drc.c               |  17 ++++
 hw/ppc/spapr_events.c            | 222 ++++++++++++++++++++++++++++++++++++++++-----------
 hw/ppc/spapr_hcall.c             |  70 +++++++---------
 hw/ppc/spapr_ovec.c              | 244 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h           |  15 +++-
 include/hw/ppc/spapr_ovec.h      |  67 ++++++++++++++++
 9 files changed, 804 insertions(+), 125 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  2:39   ` David Gibson
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.

Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".

The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.

This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/Makefile.objs        |   2 +-
 hw/ppc/spapr_ovec.c         | 244 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_ovec.h |  62 +++++++++++
 3 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_ovec.c
 create mode 100644 include/hw/ppc/spapr_ovec.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 99a0d4e..2e0b0c9 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
-obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
+obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
new file mode 100644
index 0000000..ddc19f5
--- /dev/null
+++ b/hw/ppc/spapr_ovec.c
@@ -0,0 +1,244 @@
+/*
+ * QEMU SPAPR Architecture Option Vector Helper Functions
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * Authors:
+ *  Bharata B Rao     <bharata@linux.vnet.ibm.com>
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/ppc/spapr_ovec.h"
+#include "qemu/bitmap.h"
+#include "exec/address-spaces.h"
+#include "qemu/error-report.h"
+#include <libfdt.h>
+
+/* #define DEBUG_SPAPR_OVEC */
+
+#ifdef DEBUG_SPAPR_OVEC
+#define DPRINTFN(fmt, ...) \
+    do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTFN(fmt, ...) \
+    do { } while (0)
+#endif
+
+#define OV_MAXBYTES 256 /* not including length byte */
+#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
+
+/* we *could* work with bitmaps directly, but handling the bitmap privately
+ * allows us to more safely make assumptions about the bitmap size and
+ * simplify the calling code somewhat
+ */
+struct sPAPROptionVector {
+    unsigned long *bitmap;
+};
+
+static sPAPROptionVector *spapr_ovec_from_bitmap(unsigned long *bitmap)
+{
+    sPAPROptionVector *ov;
+
+    g_assert(bitmap);
+
+    ov = g_new0(sPAPROptionVector, 1);
+    ov->bitmap = bitmap;
+
+    return ov;
+}
+
+sPAPROptionVector *spapr_ovec_new(void)
+{
+    return spapr_ovec_from_bitmap(bitmap_new(OV_MAXBITS));
+}
+
+sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
+{
+    sPAPROptionVector *ov;
+
+    g_assert(ov_orig);
+
+    ov = spapr_ovec_new();
+    bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
+
+    return ov;
+}
+
+void spapr_ovec_intersect(sPAPROptionVector *ov,
+                          sPAPROptionVector *ov1,
+                          sPAPROptionVector *ov2)
+{
+    g_assert(ov);
+    g_assert(ov1);
+    g_assert(ov2);
+
+    bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
+}
+
+/* returns true if options bits were removed, false otherwise */
+bool spapr_ovec_diff(sPAPROptionVector *ov,
+                     sPAPROptionVector *ov_old,
+                     sPAPROptionVector *ov_new)
+{
+    unsigned long *change_mask = bitmap_new(OV_MAXBITS);
+    unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
+    bool bits_were_removed = false;
+
+    g_assert(ov);
+    g_assert(ov_old);
+    g_assert(ov_new);
+
+    bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
+    bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
+    bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
+
+    if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
+        bits_were_removed = true;
+    }
+
+    g_free(change_mask);
+    g_free(removed_bits);
+
+    return bits_were_removed;
+}
+
+void spapr_ovec_cleanup(sPAPROptionVector *ov)
+{
+    if (ov) {
+        g_free(ov->bitmap);
+        g_free(ov);
+    }
+}
+
+void spapr_ovec_set(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    set_bit(bitnr, ov->bitmap);
+}
+
+void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    clear_bit(bitnr, ov->bitmap);
+}
+
+bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert_cmpint(bitnr, <, OV_MAXBITS);
+
+    return test_bit(bitnr, ov->bitmap) ? true : false;
+}
+
+static void guest_byte_to_bitmap(uint8_t entry, unsigned long *bitmap,
+                                 long bitmap_offset)
+{
+    int i;
+
+    for (i = 0; i < BITS_PER_BYTE; i++) {
+        if (entry & (1 << (BITS_PER_BYTE - 1 - i))) {
+            bitmap_set(bitmap, bitmap_offset + i, 1);
+        }
+    }
+}
+
+static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
+{
+    uint8_t entry = 0;
+    int i;
+
+    for (i = 0; i < BITS_PER_BYTE; i++) {
+        if (test_bit(bitmap_offset + i, bitmap)) {
+            entry |= (1 << (BITS_PER_BYTE - 1 - i));
+        }
+    }
+
+    return entry;
+}
+
+static target_ulong vector_addr(target_ulong table_addr, int vector)
+{
+    uint16_t vector_count, vector_len;
+    int i;
+
+    vector_count = ldub_phys(&address_space_memory, table_addr) + 1;
+    if (vector > vector_count) {
+        return 0;
+    }
+    table_addr++; /* skip nr option vectors */
+
+    for (i = 0; i < vector - 1; i++) {
+        vector_len = ldub_phys(&address_space_memory, table_addr) + 2;
+        table_addr += vector_len;
+    }
+    return table_addr;
+}
+
+sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector)
+{
+    unsigned long *bitmap;
+    target_ulong addr;
+    uint16_t vector_len;
+    int i;
+
+    g_assert(table_addr);
+    g_assert_cmpint(vector, >=, 1); /* vector numbering starts at 1 */
+
+    addr = vector_addr(table_addr, vector);
+    if (!addr) {
+        /* specified vector isn't present */
+        return NULL;
+    }
+
+    vector_len = ldub_phys(&address_space_memory, addr++) + 1;
+    if (vector_len >= OV_MAXBYTES) {
+        error_report("guest option vector length %i exceeds max of %i",
+                     vector_len, OV_MAXBYTES);
+    }
+    bitmap = bitmap_new(OV_MAXBITS);
+
+    for (i = 0; i < vector_len; i++) {
+        uint8_t entry = ldub_phys(&address_space_memory, addr + i);
+        if (entry) {
+            DPRINTFN("read guest vector %2d, byte %3d / %3d: 0x%.2x",
+                     vector, i + 1, vector_len, entry);
+            guest_byte_to_bitmap(entry, bitmap, i * BITS_PER_BYTE);
+        }
+    }
+
+    return spapr_ovec_from_bitmap(bitmap);
+}
+
+int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
+                           sPAPROptionVector *ov, const char *name)
+{
+    uint8_t vec[OV_MAXBYTES + 1];
+    uint16_t vec_len;
+    unsigned long lastbit;
+    int i;
+
+    g_assert(ov);
+
+    lastbit = MIN(find_last_bit(ov->bitmap, OV_MAXBITS), OV_MAXBITS - 1);
+    vec_len = lastbit / BITS_PER_BYTE + 2;
+    g_assert_cmpint(vec_len - 2, <=, UINT8_MAX);
+    vec[0] = vec_len - 2; /* guest expects length encoded as n - 2 */
+
+    for (i = 1; i < vec_len; i++) {
+        vec[i] = guest_byte_from_bitmap(ov->bitmap, (i - 1) * BITS_PER_BYTE);
+        if (vec[i]) {
+            DPRINTFN("encoding guest vector byte %3d / %3d: 0x%.2x",
+                     i, vec_len, vec[i]);
+        }
+    }
+
+    return fdt_setprop(fdt, fdt_offset, name, vec, vec_len);
+}
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
new file mode 100644
index 0000000..fba2d98
--- /dev/null
+++ b/include/hw/ppc/spapr_ovec.h
@@ -0,0 +1,62 @@
+/*
+ * QEMU SPAPR Option/Architecture Vector Definitions
+ *
+ * Each architecture option is organized/documented by the following
+ * in LoPAPR 1.1, Table 244:
+ *
+ *   <vector number>: the bit-vector in which the option is located
+ *   <vector byte>: the byte offset of the vector entry
+ *   <vector bit>: the bit offset within the vector entry
+ *
+ * where each vector entry can be one or more bytes.
+ *
+ * Firmware expects a somewhat literal encoding of this bit-vector
+ * structure, where each entry is stored in little-endian so that the
+ * byte ordering reflects that of the documentation, but where each bit
+ * offset is from "left-to-right" in the traditional representation of
+ * a byte value where the MSB is the left-most bit. Thus, each
+ * individual byte encodes the option bits in reverse order of the
+ * documented bit.
+ *
+ * These definitions/helpers attempt to abstract away this internal
+ * representation so that we can define/set/test for individual option
+ * bits using only the documented values. This is done mainly by relying
+ * on a bitmap to approximate the documented "bit-vector" structure and
+ * handling conversations to-from the internal representation under the
+ * covers.
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)
+#define __HW_SPAPR_OPTION_VECTORS_H__
+
+#include "cpu.h"
+
+typedef struct sPAPROptionVector sPAPROptionVector;
+
+#define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
+
+/* interfaces */
+sPAPROptionVector *spapr_ovec_new(void);
+sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
+void spapr_ovec_intersect(sPAPROptionVector *ov,
+                          sPAPROptionVector *ov1,
+                          sPAPROptionVector *ov2);
+bool spapr_ovec_diff(sPAPROptionVector *ov,
+                     sPAPROptionVector *ov_old,
+                     sPAPROptionVector *ov_new);
+void spapr_ovec_cleanup(sPAPROptionVector *ov);
+void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
+void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
+bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
+sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
+int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
+                           sPAPROptionVector *ov, const char *name);
+
+#endif /* !defined (__HW_SPAPR_OPTION_VECTORS_H__) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  3:02   ` David Gibson
  2016-10-14  7:10   ` Bharata B Rao
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets Michael Roth
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 10 ++++++--
 hw/ppc/spapr_hcall.c        | 56 ++++++++++++---------------------------------
 include/hw/ppc/spapr.h      |  5 +++-
 include/hw/ppc/spapr_ovec.h |  3 +++
 4 files changed, 30 insertions(+), 44 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 03e3803..934d6b2 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -856,7 +856,7 @@ out:
 
 int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update, bool memory_update)
+                                 bool cpu_update)
 {
     void *fdt, *fdt_skel;
     sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
@@ -880,7 +880,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
     }
 
     /* Generate ibm,dynamic-reconfiguration-memory node if required */
-    if (memory_update && smc->dr_lmb_enabled) {
+    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
+        g_assert(smc->dr_lmb_enabled);
         _FDT((spapr_populate_drconf_memory(spapr, fdt)));
     }
 
@@ -1769,7 +1770,12 @@ static void ppc_spapr_init(MachineState *machine)
                                    DIV_ROUND_UP(max_cpus * smt, smp_threads),
                                    XICS_IRQS_SPAPR, &error_fatal);
 
+    /* Set up containers for ibm,client-set-architecture negotiated options */
+    spapr->ov5 = spapr_ovec_new();
+    spapr->ov5_cas = spapr_ovec_new();
+
     if (smc->dr_lmb_enabled) {
+        spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
         spapr_validate_node_memory(machine, &error_fatal);
     }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index c5e7e8c..f1d081b 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -11,6 +11,7 @@
 #include "trace.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
+#include "hw/ppc/spapr_ovec.h"
 
 struct SPRSyncState {
     int spr;
@@ -880,32 +881,6 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     return ret;
 }
 
-/*
- * Return the offset to the requested option vector @vector in the
- * option vector table @table.
- */
-static target_ulong cas_get_option_vector(int vector, target_ulong table)
-{
-    int i;
-    char nr_vectors, nr_entries;
-
-    if (!table) {
-        return 0;
-    }
-
-    nr_vectors = (ldl_phys(&address_space_memory, table) >> 24) + 1;
-    if (!vector || vector > nr_vectors) {
-        return 0;
-    }
-    table++; /* skip nr option vectors */
-
-    for (i = 0; i < vector - 1; i++) {
-        nr_entries = ldl_phys(&address_space_memory, table) >> 24;
-        table += nr_entries + 2;
-    }
-    return table;
-}
-
 typedef struct {
     uint32_t cpu_version;
     Error *err;
@@ -961,23 +936,21 @@ static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, uint32_t pvr,
     }
 }
 
-#define OV5_DRCONF_MEMORY 0x20
-
 static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
                                                   sPAPRMachineState *spapr,
                                                   target_ulong opcode,
                                                   target_ulong *args)
 {
     target_ulong list = ppc64_phys_to_real(args[0]);
-    target_ulong ov_table, ov5;
+    target_ulong ov_table;
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu_);
     CPUState *cs;
-    bool cpu_match = false, cpu_update = true, memory_update = false;
+    bool cpu_match = false, cpu_update = true;
     unsigned old_cpu_version = cpu_->cpu_version;
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
-    char ov5_byte2;
+    sPAPROptionVector *ov5_guest;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -1033,19 +1006,20 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     /* For the future use: here @ov_table points to the first option vector */
     ov_table = list;
 
-    ov5 = cas_get_option_vector(5, ov_table);
-    if (!ov5) {
-        return H_SUCCESS;
-    }
+    ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
 
-    /* @list now points to OV 5 */
-    ov5_byte2 = ldub_phys(&address_space_memory, ov5 + 2);
-    if (ov5_byte2 & OV5_DRCONF_MEMORY) {
-        memory_update = true;
-    }
+    /* NOTE: there are actually a number of ov5 bits where input from the
+     * guest is always zero, and the platform/QEMU enables them independently
+     * of guest input. To model these properly we'd want some sort of mask,
+     * but since they only currently apply to memory migration as defined
+     * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
+     * to worry about this.
+     */
+    spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
+    spapr_ovec_cleanup(ov5_guest);
 
     if (spapr_h_cas_compose_response(spapr, args[1], args[2],
-                                     cpu_update, memory_update)) {
+                                     cpu_update)) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 39dadaa..6c20d28 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -6,6 +6,7 @@
 #include "hw/ppc/xics.h"
 #include "hw/ppc/spapr_drc.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/ppc/spapr_ovec.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
@@ -66,6 +67,8 @@ struct sPAPRMachineState {
     uint64_t rtc_offset; /* Now used only during incoming migration */
     struct PPCTimebase tb;
     bool has_graphics;
+    sPAPROptionVector *ov5;
+    sPAPROptionVector *ov5_cas;
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
@@ -577,7 +580,7 @@ void spapr_events_init(sPAPRMachineState *sm);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update, bool memory_update);
+                                 bool cpu_update);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index fba2d98..09afd59 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -42,6 +42,9 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 
 #define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
 
+/* option vector 5 */
+#define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
+
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
 sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:15   ` David Gibson
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 04/11] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_populate_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_populate_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_populate_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_populate_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         | 43 ++++++++++++++++++++++++++++++++++---------
 hw/ppc/spapr_hcall.c   | 22 ++++++++++++++++++----
 include/hw/ppc/spapr.h |  4 +++-
 3 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 934d6b2..460c7a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -854,13 +854,28 @@ out:
     return ret;
 }
 
+static int spapr_populate_cas_updates(sPAPRMachineState *spapr, void *fdt,
+                                      sPAPROptionVector *ov5_updates)
+{
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    int ret = 0;
+
+    /* Generate ibm,dynamic-reconfiguration-memory node if required */
+    if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
+        g_assert(smc->dr_lmb_enabled);
+        ret = spapr_populate_drconf_memory(spapr, fdt);
+    }
+
+    return ret;
+}
+
 int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update)
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates)
 {
     void *fdt, *fdt_skel;
     sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
-    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
 
     size -= sizeof(hdr);
 
@@ -879,11 +894,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
         _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
     }
 
-    /* Generate ibm,dynamic-reconfiguration-memory node if required */
-    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
-        g_assert(smc->dr_lmb_enabled);
-        _FDT((spapr_populate_drconf_memory(spapr, fdt)));
-    }
+    spapr_populate_cas_updates(spapr, fdt, ov5_updates);
 
     /* Pack resulting tree */
     _FDT((fdt_pack(fdt)));
@@ -904,7 +915,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
 static void spapr_finalize_fdt(sPAPRMachineState *spapr,
                                hwaddr fdt_addr,
                                hwaddr rtas_addr,
-                               hwaddr rtas_size)
+                               hwaddr rtas_size,
+                               sPAPROptionVector *ov5_updates)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
     MachineClass *mc = MACHINE_GET_CLASS(machine);
@@ -1000,6 +1012,11 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
         }
     }
 
+    ret = spapr_populate_cas_updates(spapr, fdt, ov5_updates);
+    if (ret < 0) {
+        error_report("couldn't setup CAS properties fdt");
+    }
+
     _FDT((fdt_pack(fdt)));
 
     if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
@@ -1174,9 +1191,16 @@ static void ppc_spapr_reset(void)
     spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
     spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
 
+    /* if this reset wasn't generated by CAS, we should reset our
+     * negotiated options and start from scratch */
+    if (!spapr->cas_reboot) {
+        spapr_ovec_cleanup(spapr->ov5_cas);
+        spapr->ov5_cas = spapr_ovec_new();
+    }
+
     /* Load the fdt */
     spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
-                       spapr->rtas_size);
+                       spapr->rtas_size, spapr->ov5_cas);
 
     /* Copy RTAS over */
     cpu_physical_memory_write(spapr->rtas_addr, spapr->rtas_blob,
@@ -1189,6 +1213,7 @@ static void ppc_spapr_reset(void)
     first_cpu->halted = 0;
     first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
 
+    spapr->cas_reboot = false;
 }
 
 static void spapr_create_nvram(sPAPRMachineState *spapr)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f1d081b..d277813 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -950,7 +950,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
-    sPAPROptionVector *ov5_guest;
+    sPAPROptionVector *ov5_guest, *ov5_cas_old, *ov5_updates;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -1013,13 +1013,27 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
      * of guest input. To model these properly we'd want some sort of mask,
      * but since they only currently apply to memory migration as defined
      * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
-     * to worry about this.
+     * to worry about this for now.
      */
+    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
+    /* full range of negotiated ov5 capabilities */
     spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
     spapr_ovec_cleanup(ov5_guest);
+    /* capabilities that have been added since CAS-generated guest reset.
+     * if capabilities have since been removed, generate another reset
+     */
+    ov5_updates = spapr_ovec_new();
+    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
+                                        ov5_cas_old, spapr->ov5_cas);
+
+    if (!spapr->cas_reboot) {
+        spapr->cas_reboot =
+            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
+                                         ov5_updates);
+    }
+    spapr_ovec_cleanup(ov5_updates);
 
-    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
-                                     cpu_update)) {
+    if (spapr->cas_reboot) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6c20d28..27a3328 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -69,6 +69,7 @@ struct sPAPRMachineState {
     bool has_graphics;
     sPAPROptionVector *ov5;
     sPAPROptionVector *ov5_cas;
+    bool cas_reboot;
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
@@ -580,7 +581,8 @@ void spapr_events_init(sPAPRMachineState *sm);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
-                                 bool cpu_update);
+                                 bool cpu_update,
+                                 sPAPROptionVector *ov5_updates);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 04/11] spapr: improve ibm, architecture-vec-5 property handling
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (2 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options Michael Roth
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 22 +++++++++++++++++-----
 include/hw/ppc/spapr_ovec.h |  1 +
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 460c7a8..3b2a459 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -285,7 +285,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     GString *qemu_hypertas = g_string_sized_new(256);
     uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(max_cpus)};
-    unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
     char *buf;
 
     add_str(hypertas, "hcall-pft");
@@ -351,9 +350,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     /* /chosen */
     _FDT((fdt_begin_node(fdt, "chosen")));
 
-    /* Set Form1_affinity */
-    _FDT((fdt_property(fdt, "ibm,architecture-vec-5", vec5, sizeof(vec5))));
-
     _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
     _FDT((fdt_property(fdt, "linux,initrd-start",
                        &start_prop, sizeof(start_prop))));
@@ -858,14 +854,28 @@ static int spapr_populate_cas_updates(sPAPRMachineState *spapr, void *fdt,
                                       sPAPROptionVector *ov5_updates)
 {
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
-    int ret = 0;
+    int ret = 0, offset;
 
     /* Generate ibm,dynamic-reconfiguration-memory node if required */
     if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
         g_assert(smc->dr_lmb_enabled);
         ret = spapr_populate_drconf_memory(spapr, fdt);
+        if (ret) {
+            goto out;
+        }
     }
 
+    offset = fdt_path_offset(fdt, "/chosen");
+    if (offset < 0) {
+        offset = fdt_add_subnode(fdt, 0, "chosen");
+        if (offset < 0) {
+            return offset;
+        }
+    }
+    ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
+                                 "ibm,architecture-vec-5");
+
+out:
     return ret;
 }
 
@@ -1804,6 +1814,8 @@ static void ppc_spapr_init(MachineState *machine)
         spapr_validate_node_memory(machine, &error_fatal);
     }
 
+    spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
+
     /* init CPUs */
     if (machine->cpu_model == NULL) {
         machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index 09afd59..47fa04c 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 
 /* option vector 5 */
 #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
+#define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
 
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (3 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 04/11] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:34   ` David Gibson
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation Michael Roth
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

Rather than machine instances having backward-compatible option
defaults that need to be repeatedly re-enabled for every new machine
type we introduce, we set the defaults appropriate for newer machine
types, then add code to explicitly disable instance options as needed
to maintain compatibility with older machine types.

Currently pseries-2.5 does not inherit from pseries-2.6 in this
fashion, which is okay at the moment since we do not have any
instance compatibility options for pseries-2.6+ currently.

We will make use of this in future patches though, so fix it here.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3b2a459..f8cde92 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2544,6 +2544,7 @@ DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
 
 static void spapr_machine_2_6_instance_options(MachineState *machine)
 {
+    spapr_machine_2_7_instance_options(machine);
 }
 
 static void spapr_machine_2_6_class_options(MachineClass *mc)
@@ -2568,6 +2569,7 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", false);
 
 static void spapr_machine_2_5_instance_options(MachineState *machine)
 {
+    spapr_machine_2_6_instance_options(machine);
 }
 
 static void spapr_machine_2_5_class_options(MachineClass *mc)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (4 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:35   ` David Gibson
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options Michael Roth
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

This updates the existing documentation to reflect recent updates to
the hotplug event structure, which are in draft form but slated
for inclusion in PAPR/LoPAPR.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 docs/specs/ppc-spapr-hotplug.txt | 55 +++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
index 631b0ca..f57e2a0 100644
--- a/docs/specs/ppc-spapr-hotplug.txt
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -233,12 +233,27 @@ tools by host-level management such as an HMC. This level of management is not
 applicable to PowerKVM, hence the reason for extending the notification
 framework to support hotplug events.
 
-Note that these events are not yet formally part of the PAPR+ specification,
-but support for this format has already been implemented in DR-related
-guest tools such as powerpc-utils/librtas, as well as kernel patches that have
-been submitted to handle in-kernel processing of memory/cpu-related hotplug
-events[1], and is planned for formal inclusion is PAPR+ specification. The
-hotplug-specific payload is QEMU implemented as follows (with all values
+The format for these EPOW-signalled events is described below under
+"hotplug/unplug event structure". Note that these events are not
+formally part of the PAPR+ specification, and have been superseded by a
+newer format, also described below under "hotplug/unplug event structure",
+and so are now deemed a "legacy" format. The formats are similar, but the
+"modern" format contains additional fields/flags, which are denoted for the
+purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards.
+
+QEMU should assume support only for "legacy" fields/flags unless the guest
+advertises support for the "modern" format via ibm,client-architecture-support
+hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector
+structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events,
+"modern" format events are surfaced to the guest via check-exception RTAS calls,
+but use a dedicated event source to signal the guest. This event source is
+advertised to the guest by the addition of a "hot-plug-events" node under
+"/event-sources" node of the guest's device tree using the standard format
+described in LoPAPR v11, B.6.12.1.
+
+== hotplug/unplug event structure ==
+
+The hotplug-specific payload in QEMU is implemented as follows (with all values
 encoded in big-endian format):
 
 struct rtas_event_log_v6_hp {
@@ -263,14 +278,23 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_ACTION_ADD       1
 #define RTAS_LOG_V6_HP_ACTION_REMOVE    2
     uint8_t hotplug_action;             /* action (add/remove) */
-#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
-#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
-#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
+#define RTAS_LOG_V6_HP_ID_DRC_NAME          1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX         2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT         3
+#ifdef GUEST_SUPPORTS_MODERN
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
+#endif
     uint8_t hotplug_identifier;         /* type of the resource identifier,
                                          * which serves as the discriminator
                                          * for the 'drc' union field below
                                          */
+#ifdef GUEST_SUPPORTS_MODERN
+    uint8_t capabilities;               /* capability flags, currently unused
+                                         * by QEMU
+                                         */
+#else
     uint8_t reserved;
+#endif
     union {
         uint32_t index;                 /* DRC index of resource to take action
                                          * on
@@ -278,6 +302,19 @@ struct rtas_event_log_v6_hp {
         uint32_t count;                 /* number of DR resources to take
                                          * action on (guest chooses which)
                                          */
+#ifdef GUEST_SUPPORTS_MODERN
+        struct {
+            uint32_t count;             /* number of DR resources to take
+                                         * action on
+                                         */
+            uint32_t index;             /* DRC index of first resource to take
+                                         * action on. guest will take action
+                                         * on DRC index <index> through
+                                         * DRC index <index + count - 1> in
+                                         * sequential order
+                                         */
+        } count_indexed;
+#endif
         char name[1];                   /* string representing the name of the
                                          * DRC to take action on
                                          */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (5 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:38   ` David Gibson
  2016-10-14  8:37   ` Bharata B Rao
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source Michael Roth
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

This adds machine options of the form:

  -machine pseries,legacy-hotplug-events=true
  -machine pseries,legacy-hotplug-events=false

to denote whether or not we wish to force the use of "legacy" style
hotplug events, which are surfaced through EPOW interrupts instead of
a dedicated interrupt source, and lack certain features necessary,
mainly, for memory unplug support.

If false, QEMU will default to "legacy" style unless the guest
advertises support for the newer events via
ibm,client-architecture-support hcall during early boot.

For pseries-2.7 and earlier we default to true, for newer machine
types we default to false.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h      |  1 +
 include/hw/ppc/spapr_ovec.h |  1 +
 3 files changed, 33 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f8cde92..d80a6fa 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
 
     spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
 
+    /* use dedicated HP event source if guest supports it */
+    if (spapr->use_hotplug_event_source) {
+        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
+    }
+
     /* init CPUs */
     if (machine->cpu_model == NULL) {
         machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
@@ -2172,16 +2177,39 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
     spapr->kvm_type = g_strdup(value);
 }
 
+static bool spapr_get_legacy_hotplug_events(Object *obj, Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return !spapr->use_hotplug_event_source;
+}
+
+static void spapr_set_legacy_hotplug_events(Object *obj, bool value,
+                                            Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->use_hotplug_event_source = !value;
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
 
     spapr->htab_fd = -1;
+    spapr->use_hotplug_event_source = true;
     object_property_add_str(obj, "kvm-type",
                             spapr_get_kvm_type, spapr_set_kvm_type, NULL);
     object_property_set_description(obj, "kvm-type",
                                     "Specifies the KVM virtualization mode (HV, PR)",
                                     NULL);
+    object_property_add_bool(obj, "legacy-hotplug-events",
+                            spapr_get_legacy_hotplug_events,
+                            spapr_set_legacy_hotplug_events,
+                            NULL);
+    object_property_set_description(obj, "legacy-hotplug-events",
+                                    "Use deprecated EPOW mechanism for hotplug events",
+                                    NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -2518,6 +2546,9 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
 
 static void spapr_machine_2_7_instance_options(MachineState *machine)
 {
+    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
+    spapr->use_hotplug_event_source = false;
 }
 
 static void spapr_machine_2_7_class_options(MachineClass *mc)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 27a3328..d1a4a14 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -74,6 +74,7 @@ struct sPAPRMachineState {
     uint32_t check_exception_irq;
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
+    bool use_hotplug_event_source;
 
     /* Migration state */
     int htab_save_index;
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index 47fa04c..92167c6 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 /* option vector 5 */
 #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
 #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
+#define OV5_HP_EVT              OV_BIT(6, 5)
 
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (6 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:56   ` David Gibson
  2016-10-14  8:46   ` Bharata B Rao
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type Michael Roth
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |  10 ++--
 hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
 include/hw/ppc/spapr.h |   3 +-
 3 files changed, 120 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d80a6fa..2037222 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
                                    hwaddr initrd_size,
                                    hwaddr kernel_size,
                                    bool little_endian,
-                                   const char *kernel_cmdline,
-                                   uint32_t epow_irq)
+                                   const char *kernel_cmdline)
 {
     void *fdt;
     uint32_t start_prop = cpu_to_be32(initrd_base);
@@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     _FDT((fdt_end_node(fdt)));
 
     /* event-sources */
-    spapr_events_fdt_skel(fdt, epow_irq);
+    spapr_events_fdt_skel(fdt);
 
     /* /hypervisor node */
     if (kvm_enabled()) {
@@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
     }
     g_free(filename);
 
-    /* Set up EPOW events infrastructure */
+    /* Set up RTAS event infrastructure */
     spapr_events_init(spapr);
 
     /* Set up the RTC RTAS interfaces */
@@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
                                             kernel_size, kernel_le,
-                                            kernel_cmdline,
-                                            spapr->check_exception_irq);
+                                            kernel_cmdline);
     assert(spapr->fdt_skel != NULL);
 
     /* used by RTAS */
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 4c7b6ae..f8bbec6 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -40,6 +40,7 @@
 #include "hw/ppc/spapr_drc.h"
 #include "qemu/help_option.h"
 #include "qemu/bcd.h"
+#include "hw/ppc/spapr_ovec.h"
 #include <libfdt.h>
 
 struct rtas_error_log {
@@ -206,28 +207,104 @@ struct hp_log_full {
     struct rtas_event_log_v6_hp hp;
 } QEMU_PACKED;
 
-#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
-#define EVENT_MASK_EPOW                      0x40000000
-#define EVENT_MASK_HOTPLUG                   0x10000000
-#define EVENT_MASK_IO                        0x08000000
+typedef enum EventClassIndex {
+    EVENT_CLASS_INTERNAL_ERRORS     = 0,
+    EVENT_CLASS_EPOW                = 1,
+    EVENT_CLASS_RESERVED            = 2,
+    EVENT_CLASS_HOT_PLUG            = 3,
+    EVENT_CLASS_IO                  = 4,
+    EVENT_CLASS_MAX
+} EventClassIndex;
+
+#define EVENT_CLASS_MASK(index) (1 << (31 - index))
+
+typedef struct EventSource {
+    const char *name;
+    int irq;
+    uint32_t mask;
+    bool enabled;
+} EventSource;
+
+static EventSource event_source[EVENT_CLASS_MAX] = {
+    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
+    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
+    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
+    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
+};
+
+static void rtas_event_source_register(EventClassIndex index, int irq)
+{
+    /* we only support 1 irq per event class at the moment */
+    g_assert(!event_source[index].enabled);
+    event_source[index].irq = irq;
+    event_source[index].mask = EVENT_CLASS_MASK(index);
+    event_source[index].enabled = true;
+}
 
-void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
+void spapr_events_fdt_skel(void *fdt)
 {
-    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
-    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
+    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
+    int i, count = 0;
 
     _FDT((fdt_begin_node(fdt, "event-sources")));
 
+    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
+        /* TODO: what does 0 entail? */
+        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
+
+        if (!event_source[i].enabled) {
+            continue;
+        }
+
+        _FDT((fdt_begin_node(fdt, event_source[i].name)));
+        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
+        _FDT((fdt_end_node(fdt)));
+
+        irq_ranges[count++] = interrupts[0];
+        irq_ranges[count++] = cpu_to_be32(1);
+    }
+
+    /* TODO: confirm the count is the last expected element */
+    irq_ranges[count] = cpu_to_be32(count);
+    count++;
+
     _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
     _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
     _FDT((fdt_property(fdt, "interrupt-ranges",
-                       irq_ranges, sizeof(irq_ranges))));
+                       irq_ranges, count * sizeof(uint32_t))));
 
-    _FDT((fdt_begin_node(fdt, "epow-events")));
-    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
     _FDT((fdt_end_node(fdt)));
+}
 
-    _FDT((fdt_end_node(fdt)));
+static const EventSource *rtas_event_log_to_source(int log_type)
+{
+    const EventSource *source;
+
+    switch (log_type) {
+        case RTAS_LOG_TYPE_HOTPLUG:
+            source = &event_source[EVENT_CLASS_HOT_PLUG];
+            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
+                break;
+            }
+            /* fall back to epow for legacy hotplug interrupt source */
+        case RTAS_LOG_TYPE_EPOW:
+            source = &event_source[EVENT_CLASS_EPOW];
+            break;
+        default:
+            source = NULL;
+    }
+
+    return source;
+}
+
+static int rtas_event_log_to_irq(int log_type)
+{
+    const EventSource *source = rtas_event_log_to_source(log_type);
+
+    g_assert(source);
+    g_assert(source->enabled);
+
+    return source->irq;
 }
 
 static void rtas_event_log_queue(int log_type, void *data, bool exception)
@@ -248,19 +325,14 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = NULL;
 
-    /* we only queue EPOW events atm. */
-    if ((event_mask & EVENT_MASK_EPOW) == 0) {
-        return NULL;
-    }
-
     QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
+        const EventSource *source = rtas_event_log_to_source(entry->log_type);
+
         if (entry->exception != exception) {
             continue;
         }
 
-        /* EPOW and hotplug events are surfaced in the same manner */
-        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
-            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
+        if (source->mask & event_mask) {
             break;
         }
     }
@@ -277,19 +349,14 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     sPAPREventLogEntry *entry = NULL;
 
-    /* we only queue EPOW events atm. */
-    if ((event_mask & EVENT_MASK_EPOW) == 0) {
-        return false;
-    }
-
     QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
+        const EventSource *source = rtas_event_log_to_source(entry->log_type);
+
         if (entry->exception != exception) {
             continue;
         }
 
-        /* EPOW and hotplug events are surfaced in the same manner */
-        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
-            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
+        if (source->mask & event_mask) {
             return true;
         }
     }
@@ -377,7 +444,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
 
     rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
 
-    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->xics,
+                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_EPOW)));
 }
 
 static void spapr_hotplug_set_signalled(uint32_t drc_index)
@@ -459,7 +527,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
 
     rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
 
-    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->xics,
+                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_HOTPLUG)));
 }
 
 void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
@@ -505,6 +574,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
     uint64_t xinfo;
     sPAPREventLogEntry *event;
     struct rtas_error_log *hdr;
+    int i;
 
     if ((nargs < 6) || (nargs > 7) || nret != 1) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -541,8 +611,11 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
      * do the latter here, since our code relies on edge-triggered
      * interrupts.
      */
-    if (rtas_event_log_contains(mask, true)) {
-        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
+    for (i = 0; i < EVENT_CLASS_MAX; i++) {
+        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
+            g_assert(event_source[i].enabled);
+            qemu_irq_pulse(xics_get_qirq(spapr->xics, event_source[i].irq));
+        }
     }
 
     return;
@@ -594,8 +667,17 @@ out_no_events:
 void spapr_events_init(sPAPRMachineState *spapr)
 {
     QTAILQ_INIT(&spapr->pending_events);
-    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, 0, false,
-                                            &error_fatal);
+
+    rtas_event_source_register(EVENT_CLASS_EPOW,
+                               xics_spapr_alloc(spapr->xics, 0, 0, false,
+                                                &error_fatal));
+
+    if (spapr->use_hotplug_event_source) {
+        rtas_event_source_register(EVENT_CLASS_HOT_PLUG,
+                                   xics_spapr_alloc(spapr->xics, 0, 0, false,
+                                                    &error_fatal));
+    }
+
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d1a4a14..2295ac6 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -71,7 +71,6 @@ struct sPAPRMachineState {
     sPAPROptionVector *ov5_cas;
     bool cas_reboot;
 
-    uint32_t check_exception_irq;
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
     bool use_hotplug_event_source;
@@ -579,7 +578,7 @@ struct sPAPREventLogEntry {
 };
 
 void spapr_events_init(sPAPRMachineState *sm);
-void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
+void spapr_events_fdt_skel(void *fdt);
 int spapr_h_cas_compose_response(sPAPRMachineState *sm,
                                  target_ulong addr, target_ulong size,
                                  bool cpu_update,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (7 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  4:59   ` David Gibson
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 10/11] spapr: use count+index for memory hotplug Michael Roth
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
  used in PAPR hotplug ACR
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_events.c  | 74 ++++++++++++++++++++++++++++++++++++++++----------
 include/hw/ppc/spapr.h |  4 +++
 2 files changed, 63 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index f8bbec6..eeca800 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -175,6 +175,16 @@ struct epow_log_full {
     struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+union drc_identifier {
+    uint32_t index;
+    uint32_t count;
+    struct {
+        uint32_t count;
+        uint32_t index;
+    } count_indexed;
+    char name[1];
+} QEMU_PACKED;
+
 struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
     struct rtas_event_log_v6_section_header hdr;
@@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
 #define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
 #define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
 #define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED              4
     uint8_t reserved;
-    union {
-        uint32_t index;
-        uint32_t count;
-        char name[1];
-    } drc;
+    union drc_identifier drc_id;
 } QEMU_PACKED;
 
 struct hp_log_full {
@@ -457,7 +464,7 @@ static void spapr_hotplug_set_signalled(uint32_t drc_index)
 
 static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
                                     sPAPRDRConnectorType drc_type,
-                                    uint32_t drc)
+                                    union drc_identifier *drc_id)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     struct hp_log_full *new_hp;
@@ -502,7 +509,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     case SPAPR_DR_CONNECTOR_TYPE_PCI:
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
         if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
-            spapr_hotplug_set_signalled(drc);
+            spapr_hotplug_set_signalled(drc_id->index);
         }
         break;
     case SPAPR_DR_CONNECTOR_TYPE_LMB:
@@ -520,9 +527,16 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
     }
 
     if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
-        hp->drc.count = cpu_to_be32(drc);
+        hp->drc_id.count = cpu_to_be32(drc_id->count);
     } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
-        hp->drc.index = cpu_to_be32(drc);
+        hp->drc_id.index = cpu_to_be32(drc_id->index);
+    } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
+        /* we should not be using count_indexed value unless the guest
+         * supports dedicated hotplug event source
+         */
+        g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
+        hp->drc_id.count_indexed.count = cpu_to_be32(drc_id->count_indexed.count);
+        hp->drc_id.count_indexed.index = cpu_to_be32(drc_id->count_indexed.index);
     }
 
     rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
@@ -535,34 +549,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
 {
     sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
     sPAPRDRConnectorType drc_type = drck->get_type(drc);
-    uint32_t index = drck->get_index(drc);
+    union drc_identifier drc_id;
 
+    drc_id.index = drck->get_index(drc);
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
-                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
 {
     sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
     sPAPRDRConnectorType drc_type = drck->get_type(drc);
-    uint32_t index = drck->get_index(drc);
+    union drc_identifier drc_id;
 
+    drc_id.index = drck->get_index(drc);
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
-                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
                                        uint32_t count)
 {
+    union drc_identifier drc_id;
+
+    drc_id.count = count;
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
-                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, count);
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
 }
 
 void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
                                           uint32_t count)
 {
+    union drc_identifier drc_id;
+
+    drc_id.count = count;
     spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
-                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, count);
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
+}
+
+void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                            uint32_t count, uint32_t index)
+{
+    union drc_identifier drc_id;
+
+    drc_id.count_indexed.count = count;
+    drc_id.count_indexed.index = index;
+    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
+                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
+}
+
+void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                               uint32_t count, uint32_t index)
+{
+    union drc_identifier drc_id;
+
+    drc_id.count_indexed.count = count;
+    drc_id.count_indexed.index = index;
+    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
+                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
 }
 
 static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 2295ac6..11a2597 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -602,6 +602,10 @@ void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
                                        uint32_t count);
 void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
                                           uint32_t count);
+void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                            uint32_t count, uint32_t index);
+void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
+                                               uint32_t count, uint32_t index);
 void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp);
 void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
                                     sPAPRMachineState *spapr);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 10/11] spapr: use count+index for memory hotplug
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (8 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support Michael Roth
  2016-10-14  4:10 ` [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support no-reply
  11 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

Commit 0a417869:

    spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type

dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.

With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.

Cc: bharata@linux.vnet.ibm.com
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2037222..9af4268 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2232,14 +2232,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
-static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
-                           uint32_t node, Error **errp)
+static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
+                           uint32_t node, bool dedicated_hp_event_source,
+                           Error **errp)
 {
     sPAPRDRConnector *drc;
     sPAPRDRConnectorClass *drck;
     uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
     int i, fdt_offset, fdt_size;
     void *fdt;
+    uint64_t addr = addr_start;
 
     for (i = 0; i < nr_lmbs; i++) {
         drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
@@ -2258,7 +2260,16 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
      * guest only in case of hotplugged memory
      */
     if (dev->hotplugged) {
-       spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+        if (dedicated_hp_event_source) {
+            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                           addr_start / SPAPR_MEMORY_BLOCK_SIZE);
+            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+            spapr_hotplug_req_add_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                                   nr_lmbs,
+                                                   drck->get_index(drc));
+        } else {
+            spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+        }
     }
 }
 
@@ -2291,7 +2302,9 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    spapr_add_lmbs(dev, addr, size, node, &error_abort);
+    spapr_add_lmbs(dev, addr, size, node,
+                   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
+                   &error_abort);
 
 out:
     error_propagate(errp, local_err);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (9 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 10/11] spapr: use count+index for memory hotplug Michael Roth
@ 2016-10-12 23:13 ` Michael Roth
  2016-10-14  7:05   ` Bharata B Rao
  2016-10-14  4:10 ` [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support no-reply
  11 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-12 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, bharata, david, nfont, jallen

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Add support to hot remove pc-dimm memory devices.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c     | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_drc.c |  17 +++++++++
 2 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9af4268..180fa3d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2310,6 +2310,90 @@ out:
     error_propagate(errp, local_err);
 }
 
+typedef struct sPAPRDIMMState {
+    uint32_t nr_lmbs;
+} sPAPRDIMMState;
+
+static void spapr_lmb_release(DeviceState *dev, void *opaque)
+{
+    sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
+    HotplugHandler *hotplug_ctrl = NULL;
+
+    if (--ds->nr_lmbs) {
+        return;
+    }
+
+    g_free(ds);
+
+    /*
+     * Now that all the LMBs have been removed by the guest, call the
+     * pc-dimm unplug handler to cleanup up the pc-dimm device.
+     */
+    hotplug_ctrl = qdev_get_hotplug_handler(dev);
+    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
+}
+
+static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
+                           Error **errp)
+{
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
+    int i;
+    sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
+    uint64_t addr = addr_start;
+
+    ds->nr_lmbs = nr_lmbs;
+    for (i = 0; i < nr_lmbs; i++) {
+        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                addr / SPAPR_MEMORY_BLOCK_SIZE);
+        g_assert(drc);
+
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+        drck->detach(drc, dev, spapr_lmb_release, ds, errp);
+        addr += SPAPR_MEMORY_BLOCK_SIZE;
+    }
+
+    drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                              nr_lmbs,
+                                              drck->get_index(drc));
+}
+
+static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                Error **errp)
+{
+    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+
+    pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
+    object_unparent(OBJECT(dev));
+}
+
+static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
+                                        DeviceState *dev, Error **errp)
+{
+    Error *local_err = NULL;
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+    uint64_t size = memory_region_size(mr);
+    uint64_t addr;
+
+    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    spapr_del_lmbs(dev, addr, size, &error_abort);
+out:
+    error_propagate(errp, local_err);
+}
+
 void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
                                     sPAPRMachineState *spapr)
 {
@@ -2383,10 +2467,21 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
+    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
     MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        error_setg(errp, "Memory hot unplug not supported by sPAPR");
+        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
+            spapr_memory_unplug(hotplug_dev, dev, errp);
+        } else {
+            /* NOTE: this means there is a window after guest reset, prior to
+             * CAS negotiation, where unplug requests will fail due to the
+             * capability not being detected yet. This is a bit different than
+             * the case with PCI unplug, where the events will be queued and
+             * eventually handled by the guest after boot
+             */
+            error_setg(errp, "Memory hot unplug not supported for this guest");
+        }
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
         if (!mc->query_hotpluggable_cpus) {
             error_setg(errp, "CPU hot unplug not supported on this machine");
@@ -2396,6 +2491,14 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
     }
 }
 
+static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
+                                                DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        spapr_memory_unplug_request(hotplug_dev, dev, errp);
+    }
+}
+
 static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
                                           DeviceState *dev, Error **errp)
 {
@@ -2482,6 +2585,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     hc->plug = spapr_machine_device_plug;
     hc->unplug = spapr_machine_device_unplug;
     mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
+    hc->unplug_request = spapr_machine_device_unplug_request;
 
     smc->dr_lmb_enabled = true;
     smc->tcg_default_cpu = "POWER8";
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 6e54fd4..a0c44ee 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -68,6 +68,23 @@ static uint32_t set_isolation_state(sPAPRDRConnector *drc,
         }
     }
 
+    /*
+     * Fail any requests to ISOLATE the LMB DRC if this LMB doesn't
+     * belong to a DIMM device that is marked for removal.
+     *
+     * Currently the guest userspace tool drmgr that drives the memory
+     * hotplug/unplug will just try to remove a set of 'removable' LMBs
+     * in response to a hot unplug request that is based on drc-count.
+     * If the LMB being removed doesn't belong to a DIMM device that is
+     * actually being unplugged, fail the isolation request here.
+     */
+    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_LMB) {
+        if ((state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
+             !drc->awaiting_release) {
+            return RTAS_OUT_HW_ERROR;
+        }
+    }
+
     drc->isolation_state = state;
 
     if (drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers Michael Roth
@ 2016-10-14  2:39   ` David Gibson
  2016-10-14 17:49     ` Michael Roth
  0 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2016-10-14  2:39 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 14216 bytes --]

On Wed, Oct 12, 2016 at 06:13:49PM -0500, Michael Roth wrote:
> PAPR guests advertise their capabilities to the platform by passing
> an ibm,architecture-vec structure via an
> ibm,client-architecture-support hcall as described by LoPAPR v11,
> B.6.2.3. during early boot.
> 
> Using this information, the platform enables the capabilities it
> supports, then encodes a subset of those enabled capabilities (the
> 5th option vector of the ibm,architecture-vec structure passed to
> ibm,client-architecture-support) into the guest device tree via
> "/chosen/ibm,architecture-vec-5".
> 
> The logical format of these these option vectors is a bit-vector,
> where individual bits are addressed/documented based on the byte-wise
> offset from the beginning of the bit-vector, followed by the bit-wise
> index starting from the byte-wise offset. Thus the bits of each of
> these bytes are stored in reverse order. Additionally, the first
> byte of each option vector is encodes the length of the option vector,
> so byte offsets begin at 1, and bit offset at 0.

Heh.. pity qemu doesn't use the ccan bitmap module
(http://ccodearchive.net/info/bitmap.html).  By design it always
stores the bitmaps in IBM bit number ordering, because that's most
obvious to a human reading a memory dump (for the purpose of bit
vectors - in most situations the IBM numbering is dumb).

> This is not very intuitive for the purposes of mapping these bits to
> a particular documented capability, so this patch introduces a set
> of abstractions that encapsulate the work of parsing/encoding these
> options vectors and testing for individual capabilities.
> 
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

A handful of small nits.

> ---
>  hw/ppc/Makefile.objs        |   2 +-
>  hw/ppc/spapr_ovec.c         | 244 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_ovec.h |  62 +++++++++++
>  3 files changed, 307 insertions(+), 1 deletion(-)
>  create mode 100644 hw/ppc/spapr_ovec.c
>  create mode 100644 include/hw/ppc/spapr_ovec.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 99a0d4e..2e0b0c9 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> -obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> +obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> new file mode 100644
> index 0000000..ddc19f5
> --- /dev/null
> +++ b/hw/ppc/spapr_ovec.c
> @@ -0,0 +1,244 @@
> +/*
> + * QEMU SPAPR Architecture Option Vector Helper Functions
> + *
> + * Copyright IBM Corp. 2016
> + *
> + * Authors:
> + *  Bharata B Rao     <bharata@linux.vnet.ibm.com>
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/ppc/spapr_ovec.h"
> +#include "qemu/bitmap.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/error-report.h"
> +#include <libfdt.h>
> +
> +/* #define DEBUG_SPAPR_OVEC */
> +
> +#ifdef DEBUG_SPAPR_OVEC
> +#define DPRINTFN(fmt, ...) \
> +    do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTFN(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +#define OV_MAXBYTES 256 /* not including length byte */
> +#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
> +
> +/* we *could* work with bitmaps directly, but handling the bitmap privately
> + * allows us to more safely make assumptions about the bitmap size and
> + * simplify the calling code somewhat
> + */
> +struct sPAPROptionVector {
> +    unsigned long *bitmap;
> +};
> +
> +static sPAPROptionVector *spapr_ovec_from_bitmap(unsigned long *bitmap)
> +{
> +    sPAPROptionVector *ov;
> +
> +    g_assert(bitmap);
> +
> +    ov = g_new0(sPAPROptionVector, 1);
> +    ov->bitmap = bitmap;
> +
> +    return ov;
> +}
> +
> +sPAPROptionVector *spapr_ovec_new(void)
> +{
> +    return spapr_ovec_from_bitmap(bitmap_new(OV_MAXBITS));
> +}
> +
> +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
> +{
> +    sPAPROptionVector *ov;
> +
> +    g_assert(ov_orig);
> +
> +    ov = spapr_ovec_new();
> +    bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
> +
> +    return ov;
> +}
> +
> +void spapr_ovec_intersect(sPAPROptionVector *ov,
> +                          sPAPROptionVector *ov1,
> +                          sPAPROptionVector *ov2)
> +{
> +    g_assert(ov);
> +    g_assert(ov1);
> +    g_assert(ov2);
> +
> +    bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
> +}
> +
> +/* returns true if options bits were removed, false otherwise */
> +bool spapr_ovec_diff(sPAPROptionVector *ov,
> +                     sPAPROptionVector *ov_old,
> +                     sPAPROptionVector *ov_new)
> +{
> +    unsigned long *change_mask = bitmap_new(OV_MAXBITS);
> +    unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
> +    bool bits_were_removed = false;
> +
> +    g_assert(ov);
> +    g_assert(ov_old);
> +    g_assert(ov_new);
> +
> +    bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
> +    bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
> +    bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
> +
> +    if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
> +        bits_were_removed = true;
> +    }
> +
> +    g_free(change_mask);
> +    g_free(removed_bits);
> +
> +    return bits_were_removed;
> +}
> +
> +void spapr_ovec_cleanup(sPAPROptionVector *ov)
> +{
> +    if (ov) {
> +        g_free(ov->bitmap);
> +        g_free(ov);
> +    }
> +}
> +
> +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    set_bit(bitnr, ov->bitmap);
> +}
> +
> +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    clear_bit(bitnr, ov->bitmap);
> +}
> +
> +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> +
> +    return test_bit(bitnr, ov->bitmap) ? true : false;
> +}
> +
> +static void guest_byte_to_bitmap(uint8_t entry, unsigned long *bitmap,
> +                                 long bitmap_offset)
> +{
> +    int i;
> +
> +    for (i = 0; i < BITS_PER_BYTE; i++) {
> +        if (entry & (1 << (BITS_PER_BYTE - 1 - i))) {
> +            bitmap_set(bitmap, bitmap_offset + i, 1);
> +        }
> +    }
> +}
> +
> +static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
> +{
> +    uint8_t entry = 0;
> +    int i;
> +
> +    for (i = 0; i < BITS_PER_BYTE; i++) {
> +        if (test_bit(bitmap_offset + i, bitmap)) {
> +            entry |= (1 << (BITS_PER_BYTE - 1 - i));
> +        }
> +    }
> +
> +    return entry;
> +}
> +
> +static target_ulong vector_addr(target_ulong table_addr, int vector)
> +{
> +    uint16_t vector_count, vector_len;
> +    int i;
> +
> +    vector_count = ldub_phys(&address_space_memory, table_addr) + 1;
> +    if (vector > vector_count) {
> +        return 0;
> +    }
> +    table_addr++; /* skip nr option vectors */
> +
> +    for (i = 0; i < vector - 1; i++) {
> +        vector_len = ldub_phys(&address_space_memory, table_addr) + 2;
> +        table_addr += vector_len;
> +    }
> +    return table_addr;
> +}
> +
> +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector)
> +{
> +    unsigned long *bitmap;
> +    target_ulong addr;
> +    uint16_t vector_len;
> +    int i;
> +
> +    g_assert(table_addr);
> +    g_assert_cmpint(vector, >=, 1); /* vector numbering starts at 1 */
> +
> +    addr = vector_addr(table_addr, vector);
> +    if (!addr) {
> +        /* specified vector isn't present */
> +        return NULL;
> +    }
> +
> +    vector_len = ldub_phys(&address_space_memory, addr++) + 1;

Here you use vector_len to be the number of bytes _not_ including the
length byte, but in other places you use the same name including the
length byte, which is a litle confusing.

> +    if (vector_len >= OV_MAXBYTES) {

Do you mean >= here, or >?  If so, what's wrong with vector_len ==
256, I thought that was explicitly permitted in the encoding?  If not,
then there's no need for the test since a byte load + 1 can't possibly
exceed 256 (you could have an assert if you want).

> +        error_report("guest option vector length %i exceeds max of %i",
> +                     vector_len, OV_MAXBYTES);
> +    }
> +    bitmap = bitmap_new(OV_MAXBITS);
> +
> +    for (i = 0; i < vector_len; i++) {
> +        uint8_t entry = ldub_phys(&address_space_memory, addr + i);
> +        if (entry) {
> +            DPRINTFN("read guest vector %2d, byte %3d / %3d: 0x%.2x",
> +                     vector, i + 1, vector_len, entry);
> +            guest_byte_to_bitmap(entry, bitmap, i * BITS_PER_BYTE);
> +        }
> +    }
> +
> +    return spapr_ovec_from_bitmap(bitmap);

This is the only caller of spapr_ovec_from_bitmap().  You could
equally well just use ovec_new() here and reach in to populate the
bitmap.  Means you don't need to expose spapr_ovec_from_bitmap() which
is only safe if the supplied bitmap is the right size.

> +}
> +
> +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> +                           sPAPROptionVector *ov, const char *name)
> +{
> +    uint8_t vec[OV_MAXBYTES + 1];
> +    uint16_t vec_len;
> +    unsigned long lastbit;
> +    int i;
> +
> +    g_assert(ov);
> +
> +    lastbit = MIN(find_last_bit(ov->bitmap, OV_MAXBITS), OV_MAXBITS - 1);
> +    vec_len = lastbit / BITS_PER_BYTE + 2;

If no bits are set at all, find_last_bit() will return 2048, which
means you'll include a max size vector when you actually want a
minimum size vector.

> +    g_assert_cmpint(vec_len - 2, <=, UINT8_MAX);
> +    vec[0] = vec_len - 2; /* guest expects length encoded as n - 2 */
> +
> +    for (i = 1; i < vec_len; i++) {
> +        vec[i] = guest_byte_from_bitmap(ov->bitmap, (i - 1) * BITS_PER_BYTE);
> +        if (vec[i]) {
> +            DPRINTFN("encoding guest vector byte %3d / %3d: 0x%.2x",
> +                     i, vec_len, vec[i]);
> +        }
> +    }
> +
> +    return fdt_setprop(fdt, fdt_offset, name, vec, vec_len);
> +}
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> new file mode 100644
> index 0000000..fba2d98
> --- /dev/null
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -0,0 +1,62 @@
> +/*
> + * QEMU SPAPR Option/Architecture Vector Definitions
> + *
> + * Each architecture option is organized/documented by the following
> + * in LoPAPR 1.1, Table 244:
> + *
> + *   <vector number>: the bit-vector in which the option is located
> + *   <vector byte>: the byte offset of the vector entry
> + *   <vector bit>: the bit offset within the vector entry
> + *
> + * where each vector entry can be one or more bytes.
> + *
> + * Firmware expects a somewhat literal encoding of this bit-vector
> + * structure, where each entry is stored in little-endian so that the
> + * byte ordering reflects that of the documentation, but where each bit
> + * offset is from "left-to-right" in the traditional representation of
> + * a byte value where the MSB is the left-most bit. Thus, each
> + * individual byte encodes the option bits in reverse order of the
> + * documented bit.
> + *
> + * These definitions/helpers attempt to abstract away this internal
> + * representation so that we can define/set/test for individual option
> + * bits using only the documented values. This is done mainly by relying
> + * on a bitmap to approximate the documented "bit-vector" structure and
> + * handling conversations to-from the internal representation under the
> + * covers.
> + *
> + * Copyright IBM Corp. 2016
> + *
> + * Authors:
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)
> +#define __HW_SPAPR_OPTION_VECTORS_H__
> +
> +#include "cpu.h"
> +
> +typedef struct sPAPROptionVector sPAPROptionVector;
> +
> +#define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
> +
> +/* interfaces */
> +sPAPROptionVector *spapr_ovec_new(void);
> +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> +void spapr_ovec_intersect(sPAPROptionVector *ov,
> +                          sPAPROptionVector *ov1,
> +                          sPAPROptionVector *ov2);
> +bool spapr_ovec_diff(sPAPROptionVector *ov,
> +                     sPAPROptionVector *ov_old,
> +                     sPAPROptionVector *ov_new);
> +void spapr_ovec_cleanup(sPAPROptionVector *ov);
> +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
> +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
> +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
> +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
> +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> +                           sPAPROptionVector *ov, const char *name);
> +
> +#endif /* !defined (__HW_SPAPR_OPTION_VECTORS_H__) */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
@ 2016-10-14  3:02   ` David Gibson
  2016-10-14  4:20     ` David Gibson
  2016-10-14  7:10   ` Bharata B Rao
  1 sibling, 1 reply; 35+ messages in thread
From: David Gibson @ 2016-10-14  3:02 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 8693 bytes --]

On Wed, Oct 12, 2016 at 06:13:50PM -0500, Michael Roth wrote:
> Currently we access individual bytes of an option vector via
> ldub_phys() to test for the presence of a particular capability
> within that byte. Currently this is only done for the "dynamic
> reconfiguration memory" capability bit. If that bit is present,
> we pass a boolean value to spapr_h_cas_compose_response()
> to generate a modified device tree segment with the additional
> properties required to enable this functionality.
> 
> As more capability bits are added, will would need to modify the
> code to add additional option vector accesses and extend the
> param list for spapr_h_cas_compose_response() to include similar
> boolean values for these parameters.
> 
> Avoid this by switching to spapr_ovec_* helpers so we can do all
> the parsing in one shot and then test for these additional bits
> within spapr_h_cas_compose_response() directly.
> 
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c              | 10 ++++++--
>  hw/ppc/spapr_hcall.c        | 56 ++++++++++++---------------------------------
>  include/hw/ppc/spapr.h      |  5 +++-
>  include/hw/ppc/spapr_ovec.h |  3 +++
>  4 files changed, 30 insertions(+), 44 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 03e3803..934d6b2 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -856,7 +856,7 @@ out:
>  
>  int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update, bool memory_update)
> +                                 bool cpu_update)
>  {
>      void *fdt, *fdt_skel;
>      sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> @@ -880,7 +880,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
>      }
>  
>      /* Generate ibm,dynamic-reconfiguration-memory node if required */
> -    if (memory_update && smc->dr_lmb_enabled) {
> +    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
> +        g_assert(smc->dr_lmb_enabled);
>          _FDT((spapr_populate_drconf_memory(spapr, fdt)));
>      }
>  
> @@ -1769,7 +1770,12 @@ static void ppc_spapr_init(MachineState *machine)
>                                     DIV_ROUND_UP(max_cpus * smt, smp_threads),
>                                     XICS_IRQS_SPAPR, &error_fatal);
>  
> +    /* Set up containers for ibm,client-set-architecture negotiated options */
> +    spapr->ov5 = spapr_ovec_new();
> +    spapr->ov5_cas = spapr_ovec_new();
> +
>      if (smc->dr_lmb_enabled) {
> +        spapr_ovec_set(spapr->ov5, OV5_DRCONF_MEMORY);
>          spapr_validate_node_memory(machine, &error_fatal);
>      }
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index c5e7e8c..f1d081b 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -11,6 +11,7 @@
>  #include "trace.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_ppc.h"
> +#include "hw/ppc/spapr_ovec.h"
>  
>  struct SPRSyncState {
>      int spr;
> @@ -880,32 +881,6 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      return ret;
>  }
>  
> -/*
> - * Return the offset to the requested option vector @vector in the
> - * option vector table @table.
> - */
> -static target_ulong cas_get_option_vector(int vector, target_ulong table)
> -{
> -    int i;
> -    char nr_vectors, nr_entries;
> -
> -    if (!table) {
> -        return 0;
> -    }
> -
> -    nr_vectors = (ldl_phys(&address_space_memory, table) >> 24) + 1;
> -    if (!vector || vector > nr_vectors) {
> -        return 0;
> -    }
> -    table++; /* skip nr option vectors */
> -
> -    for (i = 0; i < vector - 1; i++) {
> -        nr_entries = ldl_phys(&address_space_memory, table) >> 24;
> -        table += nr_entries + 2;
> -    }
> -    return table;
> -}
> -
>  typedef struct {
>      uint32_t cpu_version;
>      Error *err;
> @@ -961,23 +936,21 @@ static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, uint32_t pvr,
>      }
>  }
>  
> -#define OV5_DRCONF_MEMORY 0x20
> -
>  static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>                                                    sPAPRMachineState *spapr,
>                                                    target_ulong opcode,
>                                                    target_ulong *args)
>  {
>      target_ulong list = ppc64_phys_to_real(args[0]);
> -    target_ulong ov_table, ov5;
> +    target_ulong ov_table;
>      PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu_);
>      CPUState *cs;
> -    bool cpu_match = false, cpu_update = true, memory_update = false;
> +    bool cpu_match = false, cpu_update = true;
>      unsigned old_cpu_version = cpu_->cpu_version;
>      unsigned compat_lvl = 0, cpu_version = 0;
>      unsigned max_lvl = get_compat_level(cpu_->max_compat);
>      int counter;
> -    char ov5_byte2;
> +    sPAPROptionVector *ov5_guest;
>  
>      /* Parse PVR list */
>      for (counter = 0; counter < 512; ++counter) {
> @@ -1033,19 +1006,20 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>      /* For the future use: here @ov_table points to the first option vector */
>      ov_table = list;
>  
> -    ov5 = cas_get_option_vector(5, ov_table);
> -    if (!ov5) {
> -        return H_SUCCESS;
> -    }
> +    ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
>  
> -    /* @list now points to OV 5 */
> -    ov5_byte2 = ldub_phys(&address_space_memory, ov5 + 2);
> -    if (ov5_byte2 & OV5_DRCONF_MEMORY) {
> -        memory_update = true;
> -    }
> +    /* NOTE: there are actually a number of ov5 bits where input from the
> +     * guest is always zero, and the platform/QEMU enables them independently
> +     * of guest input. To model these properly we'd want some sort of mask,
> +     * but since they only currently apply to memory migration as defined
> +     * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
> +     * to worry about this.
> +     */
> +    spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
> +    spapr_ovec_cleanup(ov5_guest);
>  
>      if (spapr_h_cas_compose_response(spapr, args[1], args[2],
> -                                     cpu_update, memory_update)) {
> +                                     cpu_update)) {
>          qemu_system_reset_request();
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 39dadaa..6c20d28 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -6,6 +6,7 @@
>  #include "hw/ppc/xics.h"
>  #include "hw/ppc/spapr_drc.h"
>  #include "hw/mem/pc-dimm.h"
> +#include "hw/ppc/spapr_ovec.h"
>  
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> @@ -66,6 +67,8 @@ struct sPAPRMachineState {
>      uint64_t rtc_offset; /* Now used only during incoming migration */
>      struct PPCTimebase tb;
>      bool has_graphics;
> +    sPAPROptionVector *ov5;
> +    sPAPROptionVector *ov5_cas;
>  
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
> @@ -577,7 +580,7 @@ void spapr_events_init(sPAPRMachineState *sm);
>  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update, bool memory_update);
> +                                 bool cpu_update);
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
>  void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                              uint32_t page_shift, uint64_t bus_offset,
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index fba2d98..09afd59 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -42,6 +42,9 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  
>  #define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
>  
> +/* option vector 5 */
> +#define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
> +
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);
>  sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support
  2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
                   ` (10 preceding siblings ...)
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support Michael Roth
@ 2016-10-14  4:10 ` no-reply
  2016-10-14  5:43   ` David Gibson
  11 siblings, 1 reply; 35+ messages in thread
From: no-reply @ 2016-10-14  4:10 UTC (permalink / raw)
  To: mdroth; +Cc: famz, qemu-devel, nfont, david, qemu-ppc, jallen, bharata

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support
Type: series
Message-id: 1476314039-9520-1-git-send-email-mdroth@linux.vnet.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b6c6ecd spapr: Memory hot-unplug support
74753f0 spapr: use count+index for memory hotplug
2de1399 spapr: Add DRC count indexed hotplug identifier type
860a9e5 spapr_events: add support for dedicated hotplug event source
895d1aa spapr: add hotplug interrupt machine options
fffa858 spapr: update spapr hotplug documentation
e9df226 spapr: fix inheritance chain for default machine options
dc9b8b1 spapr: improve ibm, architecture-vec-5 property handling
be26f44 spapr: add option vector handling in CAS-generated resets
cc5d859 spapr_hcall: use spapr_ovec_* interfaces for CAS options
90daf38 spapr_ovec: initial implementation of option vector helpers

=== OUTPUT BEGIN ===
Checking PATCH 1/11: spapr_ovec: initial implementation of option vector helpers...
WARNING: architecture specific defines should be avoided
#338: FILE: include/hw/ppc/spapr_ovec.h:36:
+#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)

total: 0 errors, 1 warnings, 314 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/11: spapr_hcall: use spapr_ovec_* interfaces for CAS options...
Checking PATCH 3/11: spapr: add option vector handling in CAS-generated resets...
Checking PATCH 4/11: spapr: improve ibm, architecture-vec-5 property handling...
Checking PATCH 5/11: spapr: fix inheritance chain for default machine options...
Checking PATCH 6/11: spapr: update spapr hotplug documentation...
Checking PATCH 7/11: spapr: add hotplug interrupt machine options...
Checking PATCH 8/11: spapr_events: add support for dedicated hotplug event source...
ERROR: switch and case should be at the same indent
#164: FILE: hw/ppc/spapr_events.c:283:
+    switch (log_type) {
+        case RTAS_LOG_TYPE_HOTPLUG:
[...]
+        case RTAS_LOG_TYPE_EPOW:
[...]
+        default:

total: 1 errors, 0 warnings, 272 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 9/11: spapr: Add DRC count indexed hotplug identifier type...
WARNING: line over 80 characters
#85: FILE: hw/ppc/spapr_events.c:538:
+        hp->drc_id.count_indexed.count = cpu_to_be32(drc_id->count_indexed.count);

WARNING: line over 80 characters
#86: FILE: hw/ppc/spapr_events.c:539:
+        hp->drc_id.count_indexed.index = cpu_to_be32(drc_id->count_indexed.index);

total: 0 errors, 2 warnings, 144 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 10/11: spapr: use count+index for memory hotplug...
WARNING: line over 80 characters
#58: FILE: hw/ppc/spapr.c:2265:
+                                           addr_start / SPAPR_MEMORY_BLOCK_SIZE);

WARNING: line over 80 characters
#64: FILE: hw/ppc/spapr.c:2271:
+            spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);

total: 0 errors, 2 warnings, 45 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 11/11: spapr: Memory hot-unplug support...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets Michael Roth
@ 2016-10-14  4:15   ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:15 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 10767 bytes --]

On Wed, Oct 12, 2016 at 06:13:51PM -0500, Michael Roth wrote:
> In some cases, ibm,client-architecture-support calls can fail. This
> could happen in the current code for situations where the modified
> device tree segment exceeds the buffer size provided by the guest
> via the call parameters. In these cases, QEMU will reset, allowing
> an opportunity to regenerate the device tree from scratch via
> boot-time handling. There are potentially other scenarios as well,
> not currently reachable in the current code, but possible in theory,
> such as cases where device-tree properties or nodes need to be removed.
> 
> We currently don't handle either of these properly for option vector
> capabilities however. Instead of carrying the negotiated capability
> beyond the reset and creating the boot-time device tree accordingly,
> we start from scratch, generating the same boot-time device tree as we
> did prior to the CAS-generated and the same device tree updates as we
> did before. This could (in theory) cause us to get stuck in a reset
> loop. This hasn't been observed, but depending on the extensiveness
> of CAS-induced device tree updates in the future, could eventually
> become an issue.
> 
> Address this by pulling capability-related device tree
> updates resulting from CAS calls into a common routine,
> spapr_populate_cas_updates(), and adding an sPAPROptionVector*
> parameter that allows us to test for newly-negotiated capabilities.
> We invoke it as follows:
> 
> 1) When ibm,client-architecture-support gets called, we
>    call spapr_populate_cas_updates() with the set of capabilities
>    added since the previous call to ibm,client-architecture-support.
>    For the initial boot, or a system reset generated by something
>    other than the CAS call itself, this set will consist of *all*
>    options supported both the platform and the guest. For calls
>    to ibm,client-architecture-support immediately after a CAS-induced
>    reset, we call spapr_populate_cas_updates() with only the set
>    of capabilities added since the previous call, since the other
>    capabilities will have already been addressed by the boot-time
>    device-tree this time around. In the unlikely event that
>    capabilities are *removed* since the previous CAS, we will
>    generate a CAS-induced reset. In the unlikely event that we
>    cannot fit the device-tree updates into the buffer provided
>    by the guest, well generate a CAS-induced reset.
> 
> 2) When a CAS update results in the need to reset the machine and
>    include the updates in the boot-time device tree, we call the
>    spapr_populate_cas_updates() using the full set of negotiated
>    capabilities as part of the reset path. At initial boot, or after
>    a reset generated by something other than the CAS call itself,
>    this set will be empty, resulting in what should be the same
>    boot-time device-tree as we generated prior to this patch. For
>    CAS-induced reset, this routine will be called with the full set of
>    capabilities negotiated by the platform/guest in the previous
>    CAS call, which should result in CAS updates from previous call
>    being accounted for in the initial boot-time device tree.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

I suspect HPT resizing is also going to need actual CAS reboots
(rather than just adjusting the DT), so it's handy you've implemented
that here.

> ---
>  hw/ppc/spapr.c         | 43 ++++++++++++++++++++++++++++++++++---------
>  hw/ppc/spapr_hcall.c   | 22 ++++++++++++++++++----
>  include/hw/ppc/spapr.h |  4 +++-
>  3 files changed, 55 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 934d6b2..460c7a8 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -854,13 +854,28 @@ out:
>      return ret;
>  }
>  
> +static int spapr_populate_cas_updates(sPAPRMachineState *spapr, void *fdt,
> +                                      sPAPROptionVector *ov5_updates)
> +{
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    int ret = 0;
> +
> +    /* Generate ibm,dynamic-reconfiguration-memory node if required */
> +    if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
> +        g_assert(smc->dr_lmb_enabled);
> +        ret = spapr_populate_drconf_memory(spapr, fdt);
> +    }
> +
> +    return ret;
> +}
> +
>  int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update)
> +                                 bool cpu_update,
> +                                 sPAPROptionVector *ov5_updates)
>  {
>      void *fdt, *fdt_skel;
>      sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> -    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
>  
>      size -= sizeof(hdr);
>  
> @@ -879,11 +894,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
>          _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
>      }
>  
> -    /* Generate ibm,dynamic-reconfiguration-memory node if required */
> -    if (spapr_ovec_test(spapr->ov5_cas, OV5_DRCONF_MEMORY)) {
> -        g_assert(smc->dr_lmb_enabled);
> -        _FDT((spapr_populate_drconf_memory(spapr, fdt)));
> -    }
> +    spapr_populate_cas_updates(spapr, fdt, ov5_updates);
>  
>      /* Pack resulting tree */
>      _FDT((fdt_pack(fdt)));
> @@ -904,7 +915,8 @@ int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
>  static void spapr_finalize_fdt(sPAPRMachineState *spapr,
>                                 hwaddr fdt_addr,
>                                 hwaddr rtas_addr,
> -                               hwaddr rtas_size)
> +                               hwaddr rtas_size,
> +                               sPAPROptionVector *ov5_updates)
>  {
>      MachineState *machine = MACHINE(qdev_get_machine());
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> @@ -1000,6 +1012,11 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
>          }
>      }
>  
> +    ret = spapr_populate_cas_updates(spapr, fdt, ov5_updates);
> +    if (ret < 0) {
> +        error_report("couldn't setup CAS properties fdt");
> +    }
> +
>      _FDT((fdt_pack(fdt)));
>  
>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> @@ -1174,9 +1191,16 @@ static void ppc_spapr_reset(void)
>      spapr->rtas_addr = rtas_limit - RTAS_MAX_SIZE;
>      spapr->fdt_addr = spapr->rtas_addr - FDT_MAX_SIZE;
>  
> +    /* if this reset wasn't generated by CAS, we should reset our
> +     * negotiated options and start from scratch */
> +    if (!spapr->cas_reboot) {
> +        spapr_ovec_cleanup(spapr->ov5_cas);
> +        spapr->ov5_cas = spapr_ovec_new();
> +    }
> +
>      /* Load the fdt */
>      spapr_finalize_fdt(spapr, spapr->fdt_addr, spapr->rtas_addr,
> -                       spapr->rtas_size);
> +                       spapr->rtas_size, spapr->ov5_cas);
>  
>      /* Copy RTAS over */
>      cpu_physical_memory_write(spapr->rtas_addr, spapr->rtas_blob,
> @@ -1189,6 +1213,7 @@ static void ppc_spapr_reset(void)
>      first_cpu->halted = 0;
>      first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
>  
> +    spapr->cas_reboot = false;
>  }
>  
>  static void spapr_create_nvram(sPAPRMachineState *spapr)
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index f1d081b..d277813 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -950,7 +950,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>      unsigned compat_lvl = 0, cpu_version = 0;
>      unsigned max_lvl = get_compat_level(cpu_->max_compat);
>      int counter;
> -    sPAPROptionVector *ov5_guest;
> +    sPAPROptionVector *ov5_guest, *ov5_cas_old, *ov5_updates;
>  
>      /* Parse PVR list */
>      for (counter = 0; counter < 512; ++counter) {
> @@ -1013,13 +1013,27 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
>       * of guest input. To model these properly we'd want some sort of mask,
>       * but since they only currently apply to memory migration as defined
>       * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
> -     * to worry about this.
> +     * to worry about this for now.
>       */
> +    ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
> +    /* full range of negotiated ov5 capabilities */
>      spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
>      spapr_ovec_cleanup(ov5_guest);
> +    /* capabilities that have been added since CAS-generated guest reset.
> +     * if capabilities have since been removed, generate another reset
> +     */
> +    ov5_updates = spapr_ovec_new();
> +    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
> +                                        ov5_cas_old, spapr->ov5_cas);
> +
> +    if (!spapr->cas_reboot) {
> +        spapr->cas_reboot =
> +            spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
> +                                         ov5_updates);
> +    }
> +    spapr_ovec_cleanup(ov5_updates);
>  
> -    if (spapr_h_cas_compose_response(spapr, args[1], args[2],
> -                                     cpu_update)) {
> +    if (spapr->cas_reboot) {
>          qemu_system_reset_request();
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 6c20d28..27a3328 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -69,6 +69,7 @@ struct sPAPRMachineState {
>      bool has_graphics;
>      sPAPROptionVector *ov5;
>      sPAPROptionVector *ov5_cas;
> +    bool cas_reboot;
>  
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
> @@ -580,7 +581,8 @@ void spapr_events_init(sPAPRMachineState *sm);
>  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
> -                                 bool cpu_update);
> +                                 bool cpu_update,
> +                                 sPAPROptionVector *ov5_updates);
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
>  void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                              uint32_t page_shift, uint64_t bus_offset,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options
  2016-10-14  3:02   ` David Gibson
@ 2016-10-14  4:20     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:20 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 3661 bytes --]

On Fri, Oct 14, 2016 at 02:02:31PM +1100, David Gibson wrote:
> On Wed, Oct 12, 2016 at 06:13:50PM -0500, Michael Roth wrote:
> > Currently we access individual bytes of an option vector via
> > ldub_phys() to test for the presence of a particular capability
> > within that byte. Currently this is only done for the "dynamic
> > reconfiguration memory" capability bit. If that bit is present,
> > we pass a boolean value to spapr_h_cas_compose_response()
> > to generate a modified device tree segment with the additional
> > properties required to enable this functionality.
> > 
> > As more capability bits are added, will would need to modify the
> > code to add additional option vector accesses and extend the
> > param list for spapr_h_cas_compose_response() to include similar
> > boolean values for these parameters.
> > 
> > Avoid this by switching to spapr_ovec_* helpers so we can do all
> > the parsing in one shot and then test for these additional bits
> > within spapr_h_cas_compose_response() directly.
> > 
> > Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

That said.. some comments making the overall scheme here might be
helpful.

Specifically..

[snip]
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 39dadaa..6c20d28 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -6,6 +6,7 @@
> >  #include "hw/ppc/xics.h"
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "hw/mem/pc-dimm.h"
> > +#include "hw/ppc/spapr_ovec.h"
> >  
> >  struct VIOsPAPRBus;
> >  struct sPAPRPHBState;
> > @@ -66,6 +67,8 @@ struct sPAPRMachineState {
> >      uint64_t rtc_offset; /* Now used only during incoming migration */
> >      struct PPCTimebase tb;
> >      bool has_graphics;
> > +    sPAPROptionVector *ov5;
> > +    sPAPROptionVector *ov5_cas;

IIUC, the ov5 represents all the features qemu is capable of
supporting, and ov5_cas records the ones that were actually negotiated
during CAS.  Some descriptions here could make that much easier to follow.


> >      uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> > @@ -577,7 +580,7 @@ void spapr_events_init(sPAPRMachineState *sm);
> >  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
> >  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
> >                                   target_ulong addr, target_ulong size,
> > -                                 bool cpu_update, bool memory_update);
> > +                                 bool cpu_update);
> >  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
> >  void spapr_tce_table_enable(sPAPRTCETable *tcet,
> >                              uint32_t page_shift, uint64_t bus_offset,
> > diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> > index fba2d98..09afd59 100644
> > --- a/include/hw/ppc/spapr_ovec.h
> > +++ b/include/hw/ppc/spapr_ovec.h
> > @@ -42,6 +42,9 @@ typedef struct sPAPROptionVector sPAPROptionVector;
> >  
> >  #define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
> >  
> > +/* option vector 5 */
> > +#define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
> > +
> >  /* interfaces */
> >  sPAPROptionVector *spapr_ovec_new(void);
> >  sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options Michael Roth
@ 2016-10-14  4:34   ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:34 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 1820 bytes --]

On Wed, Oct 12, 2016 at 06:13:53PM -0500, Michael Roth wrote:
> Rather than machine instances having backward-compatible option
> defaults that need to be repeatedly re-enabled for every new machine
> type we introduce, we set the defaults appropriate for newer machine
> types, then add code to explicitly disable instance options as needed
> to maintain compatibility with older machine types.
> 
> Currently pseries-2.5 does not inherit from pseries-2.6 in this
> fashion, which is okay at the moment since we do not have any
> instance compatibility options for pseries-2.6+ currently.
> 
> We will make use of this in future patches though, so fix it here.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

This patch stands on its own, so I've applied it to ppc-for-2.8 (and
also extended it to make 2_7 inherit from 2_8).

> ---
>  hw/ppc/spapr.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3b2a459..f8cde92 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2544,6 +2544,7 @@ DEFINE_SPAPR_MACHINE(2_7, "2.7", false);
>  
>  static void spapr_machine_2_6_instance_options(MachineState *machine)
>  {
> +    spapr_machine_2_7_instance_options(machine);
>  }
>  
>  static void spapr_machine_2_6_class_options(MachineClass *mc)
> @@ -2568,6 +2569,7 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", false);
>  
>  static void spapr_machine_2_5_instance_options(MachineState *machine)
>  {
> +    spapr_machine_2_6_instance_options(machine);
>  }
>  
>  static void spapr_machine_2_5_class_options(MachineClass *mc)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation Michael Roth
@ 2016-10-14  4:35   ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:35 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 5508 bytes --]

On Wed, Oct 12, 2016 at 06:13:54PM -0500, Michael Roth wrote:
> This updates the existing documentation to reflect recent updates to
> the hotplug event structure, which are in draft form but slated
> for inclusion in PAPR/LoPAPR.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  docs/specs/ppc-spapr-hotplug.txt | 55 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 46 insertions(+), 9 deletions(-)
> 
> diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
> index 631b0ca..f57e2a0 100644
> --- a/docs/specs/ppc-spapr-hotplug.txt
> +++ b/docs/specs/ppc-spapr-hotplug.txt
> @@ -233,12 +233,27 @@ tools by host-level management such as an HMC. This level of management is not
>  applicable to PowerKVM, hence the reason for extending the notification
>  framework to support hotplug events.
>  
> -Note that these events are not yet formally part of the PAPR+ specification,
> -but support for this format has already been implemented in DR-related
> -guest tools such as powerpc-utils/librtas, as well as kernel patches that have
> -been submitted to handle in-kernel processing of memory/cpu-related hotplug
> -events[1], and is planned for formal inclusion is PAPR+ specification. The
> -hotplug-specific payload is QEMU implemented as follows (with all values
> +The format for these EPOW-signalled events is described below under
> +"hotplug/unplug event structure". Note that these events are not
> +formally part of the PAPR+ specification, and have been superseded by a
> +newer format, also described below under "hotplug/unplug event structure",
> +and so are now deemed a "legacy" format. The formats are similar, but the
> +"modern" format contains additional fields/flags, which are denoted for the
> +purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards.
> +
> +QEMU should assume support only for "legacy" fields/flags unless the guest
> +advertises support for the "modern" format via ibm,client-architecture-support
> +hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector
> +structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events,
> +"modern" format events are surfaced to the guest via check-exception RTAS calls,
> +but use a dedicated event source to signal the guest. This event source is
> +advertised to the guest by the addition of a "hot-plug-events" node under
> +"/event-sources" node of the guest's device tree using the standard format
> +described in LoPAPR v11, B.6.12.1.
> +
> +== hotplug/unplug event structure ==
> +
> +The hotplug-specific payload in QEMU is implemented as follows (with all values
>  encoded in big-endian format):
>  
>  struct rtas_event_log_v6_hp {
> @@ -263,14 +278,23 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_ACTION_ADD       1
>  #define RTAS_LOG_V6_HP_ACTION_REMOVE    2
>      uint8_t hotplug_action;             /* action (add/remove) */
> -#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
> -#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
> -#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
> +#define RTAS_LOG_V6_HP_ID_DRC_NAME          1
> +#define RTAS_LOG_V6_HP_ID_DRC_INDEX         2
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT         3
> +#ifdef GUEST_SUPPORTS_MODERN
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
> +#endif
>      uint8_t hotplug_identifier;         /* type of the resource identifier,
>                                           * which serves as the discriminator
>                                           * for the 'drc' union field below
>                                           */
> +#ifdef GUEST_SUPPORTS_MODERN
> +    uint8_t capabilities;               /* capability flags, currently unused
> +                                         * by QEMU
> +                                         */
> +#else
>      uint8_t reserved;
> +#endif
>      union {
>          uint32_t index;                 /* DRC index of resource to take action
>                                           * on
> @@ -278,6 +302,19 @@ struct rtas_event_log_v6_hp {
>          uint32_t count;                 /* number of DR resources to take
>                                           * action on (guest chooses which)
>                                           */
> +#ifdef GUEST_SUPPORTS_MODERN
> +        struct {
> +            uint32_t count;             /* number of DR resources to take
> +                                         * action on
> +                                         */
> +            uint32_t index;             /* DRC index of first resource to take
> +                                         * action on. guest will take action
> +                                         * on DRC index <index> through
> +                                         * DRC index <index + count - 1> in
> +                                         * sequential order
> +                                         */
> +        } count_indexed;
> +#endif
>          char name[1];                   /* string representing the name of the
>                                           * DRC to take action on
>                                           */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options Michael Roth
@ 2016-10-14  4:38   ` David Gibson
  2016-10-14 18:08     ` Michael Roth
  2016-10-14  8:37   ` Bharata B Rao
  1 sibling, 1 reply; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:38 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 4936 bytes --]

On Wed, Oct 12, 2016 at 06:13:55PM -0500, Michael Roth wrote:
> This adds machine options of the form:
> 
>   -machine pseries,legacy-hotplug-events=true
>   -machine pseries,legacy-hotplug-events=false
> 
> to denote whether or not we wish to force the use of "legacy" style
> hotplug events, which are surfaced through EPOW interrupts instead of
> a dedicated interrupt source, and lack certain features necessary,
> mainly, for memory unplug support.
> 
> If false, QEMU will default to "legacy" style unless the guest
> advertises support for the newer events via
> ibm,client-architecture-support hcall during early boot.
> 
> For pseries-2.7 and earlier we default to true, for newer machine
> types we default to false.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Hrm.. I think it would be a little clearer if you could find a wording
such that both the internal variable and the external property have
the same sense - i.e. get rid of the ! in the property getters /
setters.

> ---
>  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h      |  1 +
>  include/hw/ppc/spapr_ovec.h |  1 +
>  3 files changed, 33 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f8cde92..d80a6fa 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
>  
>      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
>  
> +    /* use dedicated HP event source if guest supports it */
> +    if (spapr->use_hotplug_event_source) {
> +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> +    }
> +
>      /* init CPUs */
>      if (machine->cpu_model == NULL) {
>          machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> @@ -2172,16 +2177,39 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
>      spapr->kvm_type = g_strdup(value);
>  }
>  
> +static bool spapr_get_legacy_hotplug_events(Object *obj, Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    return !spapr->use_hotplug_event_source;
> +}
> +
> +static void spapr_set_legacy_hotplug_events(Object *obj, bool value,
> +                                            Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +    spapr->use_hotplug_event_source = !value;
> +}
> +
>  static void spapr_machine_initfn(Object *obj)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>  
>      spapr->htab_fd = -1;
> +    spapr->use_hotplug_event_source = true;
>      object_property_add_str(obj, "kvm-type",
>                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
>      object_property_set_description(obj, "kvm-type",
>                                      "Specifies the KVM virtualization mode (HV, PR)",
>                                      NULL);
> +    object_property_add_bool(obj, "legacy-hotplug-events",
> +                            spapr_get_legacy_hotplug_events,
> +                            spapr_set_legacy_hotplug_events,
> +                            NULL);
> +    object_property_set_description(obj, "legacy-hotplug-events",
> +                                    "Use deprecated EPOW mechanism for hotplug events",
> +                                    NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -2518,6 +2546,9 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>  
>  static void spapr_machine_2_7_instance_options(MachineState *machine)
>  {
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +
> +    spapr->use_hotplug_event_source = false;
>  }
>  
>  static void spapr_machine_2_7_class_options(MachineClass *mc)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 27a3328..d1a4a14 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -74,6 +74,7 @@ struct sPAPRMachineState {
>      uint32_t check_exception_irq;
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> +    bool use_hotplug_event_source;
>  
>      /* Migration state */
>      int htab_save_index;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 47fa04c..92167c6 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  /* option vector 5 */
>  #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
>  #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
> +#define OV5_HP_EVT              OV_BIT(6, 5)
>  
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source Michael Roth
@ 2016-10-14  4:56   ` David Gibson
  2016-10-14 18:44     ` Michael Roth
  2016-10-14  8:46   ` Bharata B Rao
  1 sibling, 1 reply; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:56 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 13774 bytes --]

On Wed, Oct 12, 2016 at 06:13:56PM -0500, Michael Roth wrote:
> Hotplug events were previously delivered using an EPOW interrupt
> and were queued by linux guests into a circular buffer. For traditional
> EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> events there are cases where this buffer can be exhausted, resulting
> in the loss of hotplug events, resets, etc.
> 
> Newer-style hotplug event are delivered using a dedicated event source.
> We enable this in supported guests by adding standard an additional
> event source in the guest device-tree via /event-sources, and, if
> the guest advertises support for the newer-style hotplug events,
> using the corresponding interrupt to signal the available of
> hotplug/unplug events.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

So.. are you saying that as well as allowing new event types, the new
special hotplug event souce effectively allows for a bigger queue?

Does that mean that we didn't even necessarily need the base+length
unplug events, because we could now have sent the many single-LMB
unplug requests that were necessary?  Or does it not increase the
effective queue enough for that?

> ---
>  hw/ppc/spapr.c         |  10 ++--
>  hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
>  include/hw/ppc/spapr.h |   3 +-
>  3 files changed, 120 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d80a6fa..2037222 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>                                     hwaddr initrd_size,
>                                     hwaddr kernel_size,
>                                     bool little_endian,
> -                                   const char *kernel_cmdline,
> -                                   uint32_t epow_irq)
> +                                   const char *kernel_cmdline)
>  {
>      void *fdt;
>      uint32_t start_prop = cpu_to_be32(initrd_base);
> @@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>      _FDT((fdt_end_node(fdt)));
>  
>      /* event-sources */
> -    spapr_events_fdt_skel(fdt, epow_irq);
> +    spapr_events_fdt_skel(fdt);
>  
>      /* /hypervisor node */
>      if (kvm_enabled()) {
> @@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
>      }
>      g_free(filename);
>  
> -    /* Set up EPOW events infrastructure */
> +    /* Set up RTAS event infrastructure */
>      spapr_events_init(spapr);
>  
>      /* Set up the RTC RTAS interfaces */
> @@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
>      /* Prepare the device tree */
>      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
>                                              kernel_size, kernel_le,
> -                                            kernel_cmdline,
> -                                            spapr->check_exception_irq);
> +                                            kernel_cmdline);
>      assert(spapr->fdt_skel != NULL);
>  
>      /* used by RTAS */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 4c7b6ae..f8bbec6 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -40,6 +40,7 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qemu/help_option.h"
>  #include "qemu/bcd.h"
> +#include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
>  
>  struct rtas_error_log {
> @@ -206,28 +207,104 @@ struct hp_log_full {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> -#define EVENT_MASK_EPOW                      0x40000000
> -#define EVENT_MASK_HOTPLUG                   0x10000000
> -#define EVENT_MASK_IO                        0x08000000
> +typedef enum EventClassIndex {
> +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> +    EVENT_CLASS_EPOW                = 1,
> +    EVENT_CLASS_RESERVED            = 2,
> +    EVENT_CLASS_HOT_PLUG            = 3,
> +    EVENT_CLASS_IO                  = 4,
> +    EVENT_CLASS_MAX
> +} EventClassIndex;
> +
> +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> +
> +typedef struct EventSource {
> +    const char *name;
> +    int irq;
> +    uint32_t mask;
> +    bool enabled;
> +} EventSource;
> +
> +static EventSource event_source[EVENT_CLASS_MAX] = {
> +    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
> +    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
> +    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
> +    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
> +};
> +
> +static void rtas_event_source_register(EventClassIndex index, int irq)
> +{
> +    /* we only support 1 irq per event class at the moment */
> +    g_assert(!event_source[index].enabled);
> +    event_source[index].irq = irq;
> +    event_source[index].mask = EVENT_CLASS_MASK(index);
> +    event_source[index].enabled = true;
> +}

I really don't like adding a mutable global table.  This should
probably be under the sPAPRMachineState.

> -void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> +void spapr_events_fdt_skel(void *fdt)
>  {
> -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> +    int i, count = 0;
>  
>      _FDT((fdt_begin_node(fdt, "event-sources")));
>  
> +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> +        /* TODO: what does 0 entail? */

It's just part of the interrupt specifier format expected by the
event-sources binding.  It's not really important.

> +        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
> +
> +        if (!event_source[i].enabled) {
> +            continue;
> +        }
> +
> +        _FDT((fdt_begin_node(fdt, event_source[i].name)));
> +        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> +        _FDT((fdt_end_node(fdt)));
> +
> +        irq_ranges[count++] = interrupts[0];
> +        irq_ranges[count++] = cpu_to_be32(1);
> +    }
> +
> +    /* TODO: confirm the count is the last expected element */
> +    irq_ranges[count] = cpu_to_be32(count);
> +    count++;
> +
>      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
>      _FDT((fdt_property(fdt, "interrupt-ranges",
> -                       irq_ranges, sizeof(irq_ranges))));
> +                       irq_ranges, count * sizeof(uint32_t))));
>  
> -    _FDT((fdt_begin_node(fdt, "epow-events")));
> -    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
>      _FDT((fdt_end_node(fdt)));
> +}
>  
> -    _FDT((fdt_end_node(fdt)));
> +static const EventSource *rtas_event_log_to_source(int log_type)
> +{
> +    const EventSource *source;
> +
> +    switch (log_type) {
> +        case RTAS_LOG_TYPE_HOTPLUG:
> +            source = &event_source[EVENT_CLASS_HOT_PLUG];
> +            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
> +                break;
> +            }

This should probably be using the flag you already have in the
MachineState, rather than a global.

> +            /* fall back to epow for legacy hotplug interrupt source */
> +        case RTAS_LOG_TYPE_EPOW:
> +            source = &event_source[EVENT_CLASS_EPOW];
> +            break;
> +        default:
> +            source = NULL;
> +    }
> +
> +    return source;
> +}
> +
> +static int rtas_event_log_to_irq(int log_type)
> +{
> +    const EventSource *source = rtas_event_log_to_source(log_type);
> +
> +    g_assert(source);
> +    g_assert(source->enabled);
> +
> +    return source->irq;
>  }
>  
>  static void rtas_event_log_queue(int log_type, void *data, bool exception)
> @@ -248,19 +325,14 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      sPAPREventLogEntry *entry = NULL;
>  
> -    /* we only queue EPOW events atm. */
> -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> -        return NULL;
> -    }
> -
>      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> +
>          if (entry->exception != exception) {
>              continue;
>          }
>  
> -        /* EPOW and hotplug events are surfaced in the same manner */
> -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> +        if (source->mask & event_mask) {
>              break;
>          }
>      }
> @@ -277,19 +349,14 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      sPAPREventLogEntry *entry = NULL;
>  
> -    /* we only queue EPOW events atm. */
> -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> -        return false;
> -    }
> -
>      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> +
>          if (entry->exception != exception) {
>              continue;
>          }
>  
> -        /* EPOW and hotplug events are surfaced in the same manner */
> -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> +        if (source->mask & event_mask) {
>              return true;
>          }
>      }
> @@ -377,7 +444,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_EPOW)));
>  }
>  
>  static void spapr_hotplug_set_signalled(uint32_t drc_index)
> @@ -459,7 +527,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_HOTPLUG)));
>  }
>  
>  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> @@ -505,6 +574,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>      uint64_t xinfo;
>      sPAPREventLogEntry *event;
>      struct rtas_error_log *hdr;
> +    int i;
>  
>      if ((nargs < 6) || (nargs > 7) || nret != 1) {
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> @@ -541,8 +611,11 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>       * do the latter here, since our code relies on edge-triggered
>       * interrupts.
>       */
> -    if (rtas_event_log_contains(mask, true)) {
> -        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> +        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
> +            g_assert(event_source[i].enabled);
> +            qemu_irq_pulse(xics_get_qirq(spapr->xics, event_source[i].irq));
> +        }
>      }
>  
>      return;
> @@ -594,8 +667,17 @@ out_no_events:
>  void spapr_events_init(sPAPRMachineState *spapr)
>  {
>      QTAILQ_INIT(&spapr->pending_events);
> -    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, 0, false,
> -                                            &error_fatal);
> +
> +    rtas_event_source_register(EVENT_CLASS_EPOW,
> +                               xics_spapr_alloc(spapr->xics, 0, 0, false,
> +                                                &error_fatal));
> +
> +    if (spapr->use_hotplug_event_source) {
> +        rtas_event_source_register(EVENT_CLASS_HOT_PLUG,
> +                                   xics_spapr_alloc(spapr->xics, 0, 0, false,
> +                                                    &error_fatal));
> +    }
> +
>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index d1a4a14..2295ac6 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -71,7 +71,6 @@ struct sPAPRMachineState {
>      sPAPROptionVector *ov5_cas;
>      bool cas_reboot;
>  
> -    uint32_t check_exception_irq;
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
>      bool use_hotplug_event_source;
> @@ -579,7 +578,7 @@ struct sPAPREventLogEntry {
>  };
>  
>  void spapr_events_init(sPAPRMachineState *sm);
> -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
> +void spapr_events_fdt_skel(void *fdt);
>  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
>                                   target_ulong addr, target_ulong size,
>                                   bool cpu_update,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type Michael Roth
@ 2016-10-14  4:59   ` David Gibson
  2016-10-14 18:52     ` Michael Roth
  0 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2016-10-14  4:59 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 8043 bytes --]

On Wed, Oct 12, 2016 at 06:13:57PM -0500, Michael Roth wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Add support for DRC count indexed hotplug ID type which is primarily
> needed for memory hot unplug. This type allows for specifying the
> number of DRs that should be plugged/unplugged starting from a given
> DRC index.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> * updated rtas_event_log_v6_hp to reflect count/index field ordering
>   used in PAPR hotplug ACR
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_events.c  | 74 ++++++++++++++++++++++++++++++++++++++++----------
>  include/hw/ppc/spapr.h |  4 +++
>  2 files changed, 63 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index f8bbec6..eeca800 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -175,6 +175,16 @@ struct epow_log_full {
>      struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +union drc_identifier {
> +    uint32_t index;
> +    uint32_t count;
> +    struct {
> +        uint32_t count;
> +        uint32_t index;
> +    } count_indexed;
> +    char name[1];
> +} QEMU_PACKED;
> +
>  struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
>      struct rtas_event_log_v6_section_header hdr;
> @@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
>  #define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
>  #define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED              4
>      uint8_t reserved;
> -    union {
> -        uint32_t index;
> -        uint32_t count;
> -        char name[1];
> -    } drc;
> +    union drc_identifier drc_id;
>  } QEMU_PACKED;
>  
>  struct hp_log_full {
> @@ -457,7 +464,7 @@ static void spapr_hotplug_set_signalled(uint32_t drc_index)
>  
>  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>                                      sPAPRDRConnectorType drc_type,
> -                                    uint32_t drc)
> +                                    union drc_identifier *drc_id)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>      struct hp_log_full *new_hp;
> @@ -502,7 +509,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      case SPAPR_DR_CONNECTOR_TYPE_PCI:
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
>          if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
> -            spapr_hotplug_set_signalled(drc);
> +            spapr_hotplug_set_signalled(drc_id->index);
>          }
>          break;
>      case SPAPR_DR_CONNECTOR_TYPE_LMB:
> @@ -520,9 +527,16 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>      }
>  
>      if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
> -        hp->drc.count = cpu_to_be32(drc);
> +        hp->drc_id.count = cpu_to_be32(drc_id->count);
>      } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
> -        hp->drc.index = cpu_to_be32(drc);
> +        hp->drc_id.index = cpu_to_be32(drc_id->index);
> +    } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
> +        /* we should not be using count_indexed value unless the guest
> +         * supports dedicated hotplug event source
> +         */
> +        g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
> +        hp->drc_id.count_indexed.count = cpu_to_be32(drc_id->count_indexed.count);
> +        hp->drc_id.count_indexed.index = cpu_to_be32(drc_id->count_indexed.index);
>      }
>  
>      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> @@ -535,34 +549,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
>  {
>      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -    uint32_t index = drck->get_index(drc);
> +    union drc_identifier drc_id;
>  
> +    drc_id.index = drck->get_index(drc);
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
>  {
>      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -    uint32_t index = drck->get_index(drc);
> +    union drc_identifier drc_id;
>  
> +    drc_id.index = drck->get_index(drc);
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
>                                         uint32_t count)
>  {
> +    union drc_identifier drc_id;
> +
> +    drc_id.count = count;
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, count);
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
>  }
>  
>  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
>                                            uint32_t count)
>  {
> +    union drc_identifier drc_id;
> +
> +    drc_id.count = count;
>      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, count);
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> +}
> +
> +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                            uint32_t count, uint32_t index)
> +{
> +    union drc_identifier drc_id;
> +
> +    drc_id.count_indexed.count = count;
> +    drc_id.count_indexed.index = index;
> +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
> +}
> +
> +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                               uint32_t count, uint32_t index)
> +{
> +    union drc_identifier drc_id;
> +
> +    drc_id.count_indexed.count = count;
> +    drc_id.count_indexed.index = index;
> +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 2295ac6..11a2597 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -602,6 +602,10 @@ void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
>                                         uint32_t count);
>  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
>                                            uint32_t count);
> +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                            uint32_t count, uint32_t index);
> +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> +                                               uint32_t count, uint32_t index);
>  void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp);
>  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>                                      sPAPRMachineState *spapr);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support
  2016-10-14  4:10 ` [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support no-reply
@ 2016-10-14  5:43   ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-14  5:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: mdroth, famz, nfont, qemu-ppc, jallen, bharata

[-- Attachment #1: Type: text/plain, Size: 5182 bytes --]

On Thu, Oct 13, 2016 at 09:10:22PM -0700, no-reply@patchew.org wrote:
> Hi,
> 
> Your series seems to have some coding style problems. See output below for
> more information:

checkpatch generates a fair few false positives.  However, having had
a look, these don't appear to be so.

> 
> Subject: [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support
> Type: series
> Message-id: 1476314039-9520-1-git-send-email-mdroth@linux.vnet.ibm.com
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> 
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
> 
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> 
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
>     echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
>     if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
>         failed=1
>         echo
>     fi
>     n=$((n+1))
> done
> 
> exit $failed
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> b6c6ecd spapr: Memory hot-unplug support
> 74753f0 spapr: use count+index for memory hotplug
> 2de1399 spapr: Add DRC count indexed hotplug identifier type
> 860a9e5 spapr_events: add support for dedicated hotplug event source
> 895d1aa spapr: add hotplug interrupt machine options
> fffa858 spapr: update spapr hotplug documentation
> e9df226 spapr: fix inheritance chain for default machine options
> dc9b8b1 spapr: improve ibm, architecture-vec-5 property handling
> be26f44 spapr: add option vector handling in CAS-generated resets
> cc5d859 spapr_hcall: use spapr_ovec_* interfaces for CAS options
> 90daf38 spapr_ovec: initial implementation of option vector helpers
> 
> === OUTPUT BEGIN ===
> Checking PATCH 1/11: spapr_ovec: initial implementation of option vector helpers...
> WARNING: architecture specific defines should be avoided
> #338: FILE: include/hw/ppc/spapr_ovec.h:36:
> +#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)

... uh, except for this one.  I think this one can be ignored, the
rest should be addressed.

> total: 0 errors, 1 warnings, 314 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 2/11: spapr_hcall: use spapr_ovec_* interfaces for CAS options...
> Checking PATCH 3/11: spapr: add option vector handling in CAS-generated resets...
> Checking PATCH 4/11: spapr: improve ibm, architecture-vec-5 property handling...
> Checking PATCH 5/11: spapr: fix inheritance chain for default machine options...
> Checking PATCH 6/11: spapr: update spapr hotplug documentation...
> Checking PATCH 7/11: spapr: add hotplug interrupt machine options...
> Checking PATCH 8/11: spapr_events: add support for dedicated hotplug event source...
> ERROR: switch and case should be at the same indent
> #164: FILE: hw/ppc/spapr_events.c:283:
> +    switch (log_type) {
> +        case RTAS_LOG_TYPE_HOTPLUG:
> [...]
> +        case RTAS_LOG_TYPE_EPOW:
> [...]
> +        default:
> 
> total: 1 errors, 0 warnings, 272 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> Checking PATCH 9/11: spapr: Add DRC count indexed hotplug identifier type...
> WARNING: line over 80 characters
> #85: FILE: hw/ppc/spapr_events.c:538:
> +        hp->drc_id.count_indexed.count = cpu_to_be32(drc_id->count_indexed.count);
> 
> WARNING: line over 80 characters
> #86: FILE: hw/ppc/spapr_events.c:539:
> +        hp->drc_id.count_indexed.index = cpu_to_be32(drc_id->count_indexed.index);
> 
> total: 0 errors, 2 warnings, 144 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 10/11: spapr: use count+index for memory hotplug...
> WARNING: line over 80 characters
> #58: FILE: hw/ppc/spapr.c:2265:
> +                                           addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> 
> WARNING: line over 80 characters
> #64: FILE: hw/ppc/spapr.c:2271:
> +            spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
> 
> total: 0 errors, 2 warnings, 45 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 11/11: spapr: Memory hot-unplug support...
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-devel@freelists.org

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support Michael Roth
@ 2016-10-14  7:05   ` Bharata B Rao
  0 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2016-10-14  7:05 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

On Wed, Oct 12, 2016 at 06:13:59PM -0500, Michael Roth wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Add support to hot remove pc-dimm memory devices.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> * add hooks to CAS/cmdline enablement of hotplug ACR support
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c     | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/ppc/spapr_drc.c |  17 +++++++++
>  2 files changed, 122 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 9af4268..180fa3d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2310,6 +2310,90 @@ out:
>      error_propagate(errp, local_err);
>  }
> 
> +typedef struct sPAPRDIMMState {
> +    uint32_t nr_lmbs;
> +} sPAPRDIMMState;
> +
> +static void spapr_lmb_release(DeviceState *dev, void *opaque)
> +{
> +    sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
> +    HotplugHandler *hotplug_ctrl = NULL;
> +
> +    if (--ds->nr_lmbs) {
> +        return;
> +    }
> +
> +    g_free(ds);
> +
> +    /*
> +     * Now that all the LMBs have been removed by the guest, call the
> +     * pc-dimm unplug handler to cleanup up the pc-dimm device.
> +     */
> +    hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t size,
> +                           Error **errp)
> +{
> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +    int i;
> +    sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
> +    uint64_t addr = addr_start;
> +
> +    ds->nr_lmbs = nr_lmbs;
> +    for (i = 0; i < nr_lmbs; i++) {
> +        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                addr / SPAPR_MEMORY_BLOCK_SIZE);
> +        g_assert(drc);
> +
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +        drck->detach(drc, dev, spapr_lmb_release, ds, errp);
> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
> +    }
> +
> +    drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                              nr_lmbs,
> +                                              drck->get_index(drc));
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                Error **errp)
> +{
> +    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +
> +    pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +    uint64_t size = memory_region_size(mr);
> +    uint64_t addr;
> +
> +    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
> +
> +    spapr_del_lmbs(dev, addr, size, &error_abort);
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
>  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>                                      sPAPRMachineState *spapr)
>  {
> @@ -2383,10 +2467,21 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>  static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp)
>  {
> +    sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
>      MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
> 
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        error_setg(errp, "Memory hot unplug not supported by sPAPR");
> +        if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
> +            spapr_memory_unplug(hotplug_dev, dev, errp);
> +        } else {
> +            /* NOTE: this means there is a window after guest reset, prior to
> +             * CAS negotiation, where unplug requests will fail due to the
> +             * capability not being detected yet. This is a bit different than
> +             * the case with PCI unplug, where the events will be queued and
> +             * eventually handled by the guest after boot
> +             */
> +            error_setg(errp, "Memory hot unplug not supported for this guest");
> +        }
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>          if (!mc->query_hotpluggable_cpus) {
>              error_setg(errp, "CPU hot unplug not supported on this machine");
> @@ -2396,6 +2491,14 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>      }
>  }
> 
> +static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> +                                                DeviceState *dev, Error **errp)
> +{
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        spapr_memory_unplug_request(hotplug_dev, dev, errp);
> +    }
> +}
> +
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> @@ -2482,6 +2585,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      hc->plug = spapr_machine_device_plug;
>      hc->unplug = spapr_machine_device_unplug;
>      mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +    hc->unplug_request = spapr_machine_device_unplug_request;

After this, spapr_core_unplug() call needs to be moved to
->unplug_request() as ->unplug() won't be called if ->unplug_request()
is present.

However I am not able to get CPU hotplug working with your patchset.
Looks like RTAS event is getting missed or not getting generated. Will
get back after further debugging.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
  2016-10-14  3:02   ` David Gibson
@ 2016-10-14  7:10   ` Bharata B Rao
  1 sibling, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2016-10-14  7:10 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

On Wed, Oct 12, 2016 at 06:13:50PM -0500, Michael Roth wrote:
> Currently we access individual bytes of an option vector via
> ldub_phys() to test for the presence of a particular capability
> within that byte. Currently this is only done for the "dynamic
> reconfiguration memory" capability bit. If that bit is present,
> we pass a boolean value to spapr_h_cas_compose_response()
> to generate a modified device tree segment with the additional
> properties required to enable this functionality.
> 
> As more capability bits are added, will would need to modify the
> code to add additional option vector accesses and extend the
> param list for spapr_h_cas_compose_response() to include similar
> boolean values for these parameters.
> 
> Avoid this by switching to spapr_ovec_* helpers so we can do all
> the parsing in one shot and then test for these additional bits
> within spapr_h_cas_compose_response() directly.
> 
> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Nicely done!

Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options Michael Roth
  2016-10-14  4:38   ` David Gibson
@ 2016-10-14  8:37   ` Bharata B Rao
  2016-10-14 18:04     ` Michael Roth
  1 sibling, 1 reply; 35+ messages in thread
From: Bharata B Rao @ 2016-10-14  8:37 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

On Wed, Oct 12, 2016 at 06:13:55PM -0500, Michael Roth wrote:
> This adds machine options of the form:
> 
>   -machine pseries,legacy-hotplug-events=true
>   -machine pseries,legacy-hotplug-events=false
> 
> to denote whether or not we wish to force the use of "legacy" style
> hotplug events, which are surfaced through EPOW interrupts instead of
> a dedicated interrupt source, and lack certain features necessary,
> mainly, for memory unplug support.
> 
> If false, QEMU will default to "legacy" style unless the guest
> advertises support for the newer events via
> ibm,client-architecture-support hcall during early boot.
> 
> For pseries-2.7 and earlier we default to true, for newer machine
> types we default to false.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h      |  1 +
>  include/hw/ppc/spapr_ovec.h |  1 +
>  3 files changed, 33 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f8cde92..d80a6fa 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
> 
>      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> 
> +    /* use dedicated HP event source if guest supports it */
> +    if (spapr->use_hotplug_event_source) {
> +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);

The above comment can be confusing. Here you really mean that
the machine type version supports OV5_HP_EVT right ? Because
guest support for the same is determined during cas call later.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-12 23:13 ` [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source Michael Roth
  2016-10-14  4:56   ` David Gibson
@ 2016-10-14  8:46   ` Bharata B Rao
  2016-10-14 18:51     ` Michael Roth
  1 sibling, 1 reply; 35+ messages in thread
From: Bharata B Rao @ 2016-10-14  8:46 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

On Wed, Oct 12, 2016 at 06:13:56PM -0500, Michael Roth wrote:
> Hotplug events were previously delivered using an EPOW interrupt
> and were queued by linux guests into a circular buffer. For traditional
> EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> events there are cases where this buffer can be exhausted, resulting
> in the loss of hotplug events, resets, etc.
> 
> Newer-style hotplug event are delivered using a dedicated event source.
> We enable this in supported guests by adding standard an additional
> event source in the guest device-tree via /event-sources, and, if
> the guest advertises support for the newer-style hotplug events,
> using the corresponding interrupt to signal the available of
> hotplug/unplug events.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |  10 ++--
>  hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
>  include/hw/ppc/spapr.h |   3 +-
>  3 files changed, 120 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d80a6fa..2037222 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>                                     hwaddr initrd_size,
>                                     hwaddr kernel_size,
>                                     bool little_endian,
> -                                   const char *kernel_cmdline,
> -                                   uint32_t epow_irq)
> +                                   const char *kernel_cmdline)
>  {
>      void *fdt;
>      uint32_t start_prop = cpu_to_be32(initrd_base);
> @@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>      _FDT((fdt_end_node(fdt)));
> 
>      /* event-sources */
> -    spapr_events_fdt_skel(fdt, epow_irq);
> +    spapr_events_fdt_skel(fdt);
> 
>      /* /hypervisor node */
>      if (kvm_enabled()) {
> @@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
>      }
>      g_free(filename);
> 
> -    /* Set up EPOW events infrastructure */
> +    /* Set up RTAS event infrastructure */
>      spapr_events_init(spapr);
> 
>      /* Set up the RTC RTAS interfaces */
> @@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
>      /* Prepare the device tree */
>      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
>                                              kernel_size, kernel_le,
> -                                            kernel_cmdline,
> -                                            spapr->check_exception_irq);
> +                                            kernel_cmdline);
>      assert(spapr->fdt_skel != NULL);
> 
>      /* used by RTAS */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 4c7b6ae..f8bbec6 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -40,6 +40,7 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qemu/help_option.h"
>  #include "qemu/bcd.h"
> +#include "hw/ppc/spapr_ovec.h"
>  #include <libfdt.h>
> 
>  struct rtas_error_log {
> @@ -206,28 +207,104 @@ struct hp_log_full {
>      struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
> 
> -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> -#define EVENT_MASK_EPOW                      0x40000000
> -#define EVENT_MASK_HOTPLUG                   0x10000000
> -#define EVENT_MASK_IO                        0x08000000
> +typedef enum EventClassIndex {
> +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> +    EVENT_CLASS_EPOW                = 1,
> +    EVENT_CLASS_RESERVED            = 2,
> +    EVENT_CLASS_HOT_PLUG            = 3,
> +    EVENT_CLASS_IO                  = 4,
> +    EVENT_CLASS_MAX
> +} EventClassIndex;
> +
> +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> +
> +typedef struct EventSource {
> +    const char *name;
> +    int irq;
> +    uint32_t mask;
> +    bool enabled;
> +} EventSource;
> +
> +static EventSource event_source[EVENT_CLASS_MAX] = {
> +    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
> +    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
> +    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
> +    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
> +};
> +
> +static void rtas_event_source_register(EventClassIndex index, int irq)
> +{
> +    /* we only support 1 irq per event class at the moment */
> +    g_assert(!event_source[index].enabled);
> +    event_source[index].irq = irq;
> +    event_source[index].mask = EVENT_CLASS_MASK(index);
> +    event_source[index].enabled = true;
> +}
> 
> -void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> +void spapr_events_fdt_skel(void *fdt)
>  {
> -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> +    int i, count = 0;
> 
>      _FDT((fdt_begin_node(fdt, "event-sources")));
> 
> +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> +        /* TODO: what does 0 entail? */
> +        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
> +
> +        if (!event_source[i].enabled) {
> +            continue;
> +        }
> +
> +        _FDT((fdt_begin_node(fdt, event_source[i].name)));
> +        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> +        _FDT((fdt_end_node(fdt)));
> +
> +        irq_ranges[count++] = interrupts[0];
> +        irq_ranges[count++] = cpu_to_be32(1);
> +    }
> +
> +    /* TODO: confirm the count is the last expected element */
> +    irq_ranges[count] = cpu_to_be32(count);
> +    count++;
> +
>      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
>      _FDT((fdt_property(fdt, "interrupt-ranges",
> -                       irq_ranges, sizeof(irq_ranges))));
> +                       irq_ranges, count * sizeof(uint32_t))));
> 
> -    _FDT((fdt_begin_node(fdt, "epow-events")));
> -    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
>      _FDT((fdt_end_node(fdt)));
> +}
> 
> -    _FDT((fdt_end_node(fdt)));
> +static const EventSource *rtas_event_log_to_source(int log_type)
> +{
> +    const EventSource *source;
> +
> +    switch (log_type) {
> +        case RTAS_LOG_TYPE_HOTPLUG:
> +            source = &event_source[EVENT_CLASS_HOT_PLUG];
> +            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
> +                break;
> +            }

In addition to the above .enabled check, shouldn't you be checking if
the guest indeed supports the dedicated hotplug interrupt source before
returning the source ?

This I believe is the reason for the CPU hotplug failures I that mentioned
in reply to your 11/11 thread. I am on 4.7.x kernel which probably doesn't
support hotplug interrupt source, but QEMU ends up registering and raising
such an interrupt.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers
  2016-10-14  2:39   ` David Gibson
@ 2016-10-14 17:49     ` Michael Roth
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-14 17:49 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

Quoting David Gibson (2016-10-13 21:39:19)
> On Wed, Oct 12, 2016 at 06:13:49PM -0500, Michael Roth wrote:
> > PAPR guests advertise their capabilities to the platform by passing
> > an ibm,architecture-vec structure via an
> > ibm,client-architecture-support hcall as described by LoPAPR v11,
> > B.6.2.3. during early boot.
> > 
> > Using this information, the platform enables the capabilities it
> > supports, then encodes a subset of those enabled capabilities (the
> > 5th option vector of the ibm,architecture-vec structure passed to
> > ibm,client-architecture-support) into the guest device tree via
> > "/chosen/ibm,architecture-vec-5".
> > 
> > The logical format of these these option vectors is a bit-vector,
> > where individual bits are addressed/documented based on the byte-wise
> > offset from the beginning of the bit-vector, followed by the bit-wise
> > index starting from the byte-wise offset. Thus the bits of each of
> > these bytes are stored in reverse order. Additionally, the first
> > byte of each option vector is encodes the length of the option vector,
> > so byte offsets begin at 1, and bit offset at 0.
> 
> Heh.. pity qemu doesn't use the ccan bitmap module
> (http://ccodearchive.net/info/bitmap.html).  By design it always
> stores the bitmaps in IBM bit number ordering, because that's most
> obvious to a human reading a memory dump (for the purpose of bit
> vectors - in most situations the IBM numbering is dumb).
> 
> > This is not very intuitive for the purposes of mapping these bits to
> > a particular documented capability, so this patch introduces a set
> > of abstractions that encapsulate the work of parsing/encoding these
> > options vectors and testing for individual capabilities.
> > 
> > Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> A handful of small nits.
> 
> > ---
> >  hw/ppc/Makefile.objs        |   2 +-
> >  hw/ppc/spapr_ovec.c         | 244 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_ovec.h |  62 +++++++++++
> >  3 files changed, 307 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/ppc/spapr_ovec.c
> >  create mode 100644 include/hw/ppc/spapr_ovec.h
> > 
> > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> > index 99a0d4e..2e0b0c9 100644
> > --- a/hw/ppc/Makefile.objs
> > +++ b/hw/ppc/Makefile.objs
> > @@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
> >  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
> >  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> >  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> > -obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> > +obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
> >  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >  obj-y += spapr_pci_vfio.o
> >  endif
> > diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> > new file mode 100644
> > index 0000000..ddc19f5
> > --- /dev/null
> > +++ b/hw/ppc/spapr_ovec.c
> > @@ -0,0 +1,244 @@
> > +/*
> > + * QEMU SPAPR Architecture Option Vector Helper Functions
> > + *
> > + * Copyright IBM Corp. 2016
> > + *
> > + * Authors:
> > + *  Bharata B Rao     <bharata@linux.vnet.ibm.com>
> > + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/ppc/spapr_ovec.h"
> > +#include "qemu/bitmap.h"
> > +#include "exec/address-spaces.h"
> > +#include "qemu/error-report.h"
> > +#include <libfdt.h>
> > +
> > +/* #define DEBUG_SPAPR_OVEC */
> > +
> > +#ifdef DEBUG_SPAPR_OVEC
> > +#define DPRINTFN(fmt, ...) \
> > +    do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTFN(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> > +#define OV_MAXBYTES 256 /* not including length byte */
> > +#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
> > +
> > +/* we *could* work with bitmaps directly, but handling the bitmap privately
> > + * allows us to more safely make assumptions about the bitmap size and
> > + * simplify the calling code somewhat
> > + */
> > +struct sPAPROptionVector {
> > +    unsigned long *bitmap;
> > +};
> > +
> > +static sPAPROptionVector *spapr_ovec_from_bitmap(unsigned long *bitmap)
> > +{
> > +    sPAPROptionVector *ov;
> > +
> > +    g_assert(bitmap);
> > +
> > +    ov = g_new0(sPAPROptionVector, 1);
> > +    ov->bitmap = bitmap;
> > +
> > +    return ov;
> > +}
> > +
> > +sPAPROptionVector *spapr_ovec_new(void)
> > +{
> > +    return spapr_ovec_from_bitmap(bitmap_new(OV_MAXBITS));
> > +}
> > +
> > +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
> > +{
> > +    sPAPROptionVector *ov;
> > +
> > +    g_assert(ov_orig);
> > +
> > +    ov = spapr_ovec_new();
> > +    bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
> > +
> > +    return ov;
> > +}
> > +
> > +void spapr_ovec_intersect(sPAPROptionVector *ov,
> > +                          sPAPROptionVector *ov1,
> > +                          sPAPROptionVector *ov2)
> > +{
> > +    g_assert(ov);
> > +    g_assert(ov1);
> > +    g_assert(ov2);
> > +
> > +    bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
> > +}
> > +
> > +/* returns true if options bits were removed, false otherwise */
> > +bool spapr_ovec_diff(sPAPROptionVector *ov,
> > +                     sPAPROptionVector *ov_old,
> > +                     sPAPROptionVector *ov_new)
> > +{
> > +    unsigned long *change_mask = bitmap_new(OV_MAXBITS);
> > +    unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
> > +    bool bits_were_removed = false;
> > +
> > +    g_assert(ov);
> > +    g_assert(ov_old);
> > +    g_assert(ov_new);
> > +
> > +    bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
> > +    bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
> > +    bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
> > +
> > +    if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
> > +        bits_were_removed = true;
> > +    }
> > +
> > +    g_free(change_mask);
> > +    g_free(removed_bits);
> > +
> > +    return bits_were_removed;
> > +}
> > +
> > +void spapr_ovec_cleanup(sPAPROptionVector *ov)
> > +{
> > +    if (ov) {
> > +        g_free(ov->bitmap);
> > +        g_free(ov);
> > +    }
> > +}
> > +
> > +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr)
> > +{
> > +    g_assert(ov);
> > +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> > +
> > +    set_bit(bitnr, ov->bitmap);
> > +}
> > +
> > +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr)
> > +{
> > +    g_assert(ov);
> > +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> > +
> > +    clear_bit(bitnr, ov->bitmap);
> > +}
> > +
> > +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
> > +{
> > +    g_assert(ov);
> > +    g_assert_cmpint(bitnr, <, OV_MAXBITS);
> > +
> > +    return test_bit(bitnr, ov->bitmap) ? true : false;
> > +}
> > +
> > +static void guest_byte_to_bitmap(uint8_t entry, unsigned long *bitmap,
> > +                                 long bitmap_offset)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < BITS_PER_BYTE; i++) {
> > +        if (entry & (1 << (BITS_PER_BYTE - 1 - i))) {
> > +            bitmap_set(bitmap, bitmap_offset + i, 1);
> > +        }
> > +    }
> > +}
> > +
> > +static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
> > +{
> > +    uint8_t entry = 0;
> > +    int i;
> > +
> > +    for (i = 0; i < BITS_PER_BYTE; i++) {
> > +        if (test_bit(bitmap_offset + i, bitmap)) {
> > +            entry |= (1 << (BITS_PER_BYTE - 1 - i));
> > +        }
> > +    }
> > +
> > +    return entry;
> > +}
> > +
> > +static target_ulong vector_addr(target_ulong table_addr, int vector)
> > +{
> > +    uint16_t vector_count, vector_len;
> > +    int i;
> > +
> > +    vector_count = ldub_phys(&address_space_memory, table_addr) + 1;
> > +    if (vector > vector_count) {
> > +        return 0;
> > +    }
> > +    table_addr++; /* skip nr option vectors */
> > +
> > +    for (i = 0; i < vector - 1; i++) {
> > +        vector_len = ldub_phys(&address_space_memory, table_addr) + 2;
> > +        table_addr += vector_len;
> > +    }
> > +    return table_addr;
> > +}
> > +
> > +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector)
> > +{
> > +    unsigned long *bitmap;
> > +    target_ulong addr;
> > +    uint16_t vector_len;
> > +    int i;
> > +
> > +    g_assert(table_addr);
> > +    g_assert_cmpint(vector, >=, 1); /* vector numbering starts at 1 */
> > +
> > +    addr = vector_addr(table_addr, vector);
> > +    if (!addr) {
> > +        /* specified vector isn't present */
> > +        return NULL;
> > +    }
> > +
> > +    vector_len = ldub_phys(&address_space_memory, addr++) + 1;
> 
> Here you use vector_len to be the number of bytes _not_ including the
> length byte, but in other places you use the same name including the
> length byte, which is a litle confusing.

True, the additional offset to account for the length byte should be added
to the table_addr in vector_addr() rather than the vector_len beforehand.
Will fix.

> 
> > +    if (vector_len >= OV_MAXBYTES) {
> 
> Do you mean >= here, or >?  If so, what's wrong with vector_len ==
> 256, I thought that was explicitly permitted in the encoding?  If not,

Yes, you're right, that should be vector_len > OV_MAXBYTES. I documented
this in OV_MAXBYTES define but still managed to screw it up.

> then there's no need for the test since a byte load + 1 can't possibly
> exceed 256 (you could have an assert if you want).

Good point, will switch it to an assert.

> 
> > +        error_report("guest option vector length %i exceeds max of %i",
> > +                     vector_len, OV_MAXBYTES);
> > +    }
> > +    bitmap = bitmap_new(OV_MAXBITS);
> > +
> > +    for (i = 0; i < vector_len; i++) {
> > +        uint8_t entry = ldub_phys(&address_space_memory, addr + i);
> > +        if (entry) {
> > +            DPRINTFN("read guest vector %2d, byte %3d / %3d: 0x%.2x",
> > +                     vector, i + 1, vector_len, entry);
> > +            guest_byte_to_bitmap(entry, bitmap, i * BITS_PER_BYTE);
> > +        }
> > +    }
> > +
> > +    return spapr_ovec_from_bitmap(bitmap);
> 
> This is the only caller of spapr_ovec_from_bitmap().  You could
> equally well just use ovec_new() here and reach in to populate the
> bitmap.  Means you don't need to expose spapr_ovec_from_bitmap() which
> is only safe if the supplied bitmap is the right size.

Yes, we're already using internal knowledge when sizing the bitmap.
spapr_ovec_from_bitmap() might still be safer if we could actually
assert the bitmap size, but since we can't it's probably not useful.
Will drop it.

> 
> > +}
> > +
> > +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> > +                           sPAPROptionVector *ov, const char *name)
> > +{
> > +    uint8_t vec[OV_MAXBYTES + 1];
> > +    uint16_t vec_len;
> > +    unsigned long lastbit;
> > +    int i;
> > +
> > +    g_assert(ov);
> > +
> > +    lastbit = MIN(find_last_bit(ov->bitmap, OV_MAXBITS), OV_MAXBITS - 1);
> > +    vec_len = lastbit / BITS_PER_BYTE + 2;
> 
> If no bits are set at all, find_last_bit() will return 2048, which
> means you'll include a max size vector when you actually want a
> minimum size vector.

Ah, missed that. Will fix that and address the additional case of
inconsistent vector_len meaning here.

> 
> > +    g_assert_cmpint(vec_len - 2, <=, UINT8_MAX);
> > +    vec[0] = vec_len - 2; /* guest expects length encoded as n - 2 */
> > +
> > +    for (i = 1; i < vec_len; i++) {
> > +        vec[i] = guest_byte_from_bitmap(ov->bitmap, (i - 1) * BITS_PER_BYTE);
> > +        if (vec[i]) {
> > +            DPRINTFN("encoding guest vector byte %3d / %3d: 0x%.2x",
> > +                     i, vec_len, vec[i]);
> > +        }
> > +    }
> > +
> > +    return fdt_setprop(fdt, fdt_offset, name, vec, vec_len);
> > +}
> > diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> > new file mode 100644
> > index 0000000..fba2d98
> > --- /dev/null
> > +++ b/include/hw/ppc/spapr_ovec.h
> > @@ -0,0 +1,62 @@
> > +/*
> > + * QEMU SPAPR Option/Architecture Vector Definitions
> > + *
> > + * Each architecture option is organized/documented by the following
> > + * in LoPAPR 1.1, Table 244:
> > + *
> > + *   <vector number>: the bit-vector in which the option is located
> > + *   <vector byte>: the byte offset of the vector entry
> > + *   <vector bit>: the bit offset within the vector entry
> > + *
> > + * where each vector entry can be one or more bytes.
> > + *
> > + * Firmware expects a somewhat literal encoding of this bit-vector
> > + * structure, where each entry is stored in little-endian so that the
> > + * byte ordering reflects that of the documentation, but where each bit
> > + * offset is from "left-to-right" in the traditional representation of
> > + * a byte value where the MSB is the left-most bit. Thus, each
> > + * individual byte encodes the option bits in reverse order of the
> > + * documented bit.
> > + *
> > + * These definitions/helpers attempt to abstract away this internal
> > + * representation so that we can define/set/test for individual option
> > + * bits using only the documented values. This is done mainly by relying
> > + * on a bitmap to approximate the documented "bit-vector" structure and
> > + * handling conversations to-from the internal representation under the
> > + * covers.
> > + *
> > + * Copyright IBM Corp. 2016
> > + *
> > + * Authors:
> > + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#if !defined(__HW_SPAPR_OPTION_VECTORS_H__)
> > +#define __HW_SPAPR_OPTION_VECTORS_H__
> > +
> > +#include "cpu.h"
> > +
> > +typedef struct sPAPROptionVector sPAPROptionVector;
> > +
> > +#define OV_BIT(byte, bit) ((byte - 1) * BITS_PER_BYTE + bit)
> > +
> > +/* interfaces */
> > +sPAPROptionVector *spapr_ovec_new(void);
> > +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> > +void spapr_ovec_intersect(sPAPROptionVector *ov,
> > +                          sPAPROptionVector *ov1,
> > +                          sPAPROptionVector *ov2);
> > +bool spapr_ovec_diff(sPAPROptionVector *ov,
> > +                     sPAPROptionVector *ov_old,
> > +                     sPAPROptionVector *ov_new);
> > +void spapr_ovec_cleanup(sPAPROptionVector *ov);
> > +void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
> > +void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
> > +bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
> > +sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
> > +int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
> > +                           sPAPROptionVector *ov, const char *name);
> > +
> > +#endif /* !defined (__HW_SPAPR_OPTION_VECTORS_H__) */
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-14  8:37   ` Bharata B Rao
@ 2016-10-14 18:04     ` Michael Roth
  2016-10-17  2:51       ` Bharata B Rao
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-14 18:04 UTC (permalink / raw)
  To: bharata; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

Quoting Bharata B Rao (2016-10-14 03:37:32)
> On Wed, Oct 12, 2016 at 06:13:55PM -0500, Michael Roth wrote:
> > This adds machine options of the form:
> > 
> >   -machine pseries,legacy-hotplug-events=true
> >   -machine pseries,legacy-hotplug-events=false
> > 
> > to denote whether or not we wish to force the use of "legacy" style
> > hotplug events, which are surfaced through EPOW interrupts instead of
> > a dedicated interrupt source, and lack certain features necessary,
> > mainly, for memory unplug support.
> > 
> > If false, QEMU will default to "legacy" style unless the guest
> > advertises support for the newer events via
> > ibm,client-architecture-support hcall during early boot.
> > 
> > For pseries-2.7 and earlier we default to true, for newer machine
> > types we default to false.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h      |  1 +
> >  include/hw/ppc/spapr_ovec.h |  1 +
> >  3 files changed, 33 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f8cde92..d80a6fa 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
> > 
> >      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> > 
> > +    /* use dedicated HP event source if guest supports it */
> > +    if (spapr->use_hotplug_event_source) {
> > +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> 
> The above comment can be confusing. Here you really mean that
> the machine type version supports OV5_HP_EVT right ? Because
> guest support for the same is determined during cas call later.

What trying to get it across that support would only be enabled if the
guest indicates support for it later. What about something like:

/* advertise support for dedicated HP event source to guests */

> 
> Regards,
> Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-14  4:38   ` David Gibson
@ 2016-10-14 18:08     ` Michael Roth
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-14 18:08 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

Quoting David Gibson (2016-10-13 23:38:19)
> On Wed, Oct 12, 2016 at 06:13:55PM -0500, Michael Roth wrote:
> > This adds machine options of the form:
> > 
> >   -machine pseries,legacy-hotplug-events=true
> >   -machine pseries,legacy-hotplug-events=false
> > 
> > to denote whether or not we wish to force the use of "legacy" style
> > hotplug events, which are surfaced through EPOW interrupts instead of
> > a dedicated interrupt source, and lack certain features necessary,
> > mainly, for memory unplug support.
> > 
> > If false, QEMU will default to "legacy" style unless the guest
> > advertises support for the newer events via
> > ibm,client-architecture-support hcall during early boot.
> > 
> > For pseries-2.7 and earlier we default to true, for newer machine
> > types we default to false.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> Hrm.. I think it would be a little clearer if you could find a wording
> such that both the internal variable and the external property have
> the same sense - i.e. get rid of the ! in the property getters /
> setters.

Ok, wasn't sure which direction was more useful from user standpoint. I
think something like modern-hotplug-events=true|false might be
reasonable though.

> 
> > ---
> >  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h      |  1 +
> >  include/hw/ppc/spapr_ovec.h |  1 +
> >  3 files changed, 33 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f8cde92..d80a6fa 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
> >  
> >      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> >  
> > +    /* use dedicated HP event source if guest supports it */
> > +    if (spapr->use_hotplug_event_source) {
> > +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> > +    }
> > +
> >      /* init CPUs */
> >      if (machine->cpu_model == NULL) {
> >          machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> > @@ -2172,16 +2177,39 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
> >      spapr->kvm_type = g_strdup(value);
> >  }
> >  
> > +static bool spapr_get_legacy_hotplug_events(Object *obj, Error **errp)
> > +{
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > +
> > +    return !spapr->use_hotplug_event_source;
> > +}
> > +
> > +static void spapr_set_legacy_hotplug_events(Object *obj, bool value,
> > +                                            Error **errp)
> > +{
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > +
> > +    spapr->use_hotplug_event_source = !value;
> > +}
> > +
> >  static void spapr_machine_initfn(Object *obj)
> >  {
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> >  
> >      spapr->htab_fd = -1;
> > +    spapr->use_hotplug_event_source = true;
> >      object_property_add_str(obj, "kvm-type",
> >                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
> >      object_property_set_description(obj, "kvm-type",
> >                                      "Specifies the KVM virtualization mode (HV, PR)",
> >                                      NULL);
> > +    object_property_add_bool(obj, "legacy-hotplug-events",
> > +                            spapr_get_legacy_hotplug_events,
> > +                            spapr_set_legacy_hotplug_events,
> > +                            NULL);
> > +    object_property_set_description(obj, "legacy-hotplug-events",
> > +                                    "Use deprecated EPOW mechanism for hotplug events",
> > +                                    NULL);
> >  }
> >  
> >  static void spapr_machine_finalizefn(Object *obj)
> > @@ -2518,6 +2546,9 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
> >  
> >  static void spapr_machine_2_7_instance_options(MachineState *machine)
> >  {
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> > +
> > +    spapr->use_hotplug_event_source = false;
> >  }
> >  
> >  static void spapr_machine_2_7_class_options(MachineClass *mc)
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 27a3328..d1a4a14 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -74,6 +74,7 @@ struct sPAPRMachineState {
> >      uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> > +    bool use_hotplug_event_source;
> >  
> >      /* Migration state */
> >      int htab_save_index;
> > diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> > index 47fa04c..92167c6 100644
> > --- a/include/hw/ppc/spapr_ovec.h
> > +++ b/include/hw/ppc/spapr_ovec.h
> > @@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
> >  /* option vector 5 */
> >  #define OV5_DRCONF_MEMORY       OV_BIT(2, 2)
> >  #define OV5_FORM1_AFFINITY      OV_BIT(5, 0)
> > +#define OV5_HP_EVT              OV_BIT(6, 5)
> >  
> >  /* interfaces */
> >  sPAPROptionVector *spapr_ovec_new(void);
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-14  4:56   ` David Gibson
@ 2016-10-14 18:44     ` Michael Roth
  2016-10-16 23:39       ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Roth @ 2016-10-14 18:44 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

Quoting David Gibson (2016-10-13 23:56:43)
> On Wed, Oct 12, 2016 at 06:13:56PM -0500, Michael Roth wrote:
> > Hotplug events were previously delivered using an EPOW interrupt
> > and were queued by linux guests into a circular buffer. For traditional
> > EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> > events there are cases where this buffer can be exhausted, resulting
> > in the loss of hotplug events, resets, etc.
> > 
> > Newer-style hotplug event are delivered using a dedicated event source.
> > We enable this in supported guests by adding standard an additional
> > event source in the guest device-tree via /event-sources, and, if
> > the guest advertises support for the newer-style hotplug events,
> > using the corresponding interrupt to signal the available of
> > hotplug/unplug events.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> So.. are you saying that as well as allowing new event types, the new
> special hotplug event souce effectively allows for a bigger queue?
> 
> Does that mean that we didn't even necessarily need the base+length
> unplug events, because we could now have sent the many single-LMB
> unplug requests that were necessary?  Or does it not increase the
> effective queue enough for that?

I assume there are still some internal limits, but the events
are now processed using a workqueue which should provide a bit more
headroom than the RTAS event buffer used for EPOW events. Maybe
John (on cc:) can provide more insight into what the actual limits

In either case, the possibility for huge memory hotplug/unplug
situations still warrant the use of count+index I think, and since
support for the new event delivery mechanism is negotiated using the
same option bit as count+index, there's not really any reason not to
use it when we can. For situations where we can't (CPU/PCI), it
might give us a bit of breathing room there as well.

> 
> > ---
> >  hw/ppc/spapr.c         |  10 ++--
> >  hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
> >  include/hw/ppc/spapr.h |   3 +-
> >  3 files changed, 120 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d80a6fa..2037222 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >                                     hwaddr initrd_size,
> >                                     hwaddr kernel_size,
> >                                     bool little_endian,
> > -                                   const char *kernel_cmdline,
> > -                                   uint32_t epow_irq)
> > +                                   const char *kernel_cmdline)
> >  {
> >      void *fdt;
> >      uint32_t start_prop = cpu_to_be32(initrd_base);
> > @@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >      _FDT((fdt_end_node(fdt)));
> >  
> >      /* event-sources */
> > -    spapr_events_fdt_skel(fdt, epow_irq);
> > +    spapr_events_fdt_skel(fdt);
> >  
> >      /* /hypervisor node */
> >      if (kvm_enabled()) {
> > @@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      }
> >      g_free(filename);
> >  
> > -    /* Set up EPOW events infrastructure */
> > +    /* Set up RTAS event infrastructure */
> >      spapr_events_init(spapr);
> >  
> >      /* Set up the RTC RTAS interfaces */
> > @@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      /* Prepare the device tree */
> >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> >                                              kernel_size, kernel_le,
> > -                                            kernel_cmdline,
> > -                                            spapr->check_exception_irq);
> > +                                            kernel_cmdline);
> >      assert(spapr->fdt_skel != NULL);
> >  
> >      /* used by RTAS */
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 4c7b6ae..f8bbec6 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -40,6 +40,7 @@
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qemu/help_option.h"
> >  #include "qemu/bcd.h"
> > +#include "hw/ppc/spapr_ovec.h"
> >  #include <libfdt.h>
> >  
> >  struct rtas_error_log {
> > @@ -206,28 +207,104 @@ struct hp_log_full {
> >      struct rtas_event_log_v6_hp hp;
> >  } QEMU_PACKED;
> >  
> > -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> > -#define EVENT_MASK_EPOW                      0x40000000
> > -#define EVENT_MASK_HOTPLUG                   0x10000000
> > -#define EVENT_MASK_IO                        0x08000000
> > +typedef enum EventClassIndex {
> > +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > +    EVENT_CLASS_EPOW                = 1,
> > +    EVENT_CLASS_RESERVED            = 2,
> > +    EVENT_CLASS_HOT_PLUG            = 3,
> > +    EVENT_CLASS_IO                  = 4,
> > +    EVENT_CLASS_MAX
> > +} EventClassIndex;
> > +
> > +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> > +
> > +typedef struct EventSource {
> > +    const char *name;
> > +    int irq;
> > +    uint32_t mask;
> > +    bool enabled;
> > +} EventSource;
> > +
> > +static EventSource event_source[EVENT_CLASS_MAX] = {
> > +    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
> > +    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
> > +    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
> > +    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
> > +};
> > +
> > +static void rtas_event_source_register(EventClassIndex index, int irq)
> > +{
> > +    /* we only support 1 irq per event class at the moment */
> > +    g_assert(!event_source[index].enabled);
> > +    event_source[index].irq = irq;
> > +    event_source[index].mask = EVENT_CLASS_MASK(index);
> > +    event_source[index].enabled = true;
> > +}
> 
> I really don't like adding a mutable global table.  This should
> probably be under the sPAPRMachineState.

I think I started off going that route. Not sure what steered me in this
direction, but will take another look at that approach.

> 
> > -void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> > +void spapr_events_fdt_skel(void *fdt)
> >  {
> > -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> > +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> > +    int i, count = 0;
> >  
> >      _FDT((fdt_begin_node(fdt, "event-sources")));
> >  
> > +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> > +        /* TODO: what does 0 entail? */
> 
> It's just part of the interrupt specifier format expected by the
> event-sources binding.  It's not really important.
> 
> > +        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
> > +
> > +        if (!event_source[i].enabled) {
> > +            continue;
> > +        }
> > +
> > +        _FDT((fdt_begin_node(fdt, event_source[i].name)));
> > +        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> > +        _FDT((fdt_end_node(fdt)));
> > +
> > +        irq_ranges[count++] = interrupts[0];
> > +        irq_ranges[count++] = cpu_to_be32(1);
> > +    }
> > +
> > +    /* TODO: confirm the count is the last expected element */
> > +    irq_ranges[count] = cpu_to_be32(count);
> > +    count++;
> > +
> >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > -                       irq_ranges, sizeof(irq_ranges))));
> > +                       irq_ranges, count * sizeof(uint32_t))));
> >  
> > -    _FDT((fdt_begin_node(fdt, "epow-events")));
> > -    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> >      _FDT((fdt_end_node(fdt)));
> > +}
> >  
> > -    _FDT((fdt_end_node(fdt)));
> > +static const EventSource *rtas_event_log_to_source(int log_type)
> > +{
> > +    const EventSource *source;
> > +
> > +    switch (log_type) {
> > +        case RTAS_LOG_TYPE_HOTPLUG:
> > +            source = &event_source[EVENT_CLASS_HOT_PLUG];
> > +            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
> > +                break;
> > +            }
> 
> This should probably be using the flag you already have in the
> MachineState, rather than a global.
> 
> > +            /* fall back to epow for legacy hotplug interrupt source */
> > +        case RTAS_LOG_TYPE_EPOW:
> > +            source = &event_source[EVENT_CLASS_EPOW];
> > +            break;
> > +        default:
> > +            source = NULL;
> > +    }
> > +
> > +    return source;
> > +}
> > +
> > +static int rtas_event_log_to_irq(int log_type)
> > +{
> > +    const EventSource *source = rtas_event_log_to_source(log_type);
> > +
> > +    g_assert(source);
> > +    g_assert(source->enabled);
> > +
> > +    return source->irq;
> >  }
> >  
> >  static void rtas_event_log_queue(int log_type, void *data, bool exception)
> > @@ -248,19 +325,14 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >      sPAPREventLogEntry *entry = NULL;
> >  
> > -    /* we only queue EPOW events atm. */
> > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > -        return NULL;
> > -    }
> > -
> >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> > +
> >          if (entry->exception != exception) {
> >              continue;
> >          }
> >  
> > -        /* EPOW and hotplug events are surfaced in the same manner */
> > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > +        if (source->mask & event_mask) {
> >              break;
> >          }
> >      }
> > @@ -277,19 +349,14 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >      sPAPREventLogEntry *entry = NULL;
> >  
> > -    /* we only queue EPOW events atm. */
> > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > -        return false;
> > -    }
> > -
> >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> > +
> >          if (entry->exception != exception) {
> >              continue;
> >          }
> >  
> > -        /* EPOW and hotplug events are surfaced in the same manner */
> > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > +        if (source->mask & event_mask) {
> >              return true;
> >          }
> >      }
> > @@ -377,7 +444,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >  
> >      rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_EPOW)));
> >  }
> >  
> >  static void spapr_hotplug_set_signalled(uint32_t drc_index)
> > @@ -459,7 +527,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >  
> >      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_HOTPLUG)));
> >  }
> >  
> >  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> > @@ -505,6 +574,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >      uint64_t xinfo;
> >      sPAPREventLogEntry *event;
> >      struct rtas_error_log *hdr;
> > +    int i;
> >  
> >      if ((nargs < 6) || (nargs > 7) || nret != 1) {
> >          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > @@ -541,8 +611,11 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> >       * do the latter here, since our code relies on edge-triggered
> >       * interrupts.
> >       */
> > -    if (rtas_event_log_contains(mask, true)) {
> > -        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> > +        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
> > +            g_assert(event_source[i].enabled);
> > +            qemu_irq_pulse(xics_get_qirq(spapr->xics, event_source[i].irq));
> > +        }
> >      }
> >  
> >      return;
> > @@ -594,8 +667,17 @@ out_no_events:
> >  void spapr_events_init(sPAPRMachineState *spapr)
> >  {
> >      QTAILQ_INIT(&spapr->pending_events);
> > -    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, 0, false,
> > -                                            &error_fatal);
> > +
> > +    rtas_event_source_register(EVENT_CLASS_EPOW,
> > +                               xics_spapr_alloc(spapr->xics, 0, 0, false,
> > +                                                &error_fatal));
> > +
> > +    if (spapr->use_hotplug_event_source) {
> > +        rtas_event_source_register(EVENT_CLASS_HOT_PLUG,
> > +                                   xics_spapr_alloc(spapr->xics, 0, 0, false,
> > +                                                    &error_fatal));
> > +    }
> > +
> >      spapr->epow_notifier.notify = spapr_powerdown_req;
> >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> >      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index d1a4a14..2295ac6 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -71,7 +71,6 @@ struct sPAPRMachineState {
> >      sPAPROptionVector *ov5_cas;
> >      bool cas_reboot;
> >  
> > -    uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> >      bool use_hotplug_event_source;
> > @@ -579,7 +578,7 @@ struct sPAPREventLogEntry {
> >  };
> >  
> >  void spapr_events_init(sPAPRMachineState *sm);
> > -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
> > +void spapr_events_fdt_skel(void *fdt);
> >  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
> >                                   target_ulong addr, target_ulong size,
> >                                   bool cpu_update,
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-14  8:46   ` Bharata B Rao
@ 2016-10-14 18:51     ` Michael Roth
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-14 18:51 UTC (permalink / raw)
  To: bharata; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

Quoting Bharata B Rao (2016-10-14 03:46:20)
> On Wed, Oct 12, 2016 at 06:13:56PM -0500, Michael Roth wrote:
> > Hotplug events were previously delivered using an EPOW interrupt
> > and were queued by linux guests into a circular buffer. For traditional
> > EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> > events there are cases where this buffer can be exhausted, resulting
> > in the loss of hotplug events, resets, etc.
> > 
> > Newer-style hotplug event are delivered using a dedicated event source.
> > We enable this in supported guests by adding standard an additional
> > event source in the guest device-tree via /event-sources, and, if
> > the guest advertises support for the newer-style hotplug events,
> > using the corresponding interrupt to signal the available of
> > hotplug/unplug events.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |  10 ++--
> >  hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
> >  include/hw/ppc/spapr.h |   3 +-
> >  3 files changed, 120 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d80a6fa..2037222 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >                                     hwaddr initrd_size,
> >                                     hwaddr kernel_size,
> >                                     bool little_endian,
> > -                                   const char *kernel_cmdline,
> > -                                   uint32_t epow_irq)
> > +                                   const char *kernel_cmdline)
> >  {
> >      void *fdt;
> >      uint32_t start_prop = cpu_to_be32(initrd_base);
> > @@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >      _FDT((fdt_end_node(fdt)));
> > 
> >      /* event-sources */
> > -    spapr_events_fdt_skel(fdt, epow_irq);
> > +    spapr_events_fdt_skel(fdt);
> > 
> >      /* /hypervisor node */
> >      if (kvm_enabled()) {
> > @@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      }
> >      g_free(filename);
> > 
> > -    /* Set up EPOW events infrastructure */
> > +    /* Set up RTAS event infrastructure */
> >      spapr_events_init(spapr);
> > 
> >      /* Set up the RTC RTAS interfaces */
> > @@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      /* Prepare the device tree */
> >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> >                                              kernel_size, kernel_le,
> > -                                            kernel_cmdline,
> > -                                            spapr->check_exception_irq);
> > +                                            kernel_cmdline);
> >      assert(spapr->fdt_skel != NULL);
> > 
> >      /* used by RTAS */
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 4c7b6ae..f8bbec6 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -40,6 +40,7 @@
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qemu/help_option.h"
> >  #include "qemu/bcd.h"
> > +#include "hw/ppc/spapr_ovec.h"
> >  #include <libfdt.h>
> > 
> >  struct rtas_error_log {
> > @@ -206,28 +207,104 @@ struct hp_log_full {
> >      struct rtas_event_log_v6_hp hp;
> >  } QEMU_PACKED;
> > 
> > -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> > -#define EVENT_MASK_EPOW                      0x40000000
> > -#define EVENT_MASK_HOTPLUG                   0x10000000
> > -#define EVENT_MASK_IO                        0x08000000
> > +typedef enum EventClassIndex {
> > +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > +    EVENT_CLASS_EPOW                = 1,
> > +    EVENT_CLASS_RESERVED            = 2,
> > +    EVENT_CLASS_HOT_PLUG            = 3,
> > +    EVENT_CLASS_IO                  = 4,
> > +    EVENT_CLASS_MAX
> > +} EventClassIndex;
> > +
> > +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> > +
> > +typedef struct EventSource {
> > +    const char *name;
> > +    int irq;
> > +    uint32_t mask;
> > +    bool enabled;
> > +} EventSource;
> > +
> > +static EventSource event_source[EVENT_CLASS_MAX] = {
> > +    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
> > +    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
> > +    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
> > +    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
> > +};
> > +
> > +static void rtas_event_source_register(EventClassIndex index, int irq)
> > +{
> > +    /* we only support 1 irq per event class at the moment */
> > +    g_assert(!event_source[index].enabled);
> > +    event_source[index].irq = irq;
> > +    event_source[index].mask = EVENT_CLASS_MASK(index);
> > +    event_source[index].enabled = true;
> > +}
> > 
> > -void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> > +void spapr_events_fdt_skel(void *fdt)
> >  {
> > -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> > +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> > +    int i, count = 0;
> > 
> >      _FDT((fdt_begin_node(fdt, "event-sources")));
> > 
> > +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> > +        /* TODO: what does 0 entail? */
> > +        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
> > +
> > +        if (!event_source[i].enabled) {
> > +            continue;
> > +        }
> > +
> > +        _FDT((fdt_begin_node(fdt, event_source[i].name)));
> > +        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> > +        _FDT((fdt_end_node(fdt)));
> > +
> > +        irq_ranges[count++] = interrupts[0];
> > +        irq_ranges[count++] = cpu_to_be32(1);
> > +    }
> > +
> > +    /* TODO: confirm the count is the last expected element */
> > +    irq_ranges[count] = cpu_to_be32(count);
> > +    count++;
> > +
> >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > -                       irq_ranges, sizeof(irq_ranges))));
> > +                       irq_ranges, count * sizeof(uint32_t))));
> > 
> > -    _FDT((fdt_begin_node(fdt, "epow-events")));
> > -    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> >      _FDT((fdt_end_node(fdt)));
> > +}
> > 
> > -    _FDT((fdt_end_node(fdt)));
> > +static const EventSource *rtas_event_log_to_source(int log_type)
> > +{
> > +    const EventSource *source;
> > +
> > +    switch (log_type) {
> > +        case RTAS_LOG_TYPE_HOTPLUG:
> > +            source = &event_source[EVENT_CLASS_HOT_PLUG];
> > +            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
> > +                break;
> > +            }
> 
> In addition to the above .enabled check, shouldn't you be checking if
> the guest indeed supports the dedicated hotplug interrupt source before
> returning the source ?

Yes, I believe you're right. I'd been relying legacy-hotplug-events=true
to test the old signalling mechanism (which leaves this event source
disabled), but haven't tried PCI/CPU since adding the ovec stuff, so
didn't notice those were broken with legacy-hotplug-events=false.
Will fix it up in next submission.

> 
> This I believe is the reason for the CPU hotplug failures I that mentioned
> in reply to your 11/11 thread. I am on 4.7.x kernel which probably doesn't
> support hotplug interrupt source, but QEMU ends up registering and raising
> such an interrupt.
> 
> Regards,
> Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type
  2016-10-14  4:59   ` David Gibson
@ 2016-10-14 18:52     ` Michael Roth
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Roth @ 2016-10-14 18:52 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

Quoting David Gibson (2016-10-13 23:59:37)
> On Wed, Oct 12, 2016 at 06:13:57PM -0500, Michael Roth wrote:
> > From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > 
> > Add support for DRC count indexed hotplug ID type which is primarily
> > needed for memory hot unplug. This type allows for specifying the
> > number of DRs that should be plugged/unplugged starting from a given
> > DRC index.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > * updated rtas_event_log_v6_hp to reflect count/index field ordering
> >   used in PAPR hotplug ACR
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_events.c  | 74 ++++++++++++++++++++++++++++++++++++++++----------
> >  include/hw/ppc/spapr.h |  4 +++
> >  2 files changed, 63 insertions(+), 15 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index f8bbec6..eeca800 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -175,6 +175,16 @@ struct epow_log_full {
> >      struct rtas_event_log_v6_epow epow;
> >  } QEMU_PACKED;
> >  
> > +union drc_identifier {
> > +    uint32_t index;
> > +    uint32_t count;
> > +    struct {
> > +        uint32_t count;
> > +        uint32_t index;
> > +    } count_indexed;
> > +    char name[1];
> > +} QEMU_PACKED;
> > +
> >  struct rtas_event_log_v6_hp {
> >  #define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> >      struct rtas_event_log_v6_section_header hdr;
> > @@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
> >  #define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> >  #define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> >  #define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> > +#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED              4
> >      uint8_t reserved;
> > -    union {
> > -        uint32_t index;
> > -        uint32_t count;
> > -        char name[1];
> > -    } drc;
> > +    union drc_identifier drc_id;
> >  } QEMU_PACKED;
> >  
> >  struct hp_log_full {
> > @@ -457,7 +464,7 @@ static void spapr_hotplug_set_signalled(uint32_t drc_index)
> >  
> >  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >                                      sPAPRDRConnectorType drc_type,
> > -                                    uint32_t drc)
> > +                                    union drc_identifier *drc_id)
> >  {
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >      struct hp_log_full *new_hp;
> > @@ -502,7 +509,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >      case SPAPR_DR_CONNECTOR_TYPE_PCI:
> >          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> >          if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
> > -            spapr_hotplug_set_signalled(drc);
> > +            spapr_hotplug_set_signalled(drc_id->index);
> >          }
> >          break;
> >      case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > @@ -520,9 +527,16 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> >      }
> >  
> >      if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
> > -        hp->drc.count = cpu_to_be32(drc);
> > +        hp->drc_id.count = cpu_to_be32(drc_id->count);
> >      } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
> > -        hp->drc.index = cpu_to_be32(drc);
> > +        hp->drc_id.index = cpu_to_be32(drc_id->index);
> > +    } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
> > +        /* we should not be using count_indexed value unless the guest
> > +         * supports dedicated hotplug event source
> > +         */
> > +        g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
> > +        hp->drc_id.count_indexed.count = cpu_to_be32(drc_id->count_indexed.count);
> > +        hp->drc_id.count_indexed.index = cpu_to_be32(drc_id->count_indexed.index);
> >      }
> >  
> >      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> > @@ -535,34 +549,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> >  {
> >      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> >      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> > -    uint32_t index = drck->get_index(drc);
> > +    union drc_identifier drc_id;
> >  
> > +    drc_id.index = drck->get_index(drc);
> >      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> > -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
> > +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
> >  }
> >  
> >  void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
> >  {
> >      sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> >      sPAPRDRConnectorType drc_type = drck->get_type(drc);
> > -    uint32_t index = drck->get_index(drc);
> > +    union drc_identifier drc_id;
> >  
> > +    drc_id.index = drck->get_index(drc);
> >      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> > -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
> > +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >  }
> >  
> >  void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
> >                                         uint32_t count)
> >  {
> > +    union drc_identifier drc_id;
> > +
> > +    drc_id.count = count;
> >      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> > -                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, count);
> > +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
> >  }
> >  
> >  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
> >                                            uint32_t count)
> >  {
> > +    union drc_identifier drc_id;
> > +
> > +    drc_id.count = count;
> >      spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT,
> > -                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, count);
> > +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> > +}
> > +
> > +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> > +                                            uint32_t count, uint32_t index)
> > +{
> > +    union drc_identifier drc_id;
> > +
> > +    drc_id.count_indexed.count = count;
> > +    drc_id.count_indexed.index = index;
> > +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> > +                            RTAS_LOG_V6_HP_ACTION_ADD, drc_type, &drc_id);
> > +}
> > +
> > +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> > +                                               uint32_t count, uint32_t index)
> > +{
> > +    union drc_identifier drc_id;
> > +
> > +    drc_id.count_indexed.count = count;
> > +    drc_id.count_indexed.index = index;
> > +    spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED,
> > +                            RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, &drc_id);
> >  }
> >  
> >  static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 2295ac6..11a2597 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -602,6 +602,10 @@ void spapr_hotplug_req_add_by_count(sPAPRDRConnectorType drc_type,
> >                                         uint32_t count);
> >  void spapr_hotplug_req_remove_by_count(sPAPRDRConnectorType drc_type,
> >                                            uint32_t count);
> > +void spapr_hotplug_req_add_by_count_indexed(sPAPRDRConnectorType drc_type,
> > +                                            uint32_t count, uint32_t index);
> > +void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
> > +                                               uint32_t count, uint32_t index);
> >  void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error **errp);
> >  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
> >                                      sPAPRMachineState *spapr);

Was this a mis-fire?

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source
  2016-10-14 18:44     ` Michael Roth
@ 2016-10-16 23:39       ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-16 23:39 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, bharata, nfont, jallen

[-- Attachment #1: Type: text/plain, Size: 16141 bytes --]

On Fri, Oct 14, 2016 at 01:44:17PM -0500, Michael Roth wrote:
> Quoting David Gibson (2016-10-13 23:56:43)
> > On Wed, Oct 12, 2016 at 06:13:56PM -0500, Michael Roth wrote:
> > > Hotplug events were previously delivered using an EPOW interrupt
> > > and were queued by linux guests into a circular buffer. For traditional
> > > EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> > > events there are cases where this buffer can be exhausted, resulting
> > > in the loss of hotplug events, resets, etc.
> > > 
> > > Newer-style hotplug event are delivered using a dedicated event source.
> > > We enable this in supported guests by adding standard an additional
> > > event source in the guest device-tree via /event-sources, and, if
> > > the guest advertises support for the newer-style hotplug events,
> > > using the corresponding interrupt to signal the available of
> > > hotplug/unplug events.
> > > 
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > 
> > So.. are you saying that as well as allowing new event types, the new
> > special hotplug event souce effectively allows for a bigger queue?
> > 
> > Does that mean that we didn't even necessarily need the base+length
> > unplug events, because we could now have sent the many single-LMB
> > unplug requests that were necessary?  Or does it not increase the
> > effective queue enough for that?
> 
> I assume there are still some internal limits, but the events
> are now processed using a workqueue which should provide a bit more
> headroom than the RTAS event buffer used for EPOW events. Maybe
> John (on cc:) can provide more insight into what the actual limits
> 
> In either case, the possibility for huge memory hotplug/unplug
> situations still warrant the use of count+index I think, and since
> support for the new event delivery mechanism is negotiated using the
> same option bit as count+index, there's not really any reason not to
> use it when we can. For situations where we can't (CPU/PCI), it
> might give us a bit of breathing room there as well.

Ok, makes sense.  Thanks for the extra information.

> 
> > 
> > > ---
> > >  hw/ppc/spapr.c         |  10 ++--
> > >  hw/ppc/spapr_events.c  | 148 ++++++++++++++++++++++++++++++++++++++-----------
> > >  include/hw/ppc/spapr.h |   3 +-
> > >  3 files changed, 120 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index d80a6fa..2037222 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -275,8 +275,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> > >                                     hwaddr initrd_size,
> > >                                     hwaddr kernel_size,
> > >                                     bool little_endian,
> > > -                                   const char *kernel_cmdline,
> > > -                                   uint32_t epow_irq)
> > > +                                   const char *kernel_cmdline)
> > >  {
> > >      void *fdt;
> > >      uint32_t start_prop = cpu_to_be32(initrd_base);
> > > @@ -437,7 +436,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> > >      _FDT((fdt_end_node(fdt)));
> > >  
> > >      /* event-sources */
> > > -    spapr_events_fdt_skel(fdt, epow_irq);
> > > +    spapr_events_fdt_skel(fdt);
> > >  
> > >      /* /hypervisor node */
> > >      if (kvm_enabled()) {
> > > @@ -1944,7 +1943,7 @@ static void ppc_spapr_init(MachineState *machine)
> > >      }
> > >      g_free(filename);
> > >  
> > > -    /* Set up EPOW events infrastructure */
> > > +    /* Set up RTAS event infrastructure */
> > >      spapr_events_init(spapr);
> > >  
> > >      /* Set up the RTC RTAS interfaces */
> > > @@ -2076,8 +2075,7 @@ static void ppc_spapr_init(MachineState *machine)
> > >      /* Prepare the device tree */
> > >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> > >                                              kernel_size, kernel_le,
> > > -                                            kernel_cmdline,
> > > -                                            spapr->check_exception_irq);
> > > +                                            kernel_cmdline);
> > >      assert(spapr->fdt_skel != NULL);
> > >  
> > >      /* used by RTAS */
> > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > > index 4c7b6ae..f8bbec6 100644
> > > --- a/hw/ppc/spapr_events.c
> > > +++ b/hw/ppc/spapr_events.c
> > > @@ -40,6 +40,7 @@
> > >  #include "hw/ppc/spapr_drc.h"
> > >  #include "qemu/help_option.h"
> > >  #include "qemu/bcd.h"
> > > +#include "hw/ppc/spapr_ovec.h"
> > >  #include <libfdt.h>
> > >  
> > >  struct rtas_error_log {
> > > @@ -206,28 +207,104 @@ struct hp_log_full {
> > >      struct rtas_event_log_v6_hp hp;
> > >  } QEMU_PACKED;
> > >  
> > > -#define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> > > -#define EVENT_MASK_EPOW                      0x40000000
> > > -#define EVENT_MASK_HOTPLUG                   0x10000000
> > > -#define EVENT_MASK_IO                        0x08000000
> > > +typedef enum EventClassIndex {
> > > +    EVENT_CLASS_INTERNAL_ERRORS     = 0,
> > > +    EVENT_CLASS_EPOW                = 1,
> > > +    EVENT_CLASS_RESERVED            = 2,
> > > +    EVENT_CLASS_HOT_PLUG            = 3,
> > > +    EVENT_CLASS_IO                  = 4,
> > > +    EVENT_CLASS_MAX
> > > +} EventClassIndex;
> > > +
> > > +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> > > +
> > > +typedef struct EventSource {
> > > +    const char *name;
> > > +    int irq;
> > > +    uint32_t mask;
> > > +    bool enabled;
> > > +} EventSource;
> > > +
> > > +static EventSource event_source[EVENT_CLASS_MAX] = {
> > > +    [EVENT_CLASS_INTERNAL_ERRORS]       = { .name = "internal-errors", },
> > > +    [EVENT_CLASS_EPOW]                  = { .name = "epow-events", },
> > > +    [EVENT_CLASS_HOT_PLUG]              = { .name = "hot-plug-events", },
> > > +    [EVENT_CLASS_IO]                    = { .name = "ibm,io-events", },
> > > +};
> > > +
> > > +static void rtas_event_source_register(EventClassIndex index, int irq)
> > > +{
> > > +    /* we only support 1 irq per event class at the moment */
> > > +    g_assert(!event_source[index].enabled);
> > > +    event_source[index].irq = irq;
> > > +    event_source[index].mask = EVENT_CLASS_MASK(index);
> > > +    event_source[index].enabled = true;
> > > +}
> > 
> > I really don't like adding a mutable global table.  This should
> > probably be under the sPAPRMachineState.
> 
> I think I started off going that route. Not sure what steered me in this
> direction, but will take another look at that approach.
> 
> > 
> > > -void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> > > +void spapr_events_fdt_skel(void *fdt)
> > >  {
> > > -    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > > -    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> > > +    uint32_t irq_ranges[EVENT_CLASS_MAX * 2];
> > > +    int i, count = 0;
> > >  
> > >      _FDT((fdt_begin_node(fdt, "event-sources")));
> > >  
> > > +    for (i = 0, count = 0; i < EVENT_CLASS_MAX; i++) {
> > > +        /* TODO: what does 0 entail? */
> > 
> > It's just part of the interrupt specifier format expected by the
> > event-sources binding.  It's not really important.
> > 
> > > +        uint32_t interrupts[] = { cpu_to_be32(event_source[i].irq), 0 };
> > > +
> > > +        if (!event_source[i].enabled) {
> > > +            continue;
> > > +        }
> > > +
> > > +        _FDT((fdt_begin_node(fdt, event_source[i].name)));
> > > +        _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> > > +        _FDT((fdt_end_node(fdt)));
> > > +
> > > +        irq_ranges[count++] = interrupts[0];
> > > +        irq_ranges[count++] = cpu_to_be32(1);
> > > +    }
> > > +
> > > +    /* TODO: confirm the count is the last expected element */
> > > +    irq_ranges[count] = cpu_to_be32(count);
> > > +    count++;
> > > +
> > >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> > >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> > >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > > -                       irq_ranges, sizeof(irq_ranges))));
> > > +                       irq_ranges, count * sizeof(uint32_t))));
> > >  
> > > -    _FDT((fdt_begin_node(fdt, "epow-events")));
> > > -    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> > >      _FDT((fdt_end_node(fdt)));
> > > +}
> > >  
> > > -    _FDT((fdt_end_node(fdt)));
> > > +static const EventSource *rtas_event_log_to_source(int log_type)
> > > +{
> > > +    const EventSource *source;
> > > +
> > > +    switch (log_type) {
> > > +        case RTAS_LOG_TYPE_HOTPLUG:
> > > +            source = &event_source[EVENT_CLASS_HOT_PLUG];
> > > +            if (event_source[EVENT_CLASS_HOT_PLUG].enabled) {
> > > +                break;
> > > +            }
> > 
> > This should probably be using the flag you already have in the
> > MachineState, rather than a global.
> > 
> > > +            /* fall back to epow for legacy hotplug interrupt source */
> > > +        case RTAS_LOG_TYPE_EPOW:
> > > +            source = &event_source[EVENT_CLASS_EPOW];
> > > +            break;
> > > +        default:
> > > +            source = NULL;
> > > +    }
> > > +
> > > +    return source;
> > > +}
> > > +
> > > +static int rtas_event_log_to_irq(int log_type)
> > > +{
> > > +    const EventSource *source = rtas_event_log_to_source(log_type);
> > > +
> > > +    g_assert(source);
> > > +    g_assert(source->enabled);
> > > +
> > > +    return source->irq;
> > >  }
> > >  
> > >  static void rtas_event_log_queue(int log_type, void *data, bool exception)
> > > @@ -248,19 +325,14 @@ static sPAPREventLogEntry *rtas_event_log_dequeue(uint32_t event_mask,
> > >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >      sPAPREventLogEntry *entry = NULL;
> > >  
> > > -    /* we only queue EPOW events atm. */
> > > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > > -        return NULL;
> > > -    }
> > > -
> > >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > > +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> > > +
> > >          if (entry->exception != exception) {
> > >              continue;
> > >          }
> > >  
> > > -        /* EPOW and hotplug events are surfaced in the same manner */
> > > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > > +        if (source->mask & event_mask) {
> > >              break;
> > >          }
> > >      }
> > > @@ -277,19 +349,14 @@ static bool rtas_event_log_contains(uint32_t event_mask, bool exception)
> > >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > >      sPAPREventLogEntry *entry = NULL;
> > >  
> > > -    /* we only queue EPOW events atm. */
> > > -    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > > -        return false;
> > > -    }
> > > -
> > >      QTAILQ_FOREACH(entry, &spapr->pending_events, next) {
> > > +        const EventSource *source = rtas_event_log_to_source(entry->log_type);
> > > +
> > >          if (entry->exception != exception) {
> > >              continue;
> > >          }
> > >  
> > > -        /* EPOW and hotplug events are surfaced in the same manner */
> > > -        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > > -            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > > +        if (source->mask & event_mask) {
> > >              return true;
> > >          }
> > >      }
> > > @@ -377,7 +444,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> > >  
> > >      rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, new_epow, true);
> > >  
> > > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > > +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_EPOW)));
> > >  }
> > >  
> > >  static void spapr_hotplug_set_signalled(uint32_t drc_index)
> > > @@ -459,7 +527,8 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
> > >  
> > >      rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> > >  
> > > -    qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > > +    qemu_irq_pulse(xics_get_qirq(spapr->xics,
> > > +                                 rtas_event_log_to_irq(RTAS_LOG_TYPE_HOTPLUG)));
> > >  }
> > >  
> > >  void spapr_hotplug_req_add_by_index(sPAPRDRConnector *drc)
> > > @@ -505,6 +574,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > >      uint64_t xinfo;
> > >      sPAPREventLogEntry *event;
> > >      struct rtas_error_log *hdr;
> > > +    int i;
> > >  
> > >      if ((nargs < 6) || (nargs > 7) || nret != 1) {
> > >          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > > @@ -541,8 +611,11 @@ static void check_exception(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> > >       * do the latter here, since our code relies on edge-triggered
> > >       * interrupts.
> > >       */
> > > -    if (rtas_event_log_contains(mask, true)) {
> > > -        qemu_irq_pulse(xics_get_qirq(spapr->xics, spapr->check_exception_irq));
> > > +    for (i = 0; i < EVENT_CLASS_MAX; i++) {
> > > +        if (rtas_event_log_contains(EVENT_CLASS_MASK(i), true)) {
> > > +            g_assert(event_source[i].enabled);
> > > +            qemu_irq_pulse(xics_get_qirq(spapr->xics, event_source[i].irq));
> > > +        }
> > >      }
> > >  
> > >      return;
> > > @@ -594,8 +667,17 @@ out_no_events:
> > >  void spapr_events_init(sPAPRMachineState *spapr)
> > >  {
> > >      QTAILQ_INIT(&spapr->pending_events);
> > > -    spapr->check_exception_irq = xics_spapr_alloc(spapr->xics, 0, 0, false,
> > > -                                            &error_fatal);
> > > +
> > > +    rtas_event_source_register(EVENT_CLASS_EPOW,
> > > +                               xics_spapr_alloc(spapr->xics, 0, 0, false,
> > > +                                                &error_fatal));
> > > +
> > > +    if (spapr->use_hotplug_event_source) {
> > > +        rtas_event_source_register(EVENT_CLASS_HOT_PLUG,
> > > +                                   xics_spapr_alloc(spapr->xics, 0, 0, false,
> > > +                                                    &error_fatal));
> > > +    }
> > > +
> > >      spapr->epow_notifier.notify = spapr_powerdown_req;
> > >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> > >      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index d1a4a14..2295ac6 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -71,7 +71,6 @@ struct sPAPRMachineState {
> > >      sPAPROptionVector *ov5_cas;
> > >      bool cas_reboot;
> > >  
> > > -    uint32_t check_exception_irq;
> > >      Notifier epow_notifier;
> > >      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> > >      bool use_hotplug_event_source;
> > > @@ -579,7 +578,7 @@ struct sPAPREventLogEntry {
> > >  };
> > >  
> > >  void spapr_events_init(sPAPRMachineState *sm);
> > > -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
> > > +void spapr_events_fdt_skel(void *fdt);
> > >  int spapr_h_cas_compose_response(sPAPRMachineState *sm,
> > >                                   target_ulong addr, target_ulong size,
> > >                                   bool cpu_update,
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options
  2016-10-14 18:04     ` Michael Roth
@ 2016-10-17  2:51       ` Bharata B Rao
  0 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2016-10-17  2:51 UTC (permalink / raw)
  To: Michael Roth; +Cc: qemu-devel, qemu-ppc, david, nfont, jallen

On Fri, Oct 14, 2016 at 01:04:37PM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-10-14 03:37:32)
> > On Wed, Oct 12, 2016 at 06:13:55PM -0500, Michael Roth wrote:
> > > This adds machine options of the form:
> > > 
> > >   -machine pseries,legacy-hotplug-events=true
> > >   -machine pseries,legacy-hotplug-events=false
> > > 
> > > to denote whether or not we wish to force the use of "legacy" style
> > > hotplug events, which are surfaced through EPOW interrupts instead of
> > > a dedicated interrupt source, and lack certain features necessary,
> > > mainly, for memory unplug support.
> > > 
> > > If false, QEMU will default to "legacy" style unless the guest
> > > advertises support for the newer events via
> > > ibm,client-architecture-support hcall during early boot.
> > > 
> > > For pseries-2.7 and earlier we default to true, for newer machine
> > > types we default to false.
> > > 
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr.c              | 31 +++++++++++++++++++++++++++++++
> > >  include/hw/ppc/spapr.h      |  1 +
> > >  include/hw/ppc/spapr_ovec.h |  1 +
> > >  3 files changed, 33 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index f8cde92..d80a6fa 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1816,6 +1816,11 @@ static void ppc_spapr_init(MachineState *machine)
> > > 
> > >      spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> > > 
> > > +    /* use dedicated HP event source if guest supports it */
> > > +    if (spapr->use_hotplug_event_source) {
> > > +        spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> > 
> > The above comment can be confusing. Here you really mean that
> > the machine type version supports OV5_HP_EVT right ? Because
> > guest support for the same is determined during cas call later.
> 
> What trying to get it across that support would only be enabled if the
> guest indicates support for it later. What about something like:
> 
> /* advertise support for dedicated HP event source to guests */

Sounds good. Thanks.

> > 
> > Regards,
> > Bharata.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-10-17  2:51 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12 23:13 [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 01/11] spapr_ovec: initial implementation of option vector helpers Michael Roth
2016-10-14  2:39   ` David Gibson
2016-10-14 17:49     ` Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 02/11] spapr_hcall: use spapr_ovec_* interfaces for CAS options Michael Roth
2016-10-14  3:02   ` David Gibson
2016-10-14  4:20     ` David Gibson
2016-10-14  7:10   ` Bharata B Rao
2016-10-12 23:13 ` [Qemu-devel] [PATCH 03/11] spapr: add option vector handling in CAS-generated resets Michael Roth
2016-10-14  4:15   ` David Gibson
2016-10-12 23:13 ` [Qemu-devel] [PATCH 04/11] spapr: improve ibm, architecture-vec-5 property handling Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 05/11] spapr: fix inheritance chain for default machine options Michael Roth
2016-10-14  4:34   ` David Gibson
2016-10-12 23:13 ` [Qemu-devel] [PATCH 06/11] spapr: update spapr hotplug documentation Michael Roth
2016-10-14  4:35   ` David Gibson
2016-10-12 23:13 ` [Qemu-devel] [PATCH 07/11] spapr: add hotplug interrupt machine options Michael Roth
2016-10-14  4:38   ` David Gibson
2016-10-14 18:08     ` Michael Roth
2016-10-14  8:37   ` Bharata B Rao
2016-10-14 18:04     ` Michael Roth
2016-10-17  2:51       ` Bharata B Rao
2016-10-12 23:13 ` [Qemu-devel] [PATCH 08/11] spapr_events: add support for dedicated hotplug event source Michael Roth
2016-10-14  4:56   ` David Gibson
2016-10-14 18:44     ` Michael Roth
2016-10-16 23:39       ` David Gibson
2016-10-14  8:46   ` Bharata B Rao
2016-10-14 18:51     ` Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 09/11] spapr: Add DRC count indexed hotplug identifier type Michael Roth
2016-10-14  4:59   ` David Gibson
2016-10-14 18:52     ` Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 10/11] spapr: use count+index for memory hotplug Michael Roth
2016-10-12 23:13 ` [Qemu-devel] [PATCH 11/11] spapr: Memory hot-unplug support Michael Roth
2016-10-14  7:05   ` Bharata B Rao
2016-10-14  4:10 ` [Qemu-devel] [RFC PATCH 00/11] spapr: option vector re-work and memory unplug support no-reply
2016-10-14  5:43   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.