All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug
@ 2014-12-23 12:30 Michael Roth
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
                   ` (16 more replies)
  0 siblings, 17 replies; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

These patches are based on ppc-next, and can also be obtained from:

https://github.com/mdroth/qemu/commits/spapr-hotplug-pci-v4

v4:
 * added documentation for sPAPR-based hotplug (Alexey)
 * reworked DR Connectors to be QOM devices, where sensor/indicator
   states are accessed via RTAS via object methods and also exposed
   via composition tree for introspection via qom-get/qom-fuse.
   attached devices are managed via state transitions handled by
   the DRC device (Alex)
 * DRC-related constants now defined in seperate header file,
   implemented as enum types where applicable
 * removed stub implementations of sensors that were not relevant
   to dynamic-reconfiguration. we now return "not implemented"
   if a guest attempts to access them via rtas-get-sensor or
   rtas-set-indicator-state
 * added DRC reset hooks to complete unplug for devices awaiting
   additional action from the guest before removal
 * incorporated endian fixes from Bharata and tested on ppc64le
   (Alex/Bharata)
 * used rtas_{ld,st} helpers in place of cpu_physical_memory_map
   for configure-connector implementation (Alex)
 * used b_* helper macros for properties related to OF PCI Binding
   (Alexey)
 * added dynamic-reconfiguration option to spapr-pci-host-bridge to
   enable/disable PCI hotplug for child bus
 * added pseries-2.3 machine and compat code to disable PCI hotplug by
   default for older machine types (Alex)
 * removed OF properties and DRC instances related to hotplugging of
   PHBs. this is not a prereq for PCI hotplug and will be handled as
   a separate series
 * moved generation of boot-time devices properties to common helper
   that can be re-used for memory, cpu, and phb. (Bharata)
 * re-organized patches so that pci, memory, cpu, phb should base
   cleanly on common set of patches implementing core DRC functionality
   (Bharata)
 * moved PCI 0-address fix to separate series (Alex)

v3:
 * dropped emulation of firmware-managed BAR allocation. this will be
   introduced via a follow-up series via a -machine flag and tied to
   a separate hotplug event to avoid a race condition with guest vs.
   "firmware"-managed BAR allocation, in conjunction with required
   fixes to rpaphp hotplug kernel module to utilize this mode.
 * moved drc_table into sPAPREnvironment (Alexey)
 * moved INDICATOR_* constants and friends into spapr_pci.c (Alexey)
 * use prefixes for global types (DrcEntry/ConfigureConnectorState) (Alexey)
 * updated for new hotplug interface (Alexey)
 * fixed get-power-level to report current power-level rather than
   desired (Alexey)
 * rebased to latest ppc-next

v2:
  * re-ordered patches to fix build bisectability (Alexey)
  * replaced g_warning with DPRINTF in RTAS calls for guest errors (Alexey)
  * replaced g_warning with fprintf for qemu errors (Alexey)
  * updated RTAS calls to use pre-existing error/success macros (Alexey)
  * replaced DR_*/SENSOR_* macros with INDICATOR_* for set-indicator/
    get-sensor-state (Alexey)

OVERVIEW

These patches add support for PCI hotplug for SPAPR guests. We advertise
each PHB as DR-capable (as defined by PAPR 13.5/13.6) with 32 hotpluggable
PCI slots per PHB, which models a standard PCI expansion device for Power
machines where the DRC name/loc-code/index for each slot are generated
based on bus/slot number.

This is compatible with existing guest kernel's via the rpaphp hotplug
module, and existing userspace tools such as drmgr/librtas/rtas_errd for
managing devices.

NOTES / ADDITIONAL DEPENDENCIES

This series relies on v1.2.19 or later of powerppc-utils (drmgr, rtas_errd,
ppc64-diag, and librtas components, specificially), which will automate
guest-side hotplug setup in response to an EPOW event emitted by QEMU. For
guests with older versions of powerpc-utils, a manual workaround must be
used (documented below).

Note that this relies on a patch to core PCI code which allows for the
use of a 0-address IO BAR for PCI devices. Without this patch, the first
hotplugged device will likely fail. This patch will be handled separately,
but is included in the in the development tree below for testing:

https://github.com/mdroth/qemu/commits/spapr-hotplug-pci

PATCH LAYOUT

Patches
        1     Documentation for sPAPR Dynamic-Reconfiguration/hotplug
        2     Initial implementation for sPAPRDRConnector device
        3-7   Guest RTAS calls to interact with DRC devices
        8-9   Introduce RTAS events for signalling hotplug operations
              to guest, using existing infrastructure of
              EPOW/check-exception events
        10    DRC helper code to populate DT descriptions of present DRC
              devices
        11    pseries-2.3 machine type to enable hotplug functionality by
              default, and leave disable for pre-2.2 to maintain migration
              compatibility.
        12    spapr-host-bridge option to selectively enable PCI hotplug/DR
              on a PHB-by-PHB basis
        13-17 PCI-specific hotplug hooks and DRC creation to enable PCI
              hotplug and hotplug events

USAGE

For guests with powerpc-utils 1.2.19+:
  hotplug:
    qemu:
      device_add e1000,id=slot0
  unplug:
    qemu:
      device_del slot0

For guests with powerpc-utils prior to 1.2.19:
  hotplug:
    qemu:
      device_add e1000,id=slot0
    guest:
      drmgr -c pci -s "C0" -n -a
      echo 1 >/sys/bus/pci/rescan
  unplug:
    guest:
      drmgr -c pci -s "C0" -n -r
      echo 1 >/sys/bus/pci/devices/0000:00:00.0/remove
    qemu:
      device_del slot0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-16  5:28   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This adds a general overview of hotplug/dynamic-reconfiguration
for sPAPR/pSeries guest.

As specified in PAPR+ v2.7.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 docs/specs/ppc-spapr-hotplug.txt | 287 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 287 insertions(+)
 create mode 100644 docs/specs/ppc-spapr-hotplug.txt

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
new file mode 100644
index 0000000..6f2863f
--- /dev/null
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -0,0 +1,287 @@
+= sPAPR Dynamic Reconfiguration =
+
+sPAPR/"pseries" guests make use a facility called dynamic-reconfiguration to
+handle hotplugging of dynamic "physical" resources like PCI cards, or
+"logical"/paravirtual resources like memory, CPUs, and "physical"
+host-bridges, which are generally managed by the host/hypervisor and provided
+to guests as virtualized resources. The specifics of dynamic-reconfiguration
+are documented extensively in PAPR+ v2.7, Section 13.1. This document
+provides a summary of that information as it applies to the implementation
+within QEMU.
+
+== Dynamic-reconfiguration Connectors ==
+
+To manage hotplug/unplug of these resources, a firmware abstraction known as
+a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
+resource to the guest, and provide an interface for the guest to manage
+configuration/removal of the resource associated with it.
+
+== Device-tree description of DRCs ==
+
+A set of 4 array Open Firmware device tree properties are used to describe
+the name/index/power-domain/type of each DRC allocated to a guest at
+boot-time. There may be multiple sets of these arrays, rooted at different
+paths in the device tree depending on the type of resource the DRCs manage.
+
+In some cases, the DRCs themselves may be provided by a dynamic resource,
+such as the DRCs managed PCI slots on a hotplugged PHB. In this case the
+arrays would be fetched as part of the device tree retrieval interfaces
+for hotplugged resources described under "Guest->Host interface".
+
+The array properties are described below. Each entry/element in an array
+describes the DRC identified by the element in the corresponding position
+of ibm,drc-indexes:
+
+ibm,drc-names:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <name> string encoded as a byte array
+
+  <name> values for logical/virtual resources are defined in PAPR+ v2.7,
+  Section 13.5.2.4, and basically consist of the type of the resource
+  followed by a space and a numerical value that's unique across resources
+  of that type.
+
+  <name> values for "physical" resources such as PCI or VIO devices are
+  defined as being "location codes", which are the "location labels" of
+  each encapsulating device, starting from the chassis down to the
+  individual slot for the device, concatenated by a hyphen. This provides
+  a mapping of resources to a physical location in a chassis for debugging
+  purposes. For QEMU, this mapping is less important, so we assign a
+  location code that confirms to naming specifications, but is simply a
+  location label for the slot by itself to simplify the implementation.
+  The naming convention for location labels is documented in detail in
+  PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>"
+  for PCI/VIO device slots, where <n> is unique across all PCI/VIO
+  device slots.
+
+ibm,drc-indexes:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs
+    in the machine
+
+  <index> is arbitrary, but in the case of QEMU we try to maintain the
+  convention used to assign them to pSeries guests on pHyp:
+
+    bit[31:28]: integer encoding of <type>, where <type> is:
+                  1 for CPU resource
+                  2 for PHB resource
+                  3 for VIO resource
+                  4 for PCI resource
+                  8 for Memory resource
+    bit[27:0]: integer encoding of <id>, where <id> is unique across
+                 all resources of specified type
+
+ibm,drc-power-domains:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the
+    power domain the resource will be assigned to. In the case of QEMU
+    we associated all resources with a "live insertion" domain, where the
+    power is assumed to be managed automatically. The integer value for
+    this domain is a special value of -1.
+
+
+ibm,drc-types:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <type> string encoded as a byte array
+
+  <type> is assigned as follows:
+    "CPU" for a CPU
+    "PHB" for a physical host-bridge
+    "SLOT" for a VIO slot
+    "28" for a PCI slot
+    "MEM" for memory resource
+
+== Guest->Host interface to manage dynamic resources ==
+
+Each DRC is given a globally unique DRC Index, and resources associated with
+a particular DRC are configured/managed by the guest via a number of RTAS
+calls which reference individual DRCs based on the DRC index. This can be
+considered the guest->host interface.
+
+rtas-set-power-level:
+  arg[0]: integer identifying power domain
+  arg[1]: new power level for the domain, 0-100
+  output[0]: status, 0 on success
+  output[1]: power level after command
+
+  Set the power level for a specified power domain
+
+rtas-get-power-level:
+  arg[0]: integer identifying power domain
+  output[0]: status, 0 on success
+  output[1]: current power level
+
+  Get the power level for a specified power domain
+
+rtas-set-indicator:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  arg[2]: desired sensor value
+  output[0]: status, 0 on success
+
+  Set the state of an indicator or sensor. For the purpose of this document we
+  focus on the indicator/sensor types associated with a DRC. The types are:
+
+    9001: isolation-state, controls/indicates whether a device has been made
+          accessible to a guest
+
+          supported sensor values:
+            0: isolate, device is made unaccessible by guest OS
+            1: unisolate, device is made available to guest OS
+
+    9002: dr-indicator, controls "visual" indicator associated with device
+
+          supported sensor values:
+            0: inactive, resource may be safely removed
+            1: active, resource is in use and cannot be safely removed
+            2: identify, used to visually identify slot for interactive hotplug
+            3: action, in most cases, used in the same manner as identify
+
+    9003: allocation-state, generally only used for "logical" DR resources to
+          request the allocation/deallocation of a resource prior to acquiring
+          it via isolation-state->unisolate, or after releasing it via
+          isolation-state->isolate, respectively. for "physical" DR (like PCI
+          hotplug/unplug) the pre-allocation of the resource is implied and
+          this sensor is unused.
+
+          supported sensor values:
+            0: unusable, tell firmware/system the resource can be
+               unallocated/reclaimed and added back to the system resource pool
+            1: usable, request the resource be allocated/reserved for use by
+               guest OS
+            2: exchange, used to allocate a spare resource to use for fail-over
+               in certain situations. unused in QEMU
+            3: recover, used to reclaim a previously allocated resource that's
+               not currently allocated to the guest OS. unused in QEMU
+
+rtas-get-sensor-state:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  output[0]: status, 0 on success
+
+  Used to read an indicator or sensor value.
+
+  For DR-related operations, the only noteworthy sensor is dr-entity-sense,
+  which has a type value of 9003, as allocation-state does in the case of
+  rtas-set-indicator. The semantics/encodings of the sensor values are distinct
+  however:
+
+  supported sensor values for dr-entity-sense (9003) sensor:
+    0: empty,
+         for physical resources: DRC/slot is empty
+         for logical resources: unused
+    1: present,
+         for physical resources: DRC/slot is populated with a device/resource
+         for logical resources: resource has been allocated to the DRC
+    2: unusable,
+         for physical resources: unused
+         for logical resources: DRC has no resource allocated to it
+    3: exchange,
+         for physical resources: unused
+         for logical resources: resource available for exchange (see
+           allocation-state sensor semantics above)
+    4: recovery,
+         for physical resources: unused
+         for logical resources: resource available for recovery (see
+           allocation-state sensor semantics above)
+
+rtas-ibm-configure-connector:
+  arg[0]: guest physical address of 4096-byte work area buffer
+  arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero
+          if a prior RTAS response indicated a need for additional memory
+  output[0]: status:
+               0: completed transmittal of device-tree node
+               1: instruct guest to prepare for next DT sibling node
+               2: instruct guest to prepare for next DT child node
+               3: instruct guest to prepare for next DT property
+               4: instruct guest to ascend to parent DT node
+               5: instruct guest to provide additional work-area buffer
+                  via arg[1]
+            990x: instruct guest that operation took too long and to try
+                  again later
+
+  Used to fetch an OF device-tree description of the resource associated with
+  a particular DRC. The DRC index is encoded in the first 4-bytes of the first
+  work area buffer.
+
+  Work area layout, using 4-byte offsets:
+    wa[0]: DRC index of the DRC to fetch device-tree nodes from
+    wa[1]: 0 (hard-coded)
+    wa[2]: for next-sibling/next-child response:
+             wa offset of null-terminated string denoting the new node's name
+           for next-property response:
+             wa offset of null-terminated string denoting new property's name
+    wa[3]: for next-property response (unused otherwise):
+             byte-length of new property's value
+    wa[4]: for next-property response (unused otherwise):
+             new property's value, encoded as an OFDT-compatible byte array
+
+== hotplug/unplug events ==
+
+For most DR operations, the hypervisor will issue host->guest add/remove events
+using the EPOW/check-exception notification framework, where the host issues a
+check-exception interrupt, then provides an RTAS event log via an
+rtas-check-exception call issued by the guest in response. This framework is
+documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
+requests via EPOW events.
+
+For DR, this framework has been extended to include hotplug events, which were
+previously unneeded due to direct manipulation of DR-related guest userspace
+tools by host-level management such as an HMC. This level of management is not
+applicable to PowerKVM, hence the reason for extending the notification
+framework to support hotplug events.
+
+Note that these events are not yet formally part of the PAPR+ specification,
+but support for this format has already been implemented in DR-related
+guest tools such as powerpc-utils/librtas, as well as kernel patches that have
+been submitted to handle in-kernel processing of memory/cpu-related hotplug
+events[1], and is planned for formal inclusion is PAPR+ specification. The
+hotplug-specific payload is QEMU implemented as follows (with all values
+encoded in big-endian format):
+
+struct rtas_event_log_v6_hp {
+#define SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct section_header {
+        uint16_t section_id;            /* set to SECTION_ID_HOTPLUG */
+        uint16_t section_length;        /* sizeof(rtas_event_log_v6_hp),
+                                         * plus the length of the DRC name
+                                         * if a DRC name identifier is
+                                         * specified for hotplug_identifier
+                                         */
+        uint8_t section_version;        /* version 1 */
+        uint8_t section_subtype;        /* unused */
+        uint16_t creator_component_id;  /* unused */
+    } hdr;
+#define RTAS_LOG_V6_HP_TYPE_CPU         1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY      2
+#define RTAS_LOG_V6_HP_TYPE_SLOT        3
+#define RTAS_LOG_V6_HP_TYPE_PHB         4
+#define RTAS_LOG_V6_HP_TYPE_PCI         5
+    uint8_t hotplug_type;               /* type of resource/device */
+#define RTAS_LOG_V6_HP_ACTION_ADD       1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE    2
+    uint8_t hotplug_action;             /* action (add/remove) */
+#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
+    uint8_t hotplug_identifier;         /* type of the resource identifier,
+                                         * which serves as the discriminator
+                                         * for the 'drc' union field below
+                                         */
+    uint8_t reserved;
+    union {
+        uint32_t index;                 /* DRC index of resource to take action
+                                         * on
+                                         */
+        uint32_t count;                 /* number of DR resources to take
+                                         * action on (guest chooses which)
+                                         */
+        char name[1];                   /* string representing the name of the
+                                         * DRC to take action on
+                                         */
+    } drc;
+} QEMU_PACKED;
+
+[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-02 10:32   ` Bharata B Rao
  2015-01-16  6:19   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.

Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.

On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/Makefile.objs       |   2 +-
 hw/ppc/spapr_drc.c         | 503 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h | 201 ++++++++++++++++++
 3 files changed, 705 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_drc.c
 create mode 100644 include/hw/ppc/spapr_drc.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 19d9920..ea010fd 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_drc.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
new file mode 100644
index 0000000..f81c6d1
--- /dev/null
+++ b/hw/ppc/spapr_drc.c
@@ -0,0 +1,503 @@
+/*
+ * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "hw/ppc/spapr_drc.h"
+#include "qom/object.h"
+#include "hw/qdev.h"
+#include "qapi/visitor.h"
+#include "qemu/error-report.h"
+
+/* #define DEBUG_SPAPR_DRC */
+
+#ifdef DEBUG_SPAPR_DRC
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#define DPRINTFN(fmt, ...) \
+    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#define DPRINTFN(fmt, ...) \
+    do { } while (0)
+#endif
+
+#define DRC_CONTAINER_PATH "/dr-connector"
+#define DRC_INDEX_TYPE_SHIFT 28
+#define DRC_INDEX_ID_MASK ~(~0 << DRC_INDEX_TYPE_SHIFT)
+
+static int set_isolation_state(sPAPRDRConnector *drc,
+                               sPAPRDRIsolationState state)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    DPRINTFN("set_isolation_state: %x", state);
+    drc->isolation_state = state;
+    if (drc->awaiting_release &&
+        drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
+        drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
+                     drc->detach_cb_opaque);
+    }
+    return 0;
+}
+
+static int set_indicator_state(sPAPRDRConnector *drc,
+                               sPAPRDRIndicatorState state)
+{
+    DPRINTFN("set_indicator_state: %x", state);
+    drc->indicator_state = state;
+    return 0;
+}
+
+static int set_allocation_state(sPAPRDRConnector *drc,
+                                sPAPRDRAllocationState state)
+{
+    DPRINTFN("set_allocation_state: %x", state);
+    drc->indicator_state = state;
+    return 0;
+}
+
+static uint32_t get_id(sPAPRDRConnector *drc)
+{
+    /* this value is unique for DRCs of a particular type, but may
+     * overlap with the id's of other DRCs. the value is used both
+     * to calculate a unique (across all DRC types) index, as well
+     * as generating the ibm,drc-names OFDT property that describes
+     * DRCs
+     */
+    return drc->id;
+}
+
+static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
+{
+    uint32_t shift = 0;
+
+    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
+    while (type != (1 << shift)) {
+        shift++;
+    }
+    return shift;
+}
+
+static uint32_t get_index(sPAPRDRConnector *drc)
+{
+    /* no set format for a drc index: it only needs to be globally
+     * unique. this is how we encode the DRC type on bare-metal
+     * however, so might as well do that here
+     */
+    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
+            (drc->id & DRC_INDEX_ID_MASK);
+}
+
+static uint32_t get_type(sPAPRDRConnector *drc)
+{
+    return drc->type;
+}
+
+/*
+ * dr-entity-sense sensor value
+ * returned via get-sensor-state RTAS calls
+ * as expected by state diagram in PAPR+ 2.7, 13.4
+ * based on the current allocation/indicator/power states
+ * for the DR connector.
+ */
+static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
+{
+    if (drc->dev) {
+        /* this assumes all PCI devices are assigned to
+         * a 'live insertion' power domain, where QEMU
+         * manages power state automatically as opposed
+         * to the guest. present, non-PCI resources are
+         * unaffected by power state.
+         */
+        return SPAPR_DR_ENTITY_SENSE_PRESENT;
+    }
+
+    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
+        /* PCI devices, and only PCI devices, use PRESENT
+         * in cases where we'd otherwise use UNUSABLE
+         */
+        return SPAPR_DR_ENTITY_SENSE_EMPTY;
+    }
+    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
+}
+
+static sPAPRDRCCResponse configure_connector_common(sPAPRDRCCState *ccs,
+                            char **prop_name, const struct fdt_property **prop,
+                            int *prop_len)
+{
+    sPAPRDRCCResponse resp = SPAPR_DR_CC_RESPONSE_CONTINUE;
+    int fdt_offset_next;
+
+    *prop_name = NULL;
+    *prop = NULL;
+    *prop_len = 0;
+
+    if (!ccs->fdt) {
+        return SPAPR_DR_CC_RESPONSE_ERROR;
+    }
+
+    while (resp == SPAPR_DR_CC_RESPONSE_CONTINUE) {
+        const char *name_cur;
+        uint32_t tag;
+        int name_cur_len;
+
+        tag = fdt_next_tag(ccs->fdt, ccs->fdt_offset, &fdt_offset_next);
+        switch (tag) {
+        case FDT_BEGIN_NODE:
+            ccs->fdt_depth++;
+            name_cur = fdt_get_name(ccs->fdt, ccs->fdt_offset, &name_cur_len);
+            *prop_name = g_strndup(name_cur, name_cur_len);
+            resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
+            break;
+        case FDT_END_NODE:
+            ccs->fdt_depth--;
+            if (ccs->fdt_depth == 0) {
+                resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
+            } else {
+                resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT;
+            }
+            break;
+        case FDT_PROP:
+            *prop = fdt_get_property_by_offset(ccs->fdt, ccs->fdt_offset,
+                                               prop_len);
+            name_cur = fdt_string(ccs->fdt, fdt32_to_cpu((*prop)->nameoff));
+            *prop_name = g_strdup(name_cur);
+            resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
+            break;
+        case FDT_END:
+            resp = SPAPR_DR_CC_RESPONSE_ERROR;
+            break;
+        default:
+            ccs->fdt_offset = fdt_offset_next;
+        }
+    }
+
+    ccs->fdt_offset = fdt_offset_next;
+    return resp;
+}
+
+static sPAPRDRCCResponse configure_connector(sPAPRDRConnector *drc,
+                                             char **prop_name,
+                                             const struct fdt_property **prop,
+                                             int *prop_len)
+{
+    return configure_connector_common(&drc->ccs, prop_name, prop, prop_len);
+}
+
+static void prop_get_id(Object *obj, Visitor *v, void *opaque,
+                                  const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    uint32_t value = get_id(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_index(Object *obj, Visitor *v, void *opaque,
+                                  const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->get_index(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_type(Object *obj, Visitor *v, void *opaque,
+                          const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->get_type(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_entity_sense(Object *obj, Visitor *v, void *opaque,
+                                  const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->entity_sense(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_fdt(Object *obj, Visitor *v, void *opaque,
+                        const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRCCState ccs = { 0 };
+    sPAPRDRCCResponse resp;
+
+    ccs.fdt = drc->ccs.fdt;
+    ccs.fdt_offset = ccs.fdt_start_offset = drc->ccs.fdt_start_offset;
+
+    do {
+        char *prop_name = NULL;
+        const struct fdt_property *prop = NULL;
+        int prop_len;
+
+        resp = configure_connector_common(&ccs, &prop_name, &prop, &prop_len);
+
+        switch (resp) {
+        case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
+            visit_start_struct(v, NULL, NULL, prop_name, 0, NULL);
+            break;
+        case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
+            visit_end_struct(v, NULL);
+            break;
+        case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY: {
+            int i;
+            visit_start_list(v, prop_name, NULL);
+            for (i = 0; i < prop_len; i++) {
+                visit_type_uint8(v, (uint8_t *)&prop->data[i], NULL, NULL);
+            }
+            visit_end_list(v, NULL);
+            break;
+        }
+        default:
+            resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
+            break;
+        }
+
+        g_free(prop_name);
+    } while (resp != SPAPR_DR_CC_RESPONSE_SUCCESS &&
+             resp != SPAPR_DR_CC_RESPONSE_ERROR);
+}
+
+static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
+                   int fdt_start_offset, bool coldplug)
+{
+    DPRINTFN("attach");
+
+    g_assert(drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED);
+    g_assert(drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE);
+    g_assert(drc->indicator_state == SPAPR_DR_INDICATOR_STATE_INACTIVE);
+    g_assert(fdt || coldplug);
+
+    /* NOTE: setting initial isolation state to UNISOLATED means we can't
+     * detach unless guest has a userspace/kernel that moves this state
+     * back to ISOLATED in response to an unplug event, or this is done
+     * manually by the admin prior. if we force things while the guest
+     * may be accessing the device, we can easily crash the guest, so we
+     * we defer completion of removal in such cases to the reset() hook.
+     */
+    drc->isolation_state = SPAPR_DR_ISOLATION_STATE_UNISOLATED;
+    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_USABLE;
+    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_ACTIVE;
+
+    drc->dev = d;
+    drc->ccs.fdt = fdt;
+    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = fdt_start_offset;
+    drc->ccs.fdt_depth = 0;
+
+    object_property_add_link(OBJECT(drc), "device",
+                             object_get_typename(OBJECT(drc->dev)),
+                             (Object **)(&drc->dev),
+                             NULL, 0, NULL);
+}
+
+static void detach(sPAPRDRConnector *drc, DeviceState *d,
+                   spapr_drc_detach_cb *detach_cb,
+                   void *detach_cb_opaque)
+{
+    DPRINTFN("detach");
+
+    drc->detach_cb = detach_cb;
+    drc->detach_cb_opaque = detach_cb_opaque;
+
+    if (drc->isolation_state != SPAPR_DR_ISOLATION_STATE_ISOLATED) {
+        DPRINTFN("awaiting transition to isolated state before removal");
+        drc->awaiting_release = true;
+        return;
+    }
+
+    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_UNUSABLE;
+    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
+
+    if (drc->detach_cb) {
+        drc->detach_cb(drc->dev, drc->detach_cb_opaque);
+    }
+
+    drc->awaiting_release = false;
+    g_free(drc->ccs.fdt);
+    drc->ccs.fdt = NULL;
+    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = drc->ccs.fdt_depth = 0;
+    object_property_del(OBJECT(drc), "device", NULL);
+    drc->dev = NULL;
+    drc->detach_cb = NULL;
+    drc->detach_cb_opaque = NULL;
+}
+
+static void reset(DeviceState *d)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    DPRINTFN("drc reset: %x", drck->get_index(drc));
+    /* immediately upon reset we can safely assume DRCs whose devices are pending
+     * removal can be safely removed, and that they will subsequently be left in
+     * an ISOLATED state. move the DRC to this state in these cases (which will in
+     * turn complete any pending device removals)
+     */
+    if (drc->awaiting_release) {
+        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_ISOLATED);
+    }
+}
+
+static void realize(DeviceState *d, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    Object *root_container;
+    char link_name[256];
+    gchar *child_name;
+    Error *err = NULL;
+
+    DPRINTFN("drc realize: %x", drck->get_index(drc));
+    /* NOTE: we do this as part of realize/unrealize due to the fact
+     * that the guest will communicate with the DRC via RTAS calls
+     * referencing the global DRC index. By unlinking the DRC
+     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
+     * inaccessible by the guest, since lookups rely on this path
+     * existing in the composition tree
+     */
+    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+    snprintf(link_name, sizeof(link_name), "%x", drck->get_index(drc));
+    child_name = object_get_canonical_path_component(OBJECT(drc));
+    DPRINTFN("drc child name: %s", child_name);
+    object_property_add_alias(root_container, link_name,
+                              drc->owner, child_name, &err);
+    /*
+    object_property_add_link(root_container, name, TYPE_SPAPR_DR_CONNECTOR,
+                             (Object **)&drc, NULL,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE, &err);
+                             */
+    if (err) {
+        error_report("%s", error_get_pretty(err));
+        error_free(err);
+        object_unref(OBJECT(drc));
+    }
+    DPRINTFN("drc realize complete");
+}
+
+static void unrealize(DeviceState *d, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    Object *root_container;
+    char name[256];
+    Error *err = NULL;
+
+    DPRINTFN("drc unrealize: %x", drck->get_index(drc));
+    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+    snprintf(name, sizeof(name), "%x", drck->get_index(drc));
+    object_property_del(root_container, name, &err);
+    if (err) {
+        error_report("%s", error_get_pretty(err));
+        error_free(err);
+        object_unref(OBJECT(drc));
+    }
+}
+
+sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
+                                         sPAPRDRConnectorType type,
+                                         uint32_t id)
+{
+    sPAPRDRConnector *drc =
+        SPAPR_DR_CONNECTOR(object_new(TYPE_SPAPR_DR_CONNECTOR));
+
+    g_assert(type);
+
+    drc->type = type;
+    drc->id = id;
+    drc->owner = owner;
+    object_property_add_child(owner, "dr-connector[*]", OBJECT(drc), NULL);
+    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
+
+    return drc;
+}
+
+static void spapr_dr_connector_instance_init(Object *obj)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+
+    object_property_add_uint32_ptr(obj, "isolation-state",
+                                   &drc->isolation_state, NULL);
+    object_property_add_uint32_ptr(obj, "indicator-state",
+                                   &drc->indicator_state, NULL);
+    object_property_add_uint32_ptr(obj, "allocation-state",
+                                   &drc->allocation_state, NULL);
+    object_property_add(obj, "id", "uint32", prop_get_id,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "index", "uint32", prop_get_index,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "index", "uint32", prop_get_type,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "entity-sense", "uint32", prop_get_entity_sense,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "fdt", "struct", prop_get_fdt,
+                        NULL, NULL, NULL, NULL);
+}
+
+static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
+{
+    DeviceClass *dk = DEVICE_CLASS(k);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+
+    dk->reset = reset;
+    dk->realize = realize;
+    dk->unrealize = unrealize;
+    drck->set_isolation_state = set_isolation_state;
+    drck->set_indicator_state = set_indicator_state;
+    drck->set_allocation_state = set_allocation_state;
+    drck->get_index = get_index;
+    drck->get_type = get_type;
+    drck->entity_sense = entity_sense;
+    drck->configure_connector = configure_connector;
+    drck->attach = attach;
+    drck->detach = detach;
+}
+
+static const TypeInfo spapr_dr_connector_info = {
+    .name          = TYPE_SPAPR_DR_CONNECTOR,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(sPAPRDRConnector),
+    .instance_init = spapr_dr_connector_instance_init,
+    .class_size    = sizeof(sPAPRDRConnectorClass),
+    .class_init    = spapr_dr_connector_class_init,
+};
+
+static void spapr_drc_register_types(void)
+{
+    type_register_static(&spapr_dr_connector_info);
+}
+
+type_init(spapr_drc_register_types)
+
+/* helper functions for external users */
+
+sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index)
+{
+    Object *obj;
+    char name[256];
+
+    snprintf(name, sizeof(name), "%s/%x", DRC_CONTAINER_PATH, index);
+    obj = object_resolve_path(name, NULL);
+
+    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
+}
+
+sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
+                                           uint32_t id)
+{
+    return spapr_dr_connector_by_index(
+            (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
+            (id & DRC_INDEX_ID_MASK));
+}
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
new file mode 100644
index 0000000..63ec687
--- /dev/null
+++ b/include/hw/ppc/spapr_drc.h
@@ -0,0 +1,201 @@
+/*
+ * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#if !defined(__HW_SPAPR_DRC_H__)
+#define __HW_SPAPR_DRC_H__
+
+#include "qom/object.h"
+#include "hw/qdev.h"
+#include "libfdt.h"
+
+#define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
+#define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DR_CONNECTOR)
+#define SPAPR_DR_CONNECTOR_CLASS(klass) \
+        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, \
+                           TYPE_SPAPR_DR_CONNECTOR)
+#define SPAPR_DR_CONNECTOR(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
+                                             TYPE_SPAPR_DR_CONNECTOR)
+
+/*
+ * Various hotplug types managed by sPAPRDRConnector
+ *
+ * these are somewhat arbitrary, but to make things easier
+ * when generating DRC indexes later we've aligned the bit
+ * positions with the values used to assign DRC indexes on
+ * pSeries. we use those values as bit shifts to allow for
+ * the OR'ing of these values in various QEMU routines, but
+ * for values exposed to the guest (via DRC indexes for
+ * instance) we will use the shift amounts.
+ */
+typedef enum {
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU = 1,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB = 2,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
+} sPAPRDRConnectorTypeShift;
+
+typedef enum {
+    SPAPR_DR_CONNECTOR_TYPE_ANY = ~0,
+    SPAPR_DR_CONNECTOR_TYPE_CPU = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU,
+    SPAPR_DR_CONNECTOR_TYPE_PHB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB,
+    SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
+    SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
+    SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
+} sPAPRDRConnectorType;
+
+/*
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177
+ *
+ * isolated: put device under firmware control 
+ * unisolated: claim OS control of device (may or may not be in use)
+ */
+typedef enum {
+    SPAPR_DR_ISOLATION_STATE_ISOLATED   = 0,
+    SPAPR_DR_ISOLATION_STATE_UNISOLATED = 1
+} sPAPRDRIsolationState;
+
+/*
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177
+ *
+ * unusable: mark device as unavailable to OS
+ * usable: mark device as available to OS
+ * exchange: (currently unused)
+ * recover: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_ALLOCATION_STATE_UNUSABLE  = 0,
+    SPAPR_DR_ALLOCATION_STATE_USABLE    = 1,
+    SPAPR_DR_ALLOCATION_STATE_EXCHANGE  = 2,
+    SPAPR_DR_ALLOCATION_STATE_RECOVER   = 3
+} sPAPRDRAllocationState;
+
+/*
+ * LED/visual indicator state
+ *
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177,
+ * and PAPR+ 2.7 13.5.4.1, Table 180
+ *
+ * inactive: hotpluggable entity inactive and safely removable
+ * active: hotpluggable entity in use and not safely removable
+ * identify: (currently unused)
+ * action: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_INDICATOR_STATE_INACTIVE   = 0,
+    SPAPR_DR_INDICATOR_STATE_ACTIVE     = 1,
+    SPAPR_DR_INDICATOR_STATE_IDENTIFY   = 2,
+    SPAPR_DR_INDICATOR_STATE_ACTION     = 3,
+} sPAPRDRIndicatorState;
+
+/*
+ * returned via get-sensor-state RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.3, Table 175:
+ *
+ * empty: connector slot empty (e.g. empty hotpluggable PCI slot)
+ * present: connector slot populated and device available to OS
+ * unusable: device not currently available to OS
+ * exchange: (currently unused)
+ * recover: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_ENTITY_SENSE_EMPTY     = 0,
+    SPAPR_DR_ENTITY_SENSE_PRESENT   = 1,
+    SPAPR_DR_ENTITY_SENSE_UNUSABLE  = 2,
+    SPAPR_DR_ENTITY_SENSE_EXCHANGE  = 3,
+    SPAPR_DR_ENTITY_SENSE_RECOVER   = 4,
+} sPAPRDREntitySense;
+
+typedef enum {
+    SPAPR_DR_CC_RESPONSE_NEXT_SIB       = 1, /* currently unused */
+    SPAPR_DR_CC_RESPONSE_NEXT_CHILD     = 2,
+    SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY  = 3,
+    SPAPR_DR_CC_RESPONSE_PREV_PARENT    = 4,
+    SPAPR_DR_CC_RESPONSE_SUCCESS        = 0,
+    SPAPR_DR_CC_RESPONSE_ERROR          = -1,
+    SPAPR_DR_CC_RESPONSE_CONTINUE       = -2,
+} sPAPRDRCCResponse;
+
+typedef struct sPAPRDRCCState {
+    void *fdt;
+    int fdt_start_offset;
+    int fdt_offset;
+    int fdt_depth;
+} sPAPRDRCCState;
+
+typedef void (spapr_drc_detach_cb)(DeviceState *d, void *opaque);
+
+typedef struct sPAPRDRConnector {
+    /*< private >*/
+    DeviceState parent;
+
+    sPAPRDRConnectorType type;
+    uint32_t id;
+    Object *owner;
+
+    /* sensor/indicator states */
+    uint32_t isolation_state;
+    uint32_t allocation_state;
+    uint32_t indicator_state;
+
+    /* configure-connector state */
+    sPAPRDRCCState ccs;
+
+    bool awaiting_release;
+
+    /* device pointer, via link property */
+    DeviceState *dev;
+    spapr_drc_detach_cb *detach_cb;
+    void *detach_cb_opaque;
+} sPAPRDRConnector;
+
+typedef struct sPAPRDRConnectorClass {
+    /*< private >*/
+    DeviceClass parent;
+
+    /*< public >*/
+
+    /* accessors for guest-visible (generally via RTAS) DR state */
+    int (*set_isolation_state)(sPAPRDRConnector *drc,
+                               sPAPRDRIsolationState state);
+    int (*set_indicator_state)(sPAPRDRConnector *drc,
+                               sPAPRDRIndicatorState state);
+    int (*set_allocation_state)(sPAPRDRConnector *drc,
+                                sPAPRDRAllocationState state);
+    uint32_t (*get_index)(sPAPRDRConnector *drc);
+    uint32_t (*get_type)(sPAPRDRConnector *drc);
+
+    sPAPRDREntitySense (*entity_sense)(sPAPRDRConnector *drc);
+    sPAPRDRCCResponse (*configure_connector)(sPAPRDRConnector *drc,
+                                             char **prop_name,
+                                             const struct fdt_property **prop,
+                                             int *prop_len);
+
+    /* QEMU interfaces for managing hotplug operations */
+    void (*attach)(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
+                   int fdt_start_offset, bool coldplug);
+    void (*detach)(sPAPRDRConnector *drc, DeviceState *d,
+                   spapr_drc_detach_cb *detach_cb,
+                   void *detach_cb_opaque);
+} sPAPRDRConnectorClass;
+
+sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
+                                         sPAPRDRConnectorType type,
+                                         uint32_t token);
+sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
+sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
+                                           uint32_t id);
+
+#endif /* __HW_SPAPR_DRC_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-16  6:21   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface Michael Roth
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 2ec2a8e..a2fb533 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
     rtas_st(rets, 0, ret);
 }
 
+static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args, uint32_t nret,
+                                 target_ulong rets)
+{
+    /* we currently only use a single, "live insert" powerdomain for
+     * hotplugged/dlpar'd resources, so the power is always live/full (100)
+     */
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, 100);
+}
+
+static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, 100);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
                         rtas_ibm_set_system_parameter);
     spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
                         rtas_ibm_os_term);
+    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
+                        rtas_set_power_level);
+    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
+                        rtas_get_power_level);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (2 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-16  6:25   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state " Michael Roth
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Mike Day <ncmike@ncultra.org>

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index a2fb533..6aa325f 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -35,6 +35,18 @@
 #include "qapi-event.h"
 
 #include <libfdt.h>
+#include "hw/ppc/spapr_drc.h"
+
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -311,6 +323,72 @@ static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, 100);
 }
 
+/*
+ * indicator/sensor types
+ * as defined by PAPR+ 2.7 7.3.5.4, Table 41
+ *
+ * NOTE: currently only DR-related sensors are implemented here
+ */
+#define RTAS_SENSOR_TYPE_ISOLATION_STATE 9001
+#define RTAS_SENSOR_TYPE_DR 9002
+#define RTAS_SENSOR_TYPE_ALLOCATION_STATE 9003
+#define RTAS_SENSOR_TYPE_ENTITY_SENSE RTAS_SENSOR_TYPE_ALLOCATION_STATE
+
+static bool sensor_type_is_dr(uint32_t sensor_type)
+{
+    switch (sensor_type) {
+    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
+    case RTAS_SENSOR_TYPE_DR:
+    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
+        return true;
+    }
+
+    return false;
+}
+
+static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                               uint32_t token, uint32_t nargs,
+                               target_ulong args, uint32_t nret,
+                               target_ulong rets)
+{
+    uint32_t sensor_type = rtas_ld(args, 0);
+    uint32_t sensor_index = rtas_ld(args, 1);
+    uint32_t sensor_state = rtas_ld(args, 2);
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+
+    if (sensor_type_is_dr(sensor_type)) {
+        /* if this is a DR sensor we can assume sensor_index == drc_index */
+        drc = spapr_dr_connector_by_index(sensor_index);
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    } else {
+        /* currently only DR-related sensors are implemented */
+        goto out_unimplemented;
+    }
+
+    switch (sensor_type) {
+    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
+        drck->set_isolation_state(drc, sensor_state);
+        break;
+    case RTAS_SENSOR_TYPE_DR:
+        drck->set_indicator_state(drc, sensor_state);
+        break;
+    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
+        drck->set_allocation_state(drc, sensor_state);
+        break;
+    default:
+        goto out_unimplemented;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    return;
+
+out_unimplemented:
+    DPRINTF("rtas_set_indicator: sensor/indicator not implemented: %d\n",
+            sensor_type);
+    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -444,6 +522,8 @@ static void core_rtas_register_types(void)
                         rtas_set_power_level);
     spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
                         rtas_get_power_level);
+    spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
+                        rtas_set_indicator);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state RTAS interface
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (3 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-16  6:28   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper Michael Roth
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Mike Day <ncmike@ncultra.org>

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 6aa325f..13e6e55 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -389,6 +389,39 @@ out_unimplemented:
     rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
 }
 
+static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    uint32_t sensor_type = rtas_ld(args, 0);
+    uint32_t sensor_index = rtas_ld(args, 1);
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    uint32_t entity_sense;
+
+    if (sensor_type != RTAS_SENSOR_TYPE_ENTITY_SENSE) {
+        /* currently only DR-related sensors are implemented */
+        DPRINTF("rtas_get_sensor_state: sensor/indicator not implemented: %d\n",
+                sensor_type);
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    drc = spapr_dr_connector_by_index(sensor_index);
+    if (!drc) {
+        DPRINTF("rtas_get_sensor_state: invalid sensor/DRC index: %xh\n",
+                sensor_index);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    entity_sense = drck->entity_sense(drc);
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, entity_sense);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -524,6 +557,8 @@ static void core_rtas_register_types(void)
                         rtas_get_power_level);
     spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
                         rtas_set_indicator);
+    spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
+                        rtas_get_sensor_state);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (4 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state " Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  3:25   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This is similar to the existing rtas_st_buffer(), but for case where
the guest is not expecting a length-encoded byte array. Namely,
for calls where an "work area" buffer is used to pass around
arbitrary fields/data.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 include/hw/ppc/spapr.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 716bff4..b4daa42 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -413,6 +413,13 @@ static inline void rtas_st(target_ulong phys, int n, uint32_t val)
     stl_be_phys(&address_space_memory, ppc64_phys_to_real(phys + 4*n), val);
 }
 
+static inline void rtas_st_buffer_direct(target_ulong phys,
+                                         target_ulong phys_len,
+                                         uint8_t *buffer, uint16_t buffer_len)
+{
+    cpu_physical_memory_write(ppc64_phys_to_real(phys), buffer,
+                              MIN(buffer_len, phys_len));
+}
 
 static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
                                   uint8_t *buffer, uint16_t buffer_len)
@@ -422,8 +429,7 @@ static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
     }
     stw_be_phys(&address_space_memory,
                 ppc64_phys_to_real(phys), buffer_len);
-    cpu_physical_memory_write(ppc64_phys_to_real(phys + 2),
-                              buffer, MIN(buffer_len, phys_len - 2));
+    rtas_st_buffer_direct(phys + 2, phys_len - 2, buffer, buffer_len);
 }
 
 typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (5 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  3:44   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 13e6e55..d847f45 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -422,6 +422,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, entity_sense);
 }
 
+/* configure-connector work area offsets, int32_t units for field
+ * indexes, bytes for field offset/len values.
+ *
+ * as documented by PAPR+ v2.7, 13.5.3.5
+ */
+#define CC_IDX_NODE_NAME_OFFSET 2
+#define CC_IDX_PROP_NAME_OFFSET 2
+#define CC_IDX_PROP_LEN 3
+#define CC_IDX_PROP_DATA_OFFSET 4
+#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
+#define CC_WA_LEN 4096
+
+static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
+                                         sPAPREnvironment *spapr,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args, uint32_t nret,
+                                         target_ulong rets)
+{
+    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
+    uint64_t wa_offset;
+    uint32_t drc_index;
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    sPAPRDRCCResponse resp;
+    const struct fdt_property *prop = NULL;
+    char *prop_name = NULL;
+    int prop_len, rc;
+
+    drc_index = rtas_ld(wa_addr, 0);
+    drc = spapr_dr_connector_by_index(drc_index);
+    if (!drc) {
+        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
+                drc_index);
+        rc = RTAS_OUT_PARAM_ERROR;
+        goto out;
+    }
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
+
+    switch (resp) {
+    case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
+        /* provide the name of the next OF node */
+        wa_offset = CC_VAL_DATA_OFFSET;
+        rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)prop_name, strlen(prop_name) + 1);
+        break;
+    case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY:
+        /* provide the name of the next OF property */
+        wa_offset = CC_VAL_DATA_OFFSET;
+        rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)prop_name, strlen(prop_name) + 1);
+
+        /* provide the length and value of the OF property. data gets placed
+         * immediately after NULL terminator of the OF property's name string
+         */
+        wa_offset += strlen(prop_name) + 1,
+        rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
+        rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)((struct fdt_property *)prop)->data,
+                              prop_len);
+        break;
+    case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
+    case SPAPR_DR_CC_RESPONSE_ERROR:
+    case SPAPR_DR_CC_RESPONSE_SUCCESS:
+        break;
+    default:
+        /* drck->configure_connector() should not return anything else */
+        g_assert(false);
+    }
+
+    rc = resp;
+out:
+    g_free(prop_name);
+    rtas_st(rets, 0, rc);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -559,6 +638,8 @@ static void core_rtas_register_types(void)
                         rtas_set_indicator);
     spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
                         rtas_get_sensor_state);
+    spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
+                        rtas_ibm_configure_connector);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (6 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  4:31   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface Michael Roth
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

This extends the data structures currently used to report EPOW events to
gets via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd).

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   2 +-
 hw/ppc/spapr_events.c  | 211 ++++++++++++++++++++++++++++++++++++++++---------
 include/hw/ppc/spapr.h |   5 +-
 3 files changed, 177 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 361b914..1bc5773 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1601,7 +1601,7 @@ static void ppc_spapr_init(MachineState *machine)
     spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
                                             kernel_size, kernel_le,
                                             boot_device, kernel_cmdline,
-                                            spapr->epow_irq);
+                                            spapr->check_exception_irq);
     assert(spapr->fdt_skel != NULL);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 1b6157d..ebbf3a4 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -32,6 +32,9 @@
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci/pci.h"
+#include "hw/pci-host/spapr.h"
+#include "hw/ppc/spapr_drc.h"
 
 #include <libfdt.h>
 
@@ -77,6 +80,7 @@ struct rtas_error_log {
 #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
 #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
 #define   RTAS_LOG_TYPE_EPOW                    0x00000040
+#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
     uint32_t extended_length;
 } QEMU_PACKED;
 
@@ -166,6 +170,38 @@ struct epow_log_full {
     struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_hp {
+#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct rtas_event_log_v6_section_header hdr;
+    uint8_t hotplug_type;
+#define RTAS_LOG_V6_HP_TYPE_CPU                          1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
+#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
+#define RTAS_LOG_V6_HP_TYPE_PHB                          4
+#define RTAS_LOG_V6_HP_TYPE_PCI                          5
+    uint8_t hotplug_action;
+#define RTAS_LOG_V6_HP_ACTION_ADD                        1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
+    uint8_t hotplug_identifier;
+#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
+    uint8_t reserved;
+    union {
+        uint32_t index;
+        uint32_t count;
+        char name[1];
+    } drc;
+} QEMU_PACKED;
+
+struct hp_log_full {
+    struct rtas_error_log hdr;
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_maina maina;
+    struct rtas_event_log_v6_mainb mainb;
+    struct rtas_event_log_v6_hp hp;
+} QEMU_PACKED;
+
 #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
 #define EVENT_MASK_EPOW                      0x40000000
 #define EVENT_MASK_HOTPLUG                   0x10000000
@@ -181,29 +217,61 @@ struct epow_log_full {
         }                                                          \
     } while (0)
 
-void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
+void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
 {
-    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
-    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
+    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
+    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
 
     _FDT((fdt_begin_node(fdt, "event-sources")));
 
     _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
     _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
     _FDT((fdt_property(fdt, "interrupt-ranges",
-                       epow_irq_ranges, sizeof(epow_irq_ranges))));
+                       irq_ranges, sizeof(irq_ranges))));
 
     _FDT((fdt_begin_node(fdt, "epow-events")));
-    _FDT((fdt_property(fdt, "interrupts",
-                       epow_interrupts, sizeof(epow_interrupts))));
+    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
     _FDT((fdt_end_node(fdt)));
 
     _FDT((fdt_end_node(fdt)));
 }
 
 static struct epow_log_full *pending_epow;
+static struct hp_log_full *pending_hp;
 static uint32_t next_plid;
 
+static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
+{
+    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
+        | RTAS_LOG_V6_B0_BIGENDIAN;
+    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
+        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
+    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
+}
+
+static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
+                             int section_count)
+{
+    struct tm tm;
+    int year;
+
+    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
+    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
+    /* FIXME: section version, subtype and creator id? */
+    qemu_get_timedate(&tm, spapr->rtc_offset);
+    year = tm.tm_year + 1900;
+    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
+                                       | (to_bcd(year % 100) << 16)
+                                       | (to_bcd(tm.tm_mon + 1) << 8)
+                                       | to_bcd(tm.tm_mday));
+    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
+                                       | (to_bcd(tm.tm_min) << 16)
+                                       | (to_bcd(tm.tm_sec) << 8));
+    maina->creator_id = 'H'; /* Hypervisor */
+    maina->section_count = section_count;
+    maina->plid = next_plid++;
+}
+
 static void spapr_powerdown_req(Notifier *n, void *opaque)
 {
     sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
@@ -212,8 +280,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     struct rtas_event_log_v6_maina *maina;
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_epow *epow;
-    struct tm tm;
-    int year;
 
     if (pending_epow) {
         /* For now, we just throw away earlier events if two come
@@ -237,27 +303,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
                                        - sizeof(pending_epow->hdr));
 
-    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
-        | RTAS_LOG_V6_B0_BIGENDIAN;
-    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
-        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
-    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
-
-    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
-    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
-    /* FIXME: section version, subtype and creator id? */
-    qemu_get_timedate(&tm, spapr->rtc_offset);
-    year = tm.tm_year + 1900;
-    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
-                                       | (to_bcd(year % 100) << 16)
-                                       | (to_bcd(tm.tm_mon + 1) << 8)
-                                       | to_bcd(tm.tm_mday));
-    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
-                                       | (to_bcd(tm.tm_min) << 16)
-                                       | (to_bcd(tm.tm_sec) << 8));
-    maina->creator_id = 'H'; /* Hypervisor */
-    maina->section_count = 3; /* Main-A, Main-B and EPOW */
-    maina->plid = next_plid++;
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
 
     mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
     mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
@@ -274,7 +321,82 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
+{
+    struct hp_log_full *new_hp;
+    struct rtas_error_log *hdr;
+    struct rtas_event_log_v6 *v6hdr;
+    struct rtas_event_log_v6_maina *maina;
+    struct rtas_event_log_v6_mainb *mainb;
+    struct rtas_event_log_v6_hp *hp;
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    sPAPRDRConnectorType drc_type = drck->get_type(drc);
+
+    new_hp = g_malloc0(sizeof(struct hp_log_full));
+    hdr = &new_hp->hdr;
+    v6hdr = &new_hp->v6hdr;
+    maina = &new_hp->maina;
+    mainb = &new_hp->mainb;
+    hp = &new_hp->hp;
+
+    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
+                               | RTAS_LOG_SEVERITY_EVENT
+                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
+                               | RTAS_LOG_OPTIONAL_PART_PRESENT
+                               | RTAS_LOG_INITIATOR_HOTPLUG
+                               | RTAS_LOG_TYPE_HOTPLUG);
+    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
+                                       - sizeof(new_hp->hdr));
+
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
+
+    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
+    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
+    mainb->subsystem_id = 0x80; /* External environment */
+    mainb->event_severity = 0x00; /* Informational / non-error */
+    mainb->event_subtype = 0x00; /* Normal shutdown */
+
+    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
+    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
+    hp->hdr.section_version = 1; /* includes extended modifier */
+    hp->hotplug_action = hp_action;
+
+
+    switch (drc_type) {
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        hp->drc.index = cpu_to_be32(drck->get_index(drc));
+        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
+        break;
+    default:
+        /* skip notification for unknown connector types */
+        g_free(new_hp);
+        return;
+    }
+
+    if (pending_hp) {
+        /* Just toss any pending hotplug events for now, this will
+         * need to be fixed later on.
+         */
+        g_free(pending_hp);
+    }
+    pending_hp = new_hp;
+
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
+{
+    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
+}
+
+void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
+{
+    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
 }
 
 static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
@@ -298,15 +420,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
     }
 
-    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
-        if (sizeof(*pending_epow) < len) {
-            len = sizeof(*pending_epow);
-        }
+    if (mask & EVENT_MASK_EPOW) {
+        if (pending_epow) {
+            if (sizeof(*pending_epow) < len) {
+                len = sizeof(*pending_epow);
+            }
 
-        cpu_physical_memory_write(buf, pending_epow, len);
-        g_free(pending_epow);
-        pending_epow = NULL;
-        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+            cpu_physical_memory_write(buf, pending_epow, len);
+            g_free(pending_epow);
+            pending_epow = NULL;
+            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        } else if (pending_hp) {
+            if (sizeof(*pending_hp) < len) {
+                len = sizeof(*pending_hp);
+            }
+
+            cpu_physical_memory_write(buf, pending_hp, len);
+            g_free(pending_hp);
+            pending_hp = NULL;
+            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        }
     } else {
         rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
     }
@@ -314,7 +447,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
 void spapr_events_init(sPAPREnvironment *spapr)
 {
-    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
+    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b4daa42..4d50e74 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -3,6 +3,7 @@
 
 #include "sysemu/dma.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/spapr_drc.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
@@ -30,7 +31,7 @@ typedef struct sPAPREnvironment {
     struct PPCTimebase tb;
     bool has_graphics;
 
-    uint32_t epow_irq;
+    uint32_t check_exception_irq;
     Notifier epow_notifier;
 
     /* Migration state */
@@ -486,5 +487,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
                       sPAPRTCETable *tcet);
+void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
+void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
 
 #endif /* !defined (__HW_SPAPR_H__) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (7 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  4:34   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt() Michael Roth
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface as a stub.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         | 1 +
 hw/ppc/spapr_events.c  | 9 +++++++++
 include/hw/ppc/spapr.h | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1bc5773..a611616 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -541,6 +541,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
         refpoints, sizeof(refpoints))));
 
     _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
+    _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", RTAS_EVENT_SCAN_RATE)));
 
     /*
      * According to PAPR, rtas ibm,os-term does not guarantee a return
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index ebbf3a4..434a75d 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -445,6 +445,14 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     }
 }
 
+static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                            uint32_t token, uint32_t nargs,
+                            target_ulong args,
+                            uint32_t nret, target_ulong rets)
+{
+    rtas_st(rets, 0, 1); /* no error events found */
+}
+
 void spapr_events_init(sPAPREnvironment *spapr)
 {
     spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
@@ -452,4 +460,5 @@ void spapr_events_init(sPAPREnvironment *spapr)
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
                         check_exception);
+    spapr_rtas_register(RTAS_EVENT_SCAN, "event-scan", event_scan);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4d50e74..973193d 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -453,6 +453,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+#define RTAS_EVENT_SCAN_RATE    1
+
 typedef struct sPAPRTCETable sPAPRTCETable;
 
 #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt()
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (8 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:15   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type Michael Roth
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This function handles generation of ibm,drc-* array device tree
properties to describe DRC topology to guests. This will by used
by the guest to direct RTAS calls to manage any dynamic resources
we associate with a particular DR Connector as part of
hotplug/unplug.

Since general management of boot-time device trees are handled
outside of sPAPRDRConnector, we insert these values blindly given
an FDT and offset. A mask of sPAPRDRConnector types is given to
instruct us on what types of connectors entries should be generated
for, since descriptions for different connectors may live in
different parts of the device tree.

Based on code originally written by Nathan Fontenot.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_drc.c         | 225 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h |   3 +-
 2 files changed, 227 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index f81c6d1..b162184 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -501,3 +501,228 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
             (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
             (id & DRC_INDEX_ID_MASK));
 }
+
+/* internal helper to gather up DRC info specific to populating DRC
+ * topology information in the device tree.
+ */
+typedef struct DRConnectorDTInfo {
+    char drc_type[64];
+    char drc_name[64];
+    uint32_t drc_index;
+    uint32_t drc_power_domain;
+} DRConnectorDTInfo;
+
+/* generate a string the describes the DRC to encode into the
+ * device tree.
+ *
+ * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
+ */
+static void spapr_drc_get_type_str(char *buf, sPAPRDRConnectorType type)
+{
+    switch (type) {
+    case SPAPR_DR_CONNECTOR_TYPE_CPU:
+        sprintf(buf, "CPU");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PHB:
+        sprintf(buf, "PHB");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_VIO:
+        sprintf(buf, "SLOT");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        sprintf(buf, "28");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        sprintf(buf, "MEM");
+        break;
+    default:
+        g_assert(false);
+    }
+}
+
+/* generate a human-readable name for a DRC to encode into the DT
+ * description. this is mainly only used within a guest in place
+ * of the unique DRC index.
+ *
+ * in the case of VIO/PCI devices, it corresponds to a
+ * "location code" that maps a logical device/function (DRC index)
+ * to a physical (or virtual in the case of VIO) location in the
+ * system by chaining together the "location label" for each
+ * encapsulating component.
+ *
+ * since this is more to do with diagnosing physical hardware
+ * issues than guest compatibility, we choose location codes/DRC
+ * names that adhere to the documented format, but avoid encoding
+ * the entire topology information into the label/code, instead
+ * just using the location codes based on the labels for the
+ * endpoints (VIO/PCI adaptor connectors), which is basically
+ * just "C" followed by an integer ID.
+ *
+ * DRC names as documented by PAPR+ v2.7, 13.5.2.4
+ * location codes as documented by PAPR+ v2.7, 12.3.1.5
+ */
+static void spapr_drc_get_name_str(char *buf,
+                                   sPAPRDRConnectorType type,
+                                   uint32_t drc_index)
+{
+    uint32_t id = drc_index & DRC_INDEX_ID_MASK;
+
+    switch (type) {
+    case SPAPR_DR_CONNECTOR_TYPE_CPU:
+        sprintf(buf, "CPU %d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PHB:
+        sprintf(buf, "PHB %d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_VIO:
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        sprintf(buf, "C%d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        sprintf(buf, "LMB %d", id);
+        break;
+    default:
+        g_assert(false);
+    }
+}
+
+static DRConnectorDTInfo *spapr_dr_connector_get_info(uint32_t drc_type_mask,
+                                                      unsigned int *count)
+{
+    Object *root_container;
+    ObjectProperty *prop;
+    GArray *drc_info_list = g_array_new(false, true,
+                                        sizeof(DRConnectorDTInfo));
+
+    /* aliases for all DRConnector objects will be rooted in QOM
+     * composition tree at /dr-connector
+     */
+    root_container = container_get(object_get_root(), "/dr-connector");
+
+    QTAILQ_FOREACH(prop, &root_container->properties, node) {
+        Object *obj;
+        sPAPRDRConnector *drc;
+        sPAPRDRConnectorClass *drck;
+        DRConnectorDTInfo drc_info;
+
+        if (!strstart(prop->type, "link<", NULL)) {
+            continue;
+        }
+
+        obj = object_property_get_link(root_container, prop->name, NULL);
+        drc = SPAPR_DR_CONNECTOR(obj);
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+        if ((drc->type & drc_type_mask) == 0) {
+            continue;
+        }
+
+        drc_info.drc_index = drck->get_index(drc);
+        drc_info.drc_power_domain = -1;
+        spapr_drc_get_type_str(drc_info.drc_type, drc->type);
+        spapr_drc_get_name_str(drc_info.drc_name, drc->type,
+                               drck->get_index(drc));
+        g_array_append_val(drc_info_list, drc_info);
+    }
+
+    if (count) {
+        *count = drc_info_list->len;
+    }
+
+    /* if count is zero, free everything, including internal storage
+     * for array
+     */
+    return (DRConnectorDTInfo *)g_array_free(drc_info_list, count == 0);
+}
+
+/**
+ * spapr_drc_populate_dt
+ *
+ * @fdt: libfdt device tree
+ * @path: path in the DT to generate properties
+ * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
+ *   to the types of DRCs to generate entries for
+ *
+ * generate OF properties to describe DRC topology/indices to guests
+ *
+ * as documented in PAPR+ v2.1, 13.5.2
+ */
+int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask)
+{
+    DRConnectorDTInfo *drc_info_list;
+    unsigned int i, count;
+    char *char_buf;
+    uint32_t *char_buf_count;
+    uint32_t *int_buf;
+    int char_buf_offset, ret;
+
+    drc_info_list =
+        spapr_dr_connector_get_info(drc_type_mask, &count);
+
+    if (!count) {
+        return 0;
+    }
+
+    int_buf = g_new0(uint32_t, count + 1);
+    int_buf[0] = cpu_to_be32(count);
+    char_buf = g_new0(char, count * 128 + sizeof(uint32_t));
+    char_buf_count = (uint32_t *)&char_buf[0];
+    *char_buf_count = cpu_to_be32(count);
+
+    /* ibm,drc-indexes */
+    for (i = 0; i < count; i++) {
+        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_index);
+    }
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
+                      (count + 1) * sizeof(uint32_t));
+    if (ret) {
+        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
+        goto out;
+    }
+
+    /* ibm,drc-power-domains */
+    for (i = 0; i < count; i++) {
+        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_power_domain);
+    }
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
+                      (count + 1) * sizeof(uint32_t));
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
+        goto out;
+    }
+
+    /* ibm,drc-names */
+    char_buf_offset = sizeof(uint32_t);
+
+    for (i = 0; i < count; i++) {
+        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_name);
+        char_buf_offset += strlen(drc_info_list[i].drc_name) + 1;
+    }
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf,
+                      char_buf_offset);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
+        goto out;
+    }
+
+    /* ibm,drc-types */
+    char_buf_offset = sizeof(uint32_t);
+
+    for (i = 0; i < count; i++) {
+        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_type);
+        char_buf_offset += strlen(drc_info_list[i].drc_type) + 1;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf,
+                      char_buf_offset);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
+        goto out;
+    }
+
+out:
+    g_free(int_buf);
+    g_free(char_buf);
+    g_free(drc_info_list);
+    return ret;
+}
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 63ec687..5c70140 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -193,9 +193,10 @@ typedef struct sPAPRDRConnectorClass {
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
                                          sPAPRDRConnectorType type,
-                                         uint32_t token);
+                                         uint32_t id);
 sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
 sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
                                            uint32_t id);
+int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask);
 
 #endif /* __HW_SPAPR_DRC_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (9 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt() Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:16   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

And make it the default. This is identical to pseries-2.2 for now,
but subsequent commits will use it to enable pseries-2.3+ features.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a611616..9eb0a94 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1777,8 +1777,6 @@ static void spapr_machine_2_2_class_init(ObjectClass *oc, void *data)
 
     mc->name = "pseries-2.2";
     mc->desc = "pSeries Logical Partition (PAPR compliant) v2.2";
-    mc->alias = "pseries";
-    mc->is_default = 1;
 }
 
 static const TypeInfo spapr_machine_2_2_info = {
@@ -1787,11 +1785,28 @@ static const TypeInfo spapr_machine_2_2_info = {
     .class_init    = spapr_machine_2_2_class_init,
 };
 
+static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->name = "pseries-2.3";
+    mc->desc = "pSeries Logical Partition (PAPR compliant) v2.3";
+    mc->alias = "pseries";
+    mc->is_default = 1;
+}
+
+static const TypeInfo spapr_machine_2_3_info = {
+    .name          = TYPE_SPAPR_MACHINE "2.3",
+    .parent        = TYPE_SPAPR_MACHINE,
+    .class_init    = spapr_machine_2_3_class_init,
+};
+
 static void spapr_machine_register_types(void)
 {
     type_register_static(&spapr_machine_info);
     type_register_static(&spapr_machine_2_1_info);
     type_register_static(&spapr_machine_2_2_info);
+    type_register_static(&spapr_machine_2_3_info);
 }
 
 type_init(spapr_machine_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (10 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:18   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This option enables/disables PCI hotplug for a particular PHB.

Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.3.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 21 +++++++++++++++++++++
 hw/ppc/spapr_pci.c          |  2 ++
 include/hw/pci-host/spapr.h |  1 +
 3 files changed, 24 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9eb0a94..4478fa7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1752,11 +1752,27 @@ static const TypeInfo spapr_machine_info = {
     },
 };
 
+/* pSeries-specific hardware compatibility properties
+ *
+ * As with PC machines and general hardware properties, older
+ * machine types inherit backward-compability properties needed
+ * for subsequent machine types.
+ */
+#define PPC_HW_COMPAT_2_2 \
+        {\
+            .driver   = "spapr-pci-host-bridge",\
+            .property = "dynamic-reconfiguration",\
+            .value    = "off",\
+        }
+
+#define PPC_HW_COMPAT_2_1 PPC_HW_COMPAT_2_2
+
 static void spapr_machine_2_1_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
     static GlobalProperty compat_props[] = {
         HW_COMPAT_2_1,
+        PPC_HW_COMPAT_2_1,
         { /* end of list */ }
     };
 
@@ -1774,9 +1790,14 @@ static const TypeInfo spapr_machine_2_1_info = {
 static void spapr_machine_2_2_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    static GlobalProperty compat_props[] = {
+        PPC_HW_COMPAT_2_2,
+        { /* end of list */ }
+    };
 
     mc->name = "pseries-2.2";
     mc->desc = "pSeries Logical Partition (PAPR compliant) v2.2";
+    mc->compat_props = compat_props;
 }
 
 static const TypeInfo spapr_machine_2_2_info = {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 21b95b3..4850f9a 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -678,6 +678,8 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
                        SPAPR_PCI_IO_WIN_SIZE),
+    DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
+                     true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 4ea2a0d..ba0f3cc 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -67,6 +67,7 @@ struct sPAPRPHBState {
     int32_t index;
     uint64_t buid;
     char *dtbusname;
+    bool dr_enabled;
 
     MemoryRegion memspace, iospace;
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (11 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:20   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs Michael Roth
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

These will be used to support hotplug/unplug of PCI devices to the PCI
bus associated with a particular PHB.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4850f9a..73e86a4 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -47,6 +47,8 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
+#include "hw/ppc/spapr_drc.h"
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
     sPAPRPHBState *sphb;
@@ -622,6 +624,15 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         sphb->lsi_table[i].irq = irq;
     }
 
+    /* allocate connectors for child PCI devices */
+    if (sphb->dr_enabled) {
+        for (i = 0; i < PCI_SLOT_MAX; i++) {
+            spapr_dr_connector_new(OBJECT(phb),
+                                   SPAPR_DR_CONNECTOR_TYPE_PCI,
+                                   (sphb->index << 8) | (i << 3));
+        }
+    }
+
     if (!info->finish_realize) {
         error_setg(errp, "finish_realize not defined");
         return;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (12 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:22   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c Michael Roth
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Reserve 32 entries of type PCI in each PHB's initial FDT. This
advertises to guests that each PHB is DR-capable device with
physical hotpluggable slots. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 73e86a4..a5d7791 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -47,6 +47,8 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
+#define FDT_MAX_SIZE            0x10000
+
 #include "hw/ppc/spapr_drc.h"
 
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
@@ -872,7 +874,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt)
 {
-    int bus_off, i, j;
+    int bus_off, i, j, ret;
     char nodename[256];
     uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
     struct {
@@ -951,6 +953,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     object_child_foreach(OBJECT(phb), spapr_phb_children_dt,
                          &((sPAPRTCEDT){ .fdt = fdt, .node_off = bus_off }));
 
+    ret = spapr_drc_populate_dt(fdt, bus_off, SPAPR_DR_CONNECTOR_TYPE_PCI);
+    if (ret) {
+        return ret;
+    }
+
     return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (13 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:24   ` David Gibson
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/pci/pci.c         | 2 +-
 include/hw/pci/pci.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 371699c..bf16fc8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -122,7 +122,7 @@ static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
 
 static QLIST_HEAD(, PCIHostState) pci_host_bridges;
 
-static int pci_bar(PCIDevice *d, int reg)
+int pci_bar(PCIDevice *d, int reg)
 {
     uint8_t type;
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 97e4257..ae7c3fc 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -330,6 +330,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
 MemoryRegion *pci_address_space(PCIDevice *dev);
 MemoryRegion *pci_address_space_io(PCIDevice *dev);
+int pci_bar(PCIDevice *d, int reg);
 
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (14 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  5:58   ` David Gibson
  2015-01-23  5:17   ` Alexey Kardashevskiy
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
  16 siblings, 2 replies; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

This enables hotplug for PHB bridges. Upon hotplug we generate the
OF-nodes required by PAPR specification and IEEE 1275-1994
"PCI Bus Binding to Open Firmware" for the device.

We associate the corresponding FDT for these nodes with the DrcEntry
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification. The FDT is cleaned up in the case of unplug.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 268 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 249 insertions(+), 19 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index a5d7791..94e33b4 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -33,6 +33,7 @@
 #include <libfdt.h>
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qapi/qmp/qerror.h"
 
 #include "hw/pci/pci_bus.h"
 
@@ -51,6 +52,15 @@
 
 #include "hw/ppc/spapr_drc.h"
 
+#define FDT_MAX_SIZE            0x10000
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            return ret;                                            \
+        }                                                          \
+    } while (0)
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
     sPAPRPHBState *sphb;
@@ -483,6 +493,237 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &phb->iommu_as;
 }
 
+/* Macros to operate with address in OF binding to PCI */
+#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
+#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
+#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
+#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
+#define b_ss(x)         b_x((x), 24, 2) /* the space code */
+#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
+#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
+#define b_fff(x)        b_x((x), 8, 3)  /* function number */
+#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
+
+/* for 'reg'/'assigned-addresses' OF properties */
+#define RESOURCE_CELLS_SIZE 2
+#define RESOURCE_CELLS_ADDRESS 3
+#define RESOURCE_CELLS_TOTAL \
+    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
+
+static void fill_resource_props(PCIDevice *d, int bus_num,
+                                uint32_t *reg, int *reg_size,
+                                uint32_t *assigned, int *assigned_size)
+{
+    uint32_t *reg_row, *assigned_row;
+    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
+                       b_ddddd(PCI_SLOT(d->devfn)) |
+                       b_fff(PCI_FUNC(d->devfn)));
+    int i, idx = 0;
+
+    reg[0] = cpu_to_be32(dev_id);
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!d->io_regions[i].size) {
+            continue;
+        }
+        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
+        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
+        reg_row[0] = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
+        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
+            reg_row[0] |= cpu_to_be32(b_ss(1));
+        } else {
+            reg_row[0] |= cpu_to_be32(b_ss(2));
+        }
+        assigned_row[0] = cpu_to_be32(reg_row[0] | b_n(1));
+        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
+        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
+        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
+        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
+        idx++;
+    }
+
+    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
+    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
+}
+
+static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
+                                       int phb_index, int drc_index)
+{
+    int slot = PCI_SLOT(dev->devfn);
+    char slotname[16];
+    bool is_bridge = 1;
+    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+    int pci_status, reg_size, assigned_size;
+
+    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
+        PCI_HEADER_TYPE_NORMAL) {
+        is_bridge = 0;
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
+                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
+                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
+                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
+    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
+                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
+                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
+
+    /* if this device is NOT a bridge */
+    if (!is_bridge) {
+        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
+            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
+            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
+            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
+            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
+        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
+
+    /* the following fdt cells are masked off the pci status register */
+    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
+    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
+                          PCI_STATUS_DEVSEL_MASK & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
+                          PCI_STATUS_FAST_BACK & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
+                          PCI_STATUS_66MHZ & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
+                          PCI_STATUS_UDF & pci_status));
+
+    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
+    sprintf(slotname, "Slot %d", slot + phb_index * PCI_SLOT_MAX);
+    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
+                          RESOURCE_CELLS_ADDRESS));
+    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
+                          RESOURCE_CELLS_SIZE));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
+                          RESOURCE_CELLS_SIZE));
+    fill_resource_props(dev, phb_index, reg, &reg_size,
+                        assigned, &assigned_size);
+    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
+    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
+                     assigned, assigned_size));
+
+    return 0;
+}
+
+/* create OF node for pci device and required OF DT properties */
+static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
+                                       int drc_index, int *dt_offset)
+{
+    void *fdt_orig, *fdt;
+    int offset, ret;
+    int slot = PCI_SLOT(dev->devfn);
+    char nodename[512];
+
+    fdt_orig = g_malloc0(FDT_MAX_SIZE);
+    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
+    fdt_begin_node(fdt_orig, "");
+    fdt_end_node(fdt_orig);
+    fdt_finish(fdt_orig);
+
+    fdt = g_malloc0(FDT_MAX_SIZE);
+    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
+    sprintf(nodename, "pci@%d", slot);
+    offset = fdt_add_subnode(fdt, 0, nodename);
+    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
+    g_assert(!ret);
+    g_free(fdt_orig);
+
+    *dt_offset = offset;
+    return fdt;
+}
+
+static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
+                                     sPAPRPHBState *phb,
+                                     PCIDevice *pdev)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    DeviceState *dev = DEVICE(pdev);
+    int drc_index = drck->get_index(drc);
+    void *fdt = NULL;
+    int fdt_start_offset = 0;
+
+    /* boot-time devices get their device tree node created by SLOF, but for
+     * hotplugged devices we need QEMU to generate it so the guest can fetch
+     * it via RTAS
+     */
+    if (dev->hotplugged) {
+        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
+                                        &fdt_start_offset);
+    }
+    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
+}
+
+static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
+{
+    object_unparent(OBJECT(dev));
+}
+
+static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
+                                        sPAPRPHBState *phb,
+                                        PCIDevice *pdev)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
+}
+
+static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
+                               DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnector *drc =
+        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
+
+    /* if DR is disabled we don't need to do anything in the case of
+     * hotplug or coldplug callbacks
+     */
+    if (!phb->dr_enabled) {
+        /* if this is a hotplug operation initiated by the user
+         * we need to let them know it's not enabled
+         */
+        if (plugged_dev->hotplugged) {
+            error_set(errp, QERR_BUS_NO_HOTPLUG,
+                      object_get_typename(OBJECT(phb)));
+        }
+        return;
+    }
+
+    g_assert(drc);
+    spapr_device_hotplug_add(drc, phb, pdev);
+}
+
+static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
+                                 DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnector *drc =
+        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
+
+    if (!phb->dr_enabled) {
+        error_set(errp, QERR_BUS_NO_HOTPLUG,
+                  object_get_typename(OBJECT(phb)));
+        return;
+    }
+
+    spapr_device_hotplug_remove(drc, phb, pdev);
+}
+
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
 {
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
@@ -570,6 +811,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
                            &sphb->memspace, &sphb->iospace,
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
+    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
 
     /*
      * Initialize PHB address space.
@@ -806,6 +1048,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
     sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
+    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
 
     hc->root_bus_path = spapr_phb_root_bus_path;
     dc->realize = spapr_phb_realize;
@@ -815,6 +1058,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
     spc->finish_realize = spapr_phb_finish_realize;
+    hp->plug = spapr_phb_hot_plug;
+    hp->unplug = spapr_phb_hot_unplug;
 }
 
 static const TypeInfo spapr_phb_info = {
@@ -823,6 +1068,10 @@ static const TypeInfo spapr_phb_info = {
     .instance_size = sizeof(sPAPRPHBState),
     .class_init    = spapr_phb_class_init,
     .class_size    = sizeof(sPAPRPHBClass),
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
@@ -836,17 +1085,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
     return PCI_HOST_BRIDGE(dev);
 }
 
-/* Macros to operate with address in OF binding to PCI */
-#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
-#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
-#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
-#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
-#define b_ss(x)         b_x((x), 24, 2) /* the space code */
-#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
-#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
-#define b_fff(x)        b_x((x), 8, 3)  /* function number */
-#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
-
 typedef struct sPAPRTCEDT {
     void *fdt;
     int node_off;
@@ -906,14 +1144,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
         return bus_off;
     }
 
-#define _FDT(exp) \
-    do { \
-        int ret = (exp);                                           \
-        if (ret < 0) {                                             \
-            return ret;                                            \
-        }                                                          \
-    } while (0)
-
     /* Write PHB properties */
     _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
     _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug
  2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
                   ` (15 preceding siblings ...)
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
@ 2014-12-23 12:30 ` Michael Roth
  2015-01-19  6:00   ` David Gibson
  16 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2014-12-23 12:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.

Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 94e33b4..f17f984 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -705,6 +705,9 @@ static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
 
     g_assert(drc);
     spapr_device_hotplug_add(drc, phb, pdev);
+    if (plugged_dev->hotplugged) {
+        spapr_hotplug_req_add_event(drc);
+    }
 }
 
 static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
@@ -722,6 +725,7 @@ static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
     }
 
     spapr_device_hotplug_remove(drc, phb, pdev);
+    spapr_hotplug_req_remove_event(drc);
 }
 
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
@ 2015-01-02 10:32   ` Bharata B Rao
  2015-01-26  4:56     ` Michael Roth
  2015-01-16  6:19   ` David Gibson
  1 sibling, 1 reply; 55+ messages in thread
From: Bharata B Rao @ 2015-01-02 10:32 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, Alexander Graf, ncmike, qemu-ppc, tyreld,
	Nathan Fontenot

On Tue, Dec 23, 2014 at 6:00 PM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> This device emulates a firmware abstraction used by pSeries guests to
> manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
> memory, and CPUs. It is conceptually similar to an SHPC device,
> complete with LED indicators to identify individual slots to physical
> physical users and indicate when it is safe to remove a device. In
> some cases it is also used to manage virtualized resources, such a
> memory, CPUs, and physical-host bridges, which in the case of pSeries
> guests are virtualized resources where the physical components are
> managed by the host.
>
> Guests communicate with these DR Connectors using RTAS calls,
> generally by addressing the unique DRC index associated with a
> particular connector for a particular resource. For introspection
> purposes we expose this state initially as QOM properties, and
> in subsequent patches will introduce the RTAS calls that make use of
> it. This constitutes to the 'guest' interface.
>
> On the QEMU side we provide an attach/detach interface to associate
> or cleanup a DeviceState with a particular sPAPRDRConnector in
> response to hotplug/unplug, respectively. This constitutes the
> 'physical' interface to the DR Connector.
>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/Makefile.objs       |   2 +-
>  hw/ppc/spapr_drc.c         | 503 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h | 201 ++++++++++++++++++
>  3 files changed, 705 insertions(+), 1 deletion(-)
>  create mode 100644 hw/ppc/spapr_drc.c
>  create mode 100644 include/hw/ppc/spapr_drc.h
>
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 19d9920..ea010fd 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
>  # IBM pSeries (sPAPR)
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> -obj-$(CONFIG_PSERIES) += spapr_pci.o
> +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_drc.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> new file mode 100644
> index 0000000..f81c6d1
> --- /dev/null
> +++ b/hw/ppc/spapr_drc.c
> @@ -0,0 +1,503 @@
> +/*
> + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> + *
> + * Copyright IBM Corp. 2014
> + *
> + * Authors:
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "hw/ppc/spapr_drc.h"
> +#include "qom/object.h"
> +#include "hw/qdev.h"
> +#include "qapi/visitor.h"
> +#include "qemu/error-report.h"
> +
> +/* #define DEBUG_SPAPR_DRC */
> +
> +#ifdef DEBUG_SPAPR_DRC
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#define DPRINTFN(fmt, ...) \
> +    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#define DPRINTFN(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +#define DRC_CONTAINER_PATH "/dr-connector"
> +#define DRC_INDEX_TYPE_SHIFT 28
> +#define DRC_INDEX_ID_MASK ~(~0 << DRC_INDEX_TYPE_SHIFT)
> +
> +static int set_isolation_state(sPAPRDRConnector *drc,
> +                               sPAPRDRIsolationState state)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    DPRINTFN("set_isolation_state: %x", state);
> +    drc->isolation_state = state;
> +    if (drc->awaiting_release &&
> +        drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> +        drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> +                     drc->detach_cb_opaque);
> +    }
> +    return 0;
> +}
> +
> +static int set_indicator_state(sPAPRDRConnector *drc,
> +                               sPAPRDRIndicatorState state)
> +{
> +    DPRINTFN("set_indicator_state: %x", state);
> +    drc->indicator_state = state;
> +    return 0;
> +}
> +
> +static int set_allocation_state(sPAPRDRConnector *drc,
> +                                sPAPRDRAllocationState state)
> +{
> +    DPRINTFN("set_allocation_state: %x", state);
> +    drc->indicator_state = state;
> +    return 0;
> +}
> +
> +static uint32_t get_id(sPAPRDRConnector *drc)
> +{
> +    /* this value is unique for DRCs of a particular type, but may
> +     * overlap with the id's of other DRCs. the value is used both
> +     * to calculate a unique (across all DRC types) index, as well
> +     * as generating the ibm,drc-names OFDT property that describes
> +     * DRCs
> +     */
> +    return drc->id;
> +}
> +
> +static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
> +{
> +    uint32_t shift = 0;
> +
> +    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
> +    while (type != (1 << shift)) {
> +        shift++;
> +    }
> +    return shift;
> +}
> +
> +static uint32_t get_index(sPAPRDRConnector *drc)
> +{
> +    /* no set format for a drc index: it only needs to be globally
> +     * unique. this is how we encode the DRC type on bare-metal
> +     * however, so might as well do that here
> +     */
> +    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
> +            (drc->id & DRC_INDEX_ID_MASK);
> +}
> +
> +static uint32_t get_type(sPAPRDRConnector *drc)
> +{
> +    return drc->type;
> +}
> +
> +/*
> + * dr-entity-sense sensor value
> + * returned via get-sensor-state RTAS calls
> + * as expected by state diagram in PAPR+ 2.7, 13.4
> + * based on the current allocation/indicator/power states
> + * for the DR connector.
> + */
> +static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
> +{
> +    if (drc->dev) {
> +        /* this assumes all PCI devices are assigned to
> +         * a 'live insertion' power domain, where QEMU
> +         * manages power state automatically as opposed
> +         * to the guest. present, non-PCI resources are
> +         * unaffected by power state.
> +         */
> +        return SPAPR_DR_ENTITY_SENSE_PRESENT;
> +    }

Hotplugged CPU comes with drc->dev set (set during
sPAPRDRConnectorClass->attach()) and hence ends up returning PRESENT
but kernel expects UNUSABLE during get-sensor-state rtas call.

> +
> +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
> +        /* PCI devices, and only PCI devices, use PRESENT
> +         * in cases where we'd otherwise use UNUSABLE
> +         */
> +        return SPAPR_DR_ENTITY_SENSE_EMPTY;
> +    }
> +    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
@ 2015-01-16  5:28   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-16  5:28 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 3730 bytes --]

On Tue, Dec 23, 2014 at 06:30:15AM -0600, Michael Roth wrote:
> This adds a general overview of hotplug/dynamic-reconfiguration
> for sPAPR/pSeries guest.
> 
> As specified in PAPR+ v2.7.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  docs/specs/ppc-spapr-hotplug.txt | 287 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 287 insertions(+)
>  create mode 100644 docs/specs/ppc-spapr-hotplug.txt
> 
> diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
> new file mode 100644
> index 0000000..6f2863f
> --- /dev/null
> +++ b/docs/specs/ppc-spapr-hotplug.txt
> @@ -0,0 +1,287 @@
> += sPAPR Dynamic Reconfiguration =
> +
> +sPAPR/"pseries" guests make use a facility called dynamic-reconfiguration to
> +handle hotplugging of dynamic "physical" resources like PCI cards, or
> +"logical"/paravirtual resources like memory, CPUs, and "physical"
> +host-bridges, which are generally managed by the host/hypervisor and provided
> +to guests as virtualized resources. The specifics of dynamic-reconfiguration
> +are documented extensively in PAPR+ v2.7, Section 13.1. This document
> +provides a summary of that information as it applies to the implementation
> +within QEMU.
> +
> +== Dynamic-reconfiguration Connectors ==
> +
> +To manage hotplug/unplug of these resources, a firmware abstraction known as
> +a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
> +resource to the guest, and provide an interface for the guest to manage
> +configuration/removal of the resource associated with it.
> +
> +== Device-tree description of DRCs ==
> +
> +A set of 4 array Open Firmware device tree properties are used to describe
> +the name/index/power-domain/type of each DRC allocated to a guest at
> +boot-time. There may be multiple sets of these arrays, rooted at different
> +paths in the device tree depending on the type of resource the DRCs manage.
> +
> +In some cases, the DRCs themselves may be provided by a dynamic resource,
> +such as the DRCs managed PCI slots on a hotplugged PHB. In this case the
> +arrays would be fetched as part of the device tree retrieval interfaces
> +for hotplugged resources described under "Guest->Host interface".
> +
> +The array properties are described below. Each entry/element in an array
> +describes the DRC identified by the element in the corresponding position
> +of ibm,drc-indexes:
> +
> +ibm,drc-names:
> +  first 4-bytes: BE-encoded integer denoting the number of entries
> +  each entry: a NULL-terminated <name> string encoded as a byte array
> +
> +  <name> values for logical/virtual resources are defined in PAPR+ v2.7,
> +  Section 13.5.2.4, and basically consist of the type of the resource
> +  followed by a space and a numerical value that's unique across resources
> +  of that type.
> +
> +  <name> values for "physical" resources such as PCI or VIO devices are
> +  defined as being "location codes", which are the "location labels" of
> +  each encapsulating device, starting from the chassis down to the
> +  individual slot for the device, concatenated by a hyphen. This provides
> +  a mapping of resources to a physical location in a chassis for debugging
> +  purposes. For QEMU, this mapping is less important, so we assign a
> +  location code that confirms to naming specifications, but is simply a

s/confirms/conforms/

Otherwise, nice write up.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
  2015-01-02 10:32   ` Bharata B Rao
@ 2015-01-16  6:19   ` David Gibson
  2015-01-26  4:01     ` Michael Roth
  1 sibling, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-16  6:19 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 30463 bytes --]

On Tue, Dec 23, 2014 at 06:30:16AM -0600, Michael Roth wrote:
> This device emulates a firmware abstraction used by pSeries guests to
> manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
> memory, and CPUs. It is conceptually similar to an SHPC device,
> complete with LED indicators to identify individual slots to physical
> physical users and indicate when it is safe to remove a device. In
> some cases it is also used to manage virtualized resources, such a
> memory, CPUs, and physical-host bridges, which in the case of pSeries
> guests are virtualized resources where the physical components are
> managed by the host.
> 
> Guests communicate with these DR Connectors using RTAS calls,
> generally by addressing the unique DRC index associated with a
> particular connector for a particular resource. For introspection
> purposes we expose this state initially as QOM properties, and
> in subsequent patches will introduce the RTAS calls that make use of
> it. This constitutes to the 'guest' interface.
> 
> On the QEMU side we provide an attach/detach interface to associate
> or cleanup a DeviceState with a particular sPAPRDRConnector in
> response to hotplug/unplug, respectively. This constitutes the
> 'physical' interface to the DR Connector.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/Makefile.objs       |   2 +-
>  hw/ppc/spapr_drc.c         | 503 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h | 201 ++++++++++++++++++
>  3 files changed, 705 insertions(+), 1 deletion(-)
>  create mode 100644 hw/ppc/spapr_drc.c
>  create mode 100644 include/hw/ppc/spapr_drc.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 19d9920..ea010fd 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
>  # IBM pSeries (sPAPR)
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> -obj-$(CONFIG_PSERIES) += spapr_pci.o
> +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_drc.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> new file mode 100644
> index 0000000..f81c6d1
> --- /dev/null
> +++ b/hw/ppc/spapr_drc.c
> @@ -0,0 +1,503 @@
> +/*
> + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> + *
> + * Copyright IBM Corp. 2014
> + *
> + * Authors:
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "hw/ppc/spapr_drc.h"
> +#include "qom/object.h"
> +#include "hw/qdev.h"
> +#include "qapi/visitor.h"
> +#include "qemu/error-report.h"
> +
> +/* #define DEBUG_SPAPR_DRC */
> +
> +#ifdef DEBUG_SPAPR_DRC
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#define DPRINTFN(fmt, ...) \
> +    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#define DPRINTFN(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +#define DRC_CONTAINER_PATH "/dr-connector"
> +#define DRC_INDEX_TYPE_SHIFT 28
> +#define DRC_INDEX_ID_MASK ~(~0 << DRC_INDEX_TYPE_SHIFT)

I'm not sure if there are actually any situations where it can break,
but safest to put parens around the whole macro body, just in case of
macro vs. precedence weirdness.

> +static int set_isolation_state(sPAPRDRConnector *drc,
> +                               sPAPRDRIsolationState state)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    DPRINTFN("set_isolation_state: %x", state);
> +    drc->isolation_state = state;
> +    if (drc->awaiting_release &&
> +        drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> +        drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> +                     drc->detach_cb_opaque);
> +    }
> +    return 0;
> +}
> +
> +static int set_indicator_state(sPAPRDRConnector *drc,
> +                               sPAPRDRIndicatorState state)
> +{
> +    DPRINTFN("set_indicator_state: %x", state);
> +    drc->indicator_state = state;
> +    return 0;
> +}
> +
> +static int set_allocation_state(sPAPRDRConnector *drc,
> +                                sPAPRDRAllocationState state)
> +{
> +    DPRINTFN("set_allocation_state: %x", state);
> +    drc->indicator_state = state;

This should be drc->allocation_state, surely.

> +    return 0;
> +}
> +
> +static uint32_t get_id(sPAPRDRConnector *drc)
> +{
> +    /* this value is unique for DRCs of a particular type, but may
> +     * overlap with the id's of other DRCs. the value is used both
> +     * to calculate a unique (across all DRC types) index, as well
> +     * as generating the ibm,drc-names OFDT property that describes
> +     * DRCs
> +     */
> +    return drc->id;
> +}

Since this is static anyway, why not just reference drc->id directly?

> +static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
> +{
> +    uint32_t shift = 0;
> +
> +    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
> +    while (type != (1 << shift)) {
> +        shift++;
> +    }
> +    return shift;
> +}
> +
> +static uint32_t get_index(sPAPRDRConnector *drc)
> +{
> +    /* no set format for a drc index: it only needs to be globally
> +     * unique. this is how we encode the DRC type on bare-metal
> +     * however, so might as well do that here
> +     */
> +    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
> +            (drc->id & DRC_INDEX_ID_MASK);
> +}
> +
> +static uint32_t get_type(sPAPRDRConnector *drc)
> +{
> +    return drc->type;
> +}
> +
> +/*
> + * dr-entity-sense sensor value
> + * returned via get-sensor-state RTAS calls
> + * as expected by state diagram in PAPR+ 2.7, 13.4
> + * based on the current allocation/indicator/power states
> + * for the DR connector.
> + */
> +static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
> +{
> +    if (drc->dev) {
> +        /* this assumes all PCI devices are assigned to
> +         * a 'live insertion' power domain, where QEMU
> +         * manages power state automatically as opposed
> +         * to the guest. present, non-PCI resources are
> +         * unaffected by power state.
> +         */

Is it possible to make an assert() to check that?

> +        return SPAPR_DR_ENTITY_SENSE_PRESENT;
> +    }
> +
> +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
> +        /* PCI devices, and only PCI devices, use PRESENT
> +         * in cases where we'd otherwise use UNUSABLE
> +         */
> +        return SPAPR_DR_ENTITY_SENSE_EMPTY;
> +    }
> +    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
> +}
> +
> +static sPAPRDRCCResponse configure_connector_common(sPAPRDRCCState *ccs,
> +                            char **prop_name, const struct fdt_property **prop,

Maybe rename prop_name to name, since it's also used for node names.

> +                            int *prop_len)
> +{
> +    sPAPRDRCCResponse resp = SPAPR_DR_CC_RESPONSE_CONTINUE;
> +    int fdt_offset_next;
> +
> +    *prop_name = NULL;
> +    *prop = NULL;
> +    *prop_len = 0;
> +
> +    if (!ccs->fdt) {
> +        return SPAPR_DR_CC_RESPONSE_ERROR;
> +    }
> +
> +    while (resp == SPAPR_DR_CC_RESPONSE_CONTINUE) {
> +        const char *name_cur;
> +        uint32_t tag;
> +        int name_cur_len;
> +
> +        tag = fdt_next_tag(ccs->fdt, ccs->fdt_offset, &fdt_offset_next);
> +        switch (tag) {
> +        case FDT_BEGIN_NODE:
> +            ccs->fdt_depth++;
> +            name_cur = fdt_get_name(ccs->fdt, ccs->fdt_offset, &name_cur_len);
> +            *prop_name = g_strndup(name_cur, name_cur_len);
> +            resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
> +            break;
> +        case FDT_END_NODE:
> +            ccs->fdt_depth--;
> +            if (ccs->fdt_depth == 0) {
> +                resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
> +            } else {
> +                resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT;
> +            }
> +            break;
> +        case FDT_PROP:
> +            *prop = fdt_get_property_by_offset(ccs->fdt, ccs->fdt_offset,
> +                                               prop_len);
> +            name_cur = fdt_string(ccs->fdt, fdt32_to_cpu((*prop)->nameoff));
> +            *prop_name = g_strdup(name_cur);
> +            resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
> +            break;
> +        case FDT_END:
> +            resp = SPAPR_DR_CC_RESPONSE_ERROR;
> +            break;

IIUC, the fdt fragment you're stepping through here is generated
within qemu.  In which case shouldn't this be an assert, rather than
reporting an error to the guest?

> +        default:
> +            ccs->fdt_offset = fdt_offset_next;
> +        }
> +    }
> +
> +    ccs->fdt_offset = fdt_offset_next;
> +    return resp;
> +}
> +
> +static sPAPRDRCCResponse configure_connector(sPAPRDRConnector *drc,
> +                                             char **prop_name,
> +                                             const struct fdt_property **prop,
> +                                             int *prop_len)
> +{
> +    return configure_connector_common(&drc->ccs, prop_name, prop, prop_len);
> +}
> +
> +static void prop_get_id(Object *obj, Visitor *v, void *opaque,
> +                                  const char *name, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +    uint32_t value = get_id(drc);
> +    visit_type_uint32(v, &value, name, errp);
> +}
> +
> +static void prop_get_index(Object *obj, Visitor *v, void *opaque,
> +                                  const char *name, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    uint32_t value = (uint32_t)drck->get_index(drc);
> +    visit_type_uint32(v, &value, name, errp);
> +}
> +
> +static void prop_get_type(Object *obj, Visitor *v, void *opaque,
> +                          const char *name, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    uint32_t value = (uint32_t)drck->get_type(drc);
> +    visit_type_uint32(v, &value, name, errp);
> +}
> +
> +static void prop_get_entity_sense(Object *obj, Visitor *v, void *opaque,
> +                                  const char *name, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    uint32_t value = (uint32_t)drck->entity_sense(drc);
> +    visit_type_uint32(v, &value, name, errp);
> +}
> +
> +static void prop_get_fdt(Object *obj, Visitor *v, void *opaque,
> +                        const char *name, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +    sPAPRDRCCState ccs = { 0 };
> +    sPAPRDRCCResponse resp;
> +
> +    ccs.fdt = drc->ccs.fdt;
> +    ccs.fdt_offset = ccs.fdt_start_offset = drc->ccs.fdt_start_offset;
> +
> +    do {
> +        char *prop_name = NULL;
> +        const struct fdt_property *prop = NULL;
> +        int prop_len;
> +
> +        resp = configure_connector_common(&ccs, &prop_name, &prop, &prop_len);
> +
> +        switch (resp) {
> +        case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
> +            visit_start_struct(v, NULL, NULL, prop_name, 0, NULL);
> +            break;
> +        case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
> +            visit_end_struct(v, NULL);
> +            break;
> +        case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY: {
> +            int i;
> +            visit_start_list(v, prop_name, NULL);
> +            for (i = 0; i < prop_len; i++) {
> +                visit_type_uint8(v, (uint8_t *)&prop->data[i], NULL, NULL);
> +            }
> +            visit_end_list(v, NULL);
> +            break;
> +        }
> +        default:
> +            resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
> +            break;
> +        }
> +
> +        g_free(prop_name);
> +    } while (resp != SPAPR_DR_CC_RESPONSE_SUCCESS &&
> +             resp != SPAPR_DR_CC_RESPONSE_ERROR);
> +}
> +
> +static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
> +                   int fdt_start_offset, bool coldplug)
> +{
> +    DPRINTFN("attach");
> +
> +    g_assert(drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED);
> +    g_assert(drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE);
> +    g_assert(drc->indicator_state == SPAPR_DR_INDICATOR_STATE_INACTIVE);
> +    g_assert(fdt || coldplug);
> +
> +    /* NOTE: setting initial isolation state to UNISOLATED means we can't
> +     * detach unless guest has a userspace/kernel that moves this state
> +     * back to ISOLATED in response to an unplug event, or this is done
> +     * manually by the admin prior. if we force things while the guest
> +     * may be accessing the device, we can easily crash the guest, so we
> +     * we defer completion of removal in such cases to the reset() hook.
> +     */

Given that, would it make more sense to start in ISOLATED state?  Or
is the initial state specified by PAPR?

> +    drc->isolation_state = SPAPR_DR_ISOLATION_STATE_UNISOLATED;
> +    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_USABLE;
> +    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_ACTIVE;
> +
> +    drc->dev = d;
> +    drc->ccs.fdt = fdt;
> +    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = fdt_start_offset;
> +    drc->ccs.fdt_depth = 0;
> +
> +    object_property_add_link(OBJECT(drc), "device",
> +                             object_get_typename(OBJECT(drc->dev)),
> +                             (Object **)(&drc->dev),
> +                             NULL, 0, NULL);
> +}
> +
> +static void detach(sPAPRDRConnector *drc, DeviceState *d,
> +                   spapr_drc_detach_cb *detach_cb,
> +                   void *detach_cb_opaque)
> +{
> +    DPRINTFN("detach");
> +
> +    drc->detach_cb = detach_cb;
> +    drc->detach_cb_opaque = detach_cb_opaque;
> +
> +    if (drc->isolation_state != SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> +        DPRINTFN("awaiting transition to isolated state before removal");
> +        drc->awaiting_release = true;
> +        return;
> +    }
> +
> +    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_UNUSABLE;
> +    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
> +
> +    if (drc->detach_cb) {
> +        drc->detach_cb(drc->dev, drc->detach_cb_opaque);
> +    }
> +
> +    drc->awaiting_release = false;
> +    g_free(drc->ccs.fdt);
> +    drc->ccs.fdt = NULL;
> +    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = drc->ccs.fdt_depth = 0;
> +    object_property_del(OBJECT(drc), "device", NULL);
> +    drc->dev = NULL;
> +    drc->detach_cb = NULL;
> +    drc->detach_cb_opaque = NULL;

Shouldn't all this code after the detach_cb call also be called from
set_isolation_state in the case of a deferred detach?  In which case
you probably want a helper.

> +}
> +
> +static void reset(DeviceState *d)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    DPRINTFN("drc reset: %x", drck->get_index(drc));
> +    /* immediately upon reset we can safely assume DRCs whose devices are pending
> +     * removal can be safely removed, and that they will subsequently be left in
> +     * an ISOLATED state. move the DRC to this state in these cases (which will in
> +     * turn complete any pending device removals)
> +     */
> +    if (drc->awaiting_release) {
> +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_ISOLATED);
> +    }
> +}
> +
> +static void realize(DeviceState *d, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    Object *root_container;
> +    char link_name[256];
> +    gchar *child_name;
> +    Error *err = NULL;
> +
> +    DPRINTFN("drc realize: %x", drck->get_index(drc));
> +    /* NOTE: we do this as part of realize/unrealize due to the fact
> +     * that the guest will communicate with the DRC via RTAS calls
> +     * referencing the global DRC index. By unlinking the DRC
> +     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> +     * inaccessible by the guest, since lookups rely on this path
> +     * existing in the composition tree
> +     */
> +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +    snprintf(link_name, sizeof(link_name), "%x", drck->get_index(drc));
> +    child_name = object_get_canonical_path_component(OBJECT(drc));
> +    DPRINTFN("drc child name: %s", child_name);
> +    object_property_add_alias(root_container, link_name,
> +                              drc->owner, child_name, &err);
> +    /*
> +    object_property_add_link(root_container, name, TYPE_SPAPR_DR_CONNECTOR,
> +                             (Object **)&drc, NULL,
> +                             OBJ_PROP_LINK_UNREF_ON_RELEASE, &err);
> +                             */
> +    if (err) {
> +        error_report("%s", error_get_pretty(err));
> +        error_free(err);
> +        object_unref(OBJECT(drc));
> +    }
> +    DPRINTFN("drc realize complete");
> +}
> +
> +static void unrealize(DeviceState *d, Error **errp)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    Object *root_container;
> +    char name[256];
> +    Error *err = NULL;
> +
> +    DPRINTFN("drc unrealize: %x", drck->get_index(drc));
> +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +    snprintf(name, sizeof(name), "%x", drck->get_index(drc));
> +    object_property_del(root_container, name, &err);
> +    if (err) {
> +        error_report("%s", error_get_pretty(err));
> +        error_free(err);
> +        object_unref(OBJECT(drc));
> +    }
> +}
> +
> +sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> +                                         sPAPRDRConnectorType type,
> +                                         uint32_t id)
> +{
> +    sPAPRDRConnector *drc =
> +        SPAPR_DR_CONNECTOR(object_new(TYPE_SPAPR_DR_CONNECTOR));
> +
> +    g_assert(type);
> +
> +    drc->type = type;
> +    drc->id = id;
> +    drc->owner = owner;
> +    object_property_add_child(owner, "dr-connector[*]", OBJECT(drc), NULL);
> +    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
> +
> +    return drc;
> +}
> +
> +static void spapr_dr_connector_instance_init(Object *obj)
> +{
> +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> +
> +    object_property_add_uint32_ptr(obj, "isolation-state",
> +                                   &drc->isolation_state, NULL);
> +    object_property_add_uint32_ptr(obj, "indicator-state",
> +                                   &drc->indicator_state, NULL);
> +    object_property_add_uint32_ptr(obj, "allocation-state",
> +                                   &drc->allocation_state, NULL);

Don't these QOM properties need to be bound to set_isolation_state
etc. for the write side?  Or does add_uint32_ptr only allow reads?

> +    object_property_add(obj, "id", "uint32", prop_get_id,
> +                        NULL, NULL, NULL, NULL);
> +    object_property_add(obj, "index", "uint32", prop_get_index,
> +                        NULL, NULL, NULL, NULL);
> +    object_property_add(obj, "index", "uint32", prop_get_type,
> +                        NULL, NULL, NULL, NULL);
> +    object_property_add(obj, "entity-sense", "uint32", prop_get_entity_sense,
> +                        NULL, NULL, NULL, NULL);
> +    object_property_add(obj, "fdt", "struct", prop_get_fdt,
> +                        NULL, NULL, NULL, NULL);
> +}
> +
> +static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> +{
> +    DeviceClass *dk = DEVICE_CLASS(k);
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +
> +    dk->reset = reset;
> +    dk->realize = realize;
> +    dk->unrealize = unrealize;
> +    drck->set_isolation_state = set_isolation_state;
> +    drck->set_indicator_state = set_indicator_state;
> +    drck->set_allocation_state = set_allocation_state;
> +    drck->get_index = get_index;
> +    drck->get_type = get_type;
> +    drck->entity_sense = entity_sense;
> +    drck->configure_connector = configure_connector;
> +    drck->attach = attach;
> +    drck->detach = detach;
> +}
> +
> +static const TypeInfo spapr_dr_connector_info = {
> +    .name          = TYPE_SPAPR_DR_CONNECTOR,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(sPAPRDRConnector),
> +    .instance_init = spapr_dr_connector_instance_init,
> +    .class_size    = sizeof(sPAPRDRConnectorClass),
> +    .class_init    = spapr_dr_connector_class_init,
> +};
> +
> +static void spapr_drc_register_types(void)
> +{
> +    type_register_static(&spapr_dr_connector_info);
> +}
> +
> +type_init(spapr_drc_register_types)
> +
> +/* helper functions for external users */
> +
> +sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index)
> +{
> +    Object *obj;
> +    char name[256];
> +
> +    snprintf(name, sizeof(name), "%s/%x", DRC_CONTAINER_PATH, index);
> +    obj = object_resolve_path(name, NULL);
> +
> +    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> +}
> +
> +sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> +                                           uint32_t id)
> +{
> +    return spapr_dr_connector_by_index(
> +            (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
> +            (id & DRC_INDEX_ID_MASK));
> +}
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> new file mode 100644
> index 0000000..63ec687
> --- /dev/null
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -0,0 +1,201 @@
> +/*
> + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> + *
> + * Copyright IBM Corp. 2014
> + *
> + * Authors:
> + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#if !defined(__HW_SPAPR_DRC_H__)
> +#define __HW_SPAPR_DRC_H__
> +
> +#include "qom/object.h"
> +#include "hw/qdev.h"
> +#include "libfdt.h"
> +
> +#define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
> +#define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DR_CONNECTOR)
> +#define SPAPR_DR_CONNECTOR_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, \
> +                           TYPE_SPAPR_DR_CONNECTOR)
> +#define SPAPR_DR_CONNECTOR(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> +                                             TYPE_SPAPR_DR_CONNECTOR)
> +
> +/*
> + * Various hotplug types managed by sPAPRDRConnector
> + *
> + * these are somewhat arbitrary, but to make things easier
> + * when generating DRC indexes later we've aligned the bit
> + * positions with the values used to assign DRC indexes on
> + * pSeries. we use those values as bit shifts to allow for
> + * the OR'ing of these values in various QEMU routines, but
> + * for values exposed to the guest (via DRC indexes for
> + * instance) we will use the shift amounts.
> + */
> +typedef enum {
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU = 1,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB = 2,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
> +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> +} sPAPRDRConnectorTypeShift;
> +
> +typedef enum {
> +    SPAPR_DR_CONNECTOR_TYPE_ANY = ~0,
> +    SPAPR_DR_CONNECTOR_TYPE_CPU = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU,
> +    SPAPR_DR_CONNECTOR_TYPE_PHB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB,
> +    SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
> +    SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
> +    SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> +} sPAPRDRConnectorType;
> +
> +/*
> + * set via set-indicator RTAS calls
> + * as documented by PAPR+ 2.7 13.5.3.4, Table 177
> + *
> + * isolated: put device under firmware control 
> + * unisolated: claim OS control of device (may or may not be in use)
> + */
> +typedef enum {
> +    SPAPR_DR_ISOLATION_STATE_ISOLATED   = 0,
> +    SPAPR_DR_ISOLATION_STATE_UNISOLATED = 1
> +} sPAPRDRIsolationState;
> +
> +/*
> + * set via set-indicator RTAS calls
> + * as documented by PAPR+ 2.7 13.5.3.4, Table 177
> + *
> + * unusable: mark device as unavailable to OS
> + * usable: mark device as available to OS
> + * exchange: (currently unused)
> + * recover: (currently unused)
> + */
> +typedef enum {
> +    SPAPR_DR_ALLOCATION_STATE_UNUSABLE  = 0,
> +    SPAPR_DR_ALLOCATION_STATE_USABLE    = 1,
> +    SPAPR_DR_ALLOCATION_STATE_EXCHANGE  = 2,
> +    SPAPR_DR_ALLOCATION_STATE_RECOVER   = 3
> +} sPAPRDRAllocationState;
> +
> +/*
> + * LED/visual indicator state
> + *
> + * set via set-indicator RTAS calls
> + * as documented by PAPR+ 2.7 13.5.3.4, Table 177,
> + * and PAPR+ 2.7 13.5.4.1, Table 180
> + *
> + * inactive: hotpluggable entity inactive and safely removable
> + * active: hotpluggable entity in use and not safely removable
> + * identify: (currently unused)
> + * action: (currently unused)
> + */
> +typedef enum {
> +    SPAPR_DR_INDICATOR_STATE_INACTIVE   = 0,
> +    SPAPR_DR_INDICATOR_STATE_ACTIVE     = 1,
> +    SPAPR_DR_INDICATOR_STATE_IDENTIFY   = 2,
> +    SPAPR_DR_INDICATOR_STATE_ACTION     = 3,
> +} sPAPRDRIndicatorState;
> +
> +/*
> + * returned via get-sensor-state RTAS calls
> + * as documented by PAPR+ 2.7 13.5.3.3, Table 175:
> + *
> + * empty: connector slot empty (e.g. empty hotpluggable PCI slot)
> + * present: connector slot populated and device available to OS
> + * unusable: device not currently available to OS
> + * exchange: (currently unused)
> + * recover: (currently unused)
> + */
> +typedef enum {
> +    SPAPR_DR_ENTITY_SENSE_EMPTY     = 0,
> +    SPAPR_DR_ENTITY_SENSE_PRESENT   = 1,
> +    SPAPR_DR_ENTITY_SENSE_UNUSABLE  = 2,
> +    SPAPR_DR_ENTITY_SENSE_EXCHANGE  = 3,
> +    SPAPR_DR_ENTITY_SENSE_RECOVER   = 4,
> +} sPAPRDREntitySense;
> +
> +typedef enum {
> +    SPAPR_DR_CC_RESPONSE_NEXT_SIB       = 1, /* currently unused */
> +    SPAPR_DR_CC_RESPONSE_NEXT_CHILD     = 2,
> +    SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY  = 3,
> +    SPAPR_DR_CC_RESPONSE_PREV_PARENT    = 4,
> +    SPAPR_DR_CC_RESPONSE_SUCCESS        = 0,
> +    SPAPR_DR_CC_RESPONSE_ERROR          = -1,
> +    SPAPR_DR_CC_RESPONSE_CONTINUE       = -2,
> +} sPAPRDRCCResponse;
> +
> +typedef struct sPAPRDRCCState {
> +    void *fdt;
> +    int fdt_start_offset;
> +    int fdt_offset;
> +    int fdt_depth;
> +} sPAPRDRCCState;
> +
> +typedef void (spapr_drc_detach_cb)(DeviceState *d, void *opaque);
> +
> +typedef struct sPAPRDRConnector {
> +    /*< private >*/
> +    DeviceState parent;
> +
> +    sPAPRDRConnectorType type;
> +    uint32_t id;
> +    Object *owner;
> +
> +    /* sensor/indicator states */
> +    uint32_t isolation_state;
> +    uint32_t allocation_state;
> +    uint32_t indicator_state;
> +
> +    /* configure-connector state */
> +    sPAPRDRCCState ccs;
> +
> +    bool awaiting_release;
> +
> +    /* device pointer, via link property */
> +    DeviceState *dev;
> +    spapr_drc_detach_cb *detach_cb;
> +    void *detach_cb_opaque;
> +} sPAPRDRConnector;
> +
> +typedef struct sPAPRDRConnectorClass {
> +    /*< private >*/
> +    DeviceClass parent;
> +
> +    /*< public >*/
> +
> +    /* accessors for guest-visible (generally via RTAS) DR state */
> +    int (*set_isolation_state)(sPAPRDRConnector *drc,
> +                               sPAPRDRIsolationState state);
> +    int (*set_indicator_state)(sPAPRDRConnector *drc,
> +                               sPAPRDRIndicatorState state);
> +    int (*set_allocation_state)(sPAPRDRConnector *drc,
> +                                sPAPRDRAllocationState state);
> +    uint32_t (*get_index)(sPAPRDRConnector *drc);
> +    uint32_t (*get_type)(sPAPRDRConnector *drc);
> +
> +    sPAPRDREntitySense (*entity_sense)(sPAPRDRConnector *drc);
> +    sPAPRDRCCResponse (*configure_connector)(sPAPRDRConnector *drc,
> +                                             char **prop_name,
> +                                             const struct fdt_property **prop,
> +                                             int *prop_len);
> +
> +    /* QEMU interfaces for managing hotplug operations */
> +    void (*attach)(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
> +                   int fdt_start_offset, bool coldplug);
> +    void (*detach)(sPAPRDRConnector *drc, DeviceState *d,
> +                   spapr_drc_detach_cb *detach_cb,
> +                   void *detach_cb_opaque);
> +} sPAPRDRConnectorClass;
> +
> +sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> +                                         sPAPRDRConnectorType type,
> +                                         uint32_t token);
> +sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
> +sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> +                                           uint32_t id);
> +
> +#endif /* __HW_SPAPR_DRC_H__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
@ 2015-01-16  6:21   ` David Gibson
  2015-01-26  5:21     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-16  6:21 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 2ec2a8e..a2fb533 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>      rtas_st(rets, 0, ret);
>  }
>  
> +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                                 uint32_t token, uint32_t nargs,
> +                                 target_ulong args, uint32_t nret,
> +                                 target_ulong rets)
> +{
> +    /* we currently only use a single, "live insert" powerdomain for
> +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
> +     */

Even so, you should at least validate the number of args and rets, and
preferably check that the user isn't attempt to set something for some
other, non-existent power domain.

> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, 100);
> +}
> +
> +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args, uint32_t nret,
> +                                  target_ulong rets)
> +{
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, 100);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
>                          rtas_ibm_set_system_parameter);
>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
>                          rtas_ibm_os_term);
> +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
> +                        rtas_set_power_level);
> +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> +                        rtas_get_power_level);
>  }
>  
>  type_init(core_rtas_register_types)

This code should probably go in spapr_drc.c.  The idea that spapr_rtas
was just the RTAS dispatch code, and RTAS functions that had no other
home.  Generally RTAS functions should live with the devices they're
connected to.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface Michael Roth
@ 2015-01-16  6:25   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-16  6:25 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]

On Tue, Dec 23, 2014 at 06:30:18AM -0600, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>
> 
> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index a2fb533..6aa325f 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -35,6 +35,18 @@
>  #include "qapi-event.h"
>  
>  #include <libfdt.h>
> +#include "hw/ppc/spapr_drc.h"
> +
> +/* #define DEBUG_SPAPR */
> +
> +#ifdef DEBUG_SPAPR
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
>  
>  static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                                     uint32_t token, uint32_t nargs,
> @@ -311,6 +323,72 @@ static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 1, 100);
>  }
>  
> +/*
> + * indicator/sensor types
> + * as defined by PAPR+ 2.7 7.3.5.4, Table 41
> + *
> + * NOTE: currently only DR-related sensors are implemented here
> + */
> +#define RTAS_SENSOR_TYPE_ISOLATION_STATE 9001
> +#define RTAS_SENSOR_TYPE_DR 9002
> +#define RTAS_SENSOR_TYPE_ALLOCATION_STATE 9003
> +#define RTAS_SENSOR_TYPE_ENTITY_SENSE RTAS_SENSOR_TYPE_ALLOCATION_STATE

These should probably go in a header.

> +static bool sensor_type_is_dr(uint32_t sensor_type)
> +{
> +    switch (sensor_type) {
> +    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
> +    case RTAS_SENSOR_TYPE_DR:
> +    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                               uint32_t token, uint32_t nargs,
> +                               target_ulong args, uint32_t nret,
> +                               target_ulong rets)
> +{
> +    uint32_t sensor_type = rtas_ld(args, 0);
> +    uint32_t sensor_index = rtas_ld(args, 1);
> +    uint32_t sensor_state = rtas_ld(args, 2);

You must validate nargs anr nret before reading the RTAS parameters.

> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +
> +    if (sensor_type_is_dr(sensor_type)) {
> +        /* if this is a DR sensor we can assume sensor_index == drc_index */
> +        drc = spapr_dr_connector_by_index(sensor_index);
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);

I'd reverse the sense of the if, and move these initializations after
the if.  That makes it more obvious that these can't be uninitialized
when you use them below.

> +    } else {
> +        /* currently only DR-related sensors are implemented */
> +        goto out_unimplemented;
> +    }
> +
> +    switch (sensor_type) {
> +    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
> +        drck->set_isolation_state(drc, sensor_state);
> +        break;
> +    case RTAS_SENSOR_TYPE_DR:
> +        drck->set_indicator_state(drc, sensor_state);
> +        break;
> +    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
> +        drck->set_allocation_state(drc, sensor_state);
> +        break;
> +    default:
> +        goto out_unimplemented;
> +    }
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    return;
> +
> +out_unimplemented:
> +    DPRINTF("rtas_set_indicator: sensor/indicator not implemented: %d\n",
> +            sensor_type);
> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -444,6 +522,8 @@ static void core_rtas_register_types(void)
>                          rtas_set_power_level);
>      spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>                          rtas_get_power_level);
> +    spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
> +                        rtas_set_indicator);
>  }
>  
>  type_init(core_rtas_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state RTAS interface
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state " Michael Roth
@ 2015-01-16  6:28   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-16  6:28 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 2827 bytes --]

On Tue, Dec 23, 2014 at 06:30:19AM -0600, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>

Even simple patches should have commit messages.

> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 6aa325f..13e6e55 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -389,6 +389,39 @@ out_unimplemented:
>      rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>  }
>  
> +static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args, uint32_t nret,
> +                                  target_ulong rets)
> +{
> +    uint32_t sensor_type = rtas_ld(args, 0);
> +    uint32_t sensor_index = rtas_ld(args, 1);

Need to validate nargs and nret first.

> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    uint32_t entity_sense;
> +
> +    if (sensor_type != RTAS_SENSOR_TYPE_ENTITY_SENSE) {
> +        /* currently only DR-related sensors are implemented */
> +        DPRINTF("rtas_get_sensor_state: sensor/indicator not implemented: %d\n",
> +                sensor_type);
> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);

I think your previous patch used RTAS_OUT_PARAM_ERROR instead of
RTAS_OUT_NOT_SUPORTED in the case of an unsupported indicator type.  I
imagine these should be consistent.

> +        return;
> +    }
> +
> +    drc = spapr_dr_connector_by_index(sensor_index);
> +    if (!drc) {
> +        DPRINTF("rtas_get_sensor_state: invalid sensor/DRC index: %xh\n",
> +                sensor_index);
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    entity_sense = drck->entity_sense(drc);
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, entity_sense);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -524,6 +557,8 @@ static void core_rtas_register_types(void)
>                          rtas_get_power_level);
>      spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
>                          rtas_set_indicator);
> +    spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
> +                        rtas_get_sensor_state);
>  }
>  
>  type_init(core_rtas_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper Michael Roth
@ 2015-01-19  3:25   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  3:25 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Tue, Dec 23, 2014 at 06:30:20AM -0600, Michael Roth wrote:
> This is similar to the existing rtas_st_buffer(), but for case where
> the guest is not expecting a length-encoded byte array. Namely,
> for calls where an "work area" buffer is used to pass around
> arbitrary fields/data.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Although I'm not entirely convinced that such a simple wrapper is
worth it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
@ 2015-01-19  3:44   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  3:44 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 5333 bytes --]

On Tue, Dec 23, 2014 at 06:30:21AM -0600, Michael Roth wrote:

This really wants a commit message explaining in outline what
configure_connector does.

> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 13e6e55..d847f45 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -422,6 +422,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 1, entity_sense);
>  }
>  
> +/* configure-connector work area offsets, int32_t units for field
> + * indexes, bytes for field offset/len values.
> + *
> + * as documented by PAPR+ v2.7, 13.5.3.5
> + */
> +#define CC_IDX_NODE_NAME_OFFSET 2
> +#define CC_IDX_PROP_NAME_OFFSET 2
> +#define CC_IDX_PROP_LEN 3
> +#define CC_IDX_PROP_DATA_OFFSET 4
> +#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
> +#define CC_WA_LEN 4096

Any reason not to use a struct for this?

> +
> +static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
> +                                         sPAPREnvironment *spapr,
> +                                         uint32_t token, uint32_t nargs,
> +                                         target_ulong args, uint32_t nret,
> +                                         target_ulong rets)
> +{

As always, you need to validate nargs and nret first.

> +    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
> +    uint64_t wa_offset;
> +    uint32_t drc_index;
> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    sPAPRDRCCResponse resp;
> +    const struct fdt_property *prop = NULL;
> +    char *prop_name = NULL;
> +    int prop_len, rc;
> +
> +    drc_index = rtas_ld(wa_addr, 0);

I'm trying to work out if this is an abuse of the rtas_ld interface.
As written it wasn't intended for accessing anything other than the
immediate rtas args and return values.

> +    drc = spapr_dr_connector_by_index(drc_index);
> +    if (!drc) {
> +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> +                drc_index);
> +        rc = RTAS_OUT_PARAM_ERROR;
> +        goto out;
> +    }
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);

So, the RTAS interface is stuck with the rather clunky iteration
through the device tree.  But I don't see a good reason to replicate
that clunky interface for communication between the DR core and the
individual connector drivers.

Wouldn't it make more sense for the class callback to just return the
fdt fragment, and then do all the iteration in the DR core?

> +    switch (resp) {
> +    case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
> +        /* provide the name of the next OF node */
> +        wa_offset = CC_VAL_DATA_OFFSET;
> +        rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> +        break;
> +    case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY:
> +        /* provide the name of the next OF property */
> +        wa_offset = CC_VAL_DATA_OFFSET;
> +        rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> +
> +        /* provide the length and value of the OF property. data gets placed
> +         * immediately after NULL terminator of the OF property's name string
> +         */
> +        wa_offset += strlen(prop_name) + 1,
> +        rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
> +        rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)((struct fdt_property *)prop)->data,
> +                              prop_len);
> +        break;
> +    case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
> +    case SPAPR_DR_CC_RESPONSE_ERROR:
> +    case SPAPR_DR_CC_RESPONSE_SUCCESS:
> +        break;
> +    default:
> +        /* drck->configure_connector() should not return anything else */
> +        g_assert(false);
> +    }
> +
> +    rc = resp;
> +out:
> +    g_free(prop_name);
> +    rtas_st(rets, 0, rc);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -559,6 +638,8 @@ static void core_rtas_register_types(void)
>                          rtas_set_indicator);
>      spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
>                          rtas_get_sensor_state);
> +    spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
> +                        rtas_ibm_configure_connector);
>  }
>  
>  type_init(core_rtas_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2015-01-19  4:31   ` David Gibson
  2015-01-26 16:56     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-19  4:31 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 15750 bytes --]

On Tue, Dec 23, 2014 at 06:30:22AM -0600, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> This extends the data structures currently used to report EPOW events to
> gets via the check-exception RTAS interfaces to also include event types
> for hotplug/unplug events.
> 
> This is currently undocumented and being finalized for inclusion in PAPR
> specification, but we implement this here as an extension for guest
> userspace tools to implement (existing guest kernels simply log these
> events via a sysfs interface that's read by rtas_errd).
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   2 +-
>  hw/ppc/spapr_events.c  | 211 ++++++++++++++++++++++++++++++++++++++++---------
>  include/hw/ppc/spapr.h |   5 +-
>  3 files changed, 177 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 361b914..1bc5773 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1601,7 +1601,7 @@ static void ppc_spapr_init(MachineState *machine)
>      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
>                                              kernel_size, kernel_le,
>                                              boot_device, kernel_cmdline,
> -                                            spapr->epow_irq);
> +                                            spapr->check_exception_irq);
>      assert(spapr->fdt_skel != NULL);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 1b6157d..ebbf3a4 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -32,6 +32,9 @@
>  
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_vio.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci-host/spapr.h"
> +#include "hw/ppc/spapr_drc.h"
>  
>  #include <libfdt.h>
>  
> @@ -77,6 +80,7 @@ struct rtas_error_log {
>  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
>  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
>  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
>      uint32_t extended_length;
>  } QEMU_PACKED;
>  
> @@ -166,6 +170,38 @@ struct epow_log_full {
>      struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_hp {
> +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint8_t hotplug_type;
> +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +    uint8_t hotplug_action;
> +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> +    uint8_t hotplug_identifier;
> +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> +    uint8_t reserved;
> +    union {
> +        uint32_t index;
> +        uint32_t count;
> +        char name[1];
> +    } drc;
> +} QEMU_PACKED;
> +
> +struct hp_log_full {
> +    struct rtas_error_log hdr;
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_maina maina;
> +    struct rtas_event_log_v6_mainb mainb;
> +    struct rtas_event_log_v6_hp hp;
> +} QEMU_PACKED;
> +
>  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
>  #define EVENT_MASK_EPOW                      0x40000000
>  #define EVENT_MASK_HOTPLUG                   0x10000000
> @@ -181,29 +217,61 @@ struct epow_log_full {
>          }                                                          \
>      } while (0)
>  
> -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
>  {
> -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
>  
>      _FDT((fdt_begin_node(fdt, "event-sources")));
>  
>      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
>      _FDT((fdt_property(fdt, "interrupt-ranges",
> -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> +                       irq_ranges, sizeof(irq_ranges))));
>  
>      _FDT((fdt_begin_node(fdt, "epow-events")));
> -    _FDT((fdt_property(fdt, "interrupts",
> -                       epow_interrupts, sizeof(epow_interrupts))));
> +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
>      _FDT((fdt_end_node(fdt)));
>  
>      _FDT((fdt_end_node(fdt)));
>  }
>  
>  static struct epow_log_full *pending_epow;
> +static struct hp_log_full *pending_hp;
>  static uint32_t next_plid;
>  
> +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> +{
> +    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> +        | RTAS_LOG_V6_B0_BIGENDIAN;
> +    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> +        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> +    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> +}
> +
> +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> +                             int section_count)
> +{
> +    struct tm tm;
> +    int year;
> +
> +    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> +    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> +    /* FIXME: section version, subtype and creator id? */
> +    qemu_get_timedate(&tm, spapr->rtc_offset);
> +    year = tm.tm_year + 1900;
> +    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> +                                       | (to_bcd(year % 100) << 16)
> +                                       | (to_bcd(tm.tm_mon + 1) << 8)
> +                                       | to_bcd(tm.tm_mday));
> +    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> +                                       | (to_bcd(tm.tm_min) << 16)
> +                                       | (to_bcd(tm.tm_sec) << 8));
> +    maina->creator_id = 'H'; /* Hypervisor */
> +    maina->section_count = section_count;
> +    maina->plid = next_plid++;
> +}
> +
>  static void spapr_powerdown_req(Notifier *n, void *opaque)
>  {
>      sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> @@ -212,8 +280,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      struct rtas_event_log_v6_maina *maina;
>      struct rtas_event_log_v6_mainb *mainb;
>      struct rtas_event_log_v6_epow *epow;
> -    struct tm tm;
> -    int year;
>  
>      if (pending_epow) {
>          /* For now, we just throw away earlier events if two come
> @@ -237,27 +303,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
>                                         - sizeof(pending_epow->hdr));
>  
> -    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> -        | RTAS_LOG_V6_B0_BIGENDIAN;
> -    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> -        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> -    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> -
> -    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> -    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> -    /* FIXME: section version, subtype and creator id? */
> -    qemu_get_timedate(&tm, spapr->rtc_offset);
> -    year = tm.tm_year + 1900;
> -    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> -                                       | (to_bcd(year % 100) << 16)
> -                                       | (to_bcd(tm.tm_mon + 1) << 8)
> -                                       | to_bcd(tm.tm_mday));
> -    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> -                                       | (to_bcd(tm.tm_min) << 16)
> -                                       | (to_bcd(tm.tm_sec) << 8));
> -    maina->creator_id = 'H'; /* Hypervisor */
> -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> -    maina->plid = next_plid++;
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
>  
>      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
>      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> @@ -274,7 +321,82 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
>      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> +}
> +
> +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
> +{
> +    struct hp_log_full *new_hp;
> +    struct rtas_error_log *hdr;
> +    struct rtas_event_log_v6 *v6hdr;
> +    struct rtas_event_log_v6_maina *maina;
> +    struct rtas_event_log_v6_mainb *mainb;
> +    struct rtas_event_log_v6_hp *hp;
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
> +
> +    new_hp = g_malloc0(sizeof(struct hp_log_full));
> +    hdr = &new_hp->hdr;
> +    v6hdr = &new_hp->v6hdr;
> +    maina = &new_hp->maina;
> +    mainb = &new_hp->mainb;
> +    hp = &new_hp->hp;
> +
> +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> +                               | RTAS_LOG_SEVERITY_EVENT
> +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> +                               | RTAS_LOG_INITIATOR_HOTPLUG
> +                               | RTAS_LOG_TYPE_HOTPLUG);
> +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
> +                                       - sizeof(new_hp->hdr));
> +
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> +
> +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> +    mainb->subsystem_id = 0x80; /* External environment */
> +    mainb->event_severity = 0x00; /* Informational / non-error */
> +    mainb->event_subtype = 0x00; /* Normal shutdown */
> +
> +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> +    hp->hdr.section_version = 1; /* includes extended modifier */
> +    hp->hotplug_action = hp_action;
> +
> +
> +    switch (drc_type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> +        break;
> +    default:
> +        /* skip notification for unknown connector types */
> +        g_free(new_hp);
> +        return;
> +    }
> +
> +    if (pending_hp) {
> +        /* Just toss any pending hotplug events for now, this will
> +         * need to be fixed later on.
> +         */

So, we can get away with a 1-element queue for EPOW, because they're
just triggering a shutdown - so once the first one's processed, any
others aren't going to matter.  For hotplug you really do need a
proper queue.

> +        g_free(pending_hp);
> +    }
> +    pending_hp = new_hp;
> +
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> +}
> +
> +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
> +{
> +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
> +}
> +
> +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
> +{
> +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> @@ -298,15 +420,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
>      }
>  
> -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> -        if (sizeof(*pending_epow) < len) {
> -            len = sizeof(*pending_epow);
> -        }
> +    if (mask & EVENT_MASK_EPOW) {
> +        if (pending_epow) {
> +            if (sizeof(*pending_epow) < len) {
> +                len = sizeof(*pending_epow);
> +            }
>  
> -        cpu_physical_memory_write(buf, pending_epow, len);
> -        g_free(pending_epow);
> -        pending_epow = NULL;
> -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +            cpu_physical_memory_write(buf, pending_epow, len);
> +            g_free(pending_epow);
> +            pending_epow = NULL;
> +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        } else if (pending_hp) {

So.. the hotplug messages are a different type from EPOW, but are
still selected by EVENT_MASK_EPOW?  Seems a bit odd.

> +            if (sizeof(*pending_hp) < len) {
> +                len = sizeof(*pending_hp);
> +            }
> +
> +            cpu_physical_memory_write(buf, pending_hp, len);
> +            g_free(pending_hp);
> +            pending_hp = NULL;
> +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        }
>      } else {
>          rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
>      }
> @@ -314,7 +447,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>  
>  void spapr_events_init(sPAPREnvironment *spapr)
>  {
> -    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
> +    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b4daa42..4d50e74 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -3,6 +3,7 @@
>  
>  #include "sysemu/dma.h"
>  #include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_drc.h"
>  
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> @@ -30,7 +31,7 @@ typedef struct sPAPREnvironment {
>      struct PPCTimebase tb;
>      bool has_graphics;
>  
> -    uint32_t epow_irq;
> +    uint32_t check_exception_irq;
>      Notifier epow_notifier;
>  
>      /* Migration state */
> @@ -486,5 +487,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                   uint32_t liobn, uint64_t window, uint32_t size);
>  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                        sPAPRTCETable *tcet);
> +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
> +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
>  
>  #endif /* !defined (__HW_SPAPR_H__) */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface Michael Roth
@ 2015-01-19  4:34   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  4:34 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 3358 bytes --]

On Tue, Dec 23, 2014 at 06:30:23AM -0600, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> We don't actually rely on this interface to surface hotplug events, and
> instead rely on the similar-but-interrupt-driven check-exception RTAS
> interface used for EPOW events. However, the existence of this interface
> is needed to ensure guest kernels initialize the event-reporting
> interfaces which will in turn be used by userspace tools to handle these
> events, so we implement this interface as a stub.

I dislike the idea of implementing a stub only, since if we do fully
implement it someday, the guest won't have an easy way of determining
if it has a real implementation or the stub.

> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         | 1 +
>  hw/ppc/spapr_events.c  | 9 +++++++++
>  include/hw/ppc/spapr.h | 2 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1bc5773..a611616 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -541,6 +541,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>          refpoints, sizeof(refpoints))));
>  
>      _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
> +    _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", RTAS_EVENT_SCAN_RATE)));

It'd be nice if a comment or the commit message described the units of
this property.

>      /*
>       * According to PAPR, rtas ibm,os-term does not guarantee a return
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index ebbf3a4..434a75d 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -445,6 +445,14 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      }
>  }
>  
> +static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                            uint32_t token, uint32_t nargs,
> +                            target_ulong args,
> +                            uint32_t nret, target_ulong rets)
> +{

You should at least validate nargs and nret.

> +    rtas_st(rets, 0, 1); /* no error events found */
> +}
> +
>  void spapr_events_init(sPAPREnvironment *spapr)
>  {
>      spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
> @@ -452,4 +460,5 @@ void spapr_events_init(sPAPREnvironment *spapr)
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
>                          check_exception);
> +    spapr_rtas_register(RTAS_EVENT_SCAN, "event-scan", event_scan);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4d50e74..973193d 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -453,6 +453,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> +#define RTAS_EVENT_SCAN_RATE    1
> +
>  typedef struct sPAPRTCETable sPAPRTCETable;
>  
>  #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt()
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt() Michael Roth
@ 2015-01-19  5:15   ` David Gibson
  2015-01-26 20:35     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:15 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 10925 bytes --]

On Tue, Dec 23, 2014 at 06:30:24AM -0600, Michael Roth wrote:
> This function handles generation of ibm,drc-* array device tree
> properties to describe DRC topology to guests. This will by used
> by the guest to direct RTAS calls to manage any dynamic resources
> we associate with a particular DR Connector as part of
> hotplug/unplug.
> 
> Since general management of boot-time device trees are handled
> outside of sPAPRDRConnector, we insert these values blindly given
> an FDT and offset. A mask of sPAPRDRConnector types is given to
> instruct us on what types of connectors entries should be generated
> for, since descriptions for different connectors may live in
> different parts of the device tree.
> 
> Based on code originally written by Nathan Fontenot.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_drc.c         | 225 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |   3 +-
>  2 files changed, 227 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index f81c6d1..b162184 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -501,3 +501,228 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>              (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
>              (id & DRC_INDEX_ID_MASK));
>  }
> +
> +/* internal helper to gather up DRC info specific to populating DRC
> + * topology information in the device tree.
> + */
> +typedef struct DRConnectorDTInfo {
> +    char drc_type[64];
> +    char drc_name[64];
> +    uint32_t drc_index;
> +    uint32_t drc_power_domain;
> +} DRConnectorDTInfo;
> +
> +/* generate a string the describes the DRC to encode into the
> + * device tree.
> + *
> + * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
> + */
> +static void spapr_drc_get_type_str(char *buf, sPAPRDRConnectorType type)
> +{
> +    switch (type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +        sprintf(buf, "CPU");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> +        sprintf(buf, "PHB");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> +        sprintf(buf, "SLOT");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        sprintf(buf, "28");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        sprintf(buf, "MEM");
> +        break;
> +    default:
> +        g_assert(false);
> +    }

So this case is simple enough that you can probably get away with it,
but still - interfaces that involve writing to a buffer without any
length checks make me very nervous.

> +}
> +
> +/* generate a human-readable name for a DRC to encode into the DT
> + * description. this is mainly only used within a guest in place
> + * of the unique DRC index.
> + *
> + * in the case of VIO/PCI devices, it corresponds to a
> + * "location code" that maps a logical device/function (DRC index)
> + * to a physical (or virtual in the case of VIO) location in the
> + * system by chaining together the "location label" for each
> + * encapsulating component.
> + *
> + * since this is more to do with diagnosing physical hardware
> + * issues than guest compatibility, we choose location codes/DRC
> + * names that adhere to the documented format, but avoid encoding
> + * the entire topology information into the label/code, instead
> + * just using the location codes based on the labels for the
> + * endpoints (VIO/PCI adaptor connectors), which is basically
> + * just "C" followed by an integer ID.

Hrm.. would it make sense to include here the qemu "id" value on the
DRC device?  That will make names which are matchable to specific
elements on the qemu command line, which about as close an equivalent
to a physical location as I can think of.

> + * DRC names as documented by PAPR+ v2.7, 13.5.2.4
> + * location codes as documented by PAPR+ v2.7, 12.3.1.5
> + */
> +static void spapr_drc_get_name_str(char *buf,
> +                                   sPAPRDRConnectorType type,
> +                                   uint32_t drc_index)
> +{
> +    uint32_t id = drc_index & DRC_INDEX_ID_MASK;
> +
> +    switch (type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +        sprintf(buf, "CPU %d", id);
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> +        sprintf(buf, "PHB %d", id);
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        sprintf(buf, "C%d", id);
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        sprintf(buf, "LMB %d", id);
> +        break;
> +    default:
> +        g_assert(false);
> +    }
> +}
> +
> +static DRConnectorDTInfo *spapr_dr_connector_get_info(uint32_t drc_type_mask,
> +                                                      unsigned int *count)
> +{
> +    Object *root_container;
> +    ObjectProperty *prop;
> +    GArray *drc_info_list = g_array_new(false, true,
> +                                        sizeof(DRConnectorDTInfo));
> +
> +    /* aliases for all DRConnector objects will be rooted in QOM
> +     * composition tree at /dr-connector
> +     */
> +    root_container = container_get(object_get_root(), "/dr-connector");
> +
> +    QTAILQ_FOREACH(prop, &root_container->properties, node) {
> +        Object *obj;
> +        sPAPRDRConnector *drc;
> +        sPAPRDRConnectorClass *drck;
> +        DRConnectorDTInfo drc_info;
> +
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +
> +        obj = object_property_get_link(root_container, prop->name, NULL);
> +        drc = SPAPR_DR_CONNECTOR(obj);
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +        if ((drc->type & drc_type_mask) == 0) {
> +            continue;
> +        }
> +
> +        drc_info.drc_index = drck->get_index(drc);
> +        drc_info.drc_power_domain = -1;
> +        spapr_drc_get_type_str(drc_info.drc_type, drc->type);
> +        spapr_drc_get_name_str(drc_info.drc_name, drc->type,
> +                               drck->get_index(drc));
> +        g_array_append_val(drc_info_list, drc_info);
> +    }
> +
> +    if (count) {
> +        *count = drc_info_list->len;
> +    }
> +
> +    /* if count is zero, free everything, including internal storage
> +     * for array
> +     */
> +    return (DRConnectorDTInfo *)g_array_free(drc_info_list, count == 0);
> +}
> +
> +/**
> + * spapr_drc_populate_dt
> + *
> + * @fdt: libfdt device tree
> + * @path: path in the DT to generate properties
> + * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
> + *   to the types of DRCs to generate entries for
> + *
> + * generate OF properties to describe DRC topology/indices to guests
> + *
> + * as documented in PAPR+ v2.1, 13.5.2
> + */
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask)
> +{
> +    DRConnectorDTInfo *drc_info_list;
> +    unsigned int i, count;
> +    char *char_buf;
> +    uint32_t *char_buf_count;
> +    uint32_t *int_buf;
> +    int char_buf_offset, ret;
> +
> +    drc_info_list =
> +        spapr_dr_connector_get_info(drc_type_mask, &count);

This is the only call to spapr_dt_connector_get_info().  I don't see a
lot of point in splitting it out from this function, since it involves
a not particular easy to work with array encoding of the information.
Why not go direct from the drc objects to the fdt.

> +    if (!count) {
> +        return 0;
> +    }
> +
> +    int_buf = g_new0(uint32_t, count + 1);
> +    int_buf[0] = cpu_to_be32(count);
> +    char_buf = g_new0(char, count * 128 + sizeof(uint32_t));
> +    char_buf_count = (uint32_t *)&char_buf[0];
> +    *char_buf_count = cpu_to_be32(count);
> +
> +    /* ibm,drc-indexes */
> +    for (i = 0; i < count; i++) {
> +        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_index);
> +    }
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
> +                      (count + 1) * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
> +        goto out;
> +    }
> +
> +    /* ibm,drc-power-domains */
> +    for (i = 0; i < count; i++) {
> +        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_power_domain);
> +    }
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
> +                      (count + 1) * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> +        goto out;
> +    }
> +
> +    /* ibm,drc-names */
> +    char_buf_offset = sizeof(uint32_t);
> +
> +    for (i = 0; i < count; i++) {
> +        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_name);
> +        char_buf_offset += strlen(drc_info_list[i].drc_name) + 1;
> +    }
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf,
> +                      char_buf_offset);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> +        goto out;
> +    }
> +
> +    /* ibm,drc-types */
> +    char_buf_offset = sizeof(uint32_t);
> +
> +    for (i = 0; i < count; i++) {
> +        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_type);
> +        char_buf_offset += strlen(drc_info_list[i].drc_type) + 1;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf,
> +                      char_buf_offset);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> +        goto out;
> +    }
> +
> +out:
> +    g_free(int_buf);
> +    g_free(char_buf);
> +    g_free(drc_info_list);
> +    return ret;
> +}
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 63ec687..5c70140 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -193,9 +193,10 @@ typedef struct sPAPRDRConnectorClass {
>  
>  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
>                                           sPAPRDRConnectorType type,
> -                                         uint32_t token);
> +                                         uint32_t id);
>  sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
>  sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>                                             uint32_t id);
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask);
>  
>  #endif /* __HW_SPAPR_DRC_H__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type Michael Roth
@ 2015-01-19  5:16   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:16 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

On Tue, Dec 23, 2014 at 06:30:25AM -0600, Michael Roth wrote:
> And make it the default. This is identical to pseries-2.2 for now,
> but subsequent commits will use it to enable pseries-2.3+ features.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

I think this should go in immediately, regardless of what revision is
needed for the rest of the series.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
@ 2015-01-19  5:18   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:18 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Tue, Dec 23, 2014 at 06:30:26AM -0600, Michael Roth wrote:
> This option enables/disables PCI hotplug for a particular PHB.
> 
> Also add machine compatibility code to disable it by default for machine
> types prior to pseries-2.3.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
@ 2015-01-19  5:20   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:20 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On Tue, Dec 23, 2014 at 06:30:27AM -0600, Michael Roth wrote:
> These will be used to support hotplug/unplug of PCI devices to the PCI
> bus associated with a particular PHB.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs Michael Roth
@ 2015-01-19  5:22   ` David Gibson
  2015-01-26 20:44     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:22 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

On Tue, Dec 23, 2014 at 06:30:28AM -0600, Michael Roth wrote:
> Reserve 32 entries of type PCI in each PHB's initial FDT. This
> advertises to guests that each PHB is DR-capable device with
> physical hotpluggable slots. This is necessary for allowing
> hotplugging of devices to it later via bus rescan or guest rpaphp
> hotplug module.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 73e86a4..a5d7791 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -47,6 +47,8 @@
>  #define RTAS_TYPE_MSI           1
>  #define RTAS_TYPE_MSIX          2
>  
> +#define FDT_MAX_SIZE            0x10000

This define doesn't appear to be used in the new code.

>  #include "hw/ppc/spapr_drc.h"
>  
>  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> @@ -872,7 +874,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>                            uint32_t xics_phandle,
>                            void *fdt)
>  {
> -    int bus_off, i, j;
> +    int bus_off, i, j, ret;
>      char nodename[256];
>      uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
>      struct {
> @@ -951,6 +953,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      object_child_foreach(OBJECT(phb), spapr_phb_children_dt,
>                           &((sPAPRTCEDT){ .fdt = fdt, .node_off = bus_off }));
>  
> +    ret = spapr_drc_populate_dt(fdt, bus_off, SPAPR_DR_CONNECTOR_TYPE_PCI);

AFAICT this will add information for all PCI connectors in the
system.  Shouldn't it only add the ones belonging to this PHB?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c Michael Roth
@ 2015-01-19  5:24   ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:24 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Tue, Dec 23, 2014 at 06:30:29AM -0600, Michael Roth wrote:

You need a commit message with a rationale for this.

> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/pci/pci.c         | 2 +-
>  include/hw/pci/pci.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 371699c..bf16fc8 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -122,7 +122,7 @@ static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
>  
>  static QLIST_HEAD(, PCIHostState) pci_host_bridges;
>  
> -static int pci_bar(PCIDevice *d, int reg)
> +int pci_bar(PCIDevice *d, int reg)
>  {
>      uint8_t type;
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 97e4257..ae7c3fc 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -330,6 +330,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
>  int pci_device_load(PCIDevice *s, QEMUFile *f);
>  MemoryRegion *pci_address_space(PCIDevice *dev);
>  MemoryRegion *pci_address_space_io(PCIDevice *dev);
> +int pci_bar(PCIDevice *d, int reg);
>  
>  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
>  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
@ 2015-01-19  5:58   ` David Gibson
  2015-01-26 21:17     ` Michael Roth
  2015-01-23  5:17   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-19  5:58 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 16428 bytes --]

On Tue, Dec 23, 2014 at 06:30:30AM -0600, Michael Roth wrote:
> This enables hotplug for PHB bridges. Upon hotplug we generate the
> OF-nodes required by PAPR specification and IEEE 1275-1994
> "PCI Bus Binding to Open Firmware" for the device.
> 
> We associate the corresponding FDT for these nodes with the DrcEntry
> corresponding to the slot, which will be fetched via
> ibm,configure-connector RTAS calls by the guest as described by PAPR
> specification. The FDT is cleaned up in the case of unplug.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c | 268 +++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 249 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index a5d7791..94e33b4 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -33,6 +33,7 @@
>  #include <libfdt.h>
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qapi/qmp/qerror.h"
>  
>  #include "hw/pci/pci_bus.h"
>  
> @@ -51,6 +52,15 @@
>  
>  #include "hw/ppc/spapr_drc.h"
>  
> +#define FDT_MAX_SIZE            0x10000
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret < 0) {                                             \
> +            return ret;                                            \
> +        }                                                          \
> +    } while (0)
> +
>  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
>  {
>      sPAPRPHBState *sphb;
> @@ -483,6 +493,237 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &phb->iommu_as;
>  }
>  
> +/* Macros to operate with address in OF binding to PCI */
> +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> +
> +/* for 'reg'/'assigned-addresses' OF properties */
> +#define RESOURCE_CELLS_SIZE 2
> +#define RESOURCE_CELLS_ADDRESS 3
> +#define RESOURCE_CELLS_TOTAL \
> +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> +
> +static void fill_resource_props(PCIDevice *d, int bus_num,
> +                                uint32_t *reg, int *reg_size,
> +                                uint32_t *assigned, int *assigned_size)

This is another interface which writes to a buffer without any size
limit information being passed through, which makes me nervous.

> +{
> +    uint32_t *reg_row, *assigned_row;
> +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> +                       b_ddddd(PCI_SLOT(d->devfn)) |
> +                       b_fff(PCI_FUNC(d->devfn)));
> +    int i, idx = 0;
> +
> +    reg[0] = cpu_to_be32(dev_id);
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        if (!d->io_regions[i].size) {
> +            continue;
> +        }
> +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> +        reg_row[0] = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> +            reg_row[0] |= cpu_to_be32(b_ss(1));
> +        } else {
> +            reg_row[0] |= cpu_to_be32(b_ss(2));
> +        }
> +        assigned_row[0] = cpu_to_be32(reg_row[0] | b_n(1));
> +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);

You don't appear to ever fill in reg_row[1] and reg_row[2].

> +        idx++;
> +    }
> +
> +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +}
> +
> +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> +                                       int phb_index, int drc_index)
> +{
> +    int slot = PCI_SLOT(dev->devfn);
> +    char slotname[16];
> +    bool is_bridge = 1;

Should use the true and false macros for a bool type, not 0 and 1.

> +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    int pci_status, reg_size, assigned_size;
> +
> +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> +        PCI_HEADER_TYPE_NORMAL) {
> +        is_bridge = 0;
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> +
> +    /* if this device is NOT a bridge */
> +    if (!is_bridge) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> +
> +    /* the following fdt cells are masked off the pci status register */
> +    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> +                          PCI_STATUS_FAST_BACK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> +                          PCI_STATUS_66MHZ & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> +                          PCI_STATUS_UDF & pci_status));

These aren't quite right.  According to the OF PCI binding these are
boolean properties encoded in the usual way, which is to say absent
for false and present-but-empty for true.   They shouldn't contain an
actual value.

> +
> +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> +    sprintf(slotname, "Slot %d", slot + phb_index * PCI_SLOT_MAX);
> +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> +                          RESOURCE_CELLS_ADDRESS));
> +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> +                          RESOURCE_CELLS_SIZE));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> +                          RESOURCE_CELLS_SIZE));
> +    fill_resource_props(dev, phb_index, reg, &reg_size,
> +                        assigned, &assigned_size);
> +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> +                     assigned, assigned_size));
> +
> +    return 0;
> +}
> +
> +/* create OF node for pci device and required OF DT properties */
> +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> +                                       int drc_index, int *dt_offset)
> +{
> +    void *fdt_orig, *fdt;
> +    int offset, ret;
> +    int slot = PCI_SLOT(dev->devfn);
> +    char nodename[512];
> +
> +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> +    fdt_begin_node(fdt_orig, "");
> +    fdt_end_node(fdt_orig);
> +    fdt_finish(fdt_orig);

Recent versions of libfdt have an fdt_create_empty_tree() function to
simplify that standard idiom.

> +    fdt = g_malloc0(FDT_MAX_SIZE);
> +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);

There's no need for a second malloc here - fdt_open_into() may be used
in place.

> +    sprintf(nodename, "pci@%d", slot);
> +    offset = fdt_add_subnode(fdt, 0, nodename);
> +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
> +    g_assert(!ret);
> +    g_free(fdt_orig);
> +
> +    *dt_offset = offset;
> +    return fdt;
> +}
> +
> +static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
> +                                     sPAPRPHBState *phb,
> +                                     PCIDevice *pdev)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    DeviceState *dev = DEVICE(pdev);
> +    int drc_index = drck->get_index(drc);
> +    void *fdt = NULL;
> +    int fdt_start_offset = 0;
> +
> +    /* boot-time devices get their device tree node created by SLOF, but for
> +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> +     * it via RTAS

Now that we have to have this code in qemu for the hotplug case we may
want to consider using it for boot-time devices as well, and removing
the corresponding code from SLOF, but that's a problem for another day.

> +     */
> +    if (dev->hotplugged) {
> +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
> +                                        &fdt_start_offset);
> +    }
> +    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
> +}
> +
> +static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
> +{
> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
> +                                        sPAPRPHBState *phb,
> +                                        PCIDevice *pdev)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
> +}
> +
> +static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> +                               DeviceState *plugged_dev, Error **errp)

So, this function is hotplugging a PCI device into an existing PHB,
rather than hotplugging a PHB itself.  Since the DR protocol does
support both operations, I could see this name becoming confusing.

> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);

Is it safe to call this before checking phb->dr_enabled?

> +    /* if DR is disabled we don't need to do anything in the case of
> +     * hotplug or coldplug callbacks
> +     */
> +    if (!phb->dr_enabled) {
> +        /* if this is a hotplug operation initiated by the user
> +         * we need to let them know it's not enabled
> +         */
> +        if (plugged_dev->hotplugged) {
> +            error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                      object_get_typename(OBJECT(phb)));
> +        }
> +        return;
> +    }
> +
> +    g_assert(drc);
> +    spapr_device_hotplug_add(drc, phb, pdev);
> +}
> +
> +static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> +                                 DeviceState *plugged_dev, Error **errp)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> +
> +    if (!phb->dr_enabled) {
> +        error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                  object_get_typename(OBJECT(phb)));
> +        return;
> +    }
> +
> +    spapr_device_hotplug_remove(drc, phb, pdev);
> +}
> +
>  static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
>      SysBusDevice *s = SYS_BUS_DEVICE(dev);
> @@ -570,6 +811,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>                             &sphb->memspace, &sphb->iospace,
>                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>      phb->bus = bus;
> +    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
>  
>      /*
>       * Initialize PHB address space.
> @@ -806,6 +1048,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> +    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>  
>      hc->root_bus_path = spapr_phb_root_bus_path;
>      dc->realize = spapr_phb_realize;
> @@ -815,6 +1058,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>      dc->cannot_instantiate_with_device_add_yet = false;
>      spc->finish_realize = spapr_phb_finish_realize;
> +    hp->plug = spapr_phb_hot_plug;
> +    hp->unplug = spapr_phb_hot_unplug;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> @@ -823,6 +1068,10 @@ static const TypeInfo spapr_phb_info = {
>      .instance_size = sizeof(sPAPRPHBState),
>      .class_init    = spapr_phb_class_init,
>      .class_size    = sizeof(sPAPRPHBClass),
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_HOTPLUG_HANDLER },
> +        { }
> +    }
>  };
>  
>  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> @@ -836,17 +1085,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
>      return PCI_HOST_BRIDGE(dev);
>  }
>  
> -/* Macros to operate with address in OF binding to PCI */
> -#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> -#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> -#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> -#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> -#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> -#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> -#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> -#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> -#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> -
>  typedef struct sPAPRTCEDT {
>      void *fdt;
>      int node_off;
> @@ -906,14 +1144,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>          return bus_off;
>      }
>  
> -#define _FDT(exp) \
> -    do { \
> -        int ret = (exp);                                           \
> -        if (ret < 0) {                                             \
> -            return ret;                                            \
> -        }                                                          \
> -    } while (0)
> -
>      /* Write PHB properties */
>      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
>      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
@ 2015-01-19  6:00   ` David Gibson
  2015-01-26 21:32     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-19  6:00 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]

On Tue, Dec 23, 2014 at 06:30:31AM -0600, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This uses extension of existing EPOW interrupt/event mechanism
> to notify userspace tools like librtas/drmgr to handle
> in-guest configuration/cleanup operations in response to
> device_add/device_del.
> 
> Userspace tools that don't implement this extension will need
> to be run manually in response/advance of device_add/device_del,
> respectively.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 94e33b4..f17f984 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -705,6 +705,9 @@ static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
>  
>      g_assert(drc);
>      spapr_device_hotplug_add(drc, phb, pdev);
> +    if (plugged_dev->hotplugged) {
> +        spapr_hotplug_req_add_event(drc);
> +    }
>  }
>  
>  static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> @@ -722,6 +725,7 @@ static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
>      }
>  
>      spapr_device_hotplug_remove(drc, phb, pdev);
> +    spapr_hotplug_req_remove_event(drc);

The event is sent after the "physical" remove.  Is that correct?

>  }
>  
>  static void spapr_phb_realize(DeviceState *dev, Error **errp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
  2015-01-19  5:58   ` David Gibson
@ 2015-01-23  5:17   ` Alexey Kardashevskiy
  2015-01-26 21:20     ` Michael Roth
  1 sibling, 1 reply; 55+ messages in thread
From: Alexey Kardashevskiy @ 2015-01-23  5:17 UTC (permalink / raw)
  To: Michael Roth, qemu-devel
  Cc: agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

On 12/23/2014 11:30 PM, Michael Roth wrote:
> This enables hotplug for PHB bridges. Upon hotplug we generate the
> OF-nodes required by PAPR specification and IEEE 1275-1994
> "PCI Bus Binding to Open Firmware" for the device.
> 
> We associate the corresponding FDT for these nodes with the DrcEntry
> corresponding to the slot, which will be fetched via
> ibm,configure-connector RTAS calls by the guest as described by PAPR
> specification. The FDT is cleaned up in the case of unplug.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c | 268 +++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 249 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index a5d7791..94e33b4 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -33,6 +33,7 @@
>  #include <libfdt.h>
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qapi/qmp/qerror.h"
>  
>  #include "hw/pci/pci_bus.h"
>  
> @@ -51,6 +52,15 @@
>  
>  #include "hw/ppc/spapr_drc.h"
>  
> +#define FDT_MAX_SIZE            0x10000
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret < 0) {                                             \
> +            return ret;                                            \
> +        }                                                          \
> +    } while (0)
> +
>  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
>  {
>      sPAPRPHBState *sphb;
> @@ -483,6 +493,237 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &phb->iommu_as;
>  }
>  
> +/* Macros to operate with address in OF binding to PCI */
> +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> +
> +/* for 'reg'/'assigned-addresses' OF properties */
> +#define RESOURCE_CELLS_SIZE 2
> +#define RESOURCE_CELLS_ADDRESS 3
> +#define RESOURCE_CELLS_TOTAL \
> +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> +
> +static void fill_resource_props(PCIDevice *d, int bus_num,
> +                                uint32_t *reg, int *reg_size,
> +                                uint32_t *assigned, int *assigned_size)
> +{
> +    uint32_t *reg_row, *assigned_row;
> +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> +                       b_ddddd(PCI_SLOT(d->devfn)) |
> +                       b_fff(PCI_FUNC(d->devfn)));
> +    int i, idx = 0;
> +
> +    reg[0] = cpu_to_be32(dev_id);
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        if (!d->io_regions[i].size) {
> +            continue;
> +        }
> +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> +        reg_row[0] = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> +            reg_row[0] |= cpu_to_be32(b_ss(1));
> +        } else {
> +            reg_row[0] |= cpu_to_be32(b_ss(2));
> +        }
> +        assigned_row[0] = cpu_to_be32(reg_row[0] | b_n(1));
> +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
> +        idx++;
> +    }
> +
> +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +}
> +
> +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> +                                       int phb_index, int drc_index)
> +{
> +    int slot = PCI_SLOT(dev->devfn);
> +    char slotname[16];
> +    bool is_bridge = 1;
> +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    int pci_status, reg_size, assigned_size;
> +
> +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> +        PCI_HEADER_TYPE_NORMAL) {
> +        is_bridge = 0;
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> +
> +    /* if this device is NOT a bridge */
> +    if (!is_bridge) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> +
> +    /* the following fdt cells are masked off the pci status register */
> +    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> +                          PCI_STATUS_FAST_BACK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> +                          PCI_STATUS_66MHZ & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> +                          PCI_STATUS_UDF & pci_status));
> +
> +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> +    sprintf(slotname, "Slot %d", slot + phb_index * PCI_SLOT_MAX);
> +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> +                          RESOURCE_CELLS_ADDRESS));
> +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> +                          RESOURCE_CELLS_SIZE));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> +                          RESOURCE_CELLS_SIZE));
> +    fill_resource_props(dev, phb_index, reg, &reg_size,
> +                        assigned, &assigned_size);
> +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> +                     assigned, assigned_size));
> +
> +    return 0;
> +}
> +
> +/* create OF node for pci device and required OF DT properties */
> +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> +                                       int drc_index, int *dt_offset)
> +{
> +    void *fdt_orig, *fdt;
> +    int offset, ret;
> +    int slot = PCI_SLOT(dev->devfn);
> +    char nodename[512];
> +
> +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> +    fdt_begin_node(fdt_orig, "");
> +    fdt_end_node(fdt_orig);
> +    fdt_finish(fdt_orig);
> +
> +    fdt = g_malloc0(FDT_MAX_SIZE);
> +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> +    sprintf(nodename, "pci@%d", slot);
> +    offset = fdt_add_subnode(fdt, 0, nodename);
> +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
> +    g_assert(!ret);
> +    g_free(fdt_orig);
> +
> +    *dt_offset = offset;
> +    return fdt;
> +}
> +
> +static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
> +                                     sPAPRPHBState *phb,
> +                                     PCIDevice *pdev)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    DeviceState *dev = DEVICE(pdev);
> +    int drc_index = drck->get_index(drc);
> +    void *fdt = NULL;
> +    int fdt_start_offset = 0;
> +
> +    /* boot-time devices get their device tree node created by SLOF, but for
> +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> +     * it via RTAS
> +     */
> +    if (dev->hotplugged) {
> +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
> +                                        &fdt_start_offset);
> +    }
> +    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
> +}
> +
> +static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
> +{


I believe pci_device_reset(ccs->dev) is missing here as we need to deassert
INTx or otherwise we hit assert in pcibus_reset().


> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
> +                                        sPAPRPHBState *phb,
> +                                        PCIDevice *pdev)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
> +}
> +
> +static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> +                               DeviceState *plugged_dev, Error **errp)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> +
> +    /* if DR is disabled we don't need to do anything in the case of
> +     * hotplug or coldplug callbacks
> +     */
> +    if (!phb->dr_enabled) {
> +        /* if this is a hotplug operation initiated by the user
> +         * we need to let them know it's not enabled
> +         */
> +        if (plugged_dev->hotplugged) {
> +            error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                      object_get_typename(OBJECT(phb)));
> +        }
> +        return;
> +    }
> +
> +    g_assert(drc);
> +    spapr_device_hotplug_add(drc, phb, pdev);
> +}
> +
> +static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> +                                 DeviceState *plugged_dev, Error **errp)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> +
> +    if (!phb->dr_enabled) {
> +        error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                  object_get_typename(OBJECT(phb)));
> +        return;
> +    }
> +
> +    spapr_device_hotplug_remove(drc, phb, pdev);
> +}
> +
>  static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
>      SysBusDevice *s = SYS_BUS_DEVICE(dev);
> @@ -570,6 +811,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>                             &sphb->memspace, &sphb->iospace,
>                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>      phb->bus = bus;
> +    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
>  
>      /*
>       * Initialize PHB address space.
> @@ -806,6 +1048,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> +    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>  
>      hc->root_bus_path = spapr_phb_root_bus_path;
>      dc->realize = spapr_phb_realize;
> @@ -815,6 +1058,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>      dc->cannot_instantiate_with_device_add_yet = false;
>      spc->finish_realize = spapr_phb_finish_realize;
> +    hp->plug = spapr_phb_hot_plug;
> +    hp->unplug = spapr_phb_hot_unplug;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> @@ -823,6 +1068,10 @@ static const TypeInfo spapr_phb_info = {
>      .instance_size = sizeof(sPAPRPHBState),
>      .class_init    = spapr_phb_class_init,
>      .class_size    = sizeof(sPAPRPHBClass),
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_HOTPLUG_HANDLER },
> +        { }
> +    }
>  };
>  
>  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> @@ -836,17 +1085,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
>      return PCI_HOST_BRIDGE(dev);
>  }
>  
> -/* Macros to operate with address in OF binding to PCI */
> -#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> -#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> -#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> -#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> -#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> -#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> -#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> -#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> -#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> -
>  typedef struct sPAPRTCEDT {
>      void *fdt;
>      int node_off;
> @@ -906,14 +1144,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>          return bus_off;
>      }
>  
> -#define _FDT(exp) \
> -    do { \
> -        int ret = (exp);                                           \
> -        if (ret < 0) {                                             \
> -            return ret;                                            \
> -        }                                                          \
> -    } while (0)
> -
>      /* Write PHB properties */
>      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
>      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device
  2015-01-16  6:19   ` David Gibson
@ 2015-01-26  4:01     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26  4:01 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-16 00:19:08)
> On Tue, Dec 23, 2014 at 06:30:16AM -0600, Michael Roth wrote:
> > This device emulates a firmware abstraction used by pSeries guests to
> > manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
> > memory, and CPUs. It is conceptually similar to an SHPC device,
> > complete with LED indicators to identify individual slots to physical
> > physical users and indicate when it is safe to remove a device. In
> > some cases it is also used to manage virtualized resources, such a
> > memory, CPUs, and physical-host bridges, which in the case of pSeries
> > guests are virtualized resources where the physical components are
> > managed by the host.
> > 
> > Guests communicate with these DR Connectors using RTAS calls,
> > generally by addressing the unique DRC index associated with a
> > particular connector for a particular resource. For introspection
> > purposes we expose this state initially as QOM properties, and
> > in subsequent patches will introduce the RTAS calls that make use of
> > it. This constitutes to the 'guest' interface.
> > 
> > On the QEMU side we provide an attach/detach interface to associate
> > or cleanup a DeviceState with a particular sPAPRDRConnector in
> > response to hotplug/unplug, respectively. This constitutes the
> > 'physical' interface to the DR Connector.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/Makefile.objs       |   2 +-
> >  hw/ppc/spapr_drc.c         | 503 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h | 201 ++++++++++++++++++
> >  3 files changed, 705 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/ppc/spapr_drc.c
> >  create mode 100644 include/hw/ppc/spapr_drc.h
> > 
> > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> > index 19d9920..ea010fd 100644
> > --- a/hw/ppc/Makefile.objs
> > +++ b/hw/ppc/Makefile.objs
> > @@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
> >  # IBM pSeries (sPAPR)
> >  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
> >  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> > -obj-$(CONFIG_PSERIES) += spapr_pci.o
> > +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_drc.o
> >  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >  obj-y += spapr_pci_vfio.o
> >  endif
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > new file mode 100644
> > index 0000000..f81c6d1
> > --- /dev/null
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -0,0 +1,503 @@
> > +/*
> > + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> > + *
> > + * Copyright IBM Corp. 2014
> > + *
> > + * Authors:
> > + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "hw/ppc/spapr_drc.h"
> > +#include "qom/object.h"
> > +#include "hw/qdev.h"
> > +#include "qapi/visitor.h"
> > +#include "qemu/error-report.h"
> > +
> > +/* #define DEBUG_SPAPR_DRC */
> > +
> > +#ifdef DEBUG_SPAPR_DRC
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> > +#define DPRINTFN(fmt, ...) \
> > +    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#define DPRINTFN(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> > +#define DRC_CONTAINER_PATH "/dr-connector"
> > +#define DRC_INDEX_TYPE_SHIFT 28
> > +#define DRC_INDEX_ID_MASK ~(~0 << DRC_INDEX_TYPE_SHIFT)
> 
> I'm not sure if there are actually any situations where it can break,
> but safest to put parens around the whole macro body, just in case of
> macro vs. precedence weirdness.
> 
> > +static int set_isolation_state(sPAPRDRConnector *drc,
> > +                               sPAPRDRIsolationState state)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +    DPRINTFN("set_isolation_state: %x", state);
> > +    drc->isolation_state = state;
> > +    if (drc->awaiting_release &&
> > +        drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> > +        drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> > +                     drc->detach_cb_opaque);
> > +    }
> > +    return 0;
> > +}
> > +
> > +static int set_indicator_state(sPAPRDRConnector *drc,
> > +                               sPAPRDRIndicatorState state)
> > +{
> > +    DPRINTFN("set_indicator_state: %x", state);
> > +    drc->indicator_state = state;
> > +    return 0;
> > +}
> > +
> > +static int set_allocation_state(sPAPRDRConnector *drc,
> > +                                sPAPRDRAllocationState state)
> > +{
> > +    DPRINTFN("set_allocation_state: %x", state);
> > +    drc->indicator_state = state;
> 
> This should be drc->allocation_state, surely.
> 
> > +    return 0;
> > +}
> > +
> > +static uint32_t get_id(sPAPRDRConnector *drc)
> > +{
> > +    /* this value is unique for DRCs of a particular type, but may
> > +     * overlap with the id's of other DRCs. the value is used both
> > +     * to calculate a unique (across all DRC types) index, as well
> > +     * as generating the ibm,drc-names OFDT property that describes
> > +     * DRCs
> > +     */
> > +    return drc->id;
> > +}
> 
> Since this is static anyway, why not just reference drc->id directly?
> 
> > +static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
> > +{
> > +    uint32_t shift = 0;
> > +
> > +    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
> > +    while (type != (1 << shift)) {
> > +        shift++;
> > +    }
> > +    return shift;
> > +}
> > +
> > +static uint32_t get_index(sPAPRDRConnector *drc)
> > +{
> > +    /* no set format for a drc index: it only needs to be globally
> > +     * unique. this is how we encode the DRC type on bare-metal
> > +     * however, so might as well do that here
> > +     */
> > +    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
> > +            (drc->id & DRC_INDEX_ID_MASK);
> > +}
> > +
> > +static uint32_t get_type(sPAPRDRConnector *drc)
> > +{
> > +    return drc->type;
> > +}
> > +
> > +/*
> > + * dr-entity-sense sensor value
> > + * returned via get-sensor-state RTAS calls
> > + * as expected by state diagram in PAPR+ 2.7, 13.4
> > + * based on the current allocation/indicator/power states
> > + * for the DR connector.
> > + */
> > +static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
> > +{
> > +    if (drc->dev) {
> > +        /* this assumes all PCI devices are assigned to
> > +         * a 'live insertion' power domain, where QEMU
> > +         * manages power state automatically as opposed
> > +         * to the guest. present, non-PCI resources are
> > +         * unaffected by power state.
> > +         */
> 
> Is it possible to make an assert() to check that?

If we find a need to model anything other than live-insertion
domains, we could maybe track it via a link property of the DRC,
but until then it's basically a hard-coded value we only use for
device tree creation so there's not really any state to assert at
this point.

> 
> > +        return SPAPR_DR_ENTITY_SENSE_PRESENT;
> > +    }
> > +
> > +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
> > +        /* PCI devices, and only PCI devices, use PRESENT
> > +         * in cases where we'd otherwise use UNUSABLE
> > +         */
> > +        return SPAPR_DR_ENTITY_SENSE_EMPTY;
> > +    }
> > +    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
> > +}
> > +
> > +static sPAPRDRCCResponse configure_connector_common(sPAPRDRCCState *ccs,
> > +                            char **prop_name, const struct fdt_property **prop,
> 
> Maybe rename prop_name to name, since it's also used for node names.
> 
> > +                            int *prop_len)
> > +{
> > +    sPAPRDRCCResponse resp = SPAPR_DR_CC_RESPONSE_CONTINUE;
> > +    int fdt_offset_next;
> > +
> > +    *prop_name = NULL;
> > +    *prop = NULL;
> > +    *prop_len = 0;
> > +
> > +    if (!ccs->fdt) {
> > +        return SPAPR_DR_CC_RESPONSE_ERROR;
> > +    }
> > +
> > +    while (resp == SPAPR_DR_CC_RESPONSE_CONTINUE) {
> > +        const char *name_cur;
> > +        uint32_t tag;
> > +        int name_cur_len;
> > +
> > +        tag = fdt_next_tag(ccs->fdt, ccs->fdt_offset, &fdt_offset_next);
> > +        switch (tag) {
> > +        case FDT_BEGIN_NODE:
> > +            ccs->fdt_depth++;
> > +            name_cur = fdt_get_name(ccs->fdt, ccs->fdt_offset, &name_cur_len);
> > +            *prop_name = g_strndup(name_cur, name_cur_len);
> > +            resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
> > +            break;
> > +        case FDT_END_NODE:
> > +            ccs->fdt_depth--;
> > +            if (ccs->fdt_depth == 0) {
> > +                resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
> > +            } else {
> > +                resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT;
> > +            }
> > +            break;
> > +        case FDT_PROP:
> > +            *prop = fdt_get_property_by_offset(ccs->fdt, ccs->fdt_offset,
> > +                                               prop_len);
> > +            name_cur = fdt_string(ccs->fdt, fdt32_to_cpu((*prop)->nameoff));
> > +            *prop_name = g_strdup(name_cur);
> > +            resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
> > +            break;
> > +        case FDT_END:
> > +            resp = SPAPR_DR_CC_RESPONSE_ERROR;
> > +            break;
> 
> IIUC, the fdt fragment you're stepping through here is generated
> within qemu.  In which case shouldn't this be an assert, rather than
> reporting an error to the guest?

It is, but I think it's possible for a buggy guest to continue making calls to
traverse the FDT fragment even though we've already reported completion in a
previous RTAS response, so I think it makes sense to just report an error to
the guest.

I think it does make sense to assert resp != SPAPR_DR_CC_RESPONSE_ERROR in
prop_get_fdt user below on the basis you mentioned though, since that's QEMU
traversing the FDT in response to a QMP call. Will add for v5.

> 
> > +        default:
> > +            ccs->fdt_offset = fdt_offset_next;
> > +        }
> > +    }
> > +
> > +    ccs->fdt_offset = fdt_offset_next;
> > +    return resp;
> > +}
> > +
> > +static sPAPRDRCCResponse configure_connector(sPAPRDRConnector *drc,
> > +                                             char **prop_name,
> > +                                             const struct fdt_property **prop,
> > +                                             int *prop_len)
> > +{
> > +    return configure_connector_common(&drc->ccs, prop_name, prop, prop_len);
> > +}
> > +
> > +static void prop_get_id(Object *obj, Visitor *v, void *opaque,
> > +                                  const char *name, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +    uint32_t value = get_id(drc);
> > +    visit_type_uint32(v, &value, name, errp);
> > +}
> > +
> > +static void prop_get_index(Object *obj, Visitor *v, void *opaque,
> > +                                  const char *name, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    uint32_t value = (uint32_t)drck->get_index(drc);
> > +    visit_type_uint32(v, &value, name, errp);
> > +}
> > +
> > +static void prop_get_type(Object *obj, Visitor *v, void *opaque,
> > +                          const char *name, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    uint32_t value = (uint32_t)drck->get_type(drc);
> > +    visit_type_uint32(v, &value, name, errp);
> > +}
> > +
> > +static void prop_get_entity_sense(Object *obj, Visitor *v, void *opaque,
> > +                                  const char *name, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    uint32_t value = (uint32_t)drck->entity_sense(drc);
> > +    visit_type_uint32(v, &value, name, errp);
> > +}
> > +
> > +static void prop_get_fdt(Object *obj, Visitor *v, void *opaque,
> > +                        const char *name, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +    sPAPRDRCCState ccs = { 0 };
> > +    sPAPRDRCCResponse resp;
> > +
> > +    ccs.fdt = drc->ccs.fdt;
> > +    ccs.fdt_offset = ccs.fdt_start_offset = drc->ccs.fdt_start_offset;
> > +
> > +    do {
> > +        char *prop_name = NULL;
> > +        const struct fdt_property *prop = NULL;
> > +        int prop_len;
> > +
> > +        resp = configure_connector_common(&ccs, &prop_name, &prop, &prop_len);
> > +
> > +        switch (resp) {
> > +        case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
> > +            visit_start_struct(v, NULL, NULL, prop_name, 0, NULL);
> > +            break;
> > +        case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
> > +            visit_end_struct(v, NULL);
> > +            break;
> > +        case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY: {
> > +            int i;
> > +            visit_start_list(v, prop_name, NULL);
> > +            for (i = 0; i < prop_len; i++) {
> > +                visit_type_uint8(v, (uint8_t *)&prop->data[i], NULL, NULL);
> > +            }
> > +            visit_end_list(v, NULL);
> > +            break;
> > +        }
> > +        default:
> > +            resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
> > +            break;
> > +        }
> > +
> > +        g_free(prop_name);
> > +    } while (resp != SPAPR_DR_CC_RESPONSE_SUCCESS &&
> > +             resp != SPAPR_DR_CC_RESPONSE_ERROR);
> > +}
> > +
> > +static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
> > +                   int fdt_start_offset, bool coldplug)
> > +{
> > +    DPRINTFN("attach");
> > +
> > +    g_assert(drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED);
> > +    g_assert(drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE);
> > +    g_assert(drc->indicator_state == SPAPR_DR_INDICATOR_STATE_INACTIVE);
> > +    g_assert(fdt || coldplug);
> > +
> > +    /* NOTE: setting initial isolation state to UNISOLATED means we can't
> > +     * detach unless guest has a userspace/kernel that moves this state
> > +     * back to ISOLATED in response to an unplug event, or this is done
> > +     * manually by the admin prior. if we force things while the guest
> > +     * may be accessing the device, we can easily crash the guest, so we
> > +     * we defer completion of removal in such cases to the reset() hook.
> > +     */
> 
> Given that, would it make more sense to start in ISOLATED state?  Or
> is the initial state specified by PAPR?

From my read of the state machine in 13.4, it's possible to populate
a PCI slot and power on the device without immediately transitioning
to UNISOLATED, but in the case of QEMU, by the time we attach the
device via the PCI hotplug path it's already been exposed to the guest
in the sense that it can be probed/configured, so I think it's unsafe
to claim it's UNISOLATED at this point since there's a risk of
unplugging while a guest thinks there's something there.

One possible way to work around this might be to start off with
ISOLATED, but add hooks in rtas_ibm_{read,write}_pci_config that
immediately transition the state to UNISOLATED if the guest actually
accesses/configures it. Seems hacky though... the real-world benefit
is that a guest that isn't capable of PCI hotplug will still allow
for immediate attach/detach on the QEMU, as opposed to requiring a
reboot to complete the detach. I'm not sure if it's worth it, but
can look into that if this seems better.

> 
> > +    drc->isolation_state = SPAPR_DR_ISOLATION_STATE_UNISOLATED;
> > +    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_USABLE;
> > +    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_ACTIVE;
> > +
> > +    drc->dev = d;
> > +    drc->ccs.fdt = fdt;
> > +    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = fdt_start_offset;
> > +    drc->ccs.fdt_depth = 0;
> > +
> > +    object_property_add_link(OBJECT(drc), "device",
> > +                             object_get_typename(OBJECT(drc->dev)),
> > +                             (Object **)(&drc->dev),
> > +                             NULL, 0, NULL);
> > +}
> > +
> > +static void detach(sPAPRDRConnector *drc, DeviceState *d,
> > +                   spapr_drc_detach_cb *detach_cb,
> > +                   void *detach_cb_opaque)
> > +{
> > +    DPRINTFN("detach");
> > +
> > +    drc->detach_cb = detach_cb;
> > +    drc->detach_cb_opaque = detach_cb_opaque;
> > +
> > +    if (drc->isolation_state != SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> > +        DPRINTFN("awaiting transition to isolated state before removal");
> > +        drc->awaiting_release = true;
> > +        return;
> > +    }
> > +
> > +    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_UNUSABLE;
> > +    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
> > +
> > +    if (drc->detach_cb) {
> > +        drc->detach_cb(drc->dev, drc->detach_cb_opaque);
> > +    }
> > +
> > +    drc->awaiting_release = false;
> > +    g_free(drc->ccs.fdt);
> > +    drc->ccs.fdt = NULL;
> > +    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = drc->ccs.fdt_depth = 0;
> > +    object_property_del(OBJECT(drc), "device", NULL);
> > +    drc->dev = NULL;
> > +    drc->detach_cb = NULL;
> > +    drc->detach_cb_opaque = NULL;
> 
> Shouldn't all this code after the detach_cb call also be called from
> set_isolation_state in the case of a deferred detach?  In which case
> you probably want a helper.

set_isolation_state() actually calls drc->detach(), as opposed calling to
drc->detach_cb() directly, so we end up back here in both cases.

> 
> > +}
> > +
> > +static void reset(DeviceState *d)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +    DPRINTFN("drc reset: %x", drck->get_index(drc));
> > +    /* immediately upon reset we can safely assume DRCs whose devices are pending
> > +     * removal can be safely removed, and that they will subsequently be left in
> > +     * an ISOLATED state. move the DRC to this state in these cases (which will in
> > +     * turn complete any pending device removals)
> > +     */
> > +    if (drc->awaiting_release) {
> > +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_ISOLATED);
> > +    }
> > +}
> > +
> > +static void realize(DeviceState *d, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    Object *root_container;
> > +    char link_name[256];
> > +    gchar *child_name;
> > +    Error *err = NULL;
> > +
> > +    DPRINTFN("drc realize: %x", drck->get_index(drc));
> > +    /* NOTE: we do this as part of realize/unrealize due to the fact
> > +     * that the guest will communicate with the DRC via RTAS calls
> > +     * referencing the global DRC index. By unlinking the DRC
> > +     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> > +     * inaccessible by the guest, since lookups rely on this path
> > +     * existing in the composition tree
> > +     */
> > +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > +    snprintf(link_name, sizeof(link_name), "%x", drck->get_index(drc));
> > +    child_name = object_get_canonical_path_component(OBJECT(drc));
> > +    DPRINTFN("drc child name: %s", child_name);
> > +    object_property_add_alias(root_container, link_name,
> > +                              drc->owner, child_name, &err);
> > +    /*
> > +    object_property_add_link(root_container, name, TYPE_SPAPR_DR_CONNECTOR,
> > +                             (Object **)&drc, NULL,
> > +                             OBJ_PROP_LINK_UNREF_ON_RELEASE, &err);
> > +                             */
> > +    if (err) {
> > +        error_report("%s", error_get_pretty(err));
> > +        error_free(err);
> > +        object_unref(OBJECT(drc));
> > +    }
> > +    DPRINTFN("drc realize complete");
> > +}
> > +
> > +static void unrealize(DeviceState *d, Error **errp)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    Object *root_container;
> > +    char name[256];
> > +    Error *err = NULL;
> > +
> > +    DPRINTFN("drc unrealize: %x", drck->get_index(drc));
> > +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > +    snprintf(name, sizeof(name), "%x", drck->get_index(drc));
> > +    object_property_del(root_container, name, &err);
> > +    if (err) {
> > +        error_report("%s", error_get_pretty(err));
> > +        error_free(err);
> > +        object_unref(OBJECT(drc));
> > +    }
> > +}
> > +
> > +sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> > +                                         sPAPRDRConnectorType type,
> > +                                         uint32_t id)
> > +{
> > +    sPAPRDRConnector *drc =
> > +        SPAPR_DR_CONNECTOR(object_new(TYPE_SPAPR_DR_CONNECTOR));
> > +
> > +    g_assert(type);
> > +
> > +    drc->type = type;
> > +    drc->id = id;
> > +    drc->owner = owner;
> > +    object_property_add_child(owner, "dr-connector[*]", OBJECT(drc), NULL);
> > +    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
> > +
> > +    return drc;
> > +}
> > +
> > +static void spapr_dr_connector_instance_init(Object *obj)
> > +{
> > +    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
> > +
> > +    object_property_add_uint32_ptr(obj, "isolation-state",
> > +                                   &drc->isolation_state, NULL);
> > +    object_property_add_uint32_ptr(obj, "indicator-state",
> > +                                   &drc->indicator_state, NULL);
> > +    object_property_add_uint32_ptr(obj, "allocation-state",
> > +                                   &drc->allocation_state, NULL);
> 
> Don't these QOM properties need to be bound to set_isolation_state
> etc. for the write side?  Or does add_uint32_ptr only allow reads?

Only reads on add_*_ptr properties. Easily worked around though,but in
this case we're only exposing the properties here for introspection so
it's not really needed since the accessors that RTAS uses to
read/write the DRC state don't use these accessors to do it, they use
object methods, similar to the prop_get_* accessors below. 

We could allow these to be writeable properties, but I can't think
of a use-case outside of debugging the emulation path (which is
probably better done via something like qtest).

> 
> > +    object_property_add(obj, "id", "uint32", prop_get_id,
> > +                        NULL, NULL, NULL, NULL);
> > +    object_property_add(obj, "index", "uint32", prop_get_index,
> > +                        NULL, NULL, NULL, NULL);
> > +    object_property_add(obj, "index", "uint32", prop_get_type,
> > +                        NULL, NULL, NULL, NULL);
> > +    object_property_add(obj, "entity-sense", "uint32", prop_get_entity_sense,
> > +                        NULL, NULL, NULL, NULL);
> > +    object_property_add(obj, "fdt", "struct", prop_get_fdt,
> > +                        NULL, NULL, NULL, NULL);
> > +}
> > +
> > +static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > +{
> > +    DeviceClass *dk = DEVICE_CLASS(k);
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> > +
> > +    dk->reset = reset;
> > +    dk->realize = realize;
> > +    dk->unrealize = unrealize;
> > +    drck->set_isolation_state = set_isolation_state;
> > +    drck->set_indicator_state = set_indicator_state;
> > +    drck->set_allocation_state = set_allocation_state;
> > +    drck->get_index = get_index;
> > +    drck->get_type = get_type;
> > +    drck->entity_sense = entity_sense;
> > +    drck->configure_connector = configure_connector;
> > +    drck->attach = attach;
> > +    drck->detach = detach;
> > +}
> > +
> > +static const TypeInfo spapr_dr_connector_info = {
> > +    .name          = TYPE_SPAPR_DR_CONNECTOR,
> > +    .parent        = TYPE_DEVICE,
> > +    .instance_size = sizeof(sPAPRDRConnector),
> > +    .instance_init = spapr_dr_connector_instance_init,
> > +    .class_size    = sizeof(sPAPRDRConnectorClass),
> > +    .class_init    = spapr_dr_connector_class_init,
> > +};
> > +
> > +static void spapr_drc_register_types(void)
> > +{
> > +    type_register_static(&spapr_dr_connector_info);
> > +}
> > +
> > +type_init(spapr_drc_register_types)
> > +
> > +/* helper functions for external users */
> > +
> > +sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index)
> > +{
> > +    Object *obj;
> > +    char name[256];
> > +
> > +    snprintf(name, sizeof(name), "%s/%x", DRC_CONTAINER_PATH, index);
> > +    obj = object_resolve_path(name, NULL);
> > +
> > +    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> > +}
> > +
> > +sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> > +                                           uint32_t id)
> > +{
> > +    return spapr_dr_connector_by_index(
> > +            (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
> > +            (id & DRC_INDEX_ID_MASK));
> > +}
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > new file mode 100644
> > index 0000000..63ec687
> > --- /dev/null
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -0,0 +1,201 @@
> > +/*
> > + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> > + *
> > + * Copyright IBM Corp. 2014
> > + *
> > + * Authors:
> > + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#if !defined(__HW_SPAPR_DRC_H__)
> > +#define __HW_SPAPR_DRC_H__
> > +
> > +#include "qom/object.h"
> > +#include "hw/qdev.h"
> > +#include "libfdt.h"
> > +
> > +#define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
> > +#define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
> > +        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DR_CONNECTOR)
> > +#define SPAPR_DR_CONNECTOR_CLASS(klass) \
> > +        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, \
> > +                           TYPE_SPAPR_DR_CONNECTOR)
> > +#define SPAPR_DR_CONNECTOR(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
> > +                                             TYPE_SPAPR_DR_CONNECTOR)
> > +
> > +/*
> > + * Various hotplug types managed by sPAPRDRConnector
> > + *
> > + * these are somewhat arbitrary, but to make things easier
> > + * when generating DRC indexes later we've aligned the bit
> > + * positions with the values used to assign DRC indexes on
> > + * pSeries. we use those values as bit shifts to allow for
> > + * the OR'ing of these values in various QEMU routines, but
> > + * for values exposed to the guest (via DRC indexes for
> > + * instance) we will use the shift amounts.
> > + */
> > +typedef enum {
> > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU = 1,
> > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB = 2,
> > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
> > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
> > +    SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
> > +} sPAPRDRConnectorTypeShift;
> > +
> > +typedef enum {
> > +    SPAPR_DR_CONNECTOR_TYPE_ANY = ~0,
> > +    SPAPR_DR_CONNECTOR_TYPE_CPU = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU,
> > +    SPAPR_DR_CONNECTOR_TYPE_PHB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB,
> > +    SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
> > +    SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
> > +    SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
> > +} sPAPRDRConnectorType;
> > +
> > +/*
> > + * set via set-indicator RTAS calls
> > + * as documented by PAPR+ 2.7 13.5.3.4, Table 177
> > + *
> > + * isolated: put device under firmware control 
> > + * unisolated: claim OS control of device (may or may not be in use)
> > + */
> > +typedef enum {
> > +    SPAPR_DR_ISOLATION_STATE_ISOLATED   = 0,
> > +    SPAPR_DR_ISOLATION_STATE_UNISOLATED = 1
> > +} sPAPRDRIsolationState;
> > +
> > +/*
> > + * set via set-indicator RTAS calls
> > + * as documented by PAPR+ 2.7 13.5.3.4, Table 177
> > + *
> > + * unusable: mark device as unavailable to OS
> > + * usable: mark device as available to OS
> > + * exchange: (currently unused)
> > + * recover: (currently unused)
> > + */
> > +typedef enum {
> > +    SPAPR_DR_ALLOCATION_STATE_UNUSABLE  = 0,
> > +    SPAPR_DR_ALLOCATION_STATE_USABLE    = 1,
> > +    SPAPR_DR_ALLOCATION_STATE_EXCHANGE  = 2,
> > +    SPAPR_DR_ALLOCATION_STATE_RECOVER   = 3
> > +} sPAPRDRAllocationState;
> > +
> > +/*
> > + * LED/visual indicator state
> > + *
> > + * set via set-indicator RTAS calls
> > + * as documented by PAPR+ 2.7 13.5.3.4, Table 177,
> > + * and PAPR+ 2.7 13.5.4.1, Table 180
> > + *
> > + * inactive: hotpluggable entity inactive and safely removable
> > + * active: hotpluggable entity in use and not safely removable
> > + * identify: (currently unused)
> > + * action: (currently unused)
> > + */
> > +typedef enum {
> > +    SPAPR_DR_INDICATOR_STATE_INACTIVE   = 0,
> > +    SPAPR_DR_INDICATOR_STATE_ACTIVE     = 1,
> > +    SPAPR_DR_INDICATOR_STATE_IDENTIFY   = 2,
> > +    SPAPR_DR_INDICATOR_STATE_ACTION     = 3,
> > +} sPAPRDRIndicatorState;
> > +
> > +/*
> > + * returned via get-sensor-state RTAS calls
> > + * as documented by PAPR+ 2.7 13.5.3.3, Table 175:
> > + *
> > + * empty: connector slot empty (e.g. empty hotpluggable PCI slot)
> > + * present: connector slot populated and device available to OS
> > + * unusable: device not currently available to OS
> > + * exchange: (currently unused)
> > + * recover: (currently unused)
> > + */
> > +typedef enum {
> > +    SPAPR_DR_ENTITY_SENSE_EMPTY     = 0,
> > +    SPAPR_DR_ENTITY_SENSE_PRESENT   = 1,
> > +    SPAPR_DR_ENTITY_SENSE_UNUSABLE  = 2,
> > +    SPAPR_DR_ENTITY_SENSE_EXCHANGE  = 3,
> > +    SPAPR_DR_ENTITY_SENSE_RECOVER   = 4,
> > +} sPAPRDREntitySense;
> > +
> > +typedef enum {
> > +    SPAPR_DR_CC_RESPONSE_NEXT_SIB       = 1, /* currently unused */
> > +    SPAPR_DR_CC_RESPONSE_NEXT_CHILD     = 2,
> > +    SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY  = 3,
> > +    SPAPR_DR_CC_RESPONSE_PREV_PARENT    = 4,
> > +    SPAPR_DR_CC_RESPONSE_SUCCESS        = 0,
> > +    SPAPR_DR_CC_RESPONSE_ERROR          = -1,
> > +    SPAPR_DR_CC_RESPONSE_CONTINUE       = -2,
> > +} sPAPRDRCCResponse;
> > +
> > +typedef struct sPAPRDRCCState {
> > +    void *fdt;
> > +    int fdt_start_offset;
> > +    int fdt_offset;
> > +    int fdt_depth;
> > +} sPAPRDRCCState;
> > +
> > +typedef void (spapr_drc_detach_cb)(DeviceState *d, void *opaque);
> > +
> > +typedef struct sPAPRDRConnector {
> > +    /*< private >*/
> > +    DeviceState parent;
> > +
> > +    sPAPRDRConnectorType type;
> > +    uint32_t id;
> > +    Object *owner;
> > +
> > +    /* sensor/indicator states */
> > +    uint32_t isolation_state;
> > +    uint32_t allocation_state;
> > +    uint32_t indicator_state;
> > +
> > +    /* configure-connector state */
> > +    sPAPRDRCCState ccs;
> > +
> > +    bool awaiting_release;
> > +
> > +    /* device pointer, via link property */
> > +    DeviceState *dev;
> > +    spapr_drc_detach_cb *detach_cb;
> > +    void *detach_cb_opaque;
> > +} sPAPRDRConnector;
> > +
> > +typedef struct sPAPRDRConnectorClass {
> > +    /*< private >*/
> > +    DeviceClass parent;
> > +
> > +    /*< public >*/
> > +
> > +    /* accessors for guest-visible (generally via RTAS) DR state */
> > +    int (*set_isolation_state)(sPAPRDRConnector *drc,
> > +                               sPAPRDRIsolationState state);
> > +    int (*set_indicator_state)(sPAPRDRConnector *drc,
> > +                               sPAPRDRIndicatorState state);
> > +    int (*set_allocation_state)(sPAPRDRConnector *drc,
> > +                                sPAPRDRAllocationState state);
> > +    uint32_t (*get_index)(sPAPRDRConnector *drc);
> > +    uint32_t (*get_type)(sPAPRDRConnector *drc);
> > +
> > +    sPAPRDREntitySense (*entity_sense)(sPAPRDRConnector *drc);
> > +    sPAPRDRCCResponse (*configure_connector)(sPAPRDRConnector *drc,
> > +                                             char **prop_name,
> > +                                             const struct fdt_property **prop,
> > +                                             int *prop_len);
> > +
> > +    /* QEMU interfaces for managing hotplug operations */
> > +    void (*attach)(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
> > +                   int fdt_start_offset, bool coldplug);
> > +    void (*detach)(sPAPRDRConnector *drc, DeviceState *d,
> > +                   spapr_drc_detach_cb *detach_cb,
> > +                   void *detach_cb_opaque);
> > +} sPAPRDRConnectorClass;
> > +
> > +sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> > +                                         sPAPRDRConnectorType type,
> > +                                         uint32_t token);
> > +sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
> > +sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> > +                                           uint32_t id);
> > +
> > +#endif /* __HW_SPAPR_DRC_H__ */
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device
  2015-01-02 10:32   ` Bharata B Rao
@ 2015-01-26  4:56     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26  4:56 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: aik, qemu-devel, Alexander Graf, ncmike, qemu-ppc, tyreld,
	Nathan Fontenot

Quoting Bharata B Rao (2015-01-02 04:32:58)
> On Tue, Dec 23, 2014 at 6:00 PM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > This device emulates a firmware abstraction used by pSeries guests to
> > manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
> > memory, and CPUs. It is conceptually similar to an SHPC device,
> > complete with LED indicators to identify individual slots to physical
> > physical users and indicate when it is safe to remove a device. In
> > some cases it is also used to manage virtualized resources, such a
> > memory, CPUs, and physical-host bridges, which in the case of pSeries
> > guests are virtualized resources where the physical components are
> > managed by the host.
> >
> > Guests communicate with these DR Connectors using RTAS calls,
> > generally by addressing the unique DRC index associated with a
> > particular connector for a particular resource. For introspection
> > purposes we expose this state initially as QOM properties, and
> > in subsequent patches will introduce the RTAS calls that make use of
> > it. This constitutes to the 'guest' interface.
> >
> > On the QEMU side we provide an attach/detach interface to associate
> > or cleanup a DeviceState with a particular sPAPRDRConnector in
> > response to hotplug/unplug, respectively. This constitutes the
> > 'physical' interface to the DR Connector.
> >
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/Makefile.objs       |   2 +-
> >  hw/ppc/spapr_drc.c         | 503 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h | 201 ++++++++++++++++++
> >  3 files changed, 705 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/ppc/spapr_drc.c
> >  create mode 100644 include/hw/ppc/spapr_drc.h
> >
> > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> > index 19d9920..ea010fd 100644
> > --- a/hw/ppc/Makefile.objs
> > +++ b/hw/ppc/Makefile.objs
> > @@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
> >  # IBM pSeries (sPAPR)
> >  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
> >  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> > -obj-$(CONFIG_PSERIES) += spapr_pci.o
> > +obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_drc.o
> >  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >  obj-y += spapr_pci_vfio.o
> >  endif
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > new file mode 100644
> > index 0000000..f81c6d1
> > --- /dev/null
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -0,0 +1,503 @@
> > +/*
> > + * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
> > + *
> > + * Copyright IBM Corp. 2014
> > + *
> > + * Authors:
> > + *  Michael Roth      <mdroth@linux.vnet.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "hw/ppc/spapr_drc.h"
> > +#include "qom/object.h"
> > +#include "hw/qdev.h"
> > +#include "qapi/visitor.h"
> > +#include "qemu/error-report.h"
> > +
> > +/* #define DEBUG_SPAPR_DRC */
> > +
> > +#ifdef DEBUG_SPAPR_DRC
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> > +#define DPRINTFN(fmt, ...) \
> > +    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#define DPRINTFN(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> > +#define DRC_CONTAINER_PATH "/dr-connector"
> > +#define DRC_INDEX_TYPE_SHIFT 28
> > +#define DRC_INDEX_ID_MASK ~(~0 << DRC_INDEX_TYPE_SHIFT)
> > +
> > +static int set_isolation_state(sPAPRDRConnector *drc,
> > +                               sPAPRDRIsolationState state)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +    DPRINTFN("set_isolation_state: %x", state);
> > +    drc->isolation_state = state;
> > +    if (drc->awaiting_release &&
> > +        drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
> > +        drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> > +                     drc->detach_cb_opaque);
> > +    }
> > +    return 0;
> > +}
> > +
> > +static int set_indicator_state(sPAPRDRConnector *drc,
> > +                               sPAPRDRIndicatorState state)
> > +{
> > +    DPRINTFN("set_indicator_state: %x", state);
> > +    drc->indicator_state = state;
> > +    return 0;
> > +}
> > +
> > +static int set_allocation_state(sPAPRDRConnector *drc,
> > +                                sPAPRDRAllocationState state)
> > +{
> > +    DPRINTFN("set_allocation_state: %x", state);
> > +    drc->indicator_state = state;
> > +    return 0;
> > +}
> > +
> > +static uint32_t get_id(sPAPRDRConnector *drc)
> > +{
> > +    /* this value is unique for DRCs of a particular type, but may
> > +     * overlap with the id's of other DRCs. the value is used both
> > +     * to calculate a unique (across all DRC types) index, as well
> > +     * as generating the ibm,drc-names OFDT property that describes
> > +     * DRCs
> > +     */
> > +    return drc->id;
> > +}
> > +
> > +static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
> > +{
> > +    uint32_t shift = 0;
> > +
> > +    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
> > +    while (type != (1 << shift)) {
> > +        shift++;
> > +    }
> > +    return shift;
> > +}
> > +
> > +static uint32_t get_index(sPAPRDRConnector *drc)
> > +{
> > +    /* no set format for a drc index: it only needs to be globally
> > +     * unique. this is how we encode the DRC type on bare-metal
> > +     * however, so might as well do that here
> > +     */
> > +    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
> > +            (drc->id & DRC_INDEX_ID_MASK);
> > +}
> > +
> > +static uint32_t get_type(sPAPRDRConnector *drc)
> > +{
> > +    return drc->type;
> > +}
> > +
> > +/*
> > + * dr-entity-sense sensor value
> > + * returned via get-sensor-state RTAS calls
> > + * as expected by state diagram in PAPR+ 2.7, 13.4
> > + * based on the current allocation/indicator/power states
> > + * for the DR connector.
> > + */
> > +static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
> > +{
> > +    if (drc->dev) {
> > +        /* this assumes all PCI devices are assigned to
> > +         * a 'live insertion' power domain, where QEMU
> > +         * manages power state automatically as opposed
> > +         * to the guest. present, non-PCI resources are
> > +         * unaffected by power state.
> > +         */
> > +        return SPAPR_DR_ENTITY_SENSE_PRESENT;
> > +    }
> 
> Hotplugged CPU comes with drc->dev set (set during
> sPAPRDRConnectorClass->attach()) and hence ends up returning PRESENT
> but kernel expects UNUSABLE during get-sensor-state rtas call.

That does seem to be in agreement with the state machine in PAPR+ 13.4.

I've updated the check to return UNUSABLE/EMPTY if the allocation
state is UNUSABLE when dealing with logical DR devices. For PCI we
transition to PRESENT have plug/power-on regardless of allocation
state.

> 
> > +
> > +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
> > +        /* PCI devices, and only PCI devices, use PRESENT
> > +         * in cases where we'd otherwise use UNUSABLE
> > +         */
> > +        return SPAPR_DR_ENTITY_SENSE_EMPTY;
> > +    }
> > +    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
> > +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-16  6:21   ` David Gibson
@ 2015-01-26  5:21     ` Michael Roth
  2015-01-27  5:24       ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2015-01-26  5:21 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-16 00:21:55)
> On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
> > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > 
> > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index 2ec2a8e..a2fb533 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
> >      rtas_st(rets, 0, ret);
> >  }
> >  
> > +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > +                                 uint32_t token, uint32_t nargs,
> > +                                 target_ulong args, uint32_t nret,
> > +                                 target_ulong rets)
> > +{
> > +    /* we currently only use a single, "live insert" powerdomain for
> > +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
> > +     */
> 
> Even so, you should at least validate the number of args and rets, and
> preferably check that the user isn't attempt to set something for some
> other, non-existent power domain.
> 
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    rtas_st(rets, 1, 100);
> > +}
> > +
> > +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > +                                  uint32_t token, uint32_t nargs,
> > +                                  target_ulong args, uint32_t nret,
> > +                                  target_ulong rets)
> > +{
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    rtas_st(rets, 1, 100);
> > +}
> > +
> >  static struct rtas_call {
> >      const char *name;
> >      spapr_rtas_fn fn;
> > @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
> >                          rtas_ibm_set_system_parameter);
> >      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
> >                          rtas_ibm_os_term);
> > +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
> > +                        rtas_set_power_level);
> > +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> > +                        rtas_get_power_level);
> >  }
> >  
> >  type_init(core_rtas_register_types)
> 
> This code should probably go in spapr_drc.c.  The idea that spapr_rtas
> was just the RTAS dispatch code, and RTAS functions that had no other
> home.  Generally RTAS functions should live with the devices they're
> connected to.

In this particular case the calls act on a "power domain" which isn't
actually modeled in QEMU (we just assume a single "live-insertion" domain
which just magically does everything we want), so I think it makes
sense to leave these here.

But for the others it does make sense to tie them with spapr_drc.c, or
maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
well-defined accessors.

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-01-19  4:31   ` David Gibson
@ 2015-01-26 16:56     ` Michael Roth
  2015-01-27  5:27       ` David Gibson
  2015-01-28  3:56       ` Bharata B Rao
  0 siblings, 2 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26 16:56 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-18 22:31:23)
> On Tue, Dec 23, 2014 at 06:30:22AM -0600, Michael Roth wrote:
> > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > 
> > This extends the data structures currently used to report EPOW events to
> > gets via the check-exception RTAS interfaces to also include event types
> > for hotplug/unplug events.
> > 
> > This is currently undocumented and being finalized for inclusion in PAPR
> > specification, but we implement this here as an extension for guest
> > userspace tools to implement (existing guest kernels simply log these
> > events via a sysfs interface that's read by rtas_errd).
> > 
> > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   2 +-
> >  hw/ppc/spapr_events.c  | 211 ++++++++++++++++++++++++++++++++++++++++---------
> >  include/hw/ppc/spapr.h |   5 +-
> >  3 files changed, 177 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 361b914..1bc5773 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1601,7 +1601,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> >                                              kernel_size, kernel_le,
> >                                              boot_device, kernel_cmdline,
> > -                                            spapr->epow_irq);
> > +                                            spapr->check_exception_irq);
> >      assert(spapr->fdt_skel != NULL);
> >  }
> >  
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 1b6157d..ebbf3a4 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -32,6 +32,9 @@
> >  
> >  #include "hw/ppc/spapr.h"
> >  #include "hw/ppc/spapr_vio.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci-host/spapr.h"
> > +#include "hw/ppc/spapr_drc.h"
> >  
> >  #include <libfdt.h>
> >  
> > @@ -77,6 +80,7 @@ struct rtas_error_log {
> >  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
> >  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
> >  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> > +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
> >      uint32_t extended_length;
> >  } QEMU_PACKED;
> >  
> > @@ -166,6 +170,38 @@ struct epow_log_full {
> >      struct rtas_event_log_v6_epow epow;
> >  } QEMU_PACKED;
> >  
> > +struct rtas_event_log_v6_hp {
> > +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> > +    struct rtas_event_log_v6_section_header hdr;
> > +    uint8_t hotplug_type;
> > +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> > +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> > +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> > +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> > +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> > +    uint8_t hotplug_action;
> > +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> > +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> > +    uint8_t hotplug_identifier;
> > +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> > +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> > +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> > +    uint8_t reserved;
> > +    union {
> > +        uint32_t index;
> > +        uint32_t count;
> > +        char name[1];
> > +    } drc;
> > +} QEMU_PACKED;
> > +
> > +struct hp_log_full {
> > +    struct rtas_error_log hdr;
> > +    struct rtas_event_log_v6 v6hdr;
> > +    struct rtas_event_log_v6_maina maina;
> > +    struct rtas_event_log_v6_mainb mainb;
> > +    struct rtas_event_log_v6_hp hp;
> > +} QEMU_PACKED;
> > +
> >  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> >  #define EVENT_MASK_EPOW                      0x40000000
> >  #define EVENT_MASK_HOTPLUG                   0x10000000
> > @@ -181,29 +217,61 @@ struct epow_log_full {
> >          }                                                          \
> >      } while (0)
> >  
> > -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> > +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> >  {
> > -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> > -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> > +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> >  
> >      _FDT((fdt_begin_node(fdt, "event-sources")));
> >  
> >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> > +                       irq_ranges, sizeof(irq_ranges))));
> >  
> >      _FDT((fdt_begin_node(fdt, "epow-events")));
> > -    _FDT((fdt_property(fdt, "interrupts",
> > -                       epow_interrupts, sizeof(epow_interrupts))));
> > +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> >      _FDT((fdt_end_node(fdt)));
> >  
> >      _FDT((fdt_end_node(fdt)));
> >  }
> >  
> >  static struct epow_log_full *pending_epow;
> > +static struct hp_log_full *pending_hp;
> >  static uint32_t next_plid;
> >  
> > +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> > +{
> > +    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> > +        | RTAS_LOG_V6_B0_BIGENDIAN;
> > +    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> > +        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> > +    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> > +}
> > +
> > +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> > +                             int section_count)
> > +{
> > +    struct tm tm;
> > +    int year;
> > +
> > +    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> > +    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> > +    /* FIXME: section version, subtype and creator id? */
> > +    qemu_get_timedate(&tm, spapr->rtc_offset);
> > +    year = tm.tm_year + 1900;
> > +    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> > +                                       | (to_bcd(year % 100) << 16)
> > +                                       | (to_bcd(tm.tm_mon + 1) << 8)
> > +                                       | to_bcd(tm.tm_mday));
> > +    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> > +                                       | (to_bcd(tm.tm_min) << 16)
> > +                                       | (to_bcd(tm.tm_sec) << 8));
> > +    maina->creator_id = 'H'; /* Hypervisor */
> > +    maina->section_count = section_count;
> > +    maina->plid = next_plid++;
> > +}
> > +
> >  static void spapr_powerdown_req(Notifier *n, void *opaque)
> >  {
> >      sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> > @@ -212,8 +280,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      struct rtas_event_log_v6_maina *maina;
> >      struct rtas_event_log_v6_mainb *mainb;
> >      struct rtas_event_log_v6_epow *epow;
> > -    struct tm tm;
> > -    int year;
> >  
> >      if (pending_epow) {
> >          /* For now, we just throw away earlier events if two come
> > @@ -237,27 +303,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> >                                         - sizeof(pending_epow->hdr));
> >  
> > -    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> > -        | RTAS_LOG_V6_B0_BIGENDIAN;
> > -    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> > -        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> > -    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> > -
> > -    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> > -    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> > -    /* FIXME: section version, subtype and creator id? */
> > -    qemu_get_timedate(&tm, spapr->rtc_offset);
> > -    year = tm.tm_year + 1900;
> > -    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> > -                                       | (to_bcd(year % 100) << 16)
> > -                                       | (to_bcd(tm.tm_mon + 1) << 8)
> > -                                       | to_bcd(tm.tm_mday));
> > -    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> > -                                       | (to_bcd(tm.tm_min) << 16)
> > -                                       | (to_bcd(tm.tm_sec) << 8));
> > -    maina->creator_id = 'H'; /* Hypervisor */
> > -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> > -    maina->plid = next_plid++;
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
> >  
> >      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> >      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > @@ -274,7 +321,82 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
> >      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > +}
> > +
> > +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
> > +{
> > +    struct hp_log_full *new_hp;
> > +    struct rtas_error_log *hdr;
> > +    struct rtas_event_log_v6 *v6hdr;
> > +    struct rtas_event_log_v6_maina *maina;
> > +    struct rtas_event_log_v6_mainb *mainb;
> > +    struct rtas_event_log_v6_hp *hp;
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
> > +
> > +    new_hp = g_malloc0(sizeof(struct hp_log_full));
> > +    hdr = &new_hp->hdr;
> > +    v6hdr = &new_hp->v6hdr;
> > +    maina = &new_hp->maina;
> > +    mainb = &new_hp->mainb;
> > +    hp = &new_hp->hp;
> > +
> > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > +                               | RTAS_LOG_SEVERITY_EVENT
> > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > +                               | RTAS_LOG_INITIATOR_HOTPLUG
> > +                               | RTAS_LOG_TYPE_HOTPLUG);
> > +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
> > +                                       - sizeof(new_hp->hdr));
> > +
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> > +
> > +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> > +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > +    mainb->subsystem_id = 0x80; /* External environment */
> > +    mainb->event_severity = 0x00; /* Informational / non-error */
> > +    mainb->event_subtype = 0x00; /* Normal shutdown */
> > +
> > +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> > +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> > +    hp->hdr.section_version = 1; /* includes extended modifier */
> > +    hp->hotplug_action = hp_action;
> > +
> > +
> > +    switch (drc_type) {
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> > +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> > +        break;
> > +    default:
> > +        /* skip notification for unknown connector types */
> > +        g_free(new_hp);
> > +        return;
> > +    }
> > +
> > +    if (pending_hp) {
> > +        /* Just toss any pending hotplug events for now, this will
> > +         * need to be fixed later on.
> > +         */
> 
> So, we can get away with a 1-element queue for EPOW, because they're
> just triggering a shutdown - so once the first one's processed, any
> others aren't going to matter.  For hotplug you really do need a
> proper queue.

Yah, this was discussed in the past, but until now I didn't notice how
easy it would be to trigger this when hotplugging multiple devices from
a script or management harness of some sort. Should be simple enough to
fix for v5 though.

> 
> > +        g_free(pending_hp);
> > +    }
> > +    pending_hp = new_hp;
> > +
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > +}
> > +
> > +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
> > +{
> > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
> > +}
> > +
> > +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
> > +{
> > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
> >  }
> >  
> >  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > @@ -298,15 +420,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
> >      }
> >  
> > -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> > -        if (sizeof(*pending_epow) < len) {
> > -            len = sizeof(*pending_epow);
> > -        }
> > +    if (mask & EVENT_MASK_EPOW) {
> > +        if (pending_epow) {
> > +            if (sizeof(*pending_epow) < len) {
> > +                len = sizeof(*pending_epow);
> > +            }
> >  
> > -        cpu_physical_memory_write(buf, pending_epow, len);
> > -        g_free(pending_epow);
> > -        pending_epow = NULL;
> > -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +            cpu_physical_memory_write(buf, pending_epow, len);
> > +            g_free(pending_epow);
> > +            pending_epow = NULL;
> > +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +        } else if (pending_hp) {
> 
> So.. the hotplug messages are a different type from EPOW, but are
> still selected by EVENT_MASK_EPOW?  Seems a bit odd.

It's a little odd, but it's mainly just due to the way we surface the
hotplug event. Rather than requiring patched guest kernels, we opted
to re-use and generalize the EPOW IRQ to also surface hotplug-related
RTAS events, which is why we still expect the EVENT_MASK_EPOW when
returning an HP event via check-exception.

EPOW events have well-defined behavior in how they're exposed to
userspace via rtas_errd, which allows us to add hotplug support for
older guests via patched userspace tools.

> 
> > +            if (sizeof(*pending_hp) < len) {
> > +                len = sizeof(*pending_hp);
> > +            }
> > +
> > +            cpu_physical_memory_write(buf, pending_hp, len);
> > +            g_free(pending_hp);
> > +            pending_hp = NULL;
> > +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +        }
> >      } else {
> >          rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> >      }
> > @@ -314,7 +447,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >  
> >  void spapr_events_init(sPAPREnvironment *spapr)
> >  {
> > -    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
> > +    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
> >      spapr->epow_notifier.notify = spapr_powerdown_req;
> >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> >      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index b4daa42..4d50e74 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -3,6 +3,7 @@
> >  
> >  #include "sysemu/dma.h"
> >  #include "hw/ppc/xics.h"
> > +#include "hw/ppc/spapr_drc.h"
> >  
> >  struct VIOsPAPRBus;
> >  struct sPAPRPHBState;
> > @@ -30,7 +31,7 @@ typedef struct sPAPREnvironment {
> >      struct PPCTimebase tb;
> >      bool has_graphics;
> >  
> > -    uint32_t epow_irq;
> > +    uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >  
> >      /* Migration state */
> > @@ -486,5 +487,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
> >                   uint32_t liobn, uint64_t window, uint32_t size);
> >  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
> >                        sPAPRTCETable *tcet);
> > +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
> > +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
> >  
> >  #endif /* !defined (__HW_SPAPR_H__) */
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt()
  2015-01-19  5:15   ` David Gibson
@ 2015-01-26 20:35     ` Michael Roth
  2015-01-27  5:30       ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2015-01-26 20:35 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-18 23:15:28)
> On Tue, Dec 23, 2014 at 06:30:24AM -0600, Michael Roth wrote:
> > This function handles generation of ibm,drc-* array device tree
> > properties to describe DRC topology to guests. This will by used
> > by the guest to direct RTAS calls to manage any dynamic resources
> > we associate with a particular DR Connector as part of
> > hotplug/unplug.
> > 
> > Since general management of boot-time device trees are handled
> > outside of sPAPRDRConnector, we insert these values blindly given
> > an FDT and offset. A mask of sPAPRDRConnector types is given to
> > instruct us on what types of connectors entries should be generated
> > for, since descriptions for different connectors may live in
> > different parts of the device tree.
> > 
> > Based on code originally written by Nathan Fontenot.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_drc.c         | 225 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h |   3 +-
> >  2 files changed, 227 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index f81c6d1..b162184 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -501,3 +501,228 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> >              (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
> >              (id & DRC_INDEX_ID_MASK));
> >  }
> > +
> > +/* internal helper to gather up DRC info specific to populating DRC
> > + * topology information in the device tree.
> > + */
> > +typedef struct DRConnectorDTInfo {
> > +    char drc_type[64];
> > +    char drc_name[64];
> > +    uint32_t drc_index;
> > +    uint32_t drc_power_domain;
> > +} DRConnectorDTInfo;
> > +
> > +/* generate a string the describes the DRC to encode into the
> > + * device tree.
> > + *
> > + * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
> > + */
> > +static void spapr_drc_get_type_str(char *buf, sPAPRDRConnectorType type)
> > +{
> > +    switch (type) {
> > +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> > +        sprintf(buf, "CPU");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> > +        sprintf(buf, "PHB");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> > +        sprintf(buf, "SLOT");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        sprintf(buf, "28");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > +        sprintf(buf, "MEM");
> > +        break;
> > +    default:
> > +        g_assert(false);
> > +    }
> 
> So this case is simple enough that you can probably get away with it,
> but still - interfaces that involve writing to a buffer without any
> length checks make me very nervous.
> 
> > +}
> > +
> > +/* generate a human-readable name for a DRC to encode into the DT
> > + * description. this is mainly only used within a guest in place
> > + * of the unique DRC index.
> > + *
> > + * in the case of VIO/PCI devices, it corresponds to a
> > + * "location code" that maps a logical device/function (DRC index)
> > + * to a physical (or virtual in the case of VIO) location in the
> > + * system by chaining together the "location label" for each
> > + * encapsulating component.
> > + *
> > + * since this is more to do with diagnosing physical hardware
> > + * issues than guest compatibility, we choose location codes/DRC
> > + * names that adhere to the documented format, but avoid encoding
> > + * the entire topology information into the label/code, instead
> > + * just using the location codes based on the labels for the
> > + * endpoints (VIO/PCI adaptor connectors), which is basically
> > + * just "C" followed by an integer ID.
> 
> Hrm.. would it make sense to include here the qemu "id" value on the
> DRC device?  That will make names which are matchable to specific
> elements on the qemu command line, which about as close an equivalent
> to a physical location as I can think of.

I'm not sure I understand the suggestion. We do make use of the
drc->id values to generate this, though those don't really
correspond to "id"/DeviceState->id properties as specified on
the command-line. There's currently no plans to create the DRCs via
-device since the IDs are dependent on/chosen by the parent devices in
in this case (DRC IDs for PCI slots inherit/encode parent bus/controller
index for example). Did you have something else in mind?

> 
> > + * DRC names as documented by PAPR+ v2.7, 13.5.2.4
> > + * location codes as documented by PAPR+ v2.7, 12.3.1.5
> > + */
> > +static void spapr_drc_get_name_str(char *buf,
> > +                                   sPAPRDRConnectorType type,
> > +                                   uint32_t drc_index)
> > +{
> > +    uint32_t id = drc_index & DRC_INDEX_ID_MASK;
> > +
> > +    switch (type) {
> > +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> > +        sprintf(buf, "CPU %d", id);
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> > +        sprintf(buf, "PHB %d", id);
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        sprintf(buf, "C%d", id);
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > +        sprintf(buf, "LMB %d", id);
> > +        break;
> > +    default:
> > +        g_assert(false);
> > +    }
> > +}
> > +
> > +static DRConnectorDTInfo *spapr_dr_connector_get_info(uint32_t drc_type_mask,
> > +                                                      unsigned int *count)
> > +{
> > +    Object *root_container;
> > +    ObjectProperty *prop;
> > +    GArray *drc_info_list = g_array_new(false, true,
> > +                                        sizeof(DRConnectorDTInfo));
> > +
> > +    /* aliases for all DRConnector objects will be rooted in QOM
> > +     * composition tree at /dr-connector
> > +     */
> > +    root_container = container_get(object_get_root(), "/dr-connector");
> > +
> > +    QTAILQ_FOREACH(prop, &root_container->properties, node) {
> > +        Object *obj;
> > +        sPAPRDRConnector *drc;
> > +        sPAPRDRConnectorClass *drck;
> > +        DRConnectorDTInfo drc_info;
> > +
> > +        if (!strstart(prop->type, "link<", NULL)) {
> > +            continue;
> > +        }
> > +
> > +        obj = object_property_get_link(root_container, prop->name, NULL);
> > +        drc = SPAPR_DR_CONNECTOR(obj);
> > +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +        if ((drc->type & drc_type_mask) == 0) {
> > +            continue;
> > +        }
> > +
> > +        drc_info.drc_index = drck->get_index(drc);
> > +        drc_info.drc_power_domain = -1;
> > +        spapr_drc_get_type_str(drc_info.drc_type, drc->type);
> > +        spapr_drc_get_name_str(drc_info.drc_name, drc->type,
> > +                               drck->get_index(drc));
> > +        g_array_append_val(drc_info_list, drc_info);
> > +    }
> > +
> > +    if (count) {
> > +        *count = drc_info_list->len;
> > +    }
> > +
> > +    /* if count is zero, free everything, including internal storage
> > +     * for array
> > +     */
> > +    return (DRConnectorDTInfo *)g_array_free(drc_info_list, count == 0);
> > +}
> > +
> > +/**
> > + * spapr_drc_populate_dt
> > + *
> > + * @fdt: libfdt device tree
> > + * @path: path in the DT to generate properties
> > + * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
> > + *   to the types of DRCs to generate entries for
> > + *
> > + * generate OF properties to describe DRC topology/indices to guests
> > + *
> > + * as documented in PAPR+ v2.1, 13.5.2
> > + */
> > +int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask)
> > +{
> > +    DRConnectorDTInfo *drc_info_list;
> > +    unsigned int i, count;
> > +    char *char_buf;
> > +    uint32_t *char_buf_count;
> > +    uint32_t *int_buf;
> > +    int char_buf_offset, ret;
> > +
> > +    drc_info_list =
> > +        spapr_dr_connector_get_info(drc_type_mask, &count);
> 
> This is the only call to spapr_dt_connector_get_info().  I don't see a
> lot of point in splitting it out from this function, since it involves
> a not particular easy to work with array encoding of the information.
> Why not go direct from the drc objects to the fdt.
> 
> > +    if (!count) {
> > +        return 0;
> > +    }
> > +
> > +    int_buf = g_new0(uint32_t, count + 1);
> > +    int_buf[0] = cpu_to_be32(count);
> > +    char_buf = g_new0(char, count * 128 + sizeof(uint32_t));
> > +    char_buf_count = (uint32_t *)&char_buf[0];
> > +    *char_buf_count = cpu_to_be32(count);
> > +
> > +    /* ibm,drc-indexes */
> > +    for (i = 0; i < count; i++) {
> > +        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_index);
> > +    }
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
> > +                      (count + 1) * sizeof(uint32_t));
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
> > +        goto out;
> > +    }
> > +
> > +    /* ibm,drc-power-domains */
> > +    for (i = 0; i < count; i++) {
> > +        int_buf[i + 1] = cpu_to_be32(drc_info_list[i].drc_power_domain);
> > +    }
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
> > +                      (count + 1) * sizeof(uint32_t));
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> > +        goto out;
> > +    }
> > +
> > +    /* ibm,drc-names */
> > +    char_buf_offset = sizeof(uint32_t);
> > +
> > +    for (i = 0; i < count; i++) {
> > +        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_name);
> > +        char_buf_offset += strlen(drc_info_list[i].drc_name) + 1;
> > +    }
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf,
> > +                      char_buf_offset);
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> > +        goto out;
> > +    }
> > +
> > +    /* ibm,drc-types */
> > +    char_buf_offset = sizeof(uint32_t);
> > +
> > +    for (i = 0; i < count; i++) {
> > +        strcpy(char_buf + char_buf_offset, drc_info_list[i].drc_type);
> > +        char_buf_offset += strlen(drc_info_list[i].drc_type) + 1;
> > +    }
> > +
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf,
> > +                      char_buf_offset);
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    g_free(int_buf);
> > +    g_free(char_buf);
> > +    g_free(drc_info_list);
> > +    return ret;
> > +}
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index 63ec687..5c70140 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -193,9 +193,10 @@ typedef struct sPAPRDRConnectorClass {
> >  
> >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> >                                           sPAPRDRConnectorType type,
> > -                                         uint32_t token);
> > +                                         uint32_t id);
> >  sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
> >  sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> >                                             uint32_t id);
> > +int spapr_drc_populate_dt(void *fdt, int fdt_offset, uint32_t drc_type_mask);
> >  
> >  #endif /* __HW_SPAPR_DRC_H__ */
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs
  2015-01-19  5:22   ` David Gibson
@ 2015-01-26 20:44     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26 20:44 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-18 23:22:54)
> On Tue, Dec 23, 2014 at 06:30:28AM -0600, Michael Roth wrote:
> > Reserve 32 entries of type PCI in each PHB's initial FDT. This
> > advertises to guests that each PHB is DR-capable device with
> > physical hotpluggable slots. This is necessary for allowing
> > hotplugging of devices to it later via bus rescan or guest rpaphp
> > hotplug module.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 73e86a4..a5d7791 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -47,6 +47,8 @@
> >  #define RTAS_TYPE_MSI           1
> >  #define RTAS_TYPE_MSIX          2
> >  
> > +#define FDT_MAX_SIZE            0x10000
> 
> This define doesn't appear to be used in the new code.
> 
> >  #include "hw/ppc/spapr_drc.h"
> >  
> >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> > @@ -872,7 +874,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >                            uint32_t xics_phandle,
> >                            void *fdt)
> >  {
> > -    int bus_off, i, j;
> > +    int bus_off, i, j, ret;
> >      char nodename[256];
> >      uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
> >      struct {
> > @@ -951,6 +953,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >      object_child_foreach(OBJECT(phb), spapr_phb_children_dt,
> >                           &((sPAPRTCEDT){ .fdt = fdt, .node_off = bus_off }));
> >  
> > +    ret = spapr_drc_populate_dt(fdt, bus_off, SPAPR_DR_CONNECTOR_TYPE_PCI);
> 
> AFAICT this will add information for all PCI connectors in the
> system.  Shouldn't it only add the ones belonging to this PHB?

Argh, yes indeed. Since we pass in the parent device during
spapr_dr_connector_new() I think I can simply have this pass in the parent PHB
we want to generate entries for as a filter. Will add this for v5 and do some
testing with multiple PHBs.

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2015-01-19  5:58   ` David Gibson
@ 2015-01-26 21:17     ` Michael Roth
  2015-01-27  5:37       ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2015-01-26 21:17 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-18 23:58:28)
> On Tue, Dec 23, 2014 at 06:30:30AM -0600, Michael Roth wrote:
> > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > OF-nodes required by PAPR specification and IEEE 1275-1994
> > "PCI Bus Binding to Open Firmware" for the device.
> > 
> > We associate the corresponding FDT for these nodes with the DrcEntry
> > corresponding to the slot, which will be fetched via
> > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > specification. The FDT is cleaned up in the case of unplug.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c | 268 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 249 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index a5d7791..94e33b4 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -33,6 +33,7 @@
> >  #include <libfdt.h>
> >  #include "trace.h"
> >  #include "qemu/error-report.h"
> > +#include "qapi/qmp/qerror.h"
> >  
> >  #include "hw/pci/pci_bus.h"
> >  
> > @@ -51,6 +52,15 @@
> >  
> >  #include "hw/ppc/spapr_drc.h"
> >  
> > +#define FDT_MAX_SIZE            0x10000
> > +#define _FDT(exp) \
> > +    do { \
> > +        int ret = (exp);                                           \
> > +        if (ret < 0) {                                             \
> > +            return ret;                                            \
> > +        }                                                          \
> > +    } while (0)
> > +
> >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> >  {
> >      sPAPRPHBState *sphb;
> > @@ -483,6 +493,237 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >      return &phb->iommu_as;
> >  }
> >  
> > +/* Macros to operate with address in OF binding to PCI */
> > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > +
> > +/* for 'reg'/'assigned-addresses' OF properties */
> > +#define RESOURCE_CELLS_SIZE 2
> > +#define RESOURCE_CELLS_ADDRESS 3
> > +#define RESOURCE_CELLS_TOTAL \
> > +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> > +
> > +static void fill_resource_props(PCIDevice *d, int bus_num,
> > +                                uint32_t *reg, int *reg_size,
> > +                                uint32_t *assigned, int *assigned_size)
> 
> This is another interface which writes to a buffer without any size
> limit information being passed through, which makes me nervous.
> 
> > +{
> > +    uint32_t *reg_row, *assigned_row;
> > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > +                       b_fff(PCI_FUNC(d->devfn)));
> > +    int i, idx = 0;
> > +
> > +    reg[0] = cpu_to_be32(dev_id);
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        if (!d->io_regions[i].size) {
> > +            continue;
> > +        }
> > +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> > +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> > +        reg_row[0] = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > +            reg_row[0] |= cpu_to_be32(b_ss(1));
> > +        } else {
> > +            reg_row[0] |= cpu_to_be32(b_ss(2));
> > +        }
> > +        assigned_row[0] = cpu_to_be32(reg_row[0] | b_n(1));
> > +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> > +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> > +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> > +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
> 
> You don't appear to ever fill in reg_row[1] and reg_row[2].
> 
> > +        idx++;
> > +    }
> > +
> > +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +}
> > +
> > +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> > +                                       int phb_index, int drc_index)
> > +{
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    char slotname[16];
> > +    bool is_bridge = 1;
> 
> Should use the true and false macros for a bool type, not 0 and 1.
> 
> > +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    int pci_status, reg_size, assigned_size;
> > +
> > +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> > +        PCI_HEADER_TYPE_NORMAL) {
> > +        is_bridge = 0;
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> > +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> > +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> > +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> > +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> > +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> > +
> > +    /* if this device is NOT a bridge */
> > +    if (!is_bridge) {
> > +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> > +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> > +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> > +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> > +
> > +    /* the following fdt cells are masked off the pci status register */
> > +    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> > +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> > +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> > +                          PCI_STATUS_FAST_BACK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> > +                          PCI_STATUS_66MHZ & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> > +                          PCI_STATUS_UDF & pci_status));
> 
> These aren't quite right.  According to the OF PCI binding these are
> boolean properties encoded in the usual way, which is to say absent
> for false and present-but-empty for true.   They shouldn't contain an
> actual value.
> 
> > +
> > +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> > +    sprintf(slotname, "Slot %d", slot + phb_index * PCI_SLOT_MAX);
> > +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> > +                          RESOURCE_CELLS_ADDRESS));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> > +                          RESOURCE_CELLS_SIZE));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> > +                          RESOURCE_CELLS_SIZE));
> > +    fill_resource_props(dev, phb_index, reg, &reg_size,
> > +                        assigned, &assigned_size);
> > +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> > +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> > +                     assigned, assigned_size));
> > +
> > +    return 0;
> > +}
> > +
> > +/* create OF node for pci device and required OF DT properties */
> > +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> > +                                       int drc_index, int *dt_offset)
> > +{
> > +    void *fdt_orig, *fdt;
> > +    int offset, ret;
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    char nodename[512];
> > +
> > +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> > +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> > +    fdt_begin_node(fdt_orig, "");
> > +    fdt_end_node(fdt_orig);
> > +    fdt_finish(fdt_orig);
> 
> Recent versions of libfdt have an fdt_create_empty_tree() function to
> simplify that standard idiom.

Hmm, it doesn't seem to be in the source that qemu.git/dtc points to, so I'm
hesitant to rely on it. Would it be viable to get the QEMU submodule
updated to v1.4.0?

> 
> > +    fdt = g_malloc0(FDT_MAX_SIZE);
> > +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> 
> There's no need for a second malloc here - fdt_open_into() may be used
> in place.
> 
> > +    sprintf(nodename, "pci@%d", slot);
> > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
> > +    g_assert(!ret);
> > +    g_free(fdt_orig);
> > +
> > +    *dt_offset = offset;
> > +    return fdt;
> > +}
> > +
> > +static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
> > +                                     sPAPRPHBState *phb,
> > +                                     PCIDevice *pdev)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    DeviceState *dev = DEVICE(pdev);
> > +    int drc_index = drck->get_index(drc);
> > +    void *fdt = NULL;
> > +    int fdt_start_offset = 0;
> > +
> > +    /* boot-time devices get their device tree node created by SLOF, but for
> > +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> > +     * it via RTAS
> 
> Now that we have to have this code in qemu for the hotplug case we may
> want to consider using it for boot-time devices as well, and removing
> the corresponding code from SLOF, but that's a problem for another day.

Makes sense, since we do this for PHBs already. Can look into it as a follow-up.

> 
> > +     */
> > +    if (dev->hotplugged) {
> > +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
> > +                                        &fdt_start_offset);
> > +    }
> > +    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
> > +}
> > +
> > +static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
> > +{
> > +    object_unparent(OBJECT(dev));
> > +}
> > +
> > +static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
> > +                                        sPAPRPHBState *phb,
> > +                                        PCIDevice *pdev)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
> > +}
> > +
> > +static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> > +                               DeviceState *plugged_dev, Error **errp)
> 
> So, this function is hotplugging a PCI device into an existing PHB,
> rather than hotplugging a PHB itself.  Since the DR protocol does
> support both operations, I could see this name becoming confusing.
> 
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> > +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> 
> Is it safe to call this before checking phb->dr_enabled?

It will be NULL if the DRC wasn't created, so the assertion below the check
should catch any misuse before it happens.

> 
> > +    /* if DR is disabled we don't need to do anything in the case of
> > +     * hotplug or coldplug callbacks
> > +     */
> > +    if (!phb->dr_enabled) {
> > +        /* if this is a hotplug operation initiated by the user
> > +         * we need to let them know it's not enabled
> > +         */
> > +        if (plugged_dev->hotplugged) {
> > +            error_set(errp, QERR_BUS_NO_HOTPLUG,
> > +                      object_get_typename(OBJECT(phb)));
> > +        }
> > +        return;
> > +    }
> > +
> > +    g_assert(drc);
> > +    spapr_device_hotplug_add(drc, phb, pdev);
> > +}
> > +
> > +static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> > +                                 DeviceState *plugged_dev, Error **errp)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> > +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> > +
> > +    if (!phb->dr_enabled) {
> > +        error_set(errp, QERR_BUS_NO_HOTPLUG,
> > +                  object_get_typename(OBJECT(phb)));
> > +        return;
> > +    }
> > +
> > +    spapr_device_hotplug_remove(drc, phb, pdev);
> > +}
> > +
> >  static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >  {
> >      SysBusDevice *s = SYS_BUS_DEVICE(dev);
> > @@ -570,6 +811,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >                             &sphb->memspace, &sphb->iospace,
> >                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> >      phb->bus = bus;
> > +    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
> >  
> >      /*
> >       * Initialize PHB address space.
> > @@ -806,6 +1048,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
> >      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> > +    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
> >  
> >      hc->root_bus_path = spapr_phb_root_bus_path;
> >      dc->realize = spapr_phb_realize;
> > @@ -815,6 +1058,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
> >      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> >      dc->cannot_instantiate_with_device_add_yet = false;
> >      spc->finish_realize = spapr_phb_finish_realize;
> > +    hp->plug = spapr_phb_hot_plug;
> > +    hp->unplug = spapr_phb_hot_unplug;
> >  }
> >  
> >  static const TypeInfo spapr_phb_info = {
> > @@ -823,6 +1068,10 @@ static const TypeInfo spapr_phb_info = {
> >      .instance_size = sizeof(sPAPRPHBState),
> >      .class_init    = spapr_phb_class_init,
> >      .class_size    = sizeof(sPAPRPHBClass),
> > +    .interfaces    = (InterfaceInfo[]) {
> > +        { TYPE_HOTPLUG_HANDLER },
> > +        { }
> > +    }
> >  };
> >  
> >  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> > @@ -836,17 +1085,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> >      return PCI_HOST_BRIDGE(dev);
> >  }
> >  
> > -/* Macros to operate with address in OF binding to PCI */
> > -#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > -#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > -#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > -#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > -#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > -#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > -#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > -#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > -#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > -
> >  typedef struct sPAPRTCEDT {
> >      void *fdt;
> >      int node_off;
> > @@ -906,14 +1144,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >          return bus_off;
> >      }
> >  
> > -#define _FDT(exp) \
> > -    do { \
> > -        int ret = (exp);                                           \
> > -        if (ret < 0) {                                             \
> > -            return ret;                                            \
> > -        }                                                          \
> > -    } while (0)
> > -
> >      /* Write PHB properties */
> >      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
> >      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2015-01-23  5:17   ` Alexey Kardashevskiy
@ 2015-01-26 21:20     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26 21:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting Alexey Kardashevskiy (2015-01-22 23:17:21)
> On 12/23/2014 11:30 PM, Michael Roth wrote:
> > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > OF-nodes required by PAPR specification and IEEE 1275-1994
> > "PCI Bus Binding to Open Firmware" for the device.
> > 
> > We associate the corresponding FDT for these nodes with the DrcEntry
> > corresponding to the slot, which will be fetched via
> > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > specification. The FDT is cleaned up in the case of unplug.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c | 268 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 249 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index a5d7791..94e33b4 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -33,6 +33,7 @@
> >  #include <libfdt.h>
> >  #include "trace.h"
> >  #include "qemu/error-report.h"
> > +#include "qapi/qmp/qerror.h"
> >  
> >  #include "hw/pci/pci_bus.h"
> >  
> > @@ -51,6 +52,15 @@
> >  
> >  #include "hw/ppc/spapr_drc.h"
> >  
> > +#define FDT_MAX_SIZE            0x10000
> > +#define _FDT(exp) \
> > +    do { \
> > +        int ret = (exp);                                           \
> > +        if (ret < 0) {                                             \
> > +            return ret;                                            \
> > +        }                                                          \
> > +    } while (0)
> > +
> >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> >  {
> >      sPAPRPHBState *sphb;
> > @@ -483,6 +493,237 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >      return &phb->iommu_as;
> >  }
> >  
> > +/* Macros to operate with address in OF binding to PCI */
> > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > +
> > +/* for 'reg'/'assigned-addresses' OF properties */
> > +#define RESOURCE_CELLS_SIZE 2
> > +#define RESOURCE_CELLS_ADDRESS 3
> > +#define RESOURCE_CELLS_TOTAL \
> > +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> > +
> > +static void fill_resource_props(PCIDevice *d, int bus_num,
> > +                                uint32_t *reg, int *reg_size,
> > +                                uint32_t *assigned, int *assigned_size)
> > +{
> > +    uint32_t *reg_row, *assigned_row;
> > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > +                       b_fff(PCI_FUNC(d->devfn)));
> > +    int i, idx = 0;
> > +
> > +    reg[0] = cpu_to_be32(dev_id);
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        if (!d->io_regions[i].size) {
> > +            continue;
> > +        }
> > +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> > +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> > +        reg_row[0] = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > +            reg_row[0] |= cpu_to_be32(b_ss(1));
> > +        } else {
> > +            reg_row[0] |= cpu_to_be32(b_ss(2));
> > +        }
> > +        assigned_row[0] = cpu_to_be32(reg_row[0] | b_n(1));
> > +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> > +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> > +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> > +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
> > +        idx++;
> > +    }
> > +
> > +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +}
> > +
> > +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> > +                                       int phb_index, int drc_index)
> > +{
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    char slotname[16];
> > +    bool is_bridge = 1;
> > +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    int pci_status, reg_size, assigned_size;
> > +
> > +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> > +        PCI_HEADER_TYPE_NORMAL) {
> > +        is_bridge = 0;
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> > +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> > +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> > +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> > +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> > +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> > +
> > +    /* if this device is NOT a bridge */
> > +    if (!is_bridge) {
> > +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> > +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> > +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> > +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> > +
> > +    /* the following fdt cells are masked off the pci status register */
> > +    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> > +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> > +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> > +                          PCI_STATUS_FAST_BACK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> > +                          PCI_STATUS_66MHZ & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> > +                          PCI_STATUS_UDF & pci_status));
> > +
> > +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> > +    sprintf(slotname, "Slot %d", slot + phb_index * PCI_SLOT_MAX);
> > +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> > +                          RESOURCE_CELLS_ADDRESS));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> > +                          RESOURCE_CELLS_SIZE));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> > +                          RESOURCE_CELLS_SIZE));
> > +    fill_resource_props(dev, phb_index, reg, &reg_size,
> > +                        assigned, &assigned_size);
> > +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> > +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> > +                     assigned, assigned_size));
> > +
> > +    return 0;
> > +}
> > +
> > +/* create OF node for pci device and required OF DT properties */
> > +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> > +                                       int drc_index, int *dt_offset)
> > +{
> > +    void *fdt_orig, *fdt;
> > +    int offset, ret;
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    char nodename[512];
> > +
> > +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> > +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> > +    fdt_begin_node(fdt_orig, "");
> > +    fdt_end_node(fdt_orig);
> > +    fdt_finish(fdt_orig);
> > +
> > +    fdt = g_malloc0(FDT_MAX_SIZE);
> > +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> > +    sprintf(nodename, "pci@%d", slot);
> > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
> > +    g_assert(!ret);
> > +    g_free(fdt_orig);
> > +
> > +    *dt_offset = offset;
> > +    return fdt;
> > +}
> > +
> > +static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
> > +                                     sPAPRPHBState *phb,
> > +                                     PCIDevice *pdev)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    DeviceState *dev = DEVICE(pdev);
> > +    int drc_index = drck->get_index(drc);
> > +    void *fdt = NULL;
> > +    int fdt_start_offset = 0;
> > +
> > +    /* boot-time devices get their device tree node created by SLOF, but for
> > +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> > +     * it via RTAS
> > +     */
> > +    if (dev->hotplugged) {
> > +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
> > +                                        &fdt_start_offset);
> > +    }
> > +    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
> > +}
> > +
> > +static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
> > +{
> 
> 
> I believe pci_device_reset(ccs->dev) is missing here as we need to deassert
> INTx or otherwise we hit assert in pcibus_reset().

I was thinking this would be handled in the PCI unplug path elsewhere, but it
looks like we only do it for realize(). Strange that it doesn't seem to be
needed on x86...maybe a difference in guest kernels. Will confirm and add for
v5 if that's the case.

> 
> 
> > +    object_unparent(OBJECT(dev));
> > +}
> > +
> > +static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
> > +                                        sPAPRPHBState *phb,
> > +                                        PCIDevice *pdev)
> > +{
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
> > +}
> > +
> > +static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> > +                               DeviceState *plugged_dev, Error **errp)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> > +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> > +
> > +    /* if DR is disabled we don't need to do anything in the case of
> > +     * hotplug or coldplug callbacks
> > +     */
> > +    if (!phb->dr_enabled) {
> > +        /* if this is a hotplug operation initiated by the user
> > +         * we need to let them know it's not enabled
> > +         */
> > +        if (plugged_dev->hotplugged) {
> > +            error_set(errp, QERR_BUS_NO_HOTPLUG,
> > +                      object_get_typename(OBJECT(phb)));
> > +        }
> > +        return;
> > +    }
> > +
> > +    g_assert(drc);
> > +    spapr_device_hotplug_add(drc, phb, pdev);
> > +}
> > +
> > +static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> > +                                 DeviceState *plugged_dev, Error **errp)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> > +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> > +
> > +    if (!phb->dr_enabled) {
> > +        error_set(errp, QERR_BUS_NO_HOTPLUG,
> > +                  object_get_typename(OBJECT(phb)));
> > +        return;
> > +    }
> > +
> > +    spapr_device_hotplug_remove(drc, phb, pdev);
> > +}
> > +
> >  static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >  {
> >      SysBusDevice *s = SYS_BUS_DEVICE(dev);
> > @@ -570,6 +811,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >                             &sphb->memspace, &sphb->iospace,
> >                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> >      phb->bus = bus;
> > +    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
> >  
> >      /*
> >       * Initialize PHB address space.
> > @@ -806,6 +1048,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
> >      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> >      DeviceClass *dc = DEVICE_CLASS(klass);
> >      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> > +    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
> >  
> >      hc->root_bus_path = spapr_phb_root_bus_path;
> >      dc->realize = spapr_phb_realize;
> > @@ -815,6 +1058,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
> >      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> >      dc->cannot_instantiate_with_device_add_yet = false;
> >      spc->finish_realize = spapr_phb_finish_realize;
> > +    hp->plug = spapr_phb_hot_plug;
> > +    hp->unplug = spapr_phb_hot_unplug;
> >  }
> >  
> >  static const TypeInfo spapr_phb_info = {
> > @@ -823,6 +1068,10 @@ static const TypeInfo spapr_phb_info = {
> >      .instance_size = sizeof(sPAPRPHBState),
> >      .class_init    = spapr_phb_class_init,
> >      .class_size    = sizeof(sPAPRPHBClass),
> > +    .interfaces    = (InterfaceInfo[]) {
> > +        { TYPE_HOTPLUG_HANDLER },
> > +        { }
> > +    }
> >  };
> >  
> >  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> > @@ -836,17 +1085,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> >      return PCI_HOST_BRIDGE(dev);
> >  }
> >  
> > -/* Macros to operate with address in OF binding to PCI */
> > -#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > -#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > -#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > -#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > -#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > -#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > -#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > -#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > -#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > -
> >  typedef struct sPAPRTCEDT {
> >      void *fdt;
> >      int node_off;
> > @@ -906,14 +1144,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >          return bus_off;
> >      }
> >  
> > -#define _FDT(exp) \
> > -    do { \
> > -        int ret = (exp);                                           \
> > -        if (ret < 0) {                                             \
> > -            return ret;                                            \
> > -        }                                                          \
> > -    } while (0)
> > -
> >      /* Write PHB properties */
> >      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
> >      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug
  2015-01-19  6:00   ` David Gibson
@ 2015-01-26 21:32     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2015-01-26 21:32 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-19 00:00:00)
> On Tue, Dec 23, 2014 at 06:30:31AM -0600, Michael Roth wrote:
> > From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> > 
> > This uses extension of existing EPOW interrupt/event mechanism
> > to notify userspace tools like librtas/drmgr to handle
> > in-guest configuration/cleanup operations in response to
> > device_add/device_del.
> > 
> > Userspace tools that don't implement this extension will need
> > to be run manually in response/advance of device_add/device_del,
> > respectively.
> > 
> > Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 94e33b4..f17f984 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -705,6 +705,9 @@ static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> >  
> >      g_assert(drc);
> >      spapr_device_hotplug_add(drc, phb, pdev);
> > +    if (plugged_dev->hotplugged) {
> > +        spapr_hotplug_req_add_event(drc);
> > +    }
> >  }
> >  
> >  static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> > @@ -722,6 +725,7 @@ static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
> >      }
> >  
> >      spapr_device_hotplug_remove(drc, phb, pdev);
> > +    spapr_hotplug_req_remove_event(drc);
> 
> The event is sent after the "physical" remove.  Is that correct?

From the guest perspective it doesn't really matter since we default to an
allocation state of UNISOLATED, so we always end up waiting for the
guest-induced transition to ISOLATED before completing the removal (or a
reboot, in which case the event is not needed).

Thank you for the excellent review. I've responded where clarification
seemed warranted, but otherwise plan on addressing all the comments in
v5 which should go out soon.

> 
> >  }
> >  
> >  static void spapr_phb_realize(DeviceState *dev, Error **errp)
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-26  5:21     ` Michael Roth
@ 2015-01-27  5:24       ` David Gibson
  2015-01-27 21:36         ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2015-01-27  5:24 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 3541 bytes --]

On Sun, Jan 25, 2015 at 11:21:26PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-01-16 00:21:55)
> > On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
> > > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > > 
> > > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
> > >  1 file changed, 25 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index 2ec2a8e..a2fb533 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
> > >      rtas_st(rets, 0, ret);
> > >  }
> > >  
> > > +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > +                                 uint32_t token, uint32_t nargs,
> > > +                                 target_ulong args, uint32_t nret,
> > > +                                 target_ulong rets)
> > > +{
> > > +    /* we currently only use a single, "live insert" powerdomain for
> > > +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
> > > +     */
> > 
> > Even so, you should at least validate the number of args and rets, and
> > preferably check that the user isn't attempt to set something for some
> > other, non-existent power domain.
> > 
> > > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +    rtas_st(rets, 1, 100);
> > > +}
> > > +
> > > +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > +                                  uint32_t token, uint32_t nargs,
> > > +                                  target_ulong args, uint32_t nret,
> > > +                                  target_ulong rets)
> > > +{
> > > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +    rtas_st(rets, 1, 100);
> > > +}
> > > +
> > >  static struct rtas_call {
> > >      const char *name;
> > >      spapr_rtas_fn fn;
> > > @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
> > >                          rtas_ibm_set_system_parameter);
> > >      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
> > >                          rtas_ibm_os_term);
> > > +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
> > > +                        rtas_set_power_level);
> > > +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> > > +                        rtas_get_power_level);
> > >  }
> > >  
> > >  type_init(core_rtas_register_types)
> > 
> > This code should probably go in spapr_drc.c.  The idea that spapr_rtas
> > was just the RTAS dispatch code, and RTAS functions that had no other
> > home.  Generally RTAS functions should live with the devices they're
> > connected to.
> 
> In this particular case the calls act on a "power domain" which isn't
> actually modeled in QEMU (we just assume a single "live-insertion" domain
> which just magically does everything we want), so I think it makes
> sense to leave these here.

Yeah, fair enough.

> But for the others it does make sense to tie them with spapr_drc.c, or
> maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
> well-defined accessors.

Ok.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-01-26 16:56     ` Michael Roth
@ 2015-01-27  5:27       ` David Gibson
  2015-01-28  3:56       ` Bharata B Rao
  1 sibling, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-27  5:27 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 6042 bytes --]

On Mon, Jan 26, 2015 at 10:56:51AM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-01-18 22:31:23)
> > On Tue, Dec 23, 2014 at 06:30:22AM -0600, Michael Roth wrote:
> > > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[snip]
> > > +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
> > > +{
> > > +    struct hp_log_full *new_hp;
> > > +    struct rtas_error_log *hdr;
> > > +    struct rtas_event_log_v6 *v6hdr;
> > > +    struct rtas_event_log_v6_maina *maina;
> > > +    struct rtas_event_log_v6_mainb *mainb;
> > > +    struct rtas_event_log_v6_hp *hp;
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
> > > +
> > > +    new_hp = g_malloc0(sizeof(struct hp_log_full));
> > > +    hdr = &new_hp->hdr;
> > > +    v6hdr = &new_hp->v6hdr;
> > > +    maina = &new_hp->maina;
> > > +    mainb = &new_hp->mainb;
> > > +    hp = &new_hp->hp;
> > > +
> > > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > > +                               | RTAS_LOG_SEVERITY_EVENT
> > > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > > +                               | RTAS_LOG_INITIATOR_HOTPLUG
> > > +                               | RTAS_LOG_TYPE_HOTPLUG);
> > > +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
> > > +                                       - sizeof(new_hp->hdr));
> > > +
> > > +    spapr_init_v6hdr(v6hdr);
> > > +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> > > +
> > > +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> > > +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > > +    mainb->subsystem_id = 0x80; /* External environment */
> > > +    mainb->event_severity = 0x00; /* Informational / non-error */
> > > +    mainb->event_subtype = 0x00; /* Normal shutdown */
> > > +
> > > +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> > > +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> > > +    hp->hdr.section_version = 1; /* includes extended modifier */
> > > +    hp->hotplug_action = hp_action;
> > > +
> > > +
> > > +    switch (drc_type) {
> > > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > > +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> > > +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> > > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> > > +        break;
> > > +    default:
> > > +        /* skip notification for unknown connector types */
> > > +        g_free(new_hp);
> > > +        return;
> > > +    }
> > > +
> > > +    if (pending_hp) {
> > > +        /* Just toss any pending hotplug events for now, this will
> > > +         * need to be fixed later on.
> > > +         */
> > 
> > So, we can get away with a 1-element queue for EPOW, because they're
> > just triggering a shutdown - so once the first one's processed, any
> > others aren't going to matter.  For hotplug you really do need a
> > proper queue.
> 
> Yah, this was discussed in the past, but until now I didn't notice how
> easy it would be to trigger this when hotplugging multiple devices from
> a script or management harness of some sort. Should be simple enough to
> fix for v5 though.

Ok, good.

> > > +        g_free(pending_hp);
> > > +    }
> > > +    pending_hp = new_hp;
> > > +
> > > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > > +}
> > > +
> > > +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
> > > +{
> > > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
> > > +}
> > > +
> > > +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
> > > +{
> > > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
> > >  }
> > >  
> > >  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > @@ -298,15 +420,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > >          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
> > >      }
> > >  
> > > -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> > > -        if (sizeof(*pending_epow) < len) {
> > > -            len = sizeof(*pending_epow);
> > > -        }
> > > +    if (mask & EVENT_MASK_EPOW) {
> > > +        if (pending_epow) {
> > > +            if (sizeof(*pending_epow) < len) {
> > > +                len = sizeof(*pending_epow);
> > > +            }
> > >  
> > > -        cpu_physical_memory_write(buf, pending_epow, len);
> > > -        g_free(pending_epow);
> > > -        pending_epow = NULL;
> > > -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +            cpu_physical_memory_write(buf, pending_epow, len);
> > > +            g_free(pending_epow);
> > > +            pending_epow = NULL;
> > > +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > +        } else if (pending_hp) {
> > 
> > So.. the hotplug messages are a different type from EPOW, but are
> > still selected by EVENT_MASK_EPOW?  Seems a bit odd.
> 
> It's a little odd, but it's mainly just due to the way we surface the
> hotplug event. Rather than requiring patched guest kernels, we opted
> to re-use and generalize the EPOW IRQ to also surface hotplug-related
> RTAS events, which is why we still expect the EVENT_MASK_EPOW when
> returning an HP event via check-exception.
> 
> EPOW events have well-defined behavior in how they're exposed to
> userspace via rtas_errd, which allows us to add hotplug support for
> older guests via patched userspace tools.

Ok.  Kind of a hack, but for good reasons.  I kind of think it needs a
comment, but I'm not sure where it would belong.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt()
  2015-01-26 20:35     ` Michael Roth
@ 2015-01-27  5:30       ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-27  5:30 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

On Mon, Jan 26, 2015 at 02:35:58PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-01-18 23:15:28)
> > On Tue, Dec 23, 2014 at 06:30:24AM -0600, Michael Roth wrote:
[snip]
> > > +/* generate a human-readable name for a DRC to encode into the DT
> > > + * description. this is mainly only used within a guest in place
> > > + * of the unique DRC index.
> > > + *
> > > + * in the case of VIO/PCI devices, it corresponds to a
> > > + * "location code" that maps a logical device/function (DRC index)
> > > + * to a physical (or virtual in the case of VIO) location in the
> > > + * system by chaining together the "location label" for each
> > > + * encapsulating component.
> > > + *
> > > + * since this is more to do with diagnosing physical hardware
> > > + * issues than guest compatibility, we choose location codes/DRC
> > > + * names that adhere to the documented format, but avoid encoding
> > > + * the entire topology information into the label/code, instead
> > > + * just using the location codes based on the labels for the
> > > + * endpoints (VIO/PCI adaptor connectors), which is basically
> > > + * just "C" followed by an integer ID.
> > 
> > Hrm.. would it make sense to include here the qemu "id" value on the
> > DRC device?  That will make names which are matchable to specific
> > elements on the qemu command line, which about as close an equivalent
> > to a physical location as I can think of.
> 
> I'm not sure I understand the suggestion. We do make use of the
> drc->id values to generate this, though those don't really
> correspond to "id"/DeviceState->id properties as specified on
> the command-line. There's currently no plans to create the DRCs via
> -device since the IDs are dependent on/chosen by the parent devices in
> in this case (DRC IDs for PCI slots inherit/encode parent bus/controller
> index for example). Did you have something else in mind?

I guess I was thinking of building this location code from the
DeviceState->id of the bus bridge (or whatever) which created the DRC
slot.  Basically giving the user a handle on which qemu parameter is
related to this hotplug slot.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations
  2015-01-26 21:17     ` Michael Roth
@ 2015-01-27  5:37       ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2015-01-27  5:37 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 4650 bytes --]

On Mon, Jan 26, 2015 at 03:17:31PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-01-18 23:58:28)
> > On Tue, Dec 23, 2014 at 06:30:30AM -0600, Michael Roth wrote:
[snip]
> > > +/* create OF node for pci device and required OF DT properties */
> > > +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> > > +                                       int drc_index, int *dt_offset)
> > > +{
> > > +    void *fdt_orig, *fdt;
> > > +    int offset, ret;
> > > +    int slot = PCI_SLOT(dev->devfn);
> > > +    char nodename[512];
> > > +
> > > +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> > > +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> > > +    fdt_begin_node(fdt_orig, "");
> > > +    fdt_end_node(fdt_orig);
> > > +    fdt_finish(fdt_orig);
> > 
> > Recent versions of libfdt have an fdt_create_empty_tree() function to
> > simplify that standard idiom.
> 
> Hmm, it doesn't seem to be in the source that qemu.git/dtc points to, so I'm
> hesitant to rely on it. Would it be viable to get the QEMU submodule
> updated to v1.4.0?

Ah, right.  Yes, we should probably update the qemu submodule, but I
don't think your patches should have to wait on that.


> > > +    fdt = g_malloc0(FDT_MAX_SIZE);
> > > +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> > 
> > There's no need for a second malloc here - fdt_open_into() may be used
> > in place.
> > 
> > > +    sprintf(nodename, "pci@%d", slot);
> > > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > > +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index);
> > > +    g_assert(!ret);
> > > +    g_free(fdt_orig);
> > > +
> > > +    *dt_offset = offset;
> > > +    return fdt;
> > > +}
> > > +
> > > +static void spapr_device_hotplug_add(sPAPRDRConnector *drc,
> > > +                                     sPAPRPHBState *phb,
> > > +                                     PCIDevice *pdev)
> > > +{
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    DeviceState *dev = DEVICE(pdev);
> > > +    int drc_index = drck->get_index(drc);
> > > +    void *fdt = NULL;
> > > +    int fdt_start_offset = 0;
> > > +
> > > +    /* boot-time devices get their device tree node created by SLOF, but for
> > > +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> > > +     * it via RTAS
> > 
> > Now that we have to have this code in qemu for the hotplug case we may
> > want to consider using it for boot-time devices as well, and removing
> > the corresponding code from SLOF, but that's a problem for another day.
> 
> Makes sense, since we do this for PHBs already. Can look into it as
> a follow-up.

Ok, great.

> > > +     */
> > > +    if (dev->hotplugged) {
> > > +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index,
> > > +                                        &fdt_start_offset);
> > > +    }
> > > +    drck->attach(drc, DEVICE(pdev), fdt, fdt_start_offset, !dev->hotplugged);
> > > +}
> > > +
> > > +static void spapr_device_hotplug_remove_cb(DeviceState *dev, void *opaque)
> > > +{
> > > +    object_unparent(OBJECT(dev));
> > > +}
> > > +
> > > +static void spapr_device_hotplug_remove(sPAPRDRConnector *drc,
> > > +                                        sPAPRPHBState *phb,
> > > +                                        PCIDevice *pdev)
> > > +{
> > > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +
> > > +    drck->detach(drc, DEVICE(pdev), spapr_device_hotplug_remove_cb, phb);
> > > +}
> > > +
> > > +static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
> > > +                               DeviceState *plugged_dev, Error **errp)
> > 
> > So, this function is hotplugging a PCI device into an existing PHB,
> > rather than hotplugging a PHB itself.  Since the DR protocol does
> > support both operations, I could see this name becoming confusing.
> > 
> > > +{
> > > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> > > +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> > > +    sPAPRDRConnector *drc =
> > > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_PCI, pdev->devfn);
> > 
> > Is it safe to call this before checking phb->dr_enabled?
> 
> It will be NULL if the DRC wasn't created, so the assertion below the check
> should catch any misuse before it happens.

Ok.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-27  5:24       ` David Gibson
@ 2015-01-27 21:36         ` Michael Roth
  2015-01-27 22:05           ` Tyrel Datwyler
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2015-01-27 21:36 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-01-26 23:24:11)
> On Sun, Jan 25, 2015 at 11:21:26PM -0600, Michael Roth wrote:
> > Quoting David Gibson (2015-01-16 00:21:55)
> > > On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
> > > > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > > > 
> > > > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
> > > >  1 file changed, 25 insertions(+)
> > > > 
> > > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > > index 2ec2a8e..a2fb533 100644
> > > > --- a/hw/ppc/spapr_rtas.c
> > > > +++ b/hw/ppc/spapr_rtas.c
> > > > @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
> > > >      rtas_st(rets, 0, ret);
> > > >  }
> > > >  
> > > > +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > > +                                 uint32_t token, uint32_t nargs,
> > > > +                                 target_ulong args, uint32_t nret,
> > > > +                                 target_ulong rets)
> > > > +{
> > > > +    /* we currently only use a single, "live insert" powerdomain for
> > > > +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
> > > > +     */
> > > 
> > > Even so, you should at least validate the number of args and rets, and
> > > preferably check that the user isn't attempt to set something for some
> > > other, non-existent power domain.
> > > 
> > > > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > > +    rtas_st(rets, 1, 100);
> > > > +}
> > > > +
> > > > +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > > +                                  uint32_t token, uint32_t nargs,
> > > > +                                  target_ulong args, uint32_t nret,
> > > > +                                  target_ulong rets)
> > > > +{
> > > > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > > +    rtas_st(rets, 1, 100);
> > > > +}
> > > > +
> > > >  static struct rtas_call {
> > > >      const char *name;
> > > >      spapr_rtas_fn fn;
> > > > @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
> > > >                          rtas_ibm_set_system_parameter);
> > > >      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
> > > >                          rtas_ibm_os_term);
> > > > +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
> > > > +                        rtas_set_power_level);
> > > > +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> > > > +                        rtas_get_power_level);
> > > >  }
> > > >  
> > > >  type_init(core_rtas_register_types)
> > > 
> > > This code should probably go in spapr_drc.c.  The idea that spapr_rtas
> > > was just the RTAS dispatch code, and RTAS functions that had no other
> > > home.  Generally RTAS functions should live with the devices they're
> > > connected to.
> > 
> > In this particular case the calls act on a "power domain" which isn't
> > actually modeled in QEMU (we just assume a single "live-insertion" domain
> > which just magically does everything we want), so I think it makes
> > sense to leave these here.
> 
> Yeah, fair enough.

Hmm, looking at it again, set-indicator and get-sensor-state aren't actually
specific to DR, but might be extended to handle a number of other types of
sensors in the future ("Reset Component", "Error Log", and "Global Interrupt
Queue Control" may be interesting in this regard).

So it looks like only configure-connector is specifically for DR. Still
planning on moving it to spapr_drc_rtas.c, unless you'd prefer not to
at this point (it'll be lonely for the foreseable future).

> 
> > But for the others it does make sense to tie them with spapr_drc.c, or
> > maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
> > well-defined accessors.
> 
> Ok.
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-27 21:36         ` Michael Roth
@ 2015-01-27 22:05           ` Tyrel Datwyler
  2015-01-28  0:42             ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: Tyrel Datwyler @ 2015-01-27 22:05 UTC (permalink / raw)
  To: Michael Roth, David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, bharata.rao, nfont

On 01/27/2015 01:36 PM, Michael Roth wrote:
> Quoting David Gibson (2015-01-26 23:24:11)
>> On Sun, Jan 25, 2015 at 11:21:26PM -0600, Michael Roth wrote:
>>> Quoting David Gibson (2015-01-16 00:21:55)
>>>> On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
>>>>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>>
>>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>>>>> ---
>>>>>  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
>>>>>  1 file changed, 25 insertions(+)
>>>>>
>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>> index 2ec2a8e..a2fb533 100644
>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>> @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>>>>      rtas_st(rets, 0, ret);
>>>>>  }
>>>>>  
>>>>> +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>> +                                 uint32_t token, uint32_t nargs,
>>>>> +                                 target_ulong args, uint32_t nret,
>>>>> +                                 target_ulong rets)
>>>>> +{
>>>>> +    /* we currently only use a single, "live insert" powerdomain for
>>>>> +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
>>>>> +     */
>>>>
>>>> Even so, you should at least validate the number of args and rets, and
>>>> preferably check that the user isn't attempt to set something for some
>>>> other, non-existent power domain.
>>>>
>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>> +    rtas_st(rets, 1, 100);
>>>>> +}
>>>>> +
>>>>> +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>> +                                  target_ulong args, uint32_t nret,
>>>>> +                                  target_ulong rets)
>>>>> +{
>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>> +    rtas_st(rets, 1, 100);
>>>>> +}
>>>>> +
>>>>>  static struct rtas_call {
>>>>>      const char *name;
>>>>>      spapr_rtas_fn fn;
>>>>> @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
>>>>>                          rtas_ibm_set_system_parameter);
>>>>>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
>>>>>                          rtas_ibm_os_term);
>>>>> +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
>>>>> +                        rtas_set_power_level);
>>>>> +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>>> +                        rtas_get_power_level);
>>>>>  }
>>>>>  
>>>>>  type_init(core_rtas_register_types)
>>>>
>>>> This code should probably go in spapr_drc.c.  The idea that spapr_rtas
>>>> was just the RTAS dispatch code, and RTAS functions that had no other
>>>> home.  Generally RTAS functions should live with the devices they're
>>>> connected to.
>>>
>>> In this particular case the calls act on a "power domain" which isn't
>>> actually modeled in QEMU (we just assume a single "live-insertion" domain
>>> which just magically does everything we want), so I think it makes
>>> sense to leave these here.
>>
>> Yeah, fair enough.
> 
> Hmm, looking at it again, set-indicator and get-sensor-state aren't actually
> specific to DR, but might be extended to handle a number of other types of
> sensors in the future ("Reset Component", "Error Log", and "Global Interrupt
> Queue Control" may be interesting in this regard).
> 
> So it looks like only configure-connector is specifically for DR. Still
> planning on moving it to spapr_drc_rtas.c, unless you'd prefer not to
> at this point (it'll be lonely for the foreseable future).

I will point out that configure-connector would be used for
hibernation/migration if we ever implement it as it is defined by PAPR.
I know for migration there had been some talk in the past about someday
doing it the right way.

-Tyrel

> 
>>
>>> But for the others it does make sense to tie them with spapr_drc.c, or
>>> maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
>>> well-defined accessors.
>>
>> Ok.
>>
>> -- 
>> David Gibson                    | I'll have my music baroque, and my code
>> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>>                                 | _way_ _around_!
>> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-27 22:05           ` Tyrel Datwyler
@ 2015-01-28  0:42             ` Michael Roth
  2015-02-09 16:29               ` Nathan Fontenot
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2015-01-28  0:42 UTC (permalink / raw)
  To: Tyrel Datwyler, David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, bharata.rao, nfont

Quoting Tyrel Datwyler (2015-01-27 16:05:52)
> On 01/27/2015 01:36 PM, Michael Roth wrote:
> > Quoting David Gibson (2015-01-26 23:24:11)
> >> On Sun, Jan 25, 2015 at 11:21:26PM -0600, Michael Roth wrote:
> >>> Quoting David Gibson (2015-01-16 00:21:55)
> >>>> On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
> >>>>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >>>>>
> >>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >>>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> >>>>> ---
> >>>>>  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
> >>>>>  1 file changed, 25 insertions(+)
> >>>>>
> >>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>> index 2ec2a8e..a2fb533 100644
> >>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>> @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
> >>>>>      rtas_st(rets, 0, ret);
> >>>>>  }
> >>>>>  
> >>>>> +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >>>>> +                                 uint32_t token, uint32_t nargs,
> >>>>> +                                 target_ulong args, uint32_t nret,
> >>>>> +                                 target_ulong rets)
> >>>>> +{
> >>>>> +    /* we currently only use a single, "live insert" powerdomain for
> >>>>> +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
> >>>>> +     */
> >>>>
> >>>> Even so, you should at least validate the number of args and rets, and
> >>>> preferably check that the user isn't attempt to set something for some
> >>>> other, non-existent power domain.
> >>>>
> >>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>> +    rtas_st(rets, 1, 100);
> >>>>> +}
> >>>>> +
> >>>>> +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>> +                                  target_ulong args, uint32_t nret,
> >>>>> +                                  target_ulong rets)
> >>>>> +{
> >>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>> +    rtas_st(rets, 1, 100);
> >>>>> +}
> >>>>> +
> >>>>>  static struct rtas_call {
> >>>>>      const char *name;
> >>>>>      spapr_rtas_fn fn;
> >>>>> @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
> >>>>>                          rtas_ibm_set_system_parameter);
> >>>>>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
> >>>>>                          rtas_ibm_os_term);
> >>>>> +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
> >>>>> +                        rtas_set_power_level);
> >>>>> +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
> >>>>> +                        rtas_get_power_level);
> >>>>>  }
> >>>>>  
> >>>>>  type_init(core_rtas_register_types)
> >>>>
> >>>> This code should probably go in spapr_drc.c.  The idea that spapr_rtas
> >>>> was just the RTAS dispatch code, and RTAS functions that had no other
> >>>> home.  Generally RTAS functions should live with the devices they're
> >>>> connected to.
> >>>
> >>> In this particular case the calls act on a "power domain" which isn't
> >>> actually modeled in QEMU (we just assume a single "live-insertion" domain
> >>> which just magically does everything we want), so I think it makes
> >>> sense to leave these here.
> >>
> >> Yeah, fair enough.
> > 
> > Hmm, looking at it again, set-indicator and get-sensor-state aren't actually
> > specific to DR, but might be extended to handle a number of other types of
> > sensors in the future ("Reset Component", "Error Log", and "Global Interrupt
> > Queue Control" may be interesting in this regard).
> > 
> > So it looks like only configure-connector is specifically for DR. Still
> > planning on moving it to spapr_drc_rtas.c, unless you'd prefer not to
> > at this point (it'll be lonely for the foreseable future).
> 
> I will point out that configure-connector would be used for
> hibernation/migration if we ever implement it as it is defined by PAPR.
> I know for migration there had been some talk in the past about someday
> doing it the right way.

Interesting. Do you know if calls are still tied to DRCs in this case, or
is is there something more general like fetching the entire device-tree? I
didn't see anything in PAPR+ 2.7 outside of the DR use-case where it acts
on a particular DRC.

> 
> -Tyrel
> 
> > 
> >>
> >>> But for the others it does make sense to tie them with spapr_drc.c, or
> >>> maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
> >>> well-defined accessors.
> >>
> >> Ok.
> >>
> >> -- 
> >> David Gibson                    | I'll have my music baroque, and my code
> >> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
> >>                                 | _way_ _around_!
> >> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-01-26 16:56     ` Michael Roth
  2015-01-27  5:27       ` David Gibson
@ 2015-01-28  3:56       ` Bharata B Rao
  1 sibling, 0 replies; 55+ messages in thread
From: Bharata B Rao @ 2015-01-28  3:56 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, Alexander Graf, qemu-devel, ncmike, qemu-ppc, tyreld,
	Nathan Fontenot, David Gibson

On Mon, Jan 26, 2015 at 10:26 PM, Michael Roth
<mdroth@linux.vnet.ibm.com> wrote:
> Quoting David Gibson (2015-01-18 22:31:23)
>> On Tue, Dec 23, 2014 at 06:30:22AM -0600, Michael Roth wrote:
>> > +
>> > +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
>> > +{
>> > +    struct hp_log_full *new_hp;
>> > +    struct rtas_error_log *hdr;
>> > +    struct rtas_event_log_v6 *v6hdr;
>> > +    struct rtas_event_log_v6_maina *maina;
>> > +    struct rtas_event_log_v6_mainb *mainb;
>> > +    struct rtas_event_log_v6_hp *hp;
>> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>> > +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
>> > +
>> > +    new_hp = g_malloc0(sizeof(struct hp_log_full));
>> > +    hdr = &new_hp->hdr;
>> > +    v6hdr = &new_hp->v6hdr;
>> > +    maina = &new_hp->maina;
>> > +    mainb = &new_hp->mainb;
>> > +    hp = &new_hp->hp;
>> > +
>> > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
>> > +                               | RTAS_LOG_SEVERITY_EVENT
>> > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
>> > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
>> > +                               | RTAS_LOG_INITIATOR_HOTPLUG
>> > +                               | RTAS_LOG_TYPE_HOTPLUG);
>> > +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
>> > +                                       - sizeof(new_hp->hdr));
>> > +
>> > +    spapr_init_v6hdr(v6hdr);
>> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
>> > +
>> > +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
>> > +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
>> > +    mainb->subsystem_id = 0x80; /* External environment */
>> > +    mainb->event_severity = 0x00; /* Informational / non-error */
>> > +    mainb->event_subtype = 0x00; /* Normal shutdown */
>> > +
>> > +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
>> > +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
>> > +    hp->hdr.section_version = 1; /* includes extended modifier */
>> > +    hp->hotplug_action = hp_action;
>> > +
>> > +
>> > +    switch (drc_type) {
>> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
>> > +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
>> > +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
>> > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
>> > +        break;
>> > +    default:
>> > +        /* skip notification for unknown connector types */
>> > +        g_free(new_hp);
>> > +        return;
>> > +    }
>> > +
>> > +    if (pending_hp) {
>> > +        /* Just toss any pending hotplug events for now, this will
>> > +         * need to be fixed later on.
>> > +         */
>>
>> So, we can get away with a 1-element queue for EPOW, because they're
>> just triggering a shutdown - so once the first one's processed, any
>> others aren't going to matter.  For hotplug you really do need a
>> proper queue.
>
> Yah, this was discussed in the past, but until now I didn't notice how
> easy it would be to trigger this when hotplugging multiple devices from
> a script or management harness of some sort. Should be simple enough to
> fix for v5 though.

In case of memory hotplug, each LMB is associated with a drc. Hence a
request to hotplug a chunk of memory will be broken down into multiple
requests (each of min LMB size). So supporting a queue will be needed
and can be tested with memory hotplug code.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-01-28  0:42             ` Michael Roth
@ 2015-02-09 16:29               ` Nathan Fontenot
  0 siblings, 0 replies; 55+ messages in thread
From: Nathan Fontenot @ 2015-02-09 16:29 UTC (permalink / raw)
  To: Michael Roth, Tyrel Datwyler, David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, bharata.rao

On 01/27/2015 06:42 PM, Michael Roth wrote:
> Quoting Tyrel Datwyler (2015-01-27 16:05:52)
>> On 01/27/2015 01:36 PM, Michael Roth wrote:
>>> Quoting David Gibson (2015-01-26 23:24:11)
>>>> On Sun, Jan 25, 2015 at 11:21:26PM -0600, Michael Roth wrote:
>>>>> Quoting David Gibson (2015-01-16 00:21:55)
>>>>>> On Tue, Dec 23, 2014 at 06:30:17AM -0600, Michael Roth wrote:
>>>>>>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>>>>
>>>>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>  hw/ppc/spapr_rtas.c | 25 +++++++++++++++++++++++++
>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>>> index 2ec2a8e..a2fb533 100644
>>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>>> @@ -290,6 +290,27 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>>>>>>      rtas_st(rets, 0, ret);
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>>>> +                                 uint32_t token, uint32_t nargs,
>>>>>>> +                                 target_ulong args, uint32_t nret,
>>>>>>> +                                 target_ulong rets)
>>>>>>> +{
>>>>>>> +    /* we currently only use a single, "live insert" powerdomain for
>>>>>>> +     * hotplugged/dlpar'd resources, so the power is always live/full (100)
>>>>>>> +     */
>>>>>>
>>>>>> Even so, you should at least validate the number of args and rets, and
>>>>>> preferably check that the user isn't attempt to set something for some
>>>>>> other, non-existent power domain.
>>>>>>
>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>>> +    rtas_st(rets, 1, 100);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>>>> +                                  target_ulong args, uint32_t nret,
>>>>>>> +                                  target_ulong rets)
>>>>>>> +{
>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>>> +    rtas_st(rets, 1, 100);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static struct rtas_call {
>>>>>>>      const char *name;
>>>>>>>      spapr_rtas_fn fn;
>>>>>>> @@ -419,6 +440,10 @@ static void core_rtas_register_types(void)
>>>>>>>                          rtas_ibm_set_system_parameter);
>>>>>>>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
>>>>>>>                          rtas_ibm_os_term);
>>>>>>> +    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
>>>>>>> +                        rtas_set_power_level);
>>>>>>> +    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
>>>>>>> +                        rtas_get_power_level);
>>>>>>>  }
>>>>>>>  
>>>>>>>  type_init(core_rtas_register_types)
>>>>>>
>>>>>> This code should probably go in spapr_drc.c.  The idea that spapr_rtas
>>>>>> was just the RTAS dispatch code, and RTAS functions that had no other
>>>>>> home.  Generally RTAS functions should live with the devices they're
>>>>>> connected to.
>>>>>
>>>>> In this particular case the calls act on a "power domain" which isn't
>>>>> actually modeled in QEMU (we just assume a single "live-insertion" domain
>>>>> which just magically does everything we want), so I think it makes
>>>>> sense to leave these here.
>>>>
>>>> Yeah, fair enough.
>>>
>>> Hmm, looking at it again, set-indicator and get-sensor-state aren't actually
>>> specific to DR, but might be extended to handle a number of other types of
>>> sensors in the future ("Reset Component", "Error Log", and "Global Interrupt
>>> Queue Control" may be interesting in this regard).
>>>
>>> So it looks like only configure-connector is specifically for DR. Still
>>> planning on moving it to spapr_drc_rtas.c, unless you'd prefer not to
>>> at this point (it'll be lonely for the foreseable future).
>>
>> I will point out that configure-connector would be used for
>> hibernation/migration if we ever implement it as it is defined by PAPR.
>> I know for migration there had been some talk in the past about someday
>> doing it the right way.
> 
> Interesting. Do you know if calls are still tied to DRCs in this case, or
> is is there something more general like fetching the entire device-tree? I
> didn't see anything in PAPR+ 2.7 outside of the DR use-case where it acts
> on a particular DRC.
> 

It does act on a specific DRC. When doing device tree update for migration/hibernation
we first call rtas_update_nodes which passes back list containing among other things
a drc index for each new node that needs to be added to the device tree.

-Nathan

>>
>> -Tyrel
>>
>>>
>>>>
>>>>> But for the others it does make sense to tie them with spapr_drc.c, or
>>>>> maybe spapr_drc_rtas.c to maintain the encapsulation of DRC state behind
>>>>> well-defined accessors.
>>>>
>>>> Ok.
>>>>
>>>> -- 
>>>> David Gibson                    | I'll have my music baroque, and my code
>>>> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>>>>                                 | _way_ _around_!
>>>> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2015-02-09 16:30 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
2015-01-16  5:28   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
2015-01-02 10:32   ` Bharata B Rao
2015-01-26  4:56     ` Michael Roth
2015-01-16  6:19   ` David Gibson
2015-01-26  4:01     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
2015-01-16  6:21   ` David Gibson
2015-01-26  5:21     ` Michael Roth
2015-01-27  5:24       ` David Gibson
2015-01-27 21:36         ` Michael Roth
2015-01-27 22:05           ` Tyrel Datwyler
2015-01-28  0:42             ` Michael Roth
2015-02-09 16:29               ` Nathan Fontenot
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface Michael Roth
2015-01-16  6:25   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state " Michael Roth
2015-01-16  6:28   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper Michael Roth
2015-01-19  3:25   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
2015-01-19  3:44   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
2015-01-19  4:31   ` David Gibson
2015-01-26 16:56     ` Michael Roth
2015-01-27  5:27       ` David Gibson
2015-01-28  3:56       ` Bharata B Rao
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface Michael Roth
2015-01-19  4:34   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt() Michael Roth
2015-01-19  5:15   ` David Gibson
2015-01-26 20:35     ` Michael Roth
2015-01-27  5:30       ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type Michael Roth
2015-01-19  5:16   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
2015-01-19  5:18   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
2015-01-19  5:20   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs Michael Roth
2015-01-19  5:22   ` David Gibson
2015-01-26 20:44     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c Michael Roth
2015-01-19  5:24   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
2015-01-19  5:58   ` David Gibson
2015-01-26 21:17     ` Michael Roth
2015-01-27  5:37       ` David Gibson
2015-01-23  5:17   ` Alexey Kardashevskiy
2015-01-26 21:20     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
2015-01-19  6:00   ` David Gibson
2015-01-26 21:32     ` Michael Roth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.