All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug
@ 2015-02-16 14:27 Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
                   ` (15 more replies)
  0 siblings, 16 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

Re-sending due to an unfortunate s/qemu-ppc/qemu-pcc/ mishap. Sorry!

These patches are based on ppc-next, and can also be obtained from:

https://github.com/mdroth/qemu/commits/spapr-hotplug-pci-v5

v5:
* short version:
  - addressed, to my knowledge, all outstanding review comments
    from v4
  - fixed a number of issues which were made apparent when doing
    rapid device plug/unplug
  - allow for a device to be configured/unconfigured/configured
    reliably without ever being unplugged in between
  - implemented queueing of RTAS events, and suppression of subsequent
    unplug events when 1 is already in-flight for a device/DRC.
  - simplified spapr_drc_populate_dt() and 'reg'/'assigned-addresses'
    population code to clearer.
  - fixed OF device node properties in accordance with PCI-OF
    binding spec
  - better error-checking/reporting in hotplug path
* rebased/retested on ppc-next (2015-02-16)
* added support for multiple queued/in-flight EPOW/HP RTAS events
* grammar/typo fixes for documentation (David)
* fixed bug in drc->set_allocation_state() actually sets allocation
  state rather than indicator state (David)
* ensured all macro expressions were wrapped in parenthesis (David)
* dropped drc->get_id() in favor of direct access to drc->id (David)
* renamed prop_name to name to avoid naming inconsistencies in
  drc->configure_connector() (David)
* added an assert to ensure QEMU-side users of configure_connector_common()
  never reach an error condition (David)
* rtas-event-scan stub replaced with a functional implementation
* ensured unsupported sensor types always result in RTAS_OUT_NOT_SUPPORTED
  as opposed to RTAS_OUT_PARAM_ERROR (David)
* ensured all rtas arg/return counts are validated prior to access (David)
* validate power domain is live-insertion/-1 in
  rtas-{get,set}-power-level (David)
* simplified rtas_set_indicator() logic (David)
* moved RTAS_SENSOR_TYPE_* macros to spapr.h (David)
* modified drc->dr_entity_sense() to report UNUSABLE if the device
  is logical/non-PCI and the allocation state is UNUSABLE (Bharata)
* refactored spapr_drc_populate_dt() to avoid having a separate loop to
  gather DRC fields into a temporary data structure prior to generating
  OF array properties (David)
* added an Object *owner fields to spapr_drc_populate_dt() to allow limiting
  of OF DRC/slot descriptions to specific PHBs (David)
* ensure true/false are used in place of 1/0 (David)
* make naming of hotplug/unplug hooks clearer (David)
* re-worked 'reg'/'assigned-address' OF property population to
  avoid potential buffer-overrun and make it easier to understand
  the purpose of individual fields. also added documentation to
  further clarify logic (David)
* fixed boolean OF device properties to be
  present-but-empty/not-present rather than storing corresponding device
  register bits (David)
* fixed subsystem-id/subsystem-vendor-id OF properties to only be present when
  non-zero
* use actual bus num in place of phb index for 'reg'/'assigned-address' fields
  and DRC indexes

v4:
 * added documentation for sPAPR-based hotplug (Alexey)
 * reworked DR Connectors to be QOM devices, where sensor/indicator
   states are accessed via RTAS via object methods and also exposed
   via composition tree for introspection via qom-get/qom-fuse.
   attached devices are managed via state transitions handled by
   the DRC device (Alex)
 * DRC-related constants now defined in seperate header file,
   implemented as enum types where applicable
 * removed stub implementations of sensors that were not relevant
   to dynamic-reconfiguration. we now return "not implemented"
   if a guest attempts to access them via rtas-get-sensor or
   rtas-set-indicator-state
 * added DRC reset hooks to complete unplug for devices awaiting
   additional action from the guest before removal
 * incorporated endian fixes from Bharata and tested on ppc64le
   (Alex/Bharata)
 * used rtas_{ld,st} helpers in place of cpu_physical_memory_map
   for configure-connector implementation (Alex)
 * used b_* helper macros for properties related to OF PCI Binding
   (Alexey)
 * added dynamic-reconfiguration option to spapr-pci-host-bridge to
   enable/disable PCI hotplug for child bus
 * added pseries-2.3 machine and compat code to disable PCI hotplug by
   default for older machine types (Alex)
 * removed OF properties and DRC instances related to hotplugging of
   PHBs. this is not a prereq for PCI hotplug and will be handled as
   a separate series
 * moved generation of boot-time devices properties to common helper
   that can be re-used for memory, cpu, and phb. (Bharata)
 * re-organized patches so that pci, memory, cpu, phb should base
   cleanly on common set of patches implementing core DRC functionality
   (Bharata)
 * moved PCI 0-address fix to separate series (Alex)

v3:
 * dropped emulation of firmware-managed BAR allocation. this will be
   introduced via a follow-up series via a -machine flag and tied to
   a separate hotplug event to avoid a race condition with guest vs.
   "firmware"-managed BAR allocation, in conjunction with required
   fixes to rpaphp hotplug kernel module to utilize this mode.
 * moved drc_table into sPAPREnvironment (Alexey)
 * moved INDICATOR_* constants and friends into spapr_pci.c (Alexey)
 * use prefixes for global types (DrcEntry/ConfigureConnectorState) (Alexey)
 * updated for new hotplug interface (Alexey)
 * fixed get-power-level to report current power-level rather than
   desired (Alexey)
 * rebased to latest ppc-next

v2:
  * re-ordered patches to fix build bisectability (Alexey)
  * replaced g_warning with DPRINTF in RTAS calls for guest errors (Alexey)
  * replaced g_warning with fprintf for qemu errors (Alexey)
  * updated RTAS calls to use pre-existing error/success macros (Alexey)
  * replaced DR_*/SENSOR_* macros with INDICATOR_* for set-indicator/
    get-sensor-state (Alexey)

OVERVIEW

These patches add support for PCI hotplug for SPAPR guests. We advertise
each PHB as DR-capable (as defined by PAPR 13.5/13.6) with 32 hotpluggable
PCI slots per PHB, which models a standard PCI expansion device for Power
machines where the DRC name/loc-code/index for each slot are generated
based on bus/slot number.

This is compatible with existing guest kernel's via the rpaphp hotplug
module, and existing userspace tools such as drmgr/librtas/rtas_errd for
managing devices.

NOTES / ADDITIONAL DEPENDENCIES

This series relies on v1.2.19 or later of powerppc-utils (drmgr, rtas_errd,
ppc64-diag, and librtas components, specificially), which will automate
guest-side hotplug setup in response to an EPOW event emitted by QEMU. For
guests with older versions of powerpc-utils, a manual workaround must be
used (documented below).

Note that this relies on a patch to core PCI code which allows for the
use of a 0-address IO BAR for PCI devices. Without this patch, the first
hotplugged device will likely fail. This patch will be handled separately,
but is included in the in the development tree below for testing:

https://github.com/mdroth/qemu/commits/spapr-hotplug-pci

PATCH LAYOUT

Patches
        1     Documentation for sPAPR Dynamic-Reconfiguration/hotplug
        2     Initial implementation for sPAPRDRConnector device
        3-7   Guest RTAS calls to interact with DRC devices
        8-9   Introduce RTAS events for signalling hotplug operations
              to guest, using existing infrastructure of
              EPOW/check-exception events
        10    DRC helper code to populate DT descriptions of present DRC
              devices
        11    spapr-host-bridge option to selectively enable PCI hotplug/DR
              on a PHB-by-PHB basis
        12-16 PCI-specific hotplug hooks and DRC creation to enable PCI
              hotplug and hotplug events

USAGE

For guests with powerpc-utils 1.2.19+:
  hotplug:
    qemu:
      device_add e1000,id=slot0
  unplug:
    qemu:
      device_del slot0

For guests with powerpc-utils prior to 1.2.19:
  hotplug:
    qemu:
      device_add e1000,id=slot0
    guest:
      drmgr -c pci -s "C0" -n -a
      echo 1 >/sys/bus/pci/rescan
  unplug:
    guest:
      drmgr -c pci -s "C0" -n -r
      echo 1 >/sys/bus/pci/devices/0000:00:00.0/remove
    qemu:
      device_del slot0

 docs/specs/ppc-spapr-hotplug.txt | 287 +++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci/pci.c                     |   2 +-
 hw/ppc/Makefile.objs             |   2 +-
 hw/ppc/spapr.c                   |   8 +-
 hw/ppc/spapr_drc.c               | 736 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c            | 328 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 hw/ppc/spapr_pci.c               | 414 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 hw/ppc/spapr_rtas.c              | 262 ++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci-host/spapr.h      |   1 +
 include/hw/pci/pci.h             |   1 +
 include/hw/ppc/spapr.h           |  28 ++++-
 include/hw/ppc/spapr_drc.h       | 207 +++++++++++++++++++++++++++++++++++++
 12 files changed, 2200 insertions(+), 76 deletions(-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This adds a general overview of hotplug/dynamic-reconfiguration
for sPAPR/pSeries guest.

As specified in PAPR+ v2.7.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 docs/specs/ppc-spapr-hotplug.txt | 287 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 287 insertions(+)
 create mode 100644 docs/specs/ppc-spapr-hotplug.txt

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
new file mode 100644
index 0000000..d35771c
--- /dev/null
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -0,0 +1,287 @@
+= sPAPR Dynamic Reconfiguration =
+
+sPAPR/"pseries" guests make use of a facility called dynamic-reconfiguration
+to handle hotplugging of dynamic "physical" resources like PCI cards, or
+"logical"/paravirtual resources like memory, CPUs, and "physical"
+host-bridges, which are generally managed by the host/hypervisor and provided
+to guests as virtualized resources. The specifics of dynamic-reconfiguration
+are documented extensively in PAPR+ v2.7, Section 13.1. This document
+provides a summary of that information as it applies to the implementation
+within QEMU.
+
+== Dynamic-reconfiguration Connectors ==
+
+To manage hotplug/unplug of these resources, a firmware abstraction known as
+a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
+resource to the guest, and provide an interface for the guest to manage
+configuration/removal of the resource associated with it.
+
+== Device-tree description of DRCs ==
+
+A set of 4 Open Firmware device tree array properties are used to describe
+the name/index/power-domain/type of each DRC allocated to a guest at
+boot-time. There may be multiple sets of these arrays, rooted at different
+paths in the device tree depending on the type of resource the DRCs manage.
+
+In some cases, the DRCs themselves may be provided by a dynamic resource,
+such as the DRCs managing PCI slots on a hotplugged PHB. In this case the
+arrays would be fetched as part of the device tree retrieval interfaces
+for hotplugged resources described under "Guest->Host interface".
+
+The array properties are described below. Each entry/element in an array
+describes the DRC identified by the element in the corresponding position
+of ibm,drc-indexes:
+
+ibm,drc-names:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <name> string encoded as a byte array
+
+  <name> values for logical/virtual resources are defined in PAPR+ v2.7,
+  Section 13.5.2.4, and basically consist of the type of the resource
+  followed by a space and a numerical value that's unique across resources
+  of that type.
+
+  <name> values for "physical" resources such as PCI or VIO devices are
+  defined as being "location codes", which are the "location labels" of
+  each encapsulating device, starting from the chassis down to the
+  individual slot for the device, concatenated by a hyphen. This provides
+  a mapping of resources to a physical location in a chassis for debugging
+  purposes. For QEMU, this mapping is less important, so we assign a
+  location code that conforms to naming specifications, but is simply a
+  location label for the slot by itself to simplify the implementation.
+  The naming convention for location labels is documented in detail in
+  PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>"
+  for PCI/VIO device slots, where <n> is unique across all PCI/VIO
+  device slots.
+
+ibm,drc-indexes:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs
+    in the machine
+
+  <index> is arbitrary, but in the case of QEMU we try to maintain the
+  convention used to assign them to pSeries guests on pHyp:
+
+    bit[31:28]: integer encoding of <type>, where <type> is:
+                  1 for CPU resource
+                  2 for PHB resource
+                  3 for VIO resource
+                  4 for PCI resource
+                  8 for Memory resource
+    bit[27:0]: integer encoding of <id>, where <id> is unique across
+                 all resources of specified type
+
+ibm,drc-power-domains:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the
+    power domain the resource will be assigned to. In the case of QEMU
+    we associated all resources with a "live insertion" domain, where the
+    power is assumed to be managed automatically. The integer value for
+    this domain is a special value of -1.
+
+
+ibm,drc-types:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <type> string encoded as a byte array
+
+  <type> is assigned as follows:
+    "CPU" for a CPU
+    "PHB" for a physical host-bridge
+    "SLOT" for a VIO slot
+    "28" for a PCI slot
+    "MEM" for memory resource
+
+== Guest->Host interface to manage dynamic resources ==
+
+Each DRC is given a globally unique DRC Index, and resources associated with
+a particular DRC are configured/managed by the guest via a number of RTAS
+calls which reference individual DRCs based on the DRC index. This can be
+considered the guest->host interface.
+
+rtas-set-power-level:
+  arg[0]: integer identifying power domain
+  arg[1]: new power level for the domain, 0-100
+  output[0]: status, 0 on success
+  output[1]: power level after command
+
+  Set the power level for a specified power domain
+
+rtas-get-power-level:
+  arg[0]: integer identifying power domain
+  output[0]: status, 0 on success
+  output[1]: current power level
+
+  Get the power level for a specified power domain
+
+rtas-set-indicator:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  arg[2]: desired sensor value
+  output[0]: status, 0 on success
+
+  Set the state of an indicator or sensor. For the purpose of this document we
+  focus on the indicator/sensor types associated with a DRC. The types are:
+
+    9001: isolation-state, controls/indicates whether a device has been made
+          accessible to a guest
+
+          supported sensor values:
+            0: isolate, device is made unaccessible by guest OS
+            1: unisolate, device is made available to guest OS
+
+    9002: dr-indicator, controls "visual" indicator associated with device
+
+          supported sensor values:
+            0: inactive, resource may be safely removed
+            1: active, resource is in use and cannot be safely removed
+            2: identify, used to visually identify slot for interactive hotplug
+            3: action, in most cases, used in the same manner as identify
+
+    9003: allocation-state, generally only used for "logical" DR resources to
+          request the allocation/deallocation of a resource prior to acquiring
+          it via isolation-state->unisolate, or after releasing it via
+          isolation-state->isolate, respectively. for "physical" DR (like PCI
+          hotplug/unplug) the pre-allocation of the resource is implied and
+          this sensor is unused.
+
+          supported sensor values:
+            0: unusable, tell firmware/system the resource can be
+               unallocated/reclaimed and added back to the system resource pool
+            1: usable, request the resource be allocated/reserved for use by
+               guest OS
+            2: exchange, used to allocate a spare resource to use for fail-over
+               in certain situations. unused in QEMU
+            3: recover, used to reclaim a previously allocated resource that's
+               not currently allocated to the guest OS. unused in QEMU
+
+rtas-get-sensor-state:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  output[0]: status, 0 on success
+
+  Used to read an indicator or sensor value.
+
+  For DR-related operations, the only noteworthy sensor is dr-entity-sense,
+  which has a type value of 9003, as allocation-state does in the case of
+  rtas-set-indicator. The semantics/encodings of the sensor values are distinct
+  however:
+
+  supported sensor values for dr-entity-sense (9003) sensor:
+    0: empty,
+         for physical resources: DRC/slot is empty
+         for logical resources: unused
+    1: present,
+         for physical resources: DRC/slot is populated with a device/resource
+         for logical resources: resource has been allocated to the DRC
+    2: unusable,
+         for physical resources: unused
+         for logical resources: DRC has no resource allocated to it
+    3: exchange,
+         for physical resources: unused
+         for logical resources: resource available for exchange (see
+           allocation-state sensor semantics above)
+    4: recovery,
+         for physical resources: unused
+         for logical resources: resource available for recovery (see
+           allocation-state sensor semantics above)
+
+rtas-ibm-configure-connector:
+  arg[0]: guest physical address of 4096-byte work area buffer
+  arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero
+          if a prior RTAS response indicated a need for additional memory
+  output[0]: status:
+               0: completed transmittal of device-tree node
+               1: instruct guest to prepare for next DT sibling node
+               2: instruct guest to prepare for next DT child node
+               3: instruct guest to prepare for next DT property
+               4: instruct guest to ascend to parent DT node
+               5: instruct guest to provide additional work-area buffer
+                  via arg[1]
+            990x: instruct guest that operation took too long and to try
+                  again later
+
+  Used to fetch an OF device-tree description of the resource associated with
+  a particular DRC. The DRC index is encoded in the first 4-bytes of the first
+  work area buffer.
+
+  Work area layout, using 4-byte offsets:
+    wa[0]: DRC index of the DRC to fetch device-tree nodes from
+    wa[1]: 0 (hard-coded)
+    wa[2]: for next-sibling/next-child response:
+             wa offset of null-terminated string denoting the new node's name
+           for next-property response:
+             wa offset of null-terminated string denoting new property's name
+    wa[3]: for next-property response (unused otherwise):
+             byte-length of new property's value
+    wa[4]: for next-property response (unused otherwise):
+             new property's value, encoded as an OFDT-compatible byte array
+
+== hotplug/unplug events ==
+
+For most DR operations, the hypervisor will issue host->guest add/remove events
+using the EPOW/check-exception notification framework, where the host issues a
+check-exception interrupt, then provides an RTAS event log via an
+rtas-check-exception call issued by the guest in response. This framework is
+documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
+requests via EPOW events.
+
+For DR, this framework has been extended to include hotplug events, which were
+previously unneeded due to direct manipulation of DR-related guest userspace
+tools by host-level management such as an HMC. This level of management is not
+applicable to PowerKVM, hence the reason for extending the notification
+framework to support hotplug events.
+
+Note that these events are not yet formally part of the PAPR+ specification,
+but support for this format has already been implemented in DR-related
+guest tools such as powerpc-utils/librtas, as well as kernel patches that have
+been submitted to handle in-kernel processing of memory/cpu-related hotplug
+events[1], and is planned for formal inclusion is PAPR+ specification. The
+hotplug-specific payload is QEMU implemented as follows (with all values
+encoded in big-endian format):
+
+struct rtas_event_log_v6_hp {
+#define SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct section_header {
+        uint16_t section_id;            /* set to SECTION_ID_HOTPLUG */
+        uint16_t section_length;        /* sizeof(rtas_event_log_v6_hp),
+                                         * plus the length of the DRC name
+                                         * if a DRC name identifier is
+                                         * specified for hotplug_identifier
+                                         */
+        uint8_t section_version;        /* version 1 */
+        uint8_t section_subtype;        /* unused */
+        uint16_t creator_component_id;  /* unused */
+    } hdr;
+#define RTAS_LOG_V6_HP_TYPE_CPU         1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY      2
+#define RTAS_LOG_V6_HP_TYPE_SLOT        3
+#define RTAS_LOG_V6_HP_TYPE_PHB         4
+#define RTAS_LOG_V6_HP_TYPE_PCI         5
+    uint8_t hotplug_type;               /* type of resource/device */
+#define RTAS_LOG_V6_HP_ACTION_ADD       1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE    2
+    uint8_t hotplug_action;             /* action (add/remove) */
+#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
+    uint8_t hotplug_identifier;         /* type of the resource identifier,
+                                         * which serves as the discriminator
+                                         * for the 'drc' union field below
+                                         */
+    uint8_t reserved;
+    union {
+        uint32_t index;                 /* DRC index of resource to take action
+                                         * on
+                                         */
+        uint32_t count;                 /* number of DR resources to take
+                                         * action on (guest chooses which)
+                                         */
+        char name[1];                   /* string representing the name of the
+                                         * DRC to take action on
+                                         */
+    } drc;
+} QEMU_PACKED;
+
+[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-23  6:05   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.

Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.

On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/Makefile.objs       |   2 +-
 hw/ppc/spapr_drc.c         | 571 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h | 205 ++++++++++++++++
 3 files changed, 777 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_drc.c
 create mode 100644 include/hw/ppc/spapr_drc.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 437955d..c8ab06e 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
new file mode 100644
index 0000000..8b01200
--- /dev/null
+++ b/hw/ppc/spapr_drc.c
@@ -0,0 +1,571 @@
+/*
+ * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "hw/ppc/spapr_drc.h"
+#include "qom/object.h"
+#include "hw/qdev.h"
+#include "qapi/visitor.h"
+#include "qemu/error-report.h"
+
+/* #define DEBUG_SPAPR_DRC */
+
+#ifdef DEBUG_SPAPR_DRC
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#define DPRINTFN(fmt, ...) \
+    do { DPRINTF(fmt, ## __VA_ARGS__); fprintf(stderr, "\n"); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#define DPRINTFN(fmt, ...) \
+    do { } while (0)
+#endif
+
+#define DRC_CONTAINER_PATH "/dr-connector"
+#define DRC_INDEX_TYPE_SHIFT 28
+#define DRC_INDEX_ID_MASK (~(~0 << DRC_INDEX_TYPE_SHIFT))
+
+static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
+{
+    uint32_t shift = 0;
+
+    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);
+    while (type != (1 << shift)) {
+        shift++;
+    }
+    return shift;
+}
+
+static uint32_t get_index(sPAPRDRConnector *drc)
+{
+    /* no set format for a drc index: it only needs to be globally
+     * unique. this is how we encode the DRC type on bare-metal
+     * however, so might as well do that here
+     */
+    return (get_type_shift(drc->type) << DRC_INDEX_TYPE_SHIFT) |
+            (drc->id & DRC_INDEX_ID_MASK);
+}
+
+static int set_isolation_state(sPAPRDRConnector *drc,
+                               sPAPRDRIsolationState state)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    DPRINTFN("drc: %x, set_isolation_state: %x", get_index(drc), state);
+
+    drc->isolation_state = state;
+
+    if (drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) {
+        /* if we're awaiting release, but still in an unconfigured state,
+         * it's likely the guest is still in the process of configuring
+         * the device and is transitioning the devices to an ISOLATED
+         * state as a part of that process. so we only complete the
+         * removal when this transition happens for a device in a
+         * configured state, as suggested by the state diagram from
+         * PAPR+ 2.7, 13.4
+         */
+        if (drc->awaiting_release && drc->ccs.configured) {
+            DPRINTFN("finalizing device removal");
+            drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
+                         drc->detach_cb_opaque, NULL);
+        } else if (!drc->ccs.configured) {
+            DPRINTFN("deferring device removal on unconfigured device\n");
+        }
+        drc->ccs.fdt_offset = drc->ccs.fdt_start_offset;
+        drc->ccs.fdt_depth = 0;
+        drc->ccs.configured = false;
+    }
+
+    return 0;
+}
+
+static int set_indicator_state(sPAPRDRConnector *drc,
+                               sPAPRDRIndicatorState state)
+{
+    DPRINTFN("drc: %x, set_indicator_state: %x", get_index(drc), state);
+    drc->indicator_state = state;
+    return 0;
+}
+
+static int set_allocation_state(sPAPRDRConnector *drc,
+                                sPAPRDRAllocationState state)
+{
+    DPRINTFN("drc: %x, set_allocation_state: %x", get_index(drc), state);
+    drc->allocation_state = state;
+    return 0;
+}
+
+static uint32_t get_type(sPAPRDRConnector *drc)
+{
+    return drc->type;
+}
+
+static const char *get_name(sPAPRDRConnector *drc)
+{
+    return drc->name;
+}
+
+/*
+ * dr-entity-sense sensor value
+ * returned via get-sensor-state RTAS calls
+ * as expected by state diagram in PAPR+ 2.7, 13.4
+ * based on the current allocation/indicator/power states
+ * for the DR connector.
+ */
+static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
+{
+    if (drc->dev) {
+        if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI &&
+            drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) {
+            /* for logical DR, we return a state of UNUSABLE
+             * iff the allocation state UNUSABLE.
+             * Otherwise, report the state as USABLE/PRESENT,
+             * as we would for PCI.
+             */
+            return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
+        }
+
+        /* this assumes all PCI devices are assigned to
+         * a 'live insertion' power domain, where QEMU
+         * manages power state automatically as opposed
+         * to the guest. present, non-PCI resources are
+         * unaffected by power state.
+         */
+        return SPAPR_DR_ENTITY_SENSE_PRESENT;
+    }
+
+    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
+        /* PCI devices, and only PCI devices, use PRESENT
+         * in cases where we'd otherwise use UNUSABLE
+         */
+        return SPAPR_DR_ENTITY_SENSE_EMPTY;
+    }
+    return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
+}
+
+static sPAPRDRCCResponse configure_connector_common(sPAPRDRCCState *ccs,
+                            char **name, const struct fdt_property **prop,
+                            int *prop_len)
+{
+    sPAPRDRCCResponse resp = SPAPR_DR_CC_RESPONSE_CONTINUE;
+    int fdt_offset_next;
+
+    *name = NULL;
+    *prop = NULL;
+    *prop_len = 0;
+
+    if (!ccs->fdt) {
+        return SPAPR_DR_CC_RESPONSE_ERROR;
+    }
+
+    while (resp == SPAPR_DR_CC_RESPONSE_CONTINUE) {
+        const char *name_cur;
+        uint32_t tag;
+        int name_cur_len;
+
+        tag = fdt_next_tag(ccs->fdt, ccs->fdt_offset, &fdt_offset_next);
+        switch (tag) {
+        case FDT_BEGIN_NODE:
+            ccs->fdt_depth++;
+            name_cur = fdt_get_name(ccs->fdt, ccs->fdt_offset, &name_cur_len);
+            *name = g_strndup(name_cur, name_cur_len);
+            resp = SPAPR_DR_CC_RESPONSE_NEXT_CHILD;
+            break;
+        case FDT_END_NODE:
+            ccs->fdt_depth--;
+            if (ccs->fdt_depth == 0) {
+                resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
+                ccs->configured = true;
+            } else {
+                resp = SPAPR_DR_CC_RESPONSE_PREV_PARENT;
+            }
+            break;
+        case FDT_PROP:
+            *prop = fdt_get_property_by_offset(ccs->fdt, ccs->fdt_offset,
+                                               prop_len);
+            name_cur = fdt_string(ccs->fdt, fdt32_to_cpu((*prop)->nameoff));
+            *name = g_strdup(name_cur);
+            resp = SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY;
+            break;
+        case FDT_END:
+            resp = SPAPR_DR_CC_RESPONSE_ERROR;
+            break;
+        default:
+            ccs->fdt_offset = fdt_offset_next;
+        }
+    }
+
+    ccs->fdt_offset = fdt_offset_next;
+    return resp;
+}
+
+static sPAPRDRCCResponse configure_connector(sPAPRDRConnector *drc,
+                                             char **name,
+                                             const struct fdt_property **prop,
+                                             int *prop_len)
+{
+    return configure_connector_common(&drc->ccs, name, prop, prop_len);
+}
+
+static void prop_get_index(Object *obj, Visitor *v, void *opaque,
+                                  const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->get_index(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_type(Object *obj, Visitor *v, void *opaque,
+                          const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->get_type(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static char *prop_get_name(Object *obj, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    return g_strdup(drck->get_name(drc));
+}
+
+static void prop_get_entity_sense(Object *obj, Visitor *v, void *opaque,
+                                  const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    uint32_t value = (uint32_t)drck->entity_sense(drc);
+    visit_type_uint32(v, &value, name, errp);
+}
+
+static void prop_get_fdt(Object *obj, Visitor *v, void *opaque,
+                        const char *name, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+    sPAPRDRCCState ccs = { 0 };
+    sPAPRDRCCResponse resp;
+
+    ccs.fdt = drc->ccs.fdt;
+    ccs.fdt_offset = ccs.fdt_start_offset = drc->ccs.fdt_start_offset;
+
+    do {
+        char *name = NULL;
+        const struct fdt_property *prop = NULL;
+        int prop_len;
+
+        resp = configure_connector_common(&ccs, &name, &prop, &prop_len);
+
+        switch (resp) {
+        case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
+            visit_start_struct(v, NULL, NULL, name, 0, NULL);
+            break;
+        case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
+            visit_end_struct(v, NULL);
+            break;
+        case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY: {
+            int i;
+            visit_start_list(v, name, NULL);
+            for (i = 0; i < prop_len; i++) {
+                visit_type_uint8(v, (uint8_t *)&prop->data[i], NULL, NULL);
+            }
+            visit_end_list(v, NULL);
+            break;
+        }
+        default:
+            resp = SPAPR_DR_CC_RESPONSE_SUCCESS;
+            break;
+        }
+
+        g_free(name);
+    } while (resp != SPAPR_DR_CC_RESPONSE_SUCCESS &&
+             resp != SPAPR_DR_CC_RESPONSE_ERROR);
+
+    g_assert(resp != SPAPR_DR_CC_RESPONSE_ERROR);
+}
+
+static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
+                   int fdt_start_offset, bool coldplug, Error **errp)
+{
+    DPRINTFN("drc: %x, attach", get_index(drc));
+
+    if (drc->isolation_state != SPAPR_DR_ISOLATION_STATE_ISOLATED) {
+        error_setg(errp, "an attached device is still awaiting release");
+        return;
+    }
+    g_assert(drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE);
+    g_assert(fdt || coldplug);
+
+    /* NOTE: setting initial isolation state to UNISOLATED means we can't
+     * detach unless guest has a userspace/kernel that moves this state
+     * back to ISOLATED in response to an unplug event, or this is done
+     * manually by the admin prior. if we force things while the guest
+     * may be accessing the device, we can easily crash the guest, so we
+     * we defer completion of removal in such cases to the reset() hook.
+     */
+    drc->isolation_state = SPAPR_DR_ISOLATION_STATE_UNISOLATED;
+    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_USABLE;
+    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_ACTIVE;
+
+    drc->dev = d;
+    drc->ccs.fdt = fdt;
+    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = fdt_start_offset;
+    drc->ccs.fdt_depth = 0;
+    drc->ccs.configured = false;
+
+    object_property_add_link(OBJECT(drc), "device",
+                             object_get_typename(OBJECT(drc->dev)),
+                             (Object **)(&drc->dev),
+                             NULL, 0, NULL);
+}
+
+static void detach(sPAPRDRConnector *drc, DeviceState *d,
+                   spapr_drc_detach_cb *detach_cb,
+                   void *detach_cb_opaque, Error **errp)
+{
+    DPRINTFN("drc: %x, detach", get_index(drc));
+
+    drc->detach_cb = detach_cb;
+    drc->detach_cb_opaque = detach_cb_opaque;
+
+    if (drc->isolation_state != SPAPR_DR_ISOLATION_STATE_ISOLATED) {
+        DPRINTFN("awaiting transition to isolated state before removal");
+        drc->awaiting_release = true;
+        return;
+    }
+
+    drc->allocation_state = SPAPR_DR_ALLOCATION_STATE_UNUSABLE;
+    drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
+
+    if (drc->detach_cb) {
+        drc->detach_cb(drc->dev, drc->detach_cb_opaque);
+    }
+
+    drc->awaiting_release = false;
+    g_free(drc->ccs.fdt);
+    drc->ccs.fdt = NULL;
+    drc->ccs.fdt_offset = drc->ccs.fdt_start_offset = drc->ccs.fdt_depth = 0;
+    object_property_del(OBJECT(drc), "device", NULL);
+    drc->dev = NULL;
+    drc->detach_cb = NULL;
+    drc->detach_cb_opaque = NULL;
+}
+
+static bool release_pending(sPAPRDRConnector *drc)
+{
+    return drc->awaiting_release;
+}
+
+static void reset(DeviceState *d)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    DPRINTFN("drc reset: %x", drck->get_index(drc));
+    /* immediately upon reset we can safely assume DRCs whose devices are pending
+     * removal can be safely removed, and that they will subsequently be left in
+     * an ISOLATED state. move the DRC to this state in these cases (which will in
+     * turn complete any pending device removals)
+     */
+    if (drc->awaiting_release) {
+        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_ISOLATED);
+    }
+}
+
+static void realize(DeviceState *d, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    Object *root_container;
+    char link_name[256];
+    gchar *child_name;
+    Error *err = NULL;
+
+    DPRINTFN("drc realize: %x", drck->get_index(drc));
+    /* NOTE: we do this as part of realize/unrealize due to the fact
+     * that the guest will communicate with the DRC via RTAS calls
+     * referencing the global DRC index. By unlinking the DRC
+     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
+     * inaccessible by the guest, since lookups rely on this path
+     * existing in the composition tree
+     */
+    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+    snprintf(link_name, sizeof(link_name), "%x", drck->get_index(drc));
+    child_name = object_get_canonical_path_component(OBJECT(drc));
+    DPRINTFN("drc child name: %s", child_name);
+    object_property_add_alias(root_container, link_name,
+                              drc->owner, child_name, &err);
+    if (err) {
+        error_report("%s", error_get_pretty(err));
+        error_free(err);
+        object_unref(OBJECT(drc));
+    }
+    DPRINTFN("drc realize complete");
+}
+
+static void unrealize(DeviceState *d, Error **errp)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(d);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    Object *root_container;
+    char name[256];
+    Error *err = NULL;
+
+    DPRINTFN("drc unrealize: %x", drck->get_index(drc));
+    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+    snprintf(name, sizeof(name), "%x", drck->get_index(drc));
+    object_property_del(root_container, name, &err);
+    if (err) {
+        error_report("%s", error_get_pretty(err));
+        error_free(err);
+        object_unref(OBJECT(drc));
+    }
+}
+
+sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
+                                         sPAPRDRConnectorType type,
+                                         uint32_t id)
+{
+    sPAPRDRConnector *drc =
+        SPAPR_DR_CONNECTOR(object_new(TYPE_SPAPR_DR_CONNECTOR));
+
+    g_assert(type);
+
+    drc->type = type;
+    drc->id = id;
+    drc->owner = owner;
+    object_property_add_child(owner, "dr-connector[*]", OBJECT(drc), NULL);
+    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
+
+    /* human-readable name for a DRC to encode into the DT
+     * description. this is mainly only used within a guest in place
+     * of the unique DRC index.
+     *
+     * in the case of VIO/PCI devices, it corresponds to a
+     * "location code" that maps a logical device/function (DRC index)
+     * to a physical (or virtual in the case of VIO) location in the
+     * system by chaining together the "location label" for each
+     * encapsulating component.
+     *
+     * since this is more to do with diagnosing physical hardware
+     * issues than guest compatibility, we choose location codes/DRC
+     * names that adhere to the documented format, but avoid encoding
+     * the entire topology information into the label/code, instead
+     * just using the location codes based on the labels for the
+     * endpoints (VIO/PCI adaptor connectors), which is basically
+     * just "C" followed by an integer ID.
+     *
+     * DRC names as documented by PAPR+ v2.7, 13.5.2.4
+     * location codes as documented by PAPR+ v2.7, 12.3.1.5
+     */
+    switch (drc->type) {
+    case SPAPR_DR_CONNECTOR_TYPE_CPU:
+        drc->name = g_strdup_printf("CPU %d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PHB:
+        drc->name = g_strdup_printf("PHB %d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_VIO:
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        drc->name = g_strdup_printf("C%d", id);
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        drc->name = g_strdup_printf("LMB %d", id);
+        break;
+    default:
+        g_assert(false);
+    }
+
+    return drc;
+}
+
+static void spapr_dr_connector_instance_init(Object *obj)
+{
+    sPAPRDRConnector *drc = SPAPR_DR_CONNECTOR(obj);
+
+    object_property_add_uint32_ptr(obj, "isolation-state",
+                                   &drc->isolation_state, NULL);
+    object_property_add_uint32_ptr(obj, "indicator-state",
+                                   &drc->indicator_state, NULL);
+    object_property_add_uint32_ptr(obj, "allocation-state",
+                                   &drc->allocation_state, NULL);
+    object_property_add_uint32_ptr(obj, "id", &drc->id, NULL);
+    object_property_add(obj, "index", "uint32", prop_get_index,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "connector_type", "uint32", prop_get_type,
+                        NULL, NULL, NULL, NULL);
+    object_property_add_str(obj, "name", prop_get_name, NULL, NULL);
+    object_property_add(obj, "entity-sense", "uint32", prop_get_entity_sense,
+                        NULL, NULL, NULL, NULL);
+    object_property_add(obj, "fdt", "struct", prop_get_fdt,
+                        NULL, NULL, NULL, NULL);
+}
+
+static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
+{
+    DeviceClass *dk = DEVICE_CLASS(k);
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+
+    dk->reset = reset;
+    dk->realize = realize;
+    dk->unrealize = unrealize;
+    drck->set_isolation_state = set_isolation_state;
+    drck->set_indicator_state = set_indicator_state;
+    drck->set_allocation_state = set_allocation_state;
+    drck->get_index = get_index;
+    drck->get_type = get_type;
+    drck->get_name = get_name;
+    drck->entity_sense = entity_sense;
+    drck->configure_connector = configure_connector;
+    drck->attach = attach;
+    drck->detach = detach;
+    drck->release_pending = release_pending;
+}
+
+static const TypeInfo spapr_dr_connector_info = {
+    .name          = TYPE_SPAPR_DR_CONNECTOR,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(sPAPRDRConnector),
+    .instance_init = spapr_dr_connector_instance_init,
+    .class_size    = sizeof(sPAPRDRConnectorClass),
+    .class_init    = spapr_dr_connector_class_init,
+};
+
+static void spapr_drc_register_types(void)
+{
+    type_register_static(&spapr_dr_connector_info);
+}
+
+type_init(spapr_drc_register_types)
+
+/* helper functions for external users */
+
+sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index)
+{
+    Object *obj;
+    char name[256];
+
+    snprintf(name, sizeof(name), "%s/%x", DRC_CONTAINER_PATH, index);
+    obj = object_resolve_path(name, NULL);
+
+    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
+}
+
+sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
+                                           uint32_t id)
+{
+    return spapr_dr_connector_by_index(
+            (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
+            (id & DRC_INDEX_ID_MASK));
+}
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
new file mode 100644
index 0000000..d6cf597
--- /dev/null
+++ b/include/hw/ppc/spapr_drc.h
@@ -0,0 +1,205 @@
+/*
+ * QEMU SPAPR Dynamic Reconfiguration Connector Implementation
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * Authors:
+ *  Michael Roth      <mdroth@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#if !defined(__HW_SPAPR_DRC_H__)
+#define __HW_SPAPR_DRC_H__
+
+#include "qom/object.h"
+#include "hw/qdev.h"
+#include "libfdt.h"
+
+#define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
+#define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DR_CONNECTOR)
+#define SPAPR_DR_CONNECTOR_CLASS(klass) \
+        OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, \
+                           TYPE_SPAPR_DR_CONNECTOR)
+#define SPAPR_DR_CONNECTOR(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
+                                             TYPE_SPAPR_DR_CONNECTOR)
+
+/*
+ * Various hotplug types managed by sPAPRDRConnector
+ *
+ * these are somewhat arbitrary, but to make things easier
+ * when generating DRC indexes later we've aligned the bit
+ * positions with the values used to assign DRC indexes on
+ * pSeries. we use those values as bit shifts to allow for
+ * the OR'ing of these values in various QEMU routines, but
+ * for values exposed to the guest (via DRC indexes for
+ * instance) we will use the shift amounts.
+ */
+typedef enum {
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU = 1,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB = 2,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO = 3,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI = 4,
+    SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB = 8,
+} sPAPRDRConnectorTypeShift;
+
+typedef enum {
+    SPAPR_DR_CONNECTOR_TYPE_ANY = ~0,
+    SPAPR_DR_CONNECTOR_TYPE_CPU = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_CPU,
+    SPAPR_DR_CONNECTOR_TYPE_PHB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB,
+    SPAPR_DR_CONNECTOR_TYPE_VIO = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_VIO,
+    SPAPR_DR_CONNECTOR_TYPE_PCI = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_PCI,
+    SPAPR_DR_CONNECTOR_TYPE_LMB = 1 << SPAPR_DR_CONNECTOR_TYPE_SHIFT_LMB,
+} sPAPRDRConnectorType;
+
+/*
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177
+ *
+ * isolated: put device under firmware control
+ * unisolated: claim OS control of device (may or may not be in use)
+ */
+typedef enum {
+    SPAPR_DR_ISOLATION_STATE_ISOLATED   = 0,
+    SPAPR_DR_ISOLATION_STATE_UNISOLATED = 1
+} sPAPRDRIsolationState;
+
+/*
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177
+ *
+ * unusable: mark device as unavailable to OS
+ * usable: mark device as available to OS
+ * exchange: (currently unused)
+ * recover: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_ALLOCATION_STATE_UNUSABLE  = 0,
+    SPAPR_DR_ALLOCATION_STATE_USABLE    = 1,
+    SPAPR_DR_ALLOCATION_STATE_EXCHANGE  = 2,
+    SPAPR_DR_ALLOCATION_STATE_RECOVER   = 3
+} sPAPRDRAllocationState;
+
+/*
+ * LED/visual indicator state
+ *
+ * set via set-indicator RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.4, Table 177,
+ * and PAPR+ 2.7 13.5.4.1, Table 180
+ *
+ * inactive: hotpluggable entity inactive and safely removable
+ * active: hotpluggable entity in use and not safely removable
+ * identify: (currently unused)
+ * action: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_INDICATOR_STATE_INACTIVE   = 0,
+    SPAPR_DR_INDICATOR_STATE_ACTIVE     = 1,
+    SPAPR_DR_INDICATOR_STATE_IDENTIFY   = 2,
+    SPAPR_DR_INDICATOR_STATE_ACTION     = 3,
+} sPAPRDRIndicatorState;
+
+/*
+ * returned via get-sensor-state RTAS calls
+ * as documented by PAPR+ 2.7 13.5.3.3, Table 175:
+ *
+ * empty: connector slot empty (e.g. empty hotpluggable PCI slot)
+ * present: connector slot populated and device available to OS
+ * unusable: device not currently available to OS
+ * exchange: (currently unused)
+ * recover: (currently unused)
+ */
+typedef enum {
+    SPAPR_DR_ENTITY_SENSE_EMPTY     = 0,
+    SPAPR_DR_ENTITY_SENSE_PRESENT   = 1,
+    SPAPR_DR_ENTITY_SENSE_UNUSABLE  = 2,
+    SPAPR_DR_ENTITY_SENSE_EXCHANGE  = 3,
+    SPAPR_DR_ENTITY_SENSE_RECOVER   = 4,
+} sPAPRDREntitySense;
+
+typedef enum {
+    SPAPR_DR_CC_RESPONSE_NEXT_SIB       = 1, /* currently unused */
+    SPAPR_DR_CC_RESPONSE_NEXT_CHILD     = 2,
+    SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY  = 3,
+    SPAPR_DR_CC_RESPONSE_PREV_PARENT    = 4,
+    SPAPR_DR_CC_RESPONSE_SUCCESS        = 0,
+    SPAPR_DR_CC_RESPONSE_ERROR          = -1,
+    SPAPR_DR_CC_RESPONSE_CONTINUE       = -2,
+} sPAPRDRCCResponse;
+
+typedef struct sPAPRDRCCState {
+    void *fdt;
+    int fdt_start_offset;
+    int fdt_offset;
+    int fdt_depth;
+    bool configured;
+} sPAPRDRCCState;
+
+typedef void (spapr_drc_detach_cb)(DeviceState *d, void *opaque);
+
+typedef struct sPAPRDRConnector {
+    /*< private >*/
+    DeviceState parent;
+
+    sPAPRDRConnectorType type;
+    uint32_t id;
+    Object *owner;
+    const char *name;
+
+    /* sensor/indicator states */
+    uint32_t isolation_state;
+    uint32_t allocation_state;
+    uint32_t indicator_state;
+
+    /* configure-connector state */
+    sPAPRDRCCState ccs;
+
+    bool awaiting_release;
+
+    /* device pointer, via link property */
+    DeviceState *dev;
+    spapr_drc_detach_cb *detach_cb;
+    void *detach_cb_opaque;
+} sPAPRDRConnector;
+
+typedef struct sPAPRDRConnectorClass {
+    /*< private >*/
+    DeviceClass parent;
+
+    /*< public >*/
+
+    /* accessors for guest-visible (generally via RTAS) DR state */
+    int (*set_isolation_state)(sPAPRDRConnector *drc,
+                               sPAPRDRIsolationState state);
+    int (*set_indicator_state)(sPAPRDRConnector *drc,
+                               sPAPRDRIndicatorState state);
+    int (*set_allocation_state)(sPAPRDRConnector *drc,
+                                sPAPRDRAllocationState state);
+    uint32_t (*get_index)(sPAPRDRConnector *drc);
+    uint32_t (*get_type)(sPAPRDRConnector *drc);
+    const char *(*get_name)(sPAPRDRConnector *drc);
+
+    sPAPRDREntitySense (*entity_sense)(sPAPRDRConnector *drc);
+    sPAPRDRCCResponse (*configure_connector)(sPAPRDRConnector *drc,
+                                             char **prop_name,
+                                             const struct fdt_property **prop,
+                                             int *prop_len);
+
+    /* QEMU interfaces for managing hotplug operations */
+    void (*attach)(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
+                   int fdt_start_offset, bool coldplug, Error **errp);
+    void (*detach)(sPAPRDRConnector *drc, DeviceState *d,
+                   spapr_drc_detach_cb *detach_cb,
+                   void *detach_cb_opaque, Error **errp);
+    bool (*release_pending)(sPAPRDRConnector *drc);
+} sPAPRDRConnectorClass;
+
+sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
+                                         sPAPRDRConnectorType type,
+                                         uint32_t token);
+sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
+sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
+                                           uint32_t id);
+
+#endif /* __HW_SPAPR_DRC_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  6:29   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface Michael Roth
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

These interfaces manage the power domains that guest devices are
assigned to and are used to power on/off devices. Currently we
only utilize 1 power domain, the 'live-insertion' domain, which
automates power management of plugged/unplugged devices, essentially
making these calls no-ops, but the RTAS interfaces are still required
by guest hotplug code and PAPR+.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
these interfaces.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 0f1ae55..d7694cd 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -245,6 +245,56 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
     rtas_st(rets, 0, ret);
 }
 
+static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args, uint32_t nret,
+                                 target_ulong rets)
+{
+    int32_t power_domain;
+
+    if (nargs != 2 || nret != 2) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    /* we currently only use a single, "live insert" powerdomain for
+     * hotplugged/dlpar'd resources, so the power is always live/full (100)
+     */
+    power_domain = rtas_ld(args, 0);
+    if (power_domain != -1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, 100);
+}
+
+static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    int32_t power_domain;
+
+    if (nargs != 1 || nret != 2) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    /* we currently only use a single, "live insert" powerdomain for
+     * hotplugged/dlpar'd resources, so the power is always live/full (100)
+     */
+    power_domain = rtas_ld(args, 0);
+    if (power_domain != -1) {
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, 100);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -370,6 +420,10 @@ static void core_rtas_register_types(void)
                         rtas_ibm_set_system_parameter);
     spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
                         rtas_ibm_os_term);
+    spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
+                        rtas_set_power_level);
+    spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
+                        rtas_get_power_level);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (2 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  6:32   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state " Michael Roth
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Mike Day <ncmike@ncultra.org>

This interface allows a guest to control various platform/device
sensors. Initially, we only implement support necessary to control
sensors that are required for hotplug: DR connector indicators/LEDs,
resource allocation state, and resource isolation state.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c    | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h | 11 +++++++
 2 files changed, 95 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d7694cd..6c741fa 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -35,6 +35,18 @@
 #include "qapi-event.h"
 
 #include <libfdt.h>
+#include "hw/ppc/spapr_drc.h"
+
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 
 static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                    uint32_t token, uint32_t nargs,
@@ -295,6 +307,76 @@ static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static bool sensor_type_is_dr(uint32_t sensor_type)
+{
+    switch (sensor_type) {
+    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
+    case RTAS_SENSOR_TYPE_DR:
+    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
+        return true;
+    }
+
+    return false;
+}
+
+static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                               uint32_t token, uint32_t nargs,
+                               target_ulong args, uint32_t nret,
+                               target_ulong rets)
+{
+    uint32_t sensor_type;
+    uint32_t sensor_index;
+    uint32_t sensor_state;
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+
+    if (nargs != 3 || nret != 1) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    sensor_type = rtas_ld(args, 0);
+    sensor_index = rtas_ld(args, 1);
+    sensor_state = rtas_ld(args, 2);
+
+    if (!sensor_type_is_dr(sensor_type)) {
+        goto out_unimplemented;
+    }
+
+    /* if this is a DR sensor we can assume sensor_index == drc_index */
+    drc = spapr_dr_connector_by_index(sensor_index);
+    if (!drc) {
+        DPRINTF("rtas_set_indicator: invalid sensor/DRC index: %xh\n",
+                sensor_index);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    switch (sensor_type) {
+    case RTAS_SENSOR_TYPE_ISOLATION_STATE:
+        drck->set_isolation_state(drc, sensor_state);
+        break;
+    case RTAS_SENSOR_TYPE_DR:
+        drck->set_indicator_state(drc, sensor_state);
+        break;
+    case RTAS_SENSOR_TYPE_ALLOCATION_STATE:
+        drck->set_allocation_state(drc, sensor_state);
+        break;
+    default:
+        goto out_unimplemented;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    return;
+
+out_unimplemented:
+    /* currently only DR-related sensors are implemented */
+    DPRINTF("rtas_set_indicator: sensor/indicator not implemented: %d\n",
+            sensor_type);
+    rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -424,6 +506,8 @@ static void core_rtas_register_types(void)
                         rtas_set_power_level);
     spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
                         rtas_get_power_level);
+    spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
+                        rtas_set_indicator);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a8e6817..96eb21e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -391,6 +391,17 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 #define RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE        42
 #define RTAS_SYSPARM_UUID                        48
 
+/* RTAS indicator/sensor types
+ *
+ * as defined by PAPR+ 2.7 7.3.5.4, Table 41
+ *
+ * NOTE: currently only DR-related sensors are implemented here
+ */
+#define RTAS_SENSOR_TYPE_ISOLATION_STATE        9001
+#define RTAS_SENSOR_TYPE_DR                     9002
+#define RTAS_SENSOR_TYPE_ALLOCATION_STATE       9003
+#define RTAS_SENSOR_TYPE_ENTITY_SENSE RTAS_SENSOR_TYPE_ALLOCATION_STATE
+
 /* Possible values for the platform-processor-diagnostics-run-mode parameter
  * of the RTAS ibm,get-system-parameter call.
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state RTAS interface
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (3 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  6:33   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 06/16] spapr: add rtas_st_buffer_direct() helper Michael Roth
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Mike Day <ncmike@ncultra.org>

This interface allows a guest to read various platform/device sensors.
initially, we only implement support necessary to support hotplug:
reading of the dr-entity-sense sensor, which communicates the state of
a hotplugged resource/device to the guest (EMPTY/PRESENT/UNUSABLE).

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 6c741fa..f80beb2 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -377,6 +377,47 @@ out_unimplemented:
     rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
 }
 
+static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    uint32_t sensor_type;
+    uint32_t sensor_index;
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    uint32_t entity_sense;
+
+    if (nargs != 2 || nret != 2) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    sensor_type = rtas_ld(args, 0);
+    sensor_index = rtas_ld(args, 1);
+
+    if (sensor_type != RTAS_SENSOR_TYPE_ENTITY_SENSE) {
+        /* currently only DR-related sensors are implemented */
+        DPRINTF("rtas_get_sensor_state: sensor/indicator not implemented: %d\n",
+                sensor_type);
+        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
+        return;
+    }
+
+    drc = spapr_dr_connector_by_index(sensor_index);
+    if (!drc) {
+        DPRINTF("rtas_get_sensor_state: invalid sensor/DRC index: %xh\n",
+                sensor_index);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    entity_sense = drck->entity_sense(drc);
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, entity_sense);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -508,6 +549,8 @@ static void core_rtas_register_types(void)
                         rtas_get_power_level);
     spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
                         rtas_set_indicator);
+    spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
+                        rtas_get_sensor_state);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 06/16] spapr: add rtas_st_buffer_direct() helper
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (4 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state " Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This is similar to the existing rtas_st_buffer(), but for cases
where the guest is not expecting a length-encoded byte array.
Namely, for calls where a "work area" buffer is used to pass
around arbitrary fields/data.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 include/hw/ppc/spapr.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 96eb21e..3cc4490 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -425,6 +425,13 @@ static inline void rtas_st(target_ulong phys, int n, uint32_t val)
     stl_be_phys(&address_space_memory, ppc64_phys_to_real(phys + 4*n), val);
 }
 
+static inline void rtas_st_buffer_direct(target_ulong phys,
+                                         target_ulong phys_len,
+                                         uint8_t *buffer, uint16_t buffer_len)
+{
+    cpu_physical_memory_write(ppc64_phys_to_real(phys), buffer,
+                              MIN(buffer_len, phys_len));
+}
 
 static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
                                   uint8_t *buffer, uint16_t buffer_len)
@@ -434,8 +441,7 @@ static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
     }
     stw_be_phys(&address_space_memory,
                 ppc64_phys_to_real(phys), buffer_len);
-    cpu_physical_memory_write(ppc64_phys_to_real(phys + 2),
-                              buffer, MIN(buffer_len, phys_len - 2));
+    rtas_st_buffer_direct(phys + 2, phys_len - 2, buffer, buffer_len);
 }
 
 typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (5 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 06/16] spapr: add rtas_st_buffer_direct() helper Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  6:40   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This interface is used to fetch an OF device-tree nodes that describes a
newly-attached device to guest. It is called multiple times to walk the
device-tree node and fetch individual properties into a 'workarea'/buffer
provided by the guest.

The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
the initial hotplug operation, and the state of these RTAS calls is tracked by
the sPAPRDRConnector. When the last of these properties is successfully
fetched, we report as special return value to the guest and transition
the device to a 'configured' state on the QEMU/DRC side.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index f80beb2..b8551b0 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -418,6 +418,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, entity_sense);
 }
 
+/* configure-connector work area offsets, int32_t units for field
+ * indexes, bytes for field offset/len values.
+ *
+ * as documented by PAPR+ v2.7, 13.5.3.5
+ */
+#define CC_IDX_NODE_NAME_OFFSET 2
+#define CC_IDX_PROP_NAME_OFFSET 2
+#define CC_IDX_PROP_LEN 3
+#define CC_IDX_PROP_DATA_OFFSET 4
+#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
+#define CC_WA_LEN 4096
+
+static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
+                                         sPAPREnvironment *spapr,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args, uint32_t nret,
+                                         target_ulong rets)
+{
+    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
+    uint64_t wa_offset;
+    uint32_t drc_index;
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    sPAPRDRCCResponse resp;
+    const struct fdt_property *prop = NULL;
+    char *prop_name = NULL;
+    int prop_len, rc;
+
+    drc_index = rtas_ld(wa_addr, 0);
+    drc = spapr_dr_connector_by_index(drc_index);
+    if (!drc) {
+        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
+                drc_index);
+        rc = RTAS_OUT_PARAM_ERROR;
+        goto out;
+    }
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
+
+    switch (resp) {
+    case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
+        /* provide the name of the next OF node */
+        wa_offset = CC_VAL_DATA_OFFSET;
+        rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)prop_name, strlen(prop_name) + 1);
+        break;
+    case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY:
+        /* provide the name of the next OF property */
+        wa_offset = CC_VAL_DATA_OFFSET;
+        rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)prop_name, strlen(prop_name) + 1);
+
+        /* provide the length and value of the OF property. data gets placed
+         * immediately after NULL terminator of the OF property's name string
+         */
+        wa_offset += strlen(prop_name) + 1,
+        rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
+        rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
+        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
+                              (uint8_t *)((struct fdt_property *)prop)->data,
+                              prop_len);
+        break;
+    case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
+    case SPAPR_DR_CC_RESPONSE_ERROR:
+    case SPAPR_DR_CC_RESPONSE_SUCCESS:
+        break;
+    default:
+        /* drck->configure_connector() should not return anything else */
+        g_assert(false);
+    }
+
+    rc = resp;
+out:
+    g_free(prop_name);
+    rtas_st(rets, 0, rc);
+}
+
 static struct rtas_call {
     const char *name;
     spapr_rtas_fn fn;
@@ -551,6 +630,8 @@ static void core_rtas_register_types(void)
                         rtas_set_indicator);
     spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
                         rtas_get_sensor_state);
+    spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
+                        rtas_ibm_configure_connector);
 }
 
 type_init(core_rtas_register_types)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (6 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  6:49   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface Michael Roth
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

This extends the data structures currently used to report EPOW events to
guests via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd, and current
versions of rtas_errd/powerpc-utils already support the use of this
mechanism for initiating hotplug operations).

We also add support for queues of pending RTAS events, since in the
case of hotplug there's chance for multiple events being in-flight
at any point in time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   2 +-
 hw/ppc/spapr_events.c  | 271 ++++++++++++++++++++++++++++++++++++++++---------
 include/hw/ppc/spapr.h |   5 +-
 3 files changed, 226 insertions(+), 52 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bfacc9d..861107e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1640,7 +1640,7 @@ static void ppc_spapr_init(MachineState *machine)
     spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
                                             kernel_size, kernel_le,
                                             boot_device, kernel_cmdline,
-                                            spapr->epow_irq);
+                                            spapr->check_exception_irq);
     assert(spapr->fdt_skel != NULL);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 283e96b..0eeb1d8 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -32,6 +32,9 @@
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci/pci.h"
+#include "hw/pci-host/spapr.h"
+#include "hw/ppc/spapr_drc.h"
 
 #include <libfdt.h>
 
@@ -77,6 +80,7 @@ struct rtas_error_log {
 #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
 #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
 #define   RTAS_LOG_TYPE_EPOW                    0x00000040
+#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
     uint32_t extended_length;
 } QEMU_PACKED;
 
@@ -166,6 +170,38 @@ struct epow_log_full {
     struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_hp {
+#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct rtas_event_log_v6_section_header hdr;
+    uint8_t hotplug_type;
+#define RTAS_LOG_V6_HP_TYPE_CPU                          1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
+#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
+#define RTAS_LOG_V6_HP_TYPE_PHB                          4
+#define RTAS_LOG_V6_HP_TYPE_PCI                          5
+    uint8_t hotplug_action;
+#define RTAS_LOG_V6_HP_ACTION_ADD                        1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
+    uint8_t hotplug_identifier;
+#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
+    uint8_t reserved;
+    union {
+        uint32_t index;
+        uint32_t count;
+        char name[1];
+    } drc;
+} QEMU_PACKED;
+
+struct hp_log_full {
+    struct rtas_error_log hdr;
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_maina maina;
+    struct rtas_event_log_v6_mainb mainb;
+    struct rtas_event_log_v6_hp hp;
+} QEMU_PACKED;
+
 #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
 #define EVENT_MASK_EPOW                      0x40000000
 #define EVENT_MASK_HOTPLUG                   0x10000000
@@ -181,67 +217,84 @@ struct epow_log_full {
         }                                                          \
     } while (0)
 
-void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
+void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
 {
-    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
-    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
+    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
+    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
 
     _FDT((fdt_begin_node(fdt, "event-sources")));
 
     _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
     _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
     _FDT((fdt_property(fdt, "interrupt-ranges",
-                       epow_irq_ranges, sizeof(epow_irq_ranges))));
+                       irq_ranges, sizeof(irq_ranges))));
 
     _FDT((fdt_begin_node(fdt, "epow-events")));
-    _FDT((fdt_property(fdt, "interrupts",
-                       epow_interrupts, sizeof(epow_interrupts))));
+    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
     _FDT((fdt_end_node(fdt)));
 
     _FDT((fdt_end_node(fdt)));
 }
 
-static struct epow_log_full *pending_epow;
-static uint32_t next_plid;
+typedef struct EventLogEntry {
+    int log_type;
+    void *data;
+    QTAILQ_ENTRY(EventLogEntry) next;
+} EventLogEntry;
 
-static void spapr_powerdown_req(Notifier *n, void *opaque)
+static struct {
+    QTAILQ_HEAD(, EventLogEntry) pending_events;
+} rtas_event_log;
+
+static void rtas_event_log_queue(int log_type, void *data)
 {
-    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
-    struct rtas_error_log *hdr;
-    struct rtas_event_log_v6 *v6hdr;
-    struct rtas_event_log_v6_maina *maina;
-    struct rtas_event_log_v6_mainb *mainb;
-    struct rtas_event_log_v6_epow *epow;
-    struct tm tm;
-    int year;
+    EventLogEntry *entry = g_new(EventLogEntry, 1);
+
+    entry->log_type = log_type;
+    entry->data = data;
+    QTAILQ_INSERT_TAIL(&rtas_event_log.pending_events, entry, next);
+}
 
-    if (pending_epow) {
-        /* For now, we just throw away earlier events if two come
-         * along before any are consumed.  This is sufficient for our
-         * powerdown messages, but we'll need more if we do more
-         * general error/event logging */
-        g_free(pending_epow);
+static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask)
+{
+    EventLogEntry *entry = NULL;
+
+    /* we only queue EPOW events atm. */
+    if ((event_mask & EVENT_MASK_EPOW) == 0) {
+        return NULL;
     }
-    pending_epow = g_malloc0(sizeof(*pending_epow));
-    hdr = &pending_epow->hdr;
-    v6hdr = &pending_epow->v6hdr;
-    maina = &pending_epow->maina;
-    mainb = &pending_epow->mainb;
-    epow = &pending_epow->epow;
 
-    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
-                               | RTAS_LOG_SEVERITY_EVENT
-                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
-                               | RTAS_LOG_OPTIONAL_PART_PRESENT
-                               | RTAS_LOG_TYPE_EPOW);
-    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
-                                       - sizeof(pending_epow->hdr));
+    QTAILQ_FOREACH(entry, &rtas_event_log.pending_events, next) {
+        /* EPOW and hotplug events are surfaced in the same manner */
+        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
+            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
+            break;
+        }
+    }
+
+    if (entry) {
+        QTAILQ_REMOVE(&rtas_event_log.pending_events, entry, next);
+    }
 
+    return entry;
+}
+
+static uint32_t next_plid;
+
+static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
+{
     v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
         | RTAS_LOG_V6_B0_BIGENDIAN;
     v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
         | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
     v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
+}
+
+static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
+                             int section_count)
+{
+    struct tm tm;
+    int year;
 
     maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
     maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
@@ -256,8 +309,37 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
                                        | (to_bcd(tm.tm_min) << 16)
                                        | (to_bcd(tm.tm_sec) << 8));
     maina->creator_id = 'H'; /* Hypervisor */
-    maina->section_count = 3; /* Main-A, Main-B and EPOW */
+    maina->section_count = section_count;
     maina->plid = next_plid++;
+}
+
+static void spapr_powerdown_req(Notifier *n, void *opaque)
+{
+    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
+    struct rtas_error_log *hdr;
+    struct rtas_event_log_v6 *v6hdr;
+    struct rtas_event_log_v6_maina *maina;
+    struct rtas_event_log_v6_mainb *mainb;
+    struct rtas_event_log_v6_epow *epow;
+    struct epow_log_full *pending_epow;
+
+    pending_epow = g_malloc0(sizeof(*pending_epow));
+    hdr = &pending_epow->hdr;
+    v6hdr = &pending_epow->v6hdr;
+    maina = &pending_epow->maina;
+    mainb = &pending_epow->mainb;
+    epow = &pending_epow->epow;
+
+    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
+                               | RTAS_LOG_SEVERITY_EVENT
+                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
+                               | RTAS_LOG_OPTIONAL_PART_PRESENT
+                               | RTAS_LOG_TYPE_EPOW);
+    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
+                                       - sizeof(pending_epow->hdr));
+
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
 
     mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
     mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
@@ -274,7 +356,78 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
+    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow);
+
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
+{
+    struct hp_log_full *new_hp;
+    struct rtas_error_log *hdr;
+    struct rtas_event_log_v6 *v6hdr;
+    struct rtas_event_log_v6_maina *maina;
+    struct rtas_event_log_v6_mainb *mainb;
+    struct rtas_event_log_v6_hp *hp;
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    sPAPRDRConnectorType drc_type = drck->get_type(drc);
+
+    new_hp = g_malloc0(sizeof(struct hp_log_full));
+    hdr = &new_hp->hdr;
+    v6hdr = &new_hp->v6hdr;
+    maina = &new_hp->maina;
+    mainb = &new_hp->mainb;
+    hp = &new_hp->hp;
+
+    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
+                               | RTAS_LOG_SEVERITY_EVENT
+                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
+                               | RTAS_LOG_OPTIONAL_PART_PRESENT
+                               | RTAS_LOG_INITIATOR_HOTPLUG
+                               | RTAS_LOG_TYPE_HOTPLUG);
+    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
+                                       - sizeof(new_hp->hdr));
+
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
+
+    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
+    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
+    mainb->subsystem_id = 0x80; /* External environment */
+    mainb->event_severity = 0x00; /* Informational / non-error */
+    mainb->event_subtype = 0x00; /* Normal shutdown */
+
+    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
+    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
+    hp->hdr.section_version = 1; /* includes extended modifier */
+    hp->hotplug_action = hp_action;
+
+
+    switch (drc_type) {
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        hp->drc.index = cpu_to_be32(drck->get_index(drc));
+        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
+        break;
+    default:
+        /* skip notification for unknown connector types */
+        g_free(new_hp);
+        return;
+    }
+
+    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp);
+
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
+{
+    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
+}
+
+void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
+{
+    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
 }
 
 static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
@@ -282,8 +435,9 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                             target_ulong args,
                             uint32_t nret, target_ulong rets)
 {
-    uint32_t mask, buf, len;
+    uint32_t mask, buf, len, event_len;
     uint64_t xinfo;
+    EventLogEntry *event;
 
     if ((nargs < 6) || (nargs > 7) || nret != 1) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
@@ -298,23 +452,40 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
     }
 
-    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
-        if (sizeof(*pending_epow) < len) {
-            len = sizeof(*pending_epow);
-        }
+    event = rtas_event_log_dequeue(mask);
+    if (!event) {
+        goto out_no_events;
+    }
 
-        cpu_physical_memory_write(buf, pending_epow, len);
-        g_free(pending_epow);
-        pending_epow = NULL;
-        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
-    } else {
-        rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
+    switch (event->log_type) {
+        case RTAS_LOG_TYPE_EPOW:
+            event_len = sizeof(struct epow_log_full);
+            break;
+        case RTAS_LOG_TYPE_HOTPLUG:
+            event_len = sizeof(struct hp_log_full);
+            break;
+        default:
+            goto out_no_events;
     }
+
+    if (event_len < len) {
+        len = event_len;
+    }
+
+    cpu_physical_memory_write(buf, event->data, len);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    g_free(event->data);
+    g_free(event);
+    return;
+
+out_no_events:
+    rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
 }
 
 void spapr_events_init(sPAPREnvironment *spapr)
 {
-    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
+    QTAILQ_INIT(&rtas_event_log.pending_events);
+    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3cc4490..1d27708 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -3,6 +3,7 @@
 
 #include "sysemu/dma.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/spapr_drc.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
@@ -31,7 +32,7 @@ typedef struct sPAPREnvironment {
     struct PPCTimebase tb;
     bool has_graphics;
 
-    uint32_t epow_irq;
+    uint32_t check_exception_irq;
     Notifier epow_notifier;
 
     /* Migration state */
@@ -498,6 +499,8 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
                       sPAPRTCETable *tcet);
+void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
+void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
 
 #define TYPE_SPAPR_RTC "spapr-rtc"
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (7 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  9:11   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface here.

Since events surfaced by this call are mutually exclusive to those
surfaced via check-exception, we also update the RTAS event queue code
to accept a boolean to mark/filter for events accordingly.

Events of this sort are not currently generated by QEMU, but the interface
has been tested by surfacing hotplug events via event-scan in place
of check-exception.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |  1 +
 hw/ppc/spapr_events.c  | 67 ++++++++++++++++++++++++++++++++++++++++++++++----
 include/hw/ppc/spapr.h |  2 ++
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 861107e..4b7be72 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -541,6 +541,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
         refpoints, sizeof(refpoints))));
 
     _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
+    _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", RTAS_EVENT_SCAN_RATE)));
 
     /*
      * According to PAPR, rtas ibm,os-term does not guarantee a return
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 0eeb1d8..867c8e0 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -238,6 +238,11 @@ void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
 
 typedef struct EventLogEntry {
     int log_type;
+    /* according to PAPR+ 2.7, 10.1, events to be retrieved via
+     * check-exception are mutually exclusive with those to be
+     * retrieved via event-scan
+     */
+    bool exception;
     void *data;
     QTAILQ_ENTRY(EventLogEntry) next;
 } EventLogEntry;
@@ -246,16 +251,17 @@ static struct {
     QTAILQ_HEAD(, EventLogEntry) pending_events;
 } rtas_event_log;
 
-static void rtas_event_log_queue(int log_type, void *data)
+static void rtas_event_log_queue(int log_type, void *data, bool exception)
 {
     EventLogEntry *entry = g_new(EventLogEntry, 1);
 
     entry->log_type = log_type;
+    entry->exception = exception;
     entry->data = data;
     QTAILQ_INSERT_TAIL(&rtas_event_log.pending_events, entry, next);
 }
 
-static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask)
+static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask, bool exception)
 {
     EventLogEntry *entry = NULL;
 
@@ -265,6 +271,9 @@ static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask)
     }
 
     QTAILQ_FOREACH(entry, &rtas_event_log.pending_events, next) {
+        if (entry->exception != exception) {
+            continue;
+        }
         /* EPOW and hotplug events are surfaced in the same manner */
         if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
             entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
@@ -356,7 +365,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow);
+    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow, true);
 
     qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
 }
@@ -415,7 +424,7 @@ static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
         return;
     }
 
-    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp);
+    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
 
     qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
 }
@@ -452,7 +461,54 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
     }
 
-    event = rtas_event_log_dequeue(mask);
+    event = rtas_event_log_dequeue(mask, true);
+    if (!event) {
+        goto out_no_events;
+    }
+
+    switch (event->log_type) {
+        case RTAS_LOG_TYPE_EPOW:
+            event_len = sizeof(struct epow_log_full);
+            break;
+        case RTAS_LOG_TYPE_HOTPLUG:
+            event_len = sizeof(struct hp_log_full);
+            break;
+        default:
+            goto out_no_events;
+    }
+
+    if (event_len < len) {
+        len = event_len;
+    }
+
+    cpu_physical_memory_write(buf, event->data, len);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    g_free(event->data);
+    g_free(event);
+    return;
+
+out_no_events:
+    rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
+}
+
+static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                       uint32_t token, uint32_t nargs,
+                       target_ulong args,
+                       uint32_t nret, target_ulong rets)
+{
+    uint32_t mask, buf, len, event_len;
+    EventLogEntry *event;
+
+    if (nargs != 4 || nret != 1) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    mask = rtas_ld(args, 0);
+    buf = rtas_ld(args, 2);
+    len = rtas_ld(args, 3);
+
+    event = rtas_event_log_dequeue(mask, false);
     if (!event) {
         goto out_no_events;
     }
@@ -490,4 +546,5 @@ void spapr_events_init(sPAPREnvironment *spapr)
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
                         check_exception);
+    spapr_rtas_register(RTAS_EVENT_SCAN, "event-scan", event_scan);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1d27708..8acf41a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -465,6 +465,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+#define RTAS_EVENT_SCAN_RATE    1
+
 typedef struct sPAPRTCETable sPAPRTCETable;
 
 #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt()
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (8 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  9:26   ` David Gibson
  2015-02-26 15:56   ` Nathan Fontenot
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 11/16] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This function handles generation of ibm,drc-* array device tree
properties to describe DRC topology to guests. This will by used
by the guest to direct RTAS calls to manage any dynamic resources
we associate with a particular DR Connector as part of
hotplug/unplug.

Since general management of boot-time device trees are handled
outside of sPAPRDRConnector, we insert these values blindly given
an FDT and offset. A mask of sPAPRDRConnector types is given to
instruct us on what types of connectors entries should be generated
for, since descriptions for different connectors may live in
different parts of the device tree.

Based on code originally written by Nathan Fontenot.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_drc.c         | 165 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h |   4 +-
 2 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 8b01200..2f71696 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -569,3 +569,168 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
             (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
             (id & DRC_INDEX_ID_MASK));
 }
+
+/* generate a string the describes the DRC to encode into the
+ * device tree.
+ *
+ * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
+ */
+static char *spapr_drc_get_type_str(sPAPRDRConnectorType type)
+{
+    char *type_str = NULL;
+
+    switch (type) {
+    case SPAPR_DR_CONNECTOR_TYPE_CPU:
+        type_str = g_strdup_printf("CPU");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PHB:
+        type_str = g_strdup_printf("PHB");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_VIO:
+        type_str = g_strdup_printf("SLOT");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_PCI:
+        type_str = g_strdup_printf("28");
+        break;
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        type_str = g_strdup_printf("MEM");
+        break;
+    default:
+        g_assert(false);
+    }
+
+    return type_str;
+}
+
+/**
+ * spapr_drc_populate_dt
+ *
+ * @fdt: libfdt device tree
+ * @path: path in the DT to generate properties
+ * @owner: parent Object/DeviceState for which to generate DRC
+ *         descriptions for
+ * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
+ *   to the types of DRCs to generate entries for
+ *
+ * generate OF properties to describe DRC topology/indices to guests
+ *
+ * as documented in PAPR+ v2.1, 13.5.2
+ */
+int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
+                          uint32_t drc_type_mask)
+{
+    Object *root_container;
+    ObjectProperty *prop;
+    uint32_t drc_count = 0;
+    GArray *drc_indexes, *drc_power_domains;
+    GString *drc_names, *drc_types;
+    int ret;
+
+    /* the first entry of each properties is a 32-bit integer encoding
+     * the number of elements in the array. we won't know this until
+     * we complete the iteration through all the matching DRCs, but
+     * reserve the space now and set the offsets accordingly so we
+     * can fill them in later.
+     */
+    drc_indexes = g_array_new(false, true, sizeof(uint32_t));
+    drc_indexes = g_array_set_size(drc_indexes, 1);
+    drc_power_domains = g_array_new(false, true, sizeof(uint32_t));
+    drc_power_domains = g_array_set_size(drc_power_domains, 1);
+    drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
+    drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
+
+    /* aliases for all DRConnector objects will be rooted in QOM
+     * composition tree at DRC_CONTAINER_PATH
+     */
+    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+
+    QTAILQ_FOREACH(prop, &root_container->properties, node) {
+        Object *obj;
+        sPAPRDRConnector *drc;
+        sPAPRDRConnectorClass *drck;
+        char *drc_type;
+        uint32_t drc_index, drc_power_domain;
+
+        if (!strstart(prop->type, "link<", NULL)) {
+            continue;
+        }
+
+        obj = object_property_get_link(root_container, prop->name, NULL);
+        drc = SPAPR_DR_CONNECTOR(obj);
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+        if (owner && (drc->owner != owner)) {
+            continue;
+        }
+
+        if ((drc->type & drc_type_mask) == 0) {
+            continue;
+        }
+
+        drc_count++;
+
+        /* ibm,drc-indexes */
+        drc_index = cpu_to_be32(drck->get_index(drc));
+        g_array_append_val(drc_indexes, drc_index);
+
+        /* ibm,drc-power-domains */
+        drc_power_domain = cpu_to_be32(-1);
+        g_array_append_val(drc_power_domains, drc_power_domain);
+
+        /* ibm,drc-names */
+        drc_names = g_string_append(drc_names, drck->get_name(drc));
+        drc_names = g_string_insert_len(drc_names, -1, "\0", 1);
+
+        /* ibm,drc-types */
+        drc_type = spapr_drc_get_type_str(drc->type);
+        drc_types = g_string_append(drc_types, drc_type);
+        drc_types = g_string_insert_len(drc_types, -1, "\0", 1);
+        g_free(drc_type);
+    }
+ 
+    /* now write the drc count into the space we reserved at the
+     * beginning of the arrays previously
+     */
+    *(uint32_t *)drc_indexes->data = cpu_to_be32(drc_count);
+    *(uint32_t *)drc_power_domains->data = cpu_to_be32(drc_count);
+    *(uint32_t *)drc_names->str = cpu_to_be32(drc_count);
+    *(uint32_t *)drc_types->str = cpu_to_be32(drc_count);
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes",
+                      drc_indexes->data,
+                      drc_indexes->len * sizeof(uint32_t));
+    if (ret) {
+        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
+        goto out;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains",
+                      drc_power_domains->data,
+                      drc_power_domains->len * sizeof(uint32_t));
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
+        goto out;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names",
+                      drc_names->str, drc_names->len);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
+        goto out;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types",
+                      drc_types->str, drc_types->len);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
+        goto out;
+    }
+
+out:
+    g_array_free(drc_indexes, true);
+    g_array_free(drc_power_domains, true);
+    g_string_free(drc_names, true);
+    g_string_free(drc_types, true);
+
+    return ret;
+}
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index d6cf597..5895534 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -197,9 +197,11 @@ typedef struct sPAPRDRConnectorClass {
 
 sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
                                          sPAPRDRConnectorType type,
-                                         uint32_t token);
+                                         uint32_t id);
 sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
 sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
                                            uint32_t id);
+int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
+                          uint32_t drc_type_mask);
 
 #endif /* __HW_SPAPR_DRC_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 11/16] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (9 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 12/16] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This option enables/disables PCI hotplug for a particular PHB.

Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.3.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 5 +++++
 hw/ppc/spapr_pci.c          | 2 ++
 include/hw/pci-host/spapr.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4b7be72..cdd77d9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1799,6 +1799,11 @@ static const TypeInfo spapr_machine_info = {
             .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
             .property = "mem_win_size",\
             .value    = "0x20000000",\
+        },\
+        {\
+            .driver   = "spapr-pci-host-bridge",\
+            .property = "dynamic-reconfiguration",\
+            .value    = "off",\
         }
 
 #define SPAPR_COMPAT_2_1 \
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index cebdeb3..ab67715 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -684,6 +684,8 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_UINT64("io_win_size", sPAPRPHBState, io_win_size,
                        SPAPR_PCI_IO_WIN_SIZE),
+    DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
+                     true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index d725f0e..24867b0 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -67,6 +67,7 @@ struct sPAPRPHBState {
     uint32_t index;
     uint64_t buid;
     char *dtbusname;
+    bool dr_enabled;
 
     MemoryRegion memspace, iospace;
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 12/16] spapr_pci: create DRConnectors for each PCI slot during PHB realize
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (10 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 11/16] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs Michael Roth
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

These will be used to support hotplug/unplug of PCI devices to the PCI
bus associated with a particular PHB.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index ab67715..7ee6f9e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -47,6 +47,8 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
+#include "hw/ppc/spapr_drc.h"
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
     sPAPRPHBState *sphb;
@@ -628,6 +630,15 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
         sphb->lsi_table[i].irq = irq;
     }
 
+    /* allocate connectors for child PCI devices */
+    if (sphb->dr_enabled) {
+        for (i = 0; i < PCI_SLOT_MAX; i++) {
+            spapr_dr_connector_new(OBJECT(phb),
+                                   SPAPR_DR_CONNECTOR_TYPE_PCI,
+                                   (sphb->index << 16) | (i << 3));
+        }
+    }
+
     if (!info->finish_realize) {
         error_setg(errp, "finish_realize not defined");
         return;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (11 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 12/16] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24  9:29   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c Michael Roth
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

Reserve 32 entries of type PCI in each PHB's initial FDT. This
advertises to guests that each PHB is DR-capable device with
physical hotpluggable slots. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7ee6f9e..6a3d917 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -878,7 +878,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt)
 {
-    int bus_off, i, j;
+    int bus_off, i, j, ret;
     char nodename[256];
     uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
     const uint64_t mmiosize = memory_region_size(&phb->memwindow);
@@ -967,6 +967,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
     object_child_foreach(OBJECT(phb), spapr_phb_children_dt,
                          &((sPAPRTCEDT){ .fdt = fdt, .node_off = bus_off }));
 
+    ret = spapr_drc_populate_dt(fdt, bus_off, OBJECT(phb),
+                                SPAPR_DR_CONNECTOR_TYPE_PCI);
+    if (ret) {
+        return ret;
+    }
+
     return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (12 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-24 10:20   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

We need to work with PCI BARs to generate OF properties
during PCI hotplug for sPAPR guests.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/pci/pci.c         | 2 +-
 include/hw/pci/pci.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d508930..f37d3e1 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -123,7 +123,7 @@ static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
 
 static QLIST_HEAD(, PCIHostState) pci_host_bridges;
 
-static int pci_bar(PCIDevice *d, int reg)
+int pci_bar(PCIDevice *d, int reg)
 {
     uint8_t type;
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 97a83d3..eba9200 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -330,6 +330,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
 MemoryRegion *pci_address_space(PCIDevice *dev);
 MemoryRegion *pci_address_space_io(PCIDevice *dev);
+int pci_bar(PCIDevice *d, int reg);
 
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (13 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-25  3:11   ` David Gibson
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

This enables hotplug for PHB bridges. Upon hotplug we generate the
OF-nodes required by PAPR specification and IEEE 1275-1994
"PCI Bus Binding to Open Firmware" for the device.

We associate the corresponding FDT for these nodes with the DrcEntry
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification. The FDT is cleaned up in the case of unplug.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 371 insertions(+), 20 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 6a3d917..b9af1cd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -33,8 +33,11 @@
 #include <libfdt.h>
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qapi/qmp/qerror.h"
 
 #include "hw/pci/pci_bus.h"
+#include "hw/ppc/spapr_drc.h"
+#include "sysemu/device_tree.h"
 
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN           0
@@ -47,7 +50,13 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
-#include "hw/ppc/spapr_drc.h"
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            return ret;                                            \
+        }                                                          \
+    } while (0)
 
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
@@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &phb->iommu_as;
 }
 
+/* Macros to operate with address in OF binding to PCI */
+#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
+#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
+#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
+#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
+#define b_ss(x)         b_x((x), 24, 2) /* the space code */
+#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
+#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
+#define b_fff(x)        b_x((x), 8, 3)  /* function number */
+#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
+
+/* for 'reg'/'assigned-addresses' OF properties */
+#define RESOURCE_CELLS_SIZE 2
+#define RESOURCE_CELLS_ADDRESS 3
+
+typedef struct ResourceFields {
+    uint32_t phys_hi;
+    uint32_t phys_mid;
+    uint32_t phys_lo;
+    uint32_t size_hi;
+    uint32_t size_lo;
+} ResourceFields;
+
+typedef struct ResourceProps {
+    union {
+        ResourceFields fields[8];
+        uint8_t data[sizeof(ResourceFields) * 8];
+    } reg;
+    union {
+        ResourceFields fields[7];
+        uint8_t data[sizeof(ResourceFields) * 7];
+    } assigned;
+    uint32_t reg_len;
+    uint32_t assigned_len;
+} ResourceProps;
+
+/* fill in the 'reg'/'assigned-resources' OF properties for
+ * a PCI device. 'reg' describes resource requirements for a
+ * device's IO/MEM regions, 'assigned-addresses' describes the
+ * actual resource assignments.
+ *
+ * the properties are arrays of ('phys-addr', 'size') pairs describing
+ * the addressable regions of the PCI device, where 'phys-addr' is a
+ * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
+ * (phys.hi, phys.mid, phys.lo), and 'size' is a
+ * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
+ *
+ * phys.hi = 0xYYXXXXZZ, where:
+ *   0xYY = npt000ss
+ *          |||   |
+ *          |||   +-- space code: 1 if IO region, 2 if MEM region
+ *          ||+------ for non-relocatable IO: 1 if aliased
+ *          ||        for relocatable IO: 1 if below 64KB
+ *          ||        for MEM: 1 if below 1MB
+ *          |+------- 1 if region is prefetchable
+ *          +-------- 1 if region is non-relocatable
+ *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
+ *            bits respectively
+ *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
+ *          to the region
+ *
+ * phys.mid and phys.lo correspond respectively to the hi/lo portions
+ * of the actual address of the region.
+ *
+ * how the phys-addr/size values are used differ slightly between
+ * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
+ * an additional description for the config space region of the
+ * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
+ * to describe the region as relocatable, with an address-mapping
+ * that corresponds directly to the PHB's address space for the
+ * resource. 'assigned-addresses' always has n=1 set with an absolute
+ * address assigned for the resource. in general, 'assigned-addresses'
+ * won't be populated, since addresses for PCI devices are generally
+ * unmapped initially and left to the guest to assign.
+ *
+ * in accordance with PCI Bus Binding to Open Firmware,
+ * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
+ * Appendix C.
+ */
+static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
+{
+    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
+    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
+                       b_ddddd(PCI_SLOT(d->devfn)) |
+                       b_fff(PCI_FUNC(d->devfn)));
+    ResourceFields *reg, *assigned;
+    int i, reg_idx = 0, assigned_idx = 0;
+
+    /* config space region */
+    reg = &rp->reg.fields[reg_idx++];
+    reg->phys_hi = cpu_to_be32(dev_id);
+    reg->phys_mid = 0;
+    reg->phys_lo = 0;
+    reg->size_hi = 0;
+    reg->size_lo = 0;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!d->io_regions[i].size) {
+            continue;
+        }
+
+        reg = &rp->reg.fields[reg_idx++];
+
+        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
+        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
+            reg->phys_hi |= cpu_to_be32(b_ss(1));
+        } else {
+            reg->phys_hi |= cpu_to_be32(b_ss(2));
+        }
+        reg->phys_mid = 0;
+        reg->phys_lo = 0;
+        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
+        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
+
+        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
+            continue;
+        }
+
+        assigned = &rp->assigned.fields[assigned_idx++];
+        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
+        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
+        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
+        assigned->size_hi = reg->size_hi;
+        assigned->size_lo = reg->size_lo;
+    }
+
+    rp->reg_len = reg_idx * sizeof(ResourceFields);
+    rp->assigned_len = assigned_idx * sizeof(ResourceFields);
+}
+
+static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
+                                       int phb_index, int drc_index,
+                                       const char *drc_name)
+{
+    ResourceProps rp;
+    bool is_bridge = false;
+    int pci_status;
+
+    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
+        PCI_HEADER_TYPE_BRIDGE) {
+        is_bridge = true;
+    }
+
+    /* in accordance with PAPR+ v2.7 13.6.3, Table 181 */
+    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
+                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
+                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
+                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
+    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
+                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2)
+                            << 8));
+    if (pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
+                 pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
+    }
+
+    if (!is_bridge) {
+        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
+            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
+            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
+    }
+
+    if (pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
+                 pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
+    }
+
+    if (pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
+                 pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
+        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
+
+    /* the following fdt cells are masked off the pci status register */
+    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
+    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
+                          PCI_STATUS_DEVSEL_MASK & pci_status));
+
+    if (pci_status & PCI_STATUS_FAST_BACK) {
+        _FDT(fdt_setprop(fdt, offset, "fast-back-to-back", NULL, 0));
+    }
+    if (pci_status & PCI_STATUS_66MHZ) {
+        _FDT(fdt_setprop(fdt, offset, "66mhz-capable", NULL, 0));
+    }
+    if (pci_status & PCI_STATUS_UDF) {
+        _FDT(fdt_setprop(fdt, offset, "udf-supported", NULL, 0));
+    }
+
+    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
+    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", drc_name, strlen(drc_name)));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
+                          RESOURCE_CELLS_ADDRESS));
+    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
+                          RESOURCE_CELLS_SIZE));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
+                          RESOURCE_CELLS_SIZE));
+
+    populate_resource_props(dev, &rp);
+    _FDT(fdt_setprop(fdt, offset, "reg", rp.reg.data, rp.reg_len));
+    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
+                     rp.assigned.data, rp.assigned_len));
+
+    return 0;
+}
+
+/* create OF node for pci device and required OF DT properties */
+static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
+                                       int drc_index, const char *drc_name,
+                                       int *dt_offset)
+{
+    void *fdt;
+    int offset, ret, fdt_size;
+    int slot = PCI_SLOT(dev->devfn);
+    char nodename[512];
+
+    fdt = create_device_tree(&fdt_size);
+    sprintf(nodename, "pci@%d", slot);
+    offset = fdt_add_subnode(fdt, 0, nodename);
+    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index,
+                                      drc_name);
+    g_assert(!ret);
+
+    *dt_offset = offset;
+    return fdt;
+}
+
+static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
+                                     sPAPRPHBState *phb,
+                                     PCIDevice *pdev,
+                                     Error **errp)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    DeviceState *dev = DEVICE(pdev);
+    int drc_index = drck->get_index(drc);
+    const char *drc_name = drck->get_name(drc);
+    void *fdt = NULL;
+    int fdt_start_offset = 0;
+
+    /* boot-time devices get their device tree node created by SLOF, but for
+     * hotplugged devices we need QEMU to generate it so the guest can fetch
+     * it via RTAS
+     */
+    if (dev->hotplugged) {
+        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index, drc_name,
+                                        &fdt_start_offset);
+    }
+
+    drck->attach(drc, DEVICE(pdev),
+                 fdt, fdt_start_offset, !dev->hotplugged, errp);
+    if (errp) {
+        g_free(fdt);
+    }
+}
+
+static void spapr_phb_remove_pci_device_cb(DeviceState *dev, void *opaque)
+{
+    /* some version guests do not wait for completion of a device
+     * cleanup (generally done asynchronously by the kernel) before
+     * signaling to QEMU that the device is safe, but instead sleep
+     * for some 'safe' period of time. unfortunately on a busy host
+     * this sleep isn't guaranteed to be long enough, resulting in
+     * bad things like IRQ lines being left asserted during final
+     * device removal. to deal with this we call reset just prior
+     * to finalizing the device, which will put the device back into
+     * an 'idle' state, as the device cleanup code expects.
+     */
+    pci_device_reset(PCI_DEVICE(dev));
+    object_unparent(OBJECT(dev));
+}
+
+static void spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
+                                        sPAPRPHBState *phb,
+                                        PCIDevice *pdev,
+                                        Error **errp)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, errp);
+}
+
+static void spapr_phb_hot_plug_child(HotplugHandler *plug_handler,
+                                     DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
+        SPAPR_DR_CONNECTOR_TYPE_PCI,
+        (phb->index << 16) |
+        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
+        pdev->devfn);
+    Error *local_err = NULL;
+
+    /* if DR is disabled we don't need to do anything in the case of
+     * hotplug or coldplug callbacks
+     */
+    if (!phb->dr_enabled) {
+        /* if this is a hotplug operation initiated by the user
+         * we need to let them know it's not enabled
+         */
+        if (plugged_dev->hotplugged) {
+            error_set(errp, QERR_BUS_NO_HOTPLUG,
+                      object_get_typename(OBJECT(phb)));
+        }
+        return;
+    }
+
+    g_assert(drc);
+
+    spapr_phb_add_pci_device(drc, phb, pdev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
+                                       DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnectorClass *drck;
+    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
+        SPAPR_DR_CONNECTOR_TYPE_PCI,
+        (phb->index << 16) |
+        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
+        pdev->devfn);
+    Error *local_err = NULL;
+
+    if (!phb->dr_enabled) {
+        error_set(errp, QERR_BUS_NO_HOTPLUG,
+                  object_get_typename(OBJECT(phb)));
+        return;
+    }
+
+    g_assert(drc);
+
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    if (!drck->release_pending(drc)) {
+        spapr_phb_remove_pci_device(drc, phb, pdev, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
 {
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
@@ -574,6 +936,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
                            &sphb->memspace, &sphb->iospace,
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
+    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
 
     /*
      * Initialize PHB address space.
@@ -810,6 +1173,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
     sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
+    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
 
     hc->root_bus_path = spapr_phb_root_bus_path;
     dc->realize = spapr_phb_realize;
@@ -819,6 +1183,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
     spc->finish_realize = spapr_phb_finish_realize;
+    hp->plug = spapr_phb_hot_plug_child;
+    hp->unplug = spapr_phb_hot_unplug_child;
 }
 
 static const TypeInfo spapr_phb_info = {
@@ -827,6 +1193,10 @@ static const TypeInfo spapr_phb_info = {
     .instance_size = sizeof(sPAPRPHBState),
     .class_init    = spapr_phb_class_init,
     .class_size    = sizeof(sPAPRPHBClass),
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
@@ -840,17 +1210,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
     return PCI_HOST_BRIDGE(dev);
 }
 
-/* Macros to operate with address in OF binding to PCI */
-#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
-#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
-#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
-#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
-#define b_ss(x)         b_x((x), 24, 2) /* the space code */
-#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
-#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
-#define b_fff(x)        b_x((x), 8, 3)  /* function number */
-#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
-
 typedef struct sPAPRTCEDT {
     void *fdt;
     int node_off;
@@ -920,14 +1279,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
         return bus_off;
     }
 
-#define _FDT(exp) \
-    do { \
-        int ret = (exp);                                           \
-        if (ret < 0) {                                             \
-            return ret;                                            \
-        }                                                          \
-    } while (0)
-
     /* Write PHB properties */
     _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
     _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug
  2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
                   ` (14 preceding siblings ...)
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth
@ 2015-02-16 14:27 ` Michael Roth
  2015-02-25  3:18   ` David Gibson
  15 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont, david

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.

Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index b9af1cd..7713bfb 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -810,6 +810,9 @@ static void spapr_phb_hot_plug_child(HotplugHandler *plug_handler,
         error_propagate(errp, local_err);
         return;
     }
+    if (plugged_dev->hotplugged) {
+        spapr_hotplug_req_add_event(drc);
+    }
 }
 
 static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
@@ -840,6 +843,7 @@ static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
             error_propagate(errp, local_err);
             return;
         }
+        spapr_hotplug_req_remove_event(drc);
     }
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
@ 2015-02-23  6:05   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-23  6:05 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 2986 bytes --]

On Mon, Feb 16, 2015 at 08:27:38AM -0600, Michael Roth wrote:
> This device emulates a firmware abstraction used by pSeries guests to
> manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
> memory, and CPUs. It is conceptually similar to an SHPC device,
> complete with LED indicators to identify individual slots to physical
> physical users and indicate when it is safe to remove a device. In
> some cases it is also used to manage virtualized resources, such a
> memory, CPUs, and physical-host bridges, which in the case of pSeries
> guests are virtualized resources where the physical components are
> managed by the host.
> 
> Guests communicate with these DR Connectors using RTAS calls,
> generally by addressing the unique DRC index associated with a
> particular connector for a particular resource. For introspection
> purposes we expose this state initially as QOM properties, and
> in subsequent patches will introduce the RTAS calls that make use of
> it. This constitutes to the 'guest' interface.
> 
> On the QEMU side we provide an attach/detach interface to associate
> or cleanup a DeviceState with a particular sPAPRDRConnector in
> response to hotplug/unplug, respectively. This constitutes the
> 'physical' interface to the DR Connector.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Two tiny warts:

[...]
> +static sPAPRDRConnectorTypeShift get_type_shift(sPAPRDRConnectorType type)
> +{
> +    uint32_t shift = 0;
> +
> +    g_assert(type != SPAPR_DR_CONNECTOR_TYPE_ANY);

assert(is_power_of_2(type)) maybe?  Nicer than an infinite loop if a
bad value is passed in here.

[...]
> +static sPAPRDREntitySense entity_sense(sPAPRDRConnector *drc)
> +{
> +    if (drc->dev) {
> +        if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI &&
> +            drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) {
> +            /* for logical DR, we return a state of UNUSABLE
> +             * iff the allocation state UNUSABLE.
> +             * Otherwise, report the state as USABLE/PRESENT,
> +             * as we would for PCI.
> +             */
> +            return SPAPR_DR_ENTITY_SENSE_UNUSABLE;
> +        }
> +
> +        /* this assumes all PCI devices are assigned to
> +         * a 'live insertion' power domain, where QEMU
> +         * manages power state automatically as opposed
> +         * to the guest. present, non-PCI resources are
> +         * unaffected by power state.
> +         */
> +        return SPAPR_DR_ENTITY_SENSE_PRESENT;
> +    }
> +
> +    if (drc->type == SPAPR_DR_CONNECTOR_TYPE_PCI) {
> +        /* PCI devices, and only PCI devices, use PRESENT

s/PRESENT/EMPTY/ ??

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
@ 2015-02-24  6:29   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-24  6:29 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

On Mon, Feb 16, 2015 at 08:27:39AM -0600, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> These interfaces manage the power domains that guest devices are
> assigned to and are used to power on/off devices. Currently we
> only utilize 1 power domain, the 'live-insertion' domain, which
> automates power management of plugged/unplugged devices, essentially
> making these calls no-ops, but the RTAS interfaces are still required
> by guest hotplug code and PAPR+.
> 
> See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> these interfaces.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface Michael Roth
@ 2015-02-24  6:32   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-24  6:32 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

On Mon, Feb 16, 2015 at 08:27:40AM -0600, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>
> 
> This interface allows a guest to control various platform/device
> sensors. Initially, we only implement support necessary to control
> sensors that are required for hotplug: DR connector indicators/LEDs,
> resource allocation state, and resource isolation state.
> 
> See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> this interface.
> 
> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state RTAS interface
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state " Michael Roth
@ 2015-02-24  6:33   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-24  6:33 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 854 bytes --]

On Mon, Feb 16, 2015 at 08:27:41AM -0600, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>
> 
> This interface allows a guest to read various platform/device sensors.
> initially, we only implement support necessary to support hotplug:
> reading of the dr-entity-sense sensor, which communicates the state of
> a hotplugged resource/device to the guest (EMPTY/PRESENT/UNUSABLE).
> 
> See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> this interface.
> 
> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
@ 2015-02-24  6:40   ` David Gibson
  2015-02-24 20:43     ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-24  6:40 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 5784 bytes --]

On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> This interface is used to fetch an OF device-tree nodes that describes a
> newly-attached device to guest. It is called multiple times to walk the
> device-tree node and fetch individual properties into a 'workarea'/buffer
> provided by the guest.
> 
> The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
> the initial hotplug operation, and the state of these RTAS calls is tracked by
> the sPAPRDRConnector. When the last of these properties is successfully
> fetched, we report as special return value to the guest and transition
> the device to a 'configured' state on the QEMU/DRC side.
> 
> See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> this interface.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index f80beb2..b8551b0 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -418,6 +418,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 1, entity_sense);
>  }
>  
> +/* configure-connector work area offsets, int32_t units for field
> + * indexes, bytes for field offset/len values.
> + *
> + * as documented by PAPR+ v2.7, 13.5.3.5
> + */
> +#define CC_IDX_NODE_NAME_OFFSET 2
> +#define CC_IDX_PROP_NAME_OFFSET 2
> +#define CC_IDX_PROP_LEN 3
> +#define CC_IDX_PROP_DATA_OFFSET 4
> +#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
> +#define CC_WA_LEN 4096
> +
> +static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
> +                                         sPAPREnvironment *spapr,
> +                                         uint32_t token, uint32_t nargs,
> +                                         target_ulong args, uint32_t nret,
> +                                         target_ulong rets)
> +{
> +    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);

You need to check nargs and nret first.

> +    uint64_t wa_offset;
> +    uint32_t drc_index;
> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    sPAPRDRCCResponse resp;
> +    const struct fdt_property *prop = NULL;
> +    char *prop_name = NULL;
> +    int prop_len, rc;
> +
> +    drc_index = rtas_ld(wa_addr, 0);
> +    drc = spapr_dr_connector_by_index(drc_index);
> +    if (!drc) {
> +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> +                drc_index);
> +        rc = RTAS_OUT_PARAM_ERROR;
> +        goto out;
> +    }
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);

You may have answered this last time round, but if so I forgot the
reason.

Why does the awkward iteration need to go down to the drck callback?
Coudln't the drck callback part just supply the fdt fragment blob,
then have generic code which streams it out via iteration?

Obviously we have to support the horrid PAPR interface, but it would
be nice to confine the PAPR derived horridness to as small an area as
we can.


> +    switch (resp) {
> +    case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
> +        /* provide the name of the next OF node */
> +        wa_offset = CC_VAL_DATA_OFFSET;
> +        rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> +        break;
> +    case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY:
> +        /* provide the name of the next OF property */
> +        wa_offset = CC_VAL_DATA_OFFSET;
> +        rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> +
> +        /* provide the length and value of the OF property. data gets placed
> +         * immediately after NULL terminator of the OF property's name string
> +         */
> +        wa_offset += strlen(prop_name) + 1,
> +        rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
> +        rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
> +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> +                              (uint8_t *)((struct fdt_property *)prop)->data,
> +                              prop_len);
> +        break;
> +    case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
> +    case SPAPR_DR_CC_RESPONSE_ERROR:
> +    case SPAPR_DR_CC_RESPONSE_SUCCESS:
> +        break;
> +    default:
> +        /* drck->configure_connector() should not return anything else */
> +        g_assert(false);
> +    }
> +
> +    rc = resp;
> +out:
> +    g_free(prop_name);
> +    rtas_st(rets, 0, rc);
> +}
> +
>  static struct rtas_call {
>      const char *name;
>      spapr_rtas_fn fn;
> @@ -551,6 +630,8 @@ static void core_rtas_register_types(void)
>                          rtas_set_indicator);
>      spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
>                          rtas_get_sensor_state);
> +    spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
> +                        rtas_ibm_configure_connector);
>  }
>  
>  type_init(core_rtas_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2015-02-24  6:49   ` David Gibson
  2015-02-24 20:04     ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-24  6:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 18069 bytes --]

On Mon, Feb 16, 2015 at 08:27:44AM -0600, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> This extends the data structures currently used to report EPOW events to
> guests via the check-exception RTAS interfaces to also include event types
> for hotplug/unplug events.
> 
> This is currently undocumented and being finalized for inclusion in PAPR
> specification, but we implement this here as an extension for guest
> userspace tools to implement (existing guest kernels simply log these
> events via a sysfs interface that's read by rtas_errd, and current
> versions of rtas_errd/powerpc-utils already support the use of this
> mechanism for initiating hotplug operations).
> 
> We also add support for queues of pending RTAS events, since in the
> case of hotplug there's chance for multiple events being in-flight
> at any point in time.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |   2 +-
>  hw/ppc/spapr_events.c  | 271 ++++++++++++++++++++++++++++++++++++++++---------
>  include/hw/ppc/spapr.h |   5 +-
>  3 files changed, 226 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bfacc9d..861107e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1640,7 +1640,7 @@ static void ppc_spapr_init(MachineState *machine)
>      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
>                                              kernel_size, kernel_le,
>                                              boot_device, kernel_cmdline,
> -                                            spapr->epow_irq);
> +                                            spapr->check_exception_irq);
>      assert(spapr->fdt_skel != NULL);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 283e96b..0eeb1d8 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -32,6 +32,9 @@
>  
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_vio.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci-host/spapr.h"
> +#include "hw/ppc/spapr_drc.h"
>  
>  #include <libfdt.h>
>  
> @@ -77,6 +80,7 @@ struct rtas_error_log {
>  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
>  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
>  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
>      uint32_t extended_length;
>  } QEMU_PACKED;
>  
> @@ -166,6 +170,38 @@ struct epow_log_full {
>      struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_hp {
> +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint8_t hotplug_type;
> +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +    uint8_t hotplug_action;
> +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> +    uint8_t hotplug_identifier;
> +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> +    uint8_t reserved;
> +    union {
> +        uint32_t index;
> +        uint32_t count;
> +        char name[1];
> +    } drc;
> +} QEMU_PACKED;
> +
> +struct hp_log_full {
> +    struct rtas_error_log hdr;
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_maina maina;
> +    struct rtas_event_log_v6_mainb mainb;
> +    struct rtas_event_log_v6_hp hp;
> +} QEMU_PACKED;
> +
>  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
>  #define EVENT_MASK_EPOW                      0x40000000
>  #define EVENT_MASK_HOTPLUG                   0x10000000
> @@ -181,67 +217,84 @@ struct epow_log_full {
>          }                                                          \
>      } while (0)
>  
> -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
>  {
> -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
>  
>      _FDT((fdt_begin_node(fdt, "event-sources")));
>  
>      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
>      _FDT((fdt_property(fdt, "interrupt-ranges",
> -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> +                       irq_ranges, sizeof(irq_ranges))));
>  
>      _FDT((fdt_begin_node(fdt, "epow-events")));
> -    _FDT((fdt_property(fdt, "interrupts",
> -                       epow_interrupts, sizeof(epow_interrupts))));
> +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
>      _FDT((fdt_end_node(fdt)));
>  
>      _FDT((fdt_end_node(fdt)));
>  }
>  
> -static struct epow_log_full *pending_epow;
> -static uint32_t next_plid;
> +typedef struct EventLogEntry {
> +    int log_type;
> +    void *data;
> +    QTAILQ_ENTRY(EventLogEntry) next;
> +} EventLogEntry;
>  
> -static void spapr_powerdown_req(Notifier *n, void *opaque)
> +static struct {
> +    QTAILQ_HEAD(, EventLogEntry) pending_events;
> +} rtas_event_log;

Adding new globals is never nice.  Couldn't this go into
sPAPREnvironment, or better yet the SPAPR_MACHINE structure?  (I'd
like to merge those two when we get the chance).

> +static void rtas_event_log_queue(int log_type, void *data)
>  {
> -    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> -    struct rtas_error_log *hdr;
> -    struct rtas_event_log_v6 *v6hdr;
> -    struct rtas_event_log_v6_maina *maina;
> -    struct rtas_event_log_v6_mainb *mainb;
> -    struct rtas_event_log_v6_epow *epow;
> -    struct tm tm;
> -    int year;
> +    EventLogEntry *entry = g_new(EventLogEntry, 1);
> +
> +    entry->log_type = log_type;
> +    entry->data = data;
> +    QTAILQ_INSERT_TAIL(&rtas_event_log.pending_events, entry, next);
> +}
>  
> -    if (pending_epow) {
> -        /* For now, we just throw away earlier events if two come
> -         * along before any are consumed.  This is sufficient for our
> -         * powerdown messages, but we'll need more if we do more
> -         * general error/event logging */
> -        g_free(pending_epow);
> +static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask)
> +{
> +    EventLogEntry *entry = NULL;
> +
> +    /* we only queue EPOW events atm. */
> +    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> +        return NULL;
>      }
> -    pending_epow = g_malloc0(sizeof(*pending_epow));
> -    hdr = &pending_epow->hdr;
> -    v6hdr = &pending_epow->v6hdr;
> -    maina = &pending_epow->maina;
> -    mainb = &pending_epow->mainb;
> -    epow = &pending_epow->epow;
>  
> -    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> -                               | RTAS_LOG_SEVERITY_EVENT
> -                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> -                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> -                               | RTAS_LOG_TYPE_EPOW);
> -    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> -                                       - sizeof(pending_epow->hdr));
> +    QTAILQ_FOREACH(entry, &rtas_event_log.pending_events, next) {
> +        /* EPOW and hotplug events are surfaced in the same manner */
> +        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> +            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> +            break;
> +        }
> +    }
> +
> +    if (entry) {
> +        QTAILQ_REMOVE(&rtas_event_log.pending_events, entry, next);
> +    }
>  
> +    return entry;
> +}
> +
> +static uint32_t next_plid;
> +
> +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> +{
>      v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
>          | RTAS_LOG_V6_B0_BIGENDIAN;
>      v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
>          | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
>      v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> +}
> +
> +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> +                             int section_count)
> +{
> +    struct tm tm;
> +    int year;
>  
>      maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
>      maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> @@ -256,8 +309,37 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>                                         | (to_bcd(tm.tm_min) << 16)
>                                         | (to_bcd(tm.tm_sec) << 8));
>      maina->creator_id = 'H'; /* Hypervisor */
> -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> +    maina->section_count = section_count;
>      maina->plid = next_plid++;
> +}
> +
> +static void spapr_powerdown_req(Notifier *n, void *opaque)
> +{
> +    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> +    struct rtas_error_log *hdr;
> +    struct rtas_event_log_v6 *v6hdr;
> +    struct rtas_event_log_v6_maina *maina;
> +    struct rtas_event_log_v6_mainb *mainb;
> +    struct rtas_event_log_v6_epow *epow;
> +    struct epow_log_full *pending_epow;
> +
> +    pending_epow = g_malloc0(sizeof(*pending_epow));

The name's no longer really right as a local.

> +    hdr = &pending_epow->hdr;
> +    v6hdr = &pending_epow->v6hdr;
> +    maina = &pending_epow->maina;
> +    mainb = &pending_epow->mainb;
> +    epow = &pending_epow->epow;
> +
> +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> +                               | RTAS_LOG_SEVERITY_EVENT
> +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> +                               | RTAS_LOG_TYPE_EPOW);
> +    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> +                                       - sizeof(pending_epow->hdr));
> +
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
>  
>      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
>      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> @@ -274,7 +356,78 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
>      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> +    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow);
> +
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));

Shouldn't the irq pulse go into rtas_event_log_queue()?

> +}
> +
> +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
> +{
> +    struct hp_log_full *new_hp;
> +    struct rtas_error_log *hdr;
> +    struct rtas_event_log_v6 *v6hdr;
> +    struct rtas_event_log_v6_maina *maina;
> +    struct rtas_event_log_v6_mainb *mainb;
> +    struct rtas_event_log_v6_hp *hp;
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
> +
> +    new_hp = g_malloc0(sizeof(struct hp_log_full));
> +    hdr = &new_hp->hdr;
> +    v6hdr = &new_hp->v6hdr;
> +    maina = &new_hp->maina;
> +    mainb = &new_hp->mainb;
> +    hp = &new_hp->hp;
> +
> +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> +                               | RTAS_LOG_SEVERITY_EVENT
> +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> +                               | RTAS_LOG_INITIATOR_HOTPLUG
> +                               | RTAS_LOG_TYPE_HOTPLUG);
> +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
> +                                       - sizeof(new_hp->hdr));
> +
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> +
> +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> +    mainb->subsystem_id = 0x80; /* External environment */
> +    mainb->event_severity = 0x00; /* Informational / non-error */
> +    mainb->event_subtype = 0x00; /* Normal shutdown */
> +
> +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> +    hp->hdr.section_version = 1; /* includes extended modifier */
> +    hp->hotplug_action = hp_action;
> +
> +
> +    switch (drc_type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> +        break;
> +    default:
> +        /* skip notification for unknown connector types */

This should be an assert, shouldn't it?  If code is added which can
fire this for non-PCI types, without the corresponding non-PCI option
here, you want to catch that.

> +        g_free(new_hp);
> +        return;
> +    }
> +
> +    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp);
> +
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> +}
> +
> +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
> +{
> +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
> +}
> +
> +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
> +{
> +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
>  }
>  
>  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> @@ -282,8 +435,9 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                              target_ulong args,
>                              uint32_t nret, target_ulong rets)
>  {
> -    uint32_t mask, buf, len;
> +    uint32_t mask, buf, len, event_len;
>      uint64_t xinfo;
> +    EventLogEntry *event;
>  
>      if ((nargs < 6) || (nargs > 7) || nret != 1) {
>          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> @@ -298,23 +452,40 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
>      }
>  
> -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> -        if (sizeof(*pending_epow) < len) {
> -            len = sizeof(*pending_epow);
> -        }
> +    event = rtas_event_log_dequeue(mask);
> +    if (!event) {
> +        goto out_no_events;
> +    }
>  
> -        cpu_physical_memory_write(buf, pending_epow, len);
> -        g_free(pending_epow);
> -        pending_epow = NULL;
> -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> -    } else {
> -        rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> +    switch (event->log_type) {
> +        case RTAS_LOG_TYPE_EPOW:
> +            event_len = sizeof(struct epow_log_full);
> +            break;
> +        case RTAS_LOG_TYPE_HOTPLUG:
> +            event_len = sizeof(struct hp_log_full);
> +            break;
> +        default:
> +            goto out_no_events;

Doesn't one of the headers include size information you could use
here, avoiding the switch?

>      }
> +
> +    if (event_len < len) {
> +        len = event_len;
> +    }
> +
> +    cpu_physical_memory_write(buf, event->data, len);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    g_free(event->data);
> +    g_free(event);
> +    return;
> +
> +out_no_events:
> +    rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
>  }
>  
>  void spapr_events_init(sPAPREnvironment *spapr)
>  {
> -    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
> +    QTAILQ_INIT(&rtas_event_log.pending_events);
> +    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3cc4490..1d27708 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -3,6 +3,7 @@
>  
>  #include "sysemu/dma.h"
>  #include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_drc.h"
>  
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> @@ -31,7 +32,7 @@ typedef struct sPAPREnvironment {
>      struct PPCTimebase tb;
>      bool has_graphics;
>  
> -    uint32_t epow_irq;
> +    uint32_t check_exception_irq;
>      Notifier epow_notifier;
>  
>      /* Migration state */
> @@ -498,6 +499,8 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                   uint32_t liobn, uint64_t window, uint32_t size);
>  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                        sPAPRTCETable *tcet);
> +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
> +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
>  
>  #define TYPE_SPAPR_RTC "spapr-rtc"
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface Michael Roth
@ 2015-02-24  9:11   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-24  9:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Mon, Feb 16, 2015 at 08:27:45AM -0600, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> We don't actually rely on this interface to surface hotplug events, and
> instead rely on the similar-but-interrupt-driven check-exception RTAS
> interface used for EPOW events. However, the existence of this interface
> is needed to ensure guest kernels initialize the event-reporting
> interfaces which will in turn be used by userspace tools to handle these
> events, so we implement this interface here.
> 
> Since events surfaced by this call are mutually exclusive to those
> surfaced via check-exception, we also update the RTAS event queue code
> to accept a boolean to mark/filter for events accordingly.
> 
> Events of this sort are not currently generated by QEMU, but the interface
> has been tested by surfacing hotplug events via event-scan in place
> of check-exception.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Hrm, seems a little odd to implement a queue which nothing will put
events into, but it looks correct.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt()
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
@ 2015-02-24  9:26   ` David Gibson
  2015-02-24 19:21     ` Michael Roth
  2015-02-26 15:56   ` Nathan Fontenot
  1 sibling, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-24  9:26 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 8584 bytes --]

On Mon, Feb 16, 2015 at 08:27:46AM -0600, Michael Roth wrote:
> This function handles generation of ibm,drc-* array device tree
> properties to describe DRC topology to guests. This will by used
> by the guest to direct RTAS calls to manage any dynamic resources
> we associate with a particular DR Connector as part of
> hotplug/unplug.
> 
> Since general management of boot-time device trees are handled
> outside of sPAPRDRConnector, we insert these values blindly given
> an FDT and offset. A mask of sPAPRDRConnector types is given to
> instruct us on what types of connectors entries should be generated
> for, since descriptions for different connectors may live in
> different parts of the device tree.
> 
> Based on code originally written by Nathan Fontenot.

Shouldn't he have a S-o-b then?  1st S-o-b is usually the same as
Author, but it doesn't have to be.

> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_drc.c         | 165 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |   4 +-
>  2 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8b01200..2f71696 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -569,3 +569,168 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>              (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
>              (id & DRC_INDEX_ID_MASK));
>  }
> +
> +/* generate a string the describes the DRC to encode into the
> + * device tree.
> + *
> + * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
> + */
> +static char *spapr_drc_get_type_str(sPAPRDRConnectorType type)
> +{
> +    char *type_str = NULL;
> +
> +    switch (type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +        type_str = g_strdup_printf("CPU");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> +        type_str = g_strdup_printf("PHB");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> +        type_str = g_strdup_printf("SLOT");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        type_str = g_strdup_printf("28");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        type_str = g_strdup_printf("MEM");
> +        break;
> +    default:
> +        g_assert(false);
> +    }
> +
> +    return type_str;
> +}
> +
> +/**
> + * spapr_drc_populate_dt
> + *
> + * @fdt: libfdt device tree
> + * @path: path in the DT to generate properties
> + * @owner: parent Object/DeviceState for which to generate DRC
> + *         descriptions for
> + * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
> + *   to the types of DRCs to generate entries for
> + *
> + * generate OF properties to describe DRC topology/indices to guests
> + *
> + * as documented in PAPR+ v2.1, 13.5.2
> + */
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> +                          uint32_t drc_type_mask)
> +{
> +    Object *root_container;
> +    ObjectProperty *prop;
> +    uint32_t drc_count = 0;
> +    GArray *drc_indexes, *drc_power_domains;
> +    GString *drc_names, *drc_types;
> +    int ret;
> +
> +    /* the first entry of each properties is a 32-bit integer encoding
> +     * the number of elements in the array. we won't know this until
> +     * we complete the iteration through all the matching DRCs, but
> +     * reserve the space now and set the offsets accordingly so we
> +     * can fill them in later.
> +     */
> +    drc_indexes = g_array_new(false, true, sizeof(uint32_t));
> +    drc_indexes = g_array_set_size(drc_indexes, 1);
> +    drc_power_domains = g_array_new(false, true, sizeof(uint32_t));
> +    drc_power_domains = g_array_set_size(drc_power_domains, 1);
> +    drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> +    drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> +
> +    /* aliases for all DRConnector objects will be rooted in QOM
> +     * composition tree at DRC_CONTAINER_PATH
> +     */
> +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +
> +    QTAILQ_FOREACH(prop, &root_container->properties, node) {
> +        Object *obj;
> +        sPAPRDRConnector *drc;
> +        sPAPRDRConnectorClass *drck;
> +        char *drc_type;
> +        uint32_t drc_index, drc_power_domain;
> +
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +
> +        obj = object_property_get_link(root_container, prop->name, NULL);
> +        drc = SPAPR_DR_CONNECTOR(obj);
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +        if (owner && (drc->owner != owner)) {
> +            continue;
> +        }
> +
> +        if ((drc->type & drc_type_mask) == 0) {
> +            continue;
> +        }
> +
> +        drc_count++;
> +
> +        /* ibm,drc-indexes */
> +        drc_index = cpu_to_be32(drck->get_index(drc));
> +        g_array_append_val(drc_indexes, drc_index);
> +
> +        /* ibm,drc-power-domains */
> +        drc_power_domain = cpu_to_be32(-1);
> +        g_array_append_val(drc_power_domains, drc_power_domain);
> +
> +        /* ibm,drc-names */
> +        drc_names = g_string_append(drc_names, drck->get_name(drc));
> +        drc_names = g_string_insert_len(drc_names, -1, "\0", 1);
> +
> +        /* ibm,drc-types */
> +        drc_type = spapr_drc_get_type_str(drc->type);
> +        drc_types = g_string_append(drc_types, drc_type);
> +        drc_types = g_string_insert_len(drc_types, -1, "\0", 1);
> +        g_free(drc_type);
> +    }
> + 
> +    /* now write the drc count into the space we reserved at the
> +     * beginning of the arrays previously
> +     */
> +    *(uint32_t *)drc_indexes->data = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_power_domains->data = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_names->str = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_types->str = cpu_to_be32(drc_count);
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes",
> +                      drc_indexes->data,
> +                      drc_indexes->len * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains",
> +                      drc_power_domains->data,
> +                      drc_power_domains->len * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names",
> +                      drc_names->str, drc_names->len);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types",
> +                      drc_types->str, drc_types->len);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> +        goto out;
> +    }
> +
> +out:
> +    g_array_free(drc_indexes, true);
> +    g_array_free(drc_power_domains, true);
> +    g_string_free(drc_names, true);
> +    g_string_free(drc_types, true);
> +
> +    return ret;
> +}
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index d6cf597..5895534 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -197,9 +197,11 @@ typedef struct sPAPRDRConnectorClass {
>  
>  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
>                                           sPAPRDRConnectorType type,
> -                                         uint32_t token);
> +                                         uint32_t id);

This appears to be an unrelated change.

>  sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
>  sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>                                             uint32_t id);
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> +                          uint32_t drc_type_mask);
>  
>  #endif /* __HW_SPAPR_DRC_H__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs Michael Roth
@ 2015-02-24  9:29   ` David Gibson
  2015-02-24 18:52     ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-24  9:29 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Mon, Feb 16, 2015 at 08:27:49AM -0600, Michael Roth wrote:
> Reserve 32 entries of type PCI in each PHB's initial FDT. This

I find this description odd.  It's not reallky reserving anything,
just publishing the DRCs you already created in the FDT.

I wonder if this should be merged with the previous patch.

> advertises to guests that each PHB is DR-capable device with
> physical hotpluggable slots. This is necessary for allowing
> hotplugging of devices to it later via bus rescan or guest rpaphp
> hotplug module.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

But, it looks correct.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c Michael Roth
@ 2015-02-24 10:20   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-24 10:20 UTC (permalink / raw)
  To: Michael Roth
  Cc: mst, aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld,
	bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1662 bytes --]

On Mon, Feb 16, 2015 at 08:27:50AM -0600, Michael Roth wrote:
> We need to work with PCI BARs to generate OF properties
> during PCI hotplug for sPAPR guests.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Added Michael Tsirkin to the Cc list, since he usually handles the PCI
subsystem.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/pci/pci.c         | 2 +-
>  include/hw/pci/pci.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index d508930..f37d3e1 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -123,7 +123,7 @@ static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
>  
>  static QLIST_HEAD(, PCIHostState) pci_host_bridges;
>  
> -static int pci_bar(PCIDevice *d, int reg)
> +int pci_bar(PCIDevice *d, int reg)
>  {
>      uint8_t type;
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 97a83d3..eba9200 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -330,6 +330,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
>  int pci_device_load(PCIDevice *s, QEMUFile *f);
>  MemoryRegion *pci_address_space(PCIDevice *dev);
>  MemoryRegion *pci_address_space_io(PCIDevice *dev);
> +int pci_bar(PCIDevice *d, int reg);
>  
>  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
>  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs
  2015-02-24  9:29   ` David Gibson
@ 2015-02-24 18:52     ` Michael Roth
  0 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-24 18:52 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-24 03:29:40)
> On Mon, Feb 16, 2015 at 08:27:49AM -0600, Michael Roth wrote:
> > Reserve 32 entries of type PCI in each PHB's initial FDT. This
> 
> I find this description odd.  It's not reallky reserving anything,
> just publishing the DRCs you already created in the FDT.

Yah, I think this is an artifact from earlier versions where we
reserved the entries statically.

> 
> I wonder if this should be merged with the previous patch.

Makes sense. Will fix up the description and merge this into
previous patch.

> 
> > advertises to guests that each PHB is DR-capable device with
> > physical hotpluggable slots. This is necessary for allowing
> > hotplugging of devices to it later via bus rescan or guest rpaphp
> > hotplug module.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> 
> But, it looks correct.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt()
  2015-02-24  9:26   ` David Gibson
@ 2015-02-24 19:21     ` Michael Roth
  0 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-24 19:21 UTC (permalink / raw)
  To: David Gibson, nfont
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao

Quoting David Gibson (2015-02-24 03:26:42)
> On Mon, Feb 16, 2015 at 08:27:46AM -0600, Michael Roth wrote:
> > This function handles generation of ibm,drc-* array device tree
> > properties to describe DRC topology to guests. This will by used
> > by the guest to direct RTAS calls to manage any dynamic resources
> > we associate with a particular DR Connector as part of
> > hotplug/unplug.
> > 
> > Since general management of boot-time device trees are handled
> > outside of sPAPRDRConnector, we insert these values blindly given
> > an FDT and offset. A mask of sPAPRDRConnector types is given to
> > instruct us on what types of connectors entries should be generated
> > for, since descriptions for different connectors may live in
> > different parts of the device tree.
> > 
> > Based on code originally written by Nathan Fontenot.
> 
> Shouldn't he have a S-o-b then?  1st S-o-b is usually the same as
> Author, but it doesn't have to be.

Hi Nathan,

Can we get your S-o-b for this patch?

This is based on the code you wrote to reserve entries for additional
PHBs in the top-level DRC list in the device tree.

  http://lists.gnu.org/archive/html/qemu-ppc/2014-08/msg00162.html

I attributed the original code to you when we ended up dropping that patch
(to remove code that's specific to PHB hotplug), since it was the template
I had used for the PCI-specific counterparts of the DRC lists in
the device tree.

> 
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_drc.c         | 165 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h |   4 +-
> >  2 files changed, 168 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 8b01200..2f71696 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -569,3 +569,168 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> >              (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
> >              (id & DRC_INDEX_ID_MASK));
> >  }
> > +
> > +/* generate a string the describes the DRC to encode into the
> > + * device tree.
> > + *
> > + * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
> > + */
> > +static char *spapr_drc_get_type_str(sPAPRDRConnectorType type)
> > +{
> > +    char *type_str = NULL;
> > +
> > +    switch (type) {
> > +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> > +        type_str = g_strdup_printf("CPU");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> > +        type_str = g_strdup_printf("PHB");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> > +        type_str = g_strdup_printf("SLOT");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        type_str = g_strdup_printf("28");
> > +        break;
> > +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > +        type_str = g_strdup_printf("MEM");
> > +        break;
> > +    default:
> > +        g_assert(false);
> > +    }
> > +
> > +    return type_str;
> > +}
> > +
> > +/**
> > + * spapr_drc_populate_dt
> > + *
> > + * @fdt: libfdt device tree
> > + * @path: path in the DT to generate properties
> > + * @owner: parent Object/DeviceState for which to generate DRC
> > + *         descriptions for
> > + * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
> > + *   to the types of DRCs to generate entries for
> > + *
> > + * generate OF properties to describe DRC topology/indices to guests
> > + *
> > + * as documented in PAPR+ v2.1, 13.5.2
> > + */
> > +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> > +                          uint32_t drc_type_mask)
> > +{
> > +    Object *root_container;
> > +    ObjectProperty *prop;
> > +    uint32_t drc_count = 0;
> > +    GArray *drc_indexes, *drc_power_domains;
> > +    GString *drc_names, *drc_types;
> > +    int ret;
> > +
> > +    /* the first entry of each properties is a 32-bit integer encoding
> > +     * the number of elements in the array. we won't know this until
> > +     * we complete the iteration through all the matching DRCs, but
> > +     * reserve the space now and set the offsets accordingly so we
> > +     * can fill them in later.
> > +     */
> > +    drc_indexes = g_array_new(false, true, sizeof(uint32_t));
> > +    drc_indexes = g_array_set_size(drc_indexes, 1);
> > +    drc_power_domains = g_array_new(false, true, sizeof(uint32_t));
> > +    drc_power_domains = g_array_set_size(drc_power_domains, 1);
> > +    drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> > +    drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> > +
> > +    /* aliases for all DRConnector objects will be rooted in QOM
> > +     * composition tree at DRC_CONTAINER_PATH
> > +     */
> > +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > +
> > +    QTAILQ_FOREACH(prop, &root_container->properties, node) {
> > +        Object *obj;
> > +        sPAPRDRConnector *drc;
> > +        sPAPRDRConnectorClass *drck;
> > +        char *drc_type;
> > +        uint32_t drc_index, drc_power_domain;
> > +
> > +        if (!strstart(prop->type, "link<", NULL)) {
> > +            continue;
> > +        }
> > +
> > +        obj = object_property_get_link(root_container, prop->name, NULL);
> > +        drc = SPAPR_DR_CONNECTOR(obj);
> > +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +        if (owner && (drc->owner != owner)) {
> > +            continue;
> > +        }
> > +
> > +        if ((drc->type & drc_type_mask) == 0) {
> > +            continue;
> > +        }
> > +
> > +        drc_count++;
> > +
> > +        /* ibm,drc-indexes */
> > +        drc_index = cpu_to_be32(drck->get_index(drc));
> > +        g_array_append_val(drc_indexes, drc_index);
> > +
> > +        /* ibm,drc-power-domains */
> > +        drc_power_domain = cpu_to_be32(-1);
> > +        g_array_append_val(drc_power_domains, drc_power_domain);
> > +
> > +        /* ibm,drc-names */
> > +        drc_names = g_string_append(drc_names, drck->get_name(drc));
> > +        drc_names = g_string_insert_len(drc_names, -1, "\0", 1);
> > +
> > +        /* ibm,drc-types */
> > +        drc_type = spapr_drc_get_type_str(drc->type);
> > +        drc_types = g_string_append(drc_types, drc_type);
> > +        drc_types = g_string_insert_len(drc_types, -1, "\0", 1);
> > +        g_free(drc_type);
> > +    }
> > + 
> > +    /* now write the drc count into the space we reserved at the
> > +     * beginning of the arrays previously
> > +     */
> > +    *(uint32_t *)drc_indexes->data = cpu_to_be32(drc_count);
> > +    *(uint32_t *)drc_power_domains->data = cpu_to_be32(drc_count);
> > +    *(uint32_t *)drc_names->str = cpu_to_be32(drc_count);
> > +    *(uint32_t *)drc_types->str = cpu_to_be32(drc_count);
> > +
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes",
> > +                      drc_indexes->data,
> > +                      drc_indexes->len * sizeof(uint32_t));
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
> > +        goto out;
> > +    }
> > +
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains",
> > +                      drc_power_domains->data,
> > +                      drc_power_domains->len * sizeof(uint32_t));
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> > +        goto out;
> > +    }
> > +
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names",
> > +                      drc_names->str, drc_names->len);
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> > +        goto out;
> > +    }
> > +
> > +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types",
> > +                      drc_types->str, drc_types->len);
> > +    if (ret) {
> > +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    g_array_free(drc_indexes, true);
> > +    g_array_free(drc_power_domains, true);
> > +    g_string_free(drc_names, true);
> > +    g_string_free(drc_types, true);
> > +
> > +    return ret;
> > +}
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index d6cf597..5895534 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -197,9 +197,11 @@ typedef struct sPAPRDRConnectorClass {
> >  
> >  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
> >                                           sPAPRDRConnectorType type,
> > -                                         uint32_t token);
> > +                                         uint32_t id);
> 
> This appears to be an unrelated change.
> 
> >  sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
> >  sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
> >                                             uint32_t id);
> > +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> > +                          uint32_t drc_type_mask);
> >  
> >  #endif /* __HW_SPAPR_DRC_H__ */
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-02-24  6:49   ` David Gibson
@ 2015-02-24 20:04     ` Michael Roth
  2015-02-25  0:54       ` David Gibson
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-24 20:04 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-24 00:49:42)
> On Mon, Feb 16, 2015 at 08:27:44AM -0600, Michael Roth wrote:
> > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > 
> > This extends the data structures currently used to report EPOW events to
> > guests via the check-exception RTAS interfaces to also include event types
> > for hotplug/unplug events.
> > 
> > This is currently undocumented and being finalized for inclusion in PAPR
> > specification, but we implement this here as an extension for guest
> > userspace tools to implement (existing guest kernels simply log these
> > events via a sysfs interface that's read by rtas_errd, and current
> > versions of rtas_errd/powerpc-utils already support the use of this
> > mechanism for initiating hotplug operations).
> > 
> > We also add support for queues of pending RTAS events, since in the
> > case of hotplug there's chance for multiple events being in-flight
> > at any point in time.
> > 
> > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |   2 +-
> >  hw/ppc/spapr_events.c  | 271 ++++++++++++++++++++++++++++++++++++++++---------
> >  include/hw/ppc/spapr.h |   5 +-
> >  3 files changed, 226 insertions(+), 52 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index bfacc9d..861107e 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1640,7 +1640,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> >                                              kernel_size, kernel_le,
> >                                              boot_device, kernel_cmdline,
> > -                                            spapr->epow_irq);
> > +                                            spapr->check_exception_irq);
> >      assert(spapr->fdt_skel != NULL);
> >  }
> >  
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 283e96b..0eeb1d8 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -32,6 +32,9 @@
> >  
> >  #include "hw/ppc/spapr.h"
> >  #include "hw/ppc/spapr_vio.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci-host/spapr.h"
> > +#include "hw/ppc/spapr_drc.h"
> >  
> >  #include <libfdt.h>
> >  
> > @@ -77,6 +80,7 @@ struct rtas_error_log {
> >  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
> >  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
> >  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> > +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
> >      uint32_t extended_length;
> >  } QEMU_PACKED;
> >  
> > @@ -166,6 +170,38 @@ struct epow_log_full {
> >      struct rtas_event_log_v6_epow epow;
> >  } QEMU_PACKED;
> >  
> > +struct rtas_event_log_v6_hp {
> > +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> > +    struct rtas_event_log_v6_section_header hdr;
> > +    uint8_t hotplug_type;
> > +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> > +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> > +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> > +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> > +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> > +    uint8_t hotplug_action;
> > +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> > +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> > +    uint8_t hotplug_identifier;
> > +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> > +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> > +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> > +    uint8_t reserved;
> > +    union {
> > +        uint32_t index;
> > +        uint32_t count;
> > +        char name[1];
> > +    } drc;
> > +} QEMU_PACKED;
> > +
> > +struct hp_log_full {
> > +    struct rtas_error_log hdr;
> > +    struct rtas_event_log_v6 v6hdr;
> > +    struct rtas_event_log_v6_maina maina;
> > +    struct rtas_event_log_v6_mainb mainb;
> > +    struct rtas_event_log_v6_hp hp;
> > +} QEMU_PACKED;
> > +
> >  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> >  #define EVENT_MASK_EPOW                      0x40000000
> >  #define EVENT_MASK_HOTPLUG                   0x10000000
> > @@ -181,67 +217,84 @@ struct epow_log_full {
> >          }                                                          \
> >      } while (0)
> >  
> > -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> > +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> >  {
> > -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> > -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> > +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> >  
> >      _FDT((fdt_begin_node(fdt, "event-sources")));
> >  
> >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> > +                       irq_ranges, sizeof(irq_ranges))));
> >  
> >      _FDT((fdt_begin_node(fdt, "epow-events")));
> > -    _FDT((fdt_property(fdt, "interrupts",
> > -                       epow_interrupts, sizeof(epow_interrupts))));
> > +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> >      _FDT((fdt_end_node(fdt)));
> >  
> >      _FDT((fdt_end_node(fdt)));
> >  }
> >  
> > -static struct epow_log_full *pending_epow;
> > -static uint32_t next_plid;
> > +typedef struct EventLogEntry {
> > +    int log_type;
> > +    void *data;
> > +    QTAILQ_ENTRY(EventLogEntry) next;
> > +} EventLogEntry;
> >  
> > -static void spapr_powerdown_req(Notifier *n, void *opaque)
> > +static struct {
> > +    QTAILQ_HEAD(, EventLogEntry) pending_events;
> > +} rtas_event_log;
> 
> Adding new globals is never nice.  Couldn't this go into
> sPAPREnvironment, or better yet the SPAPR_MACHINE structure?  (I'd
> like to merge those two when we get the chance).

Makes sense, will roll into sPAPREnvironment initially

> 
> > +static void rtas_event_log_queue(int log_type, void *data)
> >  {
> > -    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> > -    struct rtas_error_log *hdr;
> > -    struct rtas_event_log_v6 *v6hdr;
> > -    struct rtas_event_log_v6_maina *maina;
> > -    struct rtas_event_log_v6_mainb *mainb;
> > -    struct rtas_event_log_v6_epow *epow;
> > -    struct tm tm;
> > -    int year;
> > +    EventLogEntry *entry = g_new(EventLogEntry, 1);
> > +
> > +    entry->log_type = log_type;
> > +    entry->data = data;
> > +    QTAILQ_INSERT_TAIL(&rtas_event_log.pending_events, entry, next);
> > +}
> >  
> > -    if (pending_epow) {
> > -        /* For now, we just throw away earlier events if two come
> > -         * along before any are consumed.  This is sufficient for our
> > -         * powerdown messages, but we'll need more if we do more
> > -         * general error/event logging */
> > -        g_free(pending_epow);
> > +static EventLogEntry *rtas_event_log_dequeue(uint32_t event_mask)
> > +{
> > +    EventLogEntry *entry = NULL;
> > +
> > +    /* we only queue EPOW events atm. */
> > +    if ((event_mask & EVENT_MASK_EPOW) == 0) {
> > +        return NULL;
> >      }
> > -    pending_epow = g_malloc0(sizeof(*pending_epow));
> > -    hdr = &pending_epow->hdr;
> > -    v6hdr = &pending_epow->v6hdr;
> > -    maina = &pending_epow->maina;
> > -    mainb = &pending_epow->mainb;
> > -    epow = &pending_epow->epow;
> >  
> > -    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > -                               | RTAS_LOG_SEVERITY_EVENT
> > -                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > -                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > -                               | RTAS_LOG_TYPE_EPOW);
> > -    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> > -                                       - sizeof(pending_epow->hdr));
> > +    QTAILQ_FOREACH(entry, &rtas_event_log.pending_events, next) {
> > +        /* EPOW and hotplug events are surfaced in the same manner */
> > +        if (entry->log_type == RTAS_LOG_TYPE_EPOW ||
> > +            entry->log_type == RTAS_LOG_TYPE_HOTPLUG) {
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (entry) {
> > +        QTAILQ_REMOVE(&rtas_event_log.pending_events, entry, next);
> > +    }
> >  
> > +    return entry;
> > +}
> > +
> > +static uint32_t next_plid;
> > +
> > +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> > +{
> >      v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> >          | RTAS_LOG_V6_B0_BIGENDIAN;
> >      v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> >          | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> >      v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> > +}
> > +
> > +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> > +                             int section_count)
> > +{
> > +    struct tm tm;
> > +    int year;
> >  
> >      maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> >      maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> > @@ -256,8 +309,37 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >                                         | (to_bcd(tm.tm_min) << 16)
> >                                         | (to_bcd(tm.tm_sec) << 8));
> >      maina->creator_id = 'H'; /* Hypervisor */
> > -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> > +    maina->section_count = section_count;
> >      maina->plid = next_plid++;
> > +}
> > +
> > +static void spapr_powerdown_req(Notifier *n, void *opaque)
> > +{
> > +    sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> > +    struct rtas_error_log *hdr;
> > +    struct rtas_event_log_v6 *v6hdr;
> > +    struct rtas_event_log_v6_maina *maina;
> > +    struct rtas_event_log_v6_mainb *mainb;
> > +    struct rtas_event_log_v6_epow *epow;
> > +    struct epow_log_full *pending_epow;
> > +
> > +    pending_epow = g_malloc0(sizeof(*pending_epow));
> 
> The name's no longer really right as a local.
> 
> > +    hdr = &pending_epow->hdr;
> > +    v6hdr = &pending_epow->v6hdr;
> > +    maina = &pending_epow->maina;
> > +    mainb = &pending_epow->mainb;
> > +    epow = &pending_epow->epow;
> > +
> > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > +                               | RTAS_LOG_SEVERITY_EVENT
> > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > +                               | RTAS_LOG_TYPE_EPOW);
> > +    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> > +                                       - sizeof(pending_epow->hdr));
> > +
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
> >  
> >      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> >      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > @@ -274,7 +356,78 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
> >      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> > +    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow);
> > +
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> 
> Shouldn't the irq pulse go into rtas_event_log_queue()?

It seems maybe it should...

"If a platform chooses to report multiple event or error sources through a
 single interrupt, it must ensure that the interrupt remains asserted
 or is re-asserted until check-exception has been used to process all
 out-standing errors or events for that interrupt."

But another alternative would be to have rtas-check-exception re-pulse if there
are still events present for that particular IRQ. I think I'd prefer this
approach, since it makes it easier to share the queueing functionality with
rtas-event-scan (which relies on polling instead of interrupts)

> 
> > +}
> > +
> > +static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
> > +{
> > +    struct hp_log_full *new_hp;
> > +    struct rtas_error_log *hdr;
> > +    struct rtas_event_log_v6 *v6hdr;
> > +    struct rtas_event_log_v6_maina *maina;
> > +    struct rtas_event_log_v6_mainb *mainb;
> > +    struct rtas_event_log_v6_hp *hp;
> > +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    sPAPRDRConnectorType drc_type = drck->get_type(drc);
> > +
> > +    new_hp = g_malloc0(sizeof(struct hp_log_full));
> > +    hdr = &new_hp->hdr;
> > +    v6hdr = &new_hp->v6hdr;
> > +    maina = &new_hp->maina;
> > +    mainb = &new_hp->mainb;
> > +    hp = &new_hp->hp;
> > +
> > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > +                               | RTAS_LOG_SEVERITY_EVENT
> > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > +                               | RTAS_LOG_INITIATOR_HOTPLUG
> > +                               | RTAS_LOG_TYPE_HOTPLUG);
> > +    hdr->extended_length = cpu_to_be32(sizeof(*new_hp)
> > +                                       - sizeof(new_hp->hdr));
> > +
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> > +
> > +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> > +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > +    mainb->subsystem_id = 0x80; /* External environment */
> > +    mainb->event_severity = 0x00; /* Informational / non-error */
> > +    mainb->event_subtype = 0x00; /* Normal shutdown */
> > +
> > +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> > +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> > +    hp->hdr.section_version = 1; /* includes extended modifier */
> > +    hp->hotplug_action = hp_action;
> > +
> > +
> > +    switch (drc_type) {
> > +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > +        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> > +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> > +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
> > +        break;
> > +    default:
> > +        /* skip notification for unknown connector types */
> 
> This should be an assert, shouldn't it?  If code is added which can
> fire this for non-PCI types, without the corresponding non-PCI option
> here, you want to catch that.

Agreed, if hotplug events are supported for that resource type, we should be
calling this.

> 
> > +        g_free(new_hp);
> > +        return;
> > +    }
> > +
> > +    rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp);
> > +
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > +}
> > +
> > +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc)
> > +{
> > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_ADD);
> > +}
> > +
> > +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc)
> > +{
> > +    spapr_hotplug_req_event(drc, RTAS_LOG_V6_HP_ACTION_REMOVE);
> >  }
> >  
> >  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > @@ -282,8 +435,9 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >                              target_ulong args,
> >                              uint32_t nret, target_ulong rets)
> >  {
> > -    uint32_t mask, buf, len;
> > +    uint32_t mask, buf, len, event_len;
> >      uint64_t xinfo;
> > +    EventLogEntry *event;
> >  
> >      if ((nargs < 6) || (nargs > 7) || nret != 1) {
> >          rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > @@ -298,23 +452,40 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
> >      }
> >  
> > -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> > -        if (sizeof(*pending_epow) < len) {
> > -            len = sizeof(*pending_epow);
> > -        }
> > +    event = rtas_event_log_dequeue(mask);
> > +    if (!event) {
> > +        goto out_no_events;
> > +    }
> >  
> > -        cpu_physical_memory_write(buf, pending_epow, len);
> > -        g_free(pending_epow);
> > -        pending_epow = NULL;
> > -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > -    } else {
> > -        rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> > +    switch (event->log_type) {
> > +        case RTAS_LOG_TYPE_EPOW:
> > +            event_len = sizeof(struct epow_log_full);
> > +            break;
> > +        case RTAS_LOG_TYPE_HOTPLUG:
> > +            event_len = sizeof(struct hp_log_full);
> > +            break;
> > +        default:
> > +            goto out_no_events;
> 
> Doesn't one of the headers include size information you could use
> here, avoiding the switch?

Looks like hdr->extended_length + sizeof(hdr) should do the trick. Nice!

> 
> >      }
> > +
> > +    if (event_len < len) {
> > +        len = event_len;
> > +    }
> > +
> > +    cpu_physical_memory_write(buf, event->data, len);
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    g_free(event->data);
> > +    g_free(event);
> > +    return;
> > +
> > +out_no_events:
> > +    rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> >  }
> >  
> >  void spapr_events_init(sPAPREnvironment *spapr)
> >  {
> > -    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
> > +    QTAILQ_INIT(&rtas_event_log.pending_events);
> > +    spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
> >      spapr->epow_notifier.notify = spapr_powerdown_req;
> >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> >      spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 3cc4490..1d27708 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -3,6 +3,7 @@
> >  
> >  #include "sysemu/dma.h"
> >  #include "hw/ppc/xics.h"
> > +#include "hw/ppc/spapr_drc.h"
> >  
> >  struct VIOsPAPRBus;
> >  struct sPAPRPHBState;
> > @@ -31,7 +32,7 @@ typedef struct sPAPREnvironment {
> >      struct PPCTimebase tb;
> >      bool has_graphics;
> >  
> > -    uint32_t epow_irq;
> > +    uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >  
> >      /* Migration state */
> > @@ -498,6 +499,8 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
> >                   uint32_t liobn, uint64_t window, uint32_t size);
> >  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
> >                        sPAPRTCETable *tcet);
> > +void spapr_hotplug_req_add_event(sPAPRDRConnector *drc);
> > +void spapr_hotplug_req_remove_event(sPAPRDRConnector *drc);
> >  
> >  #define TYPE_SPAPR_RTC "spapr-rtc"
> >  
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-24  6:40   ` David Gibson
@ 2015-02-24 20:43     ` Michael Roth
  2015-02-25  0:48       ` David Gibson
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-24 20:43 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-24 00:40:32)
> On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> > This interface is used to fetch an OF device-tree nodes that describes a
> > newly-attached device to guest. It is called multiple times to walk the
> > device-tree node and fetch individual properties into a 'workarea'/buffer
> > provided by the guest.
> > 
> > The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
> > the initial hotplug operation, and the state of these RTAS calls is tracked by
> > the sPAPRDRConnector. When the last of these properties is successfully
> > fetched, we report as special return value to the guest and transition
> > the device to a 'configured' state on the QEMU/DRC side.
> > 
> > See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> > this interface.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 81 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index f80beb2..b8551b0 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -418,6 +418,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >      rtas_st(rets, 1, entity_sense);
> >  }
> >  
> > +/* configure-connector work area offsets, int32_t units for field
> > + * indexes, bytes for field offset/len values.
> > + *
> > + * as documented by PAPR+ v2.7, 13.5.3.5
> > + */
> > +#define CC_IDX_NODE_NAME_OFFSET 2
> > +#define CC_IDX_PROP_NAME_OFFSET 2
> > +#define CC_IDX_PROP_LEN 3
> > +#define CC_IDX_PROP_DATA_OFFSET 4
> > +#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
> > +#define CC_WA_LEN 4096
> > +
> > +static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
> > +                                         sPAPREnvironment *spapr,
> > +                                         uint32_t token, uint32_t nargs,
> > +                                         target_ulong args, uint32_t nret,
> > +                                         target_ulong rets)
> > +{
> > +    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
> 
> You need to check nargs and nret first.

Could've sworn I'd fixed all those cases. Will make sure to take care of it on
the next one.

> 
> > +    uint64_t wa_offset;
> > +    uint32_t drc_index;
> > +    sPAPRDRConnector *drc;
> > +    sPAPRDRConnectorClass *drck;
> > +    sPAPRDRCCResponse resp;
> > +    const struct fdt_property *prop = NULL;
> > +    char *prop_name = NULL;
> > +    int prop_len, rc;
> > +
> > +    drc_index = rtas_ld(wa_addr, 0);
> > +    drc = spapr_dr_connector_by_index(drc_index);
> > +    if (!drc) {
> > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > +                drc_index);
> > +        rc = RTAS_OUT_PARAM_ERROR;
> > +        goto out;
> > +    }
> > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> 
> You may have answered this last time round, but if so I forgot the
> reason.
> 
> Why does the awkward iteration need to go down to the drck callback?
> Coudln't the drck callback part just supply the fdt fragment blob,
> then have generic code which streams it out via iteration?
> 
> Obviously we have to support the horrid PAPR interface, but it would
> be nice to confine the PAPR derived horridness to as small an area as
> we can.

That horrid interface percolates all the way up the QEMU stack,
unfortunately :)

Upon successfully having it's device tree node received, a DRC transitions
to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).

We need to track that state, since it's used to differentiate between
a case where a device is set to 'isolated' as part of entity-sense/device
configuration, as opposed to 'isolated' as part the unplug path. The
overlap between the 2 can occur if we do device_add followed by an immediate
device_del, but since the 'configured' transition must occur before the latter,
it becomes unambiguous.

It's also possible that a guest might be reset in the middle of a series of
calls to configure-connector, in which case that state needs to be reset. This
is currently handled by sPAPRDRConnector's reset hook, so if we moved that
aspect out I think we'd need to wire up a reset hook for the configure-connector
state, which is kinda messy. We'd also need a list of some sort, keyed by
the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
only one device configuration is 'in-flight' at a time), so we end up
duplicating a lot of tracking/functionality.

> 
> 
> > +    switch (resp) {
> > +    case SPAPR_DR_CC_RESPONSE_NEXT_CHILD:
> > +        /* provide the name of the next OF node */
> > +        wa_offset = CC_VAL_DATA_OFFSET;
> > +        rtas_st(wa_addr, CC_IDX_NODE_NAME_OFFSET, wa_offset);
> > +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> > +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> > +        break;
> > +    case SPAPR_DR_CC_RESPONSE_NEXT_PROPERTY:
> > +        /* provide the name of the next OF property */
> > +        wa_offset = CC_VAL_DATA_OFFSET;
> > +        rtas_st(wa_addr, CC_IDX_PROP_NAME_OFFSET, wa_offset);
> > +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> > +                              (uint8_t *)prop_name, strlen(prop_name) + 1);
> > +
> > +        /* provide the length and value of the OF property. data gets placed
> > +         * immediately after NULL terminator of the OF property's name string
> > +         */
> > +        wa_offset += strlen(prop_name) + 1,
> > +        rtas_st(wa_addr, CC_IDX_PROP_LEN, prop_len);
> > +        rtas_st(wa_addr, CC_IDX_PROP_DATA_OFFSET, wa_offset);
> > +        rtas_st_buffer_direct(wa_addr + wa_offset, CC_WA_LEN - wa_offset,
> > +                              (uint8_t *)((struct fdt_property *)prop)->data,
> > +                              prop_len);
> > +        break;
> > +    case SPAPR_DR_CC_RESPONSE_PREV_PARENT:
> > +    case SPAPR_DR_CC_RESPONSE_ERROR:
> > +    case SPAPR_DR_CC_RESPONSE_SUCCESS:
> > +        break;
> > +    default:
> > +        /* drck->configure_connector() should not return anything else */
> > +        g_assert(false);
> > +    }
> > +
> > +    rc = resp;
> > +out:
> > +    g_free(prop_name);
> > +    rtas_st(rets, 0, rc);
> > +}
> > +
> >  static struct rtas_call {
> >      const char *name;
> >      spapr_rtas_fn fn;
> > @@ -551,6 +630,8 @@ static void core_rtas_register_types(void)
> >                          rtas_set_indicator);
> >      spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
> >                          rtas_get_sensor_state);
> > +    spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, "ibm,configure-connector",
> > +                        rtas_ibm_configure_connector);
> >  }
> >  
> >  type_init(core_rtas_register_types)
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-24 20:43     ` Michael Roth
@ 2015-02-25  0:48       ` David Gibson
  2015-02-26 22:21         ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-25  0:48 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 5659 bytes --]

On Tue, Feb 24, 2015 at 02:43:45PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-02-24 00:40:32)
> > On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> > > This interface is used to fetch an OF device-tree nodes that describes a
> > > newly-attached device to guest. It is called multiple times to walk the
> > > device-tree node and fetch individual properties into a 'workarea'/buffer
> > > provided by the guest.
> > > 
> > > The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
> > > the initial hotplug operation, and the state of these RTAS calls is tracked by
> > > the sPAPRDRConnector. When the last of these properties is successfully
> > > fetched, we report as special return value to the guest and transition
> > > the device to a 'configured' state on the QEMU/DRC side.
> > > 
> > > See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> > > this interface.
> > > 
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 81 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index f80beb2..b8551b0 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -418,6 +418,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > >      rtas_st(rets, 1, entity_sense);
> > >  }
> > >  
> > > +/* configure-connector work area offsets, int32_t units for field
> > > + * indexes, bytes for field offset/len values.
> > > + *
> > > + * as documented by PAPR+ v2.7, 13.5.3.5
> > > + */
> > > +#define CC_IDX_NODE_NAME_OFFSET 2
> > > +#define CC_IDX_PROP_NAME_OFFSET 2
> > > +#define CC_IDX_PROP_LEN 3
> > > +#define CC_IDX_PROP_DATA_OFFSET 4
> > > +#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
> > > +#define CC_WA_LEN 4096
> > > +
> > > +static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
> > > +                                         sPAPREnvironment *spapr,
> > > +                                         uint32_t token, uint32_t nargs,
> > > +                                         target_ulong args, uint32_t nret,
> > > +                                         target_ulong rets)
> > > +{
> > > +    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
> > 
> > You need to check nargs and nret first.
> 
> Could've sworn I'd fixed all those cases. Will make sure to take care of it on
> the next one.
> 
> > 
> > > +    uint64_t wa_offset;
> > > +    uint32_t drc_index;
> > > +    sPAPRDRConnector *drc;
> > > +    sPAPRDRConnectorClass *drck;
> > > +    sPAPRDRCCResponse resp;
> > > +    const struct fdt_property *prop = NULL;
> > > +    char *prop_name = NULL;
> > > +    int prop_len, rc;
> > > +
> > > +    drc_index = rtas_ld(wa_addr, 0);
> > > +    drc = spapr_dr_connector_by_index(drc_index);
> > > +    if (!drc) {
> > > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > > +                drc_index);
> > > +        rc = RTAS_OUT_PARAM_ERROR;
> > > +        goto out;
> > > +    }
> > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> > 
> > You may have answered this last time round, but if so I forgot the
> > reason.
> > 
> > Why does the awkward iteration need to go down to the drck callback?
> > Coudln't the drck callback part just supply the fdt fragment blob,
> > then have generic code which streams it out via iteration?
> > 
> > Obviously we have to support the horrid PAPR interface, but it would
> > be nice to confine the PAPR derived horridness to as small an area as
> > we can.
> 
> That horrid interface percolates all the way up the QEMU stack,
> unfortunately :)
> 
> Upon successfully having it's device tree node received, a DRC transitions
> to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).
> 
> We need to track that state, since it's used to differentiate between
> a case where a device is set to 'isolated' as part of entity-sense/device
> configuration, as opposed to 'isolated' as part the unplug path. The
> overlap between the 2 can occur if we do device_add followed by an immediate
> device_del, but since the 'configured' transition must occur before the latter,
> it becomes unambiguous.
> 
> It's also possible that a guest might be reset in the middle of a series of
> calls to configure-connector, in which case that state needs to be reset. This
> is currently handled by sPAPRDRConnector's reset hook, so if we moved that
> aspect out I think we'd need to wire up a reset hook for the configure-connector
> state, which is kinda messy. We'd also need a list of some sort, keyed by
> the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
> only one device configuration is 'in-flight' at a time), so we end up
> duplicating a lot of tracking/functionality.

Hmm.  You should still be able to handle that with just 2 hooks and 1
bit of state right?  Say "start_configuration" and "end_configuration"
or similar.  start_configuration gets the blob from the backend, then
end_configuration is called once RTAS has finished streaming it out to
the guest and updates to the cnofigured state.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events
  2015-02-24 20:04     ` Michael Roth
@ 2015-02-25  0:54       ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-25  0:54 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 3988 bytes --]

On Tue, Feb 24, 2015 at 02:04:29PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-02-24 00:49:42)
> > On Mon, Feb 16, 2015 at 08:27:44AM -0600, Michael Roth wrote:
> > > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[snip]
> > > +    hdr = &pending_epow->hdr;
> > > +    v6hdr = &pending_epow->v6hdr;
> > > +    maina = &pending_epow->maina;
> > > +    mainb = &pending_epow->mainb;
> > > +    epow = &pending_epow->epow;
> > > +
> > > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > > +                               | RTAS_LOG_SEVERITY_EVENT
> > > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > > +                               | RTAS_LOG_TYPE_EPOW);
> > > +    hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> > > +                                       - sizeof(pending_epow->hdr));
> > > +
> > > +    spapr_init_v6hdr(v6hdr);
> > > +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
> > >  
> > >      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> > >      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > > @@ -274,7 +356,78 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> > >      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
> > >      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
> > >  
> > > -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> > > +    rtas_event_log_queue(RTAS_LOG_TYPE_EPOW, pending_epow);
> > > +
> > > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > 
> > Shouldn't the irq pulse go into rtas_event_log_queue()?
> 
> It seems maybe it should...
> 
> "If a platform chooses to report multiple event or error sources through a
>  single interrupt, it must ensure that the interrupt remains asserted
>  or is re-asserted until check-exception has been used to process all
>  out-standing errors or events for that interrupt."
> 
> But another alternative would be to have rtas-check-exception re-pulse if there
> are still events present for that particular IRQ. I think I'd prefer this
> approach, since it makes it easier to share the queueing functionality with
> rtas-event-scan (which relies on polling instead of interrupts)

Actually I wasn't thinking that deeply, just looking at the fact that
both callerss to rtas_event_log_queue() pulse the irq immediately
afterwards.

Hrm.  Now I'm a bit confused - the use of qemu_irq_pulse() suggests an
edge/message interrupt, but the reference to "remains asserted"
suggests level interrupt semantics.  I no longer remember enough of
PAPR to know why I used qemu_irq_pulse() in the first place.

[snip]
> > > -        cpu_physical_memory_write(buf, pending_epow, len);
> > > -        g_free(pending_epow);
> > > -        pending_epow = NULL;
> > > -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > > -    } else {
> > > -        rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> > > +    switch (event->log_type) {
> > > +        case RTAS_LOG_TYPE_EPOW:
> > > +            event_len = sizeof(struct epow_log_full);
> > > +            break;
> > > +        case RTAS_LOG_TYPE_HOTPLUG:
> > > +            event_len = sizeof(struct hp_log_full);
> > > +            break;
> > > +        default:
> > > +            goto out_no_events;
> > 
> > Doesn't one of the headers include size information you could use
> > here, avoiding the switch?
> 
> Looks like hdr->extended_length + sizeof(hdr) should do the
> trick. Nice!

Excellent. A switch on the same selector at multiple levels of the
same callstack always seems to be a warning sign of layering
violations.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth
@ 2015-02-25  3:11   ` David Gibson
  2015-02-25  5:17     ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-25  3:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 21058 bytes --]

On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> This enables hotplug for PHB bridges. Upon hotplug we generate the

"PCI Host Bridge bridges" :-p

> OF-nodes required by PAPR specification and IEEE 1275-1994
> "PCI Bus Binding to Open Firmware" for the device.
> 
> We associate the corresponding FDT for these nodes with the DrcEntry
> corresponding to the slot, which will be fetched via
> ibm,configure-connector RTAS calls by the guest as described by PAPR
> specification. The FDT is cleaned up in the case of unplug.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 371 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 6a3d917..b9af1cd 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -33,8 +33,11 @@
>  #include <libfdt.h>
>  #include "trace.h"
>  #include "qemu/error-report.h"
> +#include "qapi/qmp/qerror.h"
>  
>  #include "hw/pci/pci_bus.h"
> +#include "hw/ppc/spapr_drc.h"
> +#include "sysemu/device_tree.h"
>  
>  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
>  #define RTAS_QUERY_FN           0
> @@ -47,7 +50,13 @@
>  #define RTAS_TYPE_MSI           1
>  #define RTAS_TYPE_MSIX          2
>  
> -#include "hw/ppc/spapr_drc.h"
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret < 0) {                                             \
> +            return ret;                                            \
> +        }                                                          \
> +    } while (0)
>  
>  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
>  {
> @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &phb->iommu_as;
>  }
>  
> +/* Macros to operate with address in OF binding to PCI */
> +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> +/* for 'reg'/'assigned-addresses' OF properties */
> +#define RESOURCE_CELLS_SIZE 2
> +#define RESOURCE_CELLS_ADDRESS 3
> +
> +typedef struct ResourceFields {
> +    uint32_t phys_hi;
> +    uint32_t phys_mid;
> +    uint32_t phys_lo;
> +    uint32_t size_hi;
> +    uint32_t size_lo;
> +} ResourceFields;

Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
platforms to be safe.

> +typedef struct ResourceProps {
> +    union {
> +        ResourceFields fields[8];
> +        uint8_t data[sizeof(ResourceFields) * 8];
> +    } reg;
> +    union {
> +        ResourceFields fields[7];
> +        uint8_t data[sizeof(ResourceFields) * 7];
> +    } assigned;
> +    uint32_t reg_len;
> +    uint32_t assigned_len;
> +} ResourceProps;

I'm a bit dubious about this union, rather than just casting when you
need the bytestring version.

> +
> +/* fill in the 'reg'/'assigned-resources' OF properties for
> + * a PCI device. 'reg' describes resource requirements for a
> + * device's IO/MEM regions, 'assigned-addresses' describes the
> + * actual resource assignments.
> + *
> + * the properties are arrays of ('phys-addr', 'size') pairs describing
> + * the addressable regions of the PCI device, where 'phys-addr' is a
> + * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
> + * (phys.hi, phys.mid, phys.lo), and 'size' is a
> + * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
> + * phys.hi = 0xYYXXXXZZ, where:
> + *   0xYY = npt000ss
> + *          |||   |
> + *          |||   +-- space code: 1 if IO region, 2 if MEM region
> + *          ||+------ for non-relocatable IO: 1 if aliased
> + *          ||        for relocatable IO: 1 if below 64KB
> + *          ||        for MEM: 1 if below 1MB
> + *          |+------- 1 if region is prefetchable
> + *          +-------- 1 if region is non-relocatable
> + *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
> + *            bits respectively
> + *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
> + *          to the region
> + *
> + * phys.mid and phys.lo correspond respectively to the hi/lo portions
> + * of the actual address of the region.
> + *
> + * how the phys-addr/size values are used differ slightly between
> + * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
> + * an additional description for the config space region of the
> + * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
> + * to describe the region as relocatable, with an address-mapping
> + * that corresponds directly to the PHB's address space for the
> + * resource. 'assigned-addresses' always has n=1 set with an absolute
> + * address assigned for the resource. in general, 'assigned-addresses'
> + * won't be populated, since addresses for PCI devices are generally
> + * unmapped initially and left to the guest to assign.
> + *
> + * in accordance with PCI Bus Binding to Open Firmware,
> + * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
> + * Appendix C.
> + */
> +static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
> +{
> +    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
> +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> +                       b_ddddd(PCI_SLOT(d->devfn)) |
> +                       b_fff(PCI_FUNC(d->devfn)));
> +    ResourceFields *reg, *assigned;
> +    int i, reg_idx = 0, assigned_idx = 0;
> +
> +    /* config space region */
> +    reg = &rp->reg.fields[reg_idx++];
> +    reg->phys_hi = cpu_to_be32(dev_id);
> +    reg->phys_mid = 0;
> +    reg->phys_lo = 0;
> +    reg->size_hi = 0;
> +    reg->size_lo = 0;
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        if (!d->io_regions[i].size) {
> +            continue;
> +        }
> +
> +        reg = &rp->reg.fields[reg_idx++];
> +
> +        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> +            reg->phys_hi |= cpu_to_be32(b_ss(1));
> +        } else {
> +            reg->phys_hi |= cpu_to_be32(b_ss(2));
> +        }
> +        reg->phys_mid = 0;
> +        reg->phys_lo = 0;
> +        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
> +        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
> +
> +        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
> +            continue;

This has inverted sense, doesn't it?  If the BAR *is* unmapped you
want to continue, skipping the assigned-addresses stuff.

> +        }
> +
> +        assigned = &rp->assigned.fields[assigned_idx++];
> +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);

Not sure if I understood the comment above properly; should these
addresses actually be translated through an mappings above the PHB?
Not that there are any such translations on sPAPR, but...

> +        assigned->size_hi = reg->size_hi;
> +        assigned->size_lo = reg->size_lo;
> +    }
> +
> +    rp->reg_len = reg_idx * sizeof(ResourceFields);
> +    rp->assigned_len = assigned_idx * sizeof(ResourceFields);
> +}
> +
> +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> +                                       int phb_index, int drc_index,
> +                                       const char *drc_name)
> +{
> +    ResourceProps rp;
> +    bool is_bridge = false;
> +    int pci_status;
> +
> +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> +        PCI_HEADER_TYPE_BRIDGE) {
> +        is_bridge = true;
> +    }
> +
> +    /* in accordance with PAPR+ v2.7 13.6.3, Table 181 */
> +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2)
> +                            << 8));
> +    if (pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> +                 pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> +    }
> +
> +    if (!is_bridge) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> +    }
> +
> +    if (pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> +                 pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> +    }
> +
> +    if (pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)) {
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> +                 pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> +
> +    /* the following fdt cells are masked off the pci status register */
> +    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> +
> +    if (pci_status & PCI_STATUS_FAST_BACK) {
> +        _FDT(fdt_setprop(fdt, offset, "fast-back-to-back", NULL, 0));
> +    }
> +    if (pci_status & PCI_STATUS_66MHZ) {
> +        _FDT(fdt_setprop(fdt, offset, "66mhz-capable", NULL, 0));
> +    }
> +    if (pci_status & PCI_STATUS_UDF) {
> +        _FDT(fdt_setprop(fdt, offset, "udf-supported", NULL, 0));
> +    }
> +
> +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));

Ah.. this is a bit weird.  Explicit "name" properties are usually
omitted from fdts.  But I guess you need it, because this will get
streamed into the guest via the RTAS interfaces, rather than into the
guest's fdt parsing code (which knowws to derive name properties from
the unit name).

That probably deserves a comment.

> +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", drc_name, strlen(drc_name)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> +                          RESOURCE_CELLS_ADDRESS));
> +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> +                          RESOURCE_CELLS_SIZE));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> +                          RESOURCE_CELLS_SIZE));
> +
> +    populate_resource_props(dev, &rp);
> +    _FDT(fdt_setprop(fdt, offset, "reg", rp.reg.data, rp.reg_len));
> +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> +                     rp.assigned.data, rp.assigned_len));
> +
> +    return 0;
> +}
> +
> +/* create OF node for pci device and required OF DT properties */
> +static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
> +                                       int drc_index, const char *drc_name,
> +                                       int *dt_offset)
> +{
> +    void *fdt;
> +    int offset, ret, fdt_size;
> +    int slot = PCI_SLOT(dev->devfn);
> +    char nodename[512];
> +
> +    fdt = create_device_tree(&fdt_size);
> +    sprintf(nodename, "pci@%d", slot);
> +    offset = fdt_add_subnode(fdt, 0, nodename);
> +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index,
> +                                      drc_name);
> +    g_assert(!ret);
> +
> +    *dt_offset = offset;
> +    return fdt;
> +}
> +
> +static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
> +                                     sPAPRPHBState *phb,
> +                                     PCIDevice *pdev,
> +                                     Error **errp)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    DeviceState *dev = DEVICE(pdev);
> +    int drc_index = drck->get_index(drc);
> +    const char *drc_name = drck->get_name(drc);
> +    void *fdt = NULL;
> +    int fdt_start_offset = 0;
> +
> +    /* boot-time devices get their device tree node created by SLOF, but for
> +     * hotplugged devices we need QEMU to generate it so the guest can fetch
> +     * it via RTAS
> +     */
> +    if (dev->hotplugged) {
> +        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index, drc_name,
> +                                        &fdt_start_offset);
> +    }
> +
> +    drck->attach(drc, DEVICE(pdev),
> +                 fdt, fdt_start_offset, !dev->hotplugged, errp);
> +    if (errp) {
> +        g_free(fdt);
> +    }
> +}
> +
> +static void spapr_phb_remove_pci_device_cb(DeviceState *dev, void *opaque)
> +{
> +    /* some version guests do not wait for completion of a device
> +     * cleanup (generally done asynchronously by the kernel) before
> +     * signaling to QEMU that the device is safe, but instead sleep
> +     * for some 'safe' period of time. unfortunately on a busy host
> +     * this sleep isn't guaranteed to be long enough, resulting in
> +     * bad things like IRQ lines being left asserted during final
> +     * device removal. to deal with this we call reset just prior
> +     * to finalizing the device, which will put the device back into
> +     * an 'idle' state, as the device cleanup code expects.
> +     */
> +    pci_device_reset(PCI_DEVICE(dev));
> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
> +                                        sPAPRPHBState *phb,
> +                                        PCIDevice *pdev,
> +                                        Error **errp)
> +{
> +    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, errp);
> +}
> +
> +static void spapr_phb_hot_plug_child(HotplugHandler *plug_handler,
> +                                     DeviceState *plugged_dev, Error **errp)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
> +        SPAPR_DR_CONNECTOR_TYPE_PCI,
> +        (phb->index << 16) |
> +        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
> +        pdev->devfn);

Helper function for that calculation might be clearer.

> +    Error *local_err = NULL;
> +
> +    /* if DR is disabled we don't need to do anything in the case of
> +     * hotplug or coldplug callbacks
> +     */
> +    if (!phb->dr_enabled) {
> +        /* if this is a hotplug operation initiated by the user
> +         * we need to let them know it's not enabled
> +         */
> +        if (plugged_dev->hotplugged) {
> +            error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                      object_get_typename(OBJECT(phb)));
> +        }
> +        return;
> +    }
> +
> +    g_assert(drc);
> +
> +    spapr_phb_add_pci_device(drc, phb, pdev, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
> +                                       DeviceState *plugged_dev, Error **errp)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
> +    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
> +    sPAPRDRConnectorClass *drck;
> +    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
> +        SPAPR_DR_CONNECTOR_TYPE_PCI,
> +        (phb->index << 16) |
> +        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
> +        pdev->devfn);
> +    Error *local_err = NULL;
> +
> +    if (!phb->dr_enabled) {
> +        error_set(errp, QERR_BUS_NO_HOTPLUG,
> +                  object_get_typename(OBJECT(phb)));
> +        return;
> +    }
> +
> +    g_assert(drc);
> +
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    if (!drck->release_pending(drc)) {
> +        spapr_phb_remove_pci_device(drc, phb, pdev, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +}
> +
>  static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
>      SysBusDevice *s = SYS_BUS_DEVICE(dev);
> @@ -574,6 +936,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>                             &sphb->memspace, &sphb->iospace,
>                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>      phb->bus = bus;
> +    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
>  
>      /*
>       * Initialize PHB address space.
> @@ -810,6 +1173,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
> +    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
>  
>      hc->root_bus_path = spapr_phb_root_bus_path;
>      dc->realize = spapr_phb_realize;
> @@ -819,6 +1183,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>      dc->cannot_instantiate_with_device_add_yet = false;
>      spc->finish_realize = spapr_phb_finish_realize;
> +    hp->plug = spapr_phb_hot_plug_child;
> +    hp->unplug = spapr_phb_hot_unplug_child;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> @@ -827,6 +1193,10 @@ static const TypeInfo spapr_phb_info = {
>      .instance_size = sizeof(sPAPRPHBState),
>      .class_init    = spapr_phb_class_init,
>      .class_size    = sizeof(sPAPRPHBClass),
> +    .interfaces    = (InterfaceInfo[]) {
> +        { TYPE_HOTPLUG_HANDLER },
> +        { }
> +    }
>  };
>  
>  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> @@ -840,17 +1210,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
>      return PCI_HOST_BRIDGE(dev);
>  }
>  
> -/* Macros to operate with address in OF binding to PCI */
> -#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> -#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> -#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> -#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> -#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> -#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> -#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> -#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> -#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> -
>  typedef struct sPAPRTCEDT {
>      void *fdt;
>      int node_off;
> @@ -920,14 +1279,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>          return bus_off;
>      }
>  
> -#define _FDT(exp) \
> -    do { \
> -        int ret = (exp);                                           \
> -        if (ret < 0) {                                             \
> -            return ret;                                            \
> -        }                                                          \
> -    } while (0)
> -
>      /* Write PHB properties */
>      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
>      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
@ 2015-02-25  3:18   ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-25  3:18 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Mon, Feb 16, 2015 at 08:27:52AM -0600, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This uses extension of existing EPOW interrupt/event mechanism
> to notify userspace tools like librtas/drmgr to handle
> in-guest configuration/cleanup operations in response to
> device_add/device_del.
> 
> Userspace tools that don't implement this extension will need
> to be run manually in response/advance of device_add/device_del,
> respectively.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>


> ---
>  hw/ppc/spapr_pci.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index b9af1cd..7713bfb 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -810,6 +810,9 @@ static void spapr_phb_hot_plug_child(HotplugHandler *plug_handler,
>          error_propagate(errp, local_err);
>          return;
>      }
> +    if (plugged_dev->hotplugged) {
> +        spapr_hotplug_req_add_event(drc);
> +    }
>  }
>  
>  static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
> @@ -840,6 +843,7 @@ static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
>              error_propagate(errp, local_err);
>              return;
>          }
> +        spapr_hotplug_req_remove_event(drc);
>      }
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-25  3:11   ` David Gibson
@ 2015-02-25  5:17     ` Michael Roth
  2015-02-25  6:40       ` Michael Roth
  2015-02-26  3:49       ` David Gibson
  0 siblings, 2 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-25  5:17 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-24 21:11:29)
> On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> > This enables hotplug for PHB bridges. Upon hotplug we generate the
> 
> "PCI Host Bridge bridges" :-p
> 
> > OF-nodes required by PAPR specification and IEEE 1275-1994
> > "PCI Bus Binding to Open Firmware" for the device.
> > 
> > We associate the corresponding FDT for these nodes with the DrcEntry
> > corresponding to the slot, which will be fetched via
> > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > specification. The FDT is cleaned up in the case of unplug.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 371 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 6a3d917..b9af1cd 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -33,8 +33,11 @@
> >  #include <libfdt.h>
> >  #include "trace.h"
> >  #include "qemu/error-report.h"
> > +#include "qapi/qmp/qerror.h"
> >  
> >  #include "hw/pci/pci_bus.h"
> > +#include "hw/ppc/spapr_drc.h"
> > +#include "sysemu/device_tree.h"
> >  
> >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> >  #define RTAS_QUERY_FN           0
> > @@ -47,7 +50,13 @@
> >  #define RTAS_TYPE_MSI           1
> >  #define RTAS_TYPE_MSIX          2
> >  
> > -#include "hw/ppc/spapr_drc.h"
> > +#define _FDT(exp) \
> > +    do { \
> > +        int ret = (exp);                                           \
> > +        if (ret < 0) {                                             \
> > +            return ret;                                            \
> > +        }                                                          \
> > +    } while (0)
> >  
> >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> >  {
> > @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >      return &phb->iommu_as;
> >  }
> >  
> > +/* Macros to operate with address in OF binding to PCI */
> > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > +/* for 'reg'/'assigned-addresses' OF properties */
> > +#define RESOURCE_CELLS_SIZE 2
> > +#define RESOURCE_CELLS_ADDRESS 3
> > +
> > +typedef struct ResourceFields {
> > +    uint32_t phys_hi;
> > +    uint32_t phys_mid;
> > +    uint32_t phys_lo;
> > +    uint32_t size_hi;
> > +    uint32_t size_lo;
> > +} ResourceFields;
> 
> Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
> platforms to be safe.

I seem to remember some rule (C99?) about padding only being used in cases
where an n-byte field doesn't naturally fall on an n-byte boundary, but
it doesn't hurt to be safe/explicit about what we're expecting.

> 
> > +typedef struct ResourceProps {
> > +    union {
> > +        ResourceFields fields[8];
> > +        uint8_t data[sizeof(ResourceFields) * 8];
> > +    } reg;
> > +    union {
> > +        ResourceFields fields[7];
> > +        uint8_t data[sizeof(ResourceFields) * 7];
> > +    } assigned;
> > +    uint32_t reg_len;
> > +    uint32_t assigned_len;
> > +} ResourceProps;
> 
> I'm a bit dubious about this union, rather than just casting when you
> need the bytestring version.

Yah, I may have gotten a bit carried away there...

> 
> > +
> > +/* fill in the 'reg'/'assigned-resources' OF properties for
> > + * a PCI device. 'reg' describes resource requirements for a
> > + * device's IO/MEM regions, 'assigned-addresses' describes the
> > + * actual resource assignments.
> > + *
> > + * the properties are arrays of ('phys-addr', 'size') pairs describing
> > + * the addressable regions of the PCI device, where 'phys-addr' is a
> > + * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
> > + * (phys.hi, phys.mid, phys.lo), and 'size' is a
> > + * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
> > + * phys.hi = 0xYYXXXXZZ, where:
> > + *   0xYY = npt000ss
> > + *          |||   |
> > + *          |||   +-- space code: 1 if IO region, 2 if MEM region
> > + *          ||+------ for non-relocatable IO: 1 if aliased
> > + *          ||        for relocatable IO: 1 if below 64KB
> > + *          ||        for MEM: 1 if below 1MB
> > + *          |+------- 1 if region is prefetchable
> > + *          +-------- 1 if region is non-relocatable
> > + *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
> > + *            bits respectively
> > + *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
> > + *          to the region
> > + *
> > + * phys.mid and phys.lo correspond respectively to the hi/lo portions
> > + * of the actual address of the region.
> > + *
> > + * how the phys-addr/size values are used differ slightly between
> > + * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
> > + * an additional description for the config space region of the
> > + * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
> > + * to describe the region as relocatable, with an address-mapping
> > + * that corresponds directly to the PHB's address space for the
> > + * resource. 'assigned-addresses' always has n=1 set with an absolute
> > + * address assigned for the resource. in general, 'assigned-addresses'
> > + * won't be populated, since addresses for PCI devices are generally
> > + * unmapped initially and left to the guest to assign.
> > + *
> > + * in accordance with PCI Bus Binding to Open Firmware,
> > + * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
> > + * Appendix C.
> > + */
> > +static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
> > +{
> > +    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
> > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > +                       b_fff(PCI_FUNC(d->devfn)));
> > +    ResourceFields *reg, *assigned;
> > +    int i, reg_idx = 0, assigned_idx = 0;
> > +
> > +    /* config space region */
> > +    reg = &rp->reg.fields[reg_idx++];
> > +    reg->phys_hi = cpu_to_be32(dev_id);
> > +    reg->phys_mid = 0;
> > +    reg->phys_lo = 0;
> > +    reg->size_hi = 0;
> > +    reg->size_lo = 0;
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        if (!d->io_regions[i].size) {
> > +            continue;
> > +        }
> > +
> > +        reg = &rp->reg.fields[reg_idx++];
> > +
> > +        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > +            reg->phys_hi |= cpu_to_be32(b_ss(1));
> > +        } else {
> > +            reg->phys_hi |= cpu_to_be32(b_ss(2));
> > +        }
> > +        reg->phys_mid = 0;
> > +        reg->phys_lo = 0;
> > +        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
> > +        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
> > +
> > +        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
> > +            continue;
> 
> This has inverted sense, doesn't it?  If the BAR *is* unmapped you
> want to continue, skipping the assigned-addresses stuff.

Yes, should be inverted :( It ends up not mattering much with the
current userspace code, since we rely on PCI rescan which always
re-assigns BARs, but definitely needs fixing.

> 
> > +        }
> > +
> > +        assigned = &rp->assigned.fields[assigned_idx++];
> > +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> > +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> > +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
> 
> Not sure if I understood the comment above properly; should these
> addresses actually be translated through an mappings above the PHB?
> Not that there are any such translations on sPAPR, but...

The io_regions[i].addr values are relative to the IO/MEM address spaces for
the device, which correspond to IO/MEM windows for the PHB. So I think the
memory API should handle any translation above that...

Or do you mean because of the n=0/relocatable IO regions described by 'reg'?

IIUC, when n=0, reg->phys_{mid,lo} can be used to encode a starting offset,
so a guest can re-assign/re-program an IO region that's already been assigned
and described via the correspond fields in 'assigned-addresses', so long as
uses an addr above reg->phys{mid,lo}.

So, we could use those reg->phys_{mid,lo} values as an alternative to the
PHBs IO/MEM windows (I guess for platforms that don't provide these windows
and just expose the full address space?).

But since we have those windows, we end up not needing this, so we always do
n=0, reg->phys_mid = reg->phys_lo = 0, so the values in
assigned->phys_{mid,lo} end up just being offsets into the IO/MEM windows,
which correspond directly to io_regions[n].addr.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-25  5:17     ` Michael Roth
@ 2015-02-25  6:40       ` Michael Roth
  2015-02-26 20:06         ` Michael Roth
  2015-02-26  3:49       ` David Gibson
  1 sibling, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-25  6:40 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting Michael Roth (2015-02-24 23:17:24)
> Quoting David Gibson (2015-02-24 21:11:29)
> > On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> > > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > 
> > "PCI Host Bridge bridges" :-p
> > 
> > > OF-nodes required by PAPR specification and IEEE 1275-1994
> > > "PCI Bus Binding to Open Firmware" for the device.
> > > 
> > > We associate the corresponding FDT for these nodes with the DrcEntry
> > > corresponding to the slot, which will be fetched via
> > > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > > specification. The FDT is cleaned up in the case of unplug.
> > > 
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 371 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 6a3d917..b9af1cd 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -33,8 +33,11 @@
> > >  #include <libfdt.h>
> > >  #include "trace.h"
> > >  #include "qemu/error-report.h"
> > > +#include "qapi/qmp/qerror.h"
> > >  
> > >  #include "hw/pci/pci_bus.h"
> > > +#include "hw/ppc/spapr_drc.h"
> > > +#include "sysemu/device_tree.h"
> > >  
> > >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> > >  #define RTAS_QUERY_FN           0
> > > @@ -47,7 +50,13 @@
> > >  #define RTAS_TYPE_MSI           1
> > >  #define RTAS_TYPE_MSIX          2
> > >  
> > > -#include "hw/ppc/spapr_drc.h"
> > > +#define _FDT(exp) \
> > > +    do { \
> > > +        int ret = (exp);                                           \
> > > +        if (ret < 0) {                                             \
> > > +            return ret;                                            \
> > > +        }                                                          \
> > > +    } while (0)
> > >  
> > >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> > >  {
> > > @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> > >      return &phb->iommu_as;
> > >  }
> > >  
> > > +/* Macros to operate with address in OF binding to PCI */
> > > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > > +/* for 'reg'/'assigned-addresses' OF properties */
> > > +#define RESOURCE_CELLS_SIZE 2
> > > +#define RESOURCE_CELLS_ADDRESS 3
> > > +
> > > +typedef struct ResourceFields {
> > > +    uint32_t phys_hi;
> > > +    uint32_t phys_mid;
> > > +    uint32_t phys_lo;
> > > +    uint32_t size_hi;
> > > +    uint32_t size_lo;
> > > +} ResourceFields;
> > 
> > Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
> > platforms to be safe.
> 
> I seem to remember some rule (C99?) about padding only being used in cases
> where an n-byte field doesn't naturally fall on an n-byte boundary, but
> it doesn't hurt to be safe/explicit about what we're expecting.
> 
> > 
> > > +typedef struct ResourceProps {
> > > +    union {
> > > +        ResourceFields fields[8];
> > > +        uint8_t data[sizeof(ResourceFields) * 8];
> > > +    } reg;
> > > +    union {
> > > +        ResourceFields fields[7];
> > > +        uint8_t data[sizeof(ResourceFields) * 7];
> > > +    } assigned;
> > > +    uint32_t reg_len;
> > > +    uint32_t assigned_len;
> > > +} ResourceProps;
> > 
> > I'm a bit dubious about this union, rather than just casting when you
> > need the bytestring version.
> 
> Yah, I may have gotten a bit carried away there...
> 
> > 
> > > +
> > > +/* fill in the 'reg'/'assigned-resources' OF properties for
> > > + * a PCI device. 'reg' describes resource requirements for a
> > > + * device's IO/MEM regions, 'assigned-addresses' describes the
> > > + * actual resource assignments.
> > > + *
> > > + * the properties are arrays of ('phys-addr', 'size') pairs describing
> > > + * the addressable regions of the PCI device, where 'phys-addr' is a
> > > + * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
> > > + * (phys.hi, phys.mid, phys.lo), and 'size' is a
> > > + * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
> > > + * phys.hi = 0xYYXXXXZZ, where:
> > > + *   0xYY = npt000ss
> > > + *          |||   |
> > > + *          |||   +-- space code: 1 if IO region, 2 if MEM region
> > > + *          ||+------ for non-relocatable IO: 1 if aliased
> > > + *          ||        for relocatable IO: 1 if below 64KB
> > > + *          ||        for MEM: 1 if below 1MB
> > > + *          |+------- 1 if region is prefetchable
> > > + *          +-------- 1 if region is non-relocatable
> > > + *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
> > > + *            bits respectively
> > > + *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
> > > + *          to the region
> > > + *
> > > + * phys.mid and phys.lo correspond respectively to the hi/lo portions
> > > + * of the actual address of the region.
> > > + *
> > > + * how the phys-addr/size values are used differ slightly between
> > > + * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
> > > + * an additional description for the config space region of the
> > > + * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
> > > + * to describe the region as relocatable, with an address-mapping
> > > + * that corresponds directly to the PHB's address space for the
> > > + * resource. 'assigned-addresses' always has n=1 set with an absolute
> > > + * address assigned for the resource. in general, 'assigned-addresses'
> > > + * won't be populated, since addresses for PCI devices are generally
> > > + * unmapped initially and left to the guest to assign.
> > > + *
> > > + * in accordance with PCI Bus Binding to Open Firmware,
> > > + * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
> > > + * Appendix C.
> > > + */
> > > +static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
> > > +{
> > > +    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
> > > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > > +                       b_fff(PCI_FUNC(d->devfn)));
> > > +    ResourceFields *reg, *assigned;
> > > +    int i, reg_idx = 0, assigned_idx = 0;
> > > +
> > > +    /* config space region */
> > > +    reg = &rp->reg.fields[reg_idx++];
> > > +    reg->phys_hi = cpu_to_be32(dev_id);
> > > +    reg->phys_mid = 0;
> > > +    reg->phys_lo = 0;
> > > +    reg->size_hi = 0;
> > > +    reg->size_lo = 0;
> > > +
> > > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > +        if (!d->io_regions[i].size) {
> > > +            continue;
> > > +        }
> > > +
> > > +        reg = &rp->reg.fields[reg_idx++];
> > > +
> > > +        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > > +            reg->phys_hi |= cpu_to_be32(b_ss(1));
> > > +        } else {
> > > +            reg->phys_hi |= cpu_to_be32(b_ss(2));
> > > +        }
> > > +        reg->phys_mid = 0;
> > > +        reg->phys_lo = 0;
> > > +        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
> > > +        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
> > > +
> > > +        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
> > > +            continue;
> > 
> > This has inverted sense, doesn't it?  If the BAR *is* unmapped you
> > want to continue, skipping the assigned-addresses stuff.
> 
> Yes, should be inverted :( It ends up not mattering much with the
> current userspace code, since we rely on PCI rescan which always
> re-assigns BARs, but definitely needs fixing.
> 
> > 
> > > +        }
> > > +
> > > +        assigned = &rp->assigned.fields[assigned_idx++];
> > > +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> > > +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> > > +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
> > 
> > Not sure if I understood the comment above properly; should these
> > addresses actually be translated through an mappings above the PHB?
> > Not that there are any such translations on sPAPR, but...
> 
> The io_regions[i].addr values are relative to the IO/MEM address spaces for
> the device, which correspond to IO/MEM windows for the PHB. So I think the
> memory API should handle any translation above that...
> 
> Or do you mean because of the n=0/relocatable IO regions described by 'reg'?
> 
> IIUC, when n=0, reg->phys_{mid,lo} can be used to encode a starting offset,
> so a guest can re-assign/re-program an IO region that's already been assigned
> and described via the correspond fields in 'assigned-addresses', so long as
> uses an addr above reg->phys{mid,lo}.
> 
> So, we could use those reg->phys_{mid,lo} values as an alternative to the
> PHBs IO/MEM windows (I guess for platforms that don't provide these windows
> and just expose the full address space?).
> 
> But since we have those windows, we end up not needing this, so we always do
> n=0, reg->phys_mid = reg->phys_lo = 0, so the values in
> assigned->phys_{mid,lo} end up just being offsets into the IO/MEM windows,
> which correspond directly to io_regions[n].addr.

Argh, not sure what I was thinking. io_regions[n].addr is relative to the
IO/MEM window, so from the guest perspective .addr is indeed a different
value...

So maybe reg->phys_mid/reg->phys_lo do in fact need to reflect the window
offsets so that the relocatable region assignments are offset
accordingly...

Will look into it more.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-25  5:17     ` Michael Roth
  2015-02-25  6:40       ` Michael Roth
@ 2015-02-26  3:49       ` David Gibson
  1 sibling, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-26  3:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 4232 bytes --]

On Tue, Feb 24, 2015 at 11:17:24PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-02-24 21:11:29)
> > On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> > > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > 
> > "PCI Host Bridge bridges" :-p
> > 
> > > OF-nodes required by PAPR specification and IEEE 1275-1994
> > > "PCI Bus Binding to Open Firmware" for the device.
> > > 
> > > We associate the corresponding FDT for these nodes with the DrcEntry
> > > corresponding to the slot, which will be fetched via
> > > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > > specification. The FDT is cleaned up in the case of unplug.
> > > 
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 371 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 6a3d917..b9af1cd 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -33,8 +33,11 @@
> > >  #include <libfdt.h>
> > >  #include "trace.h"
> > >  #include "qemu/error-report.h"
> > > +#include "qapi/qmp/qerror.h"
> > >  
> > >  #include "hw/pci/pci_bus.h"
> > > +#include "hw/ppc/spapr_drc.h"
> > > +#include "sysemu/device_tree.h"
> > >  
> > >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> > >  #define RTAS_QUERY_FN           0
> > > @@ -47,7 +50,13 @@
> > >  #define RTAS_TYPE_MSI           1
> > >  #define RTAS_TYPE_MSIX          2
> > >  
> > > -#include "hw/ppc/spapr_drc.h"
> > > +#define _FDT(exp) \
> > > +    do { \
> > > +        int ret = (exp);                                           \
> > > +        if (ret < 0) {                                             \
> > > +            return ret;                                            \
> > > +        }                                                          \
> > > +    } while (0)
> > >  
> > >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> > >  {
> > > @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> > >      return &phb->iommu_as;
> > >  }
> > >  
> > > +/* Macros to operate with address in OF binding to PCI */
> > > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > > +/* for 'reg'/'assigned-addresses' OF properties */
> > > +#define RESOURCE_CELLS_SIZE 2
> > > +#define RESOURCE_CELLS_ADDRESS 3
> > > +
> > > +typedef struct ResourceFields {
> > > +    uint32_t phys_hi;
> > > +    uint32_t phys_mid;
> > > +    uint32_t phys_lo;
> > > +    uint32_t size_hi;
> > > +    uint32_t size_lo;
> > > +} ResourceFields;
> > 
> > Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
> > platforms to be safe.
> 
> I seem to remember some rule (C99?) about padding only being used in cases
> where an n-byte field doesn't naturally fall on an n-byte boundary, but
> it doesn't hurt to be safe/explicit about what we're expecting.

So, the potential problem is end-of-structure padding rather than
between-field padding.  I know some platforms (ARMs I think) at least
used to round structures out to 8 bytes, even if all the members were
less than that.  Mind you at the time, I believe ((packed)) didn't
actually fix it, causing headaches with nested structures.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt()
  2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
  2015-02-24  9:26   ` David Gibson
@ 2015-02-26 15:56   ` Nathan Fontenot
  1 sibling, 0 replies; 46+ messages in thread
From: Nathan Fontenot @ 2015-02-26 15:56 UTC (permalink / raw)
  To: Michael Roth, qemu-devel
  Cc: aik, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, david

On 02/16/2015 08:27 AM, Michael Roth wrote:
> This function handles generation of ibm,drc-* array device tree
> properties to describe DRC topology to guests. This will by used
> by the guest to direct RTAS calls to manage any dynamic resources
> we associate with a particular DR Connector as part of
> hotplug/unplug.
> 
> Since general management of boot-time device trees are handled
> outside of sPAPRDRConnector, we insert these values blindly given
> an FDT and offset. A mask of sPAPRDRConnector types is given to
> instruct us on what types of connectors entries should be generated
> for, since descriptions for different connectors may live in
> different parts of the device tree.
> 
> Based on code originally written by Nathan Fontenot.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> ---
>  hw/ppc/spapr_drc.c         | 165 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |   4 +-
>  2 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8b01200..2f71696 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -569,3 +569,168 @@ sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>              (get_type_shift(type) << DRC_INDEX_TYPE_SHIFT) |
>              (id & DRC_INDEX_ID_MASK));
>  }
> +
> +/* generate a string the describes the DRC to encode into the
> + * device tree.
> + *
> + * as documented by PAPR+ v2.7, 13.5.2.6 and C.6.1
> + */
> +static char *spapr_drc_get_type_str(sPAPRDRConnectorType type)
> +{
> +    char *type_str = NULL;
> +
> +    switch (type) {
> +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +        type_str = g_strdup_printf("CPU");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PHB:
> +        type_str = g_strdup_printf("PHB");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_VIO:
> +        type_str = g_strdup_printf("SLOT");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +        type_str = g_strdup_printf("28");
> +        break;
> +    case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +        type_str = g_strdup_printf("MEM");
> +        break;
> +    default:
> +        g_assert(false);
> +    }
> +
> +    return type_str;
> +}
> +
> +/**
> + * spapr_drc_populate_dt
> + *
> + * @fdt: libfdt device tree
> + * @path: path in the DT to generate properties
> + * @owner: parent Object/DeviceState for which to generate DRC
> + *         descriptions for
> + * @drc_type_mask: mask of sPAPRDRConnectorType values corresponding
> + *   to the types of DRCs to generate entries for
> + *
> + * generate OF properties to describe DRC topology/indices to guests
> + *
> + * as documented in PAPR+ v2.1, 13.5.2
> + */
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> +                          uint32_t drc_type_mask)
> +{
> +    Object *root_container;
> +    ObjectProperty *prop;
> +    uint32_t drc_count = 0;
> +    GArray *drc_indexes, *drc_power_domains;
> +    GString *drc_names, *drc_types;
> +    int ret;
> +
> +    /* the first entry of each properties is a 32-bit integer encoding
> +     * the number of elements in the array. we won't know this until
> +     * we complete the iteration through all the matching DRCs, but
> +     * reserve the space now and set the offsets accordingly so we
> +     * can fill them in later.
> +     */
> +    drc_indexes = g_array_new(false, true, sizeof(uint32_t));
> +    drc_indexes = g_array_set_size(drc_indexes, 1);
> +    drc_power_domains = g_array_new(false, true, sizeof(uint32_t));
> +    drc_power_domains = g_array_set_size(drc_power_domains, 1);
> +    drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> +    drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> +
> +    /* aliases for all DRConnector objects will be rooted in QOM
> +     * composition tree at DRC_CONTAINER_PATH
> +     */
> +    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +
> +    QTAILQ_FOREACH(prop, &root_container->properties, node) {
> +        Object *obj;
> +        sPAPRDRConnector *drc;
> +        sPAPRDRConnectorClass *drck;
> +        char *drc_type;
> +        uint32_t drc_index, drc_power_domain;
> +
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +
> +        obj = object_property_get_link(root_container, prop->name, NULL);
> +        drc = SPAPR_DR_CONNECTOR(obj);
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +        if (owner && (drc->owner != owner)) {
> +            continue;
> +        }
> +
> +        if ((drc->type & drc_type_mask) == 0) {
> +            continue;
> +        }
> +
> +        drc_count++;
> +
> +        /* ibm,drc-indexes */
> +        drc_index = cpu_to_be32(drck->get_index(drc));
> +        g_array_append_val(drc_indexes, drc_index);
> +
> +        /* ibm,drc-power-domains */
> +        drc_power_domain = cpu_to_be32(-1);
> +        g_array_append_val(drc_power_domains, drc_power_domain);
> +
> +        /* ibm,drc-names */
> +        drc_names = g_string_append(drc_names, drck->get_name(drc));
> +        drc_names = g_string_insert_len(drc_names, -1, "\0", 1);
> +
> +        /* ibm,drc-types */
> +        drc_type = spapr_drc_get_type_str(drc->type);
> +        drc_types = g_string_append(drc_types, drc_type);
> +        drc_types = g_string_insert_len(drc_types, -1, "\0", 1);
> +        g_free(drc_type);
> +    }
> + 
> +    /* now write the drc count into the space we reserved at the
> +     * beginning of the arrays previously
> +     */
> +    *(uint32_t *)drc_indexes->data = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_power_domains->data = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_names->str = cpu_to_be32(drc_count);
> +    *(uint32_t *)drc_types->str = cpu_to_be32(drc_count);
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes",
> +                      drc_indexes->data,
> +                      drc_indexes->len * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't create ibm,drc-indexes property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains",
> +                      drc_power_domains->data,
> +                      drc_power_domains->len * sizeof(uint32_t));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names",
> +                      drc_names->str, drc_names->len);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> +        goto out;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types",
> +                      drc_types->str, drc_types->len);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> +        goto out;
> +    }
> +
> +out:
> +    g_array_free(drc_indexes, true);
> +    g_array_free(drc_power_domains, true);
> +    g_string_free(drc_names, true);
> +    g_string_free(drc_types, true);
> +
> +    return ret;
> +}
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index d6cf597..5895534 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -197,9 +197,11 @@ typedef struct sPAPRDRConnectorClass {
> 
>  sPAPRDRConnector *spapr_dr_connector_new(Object *owner,
>                                           sPAPRDRConnectorType type,
> -                                         uint32_t token);
> +                                         uint32_t id);
>  sPAPRDRConnector *spapr_dr_connector_by_index(uint32_t index);
>  sPAPRDRConnector *spapr_dr_connector_by_id(sPAPRDRConnectorType type,
>                                             uint32_t id);
> +int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
> +                          uint32_t drc_type_mask);
> 
>  #endif /* __HW_SPAPR_DRC_H__ */
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-25  6:40       ` Michael Roth
@ 2015-02-26 20:06         ` Michael Roth
  2015-02-27  5:14           ` David Gibson
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-26 20:06 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting Michael Roth (2015-02-25 00:40:26)
> Quoting Michael Roth (2015-02-24 23:17:24)
> > Quoting David Gibson (2015-02-24 21:11:29)
> > > On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> > > > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > > 
> > > "PCI Host Bridge bridges" :-p
> > > 
> > > > OF-nodes required by PAPR specification and IEEE 1275-1994
> > > > "PCI Bus Binding to Open Firmware" for the device.
> > > > 
> > > > We associate the corresponding FDT for these nodes with the DrcEntry
> > > > corresponding to the slot, which will be fetched via
> > > > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > > > specification. The FDT is cleaned up in the case of unplug.
> > > > 
> > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 371 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > > index 6a3d917..b9af1cd 100644
> > > > --- a/hw/ppc/spapr_pci.c
> > > > +++ b/hw/ppc/spapr_pci.c
> > > > @@ -33,8 +33,11 @@
> > > >  #include <libfdt.h>
> > > >  #include "trace.h"
> > > >  #include "qemu/error-report.h"
> > > > +#include "qapi/qmp/qerror.h"
> > > >  
> > > >  #include "hw/pci/pci_bus.h"
> > > > +#include "hw/ppc/spapr_drc.h"
> > > > +#include "sysemu/device_tree.h"
> > > >  
> > > >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> > > >  #define RTAS_QUERY_FN           0
> > > > @@ -47,7 +50,13 @@
> > > >  #define RTAS_TYPE_MSI           1
> > > >  #define RTAS_TYPE_MSIX          2
> > > >  
> > > > -#include "hw/ppc/spapr_drc.h"
> > > > +#define _FDT(exp) \
> > > > +    do { \
> > > > +        int ret = (exp);                                           \
> > > > +        if (ret < 0) {                                             \
> > > > +            return ret;                                            \
> > > > +        }                                                          \
> > > > +    } while (0)
> > > >  
> > > >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> > > >  {
> > > > @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> > > >      return &phb->iommu_as;
> > > >  }
> > > >  
> > > > +/* Macros to operate with address in OF binding to PCI */
> > > > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > > > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > > > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > > > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > > > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > > > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > > > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > > > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > > > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > > > +/* for 'reg'/'assigned-addresses' OF properties */
> > > > +#define RESOURCE_CELLS_SIZE 2
> > > > +#define RESOURCE_CELLS_ADDRESS 3
> > > > +
> > > > +typedef struct ResourceFields {
> > > > +    uint32_t phys_hi;
> > > > +    uint32_t phys_mid;
> > > > +    uint32_t phys_lo;
> > > > +    uint32_t size_hi;
> > > > +    uint32_t size_lo;
> > > > +} ResourceFields;
> > > 
> > > Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
> > > platforms to be safe.
> > 
> > I seem to remember some rule (C99?) about padding only being used in cases
> > where an n-byte field doesn't naturally fall on an n-byte boundary, but
> > it doesn't hurt to be safe/explicit about what we're expecting.
> > 
> > > 
> > > > +typedef struct ResourceProps {
> > > > +    union {
> > > > +        ResourceFields fields[8];
> > > > +        uint8_t data[sizeof(ResourceFields) * 8];
> > > > +    } reg;
> > > > +    union {
> > > > +        ResourceFields fields[7];
> > > > +        uint8_t data[sizeof(ResourceFields) * 7];
> > > > +    } assigned;
> > > > +    uint32_t reg_len;
> > > > +    uint32_t assigned_len;
> > > > +} ResourceProps;
> > > 
> > > I'm a bit dubious about this union, rather than just casting when you
> > > need the bytestring version.
> > 
> > Yah, I may have gotten a bit carried away there...
> > 
> > > 
> > > > +
> > > > +/* fill in the 'reg'/'assigned-resources' OF properties for
> > > > + * a PCI device. 'reg' describes resource requirements for a
> > > > + * device's IO/MEM regions, 'assigned-addresses' describes the
> > > > + * actual resource assignments.
> > > > + *
> > > > + * the properties are arrays of ('phys-addr', 'size') pairs describing
> > > > + * the addressable regions of the PCI device, where 'phys-addr' is a
> > > > + * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
> > > > + * (phys.hi, phys.mid, phys.lo), and 'size' is a
> > > > + * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
> > > > + * phys.hi = 0xYYXXXXZZ, where:
> > > > + *   0xYY = npt000ss
> > > > + *          |||   |
> > > > + *          |||   +-- space code: 1 if IO region, 2 if MEM region
> > > > + *          ||+------ for non-relocatable IO: 1 if aliased
> > > > + *          ||        for relocatable IO: 1 if below 64KB
> > > > + *          ||        for MEM: 1 if below 1MB
> > > > + *          |+------- 1 if region is prefetchable
> > > > + *          +-------- 1 if region is non-relocatable
> > > > + *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
> > > > + *            bits respectively
> > > > + *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
> > > > + *          to the region
> > > > + *
> > > > + * phys.mid and phys.lo correspond respectively to the hi/lo portions
> > > > + * of the actual address of the region.
> > > > + *
> > > > + * how the phys-addr/size values are used differ slightly between
> > > > + * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
> > > > + * an additional description for the config space region of the
> > > > + * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
> > > > + * to describe the region as relocatable, with an address-mapping
> > > > + * that corresponds directly to the PHB's address space for the
> > > > + * resource. 'assigned-addresses' always has n=1 set with an absolute
> > > > + * address assigned for the resource. in general, 'assigned-addresses'
> > > > + * won't be populated, since addresses for PCI devices are generally
> > > > + * unmapped initially and left to the guest to assign.
> > > > + *
> > > > + * in accordance with PCI Bus Binding to Open Firmware,
> > > > + * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
> > > > + * Appendix C.
> > > > + */
> > > > +static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
> > > > +{
> > > > +    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
> > > > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > > > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > > > +                       b_fff(PCI_FUNC(d->devfn)));
> > > > +    ResourceFields *reg, *assigned;
> > > > +    int i, reg_idx = 0, assigned_idx = 0;
> > > > +
> > > > +    /* config space region */
> > > > +    reg = &rp->reg.fields[reg_idx++];
> > > > +    reg->phys_hi = cpu_to_be32(dev_id);
> > > > +    reg->phys_mid = 0;
> > > > +    reg->phys_lo = 0;
> > > > +    reg->size_hi = 0;
> > > > +    reg->size_lo = 0;
> > > > +
> > > > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > > +        if (!d->io_regions[i].size) {
> > > > +            continue;
> > > > +        }
> > > > +
> > > > +        reg = &rp->reg.fields[reg_idx++];
> > > > +
> > > > +        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > > > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > > > +            reg->phys_hi |= cpu_to_be32(b_ss(1));
> > > > +        } else {
> > > > +            reg->phys_hi |= cpu_to_be32(b_ss(2));
> > > > +        }
> > > > +        reg->phys_mid = 0;
> > > > +        reg->phys_lo = 0;
> > > > +        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
> > > > +        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
> > > > +
> > > > +        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
> > > > +            continue;
> > > 
> > > This has inverted sense, doesn't it?  If the BAR *is* unmapped you
> > > want to continue, skipping the assigned-addresses stuff.
> > 
> > Yes, should be inverted :( It ends up not mattering much with the
> > current userspace code, since we rely on PCI rescan which always
> > re-assigns BARs, but definitely needs fixing.
> > 
> > > 
> > > > +        }
> > > > +
> > > > +        assigned = &rp->assigned.fields[assigned_idx++];
> > > > +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> > > > +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> > > > +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
> > > 
> > > Not sure if I understood the comment above properly; should these
> > > addresses actually be translated through an mappings above the PHB?
> > > Not that there are any such translations on sPAPR, but...
> > 
> > The io_regions[i].addr values are relative to the IO/MEM address spaces for
> > the device, which correspond to IO/MEM windows for the PHB. So I think the
> > memory API should handle any translation above that...
> > 
> > Or do you mean because of the n=0/relocatable IO regions described by 'reg'?
> > 
> > IIUC, when n=0, reg->phys_{mid,lo} can be used to encode a starting offset,
> > so a guest can re-assign/re-program an IO region that's already been assigned
> > and described via the correspond fields in 'assigned-addresses', so long as
> > uses an addr above reg->phys{mid,lo}.
> > 
> > So, we could use those reg->phys_{mid,lo} values as an alternative to the
> > PHBs IO/MEM windows (I guess for platforms that don't provide these windows
> > and just expose the full address space?).
> > 
> > But since we have those windows, we end up not needing this, so we always do
> > n=0, reg->phys_mid = reg->phys_lo = 0, so the values in
> > assigned->phys_{mid,lo} end up just being offsets into the IO/MEM windows,
> > which correspond directly to io_regions[n].addr.
> 
> Argh, not sure what I was thinking. io_regions[n].addr is relative to the
> IO/MEM window, so from the guest perspective .addr is indeed a different
> value...
> 
> So maybe reg->phys_mid/reg->phys_lo do in fact need to reflect the window
> offsets so that the relocatable region assignments are offset
> accordingly...
> 
> Will look into it more.

So, maybe my first response wasn't as misguided as I thought. I haven't
found any documentation that lays out the specifics, but the kernel code makes
it fairly clear that the IO/MEM addresses in assigned-addresses are relative
to PHB's IO/MEM windows.

The code basically does this:

  arch/powerpc/kernel/pci_of_scan.c: of_pci_parse_addrs()
    addrs = of_get('assigned-addresses')
    config_offset = addrs[0] && 0xff (phys.hi)
    pci_bar_num = (config_offset - PCI_BASE_ADDRESS_0) / 4
    base = &addrs[1:2] (phys.mid/phys.lo)
    size = &addrs[3:4] (size.hi, size.lo)
    region.start = base
    region.end = base + size -1
    res = &dev->resource[pci_bar_num];
    pcibios_bus_to_resource(dev->bus, res, &region)
  drivers/pci/host-bridge.c:pcibios_bus_to_resource(bus, bar_res, bar_region)
    bridge = pci_host_bridge(bus)
    for window in bridge->windows:
      # skip if window type (IO/MEM), doesn't match device BAR's res)
      bus_region.start = window->res->start - window->offset
      bus_region.end = window->res->end - window->offset
      if bus_region contains device BAR region:
        offset = window->offset
        break
    //take the region, add the window offset, and use that as the resource addr
    res->start = region->start + offset
    res->end = region->end + offset

I tested this with the rpaphp hotplug module, which honors
assigned-addresses (current guest hotplug uses generic pci rescan instead
due to a long-standing issue with rpaphp only being able to handle
1 slot per PHB if there are devices present at module-load time), by adding
some test code to pre-assign the BARs during hotplug (with addresses relative
to PHB's IO/MEM windows) to make sure assigned-addresses values get properly
translated to the correct offset in guest's io/system address space.

Will add a note in the comments.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-25  0:48       ` David Gibson
@ 2015-02-26 22:21         ` Michael Roth
  2015-02-27  5:31           ` David Gibson
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-26 22:21 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-24 18:48:23)
> On Tue, Feb 24, 2015 at 02:43:45PM -0600, Michael Roth wrote:
> > Quoting David Gibson (2015-02-24 00:40:32)
> > > On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> > > > This interface is used to fetch an OF device-tree nodes that describes a
> > > > newly-attached device to guest. It is called multiple times to walk the
> > > > device-tree node and fetch individual properties into a 'workarea'/buffer
> > > > provided by the guest.
> > > > 
> > > > The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
> > > > the initial hotplug operation, and the state of these RTAS calls is tracked by
> > > > the sPAPRDRConnector. When the last of these properties is successfully
> > > > fetched, we report as special return value to the guest and transition
> > > > the device to a 'configured' state on the QEMU/DRC side.
> > > > 
> > > > See docs/specs/ppc-spapr-hotplug.txt for a complete description of
> > > > this interface.
> > > > 
> > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/ppc/spapr_rtas.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 81 insertions(+)
> > > > 
> > > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > > index f80beb2..b8551b0 100644
> > > > --- a/hw/ppc/spapr_rtas.c
> > > > +++ b/hw/ppc/spapr_rtas.c
> > > > @@ -418,6 +418,85 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > > >      rtas_st(rets, 1, entity_sense);
> > > >  }
> > > >  
> > > > +/* configure-connector work area offsets, int32_t units for field
> > > > + * indexes, bytes for field offset/len values.
> > > > + *
> > > > + * as documented by PAPR+ v2.7, 13.5.3.5
> > > > + */
> > > > +#define CC_IDX_NODE_NAME_OFFSET 2
> > > > +#define CC_IDX_PROP_NAME_OFFSET 2
> > > > +#define CC_IDX_PROP_LEN 3
> > > > +#define CC_IDX_PROP_DATA_OFFSET 4
> > > > +#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
> > > > +#define CC_WA_LEN 4096
> > > > +
> > > > +static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
> > > > +                                         sPAPREnvironment *spapr,
> > > > +                                         uint32_t token, uint32_t nargs,
> > > > +                                         target_ulong args, uint32_t nret,
> > > > +                                         target_ulong rets)
> > > > +{
> > > > +    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
> > > 
> > > You need to check nargs and nret first.
> > 
> > Could've sworn I'd fixed all those cases. Will make sure to take care of it on
> > the next one.
> > 
> > > 
> > > > +    uint64_t wa_offset;
> > > > +    uint32_t drc_index;
> > > > +    sPAPRDRConnector *drc;
> > > > +    sPAPRDRConnectorClass *drck;
> > > > +    sPAPRDRCCResponse resp;
> > > > +    const struct fdt_property *prop = NULL;
> > > > +    char *prop_name = NULL;
> > > > +    int prop_len, rc;
> > > > +
> > > > +    drc_index = rtas_ld(wa_addr, 0);
> > > > +    drc = spapr_dr_connector_by_index(drc_index);
> > > > +    if (!drc) {
> > > > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > > > +                drc_index);
> > > > +        rc = RTAS_OUT_PARAM_ERROR;
> > > > +        goto out;
> > > > +    }
> > > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> > > 
> > > You may have answered this last time round, but if so I forgot the
> > > reason.
> > > 
> > > Why does the awkward iteration need to go down to the drck callback?
> > > Coudln't the drck callback part just supply the fdt fragment blob,
> > > then have generic code which streams it out via iteration?
> > > 
> > > Obviously we have to support the horrid PAPR interface, but it would
> > > be nice to confine the PAPR derived horridness to as small an area as
> > > we can.
> > 
> > That horrid interface percolates all the way up the QEMU stack,
> > unfortunately :)
> > 
> > Upon successfully having it's device tree node received, a DRC transitions
> > to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).
> > 
> > We need to track that state, since it's used to differentiate between
> > a case where a device is set to 'isolated' as part of entity-sense/device
> > configuration, as opposed to 'isolated' as part the unplug path. The
> > overlap between the 2 can occur if we do device_add followed by an immediate
> > device_del, but since the 'configured' transition must occur before the latter,
> > it becomes unambiguous.
> > 
> > It's also possible that a guest might be reset in the middle of a series of
> > calls to configure-connector, in which case that state needs to be reset. This
> > is currently handled by sPAPRDRConnector's reset hook, so if we moved that
> > aspect out I think we'd need to wire up a reset hook for the configure-connector
> > state, which is kinda messy. We'd also need a list of some sort, keyed by
> > the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
> > only one device configuration is 'in-flight' at a time), so we end up
> > duplicating a lot of tracking/functionality.
> 
> Hmm.  You should still be able to handle that with just 2 hooks and 1
> bit of state right?  Say "start_configuration" and "end_configuration"
> or similar.  start_configuration gets the blob from the backend, then
> end_configuration is called once RTAS has finished streaming it out to
> the guest and updates to the cnofigured state.

{start,end}_configuration callbacks would work for handling
the normal 'configured' state transitions induced by the call, but we'd also
need hooks in the "other direction" for a couple cases:

This scenario for instance:

      qemu:                         guest:
      add device0 to drc0
                                    drmgr0: rtas-configure-connector drc0
      drc0->start_configuration...
                                    drmgr0: rtas-configure-connector drc0
                                    drmgr0: rtas-configure-connector drc0
                                    ...
                                    drmgr0: rtas-configure-connector drc0
      del device0                   <device0 removal pending>
                                    drmgr0: rtas-configure-connector drc0
      system_reset
      device0 removed by drc0
        reset hook
      add device1 to drc0           drmgr0: rtas-configure-connector drc0
                                    <begins fetching stale FDT>

So I think we'd need at least a reset hook wired up to the RTAS state.

It's also possible for the guest to force a transition out of the
configured state by ISOLATE'ing the device. This can happen in the
middle of the guests configure-connector calls if there's an error. If
rtas-configure-connector is the one generating the error, it can anticipate
this and automatically reset the state, but in some cases the error is
guest internal: the get_node() in src/drmgr/rtas_calls.c can fail for
memory allocation errors, or unexpected workarea structure, after which
point it simply stops calling rtas-configure-connector and ISOLATEs the
device. If we don't track that, a subsequent unplug/plug could also
result in stale FDT fragments being sent to the guest.

So we'd the RTAS state tracking code to provide an interface of some
sort for the DRC code to call into, which results in a lot of
duplicated state-tracking.

It's a bit unintuitive, but those FDT bits are closely coupled to the DRC
state, so in the end I think that ends up being the most straight-forward
place to manage them.

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-26 20:06         ` Michael Roth
@ 2015-02-27  5:14           ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-02-27  5:14 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 4724 bytes --]

On Thu, Feb 26, 2015 at 02:06:04PM -0600, Michael Roth wrote:
> Quoting Michael Roth (2015-02-25 00:40:26)
> > Quoting Michael Roth (2015-02-24 23:17:24)
> > > Quoting David Gibson (2015-02-24 21:11:29)
> > > > On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
[snip]
> > > > > +        }
> > > > > +
> > > > > +        assigned = &rp->assigned.fields[assigned_idx++];
> > > > > +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> > > > > +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> > > > > +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
> > > > 
> > > > Not sure if I understood the comment above properly; should these
> > > > addresses actually be translated through an mappings above the PHB?
> > > > Not that there are any such translations on sPAPR, but...
> > > 
> > > The io_regions[i].addr values are relative to the IO/MEM address spaces for
> > > the device, which correspond to IO/MEM windows for the PHB. So I think the
> > > memory API should handle any translation above that...
> > > 
> > > Or do you mean because of the n=0/relocatable IO regions described by 'reg'?
> > > 
> > > IIUC, when n=0, reg->phys_{mid,lo} can be used to encode a starting offset,
> > > so a guest can re-assign/re-program an IO region that's already been assigned
> > > and described via the correspond fields in 'assigned-addresses', so long as
> > > uses an addr above reg->phys{mid,lo}.
> > > 
> > > So, we could use those reg->phys_{mid,lo} values as an alternative to the
> > > PHBs IO/MEM windows (I guess for platforms that don't provide these windows
> > > and just expose the full address space?).
> > > 
> > > But since we have those windows, we end up not needing this, so we always do
> > > n=0, reg->phys_mid = reg->phys_lo = 0, so the values in
> > > assigned->phys_{mid,lo} end up just being offsets into the IO/MEM windows,
> > > which correspond directly to io_regions[n].addr.
> > 
> > Argh, not sure what I was thinking. io_regions[n].addr is relative to the
> > IO/MEM window, so from the guest perspective .addr is indeed a different
> > value...
> > 
> > So maybe reg->phys_mid/reg->phys_lo do in fact need to reflect the window
> > offsets so that the relocatable region assignments are offset
> > accordingly...
> > 
> > Will look into it more.
> 
> So, maybe my first response wasn't as misguided as I thought. I haven't
> found any documentation that lays out the specifics, but the kernel code makes
> it fairly clear that the IO/MEM addresses in assigned-addresses are relative
> to PHB's IO/MEM windows.
> 
> The code basically does this:
> 
>   arch/powerpc/kernel/pci_of_scan.c: of_pci_parse_addrs()
>     addrs = of_get('assigned-addresses')
>     config_offset = addrs[0] && 0xff (phys.hi)
>     pci_bar_num = (config_offset - PCI_BASE_ADDRESS_0) / 4
>     base = &addrs[1:2] (phys.mid/phys.lo)
>     size = &addrs[3:4] (size.hi, size.lo)
>     region.start = base
>     region.end = base + size -1
>     res = &dev->resource[pci_bar_num];
>     pcibios_bus_to_resource(dev->bus, res, &region)
>   drivers/pci/host-bridge.c:pcibios_bus_to_resource(bus, bar_res, bar_region)
>     bridge = pci_host_bridge(bus)
>     for window in bridge->windows:
>       # skip if window type (IO/MEM), doesn't match device BAR's res)
>       bus_region.start = window->res->start - window->offset
>       bus_region.end = window->res->end - window->offset
>       if bus_region contains device BAR region:
>         offset = window->offset
>         break
>     //take the region, add the window offset, and use that as the resource addr
>     res->start = region->start + offset
>     res->end = region->end + offset
> 
> I tested this with the rpaphp hotplug module, which honors
> assigned-addresses (current guest hotplug uses generic pci rescan instead
> due to a long-standing issue with rpaphp only being able to handle
> 1 slot per PHB if there are devices present at module-load time), by adding
> some test code to pre-assign the BARs during hotplug (with addresses relative
> to PHB's IO/MEM windows) to make sure assigned-addresses values get properly
> translated to the correct offset in guest's io/system address space.
> 
> Will add a note in the comments.

Ok, sounds like the only problem was a confusing description in the
comment that made it seem like the assigned-address values were GPAs
rather than relative to the PHB windows.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-26 22:21         ` Michael Roth
@ 2015-02-27  5:31           ` David Gibson
  2015-02-27 17:37             ` Michael Roth
  0 siblings, 1 reply; 46+ messages in thread
From: David Gibson @ 2015-02-27  5:31 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 6245 bytes --]

On Thu, Feb 26, 2015 at 04:21:55PM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-02-24 18:48:23)
> > On Tue, Feb 24, 2015 at 02:43:45PM -0600, Michael Roth wrote:
> > > Quoting David Gibson (2015-02-24 00:40:32)
> > > > On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
[snip]
> > > > > +    uint64_t wa_offset;
> > > > > +    uint32_t drc_index;
> > > > > +    sPAPRDRConnector *drc;
> > > > > +    sPAPRDRConnectorClass *drck;
> > > > > +    sPAPRDRCCResponse resp;
> > > > > +    const struct fdt_property *prop = NULL;
> > > > > +    char *prop_name = NULL;
> > > > > +    int prop_len, rc;
> > > > > +
> > > > > +    drc_index = rtas_ld(wa_addr, 0);
> > > > > +    drc = spapr_dr_connector_by_index(drc_index);
> > > > > +    if (!drc) {
> > > > > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > > > > +                drc_index);
> > > > > +        rc = RTAS_OUT_PARAM_ERROR;
> > > > > +        goto out;
> > > > > +    }
> > > > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> > > > 
> > > > You may have answered this last time round, but if so I forgot the
> > > > reason.
> > > > 
> > > > Why does the awkward iteration need to go down to the drck callback?
> > > > Coudln't the drck callback part just supply the fdt fragment blob,
> > > > then have generic code which streams it out via iteration?
> > > > 
> > > > Obviously we have to support the horrid PAPR interface, but it would
> > > > be nice to confine the PAPR derived horridness to as small an area as
> > > > we can.
> > > 
> > > That horrid interface percolates all the way up the QEMU stack,
> > > unfortunately :)
> > > 
> > > Upon successfully having it's device tree node received, a DRC transitions
> > > to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).
> > > 
> > > We need to track that state, since it's used to differentiate between
> > > a case where a device is set to 'isolated' as part of entity-sense/device
> > > configuration, as opposed to 'isolated' as part the unplug path. The
> > > overlap between the 2 can occur if we do device_add followed by an immediate
> > > device_del, but since the 'configured' transition must occur before the latter,
> > > it becomes unambiguous.
> > > 
> > > It's also possible that a guest might be reset in the middle of a series of
> > > calls to configure-connector, in which case that state needs to be reset. This
> > > is currently handled by sPAPRDRConnector's reset hook, so if we moved that
> > > aspect out I think we'd need to wire up a reset hook for the configure-connector
> > > state, which is kinda messy. We'd also need a list of some sort, keyed by
> > > the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
> > > only one device configuration is 'in-flight' at a time), so we end up
> > > duplicating a lot of tracking/functionality.
> > 
> > Hmm.  You should still be able to handle that with just 2 hooks and 1
> > bit of state right?  Say "start_configuration" and "end_configuration"
> > or similar.  start_configuration gets the blob from the backend, then
> > end_configuration is called once RTAS has finished streaming it out to
> > the guest and updates to the cnofigured state.
> 
> {start,end}_configuration callbacks would work for handling
> the normal 'configured' state transitions induced by the call, but we'd also
> need hooks in the "other direction" for a couple cases:
> 
> This scenario for instance:
> 
>       qemu:                         guest:
>       add device0 to drc0
>                                     drmgr0: rtas-configure-connector drc0
>       drc0->start_configuration...
>                                     drmgr0: rtas-configure-connector drc0
>                                     drmgr0: rtas-configure-connector drc0
>                                     ...
>                                     drmgr0: rtas-configure-connector drc0
>       del device0                   <device0 removal pending>
>                                     drmgr0: rtas-configure-connector drc0
>       system_reset
>       device0 removed by drc0
>         reset hook
>       add device1 to drc0           drmgr0: rtas-configure-connector drc0
>                                     <begins fetching stale FDT>
> 
> So I think we'd need at least a reset hook wired up to the RTAS state.
> 
> It's also possible for the guest to force a transition out of the
> configured state by ISOLATE'ing the device. This can happen in the
> middle of the guests configure-connector calls if there's an error. If
> rtas-configure-connector is the one generating the error, it can anticipate
> this and automatically reset the state, but in some cases the error is
> guest internal: the get_node() in src/drmgr/rtas_calls.c can fail for
> memory allocation errors, or unexpected workarea structure, after which
> point it simply stops calling rtas-configure-connector and ISOLATEs the
> device. If we don't track that, a subsequent unplug/plug could also
> result in stale FDT fragments being sent to the guest.
> 
> So we'd the RTAS state tracking code to provide an interface of some
> sort for the DRC code to call into, which results in a lot of
> duplicated state-tracking.
> 
> It's a bit unintuitive, but those FDT bits are closely coupled to the DRC
> state, so in the end I think that ends up being the most straight-forward
> place to manage them.

Hrm.  I'm still not following.  I mean, I see that the back-end needs
to be aware of the three-valued state: CC-not-called /
CC-calls-in-progress / CC-calls-completed, but I'm not seeing why it
needs to be aware of the individual CC calls.

Could we have instead a set_state() callback of some sort, that
handles both some of the existing architected states like ISOLATED,
and also special states like CC-in-progress?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-27  5:31           ` David Gibson
@ 2015-02-27 17:37             ` Michael Roth
  2015-03-02  6:25               ` David Gibson
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Roth @ 2015-02-27 17:37 UTC (permalink / raw)
  To: David Gibson
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

Quoting David Gibson (2015-02-26 23:31:20)
> On Thu, Feb 26, 2015 at 04:21:55PM -0600, Michael Roth wrote:
> > Quoting David Gibson (2015-02-24 18:48:23)
> > > On Tue, Feb 24, 2015 at 02:43:45PM -0600, Michael Roth wrote:
> > > > Quoting David Gibson (2015-02-24 00:40:32)
> > > > > On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> [snip]
> > > > > > +    uint64_t wa_offset;
> > > > > > +    uint32_t drc_index;
> > > > > > +    sPAPRDRConnector *drc;
> > > > > > +    sPAPRDRConnectorClass *drck;
> > > > > > +    sPAPRDRCCResponse resp;
> > > > > > +    const struct fdt_property *prop = NULL;
> > > > > > +    char *prop_name = NULL;
> > > > > > +    int prop_len, rc;
> > > > > > +
> > > > > > +    drc_index = rtas_ld(wa_addr, 0);
> > > > > > +    drc = spapr_dr_connector_by_index(drc_index);
> > > > > > +    if (!drc) {
> > > > > > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > > > > > +                drc_index);
> > > > > > +        rc = RTAS_OUT_PARAM_ERROR;
> > > > > > +        goto out;
> > > > > > +    }
> > > > > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> > > > > 
> > > > > You may have answered this last time round, but if so I forgot the
> > > > > reason.
> > > > > 
> > > > > Why does the awkward iteration need to go down to the drck callback?
> > > > > Coudln't the drck callback part just supply the fdt fragment blob,
> > > > > then have generic code which streams it out via iteration?
> > > > > 
> > > > > Obviously we have to support the horrid PAPR interface, but it would
> > > > > be nice to confine the PAPR derived horridness to as small an area as
> > > > > we can.
> > > > 
> > > > That horrid interface percolates all the way up the QEMU stack,
> > > > unfortunately :)
> > > > 
> > > > Upon successfully having it's device tree node received, a DRC transitions
> > > > to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).
> > > > 
> > > > We need to track that state, since it's used to differentiate between
> > > > a case where a device is set to 'isolated' as part of entity-sense/device
> > > > configuration, as opposed to 'isolated' as part the unplug path. The
> > > > overlap between the 2 can occur if we do device_add followed by an immediate
> > > > device_del, but since the 'configured' transition must occur before the latter,
> > > > it becomes unambiguous.
> > > > 
> > > > It's also possible that a guest might be reset in the middle of a series of
> > > > calls to configure-connector, in which case that state needs to be reset. This
> > > > is currently handled by sPAPRDRConnector's reset hook, so if we moved that
> > > > aspect out I think we'd need to wire up a reset hook for the configure-connector
> > > > state, which is kinda messy. We'd also need a list of some sort, keyed by
> > > > the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
> > > > only one device configuration is 'in-flight' at a time), so we end up
> > > > duplicating a lot of tracking/functionality.
> > > 
> > > Hmm.  You should still be able to handle that with just 2 hooks and 1
> > > bit of state right?  Say "start_configuration" and "end_configuration"
> > > or similar.  start_configuration gets the blob from the backend, then
> > > end_configuration is called once RTAS has finished streaming it out to
> > > the guest and updates to the cnofigured state.
> > 
> > {start,end}_configuration callbacks would work for handling
> > the normal 'configured' state transitions induced by the call, but we'd also
> > need hooks in the "other direction" for a couple cases:
> > 
> > This scenario for instance:
> > 
> >       qemu:                         guest:
> >       add device0 to drc0
> >                                     drmgr0: rtas-configure-connector drc0
> >       drc0->start_configuration...
> >                                     drmgr0: rtas-configure-connector drc0
> >                                     drmgr0: rtas-configure-connector drc0
> >                                     ...
> >                                     drmgr0: rtas-configure-connector drc0
> >       del device0                   <device0 removal pending>
> >                                     drmgr0: rtas-configure-connector drc0
> >       system_reset
> >       device0 removed by drc0
> >         reset hook
> >       add device1 to drc0           drmgr0: rtas-configure-connector drc0
> >                                     <begins fetching stale FDT>
> > 
> > So I think we'd need at least a reset hook wired up to the RTAS state.
> > 
> > It's also possible for the guest to force a transition out of the
> > configured state by ISOLATE'ing the device. This can happen in the
> > middle of the guests configure-connector calls if there's an error. If
> > rtas-configure-connector is the one generating the error, it can anticipate
> > this and automatically reset the state, but in some cases the error is
> > guest internal: the get_node() in src/drmgr/rtas_calls.c can fail for
> > memory allocation errors, or unexpected workarea structure, after which
> > point it simply stops calling rtas-configure-connector and ISOLATEs the
> > device. If we don't track that, a subsequent unplug/plug could also
> > result in stale FDT fragments being sent to the guest.
> > 
> > So we'd the RTAS state tracking code to provide an interface of some
> > sort for the DRC code to call into, which results in a lot of
> > duplicated state-tracking.
> > 
> > It's a bit unintuitive, but those FDT bits are closely coupled to the DRC
> > state, so in the end I think that ends up being the most straight-forward
> > place to manage them.
> 
> Hrm.  I'm still not following.  I mean, I see that the back-end needs
> to be aware of the three-valued state: CC-not-called /
> CC-calls-in-progress / CC-calls-completed, but I'm not seeing why it
> needs to be aware of the individual CC calls.

It's not so much that the backend needs to be aware of the individual
CC calls, but that the front-end/RTAS needs to be aware of the transitions
made by the back-end (as with the example scenarios above), which can
affect the output of those individual CC calls at any point in time.

But maybe I'm misunderstanding your original suggestion. Is this more to
do with the awkward drc->connector_connector interface, rather than the
fact that sPAPRDRConnector manages the FDT/offsets. Because if so...

> 
> Could we have instead a set_state() callback of some sort, that
> handles both some of the existing architected states like ISOLATED,
> and also special states like CC-in-progress?

I suppose we could leave the fdt/fdt_offset/configured values in the
possession of sPAPRDRConnector, but provide a drc->get_cc_state(),
which exposes the following to rtas_ibm_configure_connector():

typedef struct sPAPRDRCCState {
    /* public */
    void *fdt;
    int fdt_offset;
    int fdt_depth;
    
    /* private (or just move these up into sPAPRDRConnector) */
    int fdt_start_offset;
    bool configured;
} sPAPRDRCCState;

If we do that, we can drop the FDT traversal stuff from sPAPRDRConnector,
but still allow it to reset fdt/fdt_offset/configured/etc when it needs
to based on it's internal state transitions.

And along with that provide a drc->set_configured(void) that
rtas_ibm_configure_connector() can call when it successfully completes
the traversal.

Is that the sort of thing you had in mind?

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface
  2015-02-27 17:37             ` Michael Roth
@ 2015-03-02  6:25               ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-03-02  6:25 UTC (permalink / raw)
  To: Michael Roth
  Cc: aik, qemu-devel, agraf, ncmike, qemu-ppc, tyreld, bharata.rao, nfont

[-- Attachment #1: Type: text/plain, Size: 9752 bytes --]

On Fri, Feb 27, 2015 at 11:37:56AM -0600, Michael Roth wrote:
> Quoting David Gibson (2015-02-26 23:31:20)
> > On Thu, Feb 26, 2015 at 04:21:55PM -0600, Michael Roth wrote:
> > > Quoting David Gibson (2015-02-24 18:48:23)
> > > > On Tue, Feb 24, 2015 at 02:43:45PM -0600, Michael Roth wrote:
> > > > > Quoting David Gibson (2015-02-24 00:40:32)
> > > > > > On Mon, Feb 16, 2015 at 08:27:43AM -0600, Michael Roth wrote:
> > [snip]
> > > > > > > +    uint64_t wa_offset;
> > > > > > > +    uint32_t drc_index;
> > > > > > > +    sPAPRDRConnector *drc;
> > > > > > > +    sPAPRDRConnectorClass *drck;
> > > > > > > +    sPAPRDRCCResponse resp;
> > > > > > > +    const struct fdt_property *prop = NULL;
> > > > > > > +    char *prop_name = NULL;
> > > > > > > +    int prop_len, rc;
> > > > > > > +
> > > > > > > +    drc_index = rtas_ld(wa_addr, 0);
> > > > > > > +    drc = spapr_dr_connector_by_index(drc_index);
> > > > > > > +    if (!drc) {
> > > > > > > +        DPRINTF("rtas_ibm_configure_connector: invalid sensor/DRC index: %xh\n",
> > > > > > > +                drc_index);
> > > > > > > +        rc = RTAS_OUT_PARAM_ERROR;
> > > > > > > +        goto out;
> > > > > > > +    }
> > > > > > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > > > +    resp = drck->configure_connector(drc, &prop_name, &prop, &prop_len);
> > > > > > 
> > > > > > You may have answered this last time round, but if so I forgot the
> > > > > > reason.
> > > > > > 
> > > > > > Why does the awkward iteration need to go down to the drck callback?
> > > > > > Coudln't the drck callback part just supply the fdt fragment blob,
> > > > > > then have generic code which streams it out via iteration?
> > > > > > 
> > > > > > Obviously we have to support the horrid PAPR interface, but it would
> > > > > > be nice to confine the PAPR derived horridness to as small an area as
> > > > > > we can.
> > > > > 
> > > > > That horrid interface percolates all the way up the QEMU stack,
> > > > > unfortunately :)
> > > > > 
> > > > > Upon successfully having it's device tree node received, a DRC transitions
> > > > > to a 'configured' state that's defined in the DR state machine (PAPR+ 13.4).
> > > > > 
> > > > > We need to track that state, since it's used to differentiate between
> > > > > a case where a device is set to 'isolated' as part of entity-sense/device
> > > > > configuration, as opposed to 'isolated' as part the unplug path. The
> > > > > overlap between the 2 can occur if we do device_add followed by an immediate
> > > > > device_del, but since the 'configured' transition must occur before the latter,
> > > > > it becomes unambiguous.
> > > > > 
> > > > > It's also possible that a guest might be reset in the middle of a series of
> > > > > calls to configure-connector, in which case that state needs to be reset. This
> > > > > is currently handled by sPAPRDRConnector's reset hook, so if we moved that
> > > > > aspect out I think we'd need to wire up a reset hook for the configure-connector
> > > > > state, which is kinda messy. We'd also need a list of some sort, keyed by
> > > > > the DRC indexes, to handle the subsequent call-per-iteration's (no guarantee
> > > > > only one device configuration is 'in-flight' at a time), so we end up
> > > > > duplicating a lot of tracking/functionality.
> > > > 
> > > > Hmm.  You should still be able to handle that with just 2 hooks and 1
> > > > bit of state right?  Say "start_configuration" and "end_configuration"
> > > > or similar.  start_configuration gets the blob from the backend, then
> > > > end_configuration is called once RTAS has finished streaming it out to
> > > > the guest and updates to the cnofigured state.
> > > 
> > > {start,end}_configuration callbacks would work for handling
> > > the normal 'configured' state transitions induced by the call, but we'd also
> > > need hooks in the "other direction" for a couple cases:
> > > 
> > > This scenario for instance:
> > > 
> > >       qemu:                         guest:
> > >       add device0 to drc0
> > >                                     drmgr0: rtas-configure-connector drc0
> > >       drc0->start_configuration...
> > >                                     drmgr0: rtas-configure-connector drc0
> > >                                     drmgr0: rtas-configure-connector drc0
> > >                                     ...
> > >                                     drmgr0: rtas-configure-connector drc0
> > >       del device0                   <device0 removal pending>
> > >                                     drmgr0: rtas-configure-connector drc0
> > >       system_reset
> > >       device0 removed by drc0
> > >         reset hook
> > >       add device1 to drc0           drmgr0: rtas-configure-connector drc0
> > >                                     <begins fetching stale FDT>
> > > 
> > > So I think we'd need at least a reset hook wired up to the RTAS state.
> > > 
> > > It's also possible for the guest to force a transition out of the
> > > configured state by ISOLATE'ing the device. This can happen in the
> > > middle of the guests configure-connector calls if there's an error. If
> > > rtas-configure-connector is the one generating the error, it can anticipate
> > > this and automatically reset the state, but in some cases the error is
> > > guest internal: the get_node() in src/drmgr/rtas_calls.c can fail for
> > > memory allocation errors, or unexpected workarea structure, after which
> > > point it simply stops calling rtas-configure-connector and ISOLATEs the
> > > device. If we don't track that, a subsequent unplug/plug could also
> > > result in stale FDT fragments being sent to the guest.
> > > 
> > > So we'd the RTAS state tracking code to provide an interface of some
> > > sort for the DRC code to call into, which results in a lot of
> > > duplicated state-tracking.
> > > 
> > > It's a bit unintuitive, but those FDT bits are closely coupled to the DRC
> > > state, so in the end I think that ends up being the most straight-forward
> > > place to manage them.
> > 
> > Hrm.  I'm still not following.  I mean, I see that the back-end needs
> > to be aware of the three-valued state: CC-not-called /
> > CC-calls-in-progress / CC-calls-completed, but I'm not seeing why it
> > needs to be aware of the individual CC calls.
> 
> It's not so much that the backend needs to be aware of the individual
> CC calls, but that the front-end/RTAS needs to be aware of the transitions
> made by the back-end (as with the example scenarios above), which can
> affect the output of those individual CC calls at any point in time.
> 
> But maybe I'm misunderstanding your original suggestion. Is this more to
> do with the awkward drc->connector_connector interface, rather than the
> fact that sPAPRDRConnector manages the FDT/offsets. Because if so...

So, the conceptual operation that needs to take place here is that the
device model needs to convey a device tree fragment to the guest.  If
we were designing an interface to do that today, the obvious method is
to just send a blob of fdt.  One operation, nice and simple.  But PAPR
doesn't do that, because it predates fdt, instead it has this rather
complex and awkward sequence of calls to iterate over the dt fragment.

My basic point here is that we have to live with the PAPR interface -
but we should try to restrict the ugliness to the smallest portion of
the code we can and use a nicer interface internally wherever we can.

So I'd like to restrict knowledge of the CC interface to just the
actual RTAS implementation.  The idea it gets a device tree fragment
from the device model as fdt, then handles the streaming out via
multiple RTAS calls.

> > 
> > Could we have instead a set_state() callback of some sort, that
> > handles both some of the existing architected states like ISOLATED,
> > and also special states like CC-in-progress?
> 
> I suppose we could leave the fdt/fdt_offset/configured values in the
> possession of sPAPRDRConnector, but provide a drc->get_cc_state(),
> which exposes the following to rtas_ibm_configure_connector():
> 
> typedef struct sPAPRDRCCState {
>     /* public */
>     void *fdt;
>     int fdt_offset;
>     int fdt_depth;
>     
>     /* private (or just move these up into sPAPRDRConnector) */
>     int fdt_start_offset;
>     bool configured;
> } sPAPRDRCCState;
> 
> If we do that, we can drop the FDT traversal stuff from sPAPRDRConnector,
> but still allow it to reset fdt/fdt_offset/configured/etc when it needs
> to based on it's internal state transitions.
> 
> And along with that provide a drc->set_configured(void) that
> rtas_ibm_configure_connector() can call when it successfully completes
> the traversal.
> 
> Is that the sort of thing you had in mind?

Um.. I'm not entirely following what you're suggesting.  What I'm
intending is:

RTAS code <-> DTC / device model code

 * Interface is just "give me the fdt" and possibly "set_state"
 * What the RTAS code gets from there is just a void * fdt, no offsets

guest <-> RTAS code

 * This portion fdt_offset and fdt_depth.  It's per-DRC, obviously, so
   it might still live in a DRC linked structure, but it's only ever
   touched by the RTAS code, nothing device backend specific.

 * Clearly, this section of code will need a reset hook so it can
   update its values.

I don't know if that's made anything clearer.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
  2015-02-16 14:10 [Qemu-devel] [PATCH v5 00/16] spapr: add support for pci hotplug Michael Roth
@ 2015-02-16 14:10 ` Michael Roth
  0 siblings, 0 replies; 46+ messages in thread
From: Michael Roth @ 2015-02-16 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, qemu-pcc, agraf, ncmike, tyreld, bharata.rao, nfont, david

This enables hotplug for PHB bridges. Upon hotplug we generate the
OF-nodes required by PAPR specification and IEEE 1275-1994
"PCI Bus Binding to Open Firmware" for the device.

We associate the corresponding FDT for these nodes with the DrcEntry
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification. The FDT is cleaned up in the case of unplug.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 371 insertions(+), 20 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 6a3d917..b9af1cd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -33,8 +33,11 @@
 #include <libfdt.h>
 #include "trace.h"
 #include "qemu/error-report.h"
+#include "qapi/qmp/qerror.h"
 
 #include "hw/pci/pci_bus.h"
+#include "hw/ppc/spapr_drc.h"
+#include "sysemu/device_tree.h"
 
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN           0
@@ -47,7 +50,13 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
-#include "hw/ppc/spapr_drc.h"
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            return ret;                                            \
+        }                                                          \
+    } while (0)
 
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
@@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &phb->iommu_as;
 }
 
+/* Macros to operate with address in OF binding to PCI */
+#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
+#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
+#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
+#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
+#define b_ss(x)         b_x((x), 24, 2) /* the space code */
+#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
+#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
+#define b_fff(x)        b_x((x), 8, 3)  /* function number */
+#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
+
+/* for 'reg'/'assigned-addresses' OF properties */
+#define RESOURCE_CELLS_SIZE 2
+#define RESOURCE_CELLS_ADDRESS 3
+
+typedef struct ResourceFields {
+    uint32_t phys_hi;
+    uint32_t phys_mid;
+    uint32_t phys_lo;
+    uint32_t size_hi;
+    uint32_t size_lo;
+} ResourceFields;
+
+typedef struct ResourceProps {
+    union {
+        ResourceFields fields[8];
+        uint8_t data[sizeof(ResourceFields) * 8];
+    } reg;
+    union {
+        ResourceFields fields[7];
+        uint8_t data[sizeof(ResourceFields) * 7];
+    } assigned;
+    uint32_t reg_len;
+    uint32_t assigned_len;
+} ResourceProps;
+
+/* fill in the 'reg'/'assigned-resources' OF properties for
+ * a PCI device. 'reg' describes resource requirements for a
+ * device's IO/MEM regions, 'assigned-addresses' describes the
+ * actual resource assignments.
+ *
+ * the properties are arrays of ('phys-addr', 'size') pairs describing
+ * the addressable regions of the PCI device, where 'phys-addr' is a
+ * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
+ * (phys.hi, phys.mid, phys.lo), and 'size' is a
+ * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
+ *
+ * phys.hi = 0xYYXXXXZZ, where:
+ *   0xYY = npt000ss
+ *          |||   |
+ *          |||   +-- space code: 1 if IO region, 2 if MEM region
+ *          ||+------ for non-relocatable IO: 1 if aliased
+ *          ||        for relocatable IO: 1 if below 64KB
+ *          ||        for MEM: 1 if below 1MB
+ *          |+------- 1 if region is prefetchable
+ *          +-------- 1 if region is non-relocatable
+ *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
+ *            bits respectively
+ *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
+ *          to the region
+ *
+ * phys.mid and phys.lo correspond respectively to the hi/lo portions
+ * of the actual address of the region.
+ *
+ * how the phys-addr/size values are used differ slightly between
+ * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
+ * an additional description for the config space region of the
+ * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
+ * to describe the region as relocatable, with an address-mapping
+ * that corresponds directly to the PHB's address space for the
+ * resource. 'assigned-addresses' always has n=1 set with an absolute
+ * address assigned for the resource. in general, 'assigned-addresses'
+ * won't be populated, since addresses for PCI devices are generally
+ * unmapped initially and left to the guest to assign.
+ *
+ * in accordance with PCI Bus Binding to Open Firmware,
+ * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
+ * Appendix C.
+ */
+static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
+{
+    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
+    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
+                       b_ddddd(PCI_SLOT(d->devfn)) |
+                       b_fff(PCI_FUNC(d->devfn)));
+    ResourceFields *reg, *assigned;
+    int i, reg_idx = 0, assigned_idx = 0;
+
+    /* config space region */
+    reg = &rp->reg.fields[reg_idx++];
+    reg->phys_hi = cpu_to_be32(dev_id);
+    reg->phys_mid = 0;
+    reg->phys_lo = 0;
+    reg->size_hi = 0;
+    reg->size_lo = 0;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!d->io_regions[i].size) {
+            continue;
+        }
+
+        reg = &rp->reg.fields[reg_idx++];
+
+        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
+        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
+            reg->phys_hi |= cpu_to_be32(b_ss(1));
+        } else {
+            reg->phys_hi |= cpu_to_be32(b_ss(2));
+        }
+        reg->phys_mid = 0;
+        reg->phys_lo = 0;
+        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
+        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
+
+        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
+            continue;
+        }
+
+        assigned = &rp->assigned.fields[assigned_idx++];
+        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
+        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
+        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
+        assigned->size_hi = reg->size_hi;
+        assigned->size_lo = reg->size_lo;
+    }
+
+    rp->reg_len = reg_idx * sizeof(ResourceFields);
+    rp->assigned_len = assigned_idx * sizeof(ResourceFields);
+}
+
+static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
+                                       int phb_index, int drc_index,
+                                       const char *drc_name)
+{
+    ResourceProps rp;
+    bool is_bridge = false;
+    int pci_status;
+
+    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
+        PCI_HEADER_TYPE_BRIDGE) {
+        is_bridge = true;
+    }
+
+    /* in accordance with PAPR+ v2.7 13.6.3, Table 181 */
+    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
+                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
+                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
+                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
+    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
+                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2)
+                            << 8));
+    if (pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
+                 pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
+    }
+
+    if (!is_bridge) {
+        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
+            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
+            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
+    }
+
+    if (pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
+                 pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
+    }
+
+    if (pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)) {
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
+                 pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
+        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
+
+    /* the following fdt cells are masked off the pci status register */
+    pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
+    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
+                          PCI_STATUS_DEVSEL_MASK & pci_status));
+
+    if (pci_status & PCI_STATUS_FAST_BACK) {
+        _FDT(fdt_setprop(fdt, offset, "fast-back-to-back", NULL, 0));
+    }
+    if (pci_status & PCI_STATUS_66MHZ) {
+        _FDT(fdt_setprop(fdt, offset, "66mhz-capable", NULL, 0));
+    }
+    if (pci_status & PCI_STATUS_UDF) {
+        _FDT(fdt_setprop(fdt, offset, "udf-supported", NULL, 0));
+    }
+
+    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
+    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", drc_name, strlen(drc_name)));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
+                          RESOURCE_CELLS_ADDRESS));
+    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
+                          RESOURCE_CELLS_SIZE));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
+                          RESOURCE_CELLS_SIZE));
+
+    populate_resource_props(dev, &rp);
+    _FDT(fdt_setprop(fdt, offset, "reg", rp.reg.data, rp.reg_len));
+    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
+                     rp.assigned.data, rp.assigned_len));
+
+    return 0;
+}
+
+/* create OF node for pci device and required OF DT properties */
+static void *spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
+                                       int drc_index, const char *drc_name,
+                                       int *dt_offset)
+{
+    void *fdt;
+    int offset, ret, fdt_size;
+    int slot = PCI_SLOT(dev->devfn);
+    char nodename[512];
+
+    fdt = create_device_tree(&fdt_size);
+    sprintf(nodename, "pci@%d", slot);
+    offset = fdt_add_subnode(fdt, 0, nodename);
+    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index, drc_index,
+                                      drc_name);
+    g_assert(!ret);
+
+    *dt_offset = offset;
+    return fdt;
+}
+
+static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
+                                     sPAPRPHBState *phb,
+                                     PCIDevice *pdev,
+                                     Error **errp)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    DeviceState *dev = DEVICE(pdev);
+    int drc_index = drck->get_index(drc);
+    const char *drc_name = drck->get_name(drc);
+    void *fdt = NULL;
+    int fdt_start_offset = 0;
+
+    /* boot-time devices get their device tree node created by SLOF, but for
+     * hotplugged devices we need QEMU to generate it so the guest can fetch
+     * it via RTAS
+     */
+    if (dev->hotplugged) {
+        fdt = spapr_create_pci_child_dt(phb, pdev, drc_index, drc_name,
+                                        &fdt_start_offset);
+    }
+
+    drck->attach(drc, DEVICE(pdev),
+                 fdt, fdt_start_offset, !dev->hotplugged, errp);
+    if (errp) {
+        g_free(fdt);
+    }
+}
+
+static void spapr_phb_remove_pci_device_cb(DeviceState *dev, void *opaque)
+{
+    /* some version guests do not wait for completion of a device
+     * cleanup (generally done asynchronously by the kernel) before
+     * signaling to QEMU that the device is safe, but instead sleep
+     * for some 'safe' period of time. unfortunately on a busy host
+     * this sleep isn't guaranteed to be long enough, resulting in
+     * bad things like IRQ lines being left asserted during final
+     * device removal. to deal with this we call reset just prior
+     * to finalizing the device, which will put the device back into
+     * an 'idle' state, as the device cleanup code expects.
+     */
+    pci_device_reset(PCI_DEVICE(dev));
+    object_unparent(OBJECT(dev));
+}
+
+static void spapr_phb_remove_pci_device(sPAPRDRConnector *drc,
+                                        sPAPRPHBState *phb,
+                                        PCIDevice *pdev,
+                                        Error **errp)
+{
+    sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    drck->detach(drc, DEVICE(pdev), spapr_phb_remove_pci_device_cb, phb, errp);
+}
+
+static void spapr_phb_hot_plug_child(HotplugHandler *plug_handler,
+                                     DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
+        SPAPR_DR_CONNECTOR_TYPE_PCI,
+        (phb->index << 16) |
+        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
+        pdev->devfn);
+    Error *local_err = NULL;
+
+    /* if DR is disabled we don't need to do anything in the case of
+     * hotplug or coldplug callbacks
+     */
+    if (!phb->dr_enabled) {
+        /* if this is a hotplug operation initiated by the user
+         * we need to let them know it's not enabled
+         */
+        if (plugged_dev->hotplugged) {
+            error_set(errp, QERR_BUS_NO_HOTPLUG,
+                      object_get_typename(OBJECT(phb)));
+        }
+        return;
+    }
+
+    g_assert(drc);
+
+    spapr_phb_add_pci_device(drc, phb, pdev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static void spapr_phb_hot_unplug_child(HotplugHandler *plug_handler,
+                                       DeviceState *plugged_dev, Error **errp)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(DEVICE(plug_handler));
+    PCIDevice *pdev = PCI_DEVICE(plugged_dev);
+    sPAPRDRConnectorClass *drck;
+    sPAPRDRConnector *drc = spapr_dr_connector_by_id(
+        SPAPR_DR_CONNECTOR_TYPE_PCI,
+        (phb->index << 16) |
+        (pci_bus_num(PCI_BUS(qdev_get_parent_bus(plugged_dev))) << 8) |
+        pdev->devfn);
+    Error *local_err = NULL;
+
+    if (!phb->dr_enabled) {
+        error_set(errp, QERR_BUS_NO_HOTPLUG,
+                  object_get_typename(OBJECT(phb)));
+        return;
+    }
+
+    g_assert(drc);
+
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    if (!drck->release_pending(drc)) {
+        spapr_phb_remove_pci_device(drc, phb, pdev, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
 {
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
@@ -574,6 +936,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
                            &sphb->memspace, &sphb->iospace,
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
+    qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL);
 
     /*
      * Initialize PHB address space.
@@ -810,6 +1173,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
     DeviceClass *dc = DEVICE_CLASS(klass);
     sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
+    HotplugHandlerClass *hp = HOTPLUG_HANDLER_CLASS(klass);
 
     hc->root_bus_path = spapr_phb_root_bus_path;
     dc->realize = spapr_phb_realize;
@@ -819,6 +1183,8 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->cannot_instantiate_with_device_add_yet = false;
     spc->finish_realize = spapr_phb_finish_realize;
+    hp->plug = spapr_phb_hot_plug_child;
+    hp->unplug = spapr_phb_hot_unplug_child;
 }
 
 static const TypeInfo spapr_phb_info = {
@@ -827,6 +1193,10 @@ static const TypeInfo spapr_phb_info = {
     .instance_size = sizeof(sPAPRPHBState),
     .class_init    = spapr_phb_class_init,
     .class_size    = sizeof(sPAPRPHBClass),
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
@@ -840,17 +1210,6 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
     return PCI_HOST_BRIDGE(dev);
 }
 
-/* Macros to operate with address in OF binding to PCI */
-#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
-#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
-#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
-#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
-#define b_ss(x)         b_x((x), 24, 2) /* the space code */
-#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
-#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
-#define b_fff(x)        b_x((x), 8, 3)  /* function number */
-#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
-
 typedef struct sPAPRTCEDT {
     void *fdt;
     int node_off;
@@ -920,14 +1279,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
         return bus_off;
     }
 
-#define _FDT(exp) \
-    do { \
-        int ret = (exp);                                           \
-        if (ret < 0) {                                             \
-            return ret;                                            \
-        }                                                          \
-    } while (0)
-
     /* Write PHB properties */
     _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
     _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-03-02  7:02 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
2015-02-23  6:05   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
2015-02-24  6:29   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface Michael Roth
2015-02-24  6:32   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state " Michael Roth
2015-02-24  6:33   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 06/16] spapr: add rtas_st_buffer_direct() helper Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
2015-02-24  6:40   ` David Gibson
2015-02-24 20:43     ` Michael Roth
2015-02-25  0:48       ` David Gibson
2015-02-26 22:21         ` Michael Roth
2015-02-27  5:31           ` David Gibson
2015-02-27 17:37             ` Michael Roth
2015-03-02  6:25               ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
2015-02-24  6:49   ` David Gibson
2015-02-24 20:04     ` Michael Roth
2015-02-25  0:54       ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface Michael Roth
2015-02-24  9:11   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
2015-02-24  9:26   ` David Gibson
2015-02-24 19:21     ` Michael Roth
2015-02-26 15:56   ` Nathan Fontenot
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 11/16] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 12/16] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs Michael Roth
2015-02-24  9:29   ` David Gibson
2015-02-24 18:52     ` Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c Michael Roth
2015-02-24 10:20   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth
2015-02-25  3:11   ` David Gibson
2015-02-25  5:17     ` Michael Roth
2015-02-25  6:40       ` Michael Roth
2015-02-26 20:06         ` Michael Roth
2015-02-27  5:14           ` David Gibson
2015-02-26  3:49       ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
2015-02-25  3:18   ` David Gibson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-16 14:10 [Qemu-devel] [PATCH v5 00/16] spapr: add support for pci hotplug Michael Roth
2015-02-16 14:10 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.