All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: qemu-devel@nongnu.org
Cc: aik@ozlabs.ru, agraf@suse.de, ncmike@ncultra.org,
	qemu-ppc@nongnu.org, tyreld@linux.vnet.ibm.com,
	bharata.rao@gmail.com, nfont@linux.vnet.ibm.com
Subject: [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation
Date: Tue, 23 Dec 2014 06:30:15 -0600	[thread overview]
Message-ID: <1419337831-16552-2-git-send-email-mdroth@linux.vnet.ibm.com> (raw)
In-Reply-To: <1419337831-16552-1-git-send-email-mdroth@linux.vnet.ibm.com>

This adds a general overview of hotplug/dynamic-reconfiguration
for sPAPR/pSeries guest.

As specified in PAPR+ v2.7.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 docs/specs/ppc-spapr-hotplug.txt | 287 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 287 insertions(+)
 create mode 100644 docs/specs/ppc-spapr-hotplug.txt

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
new file mode 100644
index 0000000..6f2863f
--- /dev/null
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -0,0 +1,287 @@
+= sPAPR Dynamic Reconfiguration =
+
+sPAPR/"pseries" guests make use a facility called dynamic-reconfiguration to
+handle hotplugging of dynamic "physical" resources like PCI cards, or
+"logical"/paravirtual resources like memory, CPUs, and "physical"
+host-bridges, which are generally managed by the host/hypervisor and provided
+to guests as virtualized resources. The specifics of dynamic-reconfiguration
+are documented extensively in PAPR+ v2.7, Section 13.1. This document
+provides a summary of that information as it applies to the implementation
+within QEMU.
+
+== Dynamic-reconfiguration Connectors ==
+
+To manage hotplug/unplug of these resources, a firmware abstraction known as
+a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
+resource to the guest, and provide an interface for the guest to manage
+configuration/removal of the resource associated with it.
+
+== Device-tree description of DRCs ==
+
+A set of 4 array Open Firmware device tree properties are used to describe
+the name/index/power-domain/type of each DRC allocated to a guest at
+boot-time. There may be multiple sets of these arrays, rooted at different
+paths in the device tree depending on the type of resource the DRCs manage.
+
+In some cases, the DRCs themselves may be provided by a dynamic resource,
+such as the DRCs managed PCI slots on a hotplugged PHB. In this case the
+arrays would be fetched as part of the device tree retrieval interfaces
+for hotplugged resources described under "Guest->Host interface".
+
+The array properties are described below. Each entry/element in an array
+describes the DRC identified by the element in the corresponding position
+of ibm,drc-indexes:
+
+ibm,drc-names:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <name> string encoded as a byte array
+
+  <name> values for logical/virtual resources are defined in PAPR+ v2.7,
+  Section 13.5.2.4, and basically consist of the type of the resource
+  followed by a space and a numerical value that's unique across resources
+  of that type.
+
+  <name> values for "physical" resources such as PCI or VIO devices are
+  defined as being "location codes", which are the "location labels" of
+  each encapsulating device, starting from the chassis down to the
+  individual slot for the device, concatenated by a hyphen. This provides
+  a mapping of resources to a physical location in a chassis for debugging
+  purposes. For QEMU, this mapping is less important, so we assign a
+  location code that confirms to naming specifications, but is simply a
+  location label for the slot by itself to simplify the implementation.
+  The naming convention for location labels is documented in detail in
+  PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>"
+  for PCI/VIO device slots, where <n> is unique across all PCI/VIO
+  device slots.
+
+ibm,drc-indexes:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs
+    in the machine
+
+  <index> is arbitrary, but in the case of QEMU we try to maintain the
+  convention used to assign them to pSeries guests on pHyp:
+
+    bit[31:28]: integer encoding of <type>, where <type> is:
+                  1 for CPU resource
+                  2 for PHB resource
+                  3 for VIO resource
+                  4 for PCI resource
+                  8 for Memory resource
+    bit[27:0]: integer encoding of <id>, where <id> is unique across
+                 all resources of specified type
+
+ibm,drc-power-domains:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the
+    power domain the resource will be assigned to. In the case of QEMU
+    we associated all resources with a "live insertion" domain, where the
+    power is assumed to be managed automatically. The integer value for
+    this domain is a special value of -1.
+
+
+ibm,drc-types:
+  first 4-bytes: BE-encoded integer denoting the number of entries
+  each entry: a NULL-terminated <type> string encoded as a byte array
+
+  <type> is assigned as follows:
+    "CPU" for a CPU
+    "PHB" for a physical host-bridge
+    "SLOT" for a VIO slot
+    "28" for a PCI slot
+    "MEM" for memory resource
+
+== Guest->Host interface to manage dynamic resources ==
+
+Each DRC is given a globally unique DRC Index, and resources associated with
+a particular DRC are configured/managed by the guest via a number of RTAS
+calls which reference individual DRCs based on the DRC index. This can be
+considered the guest->host interface.
+
+rtas-set-power-level:
+  arg[0]: integer identifying power domain
+  arg[1]: new power level for the domain, 0-100
+  output[0]: status, 0 on success
+  output[1]: power level after command
+
+  Set the power level for a specified power domain
+
+rtas-get-power-level:
+  arg[0]: integer identifying power domain
+  output[0]: status, 0 on success
+  output[1]: current power level
+
+  Get the power level for a specified power domain
+
+rtas-set-indicator:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  arg[2]: desired sensor value
+  output[0]: status, 0 on success
+
+  Set the state of an indicator or sensor. For the purpose of this document we
+  focus on the indicator/sensor types associated with a DRC. The types are:
+
+    9001: isolation-state, controls/indicates whether a device has been made
+          accessible to a guest
+
+          supported sensor values:
+            0: isolate, device is made unaccessible by guest OS
+            1: unisolate, device is made available to guest OS
+
+    9002: dr-indicator, controls "visual" indicator associated with device
+
+          supported sensor values:
+            0: inactive, resource may be safely removed
+            1: active, resource is in use and cannot be safely removed
+            2: identify, used to visually identify slot for interactive hotplug
+            3: action, in most cases, used in the same manner as identify
+
+    9003: allocation-state, generally only used for "logical" DR resources to
+          request the allocation/deallocation of a resource prior to acquiring
+          it via isolation-state->unisolate, or after releasing it via
+          isolation-state->isolate, respectively. for "physical" DR (like PCI
+          hotplug/unplug) the pre-allocation of the resource is implied and
+          this sensor is unused.
+
+          supported sensor values:
+            0: unusable, tell firmware/system the resource can be
+               unallocated/reclaimed and added back to the system resource pool
+            1: usable, request the resource be allocated/reserved for use by
+               guest OS
+            2: exchange, used to allocate a spare resource to use for fail-over
+               in certain situations. unused in QEMU
+            3: recover, used to reclaim a previously allocated resource that's
+               not currently allocated to the guest OS. unused in QEMU
+
+rtas-get-sensor-state:
+  arg[0]: integer identifying sensor/indicator type
+  arg[1]: index of sensor, for DR-related sensors this is generally the
+          DRC index
+  output[0]: status, 0 on success
+
+  Used to read an indicator or sensor value.
+
+  For DR-related operations, the only noteworthy sensor is dr-entity-sense,
+  which has a type value of 9003, as allocation-state does in the case of
+  rtas-set-indicator. The semantics/encodings of the sensor values are distinct
+  however:
+
+  supported sensor values for dr-entity-sense (9003) sensor:
+    0: empty,
+         for physical resources: DRC/slot is empty
+         for logical resources: unused
+    1: present,
+         for physical resources: DRC/slot is populated with a device/resource
+         for logical resources: resource has been allocated to the DRC
+    2: unusable,
+         for physical resources: unused
+         for logical resources: DRC has no resource allocated to it
+    3: exchange,
+         for physical resources: unused
+         for logical resources: resource available for exchange (see
+           allocation-state sensor semantics above)
+    4: recovery,
+         for physical resources: unused
+         for logical resources: resource available for recovery (see
+           allocation-state sensor semantics above)
+
+rtas-ibm-configure-connector:
+  arg[0]: guest physical address of 4096-byte work area buffer
+  arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero
+          if a prior RTAS response indicated a need for additional memory
+  output[0]: status:
+               0: completed transmittal of device-tree node
+               1: instruct guest to prepare for next DT sibling node
+               2: instruct guest to prepare for next DT child node
+               3: instruct guest to prepare for next DT property
+               4: instruct guest to ascend to parent DT node
+               5: instruct guest to provide additional work-area buffer
+                  via arg[1]
+            990x: instruct guest that operation took too long and to try
+                  again later
+
+  Used to fetch an OF device-tree description of the resource associated with
+  a particular DRC. The DRC index is encoded in the first 4-bytes of the first
+  work area buffer.
+
+  Work area layout, using 4-byte offsets:
+    wa[0]: DRC index of the DRC to fetch device-tree nodes from
+    wa[1]: 0 (hard-coded)
+    wa[2]: for next-sibling/next-child response:
+             wa offset of null-terminated string denoting the new node's name
+           for next-property response:
+             wa offset of null-terminated string denoting new property's name
+    wa[3]: for next-property response (unused otherwise):
+             byte-length of new property's value
+    wa[4]: for next-property response (unused otherwise):
+             new property's value, encoded as an OFDT-compatible byte array
+
+== hotplug/unplug events ==
+
+For most DR operations, the hypervisor will issue host->guest add/remove events
+using the EPOW/check-exception notification framework, where the host issues a
+check-exception interrupt, then provides an RTAS event log via an
+rtas-check-exception call issued by the guest in response. This framework is
+documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
+requests via EPOW events.
+
+For DR, this framework has been extended to include hotplug events, which were
+previously unneeded due to direct manipulation of DR-related guest userspace
+tools by host-level management such as an HMC. This level of management is not
+applicable to PowerKVM, hence the reason for extending the notification
+framework to support hotplug events.
+
+Note that these events are not yet formally part of the PAPR+ specification,
+but support for this format has already been implemented in DR-related
+guest tools such as powerpc-utils/librtas, as well as kernel patches that have
+been submitted to handle in-kernel processing of memory/cpu-related hotplug
+events[1], and is planned for formal inclusion is PAPR+ specification. The
+hotplug-specific payload is QEMU implemented as follows (with all values
+encoded in big-endian format):
+
+struct rtas_event_log_v6_hp {
+#define SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct section_header {
+        uint16_t section_id;            /* set to SECTION_ID_HOTPLUG */
+        uint16_t section_length;        /* sizeof(rtas_event_log_v6_hp),
+                                         * plus the length of the DRC name
+                                         * if a DRC name identifier is
+                                         * specified for hotplug_identifier
+                                         */
+        uint8_t section_version;        /* version 1 */
+        uint8_t section_subtype;        /* unused */
+        uint16_t creator_component_id;  /* unused */
+    } hdr;
+#define RTAS_LOG_V6_HP_TYPE_CPU         1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY      2
+#define RTAS_LOG_V6_HP_TYPE_SLOT        3
+#define RTAS_LOG_V6_HP_TYPE_PHB         4
+#define RTAS_LOG_V6_HP_TYPE_PCI         5
+    uint8_t hotplug_type;               /* type of resource/device */
+#define RTAS_LOG_V6_HP_ACTION_ADD       1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE    2
+    uint8_t hotplug_action;             /* action (add/remove) */
+#define RTAS_LOG_V6_HP_ID_DRC_NAME      1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX     2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT     3
+    uint8_t hotplug_identifier;         /* type of the resource identifier,
+                                         * which serves as the discriminator
+                                         * for the 'drc' union field below
+                                         */
+    uint8_t reserved;
+    union {
+        uint32_t index;                 /* DRC index of resource to take action
+                                         * on
+                                         */
+        uint32_t count;                 /* number of DR resources to take
+                                         * action on (guest chooses which)
+                                         */
+        char name[1];                   /* string representing the name of the
+                                         * DRC to take action on
+                                         */
+    } drc;
+} QEMU_PACKED;
+
+[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
-- 
1.9.1

  reply	other threads:[~2014-12-23 12:31 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-23 12:30 [Qemu-devel] [PATCH v4 00/17] spapr: add support for pci hotplug Michael Roth
2014-12-23 12:30 ` Michael Roth [this message]
2015-01-16  5:28   ` [Qemu-devel] [PATCH v4 01/17] docs: add sPAPR hotplug/dynamic-reconfiguration documentation David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 02/17] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
2015-01-02 10:32   ` Bharata B Rao
2015-01-26  4:56     ` Michael Roth
2015-01-16  6:19   ` David Gibson
2015-01-26  4:01     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 03/17] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
2015-01-16  6:21   ` David Gibson
2015-01-26  5:21     ` Michael Roth
2015-01-27  5:24       ` David Gibson
2015-01-27 21:36         ` Michael Roth
2015-01-27 22:05           ` Tyrel Datwyler
2015-01-28  0:42             ` Michael Roth
2015-02-09 16:29               ` Nathan Fontenot
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 04/17] spapr_rtas: add set-indicator RTAS interface Michael Roth
2015-01-16  6:25   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 05/17] spapr_rtas: add get-sensor-state " Michael Roth
2015-01-16  6:28   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 06/17] spapr: add rtas_st_buffer_direct() helper Michael Roth
2015-01-19  3:25   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 07/17] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
2015-01-19  3:44   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 08/17] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
2015-01-19  4:31   ` David Gibson
2015-01-26 16:56     ` Michael Roth
2015-01-27  5:27       ` David Gibson
2015-01-28  3:56       ` Bharata B Rao
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 09/17] spapr_events: event-scan RTAS interface Michael Roth
2015-01-19  4:34   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 10/17] spapr_drc: add spapr_drc_populate_dt() Michael Roth
2015-01-19  5:15   ` David Gibson
2015-01-26 20:35     ` Michael Roth
2015-01-27  5:30       ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 11/17] spapr: introduce pseries-2.3 machine type Michael Roth
2015-01-19  5:16   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 12/17] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
2015-01-19  5:18   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 13/17] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
2015-01-19  5:20   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 14/17] spapr_pci: populate DRC dt entries for PHBs Michael Roth
2015-01-19  5:22   ` David Gibson
2015-01-26 20:44     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 15/17] pci: make pci_bar useable outside pci.c Michael Roth
2015-01-19  5:24   ` David Gibson
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 16/17] spapr_pci: enable basic hotplug operations Michael Roth
2015-01-19  5:58   ` David Gibson
2015-01-26 21:17     ` Michael Roth
2015-01-27  5:37       ` David Gibson
2015-01-23  5:17   ` Alexey Kardashevskiy
2015-01-26 21:20     ` Michael Roth
2014-12-23 12:30 ` [Qemu-devel] [PATCH v4 17/17] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
2015-01-19  6:00   ` David Gibson
2015-01-26 21:32     ` Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1419337831-16552-2-git-send-email-mdroth@linux.vnet.ibm.com \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=bharata.rao@gmail.com \
    --cc=ncmike@ncultra.org \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=tyreld@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.