linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/3] EEH Support for VFIO PCI Device
@ 2014-06-06  5:00 Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 1/3] powerpc/eeh: Avoid event on passed PE Gavin Shan
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Gavin Shan @ 2014-06-06  5:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu

The series of patches adds support EEH for PCI devices, which are passed
through to PowerKVM based guest via VFIO. The implementation is straightforward
based on the issues or problems we have to resolve to support EEH for PowerKVM
based guest.

- Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly.
  If QEMU can't handle it, the request will be sent to host via newly introduced
  VFIO container IOCTL command (VFIO_EEH_OP) and gets handled in host kernel.

The series of patches requires corresponding QEMU changes.

Change log
==========
v1 -> v2:
        * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl.
          The mechanism KVM in-kernel handling is dropped.
        * Error injection is reimplemented based syscall, instead of KVM in-kerenl
          handling. The logic for error injection token management is moved to
          QEMU. The error injection request is routed to QEMU and then possiblly
          to host kernel.
v2 -> v3:
        * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based
          on the comments from Alexey.
        * Define macros for EEH VFIO operations (Alexey).
        * Clear frozen state after successful PE reset.
        * Merge original [PATCH 1/2/3] to one.
v3 -> v4:
        * Remove the error injection from the patchset. Mike or I will work on that
          later.
        * Rename CONFIG_VFIO_EEH to VFIO_PCI_EEH.
        * Rename the IOCTL command to VFIO_EEH_OP and it's handled by VFIO-PCI device
          instead of VFIO container.
        * Rename the IOCTL argument structure to "vfio_eeh_op" accordingly. Also, more
          fields added to hold return values for RTAS requests.
        * The address mapping stuff is totally removed. When opening or releasing VFIO
          PCI device, notification sent to EEH to update the flags indicates the device
          is passed to guest or not.
        * Change pr_warn() to pr_debug() to avoid DOS as pointed by Alex.W
        * Argument size check issue pointed by Alex.W.
v4 -> v5:
        * Functions for VFIO PCI EEH support are moved to eeh.c and exported from there.
          VFIO PCI driver just uses those functions to tackle IOCTL command VFIO_EEH_OP.
          All of this is to make the code organized in a good way as suggested by Alex.G.
          Another potential benefit is PowerNV/pSeries are sharing "eeh_ops" and same
          infrastructure could possiblly work for KVM_PR and KVM_HV mode at the same time.
        * Don't clear error injection registers after finishing PE reset as the patchset
          is doing nothing related to error injection.
        * Amending Documentation/vfio.txt, which was missed in last revision.
        * No QEMU changes for this revision. "v4" works well. Also, remove "RFC" from the
          subject as the design is basically recognized.
v5 -> v6:
        * CONFIG_VFIO_PCI_EEH removed. Instead to use CONFIG_EEH.
        * Split one ioctl command to 5.
        * In eeh.c, description has been added for those exported functions. Also, the
          functions have negative return values for error and information with other values.
          All digital numbers have been replaced by macros defined in eeh.h. The comments,
          including the function names have been amended not to mention "guest" or "vfio".
        * Add one mutex to protect flag in eeh_dev_open()/release().
        * More information on how to use those ioctl commands to Documentation/vfio.txt.
v6 -> v7:
        * Remove ioctl command VFIO_EEH_PE_GET_ADDR, the PE address will be figured out
          in userland (e.g. QEMU) as Alex.G suggested.
        * Let sPAPR VFIO container process the ioctl commands as VFIO container is naturally
          corresponds to IOMMU group (aka PE on sPAPR platform).
        * All VFIO PCI EEH ioctl commands have "argsz+flags" for its companion data struct.
        * For VFIO PCI EEH ioctl commands, ioctl() returns negative number to indicate error
          or zero for success. Additinal output information is transported by the companion
          data struct.
        * Explaining PE in Documentation/vfio.txt, typo fixes, more comments suggested by
          Alex.G.
        * Split/merge patches according to suggestions from Alex.G and Alex.W.
        * To have EEH stub in drivers/vfio/pci/, which was suggested by Alex.W.
        * Define various EEH options as macros in vfio.h for userland to use.
v7 -> v8:
        * Change ioctl commands back to combined one.
        * EEH related logic was put into drivers/vfio/vfio_eeh.c, which is only built with
          CONFIG_EEH. Otherwise, inline functions defined in include/linux/vfio.h
        * Change vfio.txt according to the source code changes.
        * Fix various comments from internal reviews by Alexey. Thanks to Alexey.
v8 -> v9:
	* Remove unused macros in asm/include/eeh.h
	* Missed to disable VFIO device on error from vfio_spapr_pci_eeh_open().
	* Don't include unused header files in drivers/vfio/vfio_spapr_eeh.c
	* Define inline PE state for VFIO_EEH_PE_GET_STATE.

Gavin Shan (3):
  powerpc/eeh: Avoid event on passed PE
  powerpc/eeh: EEH support for VFIO PCI device
  drivers/vfio: EEH support for VFIO PCI device

 Documentation/vfio.txt                    |  87 +++++++++-
 arch/powerpc/include/asm/eeh.h            |  19 ++
 arch/powerpc/kernel/eeh.c                 | 276 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/eeh-ioda.c |   3 +-
 drivers/vfio/Makefile                     |   1 +
 drivers/vfio/pci/vfio_pci.c               |  18 +-
 drivers/vfio/vfio_iommu_spapr_tce.c       |  17 +-
 drivers/vfio/vfio_spapr_eeh.c             |  87 ++++++++++
 include/linux/vfio.h                      |  23 +++
 include/uapi/linux/vfio.h                 |  34 ++++
 10 files changed, 556 insertions(+), 9 deletions(-)
 create mode 100644 drivers/vfio/vfio_spapr_eeh.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v9 1/3] powerpc/eeh: Avoid event on passed PE
  2014-06-06  5:00 [PATCH v9 0/3] EEH Support for VFIO PCI Device Gavin Shan
@ 2014-06-06  5:00 ` Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 2/3] powerpc/eeh: EEH support for VFIO PCI device Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 3/3] drivers/vfio: " Gavin Shan
  2 siblings, 0 replies; 6+ messages in thread
From: Gavin Shan @ 2014-06-06  5:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu

We must not handle EEH error on devices which are passed to somebody
else. Instead, we expect that the frozen device owner detects an EEH
error and recovers from it.

This avoids EEH error handling on passed through devices so the device
owner gets a chance to handle them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/eeh.h            | 7 +++++++
 arch/powerpc/kernel/eeh.c                 | 8 ++++++++
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7782056..653d981 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -25,6 +25,7 @@
 #include <linux/list.h>
 #include <linux/string.h>
 #include <linux/time.h>
+#include <linux/atomic.h>
 
 struct pci_dev;
 struct pci_bus;
@@ -84,6 +85,7 @@ struct eeh_pe {
 	int freeze_count;		/* Times of froze up		*/
 	struct timeval tstamp;		/* Time on first-time freeze	*/
 	int false_positives;		/* Times of reported #ff's	*/
+	atomic_t pass_dev_cnt;		/* Count of passed through devs	*/
 	struct eeh_pe *parent;		/* Parent PE			*/
 	struct list_head child_list;	/* Link PE to the child list	*/
 	struct list_head edevs;		/* Link list of EEH devices	*/
@@ -93,6 +95,11 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
 		list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+static inline bool eeh_pe_passed(struct eeh_pe *pe)
+{
+	return pe ? !!atomic_read(&pe->pass_dev_cnt) : false;
+}
+
 /*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9c6b899..3bc8b12 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	if (ret > 0)
 		return ret;
 
+	/*
+	 * If the PE isn't owned by us, we shouldn't check the
+	 * state. Instead, let the owner handle it if the PE has
+	 * been frozen.
+	 */
+	if (eeh_pe_passed(pe))
+		return 0;
+
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
 	 * Do this checking under a lock; as multiple PCI devices
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index cab3e62..79193eb 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -892,7 +892,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 				opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no,
 					OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
 				ret = EEH_NEXT_ERR_NONE;
-			} else if ((*pe)->state & EEH_PE_ISOLATED) {
+			} else if ((*pe)->state & EEH_PE_ISOLATED ||
+				   eeh_pe_passed(*pe)) {
 				ret = EEH_NEXT_ERR_NONE;
 			} else {
 				pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n",
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v9 2/3] powerpc/eeh: EEH support for VFIO PCI device
  2014-06-06  5:00 [PATCH v9 0/3] EEH Support for VFIO PCI Device Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 1/3] powerpc/eeh: Avoid event on passed PE Gavin Shan
@ 2014-06-06  5:00 ` Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 3/3] drivers/vfio: " Gavin Shan
  2 siblings, 0 replies; 6+ messages in thread
From: Gavin Shan @ 2014-06-06  5:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu

The patch exports functions to be used by new VFIO ioctl command,
which will be introduced in subsequent patch, to support EEH
functinality for VFIO PCI devices.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/eeh.h |  12 ++
 arch/powerpc/kernel/eeh.c      | 268 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 280 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 653d981..b733044 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -173,6 +173,11 @@ enum {
 #define EEH_STATE_DMA_ACTIVE	(1 << 4)	/* Active DMA		*/
 #define EEH_STATE_MMIO_ENABLED	(1 << 5)	/* MMIO enabled		*/
 #define EEH_STATE_DMA_ENABLED	(1 << 6)	/* DMA enabled		*/
+#define EEH_PE_STATE_NORMAL		0	/* Normal state		*/
+#define EEH_PE_STATE_RESET		1	/* PE reset asserted	*/
+#define EEH_PE_STATE_STOPPED_IO_DMA	2	/* Frozen PE		*/
+#define EEH_PE_STATE_STOPPED_DMA	4	/* Stopped DMA, Enabled IO */
+#define EEH_PE_STATE_UNAVAIL		5	/* Unavailable		*/
 #define EEH_RESET_DEACTIVATE	0	/* Deactivate the PE reset	*/
 #define EEH_RESET_HOT		1	/* Hot reset			*/
 #define EEH_RESET_FUNDAMENTAL	3	/* Fundamental reset		*/
@@ -280,6 +285,13 @@ void eeh_add_device_late(struct pci_dev *);
 void eeh_add_device_tree_late(struct pci_bus *);
 void eeh_add_sysfs_files(struct pci_bus *);
 void eeh_remove_device(struct pci_dev *);
+int eeh_dev_open(struct pci_dev *pdev);
+void eeh_dev_release(struct pci_dev *pdev);
+struct eeh_pe *eeh_iommu_group_to_pe(struct iommu_group *group);
+int eeh_pe_set_option(struct eeh_pe *pe, int option);
+int eeh_pe_get_state(struct eeh_pe *pe);
+int eeh_pe_reset(struct eeh_pe *pe, int option);
+int eeh_pe_configure(struct eeh_pe *pe);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3bc8b12..fc90df0 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -40,6 +40,7 @@
 #include <asm/eeh.h>
 #include <asm/eeh_event.h>
 #include <asm/io.h>
+#include <asm/iommu.h>
 #include <asm/machdep.h>
 #include <asm/ppc-pci.h>
 #include <asm/rtas.h>
@@ -108,6 +109,9 @@ struct eeh_ops *eeh_ops = NULL;
 /* Lock to avoid races due to multiple reports of an error */
 DEFINE_RAW_SPINLOCK(confirm_error_lock);
 
+/* Lock to protect passed flags */
+static DEFINE_MUTEX(eeh_dev_mutex);
+
 /* Buffer for reporting pci register dumps. Its here in BSS, and
  * not dynamically alloced, so that it ends up in RMO where RTAS
  * can access it.
@@ -1106,6 +1110,270 @@ void eeh_remove_device(struct pci_dev *dev)
 	edev->mode &= ~EEH_DEV_SYSFS;
 }
 
+/**
+ * eeh_dev_open - Increase count of pass through devices for PE
+ * @pdev: PCI device
+ *
+ * Increase count of passed through devices for the indicated
+ * PE. In the result, the EEH errors detected on the PE won't be
+ * reported. The PE owner will be responsible for detection
+ * and recovery.
+ */
+int eeh_dev_open(struct pci_dev *pdev)
+{
+	struct eeh_dev *edev;
+
+	mutex_lock(&eeh_dev_mutex);
+
+	/* No PCI device ? */
+	if (!pdev)
+		goto out;
+
+	/* No EEH device or PE ? */
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!edev || !edev->pe)
+		goto out;
+
+	/* Increase PE's pass through count */
+	atomic_inc(&edev->pe->pass_dev_cnt);
+	mutex_unlock(&eeh_dev_mutex);
+
+	return 0;
+out:
+	mutex_unlock(&eeh_dev_mutex);
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(eeh_dev_open);
+
+/**
+ * eeh_dev_release - Decrease count of pass through devices for PE
+ * @pdev: PCI device
+ *
+ * Decrease count of pass through devices for the indicated PE. If
+ * there is no passed through device in PE, the EEH errors detected
+ * on the PE will be reported and handled as usual.
+ */
+void eeh_dev_release(struct pci_dev *pdev)
+{
+	struct eeh_dev *edev;
+
+	mutex_lock(&eeh_dev_mutex);
+
+	/* No PCI device ? */
+	if (!pdev)
+		goto out;
+
+	/* No EEH device ? */
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!edev || !edev->pe || !eeh_pe_passed(edev->pe))
+		goto out;
+
+	/* Decrease PE's pass through count */
+	atomic_dec(&edev->pe->pass_dev_cnt);
+	WARN_ON(atomic_read(&edev->pe->pass_dev_cnt) < 0);
+out:
+	mutex_unlock(&eeh_dev_mutex);
+}
+EXPORT_SYMBOL(eeh_dev_release);
+
+/**
+ * eeh_iommu_group_to_pe - Convert IOMMU group to EEH PE
+ * @group: IOMMU group
+ *
+ * The routine is called to convert IOMMU group to EEH PE.
+ */
+struct eeh_pe *eeh_iommu_group_to_pe(struct iommu_group *group)
+{
+	struct iommu_table *tbl;
+	struct pci_dev *pdev = NULL;
+	struct eeh_dev *edev;
+	bool found = false;
+
+	/* No IOMMU group ? */
+	if (!group)
+		return NULL;
+
+	/* No PCI device ? */
+	for_each_pci_dev(pdev) {
+		tbl = get_iommu_table_base(&pdev->dev);
+		if (tbl && tbl->it_group == group) {
+			found = true;
+			break;
+		}
+	}
+	if (!found)
+		return NULL;
+
+	/* No EEH device or PE ? */
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!edev || !edev->pe)
+		return NULL;
+
+	return edev->pe;
+}
+
+/**
+ * eeh_pe_set_option - Set options for the indicated PE
+ * @pe: EEH PE
+ * @option: requested option
+ *
+ * The routine is called to enable or disable EEH functionality
+ * on the indicated PE, to enable IO or DMA for the frozen PE.
+ */
+int eeh_pe_set_option(struct eeh_pe *pe, int option)
+{
+	int ret = 0;
+
+	/* Invalid PE ? */
+	if (!pe)
+		return -ENODEV;
+
+	/*
+	 * EEH functionality could possibly be disabled, just
+	 * return error for the case. And the EEH functinality
+	 * isn't expected to be disabled on one specific PE.
+	 */
+	switch (option) {
+	case EEH_OPT_ENABLE:
+		if (eeh_enabled())
+			break;
+		ret = -EIO;
+		break;
+	case EEH_OPT_DISABLE:
+		break;
+	case EEH_OPT_THAW_MMIO:
+	case EEH_OPT_THAW_DMA:
+		if (!eeh_ops || !eeh_ops->set_option) {
+			ret = -ENOENT;
+			break;
+		}
+
+		ret = eeh_ops->set_option(pe, option);
+		break;
+	default:
+		pr_debug("%s: Option %d out of range (%d, %d)\n",
+			__func__, option, EEH_OPT_DISABLE, EEH_OPT_THAW_DMA);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_pe_set_option);
+
+/**
+ * eeh_pe_get_state - Retrieve PE's state
+ * @pe: EEH PE
+ *
+ * Retrieve the PE's state, which includes 3 aspects: enabled
+ * DMA, enabled IO and asserted reset.
+ */
+int eeh_pe_get_state(struct eeh_pe *pe)
+{
+	int result, ret = 0;
+	bool rst_active, dma_en, mmio_en;
+
+	/* Existing PE ? */
+	if (!pe)
+		return -ENODEV;
+
+	if (!eeh_ops || !eeh_ops->get_state)
+		return -ENOENT;
+
+	result = eeh_ops->get_state(pe, NULL);
+	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
+	dma_en = !!(result & EEH_STATE_DMA_ENABLED);
+	mmio_en = !!(result & EEH_STATE_MMIO_ENABLED);
+
+	if (rst_active)
+		ret = EEH_PE_STATE_RESET;
+	else if (dma_en && mmio_en)
+		ret = EEH_PE_STATE_NORMAL;
+	else if (!dma_en && !mmio_en)
+		ret = EEH_PE_STATE_STOPPED_IO_DMA;
+	else if (!dma_en && mmio_en)
+		ret = EEH_PE_STATE_STOPPED_DMA;
+	else
+		ret = EEH_PE_STATE_UNAVAIL;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_pe_get_state);
+
+/**
+ * eeh_pe_reset - Issue PE reset according to specified type
+ * @pe: EEH PE
+ * @option: reset type
+ *
+ * The routine is called to reset the specified PE with the
+ * indicated type, either fundamental reset or hot reset.
+ * PE reset is the most important part for error recovery.
+ */
+int eeh_pe_reset(struct eeh_pe *pe, int option)
+{
+	int ret = 0;
+
+	/* Invalid PE ? */
+	if (!pe)
+		return -ENODEV;
+
+	if (!eeh_ops || !eeh_ops->set_option || !eeh_ops->reset)
+		return -ENOENT;
+
+	switch (option) {
+	case EEH_RESET_DEACTIVATE:
+		ret = eeh_ops->reset(pe, option);
+		if (ret)
+			break;
+
+		/*
+		 * The PE is still in frozen state and we need to clear
+		 * that. It's good to clear frozen state after deassert
+		 * to avoid messy IO access during reset, which might
+		 * cause recursive frozen PE.
+		 */
+		ret = eeh_ops->set_option(pe, EEH_OPT_THAW_MMIO);
+		if (!ret)
+			ret = eeh_ops->set_option(pe, EEH_OPT_THAW_DMA);
+		if (!ret)
+			eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+		break;
+	case EEH_RESET_HOT:
+	case EEH_RESET_FUNDAMENTAL:
+		ret = eeh_ops->reset(pe, option);
+		break;
+	default:
+		pr_debug("%s: Unsupported option %d\n",
+			__func__, option);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_pe_reset);
+
+/**
+ * eeh_pe_configure - Configure PCI bridges after PE reset
+ * @pe: EEH PE
+ *
+ * The routine is called to restore the PCI config space for
+ * those PCI devices, especially PCI bridges affected by PE
+ * reset issued previously.
+ */
+int eeh_pe_configure(struct eeh_pe *pe)
+{
+	int ret = 0;
+
+	/* Invalid PE ? */
+	if (!pe)
+		return -ENODEV;
+
+	/* Restore config space for the affected devices */
+	eeh_pe_restore_bars(pe);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_pe_configure);
+
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
 	if (!eeh_enabled()) {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v9 3/3] drivers/vfio: EEH support for VFIO PCI device
  2014-06-06  5:00 [PATCH v9 0/3] EEH Support for VFIO PCI Device Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 1/3] powerpc/eeh: Avoid event on passed PE Gavin Shan
  2014-06-06  5:00 ` [PATCH v9 2/3] powerpc/eeh: EEH support for VFIO PCI device Gavin Shan
@ 2014-06-06  5:00 ` Gavin Shan
  2014-06-06 18:20   ` Alex Williamson
  2 siblings, 1 reply; 6+ messages in thread
From: Gavin Shan @ 2014-06-06  5:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aik, agraf, Gavin Shan, alex.williamson, qiudayu

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
---
 Documentation/vfio.txt              | 87 +++++++++++++++++++++++++++++++++++--
 drivers/vfio/Makefile               |  1 +
 drivers/vfio/pci/vfio_pci.c         | 18 ++++++--
 drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++-
 drivers/vfio/vfio_spapr_eeh.c       | 87 +++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h                | 23 ++++++++++
 include/uapi/linux/vfio.h           | 34 +++++++++++++++
 7 files changed, 259 insertions(+), 8 deletions(-)
 create mode 100644 drivers/vfio/vfio_spapr_eeh.c

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index b9ca023..3fa4538 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in real mode which provides
 an excellent performance which has limitations such as inability to do
 locked pages accounting in real time.
 
-So 3 additional ioctls have been added:
+4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
+subtree that can be treated as a unit for the purposes of partitioning and
+error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
+function of a multi-function IOA, or multiple IOAs (possibly including switch
+and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors
+and recover from them via EEH RTAS services, which works on the basis of
+additional ioctl commands.
+
+So 4 additional ioctls have been added:
 
 	VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
 		of the DMA window on the PCI bus.
@@ -316,9 +324,12 @@ So 3 additional ioctls have been added:
 
 	VFIO_IOMMU_DISABLE - disables the container.
 
+	VFIO_EEH_PE_OP - provides an API for EEH setup, error detection and recovery.
 
 The code flow from the example above should be slightly changed:
 
+	struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op) };
+
 	.....
 	/* Add the group to the container */
 	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
@@ -342,9 +353,79 @@ The code flow from the example above should be slightly changed:
 	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
-
 	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
-	.....
+
+	/* Get a file descriptor for the device */
+	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
+
+	....
+
+	/* Gratuitous device reset and go... */
+	ioctl(device, VFIO_DEVICE_RESET);
+
+	/* Make sure EEH is supported */
+	ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
+
+	/* Enable the EEH functionality on the device */
+	pe_op.op = VFIO_EEH_PE_ENABLE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* You're suggested to create additional data struct to represent
+	 * PE, and put child devices belonging to same IOMMU group to the
+	 * PE instance for later reference.
+	 */
+
+	/* Check the PE's state and make sure it's in functional state */
+	pe_op.op = VFIO_EEH_PE_GET_STATE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Save device state using pci_save_state().
+	 * EEH should be enabled on the specified device.
+	 */
+
+	....
+
+	/* When 0xFF's returned from reading PCI config space or IO BARs
+	 * of the PCI device. Check the PE's state to see if that has been
+	 * frozen.
+	 */
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Waiting for pending PCI transactions to be completed and don't
+	 * produce any more PCI traffic from/to the affected PE until
+	 * recovery is finished.
+	 */
+
+	/* Enable IO for the affected PE and collect logs. Usually, the
+	 * standard part of PCI config space, AER registers are dumped
+	 * as logs for further analysis.
+	 */
+	pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/*
+	 * Issue PE reset: hot or fundamental reset. Usually, hot reset
+	 * is enough. However, the firmware of some PCI adapters would
+	 * require fundamental reset.
+	 */
+	pe_op.op = VFIO_EEH_PE_RESET_HOT;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+	pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Configure the PCI bridges for the affected PE */
+	pe_op.op = VFIO_EEH_PE_CONFIGURE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Restored state we saved at initialization time. pci_restore_state()
+	 * is good enough as an example.
+	 */
+
+	/* Hopefully, error is recovered successfully. Now, you can resume to
+	 * start PCI traffic to/from the affected PE.
+	 */
+
+	....
 
 -------------------------------------------------------------------------------
 
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 72bfabc..50e30bc 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
+obj-$(CONFIG_EEH) += vfio_spapr_eeh.o
 obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7ba0424..0122665 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -156,8 +156,10 @@ static void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
 
-	if (atomic_dec_and_test(&vdev->refcnt))
+	if (atomic_dec_and_test(&vdev->refcnt)) {
+		vfio_spapr_pci_eeh_release(vdev->pdev);
 		vfio_pci_disable(vdev);
+	}
 
 	module_put(THIS_MODULE);
 }
@@ -165,19 +167,27 @@ static void vfio_pci_release(void *device_data)
 static int vfio_pci_open(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
+	int ret;
 
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
 
 	if (atomic_inc_return(&vdev->refcnt) == 1) {
-		int ret = vfio_pci_enable(vdev);
+		ret = vfio_pci_enable(vdev);
+		if (ret)
+			goto error;
+
+		ret = vfio_spapr_pci_eeh_open(vdev->pdev);
 		if (ret) {
-			module_put(THIS_MODULE);
-			return ret;
+			vfio_pci_disable(vdev);
+			goto error;
 		}
 	}
 
 	return 0;
+error:
+	module_put(THIS_MODULE);
+	return ret;
 }
 
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index a84788b..730b4ef 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -156,7 +156,16 @@ static long tce_iommu_ioctl(void *iommu_data,
 
 	switch (cmd) {
 	case VFIO_CHECK_EXTENSION:
-		return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
+		switch (arg) {
+		case VFIO_SPAPR_TCE_IOMMU:
+			ret = 1;
+			break;
+		default:
+			ret = vfio_spapr_iommu_eeh_ioctl(NULL, cmd, arg);
+			break;
+		}
+
+		return (ret < 0) ? 0 : ret;
 
 	case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
 		struct vfio_iommu_spapr_tce_info info;
@@ -283,6 +292,12 @@ static long tce_iommu_ioctl(void *iommu_data,
 		tce_iommu_disable(container);
 		mutex_unlock(&container->lock);
 		return 0;
+	case VFIO_EEH_PE_OP:
+		if (!container->tbl || !container->tbl->it_group)
+			return -ENODEV;
+
+		return vfio_spapr_iommu_eeh_ioctl(container->tbl->it_group,
+						  cmd, arg);
 	}
 
 	return -ENOTTY;
diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
new file mode 100644
index 0000000..d438394
--- /dev/null
+++ b/drivers/vfio/vfio_spapr_eeh.c
@@ -0,0 +1,87 @@
+/*
+ * EEH functionality support for VFIO devices. The feature is only
+ * available on sPAPR compatible platforms.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2014.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <asm/eeh.h>
+
+/* We might build address mapping here for "fast" path later */
+int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
+{
+	return eeh_dev_open(pdev);
+}
+
+void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
+{
+	eeh_dev_release(pdev);
+}
+
+long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
+				unsigned int cmd, unsigned long arg)
+{
+	struct eeh_pe *pe;
+	struct vfio_eeh_pe_op op;
+	unsigned long minsz;
+	long ret = -EINVAL;
+
+	switch (cmd) {
+	case VFIO_CHECK_EXTENSION:
+		if (arg == VFIO_EEH)
+			ret = eeh_enabled() ? 1 : 0;
+		else
+			ret = 0;
+		break;
+	case VFIO_EEH_PE_OP:
+		pe = eeh_iommu_group_to_pe(group);
+		if (!pe)
+			return -ENODEV;
+
+		minsz = offsetofend(struct vfio_eeh_pe_op, op);
+		if (copy_from_user(&op, (void __user *)arg, minsz))
+			return -EFAULT;
+		if (op.argsz < minsz)
+			return -EINVAL;
+
+		switch (op.op) {
+		case VFIO_EEH_PE_DISABLE:
+			ret = eeh_pe_set_option(pe, EEH_OPT_DISABLE);
+			break;
+		case VFIO_EEH_PE_ENABLE:
+			ret = eeh_pe_set_option(pe, EEH_OPT_ENABLE);
+			break;
+		case VFIO_EEH_PE_UNFREEZE_IO:
+			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_MMIO);
+			break;
+		case VFIO_EEH_PE_UNFREEZE_DMA:
+			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_DMA);
+			break;
+		case VFIO_EEH_PE_GET_STATE:
+			ret = eeh_pe_get_state(pe);
+			break;
+		case VFIO_EEH_PE_RESET_DEACTIVATE:
+			ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE);
+			break;
+		case VFIO_EEH_PE_RESET_HOT:
+			ret = eeh_pe_reset(pe, EEH_RESET_HOT);
+			break;
+		case VFIO_EEH_PE_RESET_FUNDAMENTAL:
+			ret = eeh_pe_reset(pe, EEH_RESET_FUNDAMENTAL);
+			break;
+		case VFIO_EEH_PE_CONFIGURE:
+			ret = eeh_pe_configure(pe);
+			break;
+		default:
+			ret = -EINVAL;
+		}
+	}
+
+	return ret;
+}
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 81022a52..0d3bb8f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,4 +99,27 @@ extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
 
+#ifdef CONFIG_EEH
+extern int vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
+extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
+extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
+				       unsigned int cmd,
+				       unsigned long arg);
+#else
+static inline int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
+{
+	return 0;
+}
+
+static inline void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
+{
+}
+
+static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
+					      unsigned int cmd,
+					      unsigned long arg)
+{
+	return -ENOTTY;
+}
+#endif /* CONFIG_EEH */
 #endif /* VFIO_H */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..6612974 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -30,6 +30,9 @@
  */
 #define VFIO_DMA_CC_IOMMU		4
 
+/* Check if EEH is supported */
+#define VFIO_EEH			5
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
@@ -455,6 +458,37 @@ struct vfio_iommu_spapr_tce_info {
 
 #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
 
+/*
+ * EEH PE operation struct provides ways to:
+ * - enable/disable EEH functionality;
+ * - unfreeze IO/DMA for frozen PE;
+ * - read PE state;
+ * - reset PE;
+ * - configure PE.
+ */
+struct vfio_eeh_pe_op {
+	__u32 argsz;
+	__u32 flags;
+	__u32 op;
+};
+
+#define VFIO_EEH_PE_DISABLE		0	/* Disable EEH functionality */
+#define VFIO_EEH_PE_ENABLE		1	/* Enable EEH functionality  */
+#define VFIO_EEH_PE_UNFREEZE_IO		2	/* Enable IO for frozen PE   */
+#define VFIO_EEH_PE_UNFREEZE_DMA	3	/* Enable DMA for frozen PE  */
+#define VFIO_EEH_PE_GET_STATE		4	/* PE state retrieval        */
+#define  VFIO_EEH_PE_STATE_NORMAL	0	/* PE in functional state    */
+#define  VFIO_EEH_PE_STATE_RESET	1	/* PE reset in progress      */
+#define  VFIO_EEH_PE_STATE_STOPPED	2	/* Stopped DMA and IO        */
+#define  VFIO_EEH_PE_STATE_STOPPED_DMA	4	/* Stopped DMA only          */
+#define  VFIO_EEH_PE_STATE_UNAVAIL	5	/* State unavailable         */
+#define VFIO_EEH_PE_RESET_DEACTIVATE	5	/* Deassert PE reset         */
+#define VFIO_EEH_PE_RESET_HOT		6	/* Assert hot reset          */
+#define VFIO_EEH_PE_RESET_FUNDAMENTAL	7	/* Assert fundamental reset  */
+#define VFIO_EEH_PE_CONFIGURE		8	/* PE configuration          */
+
+#define VFIO_EEH_PE_OP			_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* ***************************************************************** */
 
 #endif /* _UAPIVFIO_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 3/3] drivers/vfio: EEH support for VFIO PCI device
  2014-06-06  5:00 ` [PATCH v9 3/3] drivers/vfio: " Gavin Shan
@ 2014-06-06 18:20   ` Alex Williamson
  2014-06-09 23:49     ` Gavin Shan
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2014-06-06 18:20 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, agraf, kvm-ppc, qiudayu, linuxppc-dev

On Fri, 2014-06-06 at 15:00 +1000, Gavin Shan wrote:
> The patch adds new IOCTL commands for sPAPR VFIO container device
> to support EEH functionality for PCI devices, which have been passed
> through from host to somebody else via VFIO.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Acked-by: Alexander Graf <agraf@suse.de>
> ---
>  Documentation/vfio.txt              | 87 +++++++++++++++++++++++++++++++++++--
>  drivers/vfio/Makefile               |  1 +
>  drivers/vfio/pci/vfio_pci.c         | 18 ++++++--
>  drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++-
>  drivers/vfio/vfio_spapr_eeh.c       | 87 +++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h                | 23 ++++++++++
>  include/uapi/linux/vfio.h           | 34 +++++++++++++++
>  7 files changed, 259 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/vfio/vfio_spapr_eeh.c
> 
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index b9ca023..3fa4538 100644
> --- a/Documentation/vfio.txt
> +++ b/Documentation/vfio.txt
> @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in real mode which provides
>  an excellent performance which has limitations such as inability to do
>  locked pages accounting in real time.
>  
> -So 3 additional ioctls have been added:
> +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
> +subtree that can be treated as a unit for the purposes of partitioning and
> +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
> +function of a multi-function IOA, or multiple IOAs (possibly including switch
> +and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors
> +and recover from them via EEH RTAS services, which works on the basis of
> +additional ioctl commands.
> +
> +So 4 additional ioctls have been added:
>  
>  	VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
>  		of the DMA window on the PCI bus.
> @@ -316,9 +324,12 @@ So 3 additional ioctls have been added:
>  
>  	VFIO_IOMMU_DISABLE - disables the container.
>  
> +	VFIO_EEH_PE_OP - provides an API for EEH setup, error detection and recovery.
>  
>  The code flow from the example above should be slightly changed:
>  
> +	struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op) };
> +
>  	.....
>  	/* Add the group to the container */
>  	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
> @@ -342,9 +353,79 @@ The code flow from the example above should be slightly changed:
>  	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>  
>  	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
> -
>  	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
> -	.....
> +
> +	/* Get a file descriptor for the device */
> +	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
> +
> +	....
> +
> +	/* Gratuitous device reset and go... */
> +	ioctl(device, VFIO_DEVICE_RESET);
> +
> +	/* Make sure EEH is supported */
> +	ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
> +
> +	/* Enable the EEH functionality on the device */
> +	pe_op.op = VFIO_EEH_PE_ENABLE;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/* You're suggested to create additional data struct to represent
> +	 * PE, and put child devices belonging to same IOMMU group to the
> +	 * PE instance for later reference.
> +	 */
> +
> +	/* Check the PE's state and make sure it's in functional state */
> +	pe_op.op = VFIO_EEH_PE_GET_STATE;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/* Save device state using pci_save_state().
> +	 * EEH should be enabled on the specified device.
> +	 */
> +
> +	....
> +
> +	/* When 0xFF's returned from reading PCI config space or IO BARs
> +	 * of the PCI device. Check the PE's state to see if that has been
> +	 * frozen.
> +	 */
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/* Waiting for pending PCI transactions to be completed and don't
> +	 * produce any more PCI traffic from/to the affected PE until
> +	 * recovery is finished.
> +	 */
> +
> +	/* Enable IO for the affected PE and collect logs. Usually, the
> +	 * standard part of PCI config space, AER registers are dumped
> +	 * as logs for further analysis.
> +	 */
> +	pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/*
> +	 * Issue PE reset: hot or fundamental reset. Usually, hot reset
> +	 * is enough. However, the firmware of some PCI adapters would
> +	 * require fundamental reset.
> +	 */
> +	pe_op.op = VFIO_EEH_PE_RESET_HOT;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +	pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/* Configure the PCI bridges for the affected PE */
> +	pe_op.op = VFIO_EEH_PE_CONFIGURE;
> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
> +
> +	/* Restored state we saved at initialization time. pci_restore_state()
> +	 * is good enough as an example.
> +	 */
> +
> +	/* Hopefully, error is recovered successfully. Now, you can resume to
> +	 * start PCI traffic to/from the affected PE.
> +	 */
> +
> +	....
>  
>  -------------------------------------------------------------------------------
>  
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 72bfabc..50e30bc 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,4 +1,5 @@
>  obj-$(CONFIG_VFIO) += vfio.o
>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> +obj-$(CONFIG_EEH) += vfio_spapr_eeh.o
>  obj-$(CONFIG_VFIO_PCI) += pci/
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 7ba0424..0122665 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -156,8 +156,10 @@ static void vfio_pci_release(void *device_data)
>  {
>  	struct vfio_pci_device *vdev = device_data;
>  
> -	if (atomic_dec_and_test(&vdev->refcnt))
> +	if (atomic_dec_and_test(&vdev->refcnt)) {
> +		vfio_spapr_pci_eeh_release(vdev->pdev);
>  		vfio_pci_disable(vdev);
> +	}
>  
>  	module_put(THIS_MODULE);
>  }
> @@ -165,19 +167,27 @@ static void vfio_pci_release(void *device_data)
>  static int vfio_pci_open(void *device_data)
>  {
>  	struct vfio_pci_device *vdev = device_data;
> +	int ret;
>  
>  	if (!try_module_get(THIS_MODULE))
>  		return -ENODEV;
>  
>  	if (atomic_inc_return(&vdev->refcnt) == 1) {
> -		int ret = vfio_pci_enable(vdev);
> +		ret = vfio_pci_enable(vdev);
> +		if (ret)
> +			goto error;
> +
> +		ret = vfio_spapr_pci_eeh_open(vdev->pdev);
>  		if (ret) {
> -			module_put(THIS_MODULE);
> -			return ret;
> +			vfio_pci_disable(vdev);
> +			goto error;
>  		}
>  	}
>  
>  	return 0;
> +error:
> +	module_put(THIS_MODULE);
> +	return ret;
>  }
>  
>  static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index a84788b..730b4ef 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -156,7 +156,16 @@ static long tce_iommu_ioctl(void *iommu_data,
>  
>  	switch (cmd) {
>  	case VFIO_CHECK_EXTENSION:
> -		return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
> +		switch (arg) {
> +		case VFIO_SPAPR_TCE_IOMMU:
> +			ret = 1;
> +			break;
> +		default:
> +			ret = vfio_spapr_iommu_eeh_ioctl(NULL, cmd, arg);
> +			break;
> +		}
> +
> +		return (ret < 0) ? 0 : ret;
>  
>  	case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
>  		struct vfio_iommu_spapr_tce_info info;
> @@ -283,6 +292,12 @@ static long tce_iommu_ioctl(void *iommu_data,
>  		tce_iommu_disable(container);
>  		mutex_unlock(&container->lock);
>  		return 0;
> +	case VFIO_EEH_PE_OP:
> +		if (!container->tbl || !container->tbl->it_group)
> +			return -ENODEV;
> +
> +		return vfio_spapr_iommu_eeh_ioctl(container->tbl->it_group,
> +						  cmd, arg);
>  	}
>  
>  	return -ENOTTY;
> diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
> new file mode 100644
> index 0000000..d438394
> --- /dev/null
> +++ b/drivers/vfio/vfio_spapr_eeh.c
> @@ -0,0 +1,87 @@
> +/*
> + * EEH functionality support for VFIO devices. The feature is only
> + * available on sPAPR compatible platforms.
> + *
> + * Copyright Gavin Shan, IBM Corporation 2014.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <asm/eeh.h>
> +
> +/* We might build address mapping here for "fast" path later */
> +int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
> +{
> +	return eeh_dev_open(pdev);
> +}
> +
> +void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
> +{
> +	eeh_dev_release(pdev);
> +}
> +
> +long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> +				unsigned int cmd, unsigned long arg)
> +{
> +	struct eeh_pe *pe;
> +	struct vfio_eeh_pe_op op;
> +	unsigned long minsz;
> +	long ret = -EINVAL;
> +
> +	switch (cmd) {
> +	case VFIO_CHECK_EXTENSION:
> +		if (arg == VFIO_EEH)
> +			ret = eeh_enabled() ? 1 : 0;
> +		else
> +			ret = 0;
> +		break;
> +	case VFIO_EEH_PE_OP:
> +		pe = eeh_iommu_group_to_pe(group);
> +		if (!pe)
> +			return -ENODEV;
> +
> +		minsz = offsetofend(struct vfio_eeh_pe_op, op);
> +		if (copy_from_user(&op, (void __user *)arg, minsz))
> +			return -EFAULT;
> +		if (op.argsz < minsz)
> +			return -EINVAL;

I'm running low on comments, but I found one more.  flags should be
tested as zero here or else we'll run into problems with adding new
functionality later.  The caller could leave it uninitialized and pass
junk which would then break if we start using those flags.  New
userspace on an old kernel could also intentionally pass flag bits that
we ignore and let through here.


How are you expecting this to go in, an ack from me then pushed through
AlexG's tree?  Thanks,

Alex

> +
> +		switch (op.op) {
> +		case VFIO_EEH_PE_DISABLE:
> +			ret = eeh_pe_set_option(pe, EEH_OPT_DISABLE);
> +			break;
> +		case VFIO_EEH_PE_ENABLE:
> +			ret = eeh_pe_set_option(pe, EEH_OPT_ENABLE);
> +			break;
> +		case VFIO_EEH_PE_UNFREEZE_IO:
> +			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_MMIO);
> +			break;
> +		case VFIO_EEH_PE_UNFREEZE_DMA:
> +			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_DMA);
> +			break;
> +		case VFIO_EEH_PE_GET_STATE:
> +			ret = eeh_pe_get_state(pe);
> +			break;
> +		case VFIO_EEH_PE_RESET_DEACTIVATE:
> +			ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE);
> +			break;
> +		case VFIO_EEH_PE_RESET_HOT:
> +			ret = eeh_pe_reset(pe, EEH_RESET_HOT);
> +			break;
> +		case VFIO_EEH_PE_RESET_FUNDAMENTAL:
> +			ret = eeh_pe_reset(pe, EEH_RESET_FUNDAMENTAL);
> +			break;
> +		case VFIO_EEH_PE_CONFIGURE:
> +			ret = eeh_pe_configure(pe);
> +			break;
> +		default:
> +			ret = -EINVAL;
> +		}
> +	}
> +
> +	return ret;
> +}
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 81022a52..0d3bb8f 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -99,4 +99,27 @@ extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
>  					  unsigned long arg);
>  
> +#ifdef CONFIG_EEH
> +extern int vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
> +extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
> +extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> +				       unsigned int cmd,
> +				       unsigned long arg);
> +#else
> +static inline int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
> +{
> +}
> +
> +static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> +					      unsigned int cmd,
> +					      unsigned long arg)
> +{
> +	return -ENOTTY;
> +}
> +#endif /* CONFIG_EEH */
>  #endif /* VFIO_H */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index cb9023d..6612974 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -30,6 +30,9 @@
>   */
>  #define VFIO_DMA_CC_IOMMU		4
>  
> +/* Check if EEH is supported */
> +#define VFIO_EEH			5
> +
>  /*
>   * The IOCTL interface is designed for extensibility by embedding the
>   * structure length (argsz) and flags into structures passed between
> @@ -455,6 +458,37 @@ struct vfio_iommu_spapr_tce_info {
>  
>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
>  
> +/*
> + * EEH PE operation struct provides ways to:
> + * - enable/disable EEH functionality;
> + * - unfreeze IO/DMA for frozen PE;
> + * - read PE state;
> + * - reset PE;
> + * - configure PE.
> + */
> +struct vfio_eeh_pe_op {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u32 op;
> +};
> +
> +#define VFIO_EEH_PE_DISABLE		0	/* Disable EEH functionality */
> +#define VFIO_EEH_PE_ENABLE		1	/* Enable EEH functionality  */
> +#define VFIO_EEH_PE_UNFREEZE_IO		2	/* Enable IO for frozen PE   */
> +#define VFIO_EEH_PE_UNFREEZE_DMA	3	/* Enable DMA for frozen PE  */
> +#define VFIO_EEH_PE_GET_STATE		4	/* PE state retrieval        */
> +#define  VFIO_EEH_PE_STATE_NORMAL	0	/* PE in functional state    */
> +#define  VFIO_EEH_PE_STATE_RESET	1	/* PE reset in progress      */
> +#define  VFIO_EEH_PE_STATE_STOPPED	2	/* Stopped DMA and IO        */
> +#define  VFIO_EEH_PE_STATE_STOPPED_DMA	4	/* Stopped DMA only          */
> +#define  VFIO_EEH_PE_STATE_UNAVAIL	5	/* State unavailable         */
> +#define VFIO_EEH_PE_RESET_DEACTIVATE	5	/* Deassert PE reset         */
> +#define VFIO_EEH_PE_RESET_HOT		6	/* Assert hot reset          */
> +#define VFIO_EEH_PE_RESET_FUNDAMENTAL	7	/* Assert fundamental reset  */
> +#define VFIO_EEH_PE_CONFIGURE		8	/* PE configuration          */
> +
> +#define VFIO_EEH_PE_OP			_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /* ***************************************************************** */
>  
>  #endif /* _UAPIVFIO_H */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 3/3] drivers/vfio: EEH support for VFIO PCI device
  2014-06-06 18:20   ` Alex Williamson
@ 2014-06-09 23:49     ` Gavin Shan
  0 siblings, 0 replies; 6+ messages in thread
From: Gavin Shan @ 2014-06-09 23:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: aik, Gavin Shan, kvm-ppc, agraf, qiudayu, linuxppc-dev

On Fri, Jun 06, 2014 at 12:20:04PM -0600, Alex Williamson wrote:
>On Fri, 2014-06-06 at 15:00 +1000, Gavin Shan wrote:
>> The patch adds new IOCTL commands for sPAPR VFIO container device
>> to support EEH functionality for PCI devices, which have been passed
>> through from host to somebody else via VFIO.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> Acked-by: Alexander Graf <agraf@suse.de>
>> ---
>>  Documentation/vfio.txt              | 87 +++++++++++++++++++++++++++++++++++--
>>  drivers/vfio/Makefile               |  1 +
>>  drivers/vfio/pci/vfio_pci.c         | 18 ++++++--
>>  drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++-
>>  drivers/vfio/vfio_spapr_eeh.c       | 87 +++++++++++++++++++++++++++++++++++++
>>  include/linux/vfio.h                | 23 ++++++++++
>>  include/uapi/linux/vfio.h           | 34 +++++++++++++++
>>  7 files changed, 259 insertions(+), 8 deletions(-)
>>  create mode 100644 drivers/vfio/vfio_spapr_eeh.c
>> 
>> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
>> index b9ca023..3fa4538 100644
>> --- a/Documentation/vfio.txt
>> +++ b/Documentation/vfio.txt
>> @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in real mode which provides
>>  an excellent performance which has limitations such as inability to do
>>  locked pages accounting in real time.
>>  
>> -So 3 additional ioctls have been added:
>> +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
>> +subtree that can be treated as a unit for the purposes of partitioning and
>> +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
>> +function of a multi-function IOA, or multiple IOAs (possibly including switch
>> +and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors
>> +and recover from them via EEH RTAS services, which works on the basis of
>> +additional ioctl commands.
>> +
>> +So 4 additional ioctls have been added:
>>  
>>  	VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
>>  		of the DMA window on the PCI bus.
>> @@ -316,9 +324,12 @@ So 3 additional ioctls have been added:
>>  
>>  	VFIO_IOMMU_DISABLE - disables the container.
>>  
>> +	VFIO_EEH_PE_OP - provides an API for EEH setup, error detection and recovery.
>>  
>>  The code flow from the example above should be slightly changed:
>>  
>> +	struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op) };
>> +
>>  	.....
>>  	/* Add the group to the container */
>>  	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
>> @@ -342,9 +353,79 @@ The code flow from the example above should be slightly changed:
>>  	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>  
>>  	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
>> -
>>  	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
>> -	.....
>> +
>> +	/* Get a file descriptor for the device */
>> +	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
>> +
>> +	....
>> +
>> +	/* Gratuitous device reset and go... */
>> +	ioctl(device, VFIO_DEVICE_RESET);
>> +
>> +	/* Make sure EEH is supported */
>> +	ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
>> +
>> +	/* Enable the EEH functionality on the device */
>> +	pe_op.op = VFIO_EEH_PE_ENABLE;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/* You're suggested to create additional data struct to represent
>> +	 * PE, and put child devices belonging to same IOMMU group to the
>> +	 * PE instance for later reference.
>> +	 */
>> +
>> +	/* Check the PE's state and make sure it's in functional state */
>> +	pe_op.op = VFIO_EEH_PE_GET_STATE;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/* Save device state using pci_save_state().
>> +	 * EEH should be enabled on the specified device.
>> +	 */
>> +
>> +	....
>> +
>> +	/* When 0xFF's returned from reading PCI config space or IO BARs
>> +	 * of the PCI device. Check the PE's state to see if that has been
>> +	 * frozen.
>> +	 */
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/* Waiting for pending PCI transactions to be completed and don't
>> +	 * produce any more PCI traffic from/to the affected PE until
>> +	 * recovery is finished.
>> +	 */
>> +
>> +	/* Enable IO for the affected PE and collect logs. Usually, the
>> +	 * standard part of PCI config space, AER registers are dumped
>> +	 * as logs for further analysis.
>> +	 */
>> +	pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/*
>> +	 * Issue PE reset: hot or fundamental reset. Usually, hot reset
>> +	 * is enough. However, the firmware of some PCI adapters would
>> +	 * require fundamental reset.
>> +	 */
>> +	pe_op.op = VFIO_EEH_PE_RESET_HOT;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +	pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/* Configure the PCI bridges for the affected PE */
>> +	pe_op.op = VFIO_EEH_PE_CONFIGURE;
>> +	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
>> +
>> +	/* Restored state we saved at initialization time. pci_restore_state()
>> +	 * is good enough as an example.
>> +	 */
>> +
>> +	/* Hopefully, error is recovered successfully. Now, you can resume to
>> +	 * start PCI traffic to/from the affected PE.
>> +	 */
>> +
>> +	....
>>  
>>  -------------------------------------------------------------------------------
>>  
>> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
>> index 72bfabc..50e30bc 100644
>> --- a/drivers/vfio/Makefile
>> +++ b/drivers/vfio/Makefile
>> @@ -1,4 +1,5 @@
>>  obj-$(CONFIG_VFIO) += vfio.o
>>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>>  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>> +obj-$(CONFIG_EEH) += vfio_spapr_eeh.o
>>  obj-$(CONFIG_VFIO_PCI) += pci/
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 7ba0424..0122665 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -156,8 +156,10 @@ static void vfio_pci_release(void *device_data)
>>  {
>>  	struct vfio_pci_device *vdev = device_data;
>>  
>> -	if (atomic_dec_and_test(&vdev->refcnt))
>> +	if (atomic_dec_and_test(&vdev->refcnt)) {
>> +		vfio_spapr_pci_eeh_release(vdev->pdev);
>>  		vfio_pci_disable(vdev);
>> +	}
>>  
>>  	module_put(THIS_MODULE);
>>  }
>> @@ -165,19 +167,27 @@ static void vfio_pci_release(void *device_data)
>>  static int vfio_pci_open(void *device_data)
>>  {
>>  	struct vfio_pci_device *vdev = device_data;
>> +	int ret;
>>  
>>  	if (!try_module_get(THIS_MODULE))
>>  		return -ENODEV;
>>  
>>  	if (atomic_inc_return(&vdev->refcnt) == 1) {
>> -		int ret = vfio_pci_enable(vdev);
>> +		ret = vfio_pci_enable(vdev);
>> +		if (ret)
>> +			goto error;
>> +
>> +		ret = vfio_spapr_pci_eeh_open(vdev->pdev);
>>  		if (ret) {
>> -			module_put(THIS_MODULE);
>> -			return ret;
>> +			vfio_pci_disable(vdev);
>> +			goto error;
>>  		}
>>  	}
>>  
>>  	return 0;
>> +error:
>> +	module_put(THIS_MODULE);
>> +	return ret;
>>  }
>>  
>>  static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index a84788b..730b4ef 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -156,7 +156,16 @@ static long tce_iommu_ioctl(void *iommu_data,
>>  
>>  	switch (cmd) {
>>  	case VFIO_CHECK_EXTENSION:
>> -		return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
>> +		switch (arg) {
>> +		case VFIO_SPAPR_TCE_IOMMU:
>> +			ret = 1;
>> +			break;
>> +		default:
>> +			ret = vfio_spapr_iommu_eeh_ioctl(NULL, cmd, arg);
>> +			break;
>> +		}
>> +
>> +		return (ret < 0) ? 0 : ret;
>>  
>>  	case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
>>  		struct vfio_iommu_spapr_tce_info info;
>> @@ -283,6 +292,12 @@ static long tce_iommu_ioctl(void *iommu_data,
>>  		tce_iommu_disable(container);
>>  		mutex_unlock(&container->lock);
>>  		return 0;
>> +	case VFIO_EEH_PE_OP:
>> +		if (!container->tbl || !container->tbl->it_group)
>> +			return -ENODEV;
>> +
>> +		return vfio_spapr_iommu_eeh_ioctl(container->tbl->it_group,
>> +						  cmd, arg);
>>  	}
>>  
>>  	return -ENOTTY;
>> diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
>> new file mode 100644
>> index 0000000..d438394
>> --- /dev/null
>> +++ b/drivers/vfio/vfio_spapr_eeh.c
>> @@ -0,0 +1,87 @@
>> +/*
>> + * EEH functionality support for VFIO devices. The feature is only
>> + * available on sPAPR compatible platforms.
>> + *
>> + * Copyright Gavin Shan, IBM Corporation 2014.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <asm/eeh.h>
>> +
>> +/* We might build address mapping here for "fast" path later */
>> +int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
>> +{
>> +	return eeh_dev_open(pdev);
>> +}
>> +
>> +void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
>> +{
>> +	eeh_dev_release(pdev);
>> +}
>> +
>> +long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>> +				unsigned int cmd, unsigned long arg)
>> +{
>> +	struct eeh_pe *pe;
>> +	struct vfio_eeh_pe_op op;
>> +	unsigned long minsz;
>> +	long ret = -EINVAL;
>> +
>> +	switch (cmd) {
>> +	case VFIO_CHECK_EXTENSION:
>> +		if (arg == VFIO_EEH)
>> +			ret = eeh_enabled() ? 1 : 0;
>> +		else
>> +			ret = 0;
>> +		break;
>> +	case VFIO_EEH_PE_OP:
>> +		pe = eeh_iommu_group_to_pe(group);
>> +		if (!pe)
>> +			return -ENODEV;
>> +
>> +		minsz = offsetofend(struct vfio_eeh_pe_op, op);
>> +		if (copy_from_user(&op, (void __user *)arg, minsz))
>> +			return -EFAULT;
>> +		if (op.argsz < minsz)
>> +			return -EINVAL;
>
>I'm running low on comments, but I found one more.  flags should be
>tested as zero here or else we'll run into problems with adding new
>functionality later.  The caller could leave it uninitialized and pass
>junk which would then break if we start using those flags.  New
>userspace on an old kernel could also intentionally pass flag bits that
>we ignore and let through here.
>

Thanks, Alex. I'll fix in next revision. The QEMU part needs corresponding
changes (set flags to zero), which will be updated and sent separately.

>
>How are you expecting this to go in, an ack from me then pushed through
>AlexG's tree?  Thanks,
>

I don't think it can catch 3.16 merge window. Perhaps, Patch[1/3] and
Patch[2/3] goes to Ben's tree and Patch[3/3] could be picked by AlexG
or you, but I'm not sure.

Thanks,
Gavin

>> +
>> +		switch (op.op) {
>> +		case VFIO_EEH_PE_DISABLE:
>> +			ret = eeh_pe_set_option(pe, EEH_OPT_DISABLE);
>> +			break;
>> +		case VFIO_EEH_PE_ENABLE:
>> +			ret = eeh_pe_set_option(pe, EEH_OPT_ENABLE);
>> +			break;
>> +		case VFIO_EEH_PE_UNFREEZE_IO:
>> +			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_MMIO);
>> +			break;
>> +		case VFIO_EEH_PE_UNFREEZE_DMA:
>> +			ret = eeh_pe_set_option(pe, EEH_OPT_THAW_DMA);
>> +			break;
>> +		case VFIO_EEH_PE_GET_STATE:
>> +			ret = eeh_pe_get_state(pe);
>> +			break;
>> +		case VFIO_EEH_PE_RESET_DEACTIVATE:
>> +			ret = eeh_pe_reset(pe, EEH_RESET_DEACTIVATE);
>> +			break;
>> +		case VFIO_EEH_PE_RESET_HOT:
>> +			ret = eeh_pe_reset(pe, EEH_RESET_HOT);
>> +			break;
>> +		case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +			ret = eeh_pe_reset(pe, EEH_RESET_FUNDAMENTAL);
>> +			break;
>> +		case VFIO_EEH_PE_CONFIGURE:
>> +			ret = eeh_pe_configure(pe);
>> +			break;
>> +		default:
>> +			ret = -EINVAL;
>> +		}
>> +	}
>> +
>> +	return ret;
>> +}
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index 81022a52..0d3bb8f 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -99,4 +99,27 @@ extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>  					  unsigned long arg);
>>  
>> +#ifdef CONFIG_EEH
>> +extern int vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
>> +extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
>> +extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>> +				       unsigned int cmd,
>> +				       unsigned long arg);
>> +#else
>> +static inline int vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline void vfio_spapr_pci_eeh_release(struct pci_dev *pdev)
>> +{
>> +}
>> +
>> +static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
>> +					      unsigned int cmd,
>> +					      unsigned long arg)
>> +{
>> +	return -ENOTTY;
>> +}
>> +#endif /* CONFIG_EEH */
>>  #endif /* VFIO_H */
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index cb9023d..6612974 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -30,6 +30,9 @@
>>   */
>>  #define VFIO_DMA_CC_IOMMU		4
>>  
>> +/* Check if EEH is supported */
>> +#define VFIO_EEH			5
>> +
>>  /*
>>   * The IOCTL interface is designed for extensibility by embedding the
>>   * structure length (argsz) and flags into structures passed between
>> @@ -455,6 +458,37 @@ struct vfio_iommu_spapr_tce_info {
>>  
>>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
>>  
>> +/*
>> + * EEH PE operation struct provides ways to:
>> + * - enable/disable EEH functionality;
>> + * - unfreeze IO/DMA for frozen PE;
>> + * - read PE state;
>> + * - reset PE;
>> + * - configure PE.
>> + */
>> +struct vfio_eeh_pe_op {
>> +	__u32 argsz;
>> +	__u32 flags;
>> +	__u32 op;
>> +};
>> +
>> +#define VFIO_EEH_PE_DISABLE		0	/* Disable EEH functionality */
>> +#define VFIO_EEH_PE_ENABLE		1	/* Enable EEH functionality  */
>> +#define VFIO_EEH_PE_UNFREEZE_IO		2	/* Enable IO for frozen PE   */
>> +#define VFIO_EEH_PE_UNFREEZE_DMA	3	/* Enable DMA for frozen PE  */
>> +#define VFIO_EEH_PE_GET_STATE		4	/* PE state retrieval        */
>> +#define  VFIO_EEH_PE_STATE_NORMAL	0	/* PE in functional state    */
>> +#define  VFIO_EEH_PE_STATE_RESET	1	/* PE reset in progress      */
>> +#define  VFIO_EEH_PE_STATE_STOPPED	2	/* Stopped DMA and IO        */
>> +#define  VFIO_EEH_PE_STATE_STOPPED_DMA	4	/* Stopped DMA only          */
>> +#define  VFIO_EEH_PE_STATE_UNAVAIL	5	/* State unavailable         */
>> +#define VFIO_EEH_PE_RESET_DEACTIVATE	5	/* Deassert PE reset         */
>> +#define VFIO_EEH_PE_RESET_HOT		6	/* Assert hot reset          */
>> +#define VFIO_EEH_PE_RESET_FUNDAMENTAL	7	/* Assert fundamental reset  */
>> +#define VFIO_EEH_PE_CONFIGURE		8	/* PE configuration          */
>> +
>> +#define VFIO_EEH_PE_OP			_IO(VFIO_TYPE, VFIO_BASE + 21)
>> +
>>  /* ***************************************************************** */
>>  
>>  #endif /* _UAPIVFIO_H */
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-09 23:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06  5:00 [PATCH v9 0/3] EEH Support for VFIO PCI Device Gavin Shan
2014-06-06  5:00 ` [PATCH v9 1/3] powerpc/eeh: Avoid event on passed PE Gavin Shan
2014-06-06  5:00 ` [PATCH v9 2/3] powerpc/eeh: EEH support for VFIO PCI device Gavin Shan
2014-06-06  5:00 ` [PATCH v9 3/3] drivers/vfio: " Gavin Shan
2014-06-06 18:20   ` Alex Williamson
2014-06-09 23:49     ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).