linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest
@ 2014-05-09  7:49 Gavin Shan
  2014-05-09  7:49 ` [PATCH 01/10] drivers/vfio: Introduce CONFIG_VFIO_EEH Gavin Shan
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The series of patches intends to support EEH for PCI devices, which are
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to
support EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly.
  If QEMU can't handle it, the request will be sent to host via newly introduced
  VFIO container IOCTL command (VFIO_EEH_INFO) and gets handled in host kernel.

- The error injection infrastructure need support request from the userland
  utility "errinjct" and PowerKVM based guest. The userland utility "errinjct"
  works on pSeries platform well with dedicated syscall, which helps invoking
  RTAS service to fulfil error injection in kernel. From the perspective, it's
  reasonable to extend the syscall to support PowerNV platform so that OPAL call
  can be invoked in host kernel for injecting errors. The data transported
  between userland and kerenl is still following "struct rtas_args" for both
  cases of PowerNV (OPAL) and pSeries (RTAS).

The series of patches requires corresponding firmware changes from Mike Qiu to
support error injection and QEMU changes to support EEH for guest. QEMU patchset
will be sent separately.

Change log
==========
v1 -> v2:
	* EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl.
	  The mechanism KVM in-kernel handling is dropped.
	* Error injection is reimplemented based syscall, instead of KVM in-kerenl
	  handling. The logic for error injection token management is moved to
	  QEMU. The error injection request is routed to QEMU and then possiblly
	  to host kernel.

Testing on P7
=============

- Emulex adapter

Testing on P8
=============

- Need more testing after design is finalized.

-----

Gavin Shan (10):
  drivers/vfio: Introduce CONFIG_VFIO_EEH
  powerpc/eeh: Info to trace passed devices
  powerpc/eeh: Search EEH device by guest address
  powerpc/eeh: Search EEH PE by guest address
  drivers/vfio: New IOCTL command VFIO_EEH_INFO
  powerpc/eeh: Avoid event on passed PE
  powerpc/powernv: Sync OPAL header file with firmware
  powerpc: Extend syscall ppc_rtas()
  powerpc/powernv: Implement ppc_call_opal()
  powerpc/powernv: Error injection infrastructure

arch/powerpc/include/asm/eeh.h                 |  52 +++++++++++++
arch/powerpc/include/asm/opal.h                |  74 +++++++++++++++++-
arch/powerpc/include/asm/rtas.h                |  10 ++-
arch/powerpc/include/asm/syscalls.h            |   2 +-
arch/powerpc/include/asm/systbl.h              |   2 +-
arch/powerpc/include/uapi/asm/unistd.h         |   2 +-
arch/powerpc/kernel/eeh.c                      |   8 ++
arch/powerpc/kernel/eeh_pe.c                   |  80 +++++++++++++++++++
arch/powerpc/kernel/rtas.c                     |  57 +++-----------
arch/powerpc/kernel/syscalls.c                 |  50 ++++++++++++
arch/powerpc/platforms/powernv/Makefile        |   3 +-
arch/powerpc/platforms/powernv/eeh-ioda.c      |   3 +-
arch/powerpc/platforms/powernv/eeh-vfio.c      | 584 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/errinject.c     | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
arch/powerpc/platforms/powernv/opal.c          |  93 ++++++++++++++++++++++
drivers/vfio/Kconfig                           |   6 ++
drivers/vfio/vfio_iommu_spapr_tce.c            |  12 +++
include/uapi/linux/vfio.h                      |  61 +++++++++++++++
kernel/sys_ni.c                                |   2 +-
20 files changed, 1271 insertions(+), 53 deletions(-)
create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c
create mode 100644 arch/powerpc/platforms/powernv/errinject.c

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/10] drivers/vfio: Introduce CONFIG_VFIO_EEH
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 02/10] powerpc/eeh: Info to trace passed devices Gavin Shan
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch introduces CONFIG_VFIO_EEH for more IOCTL commands on
tce_iommu_driver_ops to support EEH funtionality for PCI devices
that are passed through from host to guest.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/vfio/Kconfig | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index af7b204..4f3293b 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
 	depends on VFIO && SPAPR_TCE_IOMMU
 	default n
 
+config VFIO_EEH
+	tristate
+	depends on EEH && VFIO_IOMMU_SPAPR_TCE
+	default n
+
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
 	select VFIO_IOMMU_TYPE1 if X86
 	select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
+	select VFIO_EEH if PPC_POWERNV
 	select ANON_INODES
 	help
 	  VFIO provides a framework for secure userspace device drivers.
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/10] powerpc/eeh: Info to trace passed devices
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
  2014-05-09  7:49 ` [PATCH 01/10] drivers/vfio: Introduce CONFIG_VFIO_EEH Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 03/10] powerpc/eeh: Search EEH device by guest address Gavin Shan
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The address of passed PCI devices (domain:bus:slot:func) might be
quite different from the perspective of host and guest. We have to
trace the address mapping so that we can emulate EEH RTAS requests
from guest. The patch introduces additional fields to eeh_pe and
eeh_dev for the purpose.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7782056..3268692 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -48,6 +48,14 @@ struct device_node;
 #define EEH_PE_RST_HOLD_TIME		250
 #define EEH_PE_RST_SETTLE_TIME		1800
 
+#ifdef CONFIG_VFIO_EEH
+struct eeh_vfio_pci_addr {
+	uint64_t	buid;		/* PHB BUID			*/
+	uint16_t	bdn;		/* Bus/Device/Function number	*/
+	uint32_t	pe_addr;	/* PE configuration address	*/
+};
+#endif /* CONFIG_VFIO_EEH */
+
 /*
  * The struct is used to trace PE related EEH functionality.
  * In theory, there will have one instance of the struct to
@@ -72,6 +80,7 @@ struct device_node;
 #define EEH_PE_RESET		(1 << 2)	/* PE reset in progress	*/
 
 #define EEH_PE_KEEP		(1 << 8)	/* Keep PE on hotplug	*/
+#define EEH_PE_PASSTHROUGH	(1 << 9)	/* PE owned by guest	*/
 
 struct eeh_pe {
 	int type;			/* PE type: PHB/Bus/Device	*/
@@ -85,6 +94,9 @@ struct eeh_pe {
 	struct timeval tstamp;		/* Time on first-time freeze	*/
 	int false_positives;		/* Times of reported #ff's	*/
 	struct eeh_pe *parent;		/* Parent PE			*/
+#ifdef CONFIG_VFIO_EEH
+	struct eeh_vfio_pci_addr gaddr;	/* Address in guest		*/
+#endif
 	struct list_head child_list;	/* Link PE to the child list	*/
 	struct list_head edevs;		/* Link list of EEH devices	*/
 	struct list_head child;		/* Child PEs			*/
@@ -93,6 +105,21 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
 		list_for_each_entry_safe(edev, tmp, &pe->edevs, list)
 
+static inline bool eeh_pe_passed(struct eeh_pe *pe)
+{
+	return pe ? !!(pe->state & EEH_PE_PASSTHROUGH) : false;
+}
+
+static inline void eeh_pe_set_passed(struct eeh_pe *pe, bool passed)
+{
+	if (pe) {
+		if (passed)
+			pe->state |= EEH_PE_PASSTHROUGH;
+		else
+			pe->state &= ~EEH_PE_PASSTHROUGH;
+	}
+}
+
 /*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
@@ -110,6 +137,7 @@ struct eeh_pe {
 #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
 #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
 #define EEH_DEV_FRESET		(1 << 11)	/* Fundamental reset	*/
+#define EEH_DEV_PASSTHROUGH	(1 << 12)	/* Owned by guest	*/
 
 struct eeh_dev {
 	int mode;			/* EEH mode			*/
@@ -126,6 +154,9 @@ struct eeh_dev {
 	struct device_node *dn;		/* Associated device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
 	struct pci_bus *bus;		/* PCI bus for partial hotplug	*/
+#ifdef CONFIG_VFIO_EEH
+	struct eeh_vfio_pci_addr gaddr;	/* Address in guest		*/
+#endif
 };
 
 static inline struct device_node *eeh_dev_to_of_node(struct eeh_dev *edev)
@@ -138,6 +169,21 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev)
 	return edev ? edev->pdev : NULL;
 }
 
+static inline bool eeh_dev_passed(struct eeh_dev *dev)
+{
+	return dev ? !!(dev->mode & EEH_DEV_PASSTHROUGH) : false;
+}
+
+static inline void eeh_dev_set_passed(struct eeh_dev *dev, bool passed)
+{
+	if (dev) {
+		if (passed)
+			dev->mode |= EEH_DEV_PASSTHROUGH;
+		else
+			dev->mode &= ~EEH_DEV_PASSTHROUGH;
+	}
+}
+
 /* Return values from eeh_ops::next_error */
 enum {
 	EEH_NEXT_ERR_NONE = 0,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/10] powerpc/eeh: Search EEH device by guest address
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
  2014-05-09  7:49 ` [PATCH 01/10] drivers/vfio: Introduce CONFIG_VFIO_EEH Gavin Shan
  2014-05-09  7:49 ` [PATCH 02/10] powerpc/eeh: Info to trace passed devices Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 04/10] powerpc/eeh: Search EEH PE " Gavin Shan
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch introduces function eeh_vfio_dev_get() to search the EEH
device according to its guest address, which is made up of PHB BUID,
bus, slot and function number. The function is useful in the backends
for EEH RTAS emulation.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |  5 +++++
 arch/powerpc/kernel/eeh_pe.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 3268692..8ffaf39 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -381,6 +381,11 @@ static inline void eeh_remove_device(struct pci_dev *dev) { }
 #define EEH_IO_ERROR_VALUE(size) (-1UL)
 #endif /* CONFIG_EEH */
 
+
+#ifdef CONFIG_VFIO_EEH
+struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr);
+#endif /* CONFIG_VFIO_EEH */
+
 #ifdef CONFIG_PPC64
 /*
  * MMIO read/write operations with EEH support.
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index fbd01eb..d09f055 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -248,6 +248,48 @@ struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
 	return pe;
 }
 
+#ifdef CONFIG_VFIO_EEH
+static void *__eeh_vfio_dev_get(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	struct eeh_vfio_pci_addr *addr = (struct eeh_vfio_pci_addr *)flag;
+	struct eeh_dev *edev, *tmp;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (!eeh_dev_passed(edev))
+			continue;
+
+		/* Comparing the address in the guest */
+		if (addr->buid == edev->gaddr.buid &&
+		    addr->bdn  == edev->gaddr.bdn)
+			return edev;
+	}
+
+	return NULL;
+}
+
+/**
+ * eeh_vfio_dev_get - Search EEH device based on guest's address
+ * @addr: EEH device guest address
+ *
+ * Search the EEH device according to its guest's address, which
+ * is made up of PHB BUID, and PCI config address.
+ */
+struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr)
+{
+	struct eeh_pe *root;
+	struct eeh_dev *edev;
+
+	list_for_each_entry(root, &eeh_phb_pe, child) {
+		edev = eeh_pe_traverse(root, __eeh_vfio_dev_get, addr);
+		if (edev)
+			return edev;
+	}
+
+	return NULL;
+}
+#endif /* CONFIG_VFIO_EEH */
+
 /**
  * eeh_pe_get_parent - Retrieve the parent PE
  * @edev: EEH device
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/10] powerpc/eeh: Search EEH PE by guest address
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (2 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 03/10] powerpc/eeh: Search EEH device by guest address Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 05/10] drivers/vfio: New IOCTL command VFIO_EEH_INFO Gavin Shan
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch introduces function eeh_vfio_pe_get() to search the EEH
PE according to its guest address, which is made up of PHB ID and
PE configuration address. The function will be useful in backends
for EEH RTAS emulation.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h |  1 +
 arch/powerpc/kernel/eeh_pe.c   | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8ffaf39..750e028 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -384,6 +384,7 @@ static inline void eeh_remove_device(struct pci_dev *dev) { }
 
 #ifdef CONFIG_VFIO_EEH
 struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr);
+struct eeh_pe *eeh_vfio_pe_get(struct eeh_vfio_pci_addr *addr);
 #endif /* CONFIG_VFIO_EEH */
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index d09f055..8dc58ac 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -288,6 +288,44 @@ struct eeh_dev *eeh_vfio_dev_get(struct eeh_vfio_pci_addr *addr)
 
 	return NULL;
 }
+
+static void *__eeh_vfio_pe_get(void *data, void *flag)
+{
+	struct eeh_pe *pe = (struct eeh_pe *)data;
+	struct eeh_vfio_pci_addr *addr = (struct eeh_vfio_pci_addr *)flag;
+
+	if (!eeh_pe_passed(pe))
+		return NULL;
+
+	/* Comparing the address */
+	if (addr->buid    == pe->gaddr.buid &&
+	    addr->pe_addr == pe->gaddr.pe_addr)
+		return pe;
+
+	return NULL;
+}
+
+/**
+ * eeh_vfio_pe_get - Search EEH PE based on guest's address
+ * @addr: EEH PE guest address
+ *
+ * Search the EEH PE according to the guest address, which
+ * is made up of VM indicator, PHB BUID, and PE configuration
+ * address.
+ */
+struct eeh_pe *eeh_vfio_pe_get(struct eeh_vfio_pci_addr *addr)
+{
+	struct eeh_pe *root;
+	struct eeh_pe *pe;
+
+	list_for_each_entry(root, &eeh_phb_pe, child) {
+		pe = eeh_pe_traverse(root, __eeh_vfio_pe_get, addr);
+		if (pe)
+			return pe;
+	}
+
+	return NULL;
+}
 #endif /* CONFIG_VFIO_EEH */
 
 /**
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/10] drivers/vfio: New IOCTL command VFIO_EEH_INFO
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (3 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 04/10] powerpc/eeh: Search EEH PE " Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 06/10] powerpc/eeh: Avoid event on passed PE Gavin Shan
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch adds new IOCTL command VFIO_EEH_INFO to VFIO container
to support EEH functionality for PCI devices, which have been
passed from host to guest via VFIO.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/eeh-vfio.c | 584 ++++++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_spapr_tce.c       |  12 +
 include/uapi/linux/vfio.h                 |  61 ++++
 4 files changed, 658 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 63cebb9..2b15a03 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,5 +6,6 @@ obj-y			+= opal-msglog.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
 obj-$(CONFIG_EEH)	+= eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_VFIO_EEH)	+= eeh-vfio.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/eeh-vfio.c b/arch/powerpc/platforms/powernv/eeh-vfio.c
new file mode 100644
index 0000000..5766715
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-vfio.c
@@ -0,0 +1,584 @@
+/*
+  * The file intends to support EEH funtionality for those PCI devices,
+  * which have been passed through from host to guest via VFIO. So this
+  * file is naturally part of VFIO implementation on PowerNV platform.
+  *
+  * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014.
+  *
+  * This program is free software; you can redistribute it and/or modify
+  * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+  * (at your option) any later version.
+  */
+
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+#include <linux/vfio.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/opal.h>
+#include <asm/msi_bitmap.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/tce.h>
+#include <asm/uaccess.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static int powernv_eeh_vfio_map(struct vfio_eeh_info *info)
+{
+	struct pci_bus *bus, *pe_bus;
+	struct pci_dev *pdev;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	int domain, bus_no, devfn;
+
+	/* Host address */
+	domain = info->map.domain;
+	bus_no = (info->map.bdn >> 8) & 0xff;
+	devfn = info->map.bdn & 0xff;
+
+	/* Find PCI bus */
+	bus = pci_find_bus(domain, bus_no);
+	if (!bus) {
+		pr_warn("%s: PCI bus %04x:%02x not found\n",
+			__func__, domain, bus_no);
+		return -ENODEV;
+	}
+
+	/* Find PCI device */
+	pdev = pci_get_slot(bus, devfn);
+	if (!pdev) {
+		pr_warn("%s: PCI device %04x:%02x:%02x.%01x not found\n",
+			__func__, domain, bus_no,
+			PCI_SLOT(devfn), PCI_FUNC(devfn));
+		return -ENODEV;
+	}
+
+	/* No EEH device - almost impossible */
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (unlikely(!edev)) {
+		pci_dev_put(pdev);
+		pr_warn("%s: No EEH dev for PCI device %s\n",
+			__func__, pci_name(pdev));
+		return -ENODEV;
+	}
+
+	/* Doesn't support PE migration between different PHBs */
+	pe = edev->pe;
+	if (!eeh_pe_passed(pe)) {
+		pe_bus = eeh_pe_bus_get(pe);
+		BUG_ON(!pe_bus);
+
+		/* PE# has format 00BBSS00 */
+		pe->gaddr.buid	  = info->map.gbuid;
+		pe->gaddr.pe_addr = pe_bus->number << 16;
+		eeh_pe_set_passed(pe, true);
+	} else if (pe->gaddr.buid != info->map.gbuid) {
+		pci_dev_put(pdev);
+		pr_warn("%s: Mismatched PHB BUID (0x%llx, 0x%llx)\n",
+			__func__, pe->gaddr.buid, info->map.gbuid);
+		return -EINVAL;
+	}
+
+	edev->gaddr.buid = info->map.gbuid;
+	edev->gaddr.bdn  = info->map.gbdn;
+	eeh_dev_set_passed(edev, true);
+
+	pr_debug("EEH: Host PCI dev %s to %llx-%02x:%02x.%01x\n",
+		 pci_name(pdev), info->map.gbuid,
+		 (info->map.gbdn >> 8) & 0xFF,
+		 PCI_SLOT(info->map.gbdn & 0xFF),
+		 PCI_FUNC(info->map.gbdn & 0xFF));
+
+	pci_dev_put(pdev);
+	return 0;
+}
+
+static int powernv_eeh_vfio_unmap(struct vfio_eeh_info *info)
+{
+	struct eeh_vfio_pci_addr addr;
+	struct pci_dev *pdev;
+	struct eeh_dev *edev, *tmp;
+	struct eeh_pe *pe;
+	bool passed;
+
+	/* Get EEH device */
+	addr.buid = info->unmap.buid;
+	addr.bdn  = info->unmap.bdn;
+	edev = eeh_vfio_dev_get(&addr);
+	if (!edev) {
+		pr_warn("%s: Cannot locate %llx:%02x:%02x.%01x\n",
+			__func__, info->unmap.buid,
+			(info->unmap.bdn >> 8) & 0xFF,
+			PCI_SLOT(info->unmap.bdn & 0xFF),
+			PCI_FUNC(info->unmap.bdn & 0xFF));
+		return -ENODEV;
+	}
+
+	/* Return EEH device */
+	memset(&edev->gaddr, 0, sizeof(edev->gaddr));
+	eeh_dev_set_passed(edev, false);
+	pdev = eeh_dev_to_pci_dev(edev);
+	pr_debug("EEH: Host PCI dev %s returned\n",
+		 pdev ? pci_name(pdev) : "NULL");
+
+	/* Return PE if no EEH device is owned by guest */
+	pe = edev->pe;
+	passed = false;
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		pdev = eeh_dev_to_pci_dev(edev);
+		if (pdev && pdev->subordinate)
+			continue;
+
+		if (eeh_dev_passed(edev)) {
+			passed = true;
+			break;
+		}
+	}
+
+	if (!passed) {
+		memset(&pe->gaddr, 0, sizeof(pe->gaddr));
+		eeh_pe_set_passed(pe, false);
+		pr_debug("EEH: PHB#%x-PE#%x returned to host\n",
+			 pe->phb->global_number, pe->addr);
+	}
+
+	return 0;
+}
+
+static int powernv_eeh_vfio_set_option(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->option.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode < EEH_OPT_DISABLE || opcode > EEH_OPT_THAW_DMA) {
+		pr_warn("%s: opcode %d out of range (%d, %d)\n",
+			__func__, opcode, EEH_OPT_DISABLE, EEH_OPT_THAW_DMA);
+		ret = 3;
+		goto out;
+	}
+
+	/* Option "enable" uses PCI config address */
+	if (opcode == EEH_OPT_ENABLE) {
+		addr.buid = info->option.buid;
+		addr.bdn  = (info->option.pe_addr >> 8) & 0xFFFF;
+		edev = eeh_vfio_dev_get(&addr);
+		if (!edev) {
+			pr_warn("%s: Cannot locate %llx:%02x:%02x.%01x\n",
+				__func__, addr.buid,
+				(addr.bdn >> 8) & 0xFF,
+				PCI_SLOT(addr.bdn & 0xFF),
+				PCI_FUNC(addr.bdn & 0xFF));
+			ret = 7;
+			goto out;
+		}
+		phb = edev->phb->private_data;
+	} else {
+		addr.buid    = info->option.buid;
+		addr.pe_addr = info->option.pe_addr;
+		pe = eeh_vfio_pe_get(&addr);
+		if (!pe) {
+			pr_warn("%s: Cannot find PE %llx:%x\n",
+				__func__, addr.buid, addr.pe_addr);
+			ret = 7;
+			goto out;
+		}
+		phb = pe->phb->private_data;
+	}
+
+	/* Insure that the EEH stuff has been initialized */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 7;
+		goto out;
+	}
+
+	/*
+	 * The EEH functionality has been enabled on all PEs
+	 * by default. So just return success. The same situation
+	 * would be applied while we disable EEH functionality.
+	 * However, the guest isn't expected to disable that
+	 * at all.
+	 */
+	if (opcode == EEH_OPT_DISABLE ||
+	    opcode == EEH_OPT_ENABLE) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * Call into the IODA dependent backend in order
+	 * to enable DMA or MMIO for the indicated PE.
+	 */
+	if (phb->eeh_ops && phb->eeh_ops->set_option) {
+		if (phb->eeh_ops->set_option(pe, opcode)) {
+			pr_warn("%s: Failure from backend\n", __func__);
+			ret = 1;
+		}
+	} else {
+		pr_warn("%s: Unsupported request\n", __func__);
+		ret = 7;
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_get_addr(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_dev *edev;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->addr.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode != 0 && opcode != 1) {
+		pr_warn("%s: opcode %d out of range (0, 1)\n",
+			__func__, opcode);
+		ret = 3;
+		goto out;
+	}
+
+	/* Find EEH device */
+	addr.buid = info->addr.buid;
+	addr.bdn  = (info->addr.bdn >> 8 ) & 0xFFFF;
+	edev = eeh_vfio_dev_get(&addr);
+	if (!edev) {
+		pr_warn("%s: Cannot locate %llx:%02x:%02x.%01x\n",
+			__func__, addr.buid,
+			(addr.bdn >> 8) & 0xFF,
+			PCI_SLOT(addr.bdn & 0xFF),
+			PCI_FUNC(addr.bdn & 0xFF));
+		ret = 7;
+		goto out;
+	}
+	phb = edev->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* EEH device passed ? */
+	if (!eeh_dev_passed(edev)) {
+		pr_warn("%s: EEH dev %llx:%02x:%02x.%01x owned by host\n",
+			__func__, addr.buid,
+			(addr.bdn >> 8) & 0xFF,
+			PCI_SLOT(addr.bdn & 0xFF),
+			PCI_FUNC(addr.bdn & 0xFF));
+		ret = 3;
+		goto out;
+	}
+
+	/*
+	 * Fill result according to opcode. We don't differentiate
+	 * PCI bus and device sensitive PE here.
+	 */
+	if (opcode == 0)
+		info->addr.ret = edev->pe->gaddr.pe_addr;
+	else
+		info->addr.ret = 1;
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_get_state(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int result, ret = 0;
+
+	/* Locate the PE */
+	addr.buid    = info->state.buid;
+	addr.pe_addr = info->state.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* Call to the IOC dependent function */
+	if (phb->eeh_ops && phb->eeh_ops->get_state) {
+		result = phb->eeh_ops->get_state(pe);
+
+		if (!(result & EEH_STATE_RESET_ACTIVE) &&
+		     (result & EEH_STATE_DMA_ENABLED) &&
+		     (result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 0;
+		else if (result & EEH_STATE_RESET_ACTIVE)
+			info->state.state = 1;
+		else if (!(result & EEH_STATE_RESET_ACTIVE) &&
+			 !(result & EEH_STATE_DMA_ENABLED) &&
+			 !(result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 2;
+		else if (!(result & EEH_STATE_RESET_ACTIVE) &&
+			 (result & EEH_STATE_DMA_ENABLED) &&
+			 !(result & EEH_STATE_MMIO_ENABLED))
+			info->state.state = 4;
+		else
+			info->state.state = 5;
+
+		ret = 0;
+	} else {
+		pr_warn("%s: Unsupported request\n", __func__);
+		ret = 3;
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_pe_reset(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int opcode = info->reset.option;
+	int ret = 0;
+
+	/* Check opcode */
+	if (opcode != EEH_RESET_DEACTIVATE &&
+	    opcode != EEH_RESET_HOT &&
+	    opcode != EEH_RESET_FUNDAMENTAL) {
+		pr_warn("%s: Unsupported opcode %d\n", __func__, opcode);
+		ret = 3;
+		goto out;
+	}
+
+	/* Locate the PE */
+	addr.buid    = info->reset.buid;
+	addr.pe_addr = info->reset.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+	}
+
+	/* Call into the IODA dependent backend to do the reset */
+	if (!phb->eeh_ops ||
+	    !phb->eeh_ops->set_option ||
+	    !phb->eeh_ops->reset) {
+		pr_warn("%s: Unsupported request\n", __func__);
+		ret = 7;
+	} else {
+		/*
+		 * The frozen PE might be caused by the mechanism called
+		 * PAPR error injection, which is supposed to be one-shot
+		 * without "sticky" bit as being stated by the spec. But
+		 * the reality isn't that, at least on P7IOC. So we have
+		 * to clear that to avoid recrusive error, which fails the
+		 * recovery eventually.
+		 */
+		if (opcode == EEH_RESET_DEACTIVATE)
+			opal_pci_reset(phb->opal_id,
+				       OPAL_PHB_ERROR,
+				       OPAL_ASSERT_RESET);
+
+		if (phb->eeh_ops->reset(pe, opcode)) {
+			pr_warn("%s: Failure from backend\n", __func__);
+			ret = 1;
+			goto out;
+		}
+
+		/*
+		 * The PE is still in frozen state and we need clear that.
+		 * It's good to clear frozen state after deassert to avoid
+		 * messy IO access during reset, which might cause recrusive
+		 * frozen PE.
+		 */
+		if (opcode == EEH_RESET_DEACTIVATE) {
+			phb->eeh_ops->set_option(pe, EEH_OPT_THAW_MMIO);
+			phb->eeh_ops->set_option(pe, EEH_OPT_THAW_DMA);
+		}
+	}
+
+out:
+	return ret;
+}
+
+static int powernv_eeh_vfio_pe_config(struct vfio_eeh_info *info)
+{
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct eeh_vfio_pci_addr addr;
+	int ret = 0;
+
+	/* Locate the PE */
+	addr.buid    = info->config.buid;
+	addr.pe_addr = info->config.pe_addr;
+	pe = eeh_vfio_pe_get(&addr);
+	if (!pe) {
+		pr_warn("%s: Cannot locate %llx:%x\n",
+			__func__, addr.buid, addr.pe_addr);
+		ret = 3;
+		goto out;
+	}
+	phb = pe->phb->private_data;
+
+	/* EEH enabled ? */
+	if (!(phb->flags & PNV_PHB_FLAG_EEH)) {
+		pr_warn("%s: EEH disabled on PHB#%d\n",
+			__func__, phb->hose->global_number);
+		ret = 3;
+		goto out;
+        }
+
+	/*
+	 * The access to PCI config space on VFIO device has some
+	 * limitations. Part of PCI config space, including BAR
+	 * registers are not readable and writable. So the guest
+	 * should have stale values for those registers and we have
+	 * to restore them in host side.
+	 */
+	eeh_pe_restore_bars(pe);
+out:
+	return ret;
+}
+
+void eeh_vfio_release(struct iommu_table *tbl)
+{
+	struct pnv_ioda_pe *pnv_pe = container_of(tbl, struct pnv_ioda_pe,
+						  tce32_table);
+	struct pnv_phb *phb = pnv_pe->phb;
+	struct eeh_pe *phb_pe, *pe;
+	struct eeh_dev dev, *edev, *tmp;
+
+	/* Find PHB PE */
+	phb_pe = eeh_phb_pe_get(phb->hose);
+	if (unlikely(!phb_pe)) {
+		pr_warn("%s: Cannot find PHB#%d PE\n",
+			__func__, phb->hose->global_number);
+		return;
+	}
+
+	/* Find PE */
+	memset(&dev, 0, sizeof(struct eeh_dev));
+	dev.phb = phb->hose;
+	dev.pe_config_addr = pnv_pe->pe_number;
+	pe = eeh_pe_get(&dev);
+	if (unlikely(!pe)) {
+		pr_warn("%s: Cannot find PE instance for PHB#%d-PE#%d\n",
+			__func__, phb->hose->global_number,
+			pnv_pe->pe_number);
+		return;
+	}
+
+	/* Release it to host */
+	if (!eeh_pe_passed(pe))
+		return;
+
+	eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (!eeh_dev_passed(edev))
+			continue;
+
+		memset(&edev->gaddr, 0, sizeof(edev->gaddr));
+		eeh_dev_set_passed(edev, false);
+	}
+
+	memset(&pe->gaddr, 0, sizeof(pe->gaddr));
+	eeh_pe_set_passed(pe, false);
+}
+EXPORT_SYMBOL(eeh_vfio_release);
+
+int eeh_vfio_ioctl(unsigned long arg)
+{
+	struct vfio_eeh_info info;
+	int ret = -EINVAL;
+
+	/* Copy over user argument */
+	if (copy_from_user(&info, (void __user *)arg, sizeof(info))) {
+		pr_warn("%s: Cannot copy user argument 0x%lx\n",
+			__func__, arg);
+		return -EFAULT;
+	}
+
+	/* Sanity check */
+	if (info.argsz != sizeof(info)) {
+		pr_warn("%s: Invalid argument size (%d, %ld)\n",
+			__func__, info.argsz, sizeof(info));
+		return -EINVAL;
+	}
+
+	/* Route according to operation */
+	switch (info.op) {
+	case vfio_eeh_ops_map:
+		ret = powernv_eeh_vfio_map(&info);
+		break;
+	case vfio_eeh_ops_unmap:
+		ret = powernv_eeh_vfio_unmap(&info);
+		break;
+	case vfio_eeh_ops_set_option:
+		ret = powernv_eeh_vfio_set_option(&info);
+		break;
+	case vfio_eeh_ops_get_addr:
+		ret = powernv_eeh_vfio_get_addr(&info);
+		break;
+	case vfio_eeh_ops_get_state:
+		ret = powernv_eeh_vfio_get_state(&info);
+		break;
+	case vfio_eeh_ops_pe_reset:
+		ret = powernv_eeh_vfio_pe_reset(&info);
+		break;
+	case vfio_eeh_ops_pe_config:
+		ret = powernv_eeh_vfio_pe_config(&info);
+		break;
+	default:
+		pr_info("%s: Cannot handle op#%d (%d, %d)\n",
+			__func__, info.op, vfio_eeh_ops_min,
+			vfio_eeh_ops_max);
+	}
+
+	/* Copy data back */
+	if (!ret && copy_to_user((void __user *)arg, &info, sizeof(info))) {
+		pr_warn("%s: Cannot copy to user 0x%lx\n",
+			__func__, arg);
+		return -EFAULT;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(eeh_vfio_ioctl);
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index a84788b..c45dece 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -26,6 +26,11 @@
 #define DRIVER_AUTHOR   "aik@ozlabs.ru"
 #define DRIVER_DESC     "VFIO IOMMU SPAPR TCE"
 
+#ifdef CONFIG_VFIO_EEH
+extern void eeh_vfio_release(struct iommu_table *tbl);
+extern int eeh_vfio_ioctl(unsigned long arg);
+#endif
+
 static void tce_iommu_detach_group(void *iommu_data,
 		struct iommu_group *iommu_group);
 
@@ -283,6 +288,10 @@ static long tce_iommu_ioctl(void *iommu_data,
 		tce_iommu_disable(container);
 		mutex_unlock(&container->lock);
 		return 0;
+#ifdef CONFIG_VFIO_EEH
+	case VFIO_EEH_INFO:
+		return eeh_vfio_ioctl(arg);
+#endif
 	}
 
 	return -ENOTTY;
@@ -342,6 +351,9 @@ static void tce_iommu_detach_group(void *iommu_data,
 		/* pr_debug("tce_vfio: detaching group #%u from iommu %p\n",
 				iommu_group_id(iommu_group), iommu_group); */
 		container->tbl = NULL;
+#ifdef CONFIG_VFIO_EEH
+		eeh_vfio_release(tbl);
+#endif
 		iommu_release_ownership(tbl);
 	}
 	mutex_unlock(&container->lock);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..4e1c7f9 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,67 @@ struct vfio_iommu_spapr_tce_info {
 
 #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
 
+/*
+ * The VFIO EEH info struct provides way to support EEH functionality
+ * for PCI device that is passed from host to guest via VFIO.
+ */
+enum {
+	vfio_eeh_ops_min	= 0,
+	vfio_eeh_ops_map	= 0,
+	vfio_eeh_ops_unmap	= 1,
+	vfio_eeh_ops_set_option	= 2,
+	vfio_eeh_ops_get_addr	= 3,
+	vfio_eeh_ops_get_state	= 4,
+	vfio_eeh_ops_pe_reset	= 5,
+	vfio_eeh_ops_pe_config	= 6,
+	vfio_eeh_ops_max	= 6
+};
+
+struct vfio_eeh_info {
+	__u32 argsz;
+	__u32 op;
+
+	union {
+		struct vfio_eeh_map {
+			__u32 domain;
+			__u16 bdn;
+			__u64 gbuid;
+			__u16 gbdn;
+		} map;
+		struct vfio_eeh_unmap {
+			__u64 buid;
+			__u16 bdn;
+		} unmap;
+		struct vfio_eeh_set_option {
+			__u64 buid;
+			__u32 pe_addr;
+			__u32 option;
+		} option;
+		struct vfio_eeh_pe_addr {
+			__u64 buid;
+			__u32 bdn;
+			__u32 option;
+			__u32 ret;
+		} addr;
+		struct vfio_eeh_state {
+			__u64 buid;
+			__u32 pe_addr;
+			__u32 state;
+                } state;
+		struct vfio_eeh_reset {
+			__u64 buid;
+			__u32 pe_addr;
+			__u32 option;
+		} reset;
+		struct vfio_eeh_config {
+			__u64 buid;
+			__u32 pe_addr;
+		} config;
+	};
+};
+
+#define VFIO_EEH_INFO	_IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /* ***************************************************************** */
 
 #endif /* _UAPIVFIO_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/10] powerpc/eeh: Avoid event on passed PE
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (4 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 05/10] drivers/vfio: New IOCTL command VFIO_EEH_INFO Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 07/10] powerpc/powernv: Sync OPAL header file with firmware Gavin Shan
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

If we detects frozen state on PE that has been passed to guest, we
needn't handle it. Instead, we rely on the guest to detect and recover
it. The patch avoid EEH event on the frozen passed PE so that the guest
can have chance to handle that.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c                 | 8 ++++++++
 arch/powerpc/platforms/powernv/eeh-ioda.c | 3 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9c6b899..6543f05 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -400,6 +400,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	if (ret > 0)
 		return ret;
 
+	/*
+	 * If the PE has been passed to guest, we won't check the
+	 * state. Instead, let the guest handle it if the PE has
+	 * been frozen.
+	 */
+	if (eeh_pe_passed(pe))
+		return 0;
+
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
 	 * Do this checking under a lock; as multiple PCI devices
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 1b5982f..03a3ed2 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -890,7 +890,8 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 				opal_pci_eeh_freeze_clear(phb->opal_id, frozen_pe_no,
 					OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
 				ret = EEH_NEXT_ERR_NONE;
-			} else if ((*pe)->state & EEH_PE_ISOLATED) {
+			} else if ((*pe)->state & EEH_PE_ISOLATED ||
+				   eeh_pe_passed(*pe)) {
 				ret = EEH_NEXT_ERR_NONE;
 			} else {
 				pr_err("EEH: Frozen PHB#%x-PE#%x (%s) detected\n",
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/10] powerpc/powernv: Sync OPAL header file with firmware
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (5 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 06/10] powerpc/eeh: Avoid event on passed PE Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 08/10] powerpc: Extend syscall ppc_rtas() Gavin Shan
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch synchronizes OPAL header file with firmware so that the
host kernel can make OPAL call to do error injection.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h                | 65 ++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 2 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 66ad7a7..ca55d9c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -175,6 +175,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_SET_PARAM				90
 #define OPAL_DUMP_RESEND			91
 #define OPAL_DUMP_INFO2				94
+#define OPAL_ERR_INJECT				96
 
 #ifndef __ASSEMBLY__
 
@@ -219,6 +220,69 @@ enum OpalPciErrorSeverity {
 	OPAL_EEH_SEV_INF	= 5
 };
 
+enum OpalErrinjctType {
+	OpalErrinjctTypeFirst			= 0,
+	OpalErrinjctTypeFatal			= 1,
+	OpalErrinjctTypeRecoverRandomEvent	= 2,
+	OpalErrinjctTypeRecoverSpecialEvent	= 3,
+	OpalErrinjctTypeCorruptedPage		= 4,
+	OpalErrinjctTypeCorruptedSlb		= 5,
+	OpalErrinjctTypeTranslatorFailure	= 6,
+	OpalErrinjctTypeIoaBusError		= 7,
+	OpalErrinjctTypeIoaBusError64		= 8,
+	OpalErrinjctTypePlatformSpecific	= 9,
+	OpalErrinjctTypeDcacheStart		= 10,
+	OpalErrinjctTypeDcacheEnd		= 11,
+	OpalErrinjctTypeIcacheStart		= 12,
+	OpalErrinjctTypeIcacheEnd		= 13,
+	OpalErrinjctTypeTlbStart		= 14,
+	OpalErrinjctTypeTlbEnd			= 15,
+	OpalErrinjctTypeUpstreamIoError		= 16,
+	OpalErrinjctTypeLast			= 17,
+
+	/* IoaBusError & IoaBusError64 */
+	OpalEjtIoaLoadMemAddr			= 0,
+	OpalEjtIoaLoadMemData			= 1,
+	OpalEjtIoaLoadIoAddr			= 2,
+	OpalEjtIoaLoadIoData			= 3,
+	OpalEjtIoaLoadConfigAddr		= 4,
+	OpalEjtIoaLoadConfigData		= 5,
+	OpalEjtIoaStoreMemAddr			= 6,
+	OpalEjtIoaStoreMemData			= 7,
+	OpalEjtIoaStoreIoAddr			= 8,
+	OpalEjtIoaStoreIoData			= 9,
+	OpalEjtIoaStoreConfigAddr		= 10,
+	OpalEjtIoaStoreConfigData		= 11,
+	OpalEjtIoaDmaReadMemAddr		= 12,
+	OpalEjtIoaDmaReadMemData		= 13,
+	OpalEjtIoaDmaReadMemMaster		= 14,
+	OpalEjtIoaDmaReadMemTarget		= 15,
+	OpalEjtIoaDmaWriteMemAddr		= 16,
+	OpalEjtIoaDmaWriteMemData		= 17,
+	OpalEjtIoaDmaWriteMemMaster		= 18,
+	OpalEjtIoaDmaWriteMemTarget		= 19,
+};
+
+struct OpalErrinjct {
+	int32_t type;
+	union {
+		struct {
+			uint32_t addr;
+			uint32_t mask;
+			uint64_t phb_id;
+			uint32_t pe;
+			uint32_t function;
+		}ioa;
+		struct {
+			uint64_t addr;
+			uint64_t mask;
+			uint64_t phb_id;
+			uint32_t pe;
+			uint32_t function;
+		}ioa64;
+	};
+};
+
 enum OpalShpcAction {
 	OPAL_SHPC_GET_LINK_STATE = 0,
 	OPAL_SHPC_GET_SLOT_STATE = 1
@@ -839,6 +903,7 @@ int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer,
 				   uint64_t diag_buffer_len);
 int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer,
 				    uint64_t diag_buffer_len);
+int64_t opal_err_injct(void *data);
 int64_t opal_pci_fence_phb(uint64_t phb_id);
 int64_t opal_pci_reinit(uint64_t phb_id, uint64_t reinit_scope, uint64_t data);
 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index f531ffe..46265de 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -119,6 +119,7 @@ OPAL_CALL(opal_pci_next_error,			OPAL_PCI_NEXT_ERROR);
 OPAL_CALL(opal_pci_poll,			OPAL_PCI_POLL);
 OPAL_CALL(opal_pci_msi_eoi,			OPAL_PCI_MSI_EOI);
 OPAL_CALL(opal_pci_get_phb_diag_data2,		OPAL_PCI_GET_PHB_DIAG_DATA2);
+OPAL_CALL(opal_err_injct,			OPAL_ERR_INJECT);
 OPAL_CALL(opal_xscom_read,			OPAL_XSCOM_READ);
 OPAL_CALL(opal_xscom_write,			OPAL_XSCOM_WRITE);
 OPAL_CALL(opal_lpc_read,			OPAL_LPC_READ);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/10] powerpc: Extend syscall ppc_rtas()
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (6 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 07/10] powerpc/powernv: Sync OPAL header file with firmware Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 09/10] powerpc/powernv: Implement ppc_call_opal() Gavin Shan
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

Originally, syscall ppc_rtas() can be used to invoke RTAS call from
user space. Utility "errinjct" is using it to inject various errors
to the system for testing purpose. The patch intends to extend the
syscall to support both pSeries and PowerNV platform. With that,
RTAS and OPAL call can be invoked from user space. In turn, utility
"errinjct" can be supported on pSeries and PowerNV platform at same
time.

The original syscall handler ppc_rtas() is renamed to ppc_firmware(),
which calls ppc_call_rtas() or ppc_call_opal() depending on the
running platform. The data transported between userland and kerenl is
by "struct rtas_args". It's platform specific on how to use the data.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h        | 10 +++++-
 arch/powerpc/include/asm/syscalls.h    |  2 +-
 arch/powerpc/include/asm/systbl.h      |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h |  2 +-
 arch/powerpc/kernel/rtas.c             | 57 +++++++---------------------------
 arch/powerpc/kernel/syscalls.c         | 50 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c  |  7 +++++
 kernel/sys_ni.c                        |  2 +-
 8 files changed, 82 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..3428524 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -20,7 +20,7 @@
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
 
-/* Buffer size for ppc_rtas system call. */
+/* Buffer size for ppc_firmware system call. */
 #define RTAS_RMOBUF_MAX (64 * 1024)
 
 /* RTAS return status codes */
@@ -427,9 +427,17 @@ static inline int page_is_rtas_user_buf(unsigned long pfn)
 /* Not the best place to put pSeries_coalesce_init, will be fixed when we
  * move some of the rtas suspend-me stuff to pseries */
 extern void pSeries_coalesce_init(void);
+extern int ppc_call_rtas(struct rtas_args *args);
 #else
 static inline int page_is_rtas_user_buf(unsigned long pfn) { return 0;}
 static inline void pSeries_coalesce_init(void) { }
+static inline int ppc_call_rtas(struct rtas_args *args) { return -ENXIO; }
+#endif
+
+#ifdef CONFIG_PPC_POWERNV
+extern int ppc_call_opal(struct rtas_args *args);
+#else
+static inline int ppc_call_opal(struct rtas_arts *args) { return -ENXIO; }
 #endif
 
 extern int call_rtas(const char *, int, int, unsigned long *, ...);
diff --git a/arch/powerpc/include/asm/syscalls.h b/arch/powerpc/include/asm/syscalls.h
index 23be8f1..3383e50 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -15,7 +15,7 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, size_t len,
 		unsigned long prot, unsigned long flags,
 		unsigned long fd, unsigned long pgoff);
 asmlinkage long ppc64_personality(unsigned long personality);
-asmlinkage int ppc_rtas(struct rtas_args __user *uargs);
+asmlinkage int ppc_firmware(struct rtas_args __user *uargs);
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_SYSCALLS_H */
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 3ddf702..00f8bb2 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -259,7 +259,7 @@ COMPAT_SYS_SPU(utimes)
 COMPAT_SYS_SPU(statfs64)
 COMPAT_SYS_SPU(fstatfs64)
 SYSX(sys_ni_syscall, ppc_fadvise64_64, ppc_fadvise64_64)
-PPC_SYS_SPU(rtas)
+PPC_SYS_SPU(firmware)
 OLDSYS(debug_setcontext)
 SYSCALL(ni_syscall)
 COMPAT_SYS(migrate_pages)
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index 881bf2e..3aee765 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -273,7 +273,7 @@
 #ifndef __powerpc64__
 #define __NR_fadvise64_64	254
 #endif
-#define __NR_rtas		255
+#define __NR_firmware		255
 #define __NR_sys_debug_setcontext 256
 /* Number 257 is reserved for vserver */
 #define __NR_migrate_pages	258
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8cd5ed0..5d829a72 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1017,59 +1017,32 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 }
 
 /* We assume to be passed big endian arguments */
-asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
+int ppc_call_rtas(struct rtas_args *args)
 {
-	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
-	int nargs, nret, token;
 	int rc;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0)
-		return -EFAULT;
-
-	nargs = be32_to_cpu(args.nargs);
-	nret  = be32_to_cpu(args.nret);
-	token = be32_to_cpu(args.token);
-
-	if (nargs > ARRAY_SIZE(args.args)
-	    || nret > ARRAY_SIZE(args.args)
-	    || nargs + nret > ARRAY_SIZE(args.args))
-		return -EINVAL;
-
-	/* Copy in args. */
-	if (copy_from_user(args.args, uargs->args,
-			   nargs * sizeof(rtas_arg_t)) != 0)
-		return -EFAULT;
-
-	if (token == RTAS_UNKNOWN_SERVICE)
-		return -EINVAL;
-
-	args.rets = &args.args[nargs];
-	memset(args.rets, 0, nret * sizeof(rtas_arg_t));
-
 	/* Need to handle ibm,suspend_me call specially */
-	if (token == ibm_suspend_me_token) {
-		rc = rtas_ibm_suspend_me(&args);
+	if (args->token == ibm_suspend_me_token) {
+		rc = rtas_ibm_suspend_me(args);
 		if (rc)
 			return rc;
-		goto copy_return;
+		goto out;
 	}
 
 	buff_copy = get_errorlog_buffer();
 
 	flags = lock_rtas();
-
-	rtas.args = args;
+	rtas.args = *args;
 	enter_rtas(__pa(&rtas.args));
-	args = rtas.args;
+	*args = rtas.args;
 
-	/* A -1 return code indicates that the last command couldn't
-	   be completed due to a hardware error. */
-	if (be32_to_cpu(args.rets[0]) == -1)
+	/*
+	 * A -1 return code indicates that the last command couldn't
+	 * be completed due to a hardware error.
+	 */
+	if (be32_to_cpu(args->rets[0]) == -1)
 		errbuf = __fetch_rtas_last_error(buff_copy);
 
 	unlock_rtas(flags);
@@ -1080,13 +1053,7 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
 		kfree(buff_copy);
 	}
 
- copy_return:
-	/* Copy out args. */
-	if (copy_to_user(uargs->args + nargs,
-			 args.args + nargs,
-			 nret * sizeof(rtas_arg_t)) != 0)
-		return -EFAULT;
-
+out:
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index cd9be9a..bcb7483 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -40,6 +40,56 @@
 #include <asm/syscalls.h>
 #include <asm/time.h>
 #include <asm/unistd.h>
+#include <asm/machdep.h>
+#include <asm/rtas.h>
+
+asmlinkage int ppc_firmware(struct rtas_args __user *uargs)
+{
+	int rc;
+	int nargs, nret, token;
+	struct rtas_args args;
+
+	/* Copy over common header */
+	if (copy_from_user(&args, uargs, 3 * sizeof(u32)))
+		return -EFAULT;
+	nargs = be32_to_cpu(args.nargs);
+	nret  = be32_to_cpu(args.nret);
+	token = be32_to_cpu(args.token);
+
+	/* Parameter overflow ? */
+	if (nargs > ARRAY_SIZE(args.args)
+	    || nret > ARRAY_SIZE(args.args)
+	    || nargs + nret > ARRAY_SIZE(args.args))
+                return -EINVAL;
+
+	/* Copy over all arguments */
+        if (copy_from_user(args.args, uargs->args,
+			   nargs * sizeof(rtas_arg_t)))
+		return -EFAULT;
+
+	/* Invalid token ? */
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return -EINVAL;
+
+	/* Clean out return values */
+        args.rets = &args.args[nargs];
+        memset(args.rets, 0, nret * sizeof(rtas_arg_t));
+
+	/* Route to correct platform */
+	if (machine_is(pseries))
+		rc = ppc_call_rtas(&args);
+	else if (machine_is(powernv))
+		rc = ppc_call_opal(&args);
+	else
+		return -ENXIO;
+
+	/* Copy result to user space */
+	if (copy_to_user(uargs->args + nargs, args.args + nargs,
+                         nret * sizeof(rtas_arg_t)))
+		return -EFAULT;
+
+	return rc;
+}
 
 static inline unsigned long do_mmap2(unsigned long addr, size_t len,
 			unsigned long prot, unsigned long flags,
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 360ad80c..ad33c2b 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -25,6 +25,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/rtas.h>
 
 #include "powernv.h"
 
@@ -701,3 +702,9 @@ void opal_free_sg_list(struct opal_sg_list *sg)
 			sg = NULL;
 	}
 }
+
+/* Extend it later */
+int ppc_call_opal(struct rtas_args *args)
+{
+	return 0;
+}
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index bc8d1b7..2c5b3fa 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -159,7 +159,7 @@ cond_syscall(sys_pciconfig_read);
 cond_syscall(sys_pciconfig_write);
 cond_syscall(sys_pciconfig_iobase);
 cond_syscall(compat_sys_s390_ipc);
-cond_syscall(ppc_rtas);
+cond_syscall(ppc_firmware);
 cond_syscall(sys_spu_run);
 cond_syscall(sys_spu_create);
 cond_syscall(sys_subpage_prot);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/10] powerpc/powernv: Implement ppc_call_opal()
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (7 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 08/10] powerpc: Extend syscall ppc_rtas() Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:49 ` [PATCH 10/10] powerpc/powernv: Error injection infrastructure Gavin Shan
  2014-05-09  7:54 ` [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

If we're running PowerNV platform, ppc_firmware() will be directed
to ppc_call_opal() where we can call to OPAL API accordingly. In
ppc_call_opal(), the input argument are parsed out and call to
appropriate OPAL API to handle that. Each request passed to the
function is identified with token. As we get to the function either
from host owned application (e.g. errinjct) or VM, we always have
the first parameter (so-called "virtual") to differentiate the
cases.

The patch implements above logic and OPAL call handler dynamica
registeration mechanism so that the handlers could be distributed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h       |  3 +-
 arch/powerpc/platforms/powernv/opal.c | 90 ++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ca55d9c..7c4ffd0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -997,7 +997,8 @@ extern void opal_lpc_init(void);
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 					     unsigned long vmalloc_size);
 void opal_free_sg_list(struct opal_sg_list *sg);
-
+int opal_call_handler_register(bool virt, int token,
+			       int (*fn)(struct rtas_args *));
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index ad33c2b..c84823c 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -38,6 +38,13 @@ struct opal {
 	u64 size;
 } opal;
 
+struct opal_call_handler {
+	bool virt;
+	int token;
+	int (*fn)(struct rtas_args *args);
+	struct list_head list;
+};
+
 struct mcheck_recoverable_range {
 	u64 start_addr;
 	u64 end_addr;
@@ -47,6 +54,10 @@ struct mcheck_recoverable_range {
 static struct mcheck_recoverable_range *mc_recoverable_range;
 static int mc_recoverable_range_len;
 
+/* OPAL call handler */
+static LIST_HEAD(opal_call_handler_list);
+static DEFINE_SPINLOCK(opal_call_lock);
+
 struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
 extern u64 opal_mc_secondary_handler[];
@@ -703,8 +714,83 @@ void opal_free_sg_list(struct opal_sg_list *sg)
 	}
 }
 
-/* Extend it later */
-int ppc_call_opal(struct rtas_args *args)
+int opal_call_handler_register(bool virt, int token,
+			       int (*fn)(struct rtas_args *))
 {
+	struct opal_call_handler *h, *handler;
+
+	if (!token || !fn) {
+		pr_warn("%s: Invalid parameters\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	handler = kzalloc(sizeof(*handler), GFP_KERNEL);
+	if (!handler) {
+		pr_warn("%s: Out of memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+	handler->token = token;
+	handler->virt = virt;
+	handler->fn = fn;
+	INIT_LIST_HEAD(&handler->list);
+
+	spin_lock(&opal_call_lock);
+	list_for_each_entry(h, &opal_call_handler_list, list) {
+		if (h->token == token &&
+		    h->virt  == virt) {
+			spin_unlock(&opal_call_lock);
+			pr_warn("%s: Handler existing (%s, %x)\n",
+				__func__, virt ? "T" : "F", token);
+			kfree(handler);
+			return -EEXIST;
+		}
+	}
+
+	list_add_tail(&handler->list, &opal_call_handler_list);
+	spin_unlock(&opal_call_lock);
+
 	return 0;
 }
+
+/*
+ * It's usually invoked from syscall ppc_firmware() by host
+ * owned application or VM. The information carried in the
+ * input arguments is different. So we always have the first
+ * argument to differentiate it.
+ *
+ * Also, we have to extend 32-bits address to 64-bits. So
+ * for each address sensitive field, it will require 8
+ * bytes.
+ */
+int ppc_call_opal(struct rtas_args *args)
+{
+	bool virt, found;
+	int token;
+	struct opal_call_handler *h;
+
+	/* We should have "virt" at least */
+	if (args->nargs < 1)
+		return -EINVAL;
+	virt = !!args->args[0];
+	token = args->token;
+
+	/* Do we have handler ? */
+	found = false;
+	spin_lock(&opal_call_lock);
+	list_for_each_entry(h, &opal_call_handler_list, list) {
+		if (h->token == token &&
+		    h->virt == virt) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&opal_call_lock);
+
+	/* Call to handler */
+	if (!found)
+		return -ERANGE;
+
+	return h->fn(args);
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 10/10] powerpc/powernv: Error injection infrastructure
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (8 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 09/10] powerpc/powernv: Implement ppc_call_opal() Gavin Shan
@ 2014-05-09  7:49 ` Gavin Shan
  2014-05-09  7:54 ` [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:49 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: aik, alex.williamson, qiudayu, Gavin Shan

The patch intends to implemdent the error injection infrastructure
for PowerNV platform. The predetermined handlers will be called
according to the type of injected error (e.g. OpalErrinjctTypeIoaBusError).
For now, we just support PCI error injection. We need support
injecting other types of errors in future.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h            |   6 +
 arch/powerpc/platforms/powernv/Makefile    |   2 +-
 arch/powerpc/platforms/powernv/errinject.c | 224 +++++++++++++++++++++++++++++
 3 files changed, 231 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/errinject.c

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 7c4ffd0..7bf86ba 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -794,6 +794,12 @@ typedef struct oppanel_line {
 	uint64_t 	line_len;
 } oppanel_line_t;
 
+enum OpalCallToken{
+	OPAL_CALL_TOKEN_MIN = 0,
+	OPAL_CALL_TOKEN_ERRINJCT,
+	OPAL_CALL_TOKEN_MAX
+};
+
 /* /sys/firmware/opal */
 extern struct kobject *opal_kobj;
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 2b15a03..5ae8257 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
-obj-y			+= opal-msglog.o
+obj-y			+= opal-msglog.o errinject.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/errinject.c b/arch/powerpc/platforms/powernv/errinject.c
new file mode 100644
index 0000000..aa892d4
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/errinject.c
@@ -0,0 +1,224 @@
+/*
+ * The file intends to support error injection requests from host OS
+ * owned utility (e.g. errinjct) or VM. We need parse the information
+ * passed from user space and call to appropriate OPAL API accordingly.
+ *
+ * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2014.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/msi.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/io.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/rtas.h>
+#include <asm/tce.h>
+#include <asm/uaccess.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static int powernv_errinjct_ioa(struct rtas_args *args)
+{
+	return -ENXIO;
+}
+
+static int powernv_errinjct_ioa64(struct rtas_args *args)
+{
+	return -ENXIO;
+}
+
+#ifdef CONFIG_VFIO_EEH
+static int powernv_errinjct_ioa_virt(struct rtas_args *args)
+{
+	uint32_t addr, mask, cfg_addr;
+	uint32_t buid_hi, buid_lo, op;
+	uint64_t buf_addr = ((uint64_t)(args->args[3])) << 32 |
+			    args->args[4];
+	void __user *buf = (void __user *)buf_addr;
+	struct eeh_vfio_pci_addr vfio_addr;
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct OpalErrinjct ej;
+
+	/* Extract parameters */
+	if (get_user(addr, (uint32_t __user *)buf) ||
+	    get_user(mask, (uint32_t __user *)(buf + 4)) ||
+	    get_user(cfg_addr, (uint32_t __user *)(buf + 8)) ||
+	    get_user(buid_hi, (uint32_t __user *)(buf + 12)) ||
+	    get_user(buid_lo, (uint32_t __user *)(buf + 16)) ||
+	    get_user(op, (uint32_t __user *)(buf + 20)))
+		return -EFAULT;
+
+	/* Check opcode */
+	if (op < OpalEjtIoaLoadMemAddr ||
+	    op > OpalEjtIoaDmaWriteMemTarget)
+		return -EINVAL;
+
+	/* Find PE */
+	vfio_addr.buid = ((((uint64_t)buid_hi) << 32) | buid_lo);
+	vfio_addr.pe_addr = cfg_addr;
+	pe = eeh_vfio_pe_get(&vfio_addr);
+	if (!pe)
+		return -ENODEV;
+	phb = pe->phb->private_data;
+
+	/* OPAL call */
+	ej.type = OpalErrinjctTypeIoaBusError;
+	ej.ioa.addr = addr;
+	ej.ioa.mask = mask;
+	ej.ioa.phb_id = phb->opal_id;
+	ej.ioa.pe = pe->addr;
+	ej.ioa.function = op;
+	if (opal_err_injct(&ej) != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+
+static int powernv_errinjct_ioa64_virt(struct rtas_args *args)
+{
+	uint32_t addr_hi, addr_lo, mask_hi, mask_lo;
+	uint32_t cfg_addr, buid_hi, buid_lo, op;
+	uint64_t buf_addr = ((uint64_t)(args->args[3])) << 32 |
+			    args->args[4];
+	void __user *buf = (void __user *)buf_addr;
+	struct eeh_vfio_pci_addr vfio_addr;
+	struct pnv_phb *phb;
+	struct eeh_pe *pe;
+	struct OpalErrinjct ej;
+
+	/* Extract parameters */
+	if (get_user(addr_hi, (uint32_t __user *)buf) ||
+	    get_user(addr_lo, (uint32_t __user *)(buf + 4)) ||
+	    get_user(mask_hi, (uint32_t __user *)(buf + 8)) ||
+	    get_user(mask_lo, (uint32_t __user *)(buf + 12)) ||
+	    get_user(cfg_addr, (uint32_t __user *)(buf + 16)) ||
+	    get_user(buid_hi, (uint32_t __user *)(buf + 20)) ||
+	    get_user(buid_lo, (uint32_t __user *)(buf + 24)) ||
+	    get_user(op, (uint32_t __user *)(buf + 28)))
+		return -EFAULT;
+
+	/* Check opcode */
+	if (op < OpalEjtIoaLoadMemAddr ||
+	    op > OpalEjtIoaDmaWriteMemTarget)
+		return -EINVAL;
+
+	/* Find PE */
+	vfio_addr.buid = ((((uint64_t)buid_hi) << 32) | buid_lo);
+	vfio_addr.pe_addr = (cfg_addr >> 8) & 0xffff;
+	pe = eeh_vfio_pe_get(&vfio_addr);
+	if (!pe)
+		return -ENODEV;
+	phb = pe->phb->private_data;
+
+	/* OPAL call */
+	ej.type = OpalErrinjctTypeIoaBusError64;
+	ej.ioa.addr = (((uint64_t)addr_hi) << 32) | addr_lo;
+	ej.ioa.mask = (((uint64_t)mask_hi) << 32) | mask_lo;
+	ej.ioa.phb_id = phb->opal_id;
+	ej.ioa.pe = pe->addr;
+	ej.ioa.function = op;
+	if (opal_err_injct(&ej) != OPAL_SUCCESS)
+		return -EIO;
+
+	return 0;
+}
+#endif /* CONFIG_VFIO_EEH */
+
+struct errinjct_handler {
+	bool virt;
+	int token;
+	int (*fn)(struct rtas_args *arg);
+};
+
+static struct errinjct_handler handlers[] = {
+#ifdef CONFIG_EEH
+	{ false,
+	  OpalErrinjctTypeIoaBusError,
+	  powernv_errinjct_ioa
+	},
+	{ false,
+	  OpalErrinjctTypeIoaBusError64,
+          powernv_errinjct_ioa64
+	},
+#endif
+#ifdef CONFIG_VFIO_EEH
+	{ true,
+	  OpalErrinjctTypeIoaBusError,
+	  powernv_errinjct_ioa_virt
+	},
+	{ true,
+	  OpalErrinjctTypeIoaBusError64,
+	  powernv_errinjct_ioa64_virt
+	},
+#endif
+};
+
+static int powernv_errinjct(struct rtas_args *args)
+{
+	struct errinjct_handler *h;
+	int token, ej_token, i;
+	bool virt;
+
+	/* Sanity check */
+	if (args->nargs != 5 || args->nret != 1)
+		return -EINVAL;
+
+	token = args->token;
+	virt = !!args->args[0];
+	if (!virt || token != OPAL_CALL_TOKEN_ERRINJCT)
+		return -EINVAL;
+
+	/* Call into specific handler */
+	ej_token = args->args[1];
+	for (i = 0; i < ARRAY_SIZE(handlers); i++) {
+		h = &handlers[i];
+		if (h->virt == virt &&
+		    h->token == ej_token &&
+		    h->fn)
+			return h->fn(args);
+	}
+
+	return -ENXIO;
+}
+
+static int __init powernv_errinjct_init(void)
+{
+	int ret;
+
+	ret = opal_call_handler_register(false, OPAL_CALL_TOKEN_ERRINJCT,
+					 powernv_errinjct);
+	if (ret) {
+		pr_warn("%s: Cannot register errinjct handler\n",
+			__func__);
+		return ret;
+	}
+
+	ret = opal_call_handler_register(true, OPAL_CALL_TOKEN_ERRINJCT,
+					 powernv_errinjct);
+	if (ret) {
+		pr_warn("%s: Cannot register errinjct virtual handler\n",
+			__func__);
+		return ret;
+	}
+
+	return 0;
+}
+
+module_init(powernv_errinjct_init);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest
  2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
                   ` (9 preceding siblings ...)
  2014-05-09  7:49 ` [PATCH 10/10] powerpc/powernv: Error injection infrastructure Gavin Shan
@ 2014-05-09  7:54 ` Gavin Shan
  10 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2014-05-09  7:54 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, agraf, kvm-ppc, alex.williamson, qiudayu, linuxppc-dev

On Fri, May 09, 2014 at 05:49:32PM +1000, Gavin Shan wrote:

Sorry for having missed cc'ing Alex Graf. Amending it.

>The series of patches intends to support EEH for PCI devices, which are
>passed through to PowerKVM based guest via VFIO. The implementation is
>straightforward based on the issues or problems we have to resolve to
>support EEH for PowerKVM based guest.
>
>- Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly.
>  If QEMU can't handle it, the request will be sent to host via newly introduced
>  VFIO container IOCTL command (VFIO_EEH_INFO) and gets handled in host kernel.
>
>- The error injection infrastructure need support request from the userland
>  utility "errinjct" and PowerKVM based guest. The userland utility "errinjct"
>  works on pSeries platform well with dedicated syscall, which helps invoking
>  RTAS service to fulfil error injection in kernel. From the perspective, it's
>  reasonable to extend the syscall to support PowerNV platform so that OPAL call
>  can be invoked in host kernel for injecting errors. The data transported
>  between userland and kerenl is still following "struct rtas_args" for both
>  cases of PowerNV (OPAL) and pSeries (RTAS).
>
>The series of patches requires corresponding firmware changes from Mike Qiu to
>support error injection and QEMU changes to support EEH for guest. QEMU patchset
>will be sent separately.
>
>Change log
>==========
>v1 -> v2:
>	* EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl.
>	  The mechanism KVM in-kernel handling is dropped.
>	* Error injection is reimplemented based syscall, instead of KVM in-kerenl
>	  handling. The logic for error injection token management is moved to
>	  QEMU. The error injection request is routed to QEMU and then possiblly
>	  to host kernel.
>
>Testing on P7
>=============
>
>- Emulex adapter
>
>Testing on P8
>=============
>
>- Need more testing after design is finalized.
>
>-----
>
>Gavin Shan (10):
>  drivers/vfio: Introduce CONFIG_VFIO_EEH
>  powerpc/eeh: Info to trace passed devices
>  powerpc/eeh: Search EEH device by guest address
>  powerpc/eeh: Search EEH PE by guest address
>  drivers/vfio: New IOCTL command VFIO_EEH_INFO
>  powerpc/eeh: Avoid event on passed PE
>  powerpc/powernv: Sync OPAL header file with firmware
>  powerpc: Extend syscall ppc_rtas()
>  powerpc/powernv: Implement ppc_call_opal()
>  powerpc/powernv: Error injection infrastructure
>
>arch/powerpc/include/asm/eeh.h                 |  52 +++++++++++++
>arch/powerpc/include/asm/opal.h                |  74 +++++++++++++++++-
>arch/powerpc/include/asm/rtas.h                |  10 ++-
>arch/powerpc/include/asm/syscalls.h            |   2 +-
>arch/powerpc/include/asm/systbl.h              |   2 +-
>arch/powerpc/include/uapi/asm/unistd.h         |   2 +-
>arch/powerpc/kernel/eeh.c                      |   8 ++
>arch/powerpc/kernel/eeh_pe.c                   |  80 +++++++++++++++++++
>arch/powerpc/kernel/rtas.c                     |  57 +++-----------
>arch/powerpc/kernel/syscalls.c                 |  50 ++++++++++++
>arch/powerpc/platforms/powernv/Makefile        |   3 +-
>arch/powerpc/platforms/powernv/eeh-ioda.c      |   3 +-
>arch/powerpc/platforms/powernv/eeh-vfio.c      | 584 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>arch/powerpc/platforms/powernv/errinject.c     | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>arch/powerpc/platforms/powernv/opal.c          |  93 ++++++++++++++++++++++
>drivers/vfio/Kconfig                           |   6 ++
>drivers/vfio/vfio_iommu_spapr_tce.c            |  12 +++
>include/uapi/linux/vfio.h                      |  61 +++++++++++++++
>kernel/sys_ni.c                                |   2 +-
>20 files changed, 1271 insertions(+), 53 deletions(-)
>create mode 100644 arch/powerpc/platforms/powernv/eeh-vfio.c
>create mode 100644 arch/powerpc/platforms/powernv/errinject.c
>
>Thanks,
>Gavin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-09  7:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-09  7:49 [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan
2014-05-09  7:49 ` [PATCH 01/10] drivers/vfio: Introduce CONFIG_VFIO_EEH Gavin Shan
2014-05-09  7:49 ` [PATCH 02/10] powerpc/eeh: Info to trace passed devices Gavin Shan
2014-05-09  7:49 ` [PATCH 03/10] powerpc/eeh: Search EEH device by guest address Gavin Shan
2014-05-09  7:49 ` [PATCH 04/10] powerpc/eeh: Search EEH PE " Gavin Shan
2014-05-09  7:49 ` [PATCH 05/10] drivers/vfio: New IOCTL command VFIO_EEH_INFO Gavin Shan
2014-05-09  7:49 ` [PATCH 06/10] powerpc/eeh: Avoid event on passed PE Gavin Shan
2014-05-09  7:49 ` [PATCH 07/10] powerpc/powernv: Sync OPAL header file with firmware Gavin Shan
2014-05-09  7:49 ` [PATCH 08/10] powerpc: Extend syscall ppc_rtas() Gavin Shan
2014-05-09  7:49 ` [PATCH 09/10] powerpc/powernv: Implement ppc_call_opal() Gavin Shan
2014-05-09  7:49 ` [PATCH 10/10] powerpc/powernv: Error injection infrastructure Gavin Shan
2014-05-09  7:54 ` [PATCH RFC v2 00/10] EEH Support for VFIO PCI devices on PowerKVM guest Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).