[PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2)
@ 2018-10-15  9:32 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

This is a second set of patches required for passing through NVIDIA V100
with coherent memory. The full patchset is here:
https://github.com/aik/linux/tree/nv2
The matching QEMU is here:
https://github.com/aik/qemu/tree/nv2

This patchset reworks NPU DMA code to be used with VFIO. The exported symbols
are called from the NVIDIA driver so they must be able to work in both
powernv and pseries platforms.

This depends on skiboot's "npu2: Clear XTS_BDF_MAP when destroying context for next init_context"
posted earlier today.

Please comment. Thanks.

Alexey Kardashevskiy (5):
  powerpc/powernv/npu: Add helper to access struct npu for NPU device
  powerpc/powernv/npu: Collect all static symbols under one struct
  powerpc/powernv: Detach npu struct from pnv_phb
  powerpc/powernv/npu: Factor out OPAL calls from context manipulation
  powerpc/powernv/npu: Add helper to map GPU to LPAR

 arch/powerpc/include/asm/pci.h            |   4 +
 arch/powerpc/platforms/powernv/pci.h      |  18 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 298 ++++++++++++++++++++----------
 arch/powerpc/platforms/powernv/pci-ioda.c |   9 +-
 4 files changed, 212 insertions(+), 117 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2)
@ 2018-10-15  9:32 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

This is a second set of patches required for passing through NVIDIA V100
with coherent memory. The full patchset is here:
https://github.com/aik/linux/tree/nv2
The matching QEMU is here:
https://github.com/aik/qemu/tree/nv2

This patchset reworks NPU DMA code to be used with VFIO. The exported symbols
are called from the NVIDIA driver so they must be able to work in both
powernv and pseries platforms.

This depends on skiboot's "npu2: Clear XTS_BDF_MAP when destroying context for next init_context"
posted earlier today.

Please comment. Thanks.

Alexey Kardashevskiy (5):
  powerpc/powernv/npu: Add helper to access struct npu for NPU device
  powerpc/powernv/npu: Collect all static symbols under one struct
  powerpc/powernv: Detach npu struct from pnv_phb
  powerpc/powernv/npu: Factor out OPAL calls from context manipulation
  powerpc/powernv/npu: Add helper to map GPU to LPAR

 arch/powerpc/include/asm/pci.h            |   4 +
 arch/powerpc/platforms/powernv/pci.h      |  18 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 298 ++++++++++++++++++++----------
 arch/powerpc/platforms/powernv/pci-ioda.c |   9 +-
 4 files changed, 212 insertions(+), 117 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

This step is to help removing the npu struct from pnv_phb so it
can be used by pseries as well.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 3a5c4ed..13e5153 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -389,6 +389,18 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 	return gpe;
 }
 
+/*
+ * NPU2 ATS
+ */
+static struct npu *npdev_to_npu(struct pci_dev *npdev)
+{
+	struct pnv_phb *nphb;
+
+	nphb = pci_bus_to_host(npdev->bus)->private_data;
+
+	return &nphb->npu;
+}
+
 /* Maximum number of nvlinks per npu */
 #define NV_MAX_LINKS 6
 
@@ -546,7 +558,6 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 	int i, j;
 	struct npu *npu;
 	struct pci_dev *npdev;
-	struct pnv_phb *nphb;
 
 	for (i = 0; i <= max_npu2_index; i++) {
 		mmio_atsd_reg[i].reg = -1;
@@ -561,8 +572,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 			if (!npdev)
 				continue;
 
-			nphb = pci_bus_to_host(npdev->bus)->private_data;
-			npu = &nphb->npu;
+			npu = npdev_to_npu(npdev);
 			mmio_atsd_reg[i].npu = npu;
 			mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
 			while (mmio_atsd_reg[i].reg < 0) {
@@ -749,7 +759,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	}
 
 	nphb = pci_bus_to_host(npdev->bus)->private_data;
-	npu = &nphb->npu;
+	npu = npdev_to_npu(npdev);
 
 	/*
 	 * Setup the NPU context table for a particular GPU. These need to be
@@ -869,7 +879,7 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 		return;
 
 	nphb = pci_bus_to_host(npdev->bus)->private_data;
-	npu = &nphb->npu;
+	npu = npdev_to_npu(npdev);
 	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
 	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
 							&nvlink_index)))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

This step is to help removing the npu struct from pnv_phb so it
can be used by pseries as well.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 3a5c4ed..13e5153 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -389,6 +389,18 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 	return gpe;
 }
 
+/*
+ * NPU2 ATS
+ */
+static struct npu *npdev_to_npu(struct pci_dev *npdev)
+{
+	struct pnv_phb *nphb;
+
+	nphb = pci_bus_to_host(npdev->bus)->private_data;
+
+	return &nphb->npu;
+}
+
 /* Maximum number of nvlinks per npu */
 #define NV_MAX_LINKS 6
 
@@ -546,7 +558,6 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 	int i, j;
 	struct npu *npu;
 	struct pci_dev *npdev;
-	struct pnv_phb *nphb;
 
 	for (i = 0; i <= max_npu2_index; i++) {
 		mmio_atsd_reg[i].reg = -1;
@@ -561,8 +572,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 			if (!npdev)
 				continue;
 
-			nphb = pci_bus_to_host(npdev->bus)->private_data;
-			npu = &nphb->npu;
+			npu = npdev_to_npu(npdev);
 			mmio_atsd_reg[i].npu = npu;
 			mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
 			while (mmio_atsd_reg[i].reg < 0) {
@@ -749,7 +759,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	}
 
 	nphb = pci_bus_to_host(npdev->bus)->private_data;
-	npu = &nphb->npu;
+	npu = npdev_to_npu(npdev);
 
 	/*
 	 * Setup the NPU context table for a particular GPU. These need to be
@@ -869,7 +879,7 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 		return;
 
 	nphb = pci_bus_to_host(npdev->bus)->private_data;
-	npu = &nphb->npu;
+	npu = npdev_to_npu(npdev);
 	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
 	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
 							&nvlink_index)))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

We are going to add a global list of NPUs in the system which is going
to be yet another static symbol. Let's reorganise the code and put all
static symbols into one struct for better tracking what is really needed
for NPU (this might become a driver data some day).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h            |  1 +
 arch/powerpc/platforms/powernv/npu-dma.c  | 77 ++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci-ioda.c |  2 +
 3 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 2af9ded..1a96075 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -129,5 +129,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
 
 extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
+extern void pnv_npu2_devices_init(void);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 13e5153..01402f9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -22,20 +22,6 @@
 #include "pci.h"
 
 /*
- * spinlock to protect initialisation of an npu_context for a particular
- * mm_struct.
- */
-static DEFINE_SPINLOCK(npu_context_lock);
-
-/*
- * When an address shootdown range exceeds this threshold we invalidate the
- * entire TLB on the GPU for the given PID rather than each specific address in
- * the range.
- */
-static uint64_t atsd_threshold = 2 * 1024 * 1024;
-static struct dentry *atsd_threshold_dentry;
-
-/*
  * Other types of TCE cache invalidation are not functional in the
  * hardware.
  */
@@ -392,6 +378,33 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 /*
  * NPU2 ATS
  */
+static struct {
+	/*
+	 * spinlock to protect initialisation of an npu_context for
+	 * a particular mm_struct.
+	 */
+	spinlock_t context_lock;
+
+	/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
+	int max_index;
+
+	/*
+	 * When an address shootdown range exceeds this threshold we invalidate the
+	 * entire TLB on the GPU for the given PID rather than each specific address in
+	 * the range.
+	 */
+	uint64_t atsd_threshold;
+	struct dentry *atsd_threshold_dentry;
+
+} npu2_devices;
+
+void pnv_npu2_devices_init(void)
+{
+	memset(&npu2_devices, 0, sizeof(npu2_devices));
+	spin_lock_init(&npu2_devices.context_lock);
+	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
+}
+
 static struct npu *npdev_to_npu(struct pci_dev *npdev)
 {
 	struct pnv_phb *nphb;
@@ -404,9 +417,6 @@ static struct npu *npdev_to_npu(struct pci_dev *npdev)
 /* Maximum number of nvlinks per npu */
 #define NV_MAX_LINKS 6
 
-/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
-static int max_npu2_index;
-
 struct npu_context {
 	struct mm_struct *mm;
 	struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
@@ -472,7 +482,7 @@ static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
 	int i;
 	unsigned long launch;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -503,7 +513,7 @@ static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
 	int i;
 	unsigned long launch;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -536,7 +546,7 @@ static void mmio_invalidate_wait(
 	int i, reg;
 
 	/* Wait for all invalidations to complete */
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -559,7 +569,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 	struct npu *npu;
 	struct pci_dev *npdev;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		mmio_atsd_reg[i].reg = -1;
 		for (j = 0; j < NV_MAX_LINKS; j++) {
 			/*
@@ -593,7 +603,7 @@ static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
 {
 	int i;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		/*
 		 * We can't rely on npu_context->npdev[][] being the same here
 		 * as when acquire_atsd_reg() was called, hence we use the
@@ -683,7 +693,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 	struct npu_context *npu_context = mn_to_npu_context(mn);
 	unsigned long address;
 
-	if (end - start > atsd_threshold) {
+	if (end - start > npu2_devices.atsd_threshold) {
 		/*
 		 * Just invalidate the entire PID if the address range is too
 		 * large.
@@ -777,12 +787,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 * We store the npu pci device so we can more easily get at the
 	 * associated npus.
 	 */
-	spin_lock(&npu_context_lock);
+	spin_lock(&npu2_devices.context_lock);
 	npu_context = mm->context.npu_context;
 	if (npu_context) {
 		if (npu_context->release_cb != cb ||
 			npu_context->priv != priv) {
-			spin_unlock(&npu_context_lock);
+			spin_unlock(&npu2_devices.context_lock);
 			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
 						PCI_DEVID(gpdev->bus->number,
 							gpdev->devfn));
@@ -791,12 +801,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 
 		WARN_ON(!kref_get_unless_zero(&npu_context->kref));
 	}
-	spin_unlock(&npu_context_lock);
+	spin_unlock(&npu2_devices.context_lock);
 
 	if (!npu_context) {
 		/*
 		 * We can set up these fields without holding the
-		 * npu_context_lock as the npu_context hasn't been returned to
+		 * npu2_devices.context_lock as the npu_context hasn't been returned to
 		 * the caller meaning it can't be destroyed. Parallel allocation
 		 * is protected against by mmap_sem.
 		 */
@@ -887,9 +897,9 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
 	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
 				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
-	spin_lock(&npu_context_lock);
+	spin_lock(&npu2_devices.context_lock);
 	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
-	spin_unlock(&npu_context_lock);
+	spin_unlock(&npu2_devices.context_lock);
 
 	/*
 	 * We need to do this outside of pnv_npu2_release_context so that it is
@@ -958,9 +968,10 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	static int npu_index;
 	uint64_t rc = 0;
 
-	if (!atsd_threshold_dentry) {
-		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
-				   0600, powerpc_debugfs_root, &atsd_threshold);
+	if (!npu2_devices.atsd_threshold_dentry) {
+		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
+				"atsd_threshold", 0600, powerpc_debugfs_root,
+				&npu2_devices.atsd_threshold);
 	}
 
 	phb->npu.nmmu_flush =
@@ -988,7 +999,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	npu_index++;
 	if (WARN_ON(npu_index >= NV_MAX_NPUS))
 		return -ENOSPC;
-	max_npu2_index = npu_index;
+	npu2_devices.max_index = npu_index;
 	phb->npu.index = npu_index;
 
 	return 0;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e37b9cc..0cc81c0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1279,6 +1279,8 @@ static void pnv_pci_ioda_setup_PEs(void)
 	struct pci_bus *bus;
 	struct pci_dev *pdev;
 
+	pnv_npu2_devices_init();
+
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 		if (phb->type == PNV_PHB_NPU_NVLINK) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

We are going to add a global list of NPUs in the system which is going
to be yet another static symbol. Let's reorganise the code and put all
static symbols into one struct for better tracking what is really needed
for NPU (this might become a driver data some day).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h            |  1 +
 arch/powerpc/platforms/powernv/npu-dma.c  | 77 ++++++++++++++++++-------------
 arch/powerpc/platforms/powernv/pci-ioda.c |  2 +
 3 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 2af9ded..1a96075 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -129,5 +129,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
 
 extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
+extern void pnv_npu2_devices_init(void);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 13e5153..01402f9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -22,20 +22,6 @@
 #include "pci.h"
 
 /*
- * spinlock to protect initialisation of an npu_context for a particular
- * mm_struct.
- */
-static DEFINE_SPINLOCK(npu_context_lock);
-
-/*
- * When an address shootdown range exceeds this threshold we invalidate the
- * entire TLB on the GPU for the given PID rather than each specific address in
- * the range.
- */
-static uint64_t atsd_threshold = 2 * 1024 * 1024;
-static struct dentry *atsd_threshold_dentry;
-
-/*
  * Other types of TCE cache invalidation are not functional in the
  * hardware.
  */
@@ -392,6 +378,33 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 /*
  * NPU2 ATS
  */
+static struct {
+	/*
+	 * spinlock to protect initialisation of an npu_context for
+	 * a particular mm_struct.
+	 */
+	spinlock_t context_lock;
+
+	/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
+	int max_index;
+
+	/*
+	 * When an address shootdown range exceeds this threshold we invalidate the
+	 * entire TLB on the GPU for the given PID rather than each specific address in
+	 * the range.
+	 */
+	uint64_t atsd_threshold;
+	struct dentry *atsd_threshold_dentry;
+
+} npu2_devices;
+
+void pnv_npu2_devices_init(void)
+{
+	memset(&npu2_devices, 0, sizeof(npu2_devices));
+	spin_lock_init(&npu2_devices.context_lock);
+	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
+}
+
 static struct npu *npdev_to_npu(struct pci_dev *npdev)
 {
 	struct pnv_phb *nphb;
@@ -404,9 +417,6 @@ static struct npu *npdev_to_npu(struct pci_dev *npdev)
 /* Maximum number of nvlinks per npu */
 #define NV_MAX_LINKS 6
 
-/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
-static int max_npu2_index;
-
 struct npu_context {
 	struct mm_struct *mm;
 	struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
@@ -472,7 +482,7 @@ static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
 	int i;
 	unsigned long launch;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -503,7 +513,7 @@ static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
 	int i;
 	unsigned long launch;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -536,7 +546,7 @@ static void mmio_invalidate_wait(
 	int i, reg;
 
 	/* Wait for all invalidations to complete */
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		if (mmio_atsd_reg[i].reg < 0)
 			continue;
 
@@ -559,7 +569,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
 	struct npu *npu;
 	struct pci_dev *npdev;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		mmio_atsd_reg[i].reg = -1;
 		for (j = 0; j < NV_MAX_LINKS; j++) {
 			/*
@@ -593,7 +603,7 @@ static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
 {
 	int i;
 
-	for (i = 0; i <= max_npu2_index; i++) {
+	for (i = 0; i <= npu2_devices.max_index; i++) {
 		/*
 		 * We can't rely on npu_context->npdev[][] being the same here
 		 * as when acquire_atsd_reg() was called, hence we use the
@@ -683,7 +693,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 	struct npu_context *npu_context = mn_to_npu_context(mn);
 	unsigned long address;
 
-	if (end - start > atsd_threshold) {
+	if (end - start > npu2_devices.atsd_threshold) {
 		/*
 		 * Just invalidate the entire PID if the address range is too
 		 * large.
@@ -777,12 +787,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 * We store the npu pci device so we can more easily get at the
 	 * associated npus.
 	 */
-	spin_lock(&npu_context_lock);
+	spin_lock(&npu2_devices.context_lock);
 	npu_context = mm->context.npu_context;
 	if (npu_context) {
 		if (npu_context->release_cb != cb ||
 			npu_context->priv != priv) {
-			spin_unlock(&npu_context_lock);
+			spin_unlock(&npu2_devices.context_lock);
 			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
 						PCI_DEVID(gpdev->bus->number,
 							gpdev->devfn));
@@ -791,12 +801,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 
 		WARN_ON(!kref_get_unless_zero(&npu_context->kref));
 	}
-	spin_unlock(&npu_context_lock);
+	spin_unlock(&npu2_devices.context_lock);
 
 	if (!npu_context) {
 		/*
 		 * We can set up these fields without holding the
-		 * npu_context_lock as the npu_context hasn't been returned to
+		 * npu2_devices.context_lock as the npu_context hasn't been returned to
 		 * the caller meaning it can't be destroyed. Parallel allocation
 		 * is protected against by mmap_sem.
 		 */
@@ -887,9 +897,9 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
 	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
 				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
-	spin_lock(&npu_context_lock);
+	spin_lock(&npu2_devices.context_lock);
 	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
-	spin_unlock(&npu_context_lock);
+	spin_unlock(&npu2_devices.context_lock);
 
 	/*
 	 * We need to do this outside of pnv_npu2_release_context so that it is
@@ -958,9 +968,10 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	static int npu_index;
 	uint64_t rc = 0;
 
-	if (!atsd_threshold_dentry) {
-		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
-				   0600, powerpc_debugfs_root, &atsd_threshold);
+	if (!npu2_devices.atsd_threshold_dentry) {
+		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
+				"atsd_threshold", 0600, powerpc_debugfs_root,
+				&npu2_devices.atsd_threshold);
 	}
 
 	phb->npu.nmmu_flush @@ -988,7 +999,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	npu_index++;
 	if (WARN_ON(npu_index >= NV_MAX_NPUS))
 		return -ENOSPC;
-	max_npu2_index = npu_index;
+	npu2_devices.max_index = npu_index;
 	phb->npu.index = npu_index;
 
 	return 0;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e37b9cc..0cc81c0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1279,6 +1279,8 @@ static void pnv_pci_ioda_setup_PEs(void)
 	struct pci_bus *bus;
 	struct pci_dev *pdev;
 
+	pnv_npu2_devices_init();
+
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
 		if (phb->type = PNV_PHB_NPU_NVLINK) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.

This adds a global list of NPUs so each platform can register and use
these in the same fashion.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci.h     | 16 -------
 arch/powerpc/platforms/powernv/npu-dma.c | 71 +++++++++++++++++++++++++-------
 2 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 8b37b28..3b7617d 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -8,9 +8,6 @@
 
 struct pci_dn;
 
-/* Maximum possible number of ATSD MMIO registers per NPU */
-#define NV_NMMU_ATSD_REGS 8
-
 enum pnv_phb_type {
 	PNV_PHB_IODA1		= 0,
 	PNV_PHB_IODA2		= 1,
@@ -180,19 +177,6 @@ struct pnv_phb {
 	unsigned int		diag_data_size;
 	u8			*diag_data;
 
-	/* Nvlink2 data */
-	struct npu {
-		int index;
-		__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
-		unsigned int mmio_atsd_count;
-
-		/* Bitmask for MMIO register usage */
-		unsigned long mmio_atsd_usage;
-
-		/* Do we need to explicitly flush the nest mmu? */
-		bool nmmu_flush;
-	} npu;
-
 	int p2p_target_count;
 };
 
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 01402f9..cb2b4f9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -378,6 +378,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 /*
  * NPU2 ATS
  */
+/* Maximum possible number of ATSD MMIO registers per NPU */
+#define NV_NMMU_ATSD_REGS 8
+
+struct npu {
+	int index;
+	__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
+	unsigned int mmio_atsd_count;
+
+	/* Bitmask for MMIO register usage */
+	unsigned long mmio_atsd_usage;
+
+	/* Do we need to explicitly flush the nest mmu? */
+	bool nmmu_flush;
+
+	struct list_head next;
+
+	struct pci_controller *hose;
+};
+
 static struct {
 	/*
 	 * spinlock to protect initialisation of an npu_context for
@@ -396,22 +415,27 @@ static struct {
 	uint64_t atsd_threshold;
 	struct dentry *atsd_threshold_dentry;
 
+	struct list_head npu_list;
 } npu2_devices;
 
 void pnv_npu2_devices_init(void)
 {
 	memset(&npu2_devices, 0, sizeof(npu2_devices));
+	INIT_LIST_HEAD(&npu2_devices.npu_list);
 	spin_lock_init(&npu2_devices.context_lock);
 	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
 }
 
 static struct npu *npdev_to_npu(struct pci_dev *npdev)
 {
-	struct pnv_phb *nphb;
+	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
+	struct npu *npu;
 
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
+	list_for_each_entry(npu, &npu2_devices.npu_list, next)
+		if (hose == npu->hose)
+			return npu;
 
-	return &nphb->npu;
+	return NULL;
 }
 
 /* Maximum number of nvlinks per npu */
@@ -843,7 +867,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 */
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
 
-	if (!nphb->npu.nmmu_flush) {
+	if (!npu->nmmu_flush) {
 		/*
 		 * If we're not explicitly flushing ourselves we need to mark
 		 * the thread for global flushes
@@ -967,6 +991,13 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	struct pci_dev *gpdev;
 	static int npu_index;
 	uint64_t rc = 0;
+	struct pci_controller *hose = phb->hose;
+	struct npu *npu;
+	int ret;
+
+	npu = kzalloc(sizeof(*npu), GFP_KERNEL);
+	if (!npu)
+		return -ENOMEM;
 
 	if (!npu2_devices.atsd_threshold_dentry) {
 		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
@@ -974,8 +1005,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
 				&npu2_devices.atsd_threshold);
 	}
 
-	phb->npu.nmmu_flush =
-		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
+	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
 	for_each_child_of_node(phb->hose->dn, dn) {
 		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
 		if (gpdev) {
@@ -989,18 +1019,31 @@ int pnv_npu2_init(struct pnv_phb *phb)
 		}
 	}
 
-	for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
+	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
 							i, &mmio_atsd); i++)
-		phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
+		npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
 
-	pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
-	phb->npu.mmio_atsd_count = i;
-	phb->npu.mmio_atsd_usage = 0;
+	pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
+	npu->mmio_atsd_count = i;
+	npu->mmio_atsd_usage = 0;
 	npu_index++;
-	if (WARN_ON(npu_index >= NV_MAX_NPUS))
-		return -ENOSPC;
+	if (WARN_ON(npu_index >= NV_MAX_NPUS)) {
+		ret = -ENOSPC;
+		goto fail_exit;
+	}
 	npu2_devices.max_index = npu_index;
-	phb->npu.index = npu_index;
+	npu->index = npu_index;
+	npu->hose = hose;
+
+	list_add(&npu->next, &npu2_devices.npu_list);
 
 	return 0;
+
+fail_exit:
+	for (i = 0; i < npu->mmio_atsd_count; ++i)
+		iounmap(npu->mmio_atsd_regs[i]);
+
+	kfree(npu);
+
+	return ret;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb
@ 2018-10-15  9:32   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.

This adds a global list of NPUs so each platform can register and use
these in the same fashion.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci.h     | 16 -------
 arch/powerpc/platforms/powernv/npu-dma.c | 71 +++++++++++++++++++++++++-------
 2 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 8b37b28..3b7617d 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -8,9 +8,6 @@
 
 struct pci_dn;
 
-/* Maximum possible number of ATSD MMIO registers per NPU */
-#define NV_NMMU_ATSD_REGS 8
-
 enum pnv_phb_type {
 	PNV_PHB_IODA1		= 0,
 	PNV_PHB_IODA2		= 1,
@@ -180,19 +177,6 @@ struct pnv_phb {
 	unsigned int		diag_data_size;
 	u8			*diag_data;
 
-	/* Nvlink2 data */
-	struct npu {
-		int index;
-		__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
-		unsigned int mmio_atsd_count;
-
-		/* Bitmask for MMIO register usage */
-		unsigned long mmio_atsd_usage;
-
-		/* Do we need to explicitly flush the nest mmu? */
-		bool nmmu_flush;
-	} npu;
-
 	int p2p_target_count;
 };
 
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 01402f9..cb2b4f9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -378,6 +378,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
 /*
  * NPU2 ATS
  */
+/* Maximum possible number of ATSD MMIO registers per NPU */
+#define NV_NMMU_ATSD_REGS 8
+
+struct npu {
+	int index;
+	__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
+	unsigned int mmio_atsd_count;
+
+	/* Bitmask for MMIO register usage */
+	unsigned long mmio_atsd_usage;
+
+	/* Do we need to explicitly flush the nest mmu? */
+	bool nmmu_flush;
+
+	struct list_head next;
+
+	struct pci_controller *hose;
+};
+
 static struct {
 	/*
 	 * spinlock to protect initialisation of an npu_context for
@@ -396,22 +415,27 @@ static struct {
 	uint64_t atsd_threshold;
 	struct dentry *atsd_threshold_dentry;
 
+	struct list_head npu_list;
 } npu2_devices;
 
 void pnv_npu2_devices_init(void)
 {
 	memset(&npu2_devices, 0, sizeof(npu2_devices));
+	INIT_LIST_HEAD(&npu2_devices.npu_list);
 	spin_lock_init(&npu2_devices.context_lock);
 	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
 }
 
 static struct npu *npdev_to_npu(struct pci_dev *npdev)
 {
-	struct pnv_phb *nphb;
+	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
+	struct npu *npu;
 
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
+	list_for_each_entry(npu, &npu2_devices.npu_list, next)
+		if (hose = npu->hose)
+			return npu;
 
-	return &nphb->npu;
+	return NULL;
 }
 
 /* Maximum number of nvlinks per npu */
@@ -843,7 +867,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 */
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
 
-	if (!nphb->npu.nmmu_flush) {
+	if (!npu->nmmu_flush) {
 		/*
 		 * If we're not explicitly flushing ourselves we need to mark
 		 * the thread for global flushes
@@ -967,6 +991,13 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	struct pci_dev *gpdev;
 	static int npu_index;
 	uint64_t rc = 0;
+	struct pci_controller *hose = phb->hose;
+	struct npu *npu;
+	int ret;
+
+	npu = kzalloc(sizeof(*npu), GFP_KERNEL);
+	if (!npu)
+		return -ENOMEM;
 
 	if (!npu2_devices.atsd_threshold_dentry) {
 		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
@@ -974,8 +1005,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
 				&npu2_devices.atsd_threshold);
 	}
 
-	phb->npu.nmmu_flush -		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
+	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
 	for_each_child_of_node(phb->hose->dn, dn) {
 		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
 		if (gpdev) {
@@ -989,18 +1019,31 @@ int pnv_npu2_init(struct pnv_phb *phb)
 		}
 	}
 
-	for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
+	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
 							i, &mmio_atsd); i++)
-		phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
+		npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
 
-	pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
-	phb->npu.mmio_atsd_count = i;
-	phb->npu.mmio_atsd_usage = 0;
+	pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
+	npu->mmio_atsd_count = i;
+	npu->mmio_atsd_usage = 0;
 	npu_index++;
-	if (WARN_ON(npu_index >= NV_MAX_NPUS))
-		return -ENOSPC;
+	if (WARN_ON(npu_index >= NV_MAX_NPUS)) {
+		ret = -ENOSPC;
+		goto fail_exit;
+	}
 	npu2_devices.max_index = npu_index;
-	phb->npu.index = npu_index;
+	npu->index = npu_index;
+	npu->hose = hose;
+
+	list_add(&npu->next, &npu2_devices.npu_list);
 
 	return 0;
+
+fail_exit:
+	for (i = 0; i < npu->mmio_atsd_count; ++i)
+		iounmap(npu->mmio_atsd_regs[i]);
+
+	kfree(npu);
+
+	return ret;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:33   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:33 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

At the moment the NPU context init/destroy code calls OPAL. The init
handler in OPAL configures the NPU to pass ATS requests to nested MMU,
the destroy handler does nothing besides sanity checks.

Since the init handler programs the NPU with a wildcard for LPID/PID,
this can be done at the point where a GPU is mapped to an LPAR; it also
makes calling opal_npu_destroy_context() unnecessary in this context
(this will change with VFIO later though).

Also, the pnv_npu2_init() helper does not really need to call OPAL as
well as it inialized an NPU structure and does not interact with GPU or
NPU at that moment.

This moves OPAL calls to a separate helper. With this change, the API
for GPUs does not do any OPAL calls and therefore can be used by both
pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
should be called on powernv only as it does OPAL calls and it takes
an MSR mask which NPU adds to ATS requests so nested MMU knows what
translations are permitted; the VFIO/KVM will not set MSR_HV.

This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h            |   1 +
 arch/powerpc/platforms/powernv/pci.h      |   2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
 arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
 4 files changed, 57 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 1a96075..f196df6 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
 extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
 extern void pnv_npu2_devices_init(void);
+extern int pnv_npu2_init(struct pci_controller *hose);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3b7617d..ca2ce4b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
 extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
 extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
 extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
-extern int pnv_npu2_init(struct pnv_phb *phb);
+extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
 
 /* pci-ioda-tce.c */
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index cb2b4f9..677f30a 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	u32 nvlink_index;
 	struct device_node *nvlink_dn;
 	struct mm_struct *mm = current->mm;
-	struct pnv_phb *nphb;
 	struct npu *npu;
 	struct npu_context *npu_context;
 
@@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 */
 	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return ERR_PTR(-ENODEV);
-
 	if (!npdev)
 		/* No nvlink associated with this GPU device */
 		return ERR_PTR(-ENODEV);
@@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 		return ERR_PTR(-EINVAL);
 	}
 
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
 	npu = npdev_to_npu(npdev);
 
 	/*
-	 * Setup the NPU context table for a particular GPU. These need to be
-	 * per-GPU as we need the tables to filter ATSDs when there are no
-	 * active contexts on a particular GPU. It is safe for these to be
-	 * called concurrently with destroy as the OPAL call takes appropriate
-	 * locks and refcounts on init/destroy.
-	 */
-	rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
-	if (rc < 0)
-		return ERR_PTR(-ENOSPC);
-
-	/*
 	 * We store the npu pci device so we can more easily get at the
 	 * associated npus.
 	 */
@@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 		if (npu_context->release_cb != cb ||
 			npu_context->priv != priv) {
 			spin_unlock(&npu2_devices.context_lock);
-			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
-						PCI_DEVID(gpdev->bus->number,
-							gpdev->devfn));
 			return ERR_PTR(-EINVAL);
 		}
 
@@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 
 		if (rc) {
 			kfree(npu_context);
-			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
-					PCI_DEVID(gpdev->bus->number,
-						gpdev->devfn));
 			return ERR_PTR(rc);
 		}
 
@@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 			struct pci_dev *gpdev)
 {
 	int removed;
-	struct pnv_phb *nphb;
 	struct npu *npu;
 	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
 	struct device_node *nvlink_dn;
@@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 	if (WARN_ON(!npdev))
 		return;
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return;
-
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
 	npu = npdev_to_npu(npdev);
 	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
 	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
 							&nvlink_index)))
 		return;
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
-	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
 	spin_lock(&npu2_devices.context_lock);
 	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
 	spin_unlock(&npu2_devices.context_lock);
@@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
 	/* mmap_sem should be held so the struct_mm must be present */
 	struct mm_struct *mm = context->mm;
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return -ENODEV;
-
 	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
 
 	for (i = 0; i < count; i++) {
@@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
 }
 EXPORT_SYMBOL(pnv_npu2_handle_fault);
 
-int pnv_npu2_init(struct pnv_phb *phb)
+int pnv_npu2_init(struct pci_controller *hose)
 {
 	unsigned int i;
 	u64 mmio_atsd;
-	struct device_node *dn;
-	struct pci_dev *gpdev;
 	static int npu_index;
-	uint64_t rc = 0;
-	struct pci_controller *hose = phb->hose;
 	struct npu *npu;
 	int ret;
 
@@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	}
 
 	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
-	for_each_child_of_node(phb->hose->dn, dn) {
-		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
-		if (gpdev) {
-			rc = opal_npu_map_lpar(phb->opal_id,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
-				0, 0);
-			if (rc)
-				dev_err(&gpdev->dev,
-					"Error %lld mapping device to LPAR\n",
-					rc);
-		}
-	}
 
 	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
 							i, &mmio_atsd); i++)
@@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
 
 	return ret;
 }
+
+void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
+{
+	struct pci_dev *gpdev;
+	struct device_node *dn;
+	int ret;
+	struct pci_controller *hose = nphb->hose;
+
+	for_each_child_of_node(hose->dn, dn) {
+		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
+		if (!gpdev)
+			continue;
+
+		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
+		ret = opal_npu_map_lpar(nphb->opal_id,
+				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
+				0, 0);
+		if (!ret)
+			continue;
+		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
+	}
+
+	/*
+	 * It seems that touching NPU2_XTS_BDF_MAP in the way
+	 * the opal_npu_map_lpar() does somehow affects the result of
+	 * what opal_npu_init_context() does so let's put the latter in
+	 * a separate loop.
+	 */
+	for_each_child_of_node(hose->dn, dn) {
+		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
+		if (!gpdev)
+			continue;
+
+		/*
+		 * Setup the NPU context table for a particular GPU. These need
+		 * to be per-GPU as we need the tables to filter ATSDs when
+		 * there are no active contexts on a particular GPU. It is safe
+		 * for these to be called concurrently with destroy as the OPAL
+		 * call takes appropriate locks and refcounts on init/destroy.
+		 */
+		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
+				nphb->opal_id);
+		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
+				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+		if (!ret)
+			continue;
+		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
+	}
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0cc81c0..56a1398 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
 			/* PE#0 is needed for error reporting */
 			pnv_ioda_reserve_pe(phb, 0);
 			pnv_ioda_setup_npu_PEs(hose->bus);
-			if (phb->model == PNV_PHB_MODEL_NPU2)
-				pnv_npu2_init(phb);
+			if (phb->model == PNV_PHB_MODEL_NPU2) {
+				pnv_npu2_init(hose);
+				pnv_npu2_map_lpar_phb(phb,
+						MSR_DR | MSR_PR | MSR_HV);
+			}
 		}
 		if (phb->type == PNV_PHB_NPU_OCAPI) {
 			bus = hose->bus;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
@ 2018-10-15  9:33   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:33 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

At the moment the NPU context init/destroy code calls OPAL. The init
handler in OPAL configures the NPU to pass ATS requests to nested MMU,
the destroy handler does nothing besides sanity checks.

Since the init handler programs the NPU with a wildcard for LPID/PID,
this can be done at the point where a GPU is mapped to an LPAR; it also
makes calling opal_npu_destroy_context() unnecessary in this context
(this will change with VFIO later though).

Also, the pnv_npu2_init() helper does not really need to call OPAL as
well as it inialized an NPU structure and does not interact with GPU or
NPU at that moment.

This moves OPAL calls to a separate helper. With this change, the API
for GPUs does not do any OPAL calls and therefore can be used by both
pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
should be called on powernv only as it does OPAL calls and it takes
an MSR mask which NPU adds to ATS requests so nested MMU knows what
translations are permitted; the VFIO/KVM will not set MSR_HV.

This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h            |   1 +
 arch/powerpc/platforms/powernv/pci.h      |   2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
 arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
 4 files changed, 57 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 1a96075..f196df6 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
 extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
 extern void pnv_npu2_devices_init(void);
+extern int pnv_npu2_init(struct pci_controller *hose);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3b7617d..ca2ce4b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
 extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
 extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
 extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
-extern int pnv_npu2_init(struct pnv_phb *phb);
+extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
 
 /* pci-ioda-tce.c */
 #define POWERNV_IOMMU_DEFAULT_LEVELS	1
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index cb2b4f9..677f30a 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	u32 nvlink_index;
 	struct device_node *nvlink_dn;
 	struct mm_struct *mm = current->mm;
-	struct pnv_phb *nphb;
 	struct npu *npu;
 	struct npu_context *npu_context;
 
@@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 	 */
 	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return ERR_PTR(-ENODEV);
-
 	if (!npdev)
 		/* No nvlink associated with this GPU device */
 		return ERR_PTR(-ENODEV);
@@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 		return ERR_PTR(-EINVAL);
 	}
 
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
 	npu = npdev_to_npu(npdev);
 
 	/*
-	 * Setup the NPU context table for a particular GPU. These need to be
-	 * per-GPU as we need the tables to filter ATSDs when there are no
-	 * active contexts on a particular GPU. It is safe for these to be
-	 * called concurrently with destroy as the OPAL call takes appropriate
-	 * locks and refcounts on init/destroy.
-	 */
-	rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
-	if (rc < 0)
-		return ERR_PTR(-ENOSPC);
-
-	/*
 	 * We store the npu pci device so we can more easily get at the
 	 * associated npus.
 	 */
@@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 		if (npu_context->release_cb != cb ||
 			npu_context->priv != priv) {
 			spin_unlock(&npu2_devices.context_lock);
-			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
-						PCI_DEVID(gpdev->bus->number,
-							gpdev->devfn));
 			return ERR_PTR(-EINVAL);
 		}
 
@@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 
 		if (rc) {
 			kfree(npu_context);
-			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
-					PCI_DEVID(gpdev->bus->number,
-						gpdev->devfn));
 			return ERR_PTR(rc);
 		}
 
@@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 			struct pci_dev *gpdev)
 {
 	int removed;
-	struct pnv_phb *nphb;
 	struct npu *npu;
 	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
 	struct device_node *nvlink_dn;
@@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
 	if (WARN_ON(!npdev))
 		return;
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return;
-
-	nphb = pci_bus_to_host(npdev->bus)->private_data;
 	npu = npdev_to_npu(npdev);
 	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
 	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
 							&nvlink_index)))
 		return;
 	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
-	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
 	spin_lock(&npu2_devices.context_lock);
 	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
 	spin_unlock(&npu2_devices.context_lock);
@@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
 	/* mmap_sem should be held so the struct_mm must be present */
 	struct mm_struct *mm = context->mm;
 
-	if (!firmware_has_feature(FW_FEATURE_OPAL))
-		return -ENODEV;
-
 	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
 
 	for (i = 0; i < count; i++) {
@@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
 }
 EXPORT_SYMBOL(pnv_npu2_handle_fault);
 
-int pnv_npu2_init(struct pnv_phb *phb)
+int pnv_npu2_init(struct pci_controller *hose)
 {
 	unsigned int i;
 	u64 mmio_atsd;
-	struct device_node *dn;
-	struct pci_dev *gpdev;
 	static int npu_index;
-	uint64_t rc = 0;
-	struct pci_controller *hose = phb->hose;
 	struct npu *npu;
 	int ret;
 
@@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
 	}
 
 	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
-	for_each_child_of_node(phb->hose->dn, dn) {
-		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
-		if (gpdev) {
-			rc = opal_npu_map_lpar(phb->opal_id,
-				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
-				0, 0);
-			if (rc)
-				dev_err(&gpdev->dev,
-					"Error %lld mapping device to LPAR\n",
-					rc);
-		}
-	}
 
 	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
 							i, &mmio_atsd); i++)
@@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
 
 	return ret;
 }
+
+void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
+{
+	struct pci_dev *gpdev;
+	struct device_node *dn;
+	int ret;
+	struct pci_controller *hose = nphb->hose;
+
+	for_each_child_of_node(hose->dn, dn) {
+		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
+		if (!gpdev)
+			continue;
+
+		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
+		ret = opal_npu_map_lpar(nphb->opal_id,
+				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
+				0, 0);
+		if (!ret)
+			continue;
+		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
+	}
+
+	/*
+	 * It seems that touching NPU2_XTS_BDF_MAP in the way
+	 * the opal_npu_map_lpar() does somehow affects the result of
+	 * what opal_npu_init_context() does so let's put the latter in
+	 * a separate loop.
+	 */
+	for_each_child_of_node(hose->dn, dn) {
+		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
+		if (!gpdev)
+			continue;
+
+		/*
+		 * Setup the NPU context table for a particular GPU. These need
+		 * to be per-GPU as we need the tables to filter ATSDs when
+		 * there are no active contexts on a particular GPU. It is safe
+		 * for these to be called concurrently with destroy as the OPAL
+		 * call takes appropriate locks and refcounts on init/destroy.
+		 */
+		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
+				nphb->opal_id);
+		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
+				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+		if (!ret)
+			continue;
+		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
+	}
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0cc81c0..56a1398 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
 			/* PE#0 is needed for error reporting */
 			pnv_ioda_reserve_pe(phb, 0);
 			pnv_ioda_setup_npu_PEs(hose->bus);
-			if (phb->model = PNV_PHB_MODEL_NPU2)
-				pnv_npu2_init(phb);
+			if (phb->model = PNV_PHB_MODEL_NPU2) {
+				pnv_npu2_init(hose);
+				pnv_npu2_map_lpar_phb(phb,
+						MSR_DR | MSR_PR | MSR_HV);
+			}
 		}
 		if (phb->type = PNV_PHB_NPU_OCAPI) {
 			bus = hose->bus;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 5/5] powerpc/powernv/npu: Add helper to map GPU to LPAR
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:33   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:33 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

In order to make ATS work and translate addresses for arbitrary
LPID and PID, we need to program an NPU with these.

This implements a helper to assign a GPU to LPAR and program the NPU
with a wildcard for PID. The helper also takes MSR (only DR/HV/PR/SF bits
are allowed) to program them into NPU2 for ATS checkout requests.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h           |  2 ++
 arch/powerpc/platforms/powernv/npu-dma.c | 38 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index f196df6..c3c9728 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -131,5 +131,7 @@ extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
 extern void pnv_npu2_devices_init(void);
 extern int pnv_npu2_init(struct pci_controller *hose);
+extern int pnv_npu2_map_lpar_dev(struct pci_controller *hose,
+		struct pci_dev *gpdev, unsigned int lparid, unsigned long msr);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 677f30a..1dde753 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -1047,3 +1047,41 @@ void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
 		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
 	}
 }
+
+int pnv_npu2_map_lpar_dev(struct pci_controller *hose, struct pci_dev *gpdev,
+		unsigned int lparid, unsigned long msr)
+{
+	int ret;
+	struct pnv_phb *nphb = hose->private_data;
+
+	dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu lparid=%u\n",
+			nphb->opal_id, lparid);
+	/*
+	 * Currently we only support radix and non-zero LPCR only makes sense
+	 * for hash tables so skiboot expects the LPCR parameter to be a zero.
+	 */
+	ret = opal_npu_map_lpar(nphb->opal_id,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid,
+			0 /* LPCR bits */);
+	if (ret) {
+		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
+		return ret;
+	}
+
+	dev_dbg(&gpdev->dev, "destroy context opalid=%llu msr=%lx\n",
+			nphb->opal_id, msr);
+	ret = opal_npu_destroy_context(nphb->opal_id, 0/*__unused*/,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+	if (ret)
+		dev_err(&gpdev->dev, "Failed to destroy context: %d\n", ret);
+
+	dev_dbg(&gpdev->dev, "init context opalid=%llu msr=%lx\n",
+			nphb->opal_id, msr);
+	ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+	if (ret)
+		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pnv_npu2_map_lpar_dev);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 5/5] powerpc/powernv/npu: Add helper to map GPU to LPAR
@ 2018-10-15  9:33   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:33 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Frederic Barrat,
	Alex Williamson, kvm-ppc, David Gibson

In order to make ATS work and translate addresses for arbitrary
LPID and PID, we need to program an NPU with these.

This implements a helper to assign a GPU to LPAR and program the NPU
with a wildcard for PID. The helper also takes MSR (only DR/HV/PR/SF bits
are allowed) to program them into NPU2 for ATS checkout requests.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci.h           |  2 ++
 arch/powerpc/platforms/powernv/npu-dma.c | 38 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index f196df6..c3c9728 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -131,5 +131,7 @@ extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
 extern void pnv_npu2_devices_init(void);
 extern int pnv_npu2_init(struct pci_controller *hose);
+extern int pnv_npu2_map_lpar_dev(struct pci_controller *hose,
+		struct pci_dev *gpdev, unsigned int lparid, unsigned long msr);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 677f30a..1dde753 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -1047,3 +1047,41 @@ void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
 		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
 	}
 }
+
+int pnv_npu2_map_lpar_dev(struct pci_controller *hose, struct pci_dev *gpdev,
+		unsigned int lparid, unsigned long msr)
+{
+	int ret;
+	struct pnv_phb *nphb = hose->private_data;
+
+	dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu lparid=%u\n",
+			nphb->opal_id, lparid);
+	/*
+	 * Currently we only support radix and non-zero LPCR only makes sense
+	 * for hash tables so skiboot expects the LPCR parameter to be a zero.
+	 */
+	ret = opal_npu_map_lpar(nphb->opal_id,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid,
+			0 /* LPCR bits */);
+	if (ret) {
+		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
+		return ret;
+	}
+
+	dev_dbg(&gpdev->dev, "destroy context opalid=%llu msr=%lx\n",
+			nphb->opal_id, msr);
+	ret = opal_npu_destroy_context(nphb->opal_id, 0/*__unused*/,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+	if (ret)
+		dev_err(&gpdev->dev, "Failed to destroy context: %d\n", ret);
+
+	dev_dbg(&gpdev->dev, "init context opalid=%llu msr=%lx\n",
+			nphb->opal_id, msr);
+	ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
+			PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+	if (ret)
+		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pnv_npu2_map_lpar_dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2)
  2018-10-15  9:32 ` Alexey Kardashevskiy
@ 2018-10-15  9:34   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:34 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alistair Popple, Frederic Barrat, Alex Williamson, kvm-ppc,
	Reza Arbab, David Gibson

Strangely Reza is missed by the get_maintainers.pl, adding in cc: :-/


On 15/10/2018 20:32, Alexey Kardashevskiy wrote:
> This is a second set of patches required for passing through NVIDIA V100
> with coherent memory. The full patchset is here:
> https://github.com/aik/linux/tree/nv2
> The matching QEMU is here:
> https://github.com/aik/qemu/tree/nv2
> 
> 
> This patchset reworks NPU DMA code to be used with VFIO. The exported symbols
> are called from the NVIDIA driver so they must be able to work in both
> powernv and pseries platforms.
> 
> This depends on skiboot's "npu2: Clear XTS_BDF_MAP when destroying context for next init_context"
> posted earlier today.
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (5):
>   powerpc/powernv/npu: Add helper to access struct npu for NPU device
>   powerpc/powernv/npu: Collect all static symbols under one struct
>   powerpc/powernv: Detach npu struct from pnv_phb
>   powerpc/powernv/npu: Factor out OPAL calls from context manipulation
>   powerpc/powernv/npu: Add helper to map GPU to LPAR
> 
>  arch/powerpc/include/asm/pci.h            |   4 +
>  arch/powerpc/platforms/powernv/pci.h      |  18 +-
>  arch/powerpc/platforms/powernv/npu-dma.c  | 298 ++++++++++++++++++++----------
>  arch/powerpc/platforms/powernv/pci-ioda.c |   9 +-
>  4 files changed, 212 insertions(+), 117 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2)
@ 2018-10-15  9:34   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  9:34 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alistair Popple, Frederic Barrat, Alex Williamson, kvm-ppc,
	Reza Arbab, David Gibson

Strangely Reza is missed by the get_maintainers.pl, adding in cc: :-/


On 15/10/2018 20:32, Alexey Kardashevskiy wrote:
> This is a second set of patches required for passing through NVIDIA V100
> with coherent memory. The full patchset is here:
> https://github.com/aik/linux/tree/nv2
> The matching QEMU is here:
> https://github.com/aik/qemu/tree/nv2
> 
> 
> This patchset reworks NPU DMA code to be used with VFIO. The exported symbols
> are called from the NVIDIA driver so they must be able to work in both
> powernv and pseries platforms.
> 
> This depends on skiboot's "npu2: Clear XTS_BDF_MAP when destroying context for next init_context"
> posted earlier today.
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (5):
>   powerpc/powernv/npu: Add helper to access struct npu for NPU device
>   powerpc/powernv/npu: Collect all static symbols under one struct
>   powerpc/powernv: Detach npu struct from pnv_phb
>   powerpc/powernv/npu: Factor out OPAL calls from context manipulation
>   powerpc/powernv/npu: Add helper to map GPU to LPAR
> 
>  arch/powerpc/include/asm/pci.h            |   4 +
>  arch/powerpc/platforms/powernv/pci.h      |  18 +-
>  arch/powerpc/platforms/powernv/npu-dma.c  | 298 ++++++++++++++++++++----------
>  arch/powerpc/platforms/powernv/pci-ioda.c |   9 +-
>  4 files changed, 212 insertions(+), 117 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device
  2018-10-15  9:32   ` Alexey Kardashevskiy
@ 2018-11-07  5:04     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 2595 bytes --]

On Mon, Oct 15, 2018 at 08:32:57PM +1100, Alexey Kardashevskiy wrote:
> This step is to help removing the npu struct from pnv_phb so it
> can be used by pseries as well.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 3a5c4ed..13e5153 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -389,6 +389,18 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  	return gpe;
>  }
>  
> +/*
> + * NPU2 ATS
> + */
> +static struct npu *npdev_to_npu(struct pci_dev *npdev)
> +{
> +	struct pnv_phb *nphb;
> +
> +	nphb = pci_bus_to_host(npdev->bus)->private_data;
> +
> +	return &nphb->npu;
> +}
> +
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> @@ -546,7 +558,6 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  	int i, j;
>  	struct npu *npu;
>  	struct pci_dev *npdev;
> -	struct pnv_phb *nphb;
>  
>  	for (i = 0; i <= max_npu2_index; i++) {
>  		mmio_atsd_reg[i].reg = -1;
> @@ -561,8 +572,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  			if (!npdev)
>  				continue;
>  
> -			nphb = pci_bus_to_host(npdev->bus)->private_data;
> -			npu = &nphb->npu;
> +			npu = npdev_to_npu(npdev);
>  			mmio_atsd_reg[i].npu = npu;
>  			mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
>  			while (mmio_atsd_reg[i].reg < 0) {
> @@ -749,7 +759,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	}
>  
>  	nphb = pci_bus_to_host(npdev->bus)->private_data;
> -	npu = &nphb->npu;
> +	npu = npdev_to_npu(npdev);
>  
>  	/*
>  	 * Setup the NPU context table for a particular GPU. These need to be
> @@ -869,7 +879,7 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  		return;
>  
>  	nphb = pci_bus_to_host(npdev->bus)->private_data;
> -	npu = &nphb->npu;
> +	npu = npdev_to_npu(npdev);
>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>  							&nvlink_index)))

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device
@ 2018-11-07  5:04     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 2595 bytes --]

On Mon, Oct 15, 2018 at 08:32:57PM +1100, Alexey Kardashevskiy wrote:
> This step is to help removing the npu struct from pnv_phb so it
> can be used by pseries as well.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 3a5c4ed..13e5153 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -389,6 +389,18 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  	return gpe;
>  }
>  
> +/*
> + * NPU2 ATS
> + */
> +static struct npu *npdev_to_npu(struct pci_dev *npdev)
> +{
> +	struct pnv_phb *nphb;
> +
> +	nphb = pci_bus_to_host(npdev->bus)->private_data;
> +
> +	return &nphb->npu;
> +}
> +
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> @@ -546,7 +558,6 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  	int i, j;
>  	struct npu *npu;
>  	struct pci_dev *npdev;
> -	struct pnv_phb *nphb;
>  
>  	for (i = 0; i <= max_npu2_index; i++) {
>  		mmio_atsd_reg[i].reg = -1;
> @@ -561,8 +572,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  			if (!npdev)
>  				continue;
>  
> -			nphb = pci_bus_to_host(npdev->bus)->private_data;
> -			npu = &nphb->npu;
> +			npu = npdev_to_npu(npdev);
>  			mmio_atsd_reg[i].npu = npu;
>  			mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
>  			while (mmio_atsd_reg[i].reg < 0) {
> @@ -749,7 +759,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	}
>  
>  	nphb = pci_bus_to_host(npdev->bus)->private_data;
> -	npu = &nphb->npu;
> +	npu = npdev_to_npu(npdev);
>  
>  	/*
>  	 * Setup the NPU context table for a particular GPU. These need to be
> @@ -869,7 +879,7 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  		return;
>  
>  	nphb = pci_bus_to_host(npdev->bus)->private_data;
> -	npu = &nphb->npu;
> +	npu = npdev_to_npu(npdev);
>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>  							&nvlink_index)))

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct
  2018-10-15  9:32   ` Alexey Kardashevskiy
@ 2018-11-07  5:08     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 9072 bytes --]

On Mon, Oct 15, 2018 at 08:32:58PM +1100, Alexey Kardashevskiy wrote:
> We are going to add a global list of NPUs in the system which is going
> to be yet another static symbol. Let's reorganise the code and put all
> static symbols into one struct for better tracking what is really needed
> for NPU (this might become a driver data some day).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

I'm not entirely convinced this is worthwhile, but maybe it'll become
clearer with the later patches.  There are some nits though.

> ---
>  arch/powerpc/include/asm/pci.h            |  1 +
>  arch/powerpc/platforms/powernv/npu-dma.c  | 77 ++++++++++++++++++-------------
>  arch/powerpc/platforms/powernv/pci-ioda.c |  2 +
>  3 files changed, 47 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 2af9ded..1a96075 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -129,5 +129,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>  
>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
> +extern void pnv_npu2_devices_init(void);
>  
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 13e5153..01402f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -22,20 +22,6 @@
>  #include "pci.h"
>  
>  /*
> - * spinlock to protect initialisation of an npu_context for a particular
> - * mm_struct.
> - */
> -static DEFINE_SPINLOCK(npu_context_lock);
> -
> -/*
> - * When an address shootdown range exceeds this threshold we invalidate the
> - * entire TLB on the GPU for the given PID rather than each specific address in
> - * the range.
> - */
> -static uint64_t atsd_threshold = 2 * 1024 * 1024;
> -static struct dentry *atsd_threshold_dentry;
> -
> -/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -392,6 +378,33 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +static struct {
> +	/*
> +	 * spinlock to protect initialisation of an npu_context for
> +	 * a particular mm_struct.
> +	 */
> +	spinlock_t context_lock;
> +
> +	/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> +	int max_index;
> +
> +	/*
> +	 * When an address shootdown range exceeds this threshold we invalidate the
> +	 * entire TLB on the GPU for the given PID rather than each specific address in
> +	 * the range.
> +	 */
> +	uint64_t atsd_threshold;
> +	struct dentry *atsd_threshold_dentry;
> +
> +} npu2_devices;

Even as a structu, it should be possible to statically initialize this.

> +
> +void pnv_npu2_devices_init(void)
> +{
> +	memset(&npu2_devices, 0, sizeof(npu2_devices));

The memset() is unnecessary.  The static structure lives in the BSS,
which means it is already initialized to zeroes.

> +	spin_lock_init(&npu2_devices.context_lock);
> +	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
> +}
> +
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
>  	struct pnv_phb *nphb;
> @@ -404,9 +417,6 @@ static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> -/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> -static int max_npu2_index;
> -
>  struct npu_context {
>  	struct mm_struct *mm;
>  	struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
> @@ -472,7 +482,7 @@ static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
>  	int i;
>  	unsigned long launch;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -503,7 +513,7 @@ static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
>  	int i;
>  	unsigned long launch;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -536,7 +546,7 @@ static void mmio_invalidate_wait(
>  	int i, reg;
>  
>  	/* Wait for all invalidations to complete */
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -559,7 +569,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  	struct npu *npu;
>  	struct pci_dev *npdev;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		mmio_atsd_reg[i].reg = -1;
>  		for (j = 0; j < NV_MAX_LINKS; j++) {
>  			/*
> @@ -593,7 +603,7 @@ static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
>  {
>  	int i;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		/*
>  		 * We can't rely on npu_context->npdev[][] being the same here
>  		 * as when acquire_atsd_reg() was called, hence we use the
> @@ -683,7 +693,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  	struct npu_context *npu_context = mn_to_npu_context(mn);
>  	unsigned long address;
>  
> -	if (end - start > atsd_threshold) {
> +	if (end - start > npu2_devices.atsd_threshold) {
>  		/*
>  		 * Just invalidate the entire PID if the address range is too
>  		 * large.
> @@ -777,12 +787,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 * We store the npu pci device so we can more easily get at the
>  	 * associated npus.
>  	 */
> -	spin_lock(&npu_context_lock);
> +	spin_lock(&npu2_devices.context_lock);
>  	npu_context = mm->context.npu_context;
>  	if (npu_context) {
>  		if (npu_context->release_cb != cb ||
>  			npu_context->priv != priv) {
> -			spin_unlock(&npu_context_lock);
> +			spin_unlock(&npu2_devices.context_lock);
>  			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>  						PCI_DEVID(gpdev->bus->number,
>  							gpdev->devfn));
> @@ -791,12 +801,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  
>  		WARN_ON(!kref_get_unless_zero(&npu_context->kref));
>  	}
> -	spin_unlock(&npu_context_lock);
> +	spin_unlock(&npu2_devices.context_lock);
>  
>  	if (!npu_context) {
>  		/*
>  		 * We can set up these fields without holding the
> -		 * npu_context_lock as the npu_context hasn't been returned to
> +		 * npu2_devices.context_lock as the npu_context hasn't been returned to
>  		 * the caller meaning it can't be destroyed. Parallel allocation
>  		 * is protected against by mmap_sem.
>  		 */
> @@ -887,9 +897,9 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
>  	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
>  				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> -	spin_lock(&npu_context_lock);
> +	spin_lock(&npu2_devices.context_lock);
>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
> -	spin_unlock(&npu_context_lock);
> +	spin_unlock(&npu2_devices.context_lock);
>  
>  	/*
>  	 * We need to do this outside of pnv_npu2_release_context so that it is
> @@ -958,9 +968,10 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	static int npu_index;
>  	uint64_t rc = 0;
>  
> -	if (!atsd_threshold_dentry) {
> -		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
> -				   0600, powerpc_debugfs_root, &atsd_threshold);
> +	if (!npu2_devices.atsd_threshold_dentry) {
> +		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
> +				"atsd_threshold", 0600, powerpc_debugfs_root,
> +				&npu2_devices.atsd_threshold);
>  	}
>  
>  	phb->npu.nmmu_flush =
> @@ -988,7 +999,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	npu_index++;
>  	if (WARN_ON(npu_index >= NV_MAX_NPUS))
>  		return -ENOSPC;
> -	max_npu2_index = npu_index;
> +	npu2_devices.max_index = npu_index;
>  	phb->npu.index = npu_index;
>  
>  	return 0;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e37b9cc..0cc81c0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1279,6 +1279,8 @@ static void pnv_pci_ioda_setup_PEs(void)
>  	struct pci_bus *bus;
>  	struct pci_dev *pdev;
>  
> +	pnv_npu2_devices_init();
> +
>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>  		phb = hose->private_data;
>  		if (phb->type == PNV_PHB_NPU_NVLINK) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct
@ 2018-11-07  5:08     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 9072 bytes --]

On Mon, Oct 15, 2018 at 08:32:58PM +1100, Alexey Kardashevskiy wrote:
> We are going to add a global list of NPUs in the system which is going
> to be yet another static symbol. Let's reorganise the code and put all
> static symbols into one struct for better tracking what is really needed
> for NPU (this might become a driver data some day).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

I'm not entirely convinced this is worthwhile, but maybe it'll become
clearer with the later patches.  There are some nits though.

> ---
>  arch/powerpc/include/asm/pci.h            |  1 +
>  arch/powerpc/platforms/powernv/npu-dma.c  | 77 ++++++++++++++++++-------------
>  arch/powerpc/platforms/powernv/pci-ioda.c |  2 +
>  3 files changed, 47 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 2af9ded..1a96075 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -129,5 +129,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>  
>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
> +extern void pnv_npu2_devices_init(void);
>  
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 13e5153..01402f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -22,20 +22,6 @@
>  #include "pci.h"
>  
>  /*
> - * spinlock to protect initialisation of an npu_context for a particular
> - * mm_struct.
> - */
> -static DEFINE_SPINLOCK(npu_context_lock);
> -
> -/*
> - * When an address shootdown range exceeds this threshold we invalidate the
> - * entire TLB on the GPU for the given PID rather than each specific address in
> - * the range.
> - */
> -static uint64_t atsd_threshold = 2 * 1024 * 1024;
> -static struct dentry *atsd_threshold_dentry;
> -
> -/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -392,6 +378,33 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +static struct {
> +	/*
> +	 * spinlock to protect initialisation of an npu_context for
> +	 * a particular mm_struct.
> +	 */
> +	spinlock_t context_lock;
> +
> +	/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> +	int max_index;
> +
> +	/*
> +	 * When an address shootdown range exceeds this threshold we invalidate the
> +	 * entire TLB on the GPU for the given PID rather than each specific address in
> +	 * the range.
> +	 */
> +	uint64_t atsd_threshold;
> +	struct dentry *atsd_threshold_dentry;
> +
> +} npu2_devices;

Even as a structu, it should be possible to statically initialize this.

> +
> +void pnv_npu2_devices_init(void)
> +{
> +	memset(&npu2_devices, 0, sizeof(npu2_devices));

The memset() is unnecessary.  The static structure lives in the BSS,
which means it is already initialized to zeroes.

> +	spin_lock_init(&npu2_devices.context_lock);
> +	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
> +}
> +
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
>  	struct pnv_phb *nphb;
> @@ -404,9 +417,6 @@ static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  /* Maximum number of nvlinks per npu */
>  #define NV_MAX_LINKS 6
>  
> -/* Maximum index of npu2 hosts in the system. Always < NV_MAX_NPUS */
> -static int max_npu2_index;
> -
>  struct npu_context {
>  	struct mm_struct *mm;
>  	struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS];
> @@ -472,7 +482,7 @@ static void mmio_invalidate_pid(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
>  	int i;
>  	unsigned long launch;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -503,7 +513,7 @@ static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
>  	int i;
>  	unsigned long launch;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -536,7 +546,7 @@ static void mmio_invalidate_wait(
>  	int i, reg;
>  
>  	/* Wait for all invalidations to complete */
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		if (mmio_atsd_reg[i].reg < 0)
>  			continue;
>  
> @@ -559,7 +569,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
>  	struct npu *npu;
>  	struct pci_dev *npdev;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		mmio_atsd_reg[i].reg = -1;
>  		for (j = 0; j < NV_MAX_LINKS; j++) {
>  			/*
> @@ -593,7 +603,7 @@ static void release_atsd_reg(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
>  {
>  	int i;
>  
> -	for (i = 0; i <= max_npu2_index; i++) {
> +	for (i = 0; i <= npu2_devices.max_index; i++) {
>  		/*
>  		 * We can't rely on npu_context->npdev[][] being the same here
>  		 * as when acquire_atsd_reg() was called, hence we use the
> @@ -683,7 +693,7 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  	struct npu_context *npu_context = mn_to_npu_context(mn);
>  	unsigned long address;
>  
> -	if (end - start > atsd_threshold) {
> +	if (end - start > npu2_devices.atsd_threshold) {
>  		/*
>  		 * Just invalidate the entire PID if the address range is too
>  		 * large.
> @@ -777,12 +787,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 * We store the npu pci device so we can more easily get at the
>  	 * associated npus.
>  	 */
> -	spin_lock(&npu_context_lock);
> +	spin_lock(&npu2_devices.context_lock);
>  	npu_context = mm->context.npu_context;
>  	if (npu_context) {
>  		if (npu_context->release_cb != cb ||
>  			npu_context->priv != priv) {
> -			spin_unlock(&npu_context_lock);
> +			spin_unlock(&npu2_devices.context_lock);
>  			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>  						PCI_DEVID(gpdev->bus->number,
>  							gpdev->devfn));
> @@ -791,12 +801,12 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  
>  		WARN_ON(!kref_get_unless_zero(&npu_context->kref));
>  	}
> -	spin_unlock(&npu_context_lock);
> +	spin_unlock(&npu2_devices.context_lock);
>  
>  	if (!npu_context) {
>  		/*
>  		 * We can set up these fields without holding the
> -		 * npu_context_lock as the npu_context hasn't been returned to
> +		 * npu2_devices.context_lock as the npu_context hasn't been returned to
>  		 * the caller meaning it can't be destroyed. Parallel allocation
>  		 * is protected against by mmap_sem.
>  		 */
> @@ -887,9 +897,9 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
>  	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
>  				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> -	spin_lock(&npu_context_lock);
> +	spin_lock(&npu2_devices.context_lock);
>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
> -	spin_unlock(&npu_context_lock);
> +	spin_unlock(&npu2_devices.context_lock);
>  
>  	/*
>  	 * We need to do this outside of pnv_npu2_release_context so that it is
> @@ -958,9 +968,10 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	static int npu_index;
>  	uint64_t rc = 0;
>  
> -	if (!atsd_threshold_dentry) {
> -		atsd_threshold_dentry = debugfs_create_x64("atsd_threshold",
> -				   0600, powerpc_debugfs_root, &atsd_threshold);
> +	if (!npu2_devices.atsd_threshold_dentry) {
> +		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
> +				"atsd_threshold", 0600, powerpc_debugfs_root,
> +				&npu2_devices.atsd_threshold);
>  	}
>  
>  	phb->npu.nmmu_flush =
> @@ -988,7 +999,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	npu_index++;
>  	if (WARN_ON(npu_index >= NV_MAX_NPUS))
>  		return -ENOSPC;
> -	max_npu2_index = npu_index;
> +	npu2_devices.max_index = npu_index;
>  	phb->npu.index = npu_index;
>  
>  	return 0;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e37b9cc..0cc81c0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1279,6 +1279,8 @@ static void pnv_pci_ioda_setup_PEs(void)
>  	struct pci_bus *bus;
>  	struct pci_dev *pdev;
>  
> +	pnv_npu2_devices_init();
> +
>  	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
>  		phb = hose->private_data;
>  		if (phb->type == PNV_PHB_NPU_NVLINK) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb
  2018-10-15  9:32   ` Alexey Kardashevskiy
@ 2018-11-07  5:14     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:14 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 6263 bytes --]

On Mon, Oct 15, 2018 at 08:32:59PM +1100, Alexey Kardashevskiy wrote:
> The powernv PCI code stores NPU data in the pnv_phb struct. The latter
> is referenced by pci_controller::private_data. We are going to have NPU2
> support in the pseries platform as well but it does not store any
> private_data in in the pci_controller struct; and even if it did,
> it would be a different data structure.
> 
> This adds a global list of NPUs so each platform can register and use
> these in the same fashion.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/platforms/powernv/pci.h     | 16 -------
>  arch/powerpc/platforms/powernv/npu-dma.c | 71 +++++++++++++++++++++++++-------
>  2 files changed, 57 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 8b37b28..3b7617d 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -8,9 +8,6 @@
>  
>  struct pci_dn;
>  
> -/* Maximum possible number of ATSD MMIO registers per NPU */
> -#define NV_NMMU_ATSD_REGS 8
> -
>  enum pnv_phb_type {
>  	PNV_PHB_IODA1		= 0,
>  	PNV_PHB_IODA2		= 1,
> @@ -180,19 +177,6 @@ struct pnv_phb {
>  	unsigned int		diag_data_size;
>  	u8			*diag_data;
>  
> -	/* Nvlink2 data */
> -	struct npu {
> -		int index;
> -		__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> -		unsigned int mmio_atsd_count;
> -
> -		/* Bitmask for MMIO register usage */
> -		unsigned long mmio_atsd_usage;
> -
> -		/* Do we need to explicitly flush the nest mmu? */
> -		bool nmmu_flush;
> -	} npu;
> -
>  	int p2p_target_count;
>  };
>  
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 01402f9..cb2b4f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -378,6 +378,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +/* Maximum possible number of ATSD MMIO registers per NPU */
> +#define NV_NMMU_ATSD_REGS 8
> +
> +struct npu {
> +	int index;
> +	__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> +	unsigned int mmio_atsd_count;
> +
> +	/* Bitmask for MMIO register usage */
> +	unsigned long mmio_atsd_usage;
> +
> +	/* Do we need to explicitly flush the nest mmu? */
> +	bool nmmu_flush;
> +
> +	struct list_head next;
> +
> +	struct pci_controller *hose;
> +};
> +
>  static struct {
>  	/*
>  	 * spinlock to protect initialisation of an npu_context for
> @@ -396,22 +415,27 @@ static struct {
>  	uint64_t atsd_threshold;
>  	struct dentry *atsd_threshold_dentry;
>  
> +	struct list_head npu_list;
>  } npu2_devices;
>  
>  void pnv_npu2_devices_init(void)
>  {
>  	memset(&npu2_devices, 0, sizeof(npu2_devices));
> +	INIT_LIST_HEAD(&npu2_devices.npu_list);
>  	spin_lock_init(&npu2_devices.context_lock);
>  	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
>  }
>  
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
> -	struct pnv_phb *nphb;
> +	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> +	struct npu *npu;
>  
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
> +	list_for_each_entry(npu, &npu2_devices.npu_list, next)
> +		if (hose == npu->hose)
> +			return npu;
>  
> -	return &nphb->npu;
> +	return NULL;

If you're introducing the possibility this can fail, should you also
be introducing checks for NULL in the callers?

>  }
>  
>  /* Maximum number of nvlinks per npu */
> @@ -843,7 +867,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 */
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
>  
> -	if (!nphb->npu.nmmu_flush) {
> +	if (!npu->nmmu_flush) {

Looks like this hunk belongs in the patch which introduced npdev_to_npu().

>  		/*
>  		 * If we're not explicitly flushing ourselves we need to mark
>  		 * the thread for global flushes
> @@ -967,6 +991,13 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	struct pci_dev *gpdev;
>  	static int npu_index;
>  	uint64_t rc = 0;
> +	struct pci_controller *hose = phb->hose;
> +	struct npu *npu;
> +	int ret;
> +
> +	npu = kzalloc(sizeof(*npu), GFP_KERNEL);
> +	if (!npu)
> +		return -ENOMEM;
>  
>  	if (!npu2_devices.atsd_threshold_dentry) {
>  		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
> @@ -974,8 +1005,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  				&npu2_devices.atsd_threshold);
>  	}
>  
> -	phb->npu.nmmu_flush =
> -		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
> +	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
>  	for_each_child_of_node(phb->hose->dn, dn) {
>  		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>  		if (gpdev) {
> @@ -989,18 +1019,31 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  		}
>  	}
>  
> -	for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
> +	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>  							i, &mmio_atsd); i++)
> -		phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
> +		npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
>  
> -	pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
> -	phb->npu.mmio_atsd_count = i;
> -	phb->npu.mmio_atsd_usage = 0;
> +	pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
> +	npu->mmio_atsd_count = i;
> +	npu->mmio_atsd_usage = 0;
>  	npu_index++;
> -	if (WARN_ON(npu_index >= NV_MAX_NPUS))
> -		return -ENOSPC;
> +	if (WARN_ON(npu_index >= NV_MAX_NPUS)) {
> +		ret = -ENOSPC;
> +		goto fail_exit;
> +	}
>  	npu2_devices.max_index = npu_index;
> -	phb->npu.index = npu_index;
> +	npu->index = npu_index;
> +	npu->hose = hose;
> +
> +	list_add(&npu->next, &npu2_devices.npu_list);
>  
>  	return 0;
> +
> +fail_exit:
> +	for (i = 0; i < npu->mmio_atsd_count; ++i)
> +		iounmap(npu->mmio_atsd_regs[i]);
> +
> +	kfree(npu);
> +
> +	return ret;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb
@ 2018-11-07  5:14     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-07  5:14 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 6263 bytes --]

On Mon, Oct 15, 2018 at 08:32:59PM +1100, Alexey Kardashevskiy wrote:
> The powernv PCI code stores NPU data in the pnv_phb struct. The latter
> is referenced by pci_controller::private_data. We are going to have NPU2
> support in the pseries platform as well but it does not store any
> private_data in in the pci_controller struct; and even if it did,
> it would be a different data structure.
> 
> This adds a global list of NPUs so each platform can register and use
> these in the same fashion.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/platforms/powernv/pci.h     | 16 -------
>  arch/powerpc/platforms/powernv/npu-dma.c | 71 +++++++++++++++++++++++++-------
>  2 files changed, 57 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 8b37b28..3b7617d 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -8,9 +8,6 @@
>  
>  struct pci_dn;
>  
> -/* Maximum possible number of ATSD MMIO registers per NPU */
> -#define NV_NMMU_ATSD_REGS 8
> -
>  enum pnv_phb_type {
>  	PNV_PHB_IODA1		= 0,
>  	PNV_PHB_IODA2		= 1,
> @@ -180,19 +177,6 @@ struct pnv_phb {
>  	unsigned int		diag_data_size;
>  	u8			*diag_data;
>  
> -	/* Nvlink2 data */
> -	struct npu {
> -		int index;
> -		__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> -		unsigned int mmio_atsd_count;
> -
> -		/* Bitmask for MMIO register usage */
> -		unsigned long mmio_atsd_usage;
> -
> -		/* Do we need to explicitly flush the nest mmu? */
> -		bool nmmu_flush;
> -	} npu;
> -
>  	int p2p_target_count;
>  };
>  
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 01402f9..cb2b4f9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -378,6 +378,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
>  /*
>   * NPU2 ATS
>   */
> +/* Maximum possible number of ATSD MMIO registers per NPU */
> +#define NV_NMMU_ATSD_REGS 8
> +
> +struct npu {
> +	int index;
> +	__be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
> +	unsigned int mmio_atsd_count;
> +
> +	/* Bitmask for MMIO register usage */
> +	unsigned long mmio_atsd_usage;
> +
> +	/* Do we need to explicitly flush the nest mmu? */
> +	bool nmmu_flush;
> +
> +	struct list_head next;
> +
> +	struct pci_controller *hose;
> +};
> +
>  static struct {
>  	/*
>  	 * spinlock to protect initialisation of an npu_context for
> @@ -396,22 +415,27 @@ static struct {
>  	uint64_t atsd_threshold;
>  	struct dentry *atsd_threshold_dentry;
>  
> +	struct list_head npu_list;
>  } npu2_devices;
>  
>  void pnv_npu2_devices_init(void)
>  {
>  	memset(&npu2_devices, 0, sizeof(npu2_devices));
> +	INIT_LIST_HEAD(&npu2_devices.npu_list);
>  	spin_lock_init(&npu2_devices.context_lock);
>  	npu2_devices.atsd_threshold = 2 * 1024 * 1024;
>  }
>  
>  static struct npu *npdev_to_npu(struct pci_dev *npdev)
>  {
> -	struct pnv_phb *nphb;
> +	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> +	struct npu *npu;
>  
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
> +	list_for_each_entry(npu, &npu2_devices.npu_list, next)
> +		if (hose == npu->hose)
> +			return npu;
>  
> -	return &nphb->npu;
> +	return NULL;

If you're introducing the possibility this can fail, should you also
be introducing checks for NULL in the callers?

>  }
>  
>  /* Maximum number of nvlinks per npu */
> @@ -843,7 +867,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 */
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
>  
> -	if (!nphb->npu.nmmu_flush) {
> +	if (!npu->nmmu_flush) {

Looks like this hunk belongs in the patch which introduced npdev_to_npu().

>  		/*
>  		 * If we're not explicitly flushing ourselves we need to mark
>  		 * the thread for global flushes
> @@ -967,6 +991,13 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	struct pci_dev *gpdev;
>  	static int npu_index;
>  	uint64_t rc = 0;
> +	struct pci_controller *hose = phb->hose;
> +	struct npu *npu;
> +	int ret;
> +
> +	npu = kzalloc(sizeof(*npu), GFP_KERNEL);
> +	if (!npu)
> +		return -ENOMEM;
>  
>  	if (!npu2_devices.atsd_threshold_dentry) {
>  		npu2_devices.atsd_threshold_dentry = debugfs_create_x64(
> @@ -974,8 +1005,7 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  				&npu2_devices.atsd_threshold);
>  	}
>  
> -	phb->npu.nmmu_flush =
> -		of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
> +	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
>  	for_each_child_of_node(phb->hose->dn, dn) {
>  		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>  		if (gpdev) {
> @@ -989,18 +1019,31 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  		}
>  	}
>  
> -	for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
> +	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>  							i, &mmio_atsd); i++)
> -		phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
> +		npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
>  
> -	pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
> -	phb->npu.mmio_atsd_count = i;
> -	phb->npu.mmio_atsd_usage = 0;
> +	pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
> +	npu->mmio_atsd_count = i;
> +	npu->mmio_atsd_usage = 0;
>  	npu_index++;
> -	if (WARN_ON(npu_index >= NV_MAX_NPUS))
> -		return -ENOSPC;
> +	if (WARN_ON(npu_index >= NV_MAX_NPUS)) {
> +		ret = -ENOSPC;
> +		goto fail_exit;
> +	}
>  	npu2_devices.max_index = npu_index;
> -	phb->npu.index = npu_index;
> +	npu->index = npu_index;
> +	npu->hose = hose;
> +
> +	list_add(&npu->next, &npu2_devices.npu_list);
>  
>  	return 0;
> +
> +fail_exit:
> +	for (i = 0; i < npu->mmio_atsd_count; ++i)
> +		iounmap(npu->mmio_atsd_regs[i]);
> +
> +	kfree(npu);
> +
> +	return ret;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
  2018-10-15  9:33   ` Alexey Kardashevskiy
@ 2018-11-08  5:05     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-08  5:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 11133 bytes --]

On Mon, Oct 15, 2018 at 08:33:00PM +1100, Alexey Kardashevskiy wrote:
> At the moment the NPU context init/destroy code calls OPAL. The init
> handler in OPAL configures the NPU to pass ATS requests to nested MMU,
> the destroy handler does nothing besides sanity checks.
> 
> Since the init handler programs the NPU with a wildcard for LPID/PID,
> this can be done at the point where a GPU is mapped to an LPAR; it also
> makes calling opal_npu_destroy_context() unnecessary in this context
> (this will change with VFIO later though).

I think this wants a bit more explanation.  It certainly makes me
nervous removing the destroy_context calls with nothing replacing
them.

> Also, the pnv_npu2_init() helper does not really need to call OPAL as
> well as it inialized an NPU structure and does not interact with GPU or
> NPU at that moment.
> 
> This moves OPAL calls to a separate helper. With this change, the API
> for GPUs does not do any OPAL calls and therefore can be used by both
> pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
> should be called on powernv only as it does OPAL calls and it takes
> an MSR mask which NPU adds to ATS requests so nested MMU knows what
> translations are permitted; the VFIO/KVM will not set MSR_HV.
> 
> This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
> pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/pci.h            |   1 +
>  arch/powerpc/platforms/powernv/pci.h      |   2 +-
>  arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
>  arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
>  4 files changed, 57 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 1a96075..f196df6 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
>  extern void pnv_npu2_devices_init(void);
> +extern int pnv_npu2_init(struct pci_controller *hose);
>  
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 3b7617d..ca2ce4b 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
>  extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
>  extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
>  extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
> -extern int pnv_npu2_init(struct pnv_phb *phb);
> +extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
>  
>  /* pci-ioda-tce.c */
>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index cb2b4f9..677f30a 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	u32 nvlink_index;
>  	struct device_node *nvlink_dn;
>  	struct mm_struct *mm = current->mm;
> -	struct pnv_phb *nphb;
>  	struct npu *npu;
>  	struct npu_context *npu_context;
>  
> @@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 */
>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return ERR_PTR(-ENODEV);
> -
>  	if (!npdev)
>  		/* No nvlink associated with this GPU device */
>  		return ERR_PTR(-ENODEV);
> @@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>  	npu = npdev_to_npu(npdev);
>  
>  	/*
> -	 * Setup the NPU context table for a particular GPU. These need to be
> -	 * per-GPU as we need the tables to filter ATSDs when there are no
> -	 * active contexts on a particular GPU. It is safe for these to be
> -	 * called concurrently with destroy as the OPAL call takes appropriate
> -	 * locks and refcounts on init/destroy.
> -	 */
> -	rc = opal_npu_init_context(nphb->opal_id, mm->context.id,
> flags,

AFAICT this was the only use of 'flags' in this function, so it should
be removed from the signature, no?

> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> -	if (rc < 0)
> -		return ERR_PTR(-ENOSPC);
> -
> -	/*
>  	 * We store the npu pci device so we can more easily get at the
>  	 * associated npus.
>  	 */
> @@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  		if (npu_context->release_cb != cb ||
>  			npu_context->priv != priv) {
>  			spin_unlock(&npu2_devices.context_lock);
> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> -						PCI_DEVID(gpdev->bus->number,
> -							gpdev->devfn));
>  			return ERR_PTR(-EINVAL);
>  		}
>  
> @@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  
>  		if (rc) {
>  			kfree(npu_context);
> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> -					PCI_DEVID(gpdev->bus->number,
> -						gpdev->devfn));
>  			return ERR_PTR(rc);
>  		}
>  
> @@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  			struct pci_dev *gpdev)
>  {
>  	int removed;
> -	struct pnv_phb *nphb;
>  	struct npu *npu;
>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>  	struct device_node *nvlink_dn;
> @@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  	if (WARN_ON(!npdev))
>  		return;
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return;
> -
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>  	npu = npdev_to_npu(npdev);
>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>  							&nvlink_index)))
>  		return;
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
> -	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>  	spin_lock(&npu2_devices.context_lock);
>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
>  	spin_unlock(&npu2_devices.context_lock);
> @@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>  	/* mmap_sem should be held so the struct_mm must be present */
>  	struct mm_struct *mm = context->mm;
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return -ENODEV;
> -
>  	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
>  
>  	for (i = 0; i < count; i++) {
> @@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>  }
>  EXPORT_SYMBOL(pnv_npu2_handle_fault);
>  
> -int pnv_npu2_init(struct pnv_phb *phb)
> +int pnv_npu2_init(struct pci_controller *hose)
>  {
>  	unsigned int i;
>  	u64 mmio_atsd;
> -	struct device_node *dn;
> -	struct pci_dev *gpdev;
>  	static int npu_index;
> -	uint64_t rc = 0;
> -	struct pci_controller *hose = phb->hose;
>  	struct npu *npu;
>  	int ret;
>  
> @@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	}
>  
>  	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
> -	for_each_child_of_node(phb->hose->dn, dn) {
> -		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> -		if (gpdev) {
> -			rc = opal_npu_map_lpar(phb->opal_id,
> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
> -				0, 0);
> -			if (rc)
> -				dev_err(&gpdev->dev,
> -					"Error %lld mapping device to LPAR\n",
> -					rc);
> -		}
> -	}
>  
>  	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>  							i, &mmio_atsd); i++)
> @@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  
>  	return ret;
>  }
> +
> +void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
> +{
> +	struct pci_dev *gpdev;
> +	struct device_node *dn;
> +	int ret;
> +	struct pci_controller *hose = nphb->hose;
> +
> +	for_each_child_of_node(hose->dn, dn) {
> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> +		if (!gpdev)
> +			continue;
> +
> +		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
> +		ret = opal_npu_map_lpar(nphb->opal_id,
> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
> +				0, 0);
> +		if (!ret)
> +			continue;
> +		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
> +	}
> +
> +	/*
> +	 * It seems that touching NPU2_XTS_BDF_MAP in the way
> +	 * the opal_npu_map_lpar() does somehow affects the result of
> +	 * what opal_npu_init_context() does so let's put the latter in
> +	 * a separate loop.
> +	 */
> +	for_each_child_of_node(hose->dn, dn) {
> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> +		if (!gpdev)
> +			continue;
> +
> +		/*
> +		 * Setup the NPU context table for a particular GPU. These need
> +		 * to be per-GPU as we need the tables to filter ATSDs when
> +		 * there are no active contexts on a particular GPU. It is safe
> +		 * for these to be called concurrently with destroy as the OPAL
> +		 * call takes appropriate locks and refcounts on init/destroy.
> +		 */
> +		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
> +				nphb->opal_id);
> +		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> +		if (!ret)
> +			continue;
> +		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
> +	}
> +}
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0cc81c0..56a1398 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
>  			/* PE#0 is needed for error reporting */
>  			pnv_ioda_reserve_pe(phb, 0);
>  			pnv_ioda_setup_npu_PEs(hose->bus);
> -			if (phb->model == PNV_PHB_MODEL_NPU2)
> -				pnv_npu2_init(phb);
> +			if (phb->model == PNV_PHB_MODEL_NPU2) {
> +				pnv_npu2_init(hose);
> +				pnv_npu2_map_lpar_phb(phb,
> +						MSR_DR | MSR_PR | MSR_HV);

This is the only caller of pnv_np2_map_lpar_phb(), do we need the
second parameter?  I take it those "flags" to the function are
actually an MSR value; that wasn't clear from the previous code.

> +			}
>  		}
>  		if (phb->type == PNV_PHB_NPU_OCAPI) {
>  			bus = hose->bus;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
@ 2018-11-08  5:05     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-11-08  5:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 11133 bytes --]

On Mon, Oct 15, 2018 at 08:33:00PM +1100, Alexey Kardashevskiy wrote:
> At the moment the NPU context init/destroy code calls OPAL. The init
> handler in OPAL configures the NPU to pass ATS requests to nested MMU,
> the destroy handler does nothing besides sanity checks.
> 
> Since the init handler programs the NPU with a wildcard for LPID/PID,
> this can be done at the point where a GPU is mapped to an LPAR; it also
> makes calling opal_npu_destroy_context() unnecessary in this context
> (this will change with VFIO later though).

I think this wants a bit more explanation.  It certainly makes me
nervous removing the destroy_context calls with nothing replacing
them.

> Also, the pnv_npu2_init() helper does not really need to call OPAL as
> well as it inialized an NPU structure and does not interact with GPU or
> NPU at that moment.
> 
> This moves OPAL calls to a separate helper. With this change, the API
> for GPUs does not do any OPAL calls and therefore can be used by both
> pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
> should be called on powernv only as it does OPAL calls and it takes
> an MSR mask which NPU adds to ATS requests so nested MMU knows what
> translations are permitted; the VFIO/KVM will not set MSR_HV.
> 
> This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
> pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/pci.h            |   1 +
>  arch/powerpc/platforms/powernv/pci.h      |   2 +-
>  arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
>  arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
>  4 files changed, 57 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 1a96075..f196df6 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
>  extern void pnv_npu2_devices_init(void);
> +extern int pnv_npu2_init(struct pci_controller *hose);
>  
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 3b7617d..ca2ce4b 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
>  extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
>  extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
>  extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
> -extern int pnv_npu2_init(struct pnv_phb *phb);
> +extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
>  
>  /* pci-ioda-tce.c */
>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index cb2b4f9..677f30a 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	u32 nvlink_index;
>  	struct device_node *nvlink_dn;
>  	struct mm_struct *mm = current->mm;
> -	struct pnv_phb *nphb;
>  	struct npu *npu;
>  	struct npu_context *npu_context;
>  
> @@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  	 */
>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return ERR_PTR(-ENODEV);
> -
>  	if (!npdev)
>  		/* No nvlink associated with this GPU device */
>  		return ERR_PTR(-ENODEV);
> @@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>  	npu = npdev_to_npu(npdev);
>  
>  	/*
> -	 * Setup the NPU context table for a particular GPU. These need to be
> -	 * per-GPU as we need the tables to filter ATSDs when there are no
> -	 * active contexts on a particular GPU. It is safe for these to be
> -	 * called concurrently with destroy as the OPAL call takes appropriate
> -	 * locks and refcounts on init/destroy.
> -	 */
> -	rc = opal_npu_init_context(nphb->opal_id, mm->context.id,
> flags,

AFAICT this was the only use of 'flags' in this function, so it should
be removed from the signature, no?

> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> -	if (rc < 0)
> -		return ERR_PTR(-ENOSPC);
> -
> -	/*
>  	 * We store the npu pci device so we can more easily get at the
>  	 * associated npus.
>  	 */
> @@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  		if (npu_context->release_cb != cb ||
>  			npu_context->priv != priv) {
>  			spin_unlock(&npu2_devices.context_lock);
> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> -						PCI_DEVID(gpdev->bus->number,
> -							gpdev->devfn));
>  			return ERR_PTR(-EINVAL);
>  		}
>  
> @@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>  
>  		if (rc) {
>  			kfree(npu_context);
> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> -					PCI_DEVID(gpdev->bus->number,
> -						gpdev->devfn));
>  			return ERR_PTR(rc);
>  		}
>  
> @@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  			struct pci_dev *gpdev)
>  {
>  	int removed;
> -	struct pnv_phb *nphb;
>  	struct npu *npu;
>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>  	struct device_node *nvlink_dn;
> @@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>  	if (WARN_ON(!npdev))
>  		return;
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return;
> -
> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>  	npu = npdev_to_npu(npdev);
>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>  							&nvlink_index)))
>  		return;
>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
> -	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>  	spin_lock(&npu2_devices.context_lock);
>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
>  	spin_unlock(&npu2_devices.context_lock);
> @@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>  	/* mmap_sem should be held so the struct_mm must be present */
>  	struct mm_struct *mm = context->mm;
>  
> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
> -		return -ENODEV;
> -
>  	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
>  
>  	for (i = 0; i < count; i++) {
> @@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>  }
>  EXPORT_SYMBOL(pnv_npu2_handle_fault);
>  
> -int pnv_npu2_init(struct pnv_phb *phb)
> +int pnv_npu2_init(struct pci_controller *hose)
>  {
>  	unsigned int i;
>  	u64 mmio_atsd;
> -	struct device_node *dn;
> -	struct pci_dev *gpdev;
>  	static int npu_index;
> -	uint64_t rc = 0;
> -	struct pci_controller *hose = phb->hose;
>  	struct npu *npu;
>  	int ret;
>  
> @@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  	}
>  
>  	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
> -	for_each_child_of_node(phb->hose->dn, dn) {
> -		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> -		if (gpdev) {
> -			rc = opal_npu_map_lpar(phb->opal_id,
> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
> -				0, 0);
> -			if (rc)
> -				dev_err(&gpdev->dev,
> -					"Error %lld mapping device to LPAR\n",
> -					rc);
> -		}
> -	}
>  
>  	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>  							i, &mmio_atsd); i++)
> @@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
>  
>  	return ret;
>  }
> +
> +void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
> +{
> +	struct pci_dev *gpdev;
> +	struct device_node *dn;
> +	int ret;
> +	struct pci_controller *hose = nphb->hose;
> +
> +	for_each_child_of_node(hose->dn, dn) {
> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> +		if (!gpdev)
> +			continue;
> +
> +		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
> +		ret = opal_npu_map_lpar(nphb->opal_id,
> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
> +				0, 0);
> +		if (!ret)
> +			continue;
> +		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
> +	}
> +
> +	/*
> +	 * It seems that touching NPU2_XTS_BDF_MAP in the way
> +	 * the opal_npu_map_lpar() does somehow affects the result of
> +	 * what opal_npu_init_context() does so let's put the latter in
> +	 * a separate loop.
> +	 */
> +	for_each_child_of_node(hose->dn, dn) {
> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
> +		if (!gpdev)
> +			continue;
> +
> +		/*
> +		 * Setup the NPU context table for a particular GPU. These need
> +		 * to be per-GPU as we need the tables to filter ATSDs when
> +		 * there are no active contexts on a particular GPU. It is safe
> +		 * for these to be called concurrently with destroy as the OPAL
> +		 * call takes appropriate locks and refcounts on init/destroy.
> +		 */
> +		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
> +				nphb->opal_id);
> +		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> +		if (!ret)
> +			continue;
> +		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
> +	}
> +}
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0cc81c0..56a1398 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
>  			/* PE#0 is needed for error reporting */
>  			pnv_ioda_reserve_pe(phb, 0);
>  			pnv_ioda_setup_npu_PEs(hose->bus);
> -			if (phb->model == PNV_PHB_MODEL_NPU2)
> -				pnv_npu2_init(phb);
> +			if (phb->model == PNV_PHB_MODEL_NPU2) {
> +				pnv_npu2_init(hose);
> +				pnv_npu2_map_lpar_phb(phb,
> +						MSR_DR | MSR_PR | MSR_HV);

This is the only caller of pnv_np2_map_lpar_phb(), do we need the
second parameter?  I take it those "flags" to the function are
actually an MSR value; that wasn't clear from the previous code.

> +			}
>  		}
>  		if (phb->type == PNV_PHB_NPU_OCAPI) {
>  			bus = hose->bus;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
  2018-11-08  5:05     ` David Gibson
@ 2018-11-13  6:13       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-11-13  6:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc



On 08/11/2018 16:05, David Gibson wrote:
> On Mon, Oct 15, 2018 at 08:33:00PM +1100, Alexey Kardashevskiy wrote:
>> At the moment the NPU context init/destroy code calls OPAL. The init
>> handler in OPAL configures the NPU to pass ATS requests to nested MMU,
>> the destroy handler does nothing besides sanity checks.
>>
>> Since the init handler programs the NPU with a wildcard for LPID/PID,
>> this can be done at the point where a GPU is mapped to an LPAR; it also
>> makes calling opal_npu_destroy_context() unnecessary in this context
>> (this will change with VFIO later though).
> 
> I think this wants a bit more explanation.  It certainly makes me
> nervous removing the destroy_context calls with nothing replacing
> them.
> 
>> Also, the pnv_npu2_init() helper does not really need to call OPAL as
>> well as it inialized an NPU structure and does not interact with GPU or
>> NPU at that moment.
>>
>> This moves OPAL calls to a separate helper. With this change, the API
>> for GPUs does not do any OPAL calls and therefore can be used by both
>> pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
>> should be called on powernv only as it does OPAL calls and it takes
>> an MSR mask which NPU adds to ATS requests so nested MMU knows what
>> translations are permitted; the VFIO/KVM will not set MSR_HV.
>>
>> This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
>> pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/pci.h            |   1 +
>>  arch/powerpc/platforms/powernv/pci.h      |   2 +-
>>  arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
>>  4 files changed, 57 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>> index 1a96075..f196df6 100644
>> --- a/arch/powerpc/include/asm/pci.h
>> +++ b/arch/powerpc/include/asm/pci.h
>> @@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
>>  extern void pnv_npu2_devices_init(void);
>> +extern int pnv_npu2_init(struct pci_controller *hose);
>>  
>>  #endif /* __ASM_POWERPC_PCI_H */
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 3b7617d..ca2ce4b 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
>>  extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
>>  extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
>>  extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
>> -extern int pnv_npu2_init(struct pnv_phb *phb);
>> +extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
>>  
>>  /* pci-ioda-tce.c */
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
>> index cb2b4f9..677f30a 100644
>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>> @@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  	u32 nvlink_index;
>>  	struct device_node *nvlink_dn;
>>  	struct mm_struct *mm = current->mm;
>> -	struct pnv_phb *nphb;
>>  	struct npu *npu;
>>  	struct npu_context *npu_context;
>>  
>> @@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  	 */
>>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return ERR_PTR(-ENODEV);
>> -
>>  	if (!npdev)
>>  		/* No nvlink associated with this GPU device */
>>  		return ERR_PTR(-ENODEV);
>> @@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  		return ERR_PTR(-EINVAL);
>>  	}
>>  
>> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>>  	npu = npdev_to_npu(npdev);
>>  
>>  	/*
>> -	 * Setup the NPU context table for a particular GPU. These need to be
>> -	 * per-GPU as we need the tables to filter ATSDs when there are no
>> -	 * active contexts on a particular GPU. It is safe for these to be
>> -	 * called concurrently with destroy as the OPAL call takes appropriate
>> -	 * locks and refcounts on init/destroy.
>> -	 */
>> -	rc = opal_npu_init_context(nphb->opal_id, mm->context.id,
>> flags,
> 
> AFAICT this was the only use of 'flags' in this function, so it should
> be removed from the signature, no?


It should, however this is ABI and the vendor driver is calling this
function with the flags so I cannot just ditch the flags.


> 
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>> -	if (rc < 0)
>> -		return ERR_PTR(-ENOSPC);
>> -
>> -	/*
>>  	 * We store the npu pci device so we can more easily get at the
>>  	 * associated npus.
>>  	 */
>> @@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  		if (npu_context->release_cb != cb ||
>>  			npu_context->priv != priv) {
>>  			spin_unlock(&npu2_devices.context_lock);
>> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>> -						PCI_DEVID(gpdev->bus->number,
>> -							gpdev->devfn));
>>  			return ERR_PTR(-EINVAL);
>>  		}
>>  
>> @@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  
>>  		if (rc) {
>>  			kfree(npu_context);
>> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>> -					PCI_DEVID(gpdev->bus->number,
>> -						gpdev->devfn));
>>  			return ERR_PTR(rc);
>>  		}
>>  
>> @@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>>  			struct pci_dev *gpdev)
>>  {
>>  	int removed;
>> -	struct pnv_phb *nphb;
>>  	struct npu *npu;
>>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>>  	struct device_node *nvlink_dn;
>> @@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>>  	if (WARN_ON(!npdev))
>>  		return;
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return;
>> -
>> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>>  	npu = npdev_to_npu(npdev);
>>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>>  							&nvlink_index)))
>>  		return;
>>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
>> -	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>>  	spin_lock(&npu2_devices.context_lock);
>>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
>>  	spin_unlock(&npu2_devices.context_lock);
>> @@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>>  	/* mmap_sem should be held so the struct_mm must be present */
>>  	struct mm_struct *mm = context->mm;
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return -ENODEV;
>> -
>>  	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
>>  
>>  	for (i = 0; i < count; i++) {
>> @@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>>  }
>>  EXPORT_SYMBOL(pnv_npu2_handle_fault);
>>  
>> -int pnv_npu2_init(struct pnv_phb *phb)
>> +int pnv_npu2_init(struct pci_controller *hose)
>>  {
>>  	unsigned int i;
>>  	u64 mmio_atsd;
>> -	struct device_node *dn;
>> -	struct pci_dev *gpdev;
>>  	static int npu_index;
>> -	uint64_t rc = 0;
>> -	struct pci_controller *hose = phb->hose;
>>  	struct npu *npu;
>>  	int ret;
>>  
>> @@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
>>  	}
>>  
>>  	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
>> -	for_each_child_of_node(phb->hose->dn, dn) {
>> -		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> -		if (gpdev) {
>> -			rc = opal_npu_map_lpar(phb->opal_id,
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
>> -				0, 0);
>> -			if (rc)
>> -				dev_err(&gpdev->dev,
>> -					"Error %lld mapping device to LPAR\n",
>> -					rc);
>> -		}
>> -	}
>>  
>>  	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>>  							i, &mmio_atsd); i++)
>> @@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
>>  
>>  	return ret;
>>  }
>> +
>> +void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
>> +{
>> +	struct pci_dev *gpdev;
>> +	struct device_node *dn;
>> +	int ret;
>> +	struct pci_controller *hose = nphb->hose;
>> +
>> +	for_each_child_of_node(hose->dn, dn) {
>> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> +		if (!gpdev)
>> +			continue;
>> +
>> +		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
>> +		ret = opal_npu_map_lpar(nphb->opal_id,
>> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
>> +				0, 0);
>> +		if (!ret)
>> +			continue;
>> +		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
>> +	}
>> +
>> +	/*
>> +	 * It seems that touching NPU2_XTS_BDF_MAP in the way
>> +	 * the opal_npu_map_lpar() does somehow affects the result of
>> +	 * what opal_npu_init_context() does so let's put the latter in
>> +	 * a separate loop.
>> +	 */
>> +	for_each_child_of_node(hose->dn, dn) {
>> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> +		if (!gpdev)
>> +			continue;
>> +
>> +		/*
>> +		 * Setup the NPU context table for a particular GPU. These need
>> +		 * to be per-GPU as we need the tables to filter ATSDs when
>> +		 * there are no active contexts on a particular GPU. It is safe
>> +		 * for these to be called concurrently with destroy as the OPAL
>> +		 * call takes appropriate locks and refcounts on init/destroy.
>> +		 */
>> +		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
>> +				nphb->opal_id);
>> +		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
>> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>> +		if (!ret)
>> +			continue;
>> +		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
>> +	}
>> +}
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 0cc81c0..56a1398 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
>>  			/* PE#0 is needed for error reporting */
>>  			pnv_ioda_reserve_pe(phb, 0);
>>  			pnv_ioda_setup_npu_PEs(hose->bus);
>> -			if (phb->model == PNV_PHB_MODEL_NPU2)
>> -				pnv_npu2_init(phb);
>> +			if (phb->model == PNV_PHB_MODEL_NPU2) {
>> +				pnv_npu2_init(hose);
>> +				pnv_npu2_map_lpar_phb(phb,
>> +						MSR_DR | MSR_PR | MSR_HV);
> 
> This is the only caller of pnv_np2_map_lpar_phb(), do we need the
> second parameter?  I take it those "flags" to the function are
> actually an MSR value; that wasn't clear from the previous code.


Correct, the flags are some MSR bits, the name was there before I
started touching this... The flags are (MSR_DR | MSR_PR) when VFIO
subdriver calls pnv_npu2_map_lpar_phb from later patches. I'll repost
the whole thing as a single patchset, splitting it in parts was a bad
idea. Thanks,




> 
>> +			}
>>  		}
>>  		if (phb->type == PNV_PHB_NPU_OCAPI) {
>>  			bus = hose->bus;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation
@ 2018-11-13  6:13       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-11-13  6:13 UTC (permalink / raw)
  To: David Gibson
  Cc: Alistair Popple, Alex Williamson, linuxppc-dev, Frederic Barrat, kvm-ppc



On 08/11/2018 16:05, David Gibson wrote:
> On Mon, Oct 15, 2018 at 08:33:00PM +1100, Alexey Kardashevskiy wrote:
>> At the moment the NPU context init/destroy code calls OPAL. The init
>> handler in OPAL configures the NPU to pass ATS requests to nested MMU,
>> the destroy handler does nothing besides sanity checks.
>>
>> Since the init handler programs the NPU with a wildcard for LPID/PID,
>> this can be done at the point where a GPU is mapped to an LPAR; it also
>> makes calling opal_npu_destroy_context() unnecessary in this context
>> (this will change with VFIO later though).
> 
> I think this wants a bit more explanation.  It certainly makes me
> nervous removing the destroy_context calls with nothing replacing
> them.
> 
>> Also, the pnv_npu2_init() helper does not really need to call OPAL as
>> well as it inialized an NPU structure and does not interact with GPU or
>> NPU at that moment.
>>
>> This moves OPAL calls to a separate helper. With this change, the API
>> for GPUs does not do any OPAL calls and therefore can be used by both
>> pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
>> should be called on powernv only as it does OPAL calls and it takes
>> an MSR mask which NPU adds to ATS requests so nested MMU knows what
>> translations are permitted; the VFIO/KVM will not set MSR_HV.
>>
>> This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
>> pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/pci.h            |   1 +
>>  arch/powerpc/platforms/powernv/pci.h      |   2 +-
>>  arch/powerpc/platforms/powernv/npu-dma.c  | 100 +++++++++++++++---------------
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
>>  4 files changed, 57 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>> index 1a96075..f196df6 100644
>> --- a/arch/powerpc/include/asm/pci.h
>> +++ b/arch/powerpc/include/asm/pci.h
>> @@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
>>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
>>  extern void pnv_npu2_devices_init(void);
>> +extern int pnv_npu2_init(struct pci_controller *hose);
>>  
>>  #endif /* __ASM_POWERPC_PCI_H */
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 3b7617d..ca2ce4b 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
>>  extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
>>  extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
>>  extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
>> -extern int pnv_npu2_init(struct pnv_phb *phb);
>> +extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
>>  
>>  /* pci-ioda-tce.c */
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS	1
>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
>> index cb2b4f9..677f30a 100644
>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>> @@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  	u32 nvlink_index;
>>  	struct device_node *nvlink_dn;
>>  	struct mm_struct *mm = current->mm;
>> -	struct pnv_phb *nphb;
>>  	struct npu *npu;
>>  	struct npu_context *npu_context;
>>  
>> @@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  	 */
>>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return ERR_PTR(-ENODEV);
>> -
>>  	if (!npdev)
>>  		/* No nvlink associated with this GPU device */
>>  		return ERR_PTR(-ENODEV);
>> @@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  		return ERR_PTR(-EINVAL);
>>  	}
>>  
>> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>>  	npu = npdev_to_npu(npdev);
>>  
>>  	/*
>> -	 * Setup the NPU context table for a particular GPU. These need to be
>> -	 * per-GPU as we need the tables to filter ATSDs when there are no
>> -	 * active contexts on a particular GPU. It is safe for these to be
>> -	 * called concurrently with destroy as the OPAL call takes appropriate
>> -	 * locks and refcounts on init/destroy.
>> -	 */
>> -	rc = opal_npu_init_context(nphb->opal_id, mm->context.id,
>> flags,
> 
> AFAICT this was the only use of 'flags' in this function, so it should
> be removed from the signature, no?


It should, however this is ABI and the vendor driver is calling this
function with the flags so I cannot just ditch the flags.


> 
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>> -	if (rc < 0)
>> -		return ERR_PTR(-ENOSPC);
>> -
>> -	/*
>>  	 * We store the npu pci device so we can more easily get at the
>>  	 * associated npus.
>>  	 */
>> @@ -817,9 +800,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  		if (npu_context->release_cb != cb ||
>>  			npu_context->priv != priv) {
>>  			spin_unlock(&npu2_devices.context_lock);
>> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>> -						PCI_DEVID(gpdev->bus->number,
>> -							gpdev->devfn));
>>  			return ERR_PTR(-EINVAL);
>>  		}
>>  
>> @@ -845,9 +825,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>>  
>>  		if (rc) {
>>  			kfree(npu_context);
>> -			opal_npu_destroy_context(nphb->opal_id, mm->context.id,
>> -					PCI_DEVID(gpdev->bus->number,
>> -						gpdev->devfn));
>>  			return ERR_PTR(rc);
>>  		}
>>  
>> @@ -900,7 +877,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>>  			struct pci_dev *gpdev)
>>  {
>>  	int removed;
>> -	struct pnv_phb *nphb;
>>  	struct npu *npu;
>>  	struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>>  	struct device_node *nvlink_dn;
>> @@ -909,18 +885,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
>>  	if (WARN_ON(!npdev))
>>  		return;
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return;
>> -
>> -	nphb = pci_bus_to_host(npdev->bus)->private_data;
>>  	npu = npdev_to_npu(npdev);
>>  	nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
>>  	if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
>>  							&nvlink_index)))
>>  		return;
>>  	WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
>> -	opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>>  	spin_lock(&npu2_devices.context_lock);
>>  	removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
>>  	spin_unlock(&npu2_devices.context_lock);
>> @@ -952,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>>  	/* mmap_sem should be held so the struct_mm must be present */
>>  	struct mm_struct *mm = context->mm;
>>  
>> -	if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -		return -ENODEV;
>> -
>>  	WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
>>  
>>  	for (i = 0; i < count; i++) {
>> @@ -983,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
>>  }
>>  EXPORT_SYMBOL(pnv_npu2_handle_fault);
>>  
>> -int pnv_npu2_init(struct pnv_phb *phb)
>> +int pnv_npu2_init(struct pci_controller *hose)
>>  {
>>  	unsigned int i;
>>  	u64 mmio_atsd;
>> -	struct device_node *dn;
>> -	struct pci_dev *gpdev;
>>  	static int npu_index;
>> -	uint64_t rc = 0;
>> -	struct pci_controller *hose = phb->hose;
>>  	struct npu *npu;
>>  	int ret;
>>  
>> @@ -1006,18 +969,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
>>  	}
>>  
>>  	npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
>> -	for_each_child_of_node(phb->hose->dn, dn) {
>> -		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> -		if (gpdev) {
>> -			rc = opal_npu_map_lpar(phb->opal_id,
>> -				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
>> -				0, 0);
>> -			if (rc)
>> -				dev_err(&gpdev->dev,
>> -					"Error %lld mapping device to LPAR\n",
>> -					rc);
>> -		}
>> -	}
>>  
>>  	for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
>>  							i, &mmio_atsd); i++)
>> @@ -1047,3 +998,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
>>  
>>  	return ret;
>>  }
>> +
>> +void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr)
>> +{
>> +	struct pci_dev *gpdev;
>> +	struct device_node *dn;
>> +	int ret;
>> +	struct pci_controller *hose = nphb->hose;
>> +
>> +	for_each_child_of_node(hose->dn, dn) {
>> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> +		if (!gpdev)
>> +			continue;
>> +
>> +		dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu\n", nphb->opal_id);
>> +		ret = opal_npu_map_lpar(nphb->opal_id,
>> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn),
>> +				0, 0);
>> +		if (!ret)
>> +			continue;
>> +		dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
>> +	}
>> +
>> +	/*
>> +	 * It seems that touching NPU2_XTS_BDF_MAP in the way
>> +	 * the opal_npu_map_lpar() does somehow affects the result of
>> +	 * what opal_npu_init_context() does so let's put the latter in
>> +	 * a separate loop.
>> +	 */
>> +	for_each_child_of_node(hose->dn, dn) {
>> +		gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
>> +		if (!gpdev)
>> +			continue;
>> +
>> +		/*
>> +		 * Setup the NPU context table for a particular GPU. These need
>> +		 * to be per-GPU as we need the tables to filter ATSDs when
>> +		 * there are no active contexts on a particular GPU. It is safe
>> +		 * for these to be called concurrently with destroy as the OPAL
>> +		 * call takes appropriate locks and refcounts on init/destroy.
>> +		 */
>> +		dev_dbg(&gpdev->dev, "init context opalid=%llu\n",
>> +				nphb->opal_id);
>> +		ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
>> +				PCI_DEVID(gpdev->bus->number, gpdev->devfn));
>> +		if (!ret)
>> +			continue;
>> +		dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
>> +	}
>> +}
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 0cc81c0..56a1398 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -1287,8 +1287,11 @@ static void pnv_pci_ioda_setup_PEs(void)
>>  			/* PE#0 is needed for error reporting */
>>  			pnv_ioda_reserve_pe(phb, 0);
>>  			pnv_ioda_setup_npu_PEs(hose->bus);
>> -			if (phb->model = PNV_PHB_MODEL_NPU2)
>> -				pnv_npu2_init(phb);
>> +			if (phb->model = PNV_PHB_MODEL_NPU2) {
>> +				pnv_npu2_init(hose);
>> +				pnv_npu2_map_lpar_phb(phb,
>> +						MSR_DR | MSR_PR | MSR_HV);
> 
> This is the only caller of pnv_np2_map_lpar_phb(), do we need the
> second parameter?  I take it those "flags" to the function are
> actually an MSR value; that wasn't clear from the previous code.


Correct, the flags are some MSR bits, the name was there before I
started touching this... The flags are (MSR_DR | MSR_PR) when VFIO
subdriver calls pnv_npu2_map_lpar_phb from later patches. I'll repost
the whole thing as a single patchset, splitting it in parts was a bad
idea. Thanks,




> 
>> +			}
>>  		}
>>  		if (phb->type = PNV_PHB_NPU_OCAPI) {
>>  			bus = hose->bus;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-11-13  6:15 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15  9:32 [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2) Alexey Kardashevskiy
2018-10-15  9:32 ` Alexey Kardashevskiy
2018-10-15  9:32 ` [PATCH kernel 1/5] powerpc/powernv/npu: Add helper to access struct npu for NPU device Alexey Kardashevskiy
2018-10-15  9:32   ` Alexey Kardashevskiy
2018-11-07  5:04   ` David Gibson
2018-11-07  5:04     ` David Gibson
2018-10-15  9:32 ` [PATCH kernel 2/5] powerpc/powernv/npu: Collect all static symbols under one struct Alexey Kardashevskiy
2018-10-15  9:32   ` Alexey Kardashevskiy
2018-11-07  5:08   ` David Gibson
2018-11-07  5:08     ` David Gibson
2018-10-15  9:32 ` [PATCH kernel 3/5] powerpc/powernv: Detach npu struct from pnv_phb Alexey Kardashevskiy
2018-10-15  9:32   ` Alexey Kardashevskiy
2018-11-07  5:14   ` David Gibson
2018-11-07  5:14     ` David Gibson
2018-10-15  9:33 ` [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation Alexey Kardashevskiy
2018-10-15  9:33   ` Alexey Kardashevskiy
2018-11-08  5:05   ` David Gibson
2018-11-08  5:05     ` David Gibson
2018-11-13  6:13     ` Alexey Kardashevskiy
2018-11-13  6:13       ` Alexey Kardashevskiy
2018-10-15  9:33 ` [PATCH kernel 5/5] powerpc/powernv/npu: Add helper to map GPU to LPAR Alexey Kardashevskiy
2018-10-15  9:33   ` Alexey Kardashevskiy
2018-10-15  9:34 ` [PATCH kernel 0/5] powerpc/powernv/npu: Reworks for NVIDIA V100 + P9 passthrough (part 2) Alexey Kardashevskiy
2018-10-15  9:34   ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.