linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add support for Nvlink
@ 2015-10-28  5:00 Alistair Popple
  2015-10-28  5:00 ` [PATCH 1/2] Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field" Alistair Popple
  2015-10-28  5:00 ` [PATCH 2/2] platforms/powernv: Add support for Nvlink NPUs Alistair Popple
  0 siblings, 2 replies; 9+ messages in thread
From: Alistair Popple @ 2015-10-28  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: gwshan, benh, Alistair Popple

This series adds support for Nvlink, a high speed interconnect that is
used in conjunction with PCI-E to create a high bandwidth interface
between GPU and CPU.

As the Nvlink hardware interface is similar to IBM's existing PCIe
host bridges no major new kernel or user interfaces are added by this
patch series. Instead the links are treated as standard PCIe devices
sitting under a Nvlink specific PHB type. This allows existing kernel
interfaces to be used for the management and control of the links.

Alistair Popple (2):
  Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"
  platforms/powernv: Add support for Nvlink NPUs

 arch/powerpc/include/asm/pci-bridge.h     |   1 +
 arch/powerpc/include/asm/pci.h            |   4 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 267 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 105 ++++++++++--
 arch/powerpc/platforms/powernv/pci.c      |   4 +
 arch/powerpc/platforms/powernv/pci.h      |  10 ++
 7 files changed, 381 insertions(+), 12 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/npu-dma.c

--
2.1.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"
  2015-10-28  5:00 [PATCH 0/2] Add support for Nvlink Alistair Popple
@ 2015-10-28  5:00 ` Alistair Popple
  2015-10-28  5:00 ` [PATCH 2/2] platforms/powernv: Add support for Nvlink NPUs Alistair Popple
  1 sibling, 0 replies; 9+ messages in thread
From: Alistair Popple @ 2015-10-28  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: gwshan, benh, Alistair Popple

This commit removed the pcidev field from struct pci_dn as it was no
longer in use by the kernel. However to support finding the
association of Nvlink devices to GPU devices from the device-tree this
field is required.

This reverts commit 250c7b277c65.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/include/asm/pci-bridge.h     | 1 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 37fc535..54843ca 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -205,6 +205,7 @@ struct pci_dn {
 
 	int	pci_ext_config_space;	/* for pci devices */
 
+	struct	pci_dev *pcidev;	/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
 	struct eeh_dev *edev;		/* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 414fd1a..42b4bb2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1007,6 +1007,7 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 				pci_name(dev));
 			continue;
 		}
+		pdn->pcidev = dev;
 		pdn->pe_number = pe->pe_number;
 		pe->dma_weight += pnv_ioda_dma_weight(dev);
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] platforms/powernv: Add support for Nvlink NPUs
  2015-10-28  5:00 [PATCH 0/2] Add support for Nvlink Alistair Popple
  2015-10-28  5:00 ` [PATCH 1/2] Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field" Alistair Popple
@ 2015-10-28  5:00 ` Alistair Popple
  2015-11-10  2:28   ` [PATCH v2] " Alistair Popple
  1 sibling, 1 reply; 9+ messages in thread
From: Alistair Popple @ 2015-10-28  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: gwshan, benh, Alistair Popple

NV-Link is a high speed interconnect that is used in conjunction with
a PCI-E connection to create an interface between CPU and GPU that
provides very high data bandwidth. A PCI-E connection to a GPU is used
as the control path to initiate and report status of large data
transfers sent via the NV-Link.

On IBM Power systems the NV-Link hardware interface is very similar to
the existing PHB3. This patch adds support for this new NPU PHB
type. DMA operations on the NPU are not supported as this patch sets
the TCE translation tables to be the same as the related GPU PCIe
device for each Nvlink. Therefore all DMA operations are setup and
controlled via the PCIe device.

EEH is not presently supported for the NPU devices, although it may be
added in future.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pci.h            |   4 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 267 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 104 ++++++++++--
 arch/powerpc/platforms/powernv/pci.c      |   4 +
 arch/powerpc/platforms/powernv/pci.h      |  10 ++
 6 files changed, 379 insertions(+), 12 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/npu-dma.c

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 3453bd8..4409ca9 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -149,4 +149,8 @@ extern void pcibios_setup_phb_io_space(struct pci_controller *hose);
 extern void pcibios_scan_phb(struct pci_controller *hose);
 
 #endif	/* __KERNEL__ */
+
+extern struct pci_dev *pnv_get_nvl_pci_dev(struct pci_dev *nvl_dev);
+extern struct pci_dev *pnv_get_pci_nvl_dev(struct pci_dev *pci_dev, int index);
+
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 1c8cdb6..ee774e8 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -4,7 +4,7 @@ obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
 obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
new file mode 100644
index 0000000..4f8ec18
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -0,0 +1,267 @@
+/*
+ * This file implements the DMA operations for Nvlink devices. The NPU
+ * devices all point to the same iommu table as the parent PCI device.
+ *
+ * Copyright Alistair Popple, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/export.h>
+#include <linux/pci.h>
+#include <linux/memblock.h>
+
+#include <asm/iommu.h>
+#include <asm/pnv-pci.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static struct pci_dev *get_pci_dev(struct device_node *dn)
+{
+	return PCI_DN(dn)->pcidev;
+}
+
+/* Given a NPU device get the associated PCI device. */
+struct pci_dev *pnv_get_nvl_pci_dev(struct pci_dev *nvl_dev)
+{
+	struct device_node *dn;
+	struct pci_dev *pci_dev;
+
+	/* Get assoicated PCI device */
+	dn = of_parse_phandle(nvl_dev->dev.of_node, "ibm,gpu", 0);
+	if (!dn)
+		return NULL;
+
+	pci_dev = get_pci_dev(dn);
+	of_node_put(dn);
+
+	return pci_dev;
+}
+EXPORT_SYMBOL(pnv_get_nvl_pci_dev);
+
+/* Given the real PCI device get a linked NPU device. */
+struct pci_dev *pnv_get_pci_nvl_dev(struct pci_dev *pci_dev, int index)
+{
+	struct device_node *dn;
+	struct pci_dev *nvl_dev;
+
+	/* Get assoicated PCI device */
+	dn = of_parse_phandle(pci_dev->dev.of_node, "ibm,npu", index);
+	if (!dn)
+		return NULL;
+
+	nvl_dev = get_pci_dev(dn);
+	of_node_put(dn);
+
+	return nvl_dev;
+}
+EXPORT_SYMBOL(pnv_get_pci_nvl_dev);
+
+const struct dma_map_ops *get_linked_pci_dma_map_ops(struct device *dev,
+					struct pci_dev **pci_dev)
+{
+	*pci_dev = pnv_get_nvl_pci_dev(to_pci_dev(dev));
+	if (!*pci_dev)
+		return NULL;
+
+	return get_dma_ops(&(*pci_dev)->dev);
+}
+
+#define NPU_DMA_OP_UNSUPPORTED()					\
+	dev_err_once(dev, "%s operation unsupported for Nvlink devices\n", \
+		__func__)
+
+static void *dma_npu_alloc(struct device *dev, size_t size,
+				      dma_addr_t *dma_handle, gfp_t flag,
+				      struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return NULL;
+}
+
+static void dma_npu_free(struct device *dev, size_t size,
+				    void *vaddr, dma_addr_t dma_handle,
+				    struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+}
+
+static dma_addr_t dma_npu_map_page(struct device *dev, struct page *page,
+				     unsigned long offset, size_t size,
+				     enum dma_data_direction direction,
+				     struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static int dma_npu_map_sg(struct device *dev, struct scatterlist *sglist,
+			    int nelems, enum dma_data_direction direction,
+			    struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static int dma_npu_dma_supported(struct device *dev, u64 mask)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static u64 dma_npu_get_required_mask(struct device *dev)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+struct dma_map_ops dma_npu_ops = {
+	.map_page		= dma_npu_map_page,
+	.map_sg			= dma_npu_map_sg,
+	.alloc			= dma_npu_alloc,
+	.free			= dma_npu_free,
+	.dma_supported		= dma_npu_dma_supported,
+	.get_required_mask	= dma_npu_get_required_mask,
+};
+
+/* Returns the PE assoicated with the PCI device of the given
+ * NPU. Returns the linked pci device if pci_dev != NULL.
+ */
+static struct pnv_ioda_pe *get_linked_pci_pe(struct pci_dev *npu_dev,
+					struct pci_dev **pci_dev)
+{
+	struct pci_dev *linked_pci_dev;
+	struct pci_controller *pci_hose;
+	struct pnv_phb *pci_phb;
+	struct pnv_ioda_pe *linked_pe;
+	unsigned long pe_num;
+
+	linked_pci_dev = pnv_get_nvl_pci_dev(npu_dev);
+	if (!linked_pci_dev)
+		return NULL;
+
+	pci_hose = pci_bus_to_host(linked_pci_dev->bus);
+	pci_phb = pci_hose->private_data;
+	pe_num = pci_get_pdn(linked_pci_dev)->pe_number;
+	if (pe_num == IODA_INVALID_PE)
+		return NULL;
+
+	linked_pe = &pci_phb->ioda.pe_array[pe_num];
+	if (pci_dev)
+		*pci_dev = linked_pci_dev;
+
+	return linked_pe;
+}
+
+/* For the NPU we want to point the TCE table at the same table as the
+ * real PCI device.
+ */
+void pnv_pci_npu_setup_dma_pe(struct pnv_phb *npu,
+			struct pnv_ioda_pe *npu_pe)
+{
+	void *addr;
+	struct pci_dev *pci_dev;
+	struct pnv_ioda_pe *pci_pe;
+	unsigned int tce_table_size;
+	int rc;
+
+	/* Find the assoicated PCI devices and get the dma window
+	 * information from there.
+	 */
+	if (!npu_pe->pdev || !(npu_pe->flags & PNV_IODA_PE_DEV))
+		return;
+
+	pci_pe = get_linked_pci_pe(npu_pe->pdev, &pci_dev);
+	if (!pci_pe)
+		return;
+
+	addr = (void *) pci_pe->table_group.tables[0]->it_base;
+	tce_table_size = pci_pe->table_group.tables[0]->it_size << 3;
+	rc = opal_pci_map_pe_dma_window(npu->opal_id, npu_pe->pe_number,
+					npu_pe->pe_number, 1, __pa(addr),
+					tce_table_size, 0x1000);
+	WARN_ON(rc != OPAL_SUCCESS);
+
+	/* We don't initialise npu_pe->tce32_table as we always use
+	 * dma_npu_ops which redirects to the actual pci device dma op
+	 * functions.
+	 */
+	set_dma_ops(&npu_pe->pdev->dev, &dma_npu_ops);
+}
+
+/* Enable/disable bypass mode on the NPU. The NPU only supports one
+ * window per brick, so bypass needs to be explicity enabled or
+ * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
+ * active at the same time.
+ */
+int pnv_pci_npu_dma_set_bypass(struct pnv_phb *npu,
+			struct pnv_ioda_pe *npu_pe, bool enabled)
+{
+	int rc = 0;
+
+	if (npu->type != PNV_PHB_NPU)
+		return -EINVAL;
+
+	if (enabled) {
+		/* Enable the bypass window */
+		phys_addr_t top = memblock_end_of_DRAM();
+
+		npu_pe->tce_bypass_base = 0;
+		top = roundup_pow_of_two(top);
+		dev_info(&npu_pe->pdev->dev, "Enabling bypass for PE %d\n",
+			npu_pe->pe_number);
+		rc = opal_pci_map_pe_dma_window_real(npu->opal_id,
+						     npu_pe->pe_number,
+						     npu_pe->pe_number,
+						     npu_pe->tce_bypass_base,
+						     top);
+	} else
+		/* Disable the bypass window by replacing it with the
+		 * TCE32 window.
+		 */
+		pnv_pci_npu_setup_dma_pe(npu, npu_pe);
+
+	return rc;
+}
+
+int pnv_npu_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pnv_ioda_pe *pe, *linked_pe;
+	struct pci_dev *linked_pci_dev;
+	uint64_t top;
+	bool bypass = false;
+
+	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
+		return -ENODEV;
+
+
+	/* We only do bypass if it's enabled on the linked device */
+	linked_pe = get_linked_pci_pe(pdev, &linked_pci_dev);
+	if (!linked_pe)
+		return -ENODEV;
+
+	if (linked_pe->tce_bypass_enabled) {
+		top = linked_pe->tce_bypass_base + memblock_end_of_DRAM() - 1;
+		bypass = (dma_mask >= top);
+	}
+
+	if (bypass)
+		dev_info(&pdev->dev, "Using 64-bit DMA iommu bypass\n");
+	else
+		dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
+
+	pe = &phb->ioda.pe_array[pdn->pe_number];
+	pnv_pci_npu_dma_set_bypass(phb, pe, bypass);
+	*pdev->dev.dma_mask = dma_mask;
+
+	return 0;
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 42b4bb2..72b6ced 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -781,7 +781,8 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 	}
 
 	/* Configure PELTV */
-	pnv_ioda_set_peltv(phb, pe, true);
+	if (phb->type != PNV_PHB_NPU)
+		pnv_ioda_set_peltv(phb, pe, true);
 
 	/* Setup reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
@@ -924,7 +925,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */
 
-#if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
@@ -941,11 +941,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	if (pdn->pe_number != IODA_INVALID_PE)
 		return NULL;
 
-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
+	pe_num = pnv_ioda_alloc_pe(phb);
 	if (pe_num == IODA_INVALID_PE) {
 		pr_warning("%s: Not enough PE# available, disabling device\n",
 			   pci_name(dev));
@@ -963,6 +959,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	pci_dev_get(dev);
 	pdn->pcidev = dev;
 	pdn->pe_number = pe_num;
+	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
 	pe->pbus = NULL;
 	pe->tce32_seg = -1;
@@ -993,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 
 	return pe;
 }
-#endif /* Useful for SRIOV case */
 
 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
@@ -1084,6 +1080,18 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pnv_ioda_link_pe_by_weight(phb, pe);
 }
 
+static void pnv_ioda_setup_dev_PEs(struct pci_bus *bus)
+{
+	struct pci_bus *child;
+	struct pci_dev *pdev;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list)
+		pnv_ioda_setup_dev_PE(pdev);
+
+	list_for_each_entry(child, &bus->children, node)
+		pnv_ioda_setup_dev_PEs(child);
+}
+
 static void pnv_ioda_setup_PEs(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -1120,7 +1128,15 @@ static void pnv_pci_ioda_setup_PEs(void)
 		if (phb->reserve_m64_pe)
 			phb->reserve_m64_pe(hose->bus, NULL, true);
 
-		pnv_ioda_setup_PEs(hose->bus);
+		/*
+		 * On NPU PHB, we expect separate PEs for individual PCI
+		 * functions. PCI bus dependent PEs are required for the
+		 * remaining types of PHBs.
+		 */
+		if (phb->type == PNV_PHB_NPU)
+			pnv_ioda_setup_dev_PEs(hose->bus);
+		else
+			pnv_ioda_setup_PEs(hose->bus);
 	}
 }
 
@@ -1579,6 +1595,8 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 	struct pnv_ioda_pe *pe;
 	uint64_t top;
 	bool bypass = false;
+	struct pci_dev *linked_npu_dev;
+	int i;
 
 	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
 		return -ENODEV;;
@@ -1597,6 +1615,12 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 		set_dma_ops(&pdev->dev, &dma_iommu_ops);
 	}
 	*pdev->dev.dma_mask = dma_mask;
+
+	/* Update all associated npu devices */
+	for (i = 0; (linked_npu_dev = pnv_get_pci_nvl_dev(pdev, i)); i++)
+		if (dma_get_mask(&linked_npu_dev->dev) != dma_mask)
+			dma_set_mask(&linked_npu_dev->dev, dma_mask);
+
 	return 0;
 }
 
@@ -2437,10 +2461,16 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
 				pe->dma_weight, segs);
 			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
-		} else {
+		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
 			segs = 0;
 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
+		} else if (phb->type == PNV_PHB_NPU) {
+			/* We initialise the DMA space for an NPU PHB
+			 * after setup of the PHB is complete as we
+			 * point the NPU TVT to the the same location
+			 * as the PHB3 TVT.
+			 */
 		}
 
 		remaining -= segs;
@@ -2882,6 +2912,11 @@ static void pnv_pci_ioda_setup_seg(void)
 
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
+
+		/* NPU PHB does not support IO or MMIO segmentation */
+		if (phb->type == PNV_PHB_NPU)
+			continue;
+
 		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
 			pnv_ioda_setup_pe_seg(hose, pe);
 		}
@@ -2921,6 +2956,26 @@ static void pnv_pci_ioda_create_dbgfs(void)
 #endif /* CONFIG_DEBUG_FS */
 }
 
+static void pnv_npu_ioda_fixup(void)
+{
+	bool enable_bypass;
+	struct pci_controller *hose, *tmp;
+	struct pnv_phb *phb;
+	struct pnv_ioda_pe *pe;
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+		if (phb->type != PNV_PHB_NPU)
+			continue;
+
+		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
+			enable_bypass = dma_get_mask(&pe->pdev->dev) ==
+				DMA_BIT_MASK(64);
+			pnv_pci_npu_dma_set_bypass(phb, pe, enable_bypass);
+		}
+	}
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
 	pnv_pci_ioda_setup_PEs();
@@ -2933,6 +2988,9 @@ static void pnv_pci_ioda_fixup(void)
 	eeh_init();
 	eeh_addr_cache_build();
 #endif
+
+	/* Link NPU IODA tables to their PCI devices. */
+	pnv_npu_ioda_fixup();
 }
 
 /*
@@ -3047,6 +3105,19 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
        .shutdown = pnv_pci_ioda_shutdown,
 };
 
+static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
+	.dma_dev_setup = pnv_pci_dma_dev_setup,
+#ifdef CONFIG_PCI_MSI
+	.setup_msi_irqs = pnv_setup_msi_irqs,
+	.teardown_msi_irqs = pnv_teardown_msi_irqs,
+#endif
+	.enable_device_hook = pnv_pci_enable_device_hook,
+	.window_alignment = pnv_pci_window_alignment,
+	.reset_secondary_bus = pnv_pci_reset_secondary_bus,
+	.dma_set_mask = pnv_npu_dma_set_mask,
+	.shutdown = pnv_pci_ioda_shutdown,
+};
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
@@ -3102,6 +3173,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->model = PNV_PHB_MODEL_P7IOC;
 	else if (of_device_is_compatible(np, "ibm,power8-pciex"))
 		phb->model = PNV_PHB_MODEL_PHB3;
+	else if (of_device_is_compatible(np, "ibm,power8-npu-pciex"))
+		phb->model = PNV_PHB_MODEL_NPU;
 	else
 		phb->model = PNV_PHB_MODEL_UNKNOWN;
 
@@ -3202,7 +3275,11 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 * the child P2P bridges) can form individual PE.
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
-	hose->controller_ops = pnv_pci_ioda_controller_ops;
+
+	if (phb->type == PNV_PHB_NPU)
+		hose->controller_ops = pnv_npu_ioda_controller_ops;
+	else
+		hose->controller_ops = pnv_pci_ioda_controller_ops;
 
 #ifdef CONFIG_PCI_IOV
 	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
@@ -3237,6 +3314,11 @@ void __init pnv_pci_init_ioda2_phb(struct device_node *np)
 	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }
 
+void __init pnv_pci_init_npu_phb(struct device_node *np)
+{
+	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU);
+}
+
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
 {
 	struct device_node *phbn;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index f2dd772..ff4e42d 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -807,6 +807,10 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);
 
+	/* Look for NPU PHBs */
+	for_each_compatible_node(np, NULL, "ibm,ioda2-npu-phb")
+		pnv_pci_init_npu_phb(np);
+
 	/* Setup the linkage between OF nodes and PHBs */
 	pci_devs_phb_init();
 
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c8ff50e..31e0f7e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -7,6 +7,7 @@ enum pnv_phb_type {
 	PNV_PHB_P5IOC2	= 0,
 	PNV_PHB_IODA1	= 1,
 	PNV_PHB_IODA2	= 2,
+	PNV_PHB_NPU	= 3,
 };
 
 /* Precise PHB model for error management */
@@ -15,6 +16,7 @@ enum pnv_phb_model {
 	PNV_PHB_MODEL_P5IOC2,
 	PNV_PHB_MODEL_P7IOC,
 	PNV_PHB_MODEL_PHB3,
+	PNV_PHB_MODEL_NPU,
 };
 
 #define PNV_PCI_DIAG_BUF_SIZE	8192
@@ -229,6 +231,7 @@ extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 extern void pnv_pci_init_p5ioc2_hub(struct device_node *np);
 extern void pnv_pci_init_ioda_hub(struct device_node *np);
 extern void pnv_pci_init_ioda2_phb(struct device_node *np);
+extern void pnv_pci_init_npu_phb(struct device_node *np);
 extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 					__be64 *startp, __be64 *endp, bool rm);
 extern void pnv_pci_reset_secondary_bus(struct pci_dev *dev);
@@ -238,4 +241,11 @@ extern void pnv_pci_dma_dev_setup(struct pci_dev *pdev);
 extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
 
+/* Nvlink functions */
+extern void pnv_pci_npu_setup_dma_pe(struct pnv_phb *npu,
+				struct pnv_ioda_pe *npu_pe);
+extern int pnv_pci_npu_dma_set_bypass(struct pnv_phb *npu,
+				struct pnv_ioda_pe *npu_pe, bool enabled);
+extern int pnv_npu_dma_set_mask(struct pci_dev *pdev, u64 dma_mask);
+
 #endif /* __POWERNV_PCI_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2] platforms/powernv: Add support for Nvlink NPUs
  2015-10-28  5:00 ` [PATCH 2/2] platforms/powernv: Add support for Nvlink NPUs Alistair Popple
@ 2015-11-10  2:28   ` Alistair Popple
  2015-11-10  8:51     ` kbuild test robot
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Alistair Popple @ 2015-11-10  2:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: benh, gwshan, Alistair Popple

NV-Link is a high speed interconnect that is used in conjunction with
a PCI-E connection to create an interface between CPU and GPU that
provides very high data bandwidth. A PCI-E connection to a GPU is used
as the control path to initiate and report status of large data
transfers sent via the NV-Link.

On IBM Power systems the NV-Link hardware interface is very similar to
the existing PHB3. This patch adds support for this new NPU PHB
type. DMA operations on the NPU are not supported as this patch sets
the TCE translation tables to be the same as the related GPU PCIe
device for each Nvlink. Therefore all DMA operations are setup and
controlled via the PCIe device.

EEH is not presently supported for the NPU devices, although it may be
added in future.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---

This patch includes the following changes from v1:
 - Minor variable name updates and code refactors suggested by Gavin
 - Fixes for an issue with TCE cache invalidation

 arch/powerpc/include/asm/pci.h            |   4 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/npu-dma.c  | 339 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 132 +++++++++++-
 arch/powerpc/platforms/powernv/pci.c      |   4 +
 arch/powerpc/platforms/powernv/pci.h      |  19 ++
 6 files changed, 488 insertions(+), 12 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/npu-dma.c

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 3453bd8..6f8065a 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -149,4 +149,8 @@ extern void pcibios_setup_phb_io_space(struct pci_controller *hose);
 extern void pcibios_scan_phb(struct pci_controller *hose);

 #endif	/* __KERNEL__ */
+
+extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
+extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
+
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 1c8cdb6..ee774e8 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -4,7 +4,7 @@ obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
 obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o

 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
new file mode 100644
index 0000000..a1e5ba5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -0,0 +1,339 @@
+/*
+ * This file implements the DMA operations for Nvlink devices. The NPU
+ * devices all point to the same iommu table as the parent PCI device.
+ *
+ * Copyright Alistair Popple, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/export.h>
+#include <linux/pci.h>
+#include <linux/memblock.h>
+
+#include <asm/iommu.h>
+#include <asm/pnv-pci.h>
+#include <asm/msi_bitmap.h>
+#include <asm/opal.h>
+
+#include "powernv.h"
+#include "pci.h"
+
+static struct pci_dev *get_pci_dev(struct device_node *dn)
+{
+	return PCI_DN(dn)->pcidev;
+}
+
+/* Given a NPU device get the associated PCI device. */
+struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev)
+{
+	struct device_node *dn;
+	struct pci_dev *gpdev;
+
+	/* Get assoicated PCI device */
+	dn = of_parse_phandle(npdev->dev.of_node, "ibm,gpu", 0);
+	if (!dn)
+		return NULL;
+
+	gpdev = get_pci_dev(dn);
+	of_node_put(dn);
+
+	return gpdev;
+}
+EXPORT_SYMBOL(pnv_pci_get_gpu_dev);
+
+/* Given the real PCI device get a linked NPU device. */
+struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index)
+{
+	struct device_node *dn;
+	struct pci_dev *npdev;
+
+	/* Get assoicated PCI device */
+	dn = of_parse_phandle(gpdev->dev.of_node, "ibm,npu", index);
+	if (!dn)
+		return NULL;
+
+	npdev = get_pci_dev(dn);
+	of_node_put(dn);
+
+	return npdev;
+}
+EXPORT_SYMBOL(pnv_pci_get_npu_dev);
+
+#define NPU_DMA_OP_UNSUPPORTED()					\
+	dev_err_once(dev, "%s operation unsupported for Nvlink devices\n", \
+		__func__)
+
+static void *dma_npu_alloc(struct device *dev, size_t size,
+			   dma_addr_t *dma_handle, gfp_t flag,
+			   struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return NULL;
+}
+
+static void dma_npu_free(struct device *dev, size_t size,
+			 void *vaddr, dma_addr_t dma_handle,
+			 struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+}
+
+static dma_addr_t dma_npu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction direction,
+				   struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static int dma_npu_map_sg(struct device *dev, struct scatterlist *sglist,
+			  int nelems, enum dma_data_direction direction,
+			  struct dma_attrs *attrs)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static int dma_npu_dma_supported(struct device *dev, u64 mask)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+static u64 dma_npu_get_required_mask(struct device *dev)
+{
+	NPU_DMA_OP_UNSUPPORTED();
+	return 0;
+}
+
+struct dma_map_ops dma_npu_ops = {
+	.map_page		= dma_npu_map_page,
+	.map_sg			= dma_npu_map_sg,
+	.alloc			= dma_npu_alloc,
+	.free			= dma_npu_free,
+	.dma_supported		= dma_npu_dma_supported,
+	.get_required_mask	= dma_npu_get_required_mask,
+};
+
+/* Returns the PE assoicated with the PCI device of the given
+ * NPU. Returns the linked pci device if pci_dev != NULL.
+ */
+static struct pnv_ioda_pe *get_gpu_pci_dev_and_pe(struct pnv_ioda_pe *npe,
+						  struct pci_dev **gpdev)
+{
+	struct pnv_phb *phb;
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+	struct pnv_ioda_pe *pe;
+	struct pci_dn *pdn;
+
+	if (npe->flags & PNV_IODA_PE_PEER) {
+		pe = npe->peers[0];
+		pdev = pe->pdev;
+	} else {
+		pdev = pnv_pci_get_gpu_dev(npe->pdev);
+		if (!pdev)
+			return NULL;
+
+		pdn = pci_get_pdn(pdev);
+		if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
+			return NULL;
+
+		hose = pci_bus_to_host(pdev->bus);
+		phb = hose->private_data;
+		pe = &phb->ioda.pe_array[pdn->pe_number];
+	}
+
+	if (gpdev)
+		*gpdev = pdev;
+
+	return pe;
+}
+
+void pnv_npu_tce_invalidate_entire(struct pnv_ioda_pe *npe)
+{
+	struct pnv_phb *phb = npe->phb;
+
+	/* We can only invalidate the whole cache on NPU */
+	unsigned long val = (0x8ull << 60);
+
+	if (phb->type != PNV_PHB_NPU ||
+	    !phb->ioda.tce_inval_reg ||
+	    !(npe->flags & PNV_IODA_PE_DEV))
+		return;
+
+	mb(); /* Ensure above stores are visible */
+	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
+}
+
+void pnv_npu_tce_invalidate(struct pnv_ioda_pe *npe,
+				struct iommu_table *tbl,
+				unsigned long index,
+				unsigned long npages,
+				bool rm)
+{
+	struct pnv_phb *phb = npe->phb;
+
+	/* We can only invalidate the whole cache on NPU */
+	unsigned long val = (0x8ull << 60);
+
+	if (phb->type != PNV_PHB_NPU ||
+	    !phb->ioda.tce_inval_reg ||
+	    !(npe->flags & PNV_IODA_PE_DEV))
+		return;
+
+	mb();
+	if (rm)
+		__asm__ __volatile__("stdcix %0,0,%1" : :
+				"r"(cpu_to_be64(val)),
+				"r" (phb->ioda.tce_inval_reg_phys) :
+				"memory");
+	else
+		__raw_writeq(cpu_to_be64(val),
+			phb->ioda.tce_inval_reg);
+}
+
+void pnv_npu_init_dma_pe(struct pnv_ioda_pe *npe)
+{
+	struct pnv_ioda_pe *gpe;
+	struct pci_dev *gpdev;
+	int i, avail = -1;
+
+	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
+		return;
+
+	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
+	if (!gpe)
+		return;
+
+	/* Nothing to do if the PEs are already connected */
+	for (i = 0; i < PNV_IODA_MAX_PEER_PES; i++) {
+		if (avail < 0 && !gpe->peers[i])
+			avail = i;
+
+		if (gpe->peers[i] == npe)
+			return;
+	}
+
+	if (WARN_ON(avail < 0))
+		return;
+
+	gpe->peers[avail] = npe;
+	gpe->flags |= PNV_IODA_PE_PEER;
+
+	/* We assume that the NPU devices only have a single peer PE
+	 * (the GPU PCIe device PE). */
+	npe->peers[0] = gpe;
+	npe->flags |= PNV_IODA_PE_PEER;
+}
+
+/* For the NPU we want to point the TCE table at the same table as the
+ * real PCI device.
+ */
+static void pnv_npu_disable_bypass(struct pnv_ioda_pe *npe)
+{
+	struct pnv_phb *phb = npe->phb;
+	struct pci_dev *gpdev;
+	struct pnv_ioda_pe *gpe;
+	void *addr;
+	unsigned int size;
+	int64_t rc;
+
+	/* Find the assoicated PCI devices and get the dma window
+	 * information from there.
+	 */
+	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
+		return;
+
+	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
+	if (!gpe)
+		return;
+
+	addr = (void *)gpe->table_group.tables[0]->it_base;
+	size = gpe->table_group.tables[0]->it_size << 3;
+	rc = opal_pci_map_pe_dma_window(phb->opal_id, npe->pe_number,
+					npe->pe_number, 1, __pa(addr),
+					size, 0x1000);
+	if (rc != OPAL_SUCCESS)
+		pr_warn("%s: Error %lld setting DMA window on PHB#%d-PE#%d\n",
+			__func__, rc, phb->hose->global_number, npe->pe_number);
+
+	/* We don't initialise npu_pe->tce32_table as we always use
+	 * dma_npu_ops which are nops.
+	 */
+	set_dma_ops(&npe->pdev->dev, &dma_npu_ops);
+}
+
+/* Enable/disable bypass mode on the NPU. The NPU only supports one
+ * window per brick, so bypass needs to be explicity enabled or
+ * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
+ * active at the same time.
+ */
+int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe, bool enabled)
+{
+	struct pnv_phb *phb = npe->phb;
+	int64_t rc = 0;
+
+	if (phb->type != PNV_PHB_NPU || !npe->pdev)
+		return -EINVAL;
+
+	if (enabled) {
+		/* Enable the bypass window */
+		phys_addr_t top = memblock_end_of_DRAM();
+
+		npe->tce_bypass_base = 0;
+		top = roundup_pow_of_two(top);
+		dev_info(&npe->pdev->dev, "Enabling bypass for PE %d\n",
+			 npe->pe_number);
+		rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
+					npe->pe_number, npe->pe_number,
+					npe->tce_bypass_base, top);
+	} else {
+		/* Disable the bypass window by replacing it with the
+		 * TCE32 window.
+		 */
+		pnv_npu_disable_bypass(npe);
+	}
+
+	return rc;
+}
+
+int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask)
+{
+	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pci_dn *pdn = pci_get_pdn(npdev);
+	struct pnv_ioda_pe *npe, *gpe;
+	struct pci_dev *gpdev;
+	uint64_t top;
+	bool bypass = false;
+
+	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
+		return -ENXIO;
+
+	/* We only do bypass if it's enabled on the linked device */
+	npe = &phb->ioda.pe_array[pdn->pe_number];
+	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
+	if (!gpe)
+		return -ENODEV;
+
+	if (gpe->tce_bypass_enabled) {
+		top = gpe->tce_bypass_base + memblock_end_of_DRAM() - 1;
+		bypass = (dma_mask >= top);
+	}
+
+	if (bypass)
+		dev_info(&npdev->dev, "Using 64-bit DMA iommu bypass\n");
+	else
+		dev_info(&npdev->dev, "Using 32-bit DMA via iommu\n");
+
+	pnv_npu_dma_set_bypass(npe, bypass);
+	*npdev->dev.dma_mask = dma_mask;
+
+	return 0;
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 42b4bb2..8bed20d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -781,7 +781,8 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 	}

 	/* Configure PELTV */
-	pnv_ioda_set_peltv(phb, pe, true);
+	if (phb->type != PNV_PHB_NPU)
+		pnv_ioda_set_peltv(phb, pe, true);

 	/* Setup reverse map */
 	for (rid = pe->rid; rid < rid_end; rid++)
@@ -924,7 +925,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 }
 #endif /* CONFIG_PCI_IOV */

-#if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
@@ -941,11 +941,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	if (pdn->pe_number != IODA_INVALID_PE)
 		return NULL;

-	/* PE#0 has been pre-set */
-	if (dev->bus->number == 0)
-		pe_num = 0;
-	else
-		pe_num = pnv_ioda_alloc_pe(phb);
+	pe_num = pnv_ioda_alloc_pe(phb);
 	if (pe_num == IODA_INVALID_PE) {
 		pr_warning("%s: Not enough PE# available, disabling device\n",
 			   pci_name(dev));
@@ -963,6 +959,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 	pci_dev_get(dev);
 	pdn->pcidev = dev;
 	pdn->pe_number = pe_num;
+	pe->flags = PNV_IODA_PE_DEV;
 	pe->pdev = dev;
 	pe->pbus = NULL;
 	pe->tce32_seg = -1;
@@ -993,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)

 	return pe;
 }
-#endif /* Useful for SRIOV case */

 static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
 {
@@ -1084,6 +1080,18 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, bool all)
 	pnv_ioda_link_pe_by_weight(phb, pe);
 }

+static void pnv_ioda_setup_dev_PEs(struct pci_bus *bus)
+{
+	struct pci_bus *child;
+	struct pci_dev *pdev;
+
+	list_for_each_entry(pdev, &bus->devices, bus_list)
+		pnv_ioda_setup_dev_PE(pdev);
+
+	list_for_each_entry(child, &bus->children, node)
+		pnv_ioda_setup_dev_PEs(child);
+}
+
 static void pnv_ioda_setup_PEs(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -1120,7 +1128,15 @@ static void pnv_pci_ioda_setup_PEs(void)
 		if (phb->reserve_m64_pe)
 			phb->reserve_m64_pe(hose->bus, NULL, true);

-		pnv_ioda_setup_PEs(hose->bus);
+		/*
+		 * On NPU PHB, we expect separate PEs for individual PCI
+		 * functions. PCI bus dependent PEs are required for the
+		 * remaining types of PHBs.
+		 */
+		if (phb->type == PNV_PHB_NPU)
+			pnv_ioda_setup_dev_PEs(hose->bus);
+		else
+			pnv_ioda_setup_PEs(hose->bus);
 	}
 }

@@ -1579,6 +1595,8 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 	struct pnv_ioda_pe *pe;
 	uint64_t top;
 	bool bypass = false;
+	struct pci_dev *linked_npu_dev;
+	int i;

 	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
 		return -ENODEV;;
@@ -1597,6 +1615,15 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 		set_dma_ops(&pdev->dev, &dma_iommu_ops);
 	}
 	*pdev->dev.dma_mask = dma_mask;
+
+	/* Update peer npu devices */
+	if (pe->flags & PNV_IODA_PE_PEER)
+		for (i = 0; pe->peers[i]; i++) {
+			linked_npu_dev = pe->peers[i]->pdev;
+			if (dma_get_mask(&linked_npu_dev->dev) != dma_mask)
+				dma_set_mask(&linked_npu_dev->dev, dma_mask);
+		}
+
 	return 0;
 }

@@ -1741,12 +1768,23 @@ static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
 	/* 01xb - invalidate TCEs that match the specified PE# */
 	unsigned long val = (0x4ull << 60) | (pe->pe_number & 0xFF);
 	struct pnv_phb *phb = pe->phb;
+	struct pnv_ioda_pe *npe;
+	int i;

 	if (!phb->ioda.tce_inval_reg)
 		return;

 	mb(); /* Ensure above stores are visible */
 	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
+
+	if (pe->flags & PNV_IODA_PE_PEER)
+		for (i = 0; i < PNV_IODA_MAX_PEER_PES; i++) {
+			npe = pe->peers[i];
+			if (!npe || npe->phb->type != PNV_PHB_NPU)
+				continue;
+
+			pnv_npu_tce_invalidate_entire(npe);
+		}
 }

 static void pnv_pci_ioda2_do_tce_invalidate(unsigned pe_number, bool rm,
@@ -1781,15 +1819,28 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
 	struct iommu_table_group_link *tgl;

 	list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) {
+		struct pnv_ioda_pe *npe;
 		struct pnv_ioda_pe *pe = container_of(tgl->table_group,
 				struct pnv_ioda_pe, table_group);
 		__be64 __iomem *invalidate = rm ?
 			(__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
 			pe->phb->ioda.tce_inval_reg;
+		int i;

 		pnv_pci_ioda2_do_tce_invalidate(pe->pe_number, rm,
 			invalidate, tbl->it_page_shift,
 			index, npages);
+
+		if (pe->flags & PNV_IODA_PE_PEER)
+			/* Invalidate PEs using the same TCE table */
+			for (i = 0; i < PNV_IODA_MAX_PEER_PES; i++) {
+				npe = pe->peers[i];
+				if (!npe || npe->phb->type != PNV_PHB_NPU)
+					continue;
+
+				pnv_npu_tce_invalidate(npe, tbl, index,
+							npages, rm);
+			}
 	}
 }

@@ -2437,10 +2488,16 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
 				pe->dma_weight, segs);
 			pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
-		} else {
+		} else if (phb->type == PNV_PHB_IODA2) {
 			pe_info(pe, "Assign DMA32 space\n");
 			segs = 0;
 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
+		} else if (phb->type == PNV_PHB_NPU) {
+			/* We initialise the DMA space for an NPU PHB
+			 * after setup of the PHB is complete as we
+			 * point the NPU TVT to the the same location
+			 * as the PHB3 TVT.
+			 */
 		}

 		remaining -= segs;
@@ -2882,6 +2939,11 @@ static void pnv_pci_ioda_setup_seg(void)

 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
 		phb = hose->private_data;
+
+		/* NPU PHB does not support IO or MMIO segmentation */
+		if (phb->type == PNV_PHB_NPU)
+			continue;
+
 		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
 			pnv_ioda_setup_pe_seg(hose, pe);
 		}
@@ -2921,6 +2983,27 @@ static void pnv_pci_ioda_create_dbgfs(void)
 #endif /* CONFIG_DEBUG_FS */
 }

+static void pnv_npu_ioda_fixup(void)
+{
+	bool enable_bypass;
+	struct pci_controller *hose, *tmp;
+	struct pnv_phb *phb;
+	struct pnv_ioda_pe *pe;
+
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+		phb = hose->private_data;
+		if (phb->type != PNV_PHB_NPU)
+			continue;
+
+		list_for_each_entry(pe, &phb->ioda.pe_dma_list, dma_link) {
+			enable_bypass = dma_get_mask(&pe->pdev->dev) ==
+				DMA_BIT_MASK(64);
+			pnv_npu_init_dma_pe(pe);
+			pnv_npu_dma_set_bypass(pe, enable_bypass);
+		}
+	}
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
 	pnv_pci_ioda_setup_PEs();
@@ -2933,6 +3016,9 @@ static void pnv_pci_ioda_fixup(void)
 	eeh_init();
 	eeh_addr_cache_build();
 #endif
+
+	/* Link NPU IODA tables to their PCI devices. */
+	pnv_npu_ioda_fixup();
 }

 /*
@@ -3047,6 +3133,19 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
        .shutdown = pnv_pci_ioda_shutdown,
 };

+static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
+	.dma_dev_setup = pnv_pci_dma_dev_setup,
+#ifdef CONFIG_PCI_MSI
+	.setup_msi_irqs = pnv_setup_msi_irqs,
+	.teardown_msi_irqs = pnv_teardown_msi_irqs,
+#endif
+	.enable_device_hook = pnv_pci_enable_device_hook,
+	.window_alignment = pnv_pci_window_alignment,
+	.reset_secondary_bus = pnv_pci_reset_secondary_bus,
+	.dma_set_mask = pnv_npu_dma_set_mask,
+	.shutdown = pnv_pci_ioda_shutdown,
+};
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
@@ -3102,6 +3201,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		phb->model = PNV_PHB_MODEL_P7IOC;
 	else if (of_device_is_compatible(np, "ibm,power8-pciex"))
 		phb->model = PNV_PHB_MODEL_PHB3;
+	else if (of_device_is_compatible(np, "ibm,power8-npu-pciex"))
+		phb->model = PNV_PHB_MODEL_NPU;
 	else
 		phb->model = PNV_PHB_MODEL_UNKNOWN;

@@ -3202,7 +3303,11 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 * the child P2P bridges) can form individual PE.
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
-	hose->controller_ops = pnv_pci_ioda_controller_ops;
+
+	if (phb->type == PNV_PHB_NPU)
+		hose->controller_ops = pnv_npu_ioda_controller_ops;
+	else
+		hose->controller_ops = pnv_pci_ioda_controller_ops;

 #ifdef CONFIG_PCI_IOV
 	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
@@ -3237,6 +3342,11 @@ void __init pnv_pci_init_ioda2_phb(struct device_node *np)
 	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }

+void __init pnv_pci_init_npu_phb(struct device_node *np)
+{
+	pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU);
+}
+
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
 {
 	struct device_node *phbn;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index f2dd772..ff4e42d 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -807,6 +807,10 @@ void __init pnv_pci_init(void)
 	for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
 		pnv_pci_init_ioda2_phb(np);

+	/* Look for NPU PHBs */
+	for_each_compatible_node(np, NULL, "ibm,ioda2-npu-phb")
+		pnv_pci_init_npu_phb(np);
+
 	/* Setup the linkage between OF nodes and PHBs */
 	pci_devs_phb_init();

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index c8ff50e..7f56313 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -7,6 +7,7 @@ enum pnv_phb_type {
 	PNV_PHB_P5IOC2	= 0,
 	PNV_PHB_IODA1	= 1,
 	PNV_PHB_IODA2	= 2,
+	PNV_PHB_NPU	= 3,
 };

 /* Precise PHB model for error management */
@@ -15,6 +16,7 @@ enum pnv_phb_model {
 	PNV_PHB_MODEL_P5IOC2,
 	PNV_PHB_MODEL_P7IOC,
 	PNV_PHB_MODEL_PHB3,
+	PNV_PHB_MODEL_NPU,
 };

 #define PNV_PCI_DIAG_BUF_SIZE	8192
@@ -24,6 +26,7 @@ enum pnv_phb_model {
 #define PNV_IODA_PE_MASTER	(1 << 3)	/* Master PE in compound case	*/
 #define PNV_IODA_PE_SLAVE	(1 << 4)	/* Slave PE in compound case	*/
 #define PNV_IODA_PE_VF		(1 << 5)	/* PE for one VF 		*/
+#define PNV_IODA_PE_PEER	(1 << 6)	/* PE has peers			*/

 /* Data associated with a PE, including IOMMU tracking etc.. */
 struct pnv_phb;
@@ -31,6 +34,9 @@ struct pnv_ioda_pe {
 	unsigned long		flags;
 	struct pnv_phb		*phb;

+#define PNV_IODA_MAX_PEER_PES	8
+	struct pnv_ioda_pe	*peers[PNV_IODA_MAX_PEER_PES];
+
 	/* A PE can be associated with a single device or an
 	 * entire bus (& children). In the former case, pdev
 	 * is populated, in the later case, pbus is.
@@ -229,6 +235,7 @@ extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
 extern void pnv_pci_init_p5ioc2_hub(struct device_node *np);
 extern void pnv_pci_init_ioda_hub(struct device_node *np);
 extern void pnv_pci_init_ioda2_phb(struct device_node *np);
+extern void pnv_pci_init_npu_phb(struct device_node *np);
 extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 					__be64 *startp, __be64 *endp, bool rm);
 extern void pnv_pci_reset_secondary_bus(struct pci_dev *dev);
@@ -238,4 +245,16 @@ extern void pnv_pci_dma_dev_setup(struct pci_dev *pdev);
 extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);

+/* Nvlink functions */
+extern void pnv_npu_tce_invalidate_entire(struct pnv_ioda_pe *npe);
+extern void pnv_npu_tce_invalidate(struct pnv_ioda_pe *npe,
+				       struct iommu_table *tbl,
+				       unsigned long index,
+				       unsigned long npages,
+				       bool rm);
+extern void pnv_npu_init_dma_pe(struct pnv_ioda_pe *npe);
+extern void pnv_npu_setup_dma_pe(struct pnv_ioda_pe *npe);
+extern int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe, bool enabled);
+extern int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask);
+
 #endif /* __POWERNV_PCI_H */
--
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] platforms/powernv: Add support for Nvlink NPUs
  2015-11-10  2:28   ` [PATCH v2] " Alistair Popple
@ 2015-11-10  8:51     ` kbuild test robot
  2015-11-10 10:30       ` Michael Ellerman
  2015-11-10 10:43     ` kbuild test robot
  2015-12-14  9:26     ` [v2] " Michael Ellerman
  2 siblings, 1 reply; 9+ messages in thread
From: kbuild test robot @ 2015-11-10  8:51 UTC (permalink / raw)
  To: Alistair Popple; +Cc: kbuild-all, linuxppc-dev, gwshan, Alistair Popple

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

Hi Alistair,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.3 next-20151110]

url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/platforms-powernv-Add-support-for-Nvlink-NPUs/20151110-103410
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/powernv/pci-ioda.c: In function 'pnv_ioda_setup_dev_PE':
>> arch/powerpc/platforms/powernv/pci-ioda.c:960:5: error: 'struct pci_dn' has no member named 'pcidev'
     pdn->pcidev = dev;
        ^
--
   arch/powerpc/platforms/powernv/npu-dma.c: In function 'get_pci_dev':
>> arch/powerpc/platforms/powernv/npu-dma.c:27:19: error: 'struct pci_dn' has no member named 'pcidev'
     return PCI_DN(dn)->pcidev;
                      ^
   arch/powerpc/platforms/powernv/npu-dma.c:28:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +960 arch/powerpc/platforms/powernv/pci-ioda.c

184cd4a3 Benjamin Herrenschmidt 2011-11-15  954  	 * once we actually start removing things (Hotplug, SR-IOV, ...)
184cd4a3 Benjamin Herrenschmidt 2011-11-15  955  	 *
184cd4a3 Benjamin Herrenschmidt 2011-11-15  956  	 * At some point we want to remove the PDN completely anyways
184cd4a3 Benjamin Herrenschmidt 2011-11-15  957  	 */
184cd4a3 Benjamin Herrenschmidt 2011-11-15  958  	pe = &phb->ioda.pe_array[pe_num];
184cd4a3 Benjamin Herrenschmidt 2011-11-15  959  	pci_dev_get(dev);
184cd4a3 Benjamin Herrenschmidt 2011-11-15 @960  	pdn->pcidev = dev;
184cd4a3 Benjamin Herrenschmidt 2011-11-15  961  	pdn->pe_number = pe_num;
644b77a9 Alistair Popple        2015-11-10  962  	pe->flags = PNV_IODA_PE_DEV;
184cd4a3 Benjamin Herrenschmidt 2011-11-15  963  	pe->pdev = dev;

:::::: The code at line 960 was first introduced by commit
:::::: 184cd4a3b962a4769889615430eaf40076b97969 powerpc/powernv: PCI support for p7IOC under OPAL v2

:::::: TO: Benjamin Herrenschmidt <benh@kernel.crashing.org>
:::::: CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21501 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] platforms/powernv: Add support for Nvlink NPUs
  2015-11-10  8:51     ` kbuild test robot
@ 2015-11-10 10:30       ` Michael Ellerman
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2015-11-10 10:30 UTC (permalink / raw)
  To: kbuild test robot, Alistair Popple; +Cc: linuxppc-dev, kbuild-all, gwshan

On Tue, 2015-11-10 at 16:51 +0800, kbuild test robot wrote:

> Hi Alistair,
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.3 next-20151110]

Hi Al,

In future if you can repost the full series as a v2 that would be my
preference. And it will also not confuse the bot like this :)

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] platforms/powernv: Add support for Nvlink NPUs
  2015-11-10  2:28   ` [PATCH v2] " Alistair Popple
  2015-11-10  8:51     ` kbuild test robot
@ 2015-11-10 10:43     ` kbuild test robot
  2015-12-14  9:26     ` [v2] " Michael Ellerman
  2 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2015-11-10 10:43 UTC (permalink / raw)
  To: Alistair Popple; +Cc: kbuild-all, linuxppc-dev, gwshan, Alistair Popple

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

Hi Alistair,

[auto build test WARNING on powerpc/next]
[also build test WARNING on v4.3 next-20151110]

url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/platforms-powernv-Add-support-for-Nvlink-NPUs/20151110-103410
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   arch/powerpc/platforms/powernv/npu-dma.c: In function 'get_pci_dev':
   arch/powerpc/platforms/powernv/npu-dma.c:27:19: error: 'struct pci_dn' has no member named 'pcidev'
     return PCI_DN(dn)->pcidev;
                      ^
>> arch/powerpc/platforms/powernv/npu-dma.c:28:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +28 arch/powerpc/platforms/powernv/npu-dma.c

    21	
    22	#include "powernv.h"
    23	#include "pci.h"
    24	
    25	static struct pci_dev *get_pci_dev(struct device_node *dn)
    26	{
  > 27		return PCI_DN(dn)->pcidev;
  > 28	}
    29	
    30	/* Given a NPU device get the associated PCI device. */
    31	struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 46431 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v2] platforms/powernv: Add support for Nvlink NPUs
  2015-11-10  2:28   ` [PATCH v2] " Alistair Popple
  2015-11-10  8:51     ` kbuild test robot
  2015-11-10 10:43     ` kbuild test robot
@ 2015-12-14  9:26     ` Michael Ellerman
  2015-12-15  2:46       ` Alistair Popple
  2 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2015-12-14  9:26 UTC (permalink / raw)
  To: Alistair Popple, linuxppc-dev; +Cc: gwshan, Alistair Popple

Hi Alistair,

Just a few nitty things ...

On Tue, 2015-10-11 at 02:28:11 UTC, Alistair Popple wrote:
> NV-Link is a high speed interconnect that is used in conjunction with

Is it NV-Link or NVLink?

> a PCI-E connection to create an interface between CPU and GPU that
> provides very high data bandwidth. A PCI-E connection to a GPU is used
> as the control path to initiate and report status of large data
> transfers sent via the NV-Link.
> 
> On IBM Power systems the NV-Link hardware interface is very similar to
> the existing PHB3. This patch adds support for this new NPU PHB

NPU ?

> type. DMA operations on the NPU are not supported as this patch sets
> the TCE translation tables to be the same as the related GPU PCIe
> device for each Nvlink. Therefore all DMA operations are setup and
> controlled via the PCIe device.
> 
> EEH is not presently supported for the NPU devices, although it may be
> added in future.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
> 
> This patch includes the following changes from v1:
>  - Minor variable name updates and code refactors suggested by Gavin
>  - Fixes for an issue with TCE cache invalidation
> 
>  arch/powerpc/include/asm/pci.h            |   4 +
>  arch/powerpc/platforms/powernv/Makefile   |   2 +-
>  arch/powerpc/platforms/powernv/npu-dma.c  | 339 ++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 132 +++++++++++-
>  arch/powerpc/platforms/powernv/pci.c      |   4 +
>  arch/powerpc/platforms/powernv/pci.h      |  19 ++
>  6 files changed, 488 insertions(+), 12 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/npu-dma.c
> 
> --
> 2.1.4
> 
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 3453bd8..6f8065a 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -149,4 +149,8 @@ extern void pcibios_setup_phb_io_space(struct pci_controller *hose);
>  extern void pcibios_scan_phb(struct pci_controller *hose);
> 
>  #endif	/* __KERNEL__ */
> +
> +extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
> +extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
> +
>  #endif /* __ASM_POWERPC_PCI_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index 1c8cdb6..ee774e8 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -4,7 +4,7 @@ obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
>  obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
> 
>  obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
> -obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
> +obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
>  obj-$(CONFIG_EEH)	+= eeh-powernv.o
>  obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
>  obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> new file mode 100644
> index 0000000..a1e5ba5
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -0,0 +1,339 @@
> +/*
> + * This file implements the DMA operations for Nvlink devices. The NPU
> + * devices all point to the same iommu table as the parent PCI device.
> + *
> + * Copyright Alistair Popple, IBM Corporation 2015.
> + *
> + * This program is free software; you can redistribute  it and/or modify it
> + * under  the terms of  the GNU General  Public License as published by the
> + * Free Software Foundation;  either version 2 of the  License, or (at your
> + * option) any later version.

Can you drop the any later part, that's not generally true, see COPYING.

eg:

+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.

> + */
> +
> +#include <linux/export.h>
> +#include <linux/pci.h>
> +#include <linux/memblock.h>
> +
> +#include <asm/iommu.h>
> +#include <asm/pnv-pci.h>
> +#include <asm/msi_bitmap.h>
> +#include <asm/opal.h>
> +
> +#include "powernv.h"
> +#include "pci.h"
> +
> +static struct pci_dev *get_pci_dev(struct device_node *dn)
> +{
> +	return PCI_DN(dn)->pcidev;
> +}
> +
> +/* Given a NPU device get the associated PCI device. */
> +struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev)
> +{
> +	struct device_node *dn;
> +	struct pci_dev *gpdev;
> +
> +	/* Get assoicated PCI device */
> +	dn = of_parse_phandle(npdev->dev.of_node, "ibm,gpu", 0);
> +	if (!dn)
> +		return NULL;
> +
> +	gpdev = get_pci_dev(dn);
> +	of_node_put(dn);
> +
> +	return gpdev;
> +}
> +EXPORT_SYMBOL(pnv_pci_get_gpu_dev);
> +
> +/* Given the real PCI device get a linked NPU device. */
> +struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index)
> +{
> +	struct device_node *dn;
> +	struct pci_dev *npdev;
> +
> +	/* Get assoicated PCI device */
> +	dn = of_parse_phandle(gpdev->dev.of_node, "ibm,npu", index);
> +	if (!dn)
> +		return NULL;
> +
> +	npdev = get_pci_dev(dn);
> +	of_node_put(dn);
> +
> +	return npdev;
> +}
> +EXPORT_SYMBOL(pnv_pci_get_npu_dev);
> +
> +#define NPU_DMA_OP_UNSUPPORTED()					\
> +	dev_err_once(dev, "%s operation unsupported for Nvlink devices\n", \
> +		__func__)
> +
> +static void *dma_npu_alloc(struct device *dev, size_t size,
> +			   dma_addr_t *dma_handle, gfp_t flag,
> +			   struct dma_attrs *attrs)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +	return NULL;
> +}
> +
> +static void dma_npu_free(struct device *dev, size_t size,
> +			 void *vaddr, dma_addr_t dma_handle,
> +			 struct dma_attrs *attrs)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +}
> +
> +static dma_addr_t dma_npu_map_page(struct device *dev, struct page *page,
> +				   unsigned long offset, size_t size,
> +				   enum dma_data_direction direction,
> +				   struct dma_attrs *attrs)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +	return 0;
> +}
> +
> +static int dma_npu_map_sg(struct device *dev, struct scatterlist *sglist,
> +			  int nelems, enum dma_data_direction direction,
> +			  struct dma_attrs *attrs)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +	return 0;
> +}
> +
> +static int dma_npu_dma_supported(struct device *dev, u64 mask)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +	return 0;
> +}
> +
> +static u64 dma_npu_get_required_mask(struct device *dev)
> +{
> +	NPU_DMA_OP_UNSUPPORTED();
> +	return 0;
> +}
> +
> +struct dma_map_ops dma_npu_ops = {
> +	.map_page		= dma_npu_map_page,
> +	.map_sg			= dma_npu_map_sg,
> +	.alloc			= dma_npu_alloc,
> +	.free			= dma_npu_free,
> +	.dma_supported		= dma_npu_dma_supported,
> +	.get_required_mask	= dma_npu_get_required_mask,
> +};
> +
> +/* Returns the PE assoicated with the PCI device of the given
> + * NPU. Returns the linked pci device if pci_dev != NULL.
> + */

Can you reformat all your block comments the right way:

> +/*
> + * Returns the PE assoicated with the PCI device of the given
> + * NPU. Returns the linked pci device if pci_dev != NULL.
> + */

> +static struct pnv_ioda_pe *get_gpu_pci_dev_and_pe(struct pnv_ioda_pe *npe,
> +						  struct pci_dev **gpdev)
> +{
> +	struct pnv_phb *phb;
> +	struct pci_controller *hose;

I thought we were trying to use phb rather than hose these days, but dunno.

> +	struct pci_dev *pdev;
> +	struct pnv_ioda_pe *pe;
> +	struct pci_dn *pdn;
> +
> +	if (npe->flags & PNV_IODA_PE_PEER) {
> +		pe = npe->peers[0];
> +		pdev = pe->pdev;
> +	} else {
> +		pdev = pnv_pci_get_gpu_dev(npe->pdev);
> +		if (!pdev)
> +			return NULL;
> +
> +		pdn = pci_get_pdn(pdev);
> +		if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> +			return NULL;
> +
> +		hose = pci_bus_to_host(pdev->bus);
> +		phb = hose->private_data;
> +		pe = &phb->ioda.pe_array[pdn->pe_number];
> +	}
> +
> +	if (gpdev)
> +		*gpdev = pdev;
> +
> +	return pe;
> +}
> +
> +void pnv_npu_tce_invalidate_entire(struct pnv_ioda_pe *npe)
> +{
> +	struct pnv_phb *phb = npe->phb;
> +
> +	/* We can only invalidate the whole cache on NPU */
> +	unsigned long val = (0x8ull << 60);

Are these masks and shifts documented anywhere?

> +	if (phb->type != PNV_PHB_NPU ||
> +	    !phb->ioda.tce_inval_reg ||
> +	    !(npe->flags & PNV_IODA_PE_DEV))
> +		return;

Should any of those ever happen, or could this be a WARN_ON() ?

> +	mb(); /* Ensure above stores are visible */

What stores?

> +	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
> +}
> +
> +void pnv_npu_tce_invalidate(struct pnv_ioda_pe *npe,
> +				struct iommu_table *tbl,
> +				unsigned long index,
> +				unsigned long npages,
> +				bool rm)
> +{
> +	struct pnv_phb *phb = npe->phb;
> +
> +	/* We can only invalidate the whole cache on NPU */
> +	unsigned long val = (0x8ull << 60);
> +
> +	if (phb->type != PNV_PHB_NPU ||
> +	    !phb->ioda.tce_inval_reg ||
> +	    !(npe->flags & PNV_IODA_PE_DEV))
> +		return;

Ditto.

> +
> +	mb();

What's the mb doing?

> +	if (rm)
> +		__asm__ __volatile__("stdcix %0,0,%1" : :
> +				"r"(cpu_to_be64(val)),
> +				"r" (phb->ioda.tce_inval_reg_phys) :
> +				"memory");

I see this in pci-ioda.c as __raw_rm_writeq(). Can you put that in a shared
header and use it?

> +	else
> +		__raw_writeq(cpu_to_be64(val),
> +			phb->ioda.tce_inval_reg);
> +}
> +
> +void pnv_npu_init_dma_pe(struct pnv_ioda_pe *npe)
> +{
> +	struct pnv_ioda_pe *gpe;
> +	struct pci_dev *gpdev;
> +	int i, avail = -1;
> +
> +	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
> +		return;
> +
> +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> +	if (!gpe)
> +		return;
> +
> +	/* Nothing to do if the PEs are already connected */

Should that comment be on the if below instead?

> +	for (i = 0; i < PNV_IODA_MAX_PEER_PES; i++) {
> +		if (avail < 0 && !gpe->peers[i])
> +			avail = i;
> +
> +		if (gpe->peers[i] == npe)
> +			return;
> +	}
> +
> +	if (WARN_ON(avail < 0))
> +		return;
> +
> +	gpe->peers[avail] = npe;
> +	gpe->flags |= PNV_IODA_PE_PEER;

I don't see any locking around peers, I assume we're only called single
threaded? What about hot plug?

> +
> +	/* We assume that the NPU devices only have a single peer PE
> +	 * (the GPU PCIe device PE). */
> +	npe->peers[0] = gpe;

How did we ensure avail wasn't 0 ?

> +	npe->flags |= PNV_IODA_PE_PEER;
> +}
> +
> +/* For the NPU we want to point the TCE table at the same table as the
> + * real PCI device.
> + */
> +static void pnv_npu_disable_bypass(struct pnv_ioda_pe *npe)
> +{
> +	struct pnv_phb *phb = npe->phb;
> +	struct pci_dev *gpdev;
> +	struct pnv_ioda_pe *gpe;
> +	void *addr;
> +	unsigned int size;
> +	int64_t rc;
> +
> +	/* Find the assoicated PCI devices and get the dma window
> +	 * information from there.
> +	 */
> +	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
> +		return;
> +
> +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> +	if (!gpe)
> +		return;
> +
> +	addr = (void *)gpe->table_group.tables[0]->it_base;
> +	size = gpe->table_group.tables[0]->it_size << 3;
> +	rc = opal_pci_map_pe_dma_window(phb->opal_id, npe->pe_number,
> +					npe->pe_number, 1, __pa(addr),
> +					size, 0x1000);
> +	if (rc != OPAL_SUCCESS)
> +		pr_warn("%s: Error %lld setting DMA window on PHB#%d-PE#%d\n",
> +			__func__, rc, phb->hose->global_number, npe->pe_number);
> +
> +	/* We don't initialise npu_pe->tce32_table as we always use
> +	 * dma_npu_ops which are nops.
> +	 */
> +	set_dma_ops(&npe->pdev->dev, &dma_npu_ops);
> +}
> +
> +/* Enable/disable bypass mode on the NPU. The NPU only supports one
> + * window per brick, so bypass needs to be explicity enabled or

brick?

> + * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
> + * active at the same time.
> + */
> +int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe, bool enabled)

enabled should be "enable", you're asking for it to be enabled, it's not
already enabled.

> +{
> +	struct pnv_phb *phb = npe->phb;
> +	int64_t rc = 0;
> +
> +	if (phb->type != PNV_PHB_NPU || !npe->pdev)
> +		return -EINVAL;
> +
> +	if (enabled) {
> +		/* Enable the bypass window */
> +		phys_addr_t top = memblock_end_of_DRAM();
> +
> +		npe->tce_bypass_base = 0;
> +		top = roundup_pow_of_two(top);
> +		dev_info(&npe->pdev->dev, "Enabling bypass for PE %d\n",
> +			 npe->pe_number);
> +		rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
> +					npe->pe_number, npe->pe_number,
> +					npe->tce_bypass_base, top);
> +	} else {
> +		/* Disable the bypass window by replacing it with the
> +		 * TCE32 window.
> +		 */
> +		pnv_npu_disable_bypass(npe);
> +	}
> +
> +	return rc;
> +}
> +
> +int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct pci_dn *pdn = pci_get_pdn(npdev);
> +	struct pnv_ioda_pe *npe, *gpe;
> +	struct pci_dev *gpdev;
> +	uint64_t top;
> +	bool bypass = false;
> +
> +	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> +		return -ENXIO;
> +
> +	/* We only do bypass if it's enabled on the linked device */
> +	npe = &phb->ioda.pe_array[pdn->pe_number];
> +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> +	if (!gpe)
> +		return -ENODEV;
> +
> +	if (gpe->tce_bypass_enabled) {
> +		top = gpe->tce_bypass_base + memblock_end_of_DRAM() - 1;
> +		bypass = (dma_mask >= top);
> +	}
> +
> +	if (bypass)
> +		dev_info(&npdev->dev, "Using 64-bit DMA iommu bypass\n");
> +	else
> +		dev_info(&npdev->dev, "Using 32-bit DMA via iommu\n");
> +
> +	pnv_npu_dma_set_bypass(npe, bypass);
> +	*npdev->dev.dma_mask = dma_mask;
> +
> +	return 0;
> +}
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 42b4bb2..8bed20d 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -781,7 +781,8 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
>  	}
> 
>  	/* Configure PELTV */
> -	pnv_ioda_set_peltv(phb, pe, true);
> +	if (phb->type != PNV_PHB_NPU)

Why not?

> +		pnv_ioda_set_peltv(phb, pe, true);
> 
>  	/* Setup reverse map */
>  	for (rid = pe->rid; rid < rid_end; rid++)

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v2] platforms/powernv: Add support for Nvlink NPUs
  2015-12-14  9:26     ` [v2] " Michael Ellerman
@ 2015-12-15  2:46       ` Alistair Popple
  0 siblings, 0 replies; 9+ messages in thread
From: Alistair Popple @ 2015-12-15  2:46 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, gwshan

Thanks for the review Michael. I'll do a respin to address the comments below.

On Mon, 14 Dec 2015 20:26:27 Michael Ellerman wrote:
> Hi Alistair,
> 
> Just a few nitty things ...
> 
> On Tue, 2015-10-11 at 02:28:11 UTC, Alistair Popple wrote:
> > NV-Link is a high speed interconnect that is used in conjunction with
> 
> Is it NV-Link or NVLink?

I've seen NV-Link, NVLink and NVLINK in various bits of documentation and 
probably used a mixture of those variations myself. Will standardise on NVLink 
for OPAL and Linux.

> > a PCI-E connection to create an interface between CPU and GPU that
> > provides very high data bandwidth. A PCI-E connection to a GPU is used
> > as the control path to initiate and report status of large data
> > transfers sent via the NV-Link.
> > 
> > On IBM Power systems the NV-Link hardware interface is very similar to
> > the existing PHB3. This patch adds support for this new NPU PHB
> 
> NPU ?

NVLink Processing Unit.

> > type. DMA operations on the NPU are not supported as this patch sets
> > the TCE translation tables to be the same as the related GPU PCIe
> > device for each Nvlink. Therefore all DMA operations are setup and
> > controlled via the PCIe device.
> > 
> > EEH is not presently supported for the NPU devices, although it may be
> > added in future.
> > 
> > Signed-off-by: Alistair Popple <alistair@popple.id.au>
> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > ---

<snip>

> > new file mode 100644
> > index 0000000..a1e5ba5
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -0,0 +1,339 @@
> > +/*
> > + * This file implements the DMA operations for Nvlink devices. The NPU
> > + * devices all point to the same iommu table as the parent PCI device.
> > + *
> > + * Copyright Alistair Popple, IBM Corporation 2015.
> > + *
> > + * This program is free software; you can redistribute  it and/or modify 
it
> > + * under  the terms of  the GNU General  Public License as published by 
the
> > + * Free Software Foundation;  either version 2 of the  License, or (at 
your
> > + * option) any later version.
> 
> Can you drop the any later part, that's not generally true, see COPYING.
> 
> eg:
> 
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.

Good point.

<snip>

> > +
> > +/* Returns the PE assoicated with the PCI device of the given
> > + * NPU. Returns the linked pci device if pci_dev != NULL.
> > + */
> 
> Can you reformat all your block comments the right way:
> 
> > +/*
> > + * Returns the PE assoicated with the PCI device of the given
> > + * NPU. Returns the linked pci device if pci_dev != NULL.
> > + */

Sure.

> > +static struct pnv_ioda_pe *get_gpu_pci_dev_and_pe(struct pnv_ioda_pe 
*npe,
> > +						  struct pci_dev **gpdev)
> > +{
> > +	struct pnv_phb *phb;
> > +	struct pci_controller *hose;
> 
> I thought we were trying to use phb rather than hose these days, but dunno.

I'm not sure I follow - what do you mean here?

hose = pci_bus_to_host(pdev->bus);
phb = hose->private_data;

Seems to be a fairly typical pattern in pci-ioda.c to get the struct pnv_phb 
from a struct pci_dev. Is there an alternative I should be using instead?

> > +	struct pci_dev *pdev;
> > +	struct pnv_ioda_pe *pe;
> > +	struct pci_dn *pdn;
> > +
> > +	if (npe->flags & PNV_IODA_PE_PEER) {
> > +		pe = npe->peers[0];
> > +		pdev = pe->pdev;
> > +	} else {
> > +		pdev = pnv_pci_get_gpu_dev(npe->pdev);
> > +		if (!pdev)
> > +			return NULL;
> > +
> > +		pdn = pci_get_pdn(pdev);
> > +		if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> > +			return NULL;
> > +
> > +		hose = pci_bus_to_host(pdev->bus);
> > +		phb = hose->private_data;
> > +		pe = &phb->ioda.pe_array[pdn->pe_number];
> > +	}
> > +
> > +	if (gpdev)
> > +		*gpdev = pdev;
> > +
> > +	return pe;
> > +}
> > +
> > +void pnv_npu_tce_invalidate_entire(struct pnv_ioda_pe *npe)
> > +{
> > +	struct pnv_phb *phb = npe->phb;
> > +
> > +	/* We can only invalidate the whole cache on NPU */
> > +	unsigned long val = (0x8ull << 60);
> 
> Are these masks and shifts documented anywhere?

Not publicly afaik, but it would be good to add the definitions here so I'll 
add the #define's for that register.

> > +	if (phb->type != PNV_PHB_NPU ||
> > +	    !phb->ioda.tce_inval_reg ||
> > +	    !(npe->flags & PNV_IODA_PE_DEV))
> > +		return;
> 
> Should any of those ever happen, or could this be a WARN_ON() ?

Nope. WARN_ON() would be best.

> > +	mb(); /* Ensure above stores are visible */
> 
> What stores?

TCE invalidation requires a write to the TCE table in system memory before 
invalidating the NPU TCE cache by writing values to the tce_inval_reg. The 
barrier ensures that any writes to the TCE table (which occur in the callers 
of this function) are visible to the NPU prior to invalidating the cache.

At least I assume that's what it's there for - to be honest I copied it and 
the vague comment from pnv_pci_ioda2_do_tce_invalidate(). I will add a more 
descriptive comment for NPU :)

> > +	__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
> > +}
> > +
> > +void pnv_npu_tce_invalidate(struct pnv_ioda_pe *npe,
> > +				struct iommu_table *tbl,
> > +				unsigned long index,
> > +				unsigned long npages,
> > +				bool rm)
> > +{
> > +	struct pnv_phb *phb = npe->phb;
> > +
> > +	/* We can only invalidate the whole cache on NPU */
> > +	unsigned long val = (0x8ull << 60);
> > +
> > +	if (phb->type != PNV_PHB_NPU ||
> > +	    !phb->ioda.tce_inval_reg ||
> > +	    !(npe->flags & PNV_IODA_PE_DEV))
> > +		return;
> 
> Ditto.
> 
> > +
> > +	mb();
> 
> What's the mb doing?

As above.

> > +	if (rm)
> > +		__asm__ __volatile__("stdcix %0,0,%1" : :
> > +				"r"(cpu_to_be64(val)),
> > +				"r" (phb->ioda.tce_inval_reg_phys) :
> > +				"memory");
> 
> I see this in pci-ioda.c as __raw_rm_writeq(). Can you put that in a shared
> header and use it?

Sure.

> > +	else
> > +		__raw_writeq(cpu_to_be64(val),
> > +			phb->ioda.tce_inval_reg);
> > +}
> > +
> > +void pnv_npu_init_dma_pe(struct pnv_ioda_pe *npe)
> > +{
> > +	struct pnv_ioda_pe *gpe;
> > +	struct pci_dev *gpdev;
> > +	int i, avail = -1;
> > +
> > +	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
> > +		return;
> > +
> > +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> > +	if (!gpe)
> > +		return;
> > +
> > +	/* Nothing to do if the PEs are already connected */
> 
> Should that comment be on the if below instead?

Yep. And it looks like that block could be simplified a bit as well.

> > +	for (i = 0; i < PNV_IODA_MAX_PEER_PES; i++) {
> > +		if (avail < 0 && !gpe->peers[i])
> > +			avail = i;
> > +
> > +		if (gpe->peers[i] == npe)
> > +			return;
> > +	}
> > +
> > +	if (WARN_ON(avail < 0))
> > +		return;
> > +
> > +	gpe->peers[avail] = npe;
> > +	gpe->flags |= PNV_IODA_PE_PEER;
> 
> I don't see any locking around peers, I assume we're only called single
> threaded? What about hot plug?

It should only be called single threaded. Hot plug is not supported at the 
moment.

> > +
> > +	/* We assume that the NPU devices only have a single peer PE
> > +	 * (the GPU PCIe device PE). */
> > +	npe->peers[0] = gpe;
> 
> How did we ensure avail wasn't 0 ?

We don't. This is the npe (ie. the NVLink PE) which only ever has a single 
peer PE which is the GPU PE (ie. gpe). In other words we never assign anything 
to npe->peers[avail] (although we do to gpe->peers[avail]).

> > +	npe->flags |= PNV_IODA_PE_PEER;
> > +}
> > +
> > +/* For the NPU we want to point the TCE table at the same table as the
> > + * real PCI device.
> > + */
> > +static void pnv_npu_disable_bypass(struct pnv_ioda_pe *npe)
> > +{
> > +	struct pnv_phb *phb = npe->phb;
> > +	struct pci_dev *gpdev;
> > +	struct pnv_ioda_pe *gpe;
> > +	void *addr;
> > +	unsigned int size;
> > +	int64_t rc;
> > +
> > +	/* Find the assoicated PCI devices and get the dma window
> > +	 * information from there.
> > +	 */
> > +	if (!npe->pdev || !(npe->flags & PNV_IODA_PE_DEV))
> > +		return;
> > +
> > +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> > +	if (!gpe)
> > +		return;
> > +
> > +	addr = (void *)gpe->table_group.tables[0]->it_base;
> > +	size = gpe->table_group.tables[0]->it_size << 3;
> > +	rc = opal_pci_map_pe_dma_window(phb->opal_id, npe->pe_number,
> > +					npe->pe_number, 1, __pa(addr),
> > +					size, 0x1000);
> > +	if (rc != OPAL_SUCCESS)
> > +		pr_warn("%s: Error %lld setting DMA window on PHB#%d-PE#%d\n",
> > +			__func__, rc, phb->hose->global_number, npe-
>pe_number);
> > +
> > +	/* We don't initialise npu_pe->tce32_table as we always use
> > +	 * dma_npu_ops which are nops.
> > +	 */
> > +	set_dma_ops(&npe->pdev->dev, &dma_npu_ops);
> > +}
> > +
> > +/* Enable/disable bypass mode on the NPU. The NPU only supports one
> > + * window per brick, so bypass needs to be explicity enabled or
> 
> brick?

Link. Brick is an old term which should have been removed, obviously I missed 
this one.

> > + * disabled. Unlike for a PHB3 bypass and non-bypass modes can't be
> > + * active at the same time.
> > + */
> > +int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe, bool enabled)
> 
> enabled should be "enable", you're asking for it to be enabled, it's not
> already enabled.

Yep.

> > +{
> > +	struct pnv_phb *phb = npe->phb;
> > +	int64_t rc = 0;
> > +
> > +	if (phb->type != PNV_PHB_NPU || !npe->pdev)
> > +		return -EINVAL;
> > +
> > +	if (enabled) {
> > +		/* Enable the bypass window */
> > +		phys_addr_t top = memblock_end_of_DRAM();
> > +
> > +		npe->tce_bypass_base = 0;
> > +		top = roundup_pow_of_two(top);
> > +		dev_info(&npe->pdev->dev, "Enabling bypass for PE %d\n",
> > +			 npe->pe_number);
> > +		rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
> > +					npe->pe_number, npe->pe_number,
> > +					npe->tce_bypass_base, top);
> > +	} else {
> > +		/* Disable the bypass window by replacing it with the
> > +		 * TCE32 window.
> > +		 */
> > +		pnv_npu_disable_bypass(npe);
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +int pnv_npu_dma_set_mask(struct pci_dev *npdev, u64 dma_mask)
> > +{
> > +	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> > +	struct pnv_phb *phb = hose->private_data;
> > +	struct pci_dn *pdn = pci_get_pdn(npdev);
> > +	struct pnv_ioda_pe *npe, *gpe;
> > +	struct pci_dev *gpdev;
> > +	uint64_t top;
> > +	bool bypass = false;
> > +
> > +	if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> > +		return -ENXIO;
> > +
> > +	/* We only do bypass if it's enabled on the linked device */
> > +	npe = &phb->ioda.pe_array[pdn->pe_number];
> > +	gpe = get_gpu_pci_dev_and_pe(npe, &gpdev);
> > +	if (!gpe)
> > +		return -ENODEV;
> > +
> > +	if (gpe->tce_bypass_enabled) {
> > +		top = gpe->tce_bypass_base + memblock_end_of_DRAM() - 1;
> > +		bypass = (dma_mask >= top);
> > +	}
> > +
> > +	if (bypass)
> > +		dev_info(&npdev->dev, "Using 64-bit DMA iommu bypass\n");
> > +	else
> > +		dev_info(&npdev->dev, "Using 32-bit DMA via iommu\n");
> > +
> > +	pnv_npu_dma_set_bypass(npe, bypass);
> > +	*npdev->dev.dma_mask = dma_mask;
> > +
> > +	return 0;
> > +}
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 42b4bb2..8bed20d 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -781,7 +781,8 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, 
struct pnv_ioda_pe *pe)
> >  	}
> > 
> >  	/* Configure PELTV */
> > -	pnv_ioda_set_peltv(phb, pe, true);
> > +	if (phb->type != PNV_PHB_NPU)
> 
> Why not?

Because NPUs don't support it. I will add a comment to that effect.

> > +		pnv_ioda_set_peltv(phb, pe, true);
> > 
> >  	/* Setup reverse map */
> >  	for (rid = pe->rid; rid < rid_end; rid++)
> 
> cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-15  2:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-28  5:00 [PATCH 0/2] Add support for Nvlink Alistair Popple
2015-10-28  5:00 ` [PATCH 1/2] Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field" Alistair Popple
2015-10-28  5:00 ` [PATCH 2/2] platforms/powernv: Add support for Nvlink NPUs Alistair Popple
2015-11-10  2:28   ` [PATCH v2] " Alistair Popple
2015-11-10  8:51     ` kbuild test robot
2015-11-10 10:30       ` Michael Ellerman
2015-11-10 10:43     ` kbuild test robot
2015-12-14  9:26     ` [v2] " Michael Ellerman
2015-12-15  2:46       ` Alistair Popple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).