All of lore.kernel.org
 help / color / mirror / Atom feed
* powerpc / cxl: Add support for the Mellanox CX4 in cxl mode
@ 2016-07-04 13:21 Ian Munsie
  2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
                   ` (13 more replies)
  0 siblings, 14 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:21 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen

This series adds support for the Mellanox CX4 network adapter operating in cxl
mode to the cxl driver and the PowerNV PHB code. The Mellanox developers will
submit a separate patch series that makes use of this in the mlx5 driver.

The CX4 card can operate in either pci mode, or cxl mode. In cxl mode, memory
accesses from the card go through the XSL (Translation Service Layer,
essentially a stripped down version of the Power Service Layer), allowing it to
transparently access unpinned memory with the cxl driver handling faulting in
pages as necessary, etc. Most of the support for the XSL is already upstream,
though this series does include a bug fix to enable bus mastering for this
(patch 3).

Patch 2 in this series provides an API which the mlx5 driver can query to check
if it is in a cxl capable slot. The card will come up in pci mode, and the mlx5
driver can choose to switch it to cxl mode, wherein it will reappear with an
additional physical function representing the XSL that the cxl driver will bind
to. Patches 12-14 add support for switching the card's mode, including using
the PCI hotplug support to re-enumerate the device tree and re-probind the
card.

Unlike previous users of the cxl kernel API where we used a virtual PHB and
exposed PCI devices under it, the Mellanox CX4 uses a peer model where cxl
binds to one of the physical functions of the card and the mlx5_core driver
binds to the other networking physical functions. Patches 6 and 7 add support
for using the cxl kernel API with the real PHB to enable this peer model.
Patches 4 and 5 are prepatory patches exposing some APIs that the PHB will need
to call.

While in cxl mode, interrupts from the CX4 are a little unusual - they are
neither pci interrupts, nor cxl interrutps, but rather a hybrid of the two. The
interrupts are passed from the networking hardware to the XSL using a custom
format in the MSIX table, and from there are treated as cxl interrupts. These
are configured mostly transparently using the standard msix APIs - the PHB
handles allocating and configuring the cxl interrupts, associating them with
the default context, and the mlx5 driver handles filling out the MSIX table
with their custom format (not included in this series). See patch 10.

Additionally, the CX4 has a hard limitation of the number of interrupts that
can be associated with a given context, so to overcome this patches 8 and 9
expose an API to allow the mlx5 driver to inform us of the limit, and the
interrupt allocation code in patch 10 will allocate additional contexts to
associate these with.

Patch 1 is a prepatory cleanup patch to reorganise cxl code in arch/powerpc
into a separate file.

Patch 11 is a workaround for a hardware limitation in the CX4 where a context
with PE=0 cannot be used.

Note that patch 2 depends on "cxl: Ignore CAPI adapters misplaced in switched
slot" by Philippe Bergheaud:
http://patchwork.ozlabs.org/patch/642920/

Additionally, the following stand-alone patches related to the CX4 are also
pending on the mainling list, but are *not* dependencies of this series:
- cxl: Fix bug where AFU disable operation had no effect
- cxl: Workaround XSL bug that does not clear the RA bit after a reset
- cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

The entire series is bisectable.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
@ 2016-07-04 13:21 ` Ian Munsie
  2016-07-06  3:44   ` Andrew Donnellan
  2016-07-06 16:27   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 02/14] cxl: Add cxl_slot_is_supported API Ian Munsie
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:21 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The support for using the Mellanox CX4 in cxl mode will require
additions to the PHB code. In preparation for this, move the existing
cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
more organised.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/pci-cxl.c  | 163 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 159 +----------------------------
 arch/powerpc/platforms/powernv/pci.h      |   6 ++
 4 files changed, 173 insertions(+), 156 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-cxl.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index cd9711e..b5d98cb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,6 +6,7 @@ obj-y			+= opal-kmsg.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
new file mode 100644
index 0000000..ea8171f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/pnv-pci.h>
+#include <asm/opal.h>
+
+#include "pci.h"
+
+struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+	return of_node_get(hose->dn);
+}
+EXPORT_SYMBOL(pnv_pci_get_phb_node);
+
+int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	pe = pnv_ioda_get_pe(dev);
+	if (!pe)
+		return -ENODEV;
+
+	pe_info(pe, "Switching PHB to CXL\n");
+
+	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, mode, pe->pe_number);
+	if (rc == OPAL_UNSUPPORTED)
+		dev_err(&dev->dev, "Required cxl mode not supported by firmware - update skiboot\n");
+	else if (rc)
+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
+
+	return rc;
+}
+EXPORT_SYMBOL(pnv_phb_to_cxl_mode);
+
+/* Find PHB for cxl dev and allocate MSI hwirqs?
+ * Returns the absolute hardware IRQ number
+ */
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
+
+	if (hwirq < 0) {
+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	return phb->msi_base + hwirq;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
+
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
+
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int i, hwirq;
+
+	for (i = 1; i < CXL_IRQ_RANGES; i++) {
+		if (!irqs->range[i])
+			continue;
+		pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
+			 i, irqs->offset[i],
+			 irqs->range[i]);
+		hwirq = irqs->offset[i] - phb->msi_base;
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+				       irqs->range[i]);
+	}
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
+
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int i, hwirq, try;
+
+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
+
+	/* 0 is reserved for the multiplexed PSL DSI interrupt */
+	for (i = 1; i < CXL_IRQ_RANGES && num; i++) {
+		try = num;
+		while (try) {
+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
+			if (hwirq >= 0)
+				break;
+			try /= 2;
+		}
+		if (!try)
+			goto fail;
+
+		irqs->offset[i] = phb->msi_base + hwirq;
+		irqs->range[i] = try;
+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
+			 i, irqs->offset[i], irqs->range[i]);
+		num -= try;
+	}
+	if (num)
+		goto fail;
+
+	return 0;
+fail:
+	pnv_cxl_release_hwirq_ranges(irqs, dev);
+	return -ENOSPC;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
+
+int pnv_cxl_get_irq_count(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	return phb->msi_bmp.irq_count;
+}
+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
+
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	unsigned int xive_num = hwirq - phb->msi_base;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev)))
+		return -ENODEV;
+
+	/* Assign XIVE to PE */
+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
+	if (rc) {
+		pe_warn(pe, "%s: OPAL error %d setting msi_base 0x%x "
+			"hwirq 0x%x XIVE 0x%x PE\n",
+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
+		return -EIO;
+	}
+	pnv_set_msi_irq_chip(phb, virq);
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2115ed7..e0d8103 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -595,7 +595,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
  * but in the meantime, we need to protect them to avoid warnings
  */
 #ifdef CONFIG_PCI_MSI
-static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
+struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -2700,7 +2700,7 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
 }
 
 
-static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
+void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 {
 	struct irq_data *idata;
 	struct irq_chip *ichip;
@@ -2722,159 +2722,6 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	irq_set_chip(virq, &phb->ioda.irq_chip);
 }
 
-#ifdef CONFIG_CXL_BASE
-
-struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-
-	return of_node_get(hose->dn);
-}
-EXPORT_SYMBOL(pnv_pci_get_phb_node);
-
-int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
-	int rc;
-
-	pe = pnv_ioda_get_pe(dev);
-	if (!pe)
-		return -ENODEV;
-
-	pe_info(pe, "Switching PHB to CXL\n");
-
-	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, mode, pe->pe_number);
-	if (rc == OPAL_UNSUPPORTED)
-		dev_err(&dev->dev, "Required cxl mode not supported by firmware - update skiboot\n");
-	else if (rc)
-		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
-
-	return rc;
-}
-EXPORT_SYMBOL(pnv_phb_to_cxl_mode);
-
-/* Find PHB for cxl dev and allocate MSI hwirqs?
- * Returns the absolute hardware IRQ number
- */
-int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
-
-	if (hwirq < 0) {
-		dev_warn(&dev->dev, "Failed to find a free MSI\n");
-		return -ENOSPC;
-	}
-
-	return phb->msi_base + hwirq;
-}
-EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
-
-void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-
-	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
-}
-EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
-
-void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
-				  struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int i, hwirq;
-
-	for (i = 1; i < CXL_IRQ_RANGES; i++) {
-		if (!irqs->range[i])
-			continue;
-		pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
-			 i, irqs->offset[i],
-			 irqs->range[i]);
-		hwirq = irqs->offset[i] - phb->msi_base;
-		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
-				       irqs->range[i]);
-	}
-}
-EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
-
-int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
-			       struct pci_dev *dev, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int i, hwirq, try;
-
-	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
-
-	/* 0 is reserved for the multiplexed PSL DSI interrupt */
-	for (i = 1; i < CXL_IRQ_RANGES && num; i++) {
-		try = num;
-		while (try) {
-			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
-			if (hwirq >= 0)
-				break;
-			try /= 2;
-		}
-		if (!try)
-			goto fail;
-
-		irqs->offset[i] = phb->msi_base + hwirq;
-		irqs->range[i] = try;
-		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
-			 i, irqs->offset[i], irqs->range[i]);
-		num -= try;
-	}
-	if (num)
-		goto fail;
-
-	return 0;
-fail:
-	pnv_cxl_release_hwirq_ranges(irqs, dev);
-	return -ENOSPC;
-}
-EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
-
-int pnv_cxl_get_irq_count(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-
-	return phb->msi_bmp.irq_count;
-}
-EXPORT_SYMBOL(pnv_cxl_get_irq_count);
-
-int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
-			   unsigned int virq)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	unsigned int xive_num = hwirq - phb->msi_base;
-	struct pnv_ioda_pe *pe;
-	int rc;
-
-	if (!(pe = pnv_ioda_get_pe(dev)))
-		return -ENODEV;
-
-	/* Assign XIVE to PE */
-	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
-	if (rc) {
-		pe_warn(pe, "%s: OPAL error %d setting msi_base 0x%x "
-			"hwirq 0x%x XIVE 0x%x PE\n",
-			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
-		return -EIO;
-	}
-	set_msi_irq_chip(phb, virq);
-
-	return 0;
-}
-EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
-#endif
-
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
@@ -2931,7 +2778,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
-	set_msi_irq_chip(phb, virq);
+	pnv_set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %d\n",
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3a97990..49c2997 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -1,6 +1,10 @@
 #ifndef __POWERNV_PCI_H
 #define __POWERNV_PCI_H
 
+#include <linux/iommu.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+
 struct pci_dn;
 
 enum pnv_phb_type {
@@ -212,6 +216,8 @@ extern void pnv_pci_dma_dev_setup(struct pci_dev *pdev);
 extern void pnv_pci_dma_bus_setup(struct pci_bus *bus);
 extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
+extern struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev);
+extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq);
 
 extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/14] cxl: Add cxl_slot_is_supported API
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
  2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06  2:02   ` Andrew Donnellan
  2016-07-06 16:36   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Philippe Bergheaud

From: Ian Munsie <imunsie@au1.ibm.com>

This extends the check that the adapter is in a CAPI capable slot so
that it may be called by external users in the kernel API. This will be
used by the upcoming Mellanox CX4 support, which needs to know ahead of
time if the card can be switched to cxl mode so that it can leave it in
PCI mode if it is not.

This API takes a parameter to check if CAPP DMA mode is supported, which
it currently only allows on P8NVL systems, since that mode currently has
issues accessing memory < 4GB on P8, and we cannot realistically avoid
that.

This API does not currently check if a CAPP unit is available (i.e. not
already assigned to another PHB) on P8. Doing so would be racy since it
is assigned on a first come first serve basis, and so long as CAPP DMA
mode is not supported on P8 we don't need this, since the only
anticipated user of this API requires CAPP DMA mode.

Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/pci.c | 37 +++++++++++++++++++++++++++++++++++++
 include/misc/cxl.h     | 15 +++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 3a5f980..9530280 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1426,6 +1426,43 @@ static int cxl_slot_is_switched(struct pci_dev *dev)
 	return (depth > CXL_MAX_PCIEX_PARENT);
 }
 
+bool cxl_slot_is_supported(struct pci_dev *dev, int flags)
+{
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return false;
+
+	if ((flags & CXL_SLOT_FLAG_DMA) && (!pvr_version_is(PVR_POWER8NVL))) {
+		/*
+		 * CAPP DMA mode is technically supported on regular P8, but
+		 * will EEH if the card attempts to acccess memory < 4GB, which
+		 * we cannot realistically avoid. We might be able to work
+		 * around the issue, but until then return unsupported:
+		 */
+		return false;
+	}
+
+	if (cxl_slot_is_switched(dev))
+		return false;
+
+	/*
+	 * XXX: This gets a little tricky on regular P8 (not POWER8NVL) since
+	 * the CAPP can be connected to PHB 0, 1 or 2 on a first come first
+	 * served basis, which is racy to check from here. If we need to
+	 * support this in future we might need to consider having this
+	 * function effectively reserve it ahead of time.
+	 *
+	 * Currently, the only user of this API is the Mellanox CX4, which is
+	 * only supported on P8NVL due to the above mentioned limitation of
+	 * CAPP DMA mode and therefore does not need to worry about thi. If the
+	 * issue with CAPP DMA mode is later worked around on P8 we might need
+	 * to revisit this.
+	 */
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(cxl_slot_is_supported);
+
+
 static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	struct cxl *adapter;
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index b6d040f..dd9eebb 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -24,6 +24,21 @@
  * generic PCI API. This API is agnostic to the actual AFU.
  */
 
+#define CXL_SLOT_FLAG_DMA 0x1
+
+/*
+ * Checks if the given card is in a cxl capable slot. Pass CXL_SLOT_FLAG_DMA if
+ * the card requires CAPP DMA mode to also check if the system supports it.
+ * This is intended to be used by bi-modal devices to determine if they can use
+ * cxl mode or if they should continue running in PCI mode.
+ *
+ * Note that this only checks if the slot is cxl capable - it does not
+ * currently check if the CAPP is currently available for chips where it can be
+ * assigned to different PHBs on a first come first serve basis (i.e. P8)
+ */
+bool cxl_slot_is_supported(struct pci_dev *dev, int flags);
+
+
 /* Get the AFU associated with a pci_dev */
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
  2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
  2016-07-04 13:22 ` [PATCH 02/14] cxl: Add cxl_slot_is_supported API Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06  4:04   ` Andrew Donnellan
  2016-07-06 16:37   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
master to be enabled in order for the CAPI traffic to flow. This should
be harmless to enable for other cxl devices, so unconditionally enable
it in the adapter init flow.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 9530280..6c0597d 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1264,6 +1264,9 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = adapter->native->sl_ops->adapter_regs_init(adapter, dev)))
 		goto err;
 
+	/* Required for devices using CAPP DMA mode, harmless for others */
+	pci_set_master(dev);
+
 	if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode)))
 		goto err;
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (2 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-05  2:10   ` Andrew Donnellan
  2016-07-06 16:45   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 uses a model where the AFU is one physical function of
the device, and is used by other peer physical functions of the same
device. This will require those other devices to grab a reference on the
AFU when they are initialised to make sure that it does not go away
during their lifetime.

Move the AFU refcount functions to base.c so they can be called from
the PHB code.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/base.c | 13 +++++++++++++
 drivers/misc/cxl/cxl.h  | 12 ------------
 include/misc/cxl-base.h |  4 ++++
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index 9b90ec6..c35a52f 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -54,6 +54,19 @@ static inline void cxl_calls_put(struct cxl_calls *calls) { }
 
 #endif /* CONFIG_CXL_MODULE */
 
+/* AFU refcount management */
+struct cxl_afu *cxl_afu_get(struct cxl_afu *afu)
+{
+	return (get_device(&afu->dev) == NULL) ? NULL : afu;
+}
+EXPORT_SYMBOL_GPL(cxl_afu_get);
+
+void cxl_afu_put(struct cxl_afu *afu)
+{
+	put_device(&afu->dev);
+}
+EXPORT_SYMBOL_GPL(cxl_afu_put);
+
 void cxl_slbia(struct mm_struct *mm)
 {
 	struct cxl_calls *calls;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index aafffa8..9e2621e 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -428,18 +428,6 @@ struct cxl_afu {
 	bool enabled;
 };
 
-/* AFU refcount management */
-static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu)
-{
-
-	return (get_device(&afu->dev) == NULL) ? NULL : afu;
-}
-
-static inline void  cxl_afu_put(struct cxl_afu *afu)
-{
-	put_device(&afu->dev);
-}
-
 
 struct cxl_irq_name {
 	struct list_head list;
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index 5ae9625..f53808f 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -36,11 +36,15 @@ static inline void cxl_ctx_put(void)
        atomic_dec(&cxl_use_count);
 }
 
+struct cxl_afu *cxl_afu_get(struct cxl_afu *afu);
+void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
 
 #else /* CONFIG_CXL_BASE */
 
 static inline bool cxl_ctx_in_use(void) { return false; }
+static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu) { return NULL; }
+static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
 
 #endif /* CONFIG_CXL_BASE */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (3 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 16:51   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The cxl kernel API has a concept of a default context associated with
each PCI device under the virtual PHB. The Mellanox CX4 will also use
the cxl kernel API, but it does not use a virtual PHB - rather, the AFU
appears as a physical function as a peer to the networking functions.

In order to allow the kernel API to work with those networking
functions, we will need to associate a default context with them as
well. To this end, refactor the corresponding code to do this in vphb.c
and export it so that it can be called from the PHB code.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/base.c | 35 +++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h  |  6 ++++++
 drivers/misc/cxl/main.c |  2 ++
 drivers/misc/cxl/vphb.c | 37 +++++++++++++++++++++++--------------
 include/misc/cxl-base.h |  6 ++++++
 5 files changed, 72 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index c35a52f..af20b34 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -106,6 +106,41 @@ int cxl_update_properties(struct device_node *dn,
 }
 EXPORT_SYMBOL_GPL(cxl_update_properties);
 
+/*
+ * API calls into the driver that may be called from the PHB code and must be
+ * built in.
+ */
+bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	bool ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return false;
+
+	ret = calls->cxl_pci_associate_default_context(dev, afu);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_pci_associate_default_context);
+
+void cxl_pci_disable_device(struct pci_dev *dev)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_pci_disable_device(dev);
+
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL_GPL(cxl_pci_disable_device);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 9e2621e..c94b54f 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -707,9 +707,15 @@ static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg)
 ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 				loff_t off, size_t count);
 
+/* Internal functions wrapped in cxl_base to allow PHB to call them */
+bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
+void _cxl_pci_disable_device(struct pci_dev *dev);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
+	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
+	void (*cxl_pci_disable_device)(struct pci_dev *dev);
+
 	struct module *owner;
 };
 int register_cxl_calls(struct cxl_calls *calls);
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index ae68c32..4e5474b 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -110,6 +110,8 @@ static inline void cxl_slbia_core(struct mm_struct *mm)
 
 static struct cxl_calls cxl_calls = {
 	.cxl_slbia = cxl_slbia_core,
+	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
+	.cxl_pci_disable_device = _cxl_pci_disable_device,
 	.owner = THIS_MODULE,
 };
 
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 012b6aa..c5b9c201 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -40,11 +40,28 @@ static void cxl_teardown_msi_irqs(struct pci_dev *pdev)
 	 */
 }
 
+bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	struct cxl_context *ctx;
+
+	/*
+	 * Allocate a context to do cxl things too. This is used for interrupts
+	 * in the peer model using a real phb, and if we eventually do DMA ops
+	 * in the virtual phb, we'll need a default context to attach them to.
+	 */
+	ctx = cxl_dev_context_init(dev);
+	if (!ctx)
+		return false;
+	dev->dev.archdata.cxl_ctx = ctx;
+
+	return (cxl_ops->afu_check_and_enable(afu) == 0);
+}
+/* exported via cxl_base */
+
 static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
 {
 	struct pci_controller *phb;
 	struct cxl_afu *afu;
-	struct cxl_context *ctx;
 
 	phb = pci_bus_to_host(dev->bus);
 	afu = (struct cxl_afu *)phb->private_data;
@@ -57,19 +74,10 @@ static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
 	set_dma_ops(&dev->dev, &dma_direct_ops);
 	set_dma_offset(&dev->dev, PAGE_OFFSET);
 
-	/*
-	 * Allocate a context to do cxl things too.  If we eventually do real
-	 * DMA ops, we'll need a default context to attach them to
-	 */
-	ctx = cxl_dev_context_init(dev);
-	if (!ctx)
-		return false;
-	dev->dev.archdata.cxl_ctx = ctx;
-
-	return (cxl_ops->afu_check_and_enable(afu) == 0);
+	return _cxl_pci_associate_default_context(dev, afu);
 }
 
-static void cxl_pci_disable_device(struct pci_dev *dev)
+void _cxl_pci_disable_device(struct pci_dev *dev)
 {
 	struct cxl_context *ctx = cxl_get_context(dev);
 
@@ -82,6 +90,7 @@ static void cxl_pci_disable_device(struct pci_dev *dev)
 		cxl_release_context(ctx);
 	}
 }
+/* exported via cxl_base */
 
 static resource_size_t cxl_pci_window_alignment(struct pci_bus *bus,
 						unsigned long type)
@@ -197,8 +206,8 @@ static struct pci_controller_ops cxl_pci_controller_ops =
 {
 	.probe_mode = cxl_pci_probe_mode,
 	.enable_device_hook = cxl_pci_enable_device_hook,
-	.disable_device = cxl_pci_disable_device,
-	.release_device = cxl_pci_disable_device,
+	.disable_device = _cxl_pci_disable_device,
+	.release_device = _cxl_pci_disable_device,
 	.window_alignment = cxl_pci_window_alignment,
 	.reset_secondary_bus = cxl_pci_reset_secondary_bus,
 	.setup_msi_irqs = cxl_setup_msi_irqs,
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index f53808f..bb7e629 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -10,6 +10,8 @@
 #ifndef _MISC_CXL_BASE_H
 #define _MISC_CXL_BASE_H
 
+#include <misc/cxl.h>
+
 #ifdef CONFIG_CXL_BASE
 
 #define CXL_IRQ_RANGES 4
@@ -39,6 +41,8 @@ static inline void cxl_ctx_put(void)
 struct cxl_afu *cxl_afu_get(struct cxl_afu *afu);
 void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
+bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
+void cxl_pci_disable_device(struct pci_dev *dev);
 
 #else /* CONFIG_CXL_BASE */
 
@@ -46,6 +50,8 @@ static inline bool cxl_ctx_in_use(void) { return false; }
 static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu) { return NULL; }
 static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
+static inline bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu) { return false; }
+static inline void cxl_pci_disable_device(struct pci_dev *dev) {}
 
 #endif /* CONFIG_CXL_BASE */
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (4 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 17:38   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB Ian Munsie
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

This adds support for the peer model of the cxl kernel api to the
PowerNV PHB, and exports APIs to enable the mode, check if a PCI device
is attached to a PHB in this mode, and to set and get the peer AFU for
this mode.

The cxl driver will enable this mode for supported cards by calling
pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
that this mode is enabled, and switch out it's controller_ops for the
cxl version.

The cxl version of the controller_ops struct implements it's own
versions of the enable_device_hook and release_device to handle
refcounting on the peer AFU and to allocate a default context for the
device.

Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
there is no safe way to disable cxl mode short of a reboot, so until
that changes there is no reason to support the disable path.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 arch/powerpc/include/asm/pnv-pci.h        |   7 ++
 arch/powerpc/platforms/powernv/pci-cxl.c  | 112 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |  22 +++++-
 arch/powerpc/platforms/powernv/pci.h      |  16 +++++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 791db1b..c47097f 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -38,6 +38,13 @@ int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
 			       struct pci_dev *dev, int num);
 void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
 				  struct pci_dev *dev);
+
+/* Support for the cxl kernel api on the real PHB (instead of vPHB) */
+int pnv_cxl_enable_phb_kernel_api(struct pci_controller *hose, bool enable);
+bool pnv_pci_on_cxl_phb(struct pci_dev *dev);
+struct cxl_afu *pnv_cxl_phb_to_afu(struct pci_controller *hose);
+void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu);
+
 #endif
 
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
index ea8171f..2f386f5 100644
--- a/arch/powerpc/platforms/powernv/pci-cxl.c
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -7,8 +7,11 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/module.h>
+#include <asm/pci-bridge.h>
 #include <asm/pnv-pci.h>
 #include <asm/opal.h>
+#include <misc/cxl.h>
 
 #include "pci.h"
 
@@ -161,3 +164,112 @@ int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 	return 0;
 }
 EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
+
+/*
+ * Sets flags and switches the controller ops to enable the cxl kernel api.
+ * Original the cxl kernel API operated on a virtual PHB, but certain cards
+ * such as the Mellanox CX4 use a peer model instead and for these cards the
+ * cxl kernel api will operate on the real PHB.
+ */
+int pnv_cxl_enable_phb_kernel_api(struct pci_controller *hose, bool enable)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct module *cxl_module;
+
+	if (!enable) {
+		/*
+		 * Once cxl mode is enabled on the PHB, there is currently no
+		 * known safe method to disable it again, and trying risks a
+		 * checkstop. If we can find a way to safely disable cxl mode
+		 * in the future we can revisit this, but for now the only sane
+		 * thing to do is to refuse to disable cxl mode:
+		 */
+		return -EPERM;
+	}
+
+	/*
+	 * Hold a reference to the cxl module since several PHB operations now
+	 * depend on it, and it would be insane to allow it to be removed so
+	 * long as we are in this mode (and since we can't safely disable this
+	 * mode once enabled...).
+	 */
+	mutex_lock(&module_mutex);
+	cxl_module = find_module("cxl");
+	if (cxl_module)
+		__module_get(cxl_module);
+	mutex_unlock(&module_mutex);
+	if (!cxl_module)
+		return -ENODEV;
+
+	phb->flags |= PNV_PHB_FLAG_CXL;
+	hose->controller_ops = pnv_cxl_cx4_ioda_controller_ops;
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_enable_phb_kernel_api);
+
+bool pnv_pci_on_cxl_phb(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	return !!(phb->flags & PNV_PHB_FLAG_CXL);
+}
+EXPORT_SYMBOL(pnv_pci_on_cxl_phb);
+
+struct cxl_afu *pnv_cxl_phb_to_afu(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+
+	return (struct cxl_afu *)phb->cxl_afu;
+}
+EXPORT_SYMBOL_GPL(pnv_cxl_phb_to_afu);
+
+void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	phb->cxl_afu = afu;
+}
+EXPORT_SYMBOL_GPL(pnv_cxl_phb_set_peer_afu);
+
+bool pnv_cxl_enable_device_hook(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct cxl_afu *afu = phb->cxl_afu;
+
+	if (!pnv_pci_enable_device_hook(dev))
+		return false;
+
+	/* No special handling for cxl function: */
+	if (PCI_FUNC(dev->devfn) == 0)
+		return true;
+
+	if (!afu) {
+		dev_WARN(&dev->dev, "Attempted to enable function > 0 on CXL PHB without a peer AFU\n");
+		return false;
+	}
+
+	dev_info(&dev->dev, "Enabling function on CXL enabled PHB with peer AFU\n");
+
+	/* Make sure the peer AFU can't go away while this device is active */
+	cxl_afu_get(afu);
+
+	return cxl_pci_associate_default_context(dev, afu);
+}
+
+void pnv_cxl_disable_device(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct cxl_afu *afu = phb->cxl_afu;
+
+	/* No special handling for cxl function: */
+	if (PCI_FUNC(dev->devfn) == 0)
+		return;
+
+	cxl_pci_disable_device(dev);
+	cxl_afu_put(afu);
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e0d8103..467085f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3222,7 +3222,7 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
-static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
+bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -3396,7 +3396,7 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
 	pnv_ioda_free_pe(pe);
 }
 
-static void pnv_pci_release_device(struct pci_dev *pdev)
+void pnv_pci_release_device(struct pci_dev *pdev)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -3423,7 +3423,7 @@ static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 		       OPAL_ASSERT_RESET);
 }
 
-static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
+const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_dma_bus_setup,
 #ifdef CONFIG_PCI_MSI
@@ -3461,6 +3461,22 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
+#ifdef CONFIG_CXL_BASE
+const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
+	.dma_dev_setup		= pnv_pci_dma_dev_setup,
+	.dma_bus_setup		= pnv_pci_dma_bus_setup,
+	.enable_device_hook	= pnv_cxl_enable_device_hook,
+	.disable_device		= pnv_cxl_disable_device,
+	.release_device		= pnv_pci_release_device,
+	.window_alignment	= pnv_pci_window_alignment,
+	.setup_bridge		= pnv_pci_setup_bridge,
+	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
+	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
+	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
+	.shutdown		= pnv_pci_ioda_shutdown,
+};
+#endif
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 49c2997..4d003dc 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -76,6 +76,7 @@ struct pnv_ioda_pe {
 };
 
 #define PNV_PHB_FLAG_EEH	(1 << 0)
+#define PNV_PHB_FLAG_CXL	(1 << 1) /* Real PHB supporting the cxl kernel API */
 
 struct pnv_phb {
 	struct pci_controller	*hose;
@@ -177,6 +178,9 @@ struct pnv_phb {
 		struct OpalIoP7IOCErrorData 	hub_diag;
 	} diag;
 
+#ifdef CONFIG_CXL_BASE
+	struct cxl_afu *cxl_afu;
+#endif
 };
 
 extern struct pci_ops pnv_pci_ops;
@@ -218,6 +222,8 @@ extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
 extern struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev);
 extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq);
+extern bool pnv_pci_enable_device_hook(struct pci_dev *dev);
+extern void pnv_pci_release_device(struct pci_dev *pdev);
 
 extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...);
@@ -238,4 +244,14 @@ extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
 extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
 extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
 
+
+/* cxl functions */
+extern bool pnv_cxl_enable_device_hook(struct pci_dev *dev);
+extern void pnv_cxl_disable_device(struct pci_dev *dev);
+
+
+/* phb ops (cxl switches these when enabling the kernel api on the phb) */
+extern const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops;
+extern const struct pci_controller_ops pnv_pci_ioda_controller_ops;
+
 #endif /* __POWERNV_PCI_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (5 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 17:39   ` Frederic Barrat
  2016-07-06 18:30   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

This hooks up support for using the kernel API with a real PHB. After
the AFU initialisation has completed it calls into the PHB code to pass
it the AFU that will be used by other peer physical functions on the
adapter.

The cxl_pci_to_afu API is extended to work with peer PCI devices,
retrieving the peer AFU from the PHB. This API may also now return an
error if it is called on a PCI device that is not associated with either
a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
down.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/api.c  |  5 +++++
 drivers/misc/cxl/pci.c  |  6 ++++++
 drivers/misc/cxl/vphb.c | 16 ++++++++++++++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 7707055..6a030bf 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -13,6 +13,7 @@
 #include <linux/file.h>
 #include <misc/cxl.h>
 #include <linux/fs.h>
+#include <asm/pnv-pci.h>
 
 #include "cxl.h"
 
@@ -24,6 +25,8 @@ struct cxl_context *cxl_dev_context_init(struct pci_dev *dev)
 	int rc;
 
 	afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return ERR_CAST(afu);
 
 	ctx = cxl_context_alloc();
 	if (IS_ERR(ctx)) {
@@ -438,6 +441,8 @@ EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
 ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count)
 {
 	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
 
 	return cxl_ops->read_adapter_vpd(afu->adapter, buf, count);
 }
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 6c0597d..02242be 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1502,6 +1502,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			dev_err(&dev->dev, "AFU %i failed to start: %i\n", slice, rc);
 	}
 
+	if (pnv_pci_on_cxl_phb(dev) && adapter->slices >= 1)
+		pnv_cxl_phb_set_peer_afu(dev, adapter->afu[0]);
+
 	return 0;
 }
 
@@ -1572,6 +1575,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
 		 */
 		for (i = 0; i < adapter->slices; i++) {
 			afu = adapter->afu[i];
+			/* Only participate in EEH if we are on a virtual PHB */
+			if (afu->phb == NULL)
+				return PCI_ERS_RESULT_NONE;
 			cxl_vphb_error_detected(afu, state);
 		}
 		return PCI_ERS_RESULT_DISCONNECT;
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index c5b9c201..08e8db7 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -9,6 +9,7 @@
 
 #include <linux/pci.h>
 #include <misc/cxl.h>
+#include <asm/pnv-pci.h>
 #include "cxl.h"
 
 static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
@@ -280,13 +281,18 @@ void cxl_pci_vphb_remove(struct cxl_afu *afu)
 	pcibios_free_controller(phb);
 }
 
+static bool _cxl_pci_is_vphb_device(struct pci_controller *phb)
+{
+	return (phb->ops == &cxl_pcie_pci_ops);
+}
+
 bool cxl_pci_is_vphb_device(struct pci_dev *dev)
 {
 	struct pci_controller *phb;
 
 	phb = pci_bus_to_host(dev->bus);
 
-	return (phb->ops == &cxl_pcie_pci_ops);
+	return _cxl_pci_is_vphb_device(phb);
 }
 
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev)
@@ -295,7 +301,13 @@ struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev)
 
 	phb = pci_bus_to_host(dev->bus);
 
-	return (struct cxl_afu *)phb->private_data;
+	if (_cxl_pci_is_vphb_device(phb))
+		return (struct cxl_afu *)phb->private_data;
+
+	if (pnv_pci_on_cxl_phb(dev))
+		return pnv_cxl_phb_to_afu(phb);
+
+	return ERR_PTR(-ENODEV);
 }
 EXPORT_SYMBOL_GPL(cxl_pci_to_afu);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (6 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 18:11   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

These APIs will be used by the Mellanox CX4 support. While they function
standalone to configure existing behaviour, their primary purpose is to
allow the Mellanox driver to inform the cxl driver of a hardware
limitation, which will be used in a future patch.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/api.c | 27 +++++++++++++++++++++++++++
 include/misc/cxl.h     | 10 ++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 6a030bf..1e2c0d9 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -447,3 +447,30 @@ ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count)
 	return cxl_ops->read_adapter_vpd(afu->adapter, buf, count);
 }
 EXPORT_SYMBOL_GPL(cxl_read_adapter_vpd);
+
+int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs)
+{
+	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
+
+	if (irqs > afu->adapter->user_irqs)
+		return -EINVAL;
+
+	/* Limit user_irqs to prevent the user increasing this via sysfs */
+	afu->adapter->user_irqs = irqs;
+	afu->irqs_max = irqs;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_set_max_irqs_per_process);
+
+int cxl_get_max_irqs_per_process(struct pci_dev *dev)
+{
+	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
+
+	return afu->irqs_max;
+}
+EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index dd9eebb..fc07ed4 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -166,6 +166,16 @@ void cxl_psa_unmap(void __iomem *addr);
 /*  Get the process element for this context */
 int cxl_process_element(struct cxl_context *ctx);
 
+/*
+ * Limit the number of interrupts that a single context can allocate via
+ * cxl_start_work. If using the api with a real phb, this may be used to
+ * request that additional default contexts be created when allocating
+ * interrupts via pci_enable_msix_range. These will be set to the same running
+ * state as the default context, and if that is running it will reuse the
+ * parameters previously passed to cxl_start_context for the default context.
+ */
+int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs);
+int cxl_get_max_irqs_per_process(struct pci_dev *dev);
 
 /*
  * These calls allow drivers to create their own file descriptors and make them
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (7 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 18:34   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 has a hardware limitation where only 4 bits of the
AFU interrupt number can be passed to the XSL when sending an interrupt,
limiting it to only 15 interrupts per context (AFU interrupt number 0 is
invalid).

In order to overcome this, we will allocate additional contexts linked
to the default context as extra address space for the extra interrupts -
this will be implemented in the next patch.

This patch adds the preliminary support to allow this, by way of adding
a linked list in the context structure that we use to keep track of the
contexts dedicated to interrupts, and an API to simultaneously iterate
over the related context structures, AFU interrupt numbers and hardware
interrupt numbers. The point of using a single API to iterate these is
to hide some of the details of the iteration from external code, and to
reduce the number of APIs that need to be exported via base.c to allow
built in code to call.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/api.c     | 15 +++++++++++++++
 drivers/misc/cxl/base.c    | 17 +++++++++++++++++
 drivers/misc/cxl/context.c |  1 +
 drivers/misc/cxl/cxl.h     | 10 ++++++++++
 drivers/misc/cxl/main.c    |  1 +
 include/misc/cxl.h         |  9 +++++++++
 6 files changed, 53 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1e2c0d9..f02a859 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -97,6 +97,21 @@ static irq_hw_number_t cxl_find_afu_irq(struct cxl_context *ctx, int num)
 	return 0;
 }
 
+int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq)
+{
+	if (*ctx == NULL || *afu_irq == 0) {
+		*afu_irq = 1;
+		*ctx = cxl_get_context(pdev);
+	} else {
+		(*afu_irq)++;
+		if (*afu_irq > cxl_get_max_irqs_per_process(pdev)) {
+			*ctx = list_next_entry(*ctx, extra_irq_contexts);
+			*afu_irq = 1;
+		}
+	}
+	return cxl_find_afu_irq(*ctx, *afu_irq);
+}
+/* Exported via cxl_base */
 
 int cxl_set_priv(struct cxl_context *ctx, void *priv)
 {
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index af20b34..0f89ea9 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -141,6 +141,23 @@ void cxl_pci_disable_device(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(cxl_pci_disable_device);
 
+int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq)
+{
+	int ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return -EBUSY;
+
+	ret = calls->cxl_next_msi_hwirq(pdev, ctx, afu_irq);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_next_msi_hwirq);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index edbb99e..2616cddb 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -68,6 +68,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 	ctx->pending_afu_err = false;
 
 	INIT_LIST_HEAD(&ctx->irq_names);
+	INIT_LIST_HEAD(&ctx->extra_irq_contexts);
 
 	/*
 	 * When we have to destroy all contexts in cxl_context_detach_all() we
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index c94b54f..67464c9 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -525,6 +525,14 @@ struct cxl_context {
 	atomic_t afu_driver_events;
 
 	struct rcu_head rcu;
+
+	/*
+	 * Only used when more interrupts are allocated via
+	 * pci_enable_msix_range than are supported in the default context, to
+	 * use additional contexts to overcome the limitation. i.e. Mellanox
+	 * CX4 only:
+	 */
+	struct list_head extra_irq_contexts;
 };
 
 struct cxl_service_layer_ops {
@@ -710,11 +718,13 @@ ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 /* Internal functions wrapped in cxl_base to allow PHB to call them */
 bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void _cxl_pci_disable_device(struct pci_dev *dev);
+int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
 	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
 	void (*cxl_pci_disable_device)(struct pci_dev *dev);
+	int (*cxl_next_msi_hwirq)(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
 
 	struct module *owner;
 };
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 4e5474b..66fac71 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -112,6 +112,7 @@ static struct cxl_calls cxl_calls = {
 	.cxl_slbia = cxl_slbia_core,
 	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
 	.cxl_pci_disable_device = _cxl_pci_disable_device,
+	.cxl_next_msi_hwirq = _cxl_next_msi_hwirq,
 	.owner = THIS_MODULE,
 };
 
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index fc07ed4..ed81a17 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -178,6 +178,15 @@ int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs);
 int cxl_get_max_irqs_per_process(struct pci_dev *dev);
 
 /*
+ * Use to simultaneously iterate over hardware interrupt numbers, contexts and
+ * afu interrupt numbers allocated for the device via pci_enable_msix_range and
+ * is a useful convinience function when working with hardware that has
+ * limitations on the number of interrupts per process. *ctx and *afu_irq
+ * should be NULL and 0 to start the iteration.
+ */
+int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+
+/*
  * These calls allow drivers to create their own file descriptors and make them
  * identical to the cxl file descriptor user API. An example use case:
  *
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (8 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06 18:41   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
interrupts are routed from the networking hardware to the XSL using the
MSIX table, and from there will be transformed back into an MSIX
interrupt using the cxl style interrupts (i.e. using IVTE entries and
ranges to map a PE and AFU interrupt number to an MSIX address).

We want to hide the implementation details of cxl interrupts as much as
possible. To this end, we use a special version of the MSI setup &
teardown routines in the PHB while in cxl mode to allocate the cxl
interrupts and configure the IVTE entries in the process element.

This function does not configure the MSIX table - the CX4 card uses a
custom format in that table and it would not be appropriate to fill that
out in generic code. The rest of the functionality is similar to the
"Full MSI-X mode" described in the CAIA, and this could be easily
extended to support other adapters that use that mode in the future.

The interrupts will be associated with the default context. If the
maximum number of interrupts per context has been limited (e.g. by the
mlx5 driver), it will automatically allocate additional kernel contexts
to associate extra interrupts as required. These contexts will be
started using the same WED that was used to start the default context.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-cxl.c  | 84 +++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |  4 ++
 arch/powerpc/platforms/powernv/pci.h      |  2 +
 drivers/misc/cxl/api.c                    | 71 ++++++++++++++++++++++++++
 drivers/misc/cxl/base.c                   | 31 ++++++++++++
 drivers/misc/cxl/cxl.h                    |  4 ++
 drivers/misc/cxl/main.c                   |  2 +
 include/misc/cxl-base.h                   |  4 ++
 8 files changed, 202 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
index 2f386f5..1559ca2 100644
--- a/arch/powerpc/platforms/powernv/pci-cxl.c
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/msi.h>
 #include <asm/pci-bridge.h>
 #include <asm/pnv-pci.h>
 #include <asm/opal.h>
@@ -273,3 +274,86 @@ void pnv_cxl_disable_device(struct pci_dev *dev)
 	cxl_pci_disable_device(dev);
 	cxl_afu_put(afu);
 }
+
+/*
+ * This is a special version of pnv_setup_msi_irqs for cards in cxl mode. This
+ * function handles setting up the IVTE entries for the XSL to use.
+ *
+ * We are currently not filling out the MSIX table, since the only currently
+ * supported adapter (CX4) uses a custom MSIX table format in cxl mode and it
+ * is up to their driver to fill that out. In the future we may fill out the
+ * MSIX table (and change the IVTE entries to be an index to the MSIX table)
+ * for adapters implementing the Full MSI-X mode described in the CAIA.
+ */
+int pnv_cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+	struct cxl_context *ctx = NULL;
+	unsigned int virq;
+	int hwirq;
+	int afu_irq = 0;
+	int rc;
+
+	if (WARN_ON(!phb) || !phb->msi_bmp.bitmap)
+		return -ENODEV;
+
+	if (pdev->no_64bit_msi && !phb->msi32_support)
+		return -ENODEV;
+
+	rc = cxl_cx4_setup_msi_irqs(pdev, nvec, type);
+	if (rc)
+		return rc;
+
+	for_each_pci_msi_entry(entry, pdev) {
+		if (!entry->msi_attrib.is_64 && !phb->msi32_support) {
+			pr_warn("%s: Supports only 64-bit MSIs\n",
+				pci_name(pdev));
+			return -ENXIO;
+		}
+
+		hwirq = cxl_next_msi_hwirq(pdev, &ctx, &afu_irq);
+		if (WARN_ON(hwirq < 0))
+			return hwirq;
+
+		virq = irq_create_mapping(NULL, hwirq);
+		if (virq == NO_IRQ) {
+			pr_warn("%s: Failed to map cxl mode MSI to linux irq\n",
+				pci_name(pdev));
+			return -ENOMEM;
+		}
+
+		rc = pnv_cxl_ioda_msi_setup(pdev, hwirq, virq);
+		if (rc) {
+			pr_warn("%s: Failed to setup cxl mode MSI\n", pci_name(pdev));
+			irq_dispose_mapping(virq);
+			return rc;
+		}
+
+		irq_set_msi_desc(virq, entry);
+	}
+
+	return 0;
+}
+
+void pnv_cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+	irq_hw_number_t hwirq;
+
+	if (WARN_ON(!phb))
+		return;
+
+	for_each_pci_msi_entry(entry, pdev) {
+		if (entry->irq == NO_IRQ)
+			continue;
+		hwirq = virq_to_hw(entry->irq);
+		irq_set_msi_desc(entry->irq, NULL);
+		irq_dispose_mapping(entry->irq);
+	}
+
+	cxl_cx4_teardown_msi_irqs(pdev);
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 467085f..c8f3b5c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3465,6 +3465,10 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_dma_bus_setup,
+#ifdef CONFIG_PCI_MSI
+	.setup_msi_irqs		= pnv_cxl_cx4_setup_msi_irqs,
+	.teardown_msi_irqs	= pnv_cxl_cx4_teardown_msi_irqs,
+#endif
 	.enable_device_hook	= pnv_cxl_enable_device_hook,
 	.disable_device		= pnv_cxl_disable_device,
 	.release_device		= pnv_pci_release_device,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 4d003dc..4799127 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -248,6 +248,8 @@ extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
 /* cxl functions */
 extern bool pnv_cxl_enable_device_hook(struct pci_dev *dev);
 extern void pnv_cxl_disable_device(struct pci_dev *dev);
+extern int pnv_cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+extern void pnv_cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 
 /* phb ops (cxl switches these when enabling the kernel api on the phb) */
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index f02a859..f3d34b9 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -14,6 +14,7 @@
 #include <misc/cxl.h>
 #include <linux/fs.h>
 #include <asm/pnv-pci.h>
+#include <linux/msi.h>
 
 #include "cxl.h"
 
@@ -489,3 +490,73 @@ int cxl_get_max_irqs_per_process(struct pci_dev *dev)
 	return afu->irqs_max;
 }
 EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
+
+/*
+ * This is a special interrupt allocation routine called from the PHB's MSI
+ * setup function. When capi interrupts are allocated in this manner they must
+ * still be associated with a running context, but since the MSI APIs have no
+ * way to specify this we use the default context associated with the device.
+ *
+ * The Mellanox CX4 has a hardware limitation that restricts the maximum AFU
+ * interrupt number, so in order to overcome this their driver informs us of
+ * the restriction by setting the maximum interrupts per context, and we
+ * allocate additional contexts as necessary so that we can keep the AFU
+ * interrupt number within the supported range.
+ */
+int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	struct cxl_context *ctx, *new_ctx, *default_ctx;
+	int remaining;
+	int rc;
+
+	ctx = default_ctx = cxl_get_context(pdev);
+	if (WARN_ON(!default_ctx))
+		return -ENODEV;
+
+	remaining = nvec;
+	while (remaining > 0) {
+		rc = cxl_allocate_afu_irqs(ctx, min(remaining, ctx->afu->irqs_max));
+		if (rc) {
+			pr_warn("%s: Failed to find enough free MSIs\n", pci_name(pdev));
+			return rc;
+		}
+		remaining -= ctx->afu->irqs_max;
+
+		if (ctx != default_ctx && default_ctx->status == STARTED) {
+			WARN_ON(cxl_start_context(ctx,
+				be64_to_cpu(default_ctx->elem->common.wed),
+				NULL));
+		}
+
+		if (remaining > 0) {
+			new_ctx = cxl_dev_context_init(pdev);
+			if (!new_ctx) {
+				pr_warn("%s: Failed to allocate enough contexts for MSIs\n", pci_name(pdev));
+				return -ENOSPC;
+			}
+			list_add(&new_ctx->extra_irq_contexts, &ctx->extra_irq_contexts);
+			ctx = new_ctx;
+		}
+	}
+
+	return 0;
+}
+/* Exported via cxl_base */
+
+void _cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct cxl_context *ctx, *pos, *tmp;
+
+	ctx = cxl_get_context(pdev);
+	if (WARN_ON(!ctx))
+		return;
+
+	cxl_free_afu_irqs(ctx);
+	list_for_each_entry_safe(pos, tmp, &ctx->extra_irq_contexts, extra_irq_contexts) {
+		cxl_stop_context(pos);
+		cxl_free_afu_irqs(pos);
+		list_del(&pos->extra_irq_contexts);
+		cxl_release_context(pos);
+	}
+}
+/* Exported via cxl_base */
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index 0f89ea9..5778a60 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -158,6 +158,37 @@ int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_
 }
 EXPORT_SYMBOL_GPL(cxl_next_msi_hwirq);
 
+int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	int ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return false;
+
+	ret = calls->cxl_cx4_setup_msi_irqs(pdev, nvec, type);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_cx4_setup_msi_irqs);
+
+void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_cx4_teardown_msi_irqs(pdev);
+
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL_GPL(cxl_cx4_teardown_msi_irqs);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 67464c9..078b268 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -719,12 +719,16 @@ ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void _cxl_pci_disable_device(struct pci_dev *dev);
 int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+void _cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
 	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
 	void (*cxl_pci_disable_device)(struct pci_dev *dev);
 	int (*cxl_next_msi_hwirq)(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+	int (*cxl_cx4_setup_msi_irqs)(struct pci_dev *pdev, int nvec, int type);
+	void (*cxl_cx4_teardown_msi_irqs)(struct pci_dev *pdev);
 
 	struct module *owner;
 };
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 66fac71..d9be23b2 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -113,6 +113,8 @@ static struct cxl_calls cxl_calls = {
 	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
 	.cxl_pci_disable_device = _cxl_pci_disable_device,
 	.cxl_next_msi_hwirq = _cxl_next_msi_hwirq,
+	.cxl_cx4_setup_msi_irqs = _cxl_cx4_setup_msi_irqs,
+	.cxl_cx4_teardown_msi_irqs = _cxl_cx4_teardown_msi_irqs,
 	.owner = THIS_MODULE,
 };
 
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index bb7e629..b2ebc91 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -43,6 +43,8 @@ void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
 bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void cxl_pci_disable_device(struct pci_dev *dev);
+int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 #else /* CONFIG_CXL_BASE */
 
@@ -52,6 +54,8 @@ static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
 static inline bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu) { return false; }
 static inline void cxl_pci_disable_device(struct pci_dev *dev) {}
+static inline int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) { return -ENODEV; }
+static inline void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev) {}
 
 #endif /* CONFIG_CXL_BASE */
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (9 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06  4:42   ` Andrew Donnellan
  2016-07-06 18:42   ` Frederic Barrat
  2016-07-04 13:22 ` [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The CX4 card cannot cope with a context with PE=0 due to a hardware
limitation, resulting in:

[   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
[   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting

Since the kernel API allocates a default context very early during
device init that will almost certainly get Process Element ID 0 there is
no easy way for us to extend the API to allow the Mellanox to inform us
of this limitation ahead of time.

Instead, work around the issue by extending the XSL structure to include
a minimum PE to allocate. Although the bug is not in the XSL, it is the
easiest place to work around this limitation given that the CX4 is
currently the only card that uses an XSL.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/misc/cxl/context.c | 3 ++-
 drivers/misc/cxl/cxl.h     | 1 +
 drivers/misc/cxl/pci.c     | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 2616cddb..bdee9a0 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -90,7 +90,8 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 	 */
 	mutex_lock(&afu->contexts_lock);
 	idr_preload(GFP_KERNEL);
-	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
+	i = idr_alloc(&ctx->afu->contexts_idr, ctx,
+		      ctx->afu->adapter->native->sl_ops->min_pe,
 		      ctx->afu->num_procs, GFP_NOWAIT);
 	idr_preload_end();
 	mutex_unlock(&afu->contexts_lock);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 078b268..19b132f 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -549,6 +549,7 @@ struct cxl_service_layer_ops {
 	u64 (*timebase_read)(struct cxl *adapter);
 	int capi_mode;
 	bool needs_reset_before_disable;
+	int min_pe;
 };
 
 struct cxl_native {
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 02242be..090eee8 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1321,6 +1321,7 @@ static const struct cxl_service_layer_ops xsl_ops = {
 	.write_timebase_ctrl = write_timebase_ctrl_xsl,
 	.timebase_read = timebase_read_xsl,
 	.capi_mode = OPAL_PHB_CAPI_MODE_DMA,
+	.min_pe = 1, /* Workaround for Mellanox CX4 HW bug */
 };
 
 static void set_sl_ops(struct cxl *adapter, struct pci_dev *dev)
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (10 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-05  0:03   ` Gavin Shan
  2016-07-04 13:22 ` [PATCH 13/14] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
  2016-07-04 13:22 ` [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
  13 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan, linux-pci, Bjorn Helgaas

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

The cxl driver will use infrastructure from pnv_php to handle device tree
updates when switching bi-modal CAPI cards into CAPI mode.

To enable this, export pnv_php_find_slot() and
pnv_php_set_slot_power_state(), and add corresponding declarations, as well
as the definition of struct pnv_php_slot, to asm/pnv-pci.h.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-pci.h | 28 ++++++++++++++++++++++++++++
 drivers/pci/hotplug/Kconfig        |  1 +
 drivers/pci/hotplug/pnv_php.c      | 32 +++++---------------------------
 3 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index c47097f..0cbd813 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -11,6 +11,7 @@
 #define _ASM_PNV_PCI_H
 
 #include <linux/pci.h>
+#include <linux/pci_hotplug.h>
 #include <misc/cxl-base.h>
 #include <asm/opal-api.h>
 
@@ -47,4 +48,31 @@ void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu);
 
 #endif
 
+struct pnv_php_slot {
+	struct hotplug_slot		slot;
+	struct hotplug_slot_info	slot_info;
+	uint64_t			id;
+	char				*name;
+	int				slot_no;
+	struct kref			kref;
+#define PNV_PHP_STATE_INITIALIZED	0
+#define PNV_PHP_STATE_REGISTERED	1
+#define PNV_PHP_STATE_POPULATED		2
+#define PNV_PHP_STATE_OFFLINE		3
+	int				state;
+	struct device_node		*dn;
+	struct pci_dev			*pdev;
+	struct pci_bus			*bus;
+	bool				power_state_check;
+	void				*fdt;
+	void				*dt;
+	struct of_changeset		ocs;
+	struct pnv_php_slot		*parent;
+	struct list_head		children;
+	struct list_head		link;
+};
+extern struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn);
+extern int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
+					uint8_t state);
+
 #endif
diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index aadce45..b719a72 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -117,6 +117,7 @@ config HOTPLUG_PCI_POWERNV
 	tristate "PowerPC PowerNV PCI Hotplug driver"
 	depends on PPC_POWERNV && EEH
 	select OF_DYNAMIC
+	select HOTPLUG_PCI_POWERNV_BASE
 	help
 	  Say Y here if you run PowerPC PowerNV platform that supports
 	  PCI Hotplug
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 6086db6..2d2f704 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -22,30 +22,6 @@
 #define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
 #define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
 
-struct pnv_php_slot {
-	struct hotplug_slot		slot;
-	struct hotplug_slot_info	slot_info;
-	uint64_t			id;
-	char				*name;
-	int				slot_no;
-	struct kref			kref;
-#define PNV_PHP_STATE_INITIALIZED	0
-#define PNV_PHP_STATE_REGISTERED	1
-#define PNV_PHP_STATE_POPULATED		2
-#define PNV_PHP_STATE_OFFLINE		3
-	int				state;
-	struct device_node		*dn;
-	struct pci_dev			*pdev;
-	struct pci_bus			*bus;
-	bool				power_state_check;
-	void				*fdt;
-	void				*dt;
-	struct of_changeset		ocs;
-	struct pnv_php_slot		*parent;
-	struct list_head		children;
-	struct list_head		link;
-};
-
 static LIST_HEAD(pnv_php_slot_list);
 static DEFINE_SPINLOCK(pnv_php_lock);
 
@@ -91,7 +67,7 @@ static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
 	return NULL;
 }
 
-static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
+struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
 {
 	struct pnv_php_slot *php_slot, *tmp;
 	unsigned long flags;
@@ -108,6 +84,7 @@ static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(pnv_php_find_slot);
 
 /*
  * Remove pdn for all children of the indicated device node.
@@ -316,8 +293,8 @@ out:
 	return ret;
 }
 
-static int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
-					uint8_t state)
+int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
+				 uint8_t state)
 {
 	struct pnv_php_slot *php_slot = slot->private;
 	struct opal_msg msg;
@@ -347,6 +324,7 @@ static int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state);
 
 static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
 {
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/14] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (11 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-04 13:22 ` [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
  13 siblings, 0 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan, linux-pci, Bjorn Helgaas

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

When calling pnv_php_set_slot_power_state() with state ==
OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're
dealing with OPAL_PCI_SLOT_POWER_OFF.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 2d2f704..e6245b0 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -317,7 +317,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 		return ret;
 	}
 
-	if (state == OPAL_PCI_SLOT_POWER_OFF)
+	if (state == OPAL_PCI_SLOT_POWER_OFF || state == OPAL_PCI_SLOT_OFFLINE)
 		pnv_php_rmv_devtree(php_slot);
 	else
 		ret = pnv_php_add_devtree(php_slot);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (12 preceding siblings ...)
  2016-07-04 13:22 ` [PATCH 13/14] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
@ 2016-07-04 13:22 ` Ian Munsie
  2016-07-06  3:55   ` Andrew Donnellan
  2016-07-06 18:51   ` Frederic Barrat
  13 siblings, 2 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-04 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Add a new API, cxl_check_and_switch_mode() to allow for switching of
bi-modal CAPI cards, such as the Mellanox CX-4 network card.

When a driver requests to switch a card to CAPI mode, use PCI hotplug
infrastructure to remove all PCI devices underneath the slot. We then write
an updated mode control register to the CAPI VSEC, hot reset the card, and
reprobe the card.

As the card may present a different set of PCI devices after the mode
switch, use the infrastructure provided by the pnv_php driver and the OPAL
PCI slot management facilities to ensure that:

  * the old devices are removed from both the OPAL and Linux device trees
  * the new devices are probed by OPAL and added to the OPAL device tree
  * the new devices are added to the Linux device tree and probed through
    the regular PCI device probe path

As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on
the pnv_php driver.

Refactor existing code that touches the mode control register in the
regular single mode case into a new function, setup_cxl_protocol_area().

Co-authored-by: Ian Munsie <imunsie@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/misc/cxl/Kconfig |   8 ++
 drivers/misc/cxl/pci.c   | 234 +++++++++++++++++++++++++++++++++++++++++++----
 include/misc/cxl.h       |  25 +++++
 3 files changed, 249 insertions(+), 18 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 560412c..6859723 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -38,3 +38,11 @@ config CXL
 	  CAPI adapters are found in POWER8 based systems.
 
 	  If unsure, say N.
+
+config CXL_BIMODAL
+	bool "Support for bi-modal CAPI cards"
+	depends on HOTPLUG_PCI_POWERNV = y && CXL || HOTPLUG_PCI_POWERNV = m && CXL = m
+	default y
+	help
+	  Select this option to enable support for bi-modal CAPI cards, such as
+	  the Mellanox CX-4.
\ No newline at end of file
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 090eee8..63abd26 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -55,6 +55,8 @@
 	pci_read_config_byte(dev, vsec + 0xa, dest)
 #define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
 	pci_write_config_byte(dev, vsec + 0xa, val)
+#define CXL_WRITE_VSEC_MODE_CONTROL_BUS(bus, devfn, vsec, val) \
+	pci_bus_write_config_byte(bus, devfn, vsec + 0xa, val)
 #define CXL_VSEC_PROTOCOL_MASK   0xe0
 #define CXL_VSEC_PROTOCOL_1024TB 0x80
 #define CXL_VSEC_PROTOCOL_512TB  0x40
@@ -614,36 +616,232 @@ static int setup_cxl_bars(struct pci_dev *dev)
 	return 0;
 }
 
-/* pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>; */
-static int switch_card_to_cxl(struct pci_dev *dev)
-{
+#ifdef CONFIG_CXL_BIMODAL
+
+struct cxl_switch_work {
+	struct pci_dev *dev;
+	struct work_struct work;
 	int vsec;
+	int mode;
+};
+
+static void switch_card_to_cxl(struct work_struct *work)
+{
+	struct cxl_switch_work *switch_work =
+		container_of(work, struct cxl_switch_work, work);
+	struct pci_dev *dev = switch_work->dev;
+	struct pci_bus *bus = dev->bus;
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pci_dev *bridge;
+	struct pnv_php_slot *php_slot;
+	unsigned int devfn;
 	u8 val;
 	int rc;
 
-	dev_info(&dev->dev, "switch card to CXL\n");
+	dev_info(&bus->dev, "cxl: Preparing for mode switch...\n");
+	bridge = list_first_entry_or_null(&hose->bus->devices, struct pci_dev,
+					  bus_list);
+	if (!bridge) {
+		dev_WARN(&bus->dev, "cxl: Couldn't find root port!\n");
+		goto err_free_work;
+	}
 
-	if (!(vsec = find_cxl_vsec(dev))) {
-		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
+	php_slot = pnv_php_find_slot(pci_device_to_OF_node(bridge));
+	if (!php_slot) {
+		dev_err(&bus->dev, "cxl: Failed to find slot hotplug "
+			           "information. You may need to upgrade "
+			           "skiboot. Aborting.\n");
+		pci_dev_put(dev);
+		goto err_free_work;
+	}
+
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, switch_work->vsec, &val);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: Failed to read CAPI mode control: %i\n", rc);
+		pci_dev_put(dev);
+		goto err_free_work;
+	}
+	devfn = dev->devfn;
+	pci_dev_put(dev);
+
+	dev_dbg(&bus->dev, "cxl: Removing PCI devices from kernel\n");
+	pci_lock_rescan_remove();
+	pci_hp_remove_devices(bridge->subordinate);
+	pci_unlock_rescan_remove();
+
+	/* Switch the CXL protocol on the card */
+	if (switch_work->mode == CXL_BIMODE_CXL) {
+		dev_info(&bus->dev, "cxl: Switching card to CXL mode\n");
+		val &= ~CXL_VSEC_PROTOCOL_MASK;
+		val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
+		rc = pnv_cxl_enable_phb_kernel_api(hose, true);
+		if (rc) {
+			dev_err(&bus->dev, "cxl: Failed to enable kernel API"
+				           " on real PHB, aborting\n");
+			goto err_free_work;
+		}
+	} else {
+		dev_WARN(&bus->dev, "cxl: Switching card to PCI mode not supported!\n");
+		goto err_free_work;
+	}
+
+	rc = CXL_WRITE_VSEC_MODE_CONTROL_BUS(bus, devfn, switch_work->vsec, val);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: Failed to configure CXL protocol: %i\n", rc);
+		goto err_free_work;
+	}
+
+	/*
+	 * The CAIA spec (v1.1, Section 10.6 Bi-modal Device Support) states
+	 * we must wait 100ms after this mode switch before touching PCIe config
+	 * space.
+	 */
+	msleep(100);
+
+	/*
+	 * Hot reset to cause the card to come back in cxl mode. A
+	 * OPAL_RESET_PCI_LINK would be sufficient, but currently lacks support
+	 * in skiboot, so we use a hot reset instead.
+	 *
+	 * We call pci_set_pcie_reset_state() on the bridge, as a CAPI card is
+	 * guaranteed to sit directly under the root port, and setting the reset
+	 * state on a device directly under the root port is equivalent to doing
+	 * it on the root port iself.
+	 */
+	dev_info(&bus->dev, "cxl: Configuration write complete, resetting card\n");
+	pci_set_pcie_reset_state(bridge, pcie_hot_reset);
+	pci_set_pcie_reset_state(bridge, pcie_deassert_reset);
+
+	dev_dbg(&bus->dev, "cxl: Offlining slot\n");
+	rc = pnv_php_set_slot_power_state(&php_slot->slot, OPAL_PCI_SLOT_OFFLINE);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: OPAL offlining call failed: %i\n", rc);
+		goto err_free_work;
+	}
+
+	dev_dbg(&bus->dev, "cxl: Onlining and probing slot\n");
+	rc = pnv_php_set_slot_power_state(&php_slot->slot, OPAL_PCI_SLOT_ONLINE);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: OPAL onlining call failed: %i\n", rc);
+		goto err_free_work;
+	}
+
+	pci_lock_rescan_remove();
+	pci_hp_add_devices(bridge->subordinate);
+	pci_unlock_rescan_remove();
+
+	dev_info(&bus->dev, "cxl: CAPI mode switch completed\n");
+	kfree(switch_work);
+	return;
+
+err_free_work:
+	kfree(switch_work);
+}
+
+int cxl_check_and_switch_mode(struct pci_dev *dev, int mode, int vsec)
+{
+	struct cxl_switch_work *work;
+	u8 val;
+	int rc;
+
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
 		return -ENODEV;
+
+	if (!vsec) {
+		vsec = find_cxl_vsec(dev);
+		if (!vsec) {
+			dev_info(&dev->dev, "CXL VSEC not found\n");
+			return -ENODEV;
+		}
 	}
 
-	if ((rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val))) {
-		dev_err(&dev->dev, "failed to read current mode control: %i", rc);
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val);
+	if (rc) {
+		dev_err(&dev->dev, "Failed to read current mode control: %i", rc);
 		return rc;
 	}
-	val &= ~CXL_VSEC_PROTOCOL_MASK;
-	val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
-	if ((rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val))) {
-		dev_err(&dev->dev, "failed to enable CXL protocol: %i", rc);
-		return rc;
+
+	if (mode == CXL_BIMODE_PCI) {
+		if (!(val & CXL_VSEC_PROTOCOL_ENABLE)) {
+			dev_info(&dev->dev, "Card is already in PCI mode\n");
+			return 0;
+		}
+		/*
+		 * TODO: Before it's safe to switch the card back to PCI mode
+		 * we need to disable the CAPP and make sure any cachelines the
+		 * card holds have been flushed out. Needs skiboot support.
+		 */
+		dev_WARN(&dev->dev, "CXL mode switch to PCI unsupported!\n");
+		return -EIO;
 	}
+
+	if (val & CXL_VSEC_PROTOCOL_ENABLE) {
+		dev_info(&dev->dev, "Card is already in CXL mode\n");
+		return 0;
+	}
+
+	dev_info(&dev->dev, "Card is in PCI mode, scheduling kernel thread "
+			    "to switch to CXL mode\n");
+
+	work = kmalloc(sizeof(struct cxl_switch_work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	pci_dev_get(dev);
+	work->dev = dev;
+	work->vsec = vsec;
+	work->mode = mode;
+	INIT_WORK(&work->work, switch_card_to_cxl);
+
+	schedule_work(&work->work);
+
 	/*
-	 * The CAIA spec (v0.12 11.6 Bi-modal Device Support) states
-	 * we must wait 100ms after this mode switch before touching
-	 * PCIe config space.
+	 * We return a failure now to abort the driver init. Once the
+	 * link has been cycled and the card is in cxl mode we will
+	 * come back (possibly using the generic cxl driver), but
+	 * return success as the card should then be in cxl mode.
+	 *
+	 * TODO: What if the card comes back in PCI mode even after
+	 *       the switch?  Don't want to spin endlessly.
 	 */
-	msleep(100);
+	return -EBUSY;
+}
+EXPORT_SYMBOL_GPL(cxl_check_and_switch_mode);
+
+#endif /* CONFIG_CXL_BIMODAL */
+
+static int setup_cxl_protocol_area(struct pci_dev *dev)
+{
+	u8 val;
+	int rc;
+	int vsec = find_cxl_vsec(dev);
+
+	if (!vsec) {
+		dev_info(&dev->dev, "CXL VSEC not found\n");
+		return -ENODEV;
+	}
+
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val);
+	if (rc) {
+		dev_err(&dev->dev, "Failed to read current mode control: %i\n", rc);
+		return rc;
+	}
+
+	if (!(val & CXL_VSEC_PROTOCOL_ENABLE)) {
+		dev_err(&dev->dev, "Card not in CAPI mode!\n");
+		return -EIO;
+	}
+
+	/* Still configure the protocol area for single mode cards */
+	if ((val & CXL_VSEC_PROTOCOL_MASK) != CXL_VSEC_PROTOCOL_256TB) {
+		val &= ~CXL_VSEC_PROTOCOL_MASK;
+		val |= CXL_VSEC_PROTOCOL_256TB;
+		rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val);
+		if (rc) {
+			dev_err(&dev->dev, "Failed to set CXL protocol area: %i\n", rc);
+			return rc;
+		}
+	}
 
 	return 0;
 }
@@ -1249,7 +1447,7 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = setup_cxl_bars(dev)))
 		return rc;
 
-	if ((rc = switch_card_to_cxl(dev)))
+	if ((rc = setup_cxl_protocol_area(dev)))
 		return rc;
 
 	if ((rc = cxl_update_image_control(adapter)))
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index ed81a17..e5e17ed 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -39,6 +39,31 @@
 bool cxl_slot_is_supported(struct pci_dev *dev, int flags);
 
 
+#define CXL_BIMODE_CXL 1
+#define CXL_BIMODE_PCI 2
+
+/*
+ * Check the mode that the given bi-modal CXL adapter is currently in and
+ * change it if necessary. This does not apply to AFU drivers.
+ *
+ * If the mode matches the requested mode this function will return 0 - if the
+ * driver was expecting the generic CXL driver to have bound to the adapter and
+ * it gets this return value it should fail the probe function to give the CXL
+ * driver a chance to probe it.
+ *
+ * If the mode does not match it will start a background task to unplug the
+ * device from Linux and switch its mode, and will return -EBUSY. At this
+ * point the calling driver should make sure it has released the device and
+ * fail its probe function.
+ *
+ * The offset of the CXL VSEC can be provided to this function. If 0 is passed,
+ * this function will search for a CXL VSEC with ID 0x1280 and return -ENODEV
+ * if it is not found.
+ */
+#ifdef CONFIG_CXL_BIMODAL
+int cxl_check_and_switch_mode(struct pci_dev *dev, int mode, int vsec);
+#endif
+
 /* Get the AFU associated with a pci_dev */
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
  2016-07-04 13:22 ` [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
@ 2016-07-05  0:03   ` Gavin Shan
  2016-07-05  1:08     ` Andrew Donnellan
  0 siblings, 1 reply; 46+ messages in thread
From: Gavin Shan @ 2016-07-05  0:03 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen, Gavin Shan,
	linux-pci, Bjorn Helgaas

On Mon, Jul 04, 2016 at 11:22:10PM +1000, Ian Munsie wrote:
>From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
>
>The cxl driver will use infrastructure from pnv_php to handle device tree
>updates when switching bi-modal CAPI cards into CAPI mode.
>
>To enable this, export pnv_php_find_slot() and
>pnv_php_set_slot_power_state(), and add corresponding declarations, as well
>as the definition of struct pnv_php_slot, to asm/pnv-pci.h.
>
>Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
>Cc: linux-pci@vger.kernel.org
>Cc: Bjorn Helgaas <bhelgaas@google.com>
>Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
>Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>---

.../...

>diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
>index aadce45..b719a72 100644
>--- a/drivers/pci/hotplug/Kconfig
>+++ b/drivers/pci/hotplug/Kconfig
>@@ -117,6 +117,7 @@ config HOTPLUG_PCI_POWERNV
> 	tristate "PowerPC PowerNV PCI Hotplug driver"
> 	depends on PPC_POWERNV && EEH
> 	select OF_DYNAMIC
>+	select HOTPLUG_PCI_POWERNV_BASE
> 	help
> 	  Say Y here if you run PowerPC PowerNV platform that supports
> 	  PCI Hotplug

Andrew/Ian, it seems HOTPLUG_PCI_POWERNV_BASE isn't defined and we needn't it.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
  2016-07-05  0:03   ` Gavin Shan
@ 2016-07-05  1:08     ` Andrew Donnellan
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-05  1:08 UTC (permalink / raw)
  To: Gavin Shan, Ian Munsie
  Cc: Michael Neuling, Frederic Barrat, linux-pci, Bjorn Helgaas,
	Huy Nguyen, linuxppc-dev

On 05/07/16 10:03, Gavin Shan wrote:
> Andrew/Ian, it seems HOTPLUG_PCI_POWERNV_BASE isn't defined and we needn't it.

Argh, thanks for picking that up! I removed that option and all its 
occurrences in the code based on your earlier private feedback but 
forgot to drop this.

Will fix in V2.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base
  2016-07-04 13:22 ` [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
@ 2016-07-05  2:10   ` Andrew Donnellan
  2016-07-06 16:45   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-05  2:10 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 04/07/16 23:22, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The Mellanox CX4 uses a model where the AFU is one physical function of
> the device, and is used by other peer physical functions of the same
> device. This will require those other devices to grab a reference on the
> AFU when they are initialised to make sure that it does not go away
> during their lifetime.
>
> Move the AFU refcount functions to base.c so they can be called from
> the PHB code.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/14] cxl: Add cxl_slot_is_supported API
  2016-07-04 13:22 ` [PATCH 02/14] cxl: Add cxl_slot_is_supported API Ian Munsie
@ 2016-07-06  2:02   ` Andrew Donnellan
  2016-07-06 16:36   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-06  2:02 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen
  Cc: Philippe Bergheaud

On 04/07/16 23:22, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This extends the check that the adapter is in a CAPI capable slot so
> that it may be called by external users in the kernel API. This will be
> used by the upcoming Mellanox CX4 support, which needs to know ahead of
> time if the card can be switched to cxl mode so that it can leave it in
> PCI mode if it is not.
>
> This API takes a parameter to check if CAPP DMA mode is supported, which
> it currently only allows on P8NVL systems, since that mode currently has
> issues accessing memory < 4GB on P8, and we cannot realistically avoid
> that.
>
> This API does not currently check if a CAPP unit is available (i.e. not
> already assigned to another PHB) on P8. Doing so would be racy since it
> is assigned on a first come first serve basis, and so long as CAPP DMA
> mode is not supported on P8 we don't need this, since the only
> anticipated user of this API requires CAPP DMA mode.
>
> Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>  drivers/misc/cxl/pci.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/misc/cxl.h     | 15 +++++++++++++++
>  2 files changed, 52 insertions(+)
>
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 3a5f980..9530280 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1426,6 +1426,43 @@ static int cxl_slot_is_switched(struct pci_dev *dev)
>  	return (depth > CXL_MAX_PCIEX_PARENT);
>  }
>
> +bool cxl_slot_is_supported(struct pci_dev *dev, int flags)
> +{
> +	if (!cpu_has_feature(CPU_FTR_HVMODE))
> +		return false;
> +
> +	if ((flags & CXL_SLOT_FLAG_DMA) && (!pvr_version_is(PVR_POWER8NVL))) {
> +		/*
> +		 * CAPP DMA mode is technically supported on regular P8, but
> +		 * will EEH if the card attempts to acccess memory < 4GB, which

access

> +		 * we cannot realistically avoid. We might be able to work
> +		 * around the issue, but until then return unsupported:
> +		 */
> +		return false;
> +	}
> +
> +	if (cxl_slot_is_switched(dev))
> +		return false;
> +
> +	/*
> +	 * XXX: This gets a little tricky on regular P8 (not POWER8NVL) since
> +	 * the CAPP can be connected to PHB 0, 1 or 2 on a first come first
> +	 * served basis, which is racy to check from here. If we need to
> +	 * support this in future we might need to consider having this
> +	 * function effectively reserve it ahead of time.
> +	 *
> +	 * Currently, the only user of this API is the Mellanox CX4, which is
> +	 * only supported on P8NVL due to the above mentioned limitation of
> +	 * CAPP DMA mode and therefore does not need to worry about thi. If the

this

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file
  2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
@ 2016-07-06  3:44   ` Andrew Donnellan
  2016-07-06 16:27   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-06  3:44 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 04/07/16 23:21, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The support for using the Mellanox CX4 in cxl mode will require
> additions to the PHB code. In preparation for this, move the existing
> cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
> more organised.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> +++ b/arch/powerpc/platforms/powernv/pci-cxl.c
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright 2015 IBM Corp.

If you end up spinning a V2 of this, could probably put "2015, 2016" on 
there.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-04 13:22 ` [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
@ 2016-07-06  3:55   ` Andrew Donnellan
  2016-07-06 18:51   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-06  3:55 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan

On 04/07/16 23:22, Ian Munsie wrote:
> +static int setup_cxl_protocol_area(struct pci_dev *dev)
> +{
> +	u8 val;
> +	int rc;
> +	int vsec = find_cxl_vsec(dev);
> +
> +	if (!vsec) {
> +		dev_info(&dev->dev, "CXL VSEC not found\n");
> +		return -ENODEV;
> +	}
> +
> +	rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val);
> +	if (rc) {
> +		dev_err(&dev->dev, "Failed to read current mode control: %i\n", rc);
> +		return rc;
> +	}
> +
> +	if (!(val & CXL_VSEC_PROTOCOL_ENABLE)) {
> +		dev_err(&dev->dev, "Card not in CAPI mode!\n");
> +		return -EIO;
> +	}
> +
> +	/* Still configure the protocol area for single mode cards */

This comment is extraneous and will be dropped in V2.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode
  2016-07-04 13:22 ` [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
@ 2016-07-06  4:04   ` Andrew Donnellan
  2016-07-06 16:37   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-06  4:04 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 04/07/16 23:22, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
> master to be enabled in order for the CAPI traffic to flow. This should
> be harmless to enable for other cxl devices, so unconditionally enable
> it in the adapter init flow.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-04 13:22 ` [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
@ 2016-07-06  4:42   ` Andrew Donnellan
  2016-07-06 18:42   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-06  4:42 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 04/07/16 23:22, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The CX4 card cannot cope with a context with PE=0 due to a hardware
> limitation, resulting in:
>
> [   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
> [   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting
>
> Since the kernel API allocates a default context very early during
> device init that will almost certainly get Process Element ID 0 there is
> no easy way for us to extend the API to allow the Mellanox to inform us
> of this limitation ahead of time.
>
> Instead, work around the issue by extending the XSL structure to include
> a minimum PE to allocate. Although the bug is not in the XSL, it is the
> easiest place to work around this limitation given that the CX4 is
> currently the only card that uses an XSL.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file
  2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
  2016-07-06  3:44   ` Andrew Donnellan
@ 2016-07-06 16:27   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 16:27 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:21, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The support for using the Mellanox CX4 in cxl mode will require
> additions to the PHB code. In preparation for this, move the existing
> cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
> more organised.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>


Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/14] cxl: Add cxl_slot_is_supported API
  2016-07-04 13:22 ` [PATCH 02/14] cxl: Add cxl_slot_is_supported API Ian Munsie
  2016-07-06  2:02   ` Andrew Donnellan
@ 2016-07-06 16:36   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 16:36 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Philippe Bergheaud



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This extends the check that the adapter is in a CAPI capable slot so
> that it may be called by external users in the kernel API. This will be
> used by the upcoming Mellanox CX4 support, which needs to know ahead of
> time if the card can be switched to cxl mode so that it can leave it in
> PCI mode if it is not.
>
> This API takes a parameter to check if CAPP DMA mode is supported, which
> it currently only allows on P8NVL systems, since that mode currently has
> issues accessing memory < 4GB on P8, and we cannot realistically avoid
> that.
>
> This API does not currently check if a CAPP unit is available (i.e. not
> already assigned to another PHB) on P8. Doing so would be racy since it
> is assigned on a first come first serve basis, and so long as CAPP DMA
> mode is not supported on P8 we don't need this, since the only
> anticipated user of this API requires CAPP DMA mode.

Is it me or that last sentence is more complicated than it should? :-) 
Anyway, I get it.

And the rest looks ok.

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode
  2016-07-04 13:22 ` [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
  2016-07-06  4:04   ` Andrew Donnellan
@ 2016-07-06 16:37   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 16:37 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen


Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
> master to be enabled in order for the CAPI traffic to flow. This should
> be harmless to enable for other cxl devices, so unconditionally enable
> it in the adapter init flow.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>


Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base
  2016-07-04 13:22 ` [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
  2016-07-05  2:10   ` Andrew Donnellan
@ 2016-07-06 16:45   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 16:45 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The Mellanox CX4 uses a model where the AFU is one physical function of
> the device, and is used by other peer physical functions of the same
> device. This will require those other devices to grab a reference on the
> AFU when they are initialised to make sure that it does not go away
> during their lifetime.
>
> Move the AFU refcount functions to base.c so they can be called from
> the PHB code.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev
  2016-07-04 13:22 ` [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
@ 2016-07-06 16:51   ` Frederic Barrat
  0 siblings, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 16:51 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The cxl kernel API has a concept of a default context associated with
> each PCI device under the virtual PHB. The Mellanox CX4 will also use
> the cxl kernel API, but it does not use a virtual PHB - rather, the AFU
> appears as a physical function as a peer to the networking functions.
>
> In order to allow the kernel API to work with those networking
> functions, we will need to associate a default context with them as
> well. To this end, refactor the corresponding code to do this in vphb.c
> and export it so that it can be called from the PHB code.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb
  2016-07-04 13:22 ` [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
@ 2016-07-06 17:38   ` Frederic Barrat
  2016-07-07  6:28     ` Ian Munsie
  0 siblings, 1 reply; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 17:38 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



> +	/* No special handling for cxl function: */
> +	if (PCI_FUNC(dev->devfn) == 0)
> +		return true;

I believe that is the first time we're getting a hint of the black magic 
which is going to occur when the card is switched to cxl mode and the 
appearance of a new pci function. I think a general comment explaining 
it is needed somewhere. In this patch or a later one. Also "peer model" 
is used several times in the commit messages, though it's not clear to 
the novice what it really means.

At this point of the review, I was a bit overwhelmed by all the new 
APIs, wondering how everything would end up working together. By the 
last patch, it's understandable, but a few extra comments would help.
For the vPHB model, pretty much all the relevant code is in one file, 
which helps grabbing the full picture. But here it's spread between the 
phb platform code and the cxl driver.

   Fred

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB
  2016-07-04 13:22 ` [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB Ian Munsie
@ 2016-07-06 17:39   ` Frederic Barrat
  2016-07-06 18:30   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 17:39 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> This hooks up support for using the kernel API with a real PHB. After
> the AFU initialisation has completed it calls into the PHB code to pass
> it the AFU that will be used by other peer physical functions on the
> adapter.
>
> The cxl_pci_to_afu API is extended to work with peer PCI devices,
> retrieving the peer AFU from the PHB. This API may also now return an
> error if it is called on a PCI device that is not associated with either
> a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
> down.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>


Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context
  2016-07-04 13:22 ` [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
@ 2016-07-06 18:11   ` Frederic Barrat
  2016-07-07  6:00     ` Ian Munsie
  0 siblings, 1 reply; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:11 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> These APIs will be used by the Mellanox CX4 support. While they function
> standalone to configure existing behaviour, their primary purpose is to
> allow the Mellanox driver to inform the cxl driver of a hardware
> limitation, which will be used in a future patch.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Any way to add a check that the "set max" API is called before the 
interrupts are allocated?

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB
  2016-07-04 13:22 ` [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB Ian Munsie
  2016-07-06 17:39   ` Frederic Barrat
@ 2016-07-06 18:30   ` Frederic Barrat
  2016-07-07  6:32     ` Ian Munsie
  1 sibling, 1 reply; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:30 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen


> @@ -1572,6 +1575,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>   		 */
>   		for (i = 0; i < adapter->slices; i++) {
>   			afu = adapter->afu[i];
> +			/* Only participate in EEH if we are on a virtual PHB */
> +			if (afu->phb == NULL)
> +				return PCI_ERS_RESULT_NONE;
>   			cxl_vphb_error_detected(afu, state);
>   		}


Sorry, I had my notes out of order, something is bugging me here. Don't 
we always define afu->phb, though for Mellanox (or if there's no config 
record in the general case), we don't have any devices attached to it?

Which raises the question of the handling of slot_reset and resume 
callbacks...


   Fred

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation
  2016-07-04 13:22 ` [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
@ 2016-07-06 18:34   ` Frederic Barrat
  0 siblings, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:34 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The Mellanox CX4 has a hardware limitation where only 4 bits of the
> AFU interrupt number can be passed to the XSL when sending an interrupt,
> limiting it to only 15 interrupts per context (AFU interrupt number 0 is
> invalid).
>
> In order to overcome this, we will allocate additional contexts linked
> to the default context as extra address space for the extra interrupts -
> this will be implemented in the next patch.
>
> This patch adds the preliminary support to allow this, by way of adding
> a linked list in the context structure that we use to keep track of the
> contexts dedicated to interrupts, and an API to simultaneously iterate
> over the related context structures, AFU interrupt numbers and hardware
> interrupt numbers. The point of using a single API to iterate these is
> to hide some of the details of the iteration from external code, and to
> reduce the number of APIs that need to be exported via base.c to allow
> built in code to call.
>

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Just one typo below


> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> index fc07ed4..ed81a17 100644
> --- a/include/misc/cxl.h
> +++ b/include/misc/cxl.h
> @@ -178,6 +178,15 @@ int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs);
>   int cxl_get_max_irqs_per_process(struct pci_dev *dev);
>
>   /*
> + * Use to simultaneously iterate over hardware interrupt numbers, contexts and
> + * afu interrupt numbers allocated for the device via pci_enable_msix_range and
> + * is a useful convinience function when working with hardware that has

convenience

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-04 13:22 ` [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
@ 2016-07-06 18:41   ` Frederic Barrat
  2016-07-07  6:03     ` Ian Munsie
  0 siblings, 1 reply; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:41 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
> interrupts are routed from the networking hardware to the XSL using the
> MSIX table, and from there will be transformed back into an MSIX
> interrupt using the cxl style interrupts (i.e. using IVTE entries and
> ranges to map a PE and AFU interrupt number to an MSIX address).
>
> We want to hide the implementation details of cxl interrupts as much as
> possible. To this end, we use a special version of the MSI setup &
> teardown routines in the PHB while in cxl mode to allocate the cxl
> interrupts and configure the IVTE entries in the process element.
>
> This function does not configure the MSIX table - the CX4 card uses a
> custom format in that table and it would not be appropriate to fill that
> out in generic code. The rest of the functionality is similar to the
> "Full MSI-X mode" described in the CAIA, and this could be easily
> extended to support other adapters that use that mode in the future.
>
> The interrupts will be associated with the default context. If the
> maximum number of interrupts per context has been limited (e.g. by the
> mlx5 driver), it will automatically allocate additional kernel contexts
> to associate extra interrupts as required. These contexts will be
> started using the same WED that was used to start the default context.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/pci-cxl.c  | 84 +++++++++++++++++++++++++++++++
>   arch/powerpc/platforms/powernv/pci-ioda.c |  4 ++
>   arch/powerpc/platforms/powernv/pci.h      |  2 +
>   drivers/misc/cxl/api.c                    | 71 ++++++++++++++++++++++++++
>   drivers/misc/cxl/base.c                   | 31 ++++++++++++
>   drivers/misc/cxl/cxl.h                    |  4 ++
>   drivers/misc/cxl/main.c                   |  2 +
>   include/misc/cxl-base.h                   |  4 ++
>   8 files changed, 202 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
> index 2f386f5..1559ca2 100644
> --- a/arch/powerpc/platforms/powernv/pci-cxl.c
> +++ b/arch/powerpc/platforms/powernv/pci-cxl.c
> @@ -8,6 +8,7 @@
>    */
>
>   #include <linux/module.h>
> +#include <linux/msi.h>
>   #include <asm/pci-bridge.h>
>   #include <asm/pnv-pci.h>
>   #include <asm/opal.h>
> @@ -273,3 +274,86 @@ void pnv_cxl_disable_device(struct pci_dev *dev)
>   	cxl_pci_disable_device(dev);
>   	cxl_afu_put(afu);
>   }
> +
> +/*
> + * This is a special version of pnv_setup_msi_irqs for cards in cxl mode. This
> + * function handles setting up the IVTE entries for the XSL to use.
> + *
> + * We are currently not filling out the MSIX table, since the only currently
> + * supported adapter (CX4) uses a custom MSIX table format in cxl mode and it
> + * is up to their driver to fill that out. In the future we may fill out the
> + * MSIX table (and change the IVTE entries to be an index to the MSIX table)
> + * for adapters implementing the Full MSI-X mode described in the CAIA.
> + */
> +int pnv_cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	struct msi_desc *entry;
> +	struct cxl_context *ctx = NULL;
> +	unsigned int virq;
> +	int hwirq;
> +	int afu_irq = 0;
> +	int rc;
> +
> +	if (WARN_ON(!phb) || !phb->msi_bmp.bitmap)
> +		return -ENODEV;
> +
> +	if (pdev->no_64bit_msi && !phb->msi32_support)
> +		return -ENODEV;
> +
> +	rc = cxl_cx4_setup_msi_irqs(pdev, nvec, type);
> +	if (rc)
> +		return rc;
> +
> +	for_each_pci_msi_entry(entry, pdev) {
> +		if (!entry->msi_attrib.is_64 && !phb->msi32_support) {
> +			pr_warn("%s: Supports only 64-bit MSIs\n",
> +				pci_name(pdev));
> +			return -ENXIO;
> +		}
> +
> +		hwirq = cxl_next_msi_hwirq(pdev, &ctx, &afu_irq);
> +		if (WARN_ON(hwirq < 0))
> +			return hwirq;

I think we want:
	if (WARN_ON(hwirq <= 0))
cxl_find_afu_irq() returns 0 if doesn't find the irq, which is not 
supposed to happen here.

   Fred

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-04 13:22 ` [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
  2016-07-06  4:42   ` Andrew Donnellan
@ 2016-07-06 18:42   ` Frederic Barrat
  1 sibling, 0 replies; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:42 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The CX4 card cannot cope with a context with PE=0 due to a hardware
> limitation, resulting in:
>
> [   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
> [   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting
>
> Since the kernel API allocates a default context very early during
> device init that will almost certainly get Process Element ID 0 there is
> no easy way for us to extend the API to allow the Mellanox to inform us
> of this limitation ahead of time.
>
> Instead, work around the issue by extending the XSL structure to include
> a minimum PE to allocate. Although the bug is not in the XSL, it is the
> easiest place to work around this limitation given that the CX4 is
> currently the only card that uses an XSL.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-04 13:22 ` [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
  2016-07-06  3:55   ` Andrew Donnellan
@ 2016-07-06 18:51   ` Frederic Barrat
  2016-07-07  1:18     ` Andrew Donnellan
  1 sibling, 1 reply; 46+ messages in thread
From: Frederic Barrat @ 2016-07-06 18:51 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan



Le 04/07/2016 15:22, Ian Munsie a écrit :
> From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
>
> Add a new API, cxl_check_and_switch_mode() to allow for switching of
> bi-modal CAPI cards, such as the Mellanox CX-4 network card.
>
> When a driver requests to switch a card to CAPI mode, use PCI hotplug
> infrastructure to remove all PCI devices underneath the slot. We then write
> an updated mode control register to the CAPI VSEC, hot reset the card, and
> reprobe the card.
>
> As the card may present a different set of PCI devices after the mode
> switch, use the infrastructure provided by the pnv_php driver and the OPAL
> PCI slot management facilities to ensure that:
>
>    * the old devices are removed from both the OPAL and Linux device trees
>    * the new devices are probed by OPAL and added to the OPAL device tree
>    * the new devices are added to the Linux device tree and probed through
>      the regular PCI device probe path
>
> As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on
> the pnv_php driver.
>
> Refactor existing code that touches the mode control register in the
> regular single mode case into a new function, setup_cxl_protocol_area().
>
> Co-authored-by: Ian Munsie <imunsie@au1.ibm.com>
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>   drivers/misc/cxl/Kconfig |   8 ++
>   drivers/misc/cxl/pci.c   | 234 +++++++++++++++++++++++++++++++++++++++++++----
>   include/misc/cxl.h       |  25 +++++
>   3 files changed, 249 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
> index 560412c..6859723 100644
> --- a/drivers/misc/cxl/Kconfig
> +++ b/drivers/misc/cxl/Kconfig
> @@ -38,3 +38,11 @@ config CXL
>   	  CAPI adapters are found in POWER8 based systems.
>
>   	  If unsure, say N.
> +
> +config CXL_BIMODAL
> +	bool "Support for bi-modal CAPI cards"
> +	depends on HOTPLUG_PCI_POWERNV = y && CXL || HOTPLUG_PCI_POWERNV = m && CXL = m
> +	default y
> +	help
> +	  Select this option to enable support for bi-modal CAPI cards, such as
> +	  the Mellanox CX-4.
> \ No newline at end of file
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 090eee8..63abd26 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -55,6 +55,8 @@
>   	pci_read_config_byte(dev, vsec + 0xa, dest)
>   #define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
>   	pci_write_config_byte(dev, vsec + 0xa, val)
> +#define CXL_WRITE_VSEC_MODE_CONTROL_BUS(bus, devfn, vsec, val) \
> +	pci_bus_write_config_byte(bus, devfn, vsec + 0xa, val)
>   #define CXL_VSEC_PROTOCOL_MASK   0xe0
>   #define CXL_VSEC_PROTOCOL_1024TB 0x80
>   #define CXL_VSEC_PROTOCOL_512TB  0x40
> @@ -614,36 +616,232 @@ static int setup_cxl_bars(struct pci_dev *dev)
>   	return 0;
>   }
>
> -/* pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>; */
> -static int switch_card_to_cxl(struct pci_dev *dev)
> -{
> +#ifdef CONFIG_CXL_BIMODAL
> +
> +struct cxl_switch_work {
> +	struct pci_dev *dev;
> +	struct work_struct work;
>   	int vsec;
> +	int mode;
> +};
> +
> +static void switch_card_to_cxl(struct work_struct *work)
> +{
> +	struct cxl_switch_work *switch_work =
> +		container_of(work, struct cxl_switch_work, work);
> +	struct pci_dev *dev = switch_work->dev;
> +	struct pci_bus *bus = dev->bus;
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pci_dev *bridge;
> +	struct pnv_php_slot *php_slot;
> +	unsigned int devfn;
>   	u8 val;
>   	int rc;
>
> -	dev_info(&dev->dev, "switch card to CXL\n");
> +	dev_info(&bus->dev, "cxl: Preparing for mode switch...\n");
> +	bridge = list_first_entry_or_null(&hose->bus->devices, struct pci_dev,
> +					  bus_list);
> +	if (!bridge) {
> +		dev_WARN(&bus->dev, "cxl: Couldn't find root port!\n");
> +		goto err_free_work;
> +	}
>
> -	if (!(vsec = find_cxl_vsec(dev))) {
> -		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
> +	php_slot = pnv_php_find_slot(pci_device_to_OF_node(bridge));
> +	if (!php_slot) {
> +		dev_err(&bus->dev, "cxl: Failed to find slot hotplug "
> +			           "information. You may need to upgrade "
> +			           "skiboot. Aborting.\n");
> +		pci_dev_put(dev);
> +		goto err_free_work;
> +	}
> +
> +	rc = CXL_READ_VSEC_MODE_CONTROL(dev, switch_work->vsec, &val);
> +	if (rc) {
> +		dev_err(&bus->dev, "cxl: Failed to read CAPI mode control: %i\n", rc);
> +		pci_dev_put(dev);
> +		goto err_free_work;
> +	}
> +	devfn = dev->devfn;
> +	pci_dev_put(dev);

This is to balance the 'get' done in cxl_check_and_switch_mode(), right? 
A comment wouldn't hurt. I think we're missing the 'put' on the first 
error path above (!bridge).

I was half-expecting to see a new entry in the cxl_pci_tbl pci ID table 
for the Mellanox entry, but no such thing. By what magic is cxl_probe() 
called after the switch? Because of the device class?

Out of curiosity, could you tell me what the 3rd pci function looks like 
(vendor ID, device ID, ....)?
Thanks!

   Fred

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-06 18:51   ` Frederic Barrat
@ 2016-07-07  1:18     ` Andrew Donnellan
  2016-07-07  6:26       ` Ian Munsie
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-07  1:18 UTC (permalink / raw)
  To: Frederic Barrat, Ian Munsie, Michael Ellerman, Michael Neuling,
	linuxppc-dev, Huy Nguyen
  Cc: Gavin Shan

Thanks for the review Fred!

On 07/07/16 04:51, Frederic Barrat wrote:
>> +    rc = CXL_READ_VSEC_MODE_CONTROL(dev, switch_work->vsec, &val);
>> +    if (rc) {
>> +        dev_err(&bus->dev, "cxl: Failed to read CAPI mode control:
>> %i\n", rc);
>> +        pci_dev_put(dev);
>> +        goto err_free_work;
>> +    }
>> +    devfn = dev->devfn;
>> +    pci_dev_put(dev);
>
> This is to balance the 'get' done in cxl_check_and_switch_mode(), right?
> A comment wouldn't hurt. I think we're missing the 'put' on the first
> error path above (!bridge).

Yep, it's to balance the pci_dev_get() in cxl_check_and_switch_mode() - 
you're right, a comment to that effect wouldn't hurt.

You're also right about the error path. Will fix in V2.

> I was half-expecting to see a new entry in the cxl_pci_tbl pci ID table
> for the Mellanox entry, but no such thing. By what magic is cxl_probe()
> called after the switch? Because of the device class?

It matches against the class, as function 0 of the device after reset 
comes up as a class 1200 processing accelerator.

Perhaps we should be a bit more explicit though...

> Out of curiosity, could you tell me what the 3rd pci function looks like
> (vendor ID, device ID, ....)?

Before:

root@io163:~# lspci -vnn
0000:00:00.0 PCI bridge [0604]: IBM Device [1014:03dc] (prog-if 00 
[Normal decode])
         Flags: fast devsel
         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
         I/O behind bridge: 00000000-00000fff
         Capabilities: [40] Power Management version 3
         Capabilities: [48] Express Root Port (Slot-), MSI 00
         Capabilities: [100] Advanced Error Reporting
         Capabilities: [148] #19

0000:01:00.0 Infiniband controller [0207]: Mellanox Technologies MT27700 
Family [ConnectX-4] [15b3:1013]
         Subsystem: IBM Device [1014:04f4]
         Flags: fast devsel, IRQ 502
         Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=32M]
         Capabilities: [60] Express Endpoint, MSI 00
         Capabilities: [48] Vital Product Data
         Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
         Capabilities: [c0] Vendor Specific Information: Len=18 <?>
         Capabilities: [40] Power Management version 3
         Capabilities: [100] Device Serial Number ba-da-ce-55-de-ad-ca-fe
         Capabilities: [160] Vendor Specific Information: ID=1280 Rev=0 
Len=080 <?>
         Capabilities: [240] #19

0000:01:00.1 Infiniband controller [0207]: Mellanox Technologies MT27700 
Family [ConnectX-4] [15b3:1013]
         Subsystem: IBM Device [1014:04f4]
         Flags: fast devsel, IRQ 502
         Memory at 200002000000 (64-bit, prefetchable) [disabled] [size=32M]
         Capabilities: [60] Express Endpoint, MSI 00
         Capabilities: [48] Vital Product Data
         Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
         Capabilities: [40] Power Management version 3
         Capabilities: [100] Device Serial Number ba-da-ce-55-de-ad-ca-fe

After:

root@io163:~# lspci -vnn
0000:00:00.0 PCI bridge [0604]: IBM Device [1014:03dc] (prog-if 00 
[Normal decode])
         Flags: bus master, fast devsel, latency 0
         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
         I/O behind bridge: 00000000-00000fff
         Capabilities: [40] Power Management version 3
         Capabilities: [48] Express Root Port (Slot-), MSI 00
         Capabilities: [100] Advanced Error Reporting
         Capabilities: [148] #19

0000:01:00.0 Processing accelerators [1200]: Mellanox Technologies 
MT27700 Family [ConnectX-4] [15b3:1013]
         Subsystem: IBM Device [1014:04f4]
         Physical Slot: Slot3
         Flags: bus master, fast devsel, latency 0, IRQ 502
         Memory at 200004000000 (64-bit, prefetchable) [size=128K]
         Memory at 200004020000 (64-bit, prefetchable) [size=128K]
         Memory at <ignored> (64-bit, prefetchable) [size=256T]
         Capabilities: [60] Express Endpoint, MSI 00
         Capabilities: [48] Vital Product Data
         Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
         Capabilities: [100] Device Serial Number ba-da-ce-55-de-ad-ca-fe
         Capabilities: [160] Vendor Specific Information: ID=1280 Rev=0 
Len=080 <?>
         Kernel driver in use: cxl-pci

0000:01:00.1 Infiniband controller [0207]: Mellanox Technologies MT27700 
Family [ConnectX-4] [15b3:1013]
         Subsystem: IBM Device [1014:04f4]
         Physical Slot: Slot3
         Flags: bus master, fast devsel, latency 0, IRQ 502
         Memory at 200000000000 (64-bit, prefetchable) [size=32M]
         Capabilities: [60] Express Endpoint, MSI 00
         Capabilities: [48] Vital Product Data
         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
         Capabilities: [40] Power Management version 3
         Capabilities: [100] Device Serial Number ba-da-ce-55-de-ad-ca-fe
         Kernel driver in use: mlx5_core

0000:01:00.2 Infiniband controller [0207]: Mellanox Technologies MT27700 
Family [ConnectX-4] [15b3:1013]
         Subsystem: IBM Device [1014:04f4]
         Physical Slot: Slot3
         Flags: bus master, fast devsel, latency 0, IRQ 502
         Memory at 200002000000 (64-bit, prefetchable) [size=32M]
         Capabilities: [60] Express Endpoint, MSI 00
         Capabilities: [48] Vital Product Data
         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
         Capabilities: [40] Power Management version 3
         Capabilities: [100] Device Serial Number ba-da-ce-55-de-ad-ca-fe
         Kernel driver in use: mlx5_core

Andrew

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context
  2016-07-06 18:11   ` Frederic Barrat
@ 2016-07-07  6:00     ` Ian Munsie
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-07  6:00 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Michael Neuling, andrew.donnellan,
	linuxppc-dev, Huy Nguyen

Excerpts from Frederic Barrat's message of 2016-07-06 20:11:48 +0200:
> 
> Le 04/07/2016 15:22, Ian Munsie a écrit :
> > From: Ian Munsie <imunsie@au1.ibm.com>
> >
> > These APIs will be used by the Mellanox CX4 support. While they function
> > standalone to configure existing behaviour, their primary purpose is to
> > allow the Mellanox driver to inform the cxl driver of a hardware
> > limitation, which will be used in a future patch.
> >
> > Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> 
> Any way to add a check that the "set max" API is called before the 
> interrupts are allocated?

I don't think there is any real need - if the set max API has not been
called then we use the maximum number of interrupts possible on the PHB,
which is the correct thing to do if we don't need the workaround. We
could try adding a WARN in the set max API if interrupts have previously
been allocated, but realistically - if a driver developer needs to use
this they already know it and will be testing for it.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-06 18:41   ` Frederic Barrat
@ 2016-07-07  6:03     ` Ian Munsie
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-07  6:03 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Michael Neuling, andrew.donnellan,
	linuxppc-dev, Huy Nguyen

Excerpts from Frederic Barrat's message of 2016-07-06 20:41:42 +0200:
> I think we want:
>     if (WARN_ON(hwirq <= 0))
> cxl_find_afu_irq() returns 0 if doesn't find the irq, which is not 
> supposed to happen here.

Good catch - will fix in v2.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-07  1:18     ` Andrew Donnellan
@ 2016-07-07  6:26       ` Ian Munsie
  2016-07-07  6:44         ` Andrew Donnellan
  0 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-07  6:26 UTC (permalink / raw)
  To: andrew.donnellan
  Cc: Frederic Barrat, Michael Ellerman, Michael Neuling, linuxppc-dev,
	Huy Nguyen, Gavin Shan

Excerpts from andrew.donnellan's message of 2016-07-07 11:18:37 +1000:
> > This is to balance the 'get' done in cxl_check_and_switch_mode(), right?
> > A comment wouldn't hurt. I think we're missing the 'put' on the first
> > error path above (!bridge).
> 
> Yep, it's to balance the pci_dev_get() in cxl_check_and_switch_mode() - 
> you're right, a comment to that effect wouldn't hurt.
> 
> You're also right about the error path. Will fix in V2.

We could probably use a dedicated error label for all the error paths
before the pci_dev_put in the main function so we don't need it in every
error path.

> > I was half-expecting to see a new entry in the cxl_pci_tbl pci ID table
> > for the Mellanox entry, but no such thing. By what magic is cxl_probe()
> > called after the switch? Because of the device class?
> 
> It matches against the class, as function 0 of the device after reset 
> comes up as a class 1200 processing accelerator.
> 
> Perhaps we should be a bit more explicit though...

If we explicitly match the Vendor + Device ID we will also match the
networking functions, which we can't do, because before the mode switch
there *IS* a CAPI VSEC in one of the networking functions and our driver
would mistake it as a generic accelerator and try to initialise it. We
could add a comment to this effect to the PCI ID table.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb
  2016-07-06 17:38   ` Frederic Barrat
@ 2016-07-07  6:28     ` Ian Munsie
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-07  6:28 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Michael Neuling, andrew.donnellan,
	linuxppc-dev, Huy Nguyen

Excerpts from Frederic Barrat's message of 2016-07-06 19:38:18 +0200:
> 
> > +    /* No special handling for cxl function: */
> > +    if (PCI_FUNC(dev->devfn) == 0)
> > +        return true;
> 
> I believe that is the first time we're getting a hint of the black magic 
> which is going to occur when the card is switched to cxl mode and the 
> appearance of a new pci function. I think a general comment explaining 
> it is needed somewhere. In this patch or a later one. Also "peer model" 
> is used several times in the commit messages, though it's not clear to 
> the novice what it really means.
> 
> At this point of the review, I was a bit overwhelmed by all the new 
> APIs, wondering how everything would end up working together. By the 
> last patch, it's understandable, but a few extra comments would help.
> For the vPHB model, pretty much all the relevant code is in one file, 
> which helps grabbing the full picture. But here it's spread between the 
> phb platform code and the cxl driver.
> 
>    Fred

Ok, will see what I can to to clarify this.

-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB
  2016-07-06 18:30   ` Frederic Barrat
@ 2016-07-07  6:32     ` Ian Munsie
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Munsie @ 2016-07-07  6:32 UTC (permalink / raw)
  To: Frederic Barrat
  Cc: Michael Ellerman, Michael Neuling, andrew.donnellan,
	linuxppc-dev, Huy Nguyen

Excerpts from Frederic Barrat's message of 2016-07-06 20:30:41 +0200:
> 
> > @@ -1572,6 +1575,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> >            */
> >           for (i = 0; i < adapter->slices; i++) {
> >               afu = adapter->afu[i];
> > +            /* Only participate in EEH if we are on a virtual PHB */
> > +            if (afu->phb == NULL)
> > +                return PCI_ERS_RESULT_NONE;
> >               cxl_vphb_error_detected(afu, state);
> >           }
> 
> 
> Sorry, I had my notes out of order, something is bugging me here. Don't 
> we always define afu->phb, though for Mellanox (or if there's no config 
> record in the general case), we don't have any devices attached to it?

I think you're right. I'll change the vPHB code to skip it if there are
no configuration records.

> Which raises the question of the handling of slot_reset and resume 
> callbacks...

We aren't going to support EEH (at least not yet) - the vPHB model makes
this (relatively) easy since we can notify the AFU drivers when we get
notified, but in the peer model it will be the real PHB notifying us and
the networking drivers. If we do end up supporting that, it will come
later.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-07  6:26       ` Ian Munsie
@ 2016-07-07  6:44         ` Andrew Donnellan
  2016-07-07  8:15           ` Andrew Donnellan
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-07  6:44 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Frederic Barrat, Michael Ellerman, Michael Neuling, linuxppc-dev,
	Huy Nguyen, Gavin Shan

On 07/07/16 16:26, Ian Munsie wrote:
> We could probably use a dedicated error label for all the error paths
> before the pci_dev_put in the main function so we don't need it in every
> error path.

Yep, I've added that.

> If we explicitly match the Vendor + Device ID we will also match the
> networking functions, which we can't do, because before the mode switch
> there *IS* a CAPI VSEC in one of the networking functions and our driver
> would mistake it as a generic accelerator and try to initialise it. We
> could add a comment to this effect to the PCI ID table.

We can match the vendor, device ID *and* class code - unfortunately 
there isn't a macro for this, which makes it a little bit less 
aesthetically pleasing, but I'm pretty sure this works.

I'm not entirely sure how I feel about our current strategy of matching 
on all class 1200 devices (though if it weren't a CAPI device we'd bail 
very quickly...) - my quick grepping tells me we're one of a very small 
set of drivers in the kernel that uses PCI_DEVICE_CLASS.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-07  6:44         ` Andrew Donnellan
@ 2016-07-07  8:15           ` Andrew Donnellan
  2016-07-11  9:19             ` Ian Munsie
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-07  8:15 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Michael Neuling, Gavin Shan, Frederic Barrat, Huy Nguyen, linuxppc-dev

On 07/07/16 16:44, Andrew Donnellan wrote:
> We can match the vendor, device ID *and* class code - unfortunately
> there isn't a macro for this, which makes it a little bit less
> aesthetically pleasing, but I'm pretty sure this works.

Something like the below, which works fine:

/*
  * Matches a given PCI vendor ID and device ID, but only for class 12
  * (processing accelerators). Useful for bi-modal cards, such as the
  * Mellanox ConnectX-4, which keep the same vendor/device ID
  * post-mode-switch.
  */
#define PCI_DEVICE_ACCEL(vend, dev) \
	.vendor = (vend), .device = (dev), \
	.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, \
	.class = 0x120000, .class_mask = 0xff0000

static const struct pci_device_id cxl_pci_tbl[] = {
	/* FPGA devices */
	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0477), },
	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x044b), },
	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x04cf), },
	{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0601), },
	/* Mellanox ConnectX-4 */
	{ PCI_DEVICE_ACCEL(PCI_VENDOR_ID_MELLANOX, 0x1013), },
	{ }
};
MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);


-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-07  8:15           ` Andrew Donnellan
@ 2016-07-11  9:19             ` Ian Munsie
  2016-07-12  1:20               ` Andrew Donnellan
  0 siblings, 1 reply; 46+ messages in thread
From: Ian Munsie @ 2016-07-11  9:19 UTC (permalink / raw)
  To: andrew.donnellan
  Cc: Michael Neuling, Gavin Shan, Frederic Barrat, Huy Nguyen, linuxppc-dev

Excerpts from andrew.donnellan's message of 2016-07-07 18:15:06 +1000:
> On 07/07/16 16:44, Andrew Donnellan wrote:
> > We can match the vendor, device ID *and* class code - unfortunately
> > there isn't a macro for this, which makes it a little bit less
> > aesthetically pleasing, but I'm pretty sure this works.
> 
> Something like the below, which works fine:

I like this solution, but I'm not going to include it in v2 of this
series and would rather it be submitted separately. The reason being is
that this series will work as is, and I'd like to see this undergo some
regression testing separate to the cx4 work, and a bit of scrutiny from
the hardware team just in case we are missing any device IDs that would
no longer be matched(I'm not aware of any, but you never know).

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-11  9:19             ` Ian Munsie
@ 2016-07-12  1:20               ` Andrew Donnellan
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Donnellan @ 2016-07-12  1:20 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Michael Neuling, Gavin Shan, Frederic Barrat, Huy Nguyen, linuxppc-dev

On 11/07/16 19:19, Ian Munsie wrote:
> I like this solution, but I'm not going to include it in v2 of this
> series and would rather it be submitted separately. The reason being is
> that this series will work as is, and I'd like to see this undergo some
> regression testing separate to the cx4 work, and a bit of scrutiny from
> the hardware team just in case we are missing any device IDs that would
> no longer be matched(I'm not aware of any, but you never know).

Yep, I can send it separately.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-07-12  1:20 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-04 13:21 powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
2016-07-04 13:21 ` [PATCH 01/14] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
2016-07-06  3:44   ` Andrew Donnellan
2016-07-06 16:27   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 02/14] cxl: Add cxl_slot_is_supported API Ian Munsie
2016-07-06  2:02   ` Andrew Donnellan
2016-07-06 16:36   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 03/14] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
2016-07-06  4:04   ` Andrew Donnellan
2016-07-06 16:37   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 04/14] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
2016-07-05  2:10   ` Andrew Donnellan
2016-07-06 16:45   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 05/14] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
2016-07-06 16:51   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 06/14] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
2016-07-06 17:38   ` Frederic Barrat
2016-07-07  6:28     ` Ian Munsie
2016-07-04 13:22 ` [PATCH 07/14] cxl: Add support for using the kernel API with a real PHB Ian Munsie
2016-07-06 17:39   ` Frederic Barrat
2016-07-06 18:30   ` Frederic Barrat
2016-07-07  6:32     ` Ian Munsie
2016-07-04 13:22 ` [PATCH 08/14] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
2016-07-06 18:11   ` Frederic Barrat
2016-07-07  6:00     ` Ian Munsie
2016-07-04 13:22 ` [PATCH 09/14] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
2016-07-06 18:34   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 10/14] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
2016-07-06 18:41   ` Frederic Barrat
2016-07-07  6:03     ` Ian Munsie
2016-07-04 13:22 ` [PATCH 11/14] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
2016-07-06  4:42   ` Andrew Donnellan
2016-07-06 18:42   ` Frederic Barrat
2016-07-04 13:22 ` [PATCH 12/14] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
2016-07-05  0:03   ` Gavin Shan
2016-07-05  1:08     ` Andrew Donnellan
2016-07-04 13:22 ` [PATCH 13/14] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
2016-07-04 13:22 ` [PATCH 14/14] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
2016-07-06  3:55   ` Andrew Donnellan
2016-07-06 18:51   ` Frederic Barrat
2016-07-07  1:18     ` Andrew Donnellan
2016-07-07  6:26       ` Ian Munsie
2016-07-07  6:44         ` Andrew Donnellan
2016-07-07  8:15           ` Andrew Donnellan
2016-07-11  9:19             ` Ian Munsie
2016-07-12  1:20               ` Andrew Donnellan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.