linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode
@ 2016-07-13 21:16 Ian Munsie
  2016-07-13 21:17 ` [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
                   ` (14 more replies)
  0 siblings, 15 replies; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:16 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

This series adds support for the Mellanox CX4 network adapter operating in cxl
mode to the cxl driver and the PowerNV PHB code. The Mellanox developers will
submit a separate patch series that makes use of this in the mlx5 driver.

The CX4 card can operate in either pci mode, or cxl mode. In cxl mode, memory
accesses from the card go through the XSL (Translation Service Layer,
essentially a stripped down version of the Power Service Layer), allowing it to
transparently access unpinned memory with the cxl driver handling faulting in
pages as necessary, etc. Most of the support for the XSL is already upstream,
though this series does include a bug fix to enable bus mastering for this
(patch 3).

Patch 2 in this series provides an API which the mlx5 driver can query to check
if it is in a cxl capable slot. The card will come up in pci mode, and the mlx5
driver can choose to switch it to cxl mode, wherein it will reappear with an
additional physical function representing the XSL that the cxl driver will bind
to. Patches 13-15 add support for switching the card's mode, including using
the PCI hotplug support to re-enumerate the device tree and re-probind the
card.

Unlike previous users of the cxl kernel API where we used a virtual PHB and
exposed PCI devices under it, the Mellanox CX4 uses a peer model where cxl
binds to one of the physical functions of the card and the mlx5_core driver
binds to the other networking physical functions. Patch 6 skips creating a vPHB
for AFUs without any AFU configuration records (including devices using the
peer model) and opts out of EEH handling. Patches 7 and 8 add support for using
the cxl kernel API with the real PHB to enable this peer model. Patches 4 and
5 are prepatory patches exposing some APIs that the PHB will need to call.

While in cxl mode, interrupts from the CX4 are a little unusual - they are
neither pci interrupts, nor cxl interrutps, but rather a hybrid of the two. The
interrupts are passed from the networking hardware to the XSL using a custom
format in the MSIX table, and from there are treated as cxl interrupts. These
are configured mostly transparently using the standard msix APIs - the PHB
handles allocating and configuring the cxl interrupts, associating them with
the default context, and the mlx5 driver handles filling out the MSIX table
with their custom format (not included in this series). See patch 11.

Additionally, the CX4 has a hard limitation of the number of interrupts that
can be associated with a given context, so to overcome this patches 9 and 10
expose an API to allow the mlx5 driver to inform us of the limit, and the
interrupt allocation code in patch 11 will allocate additional contexts to
associate these with.

Patch 1 is a prepatory cleanup patch to reorganise cxl code in arch/powerpc
into a separate file.

Patch 12 is a workaround for a hardware limitation in the CX4 where a context
with PE=0 cannot be used.

The entire series is bisectable.

Changes since v2:
	Addressed feedback from Andrew Donnellan:
	- Fixed typos in several comments
	- Moved _cxl_pci_associate_default_context and
	  _cxl_pci_disable_device from vphb.c to a new file phb.c since
	  they are used by both the vPHB and peer models. (Patch 5)
	- Changed two exported symbols to EXPORT_SYMBOL_GPL (Patch 7)
	- Undid change to remove static from pnv_pci_release_device and
	  pci_controller_ops and declare them in the header, both of
	  which were left over from an earlier cut. (Patch 7)

Changes since v1:
	- New patch 6 to skip creating a vPHB if there are no AFU configuration
	  records, and opt out of EEH handling (partially split from patch 8).
	- Updated comments in various patches (1, 2, 7, 10, 15) with feedback
	  from Andrew Donnellan and Frederic Barrat
	- Handle error case if cxl_next_msi_hwirq returns 0 signifying
	  that an AFU IRQ is not mapped to a hardware interrupt (Patch 11)
	- Dropped extraneous "select HOTPLUG_PCI_POWERNV_BASE" in Kconfig,
	  which was accidentally left in from an earlier non-public
	  revision. Thanks to Gavin Shan for pointing it out (Patch 13)
	- Added new error label for error paths calling pci_dev_put() -
	  suggested by Ian Munsie (Patch 15)
	- Added newline at end of Kconfig (Patch 15)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [01/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 02/15] cxl: Add cxl_slot_is_supported API Ian Munsie
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The support for using the Mellanox CX4 in cxl mode will require
additions to the PHB code. In preparation for this, move the existing
cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
more organised.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

---

V1 -> V2:
	Changed copyright message in new file to 2014-2016, since most
	of the code originated in other files written in 2014, and will
	be adding new code shortly.
---
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/pci-cxl.c  | 163 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c | 159 +----------------------------
 arch/powerpc/platforms/powernv/pci.h      |   6 ++
 4 files changed, 173 insertions(+), 156 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-cxl.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index cd9711e..b5d98cb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,6 +6,7 @@ obj-y			+= opal-kmsg.o
 
 obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
 obj-$(CONFIG_EEH)	+= eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
new file mode 100644
index 0000000..e0eeb00
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright 2014-2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/pnv-pci.h>
+#include <asm/opal.h>
+
+#include "pci.h"
+
+struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+	return of_node_get(hose->dn);
+}
+EXPORT_SYMBOL(pnv_pci_get_phb_node);
+
+int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	pe = pnv_ioda_get_pe(dev);
+	if (!pe)
+		return -ENODEV;
+
+	pe_info(pe, "Switching PHB to CXL\n");
+
+	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, mode, pe->pe_number);
+	if (rc == OPAL_UNSUPPORTED)
+		dev_err(&dev->dev, "Required cxl mode not supported by firmware - update skiboot\n");
+	else if (rc)
+		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
+
+	return rc;
+}
+EXPORT_SYMBOL(pnv_phb_to_cxl_mode);
+
+/* Find PHB for cxl dev and allocate MSI hwirqs?
+ * Returns the absolute hardware IRQ number
+ */
+int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
+
+	if (hwirq < 0) {
+		dev_warn(&dev->dev, "Failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	return phb->msi_base + hwirq;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
+
+void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
+
+void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
+				  struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int i, hwirq;
+
+	for (i = 1; i < CXL_IRQ_RANGES; i++) {
+		if (!irqs->range[i])
+			continue;
+		pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
+			 i, irqs->offset[i],
+			 irqs->range[i]);
+		hwirq = irqs->offset[i] - phb->msi_base;
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
+				       irqs->range[i]);
+	}
+}
+EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
+
+int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
+			       struct pci_dev *dev, int num)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	int i, hwirq, try;
+
+	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
+
+	/* 0 is reserved for the multiplexed PSL DSI interrupt */
+	for (i = 1; i < CXL_IRQ_RANGES && num; i++) {
+		try = num;
+		while (try) {
+			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
+			if (hwirq >= 0)
+				break;
+			try /= 2;
+		}
+		if (!try)
+			goto fail;
+
+		irqs->offset[i] = phb->msi_base + hwirq;
+		irqs->range[i] = try;
+		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
+			 i, irqs->offset[i], irqs->range[i]);
+		num -= try;
+	}
+	if (num)
+		goto fail;
+
+	return 0;
+fail:
+	pnv_cxl_release_hwirq_ranges(irqs, dev);
+	return -ENOSPC;
+}
+EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
+
+int pnv_cxl_get_irq_count(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	return phb->msi_bmp.irq_count;
+}
+EXPORT_SYMBOL(pnv_cxl_get_irq_count);
+
+int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
+			   unsigned int virq)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	unsigned int xive_num = hwirq - phb->msi_base;
+	struct pnv_ioda_pe *pe;
+	int rc;
+
+	if (!(pe = pnv_ioda_get_pe(dev)))
+		return -ENODEV;
+
+	/* Assign XIVE to PE */
+	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
+	if (rc) {
+		pe_warn(pe, "%s: OPAL error %d setting msi_base 0x%x "
+			"hwirq 0x%x XIVE 0x%x PE\n",
+			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
+		return -EIO;
+	}
+	pnv_set_msi_irq_chip(phb, virq);
+
+	return 0;
+}
+EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2115ed7..e0d8103 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -595,7 +595,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
  * but in the meantime, we need to protect them to avoid warnings
  */
 #ifdef CONFIG_PCI_MSI
-static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
+struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -2700,7 +2700,7 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d)
 }
 
 
-static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
+void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 {
 	struct irq_data *idata;
 	struct irq_chip *ichip;
@@ -2722,159 +2722,6 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	irq_set_chip(virq, &phb->ioda.irq_chip);
 }
 
-#ifdef CONFIG_CXL_BASE
-
-struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-
-	return of_node_get(hose->dn);
-}
-EXPORT_SYMBOL(pnv_pci_get_phb_node);
-
-int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	struct pnv_ioda_pe *pe;
-	int rc;
-
-	pe = pnv_ioda_get_pe(dev);
-	if (!pe)
-		return -ENODEV;
-
-	pe_info(pe, "Switching PHB to CXL\n");
-
-	rc = opal_pci_set_phb_cxl_mode(phb->opal_id, mode, pe->pe_number);
-	if (rc == OPAL_UNSUPPORTED)
-		dev_err(&dev->dev, "Required cxl mode not supported by firmware - update skiboot\n");
-	else if (rc)
-		dev_err(&dev->dev, "opal_pci_set_phb_cxl_mode failed: %i\n", rc);
-
-	return rc;
-}
-EXPORT_SYMBOL(pnv_phb_to_cxl_mode);
-
-/* Find PHB for cxl dev and allocate MSI hwirqs?
- * Returns the absolute hardware IRQ number
- */
-int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, num);
-
-	if (hwirq < 0) {
-		dev_warn(&dev->dev, "Failed to find a free MSI\n");
-		return -ENOSPC;
-	}
-
-	return phb->msi_base + hwirq;
-}
-EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs);
-
-void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-
-	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, num);
-}
-EXPORT_SYMBOL(pnv_cxl_release_hwirqs);
-
-void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
-				  struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int i, hwirq;
-
-	for (i = 1; i < CXL_IRQ_RANGES; i++) {
-		if (!irqs->range[i])
-			continue;
-		pr_devel("cxl release irq range 0x%x: offset: 0x%lx  limit: %ld\n",
-			 i, irqs->offset[i],
-			 irqs->range[i]);
-		hwirq = irqs->offset[i] - phb->msi_base;
-		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq,
-				       irqs->range[i]);
-	}
-}
-EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges);
-
-int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
-			       struct pci_dev *dev, int num)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	int i, hwirq, try;
-
-	memset(irqs, 0, sizeof(struct cxl_irq_ranges));
-
-	/* 0 is reserved for the multiplexed PSL DSI interrupt */
-	for (i = 1; i < CXL_IRQ_RANGES && num; i++) {
-		try = num;
-		while (try) {
-			hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, try);
-			if (hwirq >= 0)
-				break;
-			try /= 2;
-		}
-		if (!try)
-			goto fail;
-
-		irqs->offset[i] = phb->msi_base + hwirq;
-		irqs->range[i] = try;
-		pr_devel("cxl alloc irq range 0x%x: offset: 0x%lx  limit: %li\n",
-			 i, irqs->offset[i], irqs->range[i]);
-		num -= try;
-	}
-	if (num)
-		goto fail;
-
-	return 0;
-fail:
-	pnv_cxl_release_hwirq_ranges(irqs, dev);
-	return -ENOSPC;
-}
-EXPORT_SYMBOL(pnv_cxl_alloc_hwirq_ranges);
-
-int pnv_cxl_get_irq_count(struct pci_dev *dev)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-
-	return phb->msi_bmp.irq_count;
-}
-EXPORT_SYMBOL(pnv_cxl_get_irq_count);
-
-int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
-			   unsigned int virq)
-{
-	struct pci_controller *hose = pci_bus_to_host(dev->bus);
-	struct pnv_phb *phb = hose->private_data;
-	unsigned int xive_num = hwirq - phb->msi_base;
-	struct pnv_ioda_pe *pe;
-	int rc;
-
-	if (!(pe = pnv_ioda_get_pe(dev)))
-		return -ENODEV;
-
-	/* Assign XIVE to PE */
-	rc = opal_pci_set_xive_pe(phb->opal_id, pe->pe_number, xive_num);
-	if (rc) {
-		pe_warn(pe, "%s: OPAL error %d setting msi_base 0x%x "
-			"hwirq 0x%x XIVE 0x%x PE\n",
-			pci_name(dev), rc, phb->msi_base, hwirq, xive_num);
-		return -EIO;
-	}
-	set_msi_irq_chip(phb, virq);
-
-	return 0;
-}
-EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
-#endif
-
 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 				  unsigned int hwirq, unsigned int virq,
 				  unsigned int is_64, struct msi_msg *msg)
@@ -2931,7 +2778,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
-	set_msi_irq_chip(phb, virq);
+	pnv_set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %d\n",
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 3a97990..49c2997 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -1,6 +1,10 @@
 #ifndef __POWERNV_PCI_H
 #define __POWERNV_PCI_H
 
+#include <linux/iommu.h>
+#include <asm/iommu.h>
+#include <asm/msi_bitmap.h>
+
 struct pci_dn;
 
 enum pnv_phb_type {
@@ -212,6 +216,8 @@ extern void pnv_pci_dma_dev_setup(struct pci_dev *pdev);
 extern void pnv_pci_dma_bus_setup(struct pci_bus *bus);
 extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
+extern struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev);
+extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq);
 
 extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/15] cxl: Add cxl_slot_is_supported API
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
  2016-07-13 21:17 ` [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [02/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 03/15] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Philippe Bergheaud

From: Ian Munsie <imunsie@au1.ibm.com>

This extends the check that the adapter is in a CAPI capable slot so
that it may be called by external users in the kernel API. This will be
used by the upcoming Mellanox CX4 support, which needs to know ahead of
time if the card can be switched to cxl mode so that it can leave it in
PCI mode if it is not.

This API takes a parameter to check if CAPP DMA mode is supported, which
it currently only allows on P8NVL systems, since that mode currently has
issues accessing memory < 4GB on P8, and we cannot realistically avoid
that.

This API does not currently check if a CAPP unit is available (i.e. not
already assigned to another PHB) on P8. Doing so would be racy since it
is assigned on a first come first serve basis, and so long as CAPP DMA
mode is not supported on P8 we don't need this, since the only
anticipated user of this API requires CAPP DMA mode.

Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

---

V1->V2:
	- Fixed typos in comments spotted by Andrew
---
 drivers/misc/cxl/pci.c | 37 +++++++++++++++++++++++++++++++++++++
 include/misc/cxl.h     | 15 +++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 3a5f980..6ac6b05 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1426,6 +1426,43 @@ static int cxl_slot_is_switched(struct pci_dev *dev)
 	return (depth > CXL_MAX_PCIEX_PARENT);
 }
 
+bool cxl_slot_is_supported(struct pci_dev *dev, int flags)
+{
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
+		return false;
+
+	if ((flags & CXL_SLOT_FLAG_DMA) && (!pvr_version_is(PVR_POWER8NVL))) {
+		/*
+		 * CAPP DMA mode is technically supported on regular P8, but
+		 * will EEH if the card attempts to access memory < 4GB, which
+		 * we cannot realistically avoid. We might be able to work
+		 * around the issue, but until then return unsupported:
+		 */
+		return false;
+	}
+
+	if (cxl_slot_is_switched(dev))
+		return false;
+
+	/*
+	 * XXX: This gets a little tricky on regular P8 (not POWER8NVL) since
+	 * the CAPP can be connected to PHB 0, 1 or 2 on a first come first
+	 * served basis, which is racy to check from here. If we need to
+	 * support this in future we might need to consider having this
+	 * function effectively reserve it ahead of time.
+	 *
+	 * Currently, the only user of this API is the Mellanox CX4, which is
+	 * only supported on P8NVL due to the above mentioned limitation of
+	 * CAPP DMA mode and therefore does not need to worry about this. If the
+	 * issue with CAPP DMA mode is later worked around on P8 we might need
+	 * to revisit this.
+	 */
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(cxl_slot_is_supported);
+
+
 static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	struct cxl *adapter;
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index b6d040f..dd9eebb 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -24,6 +24,21 @@
  * generic PCI API. This API is agnostic to the actual AFU.
  */
 
+#define CXL_SLOT_FLAG_DMA 0x1
+
+/*
+ * Checks if the given card is in a cxl capable slot. Pass CXL_SLOT_FLAG_DMA if
+ * the card requires CAPP DMA mode to also check if the system supports it.
+ * This is intended to be used by bi-modal devices to determine if they can use
+ * cxl mode or if they should continue running in PCI mode.
+ *
+ * Note that this only checks if the slot is cxl capable - it does not
+ * currently check if the CAPP is currently available for chips where it can be
+ * assigned to different PHBs on a first come first serve basis (i.e. P8)
+ */
+bool cxl_slot_is_supported(struct pci_dev *dev, int flags);
+
+
 /* Get the AFU associated with a pci_dev */
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/15] cxl: Enable bus mastering for devices using CAPP DMA mode
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
  2016-07-13 21:17 ` [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
  2016-07-13 21:17 ` [PATCH 02/15] cxl: Add cxl_slot_is_supported API Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [03/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 04/15] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
master to be enabled in order for the CAPI traffic to flow. This should
be harmless to enable for other cxl devices, so unconditionally enable
it in the adapter init flow.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 6ac6b05..deef9c7 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1264,6 +1264,9 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = adapter->native->sl_ops->adapter_regs_init(adapter, dev)))
 		goto err;
 
+	/* Required for devices using CAPP DMA mode, harmless for others */
+	pci_set_master(dev);
+
 	if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode)))
 		goto err;
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/15] cxl: Move cxl_afu_get / cxl_afu_put to base
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (2 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 03/15] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [04/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 05/15] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 uses a model where the AFU is one physical function of
the device, and is used by other peer physical functions of the same
device. This will require those other devices to grab a reference on the
AFU when they are initialised to make sure that it does not go away
during their lifetime.

Move the AFU refcount functions to base.c so they can be called from
the PHB code.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/base.c | 13 +++++++++++++
 drivers/misc/cxl/cxl.h  | 12 ------------
 include/misc/cxl-base.h |  4 ++++
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index e6f49ac..d7dcf5b 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -54,6 +54,19 @@ static inline void cxl_calls_put(struct cxl_calls *calls) { }
 
 #endif /* CONFIG_CXL_MODULE */
 
+/* AFU refcount management */
+struct cxl_afu *cxl_afu_get(struct cxl_afu *afu)
+{
+	return (get_device(&afu->dev) == NULL) ? NULL : afu;
+}
+EXPORT_SYMBOL_GPL(cxl_afu_get);
+
+void cxl_afu_put(struct cxl_afu *afu)
+{
+	put_device(&afu->dev);
+}
+EXPORT_SYMBOL_GPL(cxl_afu_put);
+
 void cxl_slbia(struct mm_struct *mm)
 {
 	struct cxl_calls *calls;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 36b3237..d4aae6f 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -440,18 +440,6 @@ struct cxl_afu {
 	bool enabled;
 };
 
-/* AFU refcount management */
-static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu)
-{
-
-	return (get_device(&afu->dev) == NULL) ? NULL : afu;
-}
-
-static inline void  cxl_afu_put(struct cxl_afu *afu)
-{
-	put_device(&afu->dev);
-}
-
 
 struct cxl_irq_name {
 	struct list_head list;
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index 5ae9625..f53808f 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -36,11 +36,15 @@ static inline void cxl_ctx_put(void)
        atomic_dec(&cxl_use_count);
 }
 
+struct cxl_afu *cxl_afu_get(struct cxl_afu *afu);
+void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
 
 #else /* CONFIG_CXL_BASE */
 
 static inline bool cxl_ctx_in_use(void) { return false; }
+static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu) { return NULL; }
+static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
 
 #endif /* CONFIG_CXL_BASE */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/15] cxl: Allow a default context to be associated with an external pci_dev
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (3 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 04/15] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [05/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 06/15] cxl: Do not create vPHB if there are no AFU configuration records Ian Munsie
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The cxl kernel API has a concept of a default context associated with
each PCI device under the virtual PHB. The Mellanox CX4 will also use
the cxl kernel API, but it does not use a virtual PHB - rather, the AFU
appears as a physical function as a peer to the networking functions.

In order to allow the kernel API to work with those networking
functions, we will need to associate a default context with them as
well. To this end, refactor the corresponding code to do this in vphb.c
and export it so that it can be called from the PHB code.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

---

V2->V3:
	Addressed feedback from Andrew Donnellan:
	- Fixed typo in comment
	- Moved _cxl_pci_associate_default_context and
	  _cxl_pci_disable_device from vphb.c to a new file phb.c since
	  they are used by both the vPHB and peer models.
---
 drivers/misc/cxl/Makefile |  2 +-
 drivers/misc/cxl/base.c   | 35 +++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/cxl.h    |  6 ++++++
 drivers/misc/cxl/main.c   |  2 ++
 drivers/misc/cxl/phb.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/cxl/vphb.c   | 30 +++---------------------------
 include/misc/cxl-base.h   |  6 ++++++
 7 files changed, 97 insertions(+), 28 deletions(-)
 create mode 100644 drivers/misc/cxl/phb.c

diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index 8a55c1a..56e9a47 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)	+= -Werror
 
 cxl-y				+= main.o file.o irq.o fault.o native.o
 cxl-y				+= context.o sysfs.o debugfs.o pci.o trace.o
-cxl-y				+= vphb.o api.o
+cxl-y				+= vphb.o phb.o api.o
 cxl-$(CONFIG_PPC_PSERIES)	+= flash.o guest.o of.o hcalls.o
 obj-$(CONFIG_CXL)		+= cxl.o
 obj-$(CONFIG_CXL_BASE)		+= base.o
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index d7dcf5b..1c3e737f 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -106,6 +106,41 @@ int cxl_update_properties(struct device_node *dn,
 }
 EXPORT_SYMBOL_GPL(cxl_update_properties);
 
+/*
+ * API calls into the driver that may be called from the PHB code and must be
+ * built in.
+ */
+bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	bool ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return false;
+
+	ret = calls->cxl_pci_associate_default_context(dev, afu);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_pci_associate_default_context);
+
+void cxl_pci_disable_device(struct pci_dev *dev)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_pci_disable_device(dev);
+
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL_GPL(cxl_pci_disable_device);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index d4aae6f..b81f476 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -719,9 +719,15 @@ static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg)
 ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 				loff_t off, size_t count);
 
+/* Internal functions wrapped in cxl_base to allow PHB to call them */
+bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
+void _cxl_pci_disable_device(struct pci_dev *dev);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
+	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
+	void (*cxl_pci_disable_device)(struct pci_dev *dev);
+
 	struct module *owner;
 };
 int register_cxl_calls(struct cxl_calls *calls);
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index ae68c32..4e5474b 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -110,6 +110,8 @@ static inline void cxl_slbia_core(struct mm_struct *mm)
 
 static struct cxl_calls cxl_calls = {
 	.cxl_slbia = cxl_slbia_core,
+	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
+	.cxl_pci_disable_device = _cxl_pci_disable_device,
 	.owner = THIS_MODULE,
 };
 
diff --git a/drivers/misc/cxl/phb.c b/drivers/misc/cxl/phb.c
new file mode 100644
index 0000000..0935d44
--- /dev/null
+++ b/drivers/misc/cxl/phb.c
@@ -0,0 +1,44 @@
+/*
+ * Copyright 2014-2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/pci.h>
+#include "cxl.h"
+
+bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	struct cxl_context *ctx;
+
+	/*
+	 * Allocate a context to do cxl things to. This is used for interrupts
+	 * in the peer model using a real phb, and if we eventually do DMA ops
+	 * in the virtual phb, we'll need a default context to attach them to.
+	 */
+	ctx = cxl_dev_context_init(dev);
+	if (!ctx)
+		return false;
+	dev->dev.archdata.cxl_ctx = ctx;
+
+	return (cxl_ops->afu_check_and_enable(afu) == 0);
+}
+/* exported via cxl_base */
+
+void _cxl_pci_disable_device(struct pci_dev *dev)
+{
+	struct cxl_context *ctx = cxl_get_context(dev);
+
+	if (ctx) {
+		if (ctx->status == STARTED) {
+			dev_err(&dev->dev, "Default context started\n");
+			return;
+		}
+		dev->dev.archdata.cxl_ctx = NULL;
+		cxl_release_context(ctx);
+	}
+}
+/* exported via cxl_base */
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 012b6aa..c8a759f 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -44,7 +44,6 @@ static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
 {
 	struct pci_controller *phb;
 	struct cxl_afu *afu;
-	struct cxl_context *ctx;
 
 	phb = pci_bus_to_host(dev->bus);
 	afu = (struct cxl_afu *)phb->private_data;
@@ -57,30 +56,7 @@ static bool cxl_pci_enable_device_hook(struct pci_dev *dev)
 	set_dma_ops(&dev->dev, &dma_direct_ops);
 	set_dma_offset(&dev->dev, PAGE_OFFSET);
 
-	/*
-	 * Allocate a context to do cxl things too.  If we eventually do real
-	 * DMA ops, we'll need a default context to attach them to
-	 */
-	ctx = cxl_dev_context_init(dev);
-	if (!ctx)
-		return false;
-	dev->dev.archdata.cxl_ctx = ctx;
-
-	return (cxl_ops->afu_check_and_enable(afu) == 0);
-}
-
-static void cxl_pci_disable_device(struct pci_dev *dev)
-{
-	struct cxl_context *ctx = cxl_get_context(dev);
-
-	if (ctx) {
-		if (ctx->status == STARTED) {
-			dev_err(&dev->dev, "Default context started\n");
-			return;
-		}
-		dev->dev.archdata.cxl_ctx = NULL;
-		cxl_release_context(ctx);
-	}
+	return _cxl_pci_associate_default_context(dev, afu);
 }
 
 static resource_size_t cxl_pci_window_alignment(struct pci_bus *bus,
@@ -197,8 +173,8 @@ static struct pci_controller_ops cxl_pci_controller_ops =
 {
 	.probe_mode = cxl_pci_probe_mode,
 	.enable_device_hook = cxl_pci_enable_device_hook,
-	.disable_device = cxl_pci_disable_device,
-	.release_device = cxl_pci_disable_device,
+	.disable_device = _cxl_pci_disable_device,
+	.release_device = _cxl_pci_disable_device,
 	.window_alignment = cxl_pci_window_alignment,
 	.reset_secondary_bus = cxl_pci_reset_secondary_bus,
 	.setup_msi_irqs = cxl_setup_msi_irqs,
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index f53808f..bb7e629 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -10,6 +10,8 @@
 #ifndef _MISC_CXL_BASE_H
 #define _MISC_CXL_BASE_H
 
+#include <misc/cxl.h>
+
 #ifdef CONFIG_CXL_BASE
 
 #define CXL_IRQ_RANGES 4
@@ -39,6 +41,8 @@ static inline void cxl_ctx_put(void)
 struct cxl_afu *cxl_afu_get(struct cxl_afu *afu);
 void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
+bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
+void cxl_pci_disable_device(struct pci_dev *dev);
 
 #else /* CONFIG_CXL_BASE */
 
@@ -46,6 +50,8 @@ static inline bool cxl_ctx_in_use(void) { return false; }
 static inline struct cxl_afu *cxl_afu_get(struct cxl_afu *afu) { return NULL; }
 static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
+static inline bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu) { return false; }
+static inline void cxl_pci_disable_device(struct pci_dev *dev) {}
 
 #endif /* CONFIG_CXL_BASE */
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/15] cxl: Do not create vPHB if there are no AFU configuration records
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (4 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 05/15] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [06/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The vPHB model of the cxl kernel API is a hierarchy where the AFU is
represented by the vPHB, and it's AFU configuration records are exposed
as functions under that vPHB. If there are no AFU configuration records
we will create a vPHB with nothing under it, which is a waste of
resources and will opt us into EEH handling despite not having anything
special to handle.

This also does not make sense for cards using the peer model of the cxl
kernel API, where the other functions of the device are exposed via
additional peer physical functions rather than AFU configuration
records. This model will also not work with the existing EEH handling in
the cxl driver, as that is designed around the vPHB model.

Skip creating the vPHB for AFUs without any AFU configuration records,
and opt out of EEH handling for them.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
---
 drivers/misc/cxl/pci.c  |  3 +++
 drivers/misc/cxl/vphb.c | 11 +++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index deef9c7..dd7ff22 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1572,6 +1572,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
 		 */
 		for (i = 0; i < adapter->slices; i++) {
 			afu = adapter->afu[i];
+			/* Only participate in EEH if we are on a virtual PHB */
+			if (afu->phb == NULL)
+				return PCI_ERS_RESULT_NONE;
 			cxl_vphb_error_detected(afu, state);
 		}
 		return PCI_ERS_RESULT_DISCONNECT;
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index c8a759f..8865e8d 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -188,6 +188,17 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
 	struct device_node *vphb_dn;
 	struct device *parent;
 
+	/*
+	 * If there are no AFU configuration records we won't have anything to
+	 * expose under the vPHB, so skip creating one, returning success since
+	 * this is still a valid case. This will also opt us out of EEH
+	 * handling since we won't have anything special to do if there are no
+	 * kernel drivers attached to the vPHB, and EEH handling is not yet
+	 * supported in the peer model.
+	 */
+	if (!afu->crs_num)
+		return 0;
+
 	/* The parent device is the adapter. Reuse the device node of
 	 * the adapter.
 	 * We don't seem to care what device node is used for the vPHB,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (5 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 06/15] cxl: Do not create vPHB if there are no AFU configuration records Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [07/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 08/15] cxl: Add support for using the kernel API with a real PHB Ian Munsie
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

This adds support for the peer model of the cxl kernel api to the
PowerNV PHB, in which physical function 0 represents the cxl function on
the card (an XSL in the case of the CX4), which other physical functions
will use for memory access and interrupt services. It is referred to as
the peer model as these functions are peers of one another, as opposed
to the Virtual PHB model which forms a hierarchy.

This patch exports APIs to enable the peer mode, check if a PCI device
is attached to a PHB in this mode, and to set and get the peer AFU for
this mode.

The cxl driver will enable this mode for supported cards by calling
pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
that this mode is enabled, and switch out it's controller_ops for the
cxl version.

The cxl version of the controller_ops struct implements it's own
versions of the enable_device_hook and release_device to handle
refcounting on the peer AFU and to allocate a default context for the
device.

Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
there is no safe way to disable cxl mode short of a reboot, so until
that changes there is no reason to support the disable path.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

---

V1->V2:
	- Add an explanation of the peer model to the commit message,
	  and a comment above the pnv_cxl_enable_device_hook function.
V2->V3
	Addressed comments from Andrew Donnellan:
	- Fix typo in comment
	- Changed two exported symbols to EXPORT_SYMBOL_GPL
	- Undid change to remove static from pnv_pci_release_device and
	  pci_controller_ops and declare them in the header, both of
	  which were left over from an earlier cut.
---
 arch/powerpc/include/asm/pnv-pci.h        |   7 ++
 arch/powerpc/platforms/powernv/pci-cxl.c  | 120 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |  18 ++++-
 arch/powerpc/platforms/powernv/pci.h      |  14 ++++
 4 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 791db1b..c47097f 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -38,6 +38,13 @@ int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
 			       struct pci_dev *dev, int num);
 void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs,
 				  struct pci_dev *dev);
+
+/* Support for the cxl kernel api on the real PHB (instead of vPHB) */
+int pnv_cxl_enable_phb_kernel_api(struct pci_controller *hose, bool enable);
+bool pnv_pci_on_cxl_phb(struct pci_dev *dev);
+struct cxl_afu *pnv_cxl_phb_to_afu(struct pci_controller *hose);
+void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu);
+
 #endif
 
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
index e0eeb00..831bbfb 100644
--- a/arch/powerpc/platforms/powernv/pci-cxl.c
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -7,8 +7,11 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/module.h>
+#include <asm/pci-bridge.h>
 #include <asm/pnv-pci.h>
 #include <asm/opal.h>
+#include <misc/cxl.h>
 
 #include "pci.h"
 
@@ -161,3 +164,120 @@ int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
 	return 0;
 }
 EXPORT_SYMBOL(pnv_cxl_ioda_msi_setup);
+
+/*
+ * Sets flags and switches the controller ops to enable the cxl kernel api.
+ * Originally the cxl kernel API operated on a virtual PHB, but certain cards
+ * such as the Mellanox CX4 use a peer model instead and for these cards the
+ * cxl kernel api will operate on the real PHB.
+ */
+int pnv_cxl_enable_phb_kernel_api(struct pci_controller *hose, bool enable)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct module *cxl_module;
+
+	if (!enable) {
+		/*
+		 * Once cxl mode is enabled on the PHB, there is currently no
+		 * known safe method to disable it again, and trying risks a
+		 * checkstop. If we can find a way to safely disable cxl mode
+		 * in the future we can revisit this, but for now the only sane
+		 * thing to do is to refuse to disable cxl mode:
+		 */
+		return -EPERM;
+	}
+
+	/*
+	 * Hold a reference to the cxl module since several PHB operations now
+	 * depend on it, and it would be insane to allow it to be removed so
+	 * long as we are in this mode (and since we can't safely disable this
+	 * mode once enabled...).
+	 */
+	mutex_lock(&module_mutex);
+	cxl_module = find_module("cxl");
+	if (cxl_module)
+		__module_get(cxl_module);
+	mutex_unlock(&module_mutex);
+	if (!cxl_module)
+		return -ENODEV;
+
+	phb->flags |= PNV_PHB_FLAG_CXL;
+	hose->controller_ops = pnv_cxl_cx4_ioda_controller_ops;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_cxl_enable_phb_kernel_api);
+
+bool pnv_pci_on_cxl_phb(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	return !!(phb->flags & PNV_PHB_FLAG_CXL);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_on_cxl_phb);
+
+struct cxl_afu *pnv_cxl_phb_to_afu(struct pci_controller *hose)
+{
+	struct pnv_phb *phb = hose->private_data;
+
+	return (struct cxl_afu *)phb->cxl_afu;
+}
+EXPORT_SYMBOL_GPL(pnv_cxl_phb_to_afu);
+
+void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+
+	phb->cxl_afu = afu;
+}
+EXPORT_SYMBOL_GPL(pnv_cxl_phb_set_peer_afu);
+
+/*
+ * In the peer cxl model, the XSL/PSL is physical function 0, and will be used
+ * by other functions on the device for memory access and interrupts. When the
+ * other functions are enabled we explicitly take a reference on the cxl
+ * function since they will use it, and allocate a default context associated
+ * with that function just like the vPHB model of the cxl kernel API.
+ */
+bool pnv_cxl_enable_device_hook(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct cxl_afu *afu = phb->cxl_afu;
+
+	if (!pnv_pci_enable_device_hook(dev))
+		return false;
+
+
+	/* No special handling for the cxl function, which is always PF 0 */
+	if (PCI_FUNC(dev->devfn) == 0)
+		return true;
+
+	if (!afu) {
+		dev_WARN(&dev->dev, "Attempted to enable function > 0 on CXL PHB without a peer AFU\n");
+		return false;
+	}
+
+	dev_info(&dev->dev, "Enabling function on CXL enabled PHB with peer AFU\n");
+
+	/* Make sure the peer AFU can't go away while this device is active */
+	cxl_afu_get(afu);
+
+	return cxl_pci_associate_default_context(dev, afu);
+}
+
+void pnv_cxl_disable_device(struct pci_dev *dev)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct cxl_afu *afu = phb->cxl_afu;
+
+	/* No special handling for cxl function: */
+	if (PCI_FUNC(dev->devfn) == 0)
+		return;
+
+	cxl_pci_disable_device(dev);
+	cxl_afu_put(afu);
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e0d8103..104c040 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3222,7 +3222,7 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
-static bool pnv_pci_enable_device_hook(struct pci_dev *dev)
+bool pnv_pci_enable_device_hook(struct pci_dev *dev)
 {
 	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct pnv_phb *phb = hose->private_data;
@@ -3461,6 +3461,22 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
+#ifdef CONFIG_CXL_BASE
+const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
+	.dma_dev_setup		= pnv_pci_dma_dev_setup,
+	.dma_bus_setup		= pnv_pci_dma_bus_setup,
+	.enable_device_hook	= pnv_cxl_enable_device_hook,
+	.disable_device		= pnv_cxl_disable_device,
+	.release_device		= pnv_pci_release_device,
+	.window_alignment	= pnv_pci_window_alignment,
+	.setup_bridge		= pnv_pci_setup_bridge,
+	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
+	.dma_set_mask		= pnv_pci_ioda_dma_set_mask,
+	.dma_get_required_mask	= pnv_pci_ioda_dma_get_required_mask,
+	.shutdown		= pnv_pci_ioda_shutdown,
+};
+#endif
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 49c2997..20c9a6b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -76,6 +76,7 @@ struct pnv_ioda_pe {
 };
 
 #define PNV_PHB_FLAG_EEH	(1 << 0)
+#define PNV_PHB_FLAG_CXL	(1 << 1) /* Real PHB supporting the cxl kernel API */
 
 struct pnv_phb {
 	struct pci_controller	*hose;
@@ -177,6 +178,9 @@ struct pnv_phb {
 		struct OpalIoP7IOCErrorData 	hub_diag;
 	} diag;
 
+#ifdef CONFIG_CXL_BASE
+	struct cxl_afu *cxl_afu;
+#endif
 };
 
 extern struct pci_ops pnv_pci_ops;
@@ -218,6 +222,7 @@ extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
 extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
 extern struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev);
 extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq);
+extern bool pnv_pci_enable_device_hook(struct pci_dev *dev);
 
 extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...);
@@ -238,4 +243,13 @@ extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
 extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
 extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
 
+
+/* cxl functions */
+extern bool pnv_cxl_enable_device_hook(struct pci_dev *dev);
+extern void pnv_cxl_disable_device(struct pci_dev *dev);
+
+
+/* phb ops (cxl switches these when enabling the kernel api on the phb) */
+extern const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops;
+
 #endif /* __POWERNV_PCI_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/15] cxl: Add support for using the kernel API with a real PHB
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (6 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [08/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 09/15] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

This hooks up support for using the kernel API with a real PHB. After
the AFU initialisation has completed it calls into the PHB code to pass
it the AFU that will be used by other peer physical functions on the
adapter.

The cxl_pci_to_afu API is extended to work with peer PCI devices,
retrieving the peer AFU from the PHB. This API may also now return an
error if it is called on a PCI device that is not associated with either
a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
down.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

---
V1->V2:
	- Removed change to skip participating in EEH without a vPHB -
	  moved out into an earlier patch.
---
 drivers/misc/cxl/api.c  |  5 +++++
 drivers/misc/cxl/pci.c  |  3 +++
 drivers/misc/cxl/vphb.c | 16 ++++++++++++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 7707055..6a030bf 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -13,6 +13,7 @@
 #include <linux/file.h>
 #include <misc/cxl.h>
 #include <linux/fs.h>
+#include <asm/pnv-pci.h>
 
 #include "cxl.h"
 
@@ -24,6 +25,8 @@ struct cxl_context *cxl_dev_context_init(struct pci_dev *dev)
 	int rc;
 
 	afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return ERR_CAST(afu);
 
 	ctx = cxl_context_alloc();
 	if (IS_ERR(ctx)) {
@@ -438,6 +441,8 @@ EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
 ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count)
 {
 	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
 
 	return cxl_ops->read_adapter_vpd(afu->adapter, buf, count);
 }
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index dd7ff22..cb5d172 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1502,6 +1502,9 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			dev_err(&dev->dev, "AFU %i failed to start: %i\n", slice, rc);
 	}
 
+	if (pnv_pci_on_cxl_phb(dev) && adapter->slices >= 1)
+		pnv_cxl_phb_set_peer_afu(dev, adapter->afu[0]);
+
 	return 0;
 }
 
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 8865e8d..dee8def 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -9,6 +9,7 @@
 
 #include <linux/pci.h>
 #include <misc/cxl.h>
+#include <asm/pnv-pci.h>
 #include "cxl.h"
 
 static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
@@ -258,13 +259,18 @@ void cxl_pci_vphb_remove(struct cxl_afu *afu)
 	pcibios_free_controller(phb);
 }
 
+static bool _cxl_pci_is_vphb_device(struct pci_controller *phb)
+{
+	return (phb->ops == &cxl_pcie_pci_ops);
+}
+
 bool cxl_pci_is_vphb_device(struct pci_dev *dev)
 {
 	struct pci_controller *phb;
 
 	phb = pci_bus_to_host(dev->bus);
 
-	return (phb->ops == &cxl_pcie_pci_ops);
+	return _cxl_pci_is_vphb_device(phb);
 }
 
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev)
@@ -273,7 +279,13 @@ struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev)
 
 	phb = pci_bus_to_host(dev->bus);
 
-	return (struct cxl_afu *)phb->private_data;
+	if (_cxl_pci_is_vphb_device(phb))
+		return (struct cxl_afu *)phb->private_data;
+
+	if (pnv_pci_on_cxl_phb(dev))
+		return pnv_cxl_phb_to_afu(phb);
+
+	return ERR_PTR(-ENODEV);
 }
 EXPORT_SYMBOL_GPL(cxl_pci_to_afu);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/15] cxl: Add kernel APIs to get & set the max irqs per context
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (7 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 08/15] cxl: Add support for using the kernel API with a real PHB Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [09/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 10/15] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

These APIs will be used by the Mellanox CX4 support. While they function
standalone to configure existing behaviour, their primary purpose is to
allow the Mellanox driver to inform the cxl driver of a hardware
limitation, which will be used in a future patch.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
---
 drivers/misc/cxl/api.c | 27 +++++++++++++++++++++++++++
 include/misc/cxl.h     | 10 ++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 6a030bf..1e2c0d9 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -447,3 +447,30 @@ ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count)
 	return cxl_ops->read_adapter_vpd(afu->adapter, buf, count);
 }
 EXPORT_SYMBOL_GPL(cxl_read_adapter_vpd);
+
+int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs)
+{
+	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
+
+	if (irqs > afu->adapter->user_irqs)
+		return -EINVAL;
+
+	/* Limit user_irqs to prevent the user increasing this via sysfs */
+	afu->adapter->user_irqs = irqs;
+	afu->irqs_max = irqs;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_set_max_irqs_per_process);
+
+int cxl_get_max_irqs_per_process(struct pci_dev *dev)
+{
+	struct cxl_afu *afu = cxl_pci_to_afu(dev);
+	if (IS_ERR(afu))
+		return -ENODEV;
+
+	return afu->irqs_max;
+}
+EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index dd9eebb..fc07ed4 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -166,6 +166,16 @@ void cxl_psa_unmap(void __iomem *addr);
 /*  Get the process element for this context */
 int cxl_process_element(struct cxl_context *ctx);
 
+/*
+ * Limit the number of interrupts that a single context can allocate via
+ * cxl_start_work. If using the api with a real phb, this may be used to
+ * request that additional default contexts be created when allocating
+ * interrupts via pci_enable_msix_range. These will be set to the same running
+ * state as the default context, and if that is running it will reuse the
+ * parameters previously passed to cxl_start_context for the default context.
+ */
+int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs);
+int cxl_get_max_irqs_per_process(struct pci_dev *dev);
 
 /*
  * These calls allow drivers to create their own file descriptors and make them
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/15] cxl: Add preliminary workaround for CX4 interrupt limitation
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (8 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 09/15] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [10/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 has a hardware limitation where only 4 bits of the
AFU interrupt number can be passed to the XSL when sending an interrupt,
limiting it to only 15 interrupts per context (AFU interrupt number 0 is
invalid).

In order to overcome this, we will allocate additional contexts linked
to the default context as extra address space for the extra interrupts -
this will be implemented in the next patch.

This patch adds the preliminary support to allow this, by way of adding
a linked list in the context structure that we use to keep track of the
contexts dedicated to interrupts, and an API to simultaneously iterate
over the related context structures, AFU interrupt numbers and hardware
interrupt numbers. The point of using a single API to iterate these is
to hide some of the details of the iteration from external code, and to
reduce the number of APIs that need to be exported via base.c to allow
built in code to call.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

---

V1->V2:
	- Fixed typo spotted by Fred
---
 drivers/misc/cxl/api.c     | 15 +++++++++++++++
 drivers/misc/cxl/base.c    | 17 +++++++++++++++++
 drivers/misc/cxl/context.c |  1 +
 drivers/misc/cxl/cxl.h     | 10 ++++++++++
 drivers/misc/cxl/main.c    |  1 +
 include/misc/cxl.h         |  9 +++++++++
 6 files changed, 53 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 1e2c0d9..f02a859 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -97,6 +97,21 @@ static irq_hw_number_t cxl_find_afu_irq(struct cxl_context *ctx, int num)
 	return 0;
 }
 
+int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq)
+{
+	if (*ctx == NULL || *afu_irq == 0) {
+		*afu_irq = 1;
+		*ctx = cxl_get_context(pdev);
+	} else {
+		(*afu_irq)++;
+		if (*afu_irq > cxl_get_max_irqs_per_process(pdev)) {
+			*ctx = list_next_entry(*ctx, extra_irq_contexts);
+			*afu_irq = 1;
+		}
+	}
+	return cxl_find_afu_irq(*ctx, *afu_irq);
+}
+/* Exported via cxl_base */
 
 int cxl_set_priv(struct cxl_context *ctx, void *priv)
 {
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index 1c3e737f..d7d9d02 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -141,6 +141,23 @@ void cxl_pci_disable_device(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(cxl_pci_disable_device);
 
+int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq)
+{
+	int ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return -EBUSY;
+
+	ret = calls->cxl_next_msi_hwirq(pdev, ctx, afu_irq);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_next_msi_hwirq);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index edbb99e..2616cddb 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -68,6 +68,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 	ctx->pending_afu_err = false;
 
 	INIT_LIST_HEAD(&ctx->irq_names);
+	INIT_LIST_HEAD(&ctx->extra_irq_contexts);
 
 	/*
 	 * When we have to destroy all contexts in cxl_context_detach_all() we
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index b81f476..73b9a55 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -537,6 +537,14 @@ struct cxl_context {
 	atomic_t afu_driver_events;
 
 	struct rcu_head rcu;
+
+	/*
+	 * Only used when more interrupts are allocated via
+	 * pci_enable_msix_range than are supported in the default context, to
+	 * use additional contexts to overcome the limitation. i.e. Mellanox
+	 * CX4 only:
+	 */
+	struct list_head extra_irq_contexts;
 };
 
 struct cxl_service_layer_ops {
@@ -722,11 +730,13 @@ ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 /* Internal functions wrapped in cxl_base to allow PHB to call them */
 bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void _cxl_pci_disable_device(struct pci_dev *dev);
+int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
 	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
 	void (*cxl_pci_disable_device)(struct pci_dev *dev);
+	int (*cxl_next_msi_hwirq)(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
 
 	struct module *owner;
 };
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 4e5474b..66fac71 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -112,6 +112,7 @@ static struct cxl_calls cxl_calls = {
 	.cxl_slbia = cxl_slbia_core,
 	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
 	.cxl_pci_disable_device = _cxl_pci_disable_device,
+	.cxl_next_msi_hwirq = _cxl_next_msi_hwirq,
 	.owner = THIS_MODULE,
 };
 
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index fc07ed4..6c52cbc 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -178,6 +178,15 @@ int cxl_set_max_irqs_per_process(struct pci_dev *dev, int irqs);
 int cxl_get_max_irqs_per_process(struct pci_dev *dev);
 
 /*
+ * Use to simultaneously iterate over hardware interrupt numbers, contexts and
+ * afu interrupt numbers allocated for the device via pci_enable_msix_range and
+ * is a useful convenience function when working with hardware that has
+ * limitations on the number of interrupts per process. *ctx and *afu_irq
+ * should be NULL and 0 to start the iteration.
+ */
+int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+
+/*
  * These calls allow drivers to create their own file descriptors and make them
  * identical to the cxl file descriptor user API. An example use case:
  *
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (9 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 10/15] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-14  5:34   ` Andrew Donnellan
  2016-07-15 10:53   ` [11/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
interrupts are routed from the networking hardware to the XSL using the
MSIX table, and from there will be transformed back into an MSIX
interrupt using the cxl style interrupts (i.e. using IVTE entries and
ranges to map a PE and AFU interrupt number to an MSIX address).

We want to hide the implementation details of cxl interrupts as much as
possible. To this end, we use a special version of the MSI setup &
teardown routines in the PHB while in cxl mode to allocate the cxl
interrupts and configure the IVTE entries in the process element.

This function does not configure the MSIX table - the CX4 card uses a
custom format in that table and it would not be appropriate to fill that
out in generic code. The rest of the functionality is similar to the
"Full MSI-X mode" described in the CAIA, and this could be easily
extended to support other adapters that use that mode in the future.

The interrupts will be associated with the default context. If the
maximum number of interrupts per context has been limited (e.g. by the
mlx5 driver), it will automatically allocate additional kernel contexts
to associate extra interrupts as required. These contexts will be
started using the same WED that was used to start the default context.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

---

V1->V2:
	- Handle error case if cxl_next_msi_hwirq returns 0 signifying
	  that an AFU IRQ is not mapped to a hardware interrupt.
---
 arch/powerpc/platforms/powernv/pci-cxl.c  | 84 +++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |  4 ++
 arch/powerpc/platforms/powernv/pci.h      |  2 +
 drivers/misc/cxl/api.c                    | 71 ++++++++++++++++++++++++++
 drivers/misc/cxl/base.c                   | 31 ++++++++++++
 drivers/misc/cxl/cxl.h                    |  4 ++
 drivers/misc/cxl/main.c                   |  2 +
 include/misc/cxl-base.h                   |  4 ++
 8 files changed, 202 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c b/arch/powerpc/platforms/powernv/pci-cxl.c
index 831bbfb..3f34207 100644
--- a/arch/powerpc/platforms/powernv/pci-cxl.c
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/msi.h>
 #include <asm/pci-bridge.h>
 #include <asm/pnv-pci.h>
 #include <asm/opal.h>
@@ -281,3 +282,86 @@ void pnv_cxl_disable_device(struct pci_dev *dev)
 	cxl_pci_disable_device(dev);
 	cxl_afu_put(afu);
 }
+
+/*
+ * This is a special version of pnv_setup_msi_irqs for cards in cxl mode. This
+ * function handles setting up the IVTE entries for the XSL to use.
+ *
+ * We are currently not filling out the MSIX table, since the only currently
+ * supported adapter (CX4) uses a custom MSIX table format in cxl mode and it
+ * is up to their driver to fill that out. In the future we may fill out the
+ * MSIX table (and change the IVTE entries to be an index to the MSIX table)
+ * for adapters implementing the Full MSI-X mode described in the CAIA.
+ */
+int pnv_cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+	struct cxl_context *ctx = NULL;
+	unsigned int virq;
+	int hwirq;
+	int afu_irq = 0;
+	int rc;
+
+	if (WARN_ON(!phb) || !phb->msi_bmp.bitmap)
+		return -ENODEV;
+
+	if (pdev->no_64bit_msi && !phb->msi32_support)
+		return -ENODEV;
+
+	rc = cxl_cx4_setup_msi_irqs(pdev, nvec, type);
+	if (rc)
+		return rc;
+
+	for_each_pci_msi_entry(entry, pdev) {
+		if (!entry->msi_attrib.is_64 && !phb->msi32_support) {
+			pr_warn("%s: Supports only 64-bit MSIs\n",
+				pci_name(pdev));
+			return -ENXIO;
+		}
+
+		hwirq = cxl_next_msi_hwirq(pdev, &ctx, &afu_irq);
+		if (WARN_ON(hwirq <= 0))
+			return (hwirq ? hwirq : -ENOMEM);
+
+		virq = irq_create_mapping(NULL, hwirq);
+		if (virq == NO_IRQ) {
+			pr_warn("%s: Failed to map cxl mode MSI to linux irq\n",
+				pci_name(pdev));
+			return -ENOMEM;
+		}
+
+		rc = pnv_cxl_ioda_msi_setup(pdev, hwirq, virq);
+		if (rc) {
+			pr_warn("%s: Failed to setup cxl mode MSI\n", pci_name(pdev));
+			irq_dispose_mapping(virq);
+			return rc;
+		}
+
+		irq_set_msi_desc(virq, entry);
+	}
+
+	return 0;
+}
+
+void pnv_cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	struct msi_desc *entry;
+	irq_hw_number_t hwirq;
+
+	if (WARN_ON(!phb))
+		return;
+
+	for_each_pci_msi_entry(entry, pdev) {
+		if (entry->irq == NO_IRQ)
+			continue;
+		hwirq = virq_to_hw(entry->irq);
+		irq_set_msi_desc(entry->irq, NULL);
+		irq_dispose_mapping(entry->irq);
+	}
+
+	cxl_cx4_teardown_msi_irqs(pdev);
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 104c040..530d4af 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3465,6 +3465,10 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_dma_bus_setup,
+#ifdef CONFIG_PCI_MSI
+	.setup_msi_irqs		= pnv_cxl_cx4_setup_msi_irqs,
+	.teardown_msi_irqs	= pnv_cxl_cx4_teardown_msi_irqs,
+#endif
 	.enable_device_hook	= pnv_cxl_enable_device_hook,
 	.disable_device		= pnv_cxl_disable_device,
 	.release_device		= pnv_pci_release_device,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 20c9a6b..f0c276c 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -247,6 +247,8 @@ extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
 /* cxl functions */
 extern bool pnv_cxl_enable_device_hook(struct pci_dev *dev);
 extern void pnv_cxl_disable_device(struct pci_dev *dev);
+extern int pnv_cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+extern void pnv_cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 
 /* phb ops (cxl switches these when enabling the kernel api on the phb) */
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index f02a859..f3d34b9 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -14,6 +14,7 @@
 #include <misc/cxl.h>
 #include <linux/fs.h>
 #include <asm/pnv-pci.h>
+#include <linux/msi.h>
 
 #include "cxl.h"
 
@@ -489,3 +490,73 @@ int cxl_get_max_irqs_per_process(struct pci_dev *dev)
 	return afu->irqs_max;
 }
 EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
+
+/*
+ * This is a special interrupt allocation routine called from the PHB's MSI
+ * setup function. When capi interrupts are allocated in this manner they must
+ * still be associated with a running context, but since the MSI APIs have no
+ * way to specify this we use the default context associated with the device.
+ *
+ * The Mellanox CX4 has a hardware limitation that restricts the maximum AFU
+ * interrupt number, so in order to overcome this their driver informs us of
+ * the restriction by setting the maximum interrupts per context, and we
+ * allocate additional contexts as necessary so that we can keep the AFU
+ * interrupt number within the supported range.
+ */
+int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	struct cxl_context *ctx, *new_ctx, *default_ctx;
+	int remaining;
+	int rc;
+
+	ctx = default_ctx = cxl_get_context(pdev);
+	if (WARN_ON(!default_ctx))
+		return -ENODEV;
+
+	remaining = nvec;
+	while (remaining > 0) {
+		rc = cxl_allocate_afu_irqs(ctx, min(remaining, ctx->afu->irqs_max));
+		if (rc) {
+			pr_warn("%s: Failed to find enough free MSIs\n", pci_name(pdev));
+			return rc;
+		}
+		remaining -= ctx->afu->irqs_max;
+
+		if (ctx != default_ctx && default_ctx->status == STARTED) {
+			WARN_ON(cxl_start_context(ctx,
+				be64_to_cpu(default_ctx->elem->common.wed),
+				NULL));
+		}
+
+		if (remaining > 0) {
+			new_ctx = cxl_dev_context_init(pdev);
+			if (!new_ctx) {
+				pr_warn("%s: Failed to allocate enough contexts for MSIs\n", pci_name(pdev));
+				return -ENOSPC;
+			}
+			list_add(&new_ctx->extra_irq_contexts, &ctx->extra_irq_contexts);
+			ctx = new_ctx;
+		}
+	}
+
+	return 0;
+}
+/* Exported via cxl_base */
+
+void _cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct cxl_context *ctx, *pos, *tmp;
+
+	ctx = cxl_get_context(pdev);
+	if (WARN_ON(!ctx))
+		return;
+
+	cxl_free_afu_irqs(ctx);
+	list_for_each_entry_safe(pos, tmp, &ctx->extra_irq_contexts, extra_irq_contexts) {
+		cxl_stop_context(pos);
+		cxl_free_afu_irqs(pos);
+		list_del(&pos->extra_irq_contexts);
+		cxl_release_context(pos);
+	}
+}
+/* Exported via cxl_base */
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index d7d9d02..43ec3a9 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -158,6 +158,37 @@ int cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_
 }
 EXPORT_SYMBOL_GPL(cxl_next_msi_hwirq);
 
+int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+{
+	int ret;
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return false;
+
+	ret = calls->cxl_cx4_setup_msi_irqs(pdev, nvec, type);
+
+	cxl_calls_put(calls);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cxl_cx4_setup_msi_irqs);
+
+void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev)
+{
+	struct cxl_calls *calls;
+
+	calls = cxl_calls_get();
+	if (!calls)
+		return;
+
+	calls->cxl_cx4_teardown_msi_irqs(pdev);
+
+	cxl_calls_put(calls);
+}
+EXPORT_SYMBOL_GPL(cxl_cx4_teardown_msi_irqs);
+
 static int __init cxl_base_init(void)
 {
 	struct device_node *np = NULL;
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 73b9a55..d50cdb1 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -731,12 +731,16 @@ ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
 bool _cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void _cxl_pci_disable_device(struct pci_dev *dev);
 int _cxl_next_msi_hwirq(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+void _cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 struct cxl_calls {
 	void (*cxl_slbia)(struct mm_struct *mm);
 	bool (*cxl_pci_associate_default_context)(struct pci_dev *dev, struct cxl_afu *afu);
 	void (*cxl_pci_disable_device)(struct pci_dev *dev);
 	int (*cxl_next_msi_hwirq)(struct pci_dev *pdev, struct cxl_context **ctx, int *afu_irq);
+	int (*cxl_cx4_setup_msi_irqs)(struct pci_dev *pdev, int nvec, int type);
+	void (*cxl_cx4_teardown_msi_irqs)(struct pci_dev *pdev);
 
 	struct module *owner;
 };
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 66fac71..d9be23b2 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -113,6 +113,8 @@ static struct cxl_calls cxl_calls = {
 	.cxl_pci_associate_default_context = _cxl_pci_associate_default_context,
 	.cxl_pci_disable_device = _cxl_pci_disable_device,
 	.cxl_next_msi_hwirq = _cxl_next_msi_hwirq,
+	.cxl_cx4_setup_msi_irqs = _cxl_cx4_setup_msi_irqs,
+	.cxl_cx4_teardown_msi_irqs = _cxl_cx4_teardown_msi_irqs,
 	.owner = THIS_MODULE,
 };
 
diff --git a/include/misc/cxl-base.h b/include/misc/cxl-base.h
index bb7e629..b2ebc91 100644
--- a/include/misc/cxl-base.h
+++ b/include/misc/cxl-base.h
@@ -43,6 +43,8 @@ void cxl_afu_put(struct cxl_afu *afu);
 void cxl_slbia(struct mm_struct *mm);
 bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu);
 void cxl_pci_disable_device(struct pci_dev *dev);
+int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
+void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev);
 
 #else /* CONFIG_CXL_BASE */
 
@@ -52,6 +54,8 @@ static inline void cxl_afu_put(struct cxl_afu *afu) {}
 static inline void cxl_slbia(struct mm_struct *mm) {}
 static inline bool cxl_pci_associate_default_context(struct pci_dev *dev, struct cxl_afu *afu) { return false; }
 static inline void cxl_pci_disable_device(struct pci_dev *dev) {}
+static inline int cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) { return -ENODEV; }
+static inline void cxl_cx4_teardown_msi_irqs(struct pci_dev *pdev) {}
 
 #endif /* CONFIG_CXL_BASE */
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (10 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [12/15] " Michael Ellerman
  2016-07-28  1:48   ` [PATCH 12/15] " Andrew Donnellan
  2016-07-13 21:17 ` [PATCH 13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

From: Ian Munsie <imunsie@au1.ibm.com>

The CX4 card cannot cope with a context with PE=0 due to a hardware
limitation, resulting in:

[   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
[   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting

Since the kernel API allocates a default context very early during
device init that will almost certainly get Process Element ID 0 there is
no easy way for us to extend the API to allow the Mellanox to inform us
of this limitation ahead of time.

Instead, work around the issue by extending the XSL structure to include
a minimum PE to allocate. Although the bug is not in the XSL, it is the
easiest place to work around this limitation given that the CX4 is
currently the only card that uses an XSL.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/context.c | 3 ++-
 drivers/misc/cxl/cxl.h     | 1 +
 drivers/misc/cxl/pci.c     | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 2616cddb..bdee9a0 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -90,7 +90,8 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 	 */
 	mutex_lock(&afu->contexts_lock);
 	idr_preload(GFP_KERNEL);
-	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
+	i = idr_alloc(&ctx->afu->contexts_idr, ctx,
+		      ctx->afu->adapter->native->sl_ops->min_pe,
 		      ctx->afu->num_procs, GFP_NOWAIT);
 	idr_preload_end();
 	mutex_unlock(&afu->contexts_lock);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index d50cdb1..de09053 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -561,6 +561,7 @@ struct cxl_service_layer_ops {
 	u64 (*timebase_read)(struct cxl *adapter);
 	int capi_mode;
 	bool needs_reset_before_disable;
+	int min_pe;
 };
 
 struct cxl_native {
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index cb5d172..efe202f 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1321,6 +1321,7 @@ static const struct cxl_service_layer_ops xsl_ops = {
 	.write_timebase_ctrl = write_timebase_ctrl_xsl,
 	.timebase_read = timebase_read_xsl,
 	.capi_mode = OPAL_PHB_CAPI_MODE_DMA,
+	.min_pe = 1, /* Workaround for Mellanox CX4 HW bug */
 };
 
 static void set_sl_ops(struct cxl *adapter, struct pci_dev *dev)
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (11 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [13/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
  2016-07-13 21:17 ` [PATCH 15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Gavin Shan, linux-pci, Bjorn Helgaas

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

The cxl driver will use infrastructure from pnv_php to handle device tree
updates when switching bi-modal CAPI cards into CAPI mode.

To enable this, export pnv_php_find_slot() and
pnv_php_set_slot_power_state(), and add corresponding declarations, as well
as the definition of struct pnv_php_slot, to asm/pnv-pci.h.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

---

V1->V2:

	- Dropped extraneous "select HOTPLUG_PCI_POWERNV_BASE" in Kconfig,
	  which was accidentally left in from an earlier non-public
	  revision. Thanks to Gavin Shan for pointing it out
---
 arch/powerpc/include/asm/pnv-pci.h | 28 ++++++++++++++++++++++++++++
 drivers/pci/hotplug/pnv_php.c      | 32 +++++---------------------------
 2 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index c47097f..0cbd813 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -11,6 +11,7 @@
 #define _ASM_PNV_PCI_H
 
 #include <linux/pci.h>
+#include <linux/pci_hotplug.h>
 #include <misc/cxl-base.h>
 #include <asm/opal-api.h>
 
@@ -47,4 +48,31 @@ void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu);
 
 #endif
 
+struct pnv_php_slot {
+	struct hotplug_slot		slot;
+	struct hotplug_slot_info	slot_info;
+	uint64_t			id;
+	char				*name;
+	int				slot_no;
+	struct kref			kref;
+#define PNV_PHP_STATE_INITIALIZED	0
+#define PNV_PHP_STATE_REGISTERED	1
+#define PNV_PHP_STATE_POPULATED		2
+#define PNV_PHP_STATE_OFFLINE		3
+	int				state;
+	struct device_node		*dn;
+	struct pci_dev			*pdev;
+	struct pci_bus			*bus;
+	bool				power_state_check;
+	void				*fdt;
+	void				*dt;
+	struct of_changeset		ocs;
+	struct pnv_php_slot		*parent;
+	struct list_head		children;
+	struct list_head		link;
+};
+extern struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn);
+extern int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
+					uint8_t state);
+
 #endif
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 6086db6..2d2f704 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -22,30 +22,6 @@
 #define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
 #define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
 
-struct pnv_php_slot {
-	struct hotplug_slot		slot;
-	struct hotplug_slot_info	slot_info;
-	uint64_t			id;
-	char				*name;
-	int				slot_no;
-	struct kref			kref;
-#define PNV_PHP_STATE_INITIALIZED	0
-#define PNV_PHP_STATE_REGISTERED	1
-#define PNV_PHP_STATE_POPULATED		2
-#define PNV_PHP_STATE_OFFLINE		3
-	int				state;
-	struct device_node		*dn;
-	struct pci_dev			*pdev;
-	struct pci_bus			*bus;
-	bool				power_state_check;
-	void				*fdt;
-	void				*dt;
-	struct of_changeset		ocs;
-	struct pnv_php_slot		*parent;
-	struct list_head		children;
-	struct list_head		link;
-};
-
 static LIST_HEAD(pnv_php_slot_list);
 static DEFINE_SPINLOCK(pnv_php_lock);
 
@@ -91,7 +67,7 @@ static struct pnv_php_slot *pnv_php_match(struct device_node *dn,
 	return NULL;
 }
 
-static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
+struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
 {
 	struct pnv_php_slot *php_slot, *tmp;
 	unsigned long flags;
@@ -108,6 +84,7 @@ static struct pnv_php_slot *pnv_php_find_slot(struct device_node *dn)
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(pnv_php_find_slot);
 
 /*
  * Remove pdn for all children of the indicated device node.
@@ -316,8 +293,8 @@ out:
 	return ret;
 }
 
-static int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
-					uint8_t state)
+int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
+				 uint8_t state)
 {
 	struct pnv_php_slot *php_slot = slot->private;
 	struct opal_msg msg;
@@ -347,6 +324,7 @@ static int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state);
 
 static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
 {
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (12 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [14/15] " Michael Ellerman
  2016-07-13 21:17 ` [PATCH 15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Gavin Shan, linux-pci, Bjorn Helgaas

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

When calling pnv_php_set_slot_power_state() with state ==
OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're
dealing with OPAL_PCI_SLOT_POWER_OFF.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 2d2f704..e6245b0 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -317,7 +317,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 		return ret;
 	}
 
-	if (state == OPAL_PCI_SLOT_POWER_OFF)
+	if (state == OPAL_PCI_SLOT_POWER_OFF || state == OPAL_PCI_SLOT_OFFLINE)
 		pnv_php_rmv_devtree(php_slot);
 	else
 		ret = pnv_php_add_devtree(php_slot);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
                   ` (13 preceding siblings ...)
  2016-07-13 21:17 ` [PATCH 14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
@ 2016-07-13 21:17 ` Ian Munsie
  2016-07-15 10:53   ` [15/15] " Michael Ellerman
  14 siblings, 1 reply; 33+ messages in thread
From: Ian Munsie @ 2016-07-13 21:17 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling, Frederic Barrat,
	Andrew Donnellan, linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Gavin Shan

From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Add a new API, cxl_check_and_switch_mode() to allow for switching of
bi-modal CAPI cards, such as the Mellanox CX-4 network card.

When a driver requests to switch a card to CAPI mode, use PCI hotplug
infrastructure to remove all PCI devices underneath the slot. We then write
an updated mode control register to the CAPI VSEC, hot reset the card, and
reprobe the card.

As the card may present a different set of PCI devices after the mode
switch, use the infrastructure provided by the pnv_php driver and the OPAL
PCI slot management facilities to ensure that:

  * the old devices are removed from both the OPAL and Linux device trees
  * the new devices are probed by OPAL and added to the OPAL device tree
  * the new devices are added to the Linux device tree and probed through
    the regular PCI device probe path

As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on
the pnv_php driver.

Refactor existing code that touches the mode control register in the
regular single mode case into a new function, setup_cxl_protocol_area().

Co-authored-by: Ian Munsie <imunsie@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

---

V1->V2:

	- Added comments around pci_dev_put() - suggested by Frederic
	  Barrat

	- Added new error label for error paths calling pci_dev_put() -
	  suggested by Ian Munsie

	- Added newline at end of Kconfig

	- Removed extraneous comment in setup_cxl_protocol_area()
---
 drivers/misc/cxl/Kconfig |   8 ++
 drivers/misc/cxl/pci.c   | 236 +++++++++++++++++++++++++++++++++++++++++++----
 include/misc/cxl.h       |  25 +++++
 3 files changed, 251 insertions(+), 18 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 560412c..8d76770 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -38,3 +38,11 @@ config CXL
 	  CAPI adapters are found in POWER8 based systems.
 
 	  If unsure, say N.
+
+config CXL_BIMODAL
+	bool "Support for bi-modal CAPI cards"
+	depends on HOTPLUG_PCI_POWERNV = y && CXL || HOTPLUG_PCI_POWERNV = m && CXL = m
+	default y
+	help
+	  Select this option to enable support for bi-modal CAPI cards, such as
+	  the Mellanox CX-4.
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index efe202f..d152e2d 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -55,6 +55,8 @@
 	pci_read_config_byte(dev, vsec + 0xa, dest)
 #define CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val) \
 	pci_write_config_byte(dev, vsec + 0xa, val)
+#define CXL_WRITE_VSEC_MODE_CONTROL_BUS(bus, devfn, vsec, val) \
+	pci_bus_write_config_byte(bus, devfn, vsec + 0xa, val)
 #define CXL_VSEC_PROTOCOL_MASK   0xe0
 #define CXL_VSEC_PROTOCOL_1024TB 0x80
 #define CXL_VSEC_PROTOCOL_512TB  0x40
@@ -614,36 +616,234 @@ static int setup_cxl_bars(struct pci_dev *dev)
 	return 0;
 }
 
-/* pciex node: ibm,opal-m64-window = <0x3d058 0x0 0x3d058 0x0 0x8 0x0>; */
-static int switch_card_to_cxl(struct pci_dev *dev)
-{
+#ifdef CONFIG_CXL_BIMODAL
+
+struct cxl_switch_work {
+	struct pci_dev *dev;
+	struct work_struct work;
 	int vsec;
+	int mode;
+};
+
+static void switch_card_to_cxl(struct work_struct *work)
+{
+	struct cxl_switch_work *switch_work =
+		container_of(work, struct cxl_switch_work, work);
+	struct pci_dev *dev = switch_work->dev;
+	struct pci_bus *bus = dev->bus;
+	struct pci_controller *hose = pci_bus_to_host(bus);
+	struct pci_dev *bridge;
+	struct pnv_php_slot *php_slot;
+	unsigned int devfn;
 	u8 val;
 	int rc;
 
-	dev_info(&dev->dev, "switch card to CXL\n");
+	dev_info(&bus->dev, "cxl: Preparing for mode switch...\n");
+	bridge = list_first_entry_or_null(&hose->bus->devices, struct pci_dev,
+					  bus_list);
+	if (!bridge) {
+		dev_WARN(&bus->dev, "cxl: Couldn't find root port!\n");
+		goto err_dev_put;
+	}
 
-	if (!(vsec = find_cxl_vsec(dev))) {
-		dev_err(&dev->dev, "ABORTING: CXL VSEC not found!\n");
+	php_slot = pnv_php_find_slot(pci_device_to_OF_node(bridge));
+	if (!php_slot) {
+		dev_err(&bus->dev, "cxl: Failed to find slot hotplug "
+			           "information. You may need to upgrade "
+			           "skiboot. Aborting.\n");
+		goto err_dev_put;
+	}
+
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, switch_work->vsec, &val);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: Failed to read CAPI mode control: %i\n", rc);
+		goto err_dev_put;
+	}
+	devfn = dev->devfn;
+
+	/* Release the reference obtained in cxl_check_and_switch_mode() */
+	pci_dev_put(dev);
+
+	dev_dbg(&bus->dev, "cxl: Removing PCI devices from kernel\n");
+	pci_lock_rescan_remove();
+	pci_hp_remove_devices(bridge->subordinate);
+	pci_unlock_rescan_remove();
+
+	/* Switch the CXL protocol on the card */
+	if (switch_work->mode == CXL_BIMODE_CXL) {
+		dev_info(&bus->dev, "cxl: Switching card to CXL mode\n");
+		val &= ~CXL_VSEC_PROTOCOL_MASK;
+		val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
+		rc = pnv_cxl_enable_phb_kernel_api(hose, true);
+		if (rc) {
+			dev_err(&bus->dev, "cxl: Failed to enable kernel API"
+				           " on real PHB, aborting\n");
+			goto err_free_work;
+		}
+	} else {
+		dev_WARN(&bus->dev, "cxl: Switching card to PCI mode not supported!\n");
+		goto err_free_work;
+	}
+
+	rc = CXL_WRITE_VSEC_MODE_CONTROL_BUS(bus, devfn, switch_work->vsec, val);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: Failed to configure CXL protocol: %i\n", rc);
+		goto err_free_work;
+	}
+
+	/*
+	 * The CAIA spec (v1.1, Section 10.6 Bi-modal Device Support) states
+	 * we must wait 100ms after this mode switch before touching PCIe config
+	 * space.
+	 */
+	msleep(100);
+
+	/*
+	 * Hot reset to cause the card to come back in cxl mode. A
+	 * OPAL_RESET_PCI_LINK would be sufficient, but currently lacks support
+	 * in skiboot, so we use a hot reset instead.
+	 *
+	 * We call pci_set_pcie_reset_state() on the bridge, as a CAPI card is
+	 * guaranteed to sit directly under the root port, and setting the reset
+	 * state on a device directly under the root port is equivalent to doing
+	 * it on the root port iself.
+	 */
+	dev_info(&bus->dev, "cxl: Configuration write complete, resetting card\n");
+	pci_set_pcie_reset_state(bridge, pcie_hot_reset);
+	pci_set_pcie_reset_state(bridge, pcie_deassert_reset);
+
+	dev_dbg(&bus->dev, "cxl: Offlining slot\n");
+	rc = pnv_php_set_slot_power_state(&php_slot->slot, OPAL_PCI_SLOT_OFFLINE);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: OPAL offlining call failed: %i\n", rc);
+		goto err_free_work;
+	}
+
+	dev_dbg(&bus->dev, "cxl: Onlining and probing slot\n");
+	rc = pnv_php_set_slot_power_state(&php_slot->slot, OPAL_PCI_SLOT_ONLINE);
+	if (rc) {
+		dev_err(&bus->dev, "cxl: OPAL onlining call failed: %i\n", rc);
+		goto err_free_work;
+	}
+
+	pci_lock_rescan_remove();
+	pci_hp_add_devices(bridge->subordinate);
+	pci_unlock_rescan_remove();
+
+	dev_info(&bus->dev, "cxl: CAPI mode switch completed\n");
+	kfree(switch_work);
+	return;
+
+err_dev_put:
+	/* Release the reference obtained in cxl_check_and_switch_mode() */
+	pci_dev_put(dev);
+err_free_work:
+	kfree(switch_work);
+}
+
+int cxl_check_and_switch_mode(struct pci_dev *dev, int mode, int vsec)
+{
+	struct cxl_switch_work *work;
+	u8 val;
+	int rc;
+
+	if (!cpu_has_feature(CPU_FTR_HVMODE))
 		return -ENODEV;
+
+	if (!vsec) {
+		vsec = find_cxl_vsec(dev);
+		if (!vsec) {
+			dev_info(&dev->dev, "CXL VSEC not found\n");
+			return -ENODEV;
+		}
 	}
 
-	if ((rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val))) {
-		dev_err(&dev->dev, "failed to read current mode control: %i", rc);
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val);
+	if (rc) {
+		dev_err(&dev->dev, "Failed to read current mode control: %i", rc);
 		return rc;
 	}
-	val &= ~CXL_VSEC_PROTOCOL_MASK;
-	val |= CXL_VSEC_PROTOCOL_256TB | CXL_VSEC_PROTOCOL_ENABLE;
-	if ((rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val))) {
-		dev_err(&dev->dev, "failed to enable CXL protocol: %i", rc);
-		return rc;
+
+	if (mode == CXL_BIMODE_PCI) {
+		if (!(val & CXL_VSEC_PROTOCOL_ENABLE)) {
+			dev_info(&dev->dev, "Card is already in PCI mode\n");
+			return 0;
+		}
+		/*
+		 * TODO: Before it's safe to switch the card back to PCI mode
+		 * we need to disable the CAPP and make sure any cachelines the
+		 * card holds have been flushed out. Needs skiboot support.
+		 */
+		dev_WARN(&dev->dev, "CXL mode switch to PCI unsupported!\n");
+		return -EIO;
 	}
+
+	if (val & CXL_VSEC_PROTOCOL_ENABLE) {
+		dev_info(&dev->dev, "Card is already in CXL mode\n");
+		return 0;
+	}
+
+	dev_info(&dev->dev, "Card is in PCI mode, scheduling kernel thread "
+			    "to switch to CXL mode\n");
+
+	work = kmalloc(sizeof(struct cxl_switch_work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	pci_dev_get(dev);
+	work->dev = dev;
+	work->vsec = vsec;
+	work->mode = mode;
+	INIT_WORK(&work->work, switch_card_to_cxl);
+
+	schedule_work(&work->work);
+
 	/*
-	 * The CAIA spec (v0.12 11.6 Bi-modal Device Support) states
-	 * we must wait 100ms after this mode switch before touching
-	 * PCIe config space.
+	 * We return a failure now to abort the driver init. Once the
+	 * link has been cycled and the card is in cxl mode we will
+	 * come back (possibly using the generic cxl driver), but
+	 * return success as the card should then be in cxl mode.
+	 *
+	 * TODO: What if the card comes back in PCI mode even after
+	 *       the switch?  Don't want to spin endlessly.
 	 */
-	msleep(100);
+	return -EBUSY;
+}
+EXPORT_SYMBOL_GPL(cxl_check_and_switch_mode);
+
+#endif /* CONFIG_CXL_BIMODAL */
+
+static int setup_cxl_protocol_area(struct pci_dev *dev)
+{
+	u8 val;
+	int rc;
+	int vsec = find_cxl_vsec(dev);
+
+	if (!vsec) {
+		dev_info(&dev->dev, "CXL VSEC not found\n");
+		return -ENODEV;
+	}
+
+	rc = CXL_READ_VSEC_MODE_CONTROL(dev, vsec, &val);
+	if (rc) {
+		dev_err(&dev->dev, "Failed to read current mode control: %i\n", rc);
+		return rc;
+	}
+
+	if (!(val & CXL_VSEC_PROTOCOL_ENABLE)) {
+		dev_err(&dev->dev, "Card not in CAPI mode!\n");
+		return -EIO;
+	}
+
+	if ((val & CXL_VSEC_PROTOCOL_MASK) != CXL_VSEC_PROTOCOL_256TB) {
+		val &= ~CXL_VSEC_PROTOCOL_MASK;
+		val |= CXL_VSEC_PROTOCOL_256TB;
+		rc = CXL_WRITE_VSEC_MODE_CONTROL(dev, vsec, val);
+		if (rc) {
+			dev_err(&dev->dev, "Failed to set CXL protocol area: %i\n", rc);
+			return rc;
+		}
+	}
 
 	return 0;
 }
@@ -1249,7 +1449,7 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = setup_cxl_bars(dev)))
 		return rc;
 
-	if ((rc = switch_card_to_cxl(dev)))
+	if ((rc = setup_cxl_protocol_area(dev)))
 		return rc;
 
 	if ((rc = cxl_update_image_control(adapter)))
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index 6c52cbc..480d50a 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -39,6 +39,31 @@
 bool cxl_slot_is_supported(struct pci_dev *dev, int flags);
 
 
+#define CXL_BIMODE_CXL 1
+#define CXL_BIMODE_PCI 2
+
+/*
+ * Check the mode that the given bi-modal CXL adapter is currently in and
+ * change it if necessary. This does not apply to AFU drivers.
+ *
+ * If the mode matches the requested mode this function will return 0 - if the
+ * driver was expecting the generic CXL driver to have bound to the adapter and
+ * it gets this return value it should fail the probe function to give the CXL
+ * driver a chance to probe it.
+ *
+ * If the mode does not match it will start a background task to unplug the
+ * device from Linux and switch its mode, and will return -EBUSY. At this
+ * point the calling driver should make sure it has released the device and
+ * fail its probe function.
+ *
+ * The offset of the CXL VSEC can be provided to this function. If 0 is passed,
+ * this function will search for a CXL VSEC with ID 0x1280 and return -ENODEV
+ * if it is not found.
+ */
+#ifdef CONFIG_CXL_BIMODAL
+int cxl_check_and_switch_mode(struct pci_dev *dev, int mode, int vsec);
+#endif
+
 /* Get the AFU associated with a pci_dev */
 struct cxl_afu *cxl_pci_to_afu(struct pci_dev *dev);
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-13 21:17 ` [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
@ 2016-07-14  5:34   ` Andrew Donnellan
  2016-07-15 10:53   ` [11/15] " Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Andrew Donnellan @ 2016-07-14  5:34 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 14/07/16 07:17, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
> interrupts are routed from the networking hardware to the XSL using the
> MSIX table, and from there will be transformed back into an MSIX
> interrupt using the cxl style interrupts (i.e. using IVTE entries and
> ranges to map a PE and AFU interrupt number to an MSIX address).
>
> We want to hide the implementation details of cxl interrupts as much as
> possible. To this end, we use a special version of the MSI setup &
> teardown routines in the PHB while in cxl mode to allocate the cxl
> interrupts and configure the IVTE entries in the process element.
>
> This function does not configure the MSIX table - the CX4 card uses a
> custom format in that table and it would not be appropriate to fill that
> out in generic code. The rest of the functionality is similar to the
> "Full MSI-X mode" described in the CAIA, and this could be easily
> extended to support other adapters that use that mode in the future.
>
> The interrupts will be associated with the default context. If the
> maximum number of interrupts per context has been limited (e.g. by the
> mlx5 driver), it will automatically allocate additional kernel contexts
> to associate extra interrupts as required. These contexts will be
> started using the same WED that was used to start the default context.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>

Some minor nitpicks below, which shouldn't block acceptance.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>


> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 104c040..530d4af 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3465,6 +3465,10 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>  const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
>  	.dma_dev_setup		= pnv_pci_dma_dev_setup,
>  	.dma_bus_setup		= pnv_pci_dma_bus_setup,
> +#ifdef CONFIG_PCI_MSI

If you've got CXL_BASE you've already got PCI_MSI.

> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index f02a859..f3d34b9 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -14,6 +14,7 @@
>  #include <misc/cxl.h>
>  #include <linux/fs.h>
>  #include <asm/pnv-pci.h>
> +#include <linux/msi.h>
>
>  #include "cxl.h"
>
> @@ -489,3 +490,73 @@ int cxl_get_max_irqs_per_process(struct pci_dev *dev)
>  	return afu->irqs_max;
>  }
>  EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
> +
> +/*
> + * This is a special interrupt allocation routine called from the PHB's MSI
> + * setup function. When capi interrupts are allocated in this manner they must
> + * still be associated with a running context, but since the MSI APIs have no
> + * way to specify this we use the default context associated with the device.
> + *
> + * The Mellanox CX4 has a hardware limitation that restricts the maximum AFU
> + * interrupt number, so in order to overcome this their driver informs us of
> + * the restriction by setting the maximum interrupts per context, and we
> + * allocate additional contexts as necessary so that we can keep the AFU
> + * interrupt number within the supported range.
> + */
> +int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
> +{
> +	struct cxl_context *ctx, *new_ctx, *default_ctx;
> +	int remaining;
> +	int rc;
> +
> +	ctx = default_ctx = cxl_get_context(pdev);
> +	if (WARN_ON(!default_ctx))
> +		return -ENODEV;

I have a very slight preference for:

if (!default_ctx) {
	dev_WARN(&pdev->dev, "couldn't get default context");
	return -ENODEV;
}

(I see this in your arch/powerpc code too, but that's obviously copied 
from the regular powernv irq code. Also, why is there no dev_WARN_ON() 
function?)

> +
> +	remaining = nvec;
> +	while (remaining > 0) {
> +		rc = cxl_allocate_afu_irqs(ctx, min(remaining, ctx->afu->irqs_max));
> +		if (rc) {
> +			pr_warn("%s: Failed to find enough free MSIs\n", pci_name(pdev));

dev_warn(&pdev->dev, "failed to find enough free MSIs\n"); is more 
common in the cxl code.


-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [01/15] powerpc/powernv: Split cxl code out into a separate file
  2016-07-13 21:17 ` [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:00 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The support for using the Mellanox CX4 in cxl mode will require
> additions to the PHB code. In preparation for this, move the existing
> cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
> more organised.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f456834a6c1db36c290fdfe8ab

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [02/15] cxl: Add cxl_slot_is_supported API
  2016-07-13 21:17 ` [PATCH 02/15] cxl: Add cxl_slot_is_supported API Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Philippe Bergheaud, Ian Munsie

On Wed, 2016-13-07 at 21:17:01 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This extends the check that the adapter is in a CAPI capable slot so
> that it may be called by external users in the kernel API. This will be
> used by the upcoming Mellanox CX4 support, which needs to know ahead of
> time if the card can be switched to cxl mode so that it can leave it in
> PCI mode if it is not.
> 
> This API takes a parameter to check if CAPP DMA mode is supported, which
> it currently only allows on P8NVL systems, since that mode currently has
> issues accessing memory < 4GB on P8, and we cannot realistically avoid
> that.
> 
> This API does not currently check if a CAPP unit is available (i.e. not
> already assigned to another PHB) on P8. Doing so would be racy since it
> is assigned on a first come first serve basis, and so long as CAPP DMA
> mode is not supported on P8 we don't need this, since the only
> anticipated user of this API requires CAPP DMA mode.
> 
> Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4e56f858bdde5cbfb70f61badd

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [03/15] cxl: Enable bus mastering for devices using CAPP DMA mode
  2016-07-13 21:17 ` [PATCH 03/15] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:02 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus
> master to be enabled in order for the CAPI traffic to flow. This should
> be harmless to enable for other cxl devices, so unconditionally enable
> it in the adapter init flow.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/48b3adf33459c1c42766d9c206

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [04/15] cxl: Move cxl_afu_get / cxl_afu_put to base
  2016-07-13 21:17 ` [PATCH 04/15] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:03 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The Mellanox CX4 uses a model where the AFU is one physical function of
> the device, and is used by other peer physical functions of the same
> device. This will require those other devices to grab a reference on the
> AFU when they are initialised to make sure that it does not go away
> during their lifetime.
> 
> Move the AFU refcount functions to base.c so they can be called from
> the PHB code.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/62ccf2d2efefa01d0eb92cd6ec

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [05/15] cxl: Allow a default context to be associated with an external pci_dev
  2016-07-13 21:17 ` [PATCH 05/15] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:04 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The cxl kernel API has a concept of a default context associated with
> each PCI device under the virtual PHB. The Mellanox CX4 will also use
> the cxl kernel API, but it does not use a virtual PHB - rather, the AFU
> appears as a physical function as a peer to the networking functions.
> 
> In order to allow the kernel API to work with those networking
> functions, we will need to associate a default context with them as
> well. To this end, refactor the corresponding code to do this in vphb.c
> and export it so that it can be called from the PHB code.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a19bd79e31769626d288cc016e

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [06/15] cxl: Do not create vPHB if there are no AFU configuration records
  2016-07-13 21:17 ` [PATCH 06/15] cxl: Do not create vPHB if there are no AFU configuration records Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:05 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The vPHB model of the cxl kernel API is a hierarchy where the AFU is
> represented by the vPHB, and it's AFU configuration records are exposed
> as functions under that vPHB. If there are no AFU configuration records
> we will create a vPHB with nothing under it, which is a waste of
> resources and will opt us into EEH handling despite not having anything
> special to handle.
> 
> This also does not make sense for cards using the peer model of the cxl
> kernel API, where the other functions of the device are exposed via
> additional peer physical functions rather than AFU configuration
> records. This model will also not work with the existing EEH handling in
> the cxl driver, as that is designed around the vPHB model.
> 
> Skip creating the vPHB for AFUs without any AFU configuration records,
> and opt out of EEH handling for them.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e4f5fc001a6cb82bef91037245

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb
  2016-07-13 21:17 ` [PATCH 07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:06 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This adds support for the peer model of the cxl kernel api to the
> PowerNV PHB, in which physical function 0 represents the cxl function on
> the card (an XSL in the case of the CX4), which other physical functions
> will use for memory access and interrupt services. It is referred to as
> the peer model as these functions are peers of one another, as opposed
> to the Virtual PHB model which forms a hierarchy.
> 
> This patch exports APIs to enable the peer mode, check if a PCI device
> is attached to a PHB in this mode, and to set and get the peer AFU for
> this mode.
> 
> The cxl driver will enable this mode for supported cards by calling
> pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
> that this mode is enabled, and switch out it's controller_ops for the
> cxl version.
> 
> The cxl version of the controller_ops struct implements it's own
> versions of the enable_device_hook and release_device to handle
> refcounting on the peer AFU and to allocate a default context for the
> device.
> 
> Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
> there is no safe way to disable cxl mode short of a reboot, so until
> that changes there is no reason to support the disable path.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4361b03430d685610e5feea3ec

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [08/15] cxl: Add support for using the kernel API with a real PHB
  2016-07-13 21:17 ` [PATCH 08/15] cxl: Add support for using the kernel API with a real PHB Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:07 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This hooks up support for using the kernel API with a real PHB. After
> the AFU initialisation has completed it calls into the PHB code to pass
> it the AFU that will be used by other peer physical functions on the
> adapter.
> 
> The cxl_pci_to_afu API is extended to work with peer PCI devices,
> retrieving the peer AFU from the PHB. This API may also now return an
> error if it is called on a PCI device that is not associated with either
> a cxl vPHB or a peer PCI device to an AFU, and this error is propagated
> down.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/317f5ef1b363417b6f1e93b90d

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [09/15] cxl: Add kernel APIs to get & set the max irqs per context
  2016-07-13 21:17 ` [PATCH 09/15] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:08 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> These APIs will be used by the Mellanox CX4 support. While they function
> standalone to configure existing behaviour, their primary purpose is to
> allow the Mellanox driver to inform the cxl driver of a hardware
> limitation, which will be used in a future patch.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/79384e4b71240abf50c375eea5

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [10/15] cxl: Add preliminary workaround for CX4 interrupt limitation
  2016-07-13 21:17 ` [PATCH 10/15] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:09 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The Mellanox CX4 has a hardware limitation where only 4 bits of the
> AFU interrupt number can be passed to the XSL when sending an interrupt,
> limiting it to only 15 interrupts per context (AFU interrupt number 0 is
> invalid).
> 
> In order to overcome this, we will allocate additional contexts linked
> to the default context as extra address space for the extra interrupts -
> this will be implemented in the next patch.
> 
> This patch adds the preliminary support to allow this, by way of adding
> a linked list in the context structure that we use to keep track of the
> contexts dedicated to interrupts, and an API to simultaneously iterate
> over the related context structures, AFU interrupt numbers and hardware
> interrupt numbers. The point of using a single API to iterate these is
> to hide some of the details of the iteration from external code, and to
> reduce the number of APIs that need to be exported via base.c to allow
> built in code to call.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/cbce0917e2e47d4bf5aa3b5fd6

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [11/15] cxl: Add support for interrupts on the Mellanox CX4
  2016-07-13 21:17 ` [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
  2016-07-14  5:34   ` Andrew Donnellan
@ 2016-07-15 10:53   ` Michael Ellerman
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:10 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
> interrupts are routed from the networking hardware to the XSL using the
> MSIX table, and from there will be transformed back into an MSIX
> interrupt using the cxl style interrupts (i.e. using IVTE entries and
> ranges to map a PE and AFU interrupt number to an MSIX address).
> 
> We want to hide the implementation details of cxl interrupts as much as
> possible. To this end, we use a special version of the MSI setup &
> teardown routines in the PHB while in cxl mode to allocate the cxl
> interrupts and configure the IVTE entries in the process element.
> 
> This function does not configure the MSIX table - the CX4 card uses a
> custom format in that table and it would not be appropriate to fill that
> out in generic code. The rest of the functionality is similar to the
> "Full MSI-X mode" described in the CAIA, and this could be easily
> extended to support other adapters that use that mode in the future.
> 
> The interrupts will be associated with the default context. If the
> maximum number of interrupts per context has been limited (e.g. by the
> mlx5 driver), it will automatically allocate additional kernel contexts
> to associate extra interrupts as required. These contexts will be
> started using the same WED that was used to start the default context.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a2f67d5ee8d950caaa7a6144cf

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [12/15] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-13 21:17 ` [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  2016-07-28  1:48   ` [PATCH 12/15] " Andrew Donnellan
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie

On Wed, 2016-13-07 at 21:17:11 UTC, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> The CX4 card cannot cope with a context with PE=0 due to a hardware
> limitation, resulting in:
> 
> [   34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939
> [   34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting
> 
> Since the kernel API allocates a default context very early during
> device init that will almost certainly get Process Element ID 0 there is
> no easy way for us to extend the API to allow the Mellanox to inform us
> of this limitation ahead of time.
> 
> Instead, work around the issue by extending the XSL structure to include
> a minimum PE to allocate. Although the bug is not in the XSL, it is the
> easiest place to work around this limitation given that the CX4 is
> currently the only card that uses an XSL.
> 
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f67a6722d650b864b020b19b39

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
  2016-07-13 21:17 ` [PATCH 13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Bjorn Helgaas, linux-pci, Ian Munsie, Gavin Shan

On Wed, 2016-13-07 at 21:17:12 UTC, Ian Munsie wrote:
> From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> 
> The cxl driver will use infrastructure from pnv_php to handle device tree
> updates when switching bi-modal CAPI cards into CAPI mode.
> 
> To enable this, export pnv_php_find_slot() and
> pnv_php_set_slot_power_state(), and add corresponding declarations, as well
> as the definition of struct pnv_php_slot, to asm/pnv-pci.h.
> 
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/89379f165a1be13aa9b4731a90

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
  2016-07-13 21:17 ` [PATCH 14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Bjorn Helgaas, linux-pci, Ian Munsie, Gavin Shan

On Wed, 2016-13-07 at 21:17:13 UTC, Ian Munsie wrote:
> From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> 
> When calling pnv_php_set_slot_power_state() with state ==
> OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're
> dealing with OPAL_PCI_SLOT_POWER_OFF.
> 
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5473a6bf635d35d5c1d12d0e13

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards
  2016-07-13 21:17 ` [PATCH 15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
@ 2016-07-15 10:53   ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2016-07-15 10:53 UTC (permalink / raw)
  To: Ian Munsie, Michael Neuling, Frederic Barrat, Andrew Donnellan,
	linuxppc-dev, Huy Nguyen
  Cc: Ian Munsie, Gavin Shan

On Wed, 2016-13-07 at 21:17:14 UTC, Ian Munsie wrote:
> From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> 
> Add a new API, cxl_check_and_switch_mode() to allow for switching of
> bi-modal CAPI cards, such as the Mellanox CX-4 network card.
...
> 
> Co-authored-by: Ian Munsie <imunsie@au1.ibm.com>
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b0b5e5918ad1babfd1d43d98c7

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in Mellanox CX4
  2016-07-13 21:17 ` [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
  2016-07-15 10:53   ` [12/15] " Michael Ellerman
@ 2016-07-28  1:48   ` Andrew Donnellan
  1 sibling, 0 replies; 33+ messages in thread
From: Andrew Donnellan @ 2016-07-28  1:48 UTC (permalink / raw)
  To: Ian Munsie, Michael Ellerman, Michael Neuling, Frederic Barrat,
	linuxppc-dev, Huy Nguyen

On 14/07/16 07:17, Ian Munsie wrote:
>  	mutex_lock(&afu->contexts_lock);
>  	idr_preload(GFP_KERNEL);
> -	i = idr_alloc(&ctx->afu->contexts_idr, ctx, 0,
> +	i = idr_alloc(&ctx->afu->contexts_idr, ctx,
> +		      ctx->afu->adapter->native->sl_ops->min_pe,

As it turns out, dereferencing ctx->afu->adapter->native doesn't exactly 
work on PowerVM...

Working on a fix.


Andrew

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-07-28  1:49 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 21:16 [PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode Ian Munsie
2016-07-13 21:17 ` [PATCH 01/15] powerpc/powernv: Split cxl code out into a separate file Ian Munsie
2016-07-15 10:53   ` [01/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 02/15] cxl: Add cxl_slot_is_supported API Ian Munsie
2016-07-15 10:53   ` [02/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 03/15] cxl: Enable bus mastering for devices using CAPP DMA mode Ian Munsie
2016-07-15 10:53   ` [03/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 04/15] cxl: Move cxl_afu_get / cxl_afu_put to base Ian Munsie
2016-07-15 10:53   ` [04/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 05/15] cxl: Allow a default context to be associated with an external pci_dev Ian Munsie
2016-07-15 10:53   ` [05/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 06/15] cxl: Do not create vPHB if there are no AFU configuration records Ian Munsie
2016-07-15 10:53   ` [06/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 07/15] powerpc/powernv: Add support for the cxl kernel api on the real phb Ian Munsie
2016-07-15 10:53   ` [07/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 08/15] cxl: Add support for using the kernel API with a real PHB Ian Munsie
2016-07-15 10:53   ` [08/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 09/15] cxl: Add kernel APIs to get & set the max irqs per context Ian Munsie
2016-07-15 10:53   ` [09/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 10/15] cxl: Add preliminary workaround for CX4 interrupt limitation Ian Munsie
2016-07-15 10:53   ` [10/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4 Ian Munsie
2016-07-14  5:34   ` Andrew Donnellan
2016-07-15 10:53   ` [11/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 12/15] cxl: Workaround PE=0 hardware limitation in " Ian Munsie
2016-07-15 10:53   ` [12/15] " Michael Ellerman
2016-07-28  1:48   ` [PATCH 12/15] " Andrew Donnellan
2016-07-13 21:17 ` [PATCH 13/15] PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl Ian Munsie
2016-07-15 10:53   ` [13/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 14/15] PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state Ian Munsie
2016-07-15 10:53   ` [14/15] " Michael Ellerman
2016-07-13 21:17 ` [PATCH 15/15] cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Ian Munsie
2016-07-15 10:53   ` [15/15] " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).