linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Add iProc PCIe PAXC and MSI support
@ 2015-11-18  0:31 Ray Jui
  2015-11-18  0:31 ` [PATCH 1/5] PCI: iproc: Update iProc PCIe device tree binding Ray Jui
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

This patch series adds support for the iProc PAXC interface and support for
event queue based MSI, integrated in the iProc PCIe core

Traditionally, all iProc PCIe root complexes use the PAXB interface, with an
integrated on-chip Serdes to support external endpoint devices. On newer iProc
platforms, a PAXC interface is introduced, for connection with internally
emulated PCIe endpoint devices in the ASIC
 
This iProc event queue based MSI support is meant for older platforms including
NS, NSP, Cygnus, and etc. Newer platforms with integrated MSI in the GIC (e.g.,
giv2m or gicv3-its) should not use iProc event queue based MSI

This patch series is based on Linux v4.4-rc1 and is avaliable here:
https://github.com/Broadcom/cygnus-linux/tree/iproc-msi-v1

Ray Jui (5):
  PCI: iproc: Update iProc PCIe device tree binding
  PCI: iproc: Add PAXC interface support
  PCI: iproc: Add iProc PCIe MSI device tree binding
  PCI: iproc: Add iProc PCIe MSI support
  ARM: dts: Enable MSI support for Broadcom Cygnus

 .../devicetree/bindings/pci/brcm,iproc-pcie.txt    |  45 ++-
 arch/arm/boot/dts/bcm-cygnus.dtsi                  |  26 ++
 drivers/pci/host/Kconfig                           |   9 +
 drivers/pci/host/Makefile                          |   1 +
 drivers/pci/host/pcie-iproc-msi.c                  | 434 +++++++++++++++++++++
 drivers/pci/host/pcie-iproc-platform.c             |   8 +
 drivers/pci/host/pcie-iproc.c                      | 221 +++++++++--
 drivers/pci/host/pcie-iproc.h                      |  31 ++
 8 files changed, 740 insertions(+), 35 deletions(-)
 create mode 100644 drivers/pci/host/pcie-iproc-msi.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] PCI: iproc: Update iProc PCIe device tree binding
  2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
@ 2015-11-18  0:31 ` Ray Jui
  2015-11-18  0:31 ` [PATCH 2/5] PCI: iproc: Add PAXC interface support Ray Jui
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

Add a new compatible string "brcm,iproc-pcie-paxc", for PAXC based iProc
PCIe root complex. A PAXC based PCIe root complex is connected to
emulated endpoint devices internal to the ASIC

Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
---
 Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
index 45c2a80..06eae0f 100644
--- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
@@ -1,7 +1,10 @@
 * Broadcom iProc PCIe controller with the platform bus interface
 
 Required properties:
-- compatible: Must be "brcm,iproc-pcie"
+- compatible: Must be "brcm,iproc-pcie" for PAXB, or "brcm,iproc-pcie-paxc"
+  for PAXC. PAXB based root complex is used for external endpoint devices.
+  PAXC based root complex is connected to emulated endpoint devices
+  internal to the ASIC
 - reg: base address and length of the PCIe controller I/O register space
 - #interrupt-cells: set to <1>
 - interrupt-map-mask and interrupt-map, standard PCI properties to define the
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] PCI: iproc: Add PAXC interface support
  2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
  2015-11-18  0:31 ` [PATCH 1/5] PCI: iproc: Update iProc PCIe device tree binding Ray Jui
@ 2015-11-18  0:31 ` Ray Jui
  2015-11-18  0:34   ` Florian Fainelli
  2015-11-18  0:31 ` [PATCH 3/5] PCI: iproc: Add iProc PCIe MSI device tree binding Ray Jui
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

Traditionally, all iProc PCIe root complexes use PAXB based wrapper,
with an integrated on-chip Serdes to support external endpoint devices.
On newer iProc platforms, a PAXC based wrapper is introduced, for
connection with internally emulated PCIe endpoint devices in the ASIC

This patch adds support for PAXC based iProc PCIe root complex in the
iProc PCIe core driver. This change fators out common logic between
PAXB and PAXC, and use tables to store register offsets that are
different between PAXB and PAXC. This allows the driver to be scaled to
support subsequent PAXC revisions in the future

Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
---
 drivers/pci/host/pcie-iproc-platform.c |   8 ++
 drivers/pci/host/pcie-iproc.c          | 202 +++++++++++++++++++++++++++------
 drivers/pci/host/pcie-iproc.h          |  19 ++++
 3 files changed, 195 insertions(+), 34 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
index c9550dc..716b56b 100644
--- a/drivers/pci/host/pcie-iproc-platform.c
+++ b/drivers/pci/host/pcie-iproc-platform.c
@@ -42,6 +42,13 @@ static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
 	pcie->dev = &pdev->dev;
 	platform_set_drvdata(pdev, pcie);
 
+	if (of_device_is_compatible(np, "brcm,iproc-pcie"))
+		pcie->type = IPROC_PCIE_PAXB;
+	else if (of_device_is_compatible(np, "brcm,iproc-pcie-paxc"))
+		pcie->type = IPROC_PCIE_PAXC;
+	else
+		return -ENODEV;
+
 	ret = of_address_to_resource(np, 0, &reg);
 	if (ret < 0) {
 		dev_err(pcie->dev, "unable to obtain controller resources\n");
@@ -116,6 +123,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
 
 static const struct of_device_id iproc_pcie_of_match_table[] = {
 	{ .compatible = "brcm,iproc-pcie", },
+	{ .compatible = "brcm,iproc-pcie-paxc", },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, iproc_pcie_of_match_table);
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index eac719a..24d5b62 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -30,20 +30,16 @@
 
 #include "pcie-iproc.h"
 
-#define CLK_CONTROL_OFFSET           0x000
 #define EP_PERST_SOURCE_SELECT_SHIFT 2
 #define EP_PERST_SOURCE_SELECT       BIT(EP_PERST_SOURCE_SELECT_SHIFT)
 #define EP_MODE_SURVIVE_PERST_SHIFT  1
 #define EP_MODE_SURVIVE_PERST        BIT(EP_MODE_SURVIVE_PERST_SHIFT)
 #define RC_PCIE_RST_OUTPUT_SHIFT     0
 #define RC_PCIE_RST_OUTPUT           BIT(RC_PCIE_RST_OUTPUT_SHIFT)
+#define PAXC_RESET_MASK              0x7f
 
-#define CFG_IND_ADDR_OFFSET          0x120
 #define CFG_IND_ADDR_MASK            0x00001ffc
 
-#define CFG_IND_DATA_OFFSET          0x124
-
-#define CFG_ADDR_OFFSET              0x1f8
 #define CFG_ADDR_BUS_NUM_SHIFT       20
 #define CFG_ADDR_BUS_NUM_MASK        0x0ff00000
 #define CFG_ADDR_DEV_NUM_SHIFT       15
@@ -55,12 +51,8 @@
 #define CFG_ADDR_CFG_TYPE_SHIFT      0
 #define CFG_ADDR_CFG_TYPE_MASK       0x00000003
 
-#define CFG_DATA_OFFSET              0x1fc
-
-#define SYS_RC_INTX_EN               0x330
 #define SYS_RC_INTX_MASK             0xf
 
-#define PCIE_LINK_STATUS_OFFSET      0xf0c
 #define PCIE_PHYLINKUP_SHIFT         3
 #define PCIE_PHYLINKUP               BIT(PCIE_PHYLINKUP_SHIFT)
 #define PCIE_DL_ACTIVE_SHIFT         2
@@ -71,12 +63,54 @@
 #define OARR_SIZE_CFG_SHIFT          1
 #define OARR_SIZE_CFG                BIT(OARR_SIZE_CFG_SHIFT)
 
-#define OARR_LO(window)              (0xd20 + (window) * 8)
-#define OARR_HI(window)              (0xd24 + (window) * 8)
-#define OMAP_LO(window)              (0xd40 + (window) * 8)
-#define OMAP_HI(window)              (0xd44 + (window) * 8)
-
 #define MAX_NUM_OB_WINDOWS           2
+#define MAX_NUM_PAXC_PF              4
+
+#define IPROC_PCIE_REG_INVALID 0xffff
+
+enum iproc_pcie_reg {
+	IPROC_PCIE_CLK_CTRL = 0,
+	IPROC_PCIE_CFG_IND_ADDR,
+	IPROC_PCIE_CFG_IND_DATA,
+	IPROC_PCIE_CFG_ADDR,
+	IPROC_PCIE_CFG_DATA,
+	IPROC_PCIE_INTX_EN,
+	IPROC_PCIE_OARR_LO,
+	IPROC_PCIE_OARR_HI,
+	IPROC_PCIE_OMAP_LO,
+	IPROC_PCIE_OMAP_HI,
+	IPROC_PCIE_LINK_STATUS,
+};
+
+/* iProc PCIe PAXB registers */
+static const u16 iproc_pcie_reg_paxb[] = {
+	[IPROC_PCIE_CLK_CTRL]     = 0x000,
+	[IPROC_PCIE_CFG_IND_ADDR] = 0x120,
+	[IPROC_PCIE_CFG_IND_DATA] = 0x124,
+	[IPROC_PCIE_CFG_ADDR]     = 0x1f8,
+	[IPROC_PCIE_CFG_DATA]     = 0x1fc,
+	[IPROC_PCIE_INTX_EN]      = 0x330,
+	[IPROC_PCIE_OARR_LO]      = 0xd20,
+	[IPROC_PCIE_OARR_HI]      = 0xd24,
+	[IPROC_PCIE_OMAP_LO]      = 0xd40,
+	[IPROC_PCIE_OMAP_HI]      = 0xd44,
+	[IPROC_PCIE_LINK_STATUS]  = 0xf0c,
+};
+
+/* iProc PCIe PAXC v1 registers */
+static const u16 iproc_pcie_reg_paxc[] = {
+	[IPROC_PCIE_CLK_CTRL]     = 0x000,
+	[IPROC_PCIE_CFG_IND_ADDR] = 0x1f0,
+	[IPROC_PCIE_CFG_IND_DATA] = 0x1f4,
+	[IPROC_PCIE_CFG_ADDR]     = 0x1f8,
+	[IPROC_PCIE_CFG_DATA]     = 0x1fc,
+	[IPROC_PCIE_INTX_EN]      = IPROC_PCIE_REG_INVALID,
+	[IPROC_PCIE_OARR_LO]      = IPROC_PCIE_REG_INVALID,
+	[IPROC_PCIE_OARR_HI]      = IPROC_PCIE_REG_INVALID,
+	[IPROC_PCIE_OMAP_LO]      = IPROC_PCIE_REG_INVALID,
+	[IPROC_PCIE_OMAP_HI]      = IPROC_PCIE_REG_INVALID,
+	[IPROC_PCIE_LINK_STATUS]  = IPROC_PCIE_REG_INVALID,
+};
 
 static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 {
@@ -91,6 +125,65 @@ static inline struct iproc_pcie *iproc_data(struct pci_bus *bus)
 	return pcie;
 }
 
+static inline bool iproc_pcie_reg_is_invalid(u16 reg_offset)
+{
+	return !!(reg_offset == IPROC_PCIE_REG_INVALID);
+}
+
+static inline u16 iproc_pcie_reg_offset(struct iproc_pcie *pcie,
+					enum iproc_pcie_reg reg)
+{
+	return pcie->reg_offsets[reg];
+}
+
+static inline u32 iproc_pcie_read_reg(struct iproc_pcie *pcie,
+				      enum iproc_pcie_reg reg)
+{
+	u16 offset = iproc_pcie_reg_offset(pcie, reg);
+
+	if (iproc_pcie_reg_is_invalid(offset))
+		return 0;
+
+	return readl(pcie->base + offset);
+}
+
+static inline void iproc_pcie_write_reg(struct iproc_pcie *pcie,
+					enum iproc_pcie_reg reg, u32 val)
+{
+	u16 offset = iproc_pcie_reg_offset(pcie, reg);
+
+	if (iproc_pcie_reg_is_invalid(offset))
+		return;
+
+	writel(val, pcie->base + offset);
+}
+
+static inline void iproc_pcie_ob_write(struct iproc_pcie *pcie,
+				       enum iproc_pcie_reg reg,
+				       unsigned window, u32 val)
+{
+	u16 offset = iproc_pcie_reg_offset(pcie, reg);
+
+	if (iproc_pcie_reg_is_invalid(offset))
+		return;
+
+	writel(val, pcie->base + offset + (window * 8));
+}
+
+static inline bool iproc_pcie_device_is_valid(struct iproc_pcie *pcie,
+					      unsigned int slot,
+					      unsigned int fn)
+{
+	if (slot > 0)
+		return false;
+
+	/* PAXC can only support limited number of functions */
+	if (pcie->type == IPROC_PCIE_PAXC && fn >= MAX_NUM_PAXC_PF)
+		return false;
+
+	return true;
+}
+
 /**
  * Note access to the configuration registers are protected at the higher layer
  * by 'pci_lock' in drivers/pci/access.c
@@ -104,28 +197,34 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
 	unsigned fn = PCI_FUNC(devfn);
 	unsigned busno = bus->number;
 	u32 val;
+	u16 offset;
+
+	if (!iproc_pcie_device_is_valid(pcie, slot, fn))
+		return NULL;
 
 	/* root complex access */
 	if (busno == 0) {
-		if (slot >= 1)
+		iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_IND_ADDR,
+				     where & CFG_IND_ADDR_MASK);
+		offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_IND_DATA);
+		if (iproc_pcie_reg_is_invalid(offset))
 			return NULL;
-		writel(where & CFG_IND_ADDR_MASK,
-		       pcie->base + CFG_IND_ADDR_OFFSET);
-		return (pcie->base + CFG_IND_DATA_OFFSET);
+		else
+			return (pcie->base + offset);
 	}
 
-	if (fn > 1)
-		return NULL;
-
 	/* EP device access */
 	val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
 		(slot << CFG_ADDR_DEV_NUM_SHIFT) |
 		(fn << CFG_ADDR_FUNC_NUM_SHIFT) |
 		(where & CFG_ADDR_REG_NUM_MASK) |
 		(1 & CFG_ADDR_CFG_TYPE_MASK);
-	writel(val, pcie->base + CFG_ADDR_OFFSET);
-
-	return (pcie->base + CFG_DATA_OFFSET);
+	iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
+	offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
+	if (iproc_pcie_reg_is_invalid(offset))
+		return NULL;
+	else
+		return (pcie->base + offset);
 }
 
 static struct pci_ops iproc_pcie_ops = {
@@ -138,18 +237,29 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
 {
 	u32 val;
 
+	if (pcie->type == IPROC_PCIE_PAXC) {
+		val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
+		val &= ~PAXC_RESET_MASK;
+		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
+		udelay(100);
+		val |= PAXC_RESET_MASK;
+		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
+		udelay(100);
+		return;
+	}
+
 	/*
 	 * Select perst_b signal as reset source. Put the device into reset,
 	 * and then bring it out of reset
 	 */
-	val = readl(pcie->base + CLK_CONTROL_OFFSET);
+	val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
 	val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
 		~RC_PCIE_RST_OUTPUT;
-	writel(val, pcie->base + CLK_CONTROL_OFFSET);
+	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
 	udelay(250);
 
 	val |= RC_PCIE_RST_OUTPUT;
-	writel(val, pcie->base + CLK_CONTROL_OFFSET);
+	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
 	msleep(100);
 }
 
@@ -160,7 +270,14 @@ static int iproc_pcie_check_link(struct iproc_pcie *pcie, struct pci_bus *bus)
 	u16 pos, link_status;
 	bool link_is_active = false;
 
-	val = readl(pcie->base + PCIE_LINK_STATUS_OFFSET);
+	/*
+	 * PAXC connects to emulated endpoint devices directly and does not
+	 * have a Serdes. Therefore skip the link detection logic here
+	 */
+	if (pcie->type == IPROC_PCIE_PAXC)
+		return 0;
+
+	val = iproc_pcie_read_reg(pcie, IPROC_PCIE_LINK_STATUS);
 	if (!(val & PCIE_PHYLINKUP) || !(val & PCIE_DL_ACTIVE)) {
 		dev_err(pcie->dev, "PHY or data link is INACTIVE!\n");
 		return -ENODEV;
@@ -221,7 +338,7 @@ static int iproc_pcie_check_link(struct iproc_pcie *pcie, struct pci_bus *bus)
 
 static void iproc_pcie_enable(struct iproc_pcie *pcie)
 {
-	writel(SYS_RC_INTX_MASK, pcie->base + SYS_RC_INTX_EN);
+	iproc_pcie_write_reg(pcie, IPROC_PCIE_INTX_EN, SYS_RC_INTX_MASK);
 }
 
 /**
@@ -272,11 +389,15 @@ static int iproc_pcie_setup_ob(struct iproc_pcie *pcie, u64 axi_addr,
 	axi_addr -= ob->axi_offset;
 
 	for (i = 0; i < MAX_NUM_OB_WINDOWS; i++) {
-		writel(lower_32_bits(axi_addr) | OARR_VALID |
-		       (ob->set_oarr_size ? 1 : 0), pcie->base + OARR_LO(i));
-		writel(upper_32_bits(axi_addr), pcie->base + OARR_HI(i));
-		writel(lower_32_bits(pci_addr), pcie->base + OMAP_LO(i));
-		writel(upper_32_bits(pci_addr), pcie->base + OMAP_HI(i));
+		iproc_pcie_ob_write(pcie, IPROC_PCIE_OARR_LO, i,
+				    lower_32_bits(axi_addr) | OARR_VALID |
+				    (ob->set_oarr_size ? 1 : 0));
+		iproc_pcie_ob_write(pcie, IPROC_PCIE_OARR_HI, i,
+				    upper_32_bits(axi_addr));
+		iproc_pcie_ob_write(pcie, IPROC_PCIE_OMAP_LO, i,
+				    lower_32_bits(pci_addr));
+		iproc_pcie_ob_write(pcie, IPROC_PCIE_OMAP_HI, i,
+				    upper_32_bits(pci_addr));
 
 		size -= ob->window_size;
 		if (size == 0)
@@ -340,6 +461,19 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
 		goto err_exit_phy;
 	}
 
+	switch (pcie->type) {
+	case IPROC_PCIE_PAXB:
+		pcie->reg_offsets = iproc_pcie_reg_paxb;
+		break;
+	case IPROC_PCIE_PAXC:
+		pcie->reg_offsets = iproc_pcie_reg_paxc;
+		break;
+	default:
+		dev_err(pcie->dev, "incompatible iProc PCIe interface\n");
+		ret = -EINVAL;
+		goto err_power_off_phy;
+	}
+
 	iproc_pcie_reset(pcie);
 
 	if (pcie->need_ob_cfg) {
diff --git a/drivers/pci/host/pcie-iproc.h b/drivers/pci/host/pcie-iproc.h
index d3dc940..051b651 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ -15,6 +15,20 @@
 #define _PCIE_IPROC_H
 
 /**
+ * iProc PCIe interface type
+ *
+ * PAXB is the wrapper used in root complex that can be connected to an
+ * external endpoint device
+ *
+ * PAXC is the wrapper used in root complex dedicated for internal emulated
+ * endpoint devices
+ */
+enum iproc_pcie_type {
+	IPROC_PCIE_PAXB = 0,
+	IPROC_PCIE_PAXC,
+};
+
+/**
  * iProc PCIe outbound mapping
  * @set_oarr_size: indicates the OARR size bit needs to be set
  * @axi_offset: offset from the AXI address to the internal address used by
@@ -29,7 +43,10 @@ struct iproc_pcie_ob {
 
 /**
  * iProc PCIe device
+ *
  * @dev: pointer to device data structure
+ * @type: iProc PCIe interface type
+ * @reg_offsets: register offsets
  * @base: PCIe host controller I/O register base
  * @sysdata: Per PCI controller data (ARM-specific)
  * @root_bus: pointer to root bus
@@ -41,6 +58,8 @@ struct iproc_pcie_ob {
  */
 struct iproc_pcie {
 	struct device *dev;
+	enum iproc_pcie_type type;
+	const u16 *reg_offsets;
 	void __iomem *base;
 #ifdef CONFIG_ARM
 	struct pci_sys_data sysdata;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] PCI: iproc: Add iProc PCIe MSI device tree binding
  2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
  2015-11-18  0:31 ` [PATCH 1/5] PCI: iproc: Update iProc PCIe device tree binding Ray Jui
  2015-11-18  0:31 ` [PATCH 2/5] PCI: iproc: Add PAXC interface support Ray Jui
@ 2015-11-18  0:31 ` Ray Jui
  2015-11-18  0:31 ` [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support Ray Jui
  2015-11-18  0:31 ` [PATCH 5/5] ARM: dts: Enable MSI support for Broadcom Cygnus Ray Jui
  4 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

This patch updates the iProc PCIe device tree bindings with added
binding information for MSI

Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
---
 .../devicetree/bindings/pci/brcm,iproc-pcie.txt    | 40 ++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
index 06eae0f..701d626 100644
--- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
@@ -35,6 +35,31 @@ Optional:
 - brcm,pcie-ob-oarr-size: Some iProc SoCs need the OARR size bit to be set to
 increase the outbound window size
 
+MSI support (optional):
+
+For older platforms without MSI integrated in the GIC, iProc PCIe core provides
+an event queue based MSI support. The iProc MSI uses host memories to store
+MSI posted writes and event queues
+
+- msi-parent: Link to the device node of the MSI controller. On newer iProc
+platforms, the MSI controller may be gicv2m or gicv3-its. On older iProc
+platforms without MSI support in its interrupt controller, one may use the
+event queue based MSI support integrated within the iProc PCIe core
+
+When the iProc event queue based MSI is used, one needs to define the
+following properties in the MSI device node:
+- compatible: Must be "brcm,iproc-msi"
+- msi-controller: claims itself as an MSI controller
+- interrupt-parent: Link to its parent interrupt device
+- interrupts: List of interrupt IDs from its parent interrupt device
+- brcm,num-eq-region: Required number of 4K aligned memory region for MSI event
+queue
+- brcm,num-msi-msg-region: Required number of 4K aligned memory region for MSI
+posted writes
+Optional:
+- brcm,pcie-msi-inten: Needs to be present for some older iProc platforms that
+require the interrupt enable registers to be set explicitly to enable MSI
+
 Example:
 	pcie0: pcie@18012000 {
 		compatible = "brcm,iproc-pcie";
@@ -61,6 +86,21 @@ Example:
 		brcm,pcie-ob-oarr-size;
 		brcm,pcie-ob-axi-offset = <0x00000000>;
 		brcm,pcie-ob-window-size = <256>;
+
+		msi-parent = <&msi0>;
+
+		/* iProc event queue based MSI */
+		msi0: msi@18012000 {
+			compatible = "brcm,iproc-msi";
+			msi-controller;
+			interrupt-parent = <&gic>;
+			interrupts = <GIC_SPI 96 IRQ_TYPE_NONE>,
+				     <GIC_SPI 97 IRQ_TYPE_NONE>,
+				     <GIC_SPI 98 IRQ_TYPE_NONE>,
+				     <GIC_SPI 99 IRQ_TYPE_NONE>,
+			brcm,num-eq-region = <1>;
+			brcm,num-msi-msg-region = <1>;
+		};
 	};
 
 	pcie1: pcie@18013000 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
                   ` (2 preceding siblings ...)
  2015-11-18  0:31 ` [PATCH 3/5] PCI: iproc: Add iProc PCIe MSI device tree binding Ray Jui
@ 2015-11-18  0:31 ` Ray Jui
  2015-11-18  8:48   ` Marc Zyngier
  2015-11-18  0:31 ` [PATCH 5/5] ARM: dts: Enable MSI support for Broadcom Cygnus Ray Jui
  4 siblings, 1 reply; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
all iProc based platforms. The patch follows the latest trend in the
kernel to use MSI domain based implementation

This iProc event queue based MSI support should not be used with newer
platforms with integrated MSI support in the GIC (e.g., giv2m or
gicv3-its)

Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
---
 drivers/pci/host/Kconfig          |   9 +
 drivers/pci/host/Makefile         |   1 +
 drivers/pci/host/pcie-iproc-msi.c | 434 ++++++++++++++++++++++++++++++++++++++
 drivers/pci/host/pcie-iproc.c     |  19 ++
 drivers/pci/host/pcie-iproc.h     |  12 ++
 5 files changed, 475 insertions(+)
 create mode 100644 drivers/pci/host/pcie-iproc-msi.c

diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index f131ba9..972e906 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -126,6 +126,15 @@ config PCIE_IPROC
 	  iProc family of SoCs. An appropriate bus interface driver also needs
 	  to be enabled
 
+config PCIE_IPROC_MSI
+	bool "Broadcom iProc PCIe MSI support"
+	depends on ARCH_BCM_IPROC && PCI_MSI
+	select PCI_MSI_IRQ_DOMAIN
+	default ARCH_BCM_IPROC
+	help
+	  Say Y here if you want to enable MSI support for Broadcom's iProc
+	  PCIe controller
+
 config PCIE_IPROC_PLATFORM
 	tristate "Broadcom iProc PCIe platform bus driver"
 	depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 9d4d3c6..0e4e95e 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
 obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
 obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
 obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
+obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
 obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
 obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
 obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
diff --git a/drivers/pci/host/pcie-iproc-msi.c b/drivers/pci/host/pcie-iproc-msi.c
new file mode 100644
index 0000000..a55c707
--- /dev/null
+++ b/drivers/pci/host/pcie-iproc-msi.c
@@ -0,0 +1,434 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/irqchip/chained_irq.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
+#include <linux/of_irq.h>
+#include <linux/of_pci.h>
+#include <linux/pci.h>
+
+#include "pcie-iproc.h"
+
+#define IPROC_MSI_INTS_EN_OFFSET       0x208
+#define IPROC_MSI_INTR_EN_SHIFT        11
+#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
+#define IPROC_MSI_INT_N_EVENT_SHIFT    1
+#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
+#define IPROC_MSI_EQ_EN_SHIFT          0
+#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
+
+#define IPROC_MSI_EQ_MASK              0x3f
+
+/* number of queues in each event queue */
+#define IPROC_MSI_EQ_LEN               64
+
+/* size of each event queue memory region */
+#define EQ_MEM_REGION_SIZE           SZ_4K
+
+/* size of each MSI message memory region */
+#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
+
+enum iproc_msi_reg {
+	IPROC_MSI_EQ_PAGE = 0,
+	IPROC_MSI_EQ_PAGE_UPPER,
+	IPROC_MSI_PAGE,
+	IPROC_MSI_PAGE_UPPER,
+	IPROC_MSI_CTRL,
+	IPROC_MSI_EQ_HEAD,
+	IPROC_MSI_EQ_TAIL,
+	IPROC_MSI_REG_SIZE,
+};
+
+/**
+ * iProc event queue based MSI
+ *
+ * Only meant to be used on platforms without MSI support integrated into the
+ * GIC
+ *
+ * @pcie: pointer to iProc PCIe data
+ * @reg_offsets: MSI register offsets
+ * @irqs: pointer to an array that contains the interrupt IDs
+ * @nirqs: number of total interrupts
+ * @has_inten_reg: indicates the MSI interrupt enable register needs to be
+ * set explicitly (required for some legacy platforms)
+ * @used: bitmap to track usage of MSI
+ * @inner_domain: inner IRQ domain
+ * @msi_domain: MSI IRQ domain
+ * @bitmap_lock: lock to protect access to the IRQ bitmap
+ * @n_eq_region: required number of 4K aligned memory region for MSI event
+ * queues
+ * @n_msi_msg_region: required number of 4K aligned memory region for MSI
+ * posted writes
+ * @eq_base: pointer to allocated memory region for MSI event queues
+ * @msi_base: pointer to allocated memory region for MSI posted writes
+ */
+struct iproc_msi {
+	struct iproc_pcie *pcie;
+	const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
+	int *irqs;
+	int nirqs;
+	bool has_inten_reg;
+	DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
+	struct irq_domain *inner_domain;
+	struct irq_domain *msi_domain;
+	struct mutex bitmap_lock;
+	unsigned int n_eq_region;
+	unsigned int n_msi_msg_region;
+	void *eq_base;
+	void *msi_base;
+};
+
+static const u16
+iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
+	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
+};
+
+static const u16
+iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
+	{ 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
+	{ 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
+	{ 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
+	{ 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
+};
+
+static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
+				     enum iproc_msi_reg reg,
+				     unsigned int eq)
+{
+	struct iproc_pcie *pcie = msi->pcie;
+
+	return readl(pcie->base + msi->reg_offsets[eq][reg]);
+}
+
+static inline void iproc_msi_write_reg(struct iproc_msi *msi,
+				       enum iproc_msi_reg reg,
+				       int eq, u32 val)
+{
+	struct iproc_pcie *pcie = msi->pcie;
+
+	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
+}
+
+static struct irq_chip iproc_msi_top_irq_chip = {
+	.name = "iProc MSI",
+	.irq_enable = pci_msi_unmask_irq,
+	.irq_disable = pci_msi_mask_irq,
+	.irq_mask = pci_msi_mask_irq,
+	.irq_unmask = pci_msi_unmask_irq,
+};
+
+static struct msi_domain_info iproc_msi_domain_info = {
+	.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+		MSI_FLAG_PCI_MSIX,
+	.chip = &iproc_msi_top_irq_chip,
+};
+
+static int iproc_msi_irq_set_affinity(struct irq_data *data,
+				      const struct cpumask *mask, bool force)
+{
+	return -EINVAL;
+}
+
+static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
+					  struct msi_msg *msg)
+{
+	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
+	phys_addr_t addr;
+
+	addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
+	msg->address_lo = lower_32_bits(addr);
+	msg->address_hi = upper_32_bits(addr);
+	msg->data = data->hwirq;
+}
+
+static struct irq_chip iproc_msi_bottom_irq_chip = {
+	.name = "MSI",
+	.irq_set_affinity = iproc_msi_irq_set_affinity,
+	.irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
+};
+
+static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
+				      unsigned int virq, unsigned int nr_irqs,
+				      void *args)
+{
+	struct iproc_msi *msi = domain->host_data;
+	int i, msi_irq;
+
+	mutex_lock(&msi->bitmap_lock);
+
+	for (i = 0; i < nr_irqs; i++) {
+		msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
+		if (msi_irq < msi->nirqs) {
+			set_bit(msi_irq, msi->used);
+		} else {
+			mutex_unlock(&msi->bitmap_lock);
+			return -ENOSPC;
+		}
+
+		irq_domain_set_info(domain, virq + i, msi_irq,
+				    &iproc_msi_bottom_irq_chip,
+				    domain->host_data, handle_simple_irq,
+				    NULL, NULL);
+	}
+
+	mutex_unlock(&msi->bitmap_lock);
+	return 0;
+}
+
+static void iproc_msi_irq_domain_free(struct irq_domain *domain,
+				      unsigned int virq, unsigned int nr_irqs)
+{
+	struct irq_data *data = irq_domain_get_irq_data(domain, virq);
+	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
+	unsigned int i;
+
+	mutex_lock(&msi->bitmap_lock);
+
+	for (i = 0; i < nr_irqs; i++) {
+		struct irq_data *data = irq_domain_get_irq_data(domain,
+								virq + i);
+		if (!test_bit(data->hwirq, msi->used)) {
+			dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
+				 data->hwirq);
+		} else
+			clear_bit(data->hwirq, msi->used);
+	}
+
+	mutex_unlock(&msi->bitmap_lock);
+	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops msi_domain_ops = {
+	.alloc = iproc_msi_irq_domain_alloc,
+	.free = iproc_msi_irq_domain_free,
+};
+
+static void iproc_msi_enable(struct iproc_msi *msi)
+{
+	struct iproc_pcie *pcie = msi->pcie;
+	int i, eq;
+	u32 val;
+
+	/* program memory region for each event queue */
+	for (i = 0; i < msi->n_eq_region; i++) {
+		phys_addr_t addr =
+			virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
+
+		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
+				    lower_32_bits(addr));
+		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
+				    upper_32_bits(addr));
+	}
+
+	/* program memory region for MSI posted writes */
+	for (i = 0; i < msi->n_msi_msg_region; i++) {
+		phys_addr_t addr =
+			virt_to_phys(msi->msi_base +
+				     (i * MSI_MSG_MEM_REGION_SIZE));
+
+		iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
+				    lower_32_bits(addr));
+		iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
+				    upper_32_bits(addr));
+	}
+
+	for (eq = 0; eq < msi->nirqs; eq++) {
+		/* enable MSI event queue */
+		val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
+			IPROC_MSI_EQ_EN;
+		iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
+
+		/*
+		 * Some legacy platforms require the MSI interrupt enable
+		 * register to be set explicitly
+		 */
+		if (msi->has_inten_reg) {
+			val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
+			val |= BIT(eq);
+			writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
+		}
+	}
+}
+
+static void iproc_msi_handler(struct irq_desc *desc)
+{
+	unsigned int irq = irq_desc_get_irq(desc);
+	struct irq_chip *irq_chip = irq_desc_get_chip(desc);
+	struct iproc_msi *msi;
+	struct iproc_pcie *pcie;
+	u32 eq, head, tail, num_events;
+	int virq;
+
+	chained_irq_enter(irq_chip, desc);
+
+	msi = irq_get_handler_data(irq);
+	pcie = msi->pcie;
+
+	eq = irq - msi->irqs[0];
+	virq = irq_find_mapping(msi->inner_domain, eq);
+	head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
+		IPROC_MSI_EQ_MASK;
+	do {
+		tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
+			IPROC_MSI_EQ_MASK;
+
+		num_events = (tail < head) ?
+			(IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
+		if (!num_events)
+			break;
+
+		generic_handle_irq(virq);
+
+		head++;
+		head %= IPROC_MSI_EQ_LEN;
+		iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
+	} while (true);
+
+	chained_irq_exit(irq_chip, desc);
+}
+
+int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node)
+{
+	struct iproc_msi *msi;
+	struct device_node *parent_node;
+	struct irq_domain *parent_domain;
+	int i, ret;
+
+	if (!of_device_is_compatible(node, "brcm,iproc-msi"))
+		return -ENODEV;
+
+	if (!of_find_property(node, "msi-controller", NULL))
+		return -ENODEV;
+
+	parent_node = of_parse_phandle(node, "interrupt-parent", 0);
+	if (!parent_node) {
+		dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
+		return -ENODEV;
+	}
+
+	parent_domain = irq_find_host(parent_node);
+	if (!parent_domain) {
+		dev_err(pcie->dev, "unable to get MSI parent domain\n");
+		return -ENODEV;
+	}
+
+	msi = devm_kzalloc(pcie->dev, sizeof(*msi), GFP_KERNEL);
+	if (!msi)
+		return -ENOMEM;
+
+	msi->pcie = pcie;
+	mutex_init(&msi->bitmap_lock);
+
+	switch (pcie->type) {
+	case IPROC_PCIE_PAXB:
+		msi->reg_offsets = iproc_msi_reg_paxb;
+		break;
+	case IPROC_PCIE_PAXC:
+		msi->reg_offsets = iproc_msi_reg_paxc;
+		break;
+	default:
+		dev_err(pcie->dev, "incompatible iProc PCIe interface\n");
+		return -EINVAL;
+	}
+
+	ret = of_property_read_u32(node, "brcm,num-eq-region",
+				   &msi->n_eq_region);
+	if (ret || msi->n_eq_region == 0) {
+		dev_err(pcie->dev,
+			"invalid property 'brcm,num-eq-region' %u\n",
+			msi->n_eq_region);
+		return -ENODEV;
+	}
+
+	ret = of_property_read_u32(node, "brcm,num-msi-msg-region",
+				   &msi->n_msi_msg_region);
+	if (ret || msi->n_msi_msg_region == 0) {
+		dev_err(pcie->dev,
+			"invalid property 'brcm,num-msi-msg-region' %u\n",
+			msi->n_msi_msg_region);
+		return -ENODEV;
+	}
+
+	/* reserve memory for MSI event queue */
+	msi->eq_base = devm_kcalloc(pcie->dev, msi->n_eq_region + 1,
+				    EQ_MEM_REGION_SIZE, GFP_KERNEL);
+	if (!msi->eq_base)
+		return -ENOMEM;
+	msi->eq_base = PTR_ALIGN(msi->eq_base, EQ_MEM_REGION_SIZE);
+
+	/* reserve memory for MSI posted writes */
+	msi->msi_base = devm_kcalloc(pcie->dev, msi->n_msi_msg_region + 1,
+				     MSI_MSG_MEM_REGION_SIZE, GFP_KERNEL);
+	if (!msi->msi_base)
+		return -ENOMEM;
+	msi->msi_base = PTR_ALIGN(msi->msi_base, MSI_MSG_MEM_REGION_SIZE);
+
+	if (of_find_property(node, "brcm,pcie-msi-inten", NULL))
+		msi->has_inten_reg = true;
+
+	msi->nirqs = of_irq_count(node);
+	if (!msi->nirqs) {
+		dev_err(pcie->dev, "found no MSI interrupt in DT\n");
+		return -ENODEV;
+	}
+	if (msi->nirqs > IPROC_PCIE_MAX_NUM_IRQS) {
+		dev_warn(pcie->dev, "too many MSI interrupts defined %d\n",
+			 msi->nirqs);
+		msi->nirqs = IPROC_PCIE_MAX_NUM_IRQS;
+	}
+	msi->irqs = devm_kcalloc(pcie->dev, msi->nirqs, sizeof(*msi->irqs),
+				 GFP_KERNEL);
+	if (!msi->irqs)
+		return -ENOMEM;
+
+	for (i = 0; i < msi->nirqs; i++) {
+		msi->irqs[i] = irq_of_parse_and_map(node, i);
+		if (!msi->irqs[i]) {
+			dev_err(pcie->dev, "unable to parse/map interrupt\n");
+			return -ENODEV;
+		}
+	}
+
+	msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
+						     msi->nirqs, NULL,
+						     &msi_domain_ops,
+						     msi);
+	if (!msi->inner_domain) {
+		dev_err(pcie->dev, "failed to create inner domain\n");
+		return -ENOMEM;
+	}
+
+	msi->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(node),
+						    &iproc_msi_domain_info,
+						    msi->inner_domain);
+	if (!msi->msi_domain) {
+		dev_err(pcie->dev, "failed to create MSI domain\n");
+		irq_domain_remove(msi->inner_domain);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < msi->nirqs; i++)
+		irq_set_chained_handler_and_data(msi->irqs[i],
+						 iproc_msi_handler, msi);
+
+	iproc_msi_enable(msi);
+
+	return 0;
+}
+EXPORT_SYMBOL(iproc_msi_init);
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 24d5b62..a575ef3 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -440,6 +440,21 @@ static int iproc_pcie_map_ranges(struct iproc_pcie *pcie,
 	return 0;
 }
 
+static int iproc_pcie_msi_enable(struct iproc_pcie *pcie)
+{
+	struct device_node *msi_node;
+
+	msi_node = of_parse_phandle(pcie->dev->of_node, "msi-parent", 0);
+	if (!msi_node)
+		return -ENODEV;
+
+	/*
+	 * If another MSI controller is being used, the call below should fail
+	 * but that is okay
+	 */
+	return iproc_msi_init(pcie, msi_node);
+}
+
 int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
 {
 	int ret;
@@ -507,6 +522,10 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
 
 	iproc_pcie_enable(pcie);
 
+	if (IS_ENABLED(CONFIG_PCI_MSI))
+		if (iproc_pcie_msi_enable(pcie))
+			dev_info(pcie->dev, "not using iProc MSI\n");
+
 	pci_scan_child_bus(bus);
 	pci_assign_unassigned_bus_resources(bus);
 	pci_fixup_irqs(pci_common_swizzle, pcie->map_irq);
diff --git a/drivers/pci/host/pcie-iproc.h b/drivers/pci/host/pcie-iproc.h
index 051b651..17317ef 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ -14,6 +14,8 @@
 #ifndef _PCIE_IPROC_H
 #define _PCIE_IPROC_H
 
+#define IPROC_PCIE_MAX_NUM_IRQS 6
+
 /**
  * iProc PCIe interface type
  *
@@ -74,4 +76,14 @@ struct iproc_pcie {
 int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res);
 int iproc_pcie_remove(struct iproc_pcie *pcie);
 
+#ifdef CONFIG_PCI_MSI
+int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node);
+#else
+static inline int iproc_msi_init(struct iproc_pcie *pcie,
+				 struct device_node *node)
+{
+	return -ENODEV;
+}
+#endif
+
 #endif /* _PCIE_IPROC_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] ARM: dts: Enable MSI support for Broadcom Cygnus
  2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
                   ` (3 preceding siblings ...)
  2015-11-18  0:31 ` [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support Ray Jui
@ 2015-11-18  0:31 ` Ray Jui
  4 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci, Ray Jui

Enable MSI support for Broadcom Cygnus platforms

Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Pramod KUMAR <pramodku@broadcom.com>
Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
---
 arch/arm/boot/dts/bcm-cygnus.dtsi | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi
index 2778533..bae1ac2 100644
--- a/arch/arm/boot/dts/bcm-cygnus.dtsi
+++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
@@ -162,6 +162,19 @@
 				  0x82000000 0 0x20000000 0x20000000 0 0x04000000>;
 
 			status = "disabled";
+
+			msi-parent = <&msi0>;
+			msi0: msi@18012000 {
+				compatible = "brcm,iproc-msi";
+				msi-controller;
+				interrupt-parent = <&gic>;
+				interrupts = <GIC_SPI 96 IRQ_TYPE_NONE>,
+					     <GIC_SPI 97 IRQ_TYPE_NONE>,
+					     <GIC_SPI 98 IRQ_TYPE_NONE>,
+					     <GIC_SPI 99 IRQ_TYPE_NONE>;
+				brcm,num-eq-region = <1>;
+				brcm,num-msi-msg-region = <1>;
+			};
 		};
 
 		pcie1: pcie@18013000 {
@@ -183,6 +196,19 @@
 				  0x82000000 0 0x40000000 0x40000000 0 0x04000000>;
 
 			status = "disabled";
+
+			msi-parent = <&msi1>;
+			msi1: msi@18013000 {
+				compatible = "brcm,iproc-msi";
+				msi-controller;
+				interrupt-parent = <&gic>;
+				interrupts = <GIC_SPI 102 IRQ_TYPE_NONE>,
+					     <GIC_SPI 103 IRQ_TYPE_NONE>,
+					     <GIC_SPI 104 IRQ_TYPE_NONE>,
+					     <GIC_SPI 105 IRQ_TYPE_NONE>;
+				brcm,num-eq-region = <1>;
+				brcm,num-msi-msg-region = <1>;
+			};
 		};
 
 		uart0: serial@18020000 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] PCI: iproc: Add PAXC interface support
  2015-11-18  0:31 ` [PATCH 2/5] PCI: iproc: Add PAXC interface support Ray Jui
@ 2015-11-18  0:34   ` Florian Fainelli
  2015-11-18  0:46     ` Ray Jui
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2015-11-18  0:34 UTC (permalink / raw)
  To: Ray Jui, Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On 17/11/15 16:31, Ray Jui wrote:
> Traditionally, all iProc PCIe root complexes use PAXB based wrapper,
> with an integrated on-chip Serdes to support external endpoint devices.
> On newer iProc platforms, a PAXC based wrapper is introduced, for
> connection with internally emulated PCIe endpoint devices in the ASIC
> 
> This patch adds support for PAXC based iProc PCIe root complex in the
> iProc PCIe core driver. This change fators out common logic between
> PAXB and PAXC, and use tables to store register offsets that are
> different between PAXB and PAXC. This allows the driver to be scaled to
> support subsequent PAXC revisions in the future
> 
> Signed-off-by: Ray Jui <rjui@broadcom.com>
> Reviewed-by: Scott Branden <sbranden@broadcom.com>
> ---
>  drivers/pci/host/pcie-iproc-platform.c |   8 ++
>  drivers/pci/host/pcie-iproc.c          | 202 +++++++++++++++++++++++++++------
>  drivers/pci/host/pcie-iproc.h          |  19 ++++
>  3 files changed, 195 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
> index c9550dc..716b56b 100644
> --- a/drivers/pci/host/pcie-iproc-platform.c
> +++ b/drivers/pci/host/pcie-iproc-platform.c
> @@ -42,6 +42,13 @@ static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
>  	pcie->dev = &pdev->dev;
>  	platform_set_drvdata(pdev, pcie);
>  
> +	if (of_device_is_compatible(np, "brcm,iproc-pcie"))
> +		pcie->type = IPROC_PCIE_PAXB;
> +	else if (of_device_is_compatible(np, "brcm,iproc-pcie-paxc"))
> +		pcie->type = IPROC_PCIE_PAXC;
> +	else
> +		return -ENODEV;

Sorry for not noticing earlier, but typically, to avoid repeating the
same compatible string twice (once if of_device_id, and somewhere else),
you would put the type in the .data member of the of_device_id lookup
table and you could fetch this directly here. So something like this:

 static const struct of_device_id iproc_pcie_of_match_table[] = {
 	{ .compatible = "brcm,iproc-pcie", .data = (int *)IPROC_PCIE_PAXB },
+	{ .compatible = "brcm,iproc-pcie-paxc", .data = (int *)IPROC_PCIE_PAXC },
 	{ /* sentinel */ }
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] PCI: iproc: Add PAXC interface support
  2015-11-18  0:46     ` Ray Jui
@ 2015-11-18  0:45       ` Florian Fainelli
  2015-11-18  0:47         ` Ray Jui
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2015-11-18  0:45 UTC (permalink / raw)
  To: Ray Jui, Florian Fainelli, Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On 17/11/15 16:46, Ray Jui wrote:
> 
> 
> On 11/17/2015 4:34 PM, Florian Fainelli wrote:
>> On 17/11/15 16:31, Ray Jui wrote:
>>> Traditionally, all iProc PCIe root complexes use PAXB based wrapper,
>>> with an integrated on-chip Serdes to support external endpoint devices.
>>> On newer iProc platforms, a PAXC based wrapper is introduced, for
>>> connection with internally emulated PCIe endpoint devices in the ASIC
>>>
>>> This patch adds support for PAXC based iProc PCIe root complex in the
>>> iProc PCIe core driver. This change fators out common logic between
>>> PAXB and PAXC, and use tables to store register offsets that are
>>> different between PAXB and PAXC. This allows the driver to be scaled to
>>> support subsequent PAXC revisions in the future
>>>
>>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>>> ---
>>>   drivers/pci/host/pcie-iproc-platform.c |   8 ++
>>>   drivers/pci/host/pcie-iproc.c          | 202
>>> +++++++++++++++++++++++++++------
>>>   drivers/pci/host/pcie-iproc.h          |  19 ++++
>>>   3 files changed, 195 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/drivers/pci/host/pcie-iproc-platform.c
>>> b/drivers/pci/host/pcie-iproc-platform.c
>>> index c9550dc..716b56b 100644
>>> --- a/drivers/pci/host/pcie-iproc-platform.c
>>> +++ b/drivers/pci/host/pcie-iproc-platform.c
>>> @@ -42,6 +42,13 @@ static int iproc_pcie_pltfm_probe(struct
>>> platform_device *pdev)
>>>       pcie->dev = &pdev->dev;
>>>       platform_set_drvdata(pdev, pcie);
>>>
>>> +    if (of_device_is_compatible(np, "brcm,iproc-pcie"))
>>> +        pcie->type = IPROC_PCIE_PAXB;
>>> +    else if (of_device_is_compatible(np, "brcm,iproc-pcie-paxc"))
>>> +        pcie->type = IPROC_PCIE_PAXC;
>>> +    else
>>> +        return -ENODEV;
>>
>> Sorry for not noticing earlier, but typically, to avoid repeating the
>> same compatible string twice (once if of_device_id, and somewhere else),
>> you would put the type in the .data member of the of_device_id lookup
>> table and you could fetch this directly here. So something like this:
>>
>>   static const struct of_device_id iproc_pcie_of_match_table[] = {
>>       { .compatible = "brcm,iproc-pcie", .data = (int
>> *)IPROC_PCIE_PAXB },
>> +    { .compatible = "brcm,iproc-pcie-paxc", .data = (int
>> *)IPROC_PCIE_PAXC },
>>       { /* sentinel */ }
>>
> 
> Just to confirm, if I do this, then the above code to check
> of_device_is_compatible becomes the following?
> 
> pcie->type = (enum iproc_pcie_type)of_id->data;

That is correct yes.
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] PCI: iproc: Add PAXC interface support
  2015-11-18  0:34   ` Florian Fainelli
@ 2015-11-18  0:46     ` Ray Jui
  2015-11-18  0:45       ` Florian Fainelli
  0 siblings, 1 reply; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:46 UTC (permalink / raw)
  To: Florian Fainelli, Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci



On 11/17/2015 4:34 PM, Florian Fainelli wrote:
> On 17/11/15 16:31, Ray Jui wrote:
>> Traditionally, all iProc PCIe root complexes use PAXB based wrapper,
>> with an integrated on-chip Serdes to support external endpoint devices.
>> On newer iProc platforms, a PAXC based wrapper is introduced, for
>> connection with internally emulated PCIe endpoint devices in the ASIC
>>
>> This patch adds support for PAXC based iProc PCIe root complex in the
>> iProc PCIe core driver. This change fators out common logic between
>> PAXB and PAXC, and use tables to store register offsets that are
>> different between PAXB and PAXC. This allows the driver to be scaled to
>> support subsequent PAXC revisions in the future
>>
>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>> ---
>>   drivers/pci/host/pcie-iproc-platform.c |   8 ++
>>   drivers/pci/host/pcie-iproc.c          | 202 +++++++++++++++++++++++++++------
>>   drivers/pci/host/pcie-iproc.h          |  19 ++++
>>   3 files changed, 195 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
>> index c9550dc..716b56b 100644
>> --- a/drivers/pci/host/pcie-iproc-platform.c
>> +++ b/drivers/pci/host/pcie-iproc-platform.c
>> @@ -42,6 +42,13 @@ static int iproc_pcie_pltfm_probe(struct platform_device *pdev)
>>   	pcie->dev = &pdev->dev;
>>   	platform_set_drvdata(pdev, pcie);
>>
>> +	if (of_device_is_compatible(np, "brcm,iproc-pcie"))
>> +		pcie->type = IPROC_PCIE_PAXB;
>> +	else if (of_device_is_compatible(np, "brcm,iproc-pcie-paxc"))
>> +		pcie->type = IPROC_PCIE_PAXC;
>> +	else
>> +		return -ENODEV;
>
> Sorry for not noticing earlier, but typically, to avoid repeating the
> same compatible string twice (once if of_device_id, and somewhere else),
> you would put the type in the .data member of the of_device_id lookup
> table and you could fetch this directly here. So something like this:
>
>   static const struct of_device_id iproc_pcie_of_match_table[] = {
>   	{ .compatible = "brcm,iproc-pcie", .data = (int *)IPROC_PCIE_PAXB },
> +	{ .compatible = "brcm,iproc-pcie-paxc", .data = (int *)IPROC_PCIE_PAXC },
>   	{ /* sentinel */ }
>

Just to confirm, if I do this, then the above code to check 
of_device_is_compatible becomes the following?

pcie->type = (enum iproc_pcie_type)of_id->data;

Thanks,

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] PCI: iproc: Add PAXC interface support
  2015-11-18  0:45       ` Florian Fainelli
@ 2015-11-18  0:47         ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-18  0:47 UTC (permalink / raw)
  To: Florian Fainelli, Bjorn Helgaas
  Cc: Marc Zyngier, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci



On 11/17/2015 4:45 PM, Florian Fainelli wrote:
> On 17/11/15 16:46, Ray Jui wrote:
>>
>>
>> On 11/17/2015 4:34 PM, Florian Fainelli wrote:
>>> On 17/11/15 16:31, Ray Jui wrote:
>>>> Traditionally, all iProc PCIe root complexes use PAXB based wrapper,
>>>> with an integrated on-chip Serdes to support external endpoint devices.
>>>> On newer iProc platforms, a PAXC based wrapper is introduced, for
>>>> connection with internally emulated PCIe endpoint devices in the ASIC
>>>>
>>>> This patch adds support for PAXC based iProc PCIe root complex in the
>>>> iProc PCIe core driver. This change fators out common logic between
>>>> PAXB and PAXC, and use tables to store register offsets that are
>>>> different between PAXB and PAXC. This allows the driver to be scaled to
>>>> support subsequent PAXC revisions in the future
>>>>
>>>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>>>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>>>> ---
>>>>    drivers/pci/host/pcie-iproc-platform.c |   8 ++
>>>>    drivers/pci/host/pcie-iproc.c          | 202
>>>> +++++++++++++++++++++++++++------
>>>>    drivers/pci/host/pcie-iproc.h          |  19 ++++
>>>>    3 files changed, 195 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/host/pcie-iproc-platform.c
>>>> b/drivers/pci/host/pcie-iproc-platform.c
>>>> index c9550dc..716b56b 100644
>>>> --- a/drivers/pci/host/pcie-iproc-platform.c
>>>> +++ b/drivers/pci/host/pcie-iproc-platform.c
>>>> @@ -42,6 +42,13 @@ static int iproc_pcie_pltfm_probe(struct
>>>> platform_device *pdev)
>>>>        pcie->dev = &pdev->dev;
>>>>        platform_set_drvdata(pdev, pcie);
>>>>
>>>> +    if (of_device_is_compatible(np, "brcm,iproc-pcie"))
>>>> +        pcie->type = IPROC_PCIE_PAXB;
>>>> +    else if (of_device_is_compatible(np, "brcm,iproc-pcie-paxc"))
>>>> +        pcie->type = IPROC_PCIE_PAXC;
>>>> +    else
>>>> +        return -ENODEV;
>>>
>>> Sorry for not noticing earlier, but typically, to avoid repeating the
>>> same compatible string twice (once if of_device_id, and somewhere else),
>>> you would put the type in the .data member of the of_device_id lookup
>>> table and you could fetch this directly here. So something like this:
>>>
>>>    static const struct of_device_id iproc_pcie_of_match_table[] = {
>>>        { .compatible = "brcm,iproc-pcie", .data = (int
>>> *)IPROC_PCIE_PAXB },
>>> +    { .compatible = "brcm,iproc-pcie-paxc", .data = (int
>>> *)IPROC_PCIE_PAXC },
>>>        { /* sentinel */ }
>>>
>>
>> Just to confirm, if I do this, then the above code to check
>> of_device_is_compatible becomes the following?
>>
>> pcie->type = (enum iproc_pcie_type)of_id->data;
>
> That is correct yes.
>

Okay thanks. Will wait for more comments and include this in the next 
iteration!

Ray



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-18  0:31 ` [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support Ray Jui
@ 2015-11-18  8:48   ` Marc Zyngier
  2015-11-18  9:50     ` Arnd Bergmann
  2015-11-19  1:37     ` Ray Jui
  0 siblings, 2 replies; 20+ messages in thread
From: Marc Zyngier @ 2015-11-18  8:48 UTC (permalink / raw)
  To: Ray Jui
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On Tue, 17 Nov 2015 16:31:54 -0800
Ray Jui <rjui@broadcom.com> wrote:

Hi Ray,

A few comments below.

> This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
> all iProc based platforms. The patch follows the latest trend in the
> kernel to use MSI domain based implementation
> 
> This iProc event queue based MSI support should not be used with newer
> platforms with integrated MSI support in the GIC (e.g., giv2m or
> gicv3-its)
> 
> Signed-off-by: Ray Jui <rjui@broadcom.com>
> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
> Reviewed-by: Scott Branden <sbranden@broadcom.com>
> ---
>  drivers/pci/host/Kconfig          |   9 +
>  drivers/pci/host/Makefile         |   1 +
>  drivers/pci/host/pcie-iproc-msi.c | 434 ++++++++++++++++++++++++++++++++++++++
>  drivers/pci/host/pcie-iproc.c     |  19 ++
>  drivers/pci/host/pcie-iproc.h     |  12 ++
>  5 files changed, 475 insertions(+)
>  create mode 100644 drivers/pci/host/pcie-iproc-msi.c
> 
> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
> index f131ba9..972e906 100644
> --- a/drivers/pci/host/Kconfig
> +++ b/drivers/pci/host/Kconfig
> @@ -126,6 +126,15 @@ config PCIE_IPROC
>  	  iProc family of SoCs. An appropriate bus interface driver also needs
>  	  to be enabled
>  
> +config PCIE_IPROC_MSI
> +	bool "Broadcom iProc PCIe MSI support"
> +	depends on ARCH_BCM_IPROC && PCI_MSI
> +	select PCI_MSI_IRQ_DOMAIN
> +	default ARCH_BCM_IPROC
> +	help
> +	  Say Y here if you want to enable MSI support for Broadcom's iProc
> +	  PCIe controller
> +
>  config PCIE_IPROC_PLATFORM
>  	tristate "Broadcom iProc PCIe platform bus driver"
>  	depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
> index 9d4d3c6..0e4e95e 100644
> --- a/drivers/pci/host/Makefile
> +++ b/drivers/pci/host/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>  obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
>  obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>  obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
> +obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
>  obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
>  obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
>  obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
> diff --git a/drivers/pci/host/pcie-iproc-msi.c b/drivers/pci/host/pcie-iproc-msi.c
> new file mode 100644
> index 0000000..a55c707
> --- /dev/null
> +++ b/drivers/pci/host/pcie-iproc-msi.c
> @@ -0,0 +1,434 @@
> +/*
> + * Copyright (C) 2015 Broadcom Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation version 2.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_pci.h>
> +#include <linux/pci.h>
> +
> +#include "pcie-iproc.h"
> +
> +#define IPROC_MSI_INTS_EN_OFFSET       0x208
> +#define IPROC_MSI_INTR_EN_SHIFT        11
> +#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
> +#define IPROC_MSI_INT_N_EVENT_SHIFT    1
> +#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
> +#define IPROC_MSI_EQ_EN_SHIFT          0
> +#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
> +
> +#define IPROC_MSI_EQ_MASK              0x3f
> +
> +/* number of queues in each event queue */
> +#define IPROC_MSI_EQ_LEN               64
> +
> +/* size of each event queue memory region */
> +#define EQ_MEM_REGION_SIZE           SZ_4K
> +
> +/* size of each MSI message memory region */
> +#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
> +
> +enum iproc_msi_reg {
> +	IPROC_MSI_EQ_PAGE = 0,
> +	IPROC_MSI_EQ_PAGE_UPPER,
> +	IPROC_MSI_PAGE,
> +	IPROC_MSI_PAGE_UPPER,
> +	IPROC_MSI_CTRL,
> +	IPROC_MSI_EQ_HEAD,
> +	IPROC_MSI_EQ_TAIL,
> +	IPROC_MSI_REG_SIZE,
> +};
> +
> +/**
> + * iProc event queue based MSI
> + *
> + * Only meant to be used on platforms without MSI support integrated into the
> + * GIC
> + *
> + * @pcie: pointer to iProc PCIe data
> + * @reg_offsets: MSI register offsets
> + * @irqs: pointer to an array that contains the interrupt IDs
> + * @nirqs: number of total interrupts
> + * @has_inten_reg: indicates the MSI interrupt enable register needs to be
> + * set explicitly (required for some legacy platforms)
> + * @used: bitmap to track usage of MSI
> + * @inner_domain: inner IRQ domain
> + * @msi_domain: MSI IRQ domain
> + * @bitmap_lock: lock to protect access to the IRQ bitmap
> + * @n_eq_region: required number of 4K aligned memory region for MSI event
> + * queues
> + * @n_msi_msg_region: required number of 4K aligned memory region for MSI
> + * posted writes
> + * @eq_base: pointer to allocated memory region for MSI event queues
> + * @msi_base: pointer to allocated memory region for MSI posted writes
> + */
> +struct iproc_msi {
> +	struct iproc_pcie *pcie;
> +	const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
> +	int *irqs;
> +	int nirqs;
> +	bool has_inten_reg;
> +	DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
> +	struct irq_domain *inner_domain;
> +	struct irq_domain *msi_domain;
> +	struct mutex bitmap_lock;
> +	unsigned int n_eq_region;
> +	unsigned int n_msi_msg_region;
> +	void *eq_base;
> +	void *msi_base;
> +};
> +
> +static const u16
> +iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
> +};
> +
> +static const u16
> +iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
> +	{ 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
> +	{ 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
> +	{ 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
> +	{ 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
> +};
> +
> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
> +				     enum iproc_msi_reg reg,
> +				     unsigned int eq)
> +{
> +	struct iproc_pcie *pcie = msi->pcie;
> +
> +	return readl(pcie->base + msi->reg_offsets[eq][reg]);

Do you need the extra barrier implied by readl? readl_relaxed should be
enough.

> +}
> +
> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
> +				       enum iproc_msi_reg reg,
> +				       int eq, u32 val)
> +{
> +	struct iproc_pcie *pcie = msi->pcie;
> +
> +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);

Same here for writel vs writel_relaxed.

> +}
> +
> +static struct irq_chip iproc_msi_top_irq_chip = {
> +	.name = "iProc MSI",
> +	.irq_enable = pci_msi_unmask_irq,
> +	.irq_disable = pci_msi_mask_irq,
> +	.irq_mask = pci_msi_mask_irq,
> +	.irq_unmask = pci_msi_unmask_irq,

There is no need to provide both enable/disable and mask/unmask. And
since pci_msi_{un}mask_irq is the default, you can get rid of these
function pointers anyway.

> +};
> +
> +static struct msi_domain_info iproc_msi_domain_info = {
> +	.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> +		MSI_FLAG_PCI_MSIX,
> +	.chip = &iproc_msi_top_irq_chip,
> +};
> +
> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
> +				      const struct cpumask *mask, bool force)
> +{
> +	return -EINVAL;

I wish people would stop building stupid HW that prevents proper
affinity setting for MSI...

> +}
> +
> +static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
> +					  struct msi_msg *msg)
> +{
> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
> +	phys_addr_t addr;
> +
> +	addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
> +	msg->address_lo = lower_32_bits(addr);
> +	msg->address_hi = upper_32_bits(addr);
> +	msg->data = data->hwirq;
> +}
> +
> +static struct irq_chip iproc_msi_bottom_irq_chip = {
> +	.name = "MSI",
> +	.irq_set_affinity = iproc_msi_irq_set_affinity,
> +	.irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
> +};
> +
> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
> +				      unsigned int virq, unsigned int nr_irqs,
> +				      void *args)
> +{
> +	struct iproc_msi *msi = domain->host_data;
> +	int i, msi_irq;
> +
> +	mutex_lock(&msi->bitmap_lock);
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		msi_irq = find_first_zero_bit(msi->used, msi->nirqs);

This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
we end up with a larger number of MSIs (32 or 64) multiplexed on top of
a small number of wired interrupts. Here, you seem to have a 1-1
mapping. Is that really the case?

If so (and assuming the wired interrupts are always contiguous), you
shouldn't represent this as a chained interrupt (a multiplexer), but as
a stacked irqchip, similar to what GICv2m does.

> +		if (msi_irq < msi->nirqs) {
> +			set_bit(msi_irq, msi->used);
> +		} else {
> +			mutex_unlock(&msi->bitmap_lock);
> +			return -ENOSPC;
> +		}
> +
> +		irq_domain_set_info(domain, virq + i, msi_irq,
> +				    &iproc_msi_bottom_irq_chip,
> +				    domain->host_data, handle_simple_irq,
> +				    NULL, NULL);
> +	}
> +
> +	mutex_unlock(&msi->bitmap_lock);
> +	return 0;
> +}
> +
> +static void iproc_msi_irq_domain_free(struct irq_domain *domain,
> +				      unsigned int virq, unsigned int nr_irqs)
> +{
> +	struct irq_data *data = irq_domain_get_irq_data(domain, virq);
> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
> +	unsigned int i;
> +
> +	mutex_lock(&msi->bitmap_lock);
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		struct irq_data *data = irq_domain_get_irq_data(domain,
> +								virq + i);
> +		if (!test_bit(data->hwirq, msi->used)) {
> +			dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
> +				 data->hwirq);
> +		} else
> +			clear_bit(data->hwirq, msi->used);
> +	}
> +
> +	mutex_unlock(&msi->bitmap_lock);
> +	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops msi_domain_ops = {
> +	.alloc = iproc_msi_irq_domain_alloc,
> +	.free = iproc_msi_irq_domain_free,
> +};
> +
> +static void iproc_msi_enable(struct iproc_msi *msi)
> +{
> +	struct iproc_pcie *pcie = msi->pcie;
> +	int i, eq;
> +	u32 val;
> +
> +	/* program memory region for each event queue */
> +	for (i = 0; i < msi->n_eq_region; i++) {
> +		phys_addr_t addr =
> +			virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
> +
> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
> +				    lower_32_bits(addr));
> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
> +				    upper_32_bits(addr));
> +	}
> +
> +	/* program memory region for MSI posted writes */
> +	for (i = 0; i < msi->n_msi_msg_region; i++) {
> +		phys_addr_t addr =
> +			virt_to_phys(msi->msi_base +
> +				     (i * MSI_MSG_MEM_REGION_SIZE));

You seem to be a victim of checkpatch. Please don't split statements
like this, it just make it harder to read. My terminal can wrap lines
very conveniently, and I don't care about the 80 character limit...

> +
> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
> +				    lower_32_bits(addr));
> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
> +				    upper_32_bits(addr));
> +	}
> +
> +	for (eq = 0; eq < msi->nirqs; eq++) {
> +		/* enable MSI event queue */
> +		val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
> +			IPROC_MSI_EQ_EN;
> +		iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
> +
> +		/*
> +		 * Some legacy platforms require the MSI interrupt enable
> +		 * register to be set explicitly
> +		 */
> +		if (msi->has_inten_reg) {
> +			val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
> +			val |= BIT(eq);
> +			writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
> +		}
> +	}
> +}
> +
> +static void iproc_msi_handler(struct irq_desc *desc)
> +{
> +	unsigned int irq = irq_desc_get_irq(desc);
> +	struct irq_chip *irq_chip = irq_desc_get_chip(desc);
> +	struct iproc_msi *msi;
> +	struct iproc_pcie *pcie;
> +	u32 eq, head, tail, num_events;
> +	int virq;
> +
> +	chained_irq_enter(irq_chip, desc);
> +
> +	msi = irq_get_handler_data(irq);
> +	pcie = msi->pcie;
> +
> +	eq = irq - msi->irqs[0];
> +	virq = irq_find_mapping(msi->inner_domain, eq);
> +	head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
> +		IPROC_MSI_EQ_MASK;
> +	do {
> +		tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
> +			IPROC_MSI_EQ_MASK;
> +
> +		num_events = (tail < head) ?
> +			(IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
> +		if (!num_events)
> +			break;
> +
> +		generic_handle_irq(virq);
> +
> +		head++;
> +		head %= IPROC_MSI_EQ_LEN;
> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
> +	} while (true);

That's unusual. You seem to get only one interrupt for a bunch of MSIs,
all representing the same IRQ? That feels very weird, as you can
usually collapse edge interrupts.

> +
> +	chained_irq_exit(irq_chip, desc);
> +}
> +
> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node)
> +{
> +	struct iproc_msi *msi;
> +	struct device_node *parent_node;
> +	struct irq_domain *parent_domain;
> +	int i, ret;
> +
> +	if (!of_device_is_compatible(node, "brcm,iproc-msi"))
> +		return -ENODEV;
> +
> +	if (!of_find_property(node, "msi-controller", NULL))
> +		return -ENODEV;
> +
> +	parent_node = of_parse_phandle(node, "interrupt-parent", 0);
> +	if (!parent_node) {
> +		dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
> +		return -ENODEV;
> +	}
> +
> +	parent_domain = irq_find_host(parent_node);
> +	if (!parent_domain) {
> +		dev_err(pcie->dev, "unable to get MSI parent domain\n");
> +		return -ENODEV;
> +	}
> +
> +	msi = devm_kzalloc(pcie->dev, sizeof(*msi), GFP_KERNEL);
> +	if (!msi)
> +		return -ENOMEM;
> +
> +	msi->pcie = pcie;
> +	mutex_init(&msi->bitmap_lock);
> +
> +	switch (pcie->type) {
> +	case IPROC_PCIE_PAXB:
> +		msi->reg_offsets = iproc_msi_reg_paxb;
> +		break;
> +	case IPROC_PCIE_PAXC:
> +		msi->reg_offsets = iproc_msi_reg_paxc;
> +		break;
> +	default:
> +		dev_err(pcie->dev, "incompatible iProc PCIe interface\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = of_property_read_u32(node, "brcm,num-eq-region",
> +				   &msi->n_eq_region);
> +	if (ret || msi->n_eq_region == 0) {
> +		dev_err(pcie->dev,
> +			"invalid property 'brcm,num-eq-region' %u\n",
> +			msi->n_eq_region);
> +		return -ENODEV;
> +	}
> +
> +	ret = of_property_read_u32(node, "brcm,num-msi-msg-region",
> +				   &msi->n_msi_msg_region);
> +	if (ret || msi->n_msi_msg_region == 0) {
> +		dev_err(pcie->dev,
> +			"invalid property 'brcm,num-msi-msg-region' %u\n",
> +			msi->n_msi_msg_region);
> +		return -ENODEV;
> +	}
> +
> +	/* reserve memory for MSI event queue */
> +	msi->eq_base = devm_kcalloc(pcie->dev, msi->n_eq_region + 1,
> +				    EQ_MEM_REGION_SIZE, GFP_KERNEL);
> +	if (!msi->eq_base)
> +		return -ENOMEM;
> +	msi->eq_base = PTR_ALIGN(msi->eq_base, EQ_MEM_REGION_SIZE);
> +
> +	/* reserve memory for MSI posted writes */
> +	msi->msi_base = devm_kcalloc(pcie->dev, msi->n_msi_msg_region + 1,
> +				     MSI_MSG_MEM_REGION_SIZE, GFP_KERNEL);
> +	if (!msi->msi_base)
> +		return -ENOMEM;
> +	msi->msi_base = PTR_ALIGN(msi->msi_base, MSI_MSG_MEM_REGION_SIZE);
> +
> +	if (of_find_property(node, "brcm,pcie-msi-inten", NULL))
> +		msi->has_inten_reg = true;
> +
> +	msi->nirqs = of_irq_count(node);
> +	if (!msi->nirqs) {
> +		dev_err(pcie->dev, "found no MSI interrupt in DT\n");
> +		return -ENODEV;
> +	}
> +	if (msi->nirqs > IPROC_PCIE_MAX_NUM_IRQS) {
> +		dev_warn(pcie->dev, "too many MSI interrupts defined %d\n",
> +			 msi->nirqs);
> +		msi->nirqs = IPROC_PCIE_MAX_NUM_IRQS;
> +	}
> +	msi->irqs = devm_kcalloc(pcie->dev, msi->nirqs, sizeof(*msi->irqs),
> +				 GFP_KERNEL);
> +	if (!msi->irqs)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < msi->nirqs; i++) {
> +		msi->irqs[i] = irq_of_parse_and_map(node, i);
> +		if (!msi->irqs[i]) {
> +			dev_err(pcie->dev, "unable to parse/map interrupt\n");
> +			return -ENODEV;
> +		}
> +	}
> +
> +	msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
> +						     msi->nirqs, NULL,
> +						     &msi_domain_ops,
> +						     msi);
> +	if (!msi->inner_domain) {
> +		dev_err(pcie->dev, "failed to create inner domain\n");
> +		return -ENOMEM;
> +	}
> +
> +	msi->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(node),
> +						    &iproc_msi_domain_info,
> +						    msi->inner_domain);
> +	if (!msi->msi_domain) {
> +		dev_err(pcie->dev, "failed to create MSI domain\n");
> +		irq_domain_remove(msi->inner_domain);
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < msi->nirqs; i++)
> +		irq_set_chained_handler_and_data(msi->irqs[i],
> +						 iproc_msi_handler, msi);
> +
> +	iproc_msi_enable(msi);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(iproc_msi_init);

Do you really intend for this to be built as a standalone module?

> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> index 24d5b62..a575ef3 100644
> --- a/drivers/pci/host/pcie-iproc.c
> +++ b/drivers/pci/host/pcie-iproc.c
> @@ -440,6 +440,21 @@ static int iproc_pcie_map_ranges(struct iproc_pcie *pcie,
>  	return 0;
>  }
>  
> +static int iproc_pcie_msi_enable(struct iproc_pcie *pcie)
> +{
> +	struct device_node *msi_node;
> +
> +	msi_node = of_parse_phandle(pcie->dev->of_node, "msi-parent", 0);
> +	if (!msi_node)
> +		return -ENODEV;
> +
> +	/*
> +	 * If another MSI controller is being used, the call below should fail
> +	 * but that is okay
> +	 */
> +	return iproc_msi_init(pcie, msi_node);
> +}
> +
>  int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>  {
>  	int ret;
> @@ -507,6 +522,10 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>  
>  	iproc_pcie_enable(pcie);
>  
> +	if (IS_ENABLED(CONFIG_PCI_MSI))
> +		if (iproc_pcie_msi_enable(pcie))
> +			dev_info(pcie->dev, "not using iProc MSI\n");
> +
>  	pci_scan_child_bus(bus);
>  	pci_assign_unassigned_bus_resources(bus);
>  	pci_fixup_irqs(pci_common_swizzle, pcie->map_irq);
> diff --git a/drivers/pci/host/pcie-iproc.h b/drivers/pci/host/pcie-iproc.h
> index 051b651..17317ef 100644
> --- a/drivers/pci/host/pcie-iproc.h
> +++ b/drivers/pci/host/pcie-iproc.h
> @@ -14,6 +14,8 @@
>  #ifndef _PCIE_IPROC_H
>  #define _PCIE_IPROC_H
>  
> +#define IPROC_PCIE_MAX_NUM_IRQS 6
> +

I don't see the point in putting this in an include file, as it is only
used in a single location.

>  /**
>   * iProc PCIe interface type
>   *
> @@ -74,4 +76,14 @@ struct iproc_pcie {
>  int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res);
>  int iproc_pcie_remove(struct iproc_pcie *pcie);
>  
> +#ifdef CONFIG_PCI_MSI
> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node);
> +#else
> +static inline int iproc_msi_init(struct iproc_pcie *pcie,
> +				 struct device_node *node)
> +{
> +	return -ENODEV;
> +}
> +#endif
> +
>  #endif /* _PCIE_IPROC_H */

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-18  8:48   ` Marc Zyngier
@ 2015-11-18  9:50     ` Arnd Bergmann
  2015-11-19 19:22       ` Ray Jui
  2015-11-19  1:37     ` Ray Jui
  1 sibling, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2015-11-18  9:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ray Jui, Bjorn Helgaas, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On Wednesday 18 November 2015 08:48:45 Marc Zyngier wrote:
> > +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
> > +				     enum iproc_msi_reg reg,
> > +				     unsigned int eq)
> > +{
> > +	struct iproc_pcie *pcie = msi->pcie;
> > +
> > +	return readl(pcie->base + msi->reg_offsets[eq][reg]);
> 
> Do you need the extra barrier implied by readl? readl_relaxed should be
> enough.

I suspect this is the one place where it's needed for a lot of
drivers: when the PCI device sends DMA data followed by the MSI
message, the device driver can safely assume that the DMA data
has arrived in memory even without doing another readl() from
the device itself.

It really depends on how the MSI implementation here interacts
with the memory controller, and we should probably have a comment
to explain this either way.

> > +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
> > +				       enum iproc_msi_reg reg,
> > +				       int eq, u32 val)
> > +{
> > +	struct iproc_pcie *pcie = msi->pcie;
> > +
> > +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
> 
> Same here for writel vs writel_relaxed.

We probably want writel_relaxed() when calling this from
iproc_msi_handler(), but not when calling from
iproc_msi_enable(), which should default to a normal
writel(), so we can be sure it's actually configured right
at the time we return from iproc_msi_init(). You could
try to prove that using writel_relaxed is correct here, but
using writel makes it so much easier.

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-18  8:48   ` Marc Zyngier
  2015-11-18  9:50     ` Arnd Bergmann
@ 2015-11-19  1:37     ` Ray Jui
  2015-11-19  2:56       ` Ray Jui
                         ` (3 more replies)
  1 sibling, 4 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-19  1:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Marc,

On 11/18/2015 12:48 AM, Marc Zyngier wrote:
> On Tue, 17 Nov 2015 16:31:54 -0800
> Ray Jui <rjui@broadcom.com> wrote:
>
> Hi Ray,
>
> A few comments below.
>
>> This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
>> all iProc based platforms. The patch follows the latest trend in the
>> kernel to use MSI domain based implementation
>>
>> This iProc event queue based MSI support should not be used with newer
>> platforms with integrated MSI support in the GIC (e.g., giv2m or
>> gicv3-its)
>>
>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
>> Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>> ---
>>   drivers/pci/host/Kconfig          |   9 +
>>   drivers/pci/host/Makefile         |   1 +
>>   drivers/pci/host/pcie-iproc-msi.c | 434 ++++++++++++++++++++++++++++++++++++++
>>   drivers/pci/host/pcie-iproc.c     |  19 ++
>>   drivers/pci/host/pcie-iproc.h     |  12 ++
>>   5 files changed, 475 insertions(+)
>>   create mode 100644 drivers/pci/host/pcie-iproc-msi.c
>>
>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
>> index f131ba9..972e906 100644
>> --- a/drivers/pci/host/Kconfig
>> +++ b/drivers/pci/host/Kconfig
>> @@ -126,6 +126,15 @@ config PCIE_IPROC
>>   	  iProc family of SoCs. An appropriate bus interface driver also needs
>>   	  to be enabled
>>
>> +config PCIE_IPROC_MSI
>> +	bool "Broadcom iProc PCIe MSI support"
>> +	depends on ARCH_BCM_IPROC && PCI_MSI
>> +	select PCI_MSI_IRQ_DOMAIN
>> +	default ARCH_BCM_IPROC
>> +	help
>> +	  Say Y here if you want to enable MSI support for Broadcom's iProc
>> +	  PCIe controller
>> +
>>   config PCIE_IPROC_PLATFORM
>>   	tristate "Broadcom iProc PCIe platform bus driver"
>>   	depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
>> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
>> index 9d4d3c6..0e4e95e 100644
>> --- a/drivers/pci/host/Makefile
>> +++ b/drivers/pci/host/Makefile
>> @@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>>   obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
>>   obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>>   obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
>> +obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
>>   obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
>>   obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
>>   obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
>> diff --git a/drivers/pci/host/pcie-iproc-msi.c b/drivers/pci/host/pcie-iproc-msi.c
>> new file mode 100644
>> index 0000000..a55c707
>> --- /dev/null
>> +++ b/drivers/pci/host/pcie-iproc-msi.c
>> @@ -0,0 +1,434 @@
>> +/*
>> + * Copyright (C) 2015 Broadcom Corporation
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License as
>> + * published by the Free Software Foundation version 2.
>> + *
>> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
>> + * kind, whether express or implied; without even the implied warranty
>> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + */
>> +
>> +#include <linux/irqchip/chained_irq.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/msi.h>
>> +#include <linux/of_irq.h>
>> +#include <linux/of_pci.h>
>> +#include <linux/pci.h>
>> +
>> +#include "pcie-iproc.h"
>> +
>> +#define IPROC_MSI_INTS_EN_OFFSET       0x208
>> +#define IPROC_MSI_INTR_EN_SHIFT        11
>> +#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
>> +#define IPROC_MSI_INT_N_EVENT_SHIFT    1
>> +#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
>> +#define IPROC_MSI_EQ_EN_SHIFT          0
>> +#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
>> +
>> +#define IPROC_MSI_EQ_MASK              0x3f
>> +
>> +/* number of queues in each event queue */
>> +#define IPROC_MSI_EQ_LEN               64
>> +
>> +/* size of each event queue memory region */
>> +#define EQ_MEM_REGION_SIZE           SZ_4K
>> +
>> +/* size of each MSI message memory region */
>> +#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
>> +
>> +enum iproc_msi_reg {
>> +	IPROC_MSI_EQ_PAGE = 0,
>> +	IPROC_MSI_EQ_PAGE_UPPER,
>> +	IPROC_MSI_PAGE,
>> +	IPROC_MSI_PAGE_UPPER,
>> +	IPROC_MSI_CTRL,
>> +	IPROC_MSI_EQ_HEAD,
>> +	IPROC_MSI_EQ_TAIL,
>> +	IPROC_MSI_REG_SIZE,
>> +};
>> +
>> +/**
>> + * iProc event queue based MSI
>> + *
>> + * Only meant to be used on platforms without MSI support integrated into the
>> + * GIC
>> + *
>> + * @pcie: pointer to iProc PCIe data
>> + * @reg_offsets: MSI register offsets
>> + * @irqs: pointer to an array that contains the interrupt IDs
>> + * @nirqs: number of total interrupts
>> + * @has_inten_reg: indicates the MSI interrupt enable register needs to be
>> + * set explicitly (required for some legacy platforms)
>> + * @used: bitmap to track usage of MSI
>> + * @inner_domain: inner IRQ domain
>> + * @msi_domain: MSI IRQ domain
>> + * @bitmap_lock: lock to protect access to the IRQ bitmap
>> + * @n_eq_region: required number of 4K aligned memory region for MSI event
>> + * queues
>> + * @n_msi_msg_region: required number of 4K aligned memory region for MSI
>> + * posted writes
>> + * @eq_base: pointer to allocated memory region for MSI event queues
>> + * @msi_base: pointer to allocated memory region for MSI posted writes
>> + */
>> +struct iproc_msi {
>> +	struct iproc_pcie *pcie;
>> +	const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
>> +	int *irqs;
>> +	int nirqs;
>> +	bool has_inten_reg;
>> +	DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
>> +	struct irq_domain *inner_domain;
>> +	struct irq_domain *msi_domain;
>> +	struct mutex bitmap_lock;
>> +	unsigned int n_eq_region;
>> +	unsigned int n_msi_msg_region;
>> +	void *eq_base;
>> +	void *msi_base;
>> +};
>> +
>> +static const u16
>> +iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
>> +};
>> +
>> +static const u16
>> +iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>> +	{ 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
>> +	{ 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
>> +	{ 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
>> +	{ 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
>> +};
>> +
>> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
>> +				     enum iproc_msi_reg reg,
>> +				     unsigned int eq)
>> +{
>> +	struct iproc_pcie *pcie = msi->pcie;
>> +
>> +	return readl(pcie->base + msi->reg_offsets[eq][reg]);
>
> Do you need the extra barrier implied by readl? readl_relaxed should be
> enough.
>
>> +}
>> +
>> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
>> +				       enum iproc_msi_reg reg,
>> +				       int eq, u32 val)
>> +{
>> +	struct iproc_pcie *pcie = msi->pcie;
>> +
>> +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
>
> Same here for writel vs writel_relaxed.
>

I probably do not need the barrier in most cases. But as we are dealing 
with MSI, I thought it's a lot safer to have the barrier in place so the 
CPU does not re-order the device register accesses with respect to other 
memory accesses?

>> +}
>> +
>> +static struct irq_chip iproc_msi_top_irq_chip = {
>> +	.name = "iProc MSI",
>> +	.irq_enable = pci_msi_unmask_irq,
>> +	.irq_disable = pci_msi_mask_irq,
>> +	.irq_mask = pci_msi_mask_irq,
>> +	.irq_unmask = pci_msi_unmask_irq,
>
> There is no need to provide both enable/disable and mask/unmask. And
> since pci_msi_{un}mask_irq is the default, you can get rid of these
> function pointers anyway.
>

Got it. Like you said, the mask/unmask callback are defaulted to 
pci_msi_{un}mask_irq in pci_msi_domain_update_chip_ops, called when the 
MSI irq domain is created.

I'll get rid of all the callback assignments here.

>> +};
>> +
>> +static struct msi_domain_info iproc_msi_domain_info = {
>> +	.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
>> +		MSI_FLAG_PCI_MSIX,
>> +	.chip = &iproc_msi_top_irq_chip,
>> +};
>> +
>> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
>> +				      const struct cpumask *mask, bool force)
>> +{
>> +	return -EINVAL;
>
> I wish people would stop building stupid HW that prevents proper
> affinity setting for MSI...
>

In fact, there's no reason why the HW should prevent us from setting the 
MSI affinity. This is currently more of a SW issue that I have not spent 
enough time figuring out.

Here's my understanding:

In our system, each MSI is linked to a dedicated interrupt line 
connected to the GIC upstream (e.g., the GIC from Cortex A9 in Cygnus). 
Correct me if I'm wrong, to get the MSI affinity to work, all I need 
should be propagating the affinity setting to the GIC (the 1-to-1 
mapping helps simply things quite a bit here)?

I tried to hook this up with irq_chip_set_affinity_parent. But it looks 
like the irq chip of the parent domain (i.e., the GIC) is pointing to 
NULL, and therefore it would crash when dereferencing it to get the 
irq_set_affinity callback.

I thought I did everything required by figuring out and linking to the 
correct parent domain in the iproc_msi_init routine:

parent_node = of_parse_phandle(node, "interrupt-parent", 0);
if (!parent_node) {
       dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
       return -ENODEV;
}

parent_domain = irq_find_host(parent_node);
if (!parent_domain) {
       dev_err(pcie->dev, "unable to get MSI parent domain\n");
       return -ENODEV;
}

...
...

msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
                                              msi->nirqs, NULL,
                                              &msi_domain_ops,
                                              msi);

I haven't spent too much time investigating, and am hoping to eventually 
enable affinity support with an incremental patch in the future when I 
have more time to investigate.

>> +}
>> +
>> +static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
>> +					  struct msi_msg *msg)
>> +{
>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>> +	phys_addr_t addr;
>> +
>> +	addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
>> +	msg->address_lo = lower_32_bits(addr);
>> +	msg->address_hi = upper_32_bits(addr);
>> +	msg->data = data->hwirq;
>> +}
>> +
>> +static struct irq_chip iproc_msi_bottom_irq_chip = {
>> +	.name = "MSI",
>> +	.irq_set_affinity = iproc_msi_irq_set_affinity,
>> +	.irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
>> +};
>> +
>> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>> +				      unsigned int virq, unsigned int nr_irqs,
>> +				      void *args)
>> +{
>> +	struct iproc_msi *msi = domain->host_data;
>> +	int i, msi_irq;
>> +
>> +	mutex_lock(&msi->bitmap_lock);
>> +
>> +	for (i = 0; i < nr_irqs; i++) {
>> +		msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
>
> This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
> we end up with a larger number of MSIs (32 or 64) multiplexed on top of
> a small number of wired interrupts. Here, you seem to have a 1-1
> mapping. Is that really the case?

Yes, based on the poorly written iProc PCIe arch doc, :), we seem to 
have 1-1 mapping between each wired interrupt and MSI, with each MSI 
handled by an event queue, that consists of 64x word entries allocated 
from host memory (DDR). The MSI data is stored in the low 16-bit of each 
entry, whereas the upper 16-bit of each entry is reserved for the iProc 
PCIe controller for its own use.

>
> If so (and assuming the wired interrupts are always contiguous), you
> shouldn't represent this as a chained interrupt (a multiplexer), but as
> a stacked irqchip, similar to what GICv2m does.
>

Okay, I think I might be missing something here, but I thought I 
currently have a stacked irqdomain (chip), i.e., GIC -> inner_domain -> 
MSI domain?

And does this imply I should expect 'nr_irqs' in this routine to be 
always zero and therefore I can get rid of the for loop here (same in 
the domain free routine)?

>> +		if (msi_irq < msi->nirqs) {
>> +			set_bit(msi_irq, msi->used);
>> +		} else {
>> +			mutex_unlock(&msi->bitmap_lock);
>> +			return -ENOSPC;
>> +		}
>> +
>> +		irq_domain_set_info(domain, virq + i, msi_irq,
>> +				    &iproc_msi_bottom_irq_chip,
>> +				    domain->host_data, handle_simple_irq,
>> +				    NULL, NULL);
>> +	}
>> +
>> +	mutex_unlock(&msi->bitmap_lock);
>> +	return 0;
>> +}
>> +
>> +static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>> +				      unsigned int virq, unsigned int nr_irqs)
>> +{
>> +	struct irq_data *data = irq_domain_get_irq_data(domain, virq);
>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>> +	unsigned int i;
>> +
>> +	mutex_lock(&msi->bitmap_lock);
>> +
>> +	for (i = 0; i < nr_irqs; i++) {
>> +		struct irq_data *data = irq_domain_get_irq_data(domain,
>> +								virq + i);
>> +		if (!test_bit(data->hwirq, msi->used)) {
>> +			dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
>> +				 data->hwirq);
>> +		} else
>> +			clear_bit(data->hwirq, msi->used);
>> +	}
>> +
>> +	mutex_unlock(&msi->bitmap_lock);
>> +	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>> +}
>> +
>> +static const struct irq_domain_ops msi_domain_ops = {
>> +	.alloc = iproc_msi_irq_domain_alloc,
>> +	.free = iproc_msi_irq_domain_free,
>> +};
>> +
>> +static void iproc_msi_enable(struct iproc_msi *msi)
>> +{
>> +	struct iproc_pcie *pcie = msi->pcie;
>> +	int i, eq;
>> +	u32 val;
>> +
>> +	/* program memory region for each event queue */
>> +	for (i = 0; i < msi->n_eq_region; i++) {
>> +		phys_addr_t addr =
>> +			virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
>> +
>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
>> +				    lower_32_bits(addr));
>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
>> +				    upper_32_bits(addr));
>> +	}
>> +
>> +	/* program memory region for MSI posted writes */
>> +	for (i = 0; i < msi->n_msi_msg_region; i++) {
>> +		phys_addr_t addr =
>> +			virt_to_phys(msi->msi_base +
>> +				     (i * MSI_MSG_MEM_REGION_SIZE));
>
> You seem to be a victim of checkpatch. Please don't split statements
> like this, it just make it harder to read. My terminal can wrap lines
> very conveniently, and I don't care about the 80 character limit...
>

Okay will fix.

>> +
>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
>> +				    lower_32_bits(addr));
>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
>> +				    upper_32_bits(addr));
>> +	}
>> +
>> +	for (eq = 0; eq < msi->nirqs; eq++) {
>> +		/* enable MSI event queue */
>> +		val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
>> +			IPROC_MSI_EQ_EN;
>> +		iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
>> +
>> +		/*
>> +		 * Some legacy platforms require the MSI interrupt enable
>> +		 * register to be set explicitly
>> +		 */
>> +		if (msi->has_inten_reg) {
>> +			val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>> +			val |= BIT(eq);
>> +			writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>> +		}
>> +	}
>> +}
>> +
>> +static void iproc_msi_handler(struct irq_desc *desc)
>> +{
>> +	unsigned int irq = irq_desc_get_irq(desc);
>> +	struct irq_chip *irq_chip = irq_desc_get_chip(desc);
>> +	struct iproc_msi *msi;
>> +	struct iproc_pcie *pcie;
>> +	u32 eq, head, tail, num_events;
>> +	int virq;
>> +
>> +	chained_irq_enter(irq_chip, desc);
>> +
>> +	msi = irq_get_handler_data(irq);
>> +	pcie = msi->pcie;
>> +
>> +	eq = irq - msi->irqs[0];
>> +	virq = irq_find_mapping(msi->inner_domain, eq);
>> +	head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
>> +		IPROC_MSI_EQ_MASK;
>> +	do {
>> +		tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
>> +			IPROC_MSI_EQ_MASK;
>> +
>> +		num_events = (tail < head) ?
>> +			(IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
>> +		if (!num_events)
>> +			break;
>> +
>> +		generic_handle_irq(virq);
>> +
>> +		head++;
>> +		head %= IPROC_MSI_EQ_LEN;
>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
>> +	} while (true);
>
> That's unusual. You seem to get only one interrupt for a bunch of MSIs,
> all representing the same IRQ? That feels very weird, as you can
> usually collapse edge interrupts.
>

I think we get one GIC interrupt per MSI line (1:1 mapping), but then 
the MSI data message can be more than one, stored in the event queue 
reserved for that MSI/interrupt line.

When you mentioned chained IRQ above, do you really mean the logic here? 
In fact, I don't think I really need to use the chained irq APIs here, 
as the MSI and GIC interrupt line has a 1-to-1 mapping.

>> +
>> +	chained_irq_exit(irq_chip, desc);
>> +}
>> +

Can probably get rid of the chained_irq_enter and exit?

>> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node)
>> +{
>> +	struct iproc_msi *msi;
>> +	struct device_node *parent_node;
>> +	struct irq_domain *parent_domain;
>> +	int i, ret;
>> +
>> +	if (!of_device_is_compatible(node, "brcm,iproc-msi"))
>> +		return -ENODEV;
>> +
>> +	if (!of_find_property(node, "msi-controller", NULL))
>> +		return -ENODEV;
>> +
>> +	parent_node = of_parse_phandle(node, "interrupt-parent", 0);
>> +	if (!parent_node) {
>> +		dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	parent_domain = irq_find_host(parent_node);
>> +	if (!parent_domain) {
>> +		dev_err(pcie->dev, "unable to get MSI parent domain\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	msi = devm_kzalloc(pcie->dev, sizeof(*msi), GFP_KERNEL);
>> +	if (!msi)
>> +		return -ENOMEM;
>> +
>> +	msi->pcie = pcie;
>> +	mutex_init(&msi->bitmap_lock);
>> +
>> +	switch (pcie->type) {
>> +	case IPROC_PCIE_PAXB:
>> +		msi->reg_offsets = iproc_msi_reg_paxb;
>> +		break;
>> +	case IPROC_PCIE_PAXC:
>> +		msi->reg_offsets = iproc_msi_reg_paxc;
>> +		break;
>> +	default:
>> +		dev_err(pcie->dev, "incompatible iProc PCIe interface\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = of_property_read_u32(node, "brcm,num-eq-region",
>> +				   &msi->n_eq_region);
>> +	if (ret || msi->n_eq_region == 0) {
>> +		dev_err(pcie->dev,
>> +			"invalid property 'brcm,num-eq-region' %u\n",
>> +			msi->n_eq_region);
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = of_property_read_u32(node, "brcm,num-msi-msg-region",
>> +				   &msi->n_msi_msg_region);
>> +	if (ret || msi->n_msi_msg_region == 0) {
>> +		dev_err(pcie->dev,
>> +			"invalid property 'brcm,num-msi-msg-region' %u\n",
>> +			msi->n_msi_msg_region);
>> +		return -ENODEV;
>> +	}
>> +
>> +	/* reserve memory for MSI event queue */
>> +	msi->eq_base = devm_kcalloc(pcie->dev, msi->n_eq_region + 1,
>> +				    EQ_MEM_REGION_SIZE, GFP_KERNEL);
>> +	if (!msi->eq_base)
>> +		return -ENOMEM;
>> +	msi->eq_base = PTR_ALIGN(msi->eq_base, EQ_MEM_REGION_SIZE);
>> +
>> +	/* reserve memory for MSI posted writes */
>> +	msi->msi_base = devm_kcalloc(pcie->dev, msi->n_msi_msg_region + 1,
>> +				     MSI_MSG_MEM_REGION_SIZE, GFP_KERNEL);
>> +	if (!msi->msi_base)
>> +		return -ENOMEM;
>> +	msi->msi_base = PTR_ALIGN(msi->msi_base, MSI_MSG_MEM_REGION_SIZE);
>> +
>> +	if (of_find_property(node, "brcm,pcie-msi-inten", NULL))
>> +		msi->has_inten_reg = true;
>> +
>> +	msi->nirqs = of_irq_count(node);
>> +	if (!msi->nirqs) {
>> +		dev_err(pcie->dev, "found no MSI interrupt in DT\n");
>> +		return -ENODEV;
>> +	}
>> +	if (msi->nirqs > IPROC_PCIE_MAX_NUM_IRQS) {
>> +		dev_warn(pcie->dev, "too many MSI interrupts defined %d\n",
>> +			 msi->nirqs);
>> +		msi->nirqs = IPROC_PCIE_MAX_NUM_IRQS;
>> +	}
>> +	msi->irqs = devm_kcalloc(pcie->dev, msi->nirqs, sizeof(*msi->irqs),
>> +				 GFP_KERNEL);
>> +	if (!msi->irqs)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < msi->nirqs; i++) {
>> +		msi->irqs[i] = irq_of_parse_and_map(node, i);
>> +		if (!msi->irqs[i]) {
>> +			dev_err(pcie->dev, "unable to parse/map interrupt\n");
>> +			return -ENODEV;
>> +		}
>> +	}
>> +
>> +	msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>> +						     msi->nirqs, NULL,
>> +						     &msi_domain_ops,
>> +						     msi);
>> +	if (!msi->inner_domain) {
>> +		dev_err(pcie->dev, "failed to create inner domain\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	msi->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(node),
>> +						    &iproc_msi_domain_info,
>> +						    msi->inner_domain);
>> +	if (!msi->msi_domain) {
>> +		dev_err(pcie->dev, "failed to create MSI domain\n");
>> +		irq_domain_remove(msi->inner_domain);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	for (i = 0; i < msi->nirqs; i++)
>> +		irq_set_chained_handler_and_data(msi->irqs[i],
>> +						 iproc_msi_handler, msi);
>> +
>> +	iproc_msi_enable(msi);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iproc_msi_init);
>
> Do you really intend for this to be built as a standalone module?
>

The iProc MSI driver is statically linked in, but the iProc PCIe core 
driver can be built as a module. The EXPORT_SYMBOL here is for 
iproc_msi_init to be exported symbol to iProc PCIe core driver when 
built as module.

>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> index 24d5b62..a575ef3 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -440,6 +440,21 @@ static int iproc_pcie_map_ranges(struct iproc_pcie *pcie,
>>   	return 0;
>>   }
>>
>> +static int iproc_pcie_msi_enable(struct iproc_pcie *pcie)
>> +{
>> +	struct device_node *msi_node;
>> +
>> +	msi_node = of_parse_phandle(pcie->dev->of_node, "msi-parent", 0);
>> +	if (!msi_node)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * If another MSI controller is being used, the call below should fail
>> +	 * but that is okay
>> +	 */
>> +	return iproc_msi_init(pcie, msi_node);
>> +}
>> +
>>   int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>>   {
>>   	int ret;
>> @@ -507,6 +522,10 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>>
>>   	iproc_pcie_enable(pcie);
>>
>> +	if (IS_ENABLED(CONFIG_PCI_MSI))
>> +		if (iproc_pcie_msi_enable(pcie))
>> +			dev_info(pcie->dev, "not using iProc MSI\n");
>> +
>>   	pci_scan_child_bus(bus);
>>   	pci_assign_unassigned_bus_resources(bus);
>>   	pci_fixup_irqs(pci_common_swizzle, pcie->map_irq);
>> diff --git a/drivers/pci/host/pcie-iproc.h b/drivers/pci/host/pcie-iproc.h
>> index 051b651..17317ef 100644
>> --- a/drivers/pci/host/pcie-iproc.h
>> +++ b/drivers/pci/host/pcie-iproc.h
>> @@ -14,6 +14,8 @@
>>   #ifndef _PCIE_IPROC_H
>>   #define _PCIE_IPROC_H
>>
>> +#define IPROC_PCIE_MAX_NUM_IRQS 6
>> +
>
> I don't see the point in putting this in an include file, as it is only
> used in a single location.
>

Okay. Will move it into pcie-iproc-msi.c

>>   /**
>>    * iProc PCIe interface type
>>    *
>> @@ -74,4 +76,14 @@ struct iproc_pcie {
>>   int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res);
>>   int iproc_pcie_remove(struct iproc_pcie *pcie);
>>
>> +#ifdef CONFIG_PCI_MSI
>> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node);
>> +#else
>> +static inline int iproc_msi_init(struct iproc_pcie *pcie,
>> +				 struct device_node *node)
>> +{
>> +	return -ENODEV;
>> +}
>> +#endif
>> +
>>   #endif /* _PCIE_IPROC_H */
>
> Thanks,
>
> 	M.
>

Thanks a lot for the review. I really appreciate it!

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-19  1:37     ` Ray Jui
@ 2015-11-19  2:56       ` Ray Jui
  2015-11-19  7:23       ` Ray Jui
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-19  2:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Marc,

On 11/18/2015 5:37 PM, Ray Jui wrote:
> Hi Marc,
>
> On 11/18/2015 12:48 AM, Marc Zyngier wrote:
>> On Tue, 17 Nov 2015 16:31:54 -0800
>> Ray Jui <rjui@broadcom.com> wrote:

>>> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>>> +                      unsigned int virq, unsigned int nr_irqs,
>>> +                      void *args)
>>> +{
>>> +    struct iproc_msi *msi = domain->host_data;
>>> +    int i, msi_irq;
>>> +
>>> +    mutex_lock(&msi->bitmap_lock);
>>> +
>>> +    for (i = 0; i < nr_irqs; i++) {
>>> +        msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
>>
>> This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
>> we end up with a larger number of MSIs (32 or 64) multiplexed on top of
>> a small number of wired interrupts. Here, you seem to have a 1-1
>> mapping. Is that really the case?
>
> Yes, based on the poorly written iProc PCIe arch doc, :), we seem to
> have 1-1 mapping between each wired interrupt and MSI, with each MSI
> handled by an event queue, that consists of 64x word entries allocated
> from host memory (DDR). The MSI data is stored in the low 16-bit of each
> entry, whereas the upper 16-bit of each entry is reserved for the iProc
> PCIe controller for its own use.
>

In fact, let me confirm with our ASIC team on the above statement. The 
iProc PCIe arch doc is in fact not very clear on this....

>>
>> If so (and assuming the wired interrupts are always contiguous), you
>> shouldn't represent this as a chained interrupt (a multiplexer), but as
>> a stacked irqchip, similar to what GICv2m does.
>>
>
> Okay, I think I might be missing something here, but I thought I
> currently have a stacked irqdomain (chip), i.e., GIC -> inner_domain ->
> MSI domain?
>
> And does this imply I should expect 'nr_irqs' in this routine to be
> always zero and therefore I can get rid of the for loop here (same in
> the domain free routine)?
>

Thanks,

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-19  1:37     ` Ray Jui
  2015-11-19  2:56       ` Ray Jui
@ 2015-11-19  7:23       ` Ray Jui
  2015-11-19  8:31       ` Arnd Bergmann
  2015-11-20  8:56       ` Marc Zyngier
  3 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-19  7:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Marc,

On 11/18/2015 5:37 PM, Ray Jui wrote:
> Hi Marc,
>
> On 11/18/2015 12:48 AM, Marc Zyngier wrote:
>> On Tue, 17 Nov 2015 16:31:54 -0800
>> Ray Jui <rjui@broadcom.com> wrote:
>>
>> Hi Ray,
>>
>> A few comments below.
>>
>>> This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
>>> all iProc based platforms. The patch follows the latest trend in the
>>> kernel to use MSI domain based implementation
>>>
>>> This iProc event queue based MSI support should not be used with newer
>>> platforms with integrated MSI support in the GIC (e.g., giv2m or
>>> gicv3-its)
>>>
>>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>>> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
>>> Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
>>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>>> ---
>>>   drivers/pci/host/Kconfig          |   9 +
>>>   drivers/pci/host/Makefile         |   1 +
>>>   drivers/pci/host/pcie-iproc-msi.c | 434
>>> ++++++++++++++++++++++++++++++++++++++
>>>   drivers/pci/host/pcie-iproc.c     |  19 ++
>>>   drivers/pci/host/pcie-iproc.h     |  12 ++
>>>   5 files changed, 475 insertions(+)
>>>   create mode 100644 drivers/pci/host/pcie-iproc-msi.c
>>>
>>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
>>> index f131ba9..972e906 100644
>>> --- a/drivers/pci/host/Kconfig
>>> +++ b/drivers/pci/host/Kconfig
>>> @@ -126,6 +126,15 @@ config PCIE_IPROC
>>>         iProc family of SoCs. An appropriate bus interface driver
>>> also needs
>>>         to be enabled
>>>
>>> +config PCIE_IPROC_MSI
>>> +    bool "Broadcom iProc PCIe MSI support"
>>> +    depends on ARCH_BCM_IPROC && PCI_MSI
>>> +    select PCI_MSI_IRQ_DOMAIN
>>> +    default ARCH_BCM_IPROC
>>> +    help
>>> +      Say Y here if you want to enable MSI support for Broadcom's iProc
>>> +      PCIe controller
>>> +
>>>   config PCIE_IPROC_PLATFORM
>>>       tristate "Broadcom iProc PCIe platform bus driver"
>>>       depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
>>> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
>>> index 9d4d3c6..0e4e95e 100644
>>> --- a/drivers/pci/host/Makefile
>>> +++ b/drivers/pci/host/Makefile
>>> @@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>>>   obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
>>>   obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>>>   obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
>>> +obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
>>>   obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
>>>   obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
>>>   obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
>>> diff --git a/drivers/pci/host/pcie-iproc-msi.c
>>> b/drivers/pci/host/pcie-iproc-msi.c
>>> new file mode 100644
>>> index 0000000..a55c707
>>> --- /dev/null
>>> +++ b/drivers/pci/host/pcie-iproc-msi.c
>>> @@ -0,0 +1,434 @@
>>> +/*
>>> + * Copyright (C) 2015 Broadcom Corporation
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU General Public License as
>>> + * published by the Free Software Foundation version 2.
>>> + *
>>> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
>>> + * kind, whether express or implied; without even the implied warranty
>>> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + */
>>> +
>>> +#include <linux/irqchip/chained_irq.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>> +#include <linux/of_irq.h>
>>> +#include <linux/of_pci.h>
>>> +#include <linux/pci.h>
>>> +
>>> +#include "pcie-iproc.h"
>>> +
>>> +#define IPROC_MSI_INTS_EN_OFFSET       0x208
>>> +#define IPROC_MSI_INTR_EN_SHIFT        11
>>> +#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
>>> +#define IPROC_MSI_INT_N_EVENT_SHIFT    1
>>> +#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
>>> +#define IPROC_MSI_EQ_EN_SHIFT          0
>>> +#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
>>> +
>>> +#define IPROC_MSI_EQ_MASK              0x3f
>>> +
>>> +/* number of queues in each event queue */
>>> +#define IPROC_MSI_EQ_LEN               64
>>> +
>>> +/* size of each event queue memory region */
>>> +#define EQ_MEM_REGION_SIZE           SZ_4K
>>> +
>>> +/* size of each MSI message memory region */
>>> +#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
>>> +
>>> +enum iproc_msi_reg {
>>> +    IPROC_MSI_EQ_PAGE = 0,
>>> +    IPROC_MSI_EQ_PAGE_UPPER,
>>> +    IPROC_MSI_PAGE,
>>> +    IPROC_MSI_PAGE_UPPER,
>>> +    IPROC_MSI_CTRL,
>>> +    IPROC_MSI_EQ_HEAD,
>>> +    IPROC_MSI_EQ_TAIL,
>>> +    IPROC_MSI_REG_SIZE,
>>> +};
>>> +
>>> +/**
>>> + * iProc event queue based MSI
>>> + *
>>> + * Only meant to be used on platforms without MSI support integrated
>>> into the
>>> + * GIC
>>> + *
>>> + * @pcie: pointer to iProc PCIe data
>>> + * @reg_offsets: MSI register offsets
>>> + * @irqs: pointer to an array that contains the interrupt IDs
>>> + * @nirqs: number of total interrupts
>>> + * @has_inten_reg: indicates the MSI interrupt enable register needs
>>> to be
>>> + * set explicitly (required for some legacy platforms)
>>> + * @used: bitmap to track usage of MSI
>>> + * @inner_domain: inner IRQ domain
>>> + * @msi_domain: MSI IRQ domain
>>> + * @bitmap_lock: lock to protect access to the IRQ bitmap
>>> + * @n_eq_region: required number of 4K aligned memory region for MSI
>>> event
>>> + * queues
>>> + * @n_msi_msg_region: required number of 4K aligned memory region
>>> for MSI
>>> + * posted writes
>>> + * @eq_base: pointer to allocated memory region for MSI event queues
>>> + * @msi_base: pointer to allocated memory region for MSI posted writes
>>> + */
>>> +struct iproc_msi {
>>> +    struct iproc_pcie *pcie;
>>> +    const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
>>> +    int *irqs;
>>> +    int nirqs;
>>> +    bool has_inten_reg;
>>> +    DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
>>> +    struct irq_domain *inner_domain;
>>> +    struct irq_domain *msi_domain;
>>> +    struct mutex bitmap_lock;
>>> +    unsigned int n_eq_region;
>>> +    unsigned int n_msi_msg_region;
>>> +    void *eq_base;
>>> +    void *msi_base;
>>> +};
>>> +
>>> +static const u16
>>> +iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
>>> +    { 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
>>> +};
>>> +
>>> +static const u16
>>> +iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>> +    { 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
>>> +    { 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
>>> +    { 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
>>> +    { 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
>>> +};
>>> +
>>> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
>>> +                     enum iproc_msi_reg reg,
>>> +                     unsigned int eq)
>>> +{
>>> +    struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +    return readl(pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Do you need the extra barrier implied by readl? readl_relaxed should be
>> enough.
>>
>>> +}
>>> +
>>> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
>>> +                       enum iproc_msi_reg reg,
>>> +                       int eq, u32 val)
>>> +{
>>> +    struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +    writel(val, pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Same here for writel vs writel_relaxed.
>>
>
> I probably do not need the barrier in most cases. But as we are dealing
> with MSI, I thought it's a lot safer to have the barrier in place so the
> CPU does not re-order the device register accesses with respect to other
> memory accesses?
>
>>> +}
>>> +
>>> +static struct irq_chip iproc_msi_top_irq_chip = {
>>> +    .name = "iProc MSI",
>>> +    .irq_enable = pci_msi_unmask_irq,
>>> +    .irq_disable = pci_msi_mask_irq,
>>> +    .irq_mask = pci_msi_mask_irq,
>>> +    .irq_unmask = pci_msi_unmask_irq,
>>
>> There is no need to provide both enable/disable and mask/unmask. And
>> since pci_msi_{un}mask_irq is the default, you can get rid of these
>> function pointers anyway.
>>
>
> Got it. Like you said, the mask/unmask callback are defaulted to
> pci_msi_{un}mask_irq in pci_msi_domain_update_chip_ops, called when the
> MSI irq domain is created.
>
> I'll get rid of all the callback assignments here.
>
>>> +};
>>> +
>>> +static struct msi_domain_info iproc_msi_domain_info = {
>>> +    .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
>>> +        MSI_FLAG_PCI_MSIX,
>>> +    .chip = &iproc_msi_top_irq_chip,
>>> +};
>>> +
>>> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
>>> +                      const struct cpumask *mask, bool force)
>>> +{
>>> +    return -EINVAL;
>>
>> I wish people would stop building stupid HW that prevents proper
>> affinity setting for MSI...
>>
>
> In fact, there's no reason why the HW should prevent us from setting the
> MSI affinity. This is currently more of a SW issue that I have not spent
> enough time figuring out.
>
> Here's my understanding:
>
> In our system, each MSI is linked to a dedicated interrupt line
> connected to the GIC upstream (e.g., the GIC from Cortex A9 in Cygnus).

Okay I need to take my words back. I just had a long meeting with our 
ASIC engineer. In fact, we are supposed to be able to support up to 64 
MSI vectors per GIC interrupt line.

> Correct me if I'm wrong, to get the MSI affinity to work, all I need
> should be propagating the affinity setting to the GIC (the 1-to-1
> mapping helps simply things quite a bit here)?

Now MSI affinity gets much more complicated and I'm not sure how the HW 
can support it. I need more meetings with our ASIC engineers to figure 
this out. I'm not planning to support MSI IRQ affinity at this point.

>
> I tried to hook this up with irq_chip_set_affinity_parent. But it looks
> like the irq chip of the parent domain (i.e., the GIC) is pointing to
> NULL, and therefore it would crash when dereferencing it to get the
> irq_set_affinity callback.
>
> I thought I did everything required by figuring out and linking to the
> correct parent domain in the iproc_msi_init routine:
>
> parent_node = of_parse_phandle(node, "interrupt-parent", 0);
> if (!parent_node) {
>        dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>        return -ENODEV;
> }
>
> parent_domain = irq_find_host(parent_node);
> if (!parent_domain) {
>        dev_err(pcie->dev, "unable to get MSI parent domain\n");
>        return -ENODEV;
> }
>
> ...
> ...
>
> msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>                                               msi->nirqs, NULL,
>                                               &msi_domain_ops,
>                                               msi);
>
> I haven't spent too much time investigating, and am hoping to eventually
> enable affinity support with an incremental patch in the future when I
> have more time to investigate.
>
>>> +}
>>> +
>>> +static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
>>> +                      struct msi_msg *msg)
>>> +{
>>> +    struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>> +    phys_addr_t addr;
>>> +
>>> +    addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
>>> +    msg->address_lo = lower_32_bits(addr);
>>> +    msg->address_hi = upper_32_bits(addr);
>>> +    msg->data = data->hwirq;
>>> +}
>>> +
>>> +static struct irq_chip iproc_msi_bottom_irq_chip = {
>>> +    .name = "MSI",
>>> +    .irq_set_affinity = iproc_msi_irq_set_affinity,
>>> +    .irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
>>> +};
>>> +
>>> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>>> +                      unsigned int virq, unsigned int nr_irqs,
>>> +                      void *args)
>>> +{
>>> +    struct iproc_msi *msi = domain->host_data;
>>> +    int i, msi_irq;
>>> +
>>> +    mutex_lock(&msi->bitmap_lock);
>>> +
>>> +    for (i = 0; i < nr_irqs; i++) {
>>> +        msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
>>
>> This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
>> we end up with a larger number of MSIs (32 or 64) multiplexed on top of
>> a small number of wired interrupts. Here, you seem to have a 1-1
>> mapping. Is that really the case?
>
> Yes, based on the poorly written iProc PCIe arch doc, :), we seem to
> have 1-1 mapping between each wired interrupt and MSI, with each MSI
> handled by an event queue, that consists of 64x word entries allocated
> from host memory (DDR). The MSI data is stored in the low 16-bit of each
> entry, whereas the upper 16-bit of each entry is reserved for the iProc
> PCIe controller for its own use.
>

So yeah, you are right. We should be able to support up to 64 MSI 
vectors per interrupt. This is just embarrassing.... :(

And thanks for pointing this out.

>>
>> If so (and assuming the wired interrupts are always contiguous), you
>> shouldn't represent this as a chained interrupt (a multiplexer), but as
>> a stacked irqchip, similar to what GICv2m does.
>>
>
> Okay, I think I might be missing something here, but I thought I
> currently have a stacked irqdomain (chip), i.e., GIC -> inner_domain ->
> MSI domain?
>
> And does this imply I should expect 'nr_irqs' in this routine to be
> always zero and therefore I can get rid of the for loop here (same in
> the domain free routine)?

And yeah, chained IRQ will be used for all MSI vectors in each interrupt 
line.

>
>>> +        if (msi_irq < msi->nirqs) {
>>> +            set_bit(msi_irq, msi->used);
>>> +        } else {
>>> +            mutex_unlock(&msi->bitmap_lock);
>>> +            return -ENOSPC;
>>> +        }
>>> +
>>> +        irq_domain_set_info(domain, virq + i, msi_irq,
>>> +                    &iproc_msi_bottom_irq_chip,
>>> +                    domain->host_data, handle_simple_irq,
>>> +                    NULL, NULL);
>>> +    }
>>> +
>>> +    mutex_unlock(&msi->bitmap_lock);
>>> +    return 0;
>>> +}
>>> +
>>> +static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>>> +                      unsigned int virq, unsigned int nr_irqs)
>>> +{
>>> +    struct irq_data *data = irq_domain_get_irq_data(domain, virq);
>>> +    struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>> +    unsigned int i;
>>> +
>>> +    mutex_lock(&msi->bitmap_lock);
>>> +
>>> +    for (i = 0; i < nr_irqs; i++) {
>>> +        struct irq_data *data = irq_domain_get_irq_data(domain,
>>> +                                virq + i);
>>> +        if (!test_bit(data->hwirq, msi->used)) {
>>> +            dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
>>> +                 data->hwirq);
>>> +        } else
>>> +            clear_bit(data->hwirq, msi->used);
>>> +    }
>>> +
>>> +    mutex_unlock(&msi->bitmap_lock);
>>> +    irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>>> +}
>>> +
>>> +static const struct irq_domain_ops msi_domain_ops = {
>>> +    .alloc = iproc_msi_irq_domain_alloc,
>>> +    .free = iproc_msi_irq_domain_free,
>>> +};
>>> +
>>> +static void iproc_msi_enable(struct iproc_msi *msi)
>>> +{
>>> +    struct iproc_pcie *pcie = msi->pcie;
>>> +    int i, eq;
>>> +    u32 val;
>>> +
>>> +    /* program memory region for each event queue */
>>> +    for (i = 0; i < msi->n_eq_region; i++) {
>>> +        phys_addr_t addr =
>>> +            virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
>>> +
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
>>> +                    lower_32_bits(addr));
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
>>> +                    upper_32_bits(addr));
>>> +    }
>>> +
>>> +    /* program memory region for MSI posted writes */
>>> +    for (i = 0; i < msi->n_msi_msg_region; i++) {
>>> +        phys_addr_t addr =
>>> +            virt_to_phys(msi->msi_base +
>>> +                     (i * MSI_MSG_MEM_REGION_SIZE));
>>
>> You seem to be a victim of checkpatch. Please don't split statements
>> like this, it just make it harder to read. My terminal can wrap lines
>> very conveniently, and I don't care about the 80 character limit...
>>
>
> Okay will fix.
>
>>> +
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
>>> +                    lower_32_bits(addr));
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
>>> +                    upper_32_bits(addr));
>>> +    }
>>> +
>>> +    for (eq = 0; eq < msi->nirqs; eq++) {
>>> +        /* enable MSI event queue */
>>> +        val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
>>> +            IPROC_MSI_EQ_EN;
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
>>> +
>>> +        /*
>>> +         * Some legacy platforms require the MSI interrupt enable
>>> +         * register to be set explicitly
>>> +         */
>>> +        if (msi->has_inten_reg) {
>>> +            val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>> +            val |= BIT(eq);
>>> +            writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void iproc_msi_handler(struct irq_desc *desc)
>>> +{
>>> +    unsigned int irq = irq_desc_get_irq(desc);
>>> +    struct irq_chip *irq_chip = irq_desc_get_chip(desc);
>>> +    struct iproc_msi *msi;
>>> +    struct iproc_pcie *pcie;
>>> +    u32 eq, head, tail, num_events;
>>> +    int virq;
>>> +
>>> +    chained_irq_enter(irq_chip, desc);
>>> +
>>> +    msi = irq_get_handler_data(irq);
>>> +    pcie = msi->pcie;
>>> +
>>> +    eq = irq - msi->irqs[0];
>>> +    virq = irq_find_mapping(msi->inner_domain, eq);
>>> +    head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
>>> +        IPROC_MSI_EQ_MASK;
>>> +    do {
>>> +        tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
>>> +            IPROC_MSI_EQ_MASK;
>>> +
>>> +        num_events = (tail < head) ?
>>> +            (IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
>>> +        if (!num_events)
>>> +            break;
>>> +
>>> +        generic_handle_irq(virq);
>>> +
>>> +        head++;
>>> +        head %= IPROC_MSI_EQ_LEN;
>>> +        iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
>>> +    } while (true);
>>
>> That's unusual. You seem to get only one interrupt for a bunch of MSIs,
>> all representing the same IRQ? That feels very weird, as you can
>> usually collapse edge interrupts.
>>
>
> I think we get one GIC interrupt per MSI line (1:1 mapping), but then
> the MSI data message can be more than one, stored in the event queue
> reserved for that MSI/interrupt line.
>
> When you mentioned chained IRQ above, do you really mean the logic here?
> In fact, I don't think I really need to use the chained irq APIs here,
> as the MSI and GIC interrupt line has a 1-to-1 mapping.
>

The above routine needs to be modified to:

go through the event queue and identify each MSI vector to be serviced, 
and service them by calling generic_handle_irq.

>>> +
>>> +    chained_irq_exit(irq_chip, desc);
>>> +}
>>> +
>
> Can probably get rid of the chained_irq_enter and exit?
>
>>> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node)
>>> +{
>>> +    struct iproc_msi *msi;
>>> +    struct device_node *parent_node;
>>> +    struct irq_domain *parent_domain;
>>> +    int i, ret;
>>> +
>>> +    if (!of_device_is_compatible(node, "brcm,iproc-msi"))
>>> +        return -ENODEV;
>>> +
>>> +    if (!of_find_property(node, "msi-controller", NULL))
>>> +        return -ENODEV;
>>> +
>>> +    parent_node = of_parse_phandle(node, "interrupt-parent", 0);
>>> +    if (!parent_node) {
>>> +        dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    parent_domain = irq_find_host(parent_node);
>>> +    if (!parent_domain) {
>>> +        dev_err(pcie->dev, "unable to get MSI parent domain\n");
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    msi = devm_kzalloc(pcie->dev, sizeof(*msi), GFP_KERNEL);
>>> +    if (!msi)
>>> +        return -ENOMEM;
>>> +
>>> +    msi->pcie = pcie;
>>> +    mutex_init(&msi->bitmap_lock);
>>> +
>>> +    switch (pcie->type) {
>>> +    case IPROC_PCIE_PAXB:
>>> +        msi->reg_offsets = iproc_msi_reg_paxb;
>>> +        break;
>>> +    case IPROC_PCIE_PAXC:
>>> +        msi->reg_offsets = iproc_msi_reg_paxc;
>>> +        break;
>>> +    default:
>>> +        dev_err(pcie->dev, "incompatible iProc PCIe interface\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    ret = of_property_read_u32(node, "brcm,num-eq-region",
>>> +                   &msi->n_eq_region);
>>> +    if (ret || msi->n_eq_region == 0) {
>>> +        dev_err(pcie->dev,
>>> +            "invalid property 'brcm,num-eq-region' %u\n",
>>> +            msi->n_eq_region);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    ret = of_property_read_u32(node, "brcm,num-msi-msg-region",
>>> +                   &msi->n_msi_msg_region);
>>> +    if (ret || msi->n_msi_msg_region == 0) {
>>> +        dev_err(pcie->dev,
>>> +            "invalid property 'brcm,num-msi-msg-region' %u\n",
>>> +            msi->n_msi_msg_region);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    /* reserve memory for MSI event queue */
>>> +    msi->eq_base = devm_kcalloc(pcie->dev, msi->n_eq_region + 1,
>>> +                    EQ_MEM_REGION_SIZE, GFP_KERNEL);
>>> +    if (!msi->eq_base)
>>> +        return -ENOMEM;
>>> +    msi->eq_base = PTR_ALIGN(msi->eq_base, EQ_MEM_REGION_SIZE);
>>> +
>>> +    /* reserve memory for MSI posted writes */
>>> +    msi->msi_base = devm_kcalloc(pcie->dev, msi->n_msi_msg_region + 1,
>>> +                     MSI_MSG_MEM_REGION_SIZE, GFP_KERNEL);
>>> +    if (!msi->msi_base)
>>> +        return -ENOMEM;
>>> +    msi->msi_base = PTR_ALIGN(msi->msi_base, MSI_MSG_MEM_REGION_SIZE);
>>> +
>>> +    if (of_find_property(node, "brcm,pcie-msi-inten", NULL))
>>> +        msi->has_inten_reg = true;
>>> +
>>> +    msi->nirqs = of_irq_count(node);
>>> +    if (!msi->nirqs) {
>>> +        dev_err(pcie->dev, "found no MSI interrupt in DT\n");
>>> +        return -ENODEV;
>>> +    }
>>> +    if (msi->nirqs > IPROC_PCIE_MAX_NUM_IRQS) {
>>> +        dev_warn(pcie->dev, "too many MSI interrupts defined %d\n",
>>> +             msi->nirqs);
>>> +        msi->nirqs = IPROC_PCIE_MAX_NUM_IRQS;
>>> +    }
>>> +    msi->irqs = devm_kcalloc(pcie->dev, msi->nirqs, sizeof(*msi->irqs),
>>> +                 GFP_KERNEL);
>>> +    if (!msi->irqs)
>>> +        return -ENOMEM;
>>> +
>>> +    for (i = 0; i < msi->nirqs; i++) {
>>> +        msi->irqs[i] = irq_of_parse_and_map(node, i);
>>> +        if (!msi->irqs[i]) {
>>> +            dev_err(pcie->dev, "unable to parse/map interrupt\n");
>>> +            return -ENODEV;
>>> +        }
>>> +    }
>>> +
>>> +    msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>>> +                             msi->nirqs, NULL,
>>> +                             &msi_domain_ops,
>>> +                             msi);
>>> +    if (!msi->inner_domain) {
>>> +        dev_err(pcie->dev, "failed to create inner domain\n");
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    msi->msi_domain =
>>> pci_msi_create_irq_domain(of_node_to_fwnode(node),
>>> +                            &iproc_msi_domain_info,
>>> +                            msi->inner_domain);
>>> +    if (!msi->msi_domain) {
>>> +        dev_err(pcie->dev, "failed to create MSI domain\n");
>>> +        irq_domain_remove(msi->inner_domain);
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    for (i = 0; i < msi->nirqs; i++)
>>> +        irq_set_chained_handler_and_data(msi->irqs[i],
>>> +                         iproc_msi_handler, msi);
>>> +
>>> +    iproc_msi_enable(msi);
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(iproc_msi_init);
>>
>> Do you really intend for this to be built as a standalone module?
>>
>
> The iProc MSI driver is statically linked in, but the iProc PCIe core
> driver can be built as a module. The EXPORT_SYMBOL here is for
> iproc_msi_init to be exported symbol to iProc PCIe core driver when
> built as module.
>
>>> diff --git a/drivers/pci/host/pcie-iproc.c
>>> b/drivers/pci/host/pcie-iproc.c
>>> index 24d5b62..a575ef3 100644
>>> --- a/drivers/pci/host/pcie-iproc.c
>>> +++ b/drivers/pci/host/pcie-iproc.c
>>> @@ -440,6 +440,21 @@ static int iproc_pcie_map_ranges(struct
>>> iproc_pcie *pcie,
>>>       return 0;
>>>   }
>>>
>>> +static int iproc_pcie_msi_enable(struct iproc_pcie *pcie)
>>> +{
>>> +    struct device_node *msi_node;
>>> +
>>> +    msi_node = of_parse_phandle(pcie->dev->of_node, "msi-parent", 0);
>>> +    if (!msi_node)
>>> +        return -ENODEV;
>>> +
>>> +    /*
>>> +     * If another MSI controller is being used, the call below
>>> should fail
>>> +     * but that is okay
>>> +     */
>>> +    return iproc_msi_init(pcie, msi_node);
>>> +}
>>> +
>>>   int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>>>   {
>>>       int ret;
>>> @@ -507,6 +522,10 @@ int iproc_pcie_setup(struct iproc_pcie *pcie,
>>> struct list_head *res)
>>>
>>>       iproc_pcie_enable(pcie);
>>>
>>> +    if (IS_ENABLED(CONFIG_PCI_MSI))
>>> +        if (iproc_pcie_msi_enable(pcie))
>>> +            dev_info(pcie->dev, "not using iProc MSI\n");
>>> +
>>>       pci_scan_child_bus(bus);
>>>       pci_assign_unassigned_bus_resources(bus);
>>>       pci_fixup_irqs(pci_common_swizzle, pcie->map_irq);
>>> diff --git a/drivers/pci/host/pcie-iproc.h
>>> b/drivers/pci/host/pcie-iproc.h
>>> index 051b651..17317ef 100644
>>> --- a/drivers/pci/host/pcie-iproc.h
>>> +++ b/drivers/pci/host/pcie-iproc.h
>>> @@ -14,6 +14,8 @@
>>>   #ifndef _PCIE_IPROC_H
>>>   #define _PCIE_IPROC_H
>>>
>>> +#define IPROC_PCIE_MAX_NUM_IRQS 6
>>> +
>>
>> I don't see the point in putting this in an include file, as it is only
>> used in a single location.
>>
>
> Okay. Will move it into pcie-iproc-msi.c
>
>>>   /**
>>>    * iProc PCIe interface type
>>>    *
>>> @@ -74,4 +76,14 @@ struct iproc_pcie {
>>>   int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res);
>>>   int iproc_pcie_remove(struct iproc_pcie *pcie);
>>>
>>> +#ifdef CONFIG_PCI_MSI
>>> +int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node);
>>> +#else
>>> +static inline int iproc_msi_init(struct iproc_pcie *pcie,
>>> +                 struct device_node *node)
>>> +{
>>> +    return -ENODEV;
>>> +}
>>> +#endif
>>> +
>>>   #endif /* _PCIE_IPROC_H */
>>
>> Thanks,
>>
>>     M.
>>
>
> Thanks a lot for the review. I really appreciate it!
>
> Ray

Thanks,

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-19  1:37     ` Ray Jui
  2015-11-19  2:56       ` Ray Jui
  2015-11-19  7:23       ` Ray Jui
@ 2015-11-19  8:31       ` Arnd Bergmann
  2015-11-19 23:05         ` Ray Jui
  2015-11-20  8:56       ` Marc Zyngier
  3 siblings, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2015-11-19  8:31 UTC (permalink / raw)
  To: Ray Jui
  Cc: Marc Zyngier, Bjorn Helgaas, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On Wednesday 18 November 2015 17:37:16 Ray Jui wrote:
> >> +}
> >> +
> >> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
> >> +                                   enum iproc_msi_reg reg,
> >> +                                   int eq, u32 val)
> >> +{
> >> +    struct iproc_pcie *pcie = msi->pcie;
> >> +
> >> +    writel(val, pcie->base + msi->reg_offsets[eq][reg]);
> >
> > Same here for writel vs writel_relaxed.
> >
> 
> I probably do not need the barrier in most cases. But as we are dealing 
> with MSI, I thought it's a lot safer to have the barrier in place so the 
> CPU does not re-order the device register accesses with respect to other 
> memory accesses?

See my other reply on that. For the actual handler, it makes sense to
carefully think of all the possible side-effects and eliminate the
barrier if possible, but for all other callers the performance doesn't
matter and we should default to using readl/writel.

> >> +};
> >> +
> >> +static struct msi_domain_info iproc_msi_domain_info = {
> >> +    .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> >> +            MSI_FLAG_PCI_MSIX,
> >> +    .chip = &iproc_msi_top_irq_chip,
> >> +};
> >> +
> >> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
> >> +                                  const struct cpumask *mask, bool force)
> >> +{
> >> +    return -EINVAL;
> >
> > I wish people would stop building stupid HW that prevents proper
> > affinity setting for MSI...
> >
> 
> In fact, there's no reason why the HW should prevent us from setting the 
> MSI affinity. This is currently more of a SW issue that I have not spent 
> enough time figuring out.
> 
> Here's my understanding:
> 
> In our system, each MSI is linked to a dedicated interrupt line 
> connected to the GIC upstream (e.g., the GIC from Cortex A9 in Cygnus). 
> Correct me if I'm wrong, to get the MSI affinity to work, all I need 
> should be propagating the affinity setting to the GIC (the 1-to-1 
> mapping helps simply things quite a bit here)?
> 
> I tried to hook this up with irq_chip_set_affinity_parent. But it looks 
> like the irq chip of the parent domain (i.e., the GIC) is pointing to 
> NULL, and therefore it would crash when dereferencing it to get the 
> irq_set_affinity callback.
> 
> I thought I did everything required by figuring out and linking to the 
> correct parent domain in the iproc_msi_init routine:
> 
> parent_node = of_parse_phandle(node, "interrupt-parent", 0);
> if (!parent_node) {
>        dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>        return -ENODEV;
> }
> 
> parent_domain = irq_find_host(parent_node);
> if (!parent_domain) {
>        dev_err(pcie->dev, "unable to get MSI parent domain\n");
>        return -ENODEV;
> }
> 
> ...
> ...
> 
> msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>                                               msi->nirqs, NULL,
>                                               &msi_domain_ops,
>                                               msi);
> 
> I haven't spent too much time investigating, and am hoping to eventually 
> enable affinity support with an incremental patch in the future when I 
> have more time to investigate.

Is it possible that you have a set of MSIs per GIC interrupt (as Marc
suggested earlier) and that the way it is intended to be used is by
having each one of them target a different CPU? That way you can do
affinity by switching to a different MSI in .set_affinity(), I think
that is how the old style MSI all used to work when each CPU had its
own MSI register.

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-18  9:50     ` Arnd Bergmann
@ 2015-11-19 19:22       ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-19 19:22 UTC (permalink / raw)
  To: Arnd Bergmann, Marc Zyngier
  Cc: Bjorn Helgaas, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Arnd,

On 11/18/2015 1:50 AM, Arnd Bergmann wrote:
> On Wednesday 18 November 2015 08:48:45 Marc Zyngier wrote:
>>> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
>>> +				     enum iproc_msi_reg reg,
>>> +				     unsigned int eq)
>>> +{
>>> +	struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +	return readl(pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Do you need the extra barrier implied by readl? readl_relaxed should be
>> enough.
>
> I suspect this is the one place where it's needed for a lot of
> drivers: when the PCI device sends DMA data followed by the MSI
> message, the device driver can safely assume that the DMA data
> has arrived in memory even without doing another readl() from
> the device itself.
>
> It really depends on how the MSI implementation here interacts
> with the memory controller, and we should probably have a comment
> to explain this either way.
>
>>> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
>>> +				       enum iproc_msi_reg reg,
>>> +				       int eq, u32 val)
>>> +{
>>> +	struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Same here for writel vs writel_relaxed.
>
> We probably want writel_relaxed() when calling this from
> iproc_msi_handler(), but not when calling from
> iproc_msi_enable(), which should default to a normal
> writel(), so we can be sure it's actually configured right
> at the time we return from iproc_msi_init(). You could
> try to prove that using writel_relaxed is correct here, but
> using writel makes it so much easier.
>
> 	Arnd
>

I need to think through the logic in iproc_msi_handler to make sure the 
correct accesses are used at the right place. The iproc_msi_handler 
needs to be re-written to support multiple MSI vectors per wired interrupt.

Thanks,

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-19  8:31       ` Arnd Bergmann
@ 2015-11-19 23:05         ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-19 23:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Bjorn Helgaas, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Arnd,

On 11/19/2015 12:31 AM, Arnd Bergmann wrote:
> On Wednesday 18 November 2015 17:37:16 Ray Jui wrote:

>> I haven't spent too much time investigating, and am hoping to eventually
>> enable affinity support with an incremental patch in the future when I
>> have more time to investigate.
>
> Is it possible that you have a set of MSIs per GIC interrupt (as Marc
> suggested earlier) and that the way it is intended to be used is by
> having each one of them target a different CPU? That way you can do
> affinity by switching to a different MSI in .set_affinity(), I think
> that is how the old style MSI all used to work when each CPU had its
> own MSI register.
>
> 	Arnd
>

Okay, I see that Xgene MSI has a very similar implementation to support 
MSI IRQ affinity. I plan to take a closer look and try it out in the 
future. But it likely won't be included in the current round of patch set.

Thanks,

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-19  1:37     ` Ray Jui
                         ` (2 preceding siblings ...)
  2015-11-19  8:31       ` Arnd Bergmann
@ 2015-11-20  8:56       ` Marc Zyngier
  2015-11-20 17:07         ` Ray Jui
  3 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2015-11-20  8:56 UTC (permalink / raw)
  To: Ray Jui
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

On 19/11/15 01:37, Ray Jui wrote:
> Hi Marc,
> 
> On 11/18/2015 12:48 AM, Marc Zyngier wrote:
>> On Tue, 17 Nov 2015 16:31:54 -0800
>> Ray Jui <rjui@broadcom.com> wrote:
>>
>> Hi Ray,
>>
>> A few comments below.
>>
>>> This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
>>> all iProc based platforms. The patch follows the latest trend in the
>>> kernel to use MSI domain based implementation
>>>
>>> This iProc event queue based MSI support should not be used with newer
>>> platforms with integrated MSI support in the GIC (e.g., giv2m or
>>> gicv3-its)
>>>
>>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>>> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
>>> Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
>>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>>> ---
>>>   drivers/pci/host/Kconfig          |   9 +
>>>   drivers/pci/host/Makefile         |   1 +
>>>   drivers/pci/host/pcie-iproc-msi.c | 434 ++++++++++++++++++++++++++++++++++++++
>>>   drivers/pci/host/pcie-iproc.c     |  19 ++
>>>   drivers/pci/host/pcie-iproc.h     |  12 ++
>>>   5 files changed, 475 insertions(+)
>>>   create mode 100644 drivers/pci/host/pcie-iproc-msi.c
>>>
>>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
>>> index f131ba9..972e906 100644
>>> --- a/drivers/pci/host/Kconfig
>>> +++ b/drivers/pci/host/Kconfig
>>> @@ -126,6 +126,15 @@ config PCIE_IPROC
>>>   	  iProc family of SoCs. An appropriate bus interface driver also needs
>>>   	  to be enabled
>>>
>>> +config PCIE_IPROC_MSI
>>> +	bool "Broadcom iProc PCIe MSI support"
>>> +	depends on ARCH_BCM_IPROC && PCI_MSI
>>> +	select PCI_MSI_IRQ_DOMAIN
>>> +	default ARCH_BCM_IPROC
>>> +	help
>>> +	  Say Y here if you want to enable MSI support for Broadcom's iProc
>>> +	  PCIe controller
>>> +
>>>   config PCIE_IPROC_PLATFORM
>>>   	tristate "Broadcom iProc PCIe platform bus driver"
>>>   	depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
>>> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
>>> index 9d4d3c6..0e4e95e 100644
>>> --- a/drivers/pci/host/Makefile
>>> +++ b/drivers/pci/host/Makefile
>>> @@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>>>   obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
>>>   obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>>>   obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
>>> +obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
>>>   obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
>>>   obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
>>>   obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
>>> diff --git a/drivers/pci/host/pcie-iproc-msi.c b/drivers/pci/host/pcie-iproc-msi.c
>>> new file mode 100644
>>> index 0000000..a55c707
>>> --- /dev/null
>>> +++ b/drivers/pci/host/pcie-iproc-msi.c
>>> @@ -0,0 +1,434 @@
>>> +/*
>>> + * Copyright (C) 2015 Broadcom Corporation
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU General Public License as
>>> + * published by the Free Software Foundation version 2.
>>> + *
>>> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
>>> + * kind, whether express or implied; without even the implied warranty
>>> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + */
>>> +
>>> +#include <linux/irqchip/chained_irq.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>> +#include <linux/of_irq.h>
>>> +#include <linux/of_pci.h>
>>> +#include <linux/pci.h>
>>> +
>>> +#include "pcie-iproc.h"
>>> +
>>> +#define IPROC_MSI_INTS_EN_OFFSET       0x208
>>> +#define IPROC_MSI_INTR_EN_SHIFT        11
>>> +#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
>>> +#define IPROC_MSI_INT_N_EVENT_SHIFT    1
>>> +#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
>>> +#define IPROC_MSI_EQ_EN_SHIFT          0
>>> +#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
>>> +
>>> +#define IPROC_MSI_EQ_MASK              0x3f
>>> +
>>> +/* number of queues in each event queue */
>>> +#define IPROC_MSI_EQ_LEN               64
>>> +
>>> +/* size of each event queue memory region */
>>> +#define EQ_MEM_REGION_SIZE           SZ_4K
>>> +
>>> +/* size of each MSI message memory region */
>>> +#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
>>> +
>>> +enum iproc_msi_reg {
>>> +	IPROC_MSI_EQ_PAGE = 0,
>>> +	IPROC_MSI_EQ_PAGE_UPPER,
>>> +	IPROC_MSI_PAGE,
>>> +	IPROC_MSI_PAGE_UPPER,
>>> +	IPROC_MSI_CTRL,
>>> +	IPROC_MSI_EQ_HEAD,
>>> +	IPROC_MSI_EQ_TAIL,
>>> +	IPROC_MSI_REG_SIZE,
>>> +};
>>> +
>>> +/**
>>> + * iProc event queue based MSI
>>> + *
>>> + * Only meant to be used on platforms without MSI support integrated into the
>>> + * GIC
>>> + *
>>> + * @pcie: pointer to iProc PCIe data
>>> + * @reg_offsets: MSI register offsets
>>> + * @irqs: pointer to an array that contains the interrupt IDs
>>> + * @nirqs: number of total interrupts
>>> + * @has_inten_reg: indicates the MSI interrupt enable register needs to be
>>> + * set explicitly (required for some legacy platforms)
>>> + * @used: bitmap to track usage of MSI
>>> + * @inner_domain: inner IRQ domain
>>> + * @msi_domain: MSI IRQ domain
>>> + * @bitmap_lock: lock to protect access to the IRQ bitmap
>>> + * @n_eq_region: required number of 4K aligned memory region for MSI event
>>> + * queues
>>> + * @n_msi_msg_region: required number of 4K aligned memory region for MSI
>>> + * posted writes
>>> + * @eq_base: pointer to allocated memory region for MSI event queues
>>> + * @msi_base: pointer to allocated memory region for MSI posted writes
>>> + */
>>> +struct iproc_msi {
>>> +	struct iproc_pcie *pcie;
>>> +	const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
>>> +	int *irqs;
>>> +	int nirqs;
>>> +	bool has_inten_reg;
>>> +	DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
>>> +	struct irq_domain *inner_domain;
>>> +	struct irq_domain *msi_domain;
>>> +	struct mutex bitmap_lock;
>>> +	unsigned int n_eq_region;
>>> +	unsigned int n_msi_msg_region;
>>> +	void *eq_base;
>>> +	void *msi_base;
>>> +};
>>> +
>>> +static const u16
>>> +iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
>>> +};
>>> +
>>> +static const u16
>>> +iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>> +	{ 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
>>> +	{ 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
>>> +	{ 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
>>> +	{ 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
>>> +};
>>> +
>>> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
>>> +				     enum iproc_msi_reg reg,
>>> +				     unsigned int eq)
>>> +{
>>> +	struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +	return readl(pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Do you need the extra barrier implied by readl? readl_relaxed should be
>> enough.
>>
>>> +}
>>> +
>>> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
>>> +				       enum iproc_msi_reg reg,
>>> +				       int eq, u32 val)
>>> +{
>>> +	struct iproc_pcie *pcie = msi->pcie;
>>> +
>>> +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
>>
>> Same here for writel vs writel_relaxed.
>>
> 
> I probably do not need the barrier in most cases. But as we are dealing 
> with MSI, I thought it's a lot safer to have the barrier in place so the 
> CPU does not re-order the device register accesses with respect to other 
> memory accesses?
> 
>>> +}
>>> +
>>> +static struct irq_chip iproc_msi_top_irq_chip = {
>>> +	.name = "iProc MSI",
>>> +	.irq_enable = pci_msi_unmask_irq,
>>> +	.irq_disable = pci_msi_mask_irq,
>>> +	.irq_mask = pci_msi_mask_irq,
>>> +	.irq_unmask = pci_msi_unmask_irq,
>>
>> There is no need to provide both enable/disable and mask/unmask. And
>> since pci_msi_{un}mask_irq is the default, you can get rid of these
>> function pointers anyway.
>>
> 
> Got it. Like you said, the mask/unmask callback are defaulted to 
> pci_msi_{un}mask_irq in pci_msi_domain_update_chip_ops, called when the 
> MSI irq domain is created.
> 
> I'll get rid of all the callback assignments here.
> 
>>> +};
>>> +
>>> +static struct msi_domain_info iproc_msi_domain_info = {
>>> +	.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
>>> +		MSI_FLAG_PCI_MSIX,
>>> +	.chip = &iproc_msi_top_irq_chip,
>>> +};
>>> +
>>> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
>>> +				      const struct cpumask *mask, bool force)
>>> +{
>>> +	return -EINVAL;
>>
>> I wish people would stop building stupid HW that prevents proper
>> affinity setting for MSI...
>>
> 
> In fact, there's no reason why the HW should prevent us from setting the 
> MSI affinity. This is currently more of a SW issue that I have not spent 
> enough time figuring out.
> 
> Here's my understanding:
> 
> In our system, each MSI is linked to a dedicated interrupt line 
> connected to the GIC upstream (e.g., the GIC from Cortex A9 in Cygnus). 
> Correct me if I'm wrong, to get the MSI affinity to work, all I need 
> should be propagating the affinity setting to the GIC (the 1-to-1 
> mapping helps simply things quite a bit here)?
> 
> I tried to hook this up with irq_chip_set_affinity_parent. But it looks 
> like the irq chip of the parent domain (i.e., the GIC) is pointing to 
> NULL, and therefore it would crash when dereferencing it to get the 
> irq_set_affinity callback.
> 
> I thought I did everything required by figuring out and linking to the 
> correct parent domain in the iproc_msi_init routine:
> 
> parent_node = of_parse_phandle(node, "interrupt-parent", 0);
> if (!parent_node) {
>        dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>        return -ENODEV;
> }
> 
> parent_domain = irq_find_host(parent_node);
> if (!parent_domain) {
>        dev_err(pcie->dev, "unable to get MSI parent domain\n");
>        return -ENODEV;
> }
> 
> ...
> ...
> 
> msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>                                               msi->nirqs, NULL,
>                                               &msi_domain_ops,
>                                               msi);
> 
> I haven't spent too much time investigating, and am hoping to eventually 
> enable affinity support with an incremental patch in the future when I 
> have more time to investigate.

It fails because you're not implementing a fully stacked system, but
only a partial one (see below).

>>> +}
>>> +
>>> +static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
>>> +					  struct msi_msg *msg)
>>> +{
>>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>> +	phys_addr_t addr;
>>> +
>>> +	addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
>>> +	msg->address_lo = lower_32_bits(addr);
>>> +	msg->address_hi = upper_32_bits(addr);
>>> +	msg->data = data->hwirq;
>>> +}
>>> +
>>> +static struct irq_chip iproc_msi_bottom_irq_chip = {
>>> +	.name = "MSI",
>>> +	.irq_set_affinity = iproc_msi_irq_set_affinity,
>>> +	.irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
>>> +};
>>> +
>>> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>>> +				      unsigned int virq, unsigned int nr_irqs,
>>> +				      void *args)
>>> +{
>>> +	struct iproc_msi *msi = domain->host_data;
>>> +	int i, msi_irq;
>>> +
>>> +	mutex_lock(&msi->bitmap_lock);
>>> +
>>> +	for (i = 0; i < nr_irqs; i++) {
>>> +		msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
>>
>> This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
>> we end up with a larger number of MSIs (32 or 64) multiplexed on top of
>> a small number of wired interrupts. Here, you seem to have a 1-1
>> mapping. Is that really the case?
> 
> Yes, based on the poorly written iProc PCIe arch doc, :), we seem to 
> have 1-1 mapping between each wired interrupt and MSI, with each MSI 
> handled by an event queue, that consists of 64x word entries allocated 
> from host memory (DDR). The MSI data is stored in the low 16-bit of each 
> entry, whereas the upper 16-bit of each entry is reserved for the iProc 
> PCIe controller for its own use.

Based on your description, I have a completely different interpretation
of how your HW works:

You have 4 (or 6) doorbells (the location where your PCI device writes
its MSI), and whatever the device writes ends up in the event queue. I'm
pretty sure the HW doesn't interpret what the device writes at all. The
GIC irqs are just a way to indicate that there is something in one of
the queues.

This means that you wouldn't be limited to 4/6 MSIs, but that you could
actually have a lot more (apparently 16 bits worth of MSIs), and you
could use the doorbell address to change your affinity (one GIC
interrupt per CPU, assuming you don't have more than 4 or 6).

>>
>> If so (and assuming the wired interrupts are always contiguous), you
>> shouldn't represent this as a chained interrupt (a multiplexer), but as
>> a stacked irqchip, similar to what GICv2m does.
>>
> 
> Okay, I think I might be missing something here, but I thought I 
> currently have a stacked irqdomain (chip), i.e., GIC -> inner_domain -> 
> MSI domain?

At the moment, you have one stacked domain (MSI -> iProc) that is used
for a chained irqchip that feeds into the GIC.

What you could do is turning it into a stacked domain (MSI -> iProc ->
GIC), using the fact that you have a 1:1 mapping between your MSIs and
the GIC interrupts. But I'm now questionning that fact.

> And does this imply I should expect 'nr_irqs' in this routine to be 
> always zero and therefore I can get rid of the for loop here (same in 
> the domain free routine)?

It should never be zero. It is likely to be always 1, as it is the
number of contiguous MSIs that have to be allocated in one go, and is
exclusively used for multi-MSI (which you don't claim to support).

>>> +		if (msi_irq < msi->nirqs) {
>>> +			set_bit(msi_irq, msi->used);
>>> +		} else {
>>> +			mutex_unlock(&msi->bitmap_lock);
>>> +			return -ENOSPC;
>>> +		}
>>> +
>>> +		irq_domain_set_info(domain, virq + i, msi_irq,
>>> +				    &iproc_msi_bottom_irq_chip,
>>> +				    domain->host_data, handle_simple_irq,
>>> +				    NULL, NULL);
>>> +	}
>>> +
>>> +	mutex_unlock(&msi->bitmap_lock);
>>> +	return 0;
>>> +}
>>> +
>>> +static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>>> +				      unsigned int virq, unsigned int nr_irqs)
>>> +{
>>> +	struct irq_data *data = irq_domain_get_irq_data(domain, virq);
>>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>> +	unsigned int i;
>>> +
>>> +	mutex_lock(&msi->bitmap_lock);
>>> +
>>> +	for (i = 0; i < nr_irqs; i++) {
>>> +		struct irq_data *data = irq_domain_get_irq_data(domain,
>>> +								virq + i);
>>> +		if (!test_bit(data->hwirq, msi->used)) {
>>> +			dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
>>> +				 data->hwirq);
>>> +		} else
>>> +			clear_bit(data->hwirq, msi->used);
>>> +	}
>>> +
>>> +	mutex_unlock(&msi->bitmap_lock);
>>> +	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>>> +}
>>> +
>>> +static const struct irq_domain_ops msi_domain_ops = {
>>> +	.alloc = iproc_msi_irq_domain_alloc,
>>> +	.free = iproc_msi_irq_domain_free,
>>> +};
>>> +
>>> +static void iproc_msi_enable(struct iproc_msi *msi)
>>> +{
>>> +	struct iproc_pcie *pcie = msi->pcie;
>>> +	int i, eq;
>>> +	u32 val;
>>> +
>>> +	/* program memory region for each event queue */
>>> +	for (i = 0; i < msi->n_eq_region; i++) {
>>> +		phys_addr_t addr =
>>> +			virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
>>> +
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
>>> +				    lower_32_bits(addr));
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
>>> +				    upper_32_bits(addr));
>>> +	}
>>> +
>>> +	/* program memory region for MSI posted writes */
>>> +	for (i = 0; i < msi->n_msi_msg_region; i++) {
>>> +		phys_addr_t addr =
>>> +			virt_to_phys(msi->msi_base +
>>> +				     (i * MSI_MSG_MEM_REGION_SIZE));
>>
>> You seem to be a victim of checkpatch. Please don't split statements
>> like this, it just make it harder to read. My terminal can wrap lines
>> very conveniently, and I don't care about the 80 character limit...
>>
> 
> Okay will fix.
> 
>>> +
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
>>> +				    lower_32_bits(addr));
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
>>> +				    upper_32_bits(addr));
>>> +	}
>>> +
>>> +	for (eq = 0; eq < msi->nirqs; eq++) {
>>> +		/* enable MSI event queue */
>>> +		val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
>>> +			IPROC_MSI_EQ_EN;
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
>>> +
>>> +		/*
>>> +		 * Some legacy platforms require the MSI interrupt enable
>>> +		 * register to be set explicitly
>>> +		 */
>>> +		if (msi->has_inten_reg) {
>>> +			val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>> +			val |= BIT(eq);
>>> +			writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>> +		}
>>> +	}
>>> +}
>>> +
>>> +static void iproc_msi_handler(struct irq_desc *desc)
>>> +{
>>> +	unsigned int irq = irq_desc_get_irq(desc);
>>> +	struct irq_chip *irq_chip = irq_desc_get_chip(desc);
>>> +	struct iproc_msi *msi;
>>> +	struct iproc_pcie *pcie;
>>> +	u32 eq, head, tail, num_events;
>>> +	int virq;
>>> +
>>> +	chained_irq_enter(irq_chip, desc);
>>> +
>>> +	msi = irq_get_handler_data(irq);
>>> +	pcie = msi->pcie;
>>> +
>>> +	eq = irq - msi->irqs[0];
>>> +	virq = irq_find_mapping(msi->inner_domain, eq);
>>> +	head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
>>> +		IPROC_MSI_EQ_MASK;
>>> +	do {
>>> +		tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
>>> +			IPROC_MSI_EQ_MASK;
>>> +
>>> +		num_events = (tail < head) ?
>>> +			(IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
>>> +		if (!num_events)
>>> +			break;
>>> +
>>> +		generic_handle_irq(virq);
>>> +
>>> +		head++;
>>> +		head %= IPROC_MSI_EQ_LEN;
>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
>>> +	} while (true);
>>
>> That's unusual. You seem to get only one interrupt for a bunch of MSIs,
>> all representing the same IRQ? That feels very weird, as you can
>> usually collapse edge interrupts.
>>
> 
> I think we get one GIC interrupt per MSI line (1:1 mapping), but then 
> the MSI data message can be more than one, stored in the event queue 
> reserved for that MSI/interrupt line.

But can you have different messages? Buffering the same message N times
is pretty useless (you can coalesce MSIs), but I'd completely understand
the design if you could store N different messages. If that's the case,
then the HW design would actually be a lot saner (whoever thought that 4
or 6 MSIs would be enough needs a PCIe crash course...).

> When you mentioned chained IRQ above, do you really mean the logic here? 
> In fact, I don't think I really need to use the chained irq APIs here, 
> as the MSI and GIC interrupt line has a 1-to-1 mapping.

If you definitely have a 1:1 mapping, and that you cannot store
arbitrary messages in memory, then yes, you can switch to a fully
stacked configuration.

>>> +
>>> +	chained_irq_exit(irq_chip, desc);
>>> +}
>>> +
> 
> Can probably get rid of the chained_irq_enter and exit?

You could get rid of a lot more than that, but that depends on the above
discussion.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support
  2015-11-20  8:56       ` Marc Zyngier
@ 2015-11-20 17:07         ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2015-11-20 17:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Bjorn Helgaas, Arnd Bergmann, Hauke Mehrtens, linux-kernel,
	bcm-kernel-feedback-list, linux-pci

Hi Marc,

On 11/20/2015 12:56 AM, Marc Zyngier wrote:
> On 19/11/15 01:37, Ray Jui wrote:
>> Hi Marc,
>>
>> On 11/18/2015 12:48 AM, Marc Zyngier wrote:
>>> On Tue, 17 Nov 2015 16:31:54 -0800
>>> Ray Jui <rjui@broadcom.com> wrote:
>>>
>>> Hi Ray,
>>>
>>> A few comments below.
>>>
>>>> This patch adds PCIe MSI support for both PAXB and PAXC interfaces on
>>>> all iProc based platforms. The patch follows the latest trend in the
>>>> kernel to use MSI domain based implementation
>>>>
>>>> This iProc event queue based MSI support should not be used with newer
>>>> platforms with integrated MSI support in the GIC (e.g., giv2m or
>>>> gicv3-its)
>>>>
>>>> Signed-off-by: Ray Jui <rjui@broadcom.com>
>>>> Reviewed-by: Anup Patel <anup.patel@broadcom.com>
>>>> Reviewed-by: Vikram Prakash <vikramp@broadcom.com>
>>>> Reviewed-by: Scott Branden <sbranden@broadcom.com>
>>>> ---
>>>>    drivers/pci/host/Kconfig          |   9 +
>>>>    drivers/pci/host/Makefile         |   1 +
>>>>    drivers/pci/host/pcie-iproc-msi.c | 434 ++++++++++++++++++++++++++++++++++++++
>>>>    drivers/pci/host/pcie-iproc.c     |  19 ++
>>>>    drivers/pci/host/pcie-iproc.h     |  12 ++
>>>>    5 files changed, 475 insertions(+)
>>>>    create mode 100644 drivers/pci/host/pcie-iproc-msi.c
>>>>
>>>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
>>>> index f131ba9..972e906 100644
>>>> --- a/drivers/pci/host/Kconfig
>>>> +++ b/drivers/pci/host/Kconfig
>>>> @@ -126,6 +126,15 @@ config PCIE_IPROC
>>>>    	  iProc family of SoCs. An appropriate bus interface driver also needs
>>>>    	  to be enabled
>>>>
>>>> +config PCIE_IPROC_MSI
>>>> +	bool "Broadcom iProc PCIe MSI support"
>>>> +	depends on ARCH_BCM_IPROC && PCI_MSI
>>>> +	select PCI_MSI_IRQ_DOMAIN
>>>> +	default ARCH_BCM_IPROC
>>>> +	help
>>>> +	  Say Y here if you want to enable MSI support for Broadcom's iProc
>>>> +	  PCIe controller
>>>> +
>>>>    config PCIE_IPROC_PLATFORM
>>>>    	tristate "Broadcom iProc PCIe platform bus driver"
>>>>    	depends on ARCH_BCM_IPROC || (ARM && COMPILE_TEST)
>>>> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
>>>> index 9d4d3c6..0e4e95e 100644
>>>> --- a/drivers/pci/host/Makefile
>>>> +++ b/drivers/pci/host/Makefile
>>>> @@ -15,6 +15,7 @@ obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o
>>>>    obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
>>>>    obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
>>>>    obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
>>>> +obj-$(CONFIG_PCIE_IPROC_MSI) += pcie-iproc-msi.o
>>>>    obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
>>>>    obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
>>>>    obj-$(CONFIG_PCIE_ALTERA) += pcie-altera.o
>>>> diff --git a/drivers/pci/host/pcie-iproc-msi.c b/drivers/pci/host/pcie-iproc-msi.c
>>>> new file mode 100644
>>>> index 0000000..a55c707
>>>> --- /dev/null
>>>> +++ b/drivers/pci/host/pcie-iproc-msi.c
>>>> @@ -0,0 +1,434 @@
>>>> +/*
>>>> + * Copyright (C) 2015 Broadcom Corporation
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or
>>>> + * modify it under the terms of the GNU General Public License as
>>>> + * published by the Free Software Foundation version 2.
>>>> + *
>>>> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
>>>> + * kind, whether express or implied; without even the implied warranty
>>>> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + */
>>>> +
>>>> +#include <linux/irqchip/chained_irq.h>
>>>> +#include <linux/irqdomain.h>
>>>> +#include <linux/msi.h>
>>>> +#include <linux/of_irq.h>
>>>> +#include <linux/of_pci.h>
>>>> +#include <linux/pci.h>
>>>> +
>>>> +#include "pcie-iproc.h"
>>>> +
>>>> +#define IPROC_MSI_INTS_EN_OFFSET       0x208
>>>> +#define IPROC_MSI_INTR_EN_SHIFT        11
>>>> +#define IPROC_MSI_INTR_EN              BIT(IPROC_MSI_INTR_EN_SHIFT)
>>>> +#define IPROC_MSI_INT_N_EVENT_SHIFT    1
>>>> +#define IPROC_MSI_INT_N_EVENT          BIT(IPROC_MSI_INT_N_EVENT_SHIFT)
>>>> +#define IPROC_MSI_EQ_EN_SHIFT          0
>>>> +#define IPROC_MSI_EQ_EN                BIT(IPROC_MSI_EQ_EN_SHIFT)
>>>> +
>>>> +#define IPROC_MSI_EQ_MASK              0x3f
>>>> +
>>>> +/* number of queues in each event queue */
>>>> +#define IPROC_MSI_EQ_LEN               64
>>>> +
>>>> +/* size of each event queue memory region */
>>>> +#define EQ_MEM_REGION_SIZE           SZ_4K
>>>> +
>>>> +/* size of each MSI message memory region */
>>>> +#define MSI_MSG_MEM_REGION_SIZE      SZ_4K
>>>> +
>>>> +enum iproc_msi_reg {
>>>> +	IPROC_MSI_EQ_PAGE = 0,
>>>> +	IPROC_MSI_EQ_PAGE_UPPER,
>>>> +	IPROC_MSI_PAGE,
>>>> +	IPROC_MSI_PAGE_UPPER,
>>>> +	IPROC_MSI_CTRL,
>>>> +	IPROC_MSI_EQ_HEAD,
>>>> +	IPROC_MSI_EQ_TAIL,
>>>> +	IPROC_MSI_REG_SIZE,
>>>> +};
>>>> +
>>>> +/**
>>>> + * iProc event queue based MSI
>>>> + *
>>>> + * Only meant to be used on platforms without MSI support integrated into the
>>>> + * GIC
>>>> + *
>>>> + * @pcie: pointer to iProc PCIe data
>>>> + * @reg_offsets: MSI register offsets
>>>> + * @irqs: pointer to an array that contains the interrupt IDs
>>>> + * @nirqs: number of total interrupts
>>>> + * @has_inten_reg: indicates the MSI interrupt enable register needs to be
>>>> + * set explicitly (required for some legacy platforms)
>>>> + * @used: bitmap to track usage of MSI
>>>> + * @inner_domain: inner IRQ domain
>>>> + * @msi_domain: MSI IRQ domain
>>>> + * @bitmap_lock: lock to protect access to the IRQ bitmap
>>>> + * @n_eq_region: required number of 4K aligned memory region for MSI event
>>>> + * queues
>>>> + * @n_msi_msg_region: required number of 4K aligned memory region for MSI
>>>> + * posted writes
>>>> + * @eq_base: pointer to allocated memory region for MSI event queues
>>>> + * @msi_base: pointer to allocated memory region for MSI posted writes
>>>> + */
>>>> +struct iproc_msi {
>>>> +	struct iproc_pcie *pcie;
>>>> +	const u16 (*reg_offsets)[IPROC_MSI_REG_SIZE];
>>>> +	int *irqs;
>>>> +	int nirqs;
>>>> +	bool has_inten_reg;
>>>> +	DECLARE_BITMAP(used, IPROC_PCIE_MAX_NUM_IRQS);
>>>> +	struct irq_domain *inner_domain;
>>>> +	struct irq_domain *msi_domain;
>>>> +	struct mutex bitmap_lock;
>>>> +	unsigned int n_eq_region;
>>>> +	unsigned int n_msi_msg_region;
>>>> +	void *eq_base;
>>>> +	void *msi_base;
>>>> +};
>>>> +
>>>> +static const u16
>>>> +iproc_msi_reg_paxb[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x210, 0x250, 0x254 },
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x214, 0x258, 0x25c },
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x218, 0x260, 0x264 },
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x21c, 0x268, 0x26c },
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x220, 0x270, 0x274 },
>>>> +	{ 0x200, 0x2c0, 0x204, 0x2c4, 0x224, 0x278, 0x27c },
>>>> +};
>>>> +
>>>> +static const u16
>>>> +iproc_msi_reg_paxc[IPROC_PCIE_MAX_NUM_IRQS][IPROC_MSI_REG_SIZE] = {
>>>> +	{ 0xc00, 0xc04, 0xc08, 0xc0c, 0xc40, 0xc50, 0xc60 },
>>>> +	{ 0xc10, 0xc14, 0xc18, 0xc1c, 0xc44, 0xc54, 0xc64 },
>>>> +	{ 0xc20, 0xc24, 0xc28, 0xc2c, 0xc48, 0xc58, 0xc68 },
>>>> +	{ 0xc30, 0xc34, 0xc38, 0xc3c, 0xc4c, 0xc5c, 0xc6c },
>>>> +};
>>>> +
>>>> +static inline u32 iproc_msi_read_reg(struct iproc_msi *msi,
>>>> +				     enum iproc_msi_reg reg,
>>>> +				     unsigned int eq)
>>>> +{
>>>> +	struct iproc_pcie *pcie = msi->pcie;
>>>> +
>>>> +	return readl(pcie->base + msi->reg_offsets[eq][reg]);
>>>
>>> Do you need the extra barrier implied by readl? readl_relaxed should be
>>> enough.
>>>
>>>> +}
>>>> +
>>>> +static inline void iproc_msi_write_reg(struct iproc_msi *msi,
>>>> +				       enum iproc_msi_reg reg,
>>>> +				       int eq, u32 val)
>>>> +{
>>>> +	struct iproc_pcie *pcie = msi->pcie;
>>>> +
>>>> +	writel(val, pcie->base + msi->reg_offsets[eq][reg]);
>>>
>>> Same here for writel vs writel_relaxed.
>>>
>>
>> I probably do not need the barrier in most cases. But as we are dealing
>> with MSI, I thought it's a lot safer to have the barrier in place so the
>> CPU does not re-order the device register accesses with respect to other
>> memory accesses?
>>
>>>> +}
>>>> +
>>>> +static struct irq_chip iproc_msi_top_irq_chip = {
>>>> +	.name = "iProc MSI",
>>>> +	.irq_enable = pci_msi_unmask_irq,
>>>> +	.irq_disable = pci_msi_mask_irq,
>>>> +	.irq_mask = pci_msi_mask_irq,
>>>> +	.irq_unmask = pci_msi_unmask_irq,
>>>
>>> There is no need to provide both enable/disable and mask/unmask. And
>>> since pci_msi_{un}mask_irq is the default, you can get rid of these
>>> function pointers anyway.
>>>
>>
>> Got it. Like you said, the mask/unmask callback are defaulted to
>> pci_msi_{un}mask_irq in pci_msi_domain_update_chip_ops, called when the
>> MSI irq domain is created.
>>
>> I'll get rid of all the callback assignments here.
>>
>>>> +};
>>>> +
>>>> +static struct msi_domain_info iproc_msi_domain_info = {
>>>> +	.flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
>>>> +		MSI_FLAG_PCI_MSIX,
>>>> +	.chip = &iproc_msi_top_irq_chip,
>>>> +};
>>>> +
>>>> +static int iproc_msi_irq_set_affinity(struct irq_data *data,
>>>> +				      const struct cpumask *mask, bool force)
>>>> +{
>>>> +	return -EINVAL;
>>>
>>> I wish people would stop building stupid HW that prevents proper
>>> affinity setting for MSI...
>>>
>>
>> In fact, there's no reason why the HW should prevent us from setting the
>> MSI affinity. This is currently more of a SW issue that I have not spent
>> enough time figuring out.
>>
>> Here's my understanding:
>>
>> In our system, each MSI is linked to a dedicated interrupt line
>> connected to the GIC upstream (e.g., the GIC from Cortex A9 in Cygnus).
>> Correct me if I'm wrong, to get the MSI affinity to work, all I need
>> should be propagating the affinity setting to the GIC (the 1-to-1
>> mapping helps simply things quite a bit here)?
>>
>> I tried to hook this up with irq_chip_set_affinity_parent. But it looks
>> like the irq chip of the parent domain (i.e., the GIC) is pointing to
>> NULL, and therefore it would crash when dereferencing it to get the
>> irq_set_affinity callback.
>>
>> I thought I did everything required by figuring out and linking to the
>> correct parent domain in the iproc_msi_init routine:
>>
>> parent_node = of_parse_phandle(node, "interrupt-parent", 0);
>> if (!parent_node) {
>>         dev_err(pcie->dev, "unable to parse MSI interrupt parent\n");
>>         return -ENODEV;
>> }
>>
>> parent_domain = irq_find_host(parent_node);
>> if (!parent_domain) {
>>         dev_err(pcie->dev, "unable to get MSI parent domain\n");
>>         return -ENODEV;
>> }
>>
>> ...
>> ...
>>
>> msi->inner_domain = irq_domain_add_hierarchy(parent_domain, 0,
>>                                                msi->nirqs, NULL,
>>                                                &msi_domain_ops,
>>                                                msi);
>>
>> I haven't spent too much time investigating, and am hoping to eventually
>> enable affinity support with an incremental patch in the future when I
>> have more time to investigate.
>
> It fails because you're not implementing a fully stacked system, but
> only a partial one (see below).
>
>>>> +}
>>>> +
>>>> +static void iproc_msi_irq_compose_msi_msg(struct irq_data *data,
>>>> +					  struct msi_msg *msg)
>>>> +{
>>>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>>> +	phys_addr_t addr;
>>>> +
>>>> +	addr = virt_to_phys(msi->msi_base) | (data->hwirq * 4);
>>>> +	msg->address_lo = lower_32_bits(addr);
>>>> +	msg->address_hi = upper_32_bits(addr);
>>>> +	msg->data = data->hwirq;
>>>> +}
>>>> +
>>>> +static struct irq_chip iproc_msi_bottom_irq_chip = {
>>>> +	.name = "MSI",
>>>> +	.irq_set_affinity = iproc_msi_irq_set_affinity,
>>>> +	.irq_compose_msi_msg = iproc_msi_irq_compose_msi_msg,
>>>> +};
>>>> +
>>>> +static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>>>> +				      unsigned int virq, unsigned int nr_irqs,
>>>> +				      void *args)
>>>> +{
>>>> +	struct iproc_msi *msi = domain->host_data;
>>>> +	int i, msi_irq;
>>>> +
>>>> +	mutex_lock(&msi->bitmap_lock);
>>>> +
>>>> +	for (i = 0; i < nr_irqs; i++) {
>>>> +		msi_irq = find_first_zero_bit(msi->used, msi->nirqs);
>>>
>>> This is slightly puzzling. Do you really have at most 6 MSIs? Usually,
>>> we end up with a larger number of MSIs (32 or 64) multiplexed on top of
>>> a small number of wired interrupts. Here, you seem to have a 1-1
>>> mapping. Is that really the case?
>>
>> Yes, based on the poorly written iProc PCIe arch doc, :), we seem to
>> have 1-1 mapping between each wired interrupt and MSI, with each MSI
>> handled by an event queue, that consists of 64x word entries allocated
>> from host memory (DDR). The MSI data is stored in the low 16-bit of each
>> entry, whereas the upper 16-bit of each entry is reserved for the iProc
>> PCIe controller for its own use.
>
> Based on your description, I have a completely different interpretation
> of how your HW works:
>
> You have 4 (or 6) doorbells (the location where your PCI device writes
> its MSI), and whatever the device writes ends up in the event queue. I'm
> pretty sure the HW doesn't interpret what the device writes at all. The
> GIC irqs are just a way to indicate that there is something in one of
> the queues.

Yes, see my replies on another email, and your interpretation here is 
absolutely correct.

>
> This means that you wouldn't be limited to 4/6 MSIs, but that you could
> actually have a lot more (apparently 16 bits worth of MSIs), and you
> could use the doorbell address to change your affinity (one GIC
> interrupt per CPU, assuming you don't have more than 4 or 6).
>

Yes, you are right. It looks like the number of MSI vectors to be 
supported by each event queue can be up to 64.

Given that the max number of CPUs on any SoC that uses the iProc event 
queue is 4 (and I'm glad that an upcoming 8-core processor can use 
GICv3-its instead), I can evenly distribute 4 interrupts to each of the 
CPUs.

>>>
>>> If so (and assuming the wired interrupts are always contiguous), you
>>> shouldn't represent this as a chained interrupt (a multiplexer), but as
>>> a stacked irqchip, similar to what GICv2m does.
>>>
>>
>> Okay, I think I might be missing something here, but I thought I
>> currently have a stacked irqdomain (chip), i.e., GIC -> inner_domain ->
>> MSI domain?
>
> At the moment, you have one stacked domain (MSI -> iProc) that is used
> for a chained irqchip that feeds into the GIC.
>
> What you could do is turning it into a stacked domain (MSI -> iProc ->
> GIC), using the fact that you have a 1:1 mapping between your MSIs and
> the GIC interrupts. But I'm now questionning that fact.
>

Indeed, should be 64 MSI vectors per GIC interrupt instead of 1:1.

>> And does this imply I should expect 'nr_irqs' in this routine to be
>> always zero and therefore I can get rid of the for loop here (same in
>> the domain free routine)?
>
> It should never be zero. It is likely to be always 1, as it is the
> number of contiguous MSIs that have to be allocated in one go, and is
> exclusively used for multi-MSI (which you don't claim to support).
>
>>>> +		if (msi_irq < msi->nirqs) {
>>>> +			set_bit(msi_irq, msi->used);
>>>> +		} else {
>>>> +			mutex_unlock(&msi->bitmap_lock);
>>>> +			return -ENOSPC;
>>>> +		}
>>>> +
>>>> +		irq_domain_set_info(domain, virq + i, msi_irq,
>>>> +				    &iproc_msi_bottom_irq_chip,
>>>> +				    domain->host_data, handle_simple_irq,
>>>> +				    NULL, NULL);
>>>> +	}
>>>> +
>>>> +	mutex_unlock(&msi->bitmap_lock);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>>>> +				      unsigned int virq, unsigned int nr_irqs)
>>>> +{
>>>> +	struct irq_data *data = irq_domain_get_irq_data(domain, virq);
>>>> +	struct iproc_msi *msi = irq_data_get_irq_chip_data(data);
>>>> +	unsigned int i;
>>>> +
>>>> +	mutex_lock(&msi->bitmap_lock);
>>>> +
>>>> +	for (i = 0; i < nr_irqs; i++) {
>>>> +		struct irq_data *data = irq_domain_get_irq_data(domain,
>>>> +								virq + i);
>>>> +		if (!test_bit(data->hwirq, msi->used)) {
>>>> +			dev_warn(msi->pcie->dev, "freeing unused MSI %lu\n",
>>>> +				 data->hwirq);
>>>> +		} else
>>>> +			clear_bit(data->hwirq, msi->used);
>>>> +	}
>>>> +
>>>> +	mutex_unlock(&msi->bitmap_lock);
>>>> +	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>>>> +}
>>>> +
>>>> +static const struct irq_domain_ops msi_domain_ops = {
>>>> +	.alloc = iproc_msi_irq_domain_alloc,
>>>> +	.free = iproc_msi_irq_domain_free,
>>>> +};
>>>> +
>>>> +static void iproc_msi_enable(struct iproc_msi *msi)
>>>> +{
>>>> +	struct iproc_pcie *pcie = msi->pcie;
>>>> +	int i, eq;
>>>> +	u32 val;
>>>> +
>>>> +	/* program memory region for each event queue */
>>>> +	for (i = 0; i < msi->n_eq_region; i++) {
>>>> +		phys_addr_t addr =
>>>> +			virt_to_phys(msi->eq_base + (i * EQ_MEM_REGION_SIZE));
>>>> +
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE, i,
>>>> +				    lower_32_bits(addr));
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_PAGE_UPPER, i,
>>>> +				    upper_32_bits(addr));
>>>> +	}
>>>> +
>>>> +	/* program memory region for MSI posted writes */
>>>> +	for (i = 0; i < msi->n_msi_msg_region; i++) {
>>>> +		phys_addr_t addr =
>>>> +			virt_to_phys(msi->msi_base +
>>>> +				     (i * MSI_MSG_MEM_REGION_SIZE));
>>>
>>> You seem to be a victim of checkpatch. Please don't split statements
>>> like this, it just make it harder to read. My terminal can wrap lines
>>> very conveniently, and I don't care about the 80 character limit...
>>>
>>
>> Okay will fix.
>>
>>>> +
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE, i,
>>>> +				    lower_32_bits(addr));
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_PAGE_UPPER, i,
>>>> +				    upper_32_bits(addr));
>>>> +	}
>>>> +
>>>> +	for (eq = 0; eq < msi->nirqs; eq++) {
>>>> +		/* enable MSI event queue */
>>>> +		val = IPROC_MSI_INTR_EN | IPROC_MSI_INT_N_EVENT |
>>>> +			IPROC_MSI_EQ_EN;
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_CTRL, eq, val);
>>>> +
>>>> +		/*
>>>> +		 * Some legacy platforms require the MSI interrupt enable
>>>> +		 * register to be set explicitly
>>>> +		 */
>>>> +		if (msi->has_inten_reg) {
>>>> +			val = readl(pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>>> +			val |= BIT(eq);
>>>> +			writel(val, pcie->base + IPROC_MSI_INTS_EN_OFFSET);
>>>> +		}
>>>> +	}
>>>> +}
>>>> +
>>>> +static void iproc_msi_handler(struct irq_desc *desc)
>>>> +{
>>>> +	unsigned int irq = irq_desc_get_irq(desc);
>>>> +	struct irq_chip *irq_chip = irq_desc_get_chip(desc);
>>>> +	struct iproc_msi *msi;
>>>> +	struct iproc_pcie *pcie;
>>>> +	u32 eq, head, tail, num_events;
>>>> +	int virq;
>>>> +
>>>> +	chained_irq_enter(irq_chip, desc);
>>>> +
>>>> +	msi = irq_get_handler_data(irq);
>>>> +	pcie = msi->pcie;
>>>> +
>>>> +	eq = irq - msi->irqs[0];
>>>> +	virq = irq_find_mapping(msi->inner_domain, eq);
>>>> +	head = iproc_msi_read_reg(msi, IPROC_MSI_EQ_HEAD, eq) &
>>>> +		IPROC_MSI_EQ_MASK;
>>>> +	do {
>>>> +		tail = iproc_msi_read_reg(msi, IPROC_MSI_EQ_TAIL, eq) &
>>>> +			IPROC_MSI_EQ_MASK;
>>>> +
>>>> +		num_events = (tail < head) ?
>>>> +			(IPROC_MSI_EQ_LEN - (head - tail)) : (tail - head);
>>>> +		if (!num_events)
>>>> +			break;
>>>> +
>>>> +		generic_handle_irq(virq);
>>>> +
>>>> +		head++;
>>>> +		head %= IPROC_MSI_EQ_LEN;
>>>> +		iproc_msi_write_reg(msi, IPROC_MSI_EQ_HEAD, eq, head);
>>>> +	} while (true);
>>>
>>> That's unusual. You seem to get only one interrupt for a bunch of MSIs,
>>> all representing the same IRQ? That feels very weird, as you can
>>> usually collapse edge interrupts.
>>>
>>
>> I think we get one GIC interrupt per MSI line (1:1 mapping), but then
>> the MSI data message can be more than one, stored in the event queue
>> reserved for that MSI/interrupt line.
>
> But can you have different messages? Buffering the same message N times
> is pretty useless (you can coalesce MSIs), but I'd completely understand
> the design if you could store N different messages. If that's the case,
> then the HW design would actually be a lot saner (whoever thought that 4
> or 6 MSIs would be enough needs a PCIe crash course...).
>

This all comes from the confusion that we thought we have 1 MSI vector 
per GIC line. Now this starts to make much more sense after you pointed 
it out and discussions with our ASIC engineers. Each MSI message stored 
in the event queue can indicate a different MSI vector.

>> When you mentioned chained IRQ above, do you really mean the logic here?
>> In fact, I don't think I really need to use the chained irq APIs here,
>> as the MSI and GIC interrupt line has a 1-to-1 mapping.
>
> If you definitely have a 1:1 mapping, and that you cannot store
> arbitrary messages in memory, then yes, you can switch to a fully
> stacked configuration.
>

I will now chain all MSI vectors on the same GIC interrupt.

>>>> +
>>>> +	chained_irq_exit(irq_chip, desc);
>>>> +}
>>>> +
>>
>> Can probably get rid of the chained_irq_enter and exit?
>
> You could get rid of a lot more than that, but that depends on the above
> discussion.
>
> Thanks,
>
> 	M.
>

Thanks a lot, Marc!

Ray

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-11-20 17:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18  0:31 [PATCH 0/5] Add iProc PCIe PAXC and MSI support Ray Jui
2015-11-18  0:31 ` [PATCH 1/5] PCI: iproc: Update iProc PCIe device tree binding Ray Jui
2015-11-18  0:31 ` [PATCH 2/5] PCI: iproc: Add PAXC interface support Ray Jui
2015-11-18  0:34   ` Florian Fainelli
2015-11-18  0:46     ` Ray Jui
2015-11-18  0:45       ` Florian Fainelli
2015-11-18  0:47         ` Ray Jui
2015-11-18  0:31 ` [PATCH 3/5] PCI: iproc: Add iProc PCIe MSI device tree binding Ray Jui
2015-11-18  0:31 ` [PATCH 4/5] PCI: iproc: Add iProc PCIe MSI support Ray Jui
2015-11-18  8:48   ` Marc Zyngier
2015-11-18  9:50     ` Arnd Bergmann
2015-11-19 19:22       ` Ray Jui
2015-11-19  1:37     ` Ray Jui
2015-11-19  2:56       ` Ray Jui
2015-11-19  7:23       ` Ray Jui
2015-11-19  8:31       ` Arnd Bergmann
2015-11-19 23:05         ` Ray Jui
2015-11-20  8:56       ` Marc Zyngier
2015-11-20 17:07         ` Ray Jui
2015-11-18  0:31 ` [PATCH 5/5] ARM: dts: Enable MSI support for Broadcom Cygnus Ray Jui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).