linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
@ 2018-09-19 12:35 laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets laurentiu.tudor
                   ` (21 more replies)
  0 siblings, 22 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

This patch series adds SMMU support for NXP LS1043A and LS1046A chips
and consists mostly in important driver fixes and the required device
tree updates. It touches several subsystems and consists of three main
parts:
 - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
   reserved memory areas, fixes and defered probe support
 - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
   consisting in misc dma mapping related fixes and probe ordering
 - addition of the actual arm smmu device tree node together with
   various adjustments to the device trees

Performance impact

    Running iperf benchmarks in a back-to-back setup (both sides
    having smmu enabled) on a 10GBps port show an important
    networking performance degradation of around %40 (9.48Gbps
    linerate vs 5.45Gbps). If you need performance but without
    SMMU support you can use "iommu.passthrough=1" to disable
    SMMU.

USB issue and workaround

    There's a problem with the usb controllers in these chips
    generating smaller, 40-bit wide dma addresses instead of the 48-bit
    supported at the smmu input. So you end up in a situation where the
    smmu is mapped with 48-bit address translations, but the device
    generates transactions with clipped 40-bit addresses, thus smmu
    context faults are triggered. I encountered a similar situation for
    mmc that I  managed to fix in software [1] however for USB I did not
    find a proper place in the code to add a similar fix. The only
    workaround I found was to add this kernel parameter which limits the
    usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
    This workaround if far from ideal, so any suggestions for a code
    based workaround in this area would be greatly appreciated.

The patch set is based on net-next so, if generally agreed, I'd suggest
to get the patches through the netdev tree after getting all the Acks.

[1] https://patchwork.kernel.org/patch/10506627/

Laurentiu Tudor (21):
  soc/fsl/qman: fixup liodns only on ppc targets
  soc/fsl/bman: map FBPR area in the iommu
  soc/fsl/qman: map FQD and PFDR areas in the iommu
  soc/fsl/qman-portal: map CENA area in the iommu
  soc/fsl/qbman: add APIs to retrieve the probing status
  soc/fsl/qman_portals: defer probe after qman's probe
  soc/fsl/bman_portals: defer probe after bman's probe
  soc/fsl/qbman_portals: add APIs to retrieve the probing status
  fsl/fman: backup and restore ICID registers
  fsl/fman: add API to get the device behind a fman port
  dpaa_eth: defer probing after qbman
  dpaa_eth: base dma mappings on the fman rx port
  dpaa_eth: fix iova handling for contiguous frames
  dpaa_eth: fix iova handling for sg frames
  dpaa_eth: fix SG frame cleanup
  arm64: dts: ls1046a: add smmu node
  arm64: dts: ls1043a: add smmu node
  arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
  arm64: dts: ls104x: add missing dma ranges property
  arm64: dts: ls104x: add iommu-map to pci controllers
  arm64: dts: ls104x: make dma-coherent global to the SoC

 .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
 .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
 drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
 drivers/net/ethernet/freescale/fman/fman.h    |   4 +
 .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
 .../net/ethernet/freescale/fman/fman_port.h   |   2 +
 drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
 drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
 drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
 drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
 include/soc/fsl/bman.h                        |  16 +++
 include/soc/fsl/qman.h                        |  17 +++
 13 files changed, 379 insertions(+), 53 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 02/21] soc/fsl/bman: map FBPR area in the iommu laurentiu.tudor
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

ARM SoCs use SMMU so the liodn fixup done in the qman driver is no
longer making sense and it also breaks the ICID settings inherited
from u-boot. Do the fixups only for PPC targets.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/qman_ccsr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 79cba58387a5..619e22030460 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -597,6 +597,7 @@ static int qman_init_ccsr(struct device *dev)
 #define LIO_CFG_LIODN_MASK 0x0fff0000
 void qman_liodn_fixup(u16 channel)
 {
+#ifdef CONFIG_PPC
 	static int done;
 	static u32 liodn_offset;
 	u32 before, after;
@@ -616,6 +617,7 @@ void qman_liodn_fixup(u16 channel)
 		qm_ccsr_out(REG_REV3_QCSP_LIO_CFG(idx), after);
 	else
 		qm_ccsr_out(REG_QCSP_LIO_CFG(idx), after);
+#endif
 }
 
 #define IO_CFG_SDEST_MASK 0x00ff0000
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/21] soc/fsl/bman: map FBPR area in the iommu
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 03/21] soc/fsl/qman: map FQD and PFDR areas " laurentiu.tudor
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add a one-to-one iommu mapping for bman private data memory (FBPR).
This is required for BMAN to work without faults behind an iommu.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/bman_ccsr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c
index 05c42235dd41..680f67f04fb4 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -29,6 +29,7 @@
  */
 
 #include "bman_priv.h"
+#include <linux/iommu.h>
 
 u16 bman_ip_rev;
 EXPORT_SYMBOL(bman_ip_rev);
@@ -171,6 +172,7 @@ static int fsl_bman_probe(struct platform_device *pdev)
 	int ret, err_irq;
 	struct device *dev = &pdev->dev;
 	struct device_node *node = dev->of_node;
+	struct iommu_domain *domain;
 	struct resource *res;
 	u16 id, bm_pool_cnt;
 	u8 major, minor;
@@ -216,6 +218,16 @@ static int fsl_bman_probe(struct platform_device *pdev)
 
 	dev_dbg(dev, "Allocated FBPR 0x%llx 0x%zx\n", fbpr_a, fbpr_sz);
 
+	/* Create an 1-to-1 iommu mapping for FBPR area */
+	domain = iommu_get_domain_for_dev(dev);
+	if (domain) {
+		ret = iommu_map(iommu_get_domain_for_dev(dev),
+				fbpr_a, fbpr_a, fbpr_sz,
+				IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+		if (ret)
+			dev_warn(dev, "failed to iommu_map() %d\n", ret);
+	}
+
 	bm_set_memory(fbpr_a, fbpr_sz);
 
 	err_irq = platform_get_irq(pdev, 0);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/21] soc/fsl/qman: map FQD and PFDR areas in the iommu
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 02/21] soc/fsl/bman: map FBPR area in the iommu laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 04/21] soc/fsl/qman-portal: map CENA area " laurentiu.tudor
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add a one-to-one iommu mapping for qman private data memory areas
(FQD and PFDR). This is required for QMAN to work without faults
behind an iommu.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/qman_ccsr.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 619e22030460..7163f7511ce1 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -29,6 +29,7 @@
  */
 
 #include "qman_priv.h"
+#include <linux/iommu.h>
 
 u16 qman_ip_rev;
 EXPORT_SYMBOL(qman_ip_rev);
@@ -692,6 +693,7 @@ static int fsl_qman_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *node = dev->of_node;
+	struct iommu_domain *domain;
 	struct resource *res;
 	int ret, err_irq;
 	u16 id;
@@ -769,6 +771,21 @@ static int fsl_qman_probe(struct platform_device *pdev)
 	}
 	dev_dbg(dev, "Allocated PFDR 0x%llx 0x%zx\n", pfdr_a, pfdr_sz);
 
+	/* Create an 1-to-1 iommu mapping for fqd and pfdr areas */
+	domain = iommu_get_domain_for_dev(dev);
+	if (domain) {
+		ret = iommu_map(domain,
+				fqd_a, fqd_a, fqd_sz,
+				IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+		if (ret)
+			dev_warn(dev, "iommu_map(fqd) failed %d\n", ret);
+		ret = iommu_map(domain,
+				pfdr_a, pfdr_a, pfdr_sz,
+				IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+		if (ret)
+			dev_warn(dev, "iommu_map(pfdr) failed %d\n", ret);
+	}
+
 	ret = qman_init_ccsr(dev);
 	if (ret) {
 		dev_err(dev, "CCSR setup failed\n");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/21] soc/fsl/qman-portal: map CENA area in the iommu
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (2 preceding siblings ...)
  2018-09-19 12:35 ` [PATCH 03/21] soc/fsl/qman: map FQD and PFDR areas " laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 05/21] soc/fsl/qbman: add APIs to retrieve the probing status laurentiu.tudor
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add a one-to-one iommu mapping for qman portal CENA register area.
This is required for QMAN stashing to work without faults behind
an iommu.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/qman_portal.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index a120002b630e..012bb95e87e1 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -29,6 +29,7 @@
  */
 
 #include "qman_priv.h"
+#include <linux/iommu.h>
 
 struct qman_portal *qman_dma_portal;
 EXPORT_SYMBOL(qman_dma_portal);
@@ -222,6 +223,7 @@ static int qman_portal_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *node = dev->of_node;
+	struct iommu_domain *domain;
 	struct qm_portal_config *pcfg;
 	struct resource *addr_phys[2];
 	int irq, cpu, err;
@@ -276,6 +278,21 @@ static int qman_portal_probe(struct platform_device *pdev)
 		goto err_ioremap2;
 	}
 
+	/* Create an 1-to-1 iommu mapping for cena portal area */
+	domain = iommu_get_domain_for_dev(dev);
+	if (domain) {
+		/*
+		 * Note: not mapping this as cacheable triggers the infamous
+		 * QMan CIDE error.
+		 */
+		err = iommu_map(iommu_get_domain_for_dev(dev),
+				addr_phys[0]->start, addr_phys[0]->start,
+				resource_size(addr_phys[0]),
+				IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE);
+		if (err)
+			dev_warn(dev, "failed to iommu_map() %d\n", err);
+	}
+
 	pcfg->pools = qm_get_pools_sdqcr();
 
 	spin_lock(&qman_lock);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/21] soc/fsl/qbman: add APIs to retrieve the probing status
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (3 preceding siblings ...)
  2018-09-19 12:35 ` [PATCH 04/21] soc/fsl/qman-portal: map CENA area " laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 06/21] soc/fsl/qman_portals: defer probe after qman's probe laurentiu.tudor
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add a couple of new APIs to check the probing status of qman and bman:
 'int bman_is_probed()' and 'int qman_is_probed()'.
They return the following values.
 *  1 if qman/bman were probed correctly
 *  0 if qman/bman were not yet probed
 * -1 if probing of qman/bman failed
Drivers that use qman/bman driver services are required to use these
APIs before calling any functions exported by qman or bman drivers
or otherwise they will crash the kernel.
The APIs will be used in the following couple of qbman portal patches
and later in the series in the dpaa1 ethernet driver.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/bman_ccsr.c | 11 +++++++++++
 drivers/soc/fsl/qbman/qman_ccsr.c | 11 +++++++++++
 include/soc/fsl/bman.h            |  8 ++++++++
 include/soc/fsl/qman.h            |  8 ++++++++
 4 files changed, 38 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c
index 680f67f04fb4..2c11883d42a5 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -121,6 +121,7 @@ static void bm_set_memory(u64 ba, u32 size)
  */
 static dma_addr_t fbpr_a;
 static size_t fbpr_sz;
+static int __bman_probed;
 
 static int bman_fbpr(struct reserved_mem *rmem)
 {
@@ -167,6 +168,12 @@ static irqreturn_t bman_isr(int irq, void *ptr)
 	return IRQ_HANDLED;
 }
 
+int bman_is_probed(void)
+{
+	return __bman_probed;
+}
+EXPORT_SYMBOL_GPL(bman_is_probed);
+
 static int fsl_bman_probe(struct platform_device *pdev)
 {
 	int ret, err_irq;
@@ -177,6 +184,8 @@ static int fsl_bman_probe(struct platform_device *pdev)
 	u16 id, bm_pool_cnt;
 	u8 major, minor;
 
+	__bman_probed = -1;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!res) {
 		dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n",
@@ -267,6 +276,8 @@ static int fsl_bman_probe(struct platform_device *pdev)
 		return ret;
 	}
 
+	__bman_probed = 1;
+
 	return 0;
 };
 
diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 7163f7511ce1..0bfbe24b479a 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -274,6 +274,7 @@ static const struct qman_error_info_mdata error_mdata[] = {
 static u32 __iomem *qm_ccsr_start;
 /* A SDQCR mask comprising all the available/visible pool channels */
 static u32 qm_pools_sdqcr;
+static int __qman_probed;
 
 static inline u32 qm_ccsr_in(u32 offset)
 {
@@ -689,6 +690,12 @@ static int qman_resource_init(struct device *dev)
 	return 0;
 }
 
+int qman_is_probed(void)
+{
+	return __qman_probed;
+}
+EXPORT_SYMBOL_GPL(qman_is_probed);
+
 static int fsl_qman_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -699,6 +706,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
 	u16 id;
 	u8 major, minor;
 
+	__qman_probed = -1;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!res) {
 		dev_err(dev, "Can't get %pOF property 'IORESOURCE_MEM'\n",
@@ -847,6 +856,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	__qman_probed = 1;
+
 	return 0;
 }
 
diff --git a/include/soc/fsl/bman.h b/include/soc/fsl/bman.h
index eaaf56df4086..5b99cb2ea5ef 100644
--- a/include/soc/fsl/bman.h
+++ b/include/soc/fsl/bman.h
@@ -126,4 +126,12 @@ int bman_release(struct bman_pool *pool, const struct bm_buffer *bufs, u8 num);
  */
 int bman_acquire(struct bman_pool *pool, struct bm_buffer *bufs, u8 num);
 
+/**
+ * bman_is_probed - Check if bman is probed
+ *
+ * Returns 1 if the bman driver successfully probed, -1 if the bman driver
+ * failed to probe or 0 if the bman driver did not probed yet.
+ */
+int bman_is_probed(void);
+
 #endif	/* __FSL_BMAN_H */
diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index d4dfefdee6c1..597783b8a3a0 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -1186,4 +1186,12 @@ int qman_alloc_cgrid_range(u32 *result, u32 count);
  */
 int qman_release_cgrid(u32 id);
 
+/**
+ * qman_is_probed - Check if qman is probed
+ *
+ * Returns 1 if the qman driver successfully probed, -1 if the qman driver
+ * failed to probe or 0 if the qman driver did not probed yet.
+ */
+int qman_is_probed(void);
+
 #endif	/* __FSL_QMAN_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/21] soc/fsl/qman_portals: defer probe after qman's probe
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (4 preceding siblings ...)
  2018-09-19 12:35 ` [PATCH 05/21] soc/fsl/qbman: add APIs to retrieve the probing status laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:35 ` [PATCH 07/21] soc/fsl/bman_portals: defer probe after bman's probe laurentiu.tudor
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Defer probe of qman portals after qman probing. This fixes the crash
below, seen on NXP LS1043A SoCs:

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000004
Mem abort info:
  ESR = 0x96000004
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
[0000000000000004] user address but active_mm is swapper
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.18.0-rc1-next-20180622-00200-g986f5c179185 #9
Hardware name: LS1043A RDB Board (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : qman_set_sdest+0x74/0xa0
lr : qman_portal_probe+0x22c/0x470
sp : ffff00000803bbc0
x29: ffff00000803bbc0 x28: 0000000000000000
x27: ffff0000090c1b88 x26: ffff00000927cb68
x25: ffff00000927c000 x24: ffff00000927cb60
x23: 0000000000000000 x22: 0000000000000000
x21: ffff0000090e9000 x20: ffff800073b5c810
x19: ffff800027401298 x18: ffffffffffffffff
x17: 0000000000000001 x16: 0000000000000000
x15: ffff0000090e96c8 x14: ffff80002740138a
x13: ffff0000090f2000 x12: 0000000000000030
x11: ffff000008f25000 x10: 0000000000000000
x9 : ffff80007bdfd2c0 x8 : 0000000000004000
x7 : ffff80007393cc18 x6 : 0040000000000001
x5 : 0000000000000000 x4 : ffffffffffffffff
x3 : 0000000000000004 x2 : ffff00000927c900
x1 : 0000000000000000 x0 : 0000000000000004
Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
Call trace:
 qman_set_sdest+0x74/0xa0
 platform_drv_probe+0x50/0xa8
 driver_probe_device+0x214/0x2f8
 __driver_attach+0xd8/0xe0
 bus_for_each_dev+0x68/0xc8
 driver_attach+0x20/0x28
 bus_add_driver+0x108/0x228
 driver_register+0x60/0x110
 __platform_driver_register+0x40/0x48
 qman_portal_driver_init+0x20/0x84
 do_one_initcall+0x58/0x168
 kernel_init_freeable+0x184/0x22c
 kernel_init+0x10/0x108
 ret_from_fork+0x10/0x18
Code: f9400443 11001000 927e4800 8b000063 (b9400063)
---[ end trace 4f6d50489ecfb930 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/qman_portal.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index 012bb95e87e1..7fd13f8c8da2 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -229,6 +229,14 @@ static int qman_portal_probe(struct platform_device *pdev)
 	int irq, cpu, err;
 	u32 val;
 
+	err = qman_is_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev, "failing probe due to qman probe error\n");
+		return -ENODEV;
+	}
+
 	pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL);
 	if (!pcfg)
 		return -ENOMEM;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/21] soc/fsl/bman_portals: defer probe after bman's probe
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (5 preceding siblings ...)
  2018-09-19 12:35 ` [PATCH 06/21] soc/fsl/qman_portals: defer probe after qman's probe laurentiu.tudor
@ 2018-09-19 12:35 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 08/21] soc/fsl/qbman_portals: add APIs to retrieve the probing status laurentiu.tudor
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:35 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

A crash in bman portal probing could not be triggered (as is the case
with qman portals) but it does make calls [1] into the bman driver so
lets make sure the bman portal probing happens after bman's.

[1]  bman_p_irqsource_add() (in bman) called by:
       init_pcfg() called by:
         bman_portal_probe()

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/bman_portal.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 2f71f7df3465..f9edd28894fd 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -91,7 +91,15 @@ static int bman_portal_probe(struct platform_device *pdev)
 	struct device_node *node = dev->of_node;
 	struct bm_portal_config *pcfg;
 	struct resource *addr_phys[2];
-	int irq, cpu;
+	int irq, cpu, err;
+
+	err = bman_is_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev, "failing probe due to bman probe error\n");
+		return -ENODEV;
+	}
 
 	pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL);
 	if (!pcfg)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/21] soc/fsl/qbman_portals: add APIs to retrieve the probing status
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (6 preceding siblings ...)
  2018-09-19 12:35 ` [PATCH 07/21] soc/fsl/bman_portals: defer probe after bman's probe laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 09/21] fsl/fman: backup and restore ICID registers laurentiu.tudor
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add a couple of new APIs to check the probing status of the required
cpu bound qman and bman portals:
 'int bman_portals_probed()' and 'int qman_portals_probed()'.
They return the following values.
 *  1 if qman/bman portals were all probed correctly
 *  0 if qman/bman portals were not yet probed
 * -1 if probing of qman/bman portals failed
Drivers that use qman/bman portal driver services are required to use
these APIs before calling any functions exported by these drivers or
otherwise they will crash the kernel.
First user will be the dpaa1 ethernet driver, coming in a subsequent
patch.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/soc/fsl/qbman/bman_portal.c | 10 ++++++++++
 drivers/soc/fsl/qbman/qman_portal.c | 10 ++++++++++
 include/soc/fsl/bman.h              |  8 ++++++++
 include/soc/fsl/qman.h              |  9 +++++++++
 4 files changed, 37 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index f9edd28894fd..8048d35de8a2 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -32,6 +32,7 @@
 
 static struct bman_portal *affine_bportals[NR_CPUS];
 static struct cpumask portal_cpus;
+static int __bman_portals_probed;
 /* protect bman global registers and global data shared among portals */
 static DEFINE_SPINLOCK(bman_lock);
 
@@ -85,6 +86,12 @@ static int bman_online_cpu(unsigned int cpu)
 	return 0;
 }
 
+int bman_portals_probed(void)
+{
+	return __bman_portals_probed;
+}
+EXPORT_SYMBOL_GPL(bman_portals_probed);
+
 static int bman_portal_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -148,6 +155,7 @@ static int bman_portal_probe(struct platform_device *pdev)
 	spin_lock(&bman_lock);
 	cpu = cpumask_next_zero(-1, &portal_cpus);
 	if (cpu >= nr_cpu_ids) {
+		__bman_portals_probed = 1;
 		/* unassigned portal, skip init */
 		spin_unlock(&bman_lock);
 		return 0;
@@ -173,6 +181,8 @@ static int bman_portal_probe(struct platform_device *pdev)
 err_ioremap2:
 	memunmap(pcfg->addr_virt_ce);
 err_ioremap1:
+	 __bman_portals_probed = 1;
+
 	return -ENXIO;
 }
 
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index 7fd13f8c8da2..1a987aa2ec8c 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -39,6 +39,7 @@ EXPORT_SYMBOL(qman_dma_portal);
 #define CONFIG_FSL_DPA_PIRQ_FAST  1
 
 static struct cpumask portal_cpus;
+static int __qman_portals_probed;
 /* protect qman global registers and global data shared among portals */
 static DEFINE_SPINLOCK(qman_lock);
 
@@ -219,6 +220,12 @@ static int qman_online_cpu(unsigned int cpu)
 	return 0;
 }
 
+int qman_portals_probed(void)
+{
+	return __qman_portals_probed;
+}
+EXPORT_SYMBOL_GPL(qman_portals_probed);
+
 static int qman_portal_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -306,6 +313,7 @@ static int qman_portal_probe(struct platform_device *pdev)
 	spin_lock(&qman_lock);
 	cpu = cpumask_next_zero(-1, &portal_cpus);
 	if (cpu >= nr_cpu_ids) {
+		__qman_portals_probed = 1;
 		/* unassigned portal, skip init */
 		spin_unlock(&qman_lock);
 		return 0;
@@ -336,6 +344,8 @@ static int qman_portal_probe(struct platform_device *pdev)
 err_ioremap2:
 	memunmap(pcfg->addr_virt_ce);
 err_ioremap1:
+	__qman_portals_probed = -1;
+
 	return -ENXIO;
 }
 
diff --git a/include/soc/fsl/bman.h b/include/soc/fsl/bman.h
index 5b99cb2ea5ef..173e4049d963 100644
--- a/include/soc/fsl/bman.h
+++ b/include/soc/fsl/bman.h
@@ -133,5 +133,13 @@ int bman_acquire(struct bman_pool *pool, struct bm_buffer *bufs, u8 num);
  * failed to probe or 0 if the bman driver did not probed yet.
  */
 int bman_is_probed(void);
+/**
+ * bman_portals_probed - Check if all cpu bound bman portals are probed
+ *
+ * Returns 1 if all the required cpu bound bman portals successfully probed,
+ * -1 if probe errors appeared or 0 if the bman portals did not yet finished
+ * probing.
+ */
+int bman_portals_probed(void);
 
 #endif	/* __FSL_BMAN_H */
diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index 597783b8a3a0..7732e48081eb 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -1194,4 +1194,13 @@ int qman_release_cgrid(u32 id);
  */
 int qman_is_probed(void);
 
+/**
+ * qman_portals_probed - Check if all cpu bound qman portals are probed
+ *
+ * Returns 1 if all the required cpu bound qman portals successfully probed,
+ * -1 if probe errors appeared or 0 if the qman portals did not yet finished
+ * probing.
+ */
+int qman_portals_probed(void);
+
 #endif	/* __FSL_QMAN_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/21] fsl/fman: backup and restore ICID registers
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (7 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 08/21] soc/fsl/qbman_portals: add APIs to retrieve the probing status laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 10/21] fsl/fman: add API to get the device behind a fman port laurentiu.tudor
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

During probing, FMAN is reset thus losing all its register
settings. Backup port ICID registers before reset and restore
them after, similarly to how it's done on powerpc / PAMU based
platforms.
This also has the side effect of disabling the old code path
(liodn backup/restore handling) that obviously make no sense
in the context of SMMU on ARMs.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman.c | 35 +++++++++++++++++++++-
 drivers/net/ethernet/freescale/fman/fman.h |  4 +++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index c415ac67cb7b..8f9136892d98 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -629,6 +629,7 @@ static void set_port_order_restoration(struct fman_fpm_regs __iomem *fpm_rg,
 	iowrite32be(tmp, &fpm_rg->fmfp_prc);
 }
 
+#ifdef CONFIG_PPC
 static void set_port_liodn(struct fman *fman, u8 port_id,
 			   u32 liodn_base, u32 liodn_ofst)
 {
@@ -646,6 +647,27 @@ static void set_port_liodn(struct fman *fman, u8 port_id,
 	iowrite32be(tmp, &fman->dma_regs->fmdmplr[port_id / 2]);
 	iowrite32be(liodn_ofst, &fman->bmi_regs->fmbm_spliodn[port_id - 1]);
 }
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+static void save_restore_port_icids(struct fman *fman, bool save)
+{
+	int port_idxes[] = {
+		0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc,
+		0xd, 0xe, 0xf, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
+		0x10, 0x11, 0x30, 0x31
+	};
+	int idx, i;
+
+	for (i = 0; i < ARRAY_SIZE(port_idxes); i++) {
+		idx = port_idxes[i];
+		if (save)
+			fman->sp_icids[idx] =
+				ioread32be(&fman->bmi_regs->fmbm_spliodn[idx]);
+		else
+			iowrite32be(fman->sp_icids[idx],
+				    &fman->bmi_regs->fmbm_spliodn[idx]);
+	}
+}
+#endif
 
 static void enable_rams_ecc(struct fman_fpm_regs __iomem *fpm_rg)
 {
@@ -1914,7 +1936,10 @@ static int fman_reset(struct fman *fman)
 static int fman_init(struct fman *fman)
 {
 	struct fman_cfg *cfg = NULL;
-	int err = 0, i, count;
+	int err = 0, count;
+#ifdef CONFIG_PPC
+	int i;
+#endif
 
 	if (is_init_done(fman->cfg))
 		return -EINVAL;
@@ -1934,6 +1959,7 @@ static int fman_init(struct fman *fman)
 	memset_io((void __iomem *)(fman->base_addr + CGP_OFFSET), 0,
 		  fman->state->fm_port_num_of_cg);
 
+#ifdef CONFIG_PPC
 	/* Save LIODN info before FMan reset
 	 * Skipping non-existent port 0 (i = 1)
 	 */
@@ -1953,6 +1979,9 @@ static int fman_init(struct fman *fman)
 		}
 		fman->liodn_base[i] = liodn_base;
 	}
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+	save_restore_port_icids(fman, true);
+#endif
 
 	err = fman_reset(fman);
 	if (err)
@@ -2181,8 +2210,12 @@ int fman_set_port_params(struct fman *fman,
 	if (err)
 		goto return_err;
 
+#ifdef CONFIG_PPC
 	set_port_liodn(fman, port_id, fman->liodn_base[port_id],
 		       fman->liodn_offset[port_id]);
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+	save_restore_port_icids(fman, false);
+#endif
 
 	if (fman->state->rev_info.major < 6)
 		set_port_order_restoration(fman->fpm_regs, port_id);
diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h
index 935c317fa696..19f20fa58053 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -346,8 +346,12 @@ struct fman {
 	unsigned long fifo_offset;
 	size_t fifo_size;
 
+#ifdef CONFIG_PPC
 	u32 liodn_base[64];
 	u32 liodn_offset[64];
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+	u32 sp_icids[64];
+#endif
 
 	struct fman_dts_params dts_params;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/21] fsl/fman: add API to get the device behind a fman port
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (8 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 09/21] fsl/fman: backup and restore ICID registers laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 11/21] dpaa_eth: defer probing after qbman laurentiu.tudor
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Add an API that retrieves the 'struct device' that the specified fman
port probed against. The new API will be used in a subsequent iommu
enablement related patch.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman_port.c | 14 ++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_port.h |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index ee82ee1384eb..bd76c9730692 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1728,6 +1728,20 @@ u32 fman_port_get_qman_channel_id(struct fman_port *port)
 }
 EXPORT_SYMBOL(fman_port_get_qman_channel_id);
 
+/**
+ * fman_port_get_device
+ * port:	Pointer to the FMan port device
+ *
+ * Get the 'struct device' associated to the specified FMan port device
+ *
+ * Return: pointer to associated 'struct device'
+ */
+struct device *fman_port_get_device(struct fman_port *port)
+{
+	return port->dev;
+}
+EXPORT_SYMBOL(fman_port_get_device);
+
 int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset)
 {
 	if (port->buffer_offsets.hash_result_offset == ILLEGAL_BASE)
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 9dbb69f40121..82f12661a46d 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -157,4 +157,6 @@ int fman_port_get_tstamp(struct fman_port *port, const void *data, u64 *tstamp);
 
 struct fman_port *fman_port_bind(struct device *dev);
 
+struct device *fman_port_get_device(struct fman_port *port);
+
 #endif /* __FMAN_PORT_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/21] dpaa_eth: defer probing after qbman
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (9 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 10/21] fsl/fman: add API to get the device behind a fman port laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 12/21] dpaa_eth: base dma mappings on the fman rx port laurentiu.tudor
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Enabling SMMU altered the order of device probing causing the dpaa1
ethernet driver to get probed before qbman and causing a boot crash.
Add predictability in the probing order by deferring the ethernet
driver probe after qbman and portals by using the recently introduced
qbman APIs.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index a5131a510e8b..6ca3fdbef580 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2765,6 +2765,37 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	int err = 0, i, channel;
 	struct device *dev;
 
+	err = bman_is_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev, "failing probe due to bman probe error\n");
+		return -ENODEV;
+	}
+	err = qman_is_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev, "failing probe due to qman probe error\n");
+		return -ENODEV;
+	}
+	err = bman_portals_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev,
+			"failing probe due to bman portals probe error\n");
+		return -ENODEV;
+	}
+	err = qman_portals_probed();
+	if (!err)
+		return -EPROBE_DEFER;
+	if (err < 0) {
+		dev_err(&pdev->dev,
+			"failing probe due to qman portals probe error\n");
+		return -ENODEV;
+	}
+
 	/* device used for DMA mapping */
 	dev = pdev->dev.parent;
 	err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/21] dpaa_eth: base dma mappings on the fman rx port
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (10 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 11/21] dpaa_eth: defer probing after qbman laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 13/21] dpaa_eth: fix iova handling for contiguous frames laurentiu.tudor
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

The dma transactions initiator is the rx fman port so that's the device
that the dma mappings should be done. Previously the mappings were done
through the MAC device which makes no sense because it's neither dma-able
nor connected in any way to smmu.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6ca3fdbef580..ac9e50c8a556 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2796,8 +2796,15 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 		return -ENODEV;
 	}
 
+	mac_dev = dpaa_mac_dev_get(pdev);
+	if (IS_ERR(mac_dev)) {
+		dev_err(&pdev->dev, "dpaa_mac_dev_get() failed\n");
+		err = PTR_ERR(mac_dev);
+		goto probe_err;
+	}
+
 	/* device used for DMA mapping */
-	dev = pdev->dev.parent;
+	dev = fman_port_get_device(mac_dev->port[RX]);
 	err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
 	if (err) {
 		dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
@@ -2822,13 +2829,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 
 	priv->msg_enable = netif_msg_init(debug, DPAA_MSG_DEFAULT);
 
-	mac_dev = dpaa_mac_dev_get(pdev);
-	if (IS_ERR(mac_dev)) {
-		dev_err(dev, "dpaa_mac_dev_get() failed\n");
-		err = PTR_ERR(mac_dev);
-		goto free_netdev;
-	}
-
 	/* If fsl_fm_max_frm is set to a higher value than the all-common 1500,
 	 * we choose conservatively and let the user explicitly set a higher
 	 * MTU via ifconfig. Otherwise, the user may end up with different MTUs
@@ -2964,9 +2964,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	qman_release_cgrid(priv->cgr_data.cgr.cgrid);
 free_dpaa_bps:
 	dpaa_bps_free(priv);
-free_netdev:
 	dev_set_drvdata(dev, NULL);
 	free_netdev(net_dev);
+probe_err:
 
 	return err;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/21] dpaa_eth: fix iova handling for contiguous frames
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (11 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 12/21] dpaa_eth: base dma mappings on the fman rx port laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 14/21] dpaa_eth: fix iova handling for sg frames laurentiu.tudor
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

The driver relies on the no longer valid assumption that dma addresses
(iovas) are identical to physical addressees and uses phys_to_virt() to
make iova -> vaddr conversions. Fix this by adding a function that does
proper iova -> phys conversions using the iommu api and update the code
to use it.
Also, a dma_unmap_single() call had to be moved further down the code
because iova -> vaddr conversions were required before the unmap.
For now only the contiguous frame case is handled and the SG case is
split in a following patch.
While at it, clean-up a redundant dpaa_bpid2pool() and pass the bp
as parameter.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 44 ++++++++++---------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ac9e50c8a556..e9e081c3f8cc 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -50,6 +50,7 @@
 #include <linux/highmem.h>
 #include <linux/percpu.h>
 #include <linux/dma-mapping.h>
+#include <linux/iommu.h>
 #include <linux/sort.h>
 #include <soc/fsl/bman.h>
 #include <soc/fsl/qman.h>
@@ -1595,6 +1596,17 @@ static int dpaa_eth_refill_bpools(struct dpaa_priv *priv)
 	return 0;
 }
 
+static phys_addr_t dpaa_iova_to_phys(struct device *dev, dma_addr_t addr)
+{
+	struct iommu_domain *domain;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (domain)
+		return iommu_iova_to_phys(domain, addr);
+	else
+		return addr;
+}
+
 /* Cleanup function for outgoing frame descriptors that were built on Tx path,
  * either contiguous frames or scatter/gather ones.
  * Skb freeing is not handled here.
@@ -1617,7 +1629,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
 	int nr_frags, i;
 	u64 ns;
 
-	skbh = (struct sk_buff **)phys_to_virt(addr);
+	skbh = (struct sk_buff **)phys_to_virt(dpaa_iova_to_phys(dev, addr));
 	skb = *skbh;
 
 	if (priv->tx_tstamp && skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
@@ -1687,25 +1699,21 @@ static u8 rx_csum_offload(const struct dpaa_priv *priv, const struct qm_fd *fd)
  * accommodate the shared info area of the skb.
  */
 static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
-					const struct qm_fd *fd)
+					const struct qm_fd *fd,
+					struct dpaa_bp *dpaa_bp,
+					void *vaddr)
 {
 	ssize_t fd_off = qm_fd_get_offset(fd);
-	dma_addr_t addr = qm_fd_addr(fd);
-	struct dpaa_bp *dpaa_bp;
 	struct sk_buff *skb;
-	void *vaddr;
 
-	vaddr = phys_to_virt(addr);
 	WARN_ON(!IS_ALIGNED((unsigned long)vaddr, SMP_CACHE_BYTES));
 
-	dpaa_bp = dpaa_bpid2pool(fd->bpid);
-	if (!dpaa_bp)
-		goto free_buffer;
-
 	skb = build_skb(vaddr, dpaa_bp->size +
 			SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
-	if (WARN_ONCE(!skb, "Build skb failure on Rx\n"))
-		goto free_buffer;
+	if (WARN_ONCE(!skb, "Build skb failure on Rx\n")) {
+		skb_free_frag(vaddr);
+		return NULL;
+	}
 	WARN_ON(fd_off != priv->rx_headroom);
 	skb_reserve(skb, fd_off);
 	skb_put(skb, qm_fd_get_length(fd));
@@ -1713,10 +1721,6 @@ static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
 	skb->ip_summed = rx_csum_offload(priv, fd);
 
 	return skb;
-
-free_buffer:
-	skb_free_frag(vaddr);
-	return NULL;
 }
 
 /* Build an skb with the data of the first S/G entry in the linear portion and
@@ -2302,12 +2306,12 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	if (!dpaa_bp)
 		return qman_cb_dqrr_consume;
 
-	dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
-
 	/* prefetch the first 64 bytes of the frame or the SGT start */
-	vaddr = phys_to_virt(addr);
+	vaddr = phys_to_virt(dpaa_iova_to_phys(dpaa_bp->dev, addr));
 	prefetch(vaddr + qm_fd_get_offset(fd));
 
+	dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
+
 	/* The only FD types that we may receive are contig and S/G */
 	WARN_ON((fd_format != qm_fd_contig) && (fd_format != qm_fd_sg));
 
@@ -2318,7 +2322,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	(*count_ptr)--;
 
 	if (likely(fd_format == qm_fd_contig))
-		skb = contig_fd_to_skb(priv, fd);
+		skb = contig_fd_to_skb(priv, fd, dpaa_bp, vaddr);
 	else
 		skb = sg_fd_to_skb(priv, fd);
 	if (!skb)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/21] dpaa_eth: fix iova handling for sg frames
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (12 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 13/21] dpaa_eth: fix iova handling for contiguous frames laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 15/21] dpaa_eth: fix SG frame cleanup laurentiu.tudor
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

The driver relies on the no longer valid assumption that dma addresses
(iovas) are identical to physical addressees and uses phys_to_virt() to
make iova -> vaddr conversions. Fix this also for scatter-gather frames
using the iova -> phys conversion function added in the previous patch.
While at it, clean-up a redundant dpaa_bpid2pool() and pass the bp
as parameter.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 41 +++++++++++--------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index e9e081c3f8cc..8db861f281a0 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1646,14 +1646,17 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
 
 	if (unlikely(qm_fd_get_format(fd) == qm_fd_sg)) {
 		nr_frags = skb_shinfo(skb)->nr_frags;
-		dma_unmap_single(dev, addr,
-				 qm_fd_get_offset(fd) + DPAA_SGT_SIZE,
-				 dma_dir);
 
 		/* The sgt buffer has been allocated with netdev_alloc_frag(),
 		 * it's from lowmem.
 		 */
-		sgt = phys_to_virt(addr + qm_fd_get_offset(fd));
+		sgt = phys_to_virt(dpaa_iova_to_phys(dev,
+						     addr +
+						     qm_fd_get_offset(fd)));
+
+		dma_unmap_single(dev, addr,
+				 qm_fd_get_offset(fd) + DPAA_SGT_SIZE,
+				 dma_dir);
 
 		/* sgt[0] is from lowmem, was dma_map_single()-ed */
 		dma_unmap_single(dev, qm_sg_addr(&sgt[0]),
@@ -1668,7 +1671,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
 		}
 
 		/* Free the page frag that we allocated on Tx */
-		skb_free_frag(phys_to_virt(addr));
+		skb_free_frag(skbh);
 	} else {
 		dma_unmap_single(dev, addr,
 				 skb_tail_pointer(skb) - (u8 *)skbh, dma_dir);
@@ -1729,14 +1732,14 @@ static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
  * The page fragment holding the S/G Table is recycled here.
  */
 static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
-				    const struct qm_fd *fd)
+				    const struct qm_fd *fd,
+				    struct dpaa_bp *dpaa_bp,
+				    void *vaddr)
 {
 	ssize_t fd_off = qm_fd_get_offset(fd);
-	dma_addr_t addr = qm_fd_addr(fd);
 	const struct qm_sg_entry *sgt;
 	struct page *page, *head_page;
-	struct dpaa_bp *dpaa_bp;
-	void *vaddr, *sg_vaddr;
+	void *sg_vaddr;
 	int frag_off, frag_len;
 	struct sk_buff *skb;
 	dma_addr_t sg_addr;
@@ -1745,7 +1748,6 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
 	int *count_ptr;
 	int i;
 
-	vaddr = phys_to_virt(addr);
 	WARN_ON(!IS_ALIGNED((unsigned long)vaddr, SMP_CACHE_BYTES));
 
 	/* Iterate through the SGT entries and add data buffers to the skb */
@@ -1756,14 +1758,18 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
 		WARN_ON(qm_sg_entry_is_ext(&sgt[i]));
 
 		sg_addr = qm_sg_addr(&sgt[i]);
-		sg_vaddr = phys_to_virt(sg_addr);
-		WARN_ON(!IS_ALIGNED((unsigned long)sg_vaddr,
-				    SMP_CACHE_BYTES));
 
 		/* We may use multiple Rx pools */
 		dpaa_bp = dpaa_bpid2pool(sgt[i].bpid);
-		if (!dpaa_bp)
+		if (!dpaa_bp) {
+			pr_info("%s: fail to get dpaa_bp for sg bpid %d\n",
+				__func__, sgt[i].bpid);
 			goto free_buffers;
+		}
+		sg_vaddr = phys_to_virt(dpaa_iova_to_phys(dpaa_bp->dev,
+							  sg_addr));
+		WARN_ON(!IS_ALIGNED((unsigned long)sg_vaddr,
+				    SMP_CACHE_BYTES));
 
 		count_ptr = this_cpu_ptr(dpaa_bp->percpu_count);
 		dma_unmap_single(dpaa_bp->dev, sg_addr, dpaa_bp->size,
@@ -1835,10 +1841,11 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
 	/* free all the SG entries */
 	for (i = 0; i < DPAA_SGT_MAX_ENTRIES ; i++) {
 		sg_addr = qm_sg_addr(&sgt[i]);
-		sg_vaddr = phys_to_virt(sg_addr);
-		skb_free_frag(sg_vaddr);
 		dpaa_bp = dpaa_bpid2pool(sgt[i].bpid);
 		if (dpaa_bp) {
+			sg_addr = dpaa_iova_to_phys(dpaa_bp->dev, sg_addr);
+			sg_vaddr = phys_to_virt(sg_addr);
+			skb_free_frag(sg_vaddr);
 			count_ptr = this_cpu_ptr(dpaa_bp->percpu_count);
 			(*count_ptr)--;
 		}
@@ -2324,7 +2331,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	if (likely(fd_format == qm_fd_contig))
 		skb = contig_fd_to_skb(priv, fd, dpaa_bp, vaddr);
 	else
-		skb = sg_fd_to_skb(priv, fd);
+		skb = sg_fd_to_skb(priv, fd, dpaa_bp, vaddr);
 	if (!skb)
 		return qman_cb_dqrr_consume;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 15/21] dpaa_eth: fix SG frame cleanup
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (13 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 14/21] dpaa_eth: fix iova handling for sg frames laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 16/21] arm64: dts: ls1046a: add smmu node laurentiu.tudor
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

Fix issue with the entry indexing in the sg frame cleanup code being
off-by-1. This problem showed up when doing some basic iperf tests and
manifested in traffic coming to a halt.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 8db861f281a0..605f06f0def8 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1663,7 +1663,7 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct dpaa_priv *priv,
 				 qm_sg_entry_get_len(&sgt[0]), dma_dir);
 
 		/* remaining pages were mapped with skb_frag_dma_map() */
-		for (i = 1; i < nr_frags; i++) {
+		for (i = 1; i <= nr_frags; i++) {
 			WARN_ON(qm_sg_entry_is_ext(&sgt[i]));
 
 			dma_unmap_page(dev, qm_sg_addr(&sgt[i]),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 16/21] arm64: dts: ls1046a: add smmu node
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (14 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 15/21] dpaa_eth: fix SG frame cleanup laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 13:30   ` Robin Murphy
  2018-09-19 12:36 ` [PATCH 17/21] arm64: dts: ls1043a: " laurentiu.tudor
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

This allows for the SMMU device to be probed by the SMMU kernel driver.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index ef83786b8b90..06863d3e4a7d 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -228,6 +228,48 @@
 			bus-width = <4>;
 		};
 
+		mmu: iommu@9000000 {
+			compatible = "arm,mmu-500";
+			reg = <0 0x9000000 0 0x400000>;
+			dma-coherent;
+			#global-interrupts = <2>;
+			#iommu-cells = <1>;
+			interrupts = <0 142 4>, /* global secure fault */
+				     <0 143 4>, /* combined secure interrupt */
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>;
+		};
+
 		scfg: scfg@1570000 {
 			compatible = "fsl,ls1046a-scfg", "syscon";
 			reg = <0x0 0x1570000 0x0 0x10000>;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 17/21] arm64: dts: ls1043a: add smmu node
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (15 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 16/21] arm64: dts: ls1046a: add smmu node laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID laurentiu.tudor
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

This allows for the SMMU device to be probed by the SMMU kernel driver.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 42 +++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 7881e3d81a9a..8b3eba167508 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -222,6 +222,48 @@
 			clocks = <&sysclk>;
 		};
 
+		mmu: iommu@9000000 {
+			compatible = "arm,mmu-500";
+			reg = <0 0x9000000 0 0x400000>;
+			dma-coherent;
+			#global-interrupts = <2>;
+			#iommu-cells = <1>;
+			interrupts = <0 142 4>, /* global secure fault */
+				     <0 143 4>, /* combined secure interrupt */
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>,
+				     <0 142 4>;
+		};
+
 		scfg: scfg@1570000 {
 			compatible = "fsl,ls1043a-scfg", "syscon";
 			reg = <0x0 0x1570000 0x0 0x10000>;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (16 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 17/21] arm64: dts: ls1043a: " laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 13:41   ` Robin Murphy
  2018-09-19 12:36 ` [PATCH 19/21] arm64: dts: ls104x: add missing dma ranges property laurentiu.tudor
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

The StreamID entering the SMMU is actually a concatenation of the
SMMU TBU ID and the ICID configured in software.
Since the TBU ID is internal to the SoC and since we want that the
actual the ICID configured in software to enter the SMMU witout any
additional set bits, mask out the TBU ID bits and leave only the
relevant ICID bits to enter SMMU.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 8b3eba167508..90296b9fb171 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -226,6 +226,7 @@
 			compatible = "arm,mmu-500";
 			reg = <0 0x9000000 0 0x400000>;
 			dma-coherent;
+			stream-match-mask = <0x7f00>;
 			#global-interrupts = <2>;
 			#iommu-cells = <1>;
 			interrupts = <0 142 4>, /* global secure fault */
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 06863d3e4a7d..15094dd8400e 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -232,6 +232,7 @@
 			compatible = "arm,mmu-500";
 			reg = <0 0x9000000 0 0x400000>;
 			dma-coherent;
+			stream-match-mask = <0x7f00>;
 			#global-interrupts = <2>;
 			#iommu-cells = <1>;
 			interrupts = <0 142 4>, /* global secure fault */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 19/21] arm64: dts: ls104x: add missing dma ranges property
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (17 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 20/21] arm64: dts: ls104x: add iommu-map to pci controllers laurentiu.tudor
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

These chips have a 48-bit address size so make sure that the dma-ranges
reflects this. Otherwise the linux kernel's dma sub-system will set
the default dma masks to full 64-bit, badly breaking dmas.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 90296b9fb171..48091409c472 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -214,6 +214,7 @@
 		#address-cells = <2>;
 		#size-cells = <2>;
 		ranges;
+		dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
 
 		clockgen: clocking@1ee1000 {
 			compatible = "fsl,ls1043a-clockgen";
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 15094dd8400e..40484f6f6d42 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -187,6 +187,7 @@
 		#address-cells = <2>;
 		#size-cells = <2>;
 		ranges;
+		dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
 
 		ddr: memory-controller@1080000 {
 			compatible = "fsl,qoriq-memory-controller";
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 20/21] arm64: dts: ls104x: add iommu-map to pci controllers
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (18 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 19/21] arm64: dts: ls104x: add missing dma ranges property laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 12:36 ` [PATCH 21/21] arm64: dts: ls104x: make dma-coherent global to the SoC laurentiu.tudor
  2018-09-19 13:25 ` [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A Robin Murphy
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

The pci controllers are also behind the smmu so add the iommu-map
property to reflect this. The bootloader needs to patch the stream id
ranges to some sane values.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 3 +++
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 48091409c472..3b7b2e60bd9a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -716,6 +716,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <4>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x40 0x00010000 0x0 0x00010000   /* downstream I/O */
@@ -741,6 +742,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x48 0x00010000 0x0 0x00010000   /* downstream I/O */
@@ -766,6 +768,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x50 0x00010000 0x0 0x00010000   /* downstream I/O */
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 40484f6f6d42..890d1565791f 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -685,6 +685,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <4>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x40 0x00010000 0x0 0x00010000   /* downstream I/O */
@@ -710,6 +711,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x48 0x00010000 0x0 0x00010000   /* downstream I/O */
@@ -735,6 +737,7 @@
 			#size-cells = <2>;
 			device_type = "pci";
 			dma-coherent;
+			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
 			ranges = <0x81000000 0x0 0x00000000 0x50 0x00010000 0x0 0x00010000   /* downstream I/O */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 21/21] arm64: dts: ls104x: make dma-coherent global to the SoC
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (19 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 20/21] arm64: dts: ls104x: add iommu-map to pci controllers laurentiu.tudor
@ 2018-09-19 12:36 ` laurentiu.tudor
  2018-09-19 13:25 ` [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A Robin Murphy
  21 siblings, 0 replies; 34+ messages in thread
From: laurentiu.tudor @ 2018-09-19 12:36 UTC (permalink / raw)
  To: devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: roy.pledge, madalin.bucur, davem, shawnguo, leoyang.li, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

These SoCs are really completely dma coherent in their entirety so add
the dma-coherent property at the soc level in the device tree and drop
the instances where it's specifically added to a few select devices.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 5 +----
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 3b7b2e60bd9a..d02106cb2116 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -215,6 +215,7 @@
 		#size-cells = <2>;
 		ranges;
 		dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
+		dma-coherent;
 
 		clockgen: clocking@1ee1000 {
 			compatible = "fsl,ls1043a-clockgen";
@@ -680,7 +681,6 @@
 			reg-names = "ahci", "sata-ecc";
 			interrupts = <0 69 0x4>;
 			clocks = <&clockgen 4 0>;
-			dma-coherent;
 		};
 
 		msi1: msi-controller1@1571000 {
@@ -715,7 +715,6 @@
 			#address-cells = <3>;
 			#size-cells = <2>;
 			device_type = "pci";
-			dma-coherent;
 			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <4>;
 			bus-range = <0x0 0xff>;
@@ -741,7 +740,6 @@
 			#address-cells = <3>;
 			#size-cells = <2>;
 			device_type = "pci";
-			dma-coherent;
 			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
@@ -767,7 +765,6 @@
 			#address-cells = <3>;
 			#size-cells = <2>;
 			device_type = "pci";
-			dma-coherent;
 			iommu-map = <0 &mmu 0 1>;
 			num-lanes = <2>;
 			bus-range = <0x0 0xff>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 890d1565791f..3bdea0470f69 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -188,6 +188,7 @@
 		#size-cells = <2>;
 		ranges;
 		dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
+		dma-coherent;
 
 		ddr: memory-controller@1080000 {
 			compatible = "fsl,qoriq-memory-controller";
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
                   ` (20 preceding siblings ...)
  2018-09-19 12:36 ` [PATCH 21/21] arm64: dts: ls104x: make dma-coherent global to the SoC laurentiu.tudor
@ 2018-09-19 13:25 ` Robin Murphy
  2018-09-19 14:18   ` Laurentiu Tudor
  21 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2018-09-19 13:25 UTC (permalink / raw)
  To: laurentiu.tudor, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: madalin.bucur, roy.pledge, leoyang.li, shawnguo, davem

Hi Laurentiu,

On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
> and consists mostly in important driver fixes and the required device
> tree updates. It touches several subsystems and consists of three main
> parts:
>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>     reserved memory areas, fixes and defered probe support
>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>     consisting in misc dma mapping related fixes and probe ordering
>   - addition of the actual arm smmu device tree node together with
>     various adjustments to the device trees
> 
> Performance impact
> 
>      Running iperf benchmarks in a back-to-back setup (both sides
>      having smmu enabled) on a 10GBps port show an important
>      networking performance degradation of around %40 (9.48Gbps
>      linerate vs 5.45Gbps). If you need performance but without
>      SMMU support you can use "iommu.passthrough=1" to disable
>      SMMU.
> 
> USB issue and workaround
> 
>      There's a problem with the usb controllers in these chips
>      generating smaller, 40-bit wide dma addresses instead of the 48-bit
>      supported at the smmu input. So you end up in a situation where the
>      smmu is mapped with 48-bit address translations, but the device
>      generates transactions with clipped 40-bit addresses, thus smmu
>      context faults are triggered. I encountered a similar situation for
>      mmc that I  managed to fix in software [1] however for USB I did not
>      find a proper place in the code to add a similar fix. The only
>      workaround I found was to add this kernel parameter which limits the
>      usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>      This workaround if far from ideal, so any suggestions for a code
>      based workaround in this area would be greatly appreciated.

If you have a nominally-64-bit device with a 
narrower-than-the-main-interconnect link in front of it, that should 
already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges, 
provided the interconnect hierarchy can be described appropriately (or 
at least massaged sufficiently to satisfy the binding), e.g.:

/ {
	...

	soc {
		ranges;
		dma-ranges = <0 0 10000 0>;

		dev_48bit { ... };

		periph_bus {
			ranges;
			dma-ranges = <0 0 100 0>;

			dev_40bit { ... };
		};
	};
};

and if that fails to work as expected (except for PCI hosts where 
handling dma-ranges properly still needs sorting out), please do let us 
know ;)

Robin.

> The patch set is based on net-next so, if generally agreed, I'd suggest
> to get the patches through the netdev tree after getting all the Acks.
> 
> [1] https://patchwork.kernel.org/patch/10506627/
> 
> Laurentiu Tudor (21):
>    soc/fsl/qman: fixup liodns only on ppc targets
>    soc/fsl/bman: map FBPR area in the iommu
>    soc/fsl/qman: map FQD and PFDR areas in the iommu
>    soc/fsl/qman-portal: map CENA area in the iommu
>    soc/fsl/qbman: add APIs to retrieve the probing status
>    soc/fsl/qman_portals: defer probe after qman's probe
>    soc/fsl/bman_portals: defer probe after bman's probe
>    soc/fsl/qbman_portals: add APIs to retrieve the probing status
>    fsl/fman: backup and restore ICID registers
>    fsl/fman: add API to get the device behind a fman port
>    dpaa_eth: defer probing after qbman
>    dpaa_eth: base dma mappings on the fman rx port
>    dpaa_eth: fix iova handling for contiguous frames
>    dpaa_eth: fix iova handling for sg frames
>    dpaa_eth: fix SG frame cleanup
>    arm64: dts: ls1046a: add smmu node
>    arm64: dts: ls1043a: add smmu node
>    arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
>    arm64: dts: ls104x: add missing dma ranges property
>    arm64: dts: ls104x: add iommu-map to pci controllers
>    arm64: dts: ls104x: make dma-coherent global to the SoC
> 
>   .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
>   .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
>   drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
>   drivers/net/ethernet/freescale/fman/fman.h    |   4 +
>   .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
>   .../net/ethernet/freescale/fman/fman_port.h   |   2 +
>   drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
>   drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
>   drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
>   drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
>   include/soc/fsl/bman.h                        |  16 +++
>   include/soc/fsl/qman.h                        |  17 +++
>   13 files changed, 379 insertions(+), 53 deletions(-)
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 16/21] arm64: dts: ls1046a: add smmu node
  2018-09-19 12:36 ` [PATCH 16/21] arm64: dts: ls1046a: add smmu node laurentiu.tudor
@ 2018-09-19 13:30   ` Robin Murphy
  2018-09-19 13:51     ` Laurentiu Tudor
  0 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2018-09-19 13:30 UTC (permalink / raw)
  To: laurentiu.tudor, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: madalin.bucur, roy.pledge, leoyang.li, shawnguo, davem

On 19/09/18 13:36, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> This allows for the SMMU device to be probed by the SMMU kernel driver.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index ef83786b8b90..06863d3e4a7d 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -228,6 +228,48 @@
>   			bus-width = <4>;
>   		};
>   
> +		mmu: iommu@9000000 {
> +			compatible = "arm,mmu-500";
> +			reg = <0 0x9000000 0 0x400000>;
> +			dma-coherent;
> +			#global-interrupts = <2>;
> +			#iommu-cells = <1>;
> +			interrupts = <0 142 4>, /* global secure fault */

Either that's not really the secure global interrupt, or those context 
interrupts are wrong.

Robin.

> +				     <0 143 4>, /* combined secure interrupt */
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>,
> +				     <0 142 4>;
> +		};
> +
>   		scfg: scfg@1570000 {
>   			compatible = "fsl,ls1046a-scfg", "syscon";
>   			reg = <0x0 0x1570000 0x0 0x10000>;
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
  2018-09-19 12:36 ` [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID laurentiu.tudor
@ 2018-09-19 13:41   ` Robin Murphy
  2018-09-19 14:06     ` Laurentiu Tudor
  0 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2018-09-19 13:41 UTC (permalink / raw)
  To: laurentiu.tudor, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: madalin.bucur, roy.pledge, leoyang.li, shawnguo, davem

On 19/09/18 13:36, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> The StreamID entering the SMMU is actually a concatenation of the
> SMMU TBU ID and the ICID configured in software.
> Since the TBU ID is internal to the SoC and since we want that the
> actual the ICID configured in software to enter the SMMU witout any
> additional set bits, mask out the TBU ID bits and leave only the
> relevant ICID bits to enter SMMU.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
>   arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
>   arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> index 8b3eba167508..90296b9fb171 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
> @@ -226,6 +226,7 @@
>   			compatible = "arm,mmu-500";
>   			reg = <0 0x9000000 0 0x400000>;
>   			dma-coherent;
> +			stream-match-mask = <0x7f00>;

The TBU ID only forms the top 5 bits, so also ignoring bits 9:8 raises 
an eyebrow - if the LS104x SMMU really is configured for 8-bit SID input 
then it's harmless, but if it's actually a 9 or 10-bit configuration 
then you probably want to avoid masking them (or at least document why) 
- IIRC there *was* stuff wired there on LS2085 at least.

Robin.

>   			#global-interrupts = <2>;
>   			#iommu-cells = <1>;
>   			interrupts = <0 142 4>, /* global secure fault */
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 06863d3e4a7d..15094dd8400e 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -232,6 +232,7 @@
>   			compatible = "arm,mmu-500";
>   			reg = <0 0x9000000 0 0x400000>;
>   			dma-coherent;
> +			stream-match-mask = <0x7f00>;
>   			#global-interrupts = <2>;
>   			#iommu-cells = <1>;
>   			interrupts = <0 142 4>, /* global secure fault */
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 16/21] arm64: dts: ls1046a: add smmu node
  2018-09-19 13:30   ` Robin Murphy
@ 2018-09-19 13:51     ` Laurentiu Tudor
  0 siblings, 0 replies; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-19 13:51 UTC (permalink / raw)
  To: Robin Murphy, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Madalin-cristian Bucur, Roy Pledge, Leo Li, shawnguo, davem

Hi Robin,

On 19.09.2018 16:30, Robin Murphy wrote:
> On 19/09/18 13:36, laurentiu.tudor@nxp.com wrote:
>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>
>> This allows for the SMMU device to be probed by the SMMU kernel driver.
>>
>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>> ---
>>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 42 +++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
>> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> index ef83786b8b90..06863d3e4a7d 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> @@ -228,6 +228,48 @@
>>               bus-width = <4>;
>>           };
>> +        mmu: iommu@9000000 {
>> +            compatible = "arm,mmu-500";
>> +            reg = <0 0x9000000 0 0x400000>;
>> +            dma-coherent;
>> +            #global-interrupts = <2>;
>> +            #iommu-cells = <1>;
>> +            interrupts = <0 142 4>, /* global secure fault */
> 
> Either that's not really the secure global interrupt, or those context 
> interrupts are wrong.

Now that you pointing out, I realize that the comments don't make much 
sense. Actually, 142 is the non-secure interrupt (all ints are ORed on 
this IRQ) while 143 is the secure version. I'll update the comments in 
the next re-spin.

---
Thanks & Best Regards, Laurentiu


> 
>> +                     <0 143 4>, /* combined secure interrupt */
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>,
>> +                     <0 142 4>;
>> +        };
>> +
>>           scfg: scfg@1570000 {
>>               compatible = "fsl,ls1046a-scfg", "syscon";
>>               reg = <0x0 0x1570000 0x0 0x10000>;
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
  2018-09-19 13:41   ` Robin Murphy
@ 2018-09-19 14:06     ` Laurentiu Tudor
  0 siblings, 0 replies; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-19 14:06 UTC (permalink / raw)
  To: Robin Murphy, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Madalin-cristian Bucur, Roy Pledge, Leo Li, shawnguo, davem

Hi Robin,

On 19.09.2018 16:41, Robin Murphy wrote:
> On 19/09/18 13:36, laurentiu.tudor@nxp.com wrote:
>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>
>> The StreamID entering the SMMU is actually a concatenation of the
>> SMMU TBU ID and the ICID configured in software.
>> Since the TBU ID is internal to the SoC and since we want that the
>> actual the ICID configured in software to enter the SMMU witout any
>> additional set bits, mask out the TBU ID bits and leave only the
>> relevant ICID bits to enter SMMU.
>>
>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>> ---
>>   arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
>>   arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
>> b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> index 8b3eba167508..90296b9fb171 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>> @@ -226,6 +226,7 @@
>>               compatible = "arm,mmu-500";
>>               reg = <0 0x9000000 0 0x400000>;
>>               dma-coherent;
>> +            stream-match-mask = <0x7f00>;
> 
> The TBU ID only forms the top 5 bits, so also ignoring bits 9:8 raises 
> an eyebrow - if the LS104x SMMU really is configured for 8-bit SID input 
> then it's harmless, 

On these lower-end platforms the SID input is configured and documented 
as 8-bit.

> but if it's actually a 9 or 10-bit configuration 
> then you probably want to avoid masking them (or at least document why) 
> - IIRC there *was* stuff wired there on LS2085 at least.

Yes, on LS2s there are 2 extra-bits in there carrying some signaling. 
However, on LS1s they are not present.

---
Thanks & Best Regards, Laurentiu

> 
>>               #global-interrupts = <2>;
>>               #iommu-cells = <1>;
>>               interrupts = <0 142 4>, /* global secure fault */
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
>> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> index 06863d3e4a7d..15094dd8400e 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> @@ -232,6 +232,7 @@
>>               compatible = "arm,mmu-500";
>>               reg = <0 0x9000000 0 0x400000>;
>>               dma-coherent;
>> +            stream-match-mask = <0x7f00>;
>>               #global-interrupts = <2>;
>>               #iommu-cells = <1>;
>>               interrupts = <0 142 4>, /* global secure fault */
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-19 13:25 ` [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A Robin Murphy
@ 2018-09-19 14:18   ` Laurentiu Tudor
  2018-09-19 14:37     ` Robin Murphy
  0 siblings, 1 reply; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-19 14:18 UTC (permalink / raw)
  To: Robin Murphy, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Madalin-cristian Bucur, Roy Pledge, Leo Li, shawnguo, davem

Hi Robin,

On 19.09.2018 16:25, Robin Murphy wrote:
> Hi Laurentiu,
> 
> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>
>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>> and consists mostly in important driver fixes and the required device
>> tree updates. It touches several subsystems and consists of three main
>> parts:
>>   - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>     reserved memory areas, fixes and defered probe support
>>   - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>     consisting in misc dma mapping related fixes and probe ordering
>>   - addition of the actual arm smmu device tree node together with
>>     various adjustments to the device trees
>>
>> Performance impact
>>
>>      Running iperf benchmarks in a back-to-back setup (both sides
>>      having smmu enabled) on a 10GBps port show an important
>>      networking performance degradation of around %40 (9.48Gbps
>>      linerate vs 5.45Gbps). If you need performance but without
>>      SMMU support you can use "iommu.passthrough=1" to disable
>>      SMMU.
>>
>> USB issue and workaround
>>
>>      There's a problem with the usb controllers in these chips
>>      generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>      supported at the smmu input. So you end up in a situation where the
>>      smmu is mapped with 48-bit address translations, but the device
>>      generates transactions with clipped 40-bit addresses, thus smmu
>>      context faults are triggered. I encountered a similar situation for
>>      mmc that I  managed to fix in software [1] however for USB I did not
>>      find a proper place in the code to add a similar fix. The only
>>      workaround I found was to add this kernel parameter which limits the
>>      usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>      This workaround if far from ideal, so any suggestions for a code
>>      based workaround in this area would be greatly appreciated.
> 
> If you have a nominally-64-bit device with a 
> narrower-than-the-main-interconnect link in front of it, that should 
> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges, 
> provided the interconnect hierarchy can be described appropriately (or 
> at least massaged sufficiently to satisfy the binding), e.g.:
> 
> / {
>      ...
> 
>      soc {
>          ranges;
>          dma-ranges = <0 0 10000 0>;
> 
>          dev_48bit { ... };
> 
>          periph_bus {
>              ranges;
>              dma-ranges = <0 0 100 0>;
> 
>              dev_40bit { ... };
>          };
>      };
> };
> 
> and if that fails to work as expected (except for PCI hosts where 
> handling dma-ranges properly still needs sorting out), please do let us 
> know ;)
> 

Just to confirm, Is this [1] the change I was supposed to test?
Because if so, I'm still seeing context faults [2] with what looks like 
clipped to 40-bits addresses. :-(
IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn 
will be limited to the SMMU input size of 48-bit. Won't that overwrite 
the default dma mask derived from dma-ranges?

---
Best Regards, Laurentiu

[1] -----------------------------------------------------------------

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 3bdea0470f69..a214c3df37fd 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -612,6 +612,7 @@
                         compatible = "snps,dwc3";
                         reg = <0x0 0x2f00000 0x0 0x10000>;
                         interrupts = <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>;
+                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
                         dr_mode = "host";
                         snps,quirk-frame-length-adjustment = <0x20>;
                         snps,dis_rxdet_inp3_quirk;
@@ -621,6 +622,7 @@
                         compatible = "snps,dwc3";
                         reg = <0x0 0x3000000 0x0 0x10000>;
                         interrupts = <GIC_SPI 61 IRQ_TYPE_LEVEL_HIGH>;
+                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
                         dr_mode = "host";
                         snps,quirk-frame-length-adjustment = <0x20>;
                         snps,dis_rxdet_inp3_quirk;
@@ -630,6 +632,7 @@
                         compatible = "snps,dwc3";
                         reg = <0x0 0x3100000 0x0 0x10000>;
                         interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_HIGH>;
+                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
                         dr_mode = "host";
                         snps,quirk-frame-length-adjustment = <0x20>;
                         snps,dis_rxdet_inp3_quirk;

[2] -----------------------------------------------------------------
[    2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered, 
assigned bus number 2
[    2.103720] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0  SuperSpeed
[    2.110346] arm-smmu 9000000.iommu: Unhandled context fault: 
fsr=0x402, iova=0xffffffb000, fsynr=0x1b0000, cb=3
[    2.120449] usb usb2: We don't know the algorithms for LPM for this 
host, disabling LPM.
[    2.128717] hub 2-0:1.0: USB hub found
[    2.132473] hub 2-0:1.0: 1 port detected
[    2.136527] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[    2.142014] xhci-hcd xhci-hcd.1.auto: new USB bus registered, 
assigned bus number 3
[    2.149747] xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f66d hci 
version 0x100 quirks 0x0000000002010010
[    2.159149] xhci-hcd xhci-hcd.1.auto: irq 50, io mem 0x03000000
[    2.165284] hub 3-0:1.0: USB hub found
[    2.169039] hub 3-0:1.0: 1 port detected
[    2.173051] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[    2.178536] xhci-hcd xhci-hcd.1.auto: new USB bus registered, 
assigned bus number 4
[    2.186193] xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0  SuperSpeed
[    2.192809] arm-smmu 9000000.iommu: Unhandled context fault: 
fsr=0x402, iova=0xffffffb000, fsynr=0x1f0000, cb=4
[    2.192822] usb usb4: We don't know the algorithms for LPM for this 
host, disabling LPM.
[    2.211141] hub 4-0:1.0: USB hub found
[    2.214896] hub 4-0:1.0: 1 port detected
[    2.218935] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
[    2.224425] xhci-hcd xhci-hcd.2.auto: new USB bus registered, 
assigned bus number 5
[    2.232153] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci 
version 0x100 quirks 0x0000000002010010
[    2.241562] xhci-hcd xhci-hcd.2.auto: irq 51, io mem 0x03100000
[    2.247694] hub 5-0:1.0: USB hub found
[    2.251449] hub 5-0:1.0: 1 port detected
[    2.255458] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
[    2.260945] xhci-hcd xhci-hcd.2.auto: new USB bus registered, 
assigned bus number 6
[    2.268601] xhci-hcd xhci-hcd.2.auto: Host supports USB 3.0  SuperSpeed
[    2.275218] arm-smmu 9000000.iommu: Unhandled context fault: 
fsr=0x402, iova=0xffffffb000, fsynr=0x110000, cb=5
[    2.275230] usb usb6: We don't know the algorithms for LPM for this 
host, disabling LPM.


>> The patch set is based on net-next so, if generally agreed, I'd suggest
>> to get the patches through the netdev tree after getting all the Acks.
>>
>> [1] 
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fpatch%2F10506627%2F&amp;data=02%7C01%7Claurentiu.tudor%40nxp.com%7C63c4e1dfc126488eb4ba08d61e336607%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636729603447603039&amp;sdata=XhjOX9aLgoe%2BSTBgZztv6zCz0vMebSXW%2Fnb2QcD5shY%3D&amp;reserved=0 
>>
>>
>> Laurentiu Tudor (21):
>>    soc/fsl/qman: fixup liodns only on ppc targets
>>    soc/fsl/bman: map FBPR area in the iommu
>>    soc/fsl/qman: map FQD and PFDR areas in the iommu
>>    soc/fsl/qman-portal: map CENA area in the iommu
>>    soc/fsl/qbman: add APIs to retrieve the probing status
>>    soc/fsl/qman_portals: defer probe after qman's probe
>>    soc/fsl/bman_portals: defer probe after bman's probe
>>    soc/fsl/qbman_portals: add APIs to retrieve the probing status
>>    fsl/fman: backup and restore ICID registers
>>    fsl/fman: add API to get the device behind a fman port
>>    dpaa_eth: defer probing after qbman
>>    dpaa_eth: base dma mappings on the fman rx port
>>    dpaa_eth: fix iova handling for contiguous frames
>>    dpaa_eth: fix iova handling for sg frames
>>    dpaa_eth: fix SG frame cleanup
>>    arm64: dts: ls1046a: add smmu node
>>    arm64: dts: ls1043a: add smmu node
>>    arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
>>    arm64: dts: ls104x: add missing dma ranges property
>>    arm64: dts: ls104x: add iommu-map to pci controllers
>>    arm64: dts: ls104x: make dma-coherent global to the SoC
>>
>>   .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
>>   .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
>>   .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
>>   drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
>>   drivers/net/ethernet/freescale/fman/fman.h    |   4 +
>>   .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
>>   .../net/ethernet/freescale/fman/fman_port.h   |   2 +
>>   drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
>>   drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
>>   drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
>>   drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
>>   include/soc/fsl/bman.h                        |  16 +++
>>   include/soc/fsl/qman.h                        |  17 +++
>>   13 files changed, 379 insertions(+), 53 deletions(-)
>>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-19 14:18   ` Laurentiu Tudor
@ 2018-09-19 14:37     ` Robin Murphy
  2018-09-20 10:38       ` Laurentiu Tudor
  0 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2018-09-19 14:37 UTC (permalink / raw)
  To: Laurentiu Tudor, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Madalin-cristian Bucur, Roy Pledge, Leo Li, shawnguo, davem

On 19/09/18 15:18, Laurentiu Tudor wrote:
> Hi Robin,
> 
> On 19.09.2018 16:25, Robin Murphy wrote:
>> Hi Laurentiu,
>>
>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>
>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>> and consists mostly in important driver fixes and the required device
>>> tree updates. It touches several subsystems and consists of three main
>>> parts:
>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>      reserved memory areas, fixes and defered probe support
>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>      consisting in misc dma mapping related fixes and probe ordering
>>>    - addition of the actual arm smmu device tree node together with
>>>      various adjustments to the device trees
>>>
>>> Performance impact
>>>
>>>       Running iperf benchmarks in a back-to-back setup (both sides
>>>       having smmu enabled) on a 10GBps port show an important
>>>       networking performance degradation of around %40 (9.48Gbps
>>>       linerate vs 5.45Gbps). If you need performance but without
>>>       SMMU support you can use "iommu.passthrough=1" to disable
>>>       SMMU.
>>>
>>> USB issue and workaround
>>>
>>>       There's a problem with the usb controllers in these chips
>>>       generating smaller, 40-bit wide dma addresses instead of the 48-bit
>>>       supported at the smmu input. So you end up in a situation where the
>>>       smmu is mapped with 48-bit address translations, but the device
>>>       generates transactions with clipped 40-bit addresses, thus smmu
>>>       context faults are triggered. I encountered a similar situation for
>>>       mmc that I  managed to fix in software [1] however for USB I did not
>>>       find a proper place in the code to add a similar fix. The only
>>>       workaround I found was to add this kernel parameter which limits the
>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>       This workaround if far from ideal, so any suggestions for a code
>>>       based workaround in this area would be greatly appreciated.
>>
>> If you have a nominally-64-bit device with a
>> narrower-than-the-main-interconnect link in front of it, that should
>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>> provided the interconnect hierarchy can be described appropriately (or
>> at least massaged sufficiently to satisfy the binding), e.g.:
>>
>> / {
>>       ...
>>
>>       soc {
>>           ranges;
>>           dma-ranges = <0 0 10000 0>;
>>
>>           dev_48bit { ... };
>>
>>           periph_bus {
>>               ranges;
>>               dma-ranges = <0 0 100 0>;
>>
>>               dev_40bit { ... };
>>           };
>>       };
>> };
>>
>> and if that fails to work as expected (except for PCI hosts where
>> handling dma-ranges properly still needs sorting out), please do let us
>> know ;)
>>
> 
> Just to confirm, Is this [1] the change I was supposed to test?

Not quite - dma-ranges is only valid for nodes representing a bus, so 
putting it directly in the USB device nodes doesn't work (FWIW that's 
why PCI is broken, because the parser doesn't expect the 
bus-as-leaf-node case). That's teh point of that intermediate simple-bus 
node represented by "periph_bus" in my example (sorry, I should have put 
compatibles in to make it clearer) - often that's actually true to life 
(i.e. "soc" is something like a CCI and "periph_bus" is something like 
an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the 
CCI ports) but at worst it's just a necessary evil to make the binding 
happy (if it literally only represents the point-to-point link between 
the device master port and interconnect slave port).

> Because if so, I'm still seeing context faults [2] with what looks like
> clipped to 40-bits addresses. :-(
> IIRC, the usb subsystem explicitly set 64-bit dma masks which in turn
> will be limited to the SMMU input size of 48-bit. Won't that overwrite
> the default dma mask derived from dma-ranges?

Indeed it will, but those default masks were effectively only ever a 
best-effort thing anyway - it's an ease-of-implementation detail that 
bus_dma_mask is not currently reflected in the device masks, although we 
may eventually change that; the crucial part is that the DMA ops 
implementations know about it and should now enforce it properly 
regardless of whether drivers set something wider.

Robin.

> 
> ---
> Best Regards, Laurentiu
> 
> [1] -----------------------------------------------------------------
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 3bdea0470f69..a214c3df37fd 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -612,6 +612,7 @@
>                           compatible = "snps,dwc3";
>                           reg = <0x0 0x2f00000 0x0 0x10000>;
>                           interrupts = <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>;
> +                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
>                           dr_mode = "host";
>                           snps,quirk-frame-length-adjustment = <0x20>;
>                           snps,dis_rxdet_inp3_quirk;
> @@ -621,6 +622,7 @@
>                           compatible = "snps,dwc3";
>                           reg = <0x0 0x3000000 0x0 0x10000>;
>                           interrupts = <GIC_SPI 61 IRQ_TYPE_LEVEL_HIGH>;
> +                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
>                           dr_mode = "host";
>                           snps,quirk-frame-length-adjustment = <0x20>;
>                           snps,dis_rxdet_inp3_quirk;
> @@ -630,6 +632,7 @@
>                           compatible = "snps,dwc3";
>                           reg = <0x0 0x3100000 0x0 0x10000>;
>                           interrupts = <GIC_SPI 63 IRQ_TYPE_LEVEL_HIGH>;
> +                       dma-ranges = <0x0 0x0 0x0 0x0 0x100 0x00000000>;
>                           dr_mode = "host";
>                           snps,quirk-frame-length-adjustment = <0x20>;
>                           snps,dis_rxdet_inp3_quirk;
> 
> [2] -----------------------------------------------------------------
> [    2.090577] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
> [    2.096064] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned bus number 2
> [    2.103720] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0  SuperSpeed
> [    2.110346] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x1b0000, cb=3
> [    2.120449] usb usb2: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [    2.128717] hub 2-0:1.0: USB hub found
> [    2.132473] hub 2-0:1.0: 1 port detected
> [    2.136527] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> [    2.142014] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
> assigned bus number 3
> [    2.149747] xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f66d hci
> version 0x100 quirks 0x0000000002010010
> [    2.159149] xhci-hcd xhci-hcd.1.auto: irq 50, io mem 0x03000000
> [    2.165284] hub 3-0:1.0: USB hub found
> [    2.169039] hub 3-0:1.0: 1 port detected
> [    2.173051] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> [    2.178536] xhci-hcd xhci-hcd.1.auto: new USB bus registered,
> assigned bus number 4
> [    2.186193] xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0  SuperSpeed
> [    2.192809] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x1f0000, cb=4
> [    2.192822] usb usb4: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [    2.211141] hub 4-0:1.0: USB hub found
> [    2.214896] hub 4-0:1.0: 1 port detected
> [    2.218935] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
> [    2.224425] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
> assigned bus number 5
> [    2.232153] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci
> version 0x100 quirks 0x0000000002010010
> [    2.241562] xhci-hcd xhci-hcd.2.auto: irq 51, io mem 0x03100000
> [    2.247694] hub 5-0:1.0: USB hub found
> [    2.251449] hub 5-0:1.0: 1 port detected
> [    2.255458] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
> [    2.260945] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
> assigned bus number 6
> [    2.268601] xhci-hcd xhci-hcd.2.auto: Host supports USB 3.0  SuperSpeed
> [    2.275218] arm-smmu 9000000.iommu: Unhandled context fault:
> fsr=0x402, iova=0xffffffb000, fsynr=0x110000, cb=5
> [    2.275230] usb usb6: We don't know the algorithms for LPM for this
> host, disabling LPM.
> 
> 
>>> The patch set is based on net-next so, if generally agreed, I'd suggest
>>> to get the patches through the netdev tree after getting all the Acks.
>>>
>>> [1]
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fpatch%2F10506627%2F&amp;data=02%7C01%7Claurentiu.tudor%40nxp.com%7C63c4e1dfc126488eb4ba08d61e336607%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636729603447603039&amp;sdata=XhjOX9aLgoe%2BSTBgZztv6zCz0vMebSXW%2Fnb2QcD5shY%3D&amp;reserved=0
>>>
>>>
>>> Laurentiu Tudor (21):
>>>     soc/fsl/qman: fixup liodns only on ppc targets
>>>     soc/fsl/bman: map FBPR area in the iommu
>>>     soc/fsl/qman: map FQD and PFDR areas in the iommu
>>>     soc/fsl/qman-portal: map CENA area in the iommu
>>>     soc/fsl/qbman: add APIs to retrieve the probing status
>>>     soc/fsl/qman_portals: defer probe after qman's probe
>>>     soc/fsl/bman_portals: defer probe after bman's probe
>>>     soc/fsl/qbman_portals: add APIs to retrieve the probing status
>>>     fsl/fman: backup and restore ICID registers
>>>     fsl/fman: add API to get the device behind a fman port
>>>     dpaa_eth: defer probing after qbman
>>>     dpaa_eth: base dma mappings on the fman rx port
>>>     dpaa_eth: fix iova handling for contiguous frames
>>>     dpaa_eth: fix iova handling for sg frames
>>>     dpaa_eth: fix SG frame cleanup
>>>     arm64: dts: ls1046a: add smmu node
>>>     arm64: dts: ls1043a: add smmu node
>>>     arm64: dts: ls104xa: set mask to drop TBU ID from StreamID
>>>     arm64: dts: ls104x: add missing dma ranges property
>>>     arm64: dts: ls104x: add iommu-map to pci controllers
>>>     arm64: dts: ls104x: make dma-coherent global to the SoC
>>>
>>>    .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi |  52 ++++++-
>>>    .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi |  48 +++++++
>>>    .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 136 ++++++++++++------
>>>    drivers/net/ethernet/freescale/fman/fman.c    |  35 ++++-
>>>    drivers/net/ethernet/freescale/fman/fman.h    |   4 +
>>>    .../net/ethernet/freescale/fman/fman_port.c   |  14 ++
>>>    .../net/ethernet/freescale/fman/fman_port.h   |   2 +
>>>    drivers/soc/fsl/qbman/bman_ccsr.c             |  23 +++
>>>    drivers/soc/fsl/qbman/bman_portal.c           |  20 ++-
>>>    drivers/soc/fsl/qbman/qman_ccsr.c             |  30 ++++
>>>    drivers/soc/fsl/qbman/qman_portal.c           |  35 +++++
>>>    include/soc/fsl/bman.h                        |  16 +++
>>>    include/soc/fsl/qman.h                        |  17 +++
>>>    13 files changed, 379 insertions(+), 53 deletions(-)
>> >

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-19 14:37     ` Robin Murphy
@ 2018-09-20 10:38       ` Laurentiu Tudor
  2018-09-20 11:49         ` Robin Murphy
  2018-09-20 19:07         ` Li Yang
  0 siblings, 2 replies; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-20 10:38 UTC (permalink / raw)
  To: Robin Murphy, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Madalin-cristian Bucur, Roy Pledge, Leo Li, shawnguo, davem



On 19.09.2018 17:37, Robin Murphy wrote:
> On 19/09/18 15:18, Laurentiu Tudor wrote:
>> Hi Robin,
>>
>> On 19.09.2018 16:25, Robin Murphy wrote:
>>> Hi Laurentiu,
>>>
>>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>
>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>> and consists mostly in important driver fixes and the required device
>>>> tree updates. It touches several subsystems and consists of three main
>>>> parts:
>>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>>      reserved memory areas, fixes and defered probe support
>>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>      consisting in misc dma mapping related fixes and probe ordering
>>>>    - addition of the actual arm smmu device tree node together with
>>>>      various adjustments to the device trees
>>>>
>>>> Performance impact
>>>>
>>>>       Running iperf benchmarks in a back-to-back setup (both sides
>>>>       having smmu enabled) on a 10GBps port show an important
>>>>       networking performance degradation of around %40 (9.48Gbps
>>>>       linerate vs 5.45Gbps). If you need performance but without
>>>>       SMMU support you can use "iommu.passthrough=1" to disable
>>>>       SMMU.
>>>>
>>>> USB issue and workaround
>>>>
>>>>       There's a problem with the usb controllers in these chips
>>>>       generating smaller, 40-bit wide dma addresses instead of the 
>>>> 48-bit
>>>>       supported at the smmu input. So you end up in a situation 
>>>> where the
>>>>       smmu is mapped with 48-bit address translations, but the device
>>>>       generates transactions with clipped 40-bit addresses, thus smmu
>>>>       context faults are triggered. I encountered a similar 
>>>> situation for
>>>>       mmc that I  managed to fix in software [1] however for USB I 
>>>> did not
>>>>       find a proper place in the code to add a similar fix. The only
>>>>       workaround I found was to add this kernel parameter which 
>>>> limits the
>>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>       This workaround if far from ideal, so any suggestions for a code
>>>>       based workaround in this area would be greatly appreciated.
>>>
>>> If you have a nominally-64-bit device with a
>>> narrower-than-the-main-interconnect link in front of it, that should
>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>> provided the interconnect hierarchy can be described appropriately (or
>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>
>>> / {
>>>       ...
>>>
>>>       soc {
>>>           ranges;
>>>           dma-ranges = <0 0 10000 0>;
>>>
>>>           dev_48bit { ... };
>>>
>>>           periph_bus {
>>>               ranges;
>>>               dma-ranges = <0 0 100 0>;
>>>
>>>               dev_40bit { ... };
>>>           };
>>>       };
>>> };
>>>
>>> and if that fails to work as expected (except for PCI hosts where
>>> handling dma-ranges properly still needs sorting out), please do let us
>>> know ;)
>>>
>>
>> Just to confirm, Is this [1] the change I was supposed to test?
> 
> Not quite - dma-ranges is only valid for nodes representing a bus, so 
> putting it directly in the USB device nodes doesn't work (FWIW that's 
> why PCI is broken, because the parser doesn't expect the 
> bus-as-leaf-node case). That's teh point of that intermediate simple-bus 
> node represented by "periph_bus" in my example (sorry, I should have put 
> compatibles in to make it clearer) - often that's actually true to life 
> (i.e. "soc" is something like a CCI and "periph_bus" is something like 
> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the 
> CCI ports) but at worst it's just a necessary evil to make the binding 
> happy (if it literally only represents the point-to-point link between 
> the device master port and interconnect slave port).
> 

Quick update: so I adjusted to device tree according to your example and 
it works so now I can get rid of that nasty kernel arg based workaround, 
yey! :-)
Thanks a lot, that was really helpful.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-20 10:38       ` Laurentiu Tudor
@ 2018-09-20 11:49         ` Robin Murphy
  2018-09-20 14:33           ` Laurentiu Tudor
  2018-09-20 19:07         ` Li Yang
  1 sibling, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2018-09-20 11:49 UTC (permalink / raw)
  To: Laurentiu Tudor, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Roy Pledge, Leo Li, shawnguo, davem, Madalin-cristian Bucur

On 20/09/18 11:38, Laurentiu Tudor wrote:
> 
> 
> On 19.09.2018 17:37, Robin Murphy wrote:
>> On 19/09/18 15:18, Laurentiu Tudor wrote:
>>> Hi Robin,
>>>
>>> On 19.09.2018 16:25, Robin Murphy wrote:
>>>> Hi Laurentiu,
>>>>
>>>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>
>>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>>> and consists mostly in important driver fixes and the required device
>>>>> tree updates. It touches several subsystems and consists of three main
>>>>> parts:
>>>>>     - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
>>>>>       reserved memory areas, fixes and defered probe support
>>>>>     - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>>       consisting in misc dma mapping related fixes and probe ordering
>>>>>     - addition of the actual arm smmu device tree node together with
>>>>>       various adjustments to the device trees
>>>>>
>>>>> Performance impact
>>>>>
>>>>>        Running iperf benchmarks in a back-to-back setup (both sides
>>>>>        having smmu enabled) on a 10GBps port show an important
>>>>>        networking performance degradation of around %40 (9.48Gbps
>>>>>        linerate vs 5.45Gbps). If you need performance but without
>>>>>        SMMU support you can use "iommu.passthrough=1" to disable
>>>>>        SMMU.

I should have said before - thanks for the numbers there as well. Always 
good to add another datapoint to my collection. If you're interested 
I've added SMMUv2 support to the "non-strict mode" series (of which I 
should be posting v8 soon), so it might be fun to see how well that 
works on MMU-500 in the real world.

>>>>>
>>>>> USB issue and workaround
>>>>>
>>>>>        There's a problem with the usb controllers in these chips
>>>>>        generating smaller, 40-bit wide dma addresses instead of the
>>>>> 48-bit
>>>>>        supported at the smmu input. So you end up in a situation
>>>>> where the
>>>>>        smmu is mapped with 48-bit address translations, but the device
>>>>>        generates transactions with clipped 40-bit addresses, thus smmu
>>>>>        context faults are triggered. I encountered a similar
>>>>> situation for
>>>>>        mmc that I  managed to fix in software [1] however for USB I
>>>>> did not
>>>>>        find a proper place in the code to add a similar fix. The only
>>>>>        workaround I found was to add this kernel parameter which
>>>>> limits the
>>>>>        usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>>        This workaround if far from ideal, so any suggestions for a code
>>>>>        based workaround in this area would be greatly appreciated.
>>>>
>>>> If you have a nominally-64-bit device with a
>>>> narrower-than-the-main-interconnect link in front of it, that should
>>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>>> provided the interconnect hierarchy can be described appropriately (or
>>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>>
>>>> / {
>>>>        ...
>>>>
>>>>        soc {
>>>>            ranges;
>>>>            dma-ranges = <0 0 10000 0>;
>>>>
>>>>            dev_48bit { ... };
>>>>
>>>>            periph_bus {
>>>>                ranges;
>>>>                dma-ranges = <0 0 100 0>;
>>>>
>>>>                dev_40bit { ... };
>>>>            };
>>>>        };
>>>> };
>>>>
>>>> and if that fails to work as expected (except for PCI hosts where
>>>> handling dma-ranges properly still needs sorting out), please do let us
>>>> know ;)
>>>>
>>>
>>> Just to confirm, Is this [1] the change I was supposed to test?
>>
>> Not quite - dma-ranges is only valid for nodes representing a bus, so
>> putting it directly in the USB device nodes doesn't work (FWIW that's
>> why PCI is broken, because the parser doesn't expect the
>> bus-as-leaf-node case). That's teh point of that intermediate simple-bus
>> node represented by "periph_bus" in my example (sorry, I should have put
>> compatibles in to make it clearer) - often that's actually true to life
>> (i.e. "soc" is something like a CCI and "periph_bus" is something like
>> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
>> CCI ports) but at worst it's just a necessary evil to make the binding
>> happy (if it literally only represents the point-to-point link between
>> the device master port and interconnect slave port).
>>
> 
> Quick update: so I adjusted to device tree according to your example and
> it works so now I can get rid of that nasty kernel arg based workaround,
> yey! :-)

Cool! In fact, judging by the block diagrams on the website, the "basic 
peripherals and interconnect" section hanging off the side of the CCI 
implies that probably is true to the real topology as I imagined, so it 
doesn't even count as a horrible hack :)

> Thanks a lot, that was really helpful.

No problem. FWIW if you ever come to doing ACPI support for these SoCs, 
the equivalent is merely a case of setting the device memory address 
size limit field appropriately for all the named components.

Robin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-20 11:49         ` Robin Murphy
@ 2018-09-20 14:33           ` Laurentiu Tudor
  0 siblings, 0 replies; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-20 14:33 UTC (permalink / raw)
  To: Robin Murphy, devicetree, netdev, linux-kernel, linux-arm-kernel
  Cc: Roy Pledge, Leo Li, shawnguo, davem, Madalin-cristian Bucur



On 20.09.2018 14:49, Robin Murphy wrote:
> On 20/09/18 11:38, Laurentiu Tudor wrote:
>>
>>
>> On 19.09.2018 17:37, Robin Murphy wrote:
>>> On 19/09/18 15:18, Laurentiu Tudor wrote:
>>>> Hi Robin,
>>>>
>>>> On 19.09.2018 16:25, Robin Murphy wrote:
>>>>> Hi Laurentiu,
>>>>>
>>>>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>
>>>>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
>>>>>> and consists mostly in important driver fixes and the required device
>>>>>> tree updates. It touches several subsystems and consists of three 
>>>>>> main
>>>>>> parts:
>>>>>>     - changes in soc/drivers/fsl/qbman drivers adding iommu 
>>>>>> mapping of
>>>>>>       reserved memory areas, fixes and defered probe support
>>>>>>     - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
>>>>>>       consisting in misc dma mapping related fixes and probe ordering
>>>>>>     - addition of the actual arm smmu device tree node together with
>>>>>>       various adjustments to the device trees
>>>>>>
>>>>>> Performance impact
>>>>>>
>>>>>>        Running iperf benchmarks in a back-to-back setup (both sides
>>>>>>        having smmu enabled) on a 10GBps port show an important
>>>>>>        networking performance degradation of around %40 (9.48Gbps
>>>>>>        linerate vs 5.45Gbps). If you need performance but without
>>>>>>        SMMU support you can use "iommu.passthrough=1" to disable
>>>>>>        SMMU.
> 
> I should have said before - thanks for the numbers there as well. Always 
> good to add another datapoint to my collection. If you're interested 
> I've added SMMUv2 support to the "non-strict mode" series (of which I 
> should be posting v8 soon), so it might be fun to see how well that 
> works on MMU-500 in the real world.

Hmm, I think I gave those a try some weeks ago and vaguely remember that 
I did see improvements. Can't remember the numbers off the top of my 
head but I'll re-test with the latest spin and update the numbers.

>>>>>>
>>>>>> USB issue and workaround
>>>>>>
>>>>>>        There's a problem with the usb controllers in these chips
>>>>>>        generating smaller, 40-bit wide dma addresses instead of the
>>>>>> 48-bit
>>>>>>        supported at the smmu input. So you end up in a situation
>>>>>> where the
>>>>>>        smmu is mapped with 48-bit address translations, but the 
>>>>>> device
>>>>>>        generates transactions with clipped 40-bit addresses, thus 
>>>>>> smmu
>>>>>>        context faults are triggered. I encountered a similar
>>>>>> situation for
>>>>>>        mmc that I  managed to fix in software [1] however for USB I
>>>>>> did not
>>>>>>        find a proper place in the code to add a similar fix. The only
>>>>>>        workaround I found was to add this kernel parameter which
>>>>>> limits the
>>>>>>        usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
>>>>>>        This workaround if far from ideal, so any suggestions for a 
>>>>>> code
>>>>>>        based workaround in this area would be greatly appreciated.
>>>>>
>>>>> If you have a nominally-64-bit device with a
>>>>> narrower-than-the-main-interconnect link in front of it, that should
>>>>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
>>>>> provided the interconnect hierarchy can be described appropriately (or
>>>>> at least massaged sufficiently to satisfy the binding), e.g.:
>>>>>
>>>>> / {
>>>>>        ...
>>>>>
>>>>>        soc {
>>>>>            ranges;
>>>>>            dma-ranges = <0 0 10000 0>;
>>>>>
>>>>>            dev_48bit { ... };
>>>>>
>>>>>            periph_bus {
>>>>>                ranges;
>>>>>                dma-ranges = <0 0 100 0>;
>>>>>
>>>>>                dev_40bit { ... };
>>>>>            };
>>>>>        };
>>>>> };
>>>>>
>>>>> and if that fails to work as expected (except for PCI hosts where
>>>>> handling dma-ranges properly still needs sorting out), please do 
>>>>> let us
>>>>> know ;)
>>>>>
>>>>
>>>> Just to confirm, Is this [1] the change I was supposed to test?
>>>
>>> Not quite - dma-ranges is only valid for nodes representing a bus, so
>>> putting it directly in the USB device nodes doesn't work (FWIW that's
>>> why PCI is broken, because the parser doesn't expect the
>>> bus-as-leaf-node case). That's teh point of that intermediate simple-bus
>>> node represented by "periph_bus" in my example (sorry, I should have put
>>> compatibles in to make it clearer) - often that's actually true to life
>>> (i.e. "soc" is something like a CCI and "periph_bus" is something like
>>> an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
>>> CCI ports) but at worst it's just a necessary evil to make the binding
>>> happy (if it literally only represents the point-to-point link between
>>> the device master port and interconnect slave port).
>>>
>>
>> Quick update: so I adjusted to device tree according to your example and
>> it works so now I can get rid of that nasty kernel arg based workaround,
>> yey! :-)
> 
> Cool! In fact, judging by the block diagrams on the website, the "basic 
> peripherals and interconnect" section hanging off the side of the CCI 
> implies that probably is true to the real topology as I imagined, so it 
> doesn't even count as a horrible hack :)

Indeed, on this chip there's a NoC lumping behind it several low-speed 
devices such as usb, sata, esdhc.

>> Thanks a lot, that was really helpful.
> 
> No problem. FWIW if you ever come to doing ACPI support for these SoCs, 
> the equivalent is merely a case of setting the device memory address 
> size limit field appropriately for all the named components.
> 

Thanks, I'll keep this in mind. If i remember correctly, there are 
people over here working on UEFI + ACPI support for some LS chips but 
progress appears to be slow.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-20 10:38       ` Laurentiu Tudor
  2018-09-20 11:49         ` Robin Murphy
@ 2018-09-20 19:07         ` Li Yang
  2018-09-21  7:32           ` Laurentiu Tudor
  1 sibling, 1 reply; 34+ messages in thread
From: Li Yang @ 2018-09-20 19:07 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: robin.murphy,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Netdev, lkml,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	madalin.bucur, Roy Pledge, Shawn Guo, David Miller

On Thu, Sep 20, 2018 at 5:39 AM Laurentiu Tudor <laurentiu.tudor@nxp.com> wrote:
>
>
>
> On 19.09.2018 17:37, Robin Murphy wrote:
> > On 19/09/18 15:18, Laurentiu Tudor wrote:
> >> Hi Robin,
> >>
> >> On 19.09.2018 16:25, Robin Murphy wrote:
> >>> Hi Laurentiu,
> >>>
> >>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
> >>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>>>
> >>>> This patch series adds SMMU support for NXP LS1043A and LS1046A chips
> >>>> and consists mostly in important driver fixes and the required device
> >>>> tree updates. It touches several subsystems and consists of three main
> >>>> parts:
> >>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping of
> >>>>      reserved memory areas, fixes and defered probe support
> >>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
> >>>>      consisting in misc dma mapping related fixes and probe ordering
> >>>>    - addition of the actual arm smmu device tree node together with
> >>>>      various adjustments to the device trees
> >>>>
> >>>> Performance impact
> >>>>
> >>>>       Running iperf benchmarks in a back-to-back setup (both sides
> >>>>       having smmu enabled) on a 10GBps port show an important
> >>>>       networking performance degradation of around %40 (9.48Gbps
> >>>>       linerate vs 5.45Gbps). If you need performance but without
> >>>>       SMMU support you can use "iommu.passthrough=1" to disable
> >>>>       SMMU.
> >>>>
> >>>> USB issue and workaround
> >>>>
> >>>>       There's a problem with the usb controllers in these chips
> >>>>       generating smaller, 40-bit wide dma addresses instead of the
> >>>> 48-bit
> >>>>       supported at the smmu input. So you end up in a situation
> >>>> where the
> >>>>       smmu is mapped with 48-bit address translations, but the device
> >>>>       generates transactions with clipped 40-bit addresses, thus smmu
> >>>>       context faults are triggered. I encountered a similar
> >>>> situation for
> >>>>       mmc that I  managed to fix in software [1] however for USB I
> >>>> did not
> >>>>       find a proper place in the code to add a similar fix. The only
> >>>>       workaround I found was to add this kernel parameter which
> >>>> limits the
> >>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
> >>>>       This workaround if far from ideal, so any suggestions for a code
> >>>>       based workaround in this area would be greatly appreciated.
> >>>
> >>> If you have a nominally-64-bit device with a
> >>> narrower-than-the-main-interconnect link in front of it, that should
> >>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-ranges,
> >>> provided the interconnect hierarchy can be described appropriately (or
> >>> at least massaged sufficiently to satisfy the binding), e.g.:
> >>>
> >>> / {
> >>>       ...
> >>>
> >>>       soc {
> >>>           ranges;
> >>>           dma-ranges = <0 0 10000 0>;
> >>>
> >>>           dev_48bit { ... };
> >>>
> >>>           periph_bus {
> >>>               ranges;
> >>>               dma-ranges = <0 0 100 0>;
> >>>
> >>>               dev_40bit { ... };
> >>>           };
> >>>       };
> >>> };
> >>>
> >>> and if that fails to work as expected (except for PCI hosts where
> >>> handling dma-ranges properly still needs sorting out), please do let us
> >>> know ;)
> >>>
> >>
> >> Just to confirm, Is this [1] the change I was supposed to test?
> >
> > Not quite - dma-ranges is only valid for nodes representing a bus, so
> > putting it directly in the USB device nodes doesn't work (FWIW that's
> > why PCI is broken, because the parser doesn't expect the
> > bus-as-leaf-node case). That's teh point of that intermediate simple-bus
> > node represented by "periph_bus" in my example (sorry, I should have put
> > compatibles in to make it clearer) - often that's actually true to life
> > (i.e. "soc" is something like a CCI and "periph_bus" is something like
> > an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
> > CCI ports) but at worst it's just a necessary evil to make the binding
> > happy (if it literally only represents the point-to-point link between
> > the device master port and interconnect slave port).
> >
>
> Quick update: so I adjusted to device tree according to your example and
> it works so now I can get rid of that nasty kernel arg based workaround,
> yey! :-)

Great that we have a generic solution like I hoped for!  So you will
submit a new revision of the series to include these dts updates,
right?

Regards,
Leo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A
  2018-09-20 19:07         ` Li Yang
@ 2018-09-21  7:32           ` Laurentiu Tudor
  0 siblings, 0 replies; 34+ messages in thread
From: Laurentiu Tudor @ 2018-09-21  7:32 UTC (permalink / raw)
  To: Leo Li
  Cc: robin.murphy,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Netdev, lkml,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Madalin-cristian Bucur, Roy Pledge, Shawn Guo, David Miller



> -----Original Message-----
> From: Li Yang [mailto:leoyang.li@nxp.com]
> Sent: Thursday, September 20, 2018 10:07 PM
> 
> On Thu, Sep 20, 2018 at 5:39 AM Laurentiu Tudor <laurentiu.tudor@nxp.com>
> wrote:
> >
> >
> >
> > On 19.09.2018 17:37, Robin Murphy wrote:
> > > On 19/09/18 15:18, Laurentiu Tudor wrote:
> > >> Hi Robin,
> > >>
> > >> On 19.09.2018 16:25, Robin Murphy wrote:
> > >>> Hi Laurentiu,
> > >>>
> > >>> On 19/09/18 13:35, laurentiu.tudor@nxp.com wrote:
> > >>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> > >>>>
> > >>>> This patch series adds SMMU support for NXP LS1043A and LS1046A
> chips
> > >>>> and consists mostly in important driver fixes and the required
> device
> > >>>> tree updates. It touches several subsystems and consists of three
> main
> > >>>> parts:
> > >>>>    - changes in soc/drivers/fsl/qbman drivers adding iommu mapping
> of
> > >>>>      reserved memory areas, fixes and defered probe support
> > >>>>    - changes in drivers/net/ethernet/freescale/dpaa_eth drivers
> > >>>>      consisting in misc dma mapping related fixes and probe
> ordering
> > >>>>    - addition of the actual arm smmu device tree node together with
> > >>>>      various adjustments to the device trees
> > >>>>
> > >>>> Performance impact
> > >>>>
> > >>>>       Running iperf benchmarks in a back-to-back setup (both sides
> > >>>>       having smmu enabled) on a 10GBps port show an important
> > >>>>       networking performance degradation of around %40 (9.48Gbps
> > >>>>       linerate vs 5.45Gbps). If you need performance but without
> > >>>>       SMMU support you can use "iommu.passthrough=1" to disable
> > >>>>       SMMU.
> > >>>>
> > >>>> USB issue and workaround
> > >>>>
> > >>>>       There's a problem with the usb controllers in these chips
> > >>>>       generating smaller, 40-bit wide dma addresses instead of the
> > >>>> 48-bit
> > >>>>       supported at the smmu input. So you end up in a situation
> > >>>> where the
> > >>>>       smmu is mapped with 48-bit address translations, but the
> device
> > >>>>       generates transactions with clipped 40-bit addresses, thus
> smmu
> > >>>>       context faults are triggered. I encountered a similar
> > >>>> situation for
> > >>>>       mmc that I  managed to fix in software [1] however for USB I
> > >>>> did not
> > >>>>       find a proper place in the code to add a similar fix. The
> only
> > >>>>       workaround I found was to add this kernel parameter which
> > >>>> limits the
> > >>>>       usb dma to 32-bit size: "xhci-hcd.quirks=0x800000".
> > >>>>       This workaround if far from ideal, so any suggestions for a
> code
> > >>>>       based workaround in this area would be greatly appreciated.
> > >>>
> > >>> If you have a nominally-64-bit device with a
> > >>> narrower-than-the-main-interconnect link in front of it, that should
> > >>> already be fixed in 4.19-rc by bus_dma_mask picking up DT dma-
> ranges,
> > >>> provided the interconnect hierarchy can be described appropriately
> (or
> > >>> at least massaged sufficiently to satisfy the binding), e.g.:
> > >>>
> > >>> / {
> > >>>       ...
> > >>>
> > >>>       soc {
> > >>>           ranges;
> > >>>           dma-ranges = <0 0 10000 0>;
> > >>>
> > >>>           dev_48bit { ... };
> > >>>
> > >>>           periph_bus {
> > >>>               ranges;
> > >>>               dma-ranges = <0 0 100 0>;
> > >>>
> > >>>               dev_40bit { ... };
> > >>>           };
> > >>>       };
> > >>> };
> > >>>
> > >>> and if that fails to work as expected (except for PCI hosts where
> > >>> handling dma-ranges properly still needs sorting out), please do let
> us
> > >>> know ;)
> > >>>
> > >>
> > >> Just to confirm, Is this [1] the change I was supposed to test?
> > >
> > > Not quite - dma-ranges is only valid for nodes representing a bus, so
> > > putting it directly in the USB device nodes doesn't work (FWIW that's
> > > why PCI is broken, because the parser doesn't expect the
> > > bus-as-leaf-node case). That's teh point of that intermediate simple-
> bus
> > > node represented by "periph_bus" in my example (sorry, I should have
> put
> > > compatibles in to make it clearer) - often that's actually true to
> life
> > > (i.e. "soc" is something like a CCI and "periph_bus" is something like
> > > an AXI NIC gluing a bunch of lower-bandwidth DMA masters to one of the
> > > CCI ports) but at worst it's just a necessary evil to make the binding
> > > happy (if it literally only represents the point-to-point link between
> > > the device master port and interconnect slave port).
> > >
> >
> > Quick update: so I adjusted to device tree according to your example and
> > it works so now I can get rid of that nasty kernel arg based workaround,
> > yey! :-)
> 
> Great that we have a generic solution like I hoped for!  So you will
> submit a new revision of the series to include these dts updates,
> right?
> 

Yes, I already have it prepared. Just delaying the v2 for a few days maybe there will be some more feedback.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-09-21  7:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-19 12:35 [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A laurentiu.tudor
2018-09-19 12:35 ` [PATCH 01/21] soc/fsl/qman: fixup liodns only on ppc targets laurentiu.tudor
2018-09-19 12:35 ` [PATCH 02/21] soc/fsl/bman: map FBPR area in the iommu laurentiu.tudor
2018-09-19 12:35 ` [PATCH 03/21] soc/fsl/qman: map FQD and PFDR areas " laurentiu.tudor
2018-09-19 12:35 ` [PATCH 04/21] soc/fsl/qman-portal: map CENA area " laurentiu.tudor
2018-09-19 12:35 ` [PATCH 05/21] soc/fsl/qbman: add APIs to retrieve the probing status laurentiu.tudor
2018-09-19 12:35 ` [PATCH 06/21] soc/fsl/qman_portals: defer probe after qman's probe laurentiu.tudor
2018-09-19 12:35 ` [PATCH 07/21] soc/fsl/bman_portals: defer probe after bman's probe laurentiu.tudor
2018-09-19 12:36 ` [PATCH 08/21] soc/fsl/qbman_portals: add APIs to retrieve the probing status laurentiu.tudor
2018-09-19 12:36 ` [PATCH 09/21] fsl/fman: backup and restore ICID registers laurentiu.tudor
2018-09-19 12:36 ` [PATCH 10/21] fsl/fman: add API to get the device behind a fman port laurentiu.tudor
2018-09-19 12:36 ` [PATCH 11/21] dpaa_eth: defer probing after qbman laurentiu.tudor
2018-09-19 12:36 ` [PATCH 12/21] dpaa_eth: base dma mappings on the fman rx port laurentiu.tudor
2018-09-19 12:36 ` [PATCH 13/21] dpaa_eth: fix iova handling for contiguous frames laurentiu.tudor
2018-09-19 12:36 ` [PATCH 14/21] dpaa_eth: fix iova handling for sg frames laurentiu.tudor
2018-09-19 12:36 ` [PATCH 15/21] dpaa_eth: fix SG frame cleanup laurentiu.tudor
2018-09-19 12:36 ` [PATCH 16/21] arm64: dts: ls1046a: add smmu node laurentiu.tudor
2018-09-19 13:30   ` Robin Murphy
2018-09-19 13:51     ` Laurentiu Tudor
2018-09-19 12:36 ` [PATCH 17/21] arm64: dts: ls1043a: " laurentiu.tudor
2018-09-19 12:36 ` [PATCH 18/21] arm64: dts: ls104xa: set mask to drop TBU ID from StreamID laurentiu.tudor
2018-09-19 13:41   ` Robin Murphy
2018-09-19 14:06     ` Laurentiu Tudor
2018-09-19 12:36 ` [PATCH 19/21] arm64: dts: ls104x: add missing dma ranges property laurentiu.tudor
2018-09-19 12:36 ` [PATCH 20/21] arm64: dts: ls104x: add iommu-map to pci controllers laurentiu.tudor
2018-09-19 12:36 ` [PATCH 21/21] arm64: dts: ls104x: make dma-coherent global to the SoC laurentiu.tudor
2018-09-19 13:25 ` [PATCH 00/21] SMMU enablement for NXP LS1043A and LS1046A Robin Murphy
2018-09-19 14:18   ` Laurentiu Tudor
2018-09-19 14:37     ` Robin Murphy
2018-09-20 10:38       ` Laurentiu Tudor
2018-09-20 11:49         ` Robin Murphy
2018-09-20 14:33           ` Laurentiu Tudor
2018-09-20 19:07         ` Li Yang
2018-09-21  7:32           ` Laurentiu Tudor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).