All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] fsl/qbman: ARM Enablement
@ 2017-01-18 22:39 Roy Pledge
  2017-01-18 22:39 ` [PATCH 01/10] soc/qbman: Use portable mapping for the FQD reserved memory Roy Pledge
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series enables DPAA1 QBMan devices for ARM and
ARM64 architectures. This allows the LS1043A and LS1046A to use
QBMan functionality.


Claudiu Manoil (4):
  soc/qbman: Use portable mapping for the FQD reserved memory
  soc/qbman: Drop L1_CACHE_BYTES compile time check
  soc/qbman: Add ARM equivalent for flush_dcache_range()
  soc/qbman: Add missing headers on ARM

Madalin Bucur (4):
  soc/qbman: Drop set/clear_bits usage
  soc/qbman: add QMAN_REV32
  soc/qbman: different register offsets on ARM
  fsl/qbman: Enable FSL_LAYERSCAPE config on ARM

Roy Pledge (1):
  soc/qbman: Rework ioremap() calls for ARM/PPC

Valentin Rothberg (1):
  soc/qbman: Fix ARM32 typo

 drivers/soc/fsl/qbman/Kconfig       |    2 +-
 drivers/soc/fsl/qbman/bman.c        |   24 +++++++++++++++++-
 drivers/soc/fsl/qbman/bman_portal.c |   16 +++++++++---
 drivers/soc/fsl/qbman/dpaa_sys.h    |    8 +++---
 drivers/soc/fsl/qbman/qman.c        |   46 ++++++++++++++++++++++++++++++++---
 drivers/soc/fsl/qbman/qman_ccsr.c   |   16 ++++++++----
 drivers/soc/fsl/qbman/qman_portal.c |   16 +++++++++---
 drivers/soc/fsl/qbman/qman_priv.h   |    1 +
 8 files changed, 107 insertions(+), 22 deletions(-)

--
1.7.9.5

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/10] soc/qbman: Use portable mapping for the FQD reserved memory
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 02/10] soc/qbman: Drop set/clear_bits usage Roy Pledge
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Claudiu Manoil <claudiu.manoil@nxp.com>

Use memremap/memset to zero the private QBMan areas to ensure
portability.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
---
 drivers/soc/fsl/qbman/qman_ccsr.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index f4e6e70d..43feaa9 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -441,16 +441,14 @@ static unsigned int qm_get_fqid_maxcnt(void)
 static int zero_priv_mem(struct device *dev, struct device_node *node,
 			 phys_addr_t addr, size_t sz)
 {
-	/* map as cacheable, non-guarded */
-	void __iomem *tmpp = ioremap_prot(addr, sz, 0);
-
+	void *tmpp = memremap(addr, sz, MEMREMAP_WB);
 	if (!tmpp)
 		return -ENOMEM;
 
-	memset_io(tmpp, 0, sz);
+	memset(tmpp, 0, sz);
 	flush_dcache_range((unsigned long)tmpp,
 			   (unsigned long)tmpp + sz);
-	iounmap(tmpp);
+	memunmap(tmpp);
 
 	return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/10] soc/qbman: Drop set/clear_bits usage
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
  2017-01-18 22:39 ` [PATCH 01/10] soc/qbman: Use portable mapping for the FQD reserved memory Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 03/10] soc/qbman: Drop L1_CACHE_BYTES compile time check Roy Pledge
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Madalin Bucur <madalin.bucur@nxp.com>

Replace PPC specific set/clear_bits API with standard
bit twiddling so driver is portalable outside PPC.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
---
 drivers/soc/fsl/qbman/bman.c |    2 +-
 drivers/soc/fsl/qbman/qman.c |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index a3d6d7c..3acded1 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -607,7 +607,7 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits)
 	unsigned long irqflags;
 
 	local_irq_save(irqflags);
-	set_bits(bits & BM_PIRQ_VISIBLE, &p->irq_sources);
+	p->irq_sources |= bits & BM_PIRQ_VISIBLE;
 	bm_out(&p->p, BM_REG_IER, p->irq_sources);
 	local_irq_restore(irqflags);
 	return 0;
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 6f509f6..879e8f2 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -909,12 +909,12 @@ static inline int qm_mc_result_timeout(struct qm_portal *portal,
 
 static inline void fq_set(struct qman_fq *fq, u32 mask)
 {
-	set_bits(mask, &fq->flags);
+	fq->flags |= mask;
 }
 
 static inline void fq_clear(struct qman_fq *fq, u32 mask)
 {
-	clear_bits(mask, &fq->flags);
+	fq->flags &= ~mask;
 }
 
 static inline int fq_isset(struct qman_fq *fq, u32 mask)
@@ -1561,7 +1561,7 @@ void qman_p_irqsource_add(struct qman_portal *p, u32 bits)
 	unsigned long irqflags;
 
 	local_irq_save(irqflags);
-	set_bits(bits & QM_PIRQ_VISIBLE, &p->irq_sources);
+	p->irq_sources |= bits & QM_PIRQ_VISIBLE;
 	qm_out(&p->p, QM_REG_IER, p->irq_sources);
 	local_irq_restore(irqflags);
 }
@@ -1584,7 +1584,7 @@ void qman_p_irqsource_remove(struct qman_portal *p, u32 bits)
 	 */
 	local_irq_save(irqflags);
 	bits &= QM_PIRQ_VISIBLE;
-	clear_bits(bits, &p->irq_sources);
+	p->irq_sources &= ~bits;
 	qm_out(&p->p, QM_REG_IER, p->irq_sources);
 	ier = qm_in(&p->p, QM_REG_IER);
 	/*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/10] soc/qbman: Drop L1_CACHE_BYTES compile time check
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
  2017-01-18 22:39 ` [PATCH 01/10] soc/qbman: Use portable mapping for the FQD reserved memory Roy Pledge
  2017-01-18 22:39 ` [PATCH 02/10] soc/qbman: Drop set/clear_bits usage Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 04/10] soc/qbman: Fix ARM32 typo Roy Pledge
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Claudiu Manoil <claudiu.manoil@nxp.com>

Not relevant and arch dependent. Overkill for PPC.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
 drivers/soc/fsl/qbman/dpaa_sys.h |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 2eaf318..4f1623b 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -48,10 +48,6 @@
 #define DPAA_PORTAL_CE 0
 #define DPAA_PORTAL_CI 1
 
-#if (L1_CACHE_BYTES != 32) && (L1_CACHE_BYTES != 64)
-#error "Unsupported Cacheline Size"
-#endif
-
 static inline void dpaa_flush(void *p)
 {
 #ifdef CONFIG_PPC
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/10] soc/qbman: Fix ARM32 typo
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (2 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 03/10] soc/qbman: Drop L1_CACHE_BYTES compile time check Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 05/10] soc/qbman: Rework ioremap() calls for ARM/PPC Roy Pledge
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Valentin Rothberg <valentinrothberg@gmail.com>

The Kconfig symbol for 32bit ARM is 'ARM', not 'ARM32'.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
 drivers/soc/fsl/qbman/dpaa_sys.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 4f1623b..ae65758 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -52,7 +52,7 @@ static inline void dpaa_flush(void *p)
 {
 #ifdef CONFIG_PPC
 	flush_dcache_range((unsigned long)p, (unsigned long)p+64);
-#elif defined(CONFIG_ARM32)
+#elif defined(CONFIG_ARM)
 	__cpuc_flush_dcache_area(p, 64);
 #elif defined(CONFIG_ARM64)
 	__flush_dcache_area(p, 64);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/10] soc/qbman: Rework ioremap() calls for ARM/PPC
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (3 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 04/10] soc/qbman: Fix ARM32 typo Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range() Roy Pledge
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

Rework ioremap() for PPC and ARM. The PPC devices require a
non-coherent mapping while ARM will work with a non-cachable/write
combine mapping.

Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
---
 drivers/soc/fsl/qbman/bman_portal.c |   16 +++++++++++++---
 drivers/soc/fsl/qbman/qman_portal.c |   16 +++++++++++++---
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 8354d4d..a661f30 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -125,7 +125,18 @@ static int bman_portal_probe(struct platform_device *pdev)
 	}
 	pcfg->irq = irq;
 
-	va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0);
+#ifdef CONFIG_PPC
+	/* PPC requires a cacheable/non-coherent mapping of the portal */
+	va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]),
+			  (pgprot_val(PAGE_KERNEL) & ~_PAGE_COHERENT));
+#else
+	/*
+	 * For ARM we can use write combine mapping.  A cacheable/non shareable
+	 * mapping will perform better but equires additional platform
+	 * support which is not currently available
+	 */
+	va = ioremap_wc(addr_phys[0]->start, resource_size(addr_phys[0]));
+#endif
 	if (!va) {
 		dev_err(dev, "ioremap::CE failed\n");
 		goto err_ioremap1;
@@ -133,8 +144,7 @@ static int bman_portal_probe(struct platform_device *pdev)
 
 	pcfg->addr_virt[DPAA_PORTAL_CE] = va;
 
-	va = ioremap_prot(addr_phys[1]->start, resource_size(addr_phys[1]),
-			  _PAGE_GUARDED | _PAGE_NO_CACHE);
+	va = ioremap(addr_phys[1]->start, resource_size(addr_phys[1]));
 	if (!va) {
 		dev_err(dev, "ioremap::CI failed\n");
 		goto err_ioremap2;
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index adbaa30..e7a9eef 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -265,7 +265,18 @@ static int qman_portal_probe(struct platform_device *pdev)
 	}
 	pcfg->irq = irq;
 
-	va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0);
+#ifdef CONFIG_PPC
+	/* PPC requires a cacheable/non-coherent mapping of the portal */
+	va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]),
+			  (pgprot_val(PAGE_KERNEL) & ~_PAGE_COHERENT));
+#else
+	/*
+	 * For ARM we can use write combine mapping.  A cacheable/non shareable
+	 * mapping will perform better but equires additional platform
+	 * support which is not currently available
+	 */
+	va = ioremap_wc(addr_phys[0]->start, resource_size(addr_phys[0]));
+#endif
 	if (!va) {
 		dev_err(dev, "ioremap::CE failed\n");
 		goto err_ioremap1;
@@ -273,8 +284,7 @@ static int qman_portal_probe(struct platform_device *pdev)
 
 	pcfg->addr_virt[DPAA_PORTAL_CE] = va;
 
-	va = ioremap_prot(addr_phys[1]->start, resource_size(addr_phys[1]),
-			  _PAGE_GUARDED | _PAGE_NO_CACHE);
+	va = ioremap(addr_phys[1]->start, resource_size(addr_phys[1]));
 	if (!va) {
 		dev_err(dev, "ioremap::CI failed\n");
 		goto err_ioremap2;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (4 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 05/10] soc/qbman: Rework ioremap() calls for ARM/PPC Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 23:12   ` Russell King - ARM Linux
  2017-01-18 22:39 ` [PATCH 07/10] soc/qbman: add QMAN_REV32 Roy Pledge
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Claudiu Manoil <claudiu.manoil@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
---
 drivers/soc/fsl/qbman/qman_ccsr.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 43feaa9..67ae073 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node,
 		return -ENOMEM;
 
 	memset(tmpp, 0, sz);
+#ifdef CONFIG_PPC
 	flush_dcache_range((unsigned long)tmpp,
 			   (unsigned long)tmpp + sz);
+#elif defined(CONFIG_ARM)
+	__cpuc_flush_dcache_area(tmpp, sz);
+#elif defined(CONFIG_ARM64)
+	__flush_dcache_area(tmpp, sz);
+#endif
 	memunmap(tmpp);
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/10] soc/qbman: add QMAN_REV32
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (5 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range() Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 08/10] soc/qbman: different register offsets on ARM Roy Pledge
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Madalin Bucur <madalin.bucur@nxp.com>

Add revision 3.2 of the QBMan block.  This is the version
for LS1043A and LS1046A SoCs.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
---
 drivers/soc/fsl/qbman/qman_ccsr.c |    2 ++
 drivers/soc/fsl/qbman/qman_priv.h |    1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
index 67ae073..e68793d 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -719,6 +719,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
 		qman_ip_rev = QMAN_REV30;
 	else if (major == 3 && minor == 1)
 		qman_ip_rev = QMAN_REV31;
+	else if (major == 3 && minor == 2)
+		qman_ip_rev = QMAN_REV32;
 	else {
 		dev_err(dev, "Unknown QMan version\n");
 		return -ENODEV;
diff --git a/drivers/soc/fsl/qbman/qman_priv.h b/drivers/soc/fsl/qbman/qman_priv.h
index 53685b5..554d5d4 100644
--- a/drivers/soc/fsl/qbman/qman_priv.h
+++ b/drivers/soc/fsl/qbman/qman_priv.h
@@ -247,6 +247,7 @@ struct qm_portal_config {
 #define QMAN_REV20 0x0200
 #define QMAN_REV30 0x0300
 #define QMAN_REV31 0x0301
+#define QMAN_REV32 0x0302
 extern u16 qman_ip_rev; /* 0 if uninitialised, otherwise QMAN_REVx */
 
 #define QM_FQID_RANGE_START 1 /* FQID 0 reserved for internal use */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/10] soc/qbman: different register offsets on ARM
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (6 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 07/10] soc/qbman: add QMAN_REV32 Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 09/10] soc/qbman: Add missing headers " Roy Pledge
  2017-01-18 22:39 ` [PATCH 10/10] fsl/qbman: Enable FSL_LAYERSCAPE config " Roy Pledge
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Madalin Bucur <madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
 drivers/soc/fsl/qbman/bman.c |   22 ++++++++++++++++++++++
 drivers/soc/fsl/qbman/qman.c |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index 3acded1..1fa9099 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -35,6 +35,27 @@
 
 /* Portal register assists */
 
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+/* Cache-inhibited register offsets */
+#define BM_REG_RCR_PI_CINH	0x3000
+#define BM_REG_RCR_CI_CINH	0x3100
+#define BM_REG_RCR_ITR		0x3200
+#define BM_REG_CFG		0x3300
+#define BM_REG_SCN(n)		(0x3400 + ((n) << 6))
+#define BM_REG_ISR		0x3e00
+#define BM_REG_IER		0x3e40
+#define BM_REG_ISDR		0x3e80
+#define BM_REG_IIR		0x3ec0
+
+/* Cache-enabled register offsets */
+#define BM_CL_CR		0x0000
+#define BM_CL_RR0		0x0100
+#define BM_CL_RR1		0x0140
+#define BM_CL_RCR		0x1000
+#define BM_CL_RCR_PI_CENA	0x3000
+#define BM_CL_RCR_CI_CENA	0x3100
+
+#else
 /* Cache-inhibited register offsets */
 #define BM_REG_RCR_PI_CINH	0x0000
 #define BM_REG_RCR_CI_CINH	0x0004
@@ -53,6 +74,7 @@
 #define BM_CL_RCR		0x1000
 #define BM_CL_RCR_PI_CENA	0x3000
 #define BM_CL_RCR_CI_CENA	0x3100
+#endif
 
 /*
  * Portal modes.
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 879e8f2..d67b8e1 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -41,6 +41,43 @@
 
 /* Portal register assists */
 
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+/* Cache-inhibited register offsets */
+#define QM_REG_EQCR_PI_CINH	0x3000
+#define QM_REG_EQCR_CI_CINH	0x3040
+#define QM_REG_EQCR_ITR		0x3080
+#define QM_REG_DQRR_PI_CINH	0x3100
+#define QM_REG_DQRR_CI_CINH	0x3140
+#define QM_REG_DQRR_ITR		0x3180
+#define QM_REG_DQRR_DCAP	0x31C0
+#define QM_REG_DQRR_SDQCR	0x3200
+#define QM_REG_DQRR_VDQCR	0x3240
+#define QM_REG_DQRR_PDQCR	0x3280
+#define QM_REG_MR_PI_CINH	0x3300
+#define QM_REG_MR_CI_CINH	0x3340
+#define QM_REG_MR_ITR		0x3380
+#define QM_REG_CFG		0x3500
+#define QM_REG_ISR		0x3600
+#define QM_REG_IER		0x3640
+#define QM_REG_ISDR		0x3680
+#define QM_REG_IIR		0x36C0
+#define QM_REG_ITPR		0x3740
+
+/* Cache-enabled register offsets */
+#define QM_CL_EQCR		0x0000
+#define QM_CL_DQRR		0x1000
+#define QM_CL_MR		0x2000
+#define QM_CL_EQCR_PI_CENA	0x3000
+#define QM_CL_EQCR_CI_CENA	0x3040
+#define QM_CL_DQRR_PI_CENA	0x3100
+#define QM_CL_DQRR_CI_CENA	0x3140
+#define QM_CL_MR_PI_CENA	0x3300
+#define QM_CL_MR_CI_CENA	0x3340
+#define QM_CL_CR		0x3800
+#define QM_CL_RR0		0x3900
+#define QM_CL_RR1		0x3940
+
+#else
 /* Cache-inhibited register offsets */
 #define QM_REG_EQCR_PI_CINH	0x0000
 #define QM_REG_EQCR_CI_CINH	0x0004
@@ -75,6 +112,7 @@
 #define QM_CL_CR		0x3800
 #define QM_CL_RR0		0x3900
 #define QM_CL_RR1		0x3940
+#endif
 
 /*
  * BTW, the drivers (and h/w programming model) already obtain the required
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/10] soc/qbman: Add missing headers on ARM
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (7 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 08/10] soc/qbman: different register offsets on ARM Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  2017-01-18 22:39 ` [PATCH 10/10] fsl/qbman: Enable FSL_LAYERSCAPE config " Roy Pledge
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Claudiu Manoil <claudiu.manoil@nxp.com>

Unlike PPC builds, ARM builds need following headers
explicitly:
+#include <linux/io.h>		for ioread32be()
+#include <linux/delay.h>		for udelay()

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
 drivers/soc/fsl/qbman/dpaa_sys.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index ae65758..6239e18 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -43,6 +43,8 @@
 #include <linux/prefetch.h>
 #include <linux/genalloc.h>
 #include <asm/cacheflush.h>
+#include <linux/io.h>
+#include <linux/delay.h>
 
 /* For 2-element tables related to cache-inhibited and cache-enabled mappings */
 #define DPAA_PORTAL_CE 0
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/10] fsl/qbman: Enable FSL_LAYERSCAPE config on ARM
  2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
                   ` (8 preceding siblings ...)
  2017-01-18 22:39 ` [PATCH 09/10] soc/qbman: Add missing headers " Roy Pledge
@ 2017-01-18 22:39 ` Roy Pledge
  9 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-01-18 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Madalin Bucur <madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
[Stuart: changed to use ARCH_LAYERSCAPE]
Signed-off-by: Stuart Yoder <stuart.yoder@nxp.com>
---
 drivers/soc/fsl/qbman/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig
index 757033c..fb4e6bf 100644
--- a/drivers/soc/fsl/qbman/Kconfig
+++ b/drivers/soc/fsl/qbman/Kconfig
@@ -1,6 +1,6 @@
 menuconfig FSL_DPAA
 	bool "Freescale DPAA 1.x support"
-	depends on FSL_SOC_BOOKE
+	depends on (FSL_SOC_BOOKE || ARCH_LAYERSCAPE)
 	select GENERIC_ALLOCATOR
 	help
 	  The Freescale Data Path Acceleration Architecture (DPAA) is a set of
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-18 22:39 ` [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range() Roy Pledge
@ 2017-01-18 23:12   ` Russell King - ARM Linux
  2017-01-18 23:36     ` Scott Wood
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-01-18 23:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote:
> From: Claudiu Manoil <claudiu.manoil@nxp.com>
> 
> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
> Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
> ---
>  drivers/soc/fsl/qbman/qman_ccsr.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
> index 43feaa9..67ae073 100644
> --- a/drivers/soc/fsl/qbman/qman_ccsr.c
> +++ b/drivers/soc/fsl/qbman/qman_ccsr.c
> @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node,
>  		return -ENOMEM;
>  
>  	memset(tmpp, 0, sz);
> +#ifdef CONFIG_PPC
>  	flush_dcache_range((unsigned long)tmpp,
>  			   (unsigned long)tmpp + sz);
> +#elif defined(CONFIG_ARM)
> +	__cpuc_flush_dcache_area(tmpp, sz);

Please do not fiddle about under the covers, there be dragons there.

It looks to me like you're trying to use __cpuc_flush_dcache_area()
on an area that's been ioremap()'d, which is a waste of CPU cycles.
ioremap()'d areas are mapped as "device" memory type, which means
the region isn't even cached.  So I don't think this is necessary.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-18 23:12   ` Russell King - ARM Linux
@ 2017-01-18 23:36     ` Scott Wood
  2017-01-23 19:24       ` Roy Pledge
  2017-01-30 15:12       ` Russell King - ARM Linux
  0 siblings, 2 replies; 37+ messages in thread
From: Scott Wood @ 2017-01-18 23:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote:
> On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote:
>> From: Claudiu Manoil <claudiu.manoil@nxp.com>
>>
>> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
>> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
>> Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
>> ---
>>  drivers/soc/fsl/qbman/qman_ccsr.c |    6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
>> index 43feaa9..67ae073 100644
>> --- a/drivers/soc/fsl/qbman/qman_ccsr.c
>> +++ b/drivers/soc/fsl/qbman/qman_ccsr.c
>> @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node,
>>  		return -ENOMEM;
>>  
>>  	memset(tmpp, 0, sz);
>> +#ifdef CONFIG_PPC
>>  	flush_dcache_range((unsigned long)tmpp,
>>  			   (unsigned long)tmpp + sz);
>> +#elif defined(CONFIG_ARM)
>> +	__cpuc_flush_dcache_area(tmpp, sz);
> 
> Please do not fiddle about under the covers, there be dragons there.
> 
> It looks to me like you're trying to use __cpuc_flush_dcache_area()
> on an area that's been ioremap()'d, which is a waste of CPU cycles.
> ioremap()'d areas are mapped as "device" memory type, which means
> the region isn't even cached.  So I don't think this is necessary.

It is mapped cachable, via memremap() (see patch 1/10).  This is RAM
that the device uses, non-coherently, for its own purposes (but requires
software to clear it first).

Is there a non-"under the covers" way to say "flush this region" without
the arch second-guessing whether it really needs to be flushed?

-Scott

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-18 23:36     ` Scott Wood
@ 2017-01-23 19:24       ` Roy Pledge
  2017-01-25 21:20         ` Arnd Bergmann
  2017-01-30 15:12       ` Russell King - ARM Linux
  1 sibling, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-01-23 19:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/18/2017 6:36 PM, Scott Wood wrote:
> On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote:
>> On Wed, Jan 18, 2017 at 05:39:36PM -0500, Roy Pledge wrote:
>>> From: Claudiu Manoil <claudiu.manoil@nxp.com>
>>>
>>> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
>>> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
>>> Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
>>> ---
>>>  drivers/soc/fsl/qbman/qman_ccsr.c |    6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c
>>> index 43feaa9..67ae073 100644
>>> --- a/drivers/soc/fsl/qbman/qman_ccsr.c
>>> +++ b/drivers/soc/fsl/qbman/qman_ccsr.c
>>> @@ -446,8 +446,14 @@ static int zero_priv_mem(struct device *dev, struct device_node *node,
>>>  		return -ENOMEM;
>>>  
>>>  	memset(tmpp, 0, sz);
>>> +#ifdef CONFIG_PPC
>>>  	flush_dcache_range((unsigned long)tmpp,
>>>  			   (unsigned long)tmpp + sz);
>>> +#elif defined(CONFIG_ARM)
>>> +	__cpuc_flush_dcache_area(tmpp, sz);
>> Please do not fiddle about under the covers, there be dragons there.
>>
>> It looks to me like you're trying to use __cpuc_flush_dcache_area()
>> on an area that's been ioremap()'d, which is a waste of CPU cycles.
>> ioremap()'d areas are mapped as "device" memory type, which means
>> the region isn't even cached.  So I don't think this is necessary.
> It is mapped cachable, via memremap() (see patch 1/10).  This is RAM
> that the device uses, non-coherently, for its own purposes (but requires
> software to clear it first).
>
> Is there a non-"under the covers" way to say "flush this region" without
> the arch second-guessing whether it really needs to be flushed?
Any advice on how to resolve this? I looked into trying to do a
non-cacheable mapping of the
memory so that the flush wouldn't be required but the ioremap code
prevents mapping normal
memory in this way.  The QMan device requiresthis memory to be zeroed at
startup. Because the device
does non coherent reads and writes to the memory we must ensure that any
cache in the CPU cluster(s)
is flushed in order to prevent a future castout from overwriting data.
This happened before on PPC
platforms and made for some very unfun debug session trying to
understand what was causing the failure.

Roy

> -Scott
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-23 19:24       ` Roy Pledge
@ 2017-01-25 21:20         ` Arnd Bergmann
  2017-01-26  5:08           ` Scott Wood
  2017-01-30 15:04           ` Russell King - ARM Linux
  0 siblings, 2 replies; 37+ messages in thread
From: Arnd Bergmann @ 2017-01-25 21:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
> > Is there a non-"under the covers" way to say "flush this region" without
> > the arch second-guessing whether it really needs to be flushed?
> Any advice on how to resolve this? I looked into trying to do a
> non-cacheable mapping of the
> memory so that the flush wouldn't be required but the ioremap code
> prevents mapping normal
> memory in this way.  The QMan device requiresthis memory to be zeroed at
> startup. Because the device
> does non coherent reads and writes to the memory we must ensure that any
> cache in the CPU cluster(s)
> is flushed in order to prevent a future castout from overwriting data.
> This happened before on PPC
> platforms and made for some very unfun debug session trying to
> understand what was causing the failure.

If this is normal RAM, you should be able to just write zeroes, and then
do a dma_map_single() for initialization. Are there any other requirements
what to do with the memory later, is it used for communication at all,
or just required to be zero?

	Arnd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-25 21:20         ` Arnd Bergmann
@ 2017-01-26  5:08           ` Scott Wood
  2017-01-27 16:41             ` Arnd Bergmann
  2017-01-30 15:04           ` Russell King - ARM Linux
  1 sibling, 1 reply; 37+ messages in thread
From: Scott Wood @ 2017-01-26  5:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
>>> Is there a non-"under the covers" way to say "flush this region" without
>>> the arch second-guessing whether it really needs to be flushed?
>> Any advice on how to resolve this? I looked into trying to do a
>> non-cacheable mapping of the
>> memory so that the flush wouldn't be required but the ioremap code
>> prevents mapping normal
>> memory in this way.  The QMan device requiresthis memory to be zeroed at
>> startup. Because the device
>> does non coherent reads and writes to the memory we must ensure that any
>> cache in the CPU cluster(s)
>> is flushed in order to prevent a future castout from overwriting data.
>> This happened before on PPC
>> platforms and made for some very unfun debug session trying to
>> understand what was causing the failure.
> 
> If this is normal RAM, you should be able to just write zeroes, and then
> do a dma_map_single() for initialization.

The DMA API on PPC currently has no way of knowing that the device
accesses this memory incoherently.

If that were somehow fixed, we still couldn't use dma_map_single() as it
doesn't accept virtual addresses that come from memremap() or similar
dynamic mappings.  We'd have to convert the physical address to an array
of struct pages and pass each one to dma_map_page().

And even if we did all that, there would still be other manual cache
manipulation left in this driver, to deal with its cacheable register
interface.

> Are there any other requirements
> what to do with the memory later, is it used for communication at all,
> or just required to be zero?

It just needs to be zeroed before the device begins using it for
internal storage.

-Scott

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-26  5:08           ` Scott Wood
@ 2017-01-27 16:41             ` Arnd Bergmann
  2017-01-28  2:34               ` Scott Wood
  2017-01-30 15:19               ` Russell King - ARM Linux
  0 siblings, 2 replies; 37+ messages in thread
From: Arnd Bergmann @ 2017-01-27 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
> On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:

>> If this is normal RAM, you should be able to just write zeroes, and then
>> do a dma_map_single() for initialization.
>
> The DMA API on PPC currently has no way of knowing that the device
> accesses this memory incoherently.

Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
but just assumes that coherency is a global property of the platform,right?

> If that were somehow fixed, we still couldn't use dma_map_single() as it
> doesn't accept virtual addresses that come from memremap() or similar
> dynamic mappings.  We'd have to convert the physical address to an array
> of struct pages and pass each one to dma_map_page().

Sorry for my ignorance, but where does the memory come from to start
with? Is this in the normal linearly mapped RAM, in a carveout outside
of the linear mapping but the same memory, or in some on-chip buffer?

> And even if we did all that, there would still be other manual cache
> manipulation left in this driver, to deal with its cacheable register
> interface.

I thought we had concluded that "cacheable register" is something
that cannot work reliably on ARM at all when this came up before.
Any updates on that?

>> Are there any other requirements
>> what to do with the memory later, is it used for communication at all,
>> or just required to be zero?
>
> It just needs to be zeroed before the device begins using it for
> internal storage.

Ok.

    Arnd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-27 16:41             ` Arnd Bergmann
@ 2017-01-28  2:34               ` Scott Wood
  2017-01-30 15:31                 ` Robin Murphy
  2017-01-30 15:19               ` Russell King - ARM Linux
  1 sibling, 1 reply; 37+ messages in thread
From: Scott Wood @ 2017-01-28  2:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
> > 
> > On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
> > > 
> > > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
> > 
> > > 
> > > If this is normal RAM, you should be able to just write zeroes, and then
> > > do a dma_map_single() for initialization.
> > The DMA API on PPC currently has no way of knowing that the device
> > accesses this memory incoherently.
> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
> but just assumes that coherency is a global property of the platform,right?

Right.

> > If that were somehow fixed, we still couldn't use dma_map_single() as it
> > doesn't accept virtual addresses that come from memremap() or similar
> > dynamic mappings.??We'd have to convert the physical address to an array
> > of struct pages and pass each one to dma_map_page().
> Sorry for my ignorance, but where does the memory come from to start
> with? Is this in the normal linearly mapped RAM, in a carveout outside
> of the linear mapping but the same memory, or in some on-chip buffer?

It's RAM that comes from the device tree reserved memory mechanism
(drivers/of/of_reserved_mem.c). ?On a 32-bit kernel it is not guaranteed (or
likely) to be lowmem.

> > And even if we did all that, there would still be other manual cache
> > manipulation left in this driver, to deal with its cacheable register
> > interface.
> I thought we had concluded that "cacheable register" is something
> that cannot work reliably on ARM at all when this came up before.
> Any updates on that?

I'm not familiar with the details there... ?My understanding is that the
hardware people at NXP are convinced that it can work on these specific chips
due to implementation details.

-Scott

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-25 21:20         ` Arnd Bergmann
  2017-01-26  5:08           ` Scott Wood
@ 2017-01-30 15:04           ` Russell King - ARM Linux
  2017-02-01 12:52             ` Arnd Bergmann
  1 sibling, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-01-30 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 25, 2017 at 10:20:09PM +0100, Arnd Bergmann wrote:
> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
> > > Is there a non-"under the covers" way to say "flush this region" without
> > > the arch second-guessing whether it really needs to be flushed?
> > Any advice on how to resolve this? I looked into trying to do a
> > non-cacheable mapping of the
> > memory so that the flush wouldn't be required but the ioremap code
> > prevents mapping normal
> > memory in this way.  The QMan device requiresthis memory to be zeroed at
> > startup. Because the device
> > does non coherent reads and writes to the memory we must ensure that any
> > cache in the CPU cluster(s)
> > is flushed in order to prevent a future castout from overwriting data.
> > This happened before on PPC
> > platforms and made for some very unfun debug session trying to
> > understand what was causing the failure.
> 
> If this is normal RAM, you should be able to just write zeroes, and then
> do a dma_map_single() for initialization. Are there any other requirements
> what to do with the memory later, is it used for communication at all,
> or just required to be zero?

Please don't encourage incorrect DMA API usage.

dma_map_single() must always be paired with an unmap at some point,
otherwise it's a memory leak if DMA API debugging is enabled.  So,
if you wish to suggest using dma_map_single(), please also suggest
where to use dma_unmap_single() too.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-18 23:36     ` Scott Wood
  2017-01-23 19:24       ` Roy Pledge
@ 2017-01-30 15:12       ` Russell King - ARM Linux
  1 sibling, 0 replies; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-01-30 15:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 18, 2017 at 11:36:18PM +0000, Scott Wood wrote:
> On 01/18/2017 05:12 PM, Russell King - ARM Linux wrote:
> > Please do not fiddle about under the covers, there be dragons there.
> > 
> > It looks to me like you're trying to use __cpuc_flush_dcache_area()
> > on an area that's been ioremap()'d, which is a waste of CPU cycles.
> > ioremap()'d areas are mapped as "device" memory type, which means
> > the region isn't even cached.  So I don't think this is necessary.
> 
> It is mapped cachable, via memremap() (see patch 1/10).  This is RAM
> that the device uses, non-coherently, for its own purposes (but requires
> software to clear it first).

Ok.

> Is there a non-"under the covers" way to say "flush this region" without
> the arch second-guessing whether it really needs to be flushed?

memremap() doesn't have any associated cache flushing functionality
defined for it - it's a relatively new interface added to the kernel.

There are some interfaces defined for memory that was vmalloc'd, but
due to the way kernel/memremap.c works, you can't guarantee that the
pointer you end up with came from vmalloc'd space, so I don't think
suggesting to use those would be correct (as they may have assumptions
that they'll only be called with vmalloc'd addresses.)

As pointed out later in the thread, using dma_map_single() on the
returned address is not permissible either.

The only thing I can say is that it needs a proposal for a new arch-
independent interface.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-27 16:41             ` Arnd Bergmann
  2017-01-28  2:34               ` Scott Wood
@ 2017-01-30 15:19               ` Russell King - ARM Linux
  2017-02-01 12:47                 ` Arnd Bergmann
  1 sibling, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-01-30 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 27, 2017 at 05:41:10PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
> > And even if we did all that, there would still be other manual cache
> > manipulation left in this driver, to deal with its cacheable register
> > interface.
> 
> I thought we had concluded that "cacheable register" is something
> that cannot work reliably on ARM at all when this came up before.
> Any updates on that?

There's a big comment in arch/arm/include/asm/io.h which explains
everything.  Nothing there is likely to change.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-28  2:34               ` Scott Wood
@ 2017-01-30 15:31                 ` Robin Murphy
  2017-01-30 19:04                   ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2017-01-30 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/01/17 02:34, Scott Wood wrote:
> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
>>>
>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
>>>>
>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
>>>
>>>>
>>>> If this is normal RAM, you should be able to just write zeroes, and then
>>>> do a dma_map_single() for initialization.
>>> The DMA API on PPC currently has no way of knowing that the device
>>> accesses this memory incoherently.
>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
>> but just assumes that coherency is a global property of the platform,right?
> 
> Right.
> 
>>> If that were somehow fixed, we still couldn't use dma_map_single() as it
>>> doesn't accept virtual addresses that come from memremap() or similar
>>> dynamic mappings.  We'd have to convert the physical address to an array
>>> of struct pages and pass each one to dma_map_page().
>> Sorry for my ignorance, but where does the memory come from to start
>> with? Is this in the normal linearly mapped RAM, in a carveout outside
>> of the linear mapping but the same memory, or in some on-chip buffer?
> 
> It's RAM that comes from the device tree reserved memory mechanism
> (drivers/of/of_reserved_mem.c).  On a 32-bit kernel it is not guaranteed (or
> likely) to be lowmem.

Wouldn't dma_declare_coherent_memory() be the appropriate tool for that
job, then (modulo the PPC issue)? On ARM that should result in
dma_alloc_coherent() giving back a non-cacheable mapping if the device
is non-coherent, wherein a dmb_wmb() after writing the data from the CPU
side should be enough to ensure it is published to the device.

Robin.

>>> And even if we did all that, there would still be other manual cache
>>> manipulation left in this driver, to deal with its cacheable register
>>> interface.
>> I thought we had concluded that "cacheable register" is something
>> that cannot work reliably on ARM at all when this came up before.
>> Any updates on that?
> 
> I'm not familiar with the details there...  My understanding is that the
> hardware people at NXP are convinced that it can work on these specific chips
> due to implementation details.
> 
> -Scott
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-30 15:31                 ` Robin Murphy
@ 2017-01-30 19:04                   ` Roy Pledge
  2017-02-01 13:03                     ` Robin Murphy
  0 siblings, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-01-30 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/30/2017 10:31 AM, Robin Murphy wrote:
> On 28/01/17 02:34, Scott Wood wrote:
>> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
>>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
>>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
>>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
>>>>> If this is normal RAM, you should be able to just write zeroes, and then
>>>>> do a dma_map_single() for initialization.
>>>> The DMA API on PPC currently has no way of knowing that the device
>>>> accesses this memory incoherently.
>>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
>>> but just assumes that coherency is a global property of the platform,right?
>> Right.
>>
>>>> If that were somehow fixed, we still couldn't use dma_map_single() as it
>>>> doesn't accept virtual addresses that come from memremap() or similar
>>>> dynamic mappings.  We'd have to convert the physical address to an array
>>>> of struct pages and pass each one to dma_map_page().
>>> Sorry for my ignorance, but where does the memory come from to start
>>> with? Is this in the normal linearly mapped RAM, in a carveout outside
>>> of the linear mapping but the same memory, or in some on-chip buffer?
>> It's RAM that comes from the device tree reserved memory mechanism
>> (drivers/of/of_reserved_mem.c).  On a 32-bit kernel it is not guaranteed (or
>> likely) to be lowmem.
> Wouldn't dma_declare_coherent_memory() be the appropriate tool for that
> job, then (modulo the PPC issue)? On ARM that should result in
> dma_alloc_coherent() giving back a non-cacheable mapping if the device
> is non-coherent, wherein a dmb_wmb() after writing the data from the CPU
> side should be enough to ensure it is published to the device.
I think there is some confusion here (and it may very well be mine).

My understanding is that the dma_declare_coherent_memory() API sets up a
region of memory that will be managed by dma_alloc_coherent() and
friends.  This is useful if the driver needs to manage a region of on
device memory but that isn't what this specific region is used for.

The memory that is trying to be initialized here is a big chunk
(possible multiple megabytes) of RAM that only the QBMan device will
access with one exception - at initialization the device expects
software to zero the memory before starting to use the device. Since the
CPUs will never access this region again the device does
non-coherent/non-shareable accesses for performance reasons since QBMan
device is the only user - no need to maintain coherency with core side
caches.

We used the of_reserve memory as that seemed to be the best reliable way
to guarantee that we would get a properly aligned contiguous allocation
and it's been working well.  The downside is that the contents of that
memory is undefined so we had to map it, zero it and flush the cache in
order to get the RAM into the desired state and make sure we don't get
hit by a random castout in the future.

Would it make sense to add an option in the of_reserved_mem system to
fill the memory?  I haven't looked at the feasibility of that but it
seems like a generic solution that could be useful to others.  We could
add the fill value to the device tree so you could initialize to any
pattern.

- Roy

>
> Robin.
>
>>>> And even if we did all that, there would still be other manual cache
>>>> manipulation left in this driver, to deal with its cacheable register
>>>> interface.
>>> I thought we had concluded that "cacheable register" is something
>>> that cannot work reliably on ARM at all when this came up before.
>>> Any updates on that?
>> I'm not familiar with the details there...  My understanding is that the
>> hardware people at NXP are convinced that it can work on these specific chips
>> due to implementation details.
>>
>> -Scott
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-30 15:19               ` Russell King - ARM Linux
@ 2017-02-01 12:47                 ` Arnd Bergmann
  2017-02-01 23:16                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 37+ messages in thread
From: Arnd Bergmann @ 2017-02-01 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, January 30, 2017 3:19:33 PM CET Russell King - ARM Linux wrote:
> On Fri, Jan 27, 2017 at 05:41:10PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
> > > And even if we did all that, there would still be other manual cache
> > > manipulation left in this driver, to deal with its cacheable register
> > > interface.
> > 
> > I thought we had concluded that "cacheable register" is something
> > that cannot work reliably on ARM at all when this came up before.
> > Any updates on that?
> 
> There's a big comment in arch/arm/include/asm/io.h which explains
> everything.  Nothing there is likely to change.

Ah right, so since there is no interface to map as "device writeback",
the driver would likely end up with an MMIO register in a
"normal writeback", with these properties (among others listed there):

 * - writes can be repeated (in certain cases) with no side effects
 * - writes can be merged before accessing the target
 * - unaligned accesses can be supported
 * - ordering is not guaranteed without explicit dependencies or barrier
 *   instructions
 * - writes may be delayed before they hit the endpoint memory

I think the ordering is the most critical here, as IIRC the driver
expects a whole cache line to be written into the MMIO register
in a single bus cycle specifically when we issue the flush from the
driver, but we can actually have partial writebacks at any time.

	Arnd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-30 15:04           ` Russell King - ARM Linux
@ 2017-02-01 12:52             ` Arnd Bergmann
  0 siblings, 0 replies; 37+ messages in thread
From: Arnd Bergmann @ 2017-02-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, January 30, 2017 3:04:28 PM CET Russell King - ARM Linux wrote:
> On Wed, Jan 25, 2017 at 10:20:09PM +0100, Arnd Bergmann wrote:
> > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
> > > > Is there a non-"under the covers" way to say "flush this region" without
> > > > the arch second-guessing whether it really needs to be flushed?
> > > Any advice on how to resolve this? I looked into trying to do a
> > > non-cacheable mapping of the
> > > memory so that the flush wouldn't be required but the ioremap code
> > > prevents mapping normal
> > > memory in this way.  The QMan device requiresthis memory to be zeroed at
> > > startup. Because the device
> > > does non coherent reads and writes to the memory we must ensure that any
> > > cache in the CPU cluster(s)
> > > is flushed in order to prevent a future castout from overwriting data.
> > > This happened before on PPC
> > > platforms and made for some very unfun debug session trying to
> > > understand what was causing the failure.
> > 
> > If this is normal RAM, you should be able to just write zeroes, and then
> > do a dma_map_single() for initialization. Are there any other requirements
> > what to do with the memory later, is it used for communication at all,
> > or just required to be zero?
> 
> Please don't encourage incorrect DMA API usage.
> 
> dma_map_single() must always be paired with an unmap at some point,
> otherwise it's a memory leak if DMA API debugging is enabled.  So,
> if you wish to suggest using dma_map_single(), please also suggest
> where to use dma_unmap_single() too.

The mapping obviously has to exist the whole time the device is in use,
and we should not call dma_unmap_single() until the device is released
by the driver.

Unfortunately it seems we can't use this interface though, as the device
is already configured to a specific reserved memory range, and we can't
just use get_free_pages() here.

	Arnd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-01-30 19:04                   ` Roy Pledge
@ 2017-02-01 13:03                     ` Robin Murphy
  2017-02-01 22:51                       ` Scott Wood
  2017-02-06 22:26                       ` Roy Pledge
  0 siblings, 2 replies; 37+ messages in thread
From: Robin Murphy @ 2017-02-01 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/01/17 19:04, Roy Pledge wrote:
> On 1/30/2017 10:31 AM, Robin Murphy wrote:
>> On 28/01/17 02:34, Scott Wood wrote:
>>> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
>>>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com> wrote:
>>>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
>>>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
>>>>>> If this is normal RAM, you should be able to just write zeroes, and then
>>>>>> do a dma_map_single() for initialization.
>>>>> The DMA API on PPC currently has no way of knowing that the device
>>>>> accesses this memory incoherently.
>>>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
>>>> but just assumes that coherency is a global property of the platform,right?
>>> Right.
>>>
>>>>> If that were somehow fixed, we still couldn't use dma_map_single() as it
>>>>> doesn't accept virtual addresses that come from memremap() or similar
>>>>> dynamic mappings.  We'd have to convert the physical address to an array
>>>>> of struct pages and pass each one to dma_map_page().
>>>> Sorry for my ignorance, but where does the memory come from to start
>>>> with? Is this in the normal linearly mapped RAM, in a carveout outside
>>>> of the linear mapping but the same memory, or in some on-chip buffer?
>>> It's RAM that comes from the device tree reserved memory mechanism
>>> (drivers/of/of_reserved_mem.c).  On a 32-bit kernel it is not guaranteed (or
>>> likely) to be lowmem.
>> Wouldn't dma_declare_coherent_memory() be the appropriate tool for that
>> job, then (modulo the PPC issue)? On ARM that should result in
>> dma_alloc_coherent() giving back a non-cacheable mapping if the device
>> is non-coherent, wherein a dmb_wmb() after writing the data from the CPU
>> side should be enough to ensure it is published to the device.
> I think there is some confusion here (and it may very well be mine).
> 
> My understanding is that the dma_declare_coherent_memory() API sets up a
> region of memory that will be managed by dma_alloc_coherent() and
> friends.  This is useful if the driver needs to manage a region of on
> device memory but that isn't what this specific region is used for.

It's a bit more general than that - dma_alloc_coherent() can essentially
be considered "give me some memory for this device to use". We already
have use-cases where such buffers are only ever accessed by the device
(e.g. some display controllers, and future NVMe devices), hence
DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space.

A DMA allocation also inherently guarantees appropriate alignment,
regardless of whether you're using a per-device reservation or just
regular CMA, and will also zero the underlying memory (and for a
non-coherent device perform whatever cache maintenance is necessary, if
the clearing isn't already done via a non-cacheable mapping).

All you need to do in the driver is allocate your buffer and hand the
resulting address off to the device at probe (after optionally checking
for a reservation in DT and declaring it), then free it at remove, which
also ends up far more self-documenting (IMO) than a bunch of open-coded
remapping and #ifdef'ed architecture-private cache shenanigans.

> The memory that is trying to be initialized here is a big chunk
> (possible multiple megabytes) of RAM that only the QBMan device will
> access with one exception - at initialization the device expects
> software to zero the memory before starting to use the device. Since the
> CPUs will never access this region again the device does
> non-coherent/non-shareable accesses for performance reasons since QBMan
> device is the only user - no need to maintain coherency with core side
> caches.
> 
> We used the of_reserve memory as that seemed to be the best reliable way
> to guarantee that we would get a properly aligned contiguous allocation
> and it's been working well.  The downside is that the contents of that
> memory is undefined so we had to map it, zero it and flush the cache in
> order to get the RAM into the desired state and make sure we don't get
> hit by a random castout in the future.
> 
> Would it make sense to add an option in the of_reserved_mem system to
> fill the memory?  I haven't looked at the feasibility of that but it
> seems like a generic solution that could be useful to others.  We could
> add the fill value to the device tree so you could initialize to any
> pattern.

In short, said generic solution is right there already, only the PPC
arch code might need tweaking to accommodate it :)

Robin.

> 
> - Roy
> 
>>
>> Robin.
>>
>>>>> And even if we did all that, there would still be other manual cache
>>>>> manipulation left in this driver, to deal with its cacheable register
>>>>> interface.
>>>> I thought we had concluded that "cacheable register" is something
>>>> that cannot work reliably on ARM at all when this came up before.
>>>> Any updates on that?
>>> I'm not familiar with the details there...  My understanding is that the
>>> hardware people at NXP are convinced that it can work on these specific chips
>>> due to implementation details.
>>>
>>> -Scott
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-01 13:03                     ` Robin Murphy
@ 2017-02-01 22:51                       ` Scott Wood
  2017-02-06 22:26                       ` Roy Pledge
  1 sibling, 0 replies; 37+ messages in thread
From: Scott Wood @ 2017-02-01 22:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-02-01 at 13:03 +0000, Robin Murphy wrote:
> On 30/01/17 19:04, Roy Pledge wrote:
> > 
> > On 1/30/2017 10:31 AM, Robin Murphy wrote:
> > > 
> > > On 28/01/17 02:34, Scott Wood wrote:
> > > > 
> > > > On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
> > > > > 
> > > > > On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood@nxp.com>
> > > > > wrote:
> > > > > > 
> > > > > > On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
> > > > > > > 
> > > > > > > On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
> > > > > > > If this is normal RAM, you should be able to just write zeroes,
> > > > > > > and then
> > > > > > > do a dma_map_single() for initialization.
> > > > > > The DMA API on PPC currently has no way of knowing that the device
> > > > > > accesses this memory incoherently.
> > > > > Ah, this is because PPC doesn't use the 'dma-coherent' property on
> > > > > devices
> > > > > but just assumes that coherency is a global property of the
> > > > > platform,right?
> > > > Right.
> > > > 
> > > > > 
> > > > > > 
> > > > > > If that were somehow fixed, we still couldn't use dma_map_single()
> > > > > > as it
> > > > > > doesn't accept virtual addresses that come from memremap() or
> > > > > > similar
> > > > > > dynamic mappings.??We'd have to convert the physical address to an
> > > > > > array
> > > > > > of struct pages and pass each one to dma_map_page().
> > > > > Sorry for my ignorance, but where does the memory come from to start
> > > > > with? Is this in the normal linearly mapped RAM, in a carveout
> > > > > outside
> > > > > of the linear mapping but the same memory, or in some on-chip
> > > > > buffer?
> > > > It's RAM that comes from the device tree reserved memory mechanism
> > > > (drivers/of/of_reserved_mem.c).??On a 32-bit kernel it is not
> > > > guaranteed (or
> > > > likely) to be lowmem.
> > > Wouldn't dma_declare_coherent_memory() be the appropriate tool for that
> > > job, then (modulo the PPC issue)? On ARM that should result in
> > > dma_alloc_coherent() giving back a non-cacheable mapping if the device
> > > is non-coherent, wherein a dmb_wmb() after writing the data from the CPU
> > > side should be enough to ensure it is published to the device.
> > I think there is some confusion here (and it may very well be mine).
> > 
> > My understanding is that the dma_declare_coherent_memory() API sets up a
> > region of memory that will be managed by dma_alloc_coherent() and
> > friends.??This is useful if the driver needs to manage a region of on
> > device memory but that isn't what this specific region is used for.
> It's a bit more general than that - dma_alloc_coherent() can essentially
> be considered "give me some memory for this device to use". We already
> have use-cases where such buffers are only ever accessed by the device
> (e.g. some display controllers, and future NVMe devices), hence
> DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space.

That doesn't deal with the fact that on PPC the DMA API will assume that DMA
is coherent -- and in fact providing non-cacheable memory is difficult on PPC
because the memory is covered by large-page cacheable mappings and mixing
cacheable and non-cacheable mappings is strictly forbidden (not just in terms
of coherence -- I've seen mysterious machine checks generated).

The idea behind the DMA API is that the platform knows better than the driver
how DMA needs to be handled, and usually that's correct -- but sometimes, with
integrated SoC devices whose programming model was designed from the
perspective of the entire platform, it isn't.

> A DMA allocation also inherently guarantees appropriate alignment,
> regardless of whether you're using a per-device reservation or just
> regular CMA, 

When "appropriate alignment" is many megabytes, you're more likely to waste
memory when you try to guarantee alignment on a secondary allocation than when
you're doing the aligned allocation directly from the main memory pool.

And how exactly does the DMA API know what alignment this particular
allocation needs? ?dma_alloc_coherent() doesn't take alignment as a parameter.

> and will also zero the underlying memory (and for a
> non-coherent device perform whatever cache maintenance is necessary, if
> the clearing isn't already done via a non-cacheable mapping).
> 
> All you need to do in the driver is allocate your buffer and hand the
> resulting address off to the device at probe (after optionally checking
> for a reservation in DT and declaring it), then free it at remove, which
> also ends up far more self-documenting (IMO)

It might be more self-documenting but as I pointed out earlier in this thread
it doesn't *work* without PPC arch work, and due to the mapping issues
mentioned above fixing the PPC arch (and in particular, this subarch) to
handle this would be difficult.

>  than a bunch of open-coded remapping and #ifdef'ed architecture-private
> cache shenanigans.

The only reason for the ifdefs is because arches can't agree on what to call
the function that actually does an unconditional cache flush.

And as I also pointed out earlier, this is not the only place where this
driver needs to do cache flushing.

-Scott

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-01 12:47                 ` Arnd Bergmann
@ 2017-02-01 23:16                   ` Russell King - ARM Linux
  2017-02-02 17:21                     ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-02-01 23:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 01:47:46PM +0100, Arnd Bergmann wrote:
> Ah right, so since there is no interface to map as "device writeback",
> the driver would likely end up with an MMIO register in a
> "normal writeback", with these properties (among others listed there):

It's not about there being no _interface_, there's no support in the
architecture for it.  The use of the "device" memory type does not
allow for any form of cache hints to be specified.

>  * - writes can be repeated (in certain cases) with no side effects
>  * - writes can be merged before accessing the target
>  * - unaligned accesses can be supported
>  * - ordering is not guaranteed without explicit dependencies or barrier
>  *   instructions
>  * - writes may be delayed before they hit the endpoint memory
> 
> I think the ordering is the most critical here, as IIRC the driver
> expects a whole cache line to be written into the MMIO register
> in a single bus cycle specifically when we issue the flush from the
> driver, but we can actually have partial writebacks at any time.

I don't think that's achievable, unless your data bus is as wide as
a cache line (iow, for a 32-byte cache line, your data bus all the
way to the peripheral would need to be 256 bits wide to write the
cache line in a single bus cycle.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-01 23:16                   ` Russell King - ARM Linux
@ 2017-02-02 17:21                     ` Roy Pledge
  0 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-02-02 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/1/2017 6:17 PM, Russell King - ARM Linux wrote:
> On Wed, Feb 01, 2017 at 01:47:46PM +0100, Arnd Bergmann wrote:
>> Ah right, so since there is no interface to map as "device writeback",
>> the driver would likely end up with an MMIO register in a
>> "normal writeback", with these properties (among others listed there):
> It's not about there being no _interface_, there's no support in the
> architecture for it.  The use of the "device" memory type does not
> allow for any form of cache hints to be specified.
>
>>  * - writes can be repeated (in certain cases) with no side effects
>>  * - writes can be merged before accessing the target
>>  * - unaligned accesses can be supported
>>  * - ordering is not guaranteed without explicit dependencies or barrier
>>  *   instructions
>>  * - writes may be delayed before they hit the endpoint memory
>>
>> I think the ordering is the most critical here, as IIRC the driver
>> expects a whole cache line to be written into the MMIO register
>> in a single bus cycle specifically when we issue the flush from the
>> driver, but we can actually have partial writebacks at any time.
> I don't think that's achievable, unless your data bus is as wide as
> a cache line (iow, for a 32-byte cache line, your data bus all the
> way to the peripheral would need to be 256 bits wide to write the
> cache line in a single bus cycle.)
>
The device doesn't expect a whole cache line to be written at once.  The
protocol protects against cast outs/write backs by using an alternating
valid bit that is set after a memory barrier is executed.  The driver
ensures that the valid bit is flipped last and the ARM architecture
protects that the device sees the writes in the proper order due to the
barrier. In this case portal memory is on device memory so the writes
from the CPU are backed by the device - there is no read transaction
from the device for any portal memory.

I also want to point out that the memory we've been discussion for the
flush has nothing to do with the software portal mechanism, the memory
here is for queue storage when queues grow in length.  All this patch is
trying to do is zero memory obtained from the of_reserved memory systems
and ensure it isn't in the cache once the device starts using it. I will
try looking at the dma_alloc_coherent() suggestions from Robin but I
beleive when we tried that before it couldn't deal with the multiple
megabyte allocations due to fragmentation.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-01 13:03                     ` Robin Murphy
  2017-02-01 22:51                       ` Scott Wood
@ 2017-02-06 22:26                       ` Roy Pledge
  2017-02-06 22:37                         ` Russell King - ARM Linux
  1 sibling, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-02-06 22:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/1/2017 8:03 AM, Robin Murphy wrote:
>
>> My understanding is that the dma_declare_coherent_memory() API sets up a
>> region of memory that will be managed by dma_alloc_coherent() and
>> friends.  This is useful if the driver needs to manage a region of on
>> device memory but that isn't what this specific region is used for.
> It's a bit more general than that - dma_alloc_coherent() can essentially
> be considered "give me some memory for this device to use". We already
> have use-cases where such buffers are only ever accessed by the device
> (e.g. some display controllers, and future NVMe devices), hence
> DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space.
>
> A DMA allocation also inherently guarantees appropriate alignment,
> regardless of whether you're using a per-device reservation or just
> regular CMA, and will also zero the underlying memory (and for a
> non-coherent device perform whatever cache maintenance is necessary, if
> the clearing isn't already done via a non-cacheable mapping).
>
> All you need to do in the driver is allocate your buffer and hand the
> resulting address off to the device at probe (after optionally checking
> for a reservation in DT and declaring it), then free it at remove, which
> also ends up far more self-documenting (IMO) than a bunch of open-coded
> remapping and #ifdef'ed architecture-private cache shenanigans.
I found some time to work on this but the dma_declare_coherent_memory()
API does not allow me to pass normal memory for the device to use.

Depending on which flag is used I get one of two behaviors. If I pass
DMA_MEMORY_MAP the path attempts to do a memremap() which fails with the
message "memremap attempted on ram". If I use DMA_MEMORY_IO then it
attempts to use ioremap() on the memory which fails for a similar reason
(Not allowed to map ram).  Using dma_alloc_coherent() fails in our basic
use case because it is unable to obtain 8 megabytes of RAM.

Why don't I simplify this and use dma_sync_single_for_device() to flush
the cache? This seems to me to be portable and the simplest solution to
this problem. The device reports non-coherent when queried using 
is_device_dma_coherent() and my very quick walk through the code seems
to indicate that the call will do the right thing.

Roy

Roy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-06 22:26                       ` Roy Pledge
@ 2017-02-06 22:37                         ` Russell King - ARM Linux
  2017-02-07 16:44                           ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2017-02-06 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote:
> Why don't I simplify this and use dma_sync_single_for_device() to flush
> the cache?

... which can only be used on direct-mapped memory.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-06 22:37                         ` Russell King - ARM Linux
@ 2017-02-07 16:44                           ` Roy Pledge
  2017-02-07 18:25                             ` Robin Murphy
  0 siblings, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-02-07 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote:
> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote:
>> Why don't I simplify this and use dma_sync_single_for_device() to flush
>> the cache?
> ... which can only be used on direct-mapped memory.
>
Then I don't see another (or understand) any other option. There is no
way I can add a large area of physical memory carved out by
of_reserve_mem to a DMA allocator since both ioremap and memremap throw
an error if I try to do so. I appreciate the suggestions from people to
resolve this but so far there isn't a workable solution to the problem.
As far as I can tell other users of dma_declare_coherent_memory() are
using it for device backed memory, not DDR.  It sounds like a solution
on the surface but when you dig into the details it just doesn't work. 
Please let me know if I'm missing something, I really want to close this
issue so we can get upstream support for this device on ARM.


Roy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-07 16:44                           ` Roy Pledge
@ 2017-02-07 18:25                             ` Robin Murphy
  2017-02-13 21:26                               ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2017-02-07 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/02/17 16:44, Roy Pledge wrote:
> On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote:
>> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote:
>>> Why don't I simplify this and use dma_sync_single_for_device() to flush
>>> the cache?
>> ... which can only be used on direct-mapped memory.
>>
> Then I don't see another (or understand) any other option. There is no
> way I can add a large area of physical memory carved out by
> of_reserve_mem to a DMA allocator since both ioremap and memremap throw
> an error if I try to do so. I appreciate the suggestions from people to
> resolve this but so far there isn't a workable solution to the problem.
> As far as I can tell other users of dma_declare_coherent_memory() are
> using it for device backed memory, not DDR.  It sounds like a solution
> on the surface but when you dig into the details it just doesn't work. 
> Please let me know if I'm missing something, I really want to close this
> issue so we can get upstream support for this device on ARM.

Apologies, having found time to take a proper look, it transpires I was
somewhat misremembering things. You're right that
dma_declare_coherent_memory() is for resources owned by the device, and
a DT reservation doesn't really count as such. What I had mixed up is
that a reserved-memory node has its own way of getting tied into CMA
automatically: ensure it has the "shared-dma-pool" compatible, point a
memory-region property at it from your device's node, then have the
driver call of_reserved_mem_device_init() - at that point, your DMA
allocations should be coming out of the given region so you should be
free to just grab the whole lot in one go.

I've just confirmed that experimentally by bodging a reserved-memory
node into my Juno DT for the ARM HDLCD - another non-coherent device
which needs ~8MB contiguous buffers - and it works as expected.

Robin.

> 
> 
> Roy
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-07 18:25                             ` Robin Murphy
@ 2017-02-13 21:26                               ` Roy Pledge
  2017-03-16  0:43                                 ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-02-13 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/7/2017 1:26 PM, Robin Murphy wrote:
> On 07/02/17 16:44, Roy Pledge wrote:
>> On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote:
>>> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote:
>>>> Why don't I simplify this and use dma_sync_single_for_device() to flush
>>>> the cache?
>>> ... which can only be used on direct-mapped memory.
>>>
>> Then I don't see another (or understand) any other option. There is no
>> way I can add a large area of physical memory carved out by
>> of_reserve_mem to a DMA allocator since both ioremap and memremap throw
>> an error if I try to do so. I appreciate the suggestions from people to
>> resolve this but so far there isn't a workable solution to the problem.
>> As far as I can tell other users of dma_declare_coherent_memory() are
>> using it for device backed memory, not DDR.  It sounds like a solution
>> on the surface but when you dig into the details it just doesn't work. 
>> Please let me know if I'm missing something, I really want to close this
>> issue so we can get upstream support for this device on ARM.
> Apologies, having found time to take a proper look, it transpires I was
> somewhat misremembering things. You're right that
> dma_declare_coherent_memory() is for resources owned by the device, and
> a DT reservation doesn't really count as such. What I had mixed up is
> that a reserved-memory node has its own way of getting tied into CMA
> automatically: ensure it has the "shared-dma-pool" compatible, point a
> memory-region property at it from your device's node, then have the
> driver call of_reserved_mem_device_init() - at that point, your DMA
> allocations should be coming out of the given region so you should be
> free to just grab the whole lot in one go.
>
> I've just confirmed that experimentally by bodging a reserved-memory
> node into my Juno DT for the ARM HDLCD - another non-coherent device
> which needs ~8MB contiguous buffers - and it works as expected.
>
> Robin.
This makes sense to me now, but I run into an issue when I try this on
my development system.

I added 'compatible = "shared-dma-pool" into my device tree and the set
the memory-region property for my device and I see this now at boottime:

[    0.000000] Reserved memory: created DMA memory pool at
0x00000009ff000000, size 8 MiB
[    0.000000] OF: reserved mem: initialized node qman-fqd, compatible
id shared-dma-pool

but when I call of_reserved_mem_device_init() during my device probe
routine it fails as follows:

[    0.738814] memremap attempted on ram 0x00000009ff000000 size: 0x800000
[    0.745485] ------------[ cut here ]------------
[    0.750132] WARNING: CPU: 0 PID: 1 at kernel/memremap.c:111
memremap+0x100/0x144

<....cut out a bunch of register dumps....>

[    0.973780] [<ffff00000816a748>] memremap+0x100/0x144
[    0.978859] [<ffff00000853c1d0>] dma_init_coherent_memory+0x60/0x194
[    0.985249] [<ffff00000853c7b0>] rmem_dma_device_init+0x64/0x94
[    0.991202] [<ffff000008775c88>]
of_reserved_mem_device_init_by_idx+0xf8/0x198
[    0.998468] [<ffff00000849b1ac>] fsl_qman_probe+0xdc/0x804
[    1.003983] [<ffff000008526538>] platform_drv_probe+0x50/0xbc
[    1.009760] [<ffff00000852489c>] driver_probe_device+0x220/0x2b8
[    1.015799] [<ffff0000085249d8>] __driver_attach+0xa4/0xa8
[    1.021313] [<ffff000008522908>] bus_for_each_dev+0x58/0x98
[    1.026915] [<ffff0000085241d8>] driver_attach+0x20/0x28
[    1.032254] [<ffff000008523d98>] bus_add_driver+0x1c8/0x22c
[    1.037856] [<ffff0000085252d8>] driver_register+0x60/0xf4
[    1.043371] [<ffff00000852647c>] __platform_driver_register+0x48/0x50
[    1.049849] [<ffff000008d5800c>] fsl_qman_driver_init+0x18/0x20
[    1.055802] [<ffff000008083148>] do_one_initcall+0x38/0x118
[    1.061405] [<ffff000008d20cec>] kernel_init_freeable+0x1a0/0x240
[    1.067533] [<ffff0000088f8298>] kernel_init+0x10/0xfc
[    1.072697] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
[    1.078055] Reserved memory: failed to init DMA memory pool at
0x00000009ff000000, size 8 MiB

Is there another attribute I need to set in the device tree? It seems to
be trying to setup the pool but failing to do the mapping similar to the
issues I faced earlier.

Thanks again

Roy

>
>>
>> Roy
>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-02-13 21:26                               ` Roy Pledge
@ 2017-03-16  0:43                                 ` Roy Pledge
  2017-03-16 20:08                                   ` Scott Wood
  0 siblings, 1 reply; 37+ messages in thread
From: Roy Pledge @ 2017-03-16  0:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/13/2017 4:26 PM, Roy Pledge wrote:
> On 2/7/2017 1:26 PM, Robin Murphy wrote:
>> On 07/02/17 16:44, Roy Pledge wrote:
>>> On 2/6/2017 5:37 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Feb 06, 2017 at 10:26:26PM +0000, Roy Pledge wrote:
>>>>> Why don't I simplify this and use dma_sync_single_for_device() to flush
>>>>> the cache?
>>>> ... which can only be used on direct-mapped memory.
>>>>
>>> Then I don't see another (or understand) any other option. There is no
>>> way I can add a large area of physical memory carved out by
>>> of_reserve_mem to a DMA allocator since both ioremap and memremap throw
>>> an error if I try to do so. I appreciate the suggestions from people to
>>> resolve this but so far there isn't a workable solution to the problem.
>>> As far as I can tell other users of dma_declare_coherent_memory() are
>>> using it for device backed memory, not DDR.  It sounds like a solution
>>> on the surface but when you dig into the details it just doesn't work. 
>>> Please let me know if I'm missing something, I really want to close this
>>> issue so we can get upstream support for this device on ARM.
>> Apologies, having found time to take a proper look, it transpires I was
>> somewhat misremembering things. You're right that
>> dma_declare_coherent_memory() is for resources owned by the device, and
>> a DT reservation doesn't really count as such. What I had mixed up is
>> that a reserved-memory node has its own way of getting tied into CMA
>> automatically: ensure it has the "shared-dma-pool" compatible, point a
>> memory-region property at it from your device's node, then have the
>> driver call of_reserved_mem_device_init() - at that point, your DMA
>> allocations should be coming out of the given region so you should be
>> free to just grab the whole lot in one go.
>>
>> I've just confirmed that experimentally by bodging a reserved-memory
>> node into my Juno DT for the ARM HDLCD - another non-coherent device
>> which needs ~8MB contiguous buffers - and it works as expected.
>>
>> Robin.
> This makes sense to me now, but I run into an issue when I try this on
> my development system.
>
> I added 'compatible = "shared-dma-pool" into my device tree and the set
> the memory-region property for my device and I see this now at boottime:
>
> [    0.000000] Reserved memory: created DMA memory pool at
> 0x00000009ff000000, size 8 MiB
> [    0.000000] OF: reserved mem: initialized node qman-fqd, compatible
> id shared-dma-pool
>
> but when I call of_reserved_mem_device_init() during my device probe
> routine it fails as follows:
>
> [    0.738814] memremap attempted on ram 0x00000009ff000000 size: 0x800000
> [    0.745485] ------------[ cut here ]------------
> [    0.750132] WARNING: CPU: 0 PID: 1 at kernel/memremap.c:111
> memremap+0x100/0x144
>
> <....cut out a bunch of register dumps....>
>
> [    0.973780] [<ffff00000816a748>] memremap+0x100/0x144
> [    0.978859] [<ffff00000853c1d0>] dma_init_coherent_memory+0x60/0x194
> [    0.985249] [<ffff00000853c7b0>] rmem_dma_device_init+0x64/0x94
> [    0.991202] [<ffff000008775c88>]
> of_reserved_mem_device_init_by_idx+0xf8/0x198
> [    0.998468] [<ffff00000849b1ac>] fsl_qman_probe+0xdc/0x804
> [    1.003983] [<ffff000008526538>] platform_drv_probe+0x50/0xbc
> [    1.009760] [<ffff00000852489c>] driver_probe_device+0x220/0x2b8
> [    1.015799] [<ffff0000085249d8>] __driver_attach+0xa4/0xa8
> [    1.021313] [<ffff000008522908>] bus_for_each_dev+0x58/0x98
> [    1.026915] [<ffff0000085241d8>] driver_attach+0x20/0x28
> [    1.032254] [<ffff000008523d98>] bus_add_driver+0x1c8/0x22c
> [    1.037856] [<ffff0000085252d8>] driver_register+0x60/0xf4
> [    1.043371] [<ffff00000852647c>] __platform_driver_register+0x48/0x50
> [    1.049849] [<ffff000008d5800c>] fsl_qman_driver_init+0x18/0x20
> [    1.055802] [<ffff000008083148>] do_one_initcall+0x38/0x118
> [    1.061405] [<ffff000008d20cec>] kernel_init_freeable+0x1a0/0x240
> [    1.067533] [<ffff0000088f8298>] kernel_init+0x10/0xfc
> [    1.072697] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
> [    1.078055] Reserved memory: failed to init DMA memory pool at
> 0x00000009ff000000, size 8 MiB
>
> Is there another attribute I need to set in the device tree? It seems to
> be trying to setup the pool but failing to do the mapping similar to the
> issues I faced earlier.
>
> Thanks again
>
> Roy
Just wanted to follow up on this - turns out the no-map; attribute is
what I was missing. I now have the suggested mechanism working for ARM.
However when I went to test on PowePC the region wasn't being enabled
properly. It turns out the Kconfig in arch/powerpc doesn't define
HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I
added it and everything worked on my PPC board as it did on ARM.

Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for
PowerPC? I did a quick search but didn't see previous discussion on this
topic.

>
>>> Roy
>>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-03-16  0:43                                 ` Roy Pledge
@ 2017-03-16 20:08                                   ` Scott Wood
  2017-03-29 21:19                                     ` Roy Pledge
  0 siblings, 1 reply; 37+ messages in thread
From: Scott Wood @ 2017-03-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2017-03-16 at 00:43 +0000, Roy Pledge wrote:
> Just wanted to follow up on this - turns out the no-map; attribute is
> what I was missing. I now have the suggested mechanism working for ARM.
> However when I went to test on PowePC the region wasn't being enabled
> properly. It turns out the Kconfig in arch/powerpc doesn't define
> HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I
> added it and everything worked on my PPC board as it did on ARM.
> 
> Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for
> PowerPC? I did a quick search but didn't see previous discussion on this
> topic.

Probably because nothing on PPC needed it.

In any case, the problems with doing this on PPC run deeper than that, as I
pointed out in:
https://www.spinics.net/lists/arm-kernel/msg560020.html

-Scott

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()
  2017-03-16 20:08                                   ` Scott Wood
@ 2017-03-29 21:19                                     ` Roy Pledge
  0 siblings, 0 replies; 37+ messages in thread
From: Roy Pledge @ 2017-03-29 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/16/2017 4:08 PM, Scott Wood wrote:
> On Thu, 2017-03-16 at 00:43 +0000, Roy Pledge wrote:
>> Just wanted to follow up on this - turns out the no-map; attribute is
>> what I was missing. I now have the suggested mechanism working for ARM.
>> However when I went to test on PowePC the region wasn't being enabled
>> properly. It turns out the Kconfig in arch/powerpc doesn't define
>> HAVE_GENERIC_DMA_COHERENT so this mechanism isn't available. However I
>> added it and everything worked on my PPC board as it did on ARM.
>>
>> Does anyone know why HAVE_GENERIC_DMA_COHERENT isn't defined for
>> PowerPC? I did a quick search but didn't see previous discussion on this
>> topic.
> Probably because nothing on PPC needed it.
>
> In any case, the problems with doing this on PPC run deeper than that, as I
> pointed out in:
> https://www.spinics.net/lists/arm-kernel/msg560020.html
I just pushed out an RFC patchset that uses dma_zalloc_coherent() to
setup the memory mappings as Robin suggested. This is working in our
regressions for both PPC and ARM but I only pushed the PPC changes for
now to keep the patchset small.

Scott, I'm not clear on why you don't think this will work for PPC. I
agree there are issues with flush/invalidate in the software portal area
but this patch just deals with private memory allocations. If you don't
think this approach is valid on PPC then I'm not sure how else to
implement this in a platform generic way.  I'm looking forward to your
feedback as well as Robin's and others.

Thanks - Roy
> -Scott
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-03-29 21:19 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18 22:39 [PATCH 00/10] fsl/qbman: ARM Enablement Roy Pledge
2017-01-18 22:39 ` [PATCH 01/10] soc/qbman: Use portable mapping for the FQD reserved memory Roy Pledge
2017-01-18 22:39 ` [PATCH 02/10] soc/qbman: Drop set/clear_bits usage Roy Pledge
2017-01-18 22:39 ` [PATCH 03/10] soc/qbman: Drop L1_CACHE_BYTES compile time check Roy Pledge
2017-01-18 22:39 ` [PATCH 04/10] soc/qbman: Fix ARM32 typo Roy Pledge
2017-01-18 22:39 ` [PATCH 05/10] soc/qbman: Rework ioremap() calls for ARM/PPC Roy Pledge
2017-01-18 22:39 ` [PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range() Roy Pledge
2017-01-18 23:12   ` Russell King - ARM Linux
2017-01-18 23:36     ` Scott Wood
2017-01-23 19:24       ` Roy Pledge
2017-01-25 21:20         ` Arnd Bergmann
2017-01-26  5:08           ` Scott Wood
2017-01-27 16:41             ` Arnd Bergmann
2017-01-28  2:34               ` Scott Wood
2017-01-30 15:31                 ` Robin Murphy
2017-01-30 19:04                   ` Roy Pledge
2017-02-01 13:03                     ` Robin Murphy
2017-02-01 22:51                       ` Scott Wood
2017-02-06 22:26                       ` Roy Pledge
2017-02-06 22:37                         ` Russell King - ARM Linux
2017-02-07 16:44                           ` Roy Pledge
2017-02-07 18:25                             ` Robin Murphy
2017-02-13 21:26                               ` Roy Pledge
2017-03-16  0:43                                 ` Roy Pledge
2017-03-16 20:08                                   ` Scott Wood
2017-03-29 21:19                                     ` Roy Pledge
2017-01-30 15:19               ` Russell King - ARM Linux
2017-02-01 12:47                 ` Arnd Bergmann
2017-02-01 23:16                   ` Russell King - ARM Linux
2017-02-02 17:21                     ` Roy Pledge
2017-01-30 15:04           ` Russell King - ARM Linux
2017-02-01 12:52             ` Arnd Bergmann
2017-01-30 15:12       ` Russell King - ARM Linux
2017-01-18 22:39 ` [PATCH 07/10] soc/qbman: add QMAN_REV32 Roy Pledge
2017-01-18 22:39 ` [PATCH 08/10] soc/qbman: different register offsets on ARM Roy Pledge
2017-01-18 22:39 ` [PATCH 09/10] soc/qbman: Add missing headers " Roy Pledge
2017-01-18 22:39 ` [PATCH 10/10] fsl/qbman: Enable FSL_LAYERSCAPE config " Roy Pledge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.