All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM: avoid l2x0 build warning for CONFIG_OF=n
@ 2014-09-08 14:53 Arnd Bergmann
  2014-09-08 15:39 ` Russell King - ARM Linux
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-08 14:53 UTC (permalink / raw)
  To: linux-arm-kernel

A few functions in the l2x0 code are now only used from code that
is DT-only, so we get harmless gcc warnings about unused static
symbols unless these are removed as well.

arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cf9ea8f130e2 ("ARM: l2c: remove obsolete l2x0 ops for non-OF init")

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 5f2c988a06ac..08670f96281d 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -164,6 +164,7 @@ static inline void debug_writel(unsigned long val)
 }
 #endif
 
+#ifdef CONFIG_OF
 static void l2x0_cache_sync(void)
 {
 	unsigned long flags;
@@ -201,6 +202,7 @@ static void l2x0_disable(void)
 	dsb(st);
 	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
 }
+#endif
 
 static void l2c_save(void __iomem *base)
 {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] ARM: avoid l2x0 build warning for CONFIG_OF=n
  2014-09-08 14:53 [PATCH] ARM: avoid l2x0 build warning for CONFIG_OF=n Arnd Bergmann
@ 2014-09-08 15:39 ` Russell King - ARM Linux
  2014-09-08 20:36   ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2014-09-08 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 08, 2014 at 04:53:39PM +0200, Arnd Bergmann wrote:
> A few functions in the l2x0 code are now only used from code that
> is DT-only, so we get harmless gcc warnings about unused static
> symbols unless these are removed as well.
> 
> arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
> arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
> arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Fixes: cf9ea8f130e2 ("ARM: l2c: remove obsolete l2x0 ops for non-OF init")

I've already NAK'd patches like this.  This needs solving properly.

Never hide valid warnings like this.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARM: avoid l2x0 build warning for CONFIG_OF=n
  2014-09-08 15:39 ` Russell King - ARM Linux
@ 2014-09-08 20:36   ` Arnd Bergmann
  2014-09-08 20:42     ` [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling Arnd Bergmann
  2014-09-08 20:43     ` [PATCH] ARM: cache-l2x0: optimize aurora range operations Arnd Bergmann
  0 siblings, 2 replies; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-08 20:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 08 September 2014 16:39:46 Russell King - ARM Linux wrote:
> On Mon, Sep 08, 2014 at 04:53:39PM +0200, Arnd Bergmann wrote:
> > A few functions in the l2x0 code are now only used from code that
> > is DT-only, so we get harmless gcc warnings about unused static
> > symbols unless these are removed as well.
> > 
> > arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
> > arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
> > arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]
> > 
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > Fixes: cf9ea8f130e2 ("ARM: l2c: remove obsolete l2x0 ops for non-OF init")
> 
> I've already NAK'd patches like this.  This needs solving properly.
> 
> Never hide valid warnings like this.

Ok, it wasn't clear to me that these were intentionally left in there. I've
spent a little more time looking at this code and managed to come up with
two different patches to clean this code up. I don't know if this is what you
had in mind, but it does feel like a noticeable improvement, so if I missed
the real point, they could at least be used as a starting point for the next
person to look at this.

I've put the Free-electrons and Marvell folks that are involved with the
aurora code on Cc, so I hope one of them can give try this out on real
hardware or inspect the code further.

	Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-08 20:36   ` Arnd Bergmann
@ 2014-09-08 20:42     ` Arnd Bergmann
  2014-09-11  8:54       ` Thomas Petazzoni
  2014-09-11 10:08       ` Thomas Petazzoni
  2014-09-08 20:43     ` [PATCH] ARM: cache-l2x0: optimize aurora range operations Arnd Bergmann
  1 sibling, 2 replies; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-08 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

The aurora cache controller is the only remaining user of a couple
of functions in this file and are completely unused when that is
disabled, leading to build warnings:

arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]

With the knowledge that the code is now aurora-specific, we can
simplify it noticeably:

- The pl310 errata workarounds are not needed on aurora and can be removed
- The cache_wait() macro is never needed since this is not an l210/l220
- aurora_pa_range can keep the spinlock while syncing the cache
- We can load the l2x0_base into a local variable across operations

There should be no functional change in this patch, but readability
and the generated object code improves, along with avoiding the
warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
----
 arch/arm/mm/cache-l2x0.c | 115 ++++++++++++++++++++++++++-----------------------------------------------
 1 file changed, 41 insertions(+), 74 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 5f2c988a06ac..0964e83e0238 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -135,73 +135,6 @@ static void l2c_disable(void)
 	dsb(st);
 }
 
-#ifdef CONFIG_CACHE_PL310
-static inline void cache_wait(void __iomem *reg, unsigned long mask)
-{
-	/* cache operations by line are atomic on PL310 */
-}
-#else
-#define cache_wait	l2c_wait_mask
-#endif
-
-static inline void cache_sync(void)
-{
-	void __iomem *base = l2x0_base;
-
-	writel_relaxed(0, base + sync_reg_offset);
-	cache_wait(base + L2X0_CACHE_SYNC, 1);
-}
-
-#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915)
-static inline void debug_writel(unsigned long val)
-{
-	l2c_set_debug(l2x0_base, val);
-}
-#else
-/* Optimised out for non-errata case */
-static inline void debug_writel(unsigned long val)
-{
-}
-#endif
-
-static void l2x0_cache_sync(void)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&l2x0_lock, flags);
-	cache_sync();
-	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
-}
-
-static void __l2x0_flush_all(void)
-{
-	debug_writel(0x03);
-	__l2c_op_way(l2x0_base + L2X0_CLEAN_INV_WAY);
-	cache_sync();
-	debug_writel(0x00);
-}
-
-static void l2x0_flush_all(void)
-{
-	unsigned long flags;
-
-	/* clean all ways */
-	raw_spin_lock_irqsave(&l2x0_lock, flags);
-	__l2x0_flush_all();
-	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
-}
-
-static void l2x0_disable(void)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&l2x0_lock, flags);
-	__l2x0_flush_all();
-	l2c_write_sec(0, l2x0_base, L2X0_CTRL);
-	dsb(st);
-	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
-}
-
 static void l2c_save(void __iomem *base)
 {
 	l2x0_saved_regs.aux_ctrl = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
@@ -1126,14 +1059,14 @@ static unsigned long calc_range_end(unsigned long start, unsigned long end)
 static void aurora_pa_range(unsigned long start, unsigned long end,
 			unsigned long offset)
 {
+	void __iomem *base = l2x0_base;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&l2x0_lock, flags);
-	writel_relaxed(start, l2x0_base + AURORA_RANGE_BASE_ADDR_REG);
-	writel_relaxed(end, l2x0_base + offset);
+	writel_relaxed(start, base + AURORA_RANGE_BASE_ADDR_REG);
+	writel_relaxed(end, base + offset);
+	writel_relaxed(0, base + AURORA_SYNC_REG);
 	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
-
-	cache_sync();
 }
 
 static void aurora_inv_range(unsigned long start, unsigned long end)
@@ -1193,6 +1126,40 @@ static void aurora_flush_range(unsigned long start, unsigned long end)
 	}
 }
 
+static void aurora_flush_all(void)
+{
+	void __iomem *base = l2x0_base;
+	unsigned long flags;
+
+	/* clean all ways */
+	raw_spin_lock_irqsave(&l2x0_lock, flags);
+	__l2c_op_way(base + L2X0_CLEAN_INV_WAY);
+	writel_relaxed(0, base + AURORA_SYNC_REG);
+	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
+}
+
+static void aurora_cache_sync(void)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&l2x0_lock, flags);
+	writel_relaxed(0, l2x0_base + AURORA_SYNC_REG);
+	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
+}
+
+static void aurora_disable(void)
+{
+	void __iomem *base = l2x0_base;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&l2x0_lock, flags);
+	__l2c_op_way(base + L2X0_CLEAN_INV_WAY);
+	writel_relaxed(0, base + AURORA_SYNC_REG);
+	l2c_write_sec(0, base, L2X0_CTRL);
+	dsb(st);
+	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
+}
+
 static void aurora_save(void __iomem *base)
 {
 	l2x0_saved_regs.ctrl = readl_relaxed(base + L2X0_CTRL);
@@ -1267,9 +1234,9 @@ static const struct l2c_init_data of_aurora_with_outer_data __initconst = {
 		.inv_range   = aurora_inv_range,
 		.clean_range = aurora_clean_range,
 		.flush_range = aurora_flush_range,
-		.flush_all   = l2x0_flush_all,
-		.disable     = l2x0_disable,
-		.sync        = l2x0_cache_sync,
+		.flush_all   = aurora_flush_all,
+		.disable     = aurora_disable,
+		.sync        = aurora_cache_sync,
 		.resume      = aurora_resume,
 	},
 };

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] ARM: cache-l2x0: optimize aurora range operations
  2014-09-08 20:36   ` Arnd Bergmann
  2014-09-08 20:42     ` [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling Arnd Bergmann
@ 2014-09-08 20:43     ` Arnd Bergmann
  2014-09-11  9:13       ` Thomas Petazzoni
  2014-09-11 10:08       ` Thomas Petazzoni
  1 sibling, 2 replies; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-08 20:43 UTC (permalink / raw)
  To: linux-arm-kernel

The aurora_inv_range(), aurora_clean_range() and aurora_flush_range()
functions are highly redundant, both in source and in object code, and
they are harder to understand than necessary.

By moving the range loop into the aurora_pa_range() function, they
become trivial wrappers, and the object code start looking like what
one would expect for an optimal implementation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/mm/cache-l2x0.c | 67 +++++++++++++++++-----------------------------------------
 1 file changed, 22 insertions(+), 45 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 0964e83e0238..82d724af294e 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1033,7 +1033,7 @@ static const struct l2c_init_data of_l2c310_coherent_data __initconst = {
  * noninclusive, while the hardware cache range operations use
  * inclusive start and end addresses.
  */
-static unsigned long calc_range_end(unsigned long start, unsigned long end)
+static unsigned long aurora_range_end(unsigned long start, unsigned long end)
 {
 	/*
 	 * Limit the number of cache lines processed at once,
@@ -1052,25 +1052,13 @@ static unsigned long calc_range_end(unsigned long start, unsigned long end)
 	return end;
 }
 
-/*
- * Make sure 'start' and 'end' reference the same page, as L2 is PIPT
- * and range operations only do a TLB lookup on the start address.
- */
 static void aurora_pa_range(unsigned long start, unsigned long end,
-			unsigned long offset)
+			    unsigned long offset)
 {
 	void __iomem *base = l2x0_base;
+	unsigned long range_end;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&l2x0_lock, flags);
-	writel_relaxed(start, base + AURORA_RANGE_BASE_ADDR_REG);
-	writel_relaxed(end, base + offset);
-	writel_relaxed(0, base + AURORA_SYNC_REG);
-	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
-}
-
-static void aurora_inv_range(unsigned long start, unsigned long end)
-{
 	/*
 	 * round start and end adresses up to cache line size
 	 */
@@ -1078,15 +1066,24 @@ static void aurora_inv_range(unsigned long start, unsigned long end)
 	end = ALIGN(end, CACHE_LINE_SIZE);
 
 	/*
-	 * Invalidate all full cache lines between 'start' and 'end'.
+	 * perform operation on all full cache lines between 'start' and 'end'
 	 */
 	while (start < end) {
-		unsigned long range_end = calc_range_end(start, end);
-		aurora_pa_range(start, range_end - CACHE_LINE_SIZE,
-				AURORA_INVAL_RANGE_REG);
+		range_end = aurora_range_end(start, end);
+
+		raw_spin_lock_irqsave(&l2x0_lock, flags);
+		writel_relaxed(start, base + AURORA_RANGE_BASE_ADDR_REG);
+		writel_relaxed(range_end - CACHE_LINE_SIZE, base + offset);
+		writel_relaxed(0, base + AURORA_SYNC_REG);
+		raw_spin_unlock_irqrestore(&l2x0_lock, flags);
+
 		start = range_end;
 	}
 }
+static void aurora_inv_range(unsigned long start, unsigned long end)
+{
+	aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG);
+}
 
 static void aurora_clean_range(unsigned long start, unsigned long end)
 {
@@ -1094,36 +1091,16 @@ static void aurora_clean_range(unsigned long start, unsigned long end)
 	 * If L2 is forced to WT, the L2 will always be clean and we
 	 * don't need to do anything here.
 	 */
-	if (!l2_wt_override) {
-		start &= ~(CACHE_LINE_SIZE - 1);
-		end = ALIGN(end, CACHE_LINE_SIZE);
-		while (start != end) {
-			unsigned long range_end = calc_range_end(start, end);
-			aurora_pa_range(start, range_end - CACHE_LINE_SIZE,
-					AURORA_CLEAN_RANGE_REG);
-			start = range_end;
-		}
-	}
+	if (!l2_wt_override)
+		aurora_pa_range(start, end, AURORA_CLEAN_RANGE_REG);
 }
 
 static void aurora_flush_range(unsigned long start, unsigned long end)
 {
-	start &= ~(CACHE_LINE_SIZE - 1);
-	end = ALIGN(end, CACHE_LINE_SIZE);
-	while (start != end) {
-		unsigned long range_end = calc_range_end(start, end);
-		/*
-		 * If L2 is forced to WT, the L2 will always be clean and we
-		 * just need to invalidate.
-		 */
-		if (l2_wt_override)
-			aurora_pa_range(start, range_end - CACHE_LINE_SIZE,
-							AURORA_INVAL_RANGE_REG);
-		else
-			aurora_pa_range(start, range_end - CACHE_LINE_SIZE,
-							AURORA_FLUSH_RANGE_REG);
-		start = range_end;
-	}
+	if (l2_wt_override)
+		aurora_pa_range(start, end, AURORA_INVAL_RANGE_REG);
+	else
+		aurora_pa_range(start, end, AURORA_FLUSH_RANGE_REG);
 }
 
 static void aurora_flush_all(void)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-08 20:42     ` [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling Arnd Bergmann
@ 2014-09-11  8:54       ` Thomas Petazzoni
  2014-09-11  9:50         ` Arnd Bergmann
  2014-09-11 10:16         ` Russell King - ARM Linux
  2014-09-11 10:08       ` Thomas Petazzoni
  1 sibling, 2 replies; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

Arnd,

Thanks for working on this.

On Mon, 08 Sep 2014 22:42:23 +0200, Arnd Bergmann wrote:

> - The pl310 errata workarounds are not needed on aurora and can be removed
> - The cache_wait() macro is never needed since this is not an l210/l220

I am not sure about this change (see below).

> - aurora_pa_range can keep the spinlock while syncing the cache
> - We can load the l2x0_base into a local variable across operations

I am fine with this one, but I'm wondering what is the benefit of doing
this? The l2x0_base is anyway a global variable that is available for
all those functions, so not sure what an additional local variable
brings us.


> -#ifdef CONFIG_CACHE_PL310
> -static inline void cache_wait(void __iomem *reg, unsigned long mask)
> -{
> -	/* cache operations by line are atomic on PL310 */
> -}
> -#else
> -#define cache_wait	l2c_wait_mask
> -#endif

How do you conclude from this that cache_wait() does nothing? When
we're on PL310, it indeed does nothing, but in our case, we're not on a
PL310, so cache_wait is equivalent to l2c_wait_mask, which does:

/*
 * Common code for all cache controllers.
 */
static inline void l2c_wait_mask(void __iomem *reg, unsigned long mask)
{
        /* wait for cache operation by line or way to complete */
        while (readl_relaxed(reg) & mask)
                cpu_relax();
}

And this does not seem to be in any conditional making it specific to
PL210/PL220 as your commit log suggests.

> -static inline void cache_sync(void)
> -{
> -	void __iomem *base = l2x0_base;
> -
> -	writel_relaxed(0, base + sync_reg_offset);
> -	cache_wait(base + L2X0_CACHE_SYNC, 1);

So to me, this line is actually doing something on an Aurora cache
controller.

Am I missing something?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARM: cache-l2x0: optimize aurora range operations
  2014-09-08 20:43     ` [PATCH] ARM: cache-l2x0: optimize aurora range operations Arnd Bergmann
@ 2014-09-11  9:13       ` Thomas Petazzoni
  2014-09-11 10:08       ` Thomas Petazzoni
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

Arnd,

On Mon, 08 Sep 2014 22:43:32 +0200, Arnd Bergmann wrote:
> The aurora_inv_range(), aurora_clean_range() and aurora_flush_range()
> functions are highly redundant, both in source and in object code, and
> they are harder to understand than necessary.
> 
> By moving the range loop into the aurora_pa_range() function, they
> become trivial wrappers, and the object code start looking like what
> one would expect for an optimal implementation.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/mm/cache-l2x0.c | 67 +++++++++++++++++-----------------------------------------
>  1 file changed, 22 insertions(+), 45 deletions(-)

Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Note that I also did some running testing on Armada 370 and Armada XP
of both of your patches, and it worked. But since I still have concerns
on the other patch, I'm not giving a formal Tested-by yet.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-11  8:54       ` Thomas Petazzoni
@ 2014-09-11  9:50         ` Arnd Bergmann
  2014-09-11 10:07           ` Thomas Petazzoni
  2014-09-11 10:16         ` Russell King - ARM Linux
  1 sibling, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-11  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 11 September 2014 10:54:41 Thomas Petazzoni wrote:
> 
> On Mon, 08 Sep 2014 22:42:23 +0200, Arnd Bergmann wrote:
> 
> > - The pl310 errata workarounds are not needed on aurora and can be removed
> > - The cache_wait() macro is never needed since this is not an l210/l220
> 
> I am not sure about this change (see below).
> 
> > - aurora_pa_range can keep the spinlock while syncing the cache
> > - We can load the l2x0_base into a local variable across operations
> 
> I am fine with this one, but I'm wondering what is the benefit of doing
> this? The l2x0_base is anyway a global variable that is available for
> all those functions, so not sure what an additional local variable
> brings us.

The compiler does not always know if the l2x0_base pointer has changed
between two accesses. In particular if we do a spin_lock/spin_unlock
as in aurora_pa_range() it has to reload it.

Using a local variable tells the compiler that the pointer should
in fact be read once from memory and then used from a register,
which results in slightly smaller and more efficient object code.

> > -#ifdef CONFIG_CACHE_PL310
> > -static inline void cache_wait(void __iomem *reg, unsigned long mask)
> > -{
> > -     /* cache operations by line are atomic on PL310 */
> > -}
> > -#else
> > -#define cache_wait   l2c_wait_mask
> > -#endif
> 
> How do you conclude from this that cache_wait() does nothing? When
> we're on PL310, it indeed does nothing, but in our case, we're not on a
> PL310, so cache_wait is equivalent to l2c_wait_mask, which does:
>
> /*
>  * Common code for all cache controllers.
>  */
> static inline void l2c_wait_mask(void __iomem *reg, unsigned long mask)
> {
>         /* wait for cache operation by line or way to complete */
>         while (readl_relaxed(reg) & mask)
>                 cpu_relax();
> }
> 
> And this does not seem to be in any conditional making it specific to
> PL210/PL220 as your commit log suggests.

Interesting. If that is the intention of the code, it is rather broken
because CONFIG_CACHE_PL310 gets set automatically on ARMv7:

if CACHE_L2X0

config CACHE_PL310
        bool
        default y if CPU_V7 && !(CPU_V6 || CPU_V6K)
        help
          This option enables optimisations for the PL310 cache
          controller.
...
endif

This is from the original l2x0 code that is no longer used by anything
other than aurora. The idea was apparently that if we are on a v6+v7
kernel, we might have a l210 and need to wait here, while on a v7-only
kernel, we know that we have a pl310 and never need to do it.

> > -static inline void cache_sync(void)
> > -{
> > -     void __iomem *base = l2x0_base;
> > -
> > -     writel_relaxed(0, base + sync_reg_offset);
> > -     cache_wait(base + L2X0_CACHE_SYNC, 1);
> 
> So to me, this line is actually doing something on an Aurora cache
> controller.
> 
> Am I missing something?

It might be needed, I haven't checked the data sheet. However we don't
run that code today, and my patch does not change that.

	Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-11  9:50         ` Arnd Bergmann
@ 2014-09-11 10:07           ` Thomas Petazzoni
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Thu, 11 Sep 2014 11:50:29 +0200, Arnd Bergmann wrote:

> > I am fine with this one, but I'm wondering what is the benefit of doing
> > this? The l2x0_base is anyway a global variable that is available for
> > all those functions, so not sure what an additional local variable
> > brings us.
> 
> The compiler does not always know if the l2x0_base pointer has changed
> between two accesses. In particular if we do a spin_lock/spin_unlock
> as in aurora_pa_range() it has to reload it.
> 
> Using a local variable tells the compiler that the pointer should
> in fact be read once from memory and then used from a register,
> which results in slightly smaller and more efficient object code.

Ok, thanks for the explanation, makes sense.

> Interesting. If that is the intention of the code, it is rather broken
> because CONFIG_CACHE_PL310 gets set automatically on ARMv7:
> 
> if CACHE_L2X0
> 
> config CACHE_PL310
>         bool
>         default y if CPU_V7 && !(CPU_V6 || CPU_V6K)
>         help
>           This option enables optimisations for the PL310 cache
>           controller.
> ...
> endif
> 
> This is from the original l2x0 code that is no longer used by anything
> other than aurora. The idea was apparently that if we are on a v6+v7
> kernel, we might have a l210 and need to wait here, while on a v7-only
> kernel, we know that we have a pl310 and never need to do it.
> 
> > > -static inline void cache_sync(void)
> > > -{
> > > -     void __iomem *base = l2x0_base;
> > > -
> > > -     writel_relaxed(0, base + sync_reg_offset);
> > > -     cache_wait(base + L2X0_CACHE_SYNC, 1);
> > 
> > So to me, this line is actually doing something on an Aurora cache
> > controller.
> > 
> > Am I missing something?
> 
> It might be needed, I haven't checked the data sheet. However we don't
> run that code today, and my patch does not change that.

I just checked the datasheet, and about the L2 Cache Sync register, it
says:

  Writing to this register performs a draining operation of the L2
  eviction buffer. The write operation is when all maintenance
  operations of the issuing CPU are finished, and the eviction buffer
  is empty. This is an automatic operation, the L2 slave ports are
  stalled until the operation is completed.
  The written data is ignored.

So there is indeed no need to poll this register to see when the Sync
operation has completed: writing to the register already stalls the L2
slave ports until the operation is completed.

Moreover, I checked Marvell's LSP, and then indeed also only write to
the L2 Cache Sync register, without any busy wait loop. So your patch
looks good to me.

Thanks for the additional explanations!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-08 20:42     ` [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling Arnd Bergmann
  2014-09-11  8:54       ` Thomas Petazzoni
@ 2014-09-11 10:08       ` Thomas Petazzoni
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Mon, 08 Sep 2014 22:42:23 +0200, Arnd Bergmann wrote:
> The aurora cache controller is the only remaining user of a couple
> of functions in this file and are completely unused when that is
> disabled, leading to build warnings:
> 
> arch/arm/mm/cache-l2x0.c:167:13: warning: 'l2x0_cache_sync' defined but not used [-Wunused-function]
> arch/arm/mm/cache-l2x0.c:184:13: warning: 'l2x0_flush_all' defined but not used [-Wunused-function]
> arch/arm/mm/cache-l2x0.c:194:13: warning: 'l2x0_disable' defined but not used [-Wunused-function]
> 
> With the knowledge that the code is now aurora-specific, we can
> simplify it noticeably:
> 
> - The pl310 errata workarounds are not needed on aurora and can be removed
> - The cache_wait() macro is never needed since this is not an l210/l220
> - aurora_pa_range can keep the spinlock while syncing the cache
> - We can load the l2x0_base into a local variable across operations
> 
> There should be no functional change in this patch, but readability
> and the generated object code improves, along with avoiding the
> warnings.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ----
>  arch/arm/mm/cache-l2x0.c | 115 ++++++++++++++++++++++++++-----------------------------------------------
>  1 file changed, 41 insertions(+), 74 deletions(-)

Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
 (on Armada 370 RD and Armada XP GP, boot tested, plus a little bit of
 DMA traffic by reading data from a SD card)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARM: cache-l2x0: optimize aurora range operations
  2014-09-08 20:43     ` [PATCH] ARM: cache-l2x0: optimize aurora range operations Arnd Bergmann
  2014-09-11  9:13       ` Thomas Petazzoni
@ 2014-09-11 10:08       ` Thomas Petazzoni
  2014-09-11 10:20         ` Arnd Bergmann
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Mon, 08 Sep 2014 22:43:32 +0200, Arnd Bergmann wrote:
> The aurora_inv_range(), aurora_clean_range() and aurora_flush_range()
> functions are highly redundant, both in source and in object code, and
> they are harder to understand than necessary.
> 
> By moving the range loop into the aurora_pa_range() function, they
> become trivial wrappers, and the object code start looking like what
> one would expect for an optimal implementation.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/mm/cache-l2x0.c | 67 +++++++++++++++++-----------------------------------------
>  1 file changed, 22 insertions(+), 45 deletions(-)

Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
 (on Armada 370 RD and Armada XP GP, boot tested, plus a little bit of
 DMA traffic by reading data from a SD card)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-11  8:54       ` Thomas Petazzoni
  2014-09-11  9:50         ` Arnd Bergmann
@ 2014-09-11 10:16         ` Russell King - ARM Linux
  2014-09-11 10:31           ` Thomas Petazzoni
  1 sibling, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2014-09-11 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 10:54:41AM +0200, Thomas Petazzoni wrote:
> Arnd,
> 
> Thanks for working on this.

You know, we wouldn't be in this situation today with these warnings
if people /cooperated/ with me when I was working on this code.  So
I take your thanks to Arnd as a bit of an insult to me.

> On Mon, 08 Sep 2014 22:42:23 +0200, Arnd Bergmann wrote:
> > -#ifdef CONFIG_CACHE_PL310
> > -static inline void cache_wait(void __iomem *reg, unsigned long mask)
> > -{
> > -	/* cache operations by line are atomic on PL310 */
> > -}
> > -#else
> > -#define cache_wait	l2c_wait_mask
> > -#endif
> 
> How do you conclude from this that cache_wait() does nothing? When
> we're on PL310, it indeed does nothing, but in our case, we're not on a
> PL310, so cache_wait is equivalent to l2c_wait_mask, which does:

Right, which is why aurora is broken today - the behaviour you end up with
is 100% dependent on whether you've also included L2C310 support or not.
So, using Aurora with a single zImage (which will have support for L2C310
enabled) is going to make this a no-op.

This is why I didn't touch the Aurora code - I don't know what the correct
answer is, and I was ignored when I was working on this code.  So, I
decided that the current warnings which the code produces will serve as a
lesson to people *not* to ignore me, and they can remain there until
people start taking an interest in this.  This seems to be the only way
to get stuff done - being nice and civil gets you nowhere.

> /*
>  * Common code for all cache controllers.
>  */
> static inline void l2c_wait_mask(void __iomem *reg, unsigned long mask)
> {
>         /* wait for cache operation by line or way to complete */
>         while (readl_relaxed(reg) & mask)
>                 cpu_relax();
> }
> 
> And this does not seem to be in any conditional making it specific to
> PL210/PL220 as your commit log suggests.

Why would this need to be moved under an ifdef?  __l2c_op_way() also makes
use of l2c_wait_mask, which itself is used by l2c210_flush_all() and
l2c220_op_way().

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARM: cache-l2x0: optimize aurora range operations
  2014-09-11 10:08       ` Thomas Petazzoni
@ 2014-09-11 10:20         ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2014-09-11 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 11 September 2014 12:08:39 Thomas Petazzoni wrote:
> On Mon, 08 Sep 2014 22:43:32 +0200, Arnd Bergmann wrote:
> > The aurora_inv_range(), aurora_clean_range() and aurora_flush_range()
> > functions are highly redundant, both in source and in object code, and
> > they are harder to understand than necessary.
> > 
> > By moving the range loop into the aurora_pa_range() function, they
> > become trivial wrappers, and the object code start looking like what
> > one would expect for an optimal implementation.
> > 
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> >  arch/arm/mm/cache-l2x0.c | 67 +++++++++++++++++-----------------------------------------
> >  1 file changed, 22 insertions(+), 45 deletions(-)
> 
> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
>  (on Armada 370 RD and Armada XP GP, boot tested, plus a little bit of
>  DMA traffic by reading data from a SD card)
> 

Ok, thanks!

I wonder if it's worth doing benchmarks over this, as there is still a
little optimization potential in the range function if we hold the
spinlock a little longer. Right now we drop and reacquire the lock
for every 1024 bytes being flushed, which can cause a lot of traffic
on the coherency bus if we flush really long ranges.

We could hold the lock across the entire loop to improve that, but
that can also cause extra latency if multiple CPUs try to do cache
operations simultaneously.

	Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling
  2014-09-11 10:16         ` Russell King - ARM Linux
@ 2014-09-11 10:31           ` Thomas Petazzoni
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Petazzoni @ 2014-09-11 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Russell,

On Thu, 11 Sep 2014 11:16:49 +0100, Russell King - ARM Linux wrote:

> > Thanks for working on this.
> 
> You know, we wouldn't be in this situation today with these warnings
> if people /cooperated/ with me when I was working on this code.  So
> I take your thanks to Arnd as a bit of an insult to me.

I apologize if you took it this way, because it clearly wasn't the
intention here. Clearly the L2 cache code looks much better after your
big cleanup, I don't think anybody will question that.

Your original patch series was very large and therefore a bit scary to
look at. At the time, I was not able to dedicate enough time to review
a 97 patches patch series. Maybe splitting the patch series into
smaller chunks, and get the cleanup progressively merged would have
helped getting more review.

That's not really an excuse, but I was always never Cc'ed on your patch
series (which isn't your fault: I'm not a maintainer), while I was
Cc'ed on Arnd patches.

Also, to be honest, my initial reaction to a 97 patches patch series
from Russell King cleaning up the L2 cache stuff is: I definitely trust
Russell when working on this sort of stuff, and I know he's very
careful when doing cleanups, and since I don't have enough time to
review this, I'll trust him to do the right thing.

Turns out my (maybe blind?) trust seems to have worked: as far as
Marvell EBU platforms are concerned, I believe nothing was broken by
your changes. All Arnd is doing right now are further cleanups.

> Right, which is why aurora is broken today - the behaviour you end up with
> is 100% dependent on whether you've also included L2C310 support or not.
> So, using Aurora with a single zImage (which will have support for L2C310
> enabled) is going to make this a no-op.

Which works, as per my previous answer to Arnd, after checking the
datasheet and vendor tree.

> This is why I didn't touch the Aurora code - I don't know what the correct
> answer is, and I was ignored when I was working on this code.  So, I
> decided that the current warnings which the code produces will serve as a
> lesson to people *not* to ignore me, and they can remain there until
> people start taking an interest in this.  This seems to be the only way
> to get stuff done - being nice and civil gets you nowhere.

I think all of the kernel developers are encouraged and told to not send
huge patch series, because it is well-known that large patch series
will discourage potential reviewers. I'm pretty sure it's the major
reason for the lack of review of your patch series. Had it been sent in
20 patches chunks, maybe merged across a few kernel releases, I'm sure
you would have had more reviews and comments.

> > /*
> >  * Common code for all cache controllers.
> >  */
> > static inline void l2c_wait_mask(void __iomem *reg, unsigned long mask)
> > {
> >         /* wait for cache operation by line or way to complete */
> >         while (readl_relaxed(reg) & mask)
> >                 cpu_relax();
> > }
> > 
> > And this does not seem to be in any conditional making it specific to
> > PL210/PL220 as your commit log suggests.
> 
> Why would this need to be moved under an ifdef?  __l2c_op_way() also makes
> use of l2c_wait_mask, which itself is used by l2c210_flush_all() and
> l2c220_op_way().

I'm not saying it should be under a #ifdef, but Arnd's commit log was
saying that the l2c_wait_mask code was only used on pl210/pl220, which
according to:

-#ifdef CONFIG_CACHE_PL310
-static inline void cache_wait(void __iomem *reg, unsigned long mask)
-{
-	/* cache operations by line are atomic on PL310 */
-}
-#else
-#define cache_wait	l2c_wait_mask
-#endif

is not correct, because l2c_wait_mask is in fact used on all !PL310
configurations, which includes Aurora.

So I was not suggesting to change how l2c_wait_mask is defined, I was
merely pointing out what the current implementation does.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-11 10:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-08 14:53 [PATCH] ARM: avoid l2x0 build warning for CONFIG_OF=n Arnd Bergmann
2014-09-08 15:39 ` Russell King - ARM Linux
2014-09-08 20:36   ` Arnd Bergmann
2014-09-08 20:42     ` [PATCH 1/2] ARM: cache-l2x0: clean up aurora cache handling Arnd Bergmann
2014-09-11  8:54       ` Thomas Petazzoni
2014-09-11  9:50         ` Arnd Bergmann
2014-09-11 10:07           ` Thomas Petazzoni
2014-09-11 10:16         ` Russell King - ARM Linux
2014-09-11 10:31           ` Thomas Petazzoni
2014-09-11 10:08       ` Thomas Petazzoni
2014-09-08 20:43     ` [PATCH] ARM: cache-l2x0: optimize aurora range operations Arnd Bergmann
2014-09-11  9:13       ` Thomas Petazzoni
2014-09-11 10:08       ` Thomas Petazzoni
2014-09-11 10:20         ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.