All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support
@ 2015-03-07  0:54 Florian Fainelli
  2015-03-07  0:54 ` [PATCH 1/5] ARM: v7: allow setting different cache functions Florian Fainelli
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This patch series adds support for the Broadcom Brahma-B15 CPU read-ahead cache
which is to be found in several Broadcom SoCs, and mainline brcmstb.

Let me know your thoughts on this, we have tried to minimize both the
compile-time and runtime impact for non-Broadcom CPU parts.

Florian Fainelli (5):
  ARM: v7: allow setting different cache functions
  ARM: Add Broadcom Brahma-B15 readahead cache support
  ARM: Hook B15 readahead cache functions based on processor
  ARM: B15: Add CPU hotplug awareness
  ARM: B15: Add suspend/resume hooks

 arch/arm/include/asm/cacheflush.h             |   2 +-
 arch/arm/include/asm/glue-cache.h             |   4 +
 arch/arm/include/asm/hardware/cache-b15-rac.h |  12 +
 arch/arm/mm/Kconfig                           |   8 +
 arch/arm/mm/Makefile                          |   1 +
 arch/arm/mm/cache-b15-rac.c                   | 329 ++++++++++++++++++++++++++
 arch/arm/mm/cache-v7.S                        |  21 ++
 arch/arm/mm/proc-v7.S                         |   6 +-
 8 files changed, 379 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/hardware/cache-b15-rac.h
 create mode 100644 arch/arm/mm/cache-b15-rac.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] ARM: v7: allow setting different cache functions
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
@ 2015-03-07  0:54 ` Florian Fainelli
  2015-03-07  0:54 ` [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support Florian Fainelli
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for adding support for the Broadcom Brahma-B15 read-ahead
cache which requires a different set of cache functions, allow the
__v7_proc macro to override the cache_fns settings, and default to
v7_cache_fns unless specified otherwise.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm/mm/proc-v7.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 8b4ee5e81c14..e0dc266579bc 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -467,7 +467,7 @@ __v7_setup_stack:
 	/*
 	 * Standard v7 proc info content
 	 */
-.macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions
+.macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions, cache_fns = v7_cache_fns
 	ALT_SMP(.long	PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
 			PMD_SECT_AF | PMD_FLAGS_SMP | \mm_mmuflags)
 	ALT_UP(.long	PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
@@ -483,7 +483,7 @@ __v7_setup_stack:
 	.long	\proc_fns
 	.long	v7wbi_tlb_fns
 	.long	v6_user_fns
-	.long	v7_cache_fns
+	.long	\cache_fns
 .endm
 
 #ifndef CONFIG_ARM_LPAE
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
  2015-03-07  0:54 ` [PATCH 1/5] ARM: v7: allow setting different cache functions Florian Fainelli
@ 2015-03-07  0:54 ` Florian Fainelli
  2015-03-16 21:02   ` Russell King - ARM Linux
  2015-03-17 17:29   ` Will Deacon
  2015-03-07  0:54 ` [PATCH 3/5] ARM: Hook B15 readahead cache functions based on processor Florian Fainelli
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
controller. This cache controller sits between the L2 and the memory bus
and its purpose is to provide a friendler burst size towards the DDR
interface than the native cache line size.

The readahead cache is mostly transparent, except for
flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
is precisely what we are overriding here.

The readahead cache only intercepts reads, not writes, as such, some
data can remain stale in any of its buffers, such that we need to flush
it, which is an operation that needs to happen in a particular order:

- disable the readahead cache
- flush it
- call the appropriate cache-v7.S function
- re-enable

This patch tries to minimize the impact to the cache-v7.S file by only
providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
ARCH_BRCMSTB since it is the current user).

Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm/include/asm/cacheflush.h             |   2 +-
 arch/arm/include/asm/glue-cache.h             |   4 +
 arch/arm/include/asm/hardware/cache-b15-rac.h |  12 ++
 arch/arm/mm/Kconfig                           |   8 ++
 arch/arm/mm/Makefile                          |   1 +
 arch/arm/mm/cache-b15-rac.c                   | 181 ++++++++++++++++++++++++++
 6 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/include/asm/hardware/cache-b15-rac.h
 create mode 100644 arch/arm/mm/cache-b15-rac.c

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 2d46862e7bef..4d847e185cf6 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -199,7 +199,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
  */
 #if (defined(CONFIG_CPU_V7) && \
      (defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K))) || \
-	defined(CONFIG_SMP_ON_UP)
+	defined(CONFIG_SMP_ON_UP) || defined(CONFIG_CACHE_B15_RAC)
 #define __flush_icache_preferred	__cpuc_flush_icache_all
 #elif __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
 #define __flush_icache_preferred	__flush_icache_all_v7_smp
diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h
index a3c24cd5b7c8..11f33b5f9284 100644
--- a/arch/arm/include/asm/glue-cache.h
+++ b/arch/arm/include/asm/glue-cache.h
@@ -117,6 +117,10 @@
 # endif
 #endif
 
+#if defined(CONFIG_CACHE_B15_RAC)
+# define MULTI_CACHE 1
+#endif
+
 #if defined(CONFIG_CPU_V7M)
 # ifdef _CACHE
 #  define MULTI_CACHE 1
diff --git a/arch/arm/include/asm/hardware/cache-b15-rac.h b/arch/arm/include/asm/hardware/cache-b15-rac.h
new file mode 100644
index 000000000000..76b888f53f90
--- /dev/null
+++ b/arch/arm/include/asm/hardware/cache-b15-rac.h
@@ -0,0 +1,12 @@
+#ifndef __ASM_ARM_HARDWARE_CACHE_B15_RAC_H
+#define __ASM_ARM_HARDWARE_CACHE_B15_RAC_H
+
+#ifndef __ASSEMBLY__
+
+void b15_flush_kern_cache_all(void);
+void b15_flush_kern_cache_louis(void);
+void b15_flush_icache_all(void);
+
+#endif
+
+#endif
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 9b4f29e595a4..4d5652a39304 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -853,6 +853,14 @@ config OUTER_CACHE_SYNC
 	  The outer cache has a outer_cache_fns.sync function pointer
 	  that can be used to drain the write buffer of the outer cache.
 
+config CACHE_B15_RAC
+	bool "Enable the Broadcom Brahma-B15 read-ahead cache controller"
+	depends on ARCH_BRCMSTB
+	default y
+	help
+	  This option enables the Broadcom Brahma-B15 read-ahead cache
+	  controller. If disabled, the read-ahead cache remains off.
+
 config CACHE_FEROCEON_L2
 	bool "Enable the Feroceon L2 cache controller"
 	depends on ARCH_MV78XX0 || ARCH_MVEBU
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index d3afdf9eb65a..a6797fdb6721 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -96,6 +96,7 @@ AFLAGS_proc-v6.o	:=-Wa,-march=armv6
 AFLAGS_proc-v7.o	:=-Wa,-march=armv7-a
 
 obj-$(CONFIG_OUTER_CACHE)	+= l2c-common.o
+obj-$(CONFIG_CACHE_B15_RAC)	+= cache-b15-rac.o
 obj-$(CONFIG_CACHE_FEROCEON_L2)	+= cache-feroceon-l2.o
 obj-$(CONFIG_CACHE_L2X0)	+= cache-l2x0.o l2c-l2x0-resume.o
 obj-$(CONFIG_CACHE_XSC3L2)	+= cache-xsc3l2.o
diff --git a/arch/arm/mm/cache-b15-rac.c b/arch/arm/mm/cache-b15-rac.c
new file mode 100644
index 000000000000..1c5bca6e906b
--- /dev/null
+++ b/arch/arm/mm/cache-b15-rac.c
@@ -0,0 +1,181 @@
+/*
+ * Broadcom Brahma-B15 CPU read-ahead cache management functions
+ *
+ * Copyright (C) 2015, Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/err.h>
+#include <linux/spinlock.h>
+#include <linux/io.h>
+#include <linux/bitops.h>
+#include <linux/of_address.h>
+
+#include <asm/cacheflush.h>
+#include <asm/hardware/cache-b15-rac.h>
+
+extern void v7_flush_kern_cache_all(void);
+extern void v7_flush_kern_cache_louis(void);
+extern void v7_flush_icache_all(void);
+
+/* RAC register offsets, relative to the HIF_CPU_BIUCTRL register base */
+#define RAC_CONFIG0_REG			(0x78)
+#define  RACENPREF_MASK			(0x3)
+#define  RACPREFINST_SHIFT		(0)
+#define  RACENINST_SHIFT		(2)
+#define  RACPREFDATA_SHIFT		(4)
+#define  RACENDATA_SHIFT		(6)
+#define  RAC_CPU_SHIFT			(8)
+#define  RACCFG_MASK			(0xff)
+#define RAC_CONFIG1_REG			(0x7c)
+#define RAC_FLUSH_REG			(0x80)
+#define  FLUSH_RAC			(1 << 0)
+
+/* Bitmask to enable instruction and data prefetching with a 256-bytes stride */
+#define RAC_DATA_INST_EN_MASK		(1 << RACPREFINST_SHIFT | \
+					 RACENPREF_MASK << RACENINST_SHIFT | \
+					 1 << RACPREFDATA_SHIFT | \
+					 RACENPREF_MASK << RACENDATA_SHIFT)
+
+#define RAC_ENABLED			(1 << 0)
+
+static void __iomem *b15_rac_base;
+static DEFINE_SPINLOCK(rac_lock);
+
+/* Initialization flag to avoid checking for b15_rac_base, and to prevent
+ * multi-platform kernels from crashing here as well.
+ */
+static unsigned long b15_rac_flags;
+
+static inline u32 __b15_rac_disable(void)
+{
+	u32 val = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
+	__raw_writel(0, b15_rac_base + RAC_CONFIG0_REG);
+	dmb();
+	return val;
+}
+
+static inline void __b15_rac_flush(void)
+{
+	u32 reg;
+
+	__raw_writel(FLUSH_RAC, b15_rac_base + RAC_FLUSH_REG);
+	do {
+		/* This dmb() is required to force the Bus Interface Unit
+		 * to clean oustanding writes, and forces an idle cycle
+		 * to be inserted.
+		 */
+		dmb();
+		reg = __raw_readl(b15_rac_base + RAC_FLUSH_REG);
+	} while (reg & RAC_FLUSH_REG);
+}
+
+static inline u32 b15_rac_disable_and_flush(void)
+{
+	u32 reg;
+
+	reg = __b15_rac_disable();
+	__b15_rac_flush();
+	return reg;
+}
+
+static inline void __b15_rac_enable(u32 val)
+{
+	__raw_writel(val, b15_rac_base + RAC_CONFIG0_REG);
+	/* dsb() is required here to be consistent with __flush_icache_all() */
+	dsb();
+}
+
+#define BUILD_RAC_CACHE_OP(name, bar)				\
+void b15_flush_##name(void)					\
+{								\
+	unsigned int do_flush;					\
+	u32 val = 0;						\
+								\
+	spin_lock(&rac_lock);					\
+	do_flush = test_bit(RAC_ENABLED, &b15_rac_flags);	\
+	if (do_flush)						\
+		val = b15_rac_disable_and_flush();		\
+	v7_flush_##name();					\
+	if (!do_flush)						\
+		bar;						\
+	else							\
+		__b15_rac_enable(val);				\
+	spin_unlock(&rac_lock);					\
+}
+
+#define nobarrier
+
+/* The readahead cache present in the Brahma-B15 CPU is a special piece of
+ * hardware after the integrated L2 cache of the B15 CPU complex whose purpose
+ * is to prefetch instruction and/or data with a line size of either 64 bytes
+ * or 256 bytes. The rationale is that the data-bus of the CPU interface is
+ * optimized for 256-bytes transactions, and enabling the readahead cache
+ * provides a significant performance boost we want it enabled (typically
+ * twice the performance for a memcpy benchmark application).
+ *
+ * The readahead cache is transparent for Modified Virtual Addresses
+ * cache maintenance operations: ICIMVAU, DCIMVAC, DCCMVAC, DCCMVAU and
+ * DCCIMVAC.
+ *
+ * It is however not transparent for the following cache maintenance
+ * operations: DCISW, DCCSW, DCCISW, ICIALLUIS and ICIALLU which is precisely
+ * what we are patching here with our BUILD_RAC_CACHE_OP here.
+ */
+
+BUILD_RAC_CACHE_OP(kern_cache_all, nobarrier);
+BUILD_RAC_CACHE_OP(kern_cache_louis, nobarrier);
+BUILD_RAC_CACHE_OP(icache_all, dsb());
+
+static void b15_rac_enable(void)
+{
+	unsigned int cpu;
+	u32 enable = 0;
+
+	for_each_possible_cpu(cpu)
+		enable |= (RAC_DATA_INST_EN_MASK << (cpu * RAC_CPU_SHIFT));
+
+	b15_rac_disable_and_flush();
+	__b15_rac_enable(enable);
+}
+
+static int __init b15_rac_init(void)
+{
+	struct device_node *dn;
+	int ret = 0, cpu;
+	u32 reg, en_mask = 0;
+
+	dn = of_find_compatible_node(NULL, NULL, "brcm,brcmstb-cpu-biu-ctrl");
+	if (!dn)
+		return -ENODEV;
+
+	WARN(num_possible_cpus() > 4, "RAC only supports 4 CPUs\n");
+
+	b15_rac_base = of_iomap(dn, 0);
+	if (!b15_rac_base) {
+		pr_err("failed to remap BIU control base\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock(&rac_lock);
+	reg = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
+	for_each_possible_cpu(cpu)
+		en_mask |= ((1 << RACPREFDATA_SHIFT) << (cpu * RAC_CPU_SHIFT));
+	WARN(reg & en_mask, "Read-ahead cache not previously disabled\n");
+
+	b15_rac_enable();
+	set_bit(RAC_ENABLED, &b15_rac_flags);
+	spin_unlock(&rac_lock);
+
+	pr_info("Broadcom Brahma-B15 readahead cache at: 0x%p\n",
+		b15_rac_base + RAC_CONFIG0_REG);
+
+out:
+	of_node_put(dn);
+	return ret;
+}
+arch_initcall(b15_rac_init);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] ARM: Hook B15 readahead cache functions based on processor
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
  2015-03-07  0:54 ` [PATCH 1/5] ARM: v7: allow setting different cache functions Florian Fainelli
  2015-03-07  0:54 ` [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support Florian Fainelli
@ 2015-03-07  0:54 ` Florian Fainelli
  2015-03-07  0:54 ` [PATCH 4/5] ARM: B15: Add CPU hotplug awareness Florian Fainelli
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

If we detect that we are running on a Broadcom Brahma-B15 CPU, and
CONFIG_CACHE_B15_RAC is enabled, make sure that we pick-up the
b15_cache_fns function operations.

If CONFIG_CACHE_B15_RAC is enabled, but we are not running on a Broadcom
Brahma-B15 CPU, we will fallback to calling into the regular
v7_cache_fns with no cost. If CONFIG_CACHE_B15_RAC is disabled, there is
no cost and we just use the regular v7_cache_fns.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm/mm/cache-v7.S | 21 +++++++++++++++++++++
 arch/arm/mm/proc-v7.S  |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index b966656d2c2d..86a065e5e5a5 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -15,6 +15,7 @@
 #include <asm/assembler.h>
 #include <asm/errno.h>
 #include <asm/unwind.h>
+#include <asm/hardware/cache-b15-rac.h>
 
 #include "proc-macros.S"
 
@@ -446,3 +447,23 @@ ENDPROC(v7_dma_unmap_area)
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
 	define_cache_functions v7
+
+	/* The Broadcom Brahma-B15 read-ahead cache requires some modifications
+	 * to the v7_cache_fns, we only override the ones we need
+	 */
+#ifndef CONFIG_CACHE_B15_RAC
+	globl_equ	b15_flush_icache_all,		v7_flush_icache_all
+	globl_equ	b15_flush_kern_cache_all,	v7_flush_kern_cache_all
+	globl_equ	b15_flush_kern_cache_louis,	v7_flush_kern_cache_louis
+#endif
+	globl_equ	b15_flush_user_cache_all,	v7_flush_user_cache_all
+	globl_equ	b15_flush_user_cache_range,	v7_flush_user_cache_range
+	globl_equ	b15_coherent_kern_range,	v7_coherent_kern_range
+	globl_equ	b15_coherent_user_range,	v7_coherent_user_range
+	globl_equ	b15_flush_kern_dcache_area,	v7_flush_kern_dcache_area
+
+	globl_equ	b15_dma_map_area,		v7_dma_map_area
+	globl_equ	b15_dma_unmap_area,		v7_dma_unmap_area
+	globl_equ	b15_dma_flush_range,		v7_dma_flush_range
+
+	define_cache_functions b15
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index e0dc266579bc..683ad5f2c0c5 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -568,7 +568,7 @@ __v7_ca15mp_proc_info:
 __v7_b15mp_proc_info:
 	.long	0x420f00f0
 	.long	0xff0ffff0
-	__v7_proc __v7_b15mp_setup
+	__v7_proc __v7_b15mp_setup, cache_fns = b15_cache_fns
 	.size	__v7_b15mp_proc_info, . - __v7_b15mp_proc_info
 
 	/*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] ARM: B15: Add CPU hotplug awareness
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
                   ` (2 preceding siblings ...)
  2015-03-07  0:54 ` [PATCH 3/5] ARM: Hook B15 readahead cache functions based on processor Florian Fainelli
@ 2015-03-07  0:54 ` Florian Fainelli
  2015-03-07  0:54 ` [PATCH 5/5] ARM: B15: Add suspend/resume hooks Florian Fainelli
  2015-03-16 18:33 ` [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
  5 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

The Broadcom Brahma-B15 readahead cache needs to be disabled,
respectively re-enable during a CPU hotplug. In case we were not to do,
CPU hotplug would occasionally fail with random crashes when a given CPU
exits the coherency domain while the RAC is still enabled, as it would
get stale data from the RAC.

In order to avoid adding any specific B15 readahead-cache awareness to
arch/arm/mach-bcm/hotplug-brcmstb.c we use a CPU notifier which allows
us to catch CPU hotplug events and disable/flush enable the RAC
accordingly.

Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm/mm/cache-b15-rac.c | 100 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/arch/arm/mm/cache-b15-rac.c b/arch/arm/mm/cache-b15-rac.c
index 1c5bca6e906b..73d29741f096 100644
--- a/arch/arm/mm/cache-b15-rac.c
+++ b/arch/arm/mm/cache-b15-rac.c
@@ -13,6 +13,8 @@
 #include <linux/io.h>
 #include <linux/bitops.h>
 #include <linux/of_address.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
 
 #include <asm/cacheflush.h>
 #include <asm/hardware/cache-b15-rac.h>
@@ -44,6 +46,7 @@ extern void v7_flush_icache_all(void);
 
 static void __iomem *b15_rac_base;
 static DEFINE_SPINLOCK(rac_lock);
+static u32 rac_config0_reg;
 
 /* Initialization flag to avoid checking for b15_rac_base, and to prevent
  * multi-platform kernels from crashing here as well.
@@ -142,6 +145,94 @@ static void b15_rac_enable(void)
 	__b15_rac_enable(enable);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void b15_rac_hotplug_start(void)
+{
+	/* Indicate that we are starting a hotplug procedure */
+	clear_bit(RAC_ENABLED, &b15_rac_flags);
+
+	/* Disable the readahead cache and save its value to a global */
+	rac_config0_reg = b15_rac_disable_and_flush();
+}
+
+static void b15_rac_hotplug_end(void)
+{
+	/* And enable it */
+	__b15_rac_enable(rac_config0_reg);
+	set_bit(RAC_ENABLED, &b15_rac_flags);
+}
+
+/* The CPU hotplug case is the most interesting one, we basically need to make
+ * sure that the RAC is disabled for the entire system prior to having a CPU
+ * die, in particular prior to this dying CPU having exited the coherency
+ * domain.
+ *
+ * Once this CPU is marked dead, we can safely re-enable the RAC for the
+ * remaining CPUs in the system which are still online.
+ *
+ * Offlining a CPU is the problematic case, onlining a CPU is not much of an
+ * issue since the CPU and its cache-level hierarchy will start filling with
+ * the RAC disabled, so L1 and L2 only.
+ *
+ * In this function, we should NOT have to verify any unsafe setting/condition
+ * b15_rac_base:
+ *
+ *   It is protected by the RAC_ENABLED flag which is cleared by default, and
+ *   being cleared when initial procedure is done. b15_rac_base had been set at
+ *   that time.
+ *
+ * RAC_ENABLED:
+ *   There is a small timing windows, in b15_rac_init(), between
+ *      register_cpu_notifier(&b15_rac_cpu_nb);
+ *      ...
+ *      set RAC_ENABLED
+ *   However, there is no hotplug activity based on the Linux booting procedure.
+ *
+ * Regarding the notification actions, we will receive CPU_DOWN_PREPARE,
+ * CPU_DOWN_FAILED, CPU_DYING, CPU_DEAD, and CPU_POST_DEAD notification (see
+ * _cpu_down() for detail).
+ *
+ * Since we have to disable RAC for all cores, we keep RAC on as long as as
+ * possible (disable it as late as possible) to gain the cache benefit.
+ *
+ * Thus, CPU_DYING/CPU_DEAD pair are chosen.
+ *
+ * We are choosing not do disable the RAC on a per-CPU basis, here, if we did
+ * we would want to consider disabling it as early as possible to benefit the
+ * other active CPUs.
+ */
+static int b15_rac_cpu_notify(struct notifier_block *self,
+			      unsigned long action, void *hcpu)
+{
+	action &= ~CPU_TASKS_FROZEN;
+
+	if (action != CPU_DYING && action != CPU_DOWN_FAILED &&
+	    action != CPU_DEAD)
+		return NOTIFY_OK;
+
+	spin_lock(&rac_lock);
+	switch (action) {
+	/* called on the dying CPU, exactly what we want */
+	case CPU_DYING:
+		b15_rac_hotplug_start();
+		break;
+
+	/* called on a non-dying CPU, what we want too */
+	case CPU_DOWN_FAILED:
+	case CPU_DEAD:
+		b15_rac_hotplug_end();
+		break;
+	}
+	spin_unlock(&rac_lock);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block b15_rac_cpu_nb = {
+	.notifier_call	= b15_rac_cpu_notify,
+};
+#endif /* CONFIG_HOTPLUG_CPU */
+
 static int __init b15_rac_init(void)
 {
 	struct device_node *dn;
@@ -161,6 +252,15 @@ static int __init b15_rac_init(void)
 		goto out;
 	}
 
+#ifdef CONFIG_HOTPLUG_CPU
+	ret = register_cpu_notifier(&b15_rac_cpu_nb);
+	if (ret) {
+		pr_err("failed to register notifier block\n");
+		iounmap(b15_rac_base);
+		goto out;
+	}
+#endif
+
 	spin_lock(&rac_lock);
 	reg = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
 	for_each_possible_cpu(cpu)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] ARM: B15: Add suspend/resume hooks
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
                   ` (3 preceding siblings ...)
  2015-03-07  0:54 ` [PATCH 4/5] ARM: B15: Add CPU hotplug awareness Florian Fainelli
@ 2015-03-07  0:54 ` Florian Fainelli
  2015-03-16 18:33 ` [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
  5 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-07  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

The Broadcom Brahma-B15 CPU readahead cache registers will be restored
to their Power-on-Reset values after a S3 suspend/resume cycles, so we
want to restore what we had enabled before.

Another thing we want to take care of is disabling the read-ahead cache
prior to suspending to avoid any sort of side effect with the spinlock
we need to grab to serialize register accesses.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm/mm/cache-b15-rac.c | 48 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/arch/arm/mm/cache-b15-rac.c b/arch/arm/mm/cache-b15-rac.c
index 73d29741f096..adde1808c478 100644
--- a/arch/arm/mm/cache-b15-rac.c
+++ b/arch/arm/mm/cache-b15-rac.c
@@ -15,6 +15,7 @@
 #include <linux/of_address.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
+#include <linux/syscore_ops.h>
 
 #include <asm/cacheflush.h>
 #include <asm/hardware/cache-b15-rac.h>
@@ -43,6 +44,10 @@ extern void v7_flush_icache_all(void);
 					 RACENPREF_MASK << RACENDATA_SHIFT)
 
 #define RAC_ENABLED			(1 << 0)
+/* Special state where we want to bypass the spinlock and call directly
+ * into the v7 cache maintenance operations during suspend/resume
+ */
+#define RAC_SUSPENDED			(1 << 1)
 
 static void __iomem *b15_rac_base;
 static DEFINE_SPINLOCK(rac_lock);
@@ -98,6 +103,12 @@ void b15_flush_##name(void)					\
 	unsigned int do_flush;					\
 	u32 val = 0;						\
 								\
+	if (test_bit(RAC_SUSPENDED, &b15_rac_flags)) {		\
+		v7_flush_##name();				\
+		bar;						\
+		return;						\
+	}							\
+								\
 	spin_lock(&rac_lock);					\
 	do_flush = test_bit(RAC_ENABLED, &b15_rac_flags);	\
 	if (do_flush)						\
@@ -233,6 +244,39 @@ static struct notifier_block b15_rac_cpu_nb = {
 };
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_PM_SLEEP
+static int b15_rac_suspend(void)
+{
+	/* Suspend the read-ahead cache oeprations, forcing our cache
+	 * implementation to fallback to the regular ARMv7 calls.
+	 *
+	 * We are guaranteed to be running on the boot CPU@this point and
+	 * with every other CPU quiesced, so setting RAC_SUSPENDED is not racy
+	 * here.
+	 */
+	rac_config0_reg = b15_rac_disable_and_flush();
+	set_bit(RAC_SUSPENDED, &b15_rac_flags);
+
+	return 0;
+}
+
+static void b15_rac_resume(void)
+{
+	/* Coming out of a S3 suspend/resume cycle, the read-ahead cache
+	 * register RAC_CONFIG0_REG will be restored to its default value, make
+	 * sure we re-enable it and set the enable flag, we are also guaranteed
+	 * to run on the boot CPU, so not racy again.
+	 */
+	__b15_rac_enable(rac_config0_reg);
+	clear_bit(RAC_SUSPENDED, &b15_rac_flags);
+}
+
+static struct syscore_ops b15_rac_syscore_ops = {
+	.suspend	= b15_rac_suspend,
+	.resume		= b15_rac_resume,
+};
+#endif
+
 static int __init b15_rac_init(void)
 {
 	struct device_node *dn;
@@ -261,6 +305,10 @@ static int __init b15_rac_init(void)
 	}
 #endif
 
+#ifdef CONFIG_PM_SLEEP
+	register_syscore_ops(&b15_rac_syscore_ops);
+#endif
+
 	spin_lock(&rac_lock);
 	reg = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
 	for_each_possible_cpu(cpu)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support
  2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
                   ` (4 preceding siblings ...)
  2015-03-07  0:54 ` [PATCH 5/5] ARM: B15: Add suspend/resume hooks Florian Fainelli
@ 2015-03-16 18:33 ` Florian Fainelli
  5 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-16 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/03/15 16:54, Florian Fainelli wrote:
> Hi all,
> 
> This patch series adds support for the Broadcom Brahma-B15 CPU read-ahead cache
> which is to be found in several Broadcom SoCs, and mainline brcmstb.
> 
> Let me know your thoughts on this, we have tried to minimize both the
> compile-time and runtime impact for non-Broadcom CPU parts.

Any feedback on this implementation? Thanks!

> 
> Florian Fainelli (5):
>   ARM: v7: allow setting different cache functions
>   ARM: Add Broadcom Brahma-B15 readahead cache support
>   ARM: Hook B15 readahead cache functions based on processor
>   ARM: B15: Add CPU hotplug awareness
>   ARM: B15: Add suspend/resume hooks
> 
>  arch/arm/include/asm/cacheflush.h             |   2 +-
>  arch/arm/include/asm/glue-cache.h             |   4 +
>  arch/arm/include/asm/hardware/cache-b15-rac.h |  12 +
>  arch/arm/mm/Kconfig                           |   8 +
>  arch/arm/mm/Makefile                          |   1 +
>  arch/arm/mm/cache-b15-rac.c                   | 329 ++++++++++++++++++++++++++
>  arch/arm/mm/cache-v7.S                        |  21 ++
>  arch/arm/mm/proc-v7.S                         |   6 +-
>  8 files changed, 379 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/include/asm/hardware/cache-b15-rac.h
>  create mode 100644 arch/arm/mm/cache-b15-rac.c
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-07  0:54 ` [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support Florian Fainelli
@ 2015-03-16 21:02   ` Russell King - ARM Linux
  2015-03-16 21:20     ` Florian Fainelli
  2015-03-17 17:29   ` Will Deacon
  1 sibling, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-03-16 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 06, 2015 at 04:54:50PM -0800, Florian Fainelli wrote:
> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
> controller. This cache controller sits between the L2 and the memory bus
> and its purpose is to provide a friendler burst size towards the DDR
> interface than the native cache line size.
> 
> The readahead cache is mostly transparent, except for
> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
> is precisely what we are overriding here.
> 
> The readahead cache only intercepts reads, not writes, as such, some
> data can remain stale in any of its buffers, such that we need to flush
> it, which is an operation that needs to happen in a particular order:
> 
> - disable the readahead cache
> - flush it
> - call the appropriate cache-v7.S function
> - re-enable
> 
> This patch tries to minimize the impact to the cache-v7.S file by only
> providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
> ARCH_BRCMSTB since it is the current user).
> 
> Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  arch/arm/include/asm/cacheflush.h             |   2 +-
>  arch/arm/include/asm/glue-cache.h             |   4 +
>  arch/arm/include/asm/hardware/cache-b15-rac.h |  12 ++
>  arch/arm/mm/Kconfig                           |   8 ++
>  arch/arm/mm/Makefile                          |   1 +
>  arch/arm/mm/cache-b15-rac.c                   | 181 ++++++++++++++++++++++++++
>  6 files changed, 207 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/include/asm/hardware/cache-b15-rac.h
>  create mode 100644 arch/arm/mm/cache-b15-rac.c
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 2d46862e7bef..4d847e185cf6 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -199,7 +199,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>   */
>  #if (defined(CONFIG_CPU_V7) && \
>       (defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K))) || \
> -	defined(CONFIG_SMP_ON_UP)
> +	defined(CONFIG_SMP_ON_UP) || defined(CONFIG_CACHE_B15_RAC)
>  #define __flush_icache_preferred	__cpuc_flush_icache_all
>  #elif __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
>  #define __flush_icache_preferred	__flush_icache_all_v7_smp
> diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h
> index a3c24cd5b7c8..11f33b5f9284 100644
> --- a/arch/arm/include/asm/glue-cache.h
> +++ b/arch/arm/include/asm/glue-cache.h
> @@ -117,6 +117,10 @@
>  # endif
>  #endif
>  
> +#if defined(CONFIG_CACHE_B15_RAC)
> +# define MULTI_CACHE 1
> +#endif
> +
>  #if defined(CONFIG_CPU_V7M)
>  # ifdef _CACHE
>  #  define MULTI_CACHE 1
> diff --git a/arch/arm/include/asm/hardware/cache-b15-rac.h b/arch/arm/include/asm/hardware/cache-b15-rac.h
> new file mode 100644
> index 000000000000..76b888f53f90
> --- /dev/null
> +++ b/arch/arm/include/asm/hardware/cache-b15-rac.h
> @@ -0,0 +1,12 @@
> +#ifndef __ASM_ARM_HARDWARE_CACHE_B15_RAC_H
> +#define __ASM_ARM_HARDWARE_CACHE_B15_RAC_H
> +
> +#ifndef __ASSEMBLY__
> +
> +void b15_flush_kern_cache_all(void);
> +void b15_flush_kern_cache_louis(void);
> +void b15_flush_icache_all(void);
> +
> +#endif
> +
> +#endif
> diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
> index 9b4f29e595a4..4d5652a39304 100644
> --- a/arch/arm/mm/Kconfig
> +++ b/arch/arm/mm/Kconfig
> @@ -853,6 +853,14 @@ config OUTER_CACHE_SYNC
>  	  The outer cache has a outer_cache_fns.sync function pointer
>  	  that can be used to drain the write buffer of the outer cache.
>  
> +config CACHE_B15_RAC
> +	bool "Enable the Broadcom Brahma-B15 read-ahead cache controller"
> +	depends on ARCH_BRCMSTB
> +	default y
> +	help
> +	  This option enables the Broadcom Brahma-B15 read-ahead cache
> +	  controller. If disabled, the read-ahead cache remains off.
> +
>  config CACHE_FEROCEON_L2
>  	bool "Enable the Feroceon L2 cache controller"
>  	depends on ARCH_MV78XX0 || ARCH_MVEBU
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index d3afdf9eb65a..a6797fdb6721 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -96,6 +96,7 @@ AFLAGS_proc-v6.o	:=-Wa,-march=armv6
>  AFLAGS_proc-v7.o	:=-Wa,-march=armv7-a
>  
>  obj-$(CONFIG_OUTER_CACHE)	+= l2c-common.o
> +obj-$(CONFIG_CACHE_B15_RAC)	+= cache-b15-rac.o
>  obj-$(CONFIG_CACHE_FEROCEON_L2)	+= cache-feroceon-l2.o
>  obj-$(CONFIG_CACHE_L2X0)	+= cache-l2x0.o l2c-l2x0-resume.o
>  obj-$(CONFIG_CACHE_XSC3L2)	+= cache-xsc3l2.o
> diff --git a/arch/arm/mm/cache-b15-rac.c b/arch/arm/mm/cache-b15-rac.c
> new file mode 100644
> index 000000000000..1c5bca6e906b
> --- /dev/null
> +++ b/arch/arm/mm/cache-b15-rac.c
> @@ -0,0 +1,181 @@
> +/*
> + * Broadcom Brahma-B15 CPU read-ahead cache management functions
> + *
> + * Copyright (C) 2015, Broadcom Corporation
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/err.h>
> +#include <linux/spinlock.h>
> +#include <linux/io.h>
> +#include <linux/bitops.h>
> +#include <linux/of_address.h>
> +
> +#include <asm/cacheflush.h>
> +#include <asm/hardware/cache-b15-rac.h>
> +
> +extern void v7_flush_kern_cache_all(void);
> +extern void v7_flush_kern_cache_louis(void);
> +extern void v7_flush_icache_all(void);
> +
> +/* RAC register offsets, relative to the HIF_CPU_BIUCTRL register base */
> +#define RAC_CONFIG0_REG			(0x78)
> +#define  RACENPREF_MASK			(0x3)
> +#define  RACPREFINST_SHIFT		(0)
> +#define  RACENINST_SHIFT		(2)
> +#define  RACPREFDATA_SHIFT		(4)
> +#define  RACENDATA_SHIFT		(6)
> +#define  RAC_CPU_SHIFT			(8)
> +#define  RACCFG_MASK			(0xff)
> +#define RAC_CONFIG1_REG			(0x7c)
> +#define RAC_FLUSH_REG			(0x80)
> +#define  FLUSH_RAC			(1 << 0)

					BIT(0) ?

> +
> +/* Bitmask to enable instruction and data prefetching with a 256-bytes stride */
> +#define RAC_DATA_INST_EN_MASK		(1 << RACPREFINST_SHIFT | \
> +					 RACENPREF_MASK << RACENINST_SHIFT | \
> +					 1 << RACPREFDATA_SHIFT | \
> +					 RACENPREF_MASK << RACENDATA_SHIFT)
> +
> +#define RAC_ENABLED			(1 << 0)

					BIT(0) ?

However, you don't use RAC_ENABLED as a bitmask, but a bit index, so
shouldn't this be zero?

> +
> +static void __iomem *b15_rac_base;
> +static DEFINE_SPINLOCK(rac_lock);
> +
> +/* Initialization flag to avoid checking for b15_rac_base, and to prevent
> + * multi-platform kernels from crashing here as well.
> + */
> +static unsigned long b15_rac_flags;
> +
> +static inline u32 __b15_rac_disable(void)
> +{
> +	u32 val = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
> +	__raw_writel(0, b15_rac_base + RAC_CONFIG0_REG);
> +	dmb();
> +	return val;
> +}
> +
> +static inline void __b15_rac_flush(void)
> +{
> +	u32 reg;
> +
> +	__raw_writel(FLUSH_RAC, b15_rac_base + RAC_FLUSH_REG);
> +	do {
> +		/* This dmb() is required to force the Bus Interface Unit
> +		 * to clean oustanding writes, and forces an idle cycle
> +		 * to be inserted.
> +		 */
> +		dmb();
> +		reg = __raw_readl(b15_rac_base + RAC_FLUSH_REG);
> +	} while (reg & RAC_FLUSH_REG);
> +}
> +
> +static inline u32 b15_rac_disable_and_flush(void)
> +{
> +	u32 reg;
> +
> +	reg = __b15_rac_disable();
> +	__b15_rac_flush();
> +	return reg;
> +}
> +
> +static inline void __b15_rac_enable(u32 val)
> +{
> +	__raw_writel(val, b15_rac_base + RAC_CONFIG0_REG);
> +	/* dsb() is required here to be consistent with __flush_icache_all() */
> +	dsb();
> +}
> +
> +#define BUILD_RAC_CACHE_OP(name, bar)				\
> +void b15_flush_##name(void)					\
> +{								\
> +	unsigned int do_flush;					\
> +	u32 val = 0;						\
> +								\
> +	spin_lock(&rac_lock);					\
> +	do_flush = test_bit(RAC_ENABLED, &b15_rac_flags);	\

Do you need to use test_bit() here?  You set and test this location
under a spinlock, so it's safe to use non-atomic ops here.

> +static void b15_rac_enable(void)
> +{
> +	unsigned int cpu;
> +	u32 enable = 0;
> +
> +	for_each_possible_cpu(cpu)
> +		enable |= (RAC_DATA_INST_EN_MASK << (cpu * RAC_CPU_SHIFT));

		enable |= RAC_DATA_INST_EN_MASK << (cpu * RAC_CPU_SHIFT);

You don't need the additional parens - the right hand side of |= is
already expected to be an expression by the compiler.

> +	spin_lock(&rac_lock);
> +	reg = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
> +	for_each_possible_cpu(cpu)
> +		en_mask |= ((1 << RACPREFDATA_SHIFT) << (cpu * RAC_CPU_SHIFT));

		en_mask |= 1 << (RACPREFDATA_SHIFT + cpu * RAC_CPU_SHIFT);

looks nicer, rather than having two shifts.

What happens when the system goes down (eg, for kexec?)  Does the RAC
need to be disabled for that?

-- 
FTTC broadband for 0.8mile line: currently@10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-16 21:02   ` Russell King - ARM Linux
@ 2015-03-16 21:20     ` Florian Fainelli
  2015-03-17  0:10       ` Russell King - ARM Linux
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2015-03-16 21:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/03/15 14:02, Russell King - ARM Linux wrote:
> On Fri, Mar 06, 2015 at 04:54:50PM -0800, Florian Fainelli wrote:
>> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
>> controller. This cache controller sits between the L2 and the memory bus
>> and its purpose is to provide a friendler burst size towards the DDR
>> interface than the native cache line size.
>>
>> The readahead cache is mostly transparent, except for
>> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
>> is precisely what we are overriding here.
>>
>> The readahead cache only intercepts reads, not writes, as such, some
>> data can remain stale in any of its buffers, such that we need to flush
>> it, which is an operation that needs to happen in a particular order:
>>
>> - disable the readahead cache
>> - flush it
>> - call the appropriate cache-v7.S function
>> - re-enable
>>
>> This patch tries to minimize the impact to the cache-v7.S file by only
>> providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
>> ARCH_BRCMSTB since it is the current user).
>>
>> Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---

[snip]

>> +/* Bitmask to enable instruction and data prefetching with a 256-bytes stride */
>> +#define RAC_DATA_INST_EN_MASK		(1 << RACPREFINST_SHIFT | \
>> +					 RACENPREF_MASK << RACENINST_SHIFT | \
>> +					 1 << RACPREFDATA_SHIFT | \
>> +					 RACENPREF_MASK << RACENDATA_SHIFT)
>> +
>> +#define RAC_ENABLED			(1 << 0)
> 
> 					BIT(0) ?
> 
> However, you don't use RAC_ENABLED as a bitmask, but a bit index, so
> shouldn't this be zero?

In subsequent patches we have a need for distinguishing RAC_ENABLED from
RAC_SUSPENDED, so that's the primary reason for using it as a bitmask
(could make that clear somewhere).

[snip]

>> +#define BUILD_RAC_CACHE_OP(name, bar)				\
>> +void b15_flush_##name(void)					\
>> +{								\
>> +	unsigned int do_flush;					\
>> +	u32 val = 0;						\
>> +								\
>> +	spin_lock(&rac_lock);					\
>> +	do_flush = test_bit(RAC_ENABLED, &b15_rac_flags);	\
> 
> Do you need to use test_bit() here?  You set and test this location
> under a spinlock, so it's safe to use non-atomic ops here.

Right, we don't need the test_bit, it just felt a little nicer.

> 
>> +static void b15_rac_enable(void)
>> +{
>> +	unsigned int cpu;
>> +	u32 enable = 0;
>> +
>> +	for_each_possible_cpu(cpu)
>> +		enable |= (RAC_DATA_INST_EN_MASK << (cpu * RAC_CPU_SHIFT));
> 
> 		enable |= RAC_DATA_INST_EN_MASK << (cpu * RAC_CPU_SHIFT);
> 
> You don't need the additional parens - the right hand side of |= is
> already expected to be an expression by the compiler.
> 
>> +	spin_lock(&rac_lock);
>> +	reg = __raw_readl(b15_rac_base + RAC_CONFIG0_REG);
>> +	for_each_possible_cpu(cpu)
>> +		en_mask |= ((1 << RACPREFDATA_SHIFT) << (cpu * RAC_CPU_SHIFT));
> 
> 		en_mask |= 1 << (RACPREFDATA_SHIFT + cpu * RAC_CPU_SHIFT);
> 
> looks nicer, rather than having two shifts.

Indeed, thanks.

> 
> What happens when the system goes down (eg, for kexec?)  Does the RAC
> need to be disabled for that?

Per boot convention, I would say so, yes, since this is another level of
instruction and data cache, we should turn it off. Can we register some
sort of notifier specifically for kexec?

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-16 21:20     ` Florian Fainelli
@ 2015-03-17  0:10       ` Russell King - ARM Linux
  2015-03-17  0:32         ` Florian Fainelli
  0 siblings, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux @ 2015-03-17  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 16, 2015 at 02:20:53PM -0700, Florian Fainelli wrote:
> On 16/03/15 14:02, Russell King - ARM Linux wrote:
> > On Fri, Mar 06, 2015 at 04:54:50PM -0800, Florian Fainelli wrote:
> >> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
> >> controller. This cache controller sits between the L2 and the memory bus
> >> and its purpose is to provide a friendler burst size towards the DDR
> >> interface than the native cache line size.
> >>
> >> The readahead cache is mostly transparent, except for
> >> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
> >> is precisely what we are overriding here.
> >>
> >> The readahead cache only intercepts reads, not writes, as such, some
> >> data can remain stale in any of its buffers, such that we need to flush
> >> it, which is an operation that needs to happen in a particular order:
> >>
> >> - disable the readahead cache
> >> - flush it
> >> - call the appropriate cache-v7.S function
> >> - re-enable
> >>
> >> This patch tries to minimize the impact to the cache-v7.S file by only
> >> providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
> >> ARCH_BRCMSTB since it is the current user).
> >>
> >> Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >> ---
> 
> [snip]
> 
> >> +/* Bitmask to enable instruction and data prefetching with a 256-bytes stride */
> >> +#define RAC_DATA_INST_EN_MASK		(1 << RACPREFINST_SHIFT | \
> >> +					 RACENPREF_MASK << RACENINST_SHIFT | \
> >> +					 1 << RACPREFDATA_SHIFT | \
> >> +					 RACENPREF_MASK << RACENDATA_SHIFT)
> >> +
> >> +#define RAC_ENABLED			(1 << 0)
> > 
> > 					BIT(0) ?
> > 
> > However, you don't use RAC_ENABLED as a bitmask, but a bit index, so
> > shouldn't this be zero?
> 
> In subsequent patches we have a need for distinguishing RAC_ENABLED from
> RAC_SUSPENDED, so that's the primary reason for using it as a bitmask
> (could make that clear somewhere).

However, test_bit() etc take a bit _number_ not a bit _mask_.  So:

Passing in 1 << 0 will test bit 1 rather than bit 0.
Passing in 1 << 1 will test bit 2 rather than bit 1.
Passing in 1 << 2 will test bit 4 rather than bit 2.
Passing in 1 << 3 will test bit 8 rather than bit 3.
etc.

This is not what you wanted.  Either use a mask directly, or use
test_bit() with a bit number etc.  Don't try and do both together. :)

> > What happens when the system goes down (eg, for kexec?)  Does the RAC
> > need to be disabled for that?
> 
> Per boot convention, I would say so, yes, since this is another level of
> instruction and data cache, we should turn it off. Can we register some
> sort of notifier specifically for kexec?

The code at present doesn't expect there to be platform specific caches,
so that probably isn't catered for yet.  I mentioned the point to raise
the issue that there's an oversight here.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-17  0:10       ` Russell King - ARM Linux
@ 2015-03-17  0:32         ` Florian Fainelli
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-03-17  0:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/03/15 17:10, Russell King - ARM Linux wrote:
> On Mon, Mar 16, 2015 at 02:20:53PM -0700, Florian Fainelli wrote:
>> On 16/03/15 14:02, Russell King - ARM Linux wrote:
>>> On Fri, Mar 06, 2015 at 04:54:50PM -0800, Florian Fainelli wrote:
>>>> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
>>>> controller. This cache controller sits between the L2 and the memory bus
>>>> and its purpose is to provide a friendler burst size towards the DDR
>>>> interface than the native cache line size.
>>>>
>>>> The readahead cache is mostly transparent, except for
>>>> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
>>>> is precisely what we are overriding here.
>>>>
>>>> The readahead cache only intercepts reads, not writes, as such, some
>>>> data can remain stale in any of its buffers, such that we need to flush
>>>> it, which is an operation that needs to happen in a particular order:
>>>>
>>>> - disable the readahead cache
>>>> - flush it
>>>> - call the appropriate cache-v7.S function
>>>> - re-enable
>>>>
>>>> This patch tries to minimize the impact to the cache-v7.S file by only
>>>> providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
>>>> ARCH_BRCMSTB since it is the current user).
>>>>
>>>> Signed-off-by: Alamy Liu <alamyliu@broadcom.com>
>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>>> ---
>>
>> [snip]
>>
>>>> +/* Bitmask to enable instruction and data prefetching with a 256-bytes stride */
>>>> +#define RAC_DATA_INST_EN_MASK		(1 << RACPREFINST_SHIFT | \
>>>> +					 RACENPREF_MASK << RACENINST_SHIFT | \
>>>> +					 1 << RACPREFDATA_SHIFT | \
>>>> +					 RACENPREF_MASK << RACENDATA_SHIFT)
>>>> +
>>>> +#define RAC_ENABLED			(1 << 0)
>>>
>>> 					BIT(0) ?
>>>
>>> However, you don't use RAC_ENABLED as a bitmask, but a bit index, so
>>> shouldn't this be zero?
>>
>> In subsequent patches we have a need for distinguishing RAC_ENABLED from
>> RAC_SUSPENDED, so that's the primary reason for using it as a bitmask
>> (could make that clear somewhere).
> 
> However, test_bit() etc take a bit _number_ not a bit _mask_.  So:
> 
> Passing in 1 << 0 will test bit 1 rather than bit 0.
> Passing in 1 << 1 will test bit 2 rather than bit 1.
> Passing in 1 << 2 will test bit 4 rather than bit 2.
> Passing in 1 << 3 will test bit 8 rather than bit 3.
> etc.
> 
> This is not what you wanted.  Either use a mask directly, or use
> test_bit() with a bit number etc.  Don't try and do both together. :)

Fixed, thanks.

> 
>>> What happens when the system goes down (eg, for kexec?)  Does the RAC
>>> need to be disabled for that?
>>
>> Per boot convention, I would say so, yes, since this is another level of
>> instruction and data cache, we should turn it off. Can we register some
>> sort of notifier specifically for kexec?
> 
> The code at present doesn't expect there to be platform specific caches,
> so that probably isn't catered for yet.  I mentioned the point to raise
> the issue that there's an oversight here.

Since kexec goes through the usual suspend/resume path, it will suspend
the RAC by calling into syscore_suspend (patch 5), that's when
CONFIG_KERNEL_KEXEC_JUMP is set, which is not guaranteed.

kernel_restart_prepare() calls a reboot notifier with SYS_RESTART, so if
we disable the RAC at this point, we should be good, I will add that.
-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-07  0:54 ` [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support Florian Fainelli
  2015-03-16 21:02   ` Russell King - ARM Linux
@ 2015-03-17 17:29   ` Will Deacon
  2015-03-17 18:02     ` Florian Fainelli
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-03-17 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Mar 07, 2015 at 12:54:50AM +0000, Florian Fainelli wrote:
> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
> controller. This cache controller sits between the L2 and the memory bus
> and its purpose is to provide a friendler burst size towards the DDR
> interface than the native cache line size.
> 
> The readahead cache is mostly transparent, except for
> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
> is precisely what we are overriding here.

I'm struggling to understand why you care about flush_kern_cache_louis
and flush_icache_all for a cache that sits the other side of the L2.

Can you explain why we need to do anything in these cases, please?

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-17 17:29   ` Will Deacon
@ 2015-03-17 18:02     ` Florian Fainelli
  2015-03-23 11:14       ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2015-03-17 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/03/15 10:29, Will Deacon wrote:
> On Sat, Mar 07, 2015 at 12:54:50AM +0000, Florian Fainelli wrote:
>> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
>> controller. This cache controller sits between the L2 and the memory bus
>> and its purpose is to provide a friendler burst size towards the DDR
>> interface than the native cache line size.
>>
>> The readahead cache is mostly transparent, except for
>> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
>> is precisely what we are overriding here.
> 
> I'm struggling to understand why you care about flush_kern_cache_louis
> and flush_icache_all for a cache that sits the other side of the L2.
> 
> Can you explain why we need to do anything in these cases, please?

Let's try, as you may have read in the comment, all MVA-based cache
maintenance operations are snooped by the RAC, so they are effectively
"transparent" to software, all others are not.

flush_kern_cache_louis() and flush_icache_all() both use ICALLIUS in the
SMP case and ICIALLU in the UP case which were flagged as not being
transparently handled.

The concern is that, if you perform a L1 cache (data or instruction)
flush (essentially an invalidate), this will also flush (invalidate)
corresponding L2 cache lines, but the RAC has no way to be signaled that
is should also invalidate its own RAC cache lines pertaining to that
data, and RAC holds per-CPU "super" cache lines.

In arch/arm/kernel/smp.c, all uses of flush_cache_louis() are for
writing-back data, so the RAC is not an issue. In
arch/arm/kernel/suspend.c, flush_cache_louis() is known not to guarantee
a "clean" all the way to main memory, so __cpu_flush_dcache_area is used
in conjunction. In arch/arm/mm/idmap.c and mmu.c, the use of
flush_cache_louis() seems to be meant to see fresh data, not write-back,
so not transparent to the RAC, is that right?

It may very well be that we are super cautious here and that the only
case to take care of is essentially flush_cache_all(), and nothing more.

Would you suggestions on how to instrument/exercise whether we really
need to deal with flush_cache_louis() and flush_icache_all()?

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-17 18:02     ` Florian Fainelli
@ 2015-03-23 11:14       ` Will Deacon
  2015-07-27 18:47         ` Florian Fainelli
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-03-23 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 17, 2015 at 06:02:22PM +0000, Florian Fainelli wrote:
> On 17/03/15 10:29, Will Deacon wrote:
> > On Sat, Mar 07, 2015 at 12:54:50AM +0000, Florian Fainelli wrote:
> >> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
> >> controller. This cache controller sits between the L2 and the memory bus
> >> and its purpose is to provide a friendler burst size towards the DDR
> >> interface than the native cache line size.
> >>
> >> The readahead cache is mostly transparent, except for
> >> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
> >> is precisely what we are overriding here.
> > 
> > I'm struggling to understand why you care about flush_kern_cache_louis
> > and flush_icache_all for a cache that sits the other side of the L2.
> > 
> > Can you explain why we need to do anything in these cases, please?
> 
> Let's try, as you may have read in the comment, all MVA-based cache
> maintenance operations are snooped by the RAC, so they are effectively
> "transparent" to software, all others are not.
> 
> flush_kern_cache_louis() and flush_icache_all() both use ICALLIUS in the
> SMP case and ICIALLU in the UP case which were flagged as not being
> transparently handled.
> 
> The concern is that, if you perform a L1 cache (data or instruction)
> flush (essentially an invalidate), this will also flush (invalidate)
> corresponding L2 cache lines, but the RAC has no way to be signaled that
> is should also invalidate its own RAC cache lines pertaining to that
> data, and RAC holds per-CPU "super" cache lines.
> 
> In arch/arm/kernel/smp.c, all uses of flush_cache_louis() are for
> writing-back data, so the RAC is not an issue. In
> arch/arm/kernel/suspend.c, flush_cache_louis() is known not to guarantee
> a "clean" all the way to main memory, so __cpu_flush_dcache_area is used
> in conjunction. In arch/arm/mm/idmap.c and mmu.c, the use of
> flush_cache_louis() seems to be meant to see fresh data, not write-back,
> so not transparent to the RAC, is that right?
> 
> It may very well be that we are super cautious here and that the only
> case to take care of is essentially flush_cache_all(), and nothing more.
> 
> Would you suggestions on how to instrument/exercise whether we really
> need to deal with flush_cache_louis() and flush_icache_all()?

I think that both flush_cache_louis and flush_icache_all only care about
the inner-shareable domain, so you don't need to do anything with the
RAC. It's a bit like the PL310 outer-cache, which is also not affected
by these operations.

I don't think there's a good way to determine statically if we have
missing cacheflush calls. Maybe a better bet would be to implement a
RAC driver using the outer_cache framework and only implement the
flush_all callback.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support
  2015-03-23 11:14       ` Will Deacon
@ 2015-07-27 18:47         ` Florian Fainelli
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2015-07-27 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/03/15 04:14, Will Deacon wrote:
> On Tue, Mar 17, 2015 at 06:02:22PM +0000, Florian Fainelli wrote:
>> On 17/03/15 10:29, Will Deacon wrote:
>>> On Sat, Mar 07, 2015 at 12:54:50AM +0000, Florian Fainelli wrote:
>>>> This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
>>>> controller. This cache controller sits between the L2 and the memory bus
>>>> and its purpose is to provide a friendler burst size towards the DDR
>>>> interface than the native cache line size.
>>>>
>>>> The readahead cache is mostly transparent, except for
>>>> flush_kern_cache_all, flush_kern_cache_louis and flush_icache_all, which
>>>> is precisely what we are overriding here.
>>>
>>> I'm struggling to understand why you care about flush_kern_cache_louis
>>> and flush_icache_all for a cache that sits the other side of the L2.
>>>
>>> Can you explain why we need to do anything in these cases, please?
>>
>> Let's try, as you may have read in the comment, all MVA-based cache
>> maintenance operations are snooped by the RAC, so they are effectively
>> "transparent" to software, all others are not.
>>
>> flush_kern_cache_louis() and flush_icache_all() both use ICALLIUS in the
>> SMP case and ICIALLU in the UP case which were flagged as not being
>> transparently handled.
>>
>> The concern is that, if you perform a L1 cache (data or instruction)
>> flush (essentially an invalidate), this will also flush (invalidate)
>> corresponding L2 cache lines, but the RAC has no way to be signaled that
>> is should also invalidate its own RAC cache lines pertaining to that
>> data, and RAC holds per-CPU "super" cache lines.
>>
>> In arch/arm/kernel/smp.c, all uses of flush_cache_louis() are for
>> writing-back data, so the RAC is not an issue. In
>> arch/arm/kernel/suspend.c, flush_cache_louis() is known not to guarantee
>> a "clean" all the way to main memory, so __cpu_flush_dcache_area is used
>> in conjunction. In arch/arm/mm/idmap.c and mmu.c, the use of
>> flush_cache_louis() seems to be meant to see fresh data, not write-back,
>> so not transparent to the RAC, is that right?
>>
>> It may very well be that we are super cautious here and that the only
>> case to take care of is essentially flush_cache_all(), and nothing more.
>>
>> Would you suggestions on how to instrument/exercise whether we really
>> need to deal with flush_cache_louis() and flush_icache_all()?
> 
> I think that both flush_cache_louis and flush_icache_all only care about
> the inner-shareable domain, so you don't need to do anything with the
> RAC. It's a bit like the PL310 outer-cache, which is also not affected
> by these operations.

I see, will keep experimenting with removing these two and see if
anything breaks.

> 
> I don't think there's a good way to determine statically if we have
> missing cacheflush calls. Maybe a better bet would be to implement a
> RAC driver using the outer_cache framework and only implement the
> flush_all callback.

Last I tried this, the performance became absolutely terrible for e.g:
networking which involves doing frequent invalidation + write-back due
to DMA operations. Also, it did not seem to me like it was possible to
get an information about the DMA transfer direction (at least not at
this level) which could help speed the write-back case since there
nothing to do in that case (unlike in the PL310 case).
-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-07-27 18:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07  0:54 [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli
2015-03-07  0:54 ` [PATCH 1/5] ARM: v7: allow setting different cache functions Florian Fainelli
2015-03-07  0:54 ` [PATCH 2/5] ARM: Add Broadcom Brahma-B15 readahead cache support Florian Fainelli
2015-03-16 21:02   ` Russell King - ARM Linux
2015-03-16 21:20     ` Florian Fainelli
2015-03-17  0:10       ` Russell King - ARM Linux
2015-03-17  0:32         ` Florian Fainelli
2015-03-17 17:29   ` Will Deacon
2015-03-17 18:02     ` Florian Fainelli
2015-03-23 11:14       ` Will Deacon
2015-07-27 18:47         ` Florian Fainelli
2015-03-07  0:54 ` [PATCH 3/5] ARM: Hook B15 readahead cache functions based on processor Florian Fainelli
2015-03-07  0:54 ` [PATCH 4/5] ARM: B15: Add CPU hotplug awareness Florian Fainelli
2015-03-07  0:54 ` [PATCH 5/5] ARM: B15: Add suspend/resume hooks Florian Fainelli
2015-03-16 18:33 ` [PATCH 0/5] ARM: Broadcom Brahma-B15 readahead cache support Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.