All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] ARM: add cache level maintenance operations
@ 2012-04-12 13:08 Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 1/3] ARM: mm: define cache levels for cache maintenance ops Lorenzo Pieralisi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Lorenzo Pieralisi @ 2012-04-12 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

The v7 ARM architecture introduced the concept of cache levels and relative
control registers to manage them. Cache operations that operate on set/way
require to define the cache level at which maintenance operations are carried
out by using coprocessor registers.

Processors like A7/A15 integrated a unified L2 that is part of the cache
level hierarchy; this implies that cache operations operating on all levels
also end up cleaning the L2 unified cache which is a very time consuming
operation and it is not needed for some power-down operations like single CPU
shutdown.

For v7, flush_kern_all() cleans all the cache levels up to the Level of
Coherency which includes L2 in it. This is suboptimal for code paths that ends
up shutting-down a single processor like CPU hotplug and CPU idle, where only
per-CPU cache state (ie L1 integrated cache) has to be cleaned and invalidated.

To fix this performance issue this patchset introduces cache level maintenance
operations in the kernel.

A new cache operations pointer is added to cpu_cache_fns

void (*flush_kern_dcache_level)(int);

that takes an input parameter corresponding to the upper dcache level that has
to be cleaned/invalidated.

A preferred default level hook is introduced that corresponds to flushing all
cache levels leaving the current behaviour unchanged.

A v7 specific patch adds a preferred cache level hook that corresponds to
Level of Unification Inner Shareable; it represents data cache levels that
are per-CPU (ie integrated L1) in most of the current v7 based systems.

Code is in the making to define the preferred cache level through a DT
binding since it is SoC specific.

The patchset updates cpu_suspend code accordingly, in order to flush only cache
levels that are lost when a single CPU is shutdown.

For A9/A5 processors Level of Unification Inner Shareable and Level of
Coherency are equivalent hence this patch should not affect cpu_suspend
behaviour in any way when run on A9/A5 based systems.

Tested on an A15 dual cluster system through CPU soft-reboot.

TO BE DONE:

- Test it with CPUs going through power shutdown
- Test it on all existing A9/A5 implementations using cpu_suspend

Lorenzo Pieralisi (3):
  ARM: mm: define cache levels for cache maintenance ops
  ARM: mm: v7 cache level operations
  ARM: kernel: update cpu_suspend code to use dcache level operations

 arch/arm/include/asm/cacheflush.h |   38 ++++++++++++++++++++++++++++++++
 arch/arm/kernel/suspend.c         |   13 ++++++++++-
 arch/arm/mm/cache-v7.S            |   43 ++++++++++++++++++++++++++++++++++++-
 arch/arm/mm/proc-macros.S         |    7 +++++-
 4 files changed, 98 insertions(+), 3 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] ARM: mm: define cache levels for cache maintenance ops
  2012-04-12 13:08 [RFC PATCH 0/3] ARM: add cache level maintenance operations Lorenzo Pieralisi
@ 2012-04-12 13:08 ` Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 2/3] ARM: mm: v7 cache level operations Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 3/3] ARM: kernel: update cpu_suspend code to use dcache " Lorenzo Pieralisi
  2 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Pieralisi @ 2012-04-12 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

ARM v7 architecture introduced the concept of cache levels and related
coherency requirements. In order to select which cache levels must be
cleaned and invalidated, a new kernel cache maintenance API must be
added to the cpu_cache_fns structure of pointers.

This patch adds flush_dcache_level(level) to the ARM kernel cache
maintenance API.

This function cleans and invalidates all data cache levels up to the one
passed as an input parameter.

The cpu_cache_fns struct reflects this change by adding a new function
pointer that is initialized by arch specific assembly files.

The preferred cached level to be cleaned/invalidated can be retrieved
using the function call:

flush_cache_level_cpu(void)

By default, this function returns -1 which causes all cache levels to
be cleaned and invalidated to main memory.

Architectures can override the cache level returned by default by
patching/defining the preferred cache level hook for the arch in
question.

By default, all existing archs do not instantiate any cache level function
pointer, and flush_dcache_level just falls back to flush_kern_all.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Colin Cross <ccross@android.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kachhap <amit.kachhap@linaro.org>
---
 arch/arm/include/asm/cacheflush.h |   22 ++++++++++++++++++++++
 arch/arm/mm/proc-macros.S         |    7 ++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 1252a26..741ae25 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -49,6 +49,10 @@
  *
  *		Unconditionally clean and invalidate the entire cache.
  *
+ *	flush_kern_dcache_level(level)
+ *
+ *		Flush data cache levels up to the level input parameter.
+ *
  *	flush_user_all()
  *
  *		Clean and invalidate all user space cache entries
@@ -97,6 +101,7 @@
 struct cpu_cache_fns {
 	void (*flush_icache_all)(void);
 	void (*flush_kern_all)(void);
+	void (*flush_kern_dcache_level)(int);
 	void (*flush_user_all)(void);
 	void (*flush_user_range)(unsigned long, unsigned long, unsigned int);
 
@@ -199,6 +204,23 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define __flush_icache_preferred	__flush_icache_all_generic
 #endif
 
+#define flush_cache_level_preferred()		(-1)
+
+static inline int flush_cache_level_cpu(void)
+{
+	return flush_cache_level_preferred();
+}
+/*
+ * Flush data cache up to a certain cache level
+ * level -	upper cache level to clean
+ *		if level == -1, default to flush_kern_all
+ */
+#ifdef MULTI_CACHE
+#define flush_dcache_level(level)	cpu_cache.flush_kern_dcache_level(level)
+#else
+#define flush_dcache_level(level)	__cpuc_flush_kern_all()
+#endif
+
 static inline void __flush_icache_all(void)
 {
 	__flush_icache_preferred();
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 2d8ff3a..f193bc3 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -293,12 +293,17 @@ ENTRY(\name\()_processor_functions)
 	.size	\name\()_processor_functions, . - \name\()_processor_functions
 .endm
 
-.macro define_cache_functions name:req
+.macro define_cache_functions name:req, cachelevel=0
 	.align 2
 	.type	\name\()_cache_fns, #object
 ENTRY(\name\()_cache_fns)
 	.long	\name\()_flush_icache_all
 	.long	\name\()_flush_kern_cache_all
+	.if \cachelevel
+	.long	\name\()_flush_kern_dcache_level
+	.else
+	.long	\name\()_flush_kern_cache_all
+	.endif
 	.long	\name\()_flush_user_cache_all
 	.long	\name\()_flush_user_cache_range
 	.long	\name\()_coherent_kern_range
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] ARM: mm: v7 cache level operations
  2012-04-12 13:08 [RFC PATCH 0/3] ARM: add cache level maintenance operations Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 1/3] ARM: mm: define cache levels for cache maintenance ops Lorenzo Pieralisi
@ 2012-04-12 13:08 ` Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 3/3] ARM: kernel: update cpu_suspend code to use dcache " Lorenzo Pieralisi
  2 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Pieralisi @ 2012-04-12 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

ARM v7 architecture introduces the concept of cache levels and registers
to probe and manage cache levels accordingly.

This patch adds v7 support for dcache level operations and defines a
preferred cache level hook that by default is set to Level of
Unification Inner Shareable (LoUIS).

Allowed cache levels are:
	-1: 	flush the entire cache system
	 0:	no-op
	 [1-7]  flush the corresponding data cache level

[LoUIS] has been chosen as preferred level since it represents the cache
levels which are not shared by CPUs in most of the current systems (i.e.
cache levels that are per-CPU as e.g. an integrated L1).

Power-down operations like hotplug and CPU idle require to clean/invalidate
only cache levels that are within the CPU power domain, and LoUIS reflects
this requirement properly in most of the systems.

To improve configurability and possibly optimize cache operations a DT binding
is in the making to allow platforms to define the preferred cache level in a
more flexible way.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Colin Cross <ccross@android.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kachhap <amit.kachhap@linaro.org>
---
 arch/arm/include/asm/cacheflush.h |   16 ++++++++++++++
 arch/arm/mm/cache-v7.S            |   43 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 741ae25..0c9e4fa 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -204,7 +204,23 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define __flush_icache_preferred	__flush_icache_all_generic
 #endif
 
+#if __LINUX_ARM_ARCH__ >= 7
+/*
+ * Hotplug and CPU idle code requires to flush only cache levels
+ * impacted by power down operations. In v7 the upper level is
+ * retrieved by reading LoUIS field of CLIDR, since inner shareability
+ * represents the cache boundaries affected by per-CPU shutdown
+ * operations in the most common platforms.
+ */
+#define __cache_level_v7_uis ({ \
+	u32 val; \
+	asm volatile("mrc p15, 1, %0, c0, c0, 1" : "=r"(val)); \
+	((val & 0xe00000) >> 21); })
+
+#define flush_cache_level_preferred()		__cache_level_v7_uis
+#else
 #define flush_cache_level_preferred()		(-1)
+#endif
 
 static inline int flush_cache_level_cpu(void)
 {
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 07c4bc8..c79f152 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,6 +33,23 @@ ENTRY(v7_flush_icache_all)
 ENDPROC(v7_flush_icache_all)
 
 /*
+ *	v7_flush_dcache_level(level)
+ *
+ *	Flush the D-cache up to the level passed as first input parameter.
+ *
+ * 	r0 - cache level
+ *
+ *	Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
+ */
+
+ENTRY(v7_flush_dcache_level)
+	dmb
+	mov	r3, r0, lsl #1			@ level * 2
+	mrc	p15, 1, r0, c0, c0, 1		@ read clidr
+	b	__flush_level
+ENDPROC(v7_flush_dcache_level)
+
+/*
  *	v7_flush_dcache_all()
  *
  *	Flush the whole D-cache.
@@ -47,6 +64,7 @@ ENTRY(v7_flush_dcache_all)
 	ands	r3, r0, #0x7000000		@ extract loc from clidr
 	mov	r3, r3, lsr #23			@ left align loc bit field
 	beq	finished			@ if loc is 0, then no need to clean
+__flush_level:
 	mov	r10, #0				@ start clean at cache level 0
 loop1:
 	add	r2, r10, r10, lsr #1		@ work out 3x current cache level
@@ -114,6 +132,29 @@ ENTRY(v7_flush_kern_cache_all)
 ENDPROC(v7_flush_kern_cache_all)
 
 /*
+ *	v7_flush_kern_dcache_level(int level)
+ *      level - upper level that should be cleaned/invalidated
+ *		[valid values (-1,7)]
+ *              level == -1 forces a flush_kern_cache_all
+ *              level == 0 is a nop
+ *              1 < level <=7 flush dcache up to level
+ *	Flush the data cache up to a level passed as a platform
+ *	specific parameter
+ */
+ENTRY(v7_flush_kern_dcache_level)
+	cmp	r0, #-1				@ -1 defaults to flush all
+	beq	v7_flush_kern_cache_all
+ ARM(	stmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
+ THUMB(	stmfd	sp!, {r4-r7, r9-r11, lr}	)
+	sub	r2, r0, #1
+	cmp	r2, #6
+	blls	v7_flush_dcache_level		@ jump if 0 < level <=7
+ ARM(	ldmfd	sp!, {r4-r5, r7, r9-r11, lr}	)
+ THUMB(	ldmfd	sp!, {r4-r7, r9-r11, lr}	)
+	mov	pc, lr
+ENDPROC(v7_flush_kern_dcache_level)
+
+/*
  *	v7_flush_cache_all()
  *
  *	Flush all TLB entries in a particular address space
@@ -346,4 +387,4 @@ ENDPROC(v7_dma_unmap_area)
 	__INITDATA
 
 	@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S)
-	define_cache_functions v7
+	define_cache_functions v7, cachelevel=1
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] ARM: kernel: update cpu_suspend code to use dcache level operations
  2012-04-12 13:08 [RFC PATCH 0/3] ARM: add cache level maintenance operations Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 1/3] ARM: mm: define cache levels for cache maintenance ops Lorenzo Pieralisi
  2012-04-12 13:08 ` [RFC PATCH 2/3] ARM: mm: v7 cache level operations Lorenzo Pieralisi
@ 2012-04-12 13:08 ` Lorenzo Pieralisi
  2 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Pieralisi @ 2012-04-12 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

In processors like A15/A7 L2 cache is unified and integrated within the
processor cache hierarchy, so that it is not considered an outer cache
anymore. For processors like A15/A7 flush_cache_all() ends up cleaning
all cache levels up to Level of Coherency (LoC) that includes the L2
unified cache.

When a single CPU is suspended (CPU idle) a complete L2 clean is not
required, so generic cpu_suspend code must clean the data cache using the
newly introduced dcache level function. The HW cache level is retrieved
through the hook

flush_cache_level_cpu(void)

that returns the preferred data cache level to be flushed for the respective
architecture/platform.

The context and stack pointer (context pointer) are cleaned to main memory
using cache area functions that operate on MVA and guarantee that the data
is written back to main memory (perform cache cleaning up to the Point of
Coherency - PoC) so that the processor can fetch the context when the MMU
is off in the cpu_resume code path.

outer_cache management remains unchanged.

Tested on an A15 dual-core cluster through CPU soft-reset.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Colin Cross <ccross@android.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kachhap <amit.kachhap@linaro.org>
---
 arch/arm/kernel/suspend.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
index 1794cc3..2909bc8 100644
--- a/arch/arm/kernel/suspend.c
+++ b/arch/arm/kernel/suspend.c
@@ -26,7 +26,18 @@ void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
 
 	cpu_do_suspend(ptr);
 
-	flush_cache_all();
+	flush_dcache_level(flush_cache_level_cpu());
+	/*
+	 * flush_dcache_level does not guarantee that
+	 * save_ptr and ptr are cleaned to main memory,
+	 * just up to the required cache level.
+	 * Since the context pointer and context itself
+	 * are to be retrieved with the MMU off that
+	 * data must be cleaned from all cache levels
+	 * to main memory using "area" cache primitives.
+	 */
+	__cpuc_flush_dcache_area(ptr, ptrsz);
+	__cpuc_flush_dcache_area(save_ptr, sizeof(*save_ptr));
 	outer_clean_range(*save_ptr, *save_ptr + ptrsz);
 	outer_clean_range(virt_to_phys(save_ptr),
 			  virt_to_phys(save_ptr) + sizeof(*save_ptr));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-04-12 13:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 13:08 [RFC PATCH 0/3] ARM: add cache level maintenance operations Lorenzo Pieralisi
2012-04-12 13:08 ` [RFC PATCH 1/3] ARM: mm: define cache levels for cache maintenance ops Lorenzo Pieralisi
2012-04-12 13:08 ` [RFC PATCH 2/3] ARM: mm: v7 cache level operations Lorenzo Pieralisi
2012-04-12 13:08 ` [RFC PATCH 3/3] ARM: kernel: update cpu_suspend code to use dcache " Lorenzo Pieralisi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.