All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MCPM backends updates
@ 2013-07-18  3:28 Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences Nicolas Pitre
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

Patch 1:
Fixes to the existing Vexpress DCSCB backend.

Patch 2:
Lorenzo's minimal SPC driver required by the TC2 MCPM backend.
As discussed on the list, this is not considered to be a MFD driver.
This hardly qualify as a bus driver either.  Conceptually, this is
a very platform specific register multiplex driver allowing subsystem
drivers to hook into.  I therefore moved it to drivers/platform/vexpress/.

Patch 3:
This is the MCPM backend enabling SMP secondary boot and CPU hotplug
on the VExpress TC2 big.LITTLE platform.

Patch 4:
Add MCPM suspend method to the TC2 backend allowing basic CPU idle/suspend.
The cpuidle driver that hooks into this will be submitted separately.

The TC2 MCPM patches have been reviewed here:
http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=242408

The latest SPC driver discussion is here:
http://news.gmane.org/group/gmane.linux.kernel/thread=1525836


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18  3:28 [PATCH 0/4] MCPM backends updates Nicolas Pitre
@ 2013-07-18  3:28 ` Nicolas Pitre
  2013-07-18 15:04   ` Dave Martin
  2013-07-18  3:28 ` [PATCH 2/4] drivers/platform: vexpress: add Serial Power Controller (SPC) support Nicolas Pitre
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

Unlike real A15/A7's, the RTSM simulation doesn't appear to hit the
cache when the CTRL.C bit is cleared.  Let's ensure there is no memory
access within the disable and flush cache sequence, including to the
stack.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/mach-vexpress/dcscb.c | 58 +++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 16d57a8a9d..9f01c04d58 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -136,14 +136,29 @@ static void dcscb_power_down(void)
 		/*
 		 * Flush all cache levels for this cluster.
 		 *
-		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
-		 * a preliminary flush here for those CPUs.  At least, that's
-		 * the theory -- without the extra flush, Linux explodes on
-		 * RTSM (to be investigated).
+		 * To do so we do:
+		 * - Clear the CTLR.C bit to prevent further cache allocations
+		 * - Flush the whole cache
+		 * - Disable local coherency by clearing the ACTLR "SMP" bit
+		 *
+		 * Let's do it in the safest possible way i.e. with
+		 * no memory access within the following sequence
+		 * including the stack.
 		 */
-		flush_cache_all();
-		set_cr(get_cr() & ~CR_C);
-		flush_cache_all();
+		asm volatile(
+		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
+		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
+		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
+		"isb	\n\t"
+		"bl	v7_flush_dcache_all \n\t"
+		"clrex	\n\t"
+		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
+		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
+		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
+		"isb	\n\t"
+		"dsb	"
+		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
+		      "r9","r10","r11","lr","memory");
 
 		/*
 		 * This is a harmless no-op.  On platforms with a real
@@ -152,9 +167,6 @@ static void dcscb_power_down(void)
 		 */
 		outer_flush_all();
 
-		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
-		set_auxcr(get_auxcr() & ~(1 << 6));
-
 		/*
 		 * Disable cluster-level coherency by masking
 		 * incoming snoops and DVM messages:
@@ -167,18 +179,22 @@ static void dcscb_power_down(void)
 
 		/*
 		 * Flush the local CPU cache.
-		 *
-		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
-		 * a preliminary flush here for those CPUs.  At least, that's
-		 * the theory -- without the extra flush, Linux explodes on
-		 * RTSM (to be investigated).
+		 * Let's do it in the safest possible way as above.
 		 */
-		flush_cache_louis();
-		set_cr(get_cr() & ~CR_C);
-		flush_cache_louis();
-
-		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
-		set_auxcr(get_auxcr() & ~(1 << 6));
+		asm volatile(
+		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
+		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
+		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
+		"isb	\n\t"
+		"bl	v7_flush_dcache_louis \n\t"
+		"clrex	\n\t"
+		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
+		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
+		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
+		"isb	\n\t"
+		"dsb	"
+		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
+		      "r9","r10","r11","lr","memory");
 	}
 
 	__mcpm_cpu_down(cpu, cluster);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] drivers/platform: vexpress: add Serial Power Controller (SPC) support
  2013-07-18  3:28 [PATCH 0/4] MCPM backends updates Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences Nicolas Pitre
@ 2013-07-18  3:28 ` Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 3/4] ARM: vexpress/TC2: basic PM support Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 4/4] ARM: vexpress/TC2: implement PM suspend method Nicolas Pitre
  3 siblings, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

The TC2 versatile express core tile integrates a logic block that provides
the interface between the dual cluster test-chip and the M3 microcontroller
that carries out power management. The logic block, called Serial Power
Controller (SPC), contains several memory mapped registers to control among
other things low-power states, wake-up irqs and per-CPU jump addresses
registers.

This patch provides a driver that enables run-time control of features
implemented by the SPC power management control logic with an API to
be used by different subsystem drivers on top.

The SPC control logic is required to be programmed very early in the boot
process to reset secondary CPUs on the TC2 testchip, set-up jump addresses
and wake-up IRQs for power management. Hence, waiting for core changes to
be made in the device core code to enable early registration of platform
devices, the driver puts in place an early init scheme that allows kernel
drivers to initialize the SPC driver directly from the components requiring
it, if their initialization routine is called before this driver init
function during the boot process.

Device tree bindings documentation for the SPC component is also provided.

Cc: Olof Johansson <olof@lixom.net>
Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Achin Gupta <achin.gupta@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
[ np: moved from drivers/mfd/ to drivers/platform/vexpress/ ]
Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 .../devicetree/bindings/arm/vexpress-spc.txt       |  36 +++
 drivers/platform/Kconfig                           |   4 +-
 drivers/platform/Makefile                          |   1 +
 drivers/platform/vexpress/Kconfig                  |   9 +
 drivers/platform/vexpress/Makefile                 |   1 +
 drivers/platform/vexpress/spc.c                    | 253 +++++++++++++++++++++
 include/linux/vexpress.h                           |  17 ++
 7 files changed, 320 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/arm/vexpress-spc.txt
 create mode 100644 drivers/platform/vexpress/Kconfig
 create mode 100644 drivers/platform/vexpress/Makefile
 create mode 100644 drivers/platform/vexpress/spc.c

diff --git a/Documentation/devicetree/bindings/arm/vexpress-spc.txt b/Documentation/devicetree/bindings/arm/vexpress-spc.txt
new file mode 100644
index 0000000000..1614725918
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/vexpress-spc.txt
@@ -0,0 +1,36 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power states transitions,
+wake-up IRQs and resume addresses for ARM multiprocessor testchips.
+The serial controller can be programmed through a memory mapped interface
+that enables communication between firmware running on the microcontroller
+managing power states and the application processors.
+
+The SPC DT bindings are defined as follows:
+
+- spc node
+
+	- compatible:
+		Usage: required
+		Value type: <stringlist>
+		Definition: must be
+			    "arm,vexpress-spc,v2p-ca15_a7", "arm,vexpress-spc"
+	- reg:
+		Usage: required
+		Value type: <prop-encode-array>
+		Definition: A standard property that specifies the base address
+			    and the size of the SPC address space
+	- interrupts:
+		Usage: required
+		Value type: <prop-encoded-array>
+		Definition:  SPC interrupt configuration. A standard property
+			     that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc at 7fff0000 {
+	compatible = "arm,vexpress-spc,v2p-ca15_a7", "arm,vexpress-spc";
+	reg = <0x7fff0000 0x1000>;
+	interrupts = <0 95 4>;
+};
diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
index 69616aeaa9..449010e828 100644
--- a/drivers/platform/Kconfig
+++ b/drivers/platform/Kconfig
@@ -4,4 +4,6 @@ endif
 if GOLDFISH
 source "drivers/platform/goldfish/Kconfig"
 endif
-
+if ARCH_VEXPRESS
+source "drivers/platform/vexpress/Kconfig"
+endif
diff --git a/drivers/platform/Makefile b/drivers/platform/Makefile
index 8a44a4cd6d..c1173d6e15 100644
--- a/drivers/platform/Makefile
+++ b/drivers/platform/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_X86)		+= x86/
 obj-$(CONFIG_OLPC)		+= olpc/
 obj-$(CONFIG_GOLDFISH)		+= goldfish/
+obj-$(CONFIG_ARCH_VEXPRESS)	+= vexpress/
diff --git a/drivers/platform/vexpress/Kconfig b/drivers/platform/vexpress/Kconfig
new file mode 100644
index 0000000000..d09a7c45bd
--- /dev/null
+++ b/drivers/platform/vexpress/Kconfig
@@ -0,0 +1,9 @@
+config VEXPRESS_SPC
+	bool "Versatile Express SPC driver support"
+	help
+	  The Serial Power Controller (SPC) for ARM Ltd. test chips, is
+	  an IP that provides a memory mapped interface to power controller
+	  HW. The driver provides an API abstraction meant to be used by
+	  subsystem drivers allowing to program registers controlling
+	  low-level power management features like power down flags,
+	  global and per-cpu wake-up IRQs.
diff --git a/drivers/platform/vexpress/Makefile b/drivers/platform/vexpress/Makefile
new file mode 100644
index 0000000000..d31eca24f4
--- /dev/null
+++ b/drivers/platform/vexpress/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_VEXPRESS_SPC)	+= spc.o
diff --git a/drivers/platform/vexpress/spc.c b/drivers/platform/vexpress/spc.c
new file mode 100644
index 0000000000..aa8c2a4862
--- /dev/null
+++ b/drivers/platform/vexpress/spc.c
@@ -0,0 +1,253 @@
+/*
+ * Versatile Express Serial Power Controller (SPC) support
+ *
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * Authors: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
+ *          Achin Gupta           <achin.gupta@arm.com>
+ *          Lorenzo Pieralisi     <lorenzo.pieralisi@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/of_address.h>
+#include <linux/slab.h>
+#include <linux/vexpress.h>
+
+#include <asm/cacheflush.h>
+
+#define SPCLOG "vexpress-spc: "
+
+/* SCC conf registers */
+#define A15_CONF		0x400
+#define SYS_INFO		0x700
+
+/* SPC registers base */
+#define SPC_BASE		0xB00
+
+/* SPC wake-up IRQs status and mask */
+#define WAKE_INT_MASK		(SPC_BASE + 0x24)
+#define WAKE_INT_RAW		(SPC_BASE + 0x28)
+#define WAKE_INT_STAT		(SPC_BASE + 0x2c)
+/* SPC power down registers */
+#define A15_PWRDN_EN		(SPC_BASE + 0x30)
+#define A7_PWRDN_EN		(SPC_BASE + 0x34)
+/* SPC per-CPU mailboxes */
+#define A15_BX_ADDR0		(SPC_BASE + 0x68)
+#define A7_BX_ADDR0		(SPC_BASE + 0x78)
+
+/* wake-up interrupt masks */
+#define GBL_WAKEUP_INT_MSK	(0x3 << 10)
+
+/* TC2 static dual-cluster configuration */
+#define MAX_CLUSTERS		2
+
+struct ve_spc_drvdata {
+	void __iomem *baseaddr;
+	/*
+	 * A15s cluster identifier
+	 * It corresponds to A15 processors MPIDR[15:8] bitfield
+	 */
+	u32 a15_clusid;
+};
+
+static struct ve_spc_drvdata *info;
+
+static inline bool cluster_is_a15(u32 cluster)
+{
+	return cluster == info->a15_clusid;
+}
+
+/**
+ * ve_spc_global_wakeup_irq()
+ *
+ * Function to set/clear global wakeup IRQs. Not protected by locking since
+ * it might be used in code paths where normal cacheable locks are not
+ * working. Locking must be provided by the caller to ensure atomicity.
+ *
+ * @set: if true, global wake-up IRQs are set, if false they are cleared
+ */
+void ve_spc_global_wakeup_irq(bool set)
+{
+	u32 reg;
+
+	reg = readl_relaxed(info->baseaddr + WAKE_INT_MASK);
+
+	if (set)
+		reg |= GBL_WAKEUP_INT_MSK;
+	else
+		reg &= ~GBL_WAKEUP_INT_MSK;
+
+	writel_relaxed(reg, info->baseaddr + WAKE_INT_MASK);
+}
+
+/**
+ * ve_spc_cpu_wakeup_irq()
+ *
+ * Function to set/clear per-CPU wake-up IRQs. Not protected by locking since
+ * it might be used in code paths where normal cacheable locks are not
+ * working. Locking must be provided by the caller to ensure atomicity.
+ *
+ * @cluster: mpidr[15:8] bitfield describing cluster affinity level
+ * @cpu: mpidr[7:0] bitfield describing cpu affinity level
+ * @set: if true, wake-up IRQs are set, if false they are cleared
+ */
+void ve_spc_cpu_wakeup_irq(u32 cluster, u32 cpu, bool set)
+{
+	u32 mask, reg;
+
+	if (cluster >= MAX_CLUSTERS)
+		return;
+
+	mask = 1 << cpu;
+
+	if (!cluster_is_a15(cluster))
+		mask <<= 4;
+
+	reg = readl_relaxed(info->baseaddr + WAKE_INT_MASK);
+
+	if (set)
+		reg |= mask;
+	else
+		reg &= ~mask;
+
+	writel_relaxed(reg, info->baseaddr + WAKE_INT_MASK);
+}
+
+/**
+ * ve_spc_set_resume_addr() - set the jump address used for warm boot
+ *
+ * @cluster: mpidr[15:8] bitfield describing cluster affinity level
+ * @cpu: mpidr[7:0] bitfield describing cpu affinity level
+ * @addr: physical resume address
+ */
+void ve_spc_set_resume_addr(u32 cluster, u32 cpu, u32 addr)
+{
+	void __iomem *baseaddr;
+
+	if (cluster >= MAX_CLUSTERS)
+		return;
+
+	if (cluster_is_a15(cluster))
+		baseaddr = info->baseaddr + A15_BX_ADDR0 + (cpu << 2);
+	else
+		baseaddr = info->baseaddr + A7_BX_ADDR0 + (cpu << 2);
+
+	writel_relaxed(addr, baseaddr);
+}
+
+/**
+ * ve_spc_get_nr_cpus() - get number of cpus in a cluster
+ *
+ * @cluster: mpidr[15:8] bitfield describing cluster affinity level
+ *
+ * Return: > 0 number of cpus in the cluster
+ *         or 0 if cluster number invalid
+ */
+u32 ve_spc_get_nr_cpus(u32 cluster)
+{
+	u32 val;
+
+	if (cluster >= MAX_CLUSTERS)
+		return 0;
+
+	val = readl_relaxed(info->baseaddr + SYS_INFO);
+	val = cluster_is_a15(cluster) ? (val >> 16) : (val >> 20);
+	return val & 0xf;
+}
+
+/**
+ * ve_spc_powerdown()
+ *
+ * Function to enable/disable cluster powerdown. Not protected by locking
+ * since it might be used in code paths where normal cacheable locks are not
+ * working. Locking must be provided by the caller to ensure atomicity.
+ *
+ * @cluster: mpidr[15:8] bitfield describing cluster affinity level
+ * @enable: if true enables powerdown, if false disables it
+ */
+void ve_spc_powerdown(u32 cluster, bool enable)
+{
+	u32 pwdrn_reg;
+
+	if (cluster >= MAX_CLUSTERS)
+		return;
+
+	pwdrn_reg = cluster_is_a15(cluster) ? A15_PWRDN_EN : A7_PWRDN_EN;
+	writel_relaxed(enable, info->baseaddr + pwdrn_reg);
+}
+
+static const struct of_device_id ve_spc_ids[] __initconst = {
+	{ .compatible = "arm,vexpress-spc,v2p-ca15_a7" },
+	{ .compatible = "arm,vexpress-spc" },
+	{},
+};
+
+static int __init ve_spc_probe(void)
+{
+	int ret;
+	struct device_node *node = of_find_matching_node(NULL, ve_spc_ids);
+
+	if (!node)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		pr_err(SPCLOG "unable to allocate mem\n");
+		return -ENOMEM;
+	}
+
+	info->baseaddr = of_iomap(node, 0);
+	if (!info->baseaddr) {
+		pr_err(SPCLOG "unable to ioremap memory\n");
+		ret = -ENXIO;
+		goto mem_free;
+	}
+
+	info->a15_clusid = readl_relaxed(info->baseaddr + A15_CONF) & 0xf;
+
+	/*
+	 * Multi-cluster systems may need this data when non-coherent, during
+	 * cluster power-up/power-down. Make sure driver info reaches main
+	 * memory.
+	 */
+	sync_cache_w(info);
+	sync_cache_w(&info);
+	pr_info("vexpress-spc loaded at %p\n", info->baseaddr);
+	return 0;
+
+mem_free:
+	kfree(info);
+	return ret;
+}
+
+/**
+ * ve_spc_init()
+ *
+ * Function exported to manage pre early_initcall initialization.
+ * SPC code is needed very early in the boot process to bring CPUs out of
+ * reset and initialize power management back-end so an init interface is
+ * provided to platform code to allow early initialization. The init
+ * interface can be removed as soon as the DT layer and platform bus allow
+ * platform device creation and probing before SMP boot.
+ */
+int __init ve_spc_init(void)
+{
+	static int ve_spc_init_status __initdata = -EAGAIN;
+
+	if (ve_spc_init_status == -EAGAIN)
+		ve_spc_init_status = ve_spc_probe();
+
+	return ve_spc_init_status;
+}
+early_initcall(ve_spc_init);
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 617c01b8f7..3e35556b22 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -124,4 +124,21 @@ void vexpress_osc_of_setup(struct device_node *node);
 void vexpress_clk_init(void __iomem *sp810_base);
 void vexpress_clk_of_init(void);
 
+/* SPC */
+
+#ifdef CONFIG_VEXPRESS_SPC
+int __init ve_spc_init(void);
+void ve_spc_global_wakeup_irq(bool set);
+void ve_spc_cpu_wakeup_irq(u32 cluster, u32 cpu, bool set);
+void ve_spc_set_resume_addr(u32 cluster, u32 cpu, u32 addr);
+u32 ve_spc_get_nr_cpus(u32 cluster);
+void ve_spc_powerdown(u32 cluster, bool enable);
+#else
+static inline int ve_spc_init(void) { return -ENODEV; }
+static inline void ve_spc_global_wakeup_irq(bool set) { }
+static inline void ve_spc_cpu_wakeup_irq(u32 cluster, u32 cpu, bool set) { }
+static inline void ve_spc_set_resume_addr(u32 cluster, u32 cpu, u32 addr) { }
+static inline u32 ve_spc_get_nr_cpus(u32 cluster) { return 0; }
+static inline void ve_spc_powerdown(u32 cluster, bool enable) { }
+#endif
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] ARM: vexpress/TC2: basic PM support
  2013-07-18  3:28 [PATCH 0/4] MCPM backends updates Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 2/4] drivers/platform: vexpress: add Serial Power Controller (SPC) support Nicolas Pitre
@ 2013-07-18  3:28 ` Nicolas Pitre
  2013-07-18  3:28 ` [PATCH 4/4] ARM: vexpress/TC2: implement PM suspend method Nicolas Pitre
  3 siblings, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

This is the MCPM backend for the Virtual Express A15x2 A7x3 CoreTile
aka TC2.  This provides cluster management for SMP secondary boot and
CPU hotplug.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm/mach-vexpress/Kconfig  |   9 ++
 arch/arm/mach-vexpress/Makefile |   1 +
 arch/arm/mach-vexpress/tc2_pm.c | 287 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 297 insertions(+)
 create mode 100644 arch/arm/mach-vexpress/tc2_pm.c

diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig
index b8bbabec63..4b082f8912 100644
--- a/arch/arm/mach-vexpress/Kconfig
+++ b/arch/arm/mach-vexpress/Kconfig
@@ -66,4 +66,13 @@ config ARCH_VEXPRESS_DCSCB
 	  This is needed to provide CPU and cluster power management
 	  on RTSM implementing big.LITTLE.
 
+config ARCH_VEXPRESS_TC2_PM
+	bool "Versatile Express TC2 power management"
+	depends on MCPM
+	select VEXPRESS_SPC
+	select ARM_CCI
+	help
+	  Support for CPU and cluster power management on Versatile Express
+	  with a TC2 (A15x2 A7x3) big.LITTLE core tile.
+
 endmenu
diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile
index 48ba89a814..0853da6395 100644
--- a/arch/arm/mach-vexpress/Makefile
+++ b/arch/arm/mach-vexpress/Makefile
@@ -7,5 +7,6 @@ ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \
 obj-y					:= v2m.o
 obj-$(CONFIG_ARCH_VEXPRESS_CA9X4)	+= ct-ca9x4.o
 obj-$(CONFIG_ARCH_VEXPRESS_DCSCB)	+= dcscb.o	dcscb_setup.o
+obj-$(CONFIG_ARCH_VEXPRESS_TC2_PM)	+= tc2_pm.o
 obj-$(CONFIG_SMP)			+= platsmp.o
 obj-$(CONFIG_HOTPLUG_CPU)		+= hotplug.o
diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
new file mode 100644
index 0000000000..26dbe782fb
--- /dev/null
+++ b/arch/arm/mach-vexpress/tc2_pm.c
@@ -0,0 +1,287 @@
+/*
+ * arch/arm/mach-vexpress/tc2_pm.c - TC2 power management support
+ *
+ * Created by:	Nicolas Pitre, October 2012
+ * Copyright:	(C) 2012-2013  Linaro Limited
+ *
+ * Some portions of this file were originally written by Achin Gupta
+ * Copyright:   (C) 2012  ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+
+#include <asm/mcpm.h>
+#include <asm/proc-fns.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/cp15.h>
+
+#include <linux/vexpress.h>
+#include <linux/arm-cci.h>
+
+/*
+ * We can't use regular spinlocks. In the switcher case, it is possible
+ * for an outbound CPU to call power_down() after its inbound counterpart
+ * is already live using the same logical CPU number which trips lockdep
+ * debugging.
+ */
+static arch_spinlock_t tc2_pm_lock = __ARCH_SPIN_LOCK_UNLOCKED;
+
+#define TC2_CLUSTERS			2
+#define TC2_MAX_CPUS_PER_CLUSTER	3
+
+/* Keep per-cpu usage count to cope with unordered up/down requests */
+static int tc2_pm_use_count[TC2_MAX_CPUS_PER_CLUSTER][TC2_CLUSTERS];
+
+#define tc2_cluster_unused(cluster) \
+	(!tc2_pm_use_count[0][cluster] && \
+	 !tc2_pm_use_count[1][cluster] && \
+	 !tc2_pm_use_count[2][cluster])
+
+static int tc2_pm_power_up(unsigned int cpu, unsigned int cluster)
+{
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	if (cluster >= TC2_CLUSTERS || cpu >= ve_spc_get_nr_cpus(cluster))
+		return -EINVAL;
+
+	/*
+	 * Since this is called with IRQs enabled, and no arch_spin_lock_irq
+	 * variant exists, we need to disable IRQs manually here.
+	 */
+	local_irq_disable();
+	arch_spin_lock(&tc2_pm_lock);
+
+	if (tc2_cluster_unused(cluster))
+		ve_spc_powerdown(cluster, false);
+
+	tc2_pm_use_count[cpu][cluster]++;
+	if (tc2_pm_use_count[cpu][cluster] == 1) {
+		ve_spc_set_resume_addr(cluster, cpu,
+				       virt_to_phys(mcpm_entry_point));
+		ve_spc_cpu_wakeup_irq(cluster, cpu, true);
+	} else if (tc2_pm_use_count[cpu][cluster] != 2) {
+		/*
+		 * The only possible values are:
+		 * 0 = CPU down
+		 * 1 = CPU (still) up
+		 * 2 = CPU requested to be up before it had a chance
+		 *     to actually make itself down.
+		 * Any other value is a bug.
+		 */
+		BUG();
+	}
+
+	arch_spin_unlock(&tc2_pm_lock);
+	local_irq_enable();
+
+	return 0;
+}
+
+static void tc2_pm_power_down(void)
+{
+	unsigned int mpidr, cpu, cluster;
+	bool last_man = false, skip_wfi = false;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	BUG_ON(cluster >= TC2_CLUSTERS || cpu >= TC2_MAX_CPUS_PER_CLUSTER);
+
+	__mcpm_cpu_going_down(cpu, cluster);
+
+	arch_spin_lock(&tc2_pm_lock);
+	BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
+	tc2_pm_use_count[cpu][cluster]--;
+	if (tc2_pm_use_count[cpu][cluster] == 0) {
+		ve_spc_cpu_wakeup_irq(cluster, cpu, true);
+		if (tc2_cluster_unused(cluster)) {
+			ve_spc_powerdown(cluster, true);
+			ve_spc_global_wakeup_irq(true);
+			last_man = true;
+		}
+	} else if (tc2_pm_use_count[cpu][cluster] == 1) {
+		/*
+		 * A power_up request went ahead of us.
+		 * Even if we do not want to shut this CPU down,
+		 * the caller expects a certain state as if the WFI
+		 * was aborted.  So let's continue with cache cleaning.
+		 */
+		skip_wfi = true;
+	} else
+		BUG();
+
+	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
+		arch_spin_unlock(&tc2_pm_lock);
+
+		if (read_cpuid_part_number() == ARM_CPU_PART_CORTEX_A15) {
+			/*
+			 * On the Cortex-A15 we need to disable
+			 * L2 prefetching before flushing the cache.
+			 */
+			asm volatile(
+			"mcr	p15, 1, %0, c15, c0, 3 \n\t"
+			"isb	\n\t"
+			"dsb	"
+			: : "r" (0x400) );
+		}
+
+		/*
+		 * We need to disable and flush the whole (L1 and L2) cache.
+		 * Let's do it in the safest possible way i.e. with
+		 * no memory access within the following sequence
+		 * including the stack.
+		 */
+		asm volatile(
+		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
+		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
+		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
+		"isb	\n\t"
+		"bl	v7_flush_dcache_all \n\t"
+		"clrex	\n\t"
+		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
+		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
+		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
+		"isb	\n\t"
+		"dsb	"
+		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
+		      "r9","r10","r11","lr","memory");
+
+		cci_disable_port_by_cpu(mpidr);
+
+		__mcpm_outbound_leave_critical(cluster, CLUSTER_DOWN);
+	} else {
+		/*
+		 * If last man then undo any setup done previously.
+		 */
+		if (last_man) {
+			ve_spc_powerdown(cluster, false);
+			ve_spc_global_wakeup_irq(false);
+		}
+
+		arch_spin_unlock(&tc2_pm_lock);
+
+		/*
+		 * We need to disable and flush only the L1 cache.
+		 * Let's do it in the safest possible way as above.
+		 */
+		asm volatile(
+		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
+		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
+		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
+		"isb	\n\t"
+		"bl	v7_flush_dcache_louis \n\t"
+		"clrex	\n\t"
+		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
+		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
+		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
+		"isb	\n\t"
+		"dsb	"
+		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
+		      "r9","r10","r11","lr","memory");
+	}
+
+	__mcpm_cpu_down(cpu, cluster);
+
+	/* Now we are prepared for power-down, do it: */
+	if (!skip_wfi)
+		wfi();
+
+	/* Not dead@this point?  Let our caller cope. */
+}
+
+static void tc2_pm_powered_up(void)
+{
+	unsigned int mpidr, cpu, cluster;
+	unsigned long flags;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	BUG_ON(cluster >= TC2_CLUSTERS || cpu >= TC2_MAX_CPUS_PER_CLUSTER);
+
+	local_irq_save(flags);
+	arch_spin_lock(&tc2_pm_lock);
+
+	if (tc2_cluster_unused(cluster)) {
+		ve_spc_powerdown(cluster, false);
+		ve_spc_global_wakeup_irq(false);
+	}
+
+	if (!tc2_pm_use_count[cpu][cluster])
+		tc2_pm_use_count[cpu][cluster] = 1;
+
+	ve_spc_cpu_wakeup_irq(cluster, cpu, false);
+	ve_spc_set_resume_addr(cluster, cpu, 0);
+
+	arch_spin_unlock(&tc2_pm_lock);
+	local_irq_restore(flags);
+}
+
+static const struct mcpm_platform_ops tc2_pm_power_ops = {
+	.power_up	= tc2_pm_power_up,
+	.power_down	= tc2_pm_power_down,
+	.powered_up	= tc2_pm_powered_up,
+};
+
+static bool __init tc2_pm_usage_count_init(void)
+{
+	unsigned int mpidr, cpu, cluster;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	if (cluster >= TC2_CLUSTERS || cpu >= ve_spc_get_nr_cpus(cluster)) {
+		pr_err("%s: boot CPU is out of bound!\n", __func__);
+		return false;
+	}
+	tc2_pm_use_count[cpu][cluster] = 1;
+	return true;
+}
+
+/*
+ * Enable cluster-level coherency, in preparation for turning on the MMU.
+ */
+static void __naked tc2_pm_power_up_setup(unsigned int affinity_level)
+{
+	asm volatile (" \n"
+"	cmp	r0, #1 \n"
+"	bxne	lr \n"
+"	b	cci_enable_port_for_self ");
+}
+
+static int __init tc2_pm_init(void)
+{
+	int ret;
+
+	ret = ve_spc_init();
+	if (ret)
+		return ret;
+
+	if (!cci_probed())
+		return -ENODEV;
+
+	if (!tc2_pm_usage_count_init())
+		return -EINVAL;
+
+	ret = mcpm_platform_register(&tc2_pm_power_ops);
+	if (!ret) {
+		mcpm_sync_init(tc2_pm_power_up_setup);
+		pr_info("TC2 power management initialized\n");
+	}
+	return ret;
+}
+
+early_initcall(tc2_pm_init);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] ARM: vexpress/TC2: implement PM suspend method
  2013-07-18  3:28 [PATCH 0/4] MCPM backends updates Nicolas Pitre
                   ` (2 preceding siblings ...)
  2013-07-18  3:28 ` [PATCH 3/4] ARM: vexpress/TC2: basic PM support Nicolas Pitre
@ 2013-07-18  3:28 ` Nicolas Pitre
  3 siblings, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18  3:28 UTC (permalink / raw)
  To: linux-arm-kernel

From: Nicolas Pitre <nico@linaro.org>

Similar to power_down(), except that for a suspend, the firmware
mailbox address has to be set prior entering low power mode.

The residency argument is not used yet, so the last man always shuts
down the cluster for now.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Pawel Moll <pawel.moll@arm.com>
---
 arch/arm/mach-vexpress/tc2_pm.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
index 26dbe782fb..dfb55d45b6 100644
--- a/arch/arm/mach-vexpress/tc2_pm.c
+++ b/arch/arm/mach-vexpress/tc2_pm.c
@@ -84,7 +84,7 @@ static int tc2_pm_power_up(unsigned int cpu, unsigned int cluster)
 	return 0;
 }
 
-static void tc2_pm_power_down(void)
+static void tc2_pm_down(u64 residency)
 {
 	unsigned int mpidr, cpu, cluster;
 	bool last_man = false, skip_wfi = false;
@@ -198,6 +198,22 @@ static void tc2_pm_power_down(void)
 	/* Not dead at this point?  Let our caller cope. */
 }
 
+static void tc2_pm_power_down(void)
+{
+	tc2_pm_down(0);
+}
+
+static void tc2_pm_suspend(u64 residency)
+{
+	unsigned int mpidr, cpu, cluster;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+	ve_spc_set_resume_addr(cluster, cpu, virt_to_phys(mcpm_entry_point));
+	tc2_pm_down(residency);
+}
+
 static void tc2_pm_powered_up(void)
 {
 	unsigned int mpidr, cpu, cluster;
@@ -231,6 +247,7 @@ static void tc2_pm_powered_up(void)
 static const struct mcpm_platform_ops tc2_pm_power_ops = {
 	.power_up	= tc2_pm_power_up,
 	.power_down	= tc2_pm_power_down,
+	.suspend	= tc2_pm_suspend,
 	.powered_up	= tc2_pm_powered_up,
 };
 
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18  3:28 ` [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences Nicolas Pitre
@ 2013-07-18 15:04   ` Dave Martin
  2013-07-18 17:24     ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-18 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 17, 2013 at 11:28:33PM -0400, Nicolas Pitre wrote:
> Unlike real A15/A7's, the RTSM simulation doesn't appear to hit the
> cache when the CTRL.C bit is cleared.  Let's ensure there is no memory
> access within the disable and flush cache sequence, including to the
> stack.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  arch/arm/mach-vexpress/dcscb.c | 58 +++++++++++++++++++++++++++---------------
>  1 file changed, 37 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
> index 16d57a8a9d..9f01c04d58 100644
> --- a/arch/arm/mach-vexpress/dcscb.c
> +++ b/arch/arm/mach-vexpress/dcscb.c
> @@ -136,14 +136,29 @@ static void dcscb_power_down(void)
>  		/*
>  		 * Flush all cache levels for this cluster.
>  		 *
> -		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
> -		 * a preliminary flush here for those CPUs.  At least, that's
> -		 * the theory -- without the extra flush, Linux explodes on
> -		 * RTSM (to be investigated).
> +		 * To do so we do:
> +		 * - Clear the CTLR.C bit to prevent further cache allocations

SCTLR

> +		 * - Flush the whole cache
> +		 * - Disable local coherency by clearing the ACTLR "SMP" bit
> +		 *
> +		 * Let's do it in the safest possible way i.e. with
> +		 * no memory access within the following sequence
> +		 * including the stack.
>  		 */
> -		flush_cache_all();
> -		set_cr(get_cr() & ~CR_C);
> -		flush_cache_all();
> +		asm volatile(
> +		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> +		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> +		"isb	\n\t"
> +		"bl	v7_flush_dcache_all \n\t"
> +		"clrex	\n\t"
> +		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> +		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> +		"isb	\n\t"
> +		"dsb	"
> +		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> +		      "r9","r10","r11","lr","memory");

Along with the TC2 support, we now have 4 copies of this code sequence.

This is basically the A15/A7 native "exit coherency and flash and
disable some levels of dcache" operation, whose only parameter is which
cache levels to flush.

That's a big mouthful -- we can probably come up with a better name --
but we've pretty much concluded that there is no way to break this
operation apart into bitesize pieces.  Nonetheless, any native
powerdown sequence for these processors will need to do this, or
something closely related.

Is it worth consolidating, or is that premature?

> +		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> +		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> +		"isb	\n\t"
> +		"bl	v7_flush_dcache_all \n\t"
> +		"clrex	\n\t"
> +		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> +		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> +		"isb	\n\t"
> +		"dsb	"
> +		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> +		      "r9","r10","r11","lr","memory");


>  
>  		/*
>  		 * This is a harmless no-op.  On platforms with a real
> @@ -152,9 +167,6 @@ static void dcscb_power_down(void)
>  		 */
>  		outer_flush_all();

Do you have any opinion on whether we should leave that for educational
purposes?

None of the relevant platforms has a non-architected outer cache, so
there could be an argument for removing it.  But that might mislead
people who use this code as a reference.

Cheers
---Dave

>  
> -		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
> -		set_auxcr(get_auxcr() & ~(1 << 6));
> -
>  		/*
>  		 * Disable cluster-level coherency by masking
>  		 * incoming snoops and DVM messages:
> @@ -167,18 +179,22 @@ static void dcscb_power_down(void)
>  
>  		/*
>  		 * Flush the local CPU cache.
> -		 *
> -		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
> -		 * a preliminary flush here for those CPUs.  At least, that's
> -		 * the theory -- without the extra flush, Linux explodes on
> -		 * RTSM (to be investigated).
> +		 * Let's do it in the safest possible way as above.
>  		 */
> -		flush_cache_louis();
> -		set_cr(get_cr() & ~CR_C);
> -		flush_cache_louis();
> -
> -		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
> -		set_auxcr(get_auxcr() & ~(1 << 6));
> +		asm volatile(
> +		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> +		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> +		"isb	\n\t"
> +		"bl	v7_flush_dcache_louis \n\t"
> +		"clrex	\n\t"
> +		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> +		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> +		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> +		"isb	\n\t"
> +		"dsb	"
> +		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> +		      "r9","r10","r11","lr","memory");
>  	}
>  
>  	__mcpm_cpu_down(cpu, cluster);
> -- 
> 1.8.1.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18 15:04   ` Dave Martin
@ 2013-07-18 17:24     ` Nicolas Pitre
  2013-07-18 18:03       ` Dave Martin
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

[ added Russell for his opinion on the patch below ]

On Thu, 18 Jul 2013, Dave Martin wrote:

> On Wed, Jul 17, 2013 at 11:28:33PM -0400, Nicolas Pitre wrote:
> > Unlike real A15/A7's, the RTSM simulation doesn't appear to hit the
> > cache when the CTRL.C bit is cleared.  Let's ensure there is no memory
> > access within the disable and flush cache sequence, including to the
> > stack.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > ---
> >  arch/arm/mach-vexpress/dcscb.c | 58 +++++++++++++++++++++++++++---------------
> >  1 file changed, 37 insertions(+), 21 deletions(-)
> > 
> > diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
> > index 16d57a8a9d..9f01c04d58 100644
> > --- a/arch/arm/mach-vexpress/dcscb.c
> > +++ b/arch/arm/mach-vexpress/dcscb.c
> > @@ -136,14 +136,29 @@ static void dcscb_power_down(void)
> >  		/*
> >  		 * Flush all cache levels for this cluster.
> >  		 *
> > -		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
> > -		 * a preliminary flush here for those CPUs.  At least, that's
> > -		 * the theory -- without the extra flush, Linux explodes on
> > -		 * RTSM (to be investigated).
> > +		 * To do so we do:
> > +		 * - Clear the CTLR.C bit to prevent further cache allocations
> 
> SCTLR

Fixed.

> > +		 * - Flush the whole cache
> > +		 * - Disable local coherency by clearing the ACTLR "SMP" bit
> > +		 *
> > +		 * Let's do it in the safest possible way i.e. with
> > +		 * no memory access within the following sequence
> > +		 * including the stack.
> >  		 */
> > -		flush_cache_all();
> > -		set_cr(get_cr() & ~CR_C);
> > -		flush_cache_all();
> > +		asm volatile(
> > +		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> > +		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> > +		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> > +		"isb	\n\t"
> > +		"bl	v7_flush_dcache_all \n\t"
> > +		"clrex	\n\t"
> > +		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> > +		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> > +		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> > +		"isb	\n\t"
> > +		"dsb	"
> > +		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> > +		      "r9","r10","r11","lr","memory");
> 
> Along with the TC2 support, we now have 4 copies of this code sequence.
> 
> This is basically the A15/A7 native "exit coherency and flash and
> disable some levels of dcache" operation, whose only parameter is which
> cache levels to flush.
> 
> That's a big mouthful -- we can probably come up with a better name --
> but we've pretty much concluded that there is no way to break this
> operation apart into bitesize pieces.  Nonetheless, any native
> powerdown sequence for these processors will need to do this, or
> something closely related.
> 
> Is it worth consolidating, or is that premature?

It is probably worth consolidating.

What about this:

commit 390cf8b9b83eeeebdfef51912f5003a6a9b84115
Author: Nicolas Pitre <nicolas.pitre@linaro.org>
Date:   Thu Jul 18 13:12:48 2013 -0400

    ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling code
    
    This code is becoming duplicated in many places.  So let's consolidate
    it into a handy macro that is known to be right and available for reuse.
    
    Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 17d0ae8672..8a76933e80 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -436,4 +436,33 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
 #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
 #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
 
+/*
+ * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
+ * To do so we must:
+ *
+ * - Clear the SCTLR.C bit to prevent further cache allocations
+ * - Flush the desired level of cache
+ * - Clear the ACTLR "SMP" bit to disable local coherency
+ *
+ * ... and so without any intervening memory access in between those steps,
+ * not even to the stack.
+ *
+ * The clobber list is dictated by the call to v7_flush_dcache_*.
+ */
+#define v7_disable_flush_cache(level) \
+	asm volatile( \
+	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
+	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
+	"isb	\n\t" \
+	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
+	"clrex	\n\t" \
+	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
+	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
+	"isb	\n\t" \
+	"dsb	" \
+	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
+	      "r9","r10","r11","lr","memory" )
+
 #endif
diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 85fffa702f..145d8237d5 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -133,32 +133,8 @@ static void dcscb_power_down(void)
 	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush all cache levels for this cluster.
-		 *
-		 * To do so we do:
-		 * - Clear the SCTLR.C bit to prevent further cache allocations
-		 * - Flush the whole cache
-		 * - Clear the ACTLR "SMP" bit to disable local coherency
-		 *
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including to the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Flush all cache levels for this cluster. */
+		v7_disable_flush_cache(all);
 
 		/*
 		 * This is a harmless no-op.  On platforms with a real
@@ -177,24 +153,8 @@ static void dcscb_power_down(void)
 	} else {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush the local CPU cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Flush the local CPU cache. */
+		v7_disable_flush_cache(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);
diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
index dfb55d45b6..fd8bc2d931 100644
--- a/arch/arm/mach-vexpress/tc2_pm.c
+++ b/arch/arm/mach-vexpress/tc2_pm.c
@@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency)
 			: : "r" (0x400) );
 		}
 
-		/*
-		 * We need to disable and flush the whole (L1 and L2) cache.
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_disable_flush_cache(all);
 
 		cci_disable_port_by_cpu(mpidr);
 
@@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency)
 
 		arch_spin_unlock(&tc2_pm_lock);
 
-		/*
-		 * We need to disable and flush only the L1 cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_disable_flush_cache(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18 17:24     ` Nicolas Pitre
@ 2013-07-18 18:03       ` Dave Martin
  2013-07-18 18:59         ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-18 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 18, 2013 at 01:24:48PM -0400, Nicolas Pitre wrote:
> [ added Russell for his opinion on the patch below ]

[...]

> commit 390cf8b9b83eeeebdfef51912f5003a6a9b84115
> Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date:   Thu Jul 18 13:12:48 2013 -0400
> 
>     ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling code
>     
>     This code is becoming duplicated in many places.  So let's consolidate
>     it into a handy macro that is known to be right and available for reuse.
>     
>     Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 17d0ae8672..8a76933e80 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -436,4 +436,33 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
>  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
>  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
>  
> +/*
> + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> + * To do so we must:
> + *
> + * - Clear the SCTLR.C bit to prevent further cache allocations
> + * - Flush the desired level of cache
> + * - Clear the ACTLR "SMP" bit to disable local coherency
> + *
> + * ... and so without any intervening memory access in between those steps,
> + * not even to the stack.
> + *
> + * The clobber list is dictated by the call to v7_flush_dcache_*.
> + */
> +#define v7_disable_flush_cache(level) \
> +	asm volatile( \
> +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> +	"isb	\n\t" \
> +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> +	"clrex	\n\t" \
> +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> +	"isb	\n\t" \
> +	"dsb	" \
> +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> +	      "r9","r10","r11","lr","memory" )
> +
>  #endif

That's roughly what I had in mind, though it might belong somewhere more
obscure than cacheflush.h(?)

"disable_flush_cache" sounds a too general-purpose for my liking.

"v7" isn't really right because we touch the ACTLR.  This only works
on certain CPUs and when Linux is running Secure.  Maybe saying "a15"
instead of "a7" is reasonable, since a7 is supposed to be an a15
workalike in most places.

I had other names in mind, like "coherency_exit" or "cache_exit".
Those are not very intelligible, but that might at least make people
pause and think before blindly using it.

Really, it's for the very specific purpose of doing native low-level
powerdown of an individual CPU in a multiprocessor system, when the
platform as a whole is not being shut down, on A15 or A7 (+ possibly
other future v7 CPUs)... it's not suitable for any other situation.
Even kexec probably shouldn't be disabling the SMP bit.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18 18:03       ` Dave Martin
@ 2013-07-18 18:59         ` Nicolas Pitre
  2013-07-19 10:28           ` Dave Martin
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-18 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 18 Jul 2013, Dave Martin wrote:

> On Thu, Jul 18, 2013 at 01:24:48PM -0400, Nicolas Pitre wrote:
> > [ added Russell for his opinion on the patch below ]
> 
> [...]
> 
> > commit 390cf8b9b83eeeebdfef51912f5003a6a9b84115
> > Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date:   Thu Jul 18 13:12:48 2013 -0400
> > 
> >     ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling code
> >     
> >     This code is becoming duplicated in many places.  So let's consolidate
> >     it into a handy macro that is known to be right and available for reuse.
> >     
> >     Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > 
> > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > index 17d0ae8672..8a76933e80 100644
> > --- a/arch/arm/include/asm/cacheflush.h
> > +++ b/arch/arm/include/asm/cacheflush.h
> > @@ -436,4 +436,33 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> >  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> >  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> >  
> > +/*
> > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> > + * To do so we must:
> > + *
> > + * - Clear the SCTLR.C bit to prevent further cache allocations
> > + * - Flush the desired level of cache
> > + * - Clear the ACTLR "SMP" bit to disable local coherency
> > + *
> > + * ... and so without any intervening memory access in between those steps,
> > + * not even to the stack.
> > + *
> > + * The clobber list is dictated by the call to v7_flush_dcache_*.
> > + */
> > +#define v7_disable_flush_cache(level) \
> > +	asm volatile( \
> > +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> > +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> > +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> > +	"isb	\n\t" \
> > +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> > +	"clrex	\n\t" \
> > +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> > +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> > +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> > +	"isb	\n\t" \
> > +	"dsb	" \
> > +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> > +	      "r9","r10","r11","lr","memory" )
> > +
> >  #endif
> 
> That's roughly what I had in mind, though it might belong somewhere more
> obscure than cacheflush.h(?)

Any suggestions for that?  Maybe asm/mcpm.h but that might be used in 
other circumstances as well.

> "disable_flush_cache" sounds a too general-purpose for my liking.
> 
> "v7" isn't really right because we touch the ACTLR.  This only works
> on certain CPUs and when Linux is running Secure.  Maybe saying "a15"
> instead of "a7" is reasonable, since a7 is supposed to be an a15
> workalike in most places.

Isn't this how this should be done on an A9 too?  Lorenzo asked me to do 
it this way, by fear of seeing people copy the previous implementation, 
so the same sequence works on A9 as well as A15.

> I had other names in mind, like "coherency_exit" or "cache_exit".
> Those are not very intelligible, but that might at least make people
> pause and think before blindly using it.

Good point.  It should still embody the architecture name for which it 
is valid though.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-18 18:59         ` Nicolas Pitre
@ 2013-07-19 10:28           ` Dave Martin
  2013-07-19 10:59             ` Lorenzo Pieralisi
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-19 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> On Thu, 18 Jul 2013, Dave Martin wrote:
> 
> > On Thu, Jul 18, 2013 at 01:24:48PM -0400, Nicolas Pitre wrote:
> > > [ added Russell for his opinion on the patch below ]
> > 
> > [...]
> > 
> > > commit 390cf8b9b83eeeebdfef51912f5003a6a9b84115
> > > Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> > > Date:   Thu Jul 18 13:12:48 2013 -0400
> > > 
> > >     ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling code
> > >     
> > >     This code is becoming duplicated in many places.  So let's consolidate
> > >     it into a handy macro that is known to be right and available for reuse.
> > >     
> > >     Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > > 
> > > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > > index 17d0ae8672..8a76933e80 100644
> > > --- a/arch/arm/include/asm/cacheflush.h
> > > +++ b/arch/arm/include/asm/cacheflush.h
> > > @@ -436,4 +436,33 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> > >  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> > >  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> > >  
> > > +/*
> > > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> > > + * To do so we must:
> > > + *
> > > + * - Clear the SCTLR.C bit to prevent further cache allocations
> > > + * - Flush the desired level of cache
> > > + * - Clear the ACTLR "SMP" bit to disable local coherency
> > > + *
> > > + * ... and so without any intervening memory access in between those steps,
> > > + * not even to the stack.
> > > + *
> > > + * The clobber list is dictated by the call to v7_flush_dcache_*.
> > > + */
> > > +#define v7_disable_flush_cache(level) \
> > > +	asm volatile( \
> > > +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> > > +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> > > +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> > > +	"isb	\n\t" \
> > > +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> > > +	"clrex	\n\t" \
> > > +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> > > +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> > > +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> > > +	"isb	\n\t" \
> > > +	"dsb	" \
> > > +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> > > +	      "r9","r10","r11","lr","memory" )
> > > +
> > >  #endif
> > 
> > That's roughly what I had in mind, though it might belong somewhere more
> > obscure than cacheflush.h(?)
> 
> Any suggestions for that?  Maybe asm/mcpm.h but that might be used in 
> other circumstances as well.

I don't really have a better idea...  if nobody objects strongly to
putting it in cacheflush.h, I happy to go with that.
> 
> > "disable_flush_cache" sounds a too general-purpose for my liking.
> > 
> > "v7" isn't really right because we touch the ACTLR.  This only works
> > on certain CPUs and when Linux is running Secure.  Maybe saying "a15"
> > instead of "a7" is reasonable, since a7 is supposed to be an a15
> > workalike in most places.
> 
> Isn't this how this should be done on an A9 too?  Lorenzo asked me to do 

Quite possibly -- the point is, it's not guaranteed that all v7 CPUs will
work the same way, especially the custom implementations.   Third-party v7
implementations like Snapdragon etc. need not be in any way the same as
ARM's CPUs with regard to details like the SMP bit.

> it this way, by fear of seeing people copy the previous implementation, 
> so the same sequence works on A9 as well as A15.

Would it make sense to have things like

#define a15_something() asm(volatile ... )
#define a7_something() a15_something()
#define a9_something() a15_something()

(where the correct one to call for a b.L system would probably be the
one corresponding to the big cluster)

... or just call is something like

cortex_a_something()

with a big comment (or alias macros, as above) documenting which CPUs we
know it works for, and warning people to stop and think before using it
on some other CPU.

> > I had other names in mind, like "coherency_exit" or "cache_exit".
> > Those are not very intelligible, but that might at least make people
> > pause and think before blindly using it.
> 
> Good point.  It should still embody the architecture name for which it 
> is valid though.

Sure, I was assuming something would be pasted on the start of the name.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-19 10:28           ` Dave Martin
@ 2013-07-19 10:59             ` Lorenzo Pieralisi
  2013-07-22 17:58               ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Lorenzo Pieralisi @ 2013-07-19 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > On Thu, 18 Jul 2013, Dave Martin wrote:
> > 
> > > On Thu, Jul 18, 2013 at 01:24:48PM -0400, Nicolas Pitre wrote:
> > > > [ added Russell for his opinion on the patch below ]
> > > 
> > > [...]
> > > 
> > > > commit 390cf8b9b83eeeebdfef51912f5003a6a9b84115
> > > > Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> > > > Date:   Thu Jul 18 13:12:48 2013 -0400
> > > > 
> > > >     ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling code
> > > >     
> > > >     This code is becoming duplicated in many places.  So let's consolidate
> > > >     it into a handy macro that is known to be right and available for reuse.
> > > >     
> > > >     Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > > > 
> > > > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > > > index 17d0ae8672..8a76933e80 100644
> > > > --- a/arch/arm/include/asm/cacheflush.h
> > > > +++ b/arch/arm/include/asm/cacheflush.h
> > > > @@ -436,4 +436,33 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> > > >  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> > > >  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> > > >  
> > > > +/*
> > > > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> > > > + * To do so we must:
> > > > + *
> > > > + * - Clear the SCTLR.C bit to prevent further cache allocations
> > > > + * - Flush the desired level of cache
> > > > + * - Clear the ACTLR "SMP" bit to disable local coherency
> > > > + *
> > > > + * ... and so without any intervening memory access in between those steps,
> > > > + * not even to the stack.
> > > > + *
> > > > + * The clobber list is dictated by the call to v7_flush_dcache_*.
> > > > + */
> > > > +#define v7_disable_flush_cache(level) \
> > > > +	asm volatile( \
> > > > +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> > > > +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> > > > +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> > > > +	"isb	\n\t" \
> > > > +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> > > > +	"clrex	\n\t" \
> > > > +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> > > > +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> > > > +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> > > > +	"isb	\n\t" \
> > > > +	"dsb	" \
> > > > +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> > > > +	      "r9","r10","r11","lr","memory" )
> > > > +
> > > >  #endif
> > > 
> > > That's roughly what I had in mind, though it might belong somewhere more
> > > obscure than cacheflush.h(?)
> > 
> > Any suggestions for that?  Maybe asm/mcpm.h but that might be used in 
> > other circumstances as well.
> 
> I don't really have a better idea...  if nobody objects strongly to
> putting it in cacheflush.h, I happy to go with that.
> > 
> > > "disable_flush_cache" sounds a too general-purpose for my liking.
> > > 
> > > "v7" isn't really right because we touch the ACTLR.  This only works
> > > on certain CPUs and when Linux is running Secure.  Maybe saying "a15"
> > > instead of "a7" is reasonable, since a7 is supposed to be an a15
> > > workalike in most places.
> > 
> > Isn't this how this should be done on an A9 too?  Lorenzo asked me to do 
> 
> Quite possibly -- the point is, it's not guaranteed that all v7 CPUs will
> work the same way, especially the custom implementations.   Third-party v7
> implementations like Snapdragon etc. need not be in any way the same as
> ARM's CPUs with regard to details like the SMP bit.

Well, if they change the SMP bit behaviour from what the TRM describes they
certainly know what they are doing, I am not sure we should cater for that,
they will just use their own function if their behaviour differs from
"standard".

At least all v7 implementations I've run into in the kernel (mainline
and platform trees) could have used the sequence above (well, security
notwithstanding).

Also, we should point out that after clearing the C bit ldr/str exclusive
must not be used anymore (even though this is implementation defined since it
depends on a global monitor in DRAM). The macro is also a barrier in
terms of coherency when we call it we should know what to expect,
basically a CPU with caches that are not coherent anymore.

> > it this way, by fear of seeing people copy the previous implementation, 
> > so the same sequence works on A9 as well as A15.
> 
> Would it make sense to have things like
> 
> #define a15_something() asm(volatile ... )
> #define a7_something() a15_something()
> #define a9_something() a15_something()
> 
> (where the correct one to call for a b.L system would probably be the
> one corresponding to the big cluster)
> 
> ... or just call is something like
> 
> cortex_a_something()
> 
> with a big comment (or alias macros, as above) documenting which CPUs we
> know it works for, and warning people to stop and think before using it
> on some other CPU.

It might be, even though a well written comment would do IMHO.

> > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > Those are not very intelligible, but that might at least make people
> > > pause and think before blindly using it.
> > 
> > Good point.  It should still embody the architecture name for which it 
> > is valid though.
> 
> Sure, I was assuming something would be pasted on the start of the name.

v7 :-) with a comment describing the assumptions (in particular related
as Dave mentioned to the SMP bit behaviour) ?

Lorenzo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-19 10:59             ` Lorenzo Pieralisi
@ 2013-07-22 17:58               ` Nicolas Pitre
  2013-07-23 10:43                 ` Dave Martin
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-22 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:

> On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > 
> > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > Those are not very intelligible, but that might at least make people
> > > > pause and think before blindly using it.
> > > 
> > > Good point.  It should still embody the architecture name for which it 
> > > is valid though.
> > 
> > Sure, I was assuming something would be pasted on the start of the name.
> 
> v7 :-) with a comment describing the assumptions (in particular related
> as Dave mentioned to the SMP bit behaviour) ?

OK... What about this then:

----- >8
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Thu, 18 Jul 2013 13:12:48 -0400
Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
 code

This code is becoming duplicated in many places.  So let's consolidate
it into a handy macro that is known to be right and available for reuse.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 17d0ae8672..8f4e2297e2 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
 #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
 #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
 
+/*
+ * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
+ * To do so we must:
+ *
+ * - Clear the SCTLR.C bit to prevent further cache allocations
+ * - Flush the desired level of cache
+ * - Clear the ACTLR "SMP" bit to disable local coherency
+ *
+ * ... and so without any intervening memory access in between those steps,
+ * not even to the stack.
+ *
+ * WARNING -- After this has been called:
+ *
+ * - No ldr/str exclusive must be used.
+ * - The CPU is obviously no longer coherent with the other CPUs.
+ *
+ * Further considerations:
+ *
+ * - This relies on the presence and behavior of the AUXCR.SMP bit as
+ *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
+ *   that will need their own procedure.
+ * - This is unlikely to work if Linux is running non-secure.
+ */
+#define v7_exit_coherency_flush(level) \
+	asm volatile( \
+	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
+	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
+	"isb	\n\t" \
+	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
+	"clrex	\n\t" \
+	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
+	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
+	"isb	\n\t" \
+	"dsb	" \
+	/* The clobber list is dictated by the call to v7_flush_dcache_* */ \
+	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
+	      "r9","r10","r11","lr","memory" )
+
 #endif
diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 85fffa702f..14d4996887 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -133,32 +133,8 @@ static void dcscb_power_down(void)
 	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush all cache levels for this cluster.
-		 *
-		 * To do so we do:
-		 * - Clear the SCTLR.C bit to prevent further cache allocations
-		 * - Flush the whole cache
-		 * - Clear the ACTLR "SMP" bit to disable local coherency
-		 *
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including to the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Flush all cache levels for this cluster. */
+		v7_exit_coherency_flush(all);
 
 		/*
 		 * This is a harmless no-op.  On platforms with a real
@@ -177,24 +153,8 @@ static void dcscb_power_down(void)
 	} else {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush the local CPU cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Disable and flush the local CPU cache. */
+		v7_exit_coherency_flush(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);
diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
index dfb55d45b6..5940f1e317 100644
--- a/arch/arm/mach-vexpress/tc2_pm.c
+++ b/arch/arm/mach-vexpress/tc2_pm.c
@@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency)
 			: : "r" (0x400) );
 		}
 
-		/*
-		 * We need to disable and flush the whole (L1 and L2) cache.
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_exit_coherency_flush(all);
 
 		cci_disable_port_by_cpu(mpidr);
 
@@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency)
 
 		arch_spin_unlock(&tc2_pm_lock);
 
-		/*
-		 * We need to disable and flush only the L1 cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_exit_coherency_flush(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-22 17:58               ` Nicolas Pitre
@ 2013-07-23 10:43                 ` Dave Martin
  2013-07-23 12:28                   ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-23 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote:
> On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:
> 
> > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > > 
> > > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > > Those are not very intelligible, but that might at least make people
> > > > > pause and think before blindly using it.
> > > > 
> > > > Good point.  It should still embody the architecture name for which it 
> > > > is valid though.
> > > 
> > > Sure, I was assuming something would be pasted on the start of the name.
> > 
> > v7 :-) with a comment describing the assumptions (in particular related
> > as Dave mentioned to the SMP bit behaviour) ?
> 
> OK... What about this then:
> 
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Thu, 18 Jul 2013 13:12:48 -0400
> Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
>  code
> 
> This code is becoming duplicated in many places.  So let's consolidate
> it into a handy macro that is known to be right and available for reuse.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 17d0ae8672..8f4e2297e2 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
>  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
>  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
>  
> +/*
> + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> + * To do so we must:
> + *
> + * - Clear the SCTLR.C bit to prevent further cache allocations
> + * - Flush the desired level of cache
> + * - Clear the ACTLR "SMP" bit to disable local coherency
> + *
> + * ... and so without any intervening memory access in between those steps,
> + * not even to the stack.
> + *
> + * WARNING -- After this has been called:
> + *
> + * - No ldr/str exclusive must be used.

Maybe write ldrex/strex (I misread that the first time around as "no
ldr/str ... must be used", which seemed a bit drastic)

> + * - The CPU is obviously no longer coherent with the other CPUs.
> + *
> + * Further considerations:
> + *
> + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from

Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
not part of ARMv7 at all.  Also, it seems that A9 isn't precisely the
same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
not the same either.

This is why I preferred to treat the whole sequence as specific to a
particular CPU implementation.  The similarity between A7 and A15
might be viewed as a happy coincidence (it also makes life easier in
big.LITTLE land).

(Also, you mix the names "ACTLR" and "AUXCR", but people will generally
be able to guess those are the same thing)

 
> + *   that will need their own procedure.
> + * - This is unlikely to work if Linux is running non-secure.
> + */
> +#define v7_exit_coherency_flush(level) \

... hence:

#define cortex_a15_exit_coherency(level) ...

#define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level)


What do you think?

Cheers
---Dave

> +	asm volatile( \
> +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> +	"isb	\n\t" \
> +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> +	"clrex	\n\t" \
> +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> +	"isb	\n\t" \
> +	"dsb	" \
> +	/* The clobber list is dictated by the call to v7_flush_dcache_* */ \
> +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> +	      "r9","r10","r11","lr","memory" )
> +
>  #endif
> diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
> index 85fffa702f..14d4996887 100644
> --- a/arch/arm/mach-vexpress/dcscb.c
> +++ b/arch/arm/mach-vexpress/dcscb.c
> @@ -133,32 +133,8 @@ static void dcscb_power_down(void)
>  	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
>  		arch_spin_unlock(&dcscb_lock);
>  
> -		/*
> -		 * Flush all cache levels for this cluster.
> -		 *
> -		 * To do so we do:
> -		 * - Clear the SCTLR.C bit to prevent further cache allocations
> -		 * - Flush the whole cache
> -		 * - Clear the ACTLR "SMP" bit to disable local coherency
> -		 *
> -		 * Let's do it in the safest possible way i.e. with
> -		 * no memory access within the following sequence
> -		 * including to the stack.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_all \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		/* Flush all cache levels for this cluster. */
> +		v7_exit_coherency_flush(all);
>  
>  		/*
>  		 * This is a harmless no-op.  On platforms with a real
> @@ -177,24 +153,8 @@ static void dcscb_power_down(void)
>  	} else {
>  		arch_spin_unlock(&dcscb_lock);
>  
> -		/*
> -		 * Flush the local CPU cache.
> -		 * Let's do it in the safest possible way as above.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_louis \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		/* Disable and flush the local CPU cache. */
> +		v7_exit_coherency_flush(louis);
>  	}
>  
>  	__mcpm_cpu_down(cpu, cluster);
> diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
> index dfb55d45b6..5940f1e317 100644
> --- a/arch/arm/mach-vexpress/tc2_pm.c
> +++ b/arch/arm/mach-vexpress/tc2_pm.c
> @@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency)
>  			: : "r" (0x400) );
>  		}
>  
> -		/*
> -		 * We need to disable and flush the whole (L1 and L2) cache.
> -		 * Let's do it in the safest possible way i.e. with
> -		 * no memory access within the following sequence
> -		 * including the stack.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_all \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		v7_exit_coherency_flush(all);
>  
>  		cci_disable_port_by_cpu(mpidr);
>  
> @@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency)
>  
>  		arch_spin_unlock(&tc2_pm_lock);
>  
> -		/*
> -		 * We need to disable and flush only the L1 cache.
> -		 * Let's do it in the safest possible way as above.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_louis \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		v7_exit_coherency_flush(louis);
>  	}
>  
>  	__mcpm_cpu_down(cpu, cluster);
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-23 10:43                 ` Dave Martin
@ 2013-07-23 12:28                   ` Nicolas Pitre
  2013-07-23 16:33                     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-23 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Jul 2013, Dave Martin wrote:

> On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote:
> > On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:
> > 
> > > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > > > 
> > > > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > > > Those are not very intelligible, but that might at least make people
> > > > > > pause and think before blindly using it.
> > > > > 
> > > > > Good point.  It should still embody the architecture name for which it 
> > > > > is valid though.
> > > > 
> > > > Sure, I was assuming something would be pasted on the start of the name.
> > > 
> > > v7 :-) with a comment describing the assumptions (in particular related
> > > as Dave mentioned to the SMP bit behaviour) ?
> > 
> > OK... What about this then:
> > 
> > ----- >8
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Thu, 18 Jul 2013 13:12:48 -0400
> > Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
> >  code
> > 
> > This code is becoming duplicated in many places.  So let's consolidate
> > it into a handy macro that is known to be right and available for reuse.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > 
> > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > index 17d0ae8672..8f4e2297e2 100644
> > --- a/arch/arm/include/asm/cacheflush.h
> > +++ b/arch/arm/include/asm/cacheflush.h
> > @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> >  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> >  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> >  
> > +/*
> > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> > + * To do so we must:
> > + *
> > + * - Clear the SCTLR.C bit to prevent further cache allocations
> > + * - Flush the desired level of cache
> > + * - Clear the ACTLR "SMP" bit to disable local coherency
> > + *
> > + * ... and so without any intervening memory access in between those steps,
> > + * not even to the stack.
> > + *
> > + * WARNING -- After this has been called:
> > + *
> > + * - No ldr/str exclusive must be used.
> 
> Maybe write ldrex/strex (I misread that the first time around as "no
> ldr/str ... must be used", which seemed a bit drastic)

OK.

> > + * - The CPU is obviously no longer coherent with the other CPUs.
> > + *
> > + * Further considerations:
> > + *
> > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> 
> Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> not part of ARMv7 at all.

Well, I just copied Lorenzo's words here, trusting he knew more about it 
than I do.

> Also, it seems that A9 isn't precisely the
> same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> not the same either.
> 
> This is why I preferred to treat the whole sequence as specific to a
> particular CPU implementation.  The similarity between A7 and A15
> might be viewed as a happy coincidence (it also makes life easier in
> big.LITTLE land).

Fair enough.

> (Also, you mix the names "ACTLR" and "AUXCR", but people will generally
> be able to guess those are the same thing)

We apparently use both in the kernel already.  Which one is the 
canonical one?

> > + *   that will need their own procedure.
> > + * - This is unlikely to work if Linux is running non-secure.
> > + */
> > +#define v7_exit_coherency_flush(level) \
> 
> ... hence:
> 
> #define cortex_a15_exit_coherency(level) ...
> 
> #define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level)
> 
> What do you think?

To be pedantic again, we don't really exit coherency at some cache 
level.  Hence my usage of "flush" indicating that the level argument 
applies only to that and not to the coherency status.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-23 12:28                   ` Nicolas Pitre
@ 2013-07-23 16:33                     ` Lorenzo Pieralisi
  2013-07-25  0:27                       ` Nicolas Pitre
  2013-07-25 12:04                       ` Dave Martin
  0 siblings, 2 replies; 20+ messages in thread
From: Lorenzo Pieralisi @ 2013-07-23 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:

[...]

> > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > + *
> > > + * Further considerations:
> > > + *
> > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > 
> > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > not part of ARMv7 at all.
> 
> Well, I just copied Lorenzo's words here, trusting he knew more about it 
> than I do.
> 
> > Also, it seems that A9 isn't precisely the
> > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > not the same either.

If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
clear it, clearing the SMP bit is enough.

See, Dave has a point, there is no explicit "unified v7 TRM disable
clean and exit coherency procedure" even though the designers end goal is to
have one and that's the one you wrote. The code you posted is perfectly ok on
all v7 implementations in the kernel I am aware of, I stand to be corrected
but to the best of my knowledge that's the case.

> > This is why I preferred to treat the whole sequence as specific to a
> > particular CPU implementation.  The similarity between A7 and A15
> > might be viewed as a happy coincidence (it also makes life easier in
> > big.LITTLE land).
> 
> Fair enough.

I disagree on the happy coincidence but the point is taken. I am not
sure about what we should do, but I reiterate my point, the sequence as
it stands is OK on all NS v7 implementations I am aware of. We can add
macros to differentiate processors when we need them, but again that's
just my opinion, as correct as it can be.

Lorenzo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-23 16:33                     ` Lorenzo Pieralisi
@ 2013-07-25  0:27                       ` Nicolas Pitre
  2013-07-25 12:04                       ` Dave Martin
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-25  0:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Jul 2013, Lorenzo Pieralisi wrote:

> On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> 
> [...]
> 
> > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > + *
> > > > + * Further considerations:
> > > > + *
> > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > 
> > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > not part of ARMv7 at all.
> > 
> > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > than I do.
> > 
> > > Also, it seems that A9 isn't precisely the
> > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > not the same either.
> 
> If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> clear it, clearing the SMP bit is enough.
> 
> See, Dave has a point, there is no explicit "unified v7 TRM disable
> clean and exit coherency procedure" even though the designers end goal is to
> have one and that's the one you wrote. The code you posted is perfectly ok on
> all v7 implementations in the kernel I am aware of, I stand to be corrected
> but to the best of my knowledge that's the case.

OK, I'm removing allusion to an ARMv7 TRM from the comment.

> > > This is why I preferred to treat the whole sequence as specific to a
> > > particular CPU implementation.  The similarity between A7 and A15
> > > might be viewed as a happy coincidence (it also makes life easier in
> > > big.LITTLE land).
> > 
> > Fair enough.
> 
> I disagree on the happy coincidence but the point is taken. I am not
> sure about what we should do, but I reiterate my point, the sequence as
> it stands is OK on all NS v7 implementations I am aware of. We can add
> macros to differentiate processors when we need them, but again that's
> just my opinion, as correct as it can be.

I tend to prefer that as well.

"In theory, practice and theory are equivalent, but in practice they're not"

So if in _practice_ all the ARMv7 implementations we care about are OK 
with the above, then I don't see why we couldn't call it v7_*.

Here's the portion of the patch that I just changed.  All the rest 
stayed the same.  What do you think?

--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -436,4 +436,38 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
 #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
 #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
 
+/*
+ * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
+ * To do so we must:
+ *
+ * - Clear the SCTLR.C bit to prevent further cache allocations
+ * - Flush the desired level of cache
+ * - Clear the ACTLR "SMP" bit to disable local coherency
+ *
+ * ... and so without any intervening memory access in between those steps,
+ * not even to the stack.
+ *
+ * WARNING -- After this has been called:
+ *
+ * - No ldrex/strex (and similar) instructions must be used.
+ * - The CPU is obviously no longer coherent with the other CPUs.
+ * - This is unlikely to work as expected if Linux is running non-secure.
+ */
+#define v7_exit_coherency_flush(level) \
+	asm volatile( \
+	"mrc	p15, 0, r0, c1, c0, 0	@ get SCTLR \n\t" \
+	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 0	@ set SCTLR \n\t" \
+	"isb	\n\t" \
+	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
+	"clrex	\n\t" \
+	"mrc	p15, 0, r0, c1, c0, 1	@ get ACTLR \n\t" \
+	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 1	@ set ACTLR \n\t" \
+	"isb	\n\t" \
+	"dsb	" \
+	/* The clobber list is dictated by the call to v7_flush_dcache_* */ \
+	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
+	      "r9","r10","r11","lr","memory" )
+
 #endif


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-23 16:33                     ` Lorenzo Pieralisi
  2013-07-25  0:27                       ` Nicolas Pitre
@ 2013-07-25 12:04                       ` Dave Martin
  2013-07-26 14:58                         ` Nicolas Pitre
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-25 12:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> 
> [...]
> 
> > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > + *
> > > > + * Further considerations:
> > > > + *
> > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > 
> > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > not part of ARMv7 at all.
> > 
> > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > than I do.
> > 
> > > Also, it seems that A9 isn't precisely the
> > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > not the same either.
> 
> If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> clear it, clearing the SMP bit is enough.
> 
> See, Dave has a point, there is no explicit "unified v7 TRM disable
> clean and exit coherency procedure" even though the designers end goal is to
> have one and that's the one you wrote. The code you posted is perfectly ok on
> all v7 implementations in the kernel I am aware of, I stand to be corrected
> but to the best of my knowledge that's the case.

What I'm really concerned about here is not Cortex-*, but the vendors
who have their own CPU implementations.  I don't think I've ever seen
a TRM for one of those.

> 
> > > This is why I preferred to treat the whole sequence as specific to a
> > > particular CPU implementation.  The similarity between A7 and A15
> > > might be viewed as a happy coincidence (it also makes life easier in
> > > big.LITTLE land).
> > 
> > Fair enough.
> 
> I disagree on the happy coincidence but the point is taken. I am not

I confess to exaggeration!  There was obviously an effort to align A15
and A7, but we know there are differences at the system level, and there
is always the possibility of b.L pairings which differ on things like
the SMP bit, or b.L pairings of CPUs not originally designed to pair but
which are "similar enough" to be pairable.

> sure about what we should do, but I reiterate my point, the sequence as
> it stands is OK on all NS v7 implementations I am aware of. We can add
> macros to differentiate processors when we need them, but again that's
> just my opinion, as correct as it can be.

I still vote for naming them a15_* a7_* or similar.

For big.LITTLE pairings, I think we should assume that the big CPU's
macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
will be true, and we know it works for A15+A7.

So we would use the a15 macro on TC2, but the a7 macro on an a7-only
platform.


The other way to abstract this stuff would be to add a new per-CPU
proc_fn, but that may be overkill, particularly since we know the whole
mcpm backend may tend to be platform-specific, not just the CPU coherency
exit part.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-25 12:04                       ` Dave Martin
@ 2013-07-26 14:58                         ` Nicolas Pitre
  2013-07-26 16:00                           ` Dave Martin
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-07-26 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Jul 2013, Dave Martin wrote:

> On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > 
> > [...]
> > 
> > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > + *
> > > > > + * Further considerations:
> > > > > + *
> > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > 
> > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > not part of ARMv7 at all.
> > > 
> > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > than I do.
> > > 
> > > > Also, it seems that A9 isn't precisely the
> > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > not the same either.
> > 
> > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > clear it, clearing the SMP bit is enough.
> > 
> > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > clean and exit coherency procedure" even though the designers end goal is to
> > have one and that's the one you wrote. The code you posted is perfectly ok on
> > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > but to the best of my knowledge that's the case.
> 
> What I'm really concerned about here is not Cortex-*, but the vendors
> who have their own CPU implementations.  I don't think I've ever seen
> a TRM for one of those.

Could we wait until they materialize before worrying about possible 
incompatibility issues?  After all this is not a user space ABI that 
cannot be changed afterwards.  Remember this is Linux internal code we 
have the ability to change when needed, hence we should resist the 
temptation to over-engineer.

> I confess to exaggeration!  There was obviously an effort to align A15
> and A7, but we know there are differences at the system level, and there
> is always the possibility of b.L pairings which differ on things like
> the SMP bit, or b.L pairings of CPUs not originally designed to pair but
> which are "similar enough" to be pairable.
> 
> > sure about what we should do, but I reiterate my point, the sequence as
> > it stands is OK on all NS v7 implementations I am aware of. We can add
> > macros to differentiate processors when we need them, but again that's
> > just my opinion, as correct as it can be.
> 
> I still vote for naming them a15_* a7_* or similar.
> 
> For big.LITTLE pairings, I think we should assume that the big CPU's
> macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
> will be true, and we know it works for A15+A7.

The MCPM backend don't have to assume any kind of pairing.  The TC2 
backend already has to test whether or not it is running on an A15 in 
order to turn off L2 prefetching which is an A15-only feature.  This is 
done outside of this macro exactly to keep the macro generic to all 
known (so far) v7 implementations.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-26 14:58                         ` Nicolas Pitre
@ 2013-07-26 16:00                           ` Dave Martin
  2013-07-26 16:24                             ` Lorenzo Pieralisi
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Martin @ 2013-07-26 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote:
> On Thu, 25 Jul 2013, Dave Martin wrote:
> 
> > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > > 
> > > [...]
> > > 
> > > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > > + *
> > > > > > + * Further considerations:
> > > > > > + *
> > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > > 
> > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > > not part of ARMv7 at all.
> > > > 
> > > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > > than I do.
> > > > 
> > > > > Also, it seems that A9 isn't precisely the
> > > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > > not the same either.
> > > 
> > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > > clear it, clearing the SMP bit is enough.
> > > 
> > > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > > clean and exit coherency procedure" even though the designers end goal is to
> > > have one and that's the one you wrote. The code you posted is perfectly ok on
> > > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > > but to the best of my knowledge that's the case.
> > 
> > What I'm really concerned about here is not Cortex-*, but the vendors
> > who have their own CPU implementations.  I don't think I've ever seen
> > a TRM for one of those.
> 
> Could we wait until they materialize before worrying about possible 
> incompatibility issues?  After all this is not a user space ABI that 
> cannot be changed afterwards.  Remember this is Linux internal code we 
> have the ability to change when needed, hence we should resist the 
> temptation to over-engineer.

If I could think of a better name, I would suggest it ... but unless we call
it "cortexa" or something, I don't know what it should be.  I'm not sure
how much I like that either.  Strictly speaking, that's still not correct.

So I guess this comes down to being clear about which CPUs we expect it to
work for.

Right now, we know it works for A7 and A15.

It sounds like we believe it might work for A9.  Lorenzo might know about
A5, A7 etc.  But I'm not sure this exact sequence has been tested on any
of those CPUs(?)

We shouldn't expect it to work for any non ARM Ltd CPU, but on platforms
where firmware does the power control it's not Linux's responsibility
anyhow.  (Probably the case for most of those vendors).

> > I confess to exaggeration!  There was obviously an effort to align A15
> > and A7, but we know there are differences at the system level, and there
> > is always the possibility of b.L pairings which differ on things like
> > the SMP bit, or b.L pairings of CPUs not originally designed to pair but
> > which are "similar enough" to be pairable.
> > 
> > > sure about what we should do, but I reiterate my point, the sequence as
> > > it stands is OK on all NS v7 implementations I am aware of. We can add
> > > macros to differentiate processors when we need them, but again that's
> > > just my opinion, as correct as it can be.
> > 
> > I still vote for naming them a15_* a7_* or similar.
> > 
> > For big.LITTLE pairings, I think we should assume that the big CPU's
> > macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
> > will be true, and we know it works for A15+A7.
> 
> The MCPM backend don't have to assume any kind of pairing.  The TC2 
> backend already has to test whether or not it is running on an A15 in 
> order to turn off L2 prefetching which is an A15-only feature.  This is 
> done outside of this macro exactly to keep the macro generic to all 
> known (so far) v7 implementations.

That's true.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences
  2013-07-26 16:00                           ` Dave Martin
@ 2013-07-26 16:24                             ` Lorenzo Pieralisi
  0 siblings, 0 replies; 20+ messages in thread
From: Lorenzo Pieralisi @ 2013-07-26 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 26, 2013 at 05:00:15PM +0100, Dave Martin wrote:
> On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote:
> > On Thu, 25 Jul 2013, Dave Martin wrote:
> > 
> > > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > > > + *
> > > > > > > + * Further considerations:
> > > > > > > + *
> > > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > > > 
> > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > > > not part of ARMv7 at all.
> > > > > 
> > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > > > than I do.
> > > > > 
> > > > > > Also, it seems that A9 isn't precisely the
> > > > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > > > not the same either.
> > > > 
> > > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > > > clear it, clearing the SMP bit is enough.
> > > > 
> > > > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > > > clean and exit coherency procedure" even though the designers end goal is to
> > > > have one and that's the one you wrote. The code you posted is perfectly ok on
> > > > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > > > but to the best of my knowledge that's the case.
> > > 
> > > What I'm really concerned about here is not Cortex-*, but the vendors
> > > who have their own CPU implementations.  I don't think I've ever seen
> > > a TRM for one of those.
> > 
> > Could we wait until they materialize before worrying about possible 
> > incompatibility issues?  After all this is not a user space ABI that 
> > cannot be changed afterwards.  Remember this is Linux internal code we 
> > have the ability to change when needed, hence we should resist the 
> > temptation to over-engineer.
> 
> If I could think of a better name, I would suggest it ... but unless we call
> it "cortexa" or something, I don't know what it should be.  I'm not sure
> how much I like that either.  Strictly speaking, that's still not correct.
> 
> So I guess this comes down to being clear about which CPUs we expect it to
> work for.
> 
> Right now, we know it works for A7 and A15.
> 
> It sounds like we believe it might work for A9.  Lorenzo might know about
> A5, A7 etc.  But I'm not sure this exact sequence has been tested on any
> of those CPUs(?)

I tested this sequence on A9 MP a while ago, even though, obviously was not
this same macro. I know for certain it has been tested on A5, again not the
same macro but the same sequence.

On R7 honestly I can't certify.

What if we postpone the consolidation till MCPM for some A9 platforms
(eg Tegra) is posted (and in the meantime you leave the macro in mach-vexpress
to avoid duplication) ?

Thank you,
Lorenzo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-07-26 16:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-18  3:28 [PATCH 0/4] MCPM backends updates Nicolas Pitre
2013-07-18  3:28 ` [PATCH 1/4] ARM: vexpress/dcscb: fix cache disabling sequences Nicolas Pitre
2013-07-18 15:04   ` Dave Martin
2013-07-18 17:24     ` Nicolas Pitre
2013-07-18 18:03       ` Dave Martin
2013-07-18 18:59         ` Nicolas Pitre
2013-07-19 10:28           ` Dave Martin
2013-07-19 10:59             ` Lorenzo Pieralisi
2013-07-22 17:58               ` Nicolas Pitre
2013-07-23 10:43                 ` Dave Martin
2013-07-23 12:28                   ` Nicolas Pitre
2013-07-23 16:33                     ` Lorenzo Pieralisi
2013-07-25  0:27                       ` Nicolas Pitre
2013-07-25 12:04                       ` Dave Martin
2013-07-26 14:58                         ` Nicolas Pitre
2013-07-26 16:00                           ` Dave Martin
2013-07-26 16:24                             ` Lorenzo Pieralisi
2013-07-18  3:28 ` [PATCH 2/4] drivers/platform: vexpress: add Serial Power Controller (SPC) support Nicolas Pitre
2013-07-18  3:28 ` [PATCH 3/4] ARM: vexpress/TC2: basic PM support Nicolas Pitre
2013-07-18  3:28 ` [PATCH 4/4] ARM: vexpress/TC2: implement PM suspend method Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.