All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/15] multi-cluster power management
@ 2013-02-05  5:21 Nicolas Pitre
  2013-02-05  5:21 ` [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code Nicolas Pitre
                   ` (15 more replies)
  0 siblings, 16 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:21 UTC (permalink / raw)
  To: linux-arm-kernel

This is version 4 of the patch series required to safely power up
and down CPUs in a cluster as can be found in b.L systems.  If nothing
major is reported, I'll send a pull request to Russell for this set
very soon.

Also included are the needed patches to allow CPU hotplug on RTSM
configured as a big.LITTLE system.  I'll hopefully send a separate pull
request for them later if the remaining issue with the CCI code can be
sorted out in time.

Please refer to http://article.gmane.org/gmane.linux.ports.arm.kernel/208625
for the initial series and particularly the cover page blurb for this work.

Thanks to those who provided review comments.

Changes from v3:

- DT binding documentation for the DCSCB.
- Patch to probe DT for DCSCB presence folded into the patch that
  introduced it.
- Patch to select SMP ops at boot time now split in two patches.
- Use all 8 bits from the MPIDR affinity fields in mcpm_head.S.
- Replaced gic_raise_softirq() with arch_send_wakeup_ipi_mask().
- Clear CTRL.C bit rather than calling cpu_proc_fin() to keep the I-cache
  enabled.
- Added definitions in the CCI code to clean up some magic values.

Changes from v2:

- The bL_ prefix has been changed into mcpm_ and surroundings adjusted
  accordingly.
- Documentation moved up one level in Documentation/arm/.
- Clarifications in commit log for patch #1 about future work.
- The debug macro in mcpm_head.S now displays CPU and cluster numbers.
- Patch improving mcpm_cpu_die() folded into the original patch that
  created it.
- Return -EADDRNOTAVAIL on ioremap failure.
- The auxcr patch moved down in the series to better identify dependencies.

Changes from v1:

- Pulled in Rob Herring's auxcr accessor patch and converted this series
  to it.
- VMajor rework of various barriers (some DSBs demoted to DMBs, etc.)
- The sync_mem() macro is now split and enhanced to properly process the
  cache for writers and readers in the cluster critical region helpers.
- BL_NR_CLUSTERS and BL_CPUS_PER_CLUSTER renamed to BL_MAX_CLUSTERS
  and BL_MAX_CPUS_PER_CLUSTER.
- Removed unused C definitions and prototypes for vlocks.
- Simplified the vlock memory allocation.
- The vlock code is GPL v2.
- Replaced MPIDR inline asm by read_cpuid_mpidr().
- Use of MPIDR_AFFINITY_LEVEL() to replace explicit shifts and masks.
- Dropped gic_cpu_if_down().
- Added a DSB before SEV and WFI.
- Fixed power_up_setup helper prototype.
- Nuked smp_wmb() in bL_set_entry_vector().
- Moved the CCI driver to drivers/bus/.
- Dependency on CONFIG_EXPERIMENTAL removed.
- Leftover garbage in Makefile removed.
- Added/clarified various comments in the assembly code.
- Some documentation typos fixed.
- Copyright notices updated to 2013

Still not addressed yet in this series:

- The CCI device tree binding descriptions.

Diffstat:

 Documentation/arm/cluster-pm-race-avoidance.txt | 498 ++++++++++++++++++
 Documentation/arm/vlocks.txt                    | 211 ++++++++
 .../devicetree/bindings/arm/rtsm-dcscb.txt      |  19 +
 arch/arm/Kconfig                                |   8 +
 arch/arm/common/Makefile                        |   1 +
 arch/arm/common/mcpm_entry.c                    | 314 +++++++++++
 arch/arm/common/mcpm_head.S                     | 219 ++++++++
 arch/arm/common/mcpm_platsmp.c                  |  86 +++
 arch/arm/common/vlock.S                         | 108 ++++
 arch/arm/common/vlock.h                         |  29 +
 arch/arm/include/asm/cp15.h                     |  14 +
 arch/arm/include/asm/mach/arch.h                |   3 +
 arch/arm/include/asm/mcpm_entry.h               | 190 +++++++
 arch/arm/kernel/setup.c                         |   5 +-
 arch/arm/mach-vexpress/Kconfig                  |   9 +
 arch/arm/mach-vexpress/Makefile                 |   1 +
 arch/arm/mach-vexpress/core.h                   |   2 +
 arch/arm/mach-vexpress/dcscb.c                  | 249 +++++++++
 arch/arm/mach-vexpress/dcscb_setup.S            |  80 +++
 arch/arm/mach-vexpress/platsmp.c                |  12 +
 arch/arm/mach-vexpress/v2m.c                    |   2 +-
 drivers/bus/Kconfig                             |   5 +
 drivers/bus/Makefile                            |   2 +
 drivers/bus/arm-cci.c                           | 125 +++++
 drivers/cpuidle/cpuidle-calxeda.c               |  14 -
 include/linux/arm-cci.h                         |  30 ++
 26 files changed, 2220 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
@ 2013-02-05  5:21 ` Nicolas Pitre
  2013-04-23 19:19   ` Russell King - ARM Linux
  2013-02-05  5:21 ` [PATCH v4 02/15] ARM: mcpm: introduce the CPU/cluster power API Nicolas Pitre
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:21 UTC (permalink / raw)
  To: linux-arm-kernel

CPUs in cluster based systems, such as big.LITTLE, have special needs
when entering the kernel due to a hotplug event, or when resuming from
a deep sleep mode.

This is vectorized so multiple CPUs can enter the kernel in parallel
without serialization.

The mcpm prefix stands for "multi cluster power management", however
this is usable on single cluster systems as well.  Only the basic
structure is introduced here.  This will be extended with later patches.

In order not to complexify things more than they currently have to,
the planned work to make runtime adjusted MPIDR based indexing and
dynamic memory allocation for cluster states is postponed to a later
cycle. The MAX_NR_CLUSTERS and MAX_CPUS_PER_CLUSTER static definitions
should be sufficient for those systems expected to be available in the
near future.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/Kconfig                  |  8 ++++
 arch/arm/common/Makefile          |  1 +
 arch/arm/common/mcpm_entry.c      | 29 +++++++++++++
 arch/arm/common/mcpm_head.S       | 86 +++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/mcpm_entry.h | 35 ++++++++++++++++
 5 files changed, 159 insertions(+)
 create mode 100644 arch/arm/common/mcpm_entry.c
 create mode 100644 arch/arm/common/mcpm_head.S
 create mode 100644 arch/arm/include/asm/mcpm_entry.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b82a4..200f559c1c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1584,6 +1584,14 @@ config HAVE_ARM_TWD
 	help
 	  This options enables support for the ARM timer and watchdog unit
 
+config CLUSTER_PM
+	bool "Cluster Power Management Infrastructure"
+	depends on CPU_V7 && SMP
+	help
+	  This option provides the common power management infrastructure
+	  for (multi-)cluster based systems, such as big.LITTLE based
+	  systems.
+
 choice
 	prompt "Memory split"
 	default VMSPLIT_3G
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index e8a4e58f1b..23e85b1fae 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
 obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
 obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
 obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
+obj-$(CONFIG_CLUSTER_PM)	+= mcpm_head.o mcpm_entry.o
diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
new file mode 100644
index 0000000000..3a6d7e70fd
--- /dev/null
+++ b/arch/arm/common/mcpm_entry.c
@@ -0,0 +1,29 @@
+/*
+ * arch/arm/common/mcpm_entry.c -- entry point for multi-cluster PM
+ *
+ * Created by:  Nicolas Pitre, March 2012
+ * Copyright:   (C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+
+#include <asm/mcpm_entry.h>
+#include <asm/barrier.h>
+#include <asm/proc-fns.h>
+#include <asm/cacheflush.h>
+
+extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
+
+void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
+{
+	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
+	mcpm_entry_vectors[cluster][cpu] = val;
+	__cpuc_flush_dcache_area((void *)&mcpm_entry_vectors[cluster][cpu], 4);
+	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
+			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));
+}
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
new file mode 100644
index 0000000000..cda65f200b
--- /dev/null
+++ b/arch/arm/common/mcpm_head.S
@@ -0,0 +1,86 @@
+/*
+ * arch/arm/common/mcpm_head.S -- kernel entry point for multi-cluster PM
+ *
+ * Created by:  Nicolas Pitre, March 2012
+ * Copyright:   (C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/mcpm_entry.h>
+
+	.macro	pr_dbg	string
+#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
+	b	1901f
+1902:	.asciz	"CPU"
+1903:	.asciz	" cluster"
+1904:	.asciz	": \string"
+	.align
+1901:	adr	r0, 1902b
+	bl	printascii
+	mov	r0, r9
+	bl	printhex8
+	adr	r0, 1903b
+	bl	printascii
+	mov	r0, r10
+	bl	printhex8
+	adr	r0, 1904b
+	bl	printascii
+#endif
+	.endm
+
+	.arm
+	.align
+
+ENTRY(mcpm_entry_point)
+
+ THUMB(	adr	r12, BSYM(1f)	)
+ THUMB(	bx	r12		)
+ THUMB(	.thumb			)
+1:
+	mrc	p15, 0, r0, c0, c0, 5		@ MPIDR
+	ubfx	r9, r0, #0, #8			@ r9 = cpu
+	ubfx	r10, r0, #8, #8			@ r10 = cluster
+	mov	r3, #MAX_CPUS_PER_CLUSTER
+	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
+	cmp	r4, #(MAX_CPUS_PER_CLUSTER * MAX_NR_CLUSTERS)
+	blo	2f
+
+	/* We didn't expect this CPU.  Try to cheaply make it quiet. */
+1:	wfi
+	wfe
+	b	1b
+
+2:	pr_dbg	"kernel mcpm_entry_point\n"
+
+	/*
+	 * MMU is off so we need to get to mcpm_entry_vectors in a
+	 * position independent way.
+	 */
+	adr	r5, 3f
+	ldr	r6, [r5]
+	add	r6, r5, r6			@ r6 = mcpm_entry_vectors
+
+mcpm_entry_gated:
+	ldr	r5, [r6, r4, lsl #2]		@ r5 = CPU entry vector
+	cmp	r5, #0
+	wfeeq
+	beq	mcpm_entry_gated
+	pr_dbg	"released\n"
+	bx	r5
+
+	.align	2
+
+3:	.word	mcpm_entry_vectors - .
+
+ENDPROC(mcpm_entry_point)
+
+	.bss
+	.align	5
+
+	.type	mcpm_entry_vectors, #object
+ENTRY(mcpm_entry_vectors)
+	.space	4 * MAX_NR_CLUSTERS * MAX_CPUS_PER_CLUSTER
diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
new file mode 100644
index 0000000000..cc10ebbd2e
--- /dev/null
+++ b/arch/arm/include/asm/mcpm_entry.h
@@ -0,0 +1,35 @@
+/*
+ * arch/arm/include/asm/mcpm_entry.h
+ *
+ * Created by:  Nicolas Pitre, April 2012
+ * Copyright:   (C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef MCPM_ENTRY_H
+#define MCPM_ENTRY_H
+
+#define MAX_CPUS_PER_CLUSTER	4
+#define MAX_NR_CLUSTERS		2
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Platform specific code should use this symbol to set up secondary
+ * entry location for processors to use when released from reset.
+ */
+extern void mcpm_entry_point(void);
+
+/*
+ * This is used to indicate where the given CPU from given cluster should
+ * branch once it is ready to re-enter the kernel using ptr, or NULL if it
+ * should be gated.  A gated CPU is held in a WFE loop until its vector
+ * becomes non NULL.
+ */
+void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr);
+
+#endif /* ! __ASSEMBLY__ */
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 02/15] ARM: mcpm: introduce the CPU/cluster power API
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
  2013-02-05  5:21 ` [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code Nicolas Pitre
@ 2013-02-05  5:21 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup Nicolas Pitre
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:21 UTC (permalink / raw)
  To: linux-arm-kernel

This is the basic API used to handle the powering up/down of individual
CPUs in a (multi-)cluster system.  The platform specific backend
implementation has the responsibility to also handle the cluster level
power as well when the first/last CPU in a cluster is brought up/down.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/common/mcpm_entry.c      | 88 +++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/mcpm_entry.h | 92 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 180 insertions(+)

diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
index 3a6d7e70fd..c8c0e2113e 100644
--- a/arch/arm/common/mcpm_entry.c
+++ b/arch/arm/common/mcpm_entry.c
@@ -11,11 +11,13 @@
 
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/irqflags.h>
 
 #include <asm/mcpm_entry.h>
 #include <asm/barrier.h>
 #include <asm/proc-fns.h>
 #include <asm/cacheflush.h>
+#include <asm/idmap.h>
 
 extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
 
@@ -27,3 +29,89 @@ void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
 	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
 			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));
 }
+
+static const struct mcpm_platform_ops *platform_ops;
+
+int __init mcpm_platform_register(const struct mcpm_platform_ops *ops)
+{
+	if (platform_ops)
+		return -EBUSY;
+	platform_ops = ops;
+	return 0;
+}
+
+int mcpm_cpu_power_up(unsigned int cpu, unsigned int cluster)
+{
+	if (!platform_ops)
+		return -EUNATCH; /* try not to shadow power_up errors */
+	might_sleep();
+	return platform_ops->power_up(cpu, cluster);
+}
+
+typedef void (*phys_reset_t)(unsigned long);
+
+void mcpm_cpu_power_down(void)
+{
+	phys_reset_t phys_reset;
+
+	BUG_ON(!platform_ops);
+	BUG_ON(!irqs_disabled());
+
+	/*
+	 * Do this before calling into the power_down method,
+	 * as it might not always be safe to do afterwards.
+	 */
+	setup_mm_for_reboot();
+
+	platform_ops->power_down();
+
+	/*
+	 * It is possible for a power_up request to happen concurrently
+	 * with a power_down request for the same CPU. In this case the
+	 * power_down method might not be able to actually enter a
+	 * powered down state with the WFI instruction if the power_up
+	 * method has removed the required reset condition.  The
+	 * power_down method is then allowed to return. We must perform
+	 * a re-entry in the kernel as if the power_up method just had
+	 * deasserted reset on the CPU.
+	 *
+	 * To simplify race issues, the platform specific implementation
+	 * must accommodate for the possibility of unordered calls to
+	 * power_down and power_up with a usage count. Therefore, if a
+	 * call to power_up is issued for a CPU that is not down, then
+	 * the next call to power_down must not attempt a full shutdown
+	 * but only do the minimum (normally disabling L1 cache and CPU
+	 * coherency) and return just as if a concurrent power_up request
+	 * had happened as described above.
+	 */
+
+	phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
+	phys_reset(virt_to_phys(mcpm_entry_point));
+
+	/* should never get here */
+	BUG();
+}
+
+void mcpm_cpu_suspend(u64 expected_residency)
+{
+	phys_reset_t phys_reset;
+
+	BUG_ON(!platform_ops);
+	BUG_ON(!irqs_disabled());
+
+	/* Very similar to mcpm_cpu_power_down() */
+	setup_mm_for_reboot();
+	platform_ops->suspend(expected_residency);
+	phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
+	phys_reset(virt_to_phys(mcpm_entry_point));
+	BUG();
+}
+
+int mcpm_cpu_powered_up(void)
+{
+	if (!platform_ops)
+		return -EUNATCH;
+	if (platform_ops->powered_up)
+		platform_ops->powered_up();
+	return 0;
+}
diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
index cc10ebbd2e..3286d5eb91 100644
--- a/arch/arm/include/asm/mcpm_entry.h
+++ b/arch/arm/include/asm/mcpm_entry.h
@@ -31,5 +31,97 @@ extern void mcpm_entry_point(void);
  */
 void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr);
 
+/*
+ * CPU/cluster power operations API for higher subsystems to use.
+ */
+
+/**
+ * mcpm_cpu_power_up - make given CPU in given cluster runable
+ *
+ * @cpu: CPU number within given cluster
+ * @cluster: cluster number for the CPU
+ *
+ * The identified CPU is brought out of reset.  If the cluster was powered
+ * down then it is brought up as well, taking care not to let the other CPUs
+ * in the cluster run, and ensuring appropriate cluster setup.
+ *
+ * Caller must ensure the appropriate entry vector is initialized with
+ * mcpm_set_entry_vector() prior to calling this.
+ *
+ * This must be called in a sleepable context.  However, the implementation
+ * is strongly encouraged to return early and let the operation happen
+ * asynchronously, especially when significant delays are expected.
+ *
+ * If the operation cannot be performed then an error code is returned.
+ */
+int mcpm_cpu_power_up(unsigned int cpu, unsigned int cluster);
+
+/**
+ * mcpm_cpu_power_down - power the calling CPU down
+ *
+ * The calling CPU is powered down.
+ *
+ * If this CPU is found to be the "last man standing" in the cluster
+ * then the cluster is prepared for power-down too.
+ *
+ * This must be called with interrupts disabled.
+ *
+ * This does not return.  Re-entry in the kernel is expected via
+ * mcpm_entry_point.
+ */
+void mcpm_cpu_power_down(void);
+
+/**
+ * mcpm_cpu_suspend - bring the calling CPU in a suspended state
+ *
+ * @expected_residency: duration in microseconds the CPU is expected
+ *			to remain suspended, or 0 if unknown/infinity.
+ *
+ * The calling CPU is suspended.  The expected residency argument is used
+ * as a hint by the platform specific backend to implement the appropriate
+ * sleep state level according to the knowledge it has on wake-up latency
+ * for the given hardware.
+ *
+ * If this CPU is found to be the "last man standing" in the cluster
+ * then the cluster may be prepared for power-down too, if the expected
+ * residency makes it worthwhile.
+ *
+ * This must be called with interrupts disabled.
+ *
+ * This does not return.  Re-entry in the kernel is expected via
+ * mcpm_entry_point.
+ */
+void mcpm_cpu_suspend(u64 expected_residency);
+
+/**
+ * mcpm_cpu_powered_up - housekeeping workafter a CPU has been powered up
+ *
+ * This lets the platform specific backend code perform needed housekeeping
+ * work.  This must be called by the newly activated CPU as soon as it is
+ * fully operational in kernel space, before it enables interrupts.
+ *
+ * If the operation cannot be performed then an error code is returned.
+ */
+int mcpm_cpu_powered_up(void);
+
+/*
+ * Platform specific methods used in the implementation of the above API.
+ */
+struct mcpm_platform_ops {
+	int (*power_up)(unsigned int cpu, unsigned int cluster);
+	void (*power_down)(void);
+	void (*suspend)(u64);
+	void (*powered_up)(void);
+};
+
+/**
+ * mcpm_platform_register - register platform specific power methods
+ *
+ * @ops: mcpm_platform_ops structure to register
+ *
+ * An error is returned if the registration has been done previously.
+ */
+int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
+
 #endif /* ! __ASSEMBLY__ */
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
  2013-02-05  5:21 ` [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code Nicolas Pitre
  2013-02-05  5:21 ` [PATCH v4 02/15] ARM: mcpm: introduce the CPU/cluster power API Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-04-05 23:00   ` Olof Johansson
  2013-02-05  5:22 ` [PATCH v4 04/15] ARM: mcpm: Add baremetal voting mutexes Nicolas Pitre
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dave Martin <dave.martin@linaro.org>

This provides helper methods to coordinate between CPUs coming down
and CPUs going up, as well as documentation on the used algorithms,
so that cluster teardown and setup
operations are not done for a cluster simultaneously.

For use in the power_down() implementation:
  * __mcpm_cpu_going_down(unsigned int cluster, unsigned int cpu)
  * __mcpm_outbound_enter_critical(unsigned int cluster)
  * __mcpm_outbound_leave_critical(unsigned int cluster)
  * __mcpm_cpu_down(unsigned int cluster, unsigned int cpu)

The power_up_setup() helper should do platform-specific setup in
preparation for turning the CPU on, such as invalidating local caches
or entering coherency.  It must be assembler for now, since it must
run before the MMU can be switched on.  It is passed the affinity level
which should be initialized.

Because the mcpm_sync_struct content is looked-up and modified
with the cache enabled or disabled depending on the code path, it is
crucial to always ensure proper cache maintenance to update main memory
right away.  Therefore, any cached write must be followed by a cache
clean operation and any cached read must be preceded by a cache
invalidate operation (actually a cache flush i.e. clean+invalidate to
avoid discarding possible concurrent writes) on the accessed memory.

Also, in order to prevent a cached writer from interfering with an
adjacent non-cached writer, we ensure each state variable is located to
a separate cache line.

Thanks to Nicolas Pitre and Achin Gupta for the help with this
patch.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/arm/cluster-pm-race-avoidance.txt | 498 ++++++++++++++++++++++++
 arch/arm/common/mcpm_entry.c                    | 197 ++++++++++
 arch/arm/common/mcpm_head.S                     | 106 ++++-
 arch/arm/include/asm/mcpm_entry.h               |  63 +++
 4 files changed, 862 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/arm/cluster-pm-race-avoidance.txt

diff --git a/Documentation/arm/cluster-pm-race-avoidance.txt b/Documentation/arm/cluster-pm-race-avoidance.txt
new file mode 100644
index 0000000000..750b6fc24a
--- /dev/null
+++ b/Documentation/arm/cluster-pm-race-avoidance.txt
@@ -0,0 +1,498 @@
+Cluster-wide Power-up/power-down race avoidance algorithm
+=========================================================
+
+This file documents the algorithm which is used to coordinate CPU and
+cluster setup and teardown operations and to manage hardware coherency
+controls safely.
+
+The section "Rationale" explains what the algorithm is for and why it is
+needed.  "Basic model" explains general concepts using a simplified view
+of the system.  The other sections explain the actual details of the
+algorithm in use.
+
+
+Rationale
+---------
+
+In a system containing multiple CPUs, it is desirable to have the
+ability to turn off individual CPUs when the system is idle, reducing
+power consumption and thermal dissipation.
+
+In a system containing multiple clusters of CPUs, it is also desirable
+to have the ability to turn off entire clusters.
+
+Turning entire clusters off and on is a risky business, because it
+involves performing potentially destructive operations affecting a group
+of independently running CPUs, while the OS continues to run.  This
+means that we need some coordination in order to ensure that critical
+cluster-level operations are only performed when it is truly safe to do
+so.
+
+Simple locking may not be sufficient to solve this problem, because
+mechanisms like Linux spinlocks may rely on coherency mechanisms which
+are not immediately enabled when a cluster powers up.  Since enabling or
+disabling those mechanisms may itself be a non-atomic operation (such as
+writing some hardware registers and invalidating large caches), other
+methods of coordination are required in order to guarantee safe
+power-down and power-up at the cluster level.
+
+The mechanism presented in this document describes a coherent memory
+based protocol for performing the needed coordination.  It aims to be as
+lightweight as possible, while providing the required safety properties.
+
+
+Basic model
+-----------
+
+Each cluster and CPU is assigned a state, as follows:
+
+	DOWN
+	COMING_UP
+	UP
+	GOING_DOWN
+
+	    +---------> UP ----------+
+	    |                        v
+
+	COMING_UP                GOING_DOWN
+
+	    ^                        |
+	    +--------- DOWN <--------+
+
+
+DOWN:	The CPU or cluster is not coherent, and is either powered off or
+	suspended, or is ready to be powered off or suspended.
+
+COMING_UP: The CPU or cluster has committed to moving to the UP state.
+	It may be part way through the process of initialisation and
+	enabling coherency.
+
+UP:	The CPU or cluster is active and coherent at the hardware
+	level.  A CPU in this state is not necessarily being used
+	actively by the kernel.
+
+GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
+	state.  It may be part way through the process of teardown and
+	coherency exit.
+
+
+Each CPU has one of these states assigned to it at any point in time.
+The CPU states are described in the "CPU state" section, below.
+
+Each cluster is also assigned a state, but it is necessary to split the
+state value into two parts (the "cluster" state and "inbound" state) and
+to introduce additional states in order to avoid races between different
+CPUs in the cluster simultaneously modifying the state.  The cluster-
+level states are described in the "Cluster state" section.
+
+To help distinguish the CPU states from cluster states in this
+discussion, the state names are given a CPU_ prefix for the CPU states,
+and a CLUSTER_ or INBOUND_ prefix for the cluster states.
+
+
+CPU state
+---------
+
+In this algorithm, each individual core in a multi-core processor is
+referred to as a "CPU".  CPUs are assumed to be single-threaded:
+therefore, a CPU can only be doing one thing@a single point in time.
+
+This means that CPUs fit the basic model closely.
+
+The algorithm defines the following states for each CPU in the system:
+
+	CPU_DOWN
+	CPU_COMING_UP
+	CPU_UP
+	CPU_GOING_DOWN
+
+	 cluster setup and
+	CPU setup complete          policy decision
+	      +-----------> CPU_UP ------------+
+	      |                                v
+
+	CPU_COMING_UP                   CPU_GOING_DOWN
+
+	      ^                                |
+	      +----------- CPU_DOWN <----------+
+	 policy decision           CPU teardown complete
+	or hardware event
+
+
+The definitions of the four states correspond closely to the states of
+the basic model.
+
+Transitions between states occur as follows.
+
+A trigger event (spontaneous) means that the CPU can transition to the
+next state as a result of making local progress only, with no
+requirement for any external event to happen.
+
+
+CPU_DOWN:
+
+	A CPU reaches the CPU_DOWN state when it is ready for
+	power-down.  On reaching this state, the CPU will typically
+	power itself down or suspend itself, via a WFI instruction or a
+	firmware call.
+
+	Next state:	CPU_COMING_UP
+	Conditions:	none
+
+	Trigger events:
+
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CPU_COMING_UP:
+
+	A CPU cannot start participating in hardware coherency until the
+	cluster is set up and coherent.  If the cluster is not ready,
+	then the CPU will wait in the CPU_COMING_UP state until the
+	cluster has been set up.
+
+	Next state:	CPU_UP
+	Conditions:	The CPU's parent cluster must be in CLUSTER_UP.
+	Trigger events:	Transition of the parent cluster to CLUSTER_UP.
+
+	Refer to the "Cluster state" section for a description of the
+	CLUSTER_UP state.
+
+
+CPU_UP:
+	When a CPU reaches the CPU_UP state, it is safe for the CPU to
+	start participating in local coherency.
+
+	This is done by jumping to the kernel's CPU resume code.
+
+	Note that the definition of this state is slightly different
+	from the basic model definition: CPU_UP does not mean that the
+	CPU is coherent yet, but it does mean that it is safe to resume
+	the kernel.  The kernel handles the rest of the resume
+	procedure, so the remaining steps are not visible as part of the
+	race avoidance algorithm.
+
+	The CPU remains in this state until an explicit policy decision
+	is made to shut down or suspend the CPU.
+
+	Next state:	CPU_GOING_DOWN
+	Conditions:	none
+	Trigger events:	explicit policy decision
+
+
+CPU_GOING_DOWN:
+
+	While in this state, the CPU exits coherency, including any
+	operations required to achieve this (such as cleaning data
+	caches).
+
+	Next state:	CPU_DOWN
+	Conditions:	local CPU teardown complete
+	Trigger events:	(spontaneous)
+
+
+Cluster state
+-------------
+
+A cluster is a group of connected CPUs with some common resources.
+Because a cluster contains multiple CPUs, it can be doing multiple
+things@the same time.  This has some implications.  In particular, a
+CPU can start up while another CPU is tearing the cluster down.
+
+In this discussion, the "outbound side" is the view of the cluster state
+as seen by a CPU tearing the cluster down.  The "inbound side" is the
+view of the cluster state as seen by a CPU setting the CPU up.
+
+In order to enable safe coordination in such situations, it is important
+that a CPU which is setting up the cluster can advertise its state
+independently of the CPU which is tearing down the cluster.  For this
+reason, the cluster state is split into two parts:
+
+	"cluster" state: The global state of the cluster; or the state
+		on the outbound side:
+
+		CLUSTER_DOWN
+		CLUSTER_UP
+		CLUSTER_GOING_DOWN
+
+	"inbound" state: The state of the cluster on the inbound side.
+
+		INBOUND_NOT_COMING_UP
+		INBOUND_COMING_UP
+
+
+	The different pairings of these states results in six possible
+	states for the cluster as a whole:
+
+	                            CLUSTER_UP
+	          +==========> INBOUND_NOT_COMING_UP -------------+
+	          #                                               |
+	                                                          |
+	     CLUSTER_UP     <----+                                |
+	  INBOUND_COMING_UP      |                                v
+
+	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
+	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
+
+	    CLUSTER_DOWN         |                                |
+	  INBOUND_COMING_UP <----+                                |
+	                                                          |
+	          ^                                               |
+	          +===========     CLUSTER_DOWN      <------------+
+	                       INBOUND_NOT_COMING_UP
+
+	Transitions -----> can only be made by the outbound CPU, and
+	only involve changes to the "cluster" state.
+
+	Transitions ===##> can only be made by the inbound CPU, and only
+	involve changes to the "inbound" state, except where there is no
+	further transition possible on the outbound side (i.e., the
+	outbound CPU has put the cluster into the CLUSTER_DOWN state).
+
+	The race avoidance algorithm does not provide a way to determine
+	which exact CPUs within the cluster play these roles.  This must
+	be decided in advance by some other means.  Refer to the section
+	"Last man and first man selection" for more explanation.
+
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
+	cluster can actually be powered down.
+
+	The parallelism of the inbound and outbound CPUs is observed by
+	the existence of two different paths from CLUSTER_GOING_DOWN/
+	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
+	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
+	COMING_UP in the basic model).  The second path avoids cluster
+	teardown completely.
+
+	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
+	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
+	is trivial and merely resets the state machine ready for the
+	next cycle.
+
+	Details of the allowable transitions follow.
+
+	The next state in each case is notated
+
+		<cluster state>/<inbound state> (<transitioner>)
+
+	where the <transitioner> is the side on which the transition
+	can occur; either the inbound or the outbound side.
+
+
+CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
+
+	Next state:	CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
+	Conditions:	none
+	Trigger events:
+
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CLUSTER_DOWN/INBOUND_COMING_UP:
+
+	In this state, an inbound CPU sets up the cluster, including
+	enabling of hardware coherency at the cluster level and any
+	other operations (such as cache invalidation) which are required
+	in order to achieve this.
+
+	The purpose of this state is to do sufficient cluster-level
+	setup to enable other CPUs in the cluster to enter coherency
+	safely.
+
+	Next state:	CLUSTER_UP/INBOUND_COMING_UP (inbound)
+	Conditions:	cluster-level setup and hardware coherency complete
+	Trigger events:	(spontaneous)
+
+
+CLUSTER_UP/INBOUND_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	This is a transient state, leading immediately to
+	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
+	should consider treat these two states as equivalent.
+
+	Next state:	CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
+	Conditions:	none
+	Trigger events:	(spontaneous)
+
+
+CLUSTER_UP/INBOUND_NOT_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	The cluster will remain in this state until a policy decision is
+	made to power the cluster down.
+
+	Next state:	CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
+	Conditions:	none
+	Trigger events:	policy decision to power down the cluster
+
+
+CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
+
+	An outbound CPU is tearing the cluster down.  The selected CPU
+	must wait in this state until all CPUs in the cluster are in the
+	CPU_DOWN state.
+
+	When all CPUs are in the CPU_DOWN state, the cluster can be torn
+	down, for example by cleaning data caches and exiting
+	cluster-level coherency.
+
+	To avoid wasteful unnecessary teardown operations, the outbound
+	should check the inbound cluster state for asynchronous
+	transitions to INBOUND_COMING_UP.  Alternatively, individual
+	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
+
+
+	Next states:
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
+		Conditions:	cluster torn down and ready to power off
+		Trigger events:	(spontaneous)
+
+	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
+		Conditions:	none
+		Trigger events:
+
+			a) an explicit hardware power-up operation,
+			   resulting from a policy decision on another
+			   CPU;
+
+			b) a hardware event, such as an interrupt.
+
+
+CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
+
+	The cluster is (or was) being torn down, but another CPU has
+	come online in the meantime and is trying to set up the cluster
+	again.
+
+	If the outbound CPU observes this state, it has two choices:
+
+		a) back out of teardown, restoring the cluster to the
+		   CLUSTER_UP state;
+
+		b) finish tearing the cluster down and put the cluster
+		   in the CLUSTER_DOWN state; the inbound CPU will
+		   set up the cluster again from there.
+
+	Choice (a) permits the removal of some latency by avoiding
+	unnecessary teardown and setup operations in situations where
+	the cluster is not really going to be powered down.
+
+
+	Next states:
+
+	CLUSTER_UP/INBOUND_COMING_UP (outbound)
+		Conditions:	cluster-level setup and hardware
+				coherency complete
+		Trigger events:	(spontaneous)
+
+	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
+		Conditions:	cluster torn down and ready to power off
+		Trigger events:	(spontaneous)
+
+
+Last man and First man selection
+--------------------------------
+
+The CPU which performs cluster tear-down operations on the outbound side
+is commonly referred to as the "last man".
+
+The CPU which performs cluster setup on the inbound side is commonly
+referred to as the "first man".
+
+The race avoidance algorithm documented above does not provide a
+mechanism to choose which CPUs should play these roles.
+
+
+Last man:
+
+When shutting down the cluster, all the CPUs involved are initially
+executing Linux and hence coherent.  Therefore, ordinary spinlocks can
+be used to select a last man safely, before the CPUs become
+non-coherent.
+
+
+First man:
+
+Because CPUs may power up asynchronously in response to external wake-up
+events, a dynamic mechanism is needed to make sure that only one CPU
+attempts to play the first man role and do the cluster-level
+initialisation: any other CPUs must wait for this to complete before
+proceeding.
+
+Cluster-level initialisation may involve actions such as configuring
+coherency controls in the bus fabric.
+
+The current implementation in mcpm_head.S uses a separate mutual exclusion
+mechanism to do this arbitration.  This mechanism is documented in
+detail in vlocks.txt.
+
+
+Features and Limitations
+------------------------
+
+Implementation:
+
+	The current ARM-based implementation is split between
+	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
+	arch/arm/common/mcpm_entry.c (everything else):
+
+	__mcpm_cpu_going_down() signals the transition of a CPU to the
+		CPU_GOING_DOWN state.
+
+	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
+		state.
+
+	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
+		low-level power-up code in mcpm_head.S.  This could
+		involve CPU-specific setup code, but in the current
+		implementation it does not.
+
+	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
+		handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
+		and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
+		the case of an aborted cluster power-down).
+
+		These functions are more complex than the __mcpm_cpu_*()
+		functions due to the extra inter-CPU coordination which
+		is needed for safe transitions at the cluster level.
+
+	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
+		the low-level power-up code in mcpm_head.S.  This
+		typically involves platform-specific setup code,
+		provided by the platform-specific power_up_setup
+		function registered via mcpm_sync_init.
+
+Deep topologies:
+
+	As currently described and implemented, the algorithm does not
+	support CPU topologies involving more than two levels (i.e.,
+	clusters of clusters are not supported).  The algorithm could be
+	extended by replicating the cluster-level states for the
+	additional topological levels, and modifying the transition
+	rules for the intermediate (non-outermost) cluster levels.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, in
+collaboration with Nicolas Pitre and Achin Gupta.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
index c8c0e2113e..2b83121966 100644
--- a/arch/arm/common/mcpm_entry.c
+++ b/arch/arm/common/mcpm_entry.c
@@ -18,6 +18,7 @@
 #include <asm/proc-fns.h>
 #include <asm/cacheflush.h>
 #include <asm/idmap.h>
+#include <asm/cputype.h>
 
 extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
 
@@ -115,3 +116,199 @@ int mcpm_cpu_powered_up(void)
 		platform_ops->powered_up();
 	return 0;
 }
+
+struct sync_struct mcpm_sync;
+
+/*
+ * There is no __cpuc_clean_dcache_area but we use it anyway for
+ * code intent clarity, and alias it to __cpuc_flush_dcache_area.
+ */
+#define __cpuc_clean_dcache_area __cpuc_flush_dcache_area
+
+/*
+ * Ensure preceding writes to *p by this CPU are visible to
+ * subsequent reads by other CPUs:
+ */
+static void __sync_range_w(volatile void *p, size_t size)
+{
+	char *_p = (char *)p;
+
+	__cpuc_clean_dcache_area(_p, size);
+	outer_clean_range(__pa(_p), __pa(_p + size));
+}
+
+/*
+ * Ensure preceding writes to *p by other CPUs are visible to
+ * subsequent reads by this CPU.  We must be careful not to
+ * discard data simultaneously written by another CPU, hence the
+ * usage of flush rather than invalidate operations.
+ */
+static void __sync_range_r(volatile void *p, size_t size)
+{
+	char *_p = (char *)p;
+
+#ifdef CONFIG_OUTER_CACHE
+	if (outer_cache.flush_range) {
+		/*
+		 * Ensure dirty data migrated from other CPUs into our cache
+		 * are cleaned out safely before the outer cache is cleaned:
+		 */
+		__cpuc_clean_dcache_area(_p, size);
+
+		/* Clean and invalidate stale data for *p from outer ... */
+		outer_flush_range(__pa(_p), __pa(_p + size));
+	}
+#endif
+
+	/* ... and inner cache: */
+	__cpuc_flush_dcache_area(_p, size);
+}
+
+#define sync_w(ptr) __sync_range_w(ptr, sizeof *(ptr))
+#define sync_r(ptr) __sync_range_r(ptr, sizeof *(ptr))
+
+/*
+ * __mcpm_cpu_going_down: Indicates that the cpu is being torn down.
+ *    This must be called at the point of committing to teardown of a CPU.
+ *    The CPU cache (SCTRL.C bit) is expected to still be active.
+ */
+void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster)
+{
+	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_GOING_DOWN;
+	sync_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
+}
+
+/*
+ * __mcpm_cpu_down: Indicates that cpu teardown is complete and that the
+ *    cluster can be torn down without disrupting this CPU.
+ *    To avoid deadlocks, this must be called before a CPU is powered down.
+ *    The CPU cache (SCTRL.C bit) is expected to be off.
+ */
+void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster)
+{
+	dmb();
+	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_DOWN;
+	sync_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
+	dsb_sev();
+}
+
+/*
+ * __mcpm_outbound_leave_critical: Leave the cluster teardown critical section.
+ * @state: the final state of the cluster:
+ *     CLUSTER_UP: no destructive teardown was done and the cluster has been
+ *         restored to the previous state (CPU cache still active); or
+ *     CLUSTER_DOWN: the cluster has been torn-down, ready for power-off
+ *         (CPU cache disabled).
+ */
+void __mcpm_outbound_leave_critical(unsigned int cluster, int state)
+{
+	dmb();
+	mcpm_sync.clusters[cluster].cluster = state;
+	sync_w(&mcpm_sync.clusters[cluster].cluster);
+	dsb_sev();
+}
+
+/*
+ * __mcpm_outbound_enter_critical: Enter the cluster teardown critical section.
+ * This function should be called by the last man, after local CPU teardown
+ * is complete.  CPU cache expected to be active.
+ *
+ * Returns:
+ *     false: the critical section was not entered because an inbound CPU was
+ *         observed, or the cluster is already being set up;
+ *     true: the critical section was entered: it is now safe to tear down the
+ *         cluster.
+ */
+bool __mcpm_outbound_enter_critical(unsigned int cpu, unsigned int cluster)
+{
+	unsigned int i;
+	struct mcpm_sync_struct *c = &mcpm_sync.clusters[cluster];
+
+	/* Warn inbound CPUs that the cluster is being torn down: */
+	c->cluster = CLUSTER_GOING_DOWN;
+	sync_w(&c->cluster);
+
+	/* Back out if the inbound cluster is already in the critical region: */
+	sync_r(&c->inbound);
+	if (c->inbound == INBOUND_COMING_UP)
+		goto abort;
+
+	/*
+	 * Wait for all CPUs to get out of the GOING_DOWN state, so that local
+	 * teardown is complete on each CPU before tearing down the cluster.
+	 *
+	 * If any CPU has been woken up again from the DOWN state, then we
+	 * shouldn't be taking the cluster down at all: abort in that case.
+	 */
+	sync_r(&c->cpus);
+	for (i = 0; i < MAX_CPUS_PER_CLUSTER; i++) {
+		int cpustate;
+
+		if (i == cpu)
+			continue;
+
+		while (1) {
+			cpustate = c->cpus[i].cpu;
+			if (cpustate != CPU_GOING_DOWN)
+				break;
+
+			wfe();
+			sync_r(&c->cpus[i].cpu);
+		}
+
+		switch (cpustate) {
+		case CPU_DOWN:
+			continue;
+
+		default:
+			goto abort;
+		}
+	}
+
+	return true;
+
+abort:
+	__mcpm_outbound_leave_critical(cluster, CLUSTER_UP);
+	return false;
+}
+
+int __mcpm_mcpm_state(unsigned int cluster)
+{
+	sync_r(&mcpm_sync.clusters[cluster].cluster);
+	return mcpm_sync.clusters[cluster].cluster;
+}
+
+extern unsigned long mcpm_power_up_setup_phys;
+
+int __init mcpm_sync_init(
+	void (*power_up_setup)(unsigned int affinity_level))
+{
+	unsigned int i, j, mpidr, this_cluster;
+
+	BUILD_BUG_ON(MCPM_SYNC_CLUSTER_SIZE * MAX_NR_CLUSTERS != sizeof mcpm_sync);
+	BUG_ON((unsigned long)&mcpm_sync & (__CACHE_WRITEBACK_GRANULE - 1));
+
+	/*
+	 * Set initial CPU and cluster states.
+	 * Only one cluster is assumed to be active at this point.
+	 */
+	for (i = 0; i < MAX_NR_CLUSTERS; i++) {
+		mcpm_sync.clusters[i].cluster = CLUSTER_DOWN;
+		mcpm_sync.clusters[i].inbound = INBOUND_NOT_COMING_UP;
+		for (j = 0; j < MAX_CPUS_PER_CLUSTER; j++)
+			mcpm_sync.clusters[i].cpus[j].cpu = CPU_DOWN;
+	}
+	mpidr = read_cpuid_mpidr();
+	this_cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+	for_each_online_cpu(i)
+		mcpm_sync.clusters[this_cluster].cpus[i].cpu = CPU_UP;
+	mcpm_sync.clusters[this_cluster].cluster = CLUSTER_UP;
+	sync_w(&mcpm_sync);
+
+	if (power_up_setup) {
+		mcpm_power_up_setup_phys = virt_to_phys(power_up_setup);
+		sync_w(&mcpm_power_up_setup_phys);
+	}
+
+	return 0;
+}
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
index cda65f200b..86650d4e94 100644
--- a/arch/arm/common/mcpm_head.S
+++ b/arch/arm/common/mcpm_head.S
@@ -7,11 +7,19 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
+ *
+ *
+ * Refer to Documentation/arm/cluster-pm-race-avoidance.txt
+ * for details of the synchronisation algorithms used here.
  */
 
 #include <linux/linkage.h>
 #include <asm/mcpm_entry.h>
 
+.if MCPM_SYNC_CLUSTER_CPUS
+.error "cpus must be the first member of struct mcpm_sync_struct"
+.endif
+
 	.macro	pr_dbg	string
 #if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
 	b	1901f
@@ -57,24 +65,114 @@ ENTRY(mcpm_entry_point)
 2:	pr_dbg	"kernel mcpm_entry_point\n"
 
 	/*
-	 * MMU is off so we need to get to mcpm_entry_vectors in a
+	 * MMU is off so we need to get to various variables in a
 	 * position independent way.
 	 */
 	adr	r5, 3f
-	ldr	r6, [r5]
+	ldmia	r5, {r6, r7, r8}
 	add	r6, r5, r6			@ r6 = mcpm_entry_vectors
+	ldr	r7, [r5, r7]			@ r7 = mcpm_power_up_setup_phys
+	add	r8, r5, r8			@ r8 = mcpm_sync
+
+	mov	r0, #MCPM_SYNC_CLUSTER_SIZE
+	mla	r8, r0, r10, r8			@ r8 = sync cluster base
+
+	@ Signal that this CPU is coming UP:
+	mov	r0, #CPU_COMING_UP
+	mov	r5, #MCPM_SYNC_CPU_SIZE
+	mla	r5, r9, r5, r8			@ r5 = sync cpu address
+	strb	r0, [r5]
+
+	@ At this point, the cluster cannot unexpectedly enter the GOING_DOWN
+	@ state, because there is at least one active CPU (this CPU).
+
+	@ Note: the following is racy as another CPU might be testing
+	@ the same flag at the same moment.  That'll be fixed later.
+	ldrb	r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
+	cmp	r0, #CLUSTER_UP			@ cluster already up?
+	bne	mcpm_setup			@ if not, set up the cluster
+
+	@ Otherwise, skip setup:
+	b	mcpm_setup_complete
+
+mcpm_setup:
+	@ Control dependency implies strb not observable before previous ldrb.
+
+	@ Signal that the cluster is being brought up:
+	mov	r0, #INBOUND_COMING_UP
+	strb	r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
+	dmb
+
+	@ Any CPU trying to take the cluster into CLUSTER_GOING_DOWN from this
+	@ point onwards will observe INBOUND_COMING_UP and abort.
+
+	@ Wait for any previously-pending cluster teardown operations to abort
+	@ or complete:
+mcpm_teardown_wait:
+	ldrb	r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
+	cmp	r0, #CLUSTER_GOING_DOWN
+	bne	first_man_setup
+	wfe
+	b	mcpm_teardown_wait
+
+first_man_setup:
+	dmb
+
+	@ If the outbound gave up before teardown started, skip cluster setup:
+
+	cmp	r0, #CLUSTER_UP
+	beq	mcpm_setup_leave
+
+	@ power_up_setup is now responsible for setting up the cluster:
+
+	cmp	r7, #0
+	mov	r0, #1		@ second (cluster) affinity level
+	blxne	r7		@ Call power_up_setup if defined
+	dmb
+
+	mov	r0, #CLUSTER_UP
+	strb	r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
+	dmb
+
+mcpm_setup_leave:
+	@ Leave the cluster setup critical section:
+
+	mov	r0, #INBOUND_NOT_COMING_UP
+	strb	r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
+	dsb
+	sev
+
+mcpm_setup_complete:
+	@ If a platform-specific CPU setup hook is needed, it is
+	@ called from here.
+
+	cmp	r7, #0
+	mov	r0, #0		@ first (CPU) affinity level
+	blxne	r7		@ Call power_up_setup if defined
+	dmb
+
+	@ Mark the CPU as up:
+
+	mov	r0, #CPU_UP
+	strb	r0, [r5]
+
+	@ Observability order of CPU_UP and opening of the gate does not matter.
 
 mcpm_entry_gated:
 	ldr	r5, [r6, r4, lsl #2]		@ r5 = CPU entry vector
 	cmp	r5, #0
 	wfeeq
 	beq	mcpm_entry_gated
+	dmb
+
 	pr_dbg	"released\n"
 	bx	r5
 
 	.align	2
 
 3:	.word	mcpm_entry_vectors - .
+	.word	mcpm_power_up_setup_phys - 3b
+	.word	mcpm_sync - 3b
 
 ENDPROC(mcpm_entry_point)
 
@@ -84,3 +182,7 @@ ENDPROC(mcpm_entry_point)
 	.type	mcpm_entry_vectors, #object
 ENTRY(mcpm_entry_vectors)
 	.space	4 * MAX_NR_CLUSTERS * MAX_CPUS_PER_CLUSTER
+
+	.type	mcpm_power_up_setup_phys, #object
+ENTRY(mcpm_power_up_setup_phys)
+	.space  4		@ set by mcpm_sync_init()
diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
index 3286d5eb91..e76652209d 100644
--- a/arch/arm/include/asm/mcpm_entry.h
+++ b/arch/arm/include/asm/mcpm_entry.h
@@ -15,8 +15,37 @@
 #define MAX_CPUS_PER_CLUSTER	4
 #define MAX_NR_CLUSTERS		2
 
+/* Definitions for mcpm_sync_struct */
+#define CPU_DOWN		0x11
+#define CPU_COMING_UP		0x12
+#define CPU_UP			0x13
+#define CPU_GOING_DOWN		0x14
+
+#define CLUSTER_DOWN		0x21
+#define CLUSTER_UP		0x22
+#define CLUSTER_GOING_DOWN	0x23
+
+#define INBOUND_NOT_COMING_UP	0x31
+#define INBOUND_COMING_UP	0x32
+
+/* This is a complete guess. */
+#define __CACHE_WRITEBACK_ORDER	6
+#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
+
+/* Offsets for the mcpm_sync_struct members, for use in asm: */
+#define MCPM_SYNC_CLUSTER_CPUS	0
+#define MCPM_SYNC_CPU_SIZE	__CACHE_WRITEBACK_GRANULE
+#define MCPM_SYNC_CLUSTER_CLUSTER \
+	(MCPM_SYNC_CLUSTER_CPUS + MCPM_SYNC_CPU_SIZE * MAX_CPUS_PER_CLUSTER)
+#define MCPM_SYNC_CLUSTER_INBOUND \
+	(MCPM_SYNC_CLUSTER_CLUSTER + __CACHE_WRITEBACK_GRANULE)
+#define MCPM_SYNC_CLUSTER_SIZE \
+	(MCPM_SYNC_CLUSTER_INBOUND + __CACHE_WRITEBACK_GRANULE)
+
 #ifndef __ASSEMBLY__
 
+#include <linux/types.h>
+
 /*
  * Platform specific code should use this symbol to set up secondary
  * entry location for processors to use when released from reset.
@@ -123,5 +152,39 @@ struct mcpm_platform_ops {
  */
 int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
 
+/* Synchronisation structures for coordinating safe cluster setup/teardown: */
+
+/*
+ * When modifying this structure, make sure you update the MCPM_SYNC_ defines
+ * to match.
+ */
+struct mcpm_sync_struct {
+	/* individual CPU states */
+	struct {
+		volatile s8 cpu __aligned(__CACHE_WRITEBACK_GRANULE);
+	} cpus[MAX_CPUS_PER_CLUSTER];
+
+	/* cluster state */
+	volatile s8 cluster __aligned(__CACHE_WRITEBACK_GRANULE);
+
+	/* inbound-side state */
+	volatile s8 inbound __aligned(__CACHE_WRITEBACK_GRANULE);
+};
+
+struct sync_struct {
+	struct mcpm_sync_struct clusters[MAX_NR_CLUSTERS];
+};
+
+extern unsigned long sync_phys;	/* physical address of *mcpm_sync */
+
+void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster);
+void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster);
+void __mcpm_outbound_leave_critical(unsigned int cluster, int state);
+bool __mcpm_outbound_enter_critical(unsigned int this_cpu, unsigned int cluster);
+int __mcpm_mcpm_state(unsigned int cluster);
+
+int __init mcpm_sync_init(
+	void (*power_up_setup)(unsigned int affinity_level));
+
 #endif /* ! __ASSEMBLY__ */
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 04/15] ARM: mcpm: Add baremetal voting mutexes
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (2 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 05/15] ARM: mcpm_head.S: vlock-based first man election Nicolas Pitre
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dave Martin <dave.martin@linaro.org>

This patch adds a simple low-level voting mutex implementation
to be used to arbitrate during first man selection when no load/store
exclusive instructions are usable.

For want of a better name, these are called "vlocks".  (I was
tempted to call them ballot locks, but "block" is way too confusing
an abbreviation...)

There is no function to wait for the lock to be released, and no
vlock_lock() function since we don't need these at the moment.
These could straightforwardly be added if vlocks get used for other
purposes.

For architectural correctness even Strongly-Ordered memory accesses
require barriers in order to guarantee that multiple CPUs have a
coherent view of the ordering of memory accesses.  Whether or not
this matters depends on hardware implementation details of the
memory system.  Since the purpose of this code is to provide a clean,
generic locking mechanism with no platform-specific dependencies the
barriers should be present to avoid unpleasant surprises on future
platforms.

Note:

  * When taking the lock, we don't care about implicit background
    memory operations and other signalling which may be pending,
    because those are not part of the critical section anyway.

    A DMB is sufficient to ensure correctly observed ordering if
    the explicit memory accesses in vlock_trylock.

  * No barrier is required after checking the election result,
    because the result is determined by the store to
    VLOCK_OWNER_OFFSET and is already globally observed due to the
    barriers in voting_end.  This means that global agreement on
    the winner is guaranteed, even before the winner is known
    locally.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/arm/vlocks.txt | 211 +++++++++++++++++++++++++++++++++++++++++++
 arch/arm/common/vlock.S      | 108 ++++++++++++++++++++++
 arch/arm/common/vlock.h      |  29 ++++++
 3 files changed, 348 insertions(+)
 create mode 100644 Documentation/arm/vlocks.txt
 create mode 100644 arch/arm/common/vlock.S
 create mode 100644 arch/arm/common/vlock.h

diff --git a/Documentation/arm/vlocks.txt b/Documentation/arm/vlocks.txt
new file mode 100644
index 0000000000..415960a9ba
--- /dev/null
+++ b/Documentation/arm/vlocks.txt
@@ -0,0 +1,211 @@
+vlocks for Bare-Metal Mutual Exclusion
+======================================
+
+Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
+mechanism, with reasonable but minimal requirements on the memory
+system.
+
+These are intended to be used to coordinate critical activity among CPUs
+which are otherwise non-coherent, in situations where the hardware
+provides no other mechanism to support this and ordinary spinlocks
+cannot be used.
+
+
+vlocks make use of the atomicity provided by the memory system for
+writes to a single memory location.  To arbitrate, every CPU "votes for
+itself", by storing a unique number to a common memory location.  The
+final value seen in that memory location when all the votes have been
+cast identifies the winner.
+
+In order to make sure that the election produces an unambiguous result
+in finite time, a CPU will only enter the election in the first place if
+no winner has been chosen and the election does not appear to have
+started yet.
+
+
+Algorithm
+---------
+
+The easiest way to explain the vlocks algorithm is with some pseudo-code:
+
+
+	int currently_voting[NR_CPUS] = { 0, };
+	int last_vote = -1; /* no votes yet */
+
+	bool vlock_trylock(int this_cpu)
+	{
+		/* signal our desire to vote */
+		currently_voting[this_cpu] = 1;
+		if (last_vote != -1) {
+			/* someone already volunteered himself */
+			currently_voting[this_cpu] = 0;
+			return false; /* not ourself */
+		}
+
+		/* let's suggest ourself */
+		last_vote = this_cpu;
+		currently_voting[this_cpu] = 0;
+
+		/* then wait until everyone else is done voting */
+		for_each_cpu(i) {
+			while (currently_voting[i] != 0)
+				/* wait */;
+		}
+
+		/* result */
+		if (last_vote == this_cpu)
+			return true; /* we won */
+		return false;
+	}
+
+	bool vlock_unlock(void)
+	{
+		last_vote = -1;
+	}
+
+
+The currently_voting[] array provides a way for the CPUs to determine
+whether an election is in progress, and plays a role analogous to the
+"entering" array in Lamport's bakery algorithm [1].
+
+However, once the election has started, the underlying memory system
+atomicity is used to pick the winner.  This avoids the need for a static
+priority rule to act as a tie-breaker, or any counters which could
+overflow.
+
+As long as the last_vote variable is globally visible to all CPUs, it
+will contain only one value that won't change once every CPU has cleared
+its currently_voting flag.
+
+
+Features and limitations
+------------------------
+
+ * vlocks are not intended to be fair.  In the contended case, it is the
+   _last_ CPU which attempts to get the lock which will be most likely
+   to win.
+
+   vlocks are therefore best suited to situations where it is necessary
+   to pick a unique winner, but it does not matter which CPU actually
+   wins.
+
+ * Like other similar mechanisms, vlocks will not scale well to a large
+   number of CPUs.
+
+   vlocks can be cascaded in a voting hierarchy to permit better scaling
+   if necessary, as in the following hypothetical example for 4096 CPUs:
+
+	/* first level: local election */
+	my_town = towns[(this_cpu >> 4) & 0xf];
+	I_won = vlock_trylock(my_town, this_cpu & 0xf);
+	if (I_won) {
+		/* we won the town election, let's go for the state */
+		my_state = states[(this_cpu >> 8) & 0xf];
+		I_won = vlock_lock(my_state, this_cpu & 0xf));
+		if (I_won) {
+			/* and so on */
+			I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
+			if (I_won) {
+				/* ... */
+			}
+			vlock_unlock(the_whole_country);
+		}
+		vlock_unlock(my_state);
+	}
+	vlock_unlock(my_town);
+
+
+ARM implementation
+------------------
+
+The current ARM implementation [2] contains some optimisations beyond
+the basic algorithm:
+
+ * By packing the members of the currently_voting array close together,
+   we can read the whole array in one transaction (providing the number
+   of CPUs potentially contending the lock is small enough).  This
+   reduces the number of round-trips required to external memory.
+
+   In the ARM implementation, this means that we can use a single load
+   and comparison:
+
+	LDR	Rt, [Rn]
+	CMP	Rt, #0
+
+   ...in place of code equivalent to:
+
+	LDRB	Rt, [Rn]
+	CMP	Rt, #0
+	LDRBEQ	Rt, [Rn, #1]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #2]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #3]
+	CMPEQ	Rt, #0
+
+   This cuts down on the fast-path latency, as well as potentially
+   reducing bus contention in contended cases.
+
+   The optimisation relies on the fact that the ARM memory system
+   guarantees coherency between overlapping memory accesses of
+   different sizes, similarly to many other architectures.  Note that
+   we do not care which element of currently_voting appears in which
+   bits of Rt, so there is no need to worry about endianness in this
+   optimisation.
+
+   If there are too many CPUs to read the currently_voting array in
+   one transaction then multiple transations are still required.  The
+   implementation uses a simple loop of word-sized loads for this
+   case.  The number of transactions is still fewer than would be
+   required if bytes were loaded individually.
+
+
+   In principle, we could aggregate further by using LDRD or LDM, but
+   to keep the code simple this was not attempted in the initial
+   implementation.
+
+
+ * vlocks are currently only used to coordinate between CPUs which are
+   unable to enable their caches yet.  This means that the
+   implementation removes many of the barriers which would be required
+   when executing the algorithm in cached memory.
+
+   packing of the currently_voting array does not work with cached
+   memory unless all CPUs contending the lock are cache-coherent, due
+   to cache writebacks from one CPU clobbering values written by other
+   CPUs.  (Though if all the CPUs are cache-coherent, you should be
+   probably be using proper spinlocks instead anyway).
+
+
+ * The "no votes yet" value used for the last_vote variable is 0 (not
+   -1 as in the pseudocode).  This allows statically-allocated vlocks
+   to be implicitly initialised to an unlocked state simply by putting
+   them in .bss.
+
+   An offset is added to each CPU's ID for the purpose of setting this
+   variable, so that no CPU uses the value 0 for its ID.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, for
+use in ARM-based big.LITTLE platforms, with review and input gratefully
+received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
+grabbing most of this text out of the relevant mail thread and writing
+up the pseudocode.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
+
+
+References
+----------
+
+[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
+    Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
+
+    http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
+
+[2] linux/arch/arm/common/vlock.S, www.kernel.org.
diff --git a/arch/arm/common/vlock.S b/arch/arm/common/vlock.S
new file mode 100644
index 0000000000..ff198583f6
--- /dev/null
+++ b/arch/arm/common/vlock.S
@@ -0,0 +1,108 @@
+/*
+ * vlock.S - simple voting lock implementation for ARM
+ *
+ * Created by:	Dave Martin, 2012-08-16
+ * Copyright:	(C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ *
+ * This algorithm is described in more detail in
+ * Documentation/arm/vlocks.txt.
+ */
+
+#include <linux/linkage.h>
+#include "vlock.h"
+
+/* Select different code if voting flags  can fit in a single word. */
+#if VLOCK_VOTING_SIZE > 4
+#define FEW(x...)
+#define MANY(x...) x
+#else
+#define FEW(x...) x
+#define MANY(x...)
+#endif
+
+@ voting lock for first-man coordination
+
+.macro voting_begin rbase:req, rcpu:req, rscratch:req
+	mov	\rscratch, #1
+	strb	\rscratch, [\rbase, \rcpu]
+	dmb
+.endm
+
+.macro voting_end rbase:req, rcpu:req, rscratch:req
+	dmb
+	mov	\rscratch, #0
+	strb	\rscratch, [\rbase, \rcpu]
+	dsb
+	sev
+.endm
+
+/*
+ * The vlock structure must reside in Strongly-Ordered or Device memory.
+ * This implementation deliberately eliminates most of the barriers which
+ * would be required for other memory types, and assumes that independent
+ * writes to neighbouring locations within a cacheline do not interfere
+ * with one another.
+ */
+
+@ r0: lock structure base
+@ r1: CPU ID (0-based index within cluster)
+ENTRY(vlock_trylock)
+	add	r1, r1, #VLOCK_VOTING_OFFSET
+
+	voting_begin	r0, r1, r2
+
+	ldrb	r2, [r0, #VLOCK_OWNER_OFFSET]	@ check whether lock is held
+	cmp	r2, #VLOCK_OWNER_NONE
+	bne	trylock_fail			@ fail if so
+
+	@ Control dependency implies strb not observable before previous ldrb.
+
+	strb	r1, [r0, #VLOCK_OWNER_OFFSET]	@ submit my vote
+
+	voting_end	r0, r1, r2		@ implies DMB
+
+	@ Wait for the current round of voting to finish:
+
+ MANY(	mov	r3, #VLOCK_VOTING_OFFSET			)
+0:
+ MANY(	ldr	r2, [r0, r3]					)
+ FEW(	ldr	r2, [r0, #VLOCK_VOTING_OFFSET]			)
+	cmp	r2, #0
+	wfene
+	bne	0b
+ MANY(	add	r3, r3, #4					)
+ MANY(	cmp	r3, #VLOCK_VOTING_OFFSET + VLOCK_VOTING_SIZE	)
+ MANY(	bne	0b						)
+
+	@ Check who won:
+
+	dmb
+	ldrb	r2, [r0, #VLOCK_OWNER_OFFSET]
+	eor	r0, r1, r2			@ zero if I won, else nonzero
+	bx	lr
+
+trylock_fail:
+	voting_end	r0, r1, r2
+	mov	r0, #1				@ nonzero indicates that I lost
+	bx	lr
+ENDPROC(vlock_trylock)
+
+@ r0: lock structure base
+ENTRY(vlock_unlock)
+	dmb
+	mov	r1, #VLOCK_OWNER_NONE
+	strb	r1, [r0, #VLOCK_OWNER_OFFSET]
+	dsb
+	sev
+	bx	lr
+ENDPROC(vlock_unlock)
diff --git a/arch/arm/common/vlock.h b/arch/arm/common/vlock.h
new file mode 100644
index 0000000000..eda912f915
--- /dev/null
+++ b/arch/arm/common/vlock.h
@@ -0,0 +1,29 @@
+/*
+ * vlock.h - simple voting lock implementation
+ *
+ * Created by:	Dave Martin, 2012-08-16
+ * Copyright:	(C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __VLOCK_H
+#define __VLOCK_H
+
+#include <asm/mcpm_entry.h>
+
+/* Offsets and sizes are rounded to a word (4 bytes) */
+#define VLOCK_OWNER_OFFSET	0
+#define VLOCK_VOTING_OFFSET	4
+#define VLOCK_VOTING_SIZE	((MAX_CPUS_PER_CLUSTER + 3) / 4 * 4)
+#define VLOCK_SIZE		(VLOCK_VOTING_OFFSET + VLOCK_VOTING_SIZE)
+#define VLOCK_OWNER_NONE	0
+
+#endif /* ! __VLOCK_H */
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 05/15] ARM: mcpm_head.S: vlock-based first man election
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (3 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 04/15] ARM: mcpm: Add baremetal voting mutexes Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support Nicolas Pitre
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dave Martin <dave.martin@linaro.org>

Instead of requiring the first man to be elected in advance (which
can be suboptimal in some situations), this patch uses a per-
cluster mutex to co-ordinate selection of the first man.

This should also make it more feasible to reuse this code path for
asynchronous cluster resume (as in CPUidle scenarios).

We must ensure that the vlock data doesn't share a cacheline with
anything else, or dirty cache eviction could corrupt it.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/common/Makefile    |  2 +-
 arch/arm/common/mcpm_head.S | 41 ++++++++++++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index 23e85b1fae..c901a38c59 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -13,4 +13,4 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
 obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
 obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
 obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
-obj-$(CONFIG_CLUSTER_PM)	+= mcpm_head.o mcpm_entry.o
+obj-$(CONFIG_CLUSTER_PM)	+= mcpm_head.o mcpm_entry.o vlock.o
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
index 86650d4e94..2afc8f26cb 100644
--- a/arch/arm/common/mcpm_head.S
+++ b/arch/arm/common/mcpm_head.S
@@ -16,6 +16,8 @@
 #include <linux/linkage.h>
 #include <asm/mcpm_entry.h>
 
+#include "vlock.h"
+
 .if MCPM_SYNC_CLUSTER_CPUS
 .error "cpus must be the first member of struct mcpm_sync_struct"
 .endif
@@ -69,10 +71,11 @@ ENTRY(mcpm_entry_point)
 	 * position independent way.
 	 */
 	adr	r5, 3f
-	ldmia	r5, {r6, r7, r8}
+	ldmia	r5, {r6, r7, r8, r11}
 	add	r6, r5, r6			@ r6 = mcpm_entry_vectors
 	ldr	r7, [r5, r7]			@ r7 = mcpm_power_up_setup_phys
 	add	r8, r5, r8			@ r8 = mcpm_sync
+	add	r11, r5, r11			@ r11 = first_man_locks
 
 	mov	r0, #MCPM_SYNC_CLUSTER_SIZE
 	mla	r8, r0, r10, r8			@ r8 = sync cluster base
@@ -86,13 +89,22 @@ ENTRY(mcpm_entry_point)
 	@ At this point, the cluster cannot unexpectedly enter the GOING_DOWN
 	@ state, because there is at least one active CPU (this CPU).
 
-	@ Note: the following is racy as another CPU might be testing
-	@ the same flag at the same moment.  That'll be fixed later.
+	mov	r0, #VLOCK_SIZE
+	mla	r11, r0, r10, r11		@ r11 = cluster first man lock
+	mov	r0, r11
+	mov	r1, r9				@ cpu
+	bl	vlock_trylock			@ implies DMB
+
+	cmp	r0, #0				@ failed to get the lock?
+	bne	mcpm_setup_wait		@ wait for cluster setup if so
+
 	ldrb	r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
 	cmp	r0, #CLUSTER_UP			@ cluster already up?
 	bne	mcpm_setup			@ if not, set up the cluster
 
-	@ Otherwise, skip setup:
+	@ Otherwise, release the first man lock and skip setup:
+	mov	r0, r11
+	bl	vlock_unlock
 	b	mcpm_setup_complete
 
 mcpm_setup:
@@ -142,6 +154,19 @@ mcpm_setup_leave:
 	dsb
 	sev
 
+	mov	r0, r11
+	bl	vlock_unlock	@ implies DMB
+	b	mcpm_setup_complete
+
+	@ In the contended case, non-first men wait here for cluster setup
+	@ to complete:
+mcpm_setup_wait:
+	ldrb	r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
+	cmp	r0, #CLUSTER_UP
+	wfene
+	bne	mcpm_setup_wait
+	dmb
+
 mcpm_setup_complete:
 	@ If a platform-specific CPU setup hook is needed, it is
 	@ called from here.
@@ -173,11 +198,17 @@ mcpm_entry_gated:
 3:	.word	mcpm_entry_vectors - .
 	.word	mcpm_power_up_setup_phys - 3b
 	.word	mcpm_sync - 3b
+	.word	first_man_locks - 3b
 
 ENDPROC(mcpm_entry_point)
 
 	.bss
-	.align	5
+
+	.align	__CACHE_WRITEBACK_ORDER
+	.type	first_man_locks, #object
+first_man_locks:
+	.space	VLOCK_SIZE * MAX_NR_CLUSTERS
+	.align	__CACHE_WRITEBACK_ORDER
 
 	.type	mcpm_entry_vectors, #object
 ENTRY(mcpm_entry_vectors)
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (4 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 05/15] ARM: mcpm_head.S: vlock-based first man election Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-04-23 19:31   ` Russell King - ARM Linux
  2013-02-05  5:22 ` [PATCH v4 07/15] ARM: introduce common set_auxcr/get_auxcr functions Nicolas Pitre
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the cluster power API is in place, we can use it for SMP secondary
bringup and CPU hotplug in a generic fashion.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/common/Makefile       |  2 +-
 arch/arm/common/mcpm_platsmp.c | 86 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/common/mcpm_platsmp.c

diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index c901a38c59..e1c9db45de 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -13,4 +13,4 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
 obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
 obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
 obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
-obj-$(CONFIG_CLUSTER_PM)	+= mcpm_head.o mcpm_entry.o vlock.o
+obj-$(CONFIG_CLUSTER_PM)	+= mcpm_head.o mcpm_entry.o mcpm_platsmp.o vlock.o
diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
new file mode 100644
index 0000000000..e81e73d0f8
--- /dev/null
+++ b/arch/arm/common/mcpm_platsmp.c
@@ -0,0 +1,86 @@
+/*
+ * linux/arch/arm/mach-vexpress/mcpm_platsmp.c
+ *
+ * Created by:  Nicolas Pitre, November 2012
+ * Copyright:   (C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Code to handle secondary CPU bringup and hotplug for the cluster power API.
+ */
+
+#include <linux/init.h>
+#include <linux/smp.h>
+
+#include <asm/mcpm_entry.h>
+#include <asm/smp.h>
+#include <asm/smp_plat.h>
+#include <asm/hardware/gic.h>
+
+static void __init simple_smp_init_cpus(void)
+{
+	set_smp_cross_call(gic_raise_softirq);
+}
+
+static int __cpuinit mcpm_boot_secondary(unsigned int cpu, struct task_struct *idle)
+{
+	unsigned int mpidr, pcpu, pcluster, ret;
+	extern void secondary_startup(void);
+
+	mpidr = cpu_logical_map(cpu);
+	pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+	pr_debug("%s: logical CPU %d is physical CPU %d cluster %d\n",
+		 __func__, cpu, pcpu, pcluster);
+
+	mcpm_set_entry_vector(pcpu, pcluster, NULL);
+	ret = mcpm_cpu_power_up(pcpu, pcluster);
+	if (ret)
+		return ret;
+	mcpm_set_entry_vector(pcpu, pcluster, secondary_startup);
+	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
+	dsb_sev();
+	return 0;
+}
+
+static void __cpuinit mcpm_secondary_init(unsigned int cpu)
+{
+	mcpm_cpu_powered_up();
+	gic_secondary_init(0);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+static int mcpm_cpu_disable(unsigned int cpu)
+{
+	/*
+	 * We assume all CPUs may be shut down.
+	 * This would be the hook to use for eventual Secure
+	 * OS migration requests as described in the PSCI spec.
+	 */
+	return 0;
+}
+
+static void mcpm_cpu_die(unsigned int cpu)
+{
+	unsigned int mpidr, pcpu, pcluster;
+	mpidr = read_cpuid_mpidr();
+	pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+	mcpm_set_entry_vector(pcpu, pcluster, NULL);
+	mcpm_cpu_power_down();
+}
+
+#endif
+
+struct smp_operations __initdata mcpm_smp_ops = {
+	.smp_init_cpus		= simple_smp_init_cpus,
+	.smp_boot_secondary	= mcpm_boot_secondary,
+	.smp_secondary_init	= mcpm_secondary_init,
+#ifdef CONFIG_HOTPLUG_CPU
+	.cpu_disable		= mcpm_cpu_disable,
+	.cpu_die		= mcpm_cpu_die,
+#endif
+};
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 07/15] ARM: introduce common set_auxcr/get_auxcr functions
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (5 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support Nicolas Pitre
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <rob.herring@calxeda.com>

Move the private set_auxcr/get_auxcr functions from
drivers/cpuidle/cpuidle-calxeda.c so they can be used across platforms.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/cp15.h       | 14 ++++++++++++++
 drivers/cpuidle/cpuidle-calxeda.c | 14 --------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
index 5ef4d8015a..ce4d01c03e 100644
--- a/arch/arm/include/asm/cp15.h
+++ b/arch/arm/include/asm/cp15.h
@@ -59,6 +59,20 @@ static inline void set_cr(unsigned int val)
 	isb();
 }
 
+static inline unsigned int get_auxcr(void)
+{
+	unsigned int val;
+	asm("mrc p15, 0, %0, c1, c0, 1	@ get AUXCR" : "=r" (val));
+	return val;
+}
+
+static inline void set_auxcr(unsigned int val)
+{
+	asm volatile("mcr p15, 0, %0, c1, c0, 1	@ set AUXCR"
+	  : : "r" (val));
+	isb();
+}
+
 #ifndef CONFIG_SMP
 extern void adjust_cr(unsigned long mask, unsigned long set);
 #endif
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index e1aab38c5a..ece83d6e04 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -37,20 +37,6 @@ extern void *scu_base_addr;
 
 static struct cpuidle_device __percpu *calxeda_idle_cpuidle_devices;
 
-static inline unsigned int get_auxcr(void)
-{
-	unsigned int val;
-	asm("mrc p15, 0, %0, c1, c0, 1	@ get AUXCR" : "=r" (val) : : "cc");
-	return val;
-}
-
-static inline void set_auxcr(unsigned int val)
-{
-	asm volatile("mcr p15, 0, %0, c1, c0, 1	@ set AUXCR"
-	  : : "r" (val) : "cc");
-	isb();
-}
-
 static noinline void calxeda_idle_restore(void)
 {
 	set_cr(get_cr() | CR_C);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (6 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 07/15] ARM: introduce common set_auxcr/get_auxcr functions Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-07 18:14   ` Catalin Marinas
  2013-02-05  5:22 ` [PATCH v4 09/15] ARM: vexpress/dcscb: add CPU use counts to the power up/down API implementation Nicolas Pitre
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

This adds basic CPU and cluster reset controls on RTSM for the
A15x4-A7x4 model configuration using the Dual Cluster System
Configuration Block (DCSCB).

The cache coherency interconnect (CCI) is not handled yet.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 .../devicetree/bindings/arm/rtsm-dcscb.txt         |  19 +++
 arch/arm/mach-vexpress/Kconfig                     |   8 +
 arch/arm/mach-vexpress/Makefile                    |   1 +
 arch/arm/mach-vexpress/dcscb.c                     | 162 +++++++++++++++++++++
 4 files changed, 190 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/rtsm-dcscb.txt
 create mode 100644 arch/arm/mach-vexpress/dcscb.c

diff --git a/Documentation/devicetree/bindings/arm/rtsm-dcscb.txt b/Documentation/devicetree/bindings/arm/rtsm-dcscb.txt
new file mode 100644
index 0000000000..3b8fbf3c00
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/rtsm-dcscb.txt
@@ -0,0 +1,19 @@
+ARM Dual Cluster System Configuration Block
+-------------------------------------------
+
+The Dual Cluster System Configuration Block (DCSCB) provides basic
+functionality for controlling clocks, resets and configuration pins in
+the Dual Cluster System implemented by the Real-Time System Model (RTSM).
+
+Required properties:
+
+- compatible : should be "arm,rtsm,dcscb"
+
+- reg : physical base address and the size of the registers window
+
+Example:
+
+	dcscb at 60000000 {
+		compatible = "arm,rtsm,dcscb";
+		reg = <0x60000000 0x1000>;
+	};
diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig
index 52d315b792..f3f92b120a 100644
--- a/arch/arm/mach-vexpress/Kconfig
+++ b/arch/arm/mach-vexpress/Kconfig
@@ -52,4 +52,12 @@ config ARCH_VEXPRESS_CORTEX_A5_A9_ERRATA
 config ARCH_VEXPRESS_CA9X4
 	bool "Versatile Express Cortex-A9x4 tile"
 
+config ARCH_VEXPRESS_DCSCB
+	bool "Dual Cluster System Control Block (DCSCB) support"
+	depends on CLUSTER_PM
+	help
+	  Support for the Dual Cluster System Configuration Block (DCSCB).
+	  This is needed to provide CPU and cluster power management
+	  on RTSM.
+
 endmenu
diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile
index 80b64971fb..2253644054 100644
--- a/arch/arm/mach-vexpress/Makefile
+++ b/arch/arm/mach-vexpress/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \
 
 obj-y					:= v2m.o reset.o
 obj-$(CONFIG_ARCH_VEXPRESS_CA9X4)	+= ct-ca9x4.o
+obj-$(CONFIG_ARCH_VEXPRESS_DCSCB)	+= dcscb.o
 obj-$(CONFIG_SMP)			+= platsmp.o
 obj-$(CONFIG_HOTPLUG_CPU)		+= hotplug.o
diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
new file mode 100644
index 0000000000..07e835cb72
--- /dev/null
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -0,0 +1,162 @@
+/*
+ * arch/arm/mach-vexpress/dcscb.c - Dual Cluster System Configuration Block
+ *
+ * Created by:	Nicolas Pitre, May 2012
+ * Copyright:	(C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+#include <linux/of_address.h>
+#include <linux/vexpress.h>
+
+#include <asm/mcpm_entry.h>
+#include <asm/proc-fns.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/cp15.h>
+
+
+#define RST_HOLD0	0x0
+#define RST_HOLD1	0x4
+#define SYS_SWRESET	0x8
+#define RST_STAT0	0xc
+#define RST_STAT1	0x10
+#define EAG_CFG_R	0x20
+#define EAG_CFG_W	0x24
+#define KFC_CFG_R	0x28
+#define KFC_CFG_W	0x2c
+#define DCS_CFG_R	0x30
+
+/*
+ * We can't use regular spinlocks. In the switcher case, it is possible
+ * for an outbound CPU to call power_down() after its inbound counterpart
+ * is already live using the same logical CPU number which trips lockdep
+ * debugging.
+ */
+static arch_spinlock_t dcscb_lock = __ARCH_SPIN_LOCK_UNLOCKED;
+
+static void __iomem *dcscb_base;
+
+static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
+{
+	unsigned int rst_hold, cpumask = (1 << cpu);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	if (cpu >= 4 || cluster >= 2)
+		return -EINVAL;
+
+	/*
+	 * Since this is called with IRQs enabled, and no arch_spin_lock_irq
+	 * variant exists, we need to disable IRQs manually here.
+	 */
+	local_irq_disable();
+	arch_spin_lock(&dcscb_lock);
+
+	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
+	if (rst_hold & (1 << 8)) {
+		/* remove cluster reset and add individual CPU's reset */
+		rst_hold &= ~(1 << 8);
+		rst_hold |= 0xf;
+	}
+	rst_hold &= ~(cpumask | (cpumask << 4));
+	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
+
+	arch_spin_unlock(&dcscb_lock);
+	local_irq_enable();
+
+	return 0;
+}
+
+static void dcscb_power_down(void)
+{
+	unsigned int mpidr, cpu, cluster, rst_hold, cpumask, last_man;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+	cpumask = (1 << cpu);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	BUG_ON(cpu >= 4 || cluster >= 2);
+
+	arch_spin_lock(&dcscb_lock);
+	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
+	rst_hold |= cpumask;
+	if (((rst_hold | (rst_hold >> 4)) & 0xf) == 0xf)
+		rst_hold |= (1 << 8);
+	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
+	arch_spin_unlock(&dcscb_lock);
+	last_man = (rst_hold & (1 << 8));
+
+	/*
+	 * Now let's clean our L1 cache and shut ourself down.
+	 * If we're the last CPU in this cluster then clean L2 too.
+	 */
+
+	/*
+	 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
+	 * a preliminary flush here for those CPUs.  At least, that's
+	 * the theory -- without the extra flush, Linux explodes on
+	 * RTSM (maybe not needed anymore, to be investigated)..
+	 */
+	flush_cache_louis();
+	set_cr(get_cr() & ~CR_C);
+
+	if (!last_man) {
+		flush_cache_louis();
+	} else {
+		flush_cache_all();
+		outer_flush_all();
+	}
+
+	/* Disable local coherency by clearing the ACTLR "SMP" bit: */
+	set_auxcr(get_auxcr() & ~(1 << 6));
+
+	/* Now we are prepared for power-down, do it: */
+	dsb();
+	wfi();
+
+	/* Not dead@this point?  Let our caller cope. */
+}
+
+static const struct mcpm_platform_ops dcscb_power_ops = {
+	.power_up	= dcscb_power_up,
+	.power_down	= dcscb_power_down,
+};
+
+static int __init dcscb_init(void)
+{
+	struct device_node *node;
+	int ret;
+
+	node = of_find_compatible_node(NULL, NULL, "arm,rtsm,dcscb");
+	if (!node)
+		return -ENODEV;
+	dcscb_base= of_iomap(node, 0);
+	if (!dcscb_base)
+		return -EADDRNOTAVAIL;
+
+	ret = mcpm_platform_register(&dcscb_power_ops);
+	if (ret) {
+		iounmap(dcscb_base);
+		return ret;
+	}
+
+	/*
+	 * Future entries into the kernel can now go
+	 * through the cluster entry vectors.
+	 */
+	vexpress_flags_set(virt_to_phys(mcpm_entry_point));
+
+	return 0;
+}
+
+early_initcall(dcscb_init);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 09/15] ARM: vexpress/dcscb: add CPU use counts to the power up/down API implementation
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (7 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 10/15] ARM: vexpress/dcscb: do not hardcode number of CPUs per cluster Nicolas Pitre
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

It is possible for a CPU to be told to power up before it managed
to power itself down.  Solve this race with a usage count as mandated
by the API definition.

Signed-off-by: nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mach-vexpress/dcscb.c | 77 +++++++++++++++++++++++++++++++++---------
 1 file changed, 61 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 07e835cb72..37a996adf3 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -44,6 +44,7 @@
 static arch_spinlock_t dcscb_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 
 static void __iomem *dcscb_base;
+static int dcscb_use_count[4][2];
 
 static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 {
@@ -60,14 +61,27 @@ static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 	local_irq_disable();
 	arch_spin_lock(&dcscb_lock);
 
-	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
-	if (rst_hold & (1 << 8)) {
-		/* remove cluster reset and add individual CPU's reset */
-		rst_hold &= ~(1 << 8);
-		rst_hold |= 0xf;
+	dcscb_use_count[cpu][cluster]++;
+	if (dcscb_use_count[cpu][cluster] == 1) {
+		rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
+		if (rst_hold & (1 << 8)) {
+			/* remove cluster reset and add individual CPU's reset */
+			rst_hold &= ~(1 << 8);
+			rst_hold |= 0xf;
+		}
+		rst_hold &= ~(cpumask | (cpumask << 4));
+		writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
+	} else if (dcscb_use_count[cpu][cluster] != 2) {
+		/*
+		 * The only possible values are:
+		 * 0 = CPU down
+		 * 1 = CPU (still) up
+		 * 2 = CPU requested to be up before it had a chance
+		 *     to actually make itself down.
+		 * Any other value is a bug.
+		 */
+		BUG();
 	}
-	rst_hold &= ~(cpumask | (cpumask << 4));
-	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
 
 	arch_spin_unlock(&dcscb_lock);
 	local_irq_enable();
@@ -77,7 +91,8 @@ static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 
 static void dcscb_power_down(void)
 {
-	unsigned int mpidr, cpu, cluster, rst_hold, cpumask, last_man;
+	unsigned int mpidr, cpu, cluster, rst_hold, cpumask;
+	bool last_man = false, skip_wfi = false;
 
 	mpidr = read_cpuid_mpidr();
 	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
@@ -88,13 +103,26 @@ static void dcscb_power_down(void)
 	BUG_ON(cpu >= 4 || cluster >= 2);
 
 	arch_spin_lock(&dcscb_lock);
-	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
-	rst_hold |= cpumask;
-	if (((rst_hold | (rst_hold >> 4)) & 0xf) == 0xf)
-		rst_hold |= (1 << 8);
-	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
+	dcscb_use_count[cpu][cluster]--;
+	if (dcscb_use_count[cpu][cluster] == 0) {
+		rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
+		rst_hold |= cpumask;
+		if (((rst_hold | (rst_hold >> 4)) & 0xf) == 0xf) {
+			rst_hold |= (1 << 8);
+			last_man = true;
+		}
+		writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
+	} else if (dcscb_use_count[cpu][cluster] == 1) {
+		/*
+		 * A power_up request went ahead of us.
+		 * Even if we do not want to shut this CPU down,
+		 * the caller expects a certain state as if the WFI
+		 * was aborted.  So let's continue with cache cleaning.
+		 */
+		skip_wfi = true;
+	} else
+		BUG();
 	arch_spin_unlock(&dcscb_lock);
-	last_man = (rst_hold & (1 << 8));
 
 	/*
 	 * Now let's clean our L1 cache and shut ourself down.
@@ -121,8 +149,10 @@ static void dcscb_power_down(void)
 	set_auxcr(get_auxcr() & ~(1 << 6));
 
 	/* Now we are prepared for power-down, do it: */
-	dsb();
-	wfi();
+	if (!skip_wfi) {
+		dsb();
+		wfi();
+	}
 
 	/* Not dead@this point?  Let our caller cope. */
 }
@@ -132,6 +162,19 @@ static const struct mcpm_platform_ops dcscb_power_ops = {
 	.power_down	= dcscb_power_down,
 };
 
+static void __init dcscb_usage_count_init(void)
+{
+	unsigned int mpidr, cpu, cluster;
+
+	mpidr = read_cpuid_mpidr();
+	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+
+	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
+	BUG_ON(cpu >= 4 || cluster >= 2);
+	dcscb_use_count[cpu][cluster] = 1;
+}
+
 static int __init dcscb_init(void)
 {
 	struct device_node *node;
@@ -144,6 +187,8 @@ static int __init dcscb_init(void)
 	if (!dcscb_base)
 		return -EADDRNOTAVAIL;
 
+	dcscb_usage_count_init();
+
 	ret = mcpm_platform_register(&dcscb_power_ops);
 	if (ret) {
 		iounmap(dcscb_base);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 10/15] ARM: vexpress/dcscb: do not hardcode number of CPUs per cluster
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (8 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 09/15] ARM: vexpress/dcscb: add CPU use counts to the power up/down API implementation Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 11/15] drivers/bus: add ARM CCI support Nicolas Pitre
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

If 4 CPUs are assumed, the A15x1-A7x1 model configuration would never
shut down the initial cluster as the 0xf reset bit mask will never be
observed.  Let's construct this mask based on the provided information
in the DCSCB config register for the number of CPUs per cluster.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mach-vexpress/dcscb.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 37a996adf3..7be784a830 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -45,10 +45,12 @@ static arch_spinlock_t dcscb_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 
 static void __iomem *dcscb_base;
 static int dcscb_use_count[4][2];
+static int dcscb_mcpm_cpu_mask[2];
 
 static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 {
 	unsigned int rst_hold, cpumask = (1 << cpu);
+	unsigned int mcpm_mask = dcscb_mcpm_cpu_mask[cluster];
 
 	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
 	if (cpu >= 4 || cluster >= 2)
@@ -67,7 +69,7 @@ static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 		if (rst_hold & (1 << 8)) {
 			/* remove cluster reset and add individual CPU's reset */
 			rst_hold &= ~(1 << 8);
-			rst_hold |= 0xf;
+			rst_hold |= mcpm_mask;
 		}
 		rst_hold &= ~(cpumask | (cpumask << 4));
 		writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
@@ -91,13 +93,14 @@ static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
 
 static void dcscb_power_down(void)
 {
-	unsigned int mpidr, cpu, cluster, rst_hold, cpumask;
+	unsigned int mpidr, cpu, cluster, rst_hold, cpumask, mcpm_mask;
 	bool last_man = false, skip_wfi = false;
 
 	mpidr = read_cpuid_mpidr();
 	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
 	cpumask = (1 << cpu);
+	mcpm_mask = dcscb_mcpm_cpu_mask[cluster];
 
 	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
 	BUG_ON(cpu >= 4 || cluster >= 2);
@@ -107,7 +110,7 @@ static void dcscb_power_down(void)
 	if (dcscb_use_count[cpu][cluster] == 0) {
 		rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
 		rst_hold |= cpumask;
-		if (((rst_hold | (rst_hold >> 4)) & 0xf) == 0xf) {
+		if (((rst_hold | (rst_hold >> 4)) & mcpm_mask) == mcpm_mask) {
 			rst_hold |= (1 << 8);
 			last_man = true;
 		}
@@ -178,6 +181,7 @@ static void __init dcscb_usage_count_init(void)
 static int __init dcscb_init(void)
 {
 	struct device_node *node;
+	unsigned int cfg;
 	int ret;
 
 	node = of_find_compatible_node(NULL, NULL, "arm,rtsm,dcscb");
@@ -186,7 +190,9 @@ static int __init dcscb_init(void)
 	dcscb_base= of_iomap(node, 0);
 	if (!dcscb_base)
 		return -EADDRNOTAVAIL;
-
+	cfg = readl_relaxed(dcscb_base + DCS_CFG_R);
+	dcscb_mcpm_cpu_mask[0] = (1 << (((cfg >> 16) >> (0 << 2)) & 0xf)) - 1;
+	dcscb_mcpm_cpu_mask[1] = (1 << (((cfg >> 16) >> (1 << 2)) & 0xf)) - 1;
 	dcscb_usage_count_init();
 
 	ret = mcpm_platform_register(&dcscb_power_ops);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 11/15] drivers/bus: add ARM CCI support
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (9 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 10/15] ARM: vexpress/dcscb: do not hardcode number of CPUs per cluster Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-04-23 19:38   ` Russell King - ARM Linux
  2013-02-05  5:22 ` [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache Nicolas Pitre
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

On ARM multi-cluster systems coherency between cores running on
different clusters is managed by the cache-coherent interconnect (CCI).
It allows broadcasting of TLB invalidates and memory barriers and it
guarantees cache coherency at system level.

This patch enables the basic infrastructure required in Linux to
handle and programme the CCI component. The first implementation is
based on a platform device, its relative DT compatible property and
a simple programming interface.

Very basic for now.  More functionalities will come later.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 drivers/bus/Kconfig     |   4 ++
 drivers/bus/Makefile    |   2 +
 drivers/bus/arm-cci.c   | 112 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/arm-cci.h |  30 +++++++++++++
 4 files changed, 148 insertions(+)
 create mode 100644 drivers/bus/arm-cci.c
 create mode 100644 include/linux/arm-cci.h

diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 0f51ed687d..d032f74ff2 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -19,4 +19,8 @@ config OMAP_INTERCONNECT
 
 	help
 	  Driver to enable OMAP interconnect error handling driver.
+
+config ARM_CCI
+       bool "ARM CCI driver support"
+
 endmenu
diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
index 45d997c854..55aac809e5 100644
--- a/drivers/bus/Makefile
+++ b/drivers/bus/Makefile
@@ -6,3 +6,5 @@ obj-$(CONFIG_OMAP_OCP2SCP)	+= omap-ocp2scp.o
 
 # Interconnect bus driver for OMAP SoCs.
 obj-$(CONFIG_OMAP_INTERCONNECT)	+= omap_l3_smx.o omap_l3_noc.o
+
+obj-$(CONFIG_ARM_CCI)		+= arm-cci.o
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
new file mode 100644
index 0000000000..11c1513440
--- /dev/null
+++ b/drivers/bus/arm-cci.c
@@ -0,0 +1,112 @@
+/*
+ * ARM Cache Coherency Interconnect (CCI400) support
+ *
+ * Copyright (C) 2012-2013 ARM Ltd.
+ * Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/arm-cci.h>
+
+
+#define CCI_STATUS_OFFSET	0xc
+#define STATUS_CHANGE_PENDING	(1 << 0)
+
+#define CCI_SLAVE_OFFSET(n)	(0x1000 + 0x1000 * (n))
+#define CCI400_EAG_OFFSET       CCI_SLAVE_OFFSET(3)
+#define CCI400_KF_OFFSET        CCI_SLAVE_OFFSET(4)
+
+#define DRIVER_NAME	"CCI"
+struct cci_drvdata {
+	void __iomem *baseaddr;
+};
+
+static struct cci_drvdata *info;
+
+void disable_cci(int cluster)
+{
+	u32 slave_reg = cluster ? CCI400_KF_OFFSET : CCI400_EAG_OFFSET;
+	writel_relaxed(0x0, info->baseaddr + slave_reg);
+
+	while (readl_relaxed(info->baseaddr + CCI_STATUS_OFFSET)
+						& STATUS_CHANGE_PENDING)
+			barrier();
+}
+EXPORT_SYMBOL_GPL(disable_cci);
+
+static int cci_driver_probe(struct platform_device *pdev)
+{
+	struct resource *res;
+	int ret = 0;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		dev_err(&pdev->dev, "unable to allocate mem\n");
+		return -ENOMEM;
+	}
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(&pdev->dev, "No memory resource\n");
+		ret = -EINVAL;
+		goto mem_free;
+	}
+
+	if (!request_mem_region(res->start, resource_size(res),
+				dev_name(&pdev->dev))) {
+		dev_err(&pdev->dev, "address 0x%x in use\n", (u32) res->start);
+		ret = -EBUSY;
+		goto mem_free;
+	}
+
+	info->baseaddr = ioremap(res->start, resource_size(res));
+	if (!info->baseaddr) {
+		ret = -EADDRNOTAVAIL;
+		goto ioremap_err;
+	}
+
+	platform_set_drvdata(pdev, info);
+
+	pr_info("CCI loaded at %p\n", info->baseaddr);
+	return ret;
+
+ioremap_err:
+	release_region(res->start, resource_size(res));
+mem_free:
+	kfree(info);
+
+	return ret;
+}
+
+static const struct of_device_id arm_cci_matches[] = {
+	{.compatible = "arm,cci"},
+	{},
+};
+
+static struct platform_driver cci_platform_driver = {
+	.driver = {
+		   .name = DRIVER_NAME,
+		   .of_match_table = arm_cci_matches,
+		  },
+	.probe = cci_driver_probe,
+};
+
+static int __init cci_init(void)
+{
+	return platform_driver_register(&cci_platform_driver);
+}
+
+core_initcall(cci_init);
diff --git a/include/linux/arm-cci.h b/include/linux/arm-cci.h
new file mode 100644
index 0000000000..86ae587817
--- /dev/null
+++ b/include/linux/arm-cci.h
@@ -0,0 +1,30 @@
+/*
+ * CCI support
+ *
+ * Copyright (C) 2012-2013 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __LINUX_ARM_CCI_H
+#define __LINUX_ARM_CCI_H
+
+#ifdef CONFIG_ARM_CCI
+extern void disable_cci(int cluster);
+#else
+static inline void disable_cci(int cluster) { }
+#endif
+
+#endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (10 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 11/15] drivers/bus: add ARM CCI support Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-04-23 19:40   ` Russell King - ARM Linux
  2013-02-05  5:22 ` [PATCH v4 13/15] ARM: vexpress/dcscb: handle platform coherency exit/setup and CCI Nicolas Pitre
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dave Martin <dave.martin@linaro.org>

Non-local variables used by the CCI management function called after
disabling the cache must be flushed out to main memory in advance,
otherwise incoherency of those values may occur if they are sitting
in the cache of some other CPU when cci_disable() executes.

This patch adds the appropriate flushing to the CCI driver to ensure
that the relevant data is available in RAM ahead of time.

Because this creates a dependency on arch-specific cacheflushing
functions, this patch also makes ARM_CCI depend on ARM.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 drivers/bus/Kconfig   |  1 +
 drivers/bus/arm-cci.c | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index d032f74ff2..cd4ac9f001 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -22,5 +22,6 @@ config OMAP_INTERCONNECT
 
 config ARM_CCI
        bool "ARM CCI driver support"
+	depends on ARM
 
 endmenu
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 11c1513440..3bdab2c135 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -21,6 +21,10 @@
 #include <linux/slab.h>
 #include <linux/arm-cci.h>
 
+#include <asm/cacheflush.h>
+#include <asm/memory.h>
+#include <asm/outercache.h>
+
 
 #define CCI_STATUS_OFFSET	0xc
 #define STATUS_CHANGE_PENDING	(1 << 0)
@@ -78,6 +82,15 @@ static int cci_driver_probe(struct platform_device *pdev)
 		goto ioremap_err;
 	}
 
+	/*
+	 * Multi-cluster systems may need this data when non-coherent, during
+	 * cluster power-up/power-down. Make sure it reaches main memory:
+	 */
+	__cpuc_flush_dcache_area(info, sizeof *info);
+	__cpuc_flush_dcache_area(&info, sizeof info);
+	outer_clean_range(virt_to_phys(info), virt_to_phys(info + 1));
+	outer_clean_range(virt_to_phys(&info), virt_to_phys(&info + 1));
+
 	platform_set_drvdata(pdev, info);
 
 	pr_info("CCI loaded@%p\n", info->baseaddr);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 13/15] ARM: vexpress/dcscb: handle platform coherency exit/setup and CCI
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (11 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time Nicolas Pitre
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dave Martin <dave.martin@linaro.org>

Add the required code to properly handle race free platform coherency exit
to the DCSCB power down method.

The power_up_setup callback is used to enable the CCI interface for
the cluster being brought up.  This must be done in assembly before
the kernel environment is entered.

Thanks to Achin Gupta and Nicolas Pitre for their help and
contributions.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mach-vexpress/Kconfig       |  1 +
 arch/arm/mach-vexpress/Makefile      |  2 +-
 arch/arm/mach-vexpress/dcscb.c       | 74 ++++++++++++++++++++++++---------
 arch/arm/mach-vexpress/dcscb_setup.S | 80 ++++++++++++++++++++++++++++++++++++
 4 files changed, 137 insertions(+), 20 deletions(-)
 create mode 100644 arch/arm/mach-vexpress/dcscb_setup.S

diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig
index f3f92b120a..f8fbe7c6a2 100644
--- a/arch/arm/mach-vexpress/Kconfig
+++ b/arch/arm/mach-vexpress/Kconfig
@@ -55,6 +55,7 @@ config ARCH_VEXPRESS_CA9X4
 config ARCH_VEXPRESS_DCSCB
 	bool "Dual Cluster System Control Block (DCSCB) support"
 	depends on CLUSTER_PM
+	select ARM_CCI
 	help
 	  Support for the Dual Cluster System Configuration Block (DCSCB).
 	  This is needed to provide CPU and cluster power management
diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile
index 2253644054..f6e90f3272 100644
--- a/arch/arm/mach-vexpress/Makefile
+++ b/arch/arm/mach-vexpress/Makefile
@@ -6,6 +6,6 @@ ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := -I$(srctree)/$(src)/include \
 
 obj-y					:= v2m.o reset.o
 obj-$(CONFIG_ARCH_VEXPRESS_CA9X4)	+= ct-ca9x4.o
-obj-$(CONFIG_ARCH_VEXPRESS_DCSCB)	+= dcscb.o
+obj-$(CONFIG_ARCH_VEXPRESS_DCSCB)	+= dcscb.o	dcscb_setup.o
 obj-$(CONFIG_SMP)			+= platsmp.o
 obj-$(CONFIG_HOTPLUG_CPU)		+= hotplug.o
diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 7be784a830..098e615d3f 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -16,6 +16,7 @@
 #include <linux/errno.h>
 #include <linux/of_address.h>
 #include <linux/vexpress.h>
+#include <linux/arm-cci.h>
 
 #include <asm/mcpm_entry.h>
 #include <asm/proc-fns.h>
@@ -105,6 +106,8 @@ static void dcscb_power_down(void)
 	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
 	BUG_ON(cpu >= 4 || cluster >= 2);
 
+	__mcpm_cpu_going_down(cpu, cluster);
+
 	arch_spin_lock(&dcscb_lock);
 	dcscb_use_count[cpu][cluster]--;
 	if (dcscb_use_count[cpu][cluster] == 0) {
@@ -112,6 +115,7 @@ static void dcscb_power_down(void)
 		rst_hold |= cpumask;
 		if (((rst_hold | (rst_hold >> 4)) & mcpm_mask) == mcpm_mask) {
 			rst_hold |= (1 << 8);
+			BUG_ON(__mcpm_mcpm_state(cluster) != CLUSTER_UP);
 			last_man = true;
 		}
 		writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
@@ -125,31 +129,59 @@ static void dcscb_power_down(void)
 		skip_wfi = true;
 	} else
 		BUG();
-	arch_spin_unlock(&dcscb_lock);
 
-	/*
-	 * Now let's clean our L1 cache and shut ourself down.
-	 * If we're the last CPU in this cluster then clean L2 too.
-	 */
-
-	/*
-	 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
-	 * a preliminary flush here for those CPUs.  At least, that's
-	 * the theory -- without the extra flush, Linux explodes on
-	 * RTSM (maybe not needed anymore, to be investigated)..
-	 */
-	flush_cache_louis();
-	set_cr(get_cr() & ~CR_C);
+	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
+		arch_spin_unlock(&dcscb_lock);
 
-	if (!last_man) {
-		flush_cache_louis();
-	} else {
+		/*
+		 * Flush all cache levels for this cluster.
+		 *
+		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
+		 * a preliminary flush here for those CPUs.  At least, that's
+		 * the theory -- without the extra flush, Linux explodes on
+		 * RTSM (maybe not needed anymore, to be investigated).
+		 */
 		flush_cache_all();
+		set_cr(get_cr() & ~CR_C);
+		flush_cache_all();
+
+		/*
+		 * This is a harmless no-op.  On platforms with a real
+		 * outer cache this might either be needed or not,
+		 * depending on where the outer cache sits.
+		 */
 		outer_flush_all();
+
+		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
+		set_auxcr(get_auxcr() & ~(1 << 6));
+
+		/*
+		 * Disable cluster-level coherency by masking
+		 * incoming snoops and DVM messages:
+		 */
+		disable_cci(cluster);
+
+		__mcpm_outbound_leave_critical(cluster, CLUSTER_DOWN);
+	} else {
+		arch_spin_unlock(&dcscb_lock);
+
+		/*
+		 * Flush the local CPU cache.
+		 *
+		 * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need
+		 * a preliminary flush here for those CPUs.  At least, that's
+		 * the theory -- without the extra flush, Linux explodes on
+		 * RTSM (maybe not needed anymore, to be investigated).
+		 */
+		flush_cache_louis();
+		set_cr(get_cr() & ~CR_C);
+		flush_cache_louis();
+
+		/* Disable local coherency by clearing the ACTLR "SMP" bit: */
+		set_auxcr(get_auxcr() & ~(1 << 6));
 	}
 
-	/* Disable local coherency by clearing the ACTLR "SMP" bit: */
-	set_auxcr(get_auxcr() & ~(1 << 6));
+	__mcpm_cpu_down(cpu, cluster);
 
 	/* Now we are prepared for power-down, do it: */
 	if (!skip_wfi) {
@@ -178,6 +210,8 @@ static void __init dcscb_usage_count_init(void)
 	dcscb_use_count[cpu][cluster] = 1;
 }
 
+extern void dcscb_power_up_setup(unsigned int affinity_level);
+
 static int __init dcscb_init(void)
 {
 	struct device_node *node;
@@ -196,6 +230,8 @@ static int __init dcscb_init(void)
 	dcscb_usage_count_init();
 
 	ret = mcpm_platform_register(&dcscb_power_ops);
+	if (!ret)
+		ret = mcpm_sync_init(dcscb_power_up_setup);
 	if (ret) {
 		iounmap(dcscb_base);
 		return ret;
diff --git a/arch/arm/mach-vexpress/dcscb_setup.S b/arch/arm/mach-vexpress/dcscb_setup.S
new file mode 100644
index 0000000000..cac033b982
--- /dev/null
+++ b/arch/arm/mach-vexpress/dcscb_setup.S
@@ -0,0 +1,80 @@
+/*
+ * arch/arm/include/asm/dcscb_setup.S
+ *
+ * Created by:  Dave Martin, 2012-06-22
+ * Copyright:   (C) 2012-2013  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+
+#include <linux/linkage.h>
+#include <asm/mcpm_entry.h>
+
+
+#define SLAVE_SNOOPCTL_OFFSET	0
+#define SNOOPCTL_SNOOP_ENABLE	(1 << 0)
+#define SNOOPCTL_DVM_ENABLE	(1 << 1)
+
+#define CCI_STATUS_OFFSET	0xc
+#define STATUS_CHANGE_PENDING	(1 << 0)
+
+#define CCI_SLAVE_OFFSET(n)	(0x1000 + 0x1000 * (n))
+
+#define RTSM_CCI_PHYS_BASE	0x2c090000
+#define RTSM_CCI_SLAVE_A15	3
+#define RTSM_CCI_SLAVE_A7	4
+
+#define RTSM_CCI_A15_OFFSET	CCI_SLAVE_OFFSET(RTSM_CCI_SLAVE_A15)
+#define RTSM_CCI_A7_OFFSET	CCI_SLAVE_OFFSET(RTSM_CCI_SLAVE_A7)
+
+
+ENTRY(dcscb_power_up_setup)
+
+	cmp	r0, #0			@ check affinity level
+	beq	2f
+
+/*
+ * Enable cluster-level coherency, in preparation for turning on the MMU.
+ * The ACTLR SMP bit does not need to be set here, because cpu_resume()
+ * already restores that.
+ */
+
+	mrc	p15, 0, r0, c0, c0, 5	@ MPIDR
+	ubfx	r0, r0, #8, #4		@ cluster
+
+	@ A15/A7 may not require explicit L2 invalidation on reset, dependent
+	@ on hardware integration desicions.
+	@ For now, this code assumes that L2 is either already invalidated, or
+	@ invalidation is not required.
+
+	ldr	r3, =RTSM_CCI_PHYS_BASE + RTSM_CCI_A15_OFFSET
+	cmp	r0, #0		@ A15 cluster?
+	addne	r3, r3, #RTSM_CCI_A7_OFFSET - RTSM_CCI_A15_OFFSET
+
+	@ r3 now points to the correct CCI slave register block
+
+	ldr	r0, [r3, #SLAVE_SNOOPCTL_OFFSET]
+	orr	r0, r0, #SNOOPCTL_SNOOP_ENABLE | SNOOPCTL_DVM_ENABLE
+	str	r0, [r3, #SLAVE_SNOOPCTL_OFFSET]	@ enable CCI snoops
+
+	@ Wait for snoop control change to complete:
+
+	ldr	r3, =RTSM_CCI_PHYS_BASE
+
+1:	ldr	r0, [r3, #CCI_STATUS_OFFSET]
+	tst	r0, #STATUS_CHANGE_PENDING
+	bne	1b
+
+	dsb		@ Synchronise side-effects of enabling CCI
+
+	bx	lr
+
+2:	@ Implementation-specific local CPU setup operations should go here,
+	@ if any.  In this case, there is nothing to do.
+
+	bx	lr
+
+ENDPROC(dcscb_power_up_setup)
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (12 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 13/15] ARM: vexpress/dcscb: handle platform coherency exit/setup and CCI Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-04-05 22:43   ` Olof Johansson
  2013-04-09 16:30   ` Nicolas Pitre
  2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
  2013-04-23 20:04 ` [PATCH v4 00/15] multi-cluster power management Russell King - ARM Linux
  15 siblings, 2 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Jon Medhurst <tixy@linaro.org>

Add a new 'smp_init' hook to machine_desc so platforms can specify a
function to be used to setup smp ops instead of having a statically
defined value.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/include/asm/mach/arch.h | 3 +++
 arch/arm/kernel/setup.c          | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 917d4fcfd9..3d01c6d6c3 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -17,8 +17,10 @@ struct pt_regs;
 struct smp_operations;
 #ifdef CONFIG_SMP
 #define smp_ops(ops) (&(ops))
+#define smp_init_ops(ops) (&(ops))
 #else
 #define smp_ops(ops) (struct smp_operations *)NULL
+#define smp_init_ops(ops) (void (*)(void))NULL
 #endif
 
 struct machine_desc {
@@ -42,6 +44,7 @@ struct machine_desc {
 	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
 	char			restart_mode;	/* default restart mode	*/
 	struct smp_operations	*smp;		/* SMP operations	*/
+	void			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
 	void			(*reserve)(void);/* reserve mem blocks	*/
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 3f6cbb2e3e..41edca8582 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -768,7 +768,10 @@ void __init setup_arch(char **cmdline_p)
 	arm_dt_init_cpu_maps();
 #ifdef CONFIG_SMP
 	if (is_smp()) {
-		smp_set_ops(mdesc->smp);
+		if(mdesc->smp_init)
+			(*mdesc->smp_init)();
+		else
+			smp_set_ops(mdesc->smp);
 		smp_init_cpus();
 	}
 #endif
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (13 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time Nicolas Pitre
@ 2013-02-05  5:22 ` Nicolas Pitre
  2013-02-06 16:38   ` Pawel Moll
                     ` (2 more replies)
  2013-04-23 20:04 ` [PATCH v4 00/15] multi-cluster power management Russell King - ARM Linux
  15 siblings, 3 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-05  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: Jon Medhurst <tixy@linaro.org>

Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/mach-vexpress/core.h    |  2 ++
 arch/arm/mach-vexpress/platsmp.c | 12 ++++++++++++
 arch/arm/mach-vexpress/v2m.c     |  2 +-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
index f134cd4a85..3a761fd76c 100644
--- a/arch/arm/mach-vexpress/core.h
+++ b/arch/arm/mach-vexpress/core.h
@@ -6,6 +6,8 @@
 
 void vexpress_dt_smp_map_io(void);
 
+void vexpress_smp_init_ops(void);
+
 extern struct smp_operations	vexpress_smp_ops;
 
 extern void vexpress_cpu_die(unsigned int cpu);
diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
index c5d70de9bb..667344b479 100644
--- a/arch/arm/mach-vexpress/platsmp.c
+++ b/arch/arm/mach-vexpress/platsmp.c
@@ -12,6 +12,7 @@
 #include <linux/errno.h>
 #include <linux/smp.h>
 #include <linux/io.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/vexpress.h>
 
@@ -206,3 +207,14 @@ struct smp_operations __initdata vexpress_smp_ops = {
 	.cpu_die		= vexpress_cpu_die,
 #endif
 };
+
+void __init vexpress_smp_init_ops(void)
+{
+	struct smp_operations *ops = &vexpress_smp_ops;
+#ifdef CONFIG_CLUSTER_PM
+	extern struct smp_operations mcpm_smp_ops;
+	if(of_find_compatible_node(NULL, NULL, "arm,cci"))
+		ops = &mcpm_smp_ops;
+#endif
+	smp_set_ops(ops);
+}
diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
index 011661a6c5..34172bd504 100644
--- a/arch/arm/mach-vexpress/v2m.c
+++ b/arch/arm/mach-vexpress/v2m.c
@@ -494,7 +494,7 @@ static const char * const v2m_dt_match[] __initconst = {
 
 DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
 	.dt_compat	= v2m_dt_match,
-	.smp		= smp_ops(vexpress_smp_ops),
+	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
 	.map_io		= v2m_dt_map_io,
 	.init_early	= v2m_dt_init_early,
 	.init_irq	= v2m_dt_init_irq,
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
@ 2013-02-06 16:38   ` Pawel Moll
  2013-02-06 17:55     ` Nicolas Pitre
  2013-04-05 22:48   ` Olof Johansson
  2013-04-23 19:42   ` Russell King - ARM Linux
  2 siblings, 1 reply; 55+ messages in thread
From: Pawel Moll @ 2013-02-06 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2013-02-05 at 05:22 +0000, Nicolas Pitre wrote:
> From: Jon Medhurst <tixy@linaro.org>
> 
> Signed-off-by: Jon Medhurst <tixy@linaro.org>
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/mach-vexpress/core.h    |  2 ++
>  arch/arm/mach-vexpress/platsmp.c | 12 ++++++++++++
>  arch/arm/mach-vexpress/v2m.c     |  2 +-
>  3 files changed, 15 insertions(+), 1 deletion(-)

For this and the other "ARM: vexpress" patches in the series:

Acked-by: Pawel Moll <pawel.moll@arm.com>

Thanks!

Pawel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-02-06 16:38   ` Pawel Moll
@ 2013-02-06 17:55     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-06 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 6 Feb 2013, Pawel Moll wrote:

> On Tue, 2013-02-05 at 05:22 +0000, Nicolas Pitre wrote:
> > From: Jon Medhurst <tixy@linaro.org>
> > 
> > Signed-off-by: Jon Medhurst <tixy@linaro.org>
> > Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > ---
> >  arch/arm/mach-vexpress/core.h    |  2 ++
> >  arch/arm/mach-vexpress/platsmp.c | 12 ++++++++++++
> >  arch/arm/mach-vexpress/v2m.c     |  2 +-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> For this and the other "ARM: vexpress" patches in the series:
> 
> Acked-by: Pawel Moll <pawel.moll@arm.com>

Thanks.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support
  2013-02-05  5:22 ` [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support Nicolas Pitre
@ 2013-02-07 18:14   ` Catalin Marinas
  2013-02-07 18:56     ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Catalin Marinas @ 2013-02-07 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nico,

On Tue, Feb 05, 2013 at 05:22:05AM +0000, Nicolas Pitre wrote:
> +static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
> +{
> +	unsigned int rst_hold, cpumask = (1 << cpu);
> +
> +	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
> +	if (cpu >= 4 || cluster >= 2)
> +		return -EINVAL;
> +
> +	/*
> +	 * Since this is called with IRQs enabled, and no arch_spin_lock_irq
> +	 * variant exists, we need to disable IRQs manually here.
> +	 */
> +	local_irq_disable();
> +	arch_spin_lock(&dcscb_lock);
> +
> +	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
> +	if (rst_hold & (1 << 8)) {
> +		/* remove cluster reset and add individual CPU's reset */
> +		rst_hold &= ~(1 << 8);
> +		rst_hold |= 0xf;
> +	}
> +	rst_hold &= ~(cpumask | (cpumask << 4));
> +	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);

Why do you mix relaxed and non-relaxed I/O accessors here? Do you need
the barriers implied by writel()?

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support
  2013-02-07 18:14   ` Catalin Marinas
@ 2013-02-07 18:56     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-02-07 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 7 Feb 2013, Catalin Marinas wrote:

> Hi Nico,
> 
> On Tue, Feb 05, 2013 at 05:22:05AM +0000, Nicolas Pitre wrote:
> > +static int dcscb_power_up(unsigned int cpu, unsigned int cluster)
> > +{
> > +	unsigned int rst_hold, cpumask = (1 << cpu);
> > +
> > +	pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
> > +	if (cpu >= 4 || cluster >= 2)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Since this is called with IRQs enabled, and no arch_spin_lock_irq
> > +	 * variant exists, we need to disable IRQs manually here.
> > +	 */
> > +	local_irq_disable();
> > +	arch_spin_lock(&dcscb_lock);
> > +
> > +	rst_hold = readl_relaxed(dcscb_base + RST_HOLD0 + cluster * 4);
> > +	if (rst_hold & (1 << 8)) {
> > +		/* remove cluster reset and add individual CPU's reset */
> > +		rst_hold &= ~(1 << 8);
> > +		rst_hold |= 0xf;
> > +	}
> > +	rst_hold &= ~(cpumask | (cpumask << 4));
> > +	writel(rst_hold, dcscb_base + RST_HOLD0 + cluster * 4);
> 
> Why do you mix relaxed and non-relaxed I/O accessors here? Do you need
> the barriers implied by writel()?

Most likely not.  I'll change that.

For the record, this patch is not part of the pull request I sent 
recently.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time
  2013-02-05  5:22 ` [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time Nicolas Pitre
@ 2013-04-05 22:43   ` Olof Johansson
  2013-04-06 13:43     ` Nicolas Pitre
  2013-04-09 16:30   ` Nicolas Pitre
  1 sibling, 1 reply; 55+ messages in thread
From: Olof Johansson @ 2013-04-05 22:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:11AM -0500, Nicolas Pitre wrote:
> From: Jon Medhurst <tixy@linaro.org>
> 
> Add a new 'smp_init' hook to machine_desc so platforms can specify a
> function to be used to setup smp ops instead of having a statically
> defined value.
> 
> Signed-off-by: Jon Medhurst <tixy@linaro.org>
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/include/asm/mach/arch.h | 3 +++
>  arch/arm/kernel/setup.c          | 5 ++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> index 917d4fcfd9..3d01c6d6c3 100644
> --- a/arch/arm/include/asm/mach/arch.h
> +++ b/arch/arm/include/asm/mach/arch.h
> @@ -17,8 +17,10 @@ struct pt_regs;
>  struct smp_operations;
>  #ifdef CONFIG_SMP
>  #define smp_ops(ops) (&(ops))
> +#define smp_init_ops(ops) (&(ops))
>  #else
>  #define smp_ops(ops) (struct smp_operations *)NULL
> +#define smp_init_ops(ops) (void (*)(void))NULL
>  #endif
>  
>  struct machine_desc {
> @@ -42,6 +44,7 @@ struct machine_desc {
>  	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
>  	char			restart_mode;	/* default restart mode	*/
>  	struct smp_operations	*smp;		/* SMP operations	*/
> +	void			(*smp_init)(void);
>  	void			(*fixup)(struct tag *, char **,
>  					 struct meminfo *);
>  	void			(*reserve)(void);/* reserve mem blocks	*/
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 3f6cbb2e3e..41edca8582 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -768,7 +768,10 @@ void __init setup_arch(char **cmdline_p)
>  	arm_dt_init_cpu_maps();
>  #ifdef CONFIG_SMP
>  	if (is_smp()) {
> -		smp_set_ops(mdesc->smp);
> +		if(mdesc->smp_init)

This will fail checkpatch: if() instead of if ().

> +			(*mdesc->smp_init)();

This is different calling style than init_early() below, which uses
mdesc->init_early(). Please be consistent.

> +		else
> +			smp_set_ops(mdesc->smp);


-Olof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
  2013-02-06 16:38   ` Pawel Moll
@ 2013-04-05 22:48   ` Olof Johansson
  2013-04-06 14:02     ` Nicolas Pitre
  2013-04-23 19:42   ` Russell King - ARM Linux
  2 siblings, 1 reply; 55+ messages in thread
From: Olof Johansson @ 2013-04-05 22:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:12AM -0500, Nicolas Pitre wrote:
> From: Jon Medhurst <tixy@linaro.org>
> 
> Signed-off-by: Jon Medhurst <tixy@linaro.org>
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/mach-vexpress/core.h    |  2 ++
>  arch/arm/mach-vexpress/platsmp.c | 12 ++++++++++++
>  arch/arm/mach-vexpress/v2m.c     |  2 +-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
> index f134cd4a85..3a761fd76c 100644
> --- a/arch/arm/mach-vexpress/core.h
> +++ b/arch/arm/mach-vexpress/core.h
> @@ -6,6 +6,8 @@
>  
>  void vexpress_dt_smp_map_io(void);
>  
> +void vexpress_smp_init_ops(void);
> +
>  extern struct smp_operations	vexpress_smp_ops;
>  
>  extern void vexpress_cpu_die(unsigned int cpu);
> diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
> index c5d70de9bb..667344b479 100644
> --- a/arch/arm/mach-vexpress/platsmp.c
> +++ b/arch/arm/mach-vexpress/platsmp.c
> @@ -12,6 +12,7 @@
>  #include <linux/errno.h>
>  #include <linux/smp.h>
>  #include <linux/io.h>
> +#include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/vexpress.h>
>  
> @@ -206,3 +207,14 @@ struct smp_operations __initdata vexpress_smp_ops = {
>  	.cpu_die		= vexpress_cpu_die,
>  #endif
>  };
> +
> +void __init vexpress_smp_init_ops(void)
> +{
> +	struct smp_operations *ops = &vexpress_smp_ops;
> +#ifdef CONFIG_CLUSTER_PM
> +	extern struct smp_operations mcpm_smp_ops;

Seems appropriate to put this prototype in a header file instead.

> +	if(of_find_compatible_node(NULL, NULL, "arm,cci"))

Another checkpatch error on if() whitespace.

Also, while bindings haven't been ironed out, checking for whether the
node/device is enabled or disabled could be appropriate here?

> +		ops = &mcpm_smp_ops;
> +#endif
> +	smp_set_ops(ops);
> +}
> diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
> index 011661a6c5..34172bd504 100644
> --- a/arch/arm/mach-vexpress/v2m.c
> +++ b/arch/arm/mach-vexpress/v2m.c
> @@ -494,7 +494,7 @@ static const char * const v2m_dt_match[] __initconst = {
>  
>  DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
>  	.dt_compat	= v2m_dt_match,
> -	.smp		= smp_ops(vexpress_smp_ops),
> +	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
>  	.map_io		= v2m_dt_map_io,
>  	.init_early	= v2m_dt_init_early,
>  	.init_irq	= v2m_dt_init_irq,


-Olof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup
  2013-02-05  5:22 ` [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup Nicolas Pitre
@ 2013-04-05 23:00   ` Olof Johansson
  2013-04-06 13:41     ` Nicolas Pitre
  2013-04-24  9:10     ` Dave Martin
  0 siblings, 2 replies; 55+ messages in thread
From: Olof Johansson @ 2013-04-05 23:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Just two nits below. One could be fixed incrementally, the other is
a longer-term potential cleanup.

On Tue, Feb 05, 2013 at 12:22:00AM -0500, Nicolas Pitre wrote:

> diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> index 3286d5eb91..e76652209d 100644
> --- a/arch/arm/include/asm/mcpm_entry.h
> +++ b/arch/arm/include/asm/mcpm_entry.h
> @@ -15,8 +15,37 @@
>  #define MAX_CPUS_PER_CLUSTER	4
>  #define MAX_NR_CLUSTERS		2
>  
> +/* Definitions for mcpm_sync_struct */
> +#define CPU_DOWN		0x11
> +#define CPU_COMING_UP		0x12
> +#define CPU_UP			0x13
> +#define CPU_GOING_DOWN		0x14
> +
> +#define CLUSTER_DOWN		0x21
> +#define CLUSTER_UP		0x22
> +#define CLUSTER_GOING_DOWN	0x23
> +
> +#define INBOUND_NOT_COMING_UP	0x31
> +#define INBOUND_COMING_UP	0x32
> +
> +/* This is a complete guess. */
> +#define __CACHE_WRITEBACK_ORDER	6
> +#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)

Something a little more educational could be useful here. It needs to
be the max writeback/line size of the supported platforms of the binary
kernel, i.e. if someone builds an SoC with 128-byte L2 cache lines this
guess will break.

> +/* Offsets for the mcpm_sync_struct members, for use in asm: */
> +#define MCPM_SYNC_CLUSTER_CPUS	0
> +#define MCPM_SYNC_CPU_SIZE	__CACHE_WRITEBACK_GRANULE
> +#define MCPM_SYNC_CLUSTER_CLUSTER \
> +	(MCPM_SYNC_CLUSTER_CPUS + MCPM_SYNC_CPU_SIZE * MAX_CPUS_PER_CLUSTER)
> +#define MCPM_SYNC_CLUSTER_INBOUND \
> +	(MCPM_SYNC_CLUSTER_CLUSTER + __CACHE_WRITEBACK_GRANULE)
> +#define MCPM_SYNC_CLUSTER_SIZE \
> +	(MCPM_SYNC_CLUSTER_INBOUND + __CACHE_WRITEBACK_GRANULE)
> +
>  #ifndef __ASSEMBLY__
>  
> +#include <linux/types.h>
> +
>  /*
>   * Platform specific code should use this symbol to set up secondary
>   * entry location for processors to use when released from reset.
> @@ -123,5 +152,39 @@ struct mcpm_platform_ops {
>   */
>  int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
>  
> +/* Synchronisation structures for coordinating safe cluster setup/teardown: */
> +
> +/*
> + * When modifying this structure, make sure you update the MCPM_SYNC_ defines
> + * to match.
> + */

It would be nice to introduce something like arch/powerpc/kernel/asm-offsets.c
on ARM to generate the ASM-side automatically from the C struct instead for
this and other cases, but that's unrelated to this addition.

> +struct mcpm_sync_struct {
> +	/* individual CPU states */
> +	struct {
> +		volatile s8 cpu __aligned(__CACHE_WRITEBACK_GRANULE);
> +	} cpus[MAX_CPUS_PER_CLUSTER];
> +
> +	/* cluster state */
> +	volatile s8 cluster __aligned(__CACHE_WRITEBACK_GRANULE);
> +
> +	/* inbound-side state */
> +	volatile s8 inbound __aligned(__CACHE_WRITEBACK_GRANULE);
> +};


-Olof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup
  2013-04-05 23:00   ` Olof Johansson
@ 2013-04-06 13:41     ` Nicolas Pitre
  2013-04-24  9:10     ` Dave Martin
  1 sibling, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-06 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 5 Apr 2013, Olof Johansson wrote:

> Hi,
> 
> Just two nits below. One could be fixed incrementally, the other is
> a longer-term potential cleanup.

I've updated my branch directly.  Especially the comment addition is 
best presented in the original patch to avoid confusion.

> 
> On Tue, Feb 05, 2013 at 12:22:00AM -0500, Nicolas Pitre wrote:
> 
> > diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> > index 3286d5eb91..e76652209d 100644
> > --- a/arch/arm/include/asm/mcpm_entry.h
> > +++ b/arch/arm/include/asm/mcpm_entry.h
> > @@ -15,8 +15,37 @@
> >  #define MAX_CPUS_PER_CLUSTER	4
> >  #define MAX_NR_CLUSTERS		2
> >  
> > +/* Definitions for mcpm_sync_struct */
> > +#define CPU_DOWN		0x11
> > +#define CPU_COMING_UP		0x12
> > +#define CPU_UP			0x13
> > +#define CPU_GOING_DOWN		0x14
> > +
> > +#define CLUSTER_DOWN		0x21
> > +#define CLUSTER_UP		0x22
> > +#define CLUSTER_GOING_DOWN	0x23
> > +
> > +#define INBOUND_NOT_COMING_UP	0x31
> > +#define INBOUND_COMING_UP	0x32
> > +
> > +/* This is a complete guess. */
> > +#define __CACHE_WRITEBACK_ORDER	6
> > +#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
> 
> Something a little more educational could be useful here. It needs to
> be the max writeback/line size of the supported platforms of the binary
> kernel, i.e. if someone builds an SoC with 128-byte L2 cache lines this
> guess will break.

The commit log says:

|Also, in order to prevent a cached writer from interfering with an
|adjacent non-cached writer, we ensure each state variable is located to
|a separate cache line.

Although a more explicit comment here would be helpful indeed.

Fixed.

Yet, that part will be reworked once we move to dynamic allocation 
anyway.  But for the time being it is best to keep the code simple while 
people get familiar with it.

> > +/* Offsets for the mcpm_sync_struct members, for use in asm: */
> > +#define MCPM_SYNC_CLUSTER_CPUS	0
> > +#define MCPM_SYNC_CPU_SIZE	__CACHE_WRITEBACK_GRANULE
> > +#define MCPM_SYNC_CLUSTER_CLUSTER \
> > +	(MCPM_SYNC_CLUSTER_CPUS + MCPM_SYNC_CPU_SIZE * MAX_CPUS_PER_CLUSTER)
> > +#define MCPM_SYNC_CLUSTER_INBOUND \
> > +	(MCPM_SYNC_CLUSTER_CLUSTER + __CACHE_WRITEBACK_GRANULE)
> > +#define MCPM_SYNC_CLUSTER_SIZE \
> > +	(MCPM_SYNC_CLUSTER_INBOUND + __CACHE_WRITEBACK_GRANULE)
> > +
> >  #ifndef __ASSEMBLY__
> >  
> > +#include <linux/types.h>
> > +
> >  /*
> >   * Platform specific code should use this symbol to set up secondary
> >   * entry location for processors to use when released from reset.
> > @@ -123,5 +152,39 @@ struct mcpm_platform_ops {
> >   */
> >  int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
> >  
> > +/* Synchronisation structures for coordinating safe cluster setup/teardown: */
> > +
> > +/*
> > + * When modifying this structure, make sure you update the MCPM_SYNC_ defines
> > + * to match.
> > + */
> 
> It would be nice to introduce something like arch/powerpc/kernel/asm-offsets.c
> on ARM to generate the ASM-side automatically from the C struct instead for
> this and other cases, but that's unrelated to this addition.

There is a arch/arm/kernel/asm-offsets.c already.

Here's the answer I provided before:
http://article.gmane.org/gmane.linux.ports.arm.kernel/209009

|That's how that was done initially. But that ended up cluttering
|asm-offsets.h for stuff that actually is really a local implementation
|detail which doesn't need kernel wide scope.  In other words, the end
|result looked worse.
|
|One could argue that they are still exposed too much as the only files
|that need to know about those defines are bL_head.S and bL_entry.c.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time
  2013-04-05 22:43   ` Olof Johansson
@ 2013-04-06 13:43     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-06 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 5 Apr 2013, Olof Johansson wrote:

> On Tue, Feb 05, 2013 at 12:22:11AM -0500, Nicolas Pitre wrote:
> > From: Jon Medhurst <tixy@linaro.org>
> > 
> > Add a new 'smp_init' hook to machine_desc so platforms can specify a
> > function to be used to setup smp ops instead of having a statically
> > defined value.
> > 
> > Signed-off-by: Jon Medhurst <tixy@linaro.org>
> > Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > ---
> >  arch/arm/include/asm/mach/arch.h | 3 +++
> >  arch/arm/kernel/setup.c          | 5 ++++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> > index 917d4fcfd9..3d01c6d6c3 100644
> > --- a/arch/arm/include/asm/mach/arch.h
> > +++ b/arch/arm/include/asm/mach/arch.h
> > @@ -17,8 +17,10 @@ struct pt_regs;
> >  struct smp_operations;
> >  #ifdef CONFIG_SMP
> >  #define smp_ops(ops) (&(ops))
> > +#define smp_init_ops(ops) (&(ops))
> >  #else
> >  #define smp_ops(ops) (struct smp_operations *)NULL
> > +#define smp_init_ops(ops) (void (*)(void))NULL
> >  #endif
> >  
> >  struct machine_desc {
> > @@ -42,6 +44,7 @@ struct machine_desc {
> >  	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
> >  	char			restart_mode;	/* default restart mode	*/
> >  	struct smp_operations	*smp;		/* SMP operations	*/
> > +	void			(*smp_init)(void);
> >  	void			(*fixup)(struct tag *, char **,
> >  					 struct meminfo *);
> >  	void			(*reserve)(void);/* reserve mem blocks	*/
> > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> > index 3f6cbb2e3e..41edca8582 100644
> > --- a/arch/arm/kernel/setup.c
> > +++ b/arch/arm/kernel/setup.c
> > @@ -768,7 +768,10 @@ void __init setup_arch(char **cmdline_p)
> >  	arm_dt_init_cpu_maps();
> >  #ifdef CONFIG_SMP
> >  	if (is_smp()) {
> > -		smp_set_ops(mdesc->smp);
> > +		if(mdesc->smp_init)
> 
> This will fail checkpatch: if() instead of if ().

Indeed.

> > +			(*mdesc->smp_init)();
> 
> This is different calling style than init_early() below, which uses
> mdesc->init_early(). Please be consistent.

Fixed in my tree now.

Thanks.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-05 22:48   ` Olof Johansson
@ 2013-04-06 14:02     ` Nicolas Pitre
  2013-04-08  9:10       ` Jon Medhurst (Tixy)
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-06 14:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 5 Apr 2013, Olof Johansson wrote:

> On Tue, Feb 05, 2013 at 12:22:12AM -0500, Nicolas Pitre wrote:
> > From: Jon Medhurst <tixy@linaro.org>
> > 
> > Signed-off-by: Jon Medhurst <tixy@linaro.org>
> > Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > ---
> >  arch/arm/mach-vexpress/core.h    |  2 ++
> >  arch/arm/mach-vexpress/platsmp.c | 12 ++++++++++++
> >  arch/arm/mach-vexpress/v2m.c     |  2 +-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
> > index f134cd4a85..3a761fd76c 100644
> > --- a/arch/arm/mach-vexpress/core.h
> > +++ b/arch/arm/mach-vexpress/core.h
> > @@ -6,6 +6,8 @@
> >  
> >  void vexpress_dt_smp_map_io(void);
> >  
> > +void vexpress_smp_init_ops(void);
> > +
> >  extern struct smp_operations	vexpress_smp_ops;
> >  
> >  extern void vexpress_cpu_die(unsigned int cpu);
> > diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
> > index c5d70de9bb..667344b479 100644
> > --- a/arch/arm/mach-vexpress/platsmp.c
> > +++ b/arch/arm/mach-vexpress/platsmp.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/errno.h>
> >  #include <linux/smp.h>
> >  #include <linux/io.h>
> > +#include <linux/of.h>
> >  #include <linux/of_fdt.h>
> >  #include <linux/vexpress.h>
> >  
> > @@ -206,3 +207,14 @@ struct smp_operations __initdata vexpress_smp_ops = {
> >  	.cpu_die		= vexpress_cpu_die,
> >  #endif
> >  };
> > +
> > +void __init vexpress_smp_init_ops(void)
> > +{
> > +	struct smp_operations *ops = &vexpress_smp_ops;
> > +#ifdef CONFIG_CLUSTER_PM
> > +	extern struct smp_operations mcpm_smp_ops;
> 
> Seems appropriate to put this prototype in a header file instead.
> 
> > +	if(of_find_compatible_node(NULL, NULL, "arm,cci"))
> 
> Another checkpatch error on if() whitespace.
> 
> Also, while bindings haven't been ironed out, checking for whether the
> node/device is enabled or disabled could be appropriate here?

Right.

I've amended this patch slightly so to:

1) Keep the default .smp = &vexpress_smp_ops and only perform an 
   override in vexpress_smp_init_ops() when appropriate.  This should
   remove one potential issue with Xen support that was highlighted 
   recently.

2) Add a mcpm_smp_set_ops() instead of installing mcpm_smp_ops directly
   for better abstraction.

3) Added a comment about and checked for the CCI node being enabled.

That results in the patch below.  Given those are minor changes, I kept 
the existing review tags.

From: Jon Medhurst <tixy@linaro.org>
Date: Wed, 30 Jan 2013 09:12:55 +0000
Subject: [PATCH] ARM: vexpress: Select multi-cluster SMP operation if required

Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>

diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
index 34f236af09..cc82040877 100644
--- a/arch/arm/common/mcpm_platsmp.c
+++ b/arch/arm/common/mcpm_platsmp.c
@@ -85,3 +85,8 @@ struct smp_operations __initdata mcpm_smp_ops = {
 	.cpu_die		= mcpm_cpu_die,
 #endif
 };
+
+void __init mcpm_smp_set_ops(void)
+{
+	smp_set_ops(&mcpm_smp_ops);
+}
diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
index 814623e6a1..34dfb86ff5 100644
--- a/arch/arm/include/asm/mcpm_entry.h
+++ b/arch/arm/include/asm/mcpm_entry.h
@@ -190,5 +190,7 @@ int __mcpm_cluster_state(unsigned int cluster);
 int __init mcpm_sync_init(
 	void (*power_up_setup)(unsigned int affinity_level));
 
+void __init mcpm_smp_set_ops(void);
+
 #endif /* ! __ASSEMBLY__ */
 #endif
diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
index f134cd4a85..3a761fd76c 100644
--- a/arch/arm/mach-vexpress/core.h
+++ b/arch/arm/mach-vexpress/core.h
@@ -6,6 +6,8 @@
 
 void vexpress_dt_smp_map_io(void);
 
+void vexpress_smp_init_ops(void);
+
 extern struct smp_operations	vexpress_smp_ops;
 
 extern void vexpress_cpu_die(unsigned int cpu);
diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
index dc1ace55d5..06317bc714 100644
--- a/arch/arm/mach-vexpress/platsmp.c
+++ b/arch/arm/mach-vexpress/platsmp.c
@@ -12,9 +12,11 @@
 #include <linux/errno.h>
 #include <linux/smp.h>
 #include <linux/io.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/vexpress.h>
 
+#include <asm/mcpm_entry.h>
 #include <asm/smp_scu.h>
 #include <asm/mach/map.h>
 
@@ -203,3 +205,18 @@ struct smp_operations __initdata vexpress_smp_ops = {
 	.cpu_die		= vexpress_cpu_die,
 #endif
 };
+
+void __init vexpress_smp_init_ops(void)
+{
+#ifdef CONFIG_MCPM
+	/*
+	 * The best way to detect a multi-cluster configuration at the moment
+	 * is to look for the presence of a CCI in the system.
+	 * Override the default vexpress_smp_ops if so.
+	 */
+	struct device_node *node;
+	node = of_find_compatible_node(NULL, NULL, "arm,cci");
+	if (node && of_device_is_available(node))
+		mcpm_smp_set_ops();
+#endif
+}
diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
index 915683cb67..16b42c10e0 100644
--- a/arch/arm/mach-vexpress/v2m.c
+++ b/arch/arm/mach-vexpress/v2m.c
@@ -476,6 +476,7 @@ static const char * const v2m_dt_match[] __initconst = {
 DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
 	.dt_compat	= v2m_dt_match,
 	.smp		= smp_ops(vexpress_smp_ops),
+	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
 	.map_io		= v2m_dt_map_io,
 	.init_early	= v2m_dt_init_early,
 	.init_irq	= irqchip_init,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-06 14:02     ` Nicolas Pitre
@ 2013-04-08  9:10       ` Jon Medhurst (Tixy)
  2013-04-09  5:41         ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Jon Medhurst (Tixy) @ 2013-04-08  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2013-04-06 at 10:02 -0400, Nicolas Pitre wrote:
> I've amended this patch slightly so to:
> 
> 1) Keep the default .smp = &vexpress_smp_ops and only perform an 
>    override in vexpress_smp_init_ops() when appropriate.  This should
>    remove one potential issue with Xen support that was highlighted 
>    recently.
> 
> 2) Add a mcpm_smp_set_ops() instead of installing mcpm_smp_ops directly
>    for better abstraction.
>
> 3) Added a comment about and checked for the CCI node being enabled.
> 
> That results in the patch below.  Given those are minor changes, I kept 
> the existing review tags.
>
> From: Jon Medhurst <tixy@linaro.org>
> Date: Wed, 30 Jan 2013 09:12:55 +0000
> Subject: [PATCH] ARM: vexpress: Select multi-cluster SMP operation if required
> 
> Signed-off-by: Jon Medhurst <tixy@linaro.org>
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Acked-by: Pawel Moll <pawel.moll@arm.com>
>
> diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
> index 34f236af09..cc82040877 100644
> --- a/arch/arm/common/mcpm_platsmp.c
> +++ b/arch/arm/common/mcpm_platsmp.c
> @@ -85,3 +85,8 @@ struct smp_operations __initdata mcpm_smp_ops = {
>  	.cpu_die		= mcpm_cpu_die,
>  #endif
>  };
> +
> +void __init mcpm_smp_set_ops(void)
> +{
> +	smp_set_ops(&mcpm_smp_ops);
> +}
> diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> index 814623e6a1..34dfb86ff5 100644
> --- a/arch/arm/include/asm/mcpm_entry.h
> +++ b/arch/arm/include/asm/mcpm_entry.h
> @@ -190,5 +190,7 @@ int __mcpm_cluster_state(unsigned int cluster);
>  int __init mcpm_sync_init(
>  	void (*power_up_setup)(unsigned int affinity_level));
>  
> +void __init mcpm_smp_set_ops(void);
> +
>  #endif /* ! __ASSEMBLY__ */
>  #endif


Do the changes to the above mcpm files want to be in a separate patch as
it's generic ARM code, not vexpress specific?

> diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
> index f134cd4a85..3a761fd76c 100644
> --- a/arch/arm/mach-vexpress/core.h
> +++ b/arch/arm/mach-vexpress/core.h
> @@ -6,6 +6,8 @@
>  
>  void vexpress_dt_smp_map_io(void);
>  
> +void vexpress_smp_init_ops(void);
> +
>  extern struct smp_operations	vexpress_smp_ops;
>  
>  extern void vexpress_cpu_die(unsigned int cpu);
> diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
> index dc1ace55d5..06317bc714 100644
> --- a/arch/arm/mach-vexpress/platsmp.c
> +++ b/arch/arm/mach-vexpress/platsmp.c
> @@ -12,9 +12,11 @@
>  #include <linux/errno.h>
>  #include <linux/smp.h>
>  #include <linux/io.h>
> +#include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/vexpress.h>
>  
> +#include <asm/mcpm_entry.h>
>  #include <asm/smp_scu.h>
>  #include <asm/mach/map.h>
>  
> @@ -203,3 +205,18 @@ struct smp_operations __initdata vexpress_smp_ops = {
>  	.cpu_die		= vexpress_cpu_die,
>  #endif
>  };
> +
> +void __init vexpress_smp_init_ops(void)
> +{
> +#ifdef CONFIG_MCPM
> +	/*
> +	 * The best way to detect a multi-cluster configuration at the moment
> +	 * is to look for the presence of a CCI in the system.
> +	 * Override the default vexpress_smp_ops if so.
> +	 */
> +	struct device_node *node;
> +	node = of_find_compatible_node(NULL, NULL, "arm,cci");
> +	if (node && of_device_is_available(node))
> +		mcpm_smp_set_ops();
> +#endif
> +}

What now sets smp_ops if mcpm_smp_set_ops doesn't get called above? I
know there are umpteen versions of the patch "ARM: Enable selection of
SMP operations at boot time", but none of them seem to resolve that
issue.

> diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
> index 915683cb67..16b42c10e0 100644
> --- a/arch/arm/mach-vexpress/v2m.c
> +++ b/arch/arm/mach-vexpress/v2m.c
> @@ -476,6 +476,7 @@ static const char * const v2m_dt_match[] __initconst = {
>  DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
>  	.dt_compat	= v2m_dt_match,
>  	.smp		= smp_ops(vexpress_smp_ops),
> +	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
>  	.map_io		= v2m_dt_map_io,
>  	.init_early	= v2m_dt_init_early,
>  	.init_irq	= irqchip_init,

-- 
Tixy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-08  9:10       ` Jon Medhurst (Tixy)
@ 2013-04-09  5:41         ` Nicolas Pitre
  2013-04-09  6:00           ` Jon Medhurst (Tixy)
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-09  5:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 8 Apr 2013, Jon Medhurst (Tixy) wrote:

> On Sat, 2013-04-06 at 10:02 -0400, Nicolas Pitre wrote:
> > I've amended this patch slightly so to:
> > 
> > 1) Keep the default .smp = &vexpress_smp_ops and only perform an 
> >    override in vexpress_smp_init_ops() when appropriate.  This should
> >    remove one potential issue with Xen support that was highlighted 
> >    recently.
> > 
> > 2) Add a mcpm_smp_set_ops() instead of installing mcpm_smp_ops directly
> >    for better abstraction.
> >
> > 3) Added a comment about and checked for the CCI node being enabled.
> > 
> > That results in the patch below.  Given those are minor changes, I kept 
> > the existing review tags.
> >
> > From: Jon Medhurst <tixy@linaro.org>
> > Date: Wed, 30 Jan 2013 09:12:55 +0000
> > Subject: [PATCH] ARM: vexpress: Select multi-cluster SMP operation if required
> > 
> > Signed-off-by: Jon Medhurst <tixy@linaro.org>
> > Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> > Acked-by: Pawel Moll <pawel.moll@arm.com>
> >
> > diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
> > index 34f236af09..cc82040877 100644
> > --- a/arch/arm/common/mcpm_platsmp.c
> > +++ b/arch/arm/common/mcpm_platsmp.c
> > @@ -85,3 +85,8 @@ struct smp_operations __initdata mcpm_smp_ops = {
> >  	.cpu_die		= mcpm_cpu_die,
> >  #endif
> >  };
> > +
> > +void __init mcpm_smp_set_ops(void)
> > +{
> > +	smp_set_ops(&mcpm_smp_ops);
> > +}
> > diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> > index 814623e6a1..34dfb86ff5 100644
> > --- a/arch/arm/include/asm/mcpm_entry.h
> > +++ b/arch/arm/include/asm/mcpm_entry.h
> > @@ -190,5 +190,7 @@ int __mcpm_cluster_state(unsigned int cluster);
> >  int __init mcpm_sync_init(
> >  	void (*power_up_setup)(unsigned int affinity_level));
> >  
> > +void __init mcpm_smp_set_ops(void);
> > +
> >  #endif /* ! __ASSEMBLY__ */
> >  #endif
> 
> 
> Do the changes to the above mcpm files want to be in a separate patch as
> it's generic ARM code, not vexpress specific?

Well, since it was so trivial I didn't do it, but the split might be a 
good idea nevertheless.  So I inserted the following patch in the 
series, and corresponding changes are now removed from the patch above.  
What do you think?

Author: Nicolas Pitre <nicolas.pitre@linaro.org>
Date:   Tue Apr 9 01:29:17 2013 -0400

    ARM: mcpm: provide an interface to set the SMP ops at run time
    
    This is cleaner than exporting the mcpm_smp_ops structure.
    
    Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
index 34f236af09..79ed70d846 100644
--- a/arch/arm/common/mcpm_platsmp.c
+++ b/arch/arm/common/mcpm_platsmp.c
@@ -76,7 +76,7 @@ static void mcpm_cpu_die(unsigned int cpu)
 
 #endif
 
-struct smp_operations __initdata mcpm_smp_ops = {
+static struct smp_operations __initdata mcpm_smp_ops = {
 	.smp_init_cpus		= simple_smp_init_cpus,
 	.smp_boot_secondary	= mcpm_boot_secondary,
 	.smp_secondary_init	= mcpm_secondary_init,
@@ -85,3 +85,8 @@ struct smp_operations __initdata mcpm_smp_ops = {
 	.cpu_die		= mcpm_cpu_die,
 #endif
 };
+
+void __init mcpm_smp_set_ops(void)
+{
+	smp_set_ops(&mcpm_smp_ops);
+}
diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
index 814623e6a1..34dfb86ff5 100644
--- a/arch/arm/include/asm/mcpm_entry.h
+++ b/arch/arm/include/asm/mcpm_entry.h
@@ -190,5 +190,7 @@ int __mcpm_cluster_state(unsigned int cluster);
 int __init mcpm_sync_init(
 	void (*power_up_setup)(unsigned int affinity_level));
 
+void __init mcpm_smp_set_ops(void);
+
 #endif /* ! __ASSEMBLY__ */
 #endif

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-09  5:41         ` Nicolas Pitre
@ 2013-04-09  6:00           ` Jon Medhurst (Tixy)
  2013-04-09 16:34             ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Jon Medhurst (Tixy) @ 2013-04-09  6:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2013-04-09 at 01:41 -0400, Nicolas Pitre wrote:
> > Do the changes to the above mcpm files want to be in a separate patch as
> > it's generic ARM code, not vexpress specific?
> 
> Well, since it was so trivial I didn't do it, but the split might be a 
> good idea nevertheless.  So I inserted the following patch in the 
> series, and corresponding changes are now removed from the patch above.  
> What do you think?

Looks fine. Did you miss my other comment about smp_ops not looking like
they were getting set in the case of mcpm_smp_set_ops not getting
called?


> Author: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date:   Tue Apr 9 01:29:17 2013 -0400
> 
>     ARM: mcpm: provide an interface to set the SMP ops at run time
>     
>     This is cleaner than exporting the mcpm_smp_ops structure.
>     
>     Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/common/mcpm_platsmp.c b/arch/arm/common/mcpm_platsmp.c
> index 34f236af09..79ed70d846 100644
> --- a/arch/arm/common/mcpm_platsmp.c
> +++ b/arch/arm/common/mcpm_platsmp.c
> @@ -76,7 +76,7 @@ static void mcpm_cpu_die(unsigned int cpu)
>  
>  #endif
>  
> -struct smp_operations __initdata mcpm_smp_ops = {
> +static struct smp_operations __initdata mcpm_smp_ops = {
>  	.smp_init_cpus		= simple_smp_init_cpus,
>  	.smp_boot_secondary	= mcpm_boot_secondary,
>  	.smp_secondary_init	= mcpm_secondary_init,
> @@ -85,3 +85,8 @@ struct smp_operations __initdata mcpm_smp_ops = {
>  	.cpu_die		= mcpm_cpu_die,
>  #endif
>  };
> +
> +void __init mcpm_smp_set_ops(void)
> +{
> +	smp_set_ops(&mcpm_smp_ops);
> +}
> diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> index 814623e6a1..34dfb86ff5 100644
> --- a/arch/arm/include/asm/mcpm_entry.h
> +++ b/arch/arm/include/asm/mcpm_entry.h
> @@ -190,5 +190,7 @@ int __mcpm_cluster_state(unsigned int cluster);
>  int __init mcpm_sync_init(
>  	void (*power_up_setup)(unsigned int affinity_level));
>  
> +void __init mcpm_smp_set_ops(void);
> +
>  #endif /* ! __ASSEMBLY__ */
>  #endif
> 

-- 
Tixy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time
  2013-02-05  5:22 ` [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time Nicolas Pitre
  2013-04-05 22:43   ` Olof Johansson
@ 2013-04-09 16:30   ` Nicolas Pitre
  2013-04-09 16:55     ` Jon Medhurst (Tixy)
  1 sibling, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-09 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 5 Feb 2013, Nicolas Pitre wrote:

> From: Jon Medhurst <tixy@linaro.org>
> 
> Add a new 'smp_init' hook to machine_desc so platforms can specify a
> function to be used to setup smp ops instead of having a statically
> defined value.
> 
> Signed-off-by: Jon Medhurst <tixy@linaro.org>
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

I've slightly amended this patch to make its usage more flexible, please 
see below.

> diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> index 917d4fcfd9..3d01c6d6c3 100644
> --- a/arch/arm/include/asm/mach/arch.h
> +++ b/arch/arm/include/asm/mach/arch.h
> @@ -17,8 +17,10 @@ struct pt_regs;
>  struct smp_operations;
>  #ifdef CONFIG_SMP
>  #define smp_ops(ops) (&(ops))
> +#define smp_init_ops(ops) (&(ops))
>  #else
>  #define smp_ops(ops) (struct smp_operations *)NULL
> +#define smp_init_ops(ops) (void (*)(void))NULL
>  #endif
>  
>  struct machine_desc {
> @@ -42,6 +44,7 @@ struct machine_desc {
>  	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
>  	char			restart_mode;	/* default restart mode	*/
>  	struct smp_operations	*smp;		/* SMP operations	*/
> +	void			(*smp_init)(void);
>  	void			(*fixup)(struct tag *, char **,
>  					 struct meminfo *);
>  	void			(*reserve)(void);/* reserve mem blocks	*/
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 3f6cbb2e3e..41edca8582 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -768,7 +768,10 @@ void __init setup_arch(char **cmdline_p)
>  	arm_dt_init_cpu_maps();
>  #ifdef CONFIG_SMP
>  	if (is_smp()) {
> -		smp_set_ops(mdesc->smp);
> +		if(mdesc->smp_init)
> +			(*mdesc->smp_init)();
> +		else
> +			smp_set_ops(mdesc->smp);
>  		smp_init_cpus();
>  	}
>  #endif

I've amended it with the following changes to deal with an issue 
highlighted by Tixy.  If the runtime hook does not initialize the smp 
ops, the core may continue with a default.  That should let MCPM, PSCI 
and Xen play well together.

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index c01bf53b85..af8c54c6c6 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -19,7 +19,7 @@ struct smp_operations;
 #define smp_init_ops(ops) (&(ops))
 #else
 #define smp_ops(ops) (struct smp_operations *)NULL
-#define smp_init_ops(ops) (void (*)(void))NULL
+#define smp_init_ops(ops) (bool (*)(void))NULL
 #endif
 
 struct machine_desc {
@@ -43,7 +43,7 @@ struct machine_desc {
 	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
 	char			restart_mode;	/* default restart mode	*/
 	struct smp_operations	*smp;		/* SMP operations	*/
-	void			(*smp_init)(void);
+	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
 	void			(*reserve)(void);/* reserve mem blocks	*/
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index e69c580c6f..cf4b08c0f9 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -768,9 +768,7 @@ void __init setup_arch(char **cmdline_p)
 	arm_dt_init_cpu_maps();
 #ifdef CONFIG_SMP
 	if (is_smp()) {
-		if (mdesc->smp_init)
-			mdesc->smp_init();
-		else
+		if (!mdesc->smp_init || !mdesc->smp_init())
 			smp_set_ops(mdesc->smp);
 		smp_init_cpus();
 	}


Nicolas

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-09  6:00           ` Jon Medhurst (Tixy)
@ 2013-04-09 16:34             ` Nicolas Pitre
  2013-04-09 17:28               ` Jon Medhurst (Tixy)
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-09 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 9 Apr 2013, Jon Medhurst (Tixy) wrote:

> On Tue, 2013-04-09 at 01:41 -0400, Nicolas Pitre wrote:
> > > Do the changes to the above mcpm files want to be in a separate patch as
> > > it's generic ARM code, not vexpress specific?
> > 
> > Well, since it was so trivial I didn't do it, but the split might be a 
> > good idea nevertheless.  So I inserted the following patch in the 
> > series, and corresponding changes are now removed from the patch above.  
> > What do you think?
> 
> Looks fine. 

May I add your ACK?

> Did you miss my other comment about smp_ops not looking like they were 
> getting set in the case of mcpm_smp_set_ops not getting called?

You are right.  See my change to #14/15 to fix that.

The VExpress patch now becomes this (only change is the return value):

Author: Jon Medhurst <tixy@linaro.org>
Date:   Wed Jan 30 09:12:55 2013 +0000

    ARM: vexpress: Select multi-cluster SMP operation if required
    
    Signed-off-by: Jon Medhurst <tixy@linaro.org>
    Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
    Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
    Acked-by: Pawel Moll <pawel.moll@arm.com>

diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
index f134cd4a85..bde4374ab6 100644
--- a/arch/arm/mach-vexpress/core.h
+++ b/arch/arm/mach-vexpress/core.h
@@ -6,6 +6,8 @@
 
 void vexpress_dt_smp_map_io(void);
 
+bool vexpress_smp_init_ops(void);
+
 extern struct smp_operations	vexpress_smp_ops;
 
 extern void vexpress_cpu_die(unsigned int cpu);
diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
index dc1ace55d5..f31a7af712 100644
--- a/arch/arm/mach-vexpress/platsmp.c
+++ b/arch/arm/mach-vexpress/platsmp.c
@@ -12,9 +12,11 @@
 #include <linux/errno.h>
 #include <linux/smp.h>
 #include <linux/io.h>
+#include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/vexpress.h>
 
+#include <asm/mcpm_entry.h>
 #include <asm/smp_scu.h>
 #include <asm/mach/map.h>
 
@@ -203,3 +205,21 @@ struct smp_operations __initdata vexpress_smp_ops = {
 	.cpu_die		= vexpress_cpu_die,
 #endif
 };
+
+bool __init vexpress_smp_init_ops(void)
+{
+#ifdef CONFIG_MCPM
+	/*
+	 * The best way to detect a multi-cluster configuration at the moment
+	 * is to look for the presence of a CCI in the system.
+	 * Override the default vexpress_smp_ops if so.
+	 */
+	struct device_node *node;
+	node = of_find_compatible_node(NULL, NULL, "arm,cci");
+	if (node && of_device_is_available(node)) {
+		mcpm_smp_set_ops();
+		return true;
+	}
+#endif
+	return false;
+}
diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
index 915683cb67..16b42c10e0 100644
--- a/arch/arm/mach-vexpress/v2m.c
+++ b/arch/arm/mach-vexpress/v2m.c
@@ -476,6 +476,7 @@ static const char * const v2m_dt_match[] __initconst = {
 DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
 	.dt_compat	= v2m_dt_match,
 	.smp		= smp_ops(vexpress_smp_ops),
+	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
 	.map_io		= v2m_dt_map_io,
 	.init_early	= v2m_dt_init_early,
 	.init_irq	= irqchip_init,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time
  2013-04-09 16:30   ` Nicolas Pitre
@ 2013-04-09 16:55     ` Jon Medhurst (Tixy)
  0 siblings, 0 replies; 55+ messages in thread
From: Jon Medhurst (Tixy) @ 2013-04-09 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2013-04-09 at 12:30 -0400, Nicolas Pitre wrote:
> On Tue, 5 Feb 2013, Nicolas Pitre wrote:
> 
> > From: Jon Medhurst <tixy@linaro.org>
> > 
> > Add a new 'smp_init' hook to machine_desc so platforms can specify a
> > function to be used to setup smp ops instead of having a statically
> > defined value.
> > 
> > Signed-off-by: Jon Medhurst <tixy@linaro.org>
> > Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> 
> I've slightly amended this patch to make its usage more flexible, please 
> see below.

Looks good to me.

Reviewed-by: Jon Medhurst <tixy@linaro.org>

> > diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> > index 917d4fcfd9..3d01c6d6c3 100644
> > --- a/arch/arm/include/asm/mach/arch.h
> > +++ b/arch/arm/include/asm/mach/arch.h
> > @@ -17,8 +17,10 @@ struct pt_regs;
> >  struct smp_operations;
> >  #ifdef CONFIG_SMP
> >  #define smp_ops(ops) (&(ops))
> > +#define smp_init_ops(ops) (&(ops))
> >  #else
> >  #define smp_ops(ops) (struct smp_operations *)NULL
> > +#define smp_init_ops(ops) (void (*)(void))NULL
> >  #endif
> >  
> >  struct machine_desc {
> > @@ -42,6 +44,7 @@ struct machine_desc {
> >  	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
> >  	char			restart_mode;	/* default restart mode	*/
> >  	struct smp_operations	*smp;		/* SMP operations	*/
> > +	void			(*smp_init)(void);
> >  	void			(*fixup)(struct tag *, char **,
> >  					 struct meminfo *);
> >  	void			(*reserve)(void);/* reserve mem blocks	*/
> > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> > index 3f6cbb2e3e..41edca8582 100644
> > --- a/arch/arm/kernel/setup.c
> > +++ b/arch/arm/kernel/setup.c
> > @@ -768,7 +768,10 @@ void __init setup_arch(char **cmdline_p)
> >  	arm_dt_init_cpu_maps();
> >  #ifdef CONFIG_SMP
> >  	if (is_smp()) {
> > -		smp_set_ops(mdesc->smp);
> > +		if(mdesc->smp_init)
> > +			(*mdesc->smp_init)();
> > +		else
> > +			smp_set_ops(mdesc->smp);
> >  		smp_init_cpus();
> >  	}
> >  #endif
> 
> I've amended it with the following changes to deal with an issue 
> highlighted by Tixy.  If the runtime hook does not initialize the smp 
> ops, the core may continue with a default.  That should let MCPM, PSCI 
> and Xen play well together.
> 
> diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
> index c01bf53b85..af8c54c6c6 100644
> --- a/arch/arm/include/asm/mach/arch.h
> +++ b/arch/arm/include/asm/mach/arch.h
> @@ -19,7 +19,7 @@ struct smp_operations;
>  #define smp_init_ops(ops) (&(ops))
>  #else
>  #define smp_ops(ops) (struct smp_operations *)NULL
> -#define smp_init_ops(ops) (void (*)(void))NULL
> +#define smp_init_ops(ops) (bool (*)(void))NULL
>  #endif
>  
>  struct machine_desc {
> @@ -43,7 +43,7 @@ struct machine_desc {
>  	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
>  	char			restart_mode;	/* default restart mode	*/
>  	struct smp_operations	*smp;		/* SMP operations	*/
> -	void			(*smp_init)(void);
> +	bool			(*smp_init)(void);
>  	void			(*fixup)(struct tag *, char **,
>  					 struct meminfo *);
>  	void			(*reserve)(void);/* reserve mem blocks	*/
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index e69c580c6f..cf4b08c0f9 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -768,9 +768,7 @@ void __init setup_arch(char **cmdline_p)
>  	arm_dt_init_cpu_maps();
>  #ifdef CONFIG_SMP
>  	if (is_smp()) {
> -		if (mdesc->smp_init)
> -			mdesc->smp_init();
> -		else
> +		if (!mdesc->smp_init || !mdesc->smp_init())
>  			smp_set_ops(mdesc->smp);
>  		smp_init_cpus();
>  	}
> 
> 
> Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-09 16:34             ` Nicolas Pitre
@ 2013-04-09 17:28               ` Jon Medhurst (Tixy)
  0 siblings, 0 replies; 55+ messages in thread
From: Jon Medhurst (Tixy) @ 2013-04-09 17:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2013-04-09 at 12:34 -0400, Nicolas Pitre wrote:
> On Tue, 9 Apr 2013, Jon Medhurst (Tixy) wrote:
> 
> > On Tue, 2013-04-09 at 01:41 -0400, Nicolas Pitre wrote:
> > > > Do the changes to the above mcpm files want to be in a separate patch as
> > > > it's generic ARM code, not vexpress specific?
> > > 
> > > Well, since it was so trivial I didn't do it, but the split might be a 
> > > good idea nevertheless.  So I inserted the following patch in the 
> > > series, and corresponding changes are now removed from the patch above.  
> > > What do you think?
> > 
> > Looks fine. 
> 
> May I add your ACK?

Yes,
Acked-by: Jon Medhurst <tixy@linaro.org>

> > Did you miss my other comment about smp_ops not looking like they were 
> > getting set in the case of mcpm_smp_set_ops not getting called?
> 
> You are right.  See my change to #14/15 to fix that.
> 
> The VExpress patch now becomes this (only change is the return value):
> 
> Author: Jon Medhurst <tixy@linaro.org>
> Date:   Wed Jan 30 09:12:55 2013 +0000
> 
>     ARM: vexpress: Select multi-cluster SMP operation if required
>     
>     Signed-off-by: Jon Medhurst <tixy@linaro.org>
>     Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
>     Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
>     Acked-by: Pawel Moll <pawel.moll@arm.com>
> 
> diff --git a/arch/arm/mach-vexpress/core.h b/arch/arm/mach-vexpress/core.h
> index f134cd4a85..bde4374ab6 100644
> --- a/arch/arm/mach-vexpress/core.h
> +++ b/arch/arm/mach-vexpress/core.h
> @@ -6,6 +6,8 @@
>  
>  void vexpress_dt_smp_map_io(void);
>  
> +bool vexpress_smp_init_ops(void);
> +
>  extern struct smp_operations	vexpress_smp_ops;
>  
>  extern void vexpress_cpu_die(unsigned int cpu);
> diff --git a/arch/arm/mach-vexpress/platsmp.c b/arch/arm/mach-vexpress/platsmp.c
> index dc1ace55d5..f31a7af712 100644
> --- a/arch/arm/mach-vexpress/platsmp.c
> +++ b/arch/arm/mach-vexpress/platsmp.c
> @@ -12,9 +12,11 @@
>  #include <linux/errno.h>
>  #include <linux/smp.h>
>  #include <linux/io.h>
> +#include <linux/of.h>
>  #include <linux/of_fdt.h>
>  #include <linux/vexpress.h>
>  
> +#include <asm/mcpm_entry.h>
>  #include <asm/smp_scu.h>
>  #include <asm/mach/map.h>
>  
> @@ -203,3 +205,21 @@ struct smp_operations __initdata vexpress_smp_ops = {
>  	.cpu_die		= vexpress_cpu_die,
>  #endif
>  };
> +
> +bool __init vexpress_smp_init_ops(void)
> +{
> +#ifdef CONFIG_MCPM
> +	/*
> +	 * The best way to detect a multi-cluster configuration at the moment
> +	 * is to look for the presence of a CCI in the system.
> +	 * Override the default vexpress_smp_ops if so.
> +	 */
> +	struct device_node *node;
> +	node = of_find_compatible_node(NULL, NULL, "arm,cci");
> +	if (node && of_device_is_available(node)) {
> +		mcpm_smp_set_ops();
> +		return true;
> +	}
> +#endif
> +	return false;
> +}
> diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
> index 915683cb67..16b42c10e0 100644
> --- a/arch/arm/mach-vexpress/v2m.c
> +++ b/arch/arm/mach-vexpress/v2m.c
> @@ -476,6 +476,7 @@ static const char * const v2m_dt_match[] __initconst = {
>  DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
>  	.dt_compat	= v2m_dt_match,
>  	.smp		= smp_ops(vexpress_smp_ops),
> +	.smp_init	= smp_init_ops(vexpress_smp_init_ops),
>  	.map_io		= v2m_dt_map_io,
>  	.init_early	= v2m_dt_init_early,
>  	.init_irq	= irqchip_init,
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code
  2013-02-05  5:21 ` [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code Nicolas Pitre
@ 2013-04-23 19:19   ` Russell King - ARM Linux
  2013-04-23 19:34     ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:21:58AM -0500, Nicolas Pitre wrote:
> +#include <asm/mcpm_entry.h>
> +#include <asm/barrier.h>
> +#include <asm/proc-fns.h>
> +#include <asm/cacheflush.h>
> +
> +extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];

This should not be volatile.  You should know by now the stance in the
Linux community against using volatile on data declarations.  See
Documentation/volatile-considered-harmful.txt to remind yourself of
the reasoning.

> +
> +void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> +{
> +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> +	mcpm_entry_vectors[cluster][cpu] = val;
> +	__cpuc_flush_dcache_area((void *)&mcpm_entry_vectors[cluster][cpu], 4);
> +	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
> +			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));

And really, if the write hasn't been done by the compiler prior to calling
__cpuc_flush_dcache_area() then we're into really bad problems.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support
  2013-02-05  5:22 ` [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support Nicolas Pitre
@ 2013-04-23 19:31   ` Russell King - ARM Linux
  2013-04-23 19:36     ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:03AM -0500, Nicolas Pitre wrote:
> +static void mcpm_cpu_die(unsigned int cpu)
> +{
> +	unsigned int mpidr, pcpu, pcluster;
> +	mpidr = read_cpuid_mpidr();
> +	pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> +	pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> +	mcpm_set_entry_vector(pcpu, pcluster, NULL);
> +	mcpm_cpu_power_down();

How are dirty caches handled here for the dying CPU?  Bear in mind at the
moment, all that is left to the smp platform implementation, but I don't
see anything in your patch set so far which handles this.

This is going to be especially important as we're moving that into the
generic CPU hotplug code out of the platform code.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code
  2013-04-23 19:19   ` Russell King - ARM Linux
@ 2013-04-23 19:34     ` Nicolas Pitre
  2013-04-23 20:09       ` Russell King - ARM Linux
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Feb 05, 2013 at 12:21:58AM -0500, Nicolas Pitre wrote:
> > +#include <asm/mcpm_entry.h>
> > +#include <asm/barrier.h>
> > +#include <asm/proc-fns.h>
> > +#include <asm/cacheflush.h>
> > +
> > +extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
> 
> This should not be volatile.  You should know by now the stance in the
> Linux community against using volatile on data declarations.  See
> Documentation/volatile-considered-harmful.txt to remind yourself of
> the reasoning.

That document says:

|The key point to understand with regard to volatile is that its purpose 
|is to suppress optimization, which is almost never what one really 
|wants to do.

Turns out that this is exactly what we want here: suppress optimization.

However ...

> > +
> > +void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> > +{
> > +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> > +	mcpm_entry_vectors[cluster][cpu] = val;
> > +	__cpuc_flush_dcache_area((void *)&mcpm_entry_vectors[cluster][cpu], 4);
> > +	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
> > +			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));
> 
> And really, if the write hasn't been done by the compiler prior to calling
> __cpuc_flush_dcache_area() then we're into really bad problems.

That is indeed true.  The memory might have been uncacheable at some 
point and then the volatile was necessary in that case.

I removed it in my tree.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support
  2013-04-23 19:31   ` Russell King - ARM Linux
@ 2013-04-23 19:36     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 19:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Feb 05, 2013 at 12:22:03AM -0500, Nicolas Pitre wrote:
> > +static void mcpm_cpu_die(unsigned int cpu)
> > +{
> > +	unsigned int mpidr, pcpu, pcluster;
> > +	mpidr = read_cpuid_mpidr();
> > +	pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> > +	pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> > +	mcpm_set_entry_vector(pcpu, pcluster, NULL);
> > +	mcpm_cpu_power_down();
> 
> How are dirty caches handled here for the dying CPU?  Bear in mind at the
> moment, all that is left to the smp platform implementation, but I don't
> see anything in your patch set so far which handles this.
> 
> This is going to be especially important as we're moving that into the
> generic CPU hotplug code out of the platform code.

The cache is handled in the machine specific backend accessed through 
mcpm_cpu_power_down().  See the DCSCB related patches later in the 
series.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 11/15] drivers/bus: add ARM CCI support
  2013-02-05  5:22 ` [PATCH v4 11/15] drivers/bus: add ARM CCI support Nicolas Pitre
@ 2013-04-23 19:38   ` Russell King - ARM Linux
  2013-04-23 19:53     ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 19:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:08AM -0500, Nicolas Pitre wrote:
> +void disable_cci(int cluster)
> +{
> +	u32 slave_reg = cluster ? CCI400_KF_OFFSET : CCI400_EAG_OFFSET;
> +	writel_relaxed(0x0, info->baseaddr + slave_reg);
> +
> +	while (readl_relaxed(info->baseaddr + CCI_STATUS_OFFSET)
> +						& STATUS_CHANGE_PENDING)
> +			barrier();
> +}
> +EXPORT_SYMBOL_GPL(disable_cci);

This will blow up if the cci driver hasn't been probed - which I guess is
fine.

> +
> +static int cci_driver_probe(struct platform_device *pdev)
> +{
> +	struct resource *res;
> +	int ret = 0;
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info) {
> +		dev_err(&pdev->dev, "unable to allocate mem\n");
> +		return -ENOMEM;
> +	}
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(&pdev->dev, "No memory resource\n");
> +		ret = -EINVAL;
> +		goto mem_free;
> +	}
> +
> +	if (!request_mem_region(res->start, resource_size(res),
> +				dev_name(&pdev->dev))) {
> +		dev_err(&pdev->dev, "address 0x%x in use\n", (u32) res->start);
> +		ret = -EBUSY;
> +		goto mem_free;
> +	}
> +
> +	info->baseaddr = ioremap(res->start, resource_size(res));

As we are moving stuff over to the devm_* APIs, it would be a good idea to
avoid introducing new code not using those APIs - otherwise it just creates
yet more churn, something which Linus objects to.  Can we please try in
future to avoid creating stuff which then needs to be subsequently modified
for the latest ways to do stuff...

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache
  2013-02-05  5:22 ` [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache Nicolas Pitre
@ 2013-04-23 19:40   ` Russell King - ARM Linux
  0 siblings, 0 replies; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:09AM -0500, Nicolas Pitre wrote:
> +	/*
> +	 * Multi-cluster systems may need this data when non-coherent, during
> +	 * cluster power-up/power-down. Make sure it reaches main memory:
> +	 */
> +	__cpuc_flush_dcache_area(info, sizeof *info);
> +	__cpuc_flush_dcache_area(&info, sizeof info);
> +	outer_clean_range(virt_to_phys(info), virt_to_phys(info + 1));
> +	outer_clean_range(virt_to_phys(&info), virt_to_phys(&info + 1));

This seems to be a recurring theme throughout these patches.  What it's
saying is that we need a proper way to do this, rather than keep on
open coding this same thing time and time again.  So, it's do something
about that rather than having to go back and rework all this stuff later,
creating "pointless churn" that Linus really hates.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
  2013-02-06 16:38   ` Pawel Moll
  2013-04-05 22:48   ` Olof Johansson
@ 2013-04-23 19:42   ` Russell King - ARM Linux
  2013-04-23 19:56     ` Nicolas Pitre
  2 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 19:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:22:12AM -0500, Nicolas Pitre wrote:
> +void __init vexpress_smp_init_ops(void)
> +{
> +	struct smp_operations *ops = &vexpress_smp_ops;
> +#ifdef CONFIG_CLUSTER_PM
> +	extern struct smp_operations mcpm_smp_ops;
> +	if(of_find_compatible_node(NULL, NULL, "arm,cci"))
> +		ops = &mcpm_smp_ops;

Hmm, so these patches haven't been subjected to checkpatch treatment?
As a seasoned kernel programmer, I'm surprised you haven't checked that
already.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 11/15] drivers/bus: add ARM CCI support
  2013-04-23 19:38   ` Russell King - ARM Linux
@ 2013-04-23 19:53     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Feb 05, 2013 at 12:22:08AM -0500, Nicolas Pitre wrote:
> > +void disable_cci(int cluster)
> > +{
> > +	u32 slave_reg = cluster ? CCI400_KF_OFFSET : CCI400_EAG_OFFSET;
> > +	writel_relaxed(0x0, info->baseaddr + slave_reg);
> > +
> > +	while (readl_relaxed(info->baseaddr + CCI_STATUS_OFFSET)
> > +						& STATUS_CHANGE_PENDING)
> > +			barrier();
> > +}
> > +EXPORT_SYMBOL_GPL(disable_cci);
> 
> This will blow up if the cci driver hasn't been probed - which I guess is
> fine.
> 
> > +
> > +static int cci_driver_probe(struct platform_device *pdev)
> > +{
> > +	struct resource *res;
> > +	int ret = 0;
> > +
> > +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> > +	if (!info) {
> > +		dev_err(&pdev->dev, "unable to allocate mem\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	if (!res) {
> > +		dev_err(&pdev->dev, "No memory resource\n");
> > +		ret = -EINVAL;
> > +		goto mem_free;
> > +	}
> > +
> > +	if (!request_mem_region(res->start, resource_size(res),
> > +				dev_name(&pdev->dev))) {
> > +		dev_err(&pdev->dev, "address 0x%x in use\n", (u32) res->start);
> > +		ret = -EBUSY;
> > +		goto mem_free;
> > +	}
> > +
> > +	info->baseaddr = ioremap(res->start, resource_size(res));
> 
> As we are moving stuff over to the devm_* APIs, it would be a good idea to
> avoid introducing new code not using those APIs - otherwise it just creates
> yet more churn, something which Linus objects to.  Can we please try in
> future to avoid creating stuff which then needs to be subsequently modified
> for the latest ways to do stuff...

This driver is not part of the MCPM pull request.  It was provided to 
illustrate how the MCPM layer is meant to be used.  The on-going 
discussion about CPU/cluster/CCI deviice tree bindings by Lorenzo will 
significantly enhance this driver.  And given the current timing it is 
likely that the enhanced driver will simply replace this one during the 
next cycle, so we should take your comment into account.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required
  2013-04-23 19:42   ` Russell King - ARM Linux
@ 2013-04-23 19:56     ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Feb 05, 2013 at 12:22:12AM -0500, Nicolas Pitre wrote:
> > +void __init vexpress_smp_init_ops(void)
> > +{
> > +	struct smp_operations *ops = &vexpress_smp_ops;
> > +#ifdef CONFIG_CLUSTER_PM
> > +	extern struct smp_operations mcpm_smp_ops;
> > +	if(of_find_compatible_node(NULL, NULL, "arm,cci"))
> > +		ops = &mcpm_smp_ops;
> 
> Hmm, so these patches haven't been subjected to checkpatch treatment?
> As a seasoned kernel programmer, I'm surprised you haven't checked that
> already.

They have and fixes applied, although after this set was posted.  I 
didn't think it was worth a repost.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
                   ` (14 preceding siblings ...)
  2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
@ 2013-04-23 20:04 ` Russell King - ARM Linux
  2013-04-23 21:03   ` Nicolas Pitre
  2013-04-23 21:11   ` Nicolas Pitre
  15 siblings, 2 replies; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 20:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 05, 2013 at 12:21:57AM -0500, Nicolas Pitre wrote:
> This is version 4 of the patch series required to safely power up
> and down CPUs in a cluster as can be found in b.L systems.  If nothing
> major is reported, I'll send a pull request to Russell for this set
> very soon.

Okay, so from spending an hour or so quickly scanning these patches,
on the face of it there are only a few comments I have which I'm very
surprised that your exhaustive team of apparant reviewers didn't
already catch.

The biggest disappointment so far in this patch set is finding the
re-use of the repeated cache handling stuff - when I've already made
my feelings _well_ known towards re-using existing cache APIs (which
are designed to handle a _purpose_) for different purposes "just
because" they appear to do what you need on a particular CPU.  I've
stated my objection to this kind of behaviour many times in the past,
I've even _purposely_ broken stuff (mtd stuff) when that's happened in
the past, and I've not changed my views on this one bit.  It's really
surprising to me that someone who has been around for such a long time,
and who knows my views on this, is pushing a patch set which goes
against that, and somehow thinks that I won't have a problem with
it... really?

What I suggest for the time being is to provide new inline function(s)
in arch/arm/include/cacheflush.h which are purposed for your application,
document them in that file, and call the implementation you're currently
using.  That means if we do have to change it in the future (for example,
we don't need to do anything in the dcache flushing stuff) we don't have
to hunt through all the code to find _your_ different use of that function
and fix it - we can just fix it in one place and we have the reference
there for what your code expects.

Again, this is nothing new - I've made my position on this kind of stuff
_exceedingly_ plain in the past.


That all said, my biggest concern so far from this set is the independent
issue of moving the cache handling into generic code, so that we avoid
the existing problem where every platform ends up implementing that stuff
time and time again - and the impact that will have on this MCPM code.

Currently as I see it, the two things are mutually incompatible with
each other - and having discussed with Will, we'd come to the conclusion
that I'd merge what I had because the comments alone on how the cpu
hotplug stuff is supposed to work are a valuable improvement, even
though the code changes don't completely solve all the issues.

However, MCPMm gets in the way of that.  So... that presents a dilema...
it's a case of this or that but not both.  Or do you think MCPM can
survive with additional LOUIS flushing before the cpu die callback is
called?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code
  2013-04-23 19:34     ` Nicolas Pitre
@ 2013-04-23 20:09       ` Russell King - ARM Linux
  2013-04-23 20:19         ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 20:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 23, 2013 at 03:34:08PM -0400, Nicolas Pitre wrote:
> On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> 
> > On Tue, Feb 05, 2013 at 12:21:58AM -0500, Nicolas Pitre wrote:
> > > +#include <asm/mcpm_entry.h>
> > > +#include <asm/barrier.h>
> > > +#include <asm/proc-fns.h>
> > > +#include <asm/cacheflush.h>
> > > +
> > > +extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
> > 
> > This should not be volatile.  You should know by now the stance in the
> > Linux community against using volatile on data declarations.  See
> > Documentation/volatile-considered-harmful.txt to remind yourself of
> > the reasoning.
> 
> That document says:
> 
> |The key point to understand with regard to volatile is that its purpose 
> |is to suppress optimization, which is almost never what one really 
> |wants to do.
> 
> Turns out that this is exactly what we want here: suppress optimization.

What optimization are you trying to suppress in the function below?

> However ...
> 
> > > +
> > > +void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> > > +{
> > > +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> > > +	mcpm_entry_vectors[cluster][cpu] = val;
> > > +	__cpuc_flush_dcache_area((void *)&mcpm_entry_vectors[cluster][cpu], 4);
> > > +	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
> > > +			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));
> > 
> > And really, if the write hasn't been done by the compiler prior to calling
> > __cpuc_flush_dcache_area() then we're into really bad problems.
> 
> That is indeed true.  The memory might have been uncacheable at some 
> point and then the volatile was necessary in that case.

I don't buy the argument about it being uncachable - the compiler doesn't
know that, and the cacheability of the memory really doesn't change the
code that the compiler produces.

Moreover, the compiler can't really reorder the store to
mcpm_entry_vectors[cluster][cpu] with the calls to the cache flushing
anyway - and as the compiler has no clue what those calls are doing
so it has to ensure that the results of the preceding code is visible
to some "unknown code" which it can't see.  Therefore, it practically
has no option but to issue the store before calling those cache flush
functions.

Also... consider using sizeof() rather than constant 4.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code
  2013-04-23 20:09       ` Russell King - ARM Linux
@ 2013-04-23 20:19         ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Apr 23, 2013 at 03:34:08PM -0400, Nicolas Pitre wrote:
> > On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> > 
> > > On Tue, Feb 05, 2013 at 12:21:58AM -0500, Nicolas Pitre wrote:
> > > > +#include <asm/mcpm_entry.h>
> > > > +#include <asm/barrier.h>
> > > > +#include <asm/proc-fns.h>
> > > > +#include <asm/cacheflush.h>
> > > > +
> > > > +extern volatile unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
> > > 
> > > This should not be volatile.  You should know by now the stance in the
> > > Linux community against using volatile on data declarations.  See
> > > Documentation/volatile-considered-harmful.txt to remind yourself of
> > > the reasoning.
> > 
> > That document says:
> > 
> > |The key point to understand with regard to volatile is that its purpose 
> > |is to suppress optimization, which is almost never what one really 
> > |wants to do.
> > 
> > Turns out that this is exactly what we want here: suppress optimization.
> 
> What optimization are you trying to suppress in the function below?

Ordering of writes to this memory wrt other code in case the above 
function was inlined.  But as I said this is a moot point now.

> > However ...
> > 
> > > > +
> > > > +void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> > > > +{
> > > > +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> > > > +	mcpm_entry_vectors[cluster][cpu] = val;
> > > > +	__cpuc_flush_dcache_area((void *)&mcpm_entry_vectors[cluster][cpu], 4);
> > > > +	outer_clean_range(__pa(&mcpm_entry_vectors[cluster][cpu]),
> > > > +			  __pa(&mcpm_entry_vectors[cluster][cpu + 1]));
> > > 
> > > And really, if the write hasn't been done by the compiler prior to calling
> > > __cpuc_flush_dcache_area() then we're into really bad problems.
> > 
> > That is indeed true.  The memory might have been uncacheable at some 
> > point and then the volatile was necessary in that case.
> 
> I don't buy the argument about it being uncachable - the compiler doesn't
> know that, and the cacheability of the memory really doesn't change the
> code that the compiler produces.

My point is that the memory is now cacheable these days, which requires 
explicit cache maintenance for the writes to hit main memory.  That 
cache maintenance acts as a barrier the volatile used to be before that 
cache maintenance call was there.

> Also... consider using sizeof() rather than constant 4.

Sure.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 20:04 ` [PATCH v4 00/15] multi-cluster power management Russell King - ARM Linux
@ 2013-04-23 21:03   ` Nicolas Pitre
  2013-04-23 21:46     ` Russell King - ARM Linux
  2013-04-24 14:25     ` Dave Martin
  2013-04-23 21:11   ` Nicolas Pitre
  1 sibling, 2 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 21:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> What I suggest for the time being is to provide new inline function(s)
> in arch/arm/include/cacheflush.h which are purposed for your application,
> document them in that file, and call the implementation you're currently
> using.  That means if we do have to change it in the future (for example,
> we don't need to do anything in the dcache flushing stuff) we don't have
> to hunt through all the code to find _your_ different use of that function
> and fix it - we can just fix it in one place and we have the reference
> there for what your code expects.

What about the patch below?  Once you tell me it is fine to you I'll 
adapt the MCPM series to it.

----- >8
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue, 23 Apr 2013 16:45:40 -0400
Subject: [PATCH] ARM: cacheflush: add synchronization helpers for mixed cache state accesses

Algorithms used by the MCPM layer rely on state variables which are
accessed while the cache is either active or inactive, depending
on the code path and the active state.

This patch introduces generic cache maintenance helpers to provide the
necessary cache synchronization for such state variables to always hit
main memory in a ordered way.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e1489c54cd..bff71388e7 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -363,4 +363,79 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
 		flush_cache_all();
 }
 
+/*
+ * Memory synchronization helpers for mixed cached vs non cached accesses.
+ *
+ * Some synchronization algorithms have to set states in memory with the
+ * cache enabled or disabled depending on the code path.  It is crucial
+ * to always ensure proper cache maintenance to update main memory right
+ * away in that case.
+ *
+ * Any cached write must be followed by a cache clean operation.
+ * Any cached read must be preceded by a cache invalidate operation.
+ * Yet, in the read case, a cache flush i.e. atomic clean+invalidate
+ * operation is needed to avoid discarding possible concurrent writes to the
+ * accessed memory.
+ *
+ * Also, in order to prevent a cached writer from interfering with an
+ * adjacent non-cached writer, each state variable must be located to
+ * a separate cache line.
+ */
+
+/*
+ * This needs to be >= the max cache writeback size of all
+ * supported platforms included in the current kernel configuration.
+ * This is used to align state variables to their own cache lines.
+ */
+#define __CACHE_WRITEBACK_ORDER 6  /* guessed from existing platforms */
+#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
+
+/*
+ * There is no __cpuc_clean_dcache_area but we use it anyway for
+ * code intent clarity, and alias it to __cpuc_flush_dcache_area.
+ */
+#define __cpuc_clean_dcache_area __cpuc_flush_dcache_area
+
+/*
+ * Ensure preceding writes to *p by this CPU are visible to
+ * subsequent reads by other CPUs:
+ */
+static inline void __sync_cache_range_w(volatile void *p, size_t size)
+{
+	char *_p = (char *)p;
+
+	__cpuc_clean_dcache_area(_p, size);
+	outer_clean_range(__pa(_p), __pa(_p + size));
+}
+
+/*
+ * Ensure preceding writes to *p by other CPUs are visible to
+ * subsequent reads by this CPU.  We must be careful not to
+ * discard data simultaneously written by another CPU, hence the
+ * usage of flush rather than invalidate operations.
+ */
+static inline void __sync_cache_range_r(volatile void *p, size_t size)
+{
+	char *_p = (char *)p;
+
+#ifdef CONFIG_OUTER_CACHE
+	if (outer_cache.flush_range) {
+		/*
+		 * Ensure dirty data migrated from other CPUs into our cache
+		 * are cleaned out safely before the outer cache is cleaned:
+		 */
+		__cpuc_clean_dcache_area(_p, size);
+
+		/* Clean and invalidate stale data for *p from outer ... */
+		outer_flush_range(__pa(_p), __pa(_p + size));
+	}
+#endif
+
+	/* ... and inner cache: */
+	__cpuc_flush_dcache_area(_p, size);
+}
+
+#define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
+#define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
+
 #endif

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 20:04 ` [PATCH v4 00/15] multi-cluster power management Russell King - ARM Linux
  2013-04-23 21:03   ` Nicolas Pitre
@ 2013-04-23 21:11   ` Nicolas Pitre
  1 sibling, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 21:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> That all said, my biggest concern so far from this set is the independent
> issue of moving the cache handling into generic code, so that we avoid
> the existing problem where every platform ends up implementing that stuff
> time and time again - and the impact that will have on this MCPM code.
> 
> Currently as I see it, the two things are mutually incompatible with
> each other - and having discussed with Will, we'd come to the conclusion
> that I'd merge what I had because the comments alone on how the cpu
> hotplug stuff is supposed to work are a valuable improvement, even
> though the code changes don't completely solve all the issues.
> 
> However, MCPMm gets in the way of that.  So... that presents a dilema...
> it's a case of this or that but not both.  Or do you think MCPM can
> survive with additional LOUIS flushing before the cpu die callback is
> called?

We can have both for now.  MCPM most certainly can survive it.

The cache should be mostly clean when the machine backend flushes up to 
LOUIS in that case so that shouldn't be very costly to do it again. And 
the removing of a CPU via the hotplug path is hardly a performance 
critical operation.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 21:03   ` Nicolas Pitre
@ 2013-04-23 21:46     ` Russell King - ARM Linux
  2013-04-23 21:56       ` Nicolas Pitre
  2013-04-24 14:25     ` Dave Martin
  1 sibling, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 21:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 23, 2013 at 05:03:06PM -0400, Nicolas Pitre wrote:
> On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> 
> > What I suggest for the time being is to provide new inline function(s)
> > in arch/arm/include/cacheflush.h which are purposed for your application,
> > document them in that file, and call the implementation you're currently
> > using.  That means if we do have to change it in the future (for example,
> > we don't need to do anything in the dcache flushing stuff) we don't have
> > to hunt through all the code to find _your_ different use of that function
> > and fix it - we can just fix it in one place and we have the reference
> > there for what your code expects.
> 
> What about the patch below?  Once you tell me it is fine to you I'll 
> adapt the MCPM series to it.

Yes, that looks like a saner solution.  Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 21:46     ` Russell King - ARM Linux
@ 2013-04-23 21:56       ` Nicolas Pitre
  2013-04-23 22:44         ` Russell King - ARM Linux
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-23 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Apr 23, 2013 at 05:03:06PM -0400, Nicolas Pitre wrote:
> > On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> > 
> > > What I suggest for the time being is to provide new inline function(s)
> > > in arch/arm/include/cacheflush.h which are purposed for your application,
> > > document them in that file, and call the implementation you're currently
> > > using.  That means if we do have to change it in the future (for example,
> > > we don't need to do anything in the dcache flushing stuff) we don't have
> > > to hunt through all the code to find _your_ different use of that function
> > > and fix it - we can just fix it in one place and we have the reference
> > > there for what your code expects.
> > 
> > What about the patch below?  Once you tell me it is fine to you I'll 
> > adapt the MCPM series to it.
> 
> Yes, that looks like a saner solution.  Thanks.

May I add your ACK to the patch?


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 21:56       ` Nicolas Pitre
@ 2013-04-23 22:44         ` Russell King - ARM Linux
  2013-04-24  4:11           ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-23 22:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 23, 2013 at 05:56:03PM -0400, Nicolas Pitre wrote:
> On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> 
> > On Tue, Apr 23, 2013 at 05:03:06PM -0400, Nicolas Pitre wrote:
> > > On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> > > 
> > > > What I suggest for the time being is to provide new inline function(s)
> > > > in arch/arm/include/cacheflush.h which are purposed for your application,
> > > > document them in that file, and call the implementation you're currently
> > > > using.  That means if we do have to change it in the future (for example,
> > > > we don't need to do anything in the dcache flushing stuff) we don't have
> > > > to hunt through all the code to find _your_ different use of that function
> > > > and fix it - we can just fix it in one place and we have the reference
> > > > there for what your code expects.
> > > 
> > > What about the patch below?  Once you tell me it is fine to you I'll 
> > > adapt the MCPM series to it.
> > 
> > Yes, that looks like a saner solution.  Thanks.
> 
> May I add your ACK to the patch?

Yes, sure.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 22:44         ` Russell King - ARM Linux
@ 2013-04-24  4:11           ` Nicolas Pitre
  2013-04-24 20:25             ` Russell King - ARM Linux
  0 siblings, 1 reply; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-24  4:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:

> On Tue, Apr 23, 2013 at 05:56:03PM -0400, Nicolas Pitre wrote:
> > On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> > 
> > > On Tue, Apr 23, 2013 at 05:03:06PM -0400, Nicolas Pitre wrote:
> > > > On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> > > > 
> > > > > What I suggest for the time being is to provide new inline function(s)
> > > > > in arch/arm/include/cacheflush.h which are purposed for your application,
> > > > > document them in that file, and call the implementation you're currently
> > > > > using.  That means if we do have to change it in the future (for example,
> > > > > we don't need to do anything in the dcache flushing stuff) we don't have
> > > > > to hunt through all the code to find _your_ different use of that function
> > > > > and fix it - we can just fix it in one place and we have the reference
> > > > > there for what your code expects.
> > > > 
> > > > What about the patch below?  Once you tell me it is fine to you I'll 
> > > > adapt the MCPM series to it.
> > > 
> > > Yes, that looks like a saner solution.  Thanks.
> > 
> > May I add your ACK to the patch?
> 
> Yes, sure.
> 
> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

Thanks.  I've adapted the series to this, and mcpm_set_entry_vector() 
now looks like this:

+extern unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
+
+void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
+{
+	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
+	mcpm_entry_vectors[cluster][cpu] = val;
+	sync_cache_w(&mcpm_entry_vectors[cluster][cpu]);
+}

Similarly for the CCI driver:

+	/*
+	 * Multi-cluster systems may need this data when non-coherent, during
+	 * cluster power-up/power-down. Make sure it reaches main memory:
+	 */
+	sync_cache_w(info);
+	sync_cache_w(&info);

This is also much prettier.

Assorted small changes for comments that followed the v4 post have been 
integrated as well.  I don't think they justify a repost of the whole 
thing though.  They are:

- Replace custom cache sync helpers with the sync_cache_*() ones from 
  cacheflush.h.

- Remove volatile annotations since explicit cache ops act as memory 
  barriers now.

- Rename mcpm_entry.h to mcpm.h since this is not solely about entry but
  more general.

- Export the __CACHE_WRITEBACK_* constants via asm_offsets.h since 
  they're needed by assembly code.

- Add some more comments and clarify some existing ones.

I managed to retest this on RTSM after fighting with the license system.

The latest MCPM patches are available here:

  git://git.linaro.org/people/nico/linux mcpm

Diffstat:

 Documentation/arm/cluster-pm-race-avoidance.txt | 498 ++++++++++++++++++
 Documentation/arm/vlocks.txt                    | 211 ++++++++
 arch/arm/Kconfig                                |   8 +
 arch/arm/common/Makefile                        |   1 +
 arch/arm/common/mcpm_entry.c                    | 263 +++++++++
 arch/arm/common/mcpm_head.S                     | 219 ++++++++
 arch/arm/common/mcpm_platsmp.c                  |  92 ++++
 arch/arm/common/vlock.S                         | 108 ++++
 arch/arm/common/vlock.h                         |  29 +
 arch/arm/include/asm/cacheflush.h               |  75 +++
 arch/arm/include/asm/mcpm.h                     | 209 ++++++++
 arch/arm/kernel/asm-offsets.c                   |   4 +
 12 files changed, 1717 insertions(+)

The corresponding DCSCB support for RTSM is here:

  git://git.linaro.org/people/nico/linux mcpm+dcscb

although that part is probably more suitable for ARM-SOC.

Do you want those patches to be posted again, or should I send a new 
pull request?


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup
  2013-04-05 23:00   ` Olof Johansson
  2013-04-06 13:41     ` Nicolas Pitre
@ 2013-04-24  9:10     ` Dave Martin
  1 sibling, 0 replies; 55+ messages in thread
From: Dave Martin @ 2013-04-24  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 05, 2013 at 04:00:17PM -0700, Olof Johansson wrote:
> Hi,
> 
> Just two nits below. One could be fixed incrementally, the other is
> a longer-term potential cleanup.
> 
> On Tue, Feb 05, 2013 at 12:22:00AM -0500, Nicolas Pitre wrote:
> 
> > diff --git a/arch/arm/include/asm/mcpm_entry.h b/arch/arm/include/asm/mcpm_entry.h
> > index 3286d5eb91..e76652209d 100644
> > --- a/arch/arm/include/asm/mcpm_entry.h
> > +++ b/arch/arm/include/asm/mcpm_entry.h
> > @@ -15,8 +15,37 @@
> >  #define MAX_CPUS_PER_CLUSTER	4
> >  #define MAX_NR_CLUSTERS		2
> >  
> > +/* Definitions for mcpm_sync_struct */
> > +#define CPU_DOWN		0x11
> > +#define CPU_COMING_UP		0x12
> > +#define CPU_UP			0x13
> > +#define CPU_GOING_DOWN		0x14
> > +
> > +#define CLUSTER_DOWN		0x21
> > +#define CLUSTER_UP		0x22
> > +#define CLUSTER_GOING_DOWN	0x23
> > +
> > +#define INBOUND_NOT_COMING_UP	0x31
> > +#define INBOUND_COMING_UP	0x32
> > +
> > +/* This is a complete guess. */
> > +#define __CACHE_WRITEBACK_ORDER	6
> > +#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
> 
> Something a little more educational could be useful here. It needs to
> be the max writeback/line size of the supported platforms of the binary
> kernel, i.e. if someone builds an SoC with 128-byte L2 cache lines this
> guess will break.

Will commented on the same thing.  Ideally, a suitable definition should
get selected through Kconfig.

The guess specified here is adequate for known b.L platforms, but a
better framework for configuring this is desirable.

My original comment here is certainly not very informative though --
I'll post a clarification.

> > +/* Offsets for the mcpm_sync_struct members, for use in asm: */
> > +#define MCPM_SYNC_CLUSTER_CPUS	0
> > +#define MCPM_SYNC_CPU_SIZE	__CACHE_WRITEBACK_GRANULE
> > +#define MCPM_SYNC_CLUSTER_CLUSTER \
> > +	(MCPM_SYNC_CLUSTER_CPUS + MCPM_SYNC_CPU_SIZE * MAX_CPUS_PER_CLUSTER)
> > +#define MCPM_SYNC_CLUSTER_INBOUND \
> > +	(MCPM_SYNC_CLUSTER_CLUSTER + __CACHE_WRITEBACK_GRANULE)
> > +#define MCPM_SYNC_CLUSTER_SIZE \
> > +	(MCPM_SYNC_CLUSTER_INBOUND + __CACHE_WRITEBACK_GRANULE)
> > +
> >  #ifndef __ASSEMBLY__
> >  
> > +#include <linux/types.h>
> > +
> >  /*
> >   * Platform specific code should use this symbol to set up secondary
> >   * entry location for processors to use when released from reset.
> > @@ -123,5 +152,39 @@ struct mcpm_platform_ops {
> >   */
> >  int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
> >  
> > +/* Synchronisation structures for coordinating safe cluster setup/teardown: */
> > +
> > +/*
> > + * When modifying this structure, make sure you update the MCPM_SYNC_ defines
> > + * to match.
> > + */
> 
> It would be nice to introduce something like arch/powerpc/kernel/asm-offsets.c
> on ARM to generate the ASM-side automatically from the C struct instead for
> this and other cases, but that's unrelated to this addition.

ARM already has its own asm-offsets, and we actually used this in an earlier
version of the patches.  However, this felt like pollution of that file for
not a lot of benefit.  Does powerpc have a problem with asm-offsets becoming
a dumping ground for random junk, or is there a better way to organise it?

Cheers
---Dave

> > +struct mcpm_sync_struct {
> > +	/* individual CPU states */
> > +	struct {
> > +		volatile s8 cpu __aligned(__CACHE_WRITEBACK_GRANULE);
> > +	} cpus[MAX_CPUS_PER_CLUSTER];
> > +
> > +	/* cluster state */
> > +	volatile s8 cluster __aligned(__CACHE_WRITEBACK_GRANULE);
> > +
> > +	/* inbound-side state */
> > +	volatile s8 inbound __aligned(__CACHE_WRITEBACK_GRANULE);
> > +};
> 
> 
> -Olof

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-23 21:03   ` Nicolas Pitre
  2013-04-23 21:46     ` Russell King - ARM Linux
@ 2013-04-24 14:25     ` Dave Martin
  1 sibling, 0 replies; 55+ messages in thread
From: Dave Martin @ 2013-04-24 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 23, 2013 at 05:03:06PM -0400, Nicolas Pitre wrote:
> On Tue, 23 Apr 2013, Russell King - ARM Linux wrote:
> 
> > What I suggest for the time being is to provide new inline function(s)
> > in arch/arm/include/cacheflush.h which are purposed for your application,
> > document them in that file, and call the implementation you're currently
> > using.  That means if we do have to change it in the future (for example,
> > we don't need to do anything in the dcache flushing stuff) we don't have
> > to hunt through all the code to find _your_ different use of that function
> > and fix it - we can just fix it in one place and we have the reference
> > there for what your code expects.
> 
> What about the patch below?  Once you tell me it is fine to you I'll 
> adapt the MCPM series to it.

This looks appropriate for mcpm's needs, and generic enough for other
code to reuse.  Thanks for that.

Acked-by: Dave Martin <dave.martin@linaro.org>

> 
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Tue, 23 Apr 2013 16:45:40 -0400
> Subject: [PATCH] ARM: cacheflush: add synchronization helpers for mixed cache state accesses
> 
> Algorithms used by the MCPM layer rely on state variables which are
> accessed while the cache is either active or inactive, depending
> on the code path and the active state.
> 
> This patch introduces generic cache maintenance helpers to provide the
> necessary cache synchronization for such state variables to always hit
> main memory in a ordered way.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index e1489c54cd..bff71388e7 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -363,4 +363,79 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
>  		flush_cache_all();
>  }
>  
> +/*
> + * Memory synchronization helpers for mixed cached vs non cached accesses.
> + *
> + * Some synchronization algorithms have to set states in memory with the
> + * cache enabled or disabled depending on the code path.  It is crucial
> + * to always ensure proper cache maintenance to update main memory right
> + * away in that case.
> + *
> + * Any cached write must be followed by a cache clean operation.
> + * Any cached read must be preceded by a cache invalidate operation.
> + * Yet, in the read case, a cache flush i.e. atomic clean+invalidate
> + * operation is needed to avoid discarding possible concurrent writes to the
> + * accessed memory.
> + *
> + * Also, in order to prevent a cached writer from interfering with an
> + * adjacent non-cached writer, each state variable must be located to
> + * a separate cache line.
> + */
> +
> +/*
> + * This needs to be >= the max cache writeback size of all
> + * supported platforms included in the current kernel configuration.
> + * This is used to align state variables to their own cache lines.
> + */
> +#define __CACHE_WRITEBACK_ORDER 6  /* guessed from existing platforms */
> +#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
> +
> +/*
> + * There is no __cpuc_clean_dcache_area but we use it anyway for
> + * code intent clarity, and alias it to __cpuc_flush_dcache_area.
> + */
> +#define __cpuc_clean_dcache_area __cpuc_flush_dcache_area
> +
> +/*
> + * Ensure preceding writes to *p by this CPU are visible to
> + * subsequent reads by other CPUs:
> + */
> +static inline void __sync_cache_range_w(volatile void *p, size_t size)
> +{
> +	char *_p = (char *)p;
> +
> +	__cpuc_clean_dcache_area(_p, size);
> +	outer_clean_range(__pa(_p), __pa(_p + size));
> +}
> +
> +/*
> + * Ensure preceding writes to *p by other CPUs are visible to
> + * subsequent reads by this CPU.  We must be careful not to
> + * discard data simultaneously written by another CPU, hence the
> + * usage of flush rather than invalidate operations.
> + */
> +static inline void __sync_cache_range_r(volatile void *p, size_t size)
> +{
> +	char *_p = (char *)p;
> +
> +#ifdef CONFIG_OUTER_CACHE
> +	if (outer_cache.flush_range) {
> +		/*
> +		 * Ensure dirty data migrated from other CPUs into our cache
> +		 * are cleaned out safely before the outer cache is cleaned:
> +		 */
> +		__cpuc_clean_dcache_area(_p, size);
> +
> +		/* Clean and invalidate stale data for *p from outer ... */
> +		outer_flush_range(__pa(_p), __pa(_p + size));
> +	}
> +#endif
> +
> +	/* ... and inner cache: */
> +	__cpuc_flush_dcache_area(_p, size);
> +}
> +
> +#define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> +#define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> +
>  #endif

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-24  4:11           ` Nicolas Pitre
@ 2013-04-24 20:25             ` Russell King - ARM Linux
  2013-04-24 23:31               ` Nicolas Pitre
  0 siblings, 1 reply; 55+ messages in thread
From: Russell King - ARM Linux @ 2013-04-24 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 24, 2013 at 12:11:26AM -0400, Nicolas Pitre wrote:
> The latest MCPM patches are available here:
> 
>   git://git.linaro.org/people/nico/linux mcpm
> 
> Diffstat:
> 
>  Documentation/arm/cluster-pm-race-avoidance.txt | 498 ++++++++++++++++++
>  Documentation/arm/vlocks.txt                    | 211 ++++++++
>  arch/arm/Kconfig                                |   8 +
>  arch/arm/common/Makefile                        |   1 +
>  arch/arm/common/mcpm_entry.c                    | 263 +++++++++
>  arch/arm/common/mcpm_head.S                     | 219 ++++++++
>  arch/arm/common/mcpm_platsmp.c                  |  92 ++++
>  arch/arm/common/vlock.S                         | 108 ++++
>  arch/arm/common/vlock.h                         |  29 +
>  arch/arm/include/asm/cacheflush.h               |  75 +++
>  arch/arm/include/asm/mcpm.h                     | 209 ++++++++
>  arch/arm/kernel/asm-offsets.c                   |   4 +
>  12 files changed, 1717 insertions(+)
> 
> The corresponding DCSCB support for RTSM is here:
> 
>   git://git.linaro.org/people/nico/linux mcpm+dcscb
> 
> although that part is probably more suitable for ARM-SOC.
> 
> Do you want those patches to be posted again, or should I send a new 
> pull request?

I've been doing some further digging in these patches this evening, via
https://git.linaro.org/gitweb?p=people/nico/linux.git;a=shortlog;h=refs/heads/mcpm

I think they're now in pretty good shape and are ready to be pulled.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 00/15] multi-cluster power management
  2013-04-24 20:25             ` Russell King - ARM Linux
@ 2013-04-24 23:31               ` Nicolas Pitre
  0 siblings, 0 replies; 55+ messages in thread
From: Nicolas Pitre @ 2013-04-24 23:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Apr 2013, Russell King - ARM Linux wrote:

> On Wed, Apr 24, 2013 at 12:11:26AM -0400, Nicolas Pitre wrote:
> > The latest MCPM patches are available here:
> > 
> >   git://git.linaro.org/people/nico/linux mcpm
> > 
> > Diffstat:
> > 
> >  Documentation/arm/cluster-pm-race-avoidance.txt | 498 ++++++++++++++++++
> >  Documentation/arm/vlocks.txt                    | 211 ++++++++
> >  arch/arm/Kconfig                                |   8 +
> >  arch/arm/common/Makefile                        |   1 +
> >  arch/arm/common/mcpm_entry.c                    | 263 +++++++++
> >  arch/arm/common/mcpm_head.S                     | 219 ++++++++
> >  arch/arm/common/mcpm_platsmp.c                  |  92 ++++
> >  arch/arm/common/vlock.S                         | 108 ++++
> >  arch/arm/common/vlock.h                         |  29 +
> >  arch/arm/include/asm/cacheflush.h               |  75 +++
> >  arch/arm/include/asm/mcpm.h                     | 209 ++++++++
> >  arch/arm/kernel/asm-offsets.c                   |   4 +
> >  12 files changed, 1717 insertions(+)
> > 
> > The corresponding DCSCB support for RTSM is here:
> > 
> >   git://git.linaro.org/people/nico/linux mcpm+dcscb
> > 
> > although that part is probably more suitable for ARM-SOC.
> > 
> > Do you want those patches to be posted again, or should I send a new 
> > pull request?
> 
> I've been doing some further digging in these patches this evening, via
> https://git.linaro.org/gitweb?p=people/nico/linux.git;a=shortlog;h=refs/heads/mcpm
> 
> I think they're now in pretty good shape and are ready to be pulled.

Great!  Please feel free to pull the mcpm branch.  I've nothing to add 
to it.


Nicolas

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2013-04-24 23:31 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-05  5:21 [PATCH v4 00/15] multi-cluster power management Nicolas Pitre
2013-02-05  5:21 ` [PATCH v4 01/15] ARM: multi-cluster PM: secondary kernel entry code Nicolas Pitre
2013-04-23 19:19   ` Russell King - ARM Linux
2013-04-23 19:34     ` Nicolas Pitre
2013-04-23 20:09       ` Russell King - ARM Linux
2013-04-23 20:19         ` Nicolas Pitre
2013-02-05  5:21 ` [PATCH v4 02/15] ARM: mcpm: introduce the CPU/cluster power API Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup Nicolas Pitre
2013-04-05 23:00   ` Olof Johansson
2013-04-06 13:41     ` Nicolas Pitre
2013-04-24  9:10     ` Dave Martin
2013-02-05  5:22 ` [PATCH v4 04/15] ARM: mcpm: Add baremetal voting mutexes Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 05/15] ARM: mcpm_head.S: vlock-based first man election Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 06/15] ARM: mcpm: generic SMP secondary bringup and hotplug support Nicolas Pitre
2013-04-23 19:31   ` Russell King - ARM Linux
2013-04-23 19:36     ` Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 07/15] ARM: introduce common set_auxcr/get_auxcr functions Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 08/15] ARM: vexpress: introduce DCSCB support Nicolas Pitre
2013-02-07 18:14   ` Catalin Marinas
2013-02-07 18:56     ` Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 09/15] ARM: vexpress/dcscb: add CPU use counts to the power up/down API implementation Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 10/15] ARM: vexpress/dcscb: do not hardcode number of CPUs per cluster Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 11/15] drivers/bus: add ARM CCI support Nicolas Pitre
2013-04-23 19:38   ` Russell King - ARM Linux
2013-04-23 19:53     ` Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 12/15] ARM: CCI: ensure powerdown-time data is flushed from cache Nicolas Pitre
2013-04-23 19:40   ` Russell King - ARM Linux
2013-02-05  5:22 ` [PATCH v4 13/15] ARM: vexpress/dcscb: handle platform coherency exit/setup and CCI Nicolas Pitre
2013-02-05  5:22 ` [PATCH v4 14/15] ARM: Enable selection of SMP operations at boot time Nicolas Pitre
2013-04-05 22:43   ` Olof Johansson
2013-04-06 13:43     ` Nicolas Pitre
2013-04-09 16:30   ` Nicolas Pitre
2013-04-09 16:55     ` Jon Medhurst (Tixy)
2013-02-05  5:22 ` [PATCH v4 15/15] ARM: vexpress: Select multi-cluster SMP operation if required Nicolas Pitre
2013-02-06 16:38   ` Pawel Moll
2013-02-06 17:55     ` Nicolas Pitre
2013-04-05 22:48   ` Olof Johansson
2013-04-06 14:02     ` Nicolas Pitre
2013-04-08  9:10       ` Jon Medhurst (Tixy)
2013-04-09  5:41         ` Nicolas Pitre
2013-04-09  6:00           ` Jon Medhurst (Tixy)
2013-04-09 16:34             ` Nicolas Pitre
2013-04-09 17:28               ` Jon Medhurst (Tixy)
2013-04-23 19:42   ` Russell King - ARM Linux
2013-04-23 19:56     ` Nicolas Pitre
2013-04-23 20:04 ` [PATCH v4 00/15] multi-cluster power management Russell King - ARM Linux
2013-04-23 21:03   ` Nicolas Pitre
2013-04-23 21:46     ` Russell King - ARM Linux
2013-04-23 21:56       ` Nicolas Pitre
2013-04-23 22:44         ` Russell King - ARM Linux
2013-04-24  4:11           ` Nicolas Pitre
2013-04-24 20:25             ` Russell King - ARM Linux
2013-04-24 23:31               ` Nicolas Pitre
2013-04-24 14:25     ` Dave Martin
2013-04-23 21:11   ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.