All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/8] arm64: Verify early CPU features
@ 2015-12-09  9:57 ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

This series adds support for verifying some of the cpufeatures
that are decided early in the boot process based on the boot
CPU and cannot be delayed until all the CPUs are up (e.g, ASIDBits
and may be VHE?). It also adds support for handling the failures
in booting the secondary CPUs which could not be synchronised
with the master CPU, otherwise.

It also adds one of the users of this early hook, check for ASIDBits.
The mm_context id is based on the ASIDBits width supported by the
boot CPU and is used early in the initialisation. So we need to make
sure that all the secondary CPUs supports the width reported by the
booting CPU, failing which we crash the system.

This series has been tested on Juno, Fast model by injecting smaller
ASIDBits and lack of page-size support.

Changes since V2:
 - Add support for synchronising the booting status of a secondary
   CPU. Added RFC Tag. Patch - 4
 - Panic the system with incompatible ASIDBits

Suzuki K. Poulose (8):
  arm64: Introduce cpu_die_early
  arm64: Move cpu_die_early to smp.c
  arm64: head.S : Change register usage
  arm64: Handle early CPU boot failures
  arm64: Enable CPU capability verification unconditionally
  arm64: Add hook for checking early CPU features
  arm64: Add helper for extracting ASIDBits
  arm64: Ensure the secondary CPUs have safe ASIDBits size

 arch/arm64/include/asm/cpufeature.h  |    6 ---
 arch/arm64/include/asm/mmu_context.h |    2 +
 arch/arm64/include/asm/smp.h         |   25 +++++++++++-
 arch/arm64/kernel/asm-offsets.c      |    3 ++
 arch/arm64/kernel/cpufeature.c       |   48 ++++++++--------------
 arch/arm64/kernel/head.S             |   22 ++++++++--
 arch/arm64/kernel/smp.c              |   73 +++++++++++++++++++++++++++++++++-
 arch/arm64/mm/context.c              |   52 ++++++++++++++++++------
 8 files changed, 176 insertions(+), 55 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 0/8] arm64: Verify early CPU features
@ 2015-12-09  9:57 ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

This series adds support for verifying some of the cpufeatures
that are decided early in the boot process based on the boot
CPU and cannot be delayed until all the CPUs are up (e.g, ASIDBits
and may be VHE?). It also adds support for handling the failures
in booting the secondary CPUs which could not be synchronised
with the master CPU, otherwise.

It also adds one of the users of this early hook, check for ASIDBits.
The mm_context id is based on the ASIDBits width supported by the
boot CPU and is used early in the initialisation. So we need to make
sure that all the secondary CPUs supports the width reported by the
booting CPU, failing which we crash the system.

This series has been tested on Juno, Fast model by injecting smaller
ASIDBits and lack of page-size support.

Changes since V2:
 - Add support for synchronising the booting status of a secondary
   CPU. Added RFC Tag. Patch - 4
 - Panic the system with incompatible ASIDBits

Suzuki K. Poulose (8):
  arm64: Introduce cpu_die_early
  arm64: Move cpu_die_early to smp.c
  arm64: head.S : Change register usage
  arm64: Handle early CPU boot failures
  arm64: Enable CPU capability verification unconditionally
  arm64: Add hook for checking early CPU features
  arm64: Add helper for extracting ASIDBits
  arm64: Ensure the secondary CPUs have safe ASIDBits size

 arch/arm64/include/asm/cpufeature.h  |    6 ---
 arch/arm64/include/asm/mmu_context.h |    2 +
 arch/arm64/include/asm/smp.h         |   25 +++++++++++-
 arch/arm64/kernel/asm-offsets.c      |    3 ++
 arch/arm64/kernel/cpufeature.c       |   48 ++++++++--------------
 arch/arm64/kernel/head.S             |   22 ++++++++--
 arch/arm64/kernel/smp.c              |   73 +++++++++++++++++++++++++++++++++-
 arch/arm64/mm/context.c              |   52 ++++++++++++++++++------
 8 files changed, 176 insertions(+), 55 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 1/8] arm64: Introduce cpu_die_early
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

Or in other words, make fail_incapable_cpu() reusable.

We use fail_incapable_cpu() to kill a secondary CPU early during the
bringup, which doesn't have the system advertised capabilities.
This patch makes the routine more generic, to kill a secondary
booting CPU, getting rid of the dependency on capability struct.
This can be used by checks which are not necessarily attached to
a capability struct (e.g, cpu ASIDBits).

In that process, renames the function to cpu_die_early() to better
match its functionality. This will be later moved to arch/arm64/kernel/smp.c.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpufeature.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0669c63..581b779 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -826,25 +826,26 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
- * Park the CPU which doesn't have the capability as advertised
- * by the system.
+ * Kill the calling secondary CPU, early in bringup before it is turned
+ * online.
  */
-static void fail_incapable_cpu(char *cap_type,
-				 const struct arm64_cpu_capabilities *cap)
+void cpu_die_early(void)
 {
 	int cpu = smp_processor_id();
 
-	pr_crit("CPU%d: missing %s : %s\n", cpu, cap_type, cap->desc);
+	pr_crit("CPU%d: will not boot\n", cpu);
+
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
 
 	/* Check if we can park ourselves */
 	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
 		cpu_ops[cpu]->cpu_die(cpu);
-	asm(
-	"1:	wfe\n"
-	"	wfi\n"
-	"	b	1b");
+
+	for (;;) {
+		wfe();
+		wfi();
+	}
 }
 
 /*
@@ -875,8 +876,11 @@ void verify_local_cpu_capabilities(void)
 		 * If the new CPU misses an advertised feature, we cannot proceed
 		 * further, park the cpu.
 		 */
-		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i]))
-			fail_incapable_cpu("arm64_features", &caps[i]);
+		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i])) {
+			pr_crit("CPU%d: missing feature: %s\n",
+					smp_processor_id(), caps[i].desc);
+			cpu_die_early();
+		}
 		if (caps[i].enable)
 			caps[i].enable(NULL);
 	}
@@ -884,8 +888,11 @@ void verify_local_cpu_capabilities(void)
 	for (i = 0, caps = arm64_hwcaps; caps[i].desc; i++) {
 		if (!cpus_have_hwcap(&caps[i]))
 			continue;
-		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i]))
-			fail_incapable_cpu("arm64_hwcaps", &caps[i]);
+		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i])) {
+			pr_crit("CPU%d: missing HWCAP: %s\n",
+					smp_processor_id(), caps[i].desc);
+			cpu_die_early();
+		}
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 1/8] arm64: Introduce cpu_die_early
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

Or in other words, make fail_incapable_cpu() reusable.

We use fail_incapable_cpu() to kill a secondary CPU early during the
bringup, which doesn't have the system advertised capabilities.
This patch makes the routine more generic, to kill a secondary
booting CPU, getting rid of the dependency on capability struct.
This can be used by checks which are not necessarily attached to
a capability struct (e.g, cpu ASIDBits).

In that process, renames the function to cpu_die_early() to better
match its functionality. This will be later moved to arch/arm64/kernel/smp.c.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpufeature.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0669c63..581b779 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -826,25 +826,26 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
- * Park the CPU which doesn't have the capability as advertised
- * by the system.
+ * Kill the calling secondary CPU, early in bringup before it is turned
+ * online.
  */
-static void fail_incapable_cpu(char *cap_type,
-				 const struct arm64_cpu_capabilities *cap)
+void cpu_die_early(void)
 {
 	int cpu = smp_processor_id();
 
-	pr_crit("CPU%d: missing %s : %s\n", cpu, cap_type, cap->desc);
+	pr_crit("CPU%d: will not boot\n", cpu);
+
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
 
 	/* Check if we can park ourselves */
 	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
 		cpu_ops[cpu]->cpu_die(cpu);
-	asm(
-	"1:	wfe\n"
-	"	wfi\n"
-	"	b	1b");
+
+	for (;;) {
+		wfe();
+		wfi();
+	}
 }
 
 /*
@@ -875,8 +876,11 @@ void verify_local_cpu_capabilities(void)
 		 * If the new CPU misses an advertised feature, we cannot proceed
 		 * further, park the cpu.
 		 */
-		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i]))
-			fail_incapable_cpu("arm64_features", &caps[i]);
+		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i])) {
+			pr_crit("CPU%d: missing feature: %s\n",
+					smp_processor_id(), caps[i].desc);
+			cpu_die_early();
+		}
 		if (caps[i].enable)
 			caps[i].enable(NULL);
 	}
@@ -884,8 +888,11 @@ void verify_local_cpu_capabilities(void)
 	for (i = 0, caps = arm64_hwcaps; caps[i].desc; i++) {
 		if (!cpus_have_hwcap(&caps[i]))
 			continue;
-		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i]))
-			fail_incapable_cpu("arm64_hwcaps", &caps[i]);
+		if (!feature_matches(__raw_read_system_reg(caps[i].sys_reg), &caps[i])) {
+			pr_crit("CPU%d: missing HWCAP: %s\n",
+					smp_processor_id(), caps[i].desc);
+			cpu_die_early();
+		}
 	}
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

This patch moves cpu_die_early to smp.c, where it fits better.
No functional changes, except for adding the necessary checks
for CONFIG_HOTPLUG_CPU.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/smp.h   |    1 +
 arch/arm64/kernel/cpufeature.c |   23 -----------------------
 arch/arm64/kernel/smp.c        |   25 +++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d9c3d6a..13ce01f 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -68,5 +68,6 @@ extern int __cpu_disable(void);
 
 extern void __cpu_die(unsigned int cpu);
 extern void cpu_die(void);
+extern void cpu_die_early(void);
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 581b779..517a3af 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -826,29 +826,6 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
- * Kill the calling secondary CPU, early in bringup before it is turned
- * online.
- */
-void cpu_die_early(void)
-{
-	int cpu = smp_processor_id();
-
-	pr_crit("CPU%d: will not boot\n", cpu);
-
-	/* Mark this CPU absent */
-	set_cpu_present(cpu, 0);
-
-	/* Check if we can park ourselves */
-	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
-		cpu_ops[cpu]->cpu_die(cpu);
-
-	for (;;) {
-		wfe();
-		wfi();
-	}
-}
-
-/*
  * Run through the enabled system capabilities and enable() it on this CPU.
  * The capabilities were decided based on the available CPUs at the boot time.
  * Any new CPU should match the system wide status of the capability. If the
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..607d876 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -313,6 +313,31 @@ void cpu_die(void)
 }
 #endif
 
+/*
+ * Kill the calling secondary CPU, early in bringup before it is turned
+ * online.
+ */
+void cpu_die_early(void)
+{
+	int cpu = smp_processor_id();
+
+	pr_crit("CPU%d: will not boot\n", cpu);
+
+	/* Mark this CPU absent */
+	set_cpu_present(cpu, 0);
+
+#ifdef CONFIG_HOTPLUG_CPU
+	/* Check if we can park ourselves */
+	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	for (;;) {
+		wfe();
+		wfi();
+	}
+}
+
 static void __init hyp_mode_check(void)
 {
 	if (is_hyp_mode_available())
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

This patch moves cpu_die_early to smp.c, where it fits better.
No functional changes, except for adding the necessary checks
for CONFIG_HOTPLUG_CPU.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/smp.h   |    1 +
 arch/arm64/kernel/cpufeature.c |   23 -----------------------
 arch/arm64/kernel/smp.c        |   25 +++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index d9c3d6a..13ce01f 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -68,5 +68,6 @@ extern int __cpu_disable(void);
 
 extern void __cpu_die(unsigned int cpu);
 extern void cpu_die(void);
+extern void cpu_die_early(void);
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 581b779..517a3af 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -826,29 +826,6 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
- * Kill the calling secondary CPU, early in bringup before it is turned
- * online.
- */
-void cpu_die_early(void)
-{
-	int cpu = smp_processor_id();
-
-	pr_crit("CPU%d: will not boot\n", cpu);
-
-	/* Mark this CPU absent */
-	set_cpu_present(cpu, 0);
-
-	/* Check if we can park ourselves */
-	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
-		cpu_ops[cpu]->cpu_die(cpu);
-
-	for (;;) {
-		wfe();
-		wfi();
-	}
-}
-
-/*
  * Run through the enabled system capabilities and enable() it on this CPU.
  * The capabilities were decided based on the available CPUs at the boot time.
  * Any new CPU should match the system wide status of the capability. If the
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..607d876 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -313,6 +313,31 @@ void cpu_die(void)
 }
 #endif
 
+/*
+ * Kill the calling secondary CPU, early in bringup before it is turned
+ * online.
+ */
+void cpu_die_early(void)
+{
+	int cpu = smp_processor_id();
+
+	pr_crit("CPU%d: will not boot\n", cpu);
+
+	/* Mark this CPU absent */
+	set_cpu_present(cpu, 0);
+
+#ifdef CONFIG_HOTPLUG_CPU
+	/* Check if we can park ourselves */
+	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+
+	for (;;) {
+		wfe();
+		wfi();
+	}
+}
+
 static void __init hyp_mode_check(void)
 {
 	if (is_hyp_mode_available())
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 3/8] arm64: head.S : Change register usage
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

The stack information for a secondary CPU (struct secondary_data)
is passed on in x21 before the MMU is turned on. The boot CPU uses
x21 to store the FDT pointer. In a common code shared by these
CPUs, it gets difficult to use the information. So, we use
x22 for passing the secondary_data and initialise it with NULL
for the booting CPU for consistency. This will be used in a following
patch.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/head.S |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 23cfc08..b225d34 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -221,6 +221,7 @@ ENTRY(stext)
 	 */
 	ldr	x27, =__mmap_switched		// address to jump to after
 						// MMU has been enabled
+	mov	x22, 0				// &secondary_data for Boot CPU
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
@@ -598,13 +599,13 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x21, =secondary_data
+	ldr	x22, =secondary_data
 	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
 	b	__enable_mmu
 ENDPROC(secondary_startup)
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x21]			// get secondary_data.stack
+	ldr	x0, [x22]			// get secondary_data.stack
 	mov	sp, x0
 	mov	x29, #0
 	b	secondary_start_kernel
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 3/8] arm64: head.S : Change register usage
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

The stack information for a secondary CPU (struct secondary_data)
is passed on in x21 before the MMU is turned on. The boot CPU uses
x21 to store the FDT pointer. In a common code shared by these
CPUs, it gets difficult to use the information. So, we use
x22 for passing the secondary_data and initialise it with NULL
for the booting CPU for consistency. This will be used in a following
patch.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/head.S |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 23cfc08..b225d34 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -221,6 +221,7 @@ ENTRY(stext)
 	 */
 	ldr	x27, =__mmap_switched		// address to jump to after
 						// MMU has been enabled
+	mov	x22, 0				// &secondary_data for Boot CPU
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
@@ -598,13 +599,13 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x21, =secondary_data
+	ldr	x22, =secondary_data
 	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
 	b	__enable_mmu
 ENDPROC(secondary_startup)
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x21]			// get secondary_data.stack
+	ldr	x0, [x22]			// get secondary_data.stack
 	mov	sp, x0
 	mov	x29, #0
 	b	secondary_start_kernel
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

A secondary CPU could fail to come online due to insufficient
capabilities and could simply die or loop in the kernel.
e.g, a CPU with no support for the selected kernel PAGE_SIZE
loops in kernel with MMU turned off.
or a hotplugged CPU which doesn't have one of the advertised
system capability will die during the activation.

There is no way to synchronise the status of the failing CPU
back to the master. This patch solves the issue by adding a
field to the secondary_data which can be updated by the failing
CPU.

Here are the possible states :

 -1. CPU_WAIT_STATUS - Initial value set by the master CPU.

 0. CPU_BOOT_SUCCESS - CPU has booted successfully.

 1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the
master CPU to synchronise by issuing a cpu_ops->cpu_kill.

 2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is
looping in the kernel. This information could be used by say,
kexec to check if it is really safe to do a kexec reboot.

 3. CPU_PANIC_KERNEL - CPU detected some serious issues which
requires kernel to crash immediately. The secondary CPU cannot
call panic() until it has initialised the GIC. This flag can
be used to instruct the master to do so.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/smp.h    |   24 +++++++++++++++++++-
 arch/arm64/kernel/asm-offsets.c |    3 +++
 arch/arm64/kernel/head.S        |   19 ++++++++++++++--
 arch/arm64/kernel/smp.c         |   48 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 13ce01f..240ab3d 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -16,6 +16,19 @@
 #ifndef __ASM_SMP_H
 #define __ASM_SMP_H
 
+/* Values for secondary_data.status */
+
+#define CPU_WAIT_RESULT		-1
+#define CPU_BOOT_SUCCESS	0
+/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */
+#define CPU_KILL_ME		1
+/* The cpu couldn't die gracefully and is looping in the kernel */
+#define CPU_STUCK_IN_KERNEL	2
+/* Fatal system error detected by secondary CPU, crash the system */
+#define CPU_PANIC_KERNEL	3
+
+#ifndef __ASSEMBLY__
+
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/thread_info.h>
@@ -54,10 +67,15 @@ asmlinkage void secondary_start_kernel(void);
 
 /*
  * Initial data for bringing up a secondary CPU.
+ * @stack  - sp for the secondary CPU
+ * @status - Result passed back from the secondary CPU to
+ *           indicate failure.
  */
 struct secondary_data {
 	void *stack;
-};
+	unsigned long status;
+} ____cacheline_aligned;
+
 extern struct secondary_data secondary_data;
 extern void secondary_entry(void);
 
@@ -70,4 +88,8 @@ extern void __cpu_die(unsigned int cpu);
 extern void cpu_die(void);
 extern void cpu_die_early(void);
 
+extern void update_cpu_boot_status(unsigned long);
+
+#endif /* ifndef __ASSEMBLY__ */
+
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 25de8b2..d8975f0 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -103,6 +103,9 @@ int main(void)
   DEFINE(TZ_MINWEST,		offsetof(struct timezone, tz_minuteswest));
   DEFINE(TZ_DSTTIME,		offsetof(struct timezone, tz_dsttime));
   BLANK();
+  DEFINE(CPU_BOOT_STACK,	offsetof(struct secondary_data, stack));
+  DEFINE(CPU_BOOT_STATUS,	offsetof(struct secondary_data, status));
+  BLANK();
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_CONTEXT,		offsetof(struct kvm_vcpu, arch.ctxt));
   DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b225d34..e0a42dd 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -34,6 +34,7 @@
 #include <asm/pgtable-hwdef.h>
 #include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/smp.h>
 #include <asm/sysreg.h>
 #include <asm/thread_info.h>
 #include <asm/virt.h>
@@ -605,7 +606,7 @@ ENTRY(secondary_startup)
 ENDPROC(secondary_startup)
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x22]			// get secondary_data.stack
+	ldr	x0, [x22, #CPU_BOOT_STACK]	// get secondary_data.stack
 	mov	sp, x0
 	mov	x29, #0
 	b	secondary_start_kernel
@@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
  * Enable the MMU.
  *
  *  x0  = SCTLR_EL1 value for turning on the MMU.
+ *  x22 = __va(secondary_data)
  *  x27 = *virtual* address to jump to upon completion
  *
  * Other registers depend on the function called upon completion.
@@ -647,6 +649,19 @@ __enable_mmu:
 ENDPROC(__enable_mmu)
 
 __no_granule_support:
+	/* Indicate that this CPU can't boot and is stuck in the kernel */
+	cmp	x22, 0				// Boot CPU doesn't update status
+	b.eq	1f
+	adrp	x1, secondary_data
+	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)
+	mov	x0, #CPU_STUCK_IN_KERNEL
+	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
+	/* flush the data to PoC */
+	isb
+	dc	civac, x1
+	dsb	sy
+	isb
+1:
 	wfe
-	b __no_granule_support
+	b 1b
 ENDPROC(__no_granule_support)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 607d876..708f4b1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -63,6 +63,8 @@
  * where to place its SVC stack
  */
 struct secondary_data secondary_data;
+/* Number of CPUs which aren't online, but looping in kernel text. */
+u32 cpus_stuck_in_kernel;
 
 enum ipi_msg_type {
 	IPI_RESCHEDULE,
@@ -72,6 +74,16 @@ enum ipi_msg_type {
 	IPI_IRQ_WORK,
 };
 
+#ifdef CONFIG_HOTPLUG_CPU
+static int op_cpu_kill(unsigned int cpu);
+#else
+static inline int op_cpu_kill(unsigned int cpu)
+{
+	return -ENOSYS;
+}
+#endif
+
+
 /*
  * Boot a secondary CPU, and assign it the specified idle task.
  * This also gives us the initial stack to use for this CPU.
@@ -86,6 +98,12 @@ static int boot_secondary(unsigned int cpu, struct task_struct *idle)
 
 static DECLARE_COMPLETION(cpu_running);
 
+void update_cpu_boot_status(unsigned long status)
+{
+	secondary_data.status = status;
+	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
+}
+
 int __cpu_up(unsigned int cpu, struct task_struct *idle)
 {
 	int ret;
@@ -95,7 +113,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	 * page tables.
 	 */
 	secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
-	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
+	update_cpu_boot_status(CPU_WAIT_RESULT);
 
 	/*
 	 * Now bring the CPU into our world.
@@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 	}
 
+	mb();
+
 	secondary_data.stack = NULL;
+	if (ret && secondary_data.status) {
+		switch(secondary_data.status) {
+		default:
+			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
+					cpu, secondary_data.status);
+			break;
+		case CPU_KILL_ME:
+			if (op_cpu_kill(cpu)) {
+				pr_crit("CPU%u: may not have shut down cleanly\n",
+							cpu);
+				cpus_stuck_in_kernel++;
+			} else
+				pr_crit("CPU%u: died during early boot\n", cpu);
+			break;
+		case CPU_STUCK_IN_KERNEL:
+			pr_crit("CPU%u: is stuck in kernel\n", cpu);
+			cpus_stuck_in_kernel++;
+			break;
+		case CPU_PANIC_KERNEL:
+			panic("CPU%u detected unsupported configuration\n", cpu);
+		}
+	}
 
 	return ret;
 }
@@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	pr_info("CPU%u: Booted secondary processor [%08x]\n",
 					 cpu, read_cpuid_id());
+	update_cpu_boot_status(CPU_BOOT_SUCCESS);
 	set_cpu_online(cpu, true);
 	complete(&cpu_running);
 
@@ -327,10 +370,13 @@ void cpu_die_early(void)
 	set_cpu_present(cpu, 0);
 
 #ifdef CONFIG_HOTPLUG_CPU
+	update_cpu_boot_status(CPU_KILL_ME);
 	/* Check if we can park ourselves */
 	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
 		cpu_ops[cpu]->cpu_die(cpu);
 #endif
+	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
+	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
 
 	for (;;) {
 		wfe();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

A secondary CPU could fail to come online due to insufficient
capabilities and could simply die or loop in the kernel.
e.g, a CPU with no support for the selected kernel PAGE_SIZE
loops in kernel with MMU turned off.
or a hotplugged CPU which doesn't have one of the advertised
system capability will die during the activation.

There is no way to synchronise the status of the failing CPU
back to the master. This patch solves the issue by adding a
field to the secondary_data which can be updated by the failing
CPU.

Here are the possible states :

 -1. CPU_WAIT_STATUS - Initial value set by the master CPU.

 0. CPU_BOOT_SUCCESS - CPU has booted successfully.

 1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the
master CPU to synchronise by issuing a cpu_ops->cpu_kill.

 2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is
looping in the kernel. This information could be used by say,
kexec to check if it is really safe to do a kexec reboot.

 3. CPU_PANIC_KERNEL - CPU detected some serious issues which
requires kernel to crash immediately. The secondary CPU cannot
call panic() until it has initialised the GIC. This flag can
be used to instruct the master to do so.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/smp.h    |   24 +++++++++++++++++++-
 arch/arm64/kernel/asm-offsets.c |    3 +++
 arch/arm64/kernel/head.S        |   19 ++++++++++++++--
 arch/arm64/kernel/smp.c         |   48 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 13ce01f..240ab3d 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -16,6 +16,19 @@
 #ifndef __ASM_SMP_H
 #define __ASM_SMP_H
 
+/* Values for secondary_data.status */
+
+#define CPU_WAIT_RESULT		-1
+#define CPU_BOOT_SUCCESS	0
+/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */
+#define CPU_KILL_ME		1
+/* The cpu couldn't die gracefully and is looping in the kernel */
+#define CPU_STUCK_IN_KERNEL	2
+/* Fatal system error detected by secondary CPU, crash the system */
+#define CPU_PANIC_KERNEL	3
+
+#ifndef __ASSEMBLY__
+
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/thread_info.h>
@@ -54,10 +67,15 @@ asmlinkage void secondary_start_kernel(void);
 
 /*
  * Initial data for bringing up a secondary CPU.
+ * @stack  - sp for the secondary CPU
+ * @status - Result passed back from the secondary CPU to
+ *           indicate failure.
  */
 struct secondary_data {
 	void *stack;
-};
+	unsigned long status;
+} ____cacheline_aligned;
+
 extern struct secondary_data secondary_data;
 extern void secondary_entry(void);
 
@@ -70,4 +88,8 @@ extern void __cpu_die(unsigned int cpu);
 extern void cpu_die(void);
 extern void cpu_die_early(void);
 
+extern void update_cpu_boot_status(unsigned long);
+
+#endif /* ifndef __ASSEMBLY__ */
+
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 25de8b2..d8975f0 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -103,6 +103,9 @@ int main(void)
   DEFINE(TZ_MINWEST,		offsetof(struct timezone, tz_minuteswest));
   DEFINE(TZ_DSTTIME,		offsetof(struct timezone, tz_dsttime));
   BLANK();
+  DEFINE(CPU_BOOT_STACK,	offsetof(struct secondary_data, stack));
+  DEFINE(CPU_BOOT_STATUS,	offsetof(struct secondary_data, status));
+  BLANK();
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_CONTEXT,		offsetof(struct kvm_vcpu, arch.ctxt));
   DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b225d34..e0a42dd 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -34,6 +34,7 @@
 #include <asm/pgtable-hwdef.h>
 #include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/smp.h>
 #include <asm/sysreg.h>
 #include <asm/thread_info.h>
 #include <asm/virt.h>
@@ -605,7 +606,7 @@ ENTRY(secondary_startup)
 ENDPROC(secondary_startup)
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x22]			// get secondary_data.stack
+	ldr	x0, [x22, #CPU_BOOT_STACK]	// get secondary_data.stack
 	mov	sp, x0
 	mov	x29, #0
 	b	secondary_start_kernel
@@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
  * Enable the MMU.
  *
  *  x0  = SCTLR_EL1 value for turning on the MMU.
+ *  x22 = __va(secondary_data)
  *  x27 = *virtual* address to jump to upon completion
  *
  * Other registers depend on the function called upon completion.
@@ -647,6 +649,19 @@ __enable_mmu:
 ENDPROC(__enable_mmu)
 
 __no_granule_support:
+	/* Indicate that this CPU can't boot and is stuck in the kernel */
+	cmp	x22, 0				// Boot CPU doesn't update status
+	b.eq	1f
+	adrp	x1, secondary_data
+	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)
+	mov	x0, #CPU_STUCK_IN_KERNEL
+	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
+	/* flush the data to PoC */
+	isb
+	dc	civac, x1
+	dsb	sy
+	isb
+1:
 	wfe
-	b __no_granule_support
+	b 1b
 ENDPROC(__no_granule_support)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 607d876..708f4b1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -63,6 +63,8 @@
  * where to place its SVC stack
  */
 struct secondary_data secondary_data;
+/* Number of CPUs which aren't online, but looping in kernel text. */
+u32 cpus_stuck_in_kernel;
 
 enum ipi_msg_type {
 	IPI_RESCHEDULE,
@@ -72,6 +74,16 @@ enum ipi_msg_type {
 	IPI_IRQ_WORK,
 };
 
+#ifdef CONFIG_HOTPLUG_CPU
+static int op_cpu_kill(unsigned int cpu);
+#else
+static inline int op_cpu_kill(unsigned int cpu)
+{
+	return -ENOSYS;
+}
+#endif
+
+
 /*
  * Boot a secondary CPU, and assign it the specified idle task.
  * This also gives us the initial stack to use for this CPU.
@@ -86,6 +98,12 @@ static int boot_secondary(unsigned int cpu, struct task_struct *idle)
 
 static DECLARE_COMPLETION(cpu_running);
 
+void update_cpu_boot_status(unsigned long status)
+{
+	secondary_data.status = status;
+	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
+}
+
 int __cpu_up(unsigned int cpu, struct task_struct *idle)
 {
 	int ret;
@@ -95,7 +113,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	 * page tables.
 	 */
 	secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
-	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
+	update_cpu_boot_status(CPU_WAIT_RESULT);
 
 	/*
 	 * Now bring the CPU into our world.
@@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 	}
 
+	mb();
+
 	secondary_data.stack = NULL;
+	if (ret && secondary_data.status) {
+		switch(secondary_data.status) {
+		default:
+			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
+					cpu, secondary_data.status);
+			break;
+		case CPU_KILL_ME:
+			if (op_cpu_kill(cpu)) {
+				pr_crit("CPU%u: may not have shut down cleanly\n",
+							cpu);
+				cpus_stuck_in_kernel++;
+			} else
+				pr_crit("CPU%u: died during early boot\n", cpu);
+			break;
+		case CPU_STUCK_IN_KERNEL:
+			pr_crit("CPU%u: is stuck in kernel\n", cpu);
+			cpus_stuck_in_kernel++;
+			break;
+		case CPU_PANIC_KERNEL:
+			panic("CPU%u detected unsupported configuration\n", cpu);
+		}
+	}
 
 	return ret;
 }
@@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	pr_info("CPU%u: Booted secondary processor [%08x]\n",
 					 cpu, read_cpuid_id());
+	update_cpu_boot_status(CPU_BOOT_SUCCESS);
 	set_cpu_online(cpu, true);
 	complete(&cpu_running);
 
@@ -327,10 +370,13 @@ void cpu_die_early(void)
 	set_cpu_present(cpu, 0);
 
 #ifdef CONFIG_HOTPLUG_CPU
+	update_cpu_boot_status(CPU_KILL_ME);
 	/* Check if we can park ourselves */
 	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
 		cpu_ops[cpu]->cpu_die(cpu);
 #endif
+	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
+	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
 
 	for (;;) {
 		wfe();
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 5/8] arm64: Enable CPU capability verification unconditionally
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

We verify the capabilities of the secondary CPUs only when
hotplug is enabled. The boot time activated CPUs do not
go through the verification by checking whether the system
wide capabilities were initialised or not.

This patch removes the capability check dependency on CONFIG_HOTPLUG_CPU,
to make sure that all the secondary CPUs go through the check.
The boot time activated CPUs will still skip the system wide
capability check. The plan is to hook in a check for CPU features
used by the kernel at early boot up, based on the Boot CPU values.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |    6 ------
 arch/arm64/kernel/cpufeature.c      |   10 ----------
 2 files changed, 16 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8f271b8..26024c4 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -164,13 +164,7 @@ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void check_local_cpu_errata(void);
 
-#ifdef CONFIG_HOTPLUG_CPU
 void verify_local_cpu_capabilities(void);
-#else
-static inline void verify_local_cpu_capabilities(void)
-{
-}
-#endif
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 517a3af..b94a198 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -767,8 +767,6 @@ static void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps)
 			on_each_cpu(caps[i].enable, NULL, true);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
 /*
  * Flag to indicate if we have computed the system wide
  * capabilities based on the boot time active CPUs. This
@@ -873,14 +871,6 @@ void verify_local_cpu_capabilities(void)
 	}
 }
 
-#else	/* !CONFIG_HOTPLUG_CPU */
-
-static inline void set_sys_caps_initialised(void)
-{
-}
-
-#endif	/* CONFIG_HOTPLUG_CPU */
-
 static void setup_feature_capabilities(void)
 {
 	update_cpu_capabilities(arm64_features, "detected feature:");
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 5/8] arm64: Enable CPU capability verification unconditionally
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

We verify the capabilities of the secondary CPUs only when
hotplug is enabled. The boot time activated CPUs do not
go through the verification by checking whether the system
wide capabilities were initialised or not.

This patch removes the capability check dependency on CONFIG_HOTPLUG_CPU,
to make sure that all the secondary CPUs go through the check.
The boot time activated CPUs will still skip the system wide
capability check. The plan is to hook in a check for CPU features
used by the kernel at early boot up, based on the Boot CPU values.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |    6 ------
 arch/arm64/kernel/cpufeature.c      |   10 ----------
 2 files changed, 16 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8f271b8..26024c4 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -164,13 +164,7 @@ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void check_local_cpu_errata(void);
 
-#ifdef CONFIG_HOTPLUG_CPU
 void verify_local_cpu_capabilities(void);
-#else
-static inline void verify_local_cpu_capabilities(void)
-{
-}
-#endif
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 517a3af..b94a198 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -767,8 +767,6 @@ static void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps)
 			on_each_cpu(caps[i].enable, NULL, true);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
 /*
  * Flag to indicate if we have computed the system wide
  * capabilities based on the boot time active CPUs. This
@@ -873,14 +871,6 @@ void verify_local_cpu_capabilities(void)
 	}
 }
 
-#else	/* !CONFIG_HOTPLUG_CPU */
-
-static inline void set_sys_caps_initialised(void)
-{
-}
-
-#endif	/* CONFIG_HOTPLUG_CPU */
-
 static void setup_feature_capabilities(void)
 {
 	update_cpu_capabilities(arm64_features, "detected feature:");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 6/8] arm64: Add hook for checking early CPU features
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

Adds a hook for checking whether a secondary CPU has the
features used already by the kernel during early boot, based
on the boot CPU.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpufeature.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b94a198..e63da0f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -824,6 +824,14 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
+ * Check for CPU features that are used in early boot
+ * based on the Boot CPU value.
+ */
+static void check_early_cpu_features(void)
+{
+}
+
+/*
  * Run through the enabled system capabilities and enable() it on this CPU.
  * The capabilities were decided based on the available CPUs at the boot time.
  * Any new CPU should match the system wide status of the capability. If the
@@ -836,6 +844,8 @@ void verify_local_cpu_capabilities(void)
 	int i;
 	const struct arm64_cpu_capabilities *caps;
 
+	check_early_cpu_features();
+
 	/*
 	 * If we haven't computed the system capabilities, there is nothing
 	 * to verify.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 6/8] arm64: Add hook for checking early CPU features
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

Adds a hook for checking whether a secondary CPU has the
features used already by the kernel during early boot, based
on the boot CPU.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpufeature.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b94a198..e63da0f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -824,6 +824,14 @@ static u64 __raw_read_system_reg(u32 sys_id)
 }
 
 /*
+ * Check for CPU features that are used in early boot
+ * based on the Boot CPU value.
+ */
+static void check_early_cpu_features(void)
+{
+}
+
+/*
  * Run through the enabled system capabilities and enable() it on this CPU.
  * The capabilities were decided based on the available CPUs at the boot time.
  * Any new CPU should match the system wide status of the capability. If the
@@ -836,6 +844,8 @@ void verify_local_cpu_capabilities(void)
 	int i;
 	const struct arm64_cpu_capabilities *caps;
 
+	check_early_cpu_features();
+
 	/*
 	 * If we haven't computed the system capabilities, there is nothing
 	 * to verify.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 7/8] arm64: Add helper for extracting ASIDBits
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

Add a helper to extract ASIDBits on the current cpu

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/mm/context.c |   36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index e87f53f..643bf4b 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -40,6 +40,28 @@ static cpumask_t tlb_flush_pending;
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 #define NUM_USER_ASIDS		ASID_FIRST_VERSION
 
+/* Get the ASIDBits supported by the current CPU */
+static u32 get_cpu_asid_bits(void)
+{
+	u32 asid;
+	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1),
+						ID_AA64MMFR0_ASID_SHIFT);
+
+	switch (fld) {
+	default:
+		pr_warn("CPU%d: Unknown ASID size (%d); assuming 8-bit\n",
+					smp_processor_id(),  fld);
+		/* Fallthrough */
+	case 0:
+		asid = 8;
+		break;
+	case 2:
+		asid = 16;
+	}
+
+	return asid;
+}
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
@@ -187,19 +209,7 @@ switch_mm_fastpath:
 
 static int asids_init(void)
 {
-	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1), 4);
-
-	switch (fld) {
-	default:
-		pr_warn("Unknown ASID size (%d); assuming 8-bit\n", fld);
-		/* Fallthrough */
-	case 0:
-		asid_bits = 8;
-		break;
-	case 2:
-		asid_bits = 16;
-	}
-
+	asid_bits = get_cpu_asid_bits();
 	/* If we end up with more CPUs than ASIDs, expect things to crash */
 	WARN_ON(NUM_USER_ASIDS < num_possible_cpus());
 	atomic64_set(&asid_generation, ASID_FIRST_VERSION);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 7/8] arm64: Add helper for extracting ASIDBits
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

Add a helper to extract ASIDBits on the current cpu

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/mm/context.c |   36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index e87f53f..643bf4b 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -40,6 +40,28 @@ static cpumask_t tlb_flush_pending;
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 #define NUM_USER_ASIDS		ASID_FIRST_VERSION
 
+/* Get the ASIDBits supported by the current CPU */
+static u32 get_cpu_asid_bits(void)
+{
+	u32 asid;
+	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1),
+						ID_AA64MMFR0_ASID_SHIFT);
+
+	switch (fld) {
+	default:
+		pr_warn("CPU%d: Unknown ASID size (%d); assuming 8-bit\n",
+					smp_processor_id(),  fld);
+		/* Fallthrough */
+	case 0:
+		asid = 8;
+		break;
+	case 2:
+		asid = 16;
+	}
+
+	return asid;
+}
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
@@ -187,19 +209,7 @@ switch_mm_fastpath:
 
 static int asids_init(void)
 {
-	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1), 4);
-
-	switch (fld) {
-	default:
-		pr_warn("Unknown ASID size (%d); assuming 8-bit\n", fld);
-		/* Fallthrough */
-	case 0:
-		asid_bits = 8;
-		break;
-	case 2:
-		asid_bits = 16;
-	}
-
+	asid_bits = get_cpu_asid_bits();
 	/* If we end up with more CPUs than ASIDs, expect things to crash */
 	WARN_ON(NUM_USER_ASIDS < num_possible_cpus());
 	atomic64_set(&asid_generation, ASID_FIRST_VERSION);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 8/8] arm64: Ensure the secondary CPUs have safe ASIDBits size
  2015-12-09  9:57 ` Suzuki K. Poulose
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, Suzuki K. Poulose

The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context
id and is used in the early boot to make decisions. The value is
picked up from the Boot CPU and cannot be delayed until other CPUs
are up. If a secondary CPU has a smaller size than that of the Boot
CPU, things will break horribly and the usual SANITY check is not good
enough to prevent the system from crashing. So, crash the system with
enough information.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/mmu_context.h |    2 ++
 arch/arm64/kernel/cpufeature.c       |    2 ++
 arch/arm64/mm/context.c              |   16 ++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 2416578..bd8a0b9 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -147,4 +147,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #define deactivate_mm(tsk,mm)	do { } while (0)
 #define activate_mm(prev,next)	switch_mm(prev, next, NULL)
 
+void verify_cpu_asid_bits(void);
+
 #endif
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e63da0f..f7bcd30 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -24,6 +24,7 @@
 #include <asm/cpu.h>
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
+#include <asm/mmu_context.h>
 #include <asm/processor.h>
 #include <asm/sysreg.h>
 
@@ -829,6 +830,7 @@ static u64 __raw_read_system_reg(u32 sys_id)
  */
 static void check_early_cpu_features(void)
 {
+	verify_cpu_asid_bits();
 }
 
 /*
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 643bf4b..b945bb4 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -24,6 +24,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/mmu_context.h>
+#include <asm/smp.h>
 #include <asm/tlbflush.h>
 
 static u32 asid_bits;
@@ -62,6 +63,21 @@ static u32 get_cpu_asid_bits(void)
 	return asid;
 }
 
+/* Check if the current cpu's ASIDBits is compatible with asid_bits */
+void verify_cpu_asid_bits(void)
+{
+	u32 asid = get_cpu_asid_bits();
+
+	if (asid < asid_bits) {
+		/* Differing ASIDBits is dangerous, crash the system */
+		pr_crit("CPU%d: has smaller ASIDBits(%u) than the one in use (%u)\n",
+				smp_processor_id(), asid, asid_bits);
+		update_cpu_boot_status(CPU_PANIC_KERNEL);
+		isb();
+		while(1);
+	}
+}
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 8/8] arm64: Ensure the secondary CPUs have safe ASIDBits size
@ 2015-12-09  9:57   ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-09  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context
id and is used in the early boot to make decisions. The value is
picked up from the Boot CPU and cannot be delayed until other CPUs
are up. If a secondary CPU has a smaller size than that of the Boot
CPU, things will break horribly and the usual SANITY check is not good
enough to prevent the system from crashing. So, crash the system with
enough information.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/mmu_context.h |    2 ++
 arch/arm64/kernel/cpufeature.c       |    2 ++
 arch/arm64/mm/context.c              |   16 ++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 2416578..bd8a0b9 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -147,4 +147,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #define deactivate_mm(tsk,mm)	do { } while (0)
 #define activate_mm(prev,next)	switch_mm(prev, next, NULL)
 
+void verify_cpu_asid_bits(void);
+
 #endif
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e63da0f..f7bcd30 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -24,6 +24,7 @@
 #include <asm/cpu.h>
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
+#include <asm/mmu_context.h>
 #include <asm/processor.h>
 #include <asm/sysreg.h>
 
@@ -829,6 +830,7 @@ static u64 __raw_read_system_reg(u32 sys_id)
  */
 static void check_early_cpu_features(void)
 {
+	verify_cpu_asid_bits();
 }
 
 /*
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 643bf4b..b945bb4 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -24,6 +24,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/mmu_context.h>
+#include <asm/smp.h>
 #include <asm/tlbflush.h>
 
 static u32 asid_bits;
@@ -62,6 +63,21 @@ static u32 get_cpu_asid_bits(void)
 	return asid;
 }
 
+/* Check if the current cpu's ASIDBits is compatible with asid_bits */
+void verify_cpu_asid_bits(void)
+{
+	u32 asid = get_cpu_asid_bits();
+
+	if (asid < asid_bits) {
+		/* Differing ASIDBits is dangerous, crash the system */
+		pr_crit("CPU%d: has smaller ASIDBits(%u) than the one in use (%u)\n",
+				smp_processor_id(), asid, asid_bits);
+		update_cpu_boot_status(CPU_PANIC_KERNEL);
+		isb();
+		while(1);
+	}
+}
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
  2015-12-09  9:57   ` Suzuki K. Poulose
@ 2015-12-15 11:23     ` Will Deacon
  -1 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 11:23 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, catalin.marinas,
	marc.zyngier

On Wed, Dec 09, 2015 at 09:57:13AM +0000, Suzuki K. Poulose wrote:
> This patch moves cpu_die_early to smp.c, where it fits better.
> No functional changes, except for adding the necessary checks
> for CONFIG_HOTPLUG_CPU.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/smp.h   |    1 +
>  arch/arm64/kernel/cpufeature.c |   23 -----------------------
>  arch/arm64/kernel/smp.c        |   25 +++++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index d9c3d6a..13ce01f 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -68,5 +68,6 @@ extern int __cpu_disable(void);
>  
>  extern void __cpu_die(unsigned int cpu);
>  extern void cpu_die(void);
> +extern void cpu_die_early(void);
>  
>  #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 581b779..517a3af 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -826,29 +826,6 @@ static u64 __raw_read_system_reg(u32 sys_id)
>  }
>  
>  /*
> - * Kill the calling secondary CPU, early in bringup before it is turned
> - * online.
> - */
> -void cpu_die_early(void)
> -{
> -	int cpu = smp_processor_id();
> -
> -	pr_crit("CPU%d: will not boot\n", cpu);
> -
> -	/* Mark this CPU absent */
> -	set_cpu_present(cpu, 0);
> -
> -	/* Check if we can park ourselves */
> -	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
> -		cpu_ops[cpu]->cpu_die(cpu);
> -
> -	for (;;) {
> -		wfe();
> -		wfi();
> -	}
> -}
> -
> -/*
>   * Run through the enabled system capabilities and enable() it on this CPU.
>   * The capabilities were decided based on the available CPUs at the boot time.
>   * Any new CPU should match the system wide status of the capability. If the
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..607d876 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -313,6 +313,31 @@ void cpu_die(void)
>  }
>  #endif
>  
> +/*
> + * Kill the calling secondary CPU, early in bringup before it is turned
> + * online.
> + */
> +void cpu_die_early(void)
> +{
> +	int cpu = smp_processor_id();
> +
> +	pr_crit("CPU%d: will not boot\n", cpu);
> +
> +	/* Mark this CPU absent */
> +	set_cpu_present(cpu, 0);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +	/* Check if we can park ourselves */
> +	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)

IS_ENABLED?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
@ 2015-12-15 11:23     ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 09, 2015 at 09:57:13AM +0000, Suzuki K. Poulose wrote:
> This patch moves cpu_die_early to smp.c, where it fits better.
> No functional changes, except for adding the necessary checks
> for CONFIG_HOTPLUG_CPU.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/smp.h   |    1 +
>  arch/arm64/kernel/cpufeature.c |   23 -----------------------
>  arch/arm64/kernel/smp.c        |   25 +++++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index d9c3d6a..13ce01f 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -68,5 +68,6 @@ extern int __cpu_disable(void);
>  
>  extern void __cpu_die(unsigned int cpu);
>  extern void cpu_die(void);
> +extern void cpu_die_early(void);
>  
>  #endif /* ifndef __ASM_SMP_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 581b779..517a3af 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -826,29 +826,6 @@ static u64 __raw_read_system_reg(u32 sys_id)
>  }
>  
>  /*
> - * Kill the calling secondary CPU, early in bringup before it is turned
> - * online.
> - */
> -void cpu_die_early(void)
> -{
> -	int cpu = smp_processor_id();
> -
> -	pr_crit("CPU%d: will not boot\n", cpu);
> -
> -	/* Mark this CPU absent */
> -	set_cpu_present(cpu, 0);
> -
> -	/* Check if we can park ourselves */
> -	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
> -		cpu_ops[cpu]->cpu_die(cpu);
> -
> -	for (;;) {
> -		wfe();
> -		wfi();
> -	}
> -}
> -
> -/*
>   * Run through the enabled system capabilities and enable() it on this CPU.
>   * The capabilities were decided based on the available CPUs at the boot time.
>   * Any new CPU should match the system wide status of the capability. If the
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..607d876 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -313,6 +313,31 @@ void cpu_die(void)
>  }
>  #endif
>  
> +/*
> + * Kill the calling secondary CPU, early in bringup before it is turned
> + * online.
> + */
> +void cpu_die_early(void)
> +{
> +	int cpu = smp_processor_id();
> +
> +	pr_crit("CPU%d: will not boot\n", cpu);
> +
> +	/* Mark this CPU absent */
> +	set_cpu_present(cpu, 0);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +	/* Check if we can park ourselves */
> +	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)

IS_ENABLED?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
  2015-12-15 11:23     ` Will Deacon
@ 2015-12-15 11:26       ` Mark Rutland
  -1 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2015-12-15 11:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki K. Poulose, linux-arm-kernel, linux-kernel,
	catalin.marinas, marc.zyngier

On Tue, Dec 15, 2015 at 11:23:39AM +0000, Will Deacon wrote:
> > +/*
> > + * Kill the calling secondary CPU, early in bringup before it is turned
> > + * online.
> > + */
> > +void cpu_die_early(void)
> > +{
> > +	int cpu = smp_processor_id();
> > +
> > +	pr_crit("CPU%d: will not boot\n", cpu);
> > +
> > +	/* Mark this CPU absent */
> > +	set_cpu_present(cpu, 0);
> > +
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +	/* Check if we can park ourselves */
> > +	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
> 
> IS_ENABLED?

The definition of struct cpu_ops has cpu_die in an #ifdef
CONFIG_HOTPLUG_CPU.

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c
@ 2015-12-15 11:26       ` Mark Rutland
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2015-12-15 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 15, 2015 at 11:23:39AM +0000, Will Deacon wrote:
> > +/*
> > + * Kill the calling secondary CPU, early in bringup before it is turned
> > + * online.
> > + */
> > +void cpu_die_early(void)
> > +{
> > +	int cpu = smp_processor_id();
> > +
> > +	pr_crit("CPU%d: will not boot\n", cpu);
> > +
> > +	/* Mark this CPU absent */
> > +	set_cpu_present(cpu, 0);
> > +
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +	/* Check if we can park ourselves */
> > +	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
> 
> IS_ENABLED?

The definition of struct cpu_ops has cpu_die in an #ifdef
CONFIG_HOTPLUG_CPU.

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-09  9:57   ` Suzuki K. Poulose
@ 2015-12-15 11:37     ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-15 11:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, catalin.marinas,
	marc.zyngier, James Morse

On 09/12/15 09:57, Suzuki K. Poulose wrote:
> A secondary CPU could fail to come online due to insufficient
> capabilities and could simply die or loop in the kernel.
> e.g, a CPU with no support for the selected kernel PAGE_SIZE
> loops in kernel with MMU turned off.
> or a hotplugged CPU which doesn't have one of the advertised
> system capability will die during the activation.


>   #ifdef CONFIG_HOTPLUG_CPU
> +     update_cpu_boot_status(CPU_KILL_ME);
>       /* Check if we can park ourselves */
>       if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
>               cpu_ops[cpu]->cpu_die(cpu);
>   #endif
> +     update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
> +     __flush_dcache_area(&secondary_data, sizeof(secondary_data));

James pointed out that we don't need the explicit ___flush_dcache_area (which is a left over
from previous version). I have fixed this locally.


Thanks
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-15 11:37     ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-15 11:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/12/15 09:57, Suzuki K. Poulose wrote:
> A secondary CPU could fail to come online due to insufficient
> capabilities and could simply die or loop in the kernel.
> e.g, a CPU with no support for the selected kernel PAGE_SIZE
> loops in kernel with MMU turned off.
> or a hotplugged CPU which doesn't have one of the advertised
> system capability will die during the activation.


>   #ifdef CONFIG_HOTPLUG_CPU
> +     update_cpu_boot_status(CPU_KILL_ME);
>       /* Check if we can park ourselves */
>       if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
>               cpu_ops[cpu]->cpu_die(cpu);
>   #endif
> +     update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
> +     __flush_dcache_area(&secondary_data, sizeof(secondary_data));

James pointed out that we don't need the explicit ___flush_dcache_area (which is a left over
from previous version). I have fixed this locally.


Thanks
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-09  9:57   ` Suzuki K. Poulose
@ 2015-12-15 11:55     ` Will Deacon
  -1 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 11:55 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, catalin.marinas,
	marc.zyngier

On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> A secondary CPU could fail to come online due to insufficient
> capabilities and could simply die or loop in the kernel.
> e.g, a CPU with no support for the selected kernel PAGE_SIZE
> loops in kernel with MMU turned off.
> or a hotplugged CPU which doesn't have one of the advertised
> system capability will die during the activation.
> 
> There is no way to synchronise the status of the failing CPU
> back to the master. This patch solves the issue by adding a
> field to the secondary_data which can be updated by the failing
> CPU.
> 
> Here are the possible states :
> 
>  -1. CPU_WAIT_STATUS - Initial value set by the master CPU.
> 
>  0. CPU_BOOT_SUCCESS - CPU has booted successfully.
> 
>  1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the
> master CPU to synchronise by issuing a cpu_ops->cpu_kill.
> 
>  2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is
> looping in the kernel. This information could be used by say,
> kexec to check if it is really safe to do a kexec reboot.
> 
>  3. CPU_PANIC_KERNEL - CPU detected some serious issues which
> requires kernel to crash immediately. The secondary CPU cannot
> call panic() until it has initialised the GIC. This flag can
> be used to instruct the master to do so.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/smp.h    |   24 +++++++++++++++++++-
>  arch/arm64/kernel/asm-offsets.c |    3 +++
>  arch/arm64/kernel/head.S        |   19 ++++++++++++++--
>  arch/arm64/kernel/smp.c         |   48 ++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 90 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 13ce01f..240ab3d 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -16,6 +16,19 @@
>  #ifndef __ASM_SMP_H
>  #define __ASM_SMP_H
>  
> +/* Values for secondary_data.status */
> +
> +#define CPU_WAIT_RESULT		-1
> +#define CPU_BOOT_SUCCESS	0
> +/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */
> +#define CPU_KILL_ME		1
> +/* The cpu couldn't die gracefully and is looping in the kernel */
> +#define CPU_STUCK_IN_KERNEL	2
> +/* Fatal system error detected by secondary CPU, crash the system */
> +#define CPU_PANIC_KERNEL	3
> +
> +#ifndef __ASSEMBLY__
> +
>  #include <linux/threads.h>
>  #include <linux/cpumask.h>
>  #include <linux/thread_info.h>
> @@ -54,10 +67,15 @@ asmlinkage void secondary_start_kernel(void);
>  
>  /*
>   * Initial data for bringing up a secondary CPU.
> + * @stack  - sp for the secondary CPU
> + * @status - Result passed back from the secondary CPU to
> + *           indicate failure.
>   */
>  struct secondary_data {
>  	void *stack;
> -};
> +	unsigned long status;
> +} ____cacheline_aligned;

Why is this necessary?

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index b225d34..e0a42dd 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -34,6 +34,7 @@
>  #include <asm/pgtable-hwdef.h>
>  #include <asm/pgtable.h>
>  #include <asm/page.h>
> +#include <asm/smp.h>
>  #include <asm/sysreg.h>
>  #include <asm/thread_info.h>
>  #include <asm/virt.h>
> @@ -605,7 +606,7 @@ ENTRY(secondary_startup)
>  ENDPROC(secondary_startup)
>  
>  ENTRY(__secondary_switched)
> -	ldr	x0, [x22]			// get secondary_data.stack
> +	ldr	x0, [x22, #CPU_BOOT_STACK]	// get secondary_data.stack
>  	mov	sp, x0
>  	mov	x29, #0
>  	b	secondary_start_kernel
> @@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
>   * Enable the MMU.
>   *
>   *  x0  = SCTLR_EL1 value for turning on the MMU.
> + *  x22 = __va(secondary_data)
>   *  x27 = *virtual* address to jump to upon completion
>   *
>   * Other registers depend on the function called upon completion.
> @@ -647,6 +649,19 @@ __enable_mmu:
>  ENDPROC(__enable_mmu)
>  
>  __no_granule_support:
> +	/* Indicate that this CPU can't boot and is stuck in the kernel */
> +	cmp	x22, 0				// Boot CPU doesn't update status
> +	b.eq	1f
> +	adrp	x1, secondary_data
> +	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)

This feels a bit grotty. You end up using the virtual address of
secondary_data to figure out if you're the boot CPU or not, and then you
end up having to compute the physical address of the same variable anyway.

> +	mov	x0, #CPU_STUCK_IN_KERNEL
> +	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
> +	/* flush the data to PoC */

Can you call __inval_cache_range instead of open-coding this?

> +	isb

Why?

> +	dc	civac, x1
> +	dsb	sy
> +	isb

Why?

> +1:
>  	wfe

Curious, but do you know why we don't have a wfi here too (like we do in
the C code)?

> -	b __no_granule_support
> +	b 1b
>  ENDPROC(__no_granule_support)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 607d876..708f4b1 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -63,6 +63,8 @@
>   * where to place its SVC stack
>   */
>  struct secondary_data secondary_data;
> +/* Number of CPUs which aren't online, but looping in kernel text. */
> +u32 cpus_stuck_in_kernel;
>  
>  enum ipi_msg_type {
>  	IPI_RESCHEDULE,
> @@ -72,6 +74,16 @@ enum ipi_msg_type {
>  	IPI_IRQ_WORK,
>  };
>  
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int op_cpu_kill(unsigned int cpu);
> +#else
> +static inline int op_cpu_kill(unsigned int cpu)
> +{
> +	return -ENOSYS;
> +}
> +#endif
> +
> +
>  /*
>   * Boot a secondary CPU, and assign it the specified idle task.
>   * This also gives us the initial stack to use for this CPU.
> @@ -86,6 +98,12 @@ static int boot_secondary(unsigned int cpu, struct task_struct *idle)
>  
>  static DECLARE_COMPLETION(cpu_running);
>  
> +void update_cpu_boot_status(unsigned long status)
> +{
> +	secondary_data.status = status;
> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> +}
> +
>  int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  {
>  	int ret;
> @@ -95,7 +113,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	 * page tables.
>  	 */
>  	secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
> -	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> +	update_cpu_boot_status(CPU_WAIT_RESULT);
>  
>  	/*
>  	 * Now bring the CPU into our world.
> @@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
>  
> +	mb();

Why? (and why not smp_mb(), if the barrier is actually needed?).

> +
>  	secondary_data.stack = NULL;
> +	if (ret && secondary_data.status) {
> +		switch(secondary_data.status) {
> +		default:
> +			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> +					cpu, secondary_data.status);
> +			break;
> +		case CPU_KILL_ME:
> +			if (op_cpu_kill(cpu)) {
> +				pr_crit("CPU%u: may not have shut down cleanly\n",
> +							cpu);
> +				cpus_stuck_in_kernel++;

Maybe restructure this so the failed kill case is a fallthrough to the
CPU_STUCK_IN_KERNEL case?

> +			} else
> +				pr_crit("CPU%u: died during early boot\n", cpu);

Missing braces.

> +			break;
> +		case CPU_STUCK_IN_KERNEL:
> +			pr_crit("CPU%u: is stuck in kernel\n", cpu);
> +			cpus_stuck_in_kernel++;
> +			break;
> +		case CPU_PANIC_KERNEL:
> +			panic("CPU%u detected unsupported configuration\n", cpu);

It would nice to show what the configuration difference was, but maybe
that's work for later.

> +		}
> +	}
>  
>  	return ret;
>  }
> @@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 */
>  	pr_info("CPU%u: Booted secondary processor [%08x]\n",
>  					 cpu, read_cpuid_id());
> +	update_cpu_boot_status(CPU_BOOT_SUCCESS);

Why do we need to continue with the cacheflushing at this point?

>  	set_cpu_online(cpu, true);
>  	complete(&cpu_running);
>  
> @@ -327,10 +370,13 @@ void cpu_die_early(void)
>  	set_cpu_present(cpu, 0);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +	update_cpu_boot_status(CPU_KILL_ME);
>  	/* Check if we can park ourselves */
>  	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
>  		cpu_ops[cpu]->cpu_die(cpu);
>  #endif
> +	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));

Extra flushing, but I don't see why you need it at all when you're
running in C on the CPU coming up.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-15 11:55     ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 11:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> A secondary CPU could fail to come online due to insufficient
> capabilities and could simply die or loop in the kernel.
> e.g, a CPU with no support for the selected kernel PAGE_SIZE
> loops in kernel with MMU turned off.
> or a hotplugged CPU which doesn't have one of the advertised
> system capability will die during the activation.
> 
> There is no way to synchronise the status of the failing CPU
> back to the master. This patch solves the issue by adding a
> field to the secondary_data which can be updated by the failing
> CPU.
> 
> Here are the possible states :
> 
>  -1. CPU_WAIT_STATUS - Initial value set by the master CPU.
> 
>  0. CPU_BOOT_SUCCESS - CPU has booted successfully.
> 
>  1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the
> master CPU to synchronise by issuing a cpu_ops->cpu_kill.
> 
>  2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is
> looping in the kernel. This information could be used by say,
> kexec to check if it is really safe to do a kexec reboot.
> 
>  3. CPU_PANIC_KERNEL - CPU detected some serious issues which
> requires kernel to crash immediately. The secondary CPU cannot
> call panic() until it has initialised the GIC. This flag can
> be used to instruct the master to do so.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/smp.h    |   24 +++++++++++++++++++-
>  arch/arm64/kernel/asm-offsets.c |    3 +++
>  arch/arm64/kernel/head.S        |   19 ++++++++++++++--
>  arch/arm64/kernel/smp.c         |   48 ++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 90 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 13ce01f..240ab3d 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -16,6 +16,19 @@
>  #ifndef __ASM_SMP_H
>  #define __ASM_SMP_H
>  
> +/* Values for secondary_data.status */
> +
> +#define CPU_WAIT_RESULT		-1
> +#define CPU_BOOT_SUCCESS	0
> +/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */
> +#define CPU_KILL_ME		1
> +/* The cpu couldn't die gracefully and is looping in the kernel */
> +#define CPU_STUCK_IN_KERNEL	2
> +/* Fatal system error detected by secondary CPU, crash the system */
> +#define CPU_PANIC_KERNEL	3
> +
> +#ifndef __ASSEMBLY__
> +
>  #include <linux/threads.h>
>  #include <linux/cpumask.h>
>  #include <linux/thread_info.h>
> @@ -54,10 +67,15 @@ asmlinkage void secondary_start_kernel(void);
>  
>  /*
>   * Initial data for bringing up a secondary CPU.
> + * @stack  - sp for the secondary CPU
> + * @status - Result passed back from the secondary CPU to
> + *           indicate failure.
>   */
>  struct secondary_data {
>  	void *stack;
> -};
> +	unsigned long status;
> +} ____cacheline_aligned;

Why is this necessary?

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index b225d34..e0a42dd 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -34,6 +34,7 @@
>  #include <asm/pgtable-hwdef.h>
>  #include <asm/pgtable.h>
>  #include <asm/page.h>
> +#include <asm/smp.h>
>  #include <asm/sysreg.h>
>  #include <asm/thread_info.h>
>  #include <asm/virt.h>
> @@ -605,7 +606,7 @@ ENTRY(secondary_startup)
>  ENDPROC(secondary_startup)
>  
>  ENTRY(__secondary_switched)
> -	ldr	x0, [x22]			// get secondary_data.stack
> +	ldr	x0, [x22, #CPU_BOOT_STACK]	// get secondary_data.stack
>  	mov	sp, x0
>  	mov	x29, #0
>  	b	secondary_start_kernel
> @@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
>   * Enable the MMU.
>   *
>   *  x0  = SCTLR_EL1 value for turning on the MMU.
> + *  x22 = __va(secondary_data)
>   *  x27 = *virtual* address to jump to upon completion
>   *
>   * Other registers depend on the function called upon completion.
> @@ -647,6 +649,19 @@ __enable_mmu:
>  ENDPROC(__enable_mmu)
>  
>  __no_granule_support:
> +	/* Indicate that this CPU can't boot and is stuck in the kernel */
> +	cmp	x22, 0				// Boot CPU doesn't update status
> +	b.eq	1f
> +	adrp	x1, secondary_data
> +	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)

This feels a bit grotty. You end up using the virtual address of
secondary_data to figure out if you're the boot CPU or not, and then you
end up having to compute the physical address of the same variable anyway.

> +	mov	x0, #CPU_STUCK_IN_KERNEL
> +	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
> +	/* flush the data to PoC */

Can you call __inval_cache_range instead of open-coding this?

> +	isb

Why?

> +	dc	civac, x1
> +	dsb	sy
> +	isb

Why?

> +1:
>  	wfe

Curious, but do you know why we don't have a wfi here too (like we do in
the C code)?

> -	b __no_granule_support
> +	b 1b
>  ENDPROC(__no_granule_support)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 607d876..708f4b1 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -63,6 +63,8 @@
>   * where to place its SVC stack
>   */
>  struct secondary_data secondary_data;
> +/* Number of CPUs which aren't online, but looping in kernel text. */
> +u32 cpus_stuck_in_kernel;
>  
>  enum ipi_msg_type {
>  	IPI_RESCHEDULE,
> @@ -72,6 +74,16 @@ enum ipi_msg_type {
>  	IPI_IRQ_WORK,
>  };
>  
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int op_cpu_kill(unsigned int cpu);
> +#else
> +static inline int op_cpu_kill(unsigned int cpu)
> +{
> +	return -ENOSYS;
> +}
> +#endif
> +
> +
>  /*
>   * Boot a secondary CPU, and assign it the specified idle task.
>   * This also gives us the initial stack to use for this CPU.
> @@ -86,6 +98,12 @@ static int boot_secondary(unsigned int cpu, struct task_struct *idle)
>  
>  static DECLARE_COMPLETION(cpu_running);
>  
> +void update_cpu_boot_status(unsigned long status)
> +{
> +	secondary_data.status = status;
> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> +}
> +
>  int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  {
>  	int ret;
> @@ -95,7 +113,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	 * page tables.
>  	 */
>  	secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
> -	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> +	update_cpu_boot_status(CPU_WAIT_RESULT);
>  
>  	/*
>  	 * Now bring the CPU into our world.
> @@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
>  
> +	mb();

Why? (and why not smp_mb(), if the barrier is actually needed?).

> +
>  	secondary_data.stack = NULL;
> +	if (ret && secondary_data.status) {
> +		switch(secondary_data.status) {
> +		default:
> +			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> +					cpu, secondary_data.status);
> +			break;
> +		case CPU_KILL_ME:
> +			if (op_cpu_kill(cpu)) {
> +				pr_crit("CPU%u: may not have shut down cleanly\n",
> +							cpu);
> +				cpus_stuck_in_kernel++;

Maybe restructure this so the failed kill case is a fallthrough to the
CPU_STUCK_IN_KERNEL case?

> +			} else
> +				pr_crit("CPU%u: died during early boot\n", cpu);

Missing braces.

> +			break;
> +		case CPU_STUCK_IN_KERNEL:
> +			pr_crit("CPU%u: is stuck in kernel\n", cpu);
> +			cpus_stuck_in_kernel++;
> +			break;
> +		case CPU_PANIC_KERNEL:
> +			panic("CPU%u detected unsupported configuration\n", cpu);

It would nice to show what the configuration difference was, but maybe
that's work for later.

> +		}
> +	}
>  
>  	return ret;
>  }
> @@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
>  	 */
>  	pr_info("CPU%u: Booted secondary processor [%08x]\n",
>  					 cpu, read_cpuid_id());
> +	update_cpu_boot_status(CPU_BOOT_SUCCESS);

Why do we need to continue with the cacheflushing at this point?

>  	set_cpu_online(cpu, true);
>  	complete(&cpu_running);
>  
> @@ -327,10 +370,13 @@ void cpu_die_early(void)
>  	set_cpu_present(cpu, 0);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +	update_cpu_boot_status(CPU_KILL_ME);
>  	/* Check if we can park ourselves */
>  	if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die)
>  		cpu_ops[cpu]->cpu_die(cpu);
>  #endif
> +	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));

Extra flushing, but I don't see why you need it at all when you're
running in C on the CPU coming up.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 8/8] arm64: Ensure the secondary CPUs have safe ASIDBits size
  2015-12-09  9:57   ` Suzuki K. Poulose
@ 2015-12-15 12:02     ` Will Deacon
  -1 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 12:02 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, catalin.marinas,
	marc.zyngier

On Wed, Dec 09, 2015 at 09:57:19AM +0000, Suzuki K. Poulose wrote:
> The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context
> id and is used in the early boot to make decisions. The value is
> picked up from the Boot CPU and cannot be delayed until other CPUs
> are up. If a secondary CPU has a smaller size than that of the Boot
> CPU, things will break horribly and the usual SANITY check is not good
> enough to prevent the system from crashing. So, crash the system with
> enough information.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/mmu_context.h |    2 ++
>  arch/arm64/kernel/cpufeature.c       |    2 ++
>  arch/arm64/mm/context.c              |   16 ++++++++++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 2416578..bd8a0b9 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -147,4 +147,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  #define deactivate_mm(tsk,mm)	do { } while (0)
>  #define activate_mm(prev,next)	switch_mm(prev, next, NULL)
>  
> +void verify_cpu_asid_bits(void);
> +
>  #endif
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index e63da0f..f7bcd30 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -24,6 +24,7 @@
>  #include <asm/cpu.h>
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
> +#include <asm/mmu_context.h>
>  #include <asm/processor.h>
>  #include <asm/sysreg.h>
>  
> @@ -829,6 +830,7 @@ static u64 __raw_read_system_reg(u32 sys_id)
>   */
>  static void check_early_cpu_features(void)
>  {
> +	verify_cpu_asid_bits();
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 643bf4b..b945bb4 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -24,6 +24,7 @@
>  
>  #include <asm/cpufeature.h>
>  #include <asm/mmu_context.h>
> +#include <asm/smp.h>
>  #include <asm/tlbflush.h>
>  
>  static u32 asid_bits;
> @@ -62,6 +63,21 @@ static u32 get_cpu_asid_bits(void)
>  	return asid;
>  }
>  
> +/* Check if the current cpu's ASIDBits is compatible with asid_bits */
> +void verify_cpu_asid_bits(void)
> +{
> +	u32 asid = get_cpu_asid_bits();
> +
> +	if (asid < asid_bits) {
> +		/* Differing ASIDBits is dangerous, crash the system */

This comment isn't much use. How about:

  /*
   * We cannot decrease the ASID size at runtime, so panic if we support
   * fewer ASID bits than the boot CPU.
   */

> +		pr_crit("CPU%d: has smaller ASIDBits(%u) than the one in use (%u)\n",
> +				smp_processor_id(), asid, asid_bits);

"CPU%d: smaller ASID size (%d) than boot CPU (%u)\n"

> +		update_cpu_boot_status(CPU_PANIC_KERNEL);
> +		isb();

Why?

> +		while(1);

We probably want the usuale wfe/wfi trick here. Perhaps we should have
an inline function for that.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 8/8] arm64: Ensure the secondary CPUs have safe ASIDBits size
@ 2015-12-15 12:02     ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-15 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 09, 2015 at 09:57:19AM +0000, Suzuki K. Poulose wrote:
> The ID_AA64MMFR0_EL1:ASIDBits determines the size of the mm context
> id and is used in the early boot to make decisions. The value is
> picked up from the Boot CPU and cannot be delayed until other CPUs
> are up. If a secondary CPU has a smaller size than that of the Boot
> CPU, things will break horribly and the usual SANITY check is not good
> enough to prevent the system from crashing. So, crash the system with
> enough information.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/mmu_context.h |    2 ++
>  arch/arm64/kernel/cpufeature.c       |    2 ++
>  arch/arm64/mm/context.c              |   16 ++++++++++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 2416578..bd8a0b9 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -147,4 +147,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  #define deactivate_mm(tsk,mm)	do { } while (0)
>  #define activate_mm(prev,next)	switch_mm(prev, next, NULL)
>  
> +void verify_cpu_asid_bits(void);
> +
>  #endif
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index e63da0f..f7bcd30 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -24,6 +24,7 @@
>  #include <asm/cpu.h>
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
> +#include <asm/mmu_context.h>
>  #include <asm/processor.h>
>  #include <asm/sysreg.h>
>  
> @@ -829,6 +830,7 @@ static u64 __raw_read_system_reg(u32 sys_id)
>   */
>  static void check_early_cpu_features(void)
>  {
> +	verify_cpu_asid_bits();
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 643bf4b..b945bb4 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -24,6 +24,7 @@
>  
>  #include <asm/cpufeature.h>
>  #include <asm/mmu_context.h>
> +#include <asm/smp.h>
>  #include <asm/tlbflush.h>
>  
>  static u32 asid_bits;
> @@ -62,6 +63,21 @@ static u32 get_cpu_asid_bits(void)
>  	return asid;
>  }
>  
> +/* Check if the current cpu's ASIDBits is compatible with asid_bits */
> +void verify_cpu_asid_bits(void)
> +{
> +	u32 asid = get_cpu_asid_bits();
> +
> +	if (asid < asid_bits) {
> +		/* Differing ASIDBits is dangerous, crash the system */

This comment isn't much use. How about:

  /*
   * We cannot decrease the ASID size at runtime, so panic if we support
   * fewer ASID bits than the boot CPU.
   */

> +		pr_crit("CPU%d: has smaller ASIDBits(%u) than the one in use (%u)\n",
> +				smp_processor_id(), asid, asid_bits);

"CPU%d: smaller ASID size (%d) than boot CPU (%u)\n"

> +		update_cpu_boot_status(CPU_PANIC_KERNEL);
> +		isb();

Why?

> +		while(1);

We probably want the usuale wfe/wfi trick here. Perhaps we should have
an inline function for that.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-15 11:55     ` Will Deacon
@ 2015-12-16 10:39       ` Suzuki K. Poulose
  -1 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-16 10:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, catalin.marinas,
	marc.zyngier

On 15/12/15 11:55, Will Deacon wrote:
> On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:

>>   /*
>>    * Initial data for bringing up a secondary CPU.
>> + * @stack  - sp for the secondary CPU
>> + * @status - Result passed back from the secondary CPU to
>> + *           indicate failure.
>>    */
>>   struct secondary_data {
>>   	void *stack;
>> -};
>> +	unsigned long status;
>> +} ____cacheline_aligned;
>
> Why is this necessary?

That was based on a suggestion from Mark Rutland here:

   https://lkml.org/lkml/2015/12/1/580


>> @@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
>>    * Enable the MMU.
>>    *
>>    *  x0  = SCTLR_EL1 value for turning on the MMU.
>> + *  x22 = __va(secondary_data)
>>    *  x27 = *virtual* address to jump to upon completion

...

>>
>>   __no_granule_support:
>> +	/* Indicate that this CPU can't boot and is stuck in the kernel */
>> +	cmp	x22, 0				// Boot CPU doesn't update status
>> +	b.eq	1f
>> +	adrp	x1, secondary_data
>> +	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)
>
> This feels a bit grotty. You end up using the virtual address of
> secondary_data to figure out if you're the boot CPU or not, and then you
> end up having to compute the physical address of the same variable anyway.
>

Yes, it looks a bit weird, but we pass down the va(secondary_data) to the secondary
CPU and that is used only after the MMU is turned on, to fetch the stack ptr.
We use that information to detect if we are secondary. When we fail before turning
the MMU on, we need to update our result back, hence the va-to-pa conversion.
Is there a better way to handle it ? Am I missing something ? Or we could add another
register which would indicate if it is secondary or not.

>> +	mov	x0, #CPU_STUCK_IN_KERNEL
>> +	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
>> +	/* flush the data to PoC */
>
> Can you call __inval_cache_range instead of open-coding this?

Yes, that sounds better, will use that.

>> +	isb
>
> Why?

...

>
>> +	dc	civac, x1
>> +	dsb	sy
>> +	isb
>
> Why?

I now realize how bad this code is. I wrote this up in a hurry, without proper
understanding of the dependencies of 'dc'. I was under the assumption that the isb
needed just like we do after updating the system control registers, which is not
true for dc.

>
>> +1:
>>   	wfe
>
> Curious, but do you know why we don't have a wfi here too (like we do in
> the C code)?

Yes we do, will add it.

>>   	/*
>>   	 * Now bring the CPU into our world.
>> @@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>   		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>   	}
>>
>> +	mb();
>
> Why? (and why not smp_mb(), if the barrier is actually needed?).

Actually, now when I think of it we don't need it as the cache would have been
invalidated by the secondary. All we need is making secondary_data.status volatile,
to make sure we don't use a stale copy of secondary_data.status cached in the
register.

I will rewrite this code.

>> +
>>   	secondary_data.stack = NULL;
>> +	if (ret && secondary_data.status) {
>> +		switch(secondary_data.status) {
>> +		default:
>> +			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>> +					cpu, secondary_data.status);
>> +			break;
>> +		case CPU_KILL_ME:
>> +			if (op_cpu_kill(cpu)) {
>> +				pr_crit("CPU%u: may not have shut down cleanly\n",
>> +							cpu);
>> +				cpus_stuck_in_kernel++;
>
> Maybe restructure this so the failed kill case is a fallthrough to the
> CPU_STUCK_IN_KERNEL case?

Yes, makes sense.

>
>> +			} else
>> +				pr_crit("CPU%u: died during early boot\n", cpu);
>
> Missing braces.

Will fix that.

>> +			break;
>> +		case CPU_STUCK_IN_KERNEL:
>> +			pr_crit("CPU%u: is stuck in kernel\n", cpu);
>> +			cpus_stuck_in_kernel++;
>> +			break;
>> +		case CPU_PANIC_KERNEL:
>> +			panic("CPU%u detected unsupported configuration\n", cpu);
>
> It would nice to show what the configuration difference was, but maybe
> that's work for later.

One way would be to assign code for each of the failing cases overload the
status with the reason (on the top half). But thats only needed for cases where
we couldn't write something to the console. Anyways, thats something for later.

>
>> +		}
>> +	}
>>
>>   	return ret;
>>   }
>> @@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
>>   	 */
>>   	pr_info("CPU%u: Booted secondary processor [%08x]\n",
>>   					 cpu, read_cpuid_id());
>> +	update_cpu_boot_status(CPU_BOOT_SUCCESS);
>
> Why do we need to continue with the cacheflushing at this point?

>> +	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
>> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>
> Extra flushing, but I don't see why you need it at all when you're
> running in C on the CPU coming up.


You are right, we don't need to do this as we are already using the
va.

I will rewrite it properly. Thanks for the review and patience :)

Cheers
Suzuki
  


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-16 10:39       ` Suzuki K. Poulose
  0 siblings, 0 replies; 34+ messages in thread
From: Suzuki K. Poulose @ 2015-12-16 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/12/15 11:55, Will Deacon wrote:
> On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:

>>   /*
>>    * Initial data for bringing up a secondary CPU.
>> + * @stack  - sp for the secondary CPU
>> + * @status - Result passed back from the secondary CPU to
>> + *           indicate failure.
>>    */
>>   struct secondary_data {
>>   	void *stack;
>> -};
>> +	unsigned long status;
>> +} ____cacheline_aligned;
>
> Why is this necessary?

That was based on a suggestion from Mark Rutland here:

   https://lkml.org/lkml/2015/12/1/580


>> @@ -615,6 +616,7 @@ ENDPROC(__secondary_switched)
>>    * Enable the MMU.
>>    *
>>    *  x0  = SCTLR_EL1 value for turning on the MMU.
>> + *  x22 = __va(secondary_data)
>>    *  x27 = *virtual* address to jump to upon completion

...

>>
>>   __no_granule_support:
>> +	/* Indicate that this CPU can't boot and is stuck in the kernel */
>> +	cmp	x22, 0				// Boot CPU doesn't update status
>> +	b.eq	1f
>> +	adrp	x1, secondary_data
>> +	add	x1, x1, #:lo12:secondary_data	// x1 = __pa(secondary_data)
>
> This feels a bit grotty. You end up using the virtual address of
> secondary_data to figure out if you're the boot CPU or not, and then you
> end up having to compute the physical address of the same variable anyway.
>

Yes, it looks a bit weird, but we pass down the va(secondary_data) to the secondary
CPU and that is used only after the MMU is turned on, to fetch the stack ptr.
We use that information to detect if we are secondary. When we fail before turning
the MMU on, we need to update our result back, hence the va-to-pa conversion.
Is there a better way to handle it ? Am I missing something ? Or we could add another
register which would indicate if it is secondary or not.

>> +	mov	x0, #CPU_STUCK_IN_KERNEL
>> +	str	x0, [x1, #CPU_BOOT_STATUS]	// update the secondary_data.status
>> +	/* flush the data to PoC */
>
> Can you call __inval_cache_range instead of open-coding this?

Yes, that sounds better, will use that.

>> +	isb
>
> Why?

...

>
>> +	dc	civac, x1
>> +	dsb	sy
>> +	isb
>
> Why?

I now realize how bad this code is. I wrote this up in a hurry, without proper
understanding of the dependencies of 'dc'. I was under the assumption that the isb
needed just like we do after updating the system control registers, which is not
true for dc.

>
>> +1:
>>   	wfe
>
> Curious, but do you know why we don't have a wfi here too (like we do in
> the C code)?

Yes we do, will add it.

>>   	/*
>>   	 * Now bring the CPU into our world.
>> @@ -117,7 +135,31 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>   		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>   	}
>>
>> +	mb();
>
> Why? (and why not smp_mb(), if the barrier is actually needed?).

Actually, now when I think of it we don't need it as the cache would have been
invalidated by the secondary. All we need is making secondary_data.status volatile,
to make sure we don't use a stale copy of secondary_data.status cached in the
register.

I will rewrite this code.

>> +
>>   	secondary_data.stack = NULL;
>> +	if (ret && secondary_data.status) {
>> +		switch(secondary_data.status) {
>> +		default:
>> +			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>> +					cpu, secondary_data.status);
>> +			break;
>> +		case CPU_KILL_ME:
>> +			if (op_cpu_kill(cpu)) {
>> +				pr_crit("CPU%u: may not have shut down cleanly\n",
>> +							cpu);
>> +				cpus_stuck_in_kernel++;
>
> Maybe restructure this so the failed kill case is a fallthrough to the
> CPU_STUCK_IN_KERNEL case?

Yes, makes sense.

>
>> +			} else
>> +				pr_crit("CPU%u: died during early boot\n", cpu);
>
> Missing braces.

Will fix that.

>> +			break;
>> +		case CPU_STUCK_IN_KERNEL:
>> +			pr_crit("CPU%u: is stuck in kernel\n", cpu);
>> +			cpus_stuck_in_kernel++;
>> +			break;
>> +		case CPU_PANIC_KERNEL:
>> +			panic("CPU%u detected unsupported configuration\n", cpu);
>
> It would nice to show what the configuration difference was, but maybe
> that's work for later.

One way would be to assign code for each of the failing cases overload the
status with the reason (on the top half). But thats only needed for cases where
we couldn't write something to the console. Anyways, thats something for later.

>
>> +		}
>> +	}
>>
>>   	return ret;
>>   }
>> @@ -185,6 +227,7 @@ asmlinkage void secondary_start_kernel(void)
>>   	 */
>>   	pr_info("CPU%u: Booted secondary processor [%08x]\n",
>>   					 cpu, read_cpuid_id());
>> +	update_cpu_boot_status(CPU_BOOT_SUCCESS);
>
> Why do we need to continue with the cacheflushing at this point?

>> +	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
>> +	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>
> Extra flushing, but I don't see why you need it at all when you're
> running in C on the CPU coming up.


You are right, we don't need to do this as we are already using the
va.

I will rewrite it properly. Thanks for the review and patience :)

Cheers
Suzuki
  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-16 10:39       ` Suzuki K. Poulose
@ 2015-12-16 11:29         ` Will Deacon
  -1 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-16 11:29 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: mark.rutland, catalin.marinas, linux-kernel, linux-arm-kernel,
	marc.zyngier

On Wed, Dec 16, 2015 at 10:39:50AM +0000, Suzuki K. Poulose wrote:
> On 15/12/15 11:55, Will Deacon wrote:
> >On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> 
> >>  /*
> >>   * Initial data for bringing up a secondary CPU.
> >>+ * @stack  - sp for the secondary CPU
> >>+ * @status - Result passed back from the secondary CPU to
> >>+ *           indicate failure.
> >>   */
> >>  struct secondary_data {
> >>  	void *stack;
> >>-};
> >>+	unsigned long status;
> >>+} ____cacheline_aligned;
> >
> >Why is this necessary?
> 
> That was based on a suggestion from Mark Rutland here:
> 
>   https://lkml.org/lkml/2015/12/1/580

That thread is talking about the CWG, which is not the same thing as
____cacheline_aligned. Given that the architectural maximum for the CWG
is 2K, we can probably get away with allocating the status field amongst
the head.S text instead (which we know will be clean).

Since SMP boot is serialised, that should be sufficient, right?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-16 11:29         ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2015-12-16 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 16, 2015 at 10:39:50AM +0000, Suzuki K. Poulose wrote:
> On 15/12/15 11:55, Will Deacon wrote:
> >On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> 
> >>  /*
> >>   * Initial data for bringing up a secondary CPU.
> >>+ * @stack  - sp for the secondary CPU
> >>+ * @status - Result passed back from the secondary CPU to
> >>+ *           indicate failure.
> >>   */
> >>  struct secondary_data {
> >>  	void *stack;
> >>-};
> >>+	unsigned long status;
> >>+} ____cacheline_aligned;
> >
> >Why is this necessary?
> 
> That was based on a suggestion from Mark Rutland here:
> 
>   https://lkml.org/lkml/2015/12/1/580

That thread is talking about the CWG, which is not the same thing as
____cacheline_aligned. Given that the architectural maximum for the CWG
is 2K, we can probably get away with allocating the status field amongst
the head.S text instead (which we know will be clean).

Since SMP boot is serialised, that should be sufficient, right?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
  2015-12-16 11:29         ` Will Deacon
@ 2015-12-16 11:36           ` Mark Rutland
  -1 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2015-12-16 11:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki K. Poulose, catalin.marinas, linux-kernel,
	linux-arm-kernel, marc.zyngier

On Wed, Dec 16, 2015 at 11:29:15AM +0000, Will Deacon wrote:
> On Wed, Dec 16, 2015 at 10:39:50AM +0000, Suzuki K. Poulose wrote:
> > On 15/12/15 11:55, Will Deacon wrote:
> > >On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> > 
> > >>  /*
> > >>   * Initial data for bringing up a secondary CPU.
> > >>+ * @stack  - sp for the secondary CPU
> > >>+ * @status - Result passed back from the secondary CPU to
> > >>+ *           indicate failure.
> > >>   */
> > >>  struct secondary_data {
> > >>  	void *stack;
> > >>-};
> > >>+	unsigned long status;
> > >>+} ____cacheline_aligned;
> > >
> > >Why is this necessary?
> > 
> > That was based on a suggestion from Mark Rutland here:
> > 
> >   https://lkml.org/lkml/2015/12/1/580
> 
> That thread is talking about the CWG, which is not the same thing as
> ____cacheline_aligned. Given that the architectural maximum for the CWG
> is 2K, we can probably get away with allocating the status field amongst
> the head.S text instead (which we know will be clean).
> 
> Since SMP boot is serialised, that should be sufficient, right?

Assuming you mean in .head.text (rather than .text), that should work
given that we don't currently free .head.text.

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures
@ 2015-12-16 11:36           ` Mark Rutland
  0 siblings, 0 replies; 34+ messages in thread
From: Mark Rutland @ 2015-12-16 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 16, 2015 at 11:29:15AM +0000, Will Deacon wrote:
> On Wed, Dec 16, 2015 at 10:39:50AM +0000, Suzuki K. Poulose wrote:
> > On 15/12/15 11:55, Will Deacon wrote:
> > >On Wed, Dec 09, 2015 at 09:57:15AM +0000, Suzuki K. Poulose wrote:
> > 
> > >>  /*
> > >>   * Initial data for bringing up a secondary CPU.
> > >>+ * @stack  - sp for the secondary CPU
> > >>+ * @status - Result passed back from the secondary CPU to
> > >>+ *           indicate failure.
> > >>   */
> > >>  struct secondary_data {
> > >>  	void *stack;
> > >>-};
> > >>+	unsigned long status;
> > >>+} ____cacheline_aligned;
> > >
> > >Why is this necessary?
> > 
> > That was based on a suggestion from Mark Rutland here:
> > 
> >   https://lkml.org/lkml/2015/12/1/580
> 
> That thread is talking about the CWG, which is not the same thing as
> ____cacheline_aligned. Given that the architectural maximum for the CWG
> is 2K, we can probably get away with allocating the status field amongst
> the head.S text instead (which we know will be clean).
> 
> Since SMP boot is serialised, that should be sufficient, right?

Assuming you mean in .head.text (rather than .text), that should work
given that we don't currently free .head.text.

Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2015-12-16 11:36 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09  9:57 [RFC PATCH v3 0/8] arm64: Verify early CPU features Suzuki K. Poulose
2015-12-09  9:57 ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 1/8] arm64: Introduce cpu_die_early Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 2/8] arm64: Move cpu_die_early to smp.c Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-15 11:23   ` Will Deacon
2015-12-15 11:23     ` Will Deacon
2015-12-15 11:26     ` Mark Rutland
2015-12-15 11:26       ` Mark Rutland
2015-12-09  9:57 ` [RFC PATCH v3 3/8] arm64: head.S : Change register usage Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 4/8] arm64: Handle early CPU boot failures Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-15 11:37   ` Suzuki K. Poulose
2015-12-15 11:37     ` Suzuki K. Poulose
2015-12-15 11:55   ` Will Deacon
2015-12-15 11:55     ` Will Deacon
2015-12-16 10:39     ` Suzuki K. Poulose
2015-12-16 10:39       ` Suzuki K. Poulose
2015-12-16 11:29       ` Will Deacon
2015-12-16 11:29         ` Will Deacon
2015-12-16 11:36         ` Mark Rutland
2015-12-16 11:36           ` Mark Rutland
2015-12-09  9:57 ` [RFC PATCH v3 5/8] arm64: Enable CPU capability verification unconditionally Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 6/8] arm64: Add hook for checking early CPU features Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 7/8] arm64: Add helper for extracting ASIDBits Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-09  9:57 ` [RFC PATCH v3 8/8] arm64: Ensure the secondary CPUs have safe ASIDBits size Suzuki K. Poulose
2015-12-09  9:57   ` Suzuki K. Poulose
2015-12-15 12:02   ` Will Deacon
2015-12-15 12:02     ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.