linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
@ 2020-07-31 17:38 Marc Zyngier
  2020-07-31 17:38 ` [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Marc Zyngier @ 2020-07-31 17:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Sai Prakash Ranjan, kernel-team, Suzuki K Poulose,
	Catalin Marinas, Stephen Boyd, dianders, Will Deacon

Erratum 1418040 currently prevents a late CPU from booting if none
of the early CPUs are affected by it. This is because the handling
is implemented as alternatives, and we have already got rid of them
by the time userspace onlines a new CPU.

A solution to this is to move everything into C code, and rely on
static keys instead. Once this is done, the feature can be allowed
for late CPUs.

Note that CPUs affected by 1418040 also tend to miss AArch32-EL1,
meaning they cannot be used as late CPUs when KVM is enabled and
that their sibblings have AArch32-EL1.

* From v1:
  - Dropped check for kernel threads
  - Added comment describing the switching logic
  - Made the errata handling function __always_inline

* From v2:
  - Dropped __always_inline again
  - Simplified logic

Marc Zyngier (2):
  arm64: Move handling of erratum 1418040 into C code
  arm64: Allow booting of late CPUs affected by erratum 1418040

 arch/arm64/kernel/cpu_errata.c |  2 ++
 arch/arm64/kernel/entry.S      | 21 --------------------
 arch/arm64/kernel/process.c    | 35 ++++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 21 deletions(-)

-- 
2.27.0

*** BLURB HERE ***

Marc Zyngier (2):
  arm64: Move handling of erratum 1418040 into C code
  arm64: Allow booting of late CPUs affected by erratum 1418040

 arch/arm64/kernel/cpu_errata.c |  2 ++
 arch/arm64/kernel/entry.S      | 21 ---------------------
 arch/arm64/kernel/process.c    | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 21 deletions(-)

-- 
2.27.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code
  2020-07-31 17:38 [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs Marc Zyngier
@ 2020-07-31 17:38 ` Marc Zyngier
  2020-07-31 18:00   ` Stephen Boyd
  2020-07-31 17:38 ` [PATCH v3 2/2] arm64: Allow booting of late CPUs affected by erratum 1418040 Marc Zyngier
       [not found] ` <159803353178.13439.17036526669146072985.b4-ty@arm.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2020-07-31 17:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Sai Prakash Ranjan, kernel-team, Suzuki K Poulose,
	Catalin Marinas, Stephen Boyd, dianders, Will Deacon

Instead of dealing with erratum 1418040 on each entry and exit,
let's move the handling to __switch_to() instead, which has
several advantages:

- It can be applied when it matters (switching between 32 and 64
  bit tasks).
- It is written in C (yay!)
- It can rely on static keys rather than alternatives

Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/entry.S   | 21 ---------------------
 arch/arm64/kernel/process.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 35de8ba60e3d..44445d471442 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -169,19 +169,6 @@ alternative_cb_end
 	stp	x28, x29, [sp, #16 * 14]
 
 	.if	\el == 0
-	.if	\regsize == 32
-	/*
-	 * If we're returning from a 32-bit task on a system affected by
-	 * 1418040 then re-enable userspace access to the virtual counter.
-	 */
-#ifdef CONFIG_ARM64_ERRATUM_1418040
-alternative_if ARM64_WORKAROUND_1418040
-	mrs	x0, cntkctl_el1
-	orr	x0, x0, #2	// ARCH_TIMER_USR_VCT_ACCESS_EN
-	msr	cntkctl_el1, x0
-alternative_else_nop_endif
-#endif
-	.endif
 	clear_gp_regs
 	mrs	x21, sp_el0
 	ldr_this_cpu	tsk, __entry_task, x20
@@ -337,14 +324,6 @@ alternative_else_nop_endif
 	tst	x22, #PSR_MODE32_BIT		// native task?
 	b.eq	3f
 
-#ifdef CONFIG_ARM64_ERRATUM_1418040
-alternative_if ARM64_WORKAROUND_1418040
-	mrs	x0, cntkctl_el1
-	bic	x0, x0, #2			// ARCH_TIMER_USR_VCT_ACCESS_EN
-	msr	cntkctl_el1, x0
-alternative_else_nop_endif
-#endif
-
 #ifdef CONFIG_ARM64_ERRATUM_845719
 alternative_if ARM64_WORKAROUND_845719
 #ifdef CONFIG_PID_IN_CONTEXTIDR
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6089638c7d43..d8a10cf28f82 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -515,6 +515,39 @@ static void entry_task_switch(struct task_struct *next)
 	__this_cpu_write(__entry_task, next);
 }
 
+/*
+ * ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT.
+ * Assuming the virtual counter is enabled at the beginning of times:
+ *
+ * - disable access when switching from a 64bit task to a 32bit task
+ * - enable access when switching from a 32bit task to a 64bit task
+ */
+static void erratum_1418040_thread_switch(struct task_struct *prev,
+					  struct task_struct *next)
+{
+	bool prev32, next32;
+	u64 val;
+
+	if (!(IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040) &&
+	      cpus_have_const_cap(ARM64_WORKAROUND_1418040)))
+		return;
+
+	prev32 = is_compat_thread(task_thread_info(prev));
+	next32 = is_compat_thread(task_thread_info(next));
+
+	if (prev32 == next32)
+		return;
+
+	val = read_sysreg(cntkctl_el1);
+
+	if (!next32)
+		val |= ARCH_TIMER_USR_VCT_ACCESS_EN;
+	else
+		val &= ~ARCH_TIMER_USR_VCT_ACCESS_EN;
+
+	write_sysreg(val, cntkctl_el1);
+}
+
 /*
  * Thread switching.
  */
@@ -530,6 +563,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	entry_task_switch(next);
 	uao_thread_switch(next);
 	ssbs_thread_switch(next);
+	erratum_1418040_thread_switch(prev, next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
-- 
2.27.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
  2020-07-31 17:38 [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs Marc Zyngier
  2020-07-31 17:38 ` [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code Marc Zyngier
@ 2020-07-31 17:38 ` Marc Zyngier
       [not found] ` <159803353178.13439.17036526669146072985.b4-ty@arm.com>
  2 siblings, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2020-07-31 17:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Sai Prakash Ranjan, kernel-team, Suzuki K Poulose,
	Catalin Marinas, Stephen Boyd, dianders, Will Deacon

As we can now switch from a system that isn't affected by 1418040
to a system that globally is affected, let's allow affected CPUs
to come in at a later time.

Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/cpu_errata.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 79728bfb5351..2c0b82db825b 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -910,6 +910,8 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.desc = "ARM erratum 1418040",
 		.capability = ARM64_WORKAROUND_1418040,
 		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
+		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
+			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
 	},
 #endif
 #ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_AT
-- 
2.27.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code
  2020-07-31 17:38 ` [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code Marc Zyngier
@ 2020-07-31 18:00   ` Stephen Boyd
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen Boyd @ 2020-07-31 18:00 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel
  Cc: Sai Prakash Ranjan, Will Deacon, Suzuki K Poulose,
	Catalin Marinas, dianders, kernel-team

Quoting Marc Zyngier (2020-07-31 10:38:23)
> Instead of dealing with erratum 1418040 on each entry and exit,
> let's move the handling to __switch_to() instead, which has
> several advantages:
> 
> - It can be applied when it matters (switching between 32 and 64
>   bit tasks).
> - It is written in C (yay!)
> - It can rely on static keys rather than alternatives
> 
> Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> Acked-by: Will Deacon <will@kernel.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
       [not found] ` <159803353178.13439.17036526669146072985.b4-ty@arm.com>
@ 2020-09-09 14:53   ` Doug Anderson
  2020-09-10 13:43     ` Sai Prakash Ranjan
       [not found]     ` <3d5f6d5289304c558830d5fb8820e6cb@codeaurora.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Doug Anderson @ 2020-09-09 14:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Sai Prakash Ranjan, Will Deacon, Suzuki K Poulose, Marc Zyngier,
	Stephen Boyd, Matthias Kaehlcke, Guenter Roeck, kernel-team,
	Linux ARM

Hi,

On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
> > Erratum 1418040 currently prevents a late CPU from booting if none
> > of the early CPUs are affected by it. This is because the handling
> > is implemented as alternatives, and we have already got rid of them
> > by the time userspace onlines a new CPU.
> >
> > A solution to this is to move everything into C code, and rely on
> > static keys instead. Once this is done, the feature can be allowed
> > for late CPUs.
> >
> > [...]
>
> Applied to arm64 (for-next/fixes), thanks!
>
> [1/2] arm64: Move handling of erratum 1418040 into C code
>       https://git.kernel.org/arm64/c/d49f7d7376d0
> [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>       https://git.kernel.org/arm64/c/bf87bb0881d0

NOTE: patch 2 seems to have come in through a stable merge onto Chrome
OS 5.4 and is causing a regression when resuming from suspend.  In the
short term we've got a revert going into our tree:

https://crrev.com/c/2399101

...but that's obviously not a long term fix.  I haven't done any
debugging of this myself, though I can if there's nobody more
qualified to do it and/or nobody else has time.  I'm just trying to
make sure that the problem is reported somewhere where others might
notice it rather than in an obscure Chrome OS tree.  ;-)

-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
  2020-09-09 14:53   ` [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs Doug Anderson
@ 2020-09-10 13:43     ` Sai Prakash Ranjan
       [not found]     ` <3d5f6d5289304c558830d5fb8820e6cb@codeaurora.org>
  1 sibling, 0 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2020-09-10 13:43 UTC (permalink / raw)
  To: Doug Anderson, Catalin Marinas, Marc Zyngier, Will Deacon,
	Suzuki K Poulose
  Cc: Guenter Roeck, Matthias Kaehlcke, kernel-team, Linux ARM, Stephen Boyd

On 2020-09-09 20:23, Doug Anderson wrote:
> Hi,
> 
> On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
>> 
>> On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
>> > Erratum 1418040 currently prevents a late CPU from booting if none
>> > of the early CPUs are affected by it. This is because the handling
>> > is implemented as alternatives, and we have already got rid of them
>> > by the time userspace onlines a new CPU.
>> >
>> > A solution to this is to move everything into C code, and rely on
>> > static keys instead. Once this is done, the feature can be allowed
>> > for late CPUs.
>> >
>> > [...]
>> 
>> Applied to arm64 (for-next/fixes), thanks!
>> 
>> [1/2] arm64: Move handling of erratum 1418040 into C code
>>       https://git.kernel.org/arm64/c/d49f7d7376d0
>> [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>>       https://git.kernel.org/arm64/c/bf87bb0881d0
> 
> NOTE: patch 2 seems to have come in through a stable merge onto Chrome
> OS 5.4 and is causing a regression when resuming from suspend.  In the
> short term we've got a revert going into our tree:
> 
> https://crrev.com/c/2399101
> 
> ...but that's obviously not a long term fix.  I haven't done any
> debugging of this myself, though I can if there's nobody more
> qualified to do it and/or nobody else has time.  I'm just trying to
> make sure that the problem is reported somewhere where others might
> notice it rather than in an obscure Chrome OS tree.  ;-)
> 

The rootcause is pretty straightforward however I'm afraid the
solution isn't so but I may be mistaken, so this happens on
big.LITTLE systems with CPUs differing in erratum 1418040
which was applicable only for big cores and not little cores.
So when trying to bringup little cores during resume, there
is a conflict as below (messages snipped from the internal bug
for more visibility).

Enabling non-boot CPUs ...
CPU features: CPU1: Detected conflict for capability 35 (ARM erratum 
1418040), System: 1, CPU: 0
CPU1: will not boot
CPU1: will not boot
CPU1: failed to come online
psci: CPU1 killed (polled 0 ms)
CPU1: died during early boot
Error taking CPU1 up: -5

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
       [not found]     ` <3d5f6d5289304c558830d5fb8820e6cb@codeaurora.org>
@ 2020-09-11 13:30       ` Marc Zyngier
  2020-09-11 16:34         ` Sai Prakash Ranjan
       [not found]         ` <51c30228d9fe3dd6e2a55991831e95b0@codeaurora.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Marc Zyngier @ 2020-09-11 13:30 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Will Deacon, Suzuki K Poulose, Catalin Marinas, Doug Anderson,
	Matthias Kaehlcke, Guenter Roeck, Stephen Boyd, kernel-team,
	Linux ARM

On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
> On 2020-09-09 20:23, Doug Anderson wrote:
>> Hi,
>> 
>> On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>>> 
>>> On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
>>> > Erratum 1418040 currently prevents a late CPU from booting if none
>>> > of the early CPUs are affected by it. This is because the handling
>>> > is implemented as alternatives, and we have already got rid of them
>>> > by the time userspace onlines a new CPU.
>>> >
>>> > A solution to this is to move everything into C code, and rely on
>>> > static keys instead. Once this is done, the feature can be allowed
>>> > for late CPUs.
>>> >
>>> > [...]
>>> 
>>> Applied to arm64 (for-next/fixes), thanks!
>>> 
>>> [1/2] arm64: Move handling of erratum 1418040 into C code
>>>       https://git.kernel.org/arm64/c/d49f7d7376d0
>>> [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>>>       https://git.kernel.org/arm64/c/bf87bb0881d0
>> 
>> NOTE: patch 2 seems to have come in through a stable merge onto Chrome
>> OS 5.4 and is causing a regression when resuming from suspend.  In the
>> short term we've got a revert going into our tree:
>> 
>> https://crrev.com/c/2399101
>> 
>> ...but that's obviously not a long term fix.  I haven't done any
>> debugging of this myself, though I can if there's nobody more
>> qualified to do it and/or nobody else has time.  I'm just trying to
>> make sure that the problem is reported somewhere where others might
>> notice it rather than in an obscure Chrome OS tree.  ;-)
>> 
> 
> The rootcause is pretty straightforward however I'm afraid the
> solution isn't so but I may be mistaken, so this happens on
> big.LITTLE systems with CPUs differing in erratum 1418040
> which was applicable only for big cores and not little cores.
> So when trying to bringup little cores during resume, there
> is a conflict as below (messages snipped from the internal bug
> for more visibility).
> 
> Enabling non-boot CPUs ...
> CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
> 1418040), System: 1, CPU: 0
> CPU1: will not boot
> CPU1: will not boot
> CPU1: failed to come online
> psci: CPU1 killed (polled 0 ms)
> CPU1: died during early boot
> Error taking CPU1 up: -5

This is becoming very annoying... By allowing the buggy CPUs to come
in late, we have made it impossible for the good ones to work correctly.

Can you try this (untested yet, I'm dealing with another bucket of
errata at the moment):

diff --git a/arch/arm64/kernel/cpu_errata.c 
b/arch/arm64/kernel/cpu_errata.c
index 6c8303559beb..fcf7f763400c 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = 
{
  		.capability = ARM64_WORKAROUND_1418040,
  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
+			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
  	},
  #endif


Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
  2020-09-11 13:30       ` Marc Zyngier
@ 2020-09-11 16:34         ` Sai Prakash Ranjan
       [not found]         ` <51c30228d9fe3dd6e2a55991831e95b0@codeaurora.org>
  1 sibling, 0 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2020-09-11 16:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Suzuki K Poulose, Catalin Marinas, Doug Anderson,
	Matthias Kaehlcke, Guenter Roeck, Stephen Boyd, kernel-team,
	Linux ARM

On 2020-09-11 19:00, Marc Zyngier wrote:
> On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
>> On 2020-09-09 20:23, Doug Anderson wrote:
>>> Hi,
>>> 
>>> On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
>>> <catalin.marinas@arm.com> wrote:
>>>> 
>>>> On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
>>>> > Erratum 1418040 currently prevents a late CPU from booting if none
>>>> > of the early CPUs are affected by it. This is because the handling
>>>> > is implemented as alternatives, and we have already got rid of them
>>>> > by the time userspace onlines a new CPU.
>>>> >
>>>> > A solution to this is to move everything into C code, and rely on
>>>> > static keys instead. Once this is done, the feature can be allowed
>>>> > for late CPUs.
>>>> >
>>>> > [...]
>>>> 
>>>> Applied to arm64 (for-next/fixes), thanks!
>>>> 
>>>> [1/2] arm64: Move handling of erratum 1418040 into C code
>>>>       https://git.kernel.org/arm64/c/d49f7d7376d0
>>>> [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>>>>       https://git.kernel.org/arm64/c/bf87bb0881d0
>>> 
>>> NOTE: patch 2 seems to have come in through a stable merge onto 
>>> Chrome
>>> OS 5.4 and is causing a regression when resuming from suspend.  In 
>>> the
>>> short term we've got a revert going into our tree:
>>> 
>>> https://crrev.com/c/2399101
>>> 
>>> ...but that's obviously not a long term fix.  I haven't done any
>>> debugging of this myself, though I can if there's nobody more
>>> qualified to do it and/or nobody else has time.  I'm just trying to
>>> make sure that the problem is reported somewhere where others might
>>> notice it rather than in an obscure Chrome OS tree.  ;-)
>>> 
>> 
>> The rootcause is pretty straightforward however I'm afraid the
>> solution isn't so but I may be mistaken, so this happens on
>> big.LITTLE systems with CPUs differing in erratum 1418040
>> which was applicable only for big cores and not little cores.
>> So when trying to bringup little cores during resume, there
>> is a conflict as below (messages snipped from the internal bug
>> for more visibility).
>> 
>> Enabling non-boot CPUs ...
>> CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
>> 1418040), System: 1, CPU: 0
>> CPU1: will not boot
>> CPU1: will not boot
>> CPU1: failed to come online
>> psci: CPU1 killed (polled 0 ms)
>> CPU1: died during early boot
>> Error taking CPU1 up: -5
> 
> This is becoming very annoying... By allowing the buggy CPUs to come
> in late, we have made it impossible for the good ones to work 
> correctly.
> 
> Can you try this (untested yet, I'm dealing with another bucket of
> errata at the moment):
> 
> diff --git a/arch/arm64/kernel/cpu_errata.c 
> b/arch/arm64/kernel/cpu_errata.c
> index 6c8303559beb..fcf7f763400c 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] 
> = {
>  		.capability = ARM64_WORKAROUND_1418040,
>  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
>  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
> +			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
>  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
>  	},
>  #endif
> 

Yes, this works.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
       [not found]         ` <51c30228d9fe3dd6e2a55991831e95b0@codeaurora.org>
@ 2020-09-11 16:42           ` Will Deacon
  2020-09-11 17:47             ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2020-09-11 16:42 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Suzuki K Poulose, Marc Zyngier, Doug Anderson, Matthias Kaehlcke,
	Catalin Marinas, Guenter Roeck, Stephen Boyd, kernel-team,
	Linux ARM

On Fri, Sep 11, 2020 at 10:04:24PM +0530, Sai Prakash Ranjan wrote:
> On 2020-09-11 19:00, Marc Zyngier wrote:
> > On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
> > > On 2020-09-09 20:23, Doug Anderson wrote:
> > > > Hi,
> > > > 
> > > > On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
> > > > <catalin.marinas@arm.com> wrote:
> > > > > 
> > > > > On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
> > > > > > Erratum 1418040 currently prevents a late CPU from booting if none
> > > > > > of the early CPUs are affected by it. This is because the handling
> > > > > > is implemented as alternatives, and we have already got rid of them
> > > > > > by the time userspace onlines a new CPU.
> > > > > >
> > > > > > A solution to this is to move everything into C code, and rely on
> > > > > > static keys instead. Once this is done, the feature can be allowed
> > > > > > for late CPUs.
> > > > > >
> > > > > > [...]
> > > > > 
> > > > > Applied to arm64 (for-next/fixes), thanks!
> > > > > 
> > > > > [1/2] arm64: Move handling of erratum 1418040 into C code
> > > > >       https://git.kernel.org/arm64/c/d49f7d7376d0
> > > > > [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
> > > > >       https://git.kernel.org/arm64/c/bf87bb0881d0
> > > > 
> > > > NOTE: patch 2 seems to have come in through a stable merge onto
> > > > Chrome
> > > > OS 5.4 and is causing a regression when resuming from suspend.
> > > > In the
> > > > short term we've got a revert going into our tree:
> > > > 
> > > > https://crrev.com/c/2399101
> > > > 
> > > > ...but that's obviously not a long term fix.  I haven't done any
> > > > debugging of this myself, though I can if there's nobody more
> > > > qualified to do it and/or nobody else has time.  I'm just trying to
> > > > make sure that the problem is reported somewhere where others might
> > > > notice it rather than in an obscure Chrome OS tree.  ;-)
> > > > 
> > > 
> > > The rootcause is pretty straightforward however I'm afraid the
> > > solution isn't so but I may be mistaken, so this happens on
> > > big.LITTLE systems with CPUs differing in erratum 1418040
> > > which was applicable only for big cores and not little cores.
> > > So when trying to bringup little cores during resume, there
> > > is a conflict as below (messages snipped from the internal bug
> > > for more visibility).
> > > 
> > > Enabling non-boot CPUs ...
> > > CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
> > > 1418040), System: 1, CPU: 0
> > > CPU1: will not boot
> > > CPU1: will not boot
> > > CPU1: failed to come online
> > > psci: CPU1 killed (polled 0 ms)
> > > CPU1: died during early boot
> > > Error taking CPU1 up: -5
> > 
> > This is becoming very annoying... By allowing the buggy CPUs to come
> > in late, we have made it impossible for the good ones to work correctly.
> > 
> > Can you try this (untested yet, I'm dealing with another bucket of
> > errata at the moment):
> > 
> > diff --git a/arch/arm64/kernel/cpu_errata.c
> > b/arch/arm64/kernel/cpu_errata.c
> > index 6c8303559beb..fcf7f763400c 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] =
> > {
> >  		.capability = ARM64_WORKAROUND_1418040,
> >  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
> >  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
> > +			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
> >  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
> >  	},
> >  #endif
> > 
> 
> Yes, this works.

Maybe we should spell it "ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE" and add
a comment about the "feature"?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs
  2020-09-11 16:42           ` Will Deacon
@ 2020-09-11 17:47             ` Marc Zyngier
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2020-09-11 17:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Suzuki K Poulose, Catalin Marinas,
	Doug Anderson, Matthias Kaehlcke, Guenter Roeck, Stephen Boyd,
	kernel-team, Linux ARM

On 2020-09-11 17:42, Will Deacon wrote:
> On Fri, Sep 11, 2020 at 10:04:24PM +0530, Sai Prakash Ranjan wrote:
>> On 2020-09-11 19:00, Marc Zyngier wrote:
>> > On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
>> > > On 2020-09-09 20:23, Doug Anderson wrote:
>> > > > Hi,
>> > > >
>> > > > On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
>> > > > <catalin.marinas@arm.com> wrote:
>> > > > >
>> > > > > On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
>> > > > > > Erratum 1418040 currently prevents a late CPU from booting if none
>> > > > > > of the early CPUs are affected by it. This is because the handling
>> > > > > > is implemented as alternatives, and we have already got rid of them
>> > > > > > by the time userspace onlines a new CPU.
>> > > > > >
>> > > > > > A solution to this is to move everything into C code, and rely on
>> > > > > > static keys instead. Once this is done, the feature can be allowed
>> > > > > > for late CPUs.
>> > > > > >
>> > > > > > [...]
>> > > > >
>> > > > > Applied to arm64 (for-next/fixes), thanks!
>> > > > >
>> > > > > [1/2] arm64: Move handling of erratum 1418040 into C code
>> > > > >       https://git.kernel.org/arm64/c/d49f7d7376d0
>> > > > > [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>> > > > >       https://git.kernel.org/arm64/c/bf87bb0881d0
>> > > >
>> > > > NOTE: patch 2 seems to have come in through a stable merge onto
>> > > > Chrome
>> > > > OS 5.4 and is causing a regression when resuming from suspend.
>> > > > In the
>> > > > short term we've got a revert going into our tree:
>> > > >
>> > > > https://crrev.com/c/2399101
>> > > >
>> > > > ...but that's obviously not a long term fix.  I haven't done any
>> > > > debugging of this myself, though I can if there's nobody more
>> > > > qualified to do it and/or nobody else has time.  I'm just trying to
>> > > > make sure that the problem is reported somewhere where others might
>> > > > notice it rather than in an obscure Chrome OS tree.  ;-)
>> > > >
>> > >
>> > > The rootcause is pretty straightforward however I'm afraid the
>> > > solution isn't so but I may be mistaken, so this happens on
>> > > big.LITTLE systems with CPUs differing in erratum 1418040
>> > > which was applicable only for big cores and not little cores.
>> > > So when trying to bringup little cores during resume, there
>> > > is a conflict as below (messages snipped from the internal bug
>> > > for more visibility).
>> > >
>> > > Enabling non-boot CPUs ...
>> > > CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
>> > > 1418040), System: 1, CPU: 0
>> > > CPU1: will not boot
>> > > CPU1: will not boot
>> > > CPU1: failed to come online
>> > > psci: CPU1 killed (polled 0 ms)
>> > > CPU1: died during early boot
>> > > Error taking CPU1 up: -5
>> >
>> > This is becoming very annoying... By allowing the buggy CPUs to come
>> > in late, we have made it impossible for the good ones to work correctly.
>> >
>> > Can you try this (untested yet, I'm dealing with another bucket of
>> > errata at the moment):
>> >
>> > diff --git a/arch/arm64/kernel/cpu_errata.c
>> > b/arch/arm64/kernel/cpu_errata.c
>> > index 6c8303559beb..fcf7f763400c 100644
>> > --- a/arch/arm64/kernel/cpu_errata.c
>> > +++ b/arch/arm64/kernel/cpu_errata.c
>> > @@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] =
>> > {
>> >  		.capability = ARM64_WORKAROUND_1418040,
>> >  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
>> >  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
>> > +			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
>> >  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
>> >  	},
>> >  #endif
>> >
>> 
>> Yes, this works.
> 
> Maybe we should spell it "ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE" and add
> a comment about the "feature"?

Yeah, that's exactly what it amounts to. Patch incoming.

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-11 17:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 17:38 [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs Marc Zyngier
2020-07-31 17:38 ` [PATCH v3 1/2] arm64: Move handling of erratum 1418040 into C code Marc Zyngier
2020-07-31 18:00   ` Stephen Boyd
2020-07-31 17:38 ` [PATCH v3 2/2] arm64: Allow booting of late CPUs affected by erratum 1418040 Marc Zyngier
     [not found] ` <159803353178.13439.17036526669146072985.b4-ty@arm.com>
2020-09-09 14:53   ` [PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs Doug Anderson
2020-09-10 13:43     ` Sai Prakash Ranjan
     [not found]     ` <3d5f6d5289304c558830d5fb8820e6cb@codeaurora.org>
2020-09-11 13:30       ` Marc Zyngier
2020-09-11 16:34         ` Sai Prakash Ranjan
     [not found]         ` <51c30228d9fe3dd6e2a55991831e95b0@codeaurora.org>
2020-09-11 16:42           ` Will Deacon
2020-09-11 17:47             ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).