linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Cristian Marussi <cristian.marussi@arm.com>
Cc: catalin.marinas@arm.com, will@kernel.org, james.morse@arm.com,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/2] arm64: smp: fix crash_smp_send_stop() behaviour
Date: Tue, 10 Mar 2020 14:47:58 +0000	[thread overview]
Message-ID: <20200310144758.GD54660@lakrids.cambridge.arm.com> (raw)
In-Reply-To: <20200306185739.3227-3-cristian.marussi@arm.com>

On Fri, Mar 06, 2020 at 06:57:39PM +0000, Cristian Marussi wrote:
> On a system configured to trigger a crash_kexec() reboot, when only one CPU
> is online and another CPU panics while starting-up, crash_smp_send_stop()
> will fail to send any STOP message to the other already online core,
> resulting in fail to freeze and registers not properly saved.
> 
> Moreover even if the proper messages are sent (case CPUs > 2)
> it will similarly fail to account for the booting CPU when executing
> the final stop wait-loop, so potentially resulting in some CPU not
> been waited for shutdown before rebooting.
> 
> A tangible effect of this behaviour can be observed when, after a panic
> with kexec enabled and loaded, on the following reboot triggered by kexec,
> the cpu that could not be successfully stopped fails to come back online:
> 
> [  362.291022] ------------[ cut here ]------------
> [  362.291525] kernel BUG at arch/arm64/kernel/cpufeature.c:886!
> [  362.292023] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [  362.292400] Modules linked in:
> [  362.292970] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.6.0-rc4-00003-gc780b890948a #105
> [  362.293136] Hardware name: Foundation-v8A (DT)
> [  362.293382] pstate: 200001c5 (nzCv dAIF -PAN -UAO)
> [  362.294063] pc : has_cpuid_feature+0xf0/0x348
> [  362.294177] lr : verify_local_elf_hwcaps+0x84/0xe8
> [  362.294280] sp : ffff800011b1bf60
> [  362.294362] x29: ffff800011b1bf60 x28: 0000000000000000
> [  362.294534] x27: 0000000000000000 x26: 0000000000000000
> [  362.294631] x25: 0000000000000000 x24: ffff80001189a25c
> [  362.294718] x23: 0000000000000000 x22: 0000000000000000
> [  362.294803] x21: ffff8000114aa018 x20: ffff800011156a00
> [  362.294897] x19: ffff800010c944a0 x18: 0000000000000004
> [  362.294987] x17: 0000000000000000 x16: 0000000000000000
> [  362.295073] x15: 00004e53b831ae3c x14: 00004e53b831ae3c
> [  362.295165] x13: 0000000000000384 x12: 0000000000000000
> [  362.295251] x11: 0000000000000000 x10: 00400032b5503510
> [  362.295334] x9 : 0000000000000000 x8 : ffff800010c7e204
> [  362.295426] x7 : 00000000410fd0f0 x6 : 0000000000000001
> [  362.295508] x5 : 00000000410fd0f0 x4 : 0000000000000000
> [  362.295592] x3 : 0000000000000000 x2 : ffff8000100939d8
> [  362.295683] x1 : 0000000000180420 x0 : 0000000000180480
> [  362.296011] Call trace:
> [  362.296257]  has_cpuid_feature+0xf0/0x348
> [  362.296350]  verify_local_elf_hwcaps+0x84/0xe8
> [  362.296424]  check_local_cpu_capabilities+0x44/0x128
> [  362.296497]  secondary_start_kernel+0xf4/0x188
> [  362.296998] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000)
> [  362.298652] SMP: stopping secondary CPUs
> [  362.300615] Starting crashdump kernel...
> [  362.301168] Bye!
> [    0.000000] Booting Linux on physical CPU 0x0000000003 [0x410fd0f0]
> [    0.000000] Linux version 5.6.0-rc4-00003-gc780b890948a (crimar01@e120937-lin) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #105 SMP PREEMPT Fri Mar 6 17:00:42 GMT 2020
> [    0.000000] Machine model: Foundation-v8A
> [    0.000000] earlycon: pl11 at MMIO 0x000000001c090000 (options '')
> [    0.000000] printk: bootconsole [pl11] enabled
> .....
> [    0.138024] rcu: Hierarchical SRCU implementation.
> [    0.153472] its@2f020000: unable to locate ITS domain
> [    0.154078] its@2f020000: Unable to locate ITS domain
> [    0.157541] EFI services will not be available.
> [    0.175395] smp: Bringing up secondary CPUs ...
> [    0.209182] psci: failed to boot CPU1 (-22)
> [    0.209377] CPU1: failed to boot: -22
> [    0.274598] Detected PIPT I-cache on CPU2
> [    0.278707] GICv3: CPU2: found redistributor 1 region 0:0x000000002f120000
> [    0.285212] CPU2: Booted secondary processor 0x0000000001 [0x410fd0f0]
> [    0.369053] Detected PIPT I-cache on CPU3
> [    0.372947] GICv3: CPU3: found redistributor 2 region 0:0x000000002f140000
> [    0.378664] CPU3: Booted secondary processor 0x0000000002 [0x410fd0f0]
> [    0.401707] smp: Brought up 1 node, 3 CPUs
> [    0.404057] SMP: Total of 3 processors activated.
> 
> Make crash_smp_send_stop() account also for the online status of the
> calling CPU while evaluating how many CPUs are effectively online: this way
> the right number of STOPs is sent and all other stopped-cores's registers
> are properly saved.
> 
> Fixes: 78fd584cdec05 ("arm64: kdump: implement machine_crash_shutdown()")
> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> ---
>  arch/arm64/kernel/smp.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 55779218ec56..33466d197b1c 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -997,6 +997,7 @@ void crash_smp_send_stop(void)
>  	static int cpus_stopped;
>  	cpumask_t mask;
>  	unsigned long timeout;
> +	unsigned int this_cpu_online = cpu_online(smp_processor_id());
>  
>  	/*
>  	 * This function can be called twice in panic path, but obviously
> @@ -1007,7 +1008,11 @@ void crash_smp_send_stop(void)
>  
>  	cpus_stopped = 1;
>  
> -	if (num_online_cpus() == 1) {
> +	/*
> +	 * If this cpu is the only one alive at this point in time, online or
> +	 * not, there are no stop messages to be sent around, so just back out.
> +	 */
> +	if (num_online_cpus() == this_cpu_online) {

If you add num_other_online_cpus(), this can be:


	/*
	 * If no other CPUs are online at this point in time, there are
	 * no stop messages to be sent, so jsut back out.
	 */
	if (num_other_online_cpus() == 0)

>  		sdei_mask_local_cpu();
>  		return;
>  	}
> @@ -1015,7 +1020,15 @@ void crash_smp_send_stop(void)
>  	cpumask_copy(&mask, cpu_online_mask);
>  	cpumask_clear_cpu(smp_processor_id(), &mask);
>  
> -	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
> +	/*
> +	 * Account for the online status of this cpu when counting for
> +	 * waiting cpus; note that always holds true that:
> +	 *
> +	 *	num_online_cpus() >= this_cpu_online
> +	 *
> +	 * so the following subtraction is safe.
> +	 */
> +	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - this_cpu_online);

... and I don't think we need the comment here with:

	atomic_set(&waiting_for_crash_ipi, num_other_online_cpus());

Otherwise, this looks sound to me.

Thanks,
Mark.

>  
>  	pr_crit("SMP: stopping secondary CPUs\n");
>  	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
> -- 
> 2.17.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      reply	other threads:[~2020-03-10 14:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 18:57 [PATCH 0/2] Fix Kernel failing to freeze system on panic Cristian Marussi
2020-03-06 18:57 ` [PATCH 1/2] arm64: smp: fix smp_send_stop() behaviour Cristian Marussi
2020-03-10 14:44   ` Mark Rutland
2020-03-06 18:57 ` [PATCH 2/2] arm64: smp: fix crash_smp_send_stop() behaviour Cristian Marussi
2020-03-10 14:47   ` Mark Rutland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310144758.GD54660@lakrids.cambridge.arm.com \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=cristian.marussi@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).