kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
@ 2022-05-09  4:13 Pingfan Liu
  2022-05-09  4:13 ` [PATCHv3 1/2] " Pingfan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2022-05-09  4:13 UTC (permalink / raw)
  To: kexec

For the arches (arm/arm64/ia64/riscv), which relies on the cpu hot-removing
mechanism to implement "kexec -e", it is important to make sure that the
rebooting happens on a valid online cpu. And this logic should have been
guaranteed in migrate_to_reboot_cpu().

But the current code has either contradict (resolved by [2/2]) or
redundancy (resolved by [1/2]) about the logic.

V2 -> V3:
Taking in [2/2], which also has problem with the valid rebooting
cpu. (I had sent three patches for different arches. But maybe it is
better to collapse them into one and collect acks from different arches'
maintainers )

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Donnefort <vincent.donnefort@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Baokun Li <libaokun1@huawei.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: kexec at lists.infradead.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
To: linux-ia64@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Marc Zyngier <maz@kernel.org>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (2):
  cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is
    stable
  arm/arm64/ia64: kexec: fix the primary cpu passed to
    smp_shutdown_nonboot_cpus()

 arch/arm/kernel/reboot.c    |  2 +-
 arch/arm64/kernel/process.c |  2 +-
 arch/ia64/kernel/process.c  |  2 +-
 kernel/cpu.c                | 16 ++++++++++------
 kernel/kexec_core.c         | 10 ++++------
 5 files changed, 17 insertions(+), 15 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-09  4:13 [PATCHv3 0/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable Pingfan Liu
@ 2022-05-09  4:13 ` Pingfan Liu
  2022-05-09 10:55   ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2022-05-09  4:13 UTC (permalink / raw)
  To: kexec

The following code chunk repeats in both
migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():

	if (!cpu_online(primary_cpu))
		primary_cpu = cpumask_first(cpu_online_mask);

This is due to a breakage like the following:
kernel_kexec()
   migrate_to_reboot_cpu();
   cpu_hotplug_enable();
                        -----------> comes a cpu_down(this_cpu) on other cpu
   machine_shutdown();
     smp_shutdown_nonboot_cpus(); // re-check "if (!cpu_online(primary_cpu))" to protect against the former breakin

Although the kexec-reboot task can get through a cpu_down() on its cpu,
this code looks a little confusing.

Make things straight forward by keeping cpu hotplug disabled until
smp_shutdown_nonboot_cpus() holds cpu_add_remove_lock. By this way, the
breakage is squashed out and the rebooting cpu can keep unchanged.

Note: this patch only affects the kexec-reboot on arches(arm/arm64/ia64/riscv) ,
which rely on the cpu hot-removing mechanism.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Donnefort <vincent.donnefort@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Baokun Li <libaokun1@huawei.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: kexec at lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c        | 16 ++++++++++------
 kernel/kexec_core.c | 10 ++++------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0b42e8..bf901fc90329 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1228,20 +1228,24 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
 	int error;
 
+	/*
+	 * Block other cpu hotplug event, so primary_cpu is always online if
+	 * it is not touched by us
+	 */
 	cpu_maps_update_begin();
-
 	/*
-	 * Make certain the cpu I'm about to reboot on is online.
-	 *
-	 * This is inline to what migrate_to_reboot_cpu() already do.
+	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
+	 * no further code needs to use CPU hotplug (which is true in
+	 * the reboot case). However, the kexec path depends on using
+	 * CPU hotplug again; so re-enable it here.
 	 */
-	if (!cpu_online(primary_cpu))
-		primary_cpu = cpumask_first(cpu_online_mask);
+	__cpu_hotplug_enable();
 
 	for_each_online_cpu(cpu) {
 		if (cpu == primary_cpu)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 68480f731192..db4fa6b174e3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1168,14 +1168,12 @@ int kernel_kexec(void)
 		kexec_in_progress = true;
 		kernel_restart_prepare("kexec reboot");
 		migrate_to_reboot_cpu();
-
 		/*
-		 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
-		 * no further code needs to use CPU hotplug (which is true in
-		 * the reboot case). However, the kexec path depends on using
-		 * CPU hotplug again; so re-enable it here.
+		 * migrate_to_reboot_cpu() disables CPU hotplug. If an arch
+		 * relies on the cpu teardown to achieve reboot, it needs to
+		 * re-enable CPU hotplug there.
 		 */
-		cpu_hotplug_enable();
+
 		pr_notice("Starting new kernel\n");
 		machine_shutdown();
 	}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-09  4:13 ` [PATCHv3 1/2] " Pingfan Liu
@ 2022-05-09 10:55   ` Thomas Gleixner
  2022-05-09 10:57     ` Thomas Gleixner
  2022-05-10  3:38     ` Pingfan Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2022-05-09 10:55 UTC (permalink / raw)
  To: kexec

On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
> The following code chunk repeats in both
> migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>
> 	if (!cpu_online(primary_cpu))
> 		primary_cpu = cpumask_first(cpu_online_mask);
>
> This is due to a breakage like the following:

I don't see what's broken here.

> kernel_kexec()
>    migrate_to_reboot_cpu();
>    cpu_hotplug_enable();
>                         -----------> comes a cpu_down(this_cpu) on other cpu
>    machine_shutdown();
>      smp_shutdown_nonboot_cpus(); // re-check "if (!cpu_online(primary_cpu))" to protect against the former breakin
>
> Although the kexec-reboot task can get through a cpu_down() on its cpu,
> this code looks a little confusing.

Confusing != broken.

> +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */

This comment makes no sense.

>  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>  {
>  	unsigned int cpu;
>  	int error;
>  
> +	/*
> +	 * Block other cpu hotplug event, so primary_cpu is always online if
> +	 * it is not touched by us
> +	 */
>  	cpu_maps_update_begin();
> -
>  	/*
> -	 * Make certain the cpu I'm about to reboot on is online.
> -	 *
> -	 * This is inline to what migrate_to_reboot_cpu() already do.
> +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> +	 * no further code needs to use CPU hotplug (which is true in
> +	 * the reboot case). However, the kexec path depends on using
> +	 * CPU hotplug again; so re-enable it here.

You want to reduce confusion, but in reality this is even more confusing
than before.

>  	 */
> -	if (!cpu_online(primary_cpu))
> -		primary_cpu = cpumask_first(cpu_online_mask);
> +	__cpu_hotplug_enable();

How is this decrement solving anything? At the end of this function, the
counter is incremented again. So what's the point of this exercise?

>  	for_each_online_cpu(cpu) {
>  		if (cpu == primary_cpu)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 68480f731192..db4fa6b174e3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1168,14 +1168,12 @@ int kernel_kexec(void)
>  		kexec_in_progress = true;
>  		kernel_restart_prepare("kexec reboot");
>  		migrate_to_reboot_cpu();
> -
>  		/*
> -		 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> -		 * no further code needs to use CPU hotplug (which is true in
> -		 * the reboot case). However, the kexec path depends on using
> -		 * CPU hotplug again; so re-enable it here.
> +		 * migrate_to_reboot_cpu() disables CPU hotplug. If an arch
> +		 * relies on the cpu teardown to achieve reboot, it needs to
> +		 * re-enable CPU hotplug there.

What does that for arch/powerpc/kernel/kexec_machine64.c now?

Nothing, as far as I can tell. Which means you basically reverted
011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
kexec from ST mode") unless I'm completely confused.

>  		 */
> -		cpu_hotplug_enable();

This is tinkering at best. Can we please sit down and rethink this whole
machinery instead of applying random duct tape to it?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-09 10:55   ` Thomas Gleixner
@ 2022-05-09 10:57     ` Thomas Gleixner
  2022-05-10  3:38     ` Pingfan Liu
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2022-05-09 10:57 UTC (permalink / raw)
  To: kexec

On Mon, May 09 2022 at 12:55, Thomas Gleixner wrote:
> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
>> -		cpu_hotplug_enable();
>
> This is tinkering at best. Can we please sit down and rethink this whole
> machinery instead of applying random duct tape to it?

That said, I still have not figured out which real world problem you are
actually trying to solve.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-09 10:55   ` Thomas Gleixner
  2022-05-09 10:57     ` Thomas Gleixner
@ 2022-05-10  3:38     ` Pingfan Liu
  2022-05-10  8:28       ` Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2022-05-10  3:38 UTC (permalink / raw)
  To: kexec

On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
> > The following code chunk repeats in both
> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
> >
> > 	if (!cpu_online(primary_cpu))
> > 		primary_cpu = cpumask_first(cpu_online_mask);
> >
> > This is due to a breakage like the following:
> 
> I don't see what's broken here.
> 

No, no broken. Could it be better to replace 'breakage' with 'breakin'?

> > kernel_kexec()
> >    migrate_to_reboot_cpu();
> >    cpu_hotplug_enable();
> >                         -----------> comes a cpu_down(this_cpu) on other cpu
> >    machine_shutdown();
> >      smp_shutdown_nonboot_cpus(); // re-check "if (!cpu_online(primary_cpu))" to protect against the former breakin
> >
> > Although the kexec-reboot task can get through a cpu_down() on its cpu,
> > this code looks a little confusing.
> 
> Confusing != broken.
> 

Yes. And it only recurs confusing.

> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
> 
> This comment makes no sense.
> 

Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
valid online cpu -- primary_cpu keeps unchange.

> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> >  {
> >  	unsigned int cpu;
> >  	int error;
> >  
> > +	/*
> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
> > +	 * it is not touched by us
> > +	 */
> >  	cpu_maps_update_begin();
> > -
> >  	/*
> > -	 * Make certain the cpu I'm about to reboot on is online.
> > -	 *
> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> > +	 * no further code needs to use CPU hotplug (which is true in
> > +	 * the reboot case). However, the kexec path depends on using
> > +	 * CPU hotplug again; so re-enable it here.
> 
> You want to reduce confusion, but in reality this is even more confusing
> than before.
> 

This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
arch-dependent code chunk (here), which is a more proper point.

Could it make things better by rephrasing the words as the following?
   migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
   reboot cpu from disappearing. But arches need cpu_down to hot remove
   cpus except rebooting-cpu, so re-enabling cpu hotplug again.

> >  	 */
> > -	if (!cpu_online(primary_cpu))
> > -		primary_cpu = cpumask_first(cpu_online_mask);
> > +	__cpu_hotplug_enable();
> 
> How is this decrement solving anything? At the end of this function, the
> counter is incremented again. So what's the point of this exercise?
> 

This decrement enables the cpu hot-removing.  Since
smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
cpu_hotplug_disabled, it returns -EBUSY. 

On the other hand, at the end of this function, cpu_hotplug_disabled++,
which aims to prevent any new coming cpu, since machine_kexec() assumes
to execute on the only rebooting-cpu, any dangling cpu has unexpected
result.

> >  	for_each_online_cpu(cpu) {
> >  		if (cpu == primary_cpu)
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 68480f731192..db4fa6b174e3 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -1168,14 +1168,12 @@ int kernel_kexec(void)
> >  		kexec_in_progress = true;
> >  		kernel_restart_prepare("kexec reboot");
> >  		migrate_to_reboot_cpu();
> > -
> >  		/*
> > -		 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> > -		 * no further code needs to use CPU hotplug (which is true in
> > -		 * the reboot case). However, the kexec path depends on using
> > -		 * CPU hotplug again; so re-enable it here.
> > +		 * migrate_to_reboot_cpu() disables CPU hotplug. If an arch
> > +		 * relies on the cpu teardown to achieve reboot, it needs to
> > +		 * re-enable CPU hotplug there.
> 
> What does that for arch/powerpc/kernel/kexec_machine64.c now?
> 
> Nothing, as far as I can tell. Which means you basically reverted
> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
> kexec from ST mode") unless I'm completely confused.
> 

Oops. Forget about powerpc. Considering the cpu hotplug is an
arch-dependent feature in machine_shutdown(), as x86 does not need it.

What about supplying an cpu hotplug enable in the powerpc machine_kexec implement.

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 6cc7793b8420..8ccf22197f08 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -224,6 +224,7 @@ static void wake_offline_cpus(void)

 static void kexec_prepare_cpus(void)
 {
+       cpu_hotplug_enable();
        wake_offline_cpus();
        smp_call_function(kexec_smp_down, NULL, /* wait */0);
        local_irq_disable();


> >  		 */
> > -		cpu_hotplug_enable();
> 
> This is tinkering at best. Can we please sit down and rethink this whole
> machinery instead of applying random duct tape to it?
> 

I try to make code look consistent. As in the current source tree.
There are three sequential sites trying to resolve the valid rebooting cpu:

1.
void migrate_to_reboot_cpu(void)
{
	/* The boot cpu is always logical cpu 0 */
	int cpu = reboot_cpu;

	cpu_hotplug_disable();

	/* Make certain the cpu I'm about to reboot on is online */
	if (!cpu_online(cpu))
		cpu = cpumask_first(cpu_online_mask);
...
}

2. In each arches machine_shutdown(), the input param primary_cpu for
smp_shutdown_nonboot_cpus() looks a little arbitrary.

$git grep smp_shutdown_nonboot_cpus -- arch | grep -v \*
arch/arm/kernel/reboot.c:94:    smp_shutdown_nonboot_cpus(reboot_cpu);
arch/arm64/kernel/process.c:89: smp_shutdown_nonboot_cpus(reboot_cpu);
arch/ia64/kernel/process.c:578: smp_shutdown_nonboot_cpus(reboot_cpu);
arch/riscv/kernel/machine_kexec.c:135:  smp_shutdown_nonboot_cpus(smp_processor_id());

3. finally
smp_shutdown_nonboot_cpus(primary_cpu)
{
...
	if (!cpu_online(primary_cpu))
		primary_cpu = cpumask_first(cpu_online_mask);
...
}


With this series, the 3rd can be removed, and the 2nd can be unified to
smp_processor_id().


Thanks,

	Pingfan


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-10  3:38     ` Pingfan Liu
@ 2022-05-10  8:28       ` Thomas Gleixner
  2022-05-11  9:09         ` Pingfan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2022-05-10  8:28 UTC (permalink / raw)
  To: kexec

On Tue, May 10 2022 at 11:38, Pingfan Liu wrote:
> On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
>> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
>> > The following code chunk repeats in both
>> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>> > This is due to a breakage like the following:
>> 
>> I don't see what's broken here.
>> 
>
> No, no broken. Could it be better to replace 'breakage' with
> 'breakin'?

There is no break-in. There is a phase where CPU hotplug is reenabled,
which might be avoided.

>> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
>> 
>> This comment makes no sense.
>> 
>
> Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
> valid online cpu -- primary_cpu keeps unchange.

So what is that parameter for then? If migrate_to_reboot_cpu() ensured
that the current task is on the reboot CPU then this parameter is
useless, no?

>> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>> >  {
>> >  	unsigned int cpu;
>> >  	int error;
>> >  
>> > +	/*
>> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
>> > +	 * it is not touched by us
>> > +	 */
>> >  	cpu_maps_update_begin();
>> > -
>> >  	/*
>> > -	 * Make certain the cpu I'm about to reboot on is online.
>> > -	 *
>> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
>> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> > +	 * no further code needs to use CPU hotplug (which is true in
>> > +	 * the reboot case). However, the kexec path depends on using
>> > +	 * CPU hotplug again; so re-enable it here.
>> 
>> You want to reduce confusion, but in reality this is even more confusing
>> than before.
>> 
>
> This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
> arch-dependent code chunk (here), which is a more proper point.
>
> Could it make things better by rephrasing the words as the following?
>    migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
>    reboot cpu from disappearing. But arches need cpu_down to hot remove
>    cpus except rebooting-cpu, so re-enabling cpu hotplug again.

Can you please use proper words. arches is not a word and it's closer to
the plural of arch, than to the word architecture. This is not twitter.

And no, the architectures do not need cpu_down() at all. This very
function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to
shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0.

>> >  	 */
>> > -	if (!cpu_online(primary_cpu))
>> > -		primary_cpu = cpumask_first(cpu_online_mask);
>> > +	__cpu_hotplug_enable();
>> 
>> How is this decrement solving anything? At the end of this function, the
>> counter is incremented again. So what's the point of this exercise?
>> 
> This decrement enables the cpu hot-removing.  Since
> smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
> cpu_hotplug_disabled, it returns -EBUSY.

Correct, so why can't you spell that out in concise words in the first
place right at that comment which reenables hotplug?

>> What does that for arch/powerpc/kernel/kexec_machine64.c now?
>> 
>> Nothing, as far as I can tell. Which means you basically reverted
>> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
>> kexec from ST mode") unless I'm completely confused.
>> 
>
> Oops. Forget about powerpc. Considering the cpu hotplug is an
> arch-dependent feature in machine_shutdown(), as x86 does not need it.

It's not a feature, it's a architecture specific requirement. x86 is
irrelevant here because this is a powerpc requirement.

>> This is tinkering at best. Can we please sit down and rethink this whole
>> machinery instead of applying random duct tape to it?
>> 
> I try to make code look consistent.

Emphasis on try. So far the attempt failed and resulted in a regression.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
  2022-05-10  8:28       ` Thomas Gleixner
@ 2022-05-11  9:09         ` Pingfan Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-05-11  9:09 UTC (permalink / raw)
  To: kexec

On Tue, May 10, 2022 at 10:28:11AM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 11:38, Pingfan Liu wrote:
> > On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
> >> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
> >> > The following code chunk repeats in both
> >> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
> >> > This is due to a breakage like the following:
> >> 
> >> I don't see what's broken here.
> >> 
> >
> > No, no broken. Could it be better to replace 'breakage' with
> > 'breakin'?
> 
> There is no break-in. There is a phase where CPU hotplug is reenabled,
> which might be avoided.
> 

OK, I will rephrase like this.

> >> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
> >> 
> >> This comment makes no sense.
> >> 
> >
> > Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
> > valid online cpu -- primary_cpu keeps unchange.
> 
> So what is that parameter for then? If migrate_to_reboot_cpu() ensured
> that the current task is on the reboot CPU then this parameter is
> useless, no?
> 

Yes, it is useless after this patch. I will post V4 to kill it.

> >> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> >> >  {
> >> >  	unsigned int cpu;
> >> >  	int error;
> >> >  
> >> > +	/*
> >> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
> >> > +	 * it is not touched by us
> >> > +	 */
> >> >  	cpu_maps_update_begin();
> >> > -
> >> >  	/*
> >> > -	 * Make certain the cpu I'm about to reboot on is online.
> >> > -	 *
> >> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
> >> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> >> > +	 * no further code needs to use CPU hotplug (which is true in
> >> > +	 * the reboot case). However, the kexec path depends on using
> >> > +	 * CPU hotplug again; so re-enable it here.
> >> 
> >> You want to reduce confusion, but in reality this is even more confusing
> >> than before.
> >> 
> >
> > This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
> > arch-dependent code chunk (here), which is a more proper point.
> >
> > Could it make things better by rephrasing the words as the following?
> >    migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
> >    reboot cpu from disappearing. But arches need cpu_down to hot remove
> >    cpus except rebooting-cpu, so re-enabling cpu hotplug again.
> 
> Can you please use proper words. arches is not a word and it's closer to
> the plural of arch, than to the word architecture. This is not twitter.
> 

OK, I will correct it.

> And no, the architectures do not need cpu_down() at all. This very
> function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to
> shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0.
> 

Yes. I will pay attention to the accuracy of the description.

> >> >  	 */
> >> > -	if (!cpu_online(primary_cpu))
> >> > -		primary_cpu = cpumask_first(cpu_online_mask);
> >> > +	__cpu_hotplug_enable();
> >> 
> >> How is this decrement solving anything? At the end of this function, the
> >> counter is incremented again. So what's the point of this exercise?
> >> 
> > This decrement enables the cpu hot-removing.  Since
> > smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
> > cpu_hotplug_disabled, it returns -EBUSY.
> 
> Correct, so why can't you spell that out in concise words in the first
> place right at that comment which reenables hotplug?
> 

OK, thanks for the suggestion.

> >> What does that for arch/powerpc/kernel/kexec_machine64.c now?
> >> 
> >> Nothing, as far as I can tell. Which means you basically reverted
> >> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
> >> kexec from ST mode") unless I'm completely confused.
> >> 
> >
> > Oops. Forget about powerpc. Considering the cpu hotplug is an
> > arch-dependent feature in machine_shutdown(), as x86 does not need it.
> 
> It's not a feature, it's a architecture specific requirement. x86 is
> irrelevant here because this is a powerpc requirement.
> 

Yes.

> >> This is tinkering at best. Can we please sit down and rethink this whole
> >> machinery instead of applying random duct tape to it?
> >> 
> > I try to make code look consistent.
> 
> Emphasis on try. So far the attempt failed and resulted in a regression.
> 

I will fix the powerpc issue and post V4 after a test.


Thanks for your precious time.

Best Regards,

	Pingfan



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-11  9:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  4:13 [PATCHv3 0/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable Pingfan Liu
2022-05-09  4:13 ` [PATCHv3 1/2] " Pingfan Liu
2022-05-09 10:55   ` Thomas Gleixner
2022-05-09 10:57     ` Thomas Gleixner
2022-05-10  3:38     ` Pingfan Liu
2022-05-10  8:28       ` Thomas Gleixner
2022-05-11  9:09         ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).