kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: kexec@lists.infradead.org
Subject: [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
Date: Tue, 10 May 2022 10:28:11 +0200	[thread overview]
Message-ID: <87y1z9pzac.ffs@tglx> (raw)
In-Reply-To: <YnneUeRJ42SRM/M+@piliu.users.ipa.redhat.com>

On Tue, May 10 2022 at 11:38, Pingfan Liu wrote:
> On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
>> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
>> > The following code chunk repeats in both
>> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>> > This is due to a breakage like the following:
>> 
>> I don't see what's broken here.
>> 
>
> No, no broken. Could it be better to replace 'breakage' with
> 'breakin'?

There is no break-in. There is a phase where CPU hotplug is reenabled,
which might be avoided.

>> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
>> 
>> This comment makes no sense.
>> 
>
> Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
> valid online cpu -- primary_cpu keeps unchange.

So what is that parameter for then? If migrate_to_reboot_cpu() ensured
that the current task is on the reboot CPU then this parameter is
useless, no?

>> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>> >  {
>> >  	unsigned int cpu;
>> >  	int error;
>> >  
>> > +	/*
>> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
>> > +	 * it is not touched by us
>> > +	 */
>> >  	cpu_maps_update_begin();
>> > -
>> >  	/*
>> > -	 * Make certain the cpu I'm about to reboot on is online.
>> > -	 *
>> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
>> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> > +	 * no further code needs to use CPU hotplug (which is true in
>> > +	 * the reboot case). However, the kexec path depends on using
>> > +	 * CPU hotplug again; so re-enable it here.
>> 
>> You want to reduce confusion, but in reality this is even more confusing
>> than before.
>> 
>
> This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
> arch-dependent code chunk (here), which is a more proper point.
>
> Could it make things better by rephrasing the words as the following?
>    migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
>    reboot cpu from disappearing. But arches need cpu_down to hot remove
>    cpus except rebooting-cpu, so re-enabling cpu hotplug again.

Can you please use proper words. arches is not a word and it's closer to
the plural of arch, than to the word architecture. This is not twitter.

And no, the architectures do not need cpu_down() at all. This very
function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to
shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0.

>> >  	 */
>> > -	if (!cpu_online(primary_cpu))
>> > -		primary_cpu = cpumask_first(cpu_online_mask);
>> > +	__cpu_hotplug_enable();
>> 
>> How is this decrement solving anything? At the end of this function, the
>> counter is incremented again. So what's the point of this exercise?
>> 
> This decrement enables the cpu hot-removing.  Since
> smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
> cpu_hotplug_disabled, it returns -EBUSY.

Correct, so why can't you spell that out in concise words in the first
place right at that comment which reenables hotplug?

>> What does that for arch/powerpc/kernel/kexec_machine64.c now?
>> 
>> Nothing, as far as I can tell. Which means you basically reverted
>> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
>> kexec from ST mode") unless I'm completely confused.
>> 
>
> Oops. Forget about powerpc. Considering the cpu hotplug is an
> arch-dependent feature in machine_shutdown(), as x86 does not need it.

It's not a feature, it's a architecture specific requirement. x86 is
irrelevant here because this is a powerpc requirement.

>> This is tinkering at best. Can we please sit down and rethink this whole
>> machinery instead of applying random duct tape to it?
>> 
> I try to make code look consistent.

Emphasis on try. So far the attempt failed and resulted in a regression.

Thanks,

        tglx


  reply	other threads:[~2022-05-10  8:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  4:13 [PATCHv3 0/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable Pingfan Liu
2022-05-09  4:13 ` [PATCHv3 1/2] " Pingfan Liu
2022-05-09 10:55   ` Thomas Gleixner
2022-05-09 10:57     ` Thomas Gleixner
2022-05-10  3:38     ` Pingfan Liu
2022-05-10  8:28       ` Thomas Gleixner [this message]
2022-05-11  9:09         ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1z9pzac.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=kexec@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).