All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: ego@linux.vnet.ibm.com, matt@ozlabs.org,
	mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
	ebiederm@xmission.com, paulus@samba.org,
	linuxppc-dev@lists.ozlabs.org,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 00:30:08 +0530	[thread overview]
Message-ID: <53920FB8.4090308@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140606182720.GL1526@redhat.com>

On 06/06/2014 11:57 PM, Vivek Goyal wrote:
> On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
>> On 06/04/2014 07:16 PM, Vivek Goyal wrote:
>>> On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>>>> know from the commit log and the comment mentioned above (and from my own
>>>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>>>> long-term solution.
>>>>>
>>>>> Matt, Ben, any thoughts on this?
>>>>
>>>> The problem is with our "soft offline" which we do on some platforms. When we
>>>> offline we don't actually send the CPUs back to firmware or anything like that.
>>>>
>>>> We put them into a very low low power loop inside Linux.
>>>>
>>>> The new kernel has no way to extract them from that loop. So we must re-"online"
>>>> them before we kexec so they can be passed to the new kernel normally (or returned
>>>> to firmware like we do on powernv).
>>>
>>> Srivatsa,
>>>
>>> Looks like your patch has been merged.
>>>
>>> I don't like the following change in arch independent code.
>>>
>>> /*
>>>  * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
>>>  * no further code needs to use CPU hotplug (which is true in
>>>  * the reboot case). However, the kexec path depends on  using
>>>  * CPU hotplug again; so re-enable it here. 
>>>  */
>>>                cpu_hotplug_enable();
>>>
>>> As it is very powerpc specific requirement, can you enable hotplug in powerpc
>>> arch dependent code as a short term solution.
>>>
>>
>> I didn't do that because that would mean that the _disable() would be
>> performed inside kernel/kexec.c and the corresponding _enable() would
>> be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
>> connection between them, which would have made them hard to relate.
> 
> Which we are doing anyway. The difference is that now we are doing it
> for all arches.
> 
> If this is powerpc specific requirement, then we should limit this to
> powerpc only and not let spill over in generic code.
> 
> And putting a big fat comment should take care of being able to figure
> out why arch code is overwriting the generic code's decision. By putting
> it in generic code and enforcing this on all arches does not buy us
> anything, IMHO.
> 

Yep, I see your point. Sorry about that!

Actually, I originally thought of fixing cpu_hotplug_disable/enable itself:
their true intent is to prevent *userspace* (i.e., from sysfs) from performing
CPU hotplug after a certain quiescent point in the kernel, and not to prevent
the kernel's own cpu hotplug attempts. But currently it prevents _all_ hotplug,
including those that are initiated from within the kernel, which is the reason
why kexec was effectively locking itself out on powerpc. I explored options to
fix that (which would in turn fix the powerpc problem automatically, without
having to add any code to kernel/kexec.c or even arch/powerpc code). But it
turned out to be too difficult and ugly given the current CPU hotplug locking
scheme. I'll revisit that once CPU hotplug locking is cleaned up.

But anyway, the powerpc kexec fix that I'm working on right now is not only a
much better solution, but it will also restore the original kexec code in
kernel/kexec.c, by removing the _enable() call.

Thank you!

Regards,
Srivatsa S. Bhat


WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: ego@linux.vnet.ibm.com, matt@ozlabs.org,
	mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
	ebiederm@xmission.com, paulus@samba.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 00:30:08 +0530	[thread overview]
Message-ID: <53920FB8.4090308@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140606182720.GL1526@redhat.com>

On 06/06/2014 11:57 PM, Vivek Goyal wrote:
> On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
>> On 06/04/2014 07:16 PM, Vivek Goyal wrote:
>>> On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>>>> know from the commit log and the comment mentioned above (and from my own
>>>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>>>> long-term solution.
>>>>>
>>>>> Matt, Ben, any thoughts on this?
>>>>
>>>> The problem is with our "soft offline" which we do on some platforms. When we
>>>> offline we don't actually send the CPUs back to firmware or anything like that.
>>>>
>>>> We put them into a very low low power loop inside Linux.
>>>>
>>>> The new kernel has no way to extract them from that loop. So we must re-"online"
>>>> them before we kexec so they can be passed to the new kernel normally (or returned
>>>> to firmware like we do on powernv).
>>>
>>> Srivatsa,
>>>
>>> Looks like your patch has been merged.
>>>
>>> I don't like the following change in arch independent code.
>>>
>>> /*
>>>  * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
>>>  * no further code needs to use CPU hotplug (which is true in
>>>  * the reboot case). However, the kexec path depends on  using
>>>  * CPU hotplug again; so re-enable it here. 
>>>  */
>>>                cpu_hotplug_enable();
>>>
>>> As it is very powerpc specific requirement, can you enable hotplug in powerpc
>>> arch dependent code as a short term solution.
>>>
>>
>> I didn't do that because that would mean that the _disable() would be
>> performed inside kernel/kexec.c and the corresponding _enable() would
>> be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
>> connection between them, which would have made them hard to relate.
> 
> Which we are doing anyway. The difference is that now we are doing it
> for all arches.
> 
> If this is powerpc specific requirement, then we should limit this to
> powerpc only and not let spill over in generic code.
> 
> And putting a big fat comment should take care of being able to figure
> out why arch code is overwriting the generic code's decision. By putting
> it in generic code and enforcing this on all arches does not buy us
> anything, IMHO.
> 

Yep, I see your point. Sorry about that!

Actually, I originally thought of fixing cpu_hotplug_disable/enable itself:
their true intent is to prevent *userspace* (i.e., from sysfs) from performing
CPU hotplug after a certain quiescent point in the kernel, and not to prevent
the kernel's own cpu hotplug attempts. But currently it prevents _all_ hotplug,
including those that are initiated from within the kernel, which is the reason
why kexec was effectively locking itself out on powerpc. I explored options to
fix that (which would in turn fix the powerpc problem automatically, without
having to add any code to kernel/kexec.c or even arch/powerpc code). But it
turned out to be too difficult and ugly given the current CPU hotplug locking
scheme. I'll revisit that once CPU hotplug locking is cleaned up.

But anyway, the powerpc kexec fix that I'm working on right now is not only a
much better solution, but it will also restore the original kexec code in
kernel/kexec.c, by removing the _enable() call.

Thank you!

Regards,
Srivatsa S. Bhat

WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: ego@linux.vnet.ibm.com,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	matt@ozlabs.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
	ebiederm@xmission.com,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	paulus@samba.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 00:30:08 +0530	[thread overview]
Message-ID: <53920FB8.4090308@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140606182720.GL1526@redhat.com>

On 06/06/2014 11:57 PM, Vivek Goyal wrote:
> On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
>> On 06/04/2014 07:16 PM, Vivek Goyal wrote:
>>> On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>>>> know from the commit log and the comment mentioned above (and from my own
>>>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>>>> long-term solution.
>>>>>
>>>>> Matt, Ben, any thoughts on this?
>>>>
>>>> The problem is with our "soft offline" which we do on some platforms. When we
>>>> offline we don't actually send the CPUs back to firmware or anything like that.
>>>>
>>>> We put them into a very low low power loop inside Linux.
>>>>
>>>> The new kernel has no way to extract them from that loop. So we must re-"online"
>>>> them before we kexec so they can be passed to the new kernel normally (or returned
>>>> to firmware like we do on powernv).
>>>
>>> Srivatsa,
>>>
>>> Looks like your patch has been merged.
>>>
>>> I don't like the following change in arch independent code.
>>>
>>> /*
>>>  * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
>>>  * no further code needs to use CPU hotplug (which is true in
>>>  * the reboot case). However, the kexec path depends on  using
>>>  * CPU hotplug again; so re-enable it here. 
>>>  */
>>>                cpu_hotplug_enable();
>>>
>>> As it is very powerpc specific requirement, can you enable hotplug in powerpc
>>> arch dependent code as a short term solution.
>>>
>>
>> I didn't do that because that would mean that the _disable() would be
>> performed inside kernel/kexec.c and the corresponding _enable() would
>> be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
>> connection between them, which would have made them hard to relate.
> 
> Which we are doing anyway. The difference is that now we are doing it
> for all arches.
> 
> If this is powerpc specific requirement, then we should limit this to
> powerpc only and not let spill over in generic code.
> 
> And putting a big fat comment should take care of being able to figure
> out why arch code is overwriting the generic code's decision. By putting
> it in generic code and enforcing this on all arches does not buy us
> anything, IMHO.
> 

Yep, I see your point. Sorry about that!

Actually, I originally thought of fixing cpu_hotplug_disable/enable itself:
their true intent is to prevent *userspace* (i.e., from sysfs) from performing
CPU hotplug after a certain quiescent point in the kernel, and not to prevent
the kernel's own cpu hotplug attempts. But currently it prevents _all_ hotplug,
including those that are initiated from within the kernel, which is the reason
why kexec was effectively locking itself out on powerpc. I explored options to
fix that (which would in turn fix the powerpc problem automatically, without
having to add any code to kernel/kexec.c or even arch/powerpc code). But it
turned out to be too difficult and ugly given the current CPU hotplug locking
scheme. I'll revisit that once CPU hotplug locking is cleaned up.

But anyway, the powerpc kexec fix that I'm working on right now is not only a
much better solution, but it will also restore the original kexec code in
kernel/kexec.c, by removing the _enable() call.

Thank you!

Regards,
Srivatsa S. Bhat


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2014-06-06 19:01 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 10:55 [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode Srivatsa S. Bhat
2014-05-27 10:55 ` Srivatsa S. Bhat
2014-05-27 10:55 ` Srivatsa S. Bhat
2014-05-28 13:31 ` Vivek Goyal
2014-05-28 13:31   ` Vivek Goyal
2014-05-28 13:31   ` Vivek Goyal
2014-06-03 20:28   ` Srivatsa S. Bhat
2014-06-03 20:28     ` Srivatsa S. Bhat
2014-06-03 20:28     ` Srivatsa S. Bhat
2014-06-03 22:09     ` Benjamin Herrenschmidt
2014-06-03 22:09       ` Benjamin Herrenschmidt
2014-06-03 22:09       ` Benjamin Herrenschmidt
2014-06-04 13:46       ` Vivek Goyal
2014-06-04 13:46         ` Vivek Goyal
2014-06-04 13:46         ` Vivek Goyal
2014-06-06 12:30         ` Srivatsa S. Bhat
2014-06-06 12:30           ` Srivatsa S. Bhat
2014-06-06 12:30           ` Srivatsa S. Bhat
2014-06-06 18:27           ` Vivek Goyal
2014-06-06 18:27             ` Vivek Goyal
2014-06-06 18:27             ` Vivek Goyal
2014-06-06 19:00             ` Srivatsa S. Bhat [this message]
2014-06-06 19:00               ` Srivatsa S. Bhat
2014-06-06 19:00               ` Srivatsa S. Bhat
2014-06-06 12:29       ` Srivatsa S. Bhat
2014-06-06 12:29         ` Srivatsa S. Bhat
2014-06-06 12:29         ` Srivatsa S. Bhat
2014-06-06 12:37         ` Srivatsa S. Bhat
2014-06-06 12:37           ` Srivatsa S. Bhat
2014-06-06 12:37           ` Srivatsa S. Bhat
2014-06-06 21:16         ` Srivatsa S. Bhat
2014-06-06 21:16           ` Srivatsa S. Bhat
2014-06-06 21:16           ` Srivatsa S. Bhat
2014-06-12  6:39           ` Joel Stanley
2014-06-12  6:39             ` Joel Stanley
2014-06-12  6:39             ` Joel Stanley
2014-06-12  8:17             ` Srivatsa S. Bhat
2014-06-12  8:17               ` Srivatsa S. Bhat
2014-06-12  8:17               ` Srivatsa S. Bhat
2014-06-04 13:41     ` Vivek Goyal
2014-06-04 13:41       ` Vivek Goyal
2014-06-04 13:41       ` Vivek Goyal
2014-06-06 12:31       ` Srivatsa S. Bhat
2014-06-06 12:31         ` Srivatsa S. Bhat
2014-06-06 12:31         ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53920FB8.4090308@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=ebiederm@xmission.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=matt@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=suzuki@in.ibm.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.