linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ego@linux.vnet.ibm.com, matt@ozlabs.org,
	mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
	ebiederm@xmission.com, paulus@samba.org,
	linuxppc-dev@lists.ozlabs.org, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 02:46:40 +0530	[thread overview]
Message-ID: <53922FB8.6070408@linux.vnet.ibm.com> (raw)
In-Reply-To: <5391B413.100@linux.vnet.ibm.com>

On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
> 
> Thanks a lot for the explanation Ben!
> 
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
> 
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
> 
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
> 
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
> 
> Below are the 4 preliminary patches I'm have so far, to implement this.
> 

And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)


diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@ static void wake_offline_cpus(void)
 {
 	int cpu = 0;
 
+	if (ppc_md.kexec_wake_prepare)
+		ppc_md.kexec_wake_prepare();
+
 	for_each_present_cpu(cpu) {
 		if (!cpu_online(cpu)) {
 			printk(KERN_INFO "kexec: Waking offline cpu %d.\n",

I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.

I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.

Regards,
Srivatsa S. Bhat

  parent reply	other threads:[~2014-06-06 21:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 10:55 [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode Srivatsa S. Bhat
2014-05-28 13:31 ` Vivek Goyal
2014-06-03 20:28   ` Srivatsa S. Bhat
2014-06-03 22:09     ` Benjamin Herrenschmidt
2014-06-04 13:46       ` Vivek Goyal
2014-06-06 12:30         ` Srivatsa S. Bhat
2014-06-06 18:27           ` Vivek Goyal
2014-06-06 19:00             ` Srivatsa S. Bhat
2014-06-06 12:29       ` Srivatsa S. Bhat
2014-06-06 12:37         ` Srivatsa S. Bhat
2014-06-06 21:16         ` Srivatsa S. Bhat [this message]
2014-06-12  6:39           ` Joel Stanley
2014-06-12  8:17             ` Srivatsa S. Bhat
2014-06-04 13:41     ` Vivek Goyal
2014-06-06 12:31       ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53922FB8.6070408@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=ebiederm@xmission.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=matt@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=suzuki@in.ibm.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).