All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mason <slash.tmp@free.fr>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm <linux-pm@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Russell King <linux@arm.linux.org.uk>,
	Stephen Boyd <sboyd@codeaurora.org>,
	Sebastian Frias <sf84@laposte.net>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Will Deacon <will.deacon@arm.com>, Arnd Bergmann <arnd@arndb.de>
Subject: Re: Linux panics when suspend cannot offline the secondary cores
Date: Mon, 13 Jun 2016 15:50:56 +0200	[thread overview]
Message-ID: <575EBA40.4000803@free.fr> (raw)
In-Reply-To: <2922940.3xeChLaYeK@vostro.rjw.lan>

On 13/06/2016 15:30, Rafael J. Wysocki wrote:

> On Monday, June 13, 2016 02:06:14 PM Mason wrote:
>
>> On 10/06/2016 23:37, Mason wrote:
>>
>>> On 10/06/2016 23:35, Rafael J. Wysocki wrote:
>>>
>>>> On Friday, June 10, 2016 05:41:32 PM Mason wrote:
>>>>
>>>>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really
>>>>> unhappy when the suspend framework fails to offline secondary cores.
>>>>>
>>>>> Is this expected/by design, or could it fail more gracefully?
>>>>> (It could also be something missing in my platform's code.)
>>>>
>>>> This looks like a CPU offline bug to me which is more general than just
>>>> system suspend.
>>>
>>> You may be right, I will try just off-lining cpu1.
>>> Suspend may be a red herring.
>>>
>>> By the way, I know my implementation of tango_cpu_die
>>> is incorrect, I was testing the failure mode.
>>
>> Hello Rafael,
>>
>> Suspend was indeed a red herring. Manually requesting cpu1 off-lining
>> also makes Linux panic when cpu_die() unexpectedly returns.
>>
>> The subject should perhaps have been:
>>
>>   Linux panics when secondary core off-lining fails
>>
>> Could it be made to fail more gracefully?
>> Or is this borkage inherent to the failed operation?
>> Or is it a bug in my platform code?
>> (A bug other than tango_cpu_die() failing to kill the core.)
> 
> Well, smp_ops.cpu_die() is not expected to return AFAICS, so that may be
> the reason why it fails for you the way it does.

I am aware that smp_ops.cpu_die() is not expected to return.
(I was wondering if the framework could handle it gracefully.)

The actual implementation for cpu_die() asks the firmware to off-line
the current core. If the operation fails, for whatever reason, firmware
is not supposed to return control to Linux?

Is panic the only safe thing to do in Linux:
(If yes, then why doesn't the framework panic immediately?)

static void tango_cpu_die(unsigned int cpu)
{
	ask_firmware_to_offline(cpu);
	/* if we return here, something went wrong */
	panic("firmware could not offline");
}

Regards.


WARNING: multiple messages have this Message-ID (diff)
From: slash.tmp@free.fr (Mason)
To: linux-arm-kernel@lists.infradead.org
Subject: Linux panics when suspend cannot offline the secondary cores
Date: Mon, 13 Jun 2016 15:50:56 +0200	[thread overview]
Message-ID: <575EBA40.4000803@free.fr> (raw)
In-Reply-To: <2922940.3xeChLaYeK@vostro.rjw.lan>

On 13/06/2016 15:30, Rafael J. Wysocki wrote:

> On Monday, June 13, 2016 02:06:14 PM Mason wrote:
>
>> On 10/06/2016 23:37, Mason wrote:
>>
>>> On 10/06/2016 23:35, Rafael J. Wysocki wrote:
>>>
>>>> On Friday, June 10, 2016 05:41:32 PM Mason wrote:
>>>>
>>>>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really
>>>>> unhappy when the suspend framework fails to offline secondary cores.
>>>>>
>>>>> Is this expected/by design, or could it fail more gracefully?
>>>>> (It could also be something missing in my platform's code.)
>>>>
>>>> This looks like a CPU offline bug to me which is more general than just
>>>> system suspend.
>>>
>>> You may be right, I will try just off-lining cpu1.
>>> Suspend may be a red herring.
>>>
>>> By the way, I know my implementation of tango_cpu_die
>>> is incorrect, I was testing the failure mode.
>>
>> Hello Rafael,
>>
>> Suspend was indeed a red herring. Manually requesting cpu1 off-lining
>> also makes Linux panic when cpu_die() unexpectedly returns.
>>
>> The subject should perhaps have been:
>>
>>   Linux panics when secondary core off-lining fails
>>
>> Could it be made to fail more gracefully?
>> Or is this borkage inherent to the failed operation?
>> Or is it a bug in my platform code?
>> (A bug other than tango_cpu_die() failing to kill the core.)
> 
> Well, smp_ops.cpu_die() is not expected to return AFAICS, so that may be
> the reason why it fails for you the way it does.

I am aware that smp_ops.cpu_die() is not expected to return.
(I was wondering if the framework could handle it gracefully.)

The actual implementation for cpu_die() asks the firmware to off-line
the current core. If the operation fails, for whatever reason, firmware
is not supposed to return control to Linux?

Is panic the only safe thing to do in Linux:
(If yes, then why doesn't the framework panic immediately?)

static void tango_cpu_die(unsigned int cpu)
{
	ask_firmware_to_offline(cpu);
	/* if we return here, something went wrong */
	panic("firmware could not offline");
}

Regards.

  reply	other threads:[~2016-06-13 13:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-10 15:41 Linux panics when suspend cannot offline the secondary cores Mason
2016-06-10 15:41 ` Mason
2016-06-10 21:35 ` Rafael J. Wysocki
2016-06-10 21:35   ` Rafael J. Wysocki
2016-06-10 21:37   ` Mason
2016-06-10 21:37     ` Mason
2016-06-13 12:06     ` Mason
2016-06-13 12:06       ` Mason
2016-06-13 13:30       ` Rafael J. Wysocki
2016-06-13 13:30         ` Rafael J. Wysocki
2016-06-13 13:50         ` Mason [this message]
2016-06-13 13:50           ` Mason
2016-06-13 20:49           ` Rafael J. Wysocki
2016-06-13 20:49             ` Rafael J. Wysocki
2016-06-13 21:02             ` Russell King - ARM Linux
2016-06-13 21:02               ` Russell King - ARM Linux
2016-06-14 12:42               ` Mason
2016-06-14 12:42                 ` Mason
2016-06-15 11:48                 ` Rebooting Cortex A9 MPCore (was: Linux panics when suspend cannot offline the secondary cores) Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575EBA40.4000803@free.fr \
    --to=slash.tmp@free.fr \
    --cc=arnd@arndb.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=sboyd@codeaurora.org \
    --cc=sf84@laposte.net \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.