All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Subject: Re: Regression in 5.16-rc1 with suspend to idle
Date: Fri, 18 Feb 2022 17:15:34 +0100	[thread overview]
Message-ID: <CAJZ5v0ho8PHGp0gAUp5KkUstTXLyUMsaQ7wTL=8xDJtjtXjPRw@mail.gmail.com> (raw)
In-Reply-To: <BL1PR12MB5157E2CDD68BA585C5F4CF2BE2369@BL1PR12MB5157.namprd12.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 3933 bytes --]

On Thu, Feb 17, 2022 at 9:16 PM Limonciello, Mario
<Mario.Limonciello@amd.com> wrote:
>
> [Public]
>
> > > I've found recently that on kernel 5.17-rc4 some OEM AMD laptops are
> > shutting down when entering suspend to idle.
> >
> > Interesting.  Can you identify the exact point when the shutdown occurs?
>
> It looks like it's a platform firmware crash that causes the shutdown not a kernel
> crash.

It looks like we'll need to quirk the system in question then.

> > > What would you suggest to be done in this case?  Revert both commits?  Or
> > would you prefer to have a fixup on top
> > > of that?
> >
> > I would prefer to fix the problem on top of the current 5.16-rc.
> >
>
> Here is the minimal patch I can come up with on top of 5.17-rc4 that fixes it:
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 04ea92cbd9cf..f5bf46782043 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -32,6 +32,7 @@
>  #include <linux/suspend.h>
>  #include <trace/events/power.h>
>  #include <linux/cpufreq.h>
> +#include <linux/cpuidle.h>
>  #include <linux/devfreq.h>
>  #include <linux/timer.h>
>
> @@ -1350,6 +1351,8 @@ int dpm_suspend_noirq(pm_message_t state)
>  {
>         int ret;
>
> +       cpuidle_pause();

Can you replace this with wake_up_all_idle_cpus() and remove the
cpuidle_resume()/cpuidle_pause() calls from s2idle_enter() and see
what happens?

> +
>         device_wakeup_arm_wake_irqs();
>         suspend_device_irqs();
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 6fcdee7e87a5..1708a643ba5d 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -97,6 +97,7 @@ static void s2idle_enter(void)
>         raw_spin_unlock_irq(&s2idle_lock);
>
>         cpus_read_lock();
> +       cpuidle_resume();
>
>         /* Push all the CPUs into the idle loop. */
>         wake_up_all_idle_cpus();
> @@ -104,6 +105,7 @@ static void s2idle_enter(void)
>         swait_event_exclusive(s2idle_wait_head,
>                     s2idle_state == S2IDLE_STATE_WAKE);
>
> +       cpuidle_pause();
>         cpus_read_unlock();
>
>         raw_spin_lock_irq(&s2idle_lock);
>
>
> * Removing the cpuidle_pause call from s2idle_enter will fix the crash,
>   but the system doesn't enter the deepest sleep state.

I think you mean that removing it doesn't make a difference except
that it prevents the deepest state from being entered.

Pausing cpuidle here is kind of redundant, because it just flips the
"initialized" flag that's going to be flipped again by
cpuidle_resume() in the next iteration.  [BTW, your patch is missing
one an additional cpuidle_resume() in the resume path to match this
cpuidle_pause().]  Also calling  wake_up_all_idle_cpus() is not likely
to make a difference, so I'm guessing that synchronize_rcu() does the
trick.

> * removing the cpuidle_pause call from dpm_suspend_noirq the firmware continues to
>    crash.

So the crash occurs while running dpm_suspend_noirq().

> I also confirmed that reverting both of these commits together on top of 5.17-rc4 fixes it:
>
> 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
> 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
>
> The commit messages at least make it sound like it was just a rework for unification of the codepaths,
> not supposed to be for anything to be actually fixed, so I would think it should be safe to revert.

There is a deeper issue related to these commits and I'm not inclined
to go back to the old code if there is only one system affected by
this and the problem is related to the platform firmware on that
system.

> So please advise which way you want to go and I'll send up a patch (or if you want to write one
> I'm happy to take it and test it since I can readily reproduce it).

To start with, please test the attached debug patch on top of -rc4.

[-- Attachment #2: pm-suspend-cpuidle-debug.patch --]
[-- Type: text/x-patch, Size: 957 bytes --]

---
 drivers/base/power/main.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -32,6 +32,7 @@
 #include <linux/suspend.h>
 #include <trace/events/power.h>
 #include <linux/cpufreq.h>
+#include <linux/cpuidle.h>
 #include <linux/devfreq.h>
 #include <linux/timer.h>
 
@@ -1294,6 +1295,8 @@ static int dpm_noirq_suspend_devices(pm_
 	ktime_t starttime = ktime_get();
 	int error = 0;
 
+	cpuidle_pause();
+
 	trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
@@ -1336,6 +1339,9 @@ static int dpm_noirq_suspend_devices(pm_
 	}
 	dpm_show_time(starttime, state, error, "noirq");
 	trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, false);
+
+	cpuidle_resume();
+
 	return error;
 }
 

  reply	other threads:[~2022-02-18 16:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17  6:15 Regression in 5.16-rc1 with suspend to idle Limonciello, Mario
2022-02-17 10:48 ` Rafael J. Wysocki
2022-02-17 20:16   ` Limonciello, Mario
2022-02-18 16:15     ` Rafael J. Wysocki [this message]
2022-02-18 18:18       ` Rafael J. Wysocki
2022-02-19  3:01         ` Limonciello, Mario
2022-02-21 16:05           ` Rafael J. Wysocki
2022-02-21 18:42             ` Limonciello, Mario
2022-02-21 18:58               ` Rafael J. Wysocki
2022-02-21 20:14                 ` Limonciello, Mario
2022-02-22 12:49                   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0ho8PHGp0gAUp5KkUstTXLyUMsaQ7wTL=8xDJtjtXjPRw@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=Mario.Limonciello@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.