On Thu, Feb 17, 2022 at 9:16 PM Limonciello, Mario wrote: > > [Public] > > > > I've found recently that on kernel 5.17-rc4 some OEM AMD laptops are > > shutting down when entering suspend to idle. > > > > Interesting. Can you identify the exact point when the shutdown occurs? > > It looks like it's a platform firmware crash that causes the shutdown not a kernel > crash. It looks like we'll need to quirk the system in question then. > > > What would you suggest to be done in this case? Revert both commits? Or > > would you prefer to have a fixup on top > > > of that? > > > > I would prefer to fix the problem on top of the current 5.16-rc. > > > > Here is the minimal patch I can come up with on top of 5.17-rc4 that fixes it: > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 04ea92cbd9cf..f5bf46782043 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -1350,6 +1351,8 @@ int dpm_suspend_noirq(pm_message_t state) > { > int ret; > > + cpuidle_pause(); Can you replace this with wake_up_all_idle_cpus() and remove the cpuidle_resume()/cpuidle_pause() calls from s2idle_enter() and see what happens? > + > device_wakeup_arm_wake_irqs(); > suspend_device_irqs(); > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 6fcdee7e87a5..1708a643ba5d 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -97,6 +97,7 @@ static void s2idle_enter(void) > raw_spin_unlock_irq(&s2idle_lock); > > cpus_read_lock(); > + cpuidle_resume(); > > /* Push all the CPUs into the idle loop. */ > wake_up_all_idle_cpus(); > @@ -104,6 +105,7 @@ static void s2idle_enter(void) > swait_event_exclusive(s2idle_wait_head, > s2idle_state == S2IDLE_STATE_WAKE); > > + cpuidle_pause(); > cpus_read_unlock(); > > raw_spin_lock_irq(&s2idle_lock); > > > * Removing the cpuidle_pause call from s2idle_enter will fix the crash, > but the system doesn't enter the deepest sleep state. I think you mean that removing it doesn't make a difference except that it prevents the deepest state from being entered. Pausing cpuidle here is kind of redundant, because it just flips the "initialized" flag that's going to be flipped again by cpuidle_resume() in the next iteration. [BTW, your patch is missing one an additional cpuidle_resume() in the resume path to match this cpuidle_pause().] Also calling wake_up_all_idle_cpus() is not likely to make a difference, so I'm guessing that synchronize_rcu() does the trick. > * removing the cpuidle_pause call from dpm_suspend_noirq the firmware continues to > crash. So the crash occurs while running dpm_suspend_noirq(). > I also confirmed that reverting both of these commits together on top of 5.17-rc4 fixes it: > > 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path") > 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions") > > The commit messages at least make it sound like it was just a rework for unification of the codepaths, > not supposed to be for anything to be actually fixed, so I would think it should be safe to revert. There is a deeper issue related to these commits and I'm not inclined to go back to the old code if there is only one system affected by this and the problem is related to the platform firmware on that system. > So please advise which way you want to go and I'll send up a patch (or if you want to write one > I'm happy to take it and test it since I can readily reproduce it). To start with, please test the attached debug patch on top of -rc4.