All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Limonciello, Mario" <Mario.Limonciello@amd.com>
To: Christian Casteyde <casteyde.christian@free.fr>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>
Cc: "stable@vger.kernel.org" <stable@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Subject: RE: [REGRESSION] Laptop with Ryzen 4600H fails to resume video since 5.17.4 (works 5.17.3)
Date: Mon, 23 May 2022 22:01:49 +0000	[thread overview]
Message-ID: <MN0PR12MB61011E787AAAA98597A3A9DDE2D49@MN0PR12MB6101.namprd12.prod.outlook.com> (raw)
In-Reply-To: <2408578.XAFRqVoOGU@geek500.localdomain>

[Public]



> -----Original Message-----
> From: Christian Casteyde <casteyde.christian@free.fr>
> Sent: Monday, May 23, 2022 12:03
> To: Kai-Heng Feng <kai.heng.feng@canonical.com>; Thorsten Leemhuis
> <regressions@leemhuis.info>
> Cc: stable@vger.kernel.org; regressions@lists.linux.dev; Deucher, Alexander
> <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org; Limonciello,
> Mario <Mario.Limonciello@amd.com>
> Subject: Re: [REGRESSION] Laptop with Ryzen 4600H fails to resume video
> since 5.17.4 (works 5.17.3)
> 
> I've opened the gitlab entry for this discussion:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F2023&amp;data=05%7C01%7Cmario.limonciello%40amd.com%
> 7C7a9cf928dd1e491f0c2c08da3cde2e21%7C3dd8961fe4884e608e11a82d994e
> 183d%7C0%7C0%7C637889222210477502%7CUnknown%7CTWFpbGZsb3d8ey
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000%7C%7C%7C&amp;sdata=WnQwbF8J4j2F69GINpz49Zg5Qg0tpVCmUG
> i1FjXrCu4%3D&amp;reserved=0
> 
> I confirm I 'm not receiving mails anymore from the mailing list but I'll
> follow gitlab.
> 
> In reply to the patch proposed by Mario:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.freedesktop.org%2Fpatch%2F486836%2F&amp;data=05%7C01%7Cma
> rio.limonciello%40amd.com%7C7a9cf928dd1e491f0c2c08da3cde2e21%7C3dd
> 8961fe4884e608e11a82d994e183d%7C0%7C0%7C637889222210477502%7CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
> Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=FuhEnUyC8nv
> ycpaGr4yfBbGpuXHXoKqImBTQ0PL8caY%3D&amp;reserved=0
> With this patch applied on vanilla 5.18 kernel:
> - suspend still fails;
> - after suspend attempt, the screen comes back with only the cursor;
> - switching to a console let me get the following dmesg file.
> 

I spent some time with Christian today on that Gitlab issue, and want to update this
audience to a few things.

1) The first suspend failure is caused by something wrong with acpi_tad driver on his
system.  Blacklisting the driver everything works properly.

2) Failing the first time with deep and trying s2idle instead is caused by something in
his userspace (unknown right now).

3) This is not a regression IMO.  The GPU was in a bad state from the problem caused
by acpi-tad.  We introduced poking around and resetting the GPU to help with
aborted suspends, but they've led to a pile of "Oh but not this case", fix this ordering
problem, deal with this repercussion.  We shouldn't be dropping any of those incremental
solutions to deal with a path like this; the direction should be for fixing acpi_tad or whatever
userspace is using it incorrectly on Christian's system.

> CC
> 
> Le lundi 23 mai 2022, 15:02:53 CEST Christian Casteyde a écrit :
> > Hello
> >
> > I've checked with 5.18 the problem is still there.
> > Interestingly, I tried to revert the commit but it was rejected because of
> > the change in the test from:
> >         if (!adev->in_s0ix)
> > to:
> >       if (amdgpu_acpi_should_gpu_reset(adev))
> >
> > in amdgpu_pmops_suspend.
> >
> > I fixed the rejection, keeping shoud_gpu_reset, but it still fails.
> > Then I changed to restore test of in_s0ix as it was in 5.17, and it works.
> > I tried with a call to amd_gpu_asic_reset without testing at all in_s0ix, it
> > works.
> >
> > Therefore, my APU wants a reset in amdgpu_pmops_suspend.
> >
> > By curiosity, I tried to do the reset in amdgpu_pmops_suspend_noirq as
> was
> > intended in 5.18 original code, commenting out the test of
> > amdgpu_acpi_should_gpu_reset(adev) (since this APU wants a reset).
> > This does not work, I got the Fence timeout errors or freezes.
> >
> > If I leave  noirq function unchanged (original 5.18 code), and just add a
> > reset in suspend() as was done in 5.17, it works.
> >
> > Therefore, my GPU does NOT want to be reset in noirq, the reset must be
> in
> > suspend.
> >
> > In other words, I modified amdgpu_pmops_suspend (partial revert) like
> this
> > and this works on my laptop:
> >
> > static int amdgpu_pmops_suspend(struct device *dev)
> > {
> > 	struct drm_device *drm_dev = dev_get_drvdata(dev);
> > 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > +	int r;
> >
> > 	if (amdgpu_acpi_is_s0ix_active(adev))
> > 		adev->in_s0ix = true;
> > 	else
> > 		adev->in_s3 = true;
> > -	return amdgpu_device_suspend(drm_dev, true);
> > +	r = amdgpu_device_suspend(drm_dev, true);
> > +	if (r)
> > +		return r;
> > +	if (!adev->in_s0ix)
> > +		return amdgpu_asic_reset(adev);
> > 	return 0;
> > }
> >
> > static int amdgpu_pmops_suspend_noirq(struct device *dev)
> > {
> > 	struct drm_device *drm_dev = dev_get_drvdata(dev);
> > 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >
> > 	if (amdgpu_acpi_should_gpu_reset(adev))
> > 		return amdgpu_asic_reset(adev);
> >
> > 	return 0;
> > }
> >
> > I don't know if other APU want a reset, in the same context, and how to
> > differentiate all the cases, so I cannot go further, but I can test patches
> > if needed.
> >
> > CC
> >
> > Le mercredi 18 mai 2022, 08:37:27 CEST Thorsten Leemhuis a écrit :
> > > On 18.05.22 07:54, Kai-Heng Feng wrote:
> > > > On Wed, May 18, 2022 at 1:52 PM Thorsten Leemhuis
> > > >
> > > > <regressions@leemhuis.info> wrote:
> > > >> On 17.05.22 19:37, casteyde.christian@free.fr wrote:
> > > >>> I've tryied to revert the offending commit on 5.18-rc7 (887f75cfd0da
> > > >>> ("drm/amdgpu: Ensure HDA function is suspended before ASIC
> reset"),
> > > >>> and
> > > >>> the problem disappears so it's really this commit that breaks.
> > > >>
> > > >> In that case I'll update the regzbot status to make sure it's visible
> > > >> as
> > > >> regression introduced in the 5.18 cycle:
> > > >>
> > > >> #regzbot introduced: 887f75cfd0da
> > > >>
> > > >> BTW: obviously would be nice to get this fixed before 5.18 is released
> > > >> (which might already happen on Sunday), especially as the culprit
> > > >> apparently was already backported to stable, but I guess that won't be
> > > >> easy...
> > > >>
> > > >> Which made me wondering: is reverting the culprit temporarily in
> > > >> mainline (and reapplying it later with a fix) a option here?
> > > >
> > > > It's too soon to call it's the culprit.
> > >
> > > Well, sure, the root-cause might be somewhere else. But from the point
> > > of kernel regressions (and tracking them) it's the culprit, as that's
> > > the change that triggers the misbehavior. And that's how Linus
> > > approaches these things as well when it comes to reverting to fix
> > > regressions -- and he even might...
> > >
> > > > The suspend on the system
> > > > doesn't work properly at the first place.
> > >
> > > ...ignore things like this, as long as a revert is unlikely to cause
> > > more damage than good.
> > >
> > > Ciao. Thorsten
> > >
> > > >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
> > > >> hat)
> > > >>
> > > >> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > > >> reports and sometimes miss something important when writing mails
> like
> > > >> this. If that's the case here, don't hesitate to tell me in a public
> > > >> reply, it's in everyone's interest to set the public record straight.

  reply	other threads:[~2022-05-23 22:01 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-14 14:41 [REGRESSION] Laptop with Ryzen 4600H fails to resume video since 5.17.4 (works 5.17.3) Christian Casteyde
2022-05-14 15:12 ` Thorsten Leemhuis
2022-05-14 17:34   ` Christian Casteyde
2022-05-16  2:47     ` Kai-Heng Feng
2022-05-16  3:06       ` Mario Limonciello
2022-05-16 17:23       ` Christian Casteyde
2022-05-17  2:03         ` Kai-Heng Feng
2022-05-17  6:36           ` Christian Casteyde
2022-05-17  6:36             ` Christian Casteyde
2022-05-17  6:58             ` Kai-Heng Feng
2022-05-17 17:37               ` casteyde.christian
2022-05-18  5:52                 ` Thorsten Leemhuis
2022-05-18  5:54                   ` Kai-Heng Feng
2022-05-18  6:37                     ` Thorsten Leemhuis
2022-05-18 20:15                       ` Limonciello, Mario
2022-05-23 13:02                       ` Christian Casteyde
2022-05-23 14:00                         ` Limonciello, Mario
2022-05-23 17:03                         ` Christian Casteyde
2022-05-23 22:01                           ` Limonciello, Mario [this message]
2022-05-24 20:54                           ` Christian Casteyde
2022-05-25  7:29                             ` Thorsten Leemhuis
2022-05-17 17:38               ` casteyde.christian
2022-05-17 18:13                 ` Limonciello, Mario
2022-05-18 11:02                   ` casteyde.christian
2022-05-19 17:31                     ` Limonciello, Mario
2022-05-19 18:07                       ` Limonciello, Mario
2022-05-18  2:08                 ` Kai-Heng Feng
2022-05-18  2:19                   ` Mario Limonciello
2022-05-18  7:15                   ` Christian Casteyde
2022-05-18  7:15                     ` Christian Casteyde
2022-05-18 10:53                     ` casteyde.christian
2022-05-27  9:51   ` [REGRESSION] Laptop with Ryzen 4600H fails to resume video since 5.17.4 (works 5.17.3) #forregzbot Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN0PR12MB61011E787AAAA98597A3A9DDE2D49@MN0PR12MB6101.namprd12.prod.outlook.com \
    --to=mario.limonciello@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=casteyde.christian@free.fr \
    --cc=gregkh@linuxfoundation.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.