I've opened the gitlab entry for this discussion: https://gitlab.freedesktop.org/drm/amd/-/issues/2023 I confirm I 'm not receiving mails anymore from the mailing list but I'll follow gitlab. In reply to the patch proposed by Mario: https://patchwork.freedesktop.org/patch/486836/ With this patch applied on vanilla 5.18 kernel: - suspend still fails; - after suspend attempt, the screen comes back with only the cursor; - switching to a console let me get the following dmesg file. CC Le lundi 23 mai 2022, 15:02:53 CEST Christian Casteyde a écrit : > Hello > > I've checked with 5.18 the problem is still there. > Interestingly, I tried to revert the commit but it was rejected because of > the change in the test from: > if (!adev->in_s0ix) > to: > if (amdgpu_acpi_should_gpu_reset(adev)) > > in amdgpu_pmops_suspend. > > I fixed the rejection, keeping shoud_gpu_reset, but it still fails. > Then I changed to restore test of in_s0ix as it was in 5.17, and it works. > I tried with a call to amd_gpu_asic_reset without testing at all in_s0ix, it > works. > > Therefore, my APU wants a reset in amdgpu_pmops_suspend. > > By curiosity, I tried to do the reset in amdgpu_pmops_suspend_noirq as was > intended in 5.18 original code, commenting out the test of > amdgpu_acpi_should_gpu_reset(adev) (since this APU wants a reset). > This does not work, I got the Fence timeout errors or freezes. > > If I leave noirq function unchanged (original 5.18 code), and just add a > reset in suspend() as was done in 5.17, it works. > > Therefore, my GPU does NOT want to be reset in noirq, the reset must be in > suspend. > > In other words, I modified amdgpu_pmops_suspend (partial revert) like this > and this works on my laptop: > > static int amdgpu_pmops_suspend(struct device *dev) > { > struct drm_device *drm_dev = dev_get_drvdata(dev); > struct amdgpu_device *adev = drm_to_adev(drm_dev); > + int r; > > if (amdgpu_acpi_is_s0ix_active(adev)) > adev->in_s0ix = true; > else > adev->in_s3 = true; > - return amdgpu_device_suspend(drm_dev, true); > + r = amdgpu_device_suspend(drm_dev, true); > + if (r) > + return r; > + if (!adev->in_s0ix) > + return amdgpu_asic_reset(adev); > return 0; > } > > static int amdgpu_pmops_suspend_noirq(struct device *dev) > { > struct drm_device *drm_dev = dev_get_drvdata(dev); > struct amdgpu_device *adev = drm_to_adev(drm_dev); > > if (amdgpu_acpi_should_gpu_reset(adev)) > return amdgpu_asic_reset(adev); > > return 0; > } > > I don't know if other APU want a reset, in the same context, and how to > differentiate all the cases, so I cannot go further, but I can test patches > if needed. > > CC > > Le mercredi 18 mai 2022, 08:37:27 CEST Thorsten Leemhuis a écrit : > > On 18.05.22 07:54, Kai-Heng Feng wrote: > > > On Wed, May 18, 2022 at 1:52 PM Thorsten Leemhuis > > > > > > wrote: > > >> On 17.05.22 19:37, casteyde.christian@free.fr wrote: > > >>> I've tryied to revert the offending commit on 5.18-rc7 (887f75cfd0da > > >>> ("drm/amdgpu: Ensure HDA function is suspended before ASIC reset"), > > >>> and > > >>> the problem disappears so it's really this commit that breaks. > > >> > > >> In that case I'll update the regzbot status to make sure it's visible > > >> as > > >> regression introduced in the 5.18 cycle: > > >> > > >> #regzbot introduced: 887f75cfd0da > > >> > > >> BTW: obviously would be nice to get this fixed before 5.18 is released > > >> (which might already happen on Sunday), especially as the culprit > > >> apparently was already backported to stable, but I guess that won't be > > >> easy... > > >> > > >> Which made me wondering: is reverting the culprit temporarily in > > >> mainline (and reapplying it later with a fix) a option here? > > > > > > It's too soon to call it's the culprit. > > > > Well, sure, the root-cause might be somewhere else. But from the point > > of kernel regressions (and tracking them) it's the culprit, as that's > > the change that triggers the misbehavior. And that's how Linus > > approaches these things as well when it comes to reverting to fix > > regressions -- and he even might... > > > > > The suspend on the system > > > doesn't work properly at the first place. > > > > ...ignore things like this, as long as a revert is unlikely to cause > > more damage than good. > > > > Ciao. Thorsten > > > > >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' > > >> hat) > > >> > > >> P.S.: As the Linux kernel's regression tracker I deal with a lot of > > >> reports and sometimes miss something important when writing mails like > > >> this. If that's the case here, don't hesitate to tell me in a public > > >> reply, it's in everyone's interest to set the public record straight.