All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Quan, Evan" <Evan.Quan@amd.com>
To: Salvatore Bonaccorso <carnil@debian.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: Dominique Dumont <dod@debian.org>,
	"1005005@bugs.debian.org" <1005005@bugs.debian.org>,
	"Tuikov, Luben" <Luben.Tuikov@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
Date: Tue, 15 Feb 2022 03:07:12 +0000	[thread overview]
Message-ID: <DM6PR12MB261963959BB02323CF27C0ACE4349@DM6PR12MB2619.namprd12.prod.outlook.com> (raw)
In-Reply-To: <Ygf7KuWyc0d4HIFu@eldamar.lan>

[AMD Official Use Only]



> -----Original Message-----
> From: Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> On Behalf
> Of Salvatore Bonaccorso
> Sent: Sunday, February 13, 2022 2:24 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Dominique Dumont <dod@debian.org>; 1005005@bugs.debian.org;
> Tuikov, Luben <Luben.Tuikov@amd.com>; Quan, Evan
> <Evan.Quan@amd.com>; Sasha Levin <sashal@kernel.org>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; David
> Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-
> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org
> Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
> in suspend (v2)") on suspend?
> 
> Hi Alex, hi all
> 
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA
> %3D&amp;reserved=0 that afer an update to 5.15.15 based kernel, his
> machine noe longer suspends correctly, after screen going black as usual it
> comes back. The Debian bug above contians a trace.
> 
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
> 
> 	3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> 	commit 3c196f05666610912645c7c5d9107706003f67c3
> 	Author: Alex Deucher <alexander.deucher@amd.com>
> 	Date:   Fri Nov 12 11:25:30 2021 -0500
> 
> 	    drm/amdgpu: always reset the asic in suspend (v2)
> 
> 	    [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
> 
> 	    If the platform suspend happens to fail and the power rail
> 	    is not turned off, the GPU will be in an unknown state on
> 	    resume, so reset the asic so that it will be in a known
> 	    good state on resume even if the platform suspend failed.
> 
> 	    v2: handle s0ix
> 
> 	    Acked-by: Luben Tuikov <luben.tuikov@amd.com>
> 	    Acked-by: Evan Quan <evan.quan@amd.com>
> 	    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 	    Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> 	 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 	 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> to be the first bad commit, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005%2334&amp;data=04%7C01%7Cevan.quan%40amd.c
> om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d
> 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV
> NAcw5qaI%3D&amp;reserved=0 .
I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend(). 
That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()). 
So we might got a wrong regression point here.
[  257.842851]  ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
[  257.843356]  amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
[  257.843771]  amdgpu_device_suspend+0x62/0xc0 [amdgpu]
[  257.844184]  amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
[  257.844631]  pci_pm_suspend+0x71/0x160
[  257.844643]  ? pci_pm_freeze+0xb0/0xb0

BR
Evan
> 
> Does this ring any bell? Any idea on the problem?
> 
> Regards,
> Salvatore

WARNING: multiple messages have this Message-ID (diff)
From: "Quan, Evan" <Evan.Quan@amd.com>
To: Salvatore Bonaccorso <carnil@debian.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: Sasha Levin <sashal@kernel.org>, David Airlie <airlied@linux.ie>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"1005005@bugs.debian.org" <1005005@bugs.debian.org>,
	"Tuikov, Luben" <Luben.Tuikov@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	Dominique Dumont <dod@debian.org>
Subject: RE: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
Date: Tue, 15 Feb 2022 03:07:12 +0000	[thread overview]
Message-ID: <DM6PR12MB261963959BB02323CF27C0ACE4349@DM6PR12MB2619.namprd12.prod.outlook.com> (raw)
In-Reply-To: <Ygf7KuWyc0d4HIFu@eldamar.lan>

[AMD Official Use Only]



> -----Original Message-----
> From: Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> On Behalf
> Of Salvatore Bonaccorso
> Sent: Sunday, February 13, 2022 2:24 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Dominique Dumont <dod@debian.org>; 1005005@bugs.debian.org;
> Tuikov, Luben <Luben.Tuikov@amd.com>; Quan, Evan
> <Evan.Quan@amd.com>; Sasha Levin <sashal@kernel.org>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; David
> Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-
> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org
> Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
> in suspend (v2)") on suspend?
> 
> Hi Alex, hi all
> 
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA
> %3D&amp;reserved=0 that afer an update to 5.15.15 based kernel, his
> machine noe longer suspends correctly, after screen going black as usual it
> comes back. The Debian bug above contians a trace.
> 
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
> 
> 	3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> 	commit 3c196f05666610912645c7c5d9107706003f67c3
> 	Author: Alex Deucher <alexander.deucher@amd.com>
> 	Date:   Fri Nov 12 11:25:30 2021 -0500
> 
> 	    drm/amdgpu: always reset the asic in suspend (v2)
> 
> 	    [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
> 
> 	    If the platform suspend happens to fail and the power rail
> 	    is not turned off, the GPU will be in an unknown state on
> 	    resume, so reset the asic so that it will be in a known
> 	    good state on resume even if the platform suspend failed.
> 
> 	    v2: handle s0ix
> 
> 	    Acked-by: Luben Tuikov <luben.tuikov@amd.com>
> 	    Acked-by: Evan Quan <evan.quan@amd.com>
> 	    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 	    Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> 	 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 	 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> to be the first bad commit, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005%2334&amp;data=04%7C01%7Cevan.quan%40amd.c
> om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d
> 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV
> NAcw5qaI%3D&amp;reserved=0 .
I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend(). 
That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()). 
So we might got a wrong regression point here.
[  257.842851]  ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
[  257.843356]  amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
[  257.843771]  amdgpu_device_suspend+0x62/0xc0 [amdgpu]
[  257.844184]  amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
[  257.844631]  pci_pm_suspend+0x71/0x160
[  257.844643]  ? pci_pm_freeze+0xb0/0xb0

BR
Evan
> 
> Does this ring any bell? Any idea on the problem?
> 
> Regards,
> Salvatore

WARNING: multiple messages have this Message-ID (diff)
From: "Quan, Evan" <Evan.Quan@amd.com>
To: Salvatore Bonaccorso <carnil@debian.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: Sasha Levin <sashal@kernel.org>, David Airlie <airlied@linux.ie>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"1005005@bugs.debian.org" <1005005@bugs.debian.org>,
	"Tuikov, Luben" <Luben.Tuikov@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Daniel Vetter <daniel@ffwll.ch>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	Dominique Dumont <dod@debian.org>
Subject: RE: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?
Date: Tue, 15 Feb 2022 03:07:12 +0000	[thread overview]
Message-ID: <DM6PR12MB261963959BB02323CF27C0ACE4349@DM6PR12MB2619.namprd12.prod.outlook.com> (raw)
In-Reply-To: <Ygf7KuWyc0d4HIFu@eldamar.lan>

[AMD Official Use Only]



> -----Original Message-----
> From: Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> On Behalf
> Of Salvatore Bonaccorso
> Sent: Sunday, February 13, 2022 2:24 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Dominique Dumont <dod@debian.org>; 1005005@bugs.debian.org;
> Tuikov, Luben <Luben.Tuikov@amd.com>; Quan, Evan
> <Evan.Quan@amd.com>; Sasha Levin <sashal@kernel.org>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; David
> Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-
> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org
> Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
> in suspend (v2)") on suspend?
> 
> Hi Alex, hi all
> 
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA
> %3D&amp;reserved=0 that afer an update to 5.15.15 based kernel, his
> machine noe longer suspends correctly, after screen going black as usual it
> comes back. The Debian bug above contians a trace.
> 
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
> 
> 	3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> 	commit 3c196f05666610912645c7c5d9107706003f67c3
> 	Author: Alex Deucher <alexander.deucher@amd.com>
> 	Date:   Fri Nov 12 11:25:30 2021 -0500
> 
> 	    drm/amdgpu: always reset the asic in suspend (v2)
> 
> 	    [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
> 
> 	    If the platform suspend happens to fail and the power rail
> 	    is not turned off, the GPU will be in an unknown state on
> 	    resume, so reset the asic so that it will be in a known
> 	    good state on resume even if the platform suspend failed.
> 
> 	    v2: handle s0ix
> 
> 	    Acked-by: Luben Tuikov <luben.tuikov@amd.com>
> 	    Acked-by: Evan Quan <evan.quan@amd.com>
> 	    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 	    Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> 	 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 	 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> to be the first bad commit, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005%2334&amp;data=04%7C01%7Cevan.quan%40amd.c
> om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d
> 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV
> NAcw5qaI%3D&amp;reserved=0 .
I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend(). 
That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()). 
So we might got a wrong regression point here.
[  257.842851]  ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
[  257.843356]  amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
[  257.843771]  amdgpu_device_suspend+0x62/0xc0 [amdgpu]
[  257.844184]  amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
[  257.844631]  pci_pm_suspend+0x71/0x160
[  257.844643]  ? pci_pm_freeze+0xb0/0xb0

BR
Evan
> 
> Does this ring any bell? Any idea on the problem?
> 
> Regards,
> Salvatore

  parent reply	other threads:[~2022-02-15  3:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-12 18:23 Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend? Salvatore Bonaccorso
2022-02-12 18:23 ` Salvatore Bonaccorso
2022-02-12 18:23 ` Salvatore Bonaccorso
2022-02-14 12:17 ` Thorsten Leemhuis
2022-02-14 12:17   ` Thorsten Leemhuis
2022-02-14 12:17   ` Thorsten Leemhuis
2022-02-14 21:52 ` Alex Deucher
2022-02-14 21:52   ` Alex Deucher
2022-02-20 15:48   ` Dominique Dumont
2022-02-20 15:48     ` Dominique Dumont
2022-02-20 18:03     ` Eric Valette
2022-02-20 18:03       ` Eric Valette
2022-02-21 14:16       ` Alex Deucher
2022-02-21 14:16         ` Alex Deucher
2022-03-21  8:57         ` Thorsten Leemhuis
2022-03-21  8:57           ` Thorsten Leemhuis
2022-03-21  8:57           ` Thorsten Leemhuis
2022-03-21 12:07           ` Éric Valette
2022-03-21 12:07             ` Éric Valette
2022-03-21 12:07             ` Éric Valette
2022-03-21 14:30             ` Thorsten Leemhuis
2022-03-21 14:30               ` Thorsten Leemhuis
2022-03-21 14:30               ` Thorsten Leemhuis
2022-03-24 14:39               ` Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend? #forregzbot Thorsten Leemhuis
2022-03-21 18:49           ` Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend? Dominique Dumont
2022-03-21 18:49             ` Dominique Dumont
2022-03-21 18:49             ` Dominique Dumont
2022-03-21 19:00             ` Thorsten Leemhuis
2022-03-21 19:00               ` Thorsten Leemhuis
2022-03-21 19:00               ` Thorsten Leemhuis
2022-03-21 19:09             ` Bug#1005005: " Diederik de Haas
2022-03-21 19:09               ` Diederik de Haas
2022-03-21 19:09               ` Diederik de Haas
2022-02-15  3:07 ` Quan, Evan [this message]
2022-02-15  3:07   ` Quan, Evan
2022-02-15  3:07   ` Quan, Evan
2022-02-24 12:22 Éric Valette
2022-02-24 12:22 ` Éric Valette

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB261963959BB02323CF27C0ACE4349@DM6PR12MB2619.namprd12.prod.outlook.com \
    --to=evan.quan@amd.com \
    --cc=1005005@bugs.debian.org \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Luben.Tuikov@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=carnil@debian.org \
    --cc=daniel@ffwll.ch \
    --cc=dod@debian.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.