All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: "Chen, Guchun" <Guchun.Chen@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal
Date: Fri, 1 Oct 2021 10:28:49 -0400	[thread overview]
Message-ID: <b45147f2-9ced-23f2-5020-2ff9aff1e12d@amd.com> (raw)
In-Reply-To: <DM5PR12MB2469D34B746A0C1927906879F1AB9@DM5PR12MB2469.namprd12.prod.outlook.com>

No, scheduler restart and device unlock must take place 
inamdgpu_pci_resume (see struct pci_error_handlers for the various 
states of PCI recovery). So just add a flag (probably in amdgpu_device) 
so we can remember what pci_channel_state_t we came from (unfortunately 
it's not passed to us in  amdgpu_pci_resume) and unless it's set don't 
do anything in amdgpu_pci_resume.

Andrey

On 2021-10-01 4:21 a.m., Chen, Guchun wrote:
> [Public]
>
> Hi Andrey,
>
> Do you mean to move the code of drm_sched_resubmit_jobs and drm_sched_start in amdgpu_pci_resume to amdgpu_pci_error_detected, under the case pci_channel_io_frozen?
> Then leave amdgpu_pci_resume as a null function, and in this way, we can drop the acquire/lock write lock for case of pci_channel_io_normal as well?
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Friday, October 1, 2021 10:22 AM
> To: Chen, Guchun <Guchun.Chen@amd.com>; amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal
>
> On 2021-09-30 10:00 p.m., Guchun Chen wrote:
>
>> When a PCI error state pci_channel_io_normal is detectd, it will
>> report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver
>> will continue the execution of PCI resume callback report_resume by
>> pci_walk_bridge, and the callback will go into amdgpu_pci_resume
>> finally, where write lock is releasd unconditionally without acquiring
>> such lock.
>
> Good catch but, the issue is even wider in scope, what about drm_sched_resubmit_jobs and drm_sched_start called without being stopped before ? Better to put the entire scope of code in this function under flag that set only in pci_channel_io_frozen. As far as i remember we don't need to do anything in case of pci_channel_io_normal.
>
> Andrey
>
>
>> Fixes: c9a6b82f45e2("drm/amdgpu: Implement DPC recovery")
>> Signed-off-by: Guchun Chen <guchun.chen@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>    1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bb5ad2b6ca13..12f822d51de2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5370,6 +5370,7 @@ pci_ers_result_t
>> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>>    
>>    	switch (state) {
>>    	case pci_channel_io_normal:
>> +		amdgpu_device_lock_adev(adev, NULL);
>>    		return PCI_ERS_RESULT_CAN_RECOVER;
>>    	/* Fatal error, prepare for slot reset */
>>    	case pci_channel_io_frozen:

  reply	other threads:[~2021-10-01 14:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01  2:00 [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal Guchun Chen
2021-10-01  2:21 ` Andrey Grodzovsky
2021-10-01  8:21   ` Chen, Guchun
2021-10-01 14:28     ` Andrey Grodzovsky [this message]
2021-10-01 15:21       ` Chen, Guchun
2021-10-02 15:20         ` Chen, Guchun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b45147f2-9ced-23f2-5020-2ff9aff1e12d@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Guchun.Chen@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.