From: "Deng, Emily" <Emily.Deng@amd.com>
To: "Liu, Monk" <Monk.Liu@amd.com>,
"Koenig, Christian" <Christian.Koenig@amd.com>,
"Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
"Chen, Horace" <Horace.Chen@amd.com>,
"Chen, JingWen" <JingWen.Chen2@amd.com>
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
Date: Fri, 24 Dec 2021 08:58:23 +0000 [thread overview]
Message-ID: <PH0PR12MB5417F12B403B8181D5CD03988F7F9@PH0PR12MB5417.namprd12.prod.outlook.com> (raw)
In-Reply-To: <BL1PR12MB5269AE1B82F1D07433B95B59847E9@BL1PR12MB5269.namprd12.prod.outlook.com>
These patches look good to me. JingWen will pull these patches and do some basic TDR test on sriov environment, and give feedback.
Best wishes
Emily Deng
>-----Original Message-----
>From: Liu, Monk <Monk.Liu@amd.com>
>Sent: Thursday, December 23, 2021 6:14 PM
>To: Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
><Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-
>gfx@lists.freedesktop.org; Chen, Horace <Horace.Chen@amd.com>; Chen,
>JingWen <JingWen.Chen2@amd.com>; Deng, Emily <Emily.Deng@amd.com>
>Cc: daniel@ffwll.ch
>Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>[AMD Official Use Only]
>
>@Chen, Horace @Chen, JingWen @Deng, Emily
>
>Please take a review on Andrey's patch
>
>Thanks
>-------------------------------------------------------------------
>Monk Liu | Cloud GPU & Virtualization Solution | AMD
>-------------------------------------------------------------------
>we are hiring software manager for CVS core team
>-------------------------------------------------------------------
>
>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Thursday, December 23, 2021 4:42 PM
>To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-
>devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>Cc: daniel@ffwll.ch; Liu, Monk <Monk.Liu@amd.com>; Chen, Horace
><Horace.Chen@amd.com>
>Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:
>> Since now flr work is serialized against GPU resets there is no need
>> for this.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
>Acked-by: Christian König <christian.koenig@amd.com>
>
>> ---
>> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 -----------
>> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 -----------
>> 2 files changed, 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> index 487cd654b69e..7d59a66e3988 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>> struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>> int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> - /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> - * otherwise the mailbox msg will be ruined/reseted by
>> - * the VF FLR.
>> - */
>> - if (!down_write_trylock(&adev->reset_sem))
>> - return;
>> -
>> amdgpu_virt_fini_data_exchange(adev);
>> - atomic_set(&adev->in_gpu_reset, 1);
>>
>> xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>> } while (timeout > 1);
>>
>> flr_done:
>> - atomic_set(&adev->in_gpu_reset, 0);
>> - up_write(&adev->reset_sem);
>> -
>> /* Trigger recovery for world switch failure if no TDR */
>> if (amdgpu_device_should_recover_gpu(adev)
>> && (!amdgpu_device_has_job_running(adev) || diff --git
>> a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index e3869067a31d..f82c066c8e8d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>> struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>> int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> - /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> - * otherwise the mailbox msg will be ruined/reseted by
>> - * the VF FLR.
>> - */
>> - if (!down_write_trylock(&adev->reset_sem))
>> - return;
>> -
>> amdgpu_virt_fini_data_exchange(adev);
>> - atomic_set(&adev->in_gpu_reset, 1);
>>
>> xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>> } while (timeout > 1);
>>
>> flr_done:
>> - atomic_set(&adev->in_gpu_reset, 0);
>> - up_write(&adev->reset_sem);
>> -
>> /* Trigger recovery for world switch failure if no TDR */
>> if (amdgpu_device_should_recover_gpu(adev)
>> && (!amdgpu_device_has_job_running(adev) ||
next prev parent reply other threads:[~2021-12-24 8:58 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-22 22:04 [RFC v2 0/8] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky
2021-12-22 22:04 ` [RFC v2 1/8] drm/amdgpu: Introduce reset domain Andrey Grodzovsky
2021-12-22 22:05 ` [RFC v2 2/8] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky
2021-12-23 8:39 ` Christian König
2021-12-22 22:05 ` [RFC v2 3/8] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky
2021-12-23 8:40 ` Christian König
2021-12-22 22:05 ` [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky
2021-12-23 8:41 ` Christian König
2022-01-05 9:54 ` Lazar, Lijo
2022-01-05 12:31 ` Christian König
2022-01-05 13:11 ` Lazar, Lijo
2022-01-05 13:15 ` Christian König
2022-01-05 13:26 ` Lazar, Lijo
2022-01-05 13:41 ` Christian König
2022-01-05 18:11 ` Andrey Grodzovsky
2022-01-17 19:14 ` Andrey Grodzovsky
2022-01-17 19:17 ` Christian König
2022-01-17 19:21 ` Andrey Grodzovsky
2022-01-26 15:52 ` Andrey Grodzovsky
2022-01-28 16:57 ` Grodzovsky, Andrey
2022-02-07 2:41 ` JingWen Chen
2022-02-07 3:08 ` Grodzovsky, Andrey
2021-12-22 22:13 ` [RFC v2 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue Andrey Grodzovsky
2021-12-22 22:13 ` [RFC v2 6/8] drm/amdgpu: Drop hive->in_reset Andrey Grodzovsky
2021-12-22 22:13 ` [RFC v2 7/8] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky
2021-12-22 22:14 ` [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV Andrey Grodzovsky
2021-12-23 8:42 ` Christian König
2021-12-23 10:14 ` Liu, Monk
2021-12-24 8:58 ` Deng, Emily [this message]
2021-12-24 9:57 ` JingWen Chen
2021-12-30 18:45 ` Andrey Grodzovsky
2022-01-03 10:17 ` Christian König
2022-01-04 9:07 ` JingWen Chen
2022-01-04 10:18 ` Christian König
2022-01-04 10:49 ` Liu, Monk
2022-01-04 11:36 ` Christian König
2022-01-04 16:56 ` Andrey Grodzovsky
2022-01-05 7:34 ` JingWen Chen
2022-01-05 7:59 ` Christian König
2022-01-05 18:24 ` Andrey Grodzovsky
2022-01-06 4:59 ` JingWen Chen
2022-01-06 5:18 ` JingWen Chen
2022-01-06 9:13 ` Christian König
2022-01-06 19:13 ` Andrey Grodzovsky
2022-01-07 3:57 ` JingWen Chen
2022-01-07 5:46 ` JingWen Chen
2022-01-07 16:02 ` Andrey Grodzovsky
2022-01-12 6:28 ` JingWen Chen
2022-01-04 17:13 ` Liu, Shaoyun
2022-01-04 20:54 ` Andrey Grodzovsky
2022-01-05 0:01 ` Liu, Shaoyun
2022-01-05 7:25 ` JingWen Chen
2021-12-30 18:39 ` Andrey Grodzovsky
2021-12-23 18:07 ` Liu, Shaoyun
2021-12-23 18:29 ` [RFC v3 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue Andrey Grodzovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PH0PR12MB5417F12B403B8181D5CD03988F7F9@PH0PR12MB5417.namprd12.prod.outlook.com \
--to=emily.deng@amd.com \
--cc=Andrey.Grodzovsky@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=Horace.Chen@amd.com \
--cc=JingWen.Chen2@amd.com \
--cc=Monk.Liu@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).