dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Deng, Emily" <Emily.Deng@amd.com>
To: "Liu, Monk" <Monk.Liu@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
	 "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Chen, Horace" <Horace.Chen@amd.com>,
	"Chen, JingWen" <JingWen.Chen2@amd.com>
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
Date: Fri, 24 Dec 2021 08:58:23 +0000	[thread overview]
Message-ID: <PH0PR12MB5417F12B403B8181D5CD03988F7F9@PH0PR12MB5417.namprd12.prod.outlook.com> (raw)
In-Reply-To: <BL1PR12MB5269AE1B82F1D07433B95B59847E9@BL1PR12MB5269.namprd12.prod.outlook.com>

These patches look good to me. JingWen will pull these patches and do some basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng



>-----Original Message-----
>From: Liu, Monk <Monk.Liu@amd.com>
>Sent: Thursday, December 23, 2021 6:14 PM
>To: Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey
><Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-
>gfx@lists.freedesktop.org; Chen, Horace <Horace.Chen@amd.com>; Chen,
>JingWen <JingWen.Chen2@amd.com>; Deng, Emily <Emily.Deng@amd.com>
>Cc: daniel@ffwll.ch
>Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>[AMD Official Use Only]
>
>@Chen, Horace @Chen, JingWen @Deng, Emily
>
>Please take a review on Andrey's patch
>
>Thanks
>-------------------------------------------------------------------
>Monk Liu | Cloud GPU & Virtualization Solution | AMD
>-------------------------------------------------------------------
>we are hiring software manager for CVS core team
>-------------------------------------------------------------------
>
>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Thursday, December 23, 2021 4:42 PM
>To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-
>devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>Cc: daniel@ffwll.ch; Liu, Monk <Monk.Liu@amd.com>; Chen, Horace
><Horace.Chen@amd.com>
>Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
>for SRIOV
>
>Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:
>> Since now flr work is serialized against  GPU resets there is no need
>> for this.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
>Acked-by: Christian König <christian.koenig@amd.com>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 -----------
>>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 -----------
>>   2 files changed, 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> index 487cd654b69e..7d59a66e3988 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>   	struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>   	int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> -	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> -	 * otherwise the mailbox msg will be ruined/reseted by
>> -	 * the VF FLR.
>> -	 */
>> -	if (!down_write_trylock(&adev->reset_sem))
>> -		return;
>> -
>>   	amdgpu_virt_fini_data_exchange(adev);
>> -	atomic_set(&adev->in_gpu_reset, 1);
>>
>>   	xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct
>work_struct *work)
>>   	} while (timeout > 1);
>>
>>   flr_done:
>> -	atomic_set(&adev->in_gpu_reset, 0);
>> -	up_write(&adev->reset_sem);
>> -
>>   	/* Trigger recovery for world switch failure if no TDR */
>>   	if (amdgpu_device_should_recover_gpu(adev)
>>   		&& (!amdgpu_device_has_job_running(adev) || diff --git
>> a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index e3869067a31d..f82c066c8e8d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>   	struct amdgpu_device *adev = container_of(virt, struct
>amdgpu_device, virt);
>>   	int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> -	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> -	 * otherwise the mailbox msg will be ruined/reseted by
>> -	 * the VF FLR.
>> -	 */
>> -	if (!down_write_trylock(&adev->reset_sem))
>> -		return;
>> -
>>   	amdgpu_virt_fini_data_exchange(adev);
>> -	atomic_set(&adev->in_gpu_reset, 1);
>>
>>   	xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>>   	} while (timeout > 1);
>>
>>   flr_done:
>> -	atomic_set(&adev->in_gpu_reset, 0);
>> -	up_write(&adev->reset_sem);
>> -
>>   	/* Trigger recovery for world switch failure if no TDR */
>>   	if (amdgpu_device_should_recover_gpu(adev)
>>   		&& (!amdgpu_device_has_job_running(adev) ||

  reply	other threads:[~2021-12-24  8:58 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 22:04 [RFC v2 0/8] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky
2021-12-22 22:04 ` [RFC v2 1/8] drm/amdgpu: Introduce reset domain Andrey Grodzovsky
2021-12-22 22:05 ` [RFC v2 2/8] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky
2021-12-23  8:39   ` Christian König
2021-12-22 22:05 ` [RFC v2 3/8] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky
2021-12-23  8:40   ` Christian König
2021-12-22 22:05 ` [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky
2021-12-23  8:41   ` Christian König
2022-01-05  9:54   ` Lazar, Lijo
2022-01-05 12:31     ` Christian König
2022-01-05 13:11       ` Lazar, Lijo
2022-01-05 13:15         ` Christian König
2022-01-05 13:26           ` Lazar, Lijo
2022-01-05 13:41             ` Christian König
2022-01-05 18:11       ` Andrey Grodzovsky
2022-01-17 19:14         ` Andrey Grodzovsky
2022-01-17 19:17           ` Christian König
2022-01-17 19:21             ` Andrey Grodzovsky
2022-01-26 15:52               ` Andrey Grodzovsky
2022-01-28 16:57                 ` Grodzovsky, Andrey
2022-02-07  2:41                   ` JingWen Chen
2022-02-07  3:08                     ` Grodzovsky, Andrey
2021-12-22 22:13 ` [RFC v2 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue Andrey Grodzovsky
2021-12-22 22:13   ` [RFC v2 6/8] drm/amdgpu: Drop hive->in_reset Andrey Grodzovsky
2021-12-22 22:13   ` [RFC v2 7/8] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky
2021-12-22 22:14   ` [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV Andrey Grodzovsky
2021-12-23  8:42     ` Christian König
2021-12-23 10:14       ` Liu, Monk
2021-12-24  8:58         ` Deng, Emily [this message]
2021-12-24  9:57           ` JingWen Chen
2021-12-30 18:45             ` Andrey Grodzovsky
2022-01-03 10:17               ` Christian König
2022-01-04  9:07                 ` JingWen Chen
2022-01-04 10:18                   ` Christian König
2022-01-04 10:49                     ` Liu, Monk
2022-01-04 11:36                       ` Christian König
2022-01-04 16:56                         ` Andrey Grodzovsky
2022-01-05  7:34                           ` JingWen Chen
2022-01-05  7:59                             ` Christian König
2022-01-05 18:24                               ` Andrey Grodzovsky
2022-01-06  4:59                                 ` JingWen Chen
2022-01-06  5:18                                   ` JingWen Chen
2022-01-06  9:13                                     ` Christian König
2022-01-06 19:13                                     ` Andrey Grodzovsky
2022-01-07  3:57                                       ` JingWen Chen
2022-01-07  5:46                                         ` JingWen Chen
2022-01-07 16:02                                           ` Andrey Grodzovsky
2022-01-12  6:28                                             ` JingWen Chen
2022-01-04 17:13                         ` Liu, Shaoyun
2022-01-04 20:54                           ` Andrey Grodzovsky
2022-01-05  0:01                             ` Liu, Shaoyun
2022-01-05  7:25                         ` JingWen Chen
2021-12-30 18:39           ` Andrey Grodzovsky
2021-12-23 18:07     ` Liu, Shaoyun
2021-12-23 18:29   ` [RFC v3 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR12MB5417F12B403B8181D5CD03988F7F9@PH0PR12MB5417.namprd12.prod.outlook.com \
    --to=emily.deng@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Horace.Chen@amd.com \
    --cc=JingWen.Chen2@amd.com \
    --cc=Monk.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).