All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
@ 2022-01-18 16:15 Eric Huang
  2022-01-18 16:35 ` Alex Deucher
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Huang @ 2022-01-18 16:15 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..7b24a920c12e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				true);
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
-	/* Only apply no TLB flush on Aldebaran to
-	 * workaround regressions on other Asics.
+	/* Only apply no TLB flush on Aldebaran and Arcturus
+	 * to workaround regressions on other Asics.
 	 */
-	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
+	if (table_freed &&
+	    (adev->asic_type != CHIP_ALDEBARAN) &&
+	    (adev->asic_type != CHIP_ARCTURUS))
 		*table_freed = true;
 
 	goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..ef4d676cc71c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 	mutex_unlock(&p->mutex);
 
-	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+	    KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
 		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
 				(struct kgd_mem *) mem, true);
 		if (err) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 16:15 [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus Eric Huang
@ 2022-01-18 16:35 ` Alex Deucher
  2022-01-18 19:09   ` Eric Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Deucher @ 2022-01-18 16:35 UTC (permalink / raw)
  To: Eric Huang; +Cc: amd-gfx list

On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
>
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.

Do we need to check for a specific fw version?

Alex

>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..7b24a920c12e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>                                 true);
>         ret = unreserve_bo_and_vms(&ctx, false, false);
>
> -       /* Only apply no TLB flush on Aldebaran to
> -        * workaround regressions on other Asics.
> +       /* Only apply no TLB flush on Aldebaran and Arcturus
> +        * to workaround regressions on other Asics.
>          */
> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> +       if (table_freed &&
> +           (adev->asic_type != CHIP_ALDEBARAN) &&
> +           (adev->asic_type != CHIP_ARCTURUS))
>                 *table_freed = true;
>
>         goto out;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..ef4d676cc71c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>         }
>         mutex_unlock(&p->mutex);
>
> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
>                 err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>                                 (struct kgd_mem *) mem, true);
>                 if (err) {
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 16:35 ` Alex Deucher
@ 2022-01-18 19:09   ` Eric Huang
  2022-01-18 19:27     ` Russell, Kent
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Huang @ 2022-01-18 19:09 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx list

The SDMA fix is generic and not in a specific version of FW, so we don't 
have to check.

Thanks,
Eric

On 2022-01-18 11:35, Alex Deucher wrote:
> On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
>> SDMA FW fixes the hang issue for adding heavy-weight TLB
>> flush on Arcturus, so we can enable it.
> Do we need to check for a specific fw version?
>
> Alex
>
>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
>>   2 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a64cbbd943ba..7b24a920c12e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>>                                  true);
>>          ret = unreserve_bo_and_vms(&ctx, false, false);
>>
>> -       /* Only apply no TLB flush on Aldebaran to
>> -        * workaround regressions on other Asics.
>> +       /* Only apply no TLB flush on Aldebaran and Arcturus
>> +        * to workaround regressions on other Asics.
>>           */
>> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
>> +       if (table_freed &&
>> +           (adev->asic_type != CHIP_ALDEBARAN) &&
>> +           (adev->asic_type != CHIP_ARCTURUS))
>>                  *table_freed = true;
>>
>>          goto out;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index b570c0454ce9..ef4d676cc71c 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>>          }
>>          mutex_unlock(&p->mutex);
>>
>> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
>> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
>> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
>>                  err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>>                                  (struct kgd_mem *) mem, true);
>>                  if (err) {
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 19:09   ` Eric Huang
@ 2022-01-18 19:27     ` Russell, Kent
  2022-01-18 19:35       ` Alex Deucher
  0 siblings, 1 reply; 20+ messages in thread
From: Russell, Kent @ 2022-01-18 19:27 UTC (permalink / raw)
  To: Huang, JinHuiEric, Alex Deucher; +Cc: amd-gfx list

[AMD Official Use Only]

I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix?

 Kent

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Eric Huang
> Sent: Tuesday, January 18, 2022 2:09 PM
> To: Alex Deucher <alexdeucher@gmail.com>
> Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>
> The SDMA fix is generic and not in a specific version of FW, so we don't
> have to check.
>
> Thanks,
> Eric
>
> On 2022-01-18 11:35, Alex Deucher wrote:
> > On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
> >> SDMA FW fixes the hang issue for adding heavy-weight TLB
> >> flush on Arcturus, so we can enable it.
> > Do we need to check for a specific fw version?
> >
> > Alex
> >
> >> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
> >>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
> >>   2 files changed, 7 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> index a64cbbd943ba..7b24a920c12e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >>                                  true);
> >>          ret = unreserve_bo_and_vms(&ctx, false, false);
> >>
> >> -       /* Only apply no TLB flush on Aldebaran to
> >> -        * workaround regressions on other Asics.
> >> +       /* Only apply no TLB flush on Aldebaran and Arcturus
> >> +        * to workaround regressions on other Asics.
> >>           */
> >> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> >> +       if (table_freed &&
> >> +           (adev->asic_type != CHIP_ALDEBARAN) &&
> >> +           (adev->asic_type != CHIP_ARCTURUS))
> >>                  *table_freed = true;
> >>
> >>          goto out;
> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >> index b570c0454ce9..ef4d676cc71c 100644
> >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file
> *filep,
> >>          }
> >>          mutex_unlock(&p->mutex);
> >>
> >> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> >> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> >> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
> >>                  err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> >>                                  (struct kgd_mem *) mem, true);
> >>                  if (err) {
> >> --
> >> 2.25.1
> >>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 19:27     ` Russell, Kent
@ 2022-01-18 19:35       ` Alex Deucher
  2022-01-18 20:13         ` Eric Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Deucher @ 2022-01-18 19:35 UTC (permalink / raw)
  To: Russell, Kent; +Cc: Huang, JinHuiEric, amd-gfx list

On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent <Kent.Russell@amd.com> wrote:
>
> [AMD Official Use Only]
>
> I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix?

Yes, that was my concern.

Alex

>
>  Kent
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Eric Huang
> > Sent: Tuesday, January 18, 2022 2:09 PM
> > To: Alex Deucher <alexdeucher@gmail.com>
> > Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
> > Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
> >
> > The SDMA fix is generic and not in a specific version of FW, so we don't
> > have to check.
> >
> > Thanks,
> > Eric
> >
> > On 2022-01-18 11:35, Alex Deucher wrote:
> > > On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
> > >> SDMA FW fixes the hang issue for adding heavy-weight TLB
> > >> flush on Arcturus, so we can enable it.
> > > Do we need to check for a specific fw version?
> > >
> > > Alex
> > >
> > >> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> > >> ---
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
> > >>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
> > >>   2 files changed, 7 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> index a64cbbd943ba..7b24a920c12e 100644
> > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> > >>                                  true);
> > >>          ret = unreserve_bo_and_vms(&ctx, false, false);
> > >>
> > >> -       /* Only apply no TLB flush on Aldebaran to
> > >> -        * workaround regressions on other Asics.
> > >> +       /* Only apply no TLB flush on Aldebaran and Arcturus
> > >> +        * to workaround regressions on other Asics.
> > >>           */
> > >> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> > >> +       if (table_freed &&
> > >> +           (adev->asic_type != CHIP_ALDEBARAN) &&
> > >> +           (adev->asic_type != CHIP_ARCTURUS))
> > >>                  *table_freed = true;
> > >>
> > >>          goto out;
> > >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > >> index b570c0454ce9..ef4d676cc71c 100644
> > >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > >> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file
> > *filep,
> > >>          }
> > >>          mutex_unlock(&p->mutex);
> > >>
> > >> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> > >> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> > >> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
> > >>                  err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> > >>                                  (struct kgd_mem *) mem, true);
> > >>                  if (err) {
> > >> --
> > >> 2.25.1
> > >>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 19:35       ` Alex Deucher
@ 2022-01-18 20:13         ` Eric Huang
  2022-01-18 20:15           ` Alex Deucher
                             ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Eric Huang @ 2022-01-18 20:13 UTC (permalink / raw)
  To: Alex Deucher, Russell, Kent; +Cc: amd-gfx list

I understand Alex's concern. I think usually we only check the version 
when a feature is only available in a specific version, and other 
version or newer version doesn't have.

In case of FW fix, we assume the driver and FWs have to be synchronous. 
If we have driver backward compatibility for FWs, there must be a lot of 
redundant codes for FW version check. So this patch and SDMA fix will be 
pushed into ROCm 5.1 release branch at the same time.

Regards,
Eric

On 2022-01-18 14:35, Alex Deucher wrote:
> On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent <Kent.Russell@amd.com> wrote:
>> [AMD Official Use Only]
>>
>> I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix?
> Yes, that was my concern.
>
> Alex
>
>>   Kent
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Eric Huang
>>> Sent: Tuesday, January 18, 2022 2:09 PM
>>> To: Alex Deucher <alexdeucher@gmail.com>
>>> Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
>>> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>>>
>>> The SDMA fix is generic and not in a specific version of FW, so we don't
>>> have to check.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On 2022-01-18 11:35, Alex Deucher wrote:
>>>> On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
>>>>> SDMA FW fixes the hang issue for adding heavy-weight TLB
>>>>> flush on Arcturus, so we can enable it.
>>>> Do we need to check for a specific fw version?
>>>>
>>>> Alex
>>>>
>>>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
>>>>>    drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
>>>>>    2 files changed, 7 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> index a64cbbd943ba..7b24a920c12e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>>>>>                                   true);
>>>>>           ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>
>>>>> -       /* Only apply no TLB flush on Aldebaran to
>>>>> -        * workaround regressions on other Asics.
>>>>> +       /* Only apply no TLB flush on Aldebaran and Arcturus
>>>>> +        * to workaround regressions on other Asics.
>>>>>            */
>>>>> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
>>>>> +       if (table_freed &&
>>>>> +           (adev->asic_type != CHIP_ALDEBARAN) &&
>>>>> +           (adev->asic_type != CHIP_ARCTURUS))
>>>>>                   *table_freed = true;
>>>>>
>>>>>           goto out;
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> index b570c0454ce9..ef4d676cc71c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file
>>> *filep,
>>>>>           }
>>>>>           mutex_unlock(&p->mutex);
>>>>>
>>>>> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
>>>>> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
>>>>> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
>>>>>                   err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>>>>>                                   (struct kgd_mem *) mem, true);
>>>>>                   if (err) {
>>>>> --
>>>>> 2.25.1
>>>>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 20:13         ` Eric Huang
@ 2022-01-18 20:15           ` Alex Deucher
  2022-01-18 21:02           ` Dave Airlie
  2022-01-19  7:31           ` Christian König
  2 siblings, 0 replies; 20+ messages in thread
From: Alex Deucher @ 2022-01-18 20:15 UTC (permalink / raw)
  To: Eric Huang; +Cc: Russell, Kent, amd-gfx list

On Tue, Jan 18, 2022 at 3:13 PM Eric Huang <jinhuieric.huang@amd.com> wrote:
>
> I understand Alex's concern. I think usually we only check the version
> when a feature is only available in a specific version, and other
> version or newer version doesn't have.
>
> In case of FW fix, we assume the driver and FWs have to be synchronous.
> If we have driver backward compatibility for FWs, there must be a lot of
> redundant codes for FW version check. So this patch and SDMA fix will be
> pushed into ROCm 5.1 release branch at the same time.

We need to check.  The firmwares are not distributed in lock step with
the driver.

Alex


>
> Regards,
> Eric
>
> On 2022-01-18 14:35, Alex Deucher wrote:
> > On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent <Kent.Russell@amd.com> wrote:
> >> [AMD Official Use Only]
> >>
> >> I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix?
> > Yes, that was my concern.
> >
> > Alex
> >
> >>   Kent
> >>
> >>> -----Original Message-----
> >>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Eric Huang
> >>> Sent: Tuesday, January 18, 2022 2:09 PM
> >>> To: Alex Deucher <alexdeucher@gmail.com>
> >>> Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
> >>> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
> >>>
> >>> The SDMA fix is generic and not in a specific version of FW, so we don't
> >>> have to check.
> >>>
> >>> Thanks,
> >>> Eric
> >>>
> >>> On 2022-01-18 11:35, Alex Deucher wrote:
> >>>> On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang@amd.com> wrote:
> >>>>> SDMA FW fixes the hang issue for adding heavy-weight TLB
> >>>>> flush on Arcturus, so we can enable it.
> >>>> Do we need to check for a specific fw version?
> >>>>
> >>>> Alex
> >>>>
> >>>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> >>>>> ---
> >>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
> >>>>>    drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
> >>>>>    2 files changed, 7 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>>> index a64cbbd943ba..7b24a920c12e 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>>> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >>>>>                                   true);
> >>>>>           ret = unreserve_bo_and_vms(&ctx, false, false);
> >>>>>
> >>>>> -       /* Only apply no TLB flush on Aldebaran to
> >>>>> -        * workaround regressions on other Asics.
> >>>>> +       /* Only apply no TLB flush on Aldebaran and Arcturus
> >>>>> +        * to workaround regressions on other Asics.
> >>>>>            */
> >>>>> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> >>>>> +       if (table_freed &&
> >>>>> +           (adev->asic_type != CHIP_ALDEBARAN) &&
> >>>>> +           (adev->asic_type != CHIP_ARCTURUS))
> >>>>>                   *table_freed = true;
> >>>>>
> >>>>>           goto out;
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>>> index b570c0454ce9..ef4d676cc71c 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>>> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file
> >>> *filep,
> >>>>>           }
> >>>>>           mutex_unlock(&p->mutex);
> >>>>>
> >>>>> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> >>>>> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> >>>>> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
> >>>>>                   err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> >>>>>                                   (struct kgd_mem *) mem, true);
> >>>>>                   if (err) {
> >>>>> --
> >>>>> 2.25.1
> >>>>>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 20:13         ` Eric Huang
  2022-01-18 20:15           ` Alex Deucher
@ 2022-01-18 21:02           ` Dave Airlie
  2022-01-19  7:31           ` Christian König
  2 siblings, 0 replies; 20+ messages in thread
From: Dave Airlie @ 2022-01-18 21:02 UTC (permalink / raw)
  To: Eric Huang; +Cc: Alex Deucher, Russell, Kent, amd-gfx list

On Wed, 19 Jan 2022 at 06:14, Eric Huang <jinhuieric.huang@amd.com> wrote:
>
> I understand Alex's concern. I think usually we only check the version
> when a feature is only available in a specific version, and other
> version or newer version doesn't have.
>
> In case of FW fix, we assume the driver and FWs have to be synchronous.
> If we have driver backward compatibility for FWs, there must be a lot of
> redundant codes for FW version check. So this patch and SDMA fix will be
> pushed into ROCm 5.1 release branch at the same time.
>

Please remove that assumption from everwhere.

If you have released a fw into linux-firmware then you need to support
it going forward, even if it means printing something in dmesg
recommending people upgrade for features.

Dave.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 20:13         ` Eric Huang
  2022-01-18 20:15           ` Alex Deucher
  2022-01-18 21:02           ` Dave Airlie
@ 2022-01-19  7:31           ` Christian König
  2 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2022-01-19  7:31 UTC (permalink / raw)
  To: Eric Huang, Alex Deucher, Russell, Kent; +Cc: amd-gfx list

Am 18.01.22 um 21:13 schrieb Eric Huang:
> I understand Alex's concern. I think usually we only check the version 
> when a feature is only available in a specific version, and other 
> version or newer version doesn't have.
>
> In case of FW fix, we assume the driver and FWs have to be 
> synchronous. If we have driver backward compatibility for FWs, there 
> must be a lot of redundant codes for FW version check. So this patch 
> and SDMA fix will be pushed into ROCm 5.1 release branch at the same 
> time.

Others replied as well, but just to make it clear this approach is an 
absolutely clear NAK.

Driver backward compatibility is a documented mandatory feature.

So if you already pushed this to ROCm 5.1 then please revert that ASAP.

Regards,
Christian.

>
> Regards,
> Eric


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-19 15:00         ` Eric Huang
@ 2022-01-19 15:03           ` Russell, Kent
  0 siblings, 0 replies; 20+ messages in thread
From: Russell, Kent @ 2022-01-19 15:03 UTC (permalink / raw)
  To: Huang, JinHuiEric, Kuehling, Felix, amd-gfx

[AMD Official Use Only]

> -----Original Message-----
> From: Huang, JinHuiEric <JinHuiEric.Huang@amd.com>
> Sent: Wednesday, January 19, 2022 10:01 AM
> To: Russell, Kent <Kent.Russell@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>
>
>
> On 2022-01-19 09:50, Russell, Kent wrote:
> > [AMD Official Use Only]
> >
> >> -----Original Message-----
> >> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> >> Sent: Tuesday, January 18, 2022 7:16 PM
> >> To: Russell, Kent <Kent.Russell@amd.com>; Huang, JinHuiEric
> >> <JinHuiEric.Huang@amd.com>; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
> >>
> >> Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:
> >>> One question inline
> >>>
> >>>
> >>> *KENT RUSSELL***
> >>>
> >>> Sr. Software Engineer | Linux Compute Kernel
> >>>
> >>> 1 Commerce Valley Drive East
> >>>
> >>> Markham, ON L3T 7X6
> >>>
> >>> *O*+(1) 289-695-2122**| Ext 72122
> >>>
> >>>
> >>> ------------------------------------------------------------------------
> >>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
> >>> Felix Kuehling <felix.kuehling@amd.com>
> >>> *Sent:* Tuesday, January 18, 2022 6:36 PM
> >>> *To:* Huang, JinHuiEric <JinHuiEric.Huang@amd.com>;
> >>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> >>> *Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
> >>> Arcturus
> >>>
> >>> Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
> >>>> SDMA FW fixes the hang issue for adding heavy-weight TLB
> >>>> flush on Arcturus, so we can enable it.
> >>>>
> >>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> >>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >>>
> >>>
> >>>> ---
> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
> >>>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
> >>>>   2 files changed, 8 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> index a64cbbd943ba..acb4fd973e60 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >>>>                                 true);
> >>>>         ret = unreserve_bo_and_vms(&ctx, false, false);
> >>>>
> >>>> -     /* Only apply no TLB flush on Aldebaran to
> >>>> -      * workaround regressions on other Asics.
> >>>> -      */
> >>>> -     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> >>>> -             *table_freed = true;
> >>>> -
> >>>>         goto out;
> >>>>
> >>>>   out_unreserve:
> >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>> index b570c0454ce9..485d4c52c7de 100644
> >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> >>>> @@ -1596,6 +1596,12 @@ static int
> >>> kfd_ioctl_free_memory_of_gpu(struct file *filep,
> >>>>         return ret;
> >>>>   }
> >>>>
> >>>> +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
> >>>> +     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)
> >>> Do we need to add a check for sdma ver >=8 here?
> >> What's the significance of version 8 for Aldebaran? This code was
> >> working on Aldebaran without a version check before. Did we ever
> >> publicly release an SDMA firmware older than version 8 that for Aldebaran?
> > We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket correctly, the
> same fix for Arcturus was required for Aldebaran and was part of SDMA v8. But Eric is
> obviously watching the ticket more closely than I, so I'll defer to him there.
> Yes. Aldebaran has the same bug as Arcturus in SDMA, but the bug doesn't
> cause GPU hang on Aldebaran. As Felix said heavy-weight TLB flush have
> been working on Aldebaran since it was enabled, so we don't need to
> check the version for it.

Ah perfect, thanks for the clarification!

 Kent

>
> Regards,
> Eric
> >
> >   Kent
> >
> >> Regards,
> >>    Felix
> >>
> >>
> >>> ||
> >>>> +            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> >>>> +             dev->adev->sdma.instance[0].fw_version >= 18);
> >>>> +}
> >>>> +
> >>>>   static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
> >>>>                                         struct kfd_process *p, void
> >>> *data)
> >>>>   {
> >>>> @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct
> >>> file *filep,
> >>>>         }
> >>>>
> >>>>         /* Flush TLBs after waiting for the page table updates to
> >>> complete */
> >>>> -     if (table_freed) {
> >>>> +     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
> >>>>                 for (i = 0; i < args->n_devices; i++) {
> >>>>                         peer = kfd_device_by_id(devices_arr[i]);
> >>>>                         if (WARN_ON_ONCE(!peer))
> >>>> @@ -1806,7 +1812,7 @@ static int
> >>> kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
> >>>>         }
> >>>>         mutex_unlock(&p->mutex);
> >>>>
> >>>> -     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> >>>> +     if (kfd_flush_tlb_after_unmap(dev)) {
> >>>>                 err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> >>>>                                 (struct kgd_mem *) mem, true);
> >>>>                 if (err) {


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-19 14:50       ` Russell, Kent
@ 2022-01-19 15:00         ` Eric Huang
  2022-01-19 15:03           ` Russell, Kent
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Huang @ 2022-01-19 15:00 UTC (permalink / raw)
  To: Russell, Kent, Kuehling, Felix, amd-gfx



On 2022-01-19 09:50, Russell, Kent wrote:
> [AMD Official Use Only]
>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling@amd.com>
>> Sent: Tuesday, January 18, 2022 7:16 PM
>> To: Russell, Kent <Kent.Russell@amd.com>; Huang, JinHuiEric
>> <JinHuiEric.Huang@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>>
>> Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:
>>> One question inline
>>>
>>>
>>> *KENT RUSSELL***
>>>
>>> Sr. Software Engineer | Linux Compute Kernel
>>>
>>> 1 Commerce Valley Drive East
>>>
>>> Markham, ON L3T 7X6
>>>
>>> *O*+(1) 289-695-2122**| Ext 72122
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
>>> Felix Kuehling <felix.kuehling@amd.com>
>>> *Sent:* Tuesday, January 18, 2022 6:36 PM
>>> *To:* Huang, JinHuiEric <JinHuiEric.Huang@amd.com>;
>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>> *Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
>>> Arcturus
>>>
>>> Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
>>>> SDMA FW fixes the hang issue for adding heavy-weight TLB
>>>> flush on Arcturus, so we can enable it.
>>>>
>>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>
>>>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
>>>>   2 files changed, 8 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index a64cbbd943ba..acb4fd973e60 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>>>>                                 true);
>>>>         ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>
>>>> -     /* Only apply no TLB flush on Aldebaran to
>>>> -      * workaround regressions on other Asics.
>>>> -      */
>>>> -     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
>>>> -             *table_freed = true;
>>>> -
>>>>         goto out;
>>>>
>>>>   out_unreserve:
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> index b570c0454ce9..485d4c52c7de 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>> @@ -1596,6 +1596,12 @@ static int
>>> kfd_ioctl_free_memory_of_gpu(struct file *filep,
>>>>         return ret;
>>>>   }
>>>>
>>>> +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
>>>> +     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)
>>> Do we need to add a check for sdma ver >=8 here?
>> What's the significance of version 8 for Aldebaran? This code was
>> working on Aldebaran without a version check before. Did we ever
>> publicly release an SDMA firmware older than version 8 that for Aldebaran?
> We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket correctly, the same fix for Arcturus was required for Aldebaran and was part of SDMA v8. But Eric is obviously watching the ticket more closely than I, so I'll defer to him there.
Yes. Aldebaran has the same bug as Arcturus in SDMA, but the bug doesn't 
cause GPU hang on Aldebaran. As Felix said heavy-weight TLB flush have 
been working on Aldebaran since it was enabled, so we don't need to 
check the version for it.

Regards,
Eric
>
>   Kent
>
>> Regards,
>>    Felix
>>
>>
>>> ||
>>>> +            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
>>>> +             dev->adev->sdma.instance[0].fw_version >= 18);
>>>> +}
>>>> +
>>>>   static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>>>>                                         struct kfd_process *p, void
>>> *data)
>>>>   {
>>>> @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct
>>> file *filep,
>>>>         }
>>>>
>>>>         /* Flush TLBs after waiting for the page table updates to
>>> complete */
>>>> -     if (table_freed) {
>>>> +     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
>>>>                 for (i = 0; i < args->n_devices; i++) {
>>>>                         peer = kfd_device_by_id(devices_arr[i]);
>>>>                         if (WARN_ON_ONCE(!peer))
>>>> @@ -1806,7 +1812,7 @@ static int
>>> kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>>>>         }
>>>>         mutex_unlock(&p->mutex);
>>>>
>>>> -     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
>>>> +     if (kfd_flush_tlb_after_unmap(dev)) {
>>>>                 err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>>>>                                 (struct kgd_mem *) mem, true);
>>>>                 if (err) {


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-19  0:16     ` Felix Kuehling
@ 2022-01-19 14:50       ` Russell, Kent
  2022-01-19 15:00         ` Eric Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Russell, Kent @ 2022-01-19 14:50 UTC (permalink / raw)
  To: Kuehling, Felix, Huang, JinHuiEric, amd-gfx

[AMD Official Use Only]

> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Tuesday, January 18, 2022 7:16 PM
> To: Russell, Kent <Kent.Russell@amd.com>; Huang, JinHuiEric
> <JinHuiEric.Huang@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>
> Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:
> > One question inline
> >
> >
> > *KENT RUSSELL***
> >
> > Sr. Software Engineer | Linux Compute Kernel
> >
> > 1 Commerce Valley Drive East
> >
> > Markham, ON L3T 7X6
> >
> > *O*+(1) 289-695-2122**| Ext 72122
> >
> >
> > ------------------------------------------------------------------------
> > *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
> > Felix Kuehling <felix.kuehling@amd.com>
> > *Sent:* Tuesday, January 18, 2022 6:36 PM
> > *To:* Huang, JinHuiEric <JinHuiEric.Huang@amd.com>;
> > amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> > *Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
> > Arcturus
> >
> > Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
> > > SDMA FW fixes the hang issue for adding heavy-weight TLB
> > > flush on Arcturus, so we can enable it.
> > >
> > > Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> >
> > Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
> > >  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > index a64cbbd943ba..acb4fd973e60 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> > >                                true);
> > >        ret = unreserve_bo_and_vms(&ctx, false, false);
> > >
> > > -     /* Only apply no TLB flush on Aldebaran to
> > > -      * workaround regressions on other Asics.
> > > -      */
> > > -     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> > > -             *table_freed = true;
> > > -
> > >        goto out;
> > >
> > >  out_unreserve:
> > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > index b570c0454ce9..485d4c52c7de 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > @@ -1596,6 +1596,12 @@ static int
> > kfd_ioctl_free_memory_of_gpu(struct file *filep,
> > >        return ret;
> > >  }
> > >
> > > +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
> > > +     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)
> >
> > Do we need to add a check for sdma ver >=8 here?
>
> What's the significance of version 8 for Aldebaran? This code was
> working on Aldebaran without a version check before. Did we ever
> publicly release an SDMA firmware older than version 8 that for Aldebaran?

We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket correctly, the same fix for Arcturus was required for Aldebaran and was part of SDMA v8. But Eric is obviously watching the ticket more closely than I, so I'll defer to him there.

 Kent

>
> Regards,
>   Felix
>
>
> >
> > ||
> > > +            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> > > +             dev->adev->sdma.instance[0].fw_version >= 18);
> > > +}
> > > +
> > >  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
> > >                                        struct kfd_process *p, void
> > *data)
> > >  {
> > > @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct
> > file *filep,
> > >        }
> > >
> > >        /* Flush TLBs after waiting for the page table updates to
> > complete */
> > > -     if (table_freed) {
> > > +     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
> > >                for (i = 0; i < args->n_devices; i++) {
> > >                        peer = kfd_device_by_id(devices_arr[i]);
> > >                        if (WARN_ON_ONCE(!peer))
> > > @@ -1806,7 +1812,7 @@ static int
> > kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
> > >        }
> > >        mutex_unlock(&p->mutex);
> > >
> > > -     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> > > +     if (kfd_flush_tlb_after_unmap(dev)) {
> > >                err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> > >                                (struct kgd_mem *) mem, true);
> > >                if (err) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-19  0:08   ` Russell, Kent
@ 2022-01-19  0:16     ` Felix Kuehling
  2022-01-19 14:50       ` Russell, Kent
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Kuehling @ 2022-01-19  0:16 UTC (permalink / raw)
  To: Russell, Kent, Huang, JinHuiEric, amd-gfx

Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:
> One question inline
>
>
> *KENT RUSSELL***  
>
> Sr. Software Engineer | Linux Compute Kernel
>
> 1 Commerce Valley Drive East
>
> Markham, ON L3T 7X6
>
> *O*+(1) 289-695-2122**| Ext 72122
>
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
> Felix Kuehling <felix.kuehling@amd.com>
> *Sent:* Tuesday, January 18, 2022 6:36 PM
> *To:* Huang, JinHuiEric <JinHuiEric.Huang@amd.com>;
> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> *Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
> Arcturus
>  
> Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
> > SDMA FW fixes the hang issue for adding heavy-weight TLB
> > flush on Arcturus, so we can enable it.
> >
> > Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
> >  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > index a64cbbd943ba..acb4fd973e60 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> >                                true);
> >        ret = unreserve_bo_and_vms(&ctx, false, false);
> > 
> > -     /* Only apply no TLB flush on Aldebaran to
> > -      * workaround regressions on other Asics.
> > -      */
> > -     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> > -             *table_freed = true;
> > -
> >        goto out;
> > 
> >  out_unreserve:
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index b570c0454ce9..485d4c52c7de 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -1596,6 +1596,12 @@ static int
> kfd_ioctl_free_memory_of_gpu(struct file *filep,
> >        return ret;
> >  }
> > 
> > +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
> > +     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)
>
> Do we need to add a check for sdma ver >=8 here?

What's the significance of version 8 for Aldebaran? This code was
working on Aldebaran without a version check before. Did we ever
publicly release an SDMA firmware older than version 8 that for Aldebaran?

Regards,
  Felix


>  
> ||
> > +            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> > +             dev->adev->sdma.instance[0].fw_version >= 18);
> > +}
> > +
> >  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
> >                                        struct kfd_process *p, void
> *data)
> >  {
> > @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct
> file *filep,
> >        }
> > 
> >        /* Flush TLBs after waiting for the page table updates to
> complete */
> > -     if (table_freed) {
> > +     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
> >                for (i = 0; i < args->n_devices; i++) {
> >                        peer = kfd_device_by_id(devices_arr[i]);
> >                        if (WARN_ON_ONCE(!peer))
> > @@ -1806,7 +1812,7 @@ static int
> kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
> >        }
> >        mutex_unlock(&p->mutex);
> > 
> > -     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> > +     if (kfd_flush_tlb_after_unmap(dev)) {
> >                err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
> >                                (struct kgd_mem *) mem, true);
> >                if (err) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 23:36 ` Felix Kuehling
@ 2022-01-19  0:08   ` Russell, Kent
  2022-01-19  0:16     ` Felix Kuehling
  0 siblings, 1 reply; 20+ messages in thread
From: Russell, Kent @ 2022-01-19  0:08 UTC (permalink / raw)
  To: Kuehling, Felix, Huang, JinHuiEric, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 3401 bytes --]

One question inline


KENT RUSSELL
Sr. Software Engineer | Linux Compute Kernel
1 Commerce Valley Drive East
Markham, ON L3T 7X6
O +(1) 289-695-2122 | Ext 72122

________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Felix Kuehling <felix.kuehling@amd.com>
Sent: Tuesday, January 18, 2022 6:36 PM
To: Huang, JinHuiEric <JinHuiEric.Huang@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.
>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..acb4fd973e60 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>                                true);
>        ret = unreserve_bo_and_vms(&ctx, false, false);
>
> -     /* Only apply no TLB flush on Aldebaran to
> -      * workaround regressions on other Asics.
> -      */
> -     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> -             *table_freed = true;
> -
>        goto out;
>
>  out_unreserve:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..485d4c52c7de 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
>        return ret;
>  }
>
> +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
> +     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)

Do we need to add a check for sdma ver >=8 here?

||
> +            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> +             dev->adev->sdma.instance[0].fw_version >= 18);
> +}
> +
>  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>                                        struct kfd_process *p, void *data)
>  {
> @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>        }
>
>        /* Flush TLBs after waiting for the page table updates to complete */
> -     if (table_freed) {
> +     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
>                for (i = 0; i < args->n_devices; i++) {
>                        peer = kfd_device_by_id(devices_arr[i]);
>                        if (WARN_ON_ONCE(!peer))
> @@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>        }
>        mutex_unlock(&p->mutex);
>
> -     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +     if (kfd_flush_tlb_after_unmap(dev)) {
>                err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>                                (struct kgd_mem *) mem, true);
>                if (err) {

[-- Attachment #2: Type: text/html, Size: 7840 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 22:45 Eric Huang
@ 2022-01-18 23:36 ` Felix Kuehling
  2022-01-19  0:08   ` Russell, Kent
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Kuehling @ 2022-01-18 23:36 UTC (permalink / raw)
  To: Eric Huang, amd-gfx

Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.
>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..acb4fd973e60 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>  				true);
>  	ret = unreserve_bo_and_vms(&ctx, false, false);
>  
> -	/* Only apply no TLB flush on Aldebaran to
> -	 * workaround regressions on other Asics.
> -	 */
> -	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> -		*table_freed = true;
> -
>  	goto out;
>  
>  out_unreserve:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..485d4c52c7de 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
>  	return ret;
>  }
>  
> +static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
> +	return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> +	       (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> +	        dev->adev->sdma.instance[0].fw_version >= 18);
> +}
> +
>  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  					struct kfd_process *p, void *data)
>  {
> @@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  	}
>  
>  	/* Flush TLBs after waiting for the page table updates to complete */
> -	if (table_freed) {
> +	if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
>  		for (i = 0; i < args->n_devices; i++) {
>  			peer = kfd_device_by_id(devices_arr[i]);
>  			if (WARN_ON_ONCE(!peer))
> @@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	}
>  	mutex_unlock(&p->mutex);
>  
> -	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +	if (kfd_flush_tlb_after_unmap(dev)) {
>  		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>  				(struct kgd_mem *) mem, true);
>  		if (err) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
@ 2022-01-18 22:45 Eric Huang
  2022-01-18 23:36 ` Felix Kuehling
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Huang @ 2022-01-18 22:45 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..acb4fd973e60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				true);
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
-	/* Only apply no TLB flush on Aldebaran to
-	 * workaround regressions on other Asics.
-	 */
-	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
-		*table_freed = true;
-
 	goto out;
 
 out_unreserve:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..485d4c52c7de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1596,6 +1596,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
 	return ret;
 }
 
+static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
+	return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+	       (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+	        dev->adev->sdma.instance[0].fw_version >= 18);
+}
+
 static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 					struct kfd_process *p, void *data)
 {
@@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 	}
 
 	/* Flush TLBs after waiting for the page table updates to complete */
-	if (table_freed) {
+	if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
 		for (i = 0; i < args->n_devices; i++) {
 			peer = kfd_device_by_id(devices_arr[i]);
 			if (WARN_ON_ONCE(!peer))
@@ -1806,7 +1812,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 	mutex_unlock(&p->mutex);
 
-	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+	if (kfd_flush_tlb_after_unmap(dev)) {
 		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
 				(struct kgd_mem *) mem, true);
 		if (err) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 21:28 Eric Huang
  2022-01-18 21:49 ` Alex Deucher
@ 2022-01-18 21:50 ` Felix Kuehling
  1 sibling, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-01-18 21:50 UTC (permalink / raw)
  To: Eric Huang, amd-gfx

Am 2022-01-18 um 4:28 p.m. schrieb Eric Huang:
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.
>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 4 +++-
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..f1fed0fc31d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>  				true);
>  	ret = unreserve_bo_and_vms(&ctx, false, false);
>  
> -	/* Only apply no TLB flush on Aldebaran to
> -	 * workaround regressions on other Asics.
> +	/* Only apply no TLB flush on Aldebaran and Arcturus
> +	 * to workaround regressions on other Asics.
>  	 */
> -	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> +	if (table_freed &&
> +	    (adev->asic_type != CHIP_ALDEBARAN) &&
> +	    (adev->asic_type != CHIP_ARCTURUS ||
> +	     adev->sdma.instance[0].fw_version < 18))
>  		*table_freed = true;

Can we move this check into the caller in kfd_chardev.c? That avoids
spreading around these conditions in several places.


>  
>  	goto out;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..0e4a76dca809 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	}
>  	mutex_unlock(&p->mutex);
>  
> -	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> +	    (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> +	     dev->adev->sdma.instance[0].fw_version >= 18)) {

Maybe put this into a helper function that can be used here and in
kfd_ioctl_map_memory_to_gpu. I also saw this being duplicated in the
upcoming CRIU patches. And we may want to adopt this in the SVM code as
well. Having one common helper makes sure we'll keep the TLB flushing
strategy consistent everywhere. Something like:

    bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
    	return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
    	      (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
    	       dev->adev->sdma.instance[0].fw_version >= 18);
    }

Regards,
  Felix


>  		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>  				(struct kgd_mem *) mem, true);
>  		if (err) {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
  2022-01-18 21:28 Eric Huang
@ 2022-01-18 21:49 ` Alex Deucher
  2022-01-18 21:50 ` Felix Kuehling
  1 sibling, 0 replies; 20+ messages in thread
From: Alex Deucher @ 2022-01-18 21:49 UTC (permalink / raw)
  To: Eric Huang; +Cc: amd-gfx list

On Tue, Jan 18, 2022 at 4:28 PM Eric Huang <jinhuieric.huang@amd.com> wrote:
>
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.
>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>

Acked-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 4 +++-
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..f1fed0fc31d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>                                 true);
>         ret = unreserve_bo_and_vms(&ctx, false, false);
>
> -       /* Only apply no TLB flush on Aldebaran to
> -        * workaround regressions on other Asics.
> +       /* Only apply no TLB flush on Aldebaran and Arcturus
> +        * to workaround regressions on other Asics.
>          */
> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> +       if (table_freed &&
> +           (adev->asic_type != CHIP_ALDEBARAN) &&
> +           (adev->asic_type != CHIP_ARCTURUS ||
> +            adev->sdma.instance[0].fw_version < 18))
>                 *table_freed = true;
>
>         goto out;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..0e4a76dca809 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>         }
>         mutex_unlock(&p->mutex);
>
> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> +           (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> +            dev->adev->sdma.instance[0].fw_version >= 18)) {
>                 err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>                                 (struct kgd_mem *) mem, true);
>                 if (err) {
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
@ 2022-01-18 21:28 Eric Huang
  2022-01-18 21:49 ` Alex Deucher
  2022-01-18 21:50 ` Felix Kuehling
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Huang @ 2022-01-18 21:28 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 4 +++-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..f1fed0fc31d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				true);
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
-	/* Only apply no TLB flush on Aldebaran to
-	 * workaround regressions on other Asics.
+	/* Only apply no TLB flush on Aldebaran and Arcturus
+	 * to workaround regressions on other Asics.
 	 */
-	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
+	if (table_freed &&
+	    (adev->asic_type != CHIP_ALDEBARAN) &&
+	    (adev->asic_type != CHIP_ARCTURUS ||
+	     adev->sdma.instance[0].fw_version < 18))
 		*table_freed = true;
 
 	goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..0e4a76dca809 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 	mutex_unlock(&p->mutex);
 
-	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+	    (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+	     dev->adev->sdma.instance[0].fw_version >= 18)) {
 		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
 				(struct kgd_mem *) mem, true);
 		if (err) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
@ 2022-01-18 16:04 Eric Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Huang @ 2022-01-18 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..e1d90643731c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				true);
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
-	/* Only apply no TLB flush on Aldebaran to
-	 * workaround regressions on other Asics.
+	/* Only apply no TLB flush on Aldebaran and Arcturus
+	 * to workaround regressions on other Asics.
 	 */
-	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
+	if (table_freed &&
+	    ((adev->asic_type != CHIP_ALDEBARAN) ||
+	     (adev->asic_type != CHIP_ARCTURUS))
 		*table_freed = true;
 
 	goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..ef4d676cc71c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 	mutex_unlock(&p->mutex);
 
-	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
+	    KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
 		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
 				(struct kgd_mem *) mem, true);
 		if (err) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-01-19 15:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 16:15 [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus Eric Huang
2022-01-18 16:35 ` Alex Deucher
2022-01-18 19:09   ` Eric Huang
2022-01-18 19:27     ` Russell, Kent
2022-01-18 19:35       ` Alex Deucher
2022-01-18 20:13         ` Eric Huang
2022-01-18 20:15           ` Alex Deucher
2022-01-18 21:02           ` Dave Airlie
2022-01-19  7:31           ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2022-01-18 22:45 Eric Huang
2022-01-18 23:36 ` Felix Kuehling
2022-01-19  0:08   ` Russell, Kent
2022-01-19  0:16     ` Felix Kuehling
2022-01-19 14:50       ` Russell, Kent
2022-01-19 15:00         ` Eric Huang
2022-01-19 15:03           ` Russell, Kent
2022-01-18 21:28 Eric Huang
2022-01-18 21:49 ` Alex Deucher
2022-01-18 21:50 ` Felix Kuehling
2022-01-18 16:04 Eric Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.