* [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
@ 2022-06-08 11:51 Ramesh Errabolu
2022-06-08 19:39 ` Felix Kuehling
0 siblings, 1 reply; 5+ messages in thread
From: Ramesh Errabolu @ 2022-06-08 11:51 UTC (permalink / raw)
To: amd-gfx; +Cc: Ramesh Errabolu
In existing code MMIO and DOORBELL BOs are unpinned without ensuring the
condition that their map count has reached zero. Unpinning without checking
this constraint could lead to an error while BO is being freed. The patch
fixes this issue.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a1de900ba677..e5dc94b745b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
mutex_lock(&mem->lock);
- /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
- if (mem->alloc_flags &
- (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
- KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
- amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
- }
-
mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
is_imported = mem->is_imported;
mutex_unlock(&mem->lock);
@@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
/* Make sure restore workers don't access the BO any more */
bo_list_entry = &mem->validate_list;
mutex_lock(&process_info->lock);
- list_del(&bo_list_entry->head);
+ list_del_init(&bo_list_entry->head);
mutex_unlock(&process_info->lock);
/* No more MMU notifiers */
@@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
ret = unreserve_bo_and_vms(&ctx, false, false);
+ /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
+ if (mem->alloc_flags &
+ (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+ KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
+ amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
+
/* Free the sync object */
amdgpu_sync_free(&mem->sync);
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
2022-06-08 11:51 [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero Ramesh Errabolu
@ 2022-06-08 19:39 ` Felix Kuehling
2022-06-08 20:03 ` Errabolu, Ramesh
0 siblings, 1 reply; 5+ messages in thread
From: Felix Kuehling @ 2022-06-08 19:39 UTC (permalink / raw)
To: amd-gfx, Errabolu, Ramesh
On 2022-06-08 07:51, Ramesh Errabolu wrote:
> In existing code MMIO and DOORBELL BOs are unpinned without ensuring the
> condition that their map count has reached zero. Unpinning without checking
> this constraint could lead to an error while BO is being freed. The patch
> fixes this issue.
>
> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a1de900ba677..e5dc94b745b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>
> mutex_lock(&mem->lock);
>
> - /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
> - if (mem->alloc_flags &
> - (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> - KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> - amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
> - }
> -
> mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
> is_imported = mem->is_imported;
> mutex_unlock(&mem->lock);
> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> /* Make sure restore workers don't access the BO any more */
> bo_list_entry = &mem->validate_list;
> mutex_lock(&process_info->lock);
> - list_del(&bo_list_entry->head);
> + list_del_init(&bo_list_entry->head);
Is this an unrelated fix? What is this needed for? I vaguely remember
discussing this before, but can't remember the reason.
Regards,
Felix
> mutex_unlock(&process_info->lock);
>
> /* No more MMU notifiers */
> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>
> ret = unreserve_bo_and_vms(&ctx, false, false);
>
> + /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
> + if (mem->alloc_flags &
> + (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> + KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
> + amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
> +
> /* Free the sync object */
> amdgpu_sync_free(&mem->sync);
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
2022-06-08 19:39 ` Felix Kuehling
@ 2022-06-08 20:03 ` Errabolu, Ramesh
2022-06-08 20:44 ` Felix Kuehling
0 siblings, 1 reply; 5+ messages in thread
From: Errabolu, Ramesh @ 2022-06-08 20:03 UTC (permalink / raw)
To: Kuehling, Felix, amd-gfx
[AMD Official Use Only - General]
My response is inline.
Regards,
Ramesh
-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Thursday, June 9, 2022 1:10 AM
To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh <Ramesh.Errabolu@amd.com>
Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
On 2022-06-08 07:51, Ramesh Errabolu wrote:
> In existing code MMIO and DOORBELL BOs are unpinned without ensuring
> the condition that their map count has reached zero. Unpinning without
> checking this constraint could lead to an error while BO is being
> freed. The patch fixes this issue.
>
> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a1de900ba677..e5dc94b745b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>
> mutex_lock(&mem->lock);
>
> - /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
> - if (mem->alloc_flags &
> - (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> - KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> - amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
> - }
> -
> mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
> is_imported = mem->is_imported;
> mutex_unlock(&mem->lock);
> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> /* Make sure restore workers don't access the BO any more */
> bo_list_entry = &mem->validate_list;
> mutex_lock(&process_info->lock);
> - list_del(&bo_list_entry->head);
> + list_del_init(&bo_list_entry->head);
Is this an unrelated fix? What is this needed for? I vaguely remember discussing this before, but can't remember the reason.
Ramesh: This fix is unrelated to P2P work. I brought this issue to attention while working on IOMMU support on DKMS branch. Basically a user could call free() before the map count goes to zero. The patch is trying fix that.
Regards,
Felix
> mutex_unlock(&process_info->lock);
>
> /* No more MMU notifiers */
> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>
> ret = unreserve_bo_and_vms(&ctx, false, false);
>
> + /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
> + if (mem->alloc_flags &
> + (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> + KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
> + amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
> +
> /* Free the sync object */
> amdgpu_sync_free(&mem->sync);
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
2022-06-08 20:03 ` Errabolu, Ramesh
@ 2022-06-08 20:44 ` Felix Kuehling
2022-06-09 14:44 ` Errabolu, Ramesh
0 siblings, 1 reply; 5+ messages in thread
From: Felix Kuehling @ 2022-06-08 20:44 UTC (permalink / raw)
To: Errabolu, Ramesh, amd-gfx
On 2022-06-08 16:03, Errabolu, Ramesh wrote:
> [AMD Official Use Only - General]
>
> My response is inline.
>
> Regards,
> Ramesh
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Thursday, June 9, 2022 1:10 AM
> To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh <Ramesh.Errabolu@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
>
>
> On 2022-06-08 07:51, Ramesh Errabolu wrote:
>> In existing code MMIO and DOORBELL BOs are unpinned without ensuring
>> the condition that their map count has reached zero. Unpinning without
>> checking this constraint could lead to an error while BO is being
>> freed. The patch fixes this issue.
>>
>> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a1de900ba677..e5dc94b745b1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>
>> mutex_lock(&mem->lock);
>>
>> - /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> - if (mem->alloc_flags &
>> - (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> - KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
>> - amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> - }
>> -
>> mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
>> is_imported = mem->is_imported;
>> mutex_unlock(&mem->lock);
>> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>> /* Make sure restore workers don't access the BO any more */
>> bo_list_entry = &mem->validate_list;
>> mutex_lock(&process_info->lock);
>> - list_del(&bo_list_entry->head);
>> + list_del_init(&bo_list_entry->head);
> Is this an unrelated fix? What is this needed for? I vaguely remember discussing this before, but can't remember the reason.
>
> Ramesh: This fix is unrelated to P2P work. I brought this issue to attention while working on IOMMU support on DKMS branch. Basically a user could call free() before the map count goes to zero. The patch is trying fix that.
I get that, but I couldn't remember why I suggested list_del_init here.
It has nothing to do with unpinning of BOs.
Now I recall that it had something to do with restarting the ioctl after
it was interrupted by a signal. reserve_bo_and_cond_vms can fail with
-ERESTARTSYS. In that case the ioctl is reentered. We need to make sure
it doesn't crash the second time around. list_del will remove
bo_list_entry from the list but leave the pointers dangling. The second
time around it will probably cause corruption or an oops. Using
list_del_init avoids that by initializing the prev and next pointers to
NULL.
See one more little fix below.
>
> Regards,
> Felix
>
>
>> mutex_unlock(&process_info->lock);
>>
>> /* No more MMU notifiers */
>> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>
>> ret = unreserve_bo_and_vms(&ctx, false, false);
This unreserve_bo_and_vms call cannot fail because the wait parameter is
false. If it did fail, the error handling would be broken. I'd add a
WARN_ONCE to make that assumption explicit, and change the return at the
end of this function to return 0. Basically, if we got this far, we are
not turning back, and we should return success.
You could update the commit headline to be more general. Something like:
Fix error handling in amdgpu_amdkfd_gpuvm_free_memory_of_gpu.
Regards,
Felix
>>
>> + /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> + if (mem->alloc_flags &
>> + (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> + KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
>> + amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> +
>> /* Free the sync object */
>> amdgpu_sync_free(&mem->sync);
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
2022-06-08 20:44 ` Felix Kuehling
@ 2022-06-09 14:44 ` Errabolu, Ramesh
0 siblings, 0 replies; 5+ messages in thread
From: Errabolu, Ramesh @ 2022-06-09 14:44 UTC (permalink / raw)
To: Kuehling, Felix, amd-gfx
[AMD Official Use Only - General]
My resp in line
Regards,
Ramesh
-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Thursday, June 9, 2022 2:14 AM
To: Errabolu, Ramesh <Ramesh.Errabolu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
On 2022-06-08 16:03, Errabolu, Ramesh wrote:
> [AMD Official Use Only - General]
>
> My response is inline.
>
> Regards,
> Ramesh
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Thursday, June 9, 2022 1:10 AM
> To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh
> <Ramesh.Errabolu@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only
> after map count goes to zero
>
>
> On 2022-06-08 07:51, Ramesh Errabolu wrote:
>> In existing code MMIO and DOORBELL BOs are unpinned without ensuring
>> the condition that their map count has reached zero. Unpinning
>> without checking this constraint could lead to an error while BO is
>> being freed. The patch fixes this issue.
>>
>> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a1de900ba677..e5dc94b745b1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>
>> mutex_lock(&mem->lock);
>>
>> - /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> - if (mem->alloc_flags &
>> - (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> - KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
>> - amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> - }
>> -
>> mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
>> is_imported = mem->is_imported;
>> mutex_unlock(&mem->lock);
>> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>> /* Make sure restore workers don't access the BO any more */
>> bo_list_entry = &mem->validate_list;
>> mutex_lock(&process_info->lock);
>> - list_del(&bo_list_entry->head);
>> + list_del_init(&bo_list_entry->head);
> Is this an unrelated fix? What is this needed for? I vaguely remember discussing this before, but can't remember the reason.
>
> Ramesh: This fix is unrelated to P2P work. I brought this issue to attention while working on IOMMU support on DKMS branch. Basically a user could call free() before the map count goes to zero. The patch is trying fix that.
I get that, but I couldn't remember why I suggested list_del_init here.
It has nothing to do with unpinning of BOs.
Now I recall that it had something to do with restarting the ioctl after it was interrupted by a signal. reserve_bo_and_cond_vms can fail with -ERESTARTSYS. In that case the ioctl is reentered. We need to make sure it doesn't crash the second time around. list_del will remove bo_list_entry from the list but leave the pointers dangling. The second time around it will probably cause corruption or an oops. Using list_del_init avoids that by initializing the prev and next pointers to NULL.
Ramesh: I see the same idiom in the method remove_kgd_mem_from_kfd_bo_list(). Should we be calling this method rather than re-write the same code block. Also the name remove_xyz_kfd_bo_list() is misleading. Should this name be changed.
See one more little fix below.
>
> Regards,
> Felix
>
>
>> mutex_unlock(&process_info->lock);
>>
>> /* No more MMU notifiers */
>> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>
>> ret = unreserve_bo_and_vms(&ctx, false, false);
This unreserve_bo_and_vms call cannot fail because the wait parameter is false. If it did fail, the error handling would be broken. I'd add a WARN_ONCE to make that assumption explicit, and change the return at the end of this function to return 0. Basically, if we got this far, we are not turning back, and we should return success.
You could update the commit headline to be more general. Something like:
Fix error handling in amdgpu_amdkfd_gpuvm_free_memory_of_gpu.
Regards,
Felix
>>
>> + /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> + if (mem->alloc_flags &
>> + (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> + KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
>> + amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> +
>> /* Free the sync object */
>> amdgpu_sync_free(&mem->sync);
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-06-09 14:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08 11:51 [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero Ramesh Errabolu
2022-06-08 19:39 ` Felix Kuehling
2022-06-08 20:03 ` Errabolu, Ramesh
2022-06-08 20:44 ` Felix Kuehling
2022-06-09 14:44 ` Errabolu, Ramesh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.