All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Errabolu, Ramesh" <Ramesh.Errabolu@amd.com>
To: "Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
Date: Thu, 9 Jun 2022 14:44:20 +0000	[thread overview]
Message-ID: <SN1PR12MB25751550E5E84F119F616EFBE3A79@SN1PR12MB2575.namprd12.prod.outlook.com> (raw)
In-Reply-To: <4eb71fb8-fff6-4686-03ba-877fb920770e@amd.com>

[AMD Official Use Only - General]

My resp in line

Regards,
Ramesh

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com> 
Sent: Thursday, June 9, 2022 2:14 AM
To: Errabolu, Ramesh <Ramesh.Errabolu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero

On 2022-06-08 16:03, Errabolu, Ramesh wrote:
> [AMD Official Use Only - General]
>
> My response is inline.
>
> Regards,
> Ramesh
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Thursday, June 9, 2022 1:10 AM
> To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh 
> <Ramesh.Errabolu@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only 
> after map count goes to zero
>
>
> On 2022-06-08 07:51, Ramesh Errabolu wrote:
>> In existing code MMIO and DOORBELL BOs are unpinned without ensuring 
>> the condition that their map count has reached zero. Unpinning 
>> without checking this constraint could lead to an error while BO is 
>> being freed. The patch fixes this issue.
>>
>> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
>>    1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a1de900ba677..e5dc94b745b1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    
>>    	mutex_lock(&mem->lock);
>>    
>> -	/* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> -	if (mem->alloc_flags &
>> -	    (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> -	     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
>> -		amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> -	}
>> -
>>    	mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
>>    	is_imported = mem->is_imported;
>>    	mutex_unlock(&mem->lock);
>> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    	/* Make sure restore workers don't access the BO any more */
>>    	bo_list_entry = &mem->validate_list;
>>    	mutex_lock(&process_info->lock);
>> -	list_del(&bo_list_entry->head);
>> +	list_del_init(&bo_list_entry->head);
> Is this an unrelated fix? What is this needed for? I vaguely remember discussing this before, but can't remember the reason.
>
> Ramesh: This fix is unrelated to P2P work. I brought this issue to attention while working on IOMMU support on DKMS branch. Basically a user could call free() before the map count goes to zero. The patch is trying fix that.

I get that, but I couldn't remember why I suggested list_del_init here. 
It has nothing to do with unpinning of BOs.

Now I recall that it had something to do with restarting the ioctl after it was interrupted by a signal. reserve_bo_and_cond_vms can fail with -ERESTARTSYS. In that case the ioctl is reentered. We need to make sure it doesn't crash the second time around. list_del will remove bo_list_entry from the list but leave the pointers dangling. The second time around it will probably cause corruption or an oops. Using list_del_init avoids that by initializing the prev and next pointers to NULL.

Ramesh: I see the same idiom in the method remove_kgd_mem_from_kfd_bo_list(). Should we be calling this method rather than re-write the same code block. Also the name remove_xyz_kfd_bo_list() is misleading. Should this name be changed.

See one more little fix below.


>
> Regards,
>     Felix
>
>
>>    	mutex_unlock(&process_info->lock);
>>    
>>    	/* No more MMU notifiers */
>> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    
>>    	ret = unreserve_bo_and_vms(&ctx, false, false);

This unreserve_bo_and_vms call cannot fail because the wait parameter is false. If it did fail, the error handling would be broken. I'd add a WARN_ONCE to make that assumption explicit, and change the return at the end of this function to return 0. Basically, if we got this far, we are not turning back, and we should return success.

You could update the commit headline to be more general. Something like: 
Fix error handling in amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

Regards,
   Felix


>>    
>> +	/* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> +	if (mem->alloc_flags &
>> +	    (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> +	     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
>> +		amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> +
>>    	/* Free the sync object */
>>    	amdgpu_sync_free(&mem->sync);
>>    

      reply	other threads:[~2022-06-09 14:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 11:51 [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero Ramesh Errabolu
2022-06-08 19:39 ` Felix Kuehling
2022-06-08 20:03   ` Errabolu, Ramesh
2022-06-08 20:44     ` Felix Kuehling
2022-06-09 14:44       ` Errabolu, Ramesh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN1PR12MB25751550E5E84F119F616EFBE3A79@SN1PR12MB2575.namprd12.prod.outlook.com \
    --to=ramesh.errabolu@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.