dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Qu Huang <jinsdb@126.com>
To: "Christian König" <christian.koenig@amd.com>,
	alexander.deucher@amd.com, airlied@linux.ie, daniel@ffwll.ch,
	sumit.semwal@linaro.org, airlied@redhat.com, ray.huang@amd.com,
	Mihir.Patel@amd.com, nirmoy.aiemd@gmail.com
Cc: linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access
Date: Sat, 3 Apr 2021 11:08:35 +0800	[thread overview]
Message-ID: <5ccca988-fd60-f63e-850c-9b615a17b856@126.com> (raw)
In-Reply-To: <9b876791-7fa4-46da-7aec-1d1bfde83f4e@amd.com>

Hi Christian,

On 2021/4/3 0:25, Christian König wrote:
> Hi Qu,
> 
> Am 02.04.21 um 05:18 schrieb Qu Huang:
>> Before dma_resv_lock(bo->base.resv, NULL) in amdgpu_bo_release_notify(),
>> the bo->base.resv lock may be held by ttm_mem_evict_first(),
> 
> That can't happen since when bo_release_notify is called the BO has not 
> more references and is therefore deleted.
> 
> And we never evict a deleted BO, we just wait for it to become idle.

Yes, the bo reference counter return to zero will enter 
ttm_bo_release(),but notify bo release (call amdgpu_bo_release_notify()) 
first happen, and then test if a reservation object's fences have been 
signaled, and then mark bo as deleted and remove bo from the LRU list.

When ttm_bo_release() and ttm_mem_evict_first() is concurrent,
the Bo has not been removed from the LRU list and is not marked as 
deleted, this will happen.

As a test, when we use CPU memset instead of SDMA fill in 
amdgpu_bo_release_notify(), the result is page fault:

PID: 5490   TASK: ffff8e8136e04100  CPU: 4   COMMAND: "gemmPerf"
  #0 [ffff8e79eaa17970] machine_kexec at ffffffffb2863784
  #1 [ffff8e79eaa179d0] __crash_kexec at ffffffffb291ce92
  #2 [ffff8e79eaa17aa0] crash_kexec at ffffffffb291cf80
  #3 [ffff8e79eaa17ab8] oops_end at ffffffffb2f6c768
  #4 [ffff8e79eaa17ae0] no_context at ffffffffb2f5aaa6
  #5 [ffff8e79eaa17b30] __bad_area_nosemaphore at ffffffffb2f5ab3d
  #6 [ffff8e79eaa17b80] bad_area_nosemaphore at ffffffffb2f5acae
  #7 [ffff8e79eaa17b90] __do_page_fault at ffffffffb2f6f6c0
  #8 [ffff8e79eaa17c00] do_page_fault at ffffffffb2f6f925
  #9 [ffff8e79eaa17c30] page_fault at ffffffffb2f6b758
     [exception RIP: memset+31]
     RIP: ffffffffb2b8668f  RSP: ffff8e79eaa17ce8  RFLAGS: 00010a17
     RAX: bebebebebebebebe  RBX: ffff8e747bff10c0  RCX: 0000060b00200000
     RDX: 0000000000000000  RSI: 00000000000000be  RDI: ffffab807f000000
     RBP: ffff8e79eaa17d10   R8: ffff8e79eaa14000   R9: ffffab7c80000000
     R10: 000000000000bcba  R11: 00000000000001ba  R12: ffff8e79ebaa4050
     R13: ffffab7c80000000  R14: 0000000000022600  R15: ffff8e8136e04100
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff8e79eaa17ce8] amdgpu_bo_release_notify at ffffffffc092f2d1 [amdgpu]
#11 [ffff8e79eaa17d18] ttm_bo_release at ffffffffc08f39dd [amdttm]
#12 [ffff8e79eaa17d58] amdttm_bo_put at ffffffffc08f3c8c [amdttm]
#13 [ffff8e79eaa17d68] amdttm_bo_vm_close at ffffffffc08f7ac9 [amdttm]
#14 [ffff8e79eaa17d80] remove_vma at ffffffffb29ef115
#15 [ffff8e79eaa17da0] exit_mmap at ffffffffb29f2c64
#16 [ffff8e79eaa17e58] mmput at ffffffffb28940c7
#17 [ffff8e79eaa17e78] do_exit at ffffffffb289dc95
#18 [ffff8e79eaa17f10] do_group_exit at ffffffffb289e4cf
#19 [ffff8e79eaa17f40] sys_exit_group at ffffffffb289e544
#20 [ffff8e79eaa17f50] system_call_fastpath at ffffffffb2f74ddb

Regards,
Qu.


> 
> Regards,
> Christian.
> 
>> and the VRAM mem will be evicted, mem region was replaced
>> by Gtt mem region. amdgpu_bo_release_notify() will then
>> hold the bo->base.resv lock, and SDMA will get an invalid
>> address in amdgpu_fill_buffer(), resulting in a VMFAULT
>> or memory corruption.
>>
>> To avoid it, we have to hold bo->base.resv lock first, and
>> check whether the mem.mem_type is TTM_PL_VRAM.
>>
>> Signed-off-by: Qu Huang <jinsdb@126.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 4b29b82..8018574 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct 
>> ttm_buffer_object *bo)
>>       if (bo->base.resv == &bo->base._resv)
>>           amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
>>
>> -    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
>> -        !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
>> +    if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
>>           return;
>>
>>       dma_resv_lock(bo->base.resv, NULL);
>>
>> +    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
>> +        dma_resv_unlock(bo->base.resv);
>> +        return;
>> +    }
>> +
>>       r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence);
>>       if (!WARN_ON(r)) {
>>           amdgpu_bo_fence(abo, fence, false);
>> -- 
>> 1.8.3.1
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2021-04-03  3:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  3:18 [PATCH] drm/amdgpu: Fix a potential sdma invalid access Qu Huang
2021-04-02 16:25 ` Christian König
2021-04-03  3:08   ` Qu Huang [this message]
2021-04-03  5:08   ` Qu Huang
2021-04-03  8:49     ` Christian König
2021-04-06  6:04       ` Qu Huang
2021-04-06 13:44         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ccca988-fd60-f63e-850c-9b615a17b856@126.com \
    --to=jinsdb@126.com \
    --cc=Mihir.Patel@amd.com \
    --cc=airlied@linux.ie \
    --cc=airlied@redhat.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=nirmoy.aiemd@gmail.com \
    --cc=ray.huang@amd.com \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).