linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
@ 2021-03-18  8:32 Daniel Gomez
       [not found] ` <MN2PR12MB37755DAAB03FC0065E98EF1183699@MN2PR12MB3775.namprd12.prod.outlook.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Gomez @ 2021-03-18  8:32 UTC (permalink / raw)
  Cc: dagmcr, Daniel Gomez, Alex Deucher, Christian König,
	David Airlie, Daniel Vetter, amd-gfx, dri-devel, linux-kernel

If userptr pages have been pinned but not bounded,
they remain uncleared.

Signed-off-by: Daniel Gomez <daniel@qtec.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index e8c66d10478f..bbcc6264d48f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
 	struct radeon_ttm_tt *gtt = (void *)ttm;
 	struct radeon_device *rdev = radeon_get_rdev(bdev);
 
+	if (gtt->userptr)
+		radeon_ttm_tt_unpin_userptr(bdev, ttm);
+
 	if (!gtt->bound)
 		return;
 
 	radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
 
-	if (gtt->userptr)
-		radeon_ttm_tt_unpin_userptr(bdev, ttm);
 	gtt->bound = false;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
       [not found] ` <MN2PR12MB37755DAAB03FC0065E98EF1183699@MN2PR12MB3775.namprd12.prod.outlook.com>
@ 2021-03-19  2:57   ` Alex Deucher
  2021-03-19 20:28     ` Felix Kuehling
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2021-03-19  2:57 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Daniel Gomez, David Airlie, linux-kernel, dri-devel, amd-gfx,
	Deucher, Alexander, dagmcr

Applied.  Thanks!

Alex

On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ________________________________
> Von: Daniel Gomez <daniel@qtec.com>
> Gesendet: Donnerstag, 18. März 2021 09:32
> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
>
> If userptr pages have been pinned but not bounded,
> they remain uncleared.
>
> Signed-off-by: Daniel Gomez <daniel@qtec.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index e8c66d10478f..bbcc6264d48f 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
>          struct radeon_ttm_tt *gtt = (void *)ttm;
>          struct radeon_device *rdev = radeon_get_rdev(bdev);
>
> +       if (gtt->userptr)
> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> +
>          if (!gtt->bound)
>                  return;
>
>          radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
>
> -       if (gtt->userptr)
> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>          gtt->bound = false;
>  }
>
> --
> 2.30.2
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-03-19  2:57   ` Alex Deucher
@ 2021-03-19 20:28     ` Felix Kuehling
  2021-03-22  9:38       ` Daniel Gomez
  0 siblings, 1 reply; 9+ messages in thread
From: Felix Kuehling @ 2021-03-19 20:28 UTC (permalink / raw)
  To: Alex Deucher, Koenig, Christian
  Cc: David Airlie, linux-kernel, amd-gfx, dri-devel, Deucher,
	Alexander, Daniel Gomez, dagmcr

This caused a regression in kfdtest in a large-buffer stress test after 
memory allocation for user pages fails:

[17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
[17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17359.551494] #PF: supervisor read access in kernel mode
[17359.557375] #PF: error_code(0x0000) - not-present page
[17359.563247] PGD 0 P4D 0
[17359.566514] Oops: 0000 [#1] SMP PTI
[17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
[17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
[17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
[17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
[17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
[17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
[17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
[17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
[17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
[17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
[17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
[17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
[17359.682883] Call Trace:
[17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
[17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
[17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
[17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
[17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
[17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
[17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
[17359.734152]  __x64_sys_ioctl+0x8b/0xd0
[17359.738796]  do_syscall_64+0x2d/0x40
[17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[17359.749205] RIP: 0033:0x7febb083b6d7
[17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
[17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
[17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
[17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
[17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
[17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
[17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
[17359.837776] CR2: 0000000000000000
[17359.841888] ---[ end trace a6f27d64475b28c8 ]---
[17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
[17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
[17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
[17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
[17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
[17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
[17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
[17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
[17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
[17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0

Reverting this patch fixes the problem for me.

Regards,
   Felix

On 2021-03-18 10:57 p.m., Alex Deucher wrote:
> Applied.  Thanks!
>
> Alex
>
> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> ________________________________
>> Von: Daniel Gomez <daniel@qtec.com>
>> Gesendet: Donnerstag, 18. März 2021 09:32
>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
>>
>> If userptr pages have been pinned but not bounded,
>> they remain uncleared.
>>
>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
>> ---
>>   drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
>> index e8c66d10478f..bbcc6264d48f 100644
>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
>>           struct radeon_ttm_tt *gtt = (void *)ttm;
>>           struct radeon_device *rdev = radeon_get_rdev(bdev);
>>
>> +       if (gtt->userptr)
>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>> +
>>           if (!gtt->bound)
>>                   return;
>>
>>           radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
>>
>> -       if (gtt->userptr)
>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>>           gtt->bound = false;
>>   }
>>
>> --
>> 2.30.2
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-03-19 20:28     ` Felix Kuehling
@ 2021-03-22  9:38       ` Daniel Gomez
  2021-03-22 10:34         ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Gomez @ 2021-03-22  9:38 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Alex Deucher, Koenig, Christian, David Airlie, linux-kernel,
	amd-gfx, dri-devel, Deucher, Alexander, dagmcr

On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
>
> This caused a regression in kfdtest in a large-buffer stress test after
> memory allocation for user pages fails:

I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
not this one.
Just some background for the mem leak patch if helps to understand this:
The leak was introduce here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
where the bound status was introduced for all drm drivers including
radeon and amdgpu. So this patch just reverts the logic to the
original code but keeping the bound status. In my case, the binding
code allocates the user pages memory and returns without bounding (at
amdgpu_gtt_mgr_has_gart_addr). So,
when the unbinding happens, the memory needs to be cleared to prevent the leak.

>
> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [17359.551494] #PF: supervisor read access in kernel mode
> [17359.557375] #PF: error_code(0x0000) - not-present page
> [17359.563247] PGD 0 P4D 0
> [17359.566514] Oops: 0000 [#1] SMP PTI
> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> [17359.682883] Call Trace:
> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
> [17359.738796]  do_syscall_64+0x2d/0x40
> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [17359.749205] RIP: 0033:0x7febb083b6d7
> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
> [17359.837776] CR2: 0000000000000000
> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0

From what I understand, the init_user_pages fails (returns EBUSY) and
the code goes to allocate_init_user_pages_failed where the unbind and
the userptr clear occurs.
Can we prevent this if we save the bounding status + userptr alloc? so
the function amdgpu_ttm_backend_unbind returns without trying to clear
the userptr memory?

Something like:

amdgpu_ttm_backend_bind:
    if (gtt->userptr) {
        r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
        if (r) ...
       gtt->sg_table = true;
   }

amdgpu_ttm_backend_unbind:
if (gtt->sg_table) {
        if (gtt->user_ptr) ...
}

If you agree, I'll send a v2 patch. Otherwise, maybe we could return
within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.

Any other ideas?

Regards,
Daniel

>
> Reverting this patch fixes the problem for me.
>
> Regards,
>    Felix
>
> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
> > Applied.  Thanks!
> >
> > Alex
> >
> > On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
> > <Christian.Koenig@amd.com> wrote:
> >> Reviewed-by: Christian König <christian.koenig@amd.com>
> >> ________________________________
> >> Von: Daniel Gomez <daniel@qtec.com>
> >> Gesendet: Donnerstag, 18. März 2021 09:32
> >> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> >> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
> >>
> >> If userptr pages have been pinned but not bounded,
> >> they remain uncleared.
> >>
> >> Signed-off-by: Daniel Gomez <daniel@qtec.com>
> >> ---
> >>   drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
> >>   1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> >> index e8c66d10478f..bbcc6264d48f 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> >> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
> >>           struct radeon_ttm_tt *gtt = (void *)ttm;
> >>           struct radeon_device *rdev = radeon_get_rdev(bdev);
> >>
> >> +       if (gtt->userptr)
> >> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >> +
> >>           if (!gtt->bound)
> >>                   return;
> >>
> >>           radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
> >>
> >> -       if (gtt->userptr)
> >> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >>           gtt->bound = false;
> >>   }
> >>
> >> --
> >> 2.30.2
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-03-22  9:38       ` Daniel Gomez
@ 2021-03-22 10:34         ` Christian König
  2021-03-22 10:49           ` Daniel Gomez
  2021-04-06 20:55           ` Alex Deucher
  0 siblings, 2 replies; 9+ messages in thread
From: Christian König @ 2021-03-22 10:34 UTC (permalink / raw)
  To: Daniel Gomez, Felix Kuehling
  Cc: David Airlie, linux-kernel, dri-devel, Deucher, Alexander,
	amd-gfx, Alex Deucher, Koenig, Christian, dagmcr

Hi Daniel,

Am 22.03.21 um 10:38 schrieb Daniel Gomez:
> On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
>> This caused a regression in kfdtest in a large-buffer stress test after
>> memory allocation for user pages fails:
> I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
> not this one.
> Just some background for the mem leak patch if helps to understand this:
> The leak was introduce here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
> where the bound status was introduced for all drm drivers including
> radeon and amdgpu. So this patch just reverts the logic to the
> original code but keeping the bound status. In my case, the binding
> code allocates the user pages memory and returns without bounding (at
> amdgpu_gtt_mgr_has_gart_addr). So,
> when the unbinding happens, the memory needs to be cleared to prevent the leak.

Ah, now I understand what's happening here. Daniel your patch is not 
really correct.

The problem is rather that we don't set the tt object to bound if it 
doesn't have a GTT address.

Going to provide a patch for this.

Regards,
Christian.

>
>> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
>> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [17359.551494] #PF: supervisor read access in kernel mode
>> [17359.557375] #PF: error_code(0x0000) - not-present page
>> [17359.563247] PGD 0 P4D 0
>> [17359.566514] Oops: 0000 [#1] SMP PTI
>> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
>> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
>> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
>> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
>> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
>> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
>> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
>> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
>> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
>> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
>> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
>> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
>> [17359.682883] Call Trace:
>> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
>> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
>> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
>> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
>> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
>> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
>> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
>> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
>> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
>> [17359.738796]  do_syscall_64+0x2d/0x40
>> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [17359.749205] RIP: 0033:0x7febb083b6d7
>> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
>> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
>> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
>> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
>> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
>> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
>> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
>> [17359.837776] CR2: 0000000000000000
>> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
>> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
>> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
>> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
>> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
>> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
>> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
>> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
>> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
>> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
>> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
>  From what I understand, the init_user_pages fails (returns EBUSY) and
> the code goes to allocate_init_user_pages_failed where the unbind and
> the userptr clear occurs.
> Can we prevent this if we save the bounding status + userptr alloc? so
> the function amdgpu_ttm_backend_unbind returns without trying to clear
> the userptr memory?
>
> Something like:
>
> amdgpu_ttm_backend_bind:
>      if (gtt->userptr) {
>          r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
>          if (r) ...
>         gtt->sg_table = true;
>     }
>
> amdgpu_ttm_backend_unbind:
> if (gtt->sg_table) {
>          if (gtt->user_ptr) ...
> }
>
> If you agree, I'll send a v2 patch. Otherwise, maybe we could return
> within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.
>
> Any other ideas?
>
> Regards,
> Daniel
>
>> Reverting this patch fixes the problem for me.
>>
>> Regards,
>>     Felix
>>
>> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
>>> Applied.  Thanks!
>>>
>>> Alex
>>>
>>> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
>>> <Christian.Koenig@amd.com> wrote:
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>> ________________________________
>>>> Von: Daniel Gomez <daniel@qtec.com>
>>>> Gesendet: Donnerstag, 18. März 2021 09:32
>>>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>>>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
>>>>
>>>> If userptr pages have been pinned but not bounded,
>>>> they remain uncleared.
>>>>
>>>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
>>>> ---
>>>>    drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> index e8c66d10478f..bbcc6264d48f 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
>>>>            struct radeon_ttm_tt *gtt = (void *)ttm;
>>>>            struct radeon_device *rdev = radeon_get_rdev(bdev);
>>>>
>>>> +       if (gtt->userptr)
>>>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>>>> +
>>>>            if (!gtt->bound)
>>>>                    return;
>>>>
>>>>            radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
>>>>
>>>> -       if (gtt->userptr)
>>>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>>>>            gtt->bound = false;
>>>>    }
>>>>
>>>> --
>>>> 2.30.2
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-03-22 10:34         ` Christian König
@ 2021-03-22 10:49           ` Daniel Gomez
  2021-04-06 20:55           ` Alex Deucher
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Gomez @ 2021-03-22 10:49 UTC (permalink / raw)
  To: Christian König
  Cc: Felix Kuehling, David Airlie, linux-kernel, dri-devel, Deucher,
	Alexander, amd-gfx, Alex Deucher, Koenig, Christian, dagmcr

On Mon, 22 Mar 2021 at 11:34, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hi Daniel,
>
> Am 22.03.21 um 10:38 schrieb Daniel Gomez:
> > On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
> >> This caused a regression in kfdtest in a large-buffer stress test after
> >> memory allocation for user pages fails:
> > I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
> > not this one.
> > Just some background for the mem leak patch if helps to understand this:
> > The leak was introduce here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
> > where the bound status was introduced for all drm drivers including
> > radeon and amdgpu. So this patch just reverts the logic to the
> > original code but keeping the bound status. In my case, the binding
> > code allocates the user pages memory and returns without bounding (at
> > amdgpu_gtt_mgr_has_gart_addr). So,
> > when the unbinding happens, the memory needs to be cleared to prevent the leak.
>
> Ah, now I understand what's happening here. Daniel your patch is not
> really correct.
>
> The problem is rather that we don't set the tt object to bound if it
> doesn't have a GTT address.
Okay, I understand.
>
> Going to provide a patch for this.
Looking forward to your patch. Thanks Christian!
>
> Regards,
> Christian.
>
> >
> >> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
> >> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> [17359.551494] #PF: supervisor read access in kernel mode
> >> [17359.557375] #PF: error_code(0x0000) - not-present page
> >> [17359.563247] PGD 0 P4D 0
> >> [17359.566514] Oops: 0000 [#1] SMP PTI
> >> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
> >> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
> >> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> >> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> >> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> >> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> >> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> >> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> >> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> >> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> >> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> >> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> >> [17359.682883] Call Trace:
> >> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
> >> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
> >> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
> >> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
> >> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
> >> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
> >> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
> >> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
> >> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
> >> [17359.738796]  do_syscall_64+0x2d/0x40
> >> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [17359.749205] RIP: 0033:0x7febb083b6d7
> >> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
> >> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> >> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
> >> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
> >> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
> >> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
> >> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
> >> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
> >> [17359.837776] CR2: 0000000000000000
> >> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
> >> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> >> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> >> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> >> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> >> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> >> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> >> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> >> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> >> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> >> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> >  From what I understand, the init_user_pages fails (returns EBUSY) and
> > the code goes to allocate_init_user_pages_failed where the unbind and
> > the userptr clear occurs.
> > Can we prevent this if we save the bounding status + userptr alloc? so
> > the function amdgpu_ttm_backend_unbind returns without trying to clear
> > the userptr memory?
> >
> > Something like:
> >
> > amdgpu_ttm_backend_bind:
> >      if (gtt->userptr) {
> >          r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
> >          if (r) ...
> >         gtt->sg_table = true;
> >     }
> >
> > amdgpu_ttm_backend_unbind:
> > if (gtt->sg_table) {
> >          if (gtt->user_ptr) ...
> > }
> >
> > If you agree, I'll send a v2 patch. Otherwise, maybe we could return
> > within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.
> >
> > Any other ideas?
> >
> > Regards,
> > Daniel
> >
> >> Reverting this patch fixes the problem for me.
> >>
> >> Regards,
> >>     Felix
> >>
> >> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
> >>> Applied.  Thanks!
> >>>
> >>> Alex
> >>>
> >>> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
> >>> <Christian.Koenig@amd.com> wrote:
> >>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>> ________________________________
> >>>> Von: Daniel Gomez <daniel@qtec.com>
> >>>> Gesendet: Donnerstag, 18. März 2021 09:32
> >>>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> >>>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
> >>>>
> >>>> If userptr pages have been pinned but not bounded,
> >>>> they remain uncleared.
> >>>>
> >>>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
> >>>> ---
> >>>>    drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
> >>>>    1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> index e8c66d10478f..bbcc6264d48f 100644
> >>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
> >>>>            struct radeon_ttm_tt *gtt = (void *)ttm;
> >>>>            struct radeon_device *rdev = radeon_get_rdev(bdev);
> >>>>
> >>>> +       if (gtt->userptr)
> >>>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >>>> +
> >>>>            if (!gtt->bound)
> >>>>                    return;
> >>>>
> >>>>            radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
> >>>>
> >>>> -       if (gtt->userptr)
> >>>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >>>>            gtt->bound = false;
> >>>>    }
> >>>>
> >>>> --
> >>>> 2.30.2
> >>>>
> >>>> _______________________________________________
> >>>> dri-devel mailing list
> >>>> dri-devel@lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-03-22 10:34         ` Christian König
  2021-03-22 10:49           ` Daniel Gomez
@ 2021-04-06 20:55           ` Alex Deucher
  2021-04-07  7:47             ` Daniel Gomez
  1 sibling, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2021-04-06 20:55 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Gomez, Felix Kuehling, David Airlie, linux-kernel,
	dri-devel, Deucher, Alexander, amd-gfx, Koenig, Christian,
	dagmcr

On Mon, Mar 22, 2021 at 6:34 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hi Daniel,
>
> Am 22.03.21 um 10:38 schrieb Daniel Gomez:
> > On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
> >> This caused a regression in kfdtest in a large-buffer stress test after
> >> memory allocation for user pages fails:
> > I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
> > not this one.
> > Just some background for the mem leak patch if helps to understand this:
> > The leak was introduce here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
> > where the bound status was introduced for all drm drivers including
> > radeon and amdgpu. So this patch just reverts the logic to the
> > original code but keeping the bound status. In my case, the binding
> > code allocates the user pages memory and returns without bounding (at
> > amdgpu_gtt_mgr_has_gart_addr). So,
> > when the unbinding happens, the memory needs to be cleared to prevent the leak.
>
> Ah, now I understand what's happening here. Daniel your patch is not
> really correct.
>
> The problem is rather that we don't set the tt object to bound if it
> doesn't have a GTT address.
>
> Going to provide a patch for this.

Did this patch ever land?

Alex

>
> Regards,
> Christian.
>
> >
> >> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
> >> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> [17359.551494] #PF: supervisor read access in kernel mode
> >> [17359.557375] #PF: error_code(0x0000) - not-present page
> >> [17359.563247] PGD 0 P4D 0
> >> [17359.566514] Oops: 0000 [#1] SMP PTI
> >> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
> >> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
> >> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> >> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> >> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> >> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> >> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> >> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> >> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> >> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> >> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> >> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> >> [17359.682883] Call Trace:
> >> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
> >> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
> >> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
> >> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
> >> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
> >> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
> >> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
> >> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
> >> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
> >> [17359.738796]  do_syscall_64+0x2d/0x40
> >> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [17359.749205] RIP: 0033:0x7febb083b6d7
> >> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
> >> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> >> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
> >> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
> >> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
> >> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
> >> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
> >> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
> >> [17359.837776] CR2: 0000000000000000
> >> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
> >> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> >> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> >> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> >> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> >> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> >> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> >> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> >> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> >> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> >> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> >  From what I understand, the init_user_pages fails (returns EBUSY) and
> > the code goes to allocate_init_user_pages_failed where the unbind and
> > the userptr clear occurs.
> > Can we prevent this if we save the bounding status + userptr alloc? so
> > the function amdgpu_ttm_backend_unbind returns without trying to clear
> > the userptr memory?
> >
> > Something like:
> >
> > amdgpu_ttm_backend_bind:
> >      if (gtt->userptr) {
> >          r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
> >          if (r) ...
> >         gtt->sg_table = true;
> >     }
> >
> > amdgpu_ttm_backend_unbind:
> > if (gtt->sg_table) {
> >          if (gtt->user_ptr) ...
> > }
> >
> > If you agree, I'll send a v2 patch. Otherwise, maybe we could return
> > within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.
> >
> > Any other ideas?
> >
> > Regards,
> > Daniel
> >
> >> Reverting this patch fixes the problem for me.
> >>
> >> Regards,
> >>     Felix
> >>
> >> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
> >>> Applied.  Thanks!
> >>>
> >>> Alex
> >>>
> >>> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
> >>> <Christian.Koenig@amd.com> wrote:
> >>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>>> ________________________________
> >>>> Von: Daniel Gomez <daniel@qtec.com>
> >>>> Gesendet: Donnerstag, 18. März 2021 09:32
> >>>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> >>>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
> >>>>
> >>>> If userptr pages have been pinned but not bounded,
> >>>> they remain uncleared.
> >>>>
> >>>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
> >>>> ---
> >>>>    drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
> >>>>    1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> index e8c66d10478f..bbcc6264d48f 100644
> >>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> >>>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
> >>>>            struct radeon_ttm_tt *gtt = (void *)ttm;
> >>>>            struct radeon_device *rdev = radeon_get_rdev(bdev);
> >>>>
> >>>> +       if (gtt->userptr)
> >>>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >>>> +
> >>>>            if (!gtt->bound)
> >>>>                    return;
> >>>>
> >>>>            radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
> >>>>
> >>>> -       if (gtt->userptr)
> >>>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> >>>>            gtt->bound = false;
> >>>>    }
> >>>>
> >>>> --
> >>>> 2.30.2
> >>>>
> >>>> _______________________________________________
> >>>> dri-devel mailing list
> >>>> dri-devel@lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-04-06 20:55           ` Alex Deucher
@ 2021-04-07  7:47             ` Daniel Gomez
  2021-04-07  9:27               ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Gomez @ 2021-04-07  7:47 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Christian König, Felix Kuehling, David Airlie, linux-kernel,
	dri-devel, Deucher, Alexander, amd-gfx, Koenig, Christian,
	dagmcr

On Tue, 6 Apr 2021 at 22:56, Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 6:34 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Hi Daniel,
> >
> > Am 22.03.21 um 10:38 schrieb Daniel Gomez:
> > > On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
> > >> This caused a regression in kfdtest in a large-buffer stress test after
> > >> memory allocation for user pages fails:
> > > I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
> > > not this one.
> > > Just some background for the mem leak patch if helps to understand this:
> > > The leak was introduce here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb
> > > where the bound status was introduced for all drm drivers including
> > > radeon and amdgpu. So this patch just reverts the logic to the
> > > original code but keeping the bound status. In my case, the binding
> > > code allocates the user pages memory and returns without bounding (at
> > > amdgpu_gtt_mgr_has_gart_addr). So,
> > > when the unbinding happens, the memory needs to be cleared to prevent the leak.
> >
> > Ah, now I understand what's happening here. Daniel your patch is not
> > really correct.
> >
> > The problem is rather that we don't set the tt object to bound if it
> > doesn't have a GTT address.
> >
> > Going to provide a patch for this.
>
> Did this patch ever land?

I don't think so but I might send a v2 following Christian's comment
if you guys agree.

Also, the patch here is for radeon but the pagefault issue reported by
Felix is affected by the amdgpu one:

radeon patch: drm/radeon/ttm: Fix memory leak userptr pages
https://patchwork.kernel.org/project/dri-devel/patch/20210318083236.43578-1-daniel@qtec.com/

amdgpu patch: drm/amdgpu/ttm: Fix memory leak userptr pages
https://patchwork.kernel.org/project/dri-devel/patch/20210317160840.36019-1-daniel@qtec.com/

I assume both need to be fixed with the same approach.

Daniel
>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > >
> > >> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
> > >> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >> [17359.551494] #PF: supervisor read access in kernel mode
> > >> [17359.557375] #PF: error_code(0x0000) - not-present page
> > >> [17359.563247] PGD 0 P4D 0
> > >> [17359.566514] Oops: 0000 [#1] SMP PTI
> > >> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
> > >> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
> > >> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> > >> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> > >> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> > >> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> > >> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> > >> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> > >> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> > >> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> > >> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> > >> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> > >> [17359.682883] Call Trace:
> > >> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
> > >> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
> > >> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
> > >> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
> > >> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
> > >> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
> > >> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
> > >> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
> > >> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
> > >> [17359.738796]  do_syscall_64+0x2d/0x40
> > >> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >> [17359.749205] RIP: 0033:0x7febb083b6d7
> > >> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
> > >> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> > >> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
> > >> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
> > >> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
> > >> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
> > >> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
> > >> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
> > >> [17359.837776] CR2: 0000000000000000
> > >> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
> > >> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
> > >> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
> > >> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
> > >> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
> > >> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
> > >> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
> > >> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
> > >> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
> > >> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
> > >> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
> > >  From what I understand, the init_user_pages fails (returns EBUSY) and
> > > the code goes to allocate_init_user_pages_failed where the unbind and
> > > the userptr clear occurs.
> > > Can we prevent this if we save the bounding status + userptr alloc? so
> > > the function amdgpu_ttm_backend_unbind returns without trying to clear
> > > the userptr memory?
> > >
> > > Something like:
> > >
> > > amdgpu_ttm_backend_bind:
> > >      if (gtt->userptr) {
> > >          r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
> > >          if (r) ...
> > >         gtt->sg_table = true;
> > >     }
> > >
> > > amdgpu_ttm_backend_unbind:
> > > if (gtt->sg_table) {
> > >          if (gtt->user_ptr) ...
> > > }
> > >
> > > If you agree, I'll send a v2 patch. Otherwise, maybe we could return
> > > within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.
> > >
> > > Any other ideas?
> > >
> > > Regards,
> > > Daniel
> > >
> > >> Reverting this patch fixes the problem for me.
> > >>
> > >> Regards,
> > >>     Felix
> > >>
> > >> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
> > >>> Applied.  Thanks!
> > >>>
> > >>> Alex
> > >>>
> > >>> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
> > >>> <Christian.Koenig@amd.com> wrote:
> > >>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> > >>>> ________________________________
> > >>>> Von: Daniel Gomez <daniel@qtec.com>
> > >>>> Gesendet: Donnerstag, 18. März 2021 09:32
> > >>>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> > >>>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
> > >>>>
> > >>>> If userptr pages have been pinned but not bounded,
> > >>>> they remain uncleared.
> > >>>>
> > >>>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
> > >>>> ---
> > >>>>    drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
> > >>>>    1 file changed, 3 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> > >>>> index e8c66d10478f..bbcc6264d48f 100644
> > >>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> > >>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> > >>>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
> > >>>>            struct radeon_ttm_tt *gtt = (void *)ttm;
> > >>>>            struct radeon_device *rdev = radeon_get_rdev(bdev);
> > >>>>
> > >>>> +       if (gtt->userptr)
> > >>>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> > >>>> +
> > >>>>            if (!gtt->bound)
> > >>>>                    return;
> > >>>>
> > >>>>            radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
> > >>>>
> > >>>> -       if (gtt->userptr)
> > >>>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
> > >>>>            gtt->bound = false;
> > >>>>    }
> > >>>>
> > >>>> --
> > >>>> 2.30.2
> > >>>>
> > >>>> _______________________________________________
> > >>>> dri-devel mailing list
> > >>>> dri-devel@lists.freedesktop.org
> > >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>> _______________________________________________
> > >>> dri-devel mailing list
> > >>> dri-devel@lists.freedesktop.org
> > >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > _______________________________________________
> > > amd-gfx mailing list
> > > amd-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
  2021-04-07  7:47             ` Daniel Gomez
@ 2021-04-07  9:27               ` Christian König
  0 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2021-04-07  9:27 UTC (permalink / raw)
  To: Daniel Gomez, Alex Deucher
  Cc: Christian König, Felix Kuehling, David Airlie, linux-kernel,
	dri-devel, Deucher, Alexander, amd-gfx, dagmcr

Am 07.04.21 um 09:47 schrieb Daniel Gomez:
> On Tue, 6 Apr 2021 at 22:56, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Mon, Mar 22, 2021 at 6:34 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Hi Daniel,
>>>
>>> Am 22.03.21 um 10:38 schrieb Daniel Gomez:
>>>> On Fri, 19 Mar 2021 at 21:29, Felix Kuehling <felix.kuehling@amd.com> wrote:
>>>>> This caused a regression in kfdtest in a large-buffer stress test after
>>>>> memory allocation for user pages fails:
>>>> I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
>>>> not this one.
>>>> Just some background for the mem leak patch if helps to understand this:
>>>> The leak was introduce here:
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D0b988ca1c7c4c73983b4ea96ef7c2af2263c87eb&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2FeOQf12NBkC3YGZ7QW66%2FT%2FpyM3DjEb9IMbqUvISMfo%3D&amp;reserved=0
>>>> where the bound status was introduced for all drm drivers including
>>>> radeon and amdgpu. So this patch just reverts the logic to the
>>>> original code but keeping the bound status. In my case, the binding
>>>> code allocates the user pages memory and returns without bounding (at
>>>> amdgpu_gtt_mgr_has_gart_addr). So,
>>>> when the unbinding happens, the memory needs to be cleared to prevent the leak.
>>> Ah, now I understand what's happening here. Daniel your patch is not
>>> really correct.
>>>
>>> The problem is rather that we don't set the tt object to bound if it
>>> doesn't have a GTT address.
>>>
>>> Going to provide a patch for this.
>> Did this patch ever land?
> I don't think so but I might send a v2 following Christian's comment
> if you guys agree.

Somebody else already provided a patch which I reviewed, but I'm not 
sure if that landed either.

> Also, the patch here is for radeon but the pagefault issue reported by
> Felix is affected by the amdgpu one:
>
> radeon patch: drm/radeon/ttm: Fix memory leak userptr pages
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210318083236.43578-1-daniel%40qtec.com%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HSMK%2FqYz%2Bzz9qbKc%2FITUWluBDeaW9YrgyH8p0L640%2F0%3D&amp;reserved=0
>
> amdgpu patch: drm/amdgpu/ttm: Fix memory leak userptr pages
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210317160840.36019-1-daniel%40qtec.com%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UsUZ4YjCSjHmzlPB07oTaGrsntTrQSwlGk%2BxUjwDiag%3D&amp;reserved=0
>
> I assume both need to be fixed with the same approach.

Yes correct. Let me double check where that fix went.

Thanks,
Christian.

>
> Daniel
>> Alex
>>
>>> Regards,
>>> Christian.
>>>
>>>>> [17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
>>>>> [17359.543746] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>>> [17359.551494] #PF: supervisor read access in kernel mode
>>>>> [17359.557375] #PF: error_code(0x0000) - not-present page
>>>>> [17359.563247] PGD 0 P4D 0
>>>>> [17359.566514] Oops: 0000 [#1] SMP PTI
>>>>> [17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin #193
>>>>> [17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 06/17/2016
>>>>> [17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
>>>>> [17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
>>>>> [17359.614340] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
>>>>> [17359.620315] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
>>>>> [17359.628204] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
>>>>> [17359.636084] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
>>>>> [17359.643958] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
>>>>> [17359.651833] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
>>>>> [17359.659701] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
>>>>> [17359.668528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [17359.675012] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
>>>>> [17359.682883] Call Trace:
>>>>> [17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
>>>>> [17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
>>>>> [17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
>>>>> [17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
>>>>> [17359.708701]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7e5/0x910 [amdgpu]
>>>>> [17359.716307]  kfd_ioctl_alloc_memory_of_gpu+0x11a/0x220 [amdgpu]
>>>>> [17359.723036]  kfd_ioctl+0x223/0x400 [amdgpu]
>>>>> [17359.728017]  ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
>>>>> [17359.734152]  __x64_sys_ioctl+0x8b/0xd0
>>>>> [17359.738796]  do_syscall_64+0x2d/0x40
>>>>> [17359.743259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [17359.749205] RIP: 0033:0x7febb083b6d7
>>>>> [17359.753681] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
>>>>> [17359.774340] RSP: 002b:00007ffdb5522cd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>>>>> [17359.782668] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007febb083b6d7
>>>>> [17359.790566] RDX: 00007ffdb5522d60 RSI: 00000000c0284b16 RDI: 0000000000000003
>>>>> [17359.798459] RBP: 00007ffdb5522d10 R08: 00007ffdb5522dd0 R09: 00000000c4000004
>>>>> [17359.806352] R10: 0000000000000000 R11: 0000000000000202 R12: 0000559416e4e2aa
>>>>> [17359.814251] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
>>>>> [17359.822140] Modules linked in: ip6table_filter ip6_tables iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 gpu_sched ip_tables x_tables
>>>>> [17359.837776] CR2: 0000000000000000
>>>>> [17359.841888] ---[ end trace a6f27d64475b28c8 ]---
>>>>> [17359.847318] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
>>>>> [17359.854479] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
>>>>> [17359.874929] RSP: 0018:ffffa4764971fc98 EFLAGS: 00010206
>>>>> [17359.881014] RAX: 0000000000000000 RBX: ffff950e8d4edf00 RCX: 0000000000000000
>>>>> [17359.889007] RDX: 0000000000000000 RSI: ffff950e8d4edf00 RDI: ffff950eadec5e80
>>>>> [17359.897008] RBP: ffff950eadec5e80 R08: 0000000000000000 R09: 0000000000000000
>>>>> [17359.905020] R10: 0000000000000246 R11: 0000000000000001 R12: ffff950c03377800
>>>>> [17359.913034] R13: ffff950eadec5e80 R14: ffff950c03377858 R15: 0000000000000000
>>>>> [17359.921050] FS:  00007febb20cb740(0000) GS:ffff950ebfc00000(0000) knlGS:0000000000000000
>>>>> [17359.930047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [17359.936674] CR2: 0000000000000000 CR3: 00000006d700e005 CR4: 00000000001706e0
>>>>   From what I understand, the init_user_pages fails (returns EBUSY) and
>>>> the code goes to allocate_init_user_pages_failed where the unbind and
>>>> the userptr clear occurs.
>>>> Can we prevent this if we save the bounding status + userptr alloc? so
>>>> the function amdgpu_ttm_backend_unbind returns without trying to clear
>>>> the userptr memory?
>>>>
>>>> Something like:
>>>>
>>>> amdgpu_ttm_backend_bind:
>>>>       if (gtt->userptr) {
>>>>           r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
>>>>           if (r) ...
>>>>          gtt->sg_table = true;
>>>>      }
>>>>
>>>> amdgpu_ttm_backend_unbind:
>>>> if (gtt->sg_table) {
>>>>           if (gtt->user_ptr) ...
>>>> }
>>>>
>>>> If you agree, I'll send a v2 patch. Otherwise, maybe we could return
>>>> within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.
>>>>
>>>> Any other ideas?
>>>>
>>>> Regards,
>>>> Daniel
>>>>
>>>>> Reverting this patch fixes the problem for me.
>>>>>
>>>>> Regards,
>>>>>      Felix
>>>>>
>>>>> On 2021-03-18 10:57 p.m., Alex Deucher wrote:
>>>>>> Applied.  Thanks!
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
>>>>>> <Christian.Koenig@amd.com> wrote:
>>>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>> ________________________________
>>>>>>> Von: Daniel Gomez <daniel@qtec.com>
>>>>>>> Gesendet: Donnerstag, 18. März 2021 09:32
>>>>>>> Cc: dagmcr@gmail.com <dagmcr@gmail.com>; Daniel Gomez <daniel@qtec.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>>>>>>> Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages
>>>>>>>
>>>>>>> If userptr pages have been pinned but not bounded,
>>>>>>> they remain uncleared.
>>>>>>>
>>>>>>> Signed-off-by: Daniel Gomez <daniel@qtec.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
>>>>>>>     1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>>>>> index e8c66d10478f..bbcc6264d48f 100644
>>>>>>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>>>>>>> @@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct ttm_bo_device *bdev, struct ttm_tt
>>>>>>>             struct radeon_ttm_tt *gtt = (void *)ttm;
>>>>>>>             struct radeon_device *rdev = radeon_get_rdev(bdev);
>>>>>>>
>>>>>>> +       if (gtt->userptr)
>>>>>>> +               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>>>>>>> +
>>>>>>>             if (!gtt->bound)
>>>>>>>                     return;
>>>>>>>
>>>>>>>             radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);
>>>>>>>
>>>>>>> -       if (gtt->userptr)
>>>>>>> -               radeon_ttm_tt_unpin_userptr(bdev, ttm);
>>>>>>>             gtt->bound = false;
>>>>>>>     }
>>>>>>>
>>>>>>> --
>>>>>>> 2.30.2
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aU9ZAl66EOLKphjWFPXJPR%2BTM%2BZeeMv%2BVJC6vliEqrs%3D&amp;reserved=0
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aU9ZAl66EOLKphjWFPXJPR%2BTM%2BZeeMv%2BVJC6vliEqrs%3D&amp;reserved=0
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=USmYbhfkSfPcE1npvsMfRwBr9Ijresh1fH4cAeNEr2M%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-07  9:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18  8:32 [PATCH] drm/radeon/ttm: Fix memory leak userptr pages Daniel Gomez
     [not found] ` <MN2PR12MB37755DAAB03FC0065E98EF1183699@MN2PR12MB3775.namprd12.prod.outlook.com>
2021-03-19  2:57   ` Alex Deucher
2021-03-19 20:28     ` Felix Kuehling
2021-03-22  9:38       ` Daniel Gomez
2021-03-22 10:34         ` Christian König
2021-03-22 10:49           ` Daniel Gomez
2021-04-06 20:55           ` Alex Deucher
2021-04-07  7:47             ` Daniel Gomez
2021-04-07  9:27               ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).