All of lore.kernel.org
 help / color / mirror / Atom feed
* TTM leaking swap space?
@ 2018-01-17  3:21 Felix Kuehling
       [not found] ` <5ce0f6cb-4c5e-2915-da39-a8037568edcb-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2018-01-17  3:21 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I'm running an eviction stress test with KFD and find that sometimes it
starts swapping. When that happens, swap usage goes up rapidly, but it
never comes down. Even after the processes terminate, and all VRAM and
GTT allocations are freed (checked in
/sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
released.

Running the test repeatedly I was able to trigger the OOM killer quite
easily. The system died with a panic, running out of processes to kill.

The symptoms look like swap space is only allocated but never released.

A quick look at the swapping code in ttm_tt.c doesn't show any obvious
problems. I'm assuming that fput should free swap space. That should
happen when BOs are swapped back in, or destroyed. As far as I can tell,
amdgpu doesn't use persistent swap space, so I'm ignoring
TTM_PAGE_FLAG_PERSISTENT_SWAP.

Any other ideas or pointers?

Thanks,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TTM leaking swap space?
       [not found] ` <5ce0f6cb-4c5e-2915-da39-a8037568edcb-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-17  3:24   ` Chunming Zhou
       [not found]     ` <d497572f-bd28-d637-5b83-c8812d942e8e-5C7GfCeVMHo@public.gmane.org>
  2018-01-17 20:33   ` Andrey Grodzovsky
  1 sibling, 1 reply; 6+ messages in thread
From: Chunming Zhou @ 2018-01-17  3:24 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Felix,

Could I get your test to have a try?


Thanks,

David Zhou


On 2018年01月17日 11:21, Felix Kuehling wrote:
> I'm running an eviction stress test with KFD and find that sometimes it
> starts swapping. When that happens, swap usage goes up rapidly, but it
> never comes down. Even after the processes terminate, and all VRAM and
> GTT allocations are freed (checked in
> /sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
> released.
>
> Running the test repeatedly I was able to trigger the OOM killer quite
> easily. The system died with a panic, running out of processes to kill.
>
> The symptoms look like swap space is only allocated but never released.
>
> A quick look at the swapping code in ttm_tt.c doesn't show any obvious
> problems. I'm assuming that fput should free swap space. That should
> happen when BOs are swapped back in, or destroyed. As far as I can tell,
> amdgpu doesn't use persistent swap space, so I'm ignoring
> TTM_PAGE_FLAG_PERSISTENT_SWAP.
>
> Any other ideas or pointers?
>
> Thanks,
>    Felix
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TTM leaking swap space?
       [not found]     ` <d497572f-bd28-d637-5b83-c8812d942e8e-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-17 16:52       ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2018-01-17 16:52 UTC (permalink / raw)
  To: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

It's on the amd-kfd-staging branch with some eviction fixes that are
still under developer testing, and a test that's still in development.
Not easy to share right now.

I'll look into ways to make it easier to reproduce. I'll experiment with
a driver change that deliberately swaps all BOs during allocation to see
if there is any obvious problem with swapping.

Regards,
  Felix


On 2018-01-16 10:24 PM, Chunming Zhou wrote:
> Hi Felix,
>
> Could I get your test to have a try?
>
>
> Thanks,
>
> David Zhou
>
>
> On 2018年01月17日 11:21, Felix Kuehling wrote:
>> I'm running an eviction stress test with KFD and find that sometimes it
>> starts swapping. When that happens, swap usage goes up rapidly, but it
>> never comes down. Even after the processes terminate, and all VRAM and
>> GTT allocations are freed (checked in
>> /sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
>> released.
>>
>> Running the test repeatedly I was able to trigger the OOM killer quite
>> easily. The system died with a panic, running out of processes to kill.
>>
>> The symptoms look like swap space is only allocated but never released.
>>
>> A quick look at the swapping code in ttm_tt.c doesn't show any obvious
>> problems. I'm assuming that fput should free swap space. That should
>> happen when BOs are swapped back in, or destroyed. As far as I can tell,
>> amdgpu doesn't use persistent swap space, so I'm ignoring
>> TTM_PAGE_FLAG_PERSISTENT_SWAP.
>>
>> Any other ideas or pointers?
>>
>> Thanks,
>>    Felix
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TTM leaking swap space?
       [not found] ` <5ce0f6cb-4c5e-2915-da39-a8037568edcb-5C7GfCeVMHo@public.gmane.org>
  2018-01-17  3:24   ` Chunming Zhou
@ 2018-01-17 20:33   ` Andrey Grodzovsky
       [not found]     ` <995fdb3c-27e5-4cd7-2a0b-cd686a575fde-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Andrey Grodzovsky @ 2018-01-17 20:33 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I have a private libdrm amdgpu test which allocates very big BOs in loop 
until all VRAM, GTT and swap are full, and I don't release them in the 
test (yet) .

Once the test process terminates everything always gets cleared 
including swap . Could this point to KFD specific issue ?

Thanks,

Andrey


On 01/16/2018 10:21 PM, Felix Kuehling wrote:
> I'm running an eviction stress test with KFD and find that sometimes it
> starts swapping. When that happens, swap usage goes up rapidly, but it
> never comes down. Even after the processes terminate, and all VRAM and
> GTT allocations are freed (checked in
> /sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
> released.
>
> Running the test repeatedly I was able to trigger the OOM killer quite
> easily. The system died with a panic, running out of processes to kill.
>
> The symptoms look like swap space is only allocated but never released.
>
> A quick look at the swapping code in ttm_tt.c doesn't show any obvious
> problems. I'm assuming that fput should free swap space. That should
> happen when BOs are swapped back in, or destroyed. As far as I can tell,
> amdgpu doesn't use persistent swap space, so I'm ignoring
> TTM_PAGE_FLAG_PERSISTENT_SWAP.
>
> Any other ideas or pointers?
>
> Thanks,
>    Felix
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: TTM leaking swap space?
       [not found]     ` <995fdb3c-27e5-4cd7-2a0b-cd686a575fde-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-17 23:50       ` Felix Kuehling
       [not found]         ` <a76d4e01-906a-179b-64b6-acec8526e7bd-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2018-01-17 23:50 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-01-17 03:33 PM, Andrey Grodzovsky wrote:
> I have a private libdrm amdgpu test which allocates very big BOs in
> loop until all VRAM, GTT and swap are full, and I don't release them
> in the test (yet) .
>
> Once the test process terminates everything always gets cleared
> including swap . Could this point to KFD specific issue ?

That's possible.

I added some WARNs:

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5a046a3..d68141e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -175,7 +175,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
 
        if (ttm->state == tt_unbound)
                ttm_tt_unpopulate(ttm);
-
+        WARN_ON(ttm->page_flags & TTM_PAGE_FLAG_PERSISTENT_SWAP);
        if (!(ttm->page_flags & TTM_PAGE_FLAG_PERSISTENT_SWAP) &&
            ttm->swap_storage)
                fput(ttm->swap_storage);
@@ -321,6 +321,7 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 
        return 0;
 out_err:
+        WARN(1, "Returning error, not freeing swap_storage");
        return ret;
 }
 
@@ -336,7 +337,8 @@ int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage)
        BUG_ON(ttm->state != tt_unbound && ttm->state != tt_unpopulated);
        BUG_ON(ttm->caching_state != tt_cached);
 
-       if (!persistent_swap_storage) {
+        if (!persistent_swap_storage) {
+                WARN(ttm->swap_storage, "already has swap storage");
                swap_storage = shmem_file_setup("ttm swap",
                                                ttm->num_pages << PAGE_SHIFT,
                                                0);


And noticed that ttm_bo_swapout is getting called on BOs that already
have swap space. I think that means it's trying to swap out a BO that's
already swapped out, and that's where it's leaking the pointer to
already allocated swap space:

Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602083] ------------[ cut here ]------------
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602086] already has swap storage
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602124] WARNING: CPU: 8 PID: 1940 at /home/fkuehlin/compute/kernel/drivers/gpu/drm/ttm/t
tm_tt.c:341 ttm_tt_swapout+0x230/0x250 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602126] Modules linked in: ip6_tables(E) ip_tables(E) x_tables(E) x86_pkg_temp_thermal(E
) amdkfd(E) amd_iommu_v2(E) amdgpu(E) chash(E) gpu_sched(E) ttm(E)
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602139] CPU: 8 PID: 1940 Comm: kworker/u24:6 Tainted: G        W   E    4.15.0-rc2-kfd-fkuehlin #7
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602141] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 2006 04/07/2016
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602144] Workqueue: ttm_swap ttm_shrink_work [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602147] task: 00000000894fffc6 task.stack: 000000008f73bd43
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602150] RIP: 0010:ttm_tt_swapout+0x230/0x250 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602151] RSP: 0018:ffffa87a43633ce8 EFLAGS: 00010296
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602153] RAX: 0000000000000018 RBX: ffff90ba3af6b858 RCX: 0000000000000006
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602154] RDX: 0000000000001027 RSI: ffff90bbe3306df8 RDI: 0000000000000202
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602155] RBP: ffff90bbddabde00 R08: 0000000000000000 R09: 0000000000000000
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602156] R10: 000000005b2635d0 R11: 0000000000000000 R12: ffff90ba3af6b88c
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602157] R13: ffffa87a43633e3a R14: ffff90bbdda59d70 R15: ffff90bbdda59ce0
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602159] FS:  0000000000000000(0000) GS:ffff90bbe7400000(0000) knlGS:0000000000000000
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602161] CR2: 0000000001afbd58 CR3: 00000003f5010002 CR4: 00000000001606e0
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602162] Call Trace:
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602171]  ttm_bo_swapout+0x23a/0x260 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602175]  ? ttm_shrink+0xa8/0xf0 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602179]  ttm_shrink+0xb6/0xf0 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602184]  ttm_shrink_work+0x31/0x40 [ttm]
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602189]  process_one_work+0x19d/0x430
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602191]  ? process_one_work+0x136/0x430
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602196]  worker_thread+0x45/0x430
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602202]  kthread+0x134/0x170
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602204]  ? process_one_work+0x430/0x430
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602206]  ? kthread_delayed_work_timer_fn+0x80/0x80
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602212]  ret_from_fork+0x24/0x30
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602218] Code: 89 45 40 31 c0 e9 34 ff ff ff 48 c7 c7 a8 ea 13 c0 e8 0b ff f8 ed 8b 44 24 08 e9 1f ff ff ff 48 c7 c7 de fa 13 c0 e8 d0 85 f2 ed <0f> ff eb 80 48 89 ef e8 14 fc ff ff 48 8b 04 24 48 89 45 40 8b 
Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602268] ---[ end trace 019b6398cabc8266 ]---

Regards,
  Felix


>
> Thanks,
>
> Andrey
>
>
> On 01/16/2018 10:21 PM, Felix Kuehling wrote:
>> I'm running an eviction stress test with KFD and find that sometimes it
>> starts swapping. When that happens, swap usage goes up rapidly, but it
>> never comes down. Even after the processes terminate, and all VRAM and
>> GTT allocations are freed (checked in
>> /sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
>> released.
>>
>> Running the test repeatedly I was able to trigger the OOM killer quite
>> easily. The system died with a panic, running out of processes to kill.
>>
>> The symptoms look like swap space is only allocated but never released.
>>
>> A quick look at the swapping code in ttm_tt.c doesn't show any obvious
>> problems. I'm assuming that fput should free swap space. That should
>> happen when BOs are swapped back in, or destroyed. As far as I can tell,
>> amdgpu doesn't use persistent swap space, so I'm ignoring
>> TTM_PAGE_FLAG_PERSISTENT_SWAP.
>>
>> Any other ideas or pointers?
>>
>> Thanks,
>>    Felix
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: TTM leaking swap space?
       [not found]         ` <a76d4e01-906a-179b-64b6-acec8526e7bd-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-18  4:22           ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2018-01-18  4:22 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I find that ttm_bo_swapout tries to swap the first BO from the swap LRU
list. But I don't find what removes a BO from that list. It looks like
the same BO can be swapped again, even when it's already swapped. What
am I missing?

Should swapped out BOs still be on the swap LRU list? In that case we
need to add a condition in ttm_bo_swapout to prevent swapping out a BO
that's already swapped.

Regards,
  Felix


On 2018-01-17 06:50 PM, Felix Kuehling wrote:
> On 2018-01-17 03:33 PM, Andrey Grodzovsky wrote:
>> I have a private libdrm amdgpu test which allocates very big BOs in
>> loop until all VRAM, GTT and swap are full, and I don't release them
>> in the test (yet) .
>>
>> Once the test process terminates everything always gets cleared
>> including swap . Could this point to KFD specific issue ?
> That's possible.
>
> I added some WARNs:
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 5a046a3..d68141e 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -175,7 +175,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
>  
>         if (ttm->state == tt_unbound)
>                 ttm_tt_unpopulate(ttm);
> -
> +        WARN_ON(ttm->page_flags & TTM_PAGE_FLAG_PERSISTENT_SWAP);
>         if (!(ttm->page_flags & TTM_PAGE_FLAG_PERSISTENT_SWAP) &&
>             ttm->swap_storage)
>                 fput(ttm->swap_storage);
> @@ -321,6 +321,7 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
>  
>         return 0;
>  out_err:
> +        WARN(1, "Returning error, not freeing swap_storage");
>         return ret;
>  }
>  
> @@ -336,7 +337,8 @@ int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage)
>         BUG_ON(ttm->state != tt_unbound && ttm->state != tt_unpopulated);
>         BUG_ON(ttm->caching_state != tt_cached);
>  
> -       if (!persistent_swap_storage) {
> +        if (!persistent_swap_storage) {
> +                WARN(ttm->swap_storage, "already has swap storage");
>                 swap_storage = shmem_file_setup("ttm swap",
>                                                 ttm->num_pages << PAGE_SHIFT,
>                                                 0);
>
>
> And noticed that ttm_bo_swapout is getting called on BOs that already
> have swap space. I think that means it's trying to swap out a BO that's
> already swapped out, and that's where it's leaking the pointer to
> already allocated swap space:
>
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602083] ------------[ cut here ]------------
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602086] already has swap storage
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602124] WARNING: CPU: 8 PID: 1940 at /home/fkuehlin/compute/kernel/drivers/gpu/drm/ttm/t
> tm_tt.c:341 ttm_tt_swapout+0x230/0x250 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602126] Modules linked in: ip6_tables(E) ip_tables(E) x_tables(E) x86_pkg_temp_thermal(E
> ) amdkfd(E) amd_iommu_v2(E) amdgpu(E) chash(E) gpu_sched(E) ttm(E)
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602139] CPU: 8 PID: 1940 Comm: kworker/u24:6 Tainted: G        W   E    4.15.0-rc2-kfd-fkuehlin #7
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602141] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 2006 04/07/2016
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602144] Workqueue: ttm_swap ttm_shrink_work [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602147] task: 00000000894fffc6 task.stack: 000000008f73bd43
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602150] RIP: 0010:ttm_tt_swapout+0x230/0x250 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602151] RSP: 0018:ffffa87a43633ce8 EFLAGS: 00010296
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602153] RAX: 0000000000000018 RBX: ffff90ba3af6b858 RCX: 0000000000000006
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602154] RDX: 0000000000001027 RSI: ffff90bbe3306df8 RDI: 0000000000000202
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602155] RBP: ffff90bbddabde00 R08: 0000000000000000 R09: 0000000000000000
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602156] R10: 000000005b2635d0 R11: 0000000000000000 R12: ffff90ba3af6b88c
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602157] R13: ffffa87a43633e3a R14: ffff90bbdda59d70 R15: ffff90bbdda59ce0
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602159] FS:  0000000000000000(0000) GS:ffff90bbe7400000(0000) knlGS:0000000000000000
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602161] CR2: 0000000001afbd58 CR3: 00000003f5010002 CR4: 00000000001606e0
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602162] Call Trace:
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602171]  ttm_bo_swapout+0x23a/0x260 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602175]  ? ttm_shrink+0xa8/0xf0 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602179]  ttm_shrink+0xb6/0xf0 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602184]  ttm_shrink_work+0x31/0x40 [ttm]
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602189]  process_one_work+0x19d/0x430
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602191]  ? process_one_work+0x136/0x430
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602196]  worker_thread+0x45/0x430
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602202]  kthread+0x134/0x170
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602204]  ? process_one_work+0x430/0x430
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602206]  ? kthread_delayed_work_timer_fn+0x80/0x80
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602212]  ret_from_fork+0x24/0x30
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602218] Code: 89 45 40 31 c0 e9 34 ff ff ff 48 c7 c7 a8 ea 13 c0 e8 0b ff f8 ed 8b 44 24 08 e9 1f ff ff ff 48 c7 c7 de fa 13 c0 e8 d0 85 f2 ed <0f> ff eb 80 48 89 ef e8 14 fc ff ff 48 8b 04 24 48 89 45 40 8b 
> Jan 17 18:40:06 fkuehlin-hsatest2 kernel: [  196.602268] ---[ end trace 019b6398cabc8266 ]---
>
> Regards,
>   Felix
>
>
>> Thanks,
>>
>> Andrey
>>
>>
>> On 01/16/2018 10:21 PM, Felix Kuehling wrote:
>>> I'm running an eviction stress test with KFD and find that sometimes it
>>> starts swapping. When that happens, swap usage goes up rapidly, but it
>>> never comes down. Even after the processes terminate, and all VRAM and
>>> GTT allocations are freed (checked in
>>> /sys/kernel/debug/dri/0/amdgpu_{gtt|vram}_mm), swap space is still not
>>> released.
>>>
>>> Running the test repeatedly I was able to trigger the OOM killer quite
>>> easily. The system died with a panic, running out of processes to kill.
>>>
>>> The symptoms look like swap space is only allocated but never released.
>>>
>>> A quick look at the swapping code in ttm_tt.c doesn't show any obvious
>>> problems. I'm assuming that fput should free swap space. That should
>>> happen when BOs are swapped back in, or destroyed. As far as I can tell,
>>> amdgpu doesn't use persistent swap space, so I'm ignoring
>>> TTM_PAGE_FLAG_PERSISTENT_SWAP.
>>>
>>> Any other ideas or pointers?
>>>
>>> Thanks,
>>>    Felix
>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-18  4:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17  3:21 TTM leaking swap space? Felix Kuehling
     [not found] ` <5ce0f6cb-4c5e-2915-da39-a8037568edcb-5C7GfCeVMHo@public.gmane.org>
2018-01-17  3:24   ` Chunming Zhou
     [not found]     ` <d497572f-bd28-d637-5b83-c8812d942e8e-5C7GfCeVMHo@public.gmane.org>
2018-01-17 16:52       ` Felix Kuehling
2018-01-17 20:33   ` Andrey Grodzovsky
     [not found]     ` <995fdb3c-27e5-4cd7-2a0b-cd686a575fde-5C7GfCeVMHo@public.gmane.org>
2018-01-17 23:50       ` Felix Kuehling
     [not found]         ` <a76d4e01-906a-179b-64b6-acec8526e7bd-5C7GfCeVMHo@public.gmane.org>
2018-01-18  4:22           ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.