All of lore.kernel.org
 help / color / mirror / Atom feed
* TTM allocation failure under memory pressure on suspend
@ 2019-05-29 19:18 Lorenz Brun
  0 siblings, 0 replies; only message in thread
From: Lorenz Brun @ 2019-05-29 19:18 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

I have a RX 570 which fails to suspend properly under memory pressure and stays black after waking up.
It looks like an allocation failure in the TTM VRAM eviction is to blame:

[635471.240411] kworker/u24:26: page allocation failure: order:0, mode:0x620402(GFP_NOIO|__GFP_HIGHMEM|__GFP_RETRY_MAYFAIL|__GFP_HARDWALL), nodemask=(null),cpuset=/,mems_allowed=0
[635471.240416] CPU: 9 PID: 20884 Comm: kworker/u24:26 Tainted: P           OE     5.0.0-13-generic #14-Ubuntu
[635471.240417] Hardware name: MSI MS-7885/X99A SLI PLUS(MS-7885), BIOS 1.80 03/20/2015
[635471.240421] Workqueue: events_unbound async_run_entry_fn
[635471.240421] Call Trace:
[635471.240426]  dump_stack+0x63/0x8a
[635471.240428]  warn_alloc.cold.119+0x7b/0xfb
[635471.240429]  __alloc_pages_slowpath+0xe63/0xea0
[635471.240432]  ? flush_tlb_all+0x1c/0x20
[635471.240433]  ? change_page_attr_set_clr+0x164/0x1f0
[635471.240434]  __alloc_pages_nodemask+0x2c4/0x2e0
[635471.240437]  alloc_pages_current+0x81/0xe0
[635471.240442]  ttm_alloc_new_pages.isra.16+0x95/0x1e0 [ttm]
[635471.240444]  ttm_page_pool_get_pages+0x16b/0x380 [ttm]
[635471.240446]  ttm_pool_populate+0x1a3/0x4a0 [ttm]
[635471.240448]  ttm_populate_and_map_pages+0x28/0x250 [ttm]
[635471.240450]  ? ttm_dma_tt_alloc_page_directory+0x2d/0x60 [ttm]
[635471.240490]  amdgpu_ttm_tt_populate+0x56/0xe0 [amdgpu]
[635471.240493]  ttm_tt_populate.part.9+0x22/0x60 [ttm]
[635471.240495]  ttm_tt_bind+0x4f/0x60 [ttm]
[635471.240497]  ttm_bo_handle_move_mem+0x26c/0x500 [ttm]
[635471.240499]  ttm_bo_evict+0x142/0x1c0 [ttm]
[635471.240501]  ttm_mem_evict_first+0x19a/0x220 [ttm]
[635471.240504]  ttm_bo_force_list_clean+0xa1/0x170 [ttm]
[635471.240506]  ttm_bo_evict_mm+0x2e/0x30 [ttm]
[635471.240531]  amdgpu_bo_evict_vram+0x1a/0x20 [amdgpu]
[635471.240554]  amdgpu_device_suspend+0x1dd/0x3d0 [amdgpu]
[635471.240578]  amdgpu_pmops_suspend+0x1f/0x30 [amdgpu]
[635471.240579]  pci_pm_suspend+0x76/0x130
[635471.240580]  ? pci_pm_freeze+0xf0/0xf0
[635471.240582]  dpm_run_callback+0x66/0x150
[635471.240582]  __device_suspend+0x110/0x490
[635471.240583]  async_suspend+0x1f/0x90
[635471.240584]  async_run_entry_fn+0x3c/0x150
[635471.240586]  process_one_work+0x20f/0x410
[635471.240587]  worker_thread+0x34/0x400
[635471.240589]  kthread+0x120/0x140
[635471.240589]  ? process_one_work+0x410/0x410
[635471.240591]  ? __kthread_parkme+0x70/0x70
[635471.240592]  ret_from_fork+0x35/0x40
…
[635471.241994] [TTM] Buffer eviction failed
[635471.627554] [TTM] Buffer eviction failed

Subsequently it fails to wake up (all 3 screens black) because of an initialization failure:

[635472.216323] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[635472.216354] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
[635472.216384] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
[635472.216387] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110
[635472.216390] PM: Device 0000:04:00.0 failed to resume async: error -110

I’m pretty sure the problem is setting GFP_NOIO which makes it impossible for the kernel to swap anything out and it subsequently gives up trying to satisfy the allocation. I usually run under quite some memory pressure with a lot of swap (32GiB RAM + 48GiB Swap, >48GiB memory usage is regular). I have looked at the code in question but I’m not sure where this is coming from, it seems like neither ttm nor amdgpu set GFP_NOIO. TTM seems to have per-pool allocation flags and somehow GFP_NOIO is getting enabled there for the amdgpu pool.

Thanks,
Lorenz
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-05-29 19:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-29 19:18 TTM allocation failure under memory pressure on suspend Lorenz Brun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.