All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-01-30 23:17 ` Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-01-30 23:17 UTC (permalink / raw)
  To: ckoenig.leichtzumerken; +Cc: Linux List Kernel Mailing, amd-gfx list

The 5.11-rc5 (git 76c057c84d28) brought a new issue.
Now the kernel log is flooded with the message "page allocation failure".

Trace:
msedge:cs0: page allocation failure: order:10,
mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 18 PID: 4540 Comm: msedge:cs0 Tainted: G        W
--------- ---  5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 3402 01/13/2021
Call Trace:
 dump_stack+0x8b/0xb0
 warn_alloc.cold+0x72/0xd6
 ? _cond_resched+0x16/0x50
 ? __alloc_pages_direct_compact+0x1a1/0x210
 __alloc_pages_slowpath.constprop.0+0xf64/0xf90
 ? kmem_cache_alloc+0x299/0x310
 ? lock_acquire+0x173/0x380
 ? trace_hardirqs_on+0x1b/0xe0
 ? lock_release+0x1e9/0x400
 __alloc_pages_nodemask+0x37d/0x400
 ttm_pool_alloc+0x2a3/0x630 [ttm]
 ttm_tt_populate+0x37/0xe0 [ttm]
 ttm_bo_handle_move_mem+0x142/0x180 [ttm]
 ttm_bo_evict+0x12e/0x1b0 [ttm]
 ? kfree+0xeb/0x660
 ? amdgpu_vram_mgr_new+0x34d/0x3d0 [amdgpu]
 ttm_mem_evict_first+0x101/0x4d0 [ttm]
 ttm_bo_mem_space+0x2c8/0x330 [ttm]
 ttm_bo_validate+0x163/0x1c0 [ttm]
 amdgpu_cs_bo_validate+0x82/0x190 [amdgpu]
 amdgpu_cs_list_validate+0x105/0x150 [amdgpu]
 amdgpu_cs_ioctl+0x803/0x1ef0 [amdgpu]
 ? trace_hardirqs_off_caller+0x41/0xd0
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 drm_ioctl_kernel+0x8c/0xe0 [drm]
 drm_ioctl+0x20f/0x3c0 [drm]
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 ? selinux_file_ioctl+0x147/0x200
 ? lock_acquired+0x1fa/0x380
 ? lock_release+0x1e9/0x400
 ? trace_hardirqs_on+0x1b/0xe0
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 __x64_sys_ioctl+0x82/0xb0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f829c36c11b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 25 bd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f8282c14f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f8282c14fa0 RCX: 00007f829c36c11b
RDX: 00007f8282c14fa0 RSI: 00000000c0186444 RDI: 0000000000000018
RBP: 00000000c0186444 R08: 00007f8282c15640 R09: 00007f8282c14f80
R10: 0000000000000000 R11: 0000000000000246 R12: 00001f592c0fe088
R13: 0000000000000018 R14: 0000000000000000 R15: 00000000fffffffd
Mem-Info:
active_anon:24325 inactive_anon:3569299 isolated_anon:0
 active_file:704540 inactive_file:2709725 isolated_file:0
 unevictable:1230 dirty:256317 writeback:7074
 slab_reclaimable:222328 slab_unreclaimable:112852
 mapped:838359 shmem:469422 pagetables:47722 bounce:0
 free:107165 free_pcp:1298 free_cma:0
Node 0 active_anon:97300kB inactive_anon:14277196kB
active_file:2818160kB inactive_file:10838900kB unevictable:4920kB
isolated(anon):0kB isolated(file):0kB mapped:3353436kB dirty:1025268kB
writeback:28296kB shmem:1877688kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB kernel_stack:62528kB
pagetables:190888kB all_unreclaimable? no
Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:15992kB managed:15900kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3056 31787 31787 31787
Node 0 DMA32 free:303044kB min:6492kB low:9620kB high:12748kB
reserved_highatomic:0KB active_anon:20kB inactive_anon:1322808kB
active_file:5136kB inactive_file:483136kB unevictable:0kB
writepending:220876kB present:3314552kB managed:3246620kB mlocked:0kB
bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 28731 28731 28731
Node 0 Normal free:113816kB min:61052kB low:90472kB high:119892kB
reserved_highatomic:0KB active_anon:97280kB inactive_anon:12953852kB
active_file:2812656kB inactive_file:10355000kB unevictable:4920kB
writepending:832688kB present:30133248kB managed:29421044kB
mlocked:4920kB bounce:0kB free_pcp:5180kB local_pcp:4kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U)
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB
Node 0 DMA32: 1009*4kB (UME) 724*8kB (UME) 488*16kB (UME) 1111*32kB
(UME) 950*64kB (UME) 620*128kB (UME) 223*256kB (UME) 74*512kB (M)
11*1024kB (M) 2*2048kB (ME) 0*4096kB = 303684kB
Node 0 Normal: 964*4kB (UME) 719*8kB (ME) 379*16kB (UME) 192*32kB
(UME) 127*64kB (UME) 130*128kB (UME) 122*256kB (UME) 18*512kB (UME)
4*1024kB (UM) 11*2048kB (UM) 0*4096kB = 113656kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
3881804 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 67108860kB
Total swap = 67108860kB
8365948 pages RAM
0 pages HighMem/MovableOnly
195057 pages reserved
0 pages cma reserved
0 pages hwpoisoned

Full kernel log: https://pastebin.com/dJEzxzQ7

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_pool_alloc+0x2a3
ttm_pool_alloc+0x2a3/0x630:
alloc_pages at /usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/./include/linux/gfp.h:547
(inlined by) ttm_pool_alloc_page at
/usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:91
(inlined by) ttm_pool_alloc at
/usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:383



-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-01-30 23:17 ` Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-01-30 23:17 UTC (permalink / raw)
  To: ckoenig.leichtzumerken; +Cc: Linux List Kernel Mailing, amd-gfx list

The 5.11-rc5 (git 76c057c84d28) brought a new issue.
Now the kernel log is flooded with the message "page allocation failure".

Trace:
msedge:cs0: page allocation failure: order:10,
mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 18 PID: 4540 Comm: msedge:cs0 Tainted: G        W
--------- ---  5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 3402 01/13/2021
Call Trace:
 dump_stack+0x8b/0xb0
 warn_alloc.cold+0x72/0xd6
 ? _cond_resched+0x16/0x50
 ? __alloc_pages_direct_compact+0x1a1/0x210
 __alloc_pages_slowpath.constprop.0+0xf64/0xf90
 ? kmem_cache_alloc+0x299/0x310
 ? lock_acquire+0x173/0x380
 ? trace_hardirqs_on+0x1b/0xe0
 ? lock_release+0x1e9/0x400
 __alloc_pages_nodemask+0x37d/0x400
 ttm_pool_alloc+0x2a3/0x630 [ttm]
 ttm_tt_populate+0x37/0xe0 [ttm]
 ttm_bo_handle_move_mem+0x142/0x180 [ttm]
 ttm_bo_evict+0x12e/0x1b0 [ttm]
 ? kfree+0xeb/0x660
 ? amdgpu_vram_mgr_new+0x34d/0x3d0 [amdgpu]
 ttm_mem_evict_first+0x101/0x4d0 [ttm]
 ttm_bo_mem_space+0x2c8/0x330 [ttm]
 ttm_bo_validate+0x163/0x1c0 [ttm]
 amdgpu_cs_bo_validate+0x82/0x190 [amdgpu]
 amdgpu_cs_list_validate+0x105/0x150 [amdgpu]
 amdgpu_cs_ioctl+0x803/0x1ef0 [amdgpu]
 ? trace_hardirqs_off_caller+0x41/0xd0
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 drm_ioctl_kernel+0x8c/0xe0 [drm]
 drm_ioctl+0x20f/0x3c0 [drm]
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 ? selinux_file_ioctl+0x147/0x200
 ? lock_acquired+0x1fa/0x380
 ? lock_release+0x1e9/0x400
 ? trace_hardirqs_on+0x1b/0xe0
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 __x64_sys_ioctl+0x82/0xb0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f829c36c11b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 25 bd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f8282c14f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f8282c14fa0 RCX: 00007f829c36c11b
RDX: 00007f8282c14fa0 RSI: 00000000c0186444 RDI: 0000000000000018
RBP: 00000000c0186444 R08: 00007f8282c15640 R09: 00007f8282c14f80
R10: 0000000000000000 R11: 0000000000000246 R12: 00001f592c0fe088
R13: 0000000000000018 R14: 0000000000000000 R15: 00000000fffffffd
Mem-Info:
active_anon:24325 inactive_anon:3569299 isolated_anon:0
 active_file:704540 inactive_file:2709725 isolated_file:0
 unevictable:1230 dirty:256317 writeback:7074
 slab_reclaimable:222328 slab_unreclaimable:112852
 mapped:838359 shmem:469422 pagetables:47722 bounce:0
 free:107165 free_pcp:1298 free_cma:0
Node 0 active_anon:97300kB inactive_anon:14277196kB
active_file:2818160kB inactive_file:10838900kB unevictable:4920kB
isolated(anon):0kB isolated(file):0kB mapped:3353436kB dirty:1025268kB
writeback:28296kB shmem:1877688kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB kernel_stack:62528kB
pagetables:190888kB all_unreclaimable? no
Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:15992kB managed:15900kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3056 31787 31787 31787
Node 0 DMA32 free:303044kB min:6492kB low:9620kB high:12748kB
reserved_highatomic:0KB active_anon:20kB inactive_anon:1322808kB
active_file:5136kB inactive_file:483136kB unevictable:0kB
writepending:220876kB present:3314552kB managed:3246620kB mlocked:0kB
bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 28731 28731 28731
Node 0 Normal free:113816kB min:61052kB low:90472kB high:119892kB
reserved_highatomic:0KB active_anon:97280kB inactive_anon:12953852kB
active_file:2812656kB inactive_file:10355000kB unevictable:4920kB
writepending:832688kB present:30133248kB managed:29421044kB
mlocked:4920kB bounce:0kB free_pcp:5180kB local_pcp:4kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U)
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB
Node 0 DMA32: 1009*4kB (UME) 724*8kB (UME) 488*16kB (UME) 1111*32kB
(UME) 950*64kB (UME) 620*128kB (UME) 223*256kB (UME) 74*512kB (M)
11*1024kB (M) 2*2048kB (ME) 0*4096kB = 303684kB
Node 0 Normal: 964*4kB (UME) 719*8kB (ME) 379*16kB (UME) 192*32kB
(UME) 127*64kB (UME) 130*128kB (UME) 122*256kB (UME) 18*512kB (UME)
4*1024kB (UM) 11*2048kB (UM) 0*4096kB = 113656kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
3881804 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 67108860kB
Total swap = 67108860kB
8365948 pages RAM
0 pages HighMem/MovableOnly
195057 pages reserved
0 pages cma reserved
0 pages hwpoisoned

Full kernel log: https://pastebin.com/dJEzxzQ7

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_pool_alloc+0x2a3
ttm_pool_alloc+0x2a3/0x630:
alloc_pages at /usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/./include/linux/gfp.h:547
(inlined by) ttm_pool_alloc_page at
/usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:91
(inlined by) ttm_pool_alloc at
/usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:383



-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-01-30 23:17 ` Mikhail Gavrilov
  (?)
@ 2021-01-31  1:01 ` David Rientjes
  2021-01-31  1:03   ` David Rientjes
  -1 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2021-01-31  1:01 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Christian König, ckoenig.leichtzumerken,
	Linux List Kernel Mailing, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 6438 bytes --]

On Sun, 31 Jan 2021, Mikhail Gavrilov wrote:

> The 5.11-rc5 (git 76c057c84d28) brought a new issue.
> Now the kernel log is flooded with the message "page allocation failure".
> 
> Trace:
> msedge:cs0: page allocation failure: order:10,

Order-10, wow!

ttm_pool_alloc() will start at order-10 and back off trying smaller orders 
if necessary.  This is a regression introduced in

commit bf9eee249ac2032521677dd74e31ede5429afbc0
Author: Christian König <christian.koenig@amd.com>
Date:   Wed Jan 13 14:02:04 2021 +0100

    drm/ttm: stop using GFP_TRANSHUGE_LIGHT

Namely, it removed the __GFP_NOWARN that we otherwise require.  I'll send 
a patch in reply.

> mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC),
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 18 PID: 4540 Comm: msedge:cs0 Tainted: G        W
> --------- ---  5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 3402 01/13/2021
> Call Trace:
>  dump_stack+0x8b/0xb0
>  warn_alloc.cold+0x72/0xd6
>  ? _cond_resched+0x16/0x50
>  ? __alloc_pages_direct_compact+0x1a1/0x210
>  __alloc_pages_slowpath.constprop.0+0xf64/0xf90
>  ? kmem_cache_alloc+0x299/0x310
>  ? lock_acquire+0x173/0x380
>  ? trace_hardirqs_on+0x1b/0xe0
>  ? lock_release+0x1e9/0x400
>  __alloc_pages_nodemask+0x37d/0x400
>  ttm_pool_alloc+0x2a3/0x630 [ttm]
>  ttm_tt_populate+0x37/0xe0 [ttm]
>  ttm_bo_handle_move_mem+0x142/0x180 [ttm]
>  ttm_bo_evict+0x12e/0x1b0 [ttm]
>  ? kfree+0xeb/0x660
>  ? amdgpu_vram_mgr_new+0x34d/0x3d0 [amdgpu]
>  ttm_mem_evict_first+0x101/0x4d0 [ttm]
>  ttm_bo_mem_space+0x2c8/0x330 [ttm]
>  ttm_bo_validate+0x163/0x1c0 [ttm]
>  amdgpu_cs_bo_validate+0x82/0x190 [amdgpu]
>  amdgpu_cs_list_validate+0x105/0x150 [amdgpu]
>  amdgpu_cs_ioctl+0x803/0x1ef0 [amdgpu]
>  ? trace_hardirqs_off_caller+0x41/0xd0
>  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>  drm_ioctl_kernel+0x8c/0xe0 [drm]
>  drm_ioctl+0x20f/0x3c0 [drm]
>  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>  ? selinux_file_ioctl+0x147/0x200
>  ? lock_acquired+0x1fa/0x380
>  ? lock_release+0x1e9/0x400
>  ? trace_hardirqs_on+0x1b/0xe0
>  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
>  __x64_sys_ioctl+0x82/0xb0
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f829c36c11b
> Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
> c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d 25 bd 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007f8282c14f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007f8282c14fa0 RCX: 00007f829c36c11b
> RDX: 00007f8282c14fa0 RSI: 00000000c0186444 RDI: 0000000000000018
> RBP: 00000000c0186444 R08: 00007f8282c15640 R09: 00007f8282c14f80
> R10: 0000000000000000 R11: 0000000000000246 R12: 00001f592c0fe088
> R13: 0000000000000018 R14: 0000000000000000 R15: 00000000fffffffd
> Mem-Info:
> active_anon:24325 inactive_anon:3569299 isolated_anon:0
>  active_file:704540 inactive_file:2709725 isolated_file:0
>  unevictable:1230 dirty:256317 writeback:7074
>  slab_reclaimable:222328 slab_unreclaimable:112852
>  mapped:838359 shmem:469422 pagetables:47722 bounce:0
>  free:107165 free_pcp:1298 free_cma:0
> Node 0 active_anon:97300kB inactive_anon:14277196kB
> active_file:2818160kB inactive_file:10838900kB unevictable:4920kB
> isolated(anon):0kB isolated(file):0kB mapped:3353436kB dirty:1025268kB
> writeback:28296kB shmem:1877688kB shmem_thp: 0kB shmem_pmdmapped: 0kB
> anon_thp: 0kB writeback_tmp:0kB kernel_stack:62528kB
> pagetables:190888kB all_unreclaimable? no
> Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:15992kB managed:15900kB mlocked:0kB bounce:0kB free_pcp:0kB
> local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 3056 31787 31787 31787
> Node 0 DMA32 free:303044kB min:6492kB low:9620kB high:12748kB
> reserved_highatomic:0KB active_anon:20kB inactive_anon:1322808kB
> active_file:5136kB inactive_file:483136kB unevictable:0kB
> writepending:220876kB present:3314552kB managed:3246620kB mlocked:0kB
> bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 0 28731 28731 28731
> Node 0 Normal free:113816kB min:61052kB low:90472kB high:119892kB
> reserved_highatomic:0KB active_anon:97280kB inactive_anon:12953852kB
> active_file:2812656kB inactive_file:10355000kB unevictable:4920kB
> writepending:832688kB present:30133248kB managed:29421044kB
> mlocked:4920kB bounce:0kB free_pcp:5180kB local_pcp:4kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0 0
> Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U)
> 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB
> Node 0 DMA32: 1009*4kB (UME) 724*8kB (UME) 488*16kB (UME) 1111*32kB
> (UME) 950*64kB (UME) 620*128kB (UME) 223*256kB (UME) 74*512kB (M)
> 11*1024kB (M) 2*2048kB (ME) 0*4096kB = 303684kB
> Node 0 Normal: 964*4kB (UME) 719*8kB (ME) 379*16kB (UME) 192*32kB
> (UME) 127*64kB (UME) 130*128kB (UME) 122*256kB (UME) 18*512kB (UME)
> 4*1024kB (UM) 11*2048kB (UM) 0*4096kB = 113656kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=1048576kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 3881804 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 67108860kB
> Total swap = 67108860kB
> 8365948 pages RAM
> 0 pages HighMem/MovableOnly
> 195057 pages reserved
> 0 pages cma reserved
> 0 pages hwpoisoned
> 
> Full kernel log: https://pastebin.com/dJEzxzQ7
> 
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname
> -r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_pool_alloc+0x2a3
> ttm_pool_alloc+0x2a3/0x630:
> alloc_pages at /usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/./include/linux/gfp.h:547
> (inlined by) ttm_pool_alloc_page at
> /usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:91
> (inlined by) ttm_pool_alloc at
> /usr/src/debug/kernel-20210128git76c057c84d28/linux-5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64/drivers/gpu/drm/ttm/ttm_pool.c:383
> 
> 
> 
> -- 
> Best Regards,
> Mike Gavrilov.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-01-31  1:01 ` David Rientjes
@ 2021-01-31  1:03   ` David Rientjes
  2021-02-03 13:22       ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2021-01-31  1:03 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Christian König, ckoenig.leichtzumerken,
	Linux List Kernel Mailing, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

On Sat, 30 Jan 2021, David Rientjes wrote:

> On Sun, 31 Jan 2021, Mikhail Gavrilov wrote:
> 
> > The 5.11-rc5 (git 76c057c84d28) brought a new issue.
> > Now the kernel log is flooded with the message "page allocation failure".
> > 
> > Trace:
> > msedge:cs0: page allocation failure: order:10,
> 
> Order-10, wow!
> 
> ttm_pool_alloc() will start at order-10 and back off trying smaller orders 
> if necessary.  This is a regression introduced in
> 
> commit bf9eee249ac2032521677dd74e31ede5429afbc0
> Author: Christian König <christian.koenig@amd.com>
> Date:   Wed Jan 13 14:02:04 2021 +0100
> 
>     drm/ttm: stop using GFP_TRANSHUGE_LIGHT
> 
> Namely, it removed the __GFP_NOWARN that we otherwise require.  I'll send 
> a patch in reply.
> 

Looks like Michel Dänzer <michel@daenzer.net> already sent a patch that 
should fix this:
https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-01-31  1:03   ` David Rientjes
@ 2021-02-03 13:22       ` Christian König
  0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2021-02-03 13:22 UTC (permalink / raw)
  To: David Rientjes, Mikhail Gavrilov
  Cc: Christian König, Linux List Kernel Mailing, amd-gfx list

Am 31.01.21 um 02:03 schrieb David Rientjes:
> On Sat, 30 Jan 2021, David Rientjes wrote:
>
>> On Sun, 31 Jan 2021, Mikhail Gavrilov wrote:
>>
>>> The 5.11-rc5 (git 76c057c84d28) brought a new issue.
>>> Now the kernel log is flooded with the message "page allocation failure".
>>>
>>> Trace:
>>> msedge:cs0: page allocation failure: order:10,
>> Order-10, wow!
>>
>> ttm_pool_alloc() will start at order-10 and back off trying smaller orders
>> if necessary.  This is a regression introduced in
>>
>> commit bf9eee249ac2032521677dd74e31ede5429afbc0
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Wed Jan 13 14:02:04 2021 +0100
>>
>>      drm/ttm: stop using GFP_TRANSHUGE_LIGHT
>>
>> Namely, it removed the __GFP_NOWARN that we otherwise require.  I'll send
>> a patch in reply.
>>
> Looks like Michel Dänzer <michel@daenzer.net> already sent a patch that
> should fix this:
> https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes. 
Should land in the next -rc by the weekend.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-02-03 13:22       ` Christian König
  0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2021-02-03 13:22 UTC (permalink / raw)
  To: David Rientjes, Mikhail Gavrilov
  Cc: Christian König, amd-gfx list, Linux List Kernel Mailing

Am 31.01.21 um 02:03 schrieb David Rientjes:
> On Sat, 30 Jan 2021, David Rientjes wrote:
>
>> On Sun, 31 Jan 2021, Mikhail Gavrilov wrote:
>>
>>> The 5.11-rc5 (git 76c057c84d28) brought a new issue.
>>> Now the kernel log is flooded with the message "page allocation failure".
>>>
>>> Trace:
>>> msedge:cs0: page allocation failure: order:10,
>> Order-10, wow!
>>
>> ttm_pool_alloc() will start at order-10 and back off trying smaller orders
>> if necessary.  This is a regression introduced in
>>
>> commit bf9eee249ac2032521677dd74e31ede5429afbc0
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Wed Jan 13 14:02:04 2021 +0100
>>
>>      drm/ttm: stop using GFP_TRANSHUGE_LIGHT
>>
>> Namely, it removed the __GFP_NOWARN that we otherwise require.  I'll send
>> a patch in reply.
>>
> Looks like Michel Dänzer <michel@daenzer.net> already sent a patch that
> should fix this:
> https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes. 
Should land in the next -rc by the weekend.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-02-03 13:22       ` Christian König
@ 2021-02-06 18:17         ` Mikhail Gavrilov
  -1 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-02-06 18:17 UTC (permalink / raw)
  To: Christian König
  Cc: David Rientjes, Linux List Kernel Mailing, amd-gfx list

On Sun, 31 Jan 2021 at 22:22, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
>
> Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes.
> Should land in the next -rc by the weekend.
>
> Regards,
> Christian.

I checked this patch [1] for several days.
And I can confirm that the reported issue was gone.

[1] https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-02-06 18:17         ` Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-02-06 18:17 UTC (permalink / raw)
  To: Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list, David Rientjes

On Sun, 31 Jan 2021 at 22:22, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
>
> Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes.
> Should land in the next -rc by the weekend.
>
> Regards,
> Christian.

I checked this patch [1] for several days.
And I can confirm that the reported issue was gone.

[1] https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-02-06 18:17         ` Mikhail Gavrilov
@ 2021-02-08  9:18           ` Christian König
  -1 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2021-02-08  9:18 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list, David Rientjes

Am 06.02.21 um 19:17 schrieb Mikhail Gavrilov:
> On Sun, 31 Jan 2021 at 22:22, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>> Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes.
>> Should land in the next -rc by the weekend.
>>
>> Regards,
>> Christian.
> I checked this patch [1] for several days.
> And I can confirm that the reported issue was gone.
>
> [1] https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

Are the other problems gone as well?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-02-08  9:18           ` Christian König
  0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2021-02-08  9:18 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list, David Rientjes

Am 06.02.21 um 19:17 schrieb Mikhail Gavrilov:
> On Sun, 31 Jan 2021 at 22:22, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>> Yeah, known issue. I already pushed Michel's fix to drm-misc-fixes.
>> Should land in the next -rc by the weekend.
>>
>> Regards,
>> Christian.
> I checked this patch [1] for several days.
> And I can confirm that the reported issue was gone.
>
> [1] https://lore.kernel.org/lkml/20210128095346.2421-1-michel@daenzer.net/

Are the other problems gone as well?

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
  2021-02-08  9:18           ` Christian König
@ 2021-02-09 19:22             ` Mikhail Gavrilov
  -1 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-02-09 19:22 UTC (permalink / raw)
  To: Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list, David Rientjes

On Mon, 8 Feb 2021 at 14:18, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Are the other problems gone as well?
>

And yes and no.
The issue with monitor turns off was gone after rc6 (git3aaf0a27ffc2)
But both traces
1) BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196 (kernel 5.11 specific)
2) WARNING: CPU: 14 PID: 504 at kernel/locking/lockdep.c:4618
lockdep_init_map_waits+0x18b/0x210 (Navi specific)
are still happening on every boot.

1)
[    5.806032] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
[    5.806048] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
504, name: systemd-udevd
[    5.806064] 1 lock held by systemd-udevd/504:
[    5.806073]  #0: ffff9c5ac2e4f258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x3b/0xb0
[    5.806097] CPU: 14 PID: 504 Comm: systemd-udevd Not tainted
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    5.806117] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    5.806135] Call Trace:
[    5.806142]  dump_stack+0x8b/0xb0
[    5.806153]  ___might_sleep.cold+0xb6/0xc6
[    5.806163]  ? dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806338]  kmem_cache_alloc_trace+0x204/0x230
[    5.806353]  dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806516]  dcn30_create_resource_pool+0x1de/0x13b0 [amdgpu]
[    5.806678]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.806690]  ? trace_kmalloc+0xb2/0xe0
[    5.806699]  ? __kmalloc+0x191/0x280
[    5.806710]  ? dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.806869]  dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.807026]  dc_create+0x205/0x790 [amdgpu]
[    5.807181]  ? trace_kmalloc+0xb2/0xe0
[    5.807190]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.807203]  amdgpu_dm_init.isra.0+0x1b9/0x250 [amdgpu]
[    5.807369]  ? dev_vprintk_emit+0x171/0x195
[    5.807385]  ? dev_printk_emit+0x3e/0x40
[    5.807403]  dm_hw_init+0xe/0x20 [amdgpu]
[    5.807563]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    5.807728]  ? pci_conf1_read+0x9b/0xf0
[    5.807741]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    5.807877]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    5.808009]  local_pci_probe+0x42/0x80
[    5.808020]  pci_device_probe+0xd9/0x1a0
[    5.808031]  really_probe+0xf2/0x440
[    5.808042]  driver_probe_device+0xe1/0x150
[    5.808053]  device_driver_attach+0xa8/0xb0
[    5.808063]  __driver_attach+0x8c/0x150
[    5.808071]  ? device_driver_attach+0xb0/0xb0
[    5.808080]  ? device_driver_attach+0xb0/0xb0
[    5.808090]  bus_for_each_dev+0x67/0x90
[    5.808101]  bus_add_driver+0x12e/0x1f0
[    5.808111]  driver_register+0x8f/0xe0
[    5.808119]  ? 0xffffffffc0c02000
[    5.808128]  do_one_initcall+0x67/0x320
[    5.808138]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.808148]  ? trace_kmalloc+0xb2/0xe0
[    5.808157]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.808169]  do_init_module+0x5c/0x270
[    5.808179]  __do_sys_init_module+0x130/0x190
[    5.808196]  do_syscall_64+0x33/0x40
[    5.808205]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    5.808216] RIP: 0033:0x7f4d133aa40e
[    5.808225] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    5.808256] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    5.808272] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    5.808285] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    5.808298] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    5.808311] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    5.808324] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100


2)
[    6.064107] BUG: key ffff9c5adb339148 has not been registered!
[    6.064119] ------------[ cut here ]------------
[    6.064121] DEBUG_LOCKS_WARN_ON(1)
[    6.064124] WARNING: CPU: 14 PID: 504 at
kernel/locking/lockdep.c:4618 lockdep_init_map_waits+0x18b/0x210
[    6.064131] Modules linked in: amdgpu(+) drm_ttm_helper ttm
iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul
crc32c_intel cec igb drm ghash_clmulni_intel ccp nvme dca i2c_algo_bit
nvme_core wmi pinctrl_amd fuse
[    6.064147] CPU: 14 PID: 504 Comm: systemd-udevd Tainted: G
W        --------- ---
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    6.064152] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    6.064156] RIP: 0010:lockdep_init_map_waits+0x18b/0x210
[    6.064159] Code: 00 85 c0 0f 84 77 ff ff ff 8b 3d 08 5e f1 01 85
ff 0f 85 69 ff ff ff 48 c7 c6 cc 98 60 9a 48 c7 c7 7d d4 5a 9a e8 51
3a b7 00 <0f> 0b e9 4f ff ff ff e8 c9 82 bd 00 85 c0 74 21 44 8b 15 d6
5d f1
[    6.064165] RSP: 0018:ffffbba701be78c8 EFLAGS: 00010292
[    6.064168] RAX: 0000000000000016 RBX: ffffffff9a247b80 RCX: 0000000000000027
[    6.064171] RDX: ffff9c61c87db2a8 RSI: 0000000000000001 RDI: ffff9c61c87db2a0
[    6.064174] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbba701be7700
[    6.064177] R10: ffffbba701be76f8 R11: 0000000000000000 R12: ffff9c5adb339148
[    6.064180] R13: 0000000000000000 R14: ffff9c5adb610348 R15: ffff9c5adb610348
[    6.064183] FS:  00007f4d1279c340(0000) GS:ffff9c61c8600000(0000)
knlGS:0000000000000000
[    6.064186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.064189] CR2: 0000563f79657000 CR3: 0000000111396000 CR4: 0000000000350ee0
[    6.064192] Call Trace:
[    6.064194]  __kernfs_create_file+0x7b/0x100
[    6.064198]  sysfs_add_file_mode_ns+0xa2/0x190
[    6.064202]  sysfs_create_bin_file+0x50/0x70
[    6.064205]  hdcp_create_workqueue+0x3bd/0x410 [amdgpu]
[    6.064365]  amdgpu_dm_init.isra.0.cold+0x293/0x13e7 [amdgpu]
[    6.064526]  ? dev_vprintk_emit+0x171/0x195
[    6.064529]  ? psp_set_srm+0xb0/0xb0 [amdgpu]
[    6.064691]  ? hdcp_update_display+0x1f0/0x1f0 [amdgpu]
[    6.064847]  ? dev_printk_emit+0x3e/0x40
[    6.064851]  dm_hw_init+0xe/0x20 [amdgpu]
[    6.065005]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    6.065160]  ? pci_conf1_read+0x9b/0xf0
[    6.065164]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    6.065291]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    6.065415]  local_pci_probe+0x42/0x80
[    6.065418]  pci_device_probe+0xd9/0x1a0
[    6.065421]  really_probe+0xf2/0x440
[    6.065425]  driver_probe_device+0xe1/0x150
[    6.065428]  device_driver_attach+0xa8/0xb0
[    6.065431]  __driver_attach+0x8c/0x150
[    6.065433]  ? device_driver_attach+0xb0/0xb0
[    6.065435]  ? device_driver_attach+0xb0/0xb0
[    6.065438]  bus_for_each_dev+0x67/0x90
[    6.065441]  bus_add_driver+0x12e/0x1f0
[    6.065445]  driver_register+0x8f/0xe0
[    6.065447]  ? 0xffffffffc0c02000
[    6.065449]  do_one_initcall+0x67/0x320
[    6.065452]  ? rcu_read_lock_sched_held+0x3f/0x80
[    6.065455]  ? trace_kmalloc+0xb2/0xe0
[    6.065458]  ? kmem_cache_alloc_trace+0x174/0x230
[    6.065462]  do_init_module+0x5c/0x270
[    6.065465]  __do_sys_init_module+0x130/0x190
[    6.065469]  do_syscall_64+0x33/0x40
[    6.065472]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    6.065475] RIP: 0033:0x7f4d133aa40e
[    6.065477] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    6.065483] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    6.065487] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    6.065490] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    6.065493] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    6.065496] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    6.065499] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100
[    6.065503] irq event stamp: 304459
[    6.065505] hardirqs last  enabled at (304459):
[<ffffffff99169d57>] console_unlock+0x527/0x640
[    6.065510] hardirqs last disabled at (304458):
[<ffffffff99169ca2>] console_unlock+0x472/0x640
[    6.065514] softirqs last  enabled at (304350):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065518] softirqs last disabled at (304345):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065522] ---[ end trace 3e996d7d10608635 ]---


Full kernel log is here: https://pastebin.com/sguf7Tac

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]
@ 2021-02-09 19:22             ` Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2021-02-09 19:22 UTC (permalink / raw)
  To: Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list, David Rientjes

On Mon, 8 Feb 2021 at 14:18, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Are the other problems gone as well?
>

And yes and no.
The issue with monitor turns off was gone after rc6 (git3aaf0a27ffc2)
But both traces
1) BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196 (kernel 5.11 specific)
2) WARNING: CPU: 14 PID: 504 at kernel/locking/lockdep.c:4618
lockdep_init_map_waits+0x18b/0x210 (Navi specific)
are still happening on every boot.

1)
[    5.806032] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
[    5.806048] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
504, name: systemd-udevd
[    5.806064] 1 lock held by systemd-udevd/504:
[    5.806073]  #0: ffff9c5ac2e4f258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x3b/0xb0
[    5.806097] CPU: 14 PID: 504 Comm: systemd-udevd Not tainted
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    5.806117] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    5.806135] Call Trace:
[    5.806142]  dump_stack+0x8b/0xb0
[    5.806153]  ___might_sleep.cold+0xb6/0xc6
[    5.806163]  ? dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806338]  kmem_cache_alloc_trace+0x204/0x230
[    5.806353]  dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806516]  dcn30_create_resource_pool+0x1de/0x13b0 [amdgpu]
[    5.806678]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.806690]  ? trace_kmalloc+0xb2/0xe0
[    5.806699]  ? __kmalloc+0x191/0x280
[    5.806710]  ? dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.806869]  dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.807026]  dc_create+0x205/0x790 [amdgpu]
[    5.807181]  ? trace_kmalloc+0xb2/0xe0
[    5.807190]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.807203]  amdgpu_dm_init.isra.0+0x1b9/0x250 [amdgpu]
[    5.807369]  ? dev_vprintk_emit+0x171/0x195
[    5.807385]  ? dev_printk_emit+0x3e/0x40
[    5.807403]  dm_hw_init+0xe/0x20 [amdgpu]
[    5.807563]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    5.807728]  ? pci_conf1_read+0x9b/0xf0
[    5.807741]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    5.807877]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    5.808009]  local_pci_probe+0x42/0x80
[    5.808020]  pci_device_probe+0xd9/0x1a0
[    5.808031]  really_probe+0xf2/0x440
[    5.808042]  driver_probe_device+0xe1/0x150
[    5.808053]  device_driver_attach+0xa8/0xb0
[    5.808063]  __driver_attach+0x8c/0x150
[    5.808071]  ? device_driver_attach+0xb0/0xb0
[    5.808080]  ? device_driver_attach+0xb0/0xb0
[    5.808090]  bus_for_each_dev+0x67/0x90
[    5.808101]  bus_add_driver+0x12e/0x1f0
[    5.808111]  driver_register+0x8f/0xe0
[    5.808119]  ? 0xffffffffc0c02000
[    5.808128]  do_one_initcall+0x67/0x320
[    5.808138]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.808148]  ? trace_kmalloc+0xb2/0xe0
[    5.808157]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.808169]  do_init_module+0x5c/0x270
[    5.808179]  __do_sys_init_module+0x130/0x190
[    5.808196]  do_syscall_64+0x33/0x40
[    5.808205]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    5.808216] RIP: 0033:0x7f4d133aa40e
[    5.808225] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    5.808256] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    5.808272] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    5.808285] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    5.808298] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    5.808311] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    5.808324] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100


2)
[    6.064107] BUG: key ffff9c5adb339148 has not been registered!
[    6.064119] ------------[ cut here ]------------
[    6.064121] DEBUG_LOCKS_WARN_ON(1)
[    6.064124] WARNING: CPU: 14 PID: 504 at
kernel/locking/lockdep.c:4618 lockdep_init_map_waits+0x18b/0x210
[    6.064131] Modules linked in: amdgpu(+) drm_ttm_helper ttm
iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul
crc32c_intel cec igb drm ghash_clmulni_intel ccp nvme dca i2c_algo_bit
nvme_core wmi pinctrl_amd fuse
[    6.064147] CPU: 14 PID: 504 Comm: systemd-udevd Tainted: G
W        --------- ---
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    6.064152] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    6.064156] RIP: 0010:lockdep_init_map_waits+0x18b/0x210
[    6.064159] Code: 00 85 c0 0f 84 77 ff ff ff 8b 3d 08 5e f1 01 85
ff 0f 85 69 ff ff ff 48 c7 c6 cc 98 60 9a 48 c7 c7 7d d4 5a 9a e8 51
3a b7 00 <0f> 0b e9 4f ff ff ff e8 c9 82 bd 00 85 c0 74 21 44 8b 15 d6
5d f1
[    6.064165] RSP: 0018:ffffbba701be78c8 EFLAGS: 00010292
[    6.064168] RAX: 0000000000000016 RBX: ffffffff9a247b80 RCX: 0000000000000027
[    6.064171] RDX: ffff9c61c87db2a8 RSI: 0000000000000001 RDI: ffff9c61c87db2a0
[    6.064174] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbba701be7700
[    6.064177] R10: ffffbba701be76f8 R11: 0000000000000000 R12: ffff9c5adb339148
[    6.064180] R13: 0000000000000000 R14: ffff9c5adb610348 R15: ffff9c5adb610348
[    6.064183] FS:  00007f4d1279c340(0000) GS:ffff9c61c8600000(0000)
knlGS:0000000000000000
[    6.064186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.064189] CR2: 0000563f79657000 CR3: 0000000111396000 CR4: 0000000000350ee0
[    6.064192] Call Trace:
[    6.064194]  __kernfs_create_file+0x7b/0x100
[    6.064198]  sysfs_add_file_mode_ns+0xa2/0x190
[    6.064202]  sysfs_create_bin_file+0x50/0x70
[    6.064205]  hdcp_create_workqueue+0x3bd/0x410 [amdgpu]
[    6.064365]  amdgpu_dm_init.isra.0.cold+0x293/0x13e7 [amdgpu]
[    6.064526]  ? dev_vprintk_emit+0x171/0x195
[    6.064529]  ? psp_set_srm+0xb0/0xb0 [amdgpu]
[    6.064691]  ? hdcp_update_display+0x1f0/0x1f0 [amdgpu]
[    6.064847]  ? dev_printk_emit+0x3e/0x40
[    6.064851]  dm_hw_init+0xe/0x20 [amdgpu]
[    6.065005]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    6.065160]  ? pci_conf1_read+0x9b/0xf0
[    6.065164]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    6.065291]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    6.065415]  local_pci_probe+0x42/0x80
[    6.065418]  pci_device_probe+0xd9/0x1a0
[    6.065421]  really_probe+0xf2/0x440
[    6.065425]  driver_probe_device+0xe1/0x150
[    6.065428]  device_driver_attach+0xa8/0xb0
[    6.065431]  __driver_attach+0x8c/0x150
[    6.065433]  ? device_driver_attach+0xb0/0xb0
[    6.065435]  ? device_driver_attach+0xb0/0xb0
[    6.065438]  bus_for_each_dev+0x67/0x90
[    6.065441]  bus_add_driver+0x12e/0x1f0
[    6.065445]  driver_register+0x8f/0xe0
[    6.065447]  ? 0xffffffffc0c02000
[    6.065449]  do_one_initcall+0x67/0x320
[    6.065452]  ? rcu_read_lock_sched_held+0x3f/0x80
[    6.065455]  ? trace_kmalloc+0xb2/0xe0
[    6.065458]  ? kmem_cache_alloc_trace+0x174/0x230
[    6.065462]  do_init_module+0x5c/0x270
[    6.065465]  __do_sys_init_module+0x130/0x190
[    6.065469]  do_syscall_64+0x33/0x40
[    6.065472]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    6.065475] RIP: 0033:0x7f4d133aa40e
[    6.065477] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    6.065483] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    6.065487] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    6.065490] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    6.065493] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    6.065496] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    6.065499] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100
[    6.065503] irq event stamp: 304459
[    6.065505] hardirqs last  enabled at (304459):
[<ffffffff99169d57>] console_unlock+0x527/0x640
[    6.065510] hardirqs last disabled at (304458):
[<ffffffff99169ca2>] console_unlock+0x472/0x640
[    6.065514] softirqs last  enabled at (304350):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065518] softirqs last disabled at (304345):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065522] ---[ end trace 3e996d7d10608635 ]---


Full kernel log is here: https://pastebin.com/sguf7Tac

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-09 23:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-30 23:17 [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu] Mikhail Gavrilov
2021-01-30 23:17 ` Mikhail Gavrilov
2021-01-31  1:01 ` David Rientjes
2021-01-31  1:03   ` David Rientjes
2021-02-03 13:22     ` Christian König
2021-02-03 13:22       ` Christian König
2021-02-06 18:17       ` Mikhail Gavrilov
2021-02-06 18:17         ` Mikhail Gavrilov
2021-02-08  9:18         ` Christian König
2021-02-08  9:18           ` Christian König
2021-02-09 19:22           ` Mikhail Gavrilov
2021-02-09 19:22             ` Mikhail Gavrilov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.