* list corruption in deferred_split_scan()
@ 2019-07-10 21:43 Qian Cai
2019-07-11 0:16 ` Yang Shi
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Qian Cai @ 2019-07-10 21:43 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
Running LTP oom01 test case with swap triggers a crash below. Revert the series
"Make deferred split shrinker memcg aware" [1] seems fix the issue.
aefde94195ca mm: thp: make deferred split shrinker memcg aware
cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
4e050f2df876 mm: thp: extract split_queue_* into a struct
[1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@
linux.alibaba.com/
[ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
LIST_POISON1 (dead000000000100)
[ 1145.739763][ T5764] ------------[ cut here ]------------
[ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
[ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
G W 5.2.0-next-20190710+ #7
[ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
[ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
[ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
[ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
ffffffffae95d318
[ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff8888440bd380
[ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
ffffed1108817a70
[ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
dead000000000122
[ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
dead000000000100
[ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000)
knlGS:0000000000000000
[ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
00000000001406a0
[ 1145.870664][ T5764] Call Trace:
[ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
[ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
[ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
[ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
[ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
[ 1145.900159][ T5764] shrink_slab+0x253/0x440
[ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
[ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
[ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
[ 1145.919645][ T5764] shrink_node+0x31e/0xa30
[ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
[ 1145.929126][ T5764] ? ktime_get+0x93/0x110
[ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
[ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
[ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
[ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
[ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
[ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
[ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
[ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
[ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
[ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
[ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
[ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
[ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
[ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
[ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
[ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
[ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
[ 1146.026893][ T5764] ? finish_fault+0x120/0x120
[ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
[ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
[ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
[ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
[ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
[ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
[ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
[ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
[ 1146.075426][ T5764] ? page_fault+0x5/0x20
[ 1146.079553][ T5764] page_fault+0x1b/0x20
[ 1146.083594][ T5764] RIP: 0033:0x410be0
[ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
[ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007f98f2674497
[ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
0000000000000000
[ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
0000000000000000
[ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
T5764] Shutting down cpus with NMI
[ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
@ 2019-07-11 0:16 ` Yang Shi
2019-07-11 21:07 ` Qian Cai
2019-07-15 4:52 ` Yang Shi
2019-07-24 21:13 ` Qian Cai
2 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-11 0:16 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
Hi Qian,
Thanks for reporting the issue. But, I can't reproduce it on my machine.
Could you please share more details about your test? How often did you
run into this problem?
Regards,
Yang
On 7/10/19 2:43 PM, Qian Cai wrote:
> Running LTP oom01 test case with swap triggers a crash below. Revert the series
> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>
> aefde94195ca mm: thp: make deferred split shrinker memcg aware
> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>
> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@
> linux.alibaba.com/
>
> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
> LIST_POISON1 (dead000000000100)
> [ 1145.739763][ T5764] ------------[ cut here ]------------
> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
> G W 5.2.0-next-20190710+ #7
> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
> ffffffffae95d318
> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> ffff8888440bd380
> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
> ffffed1108817a70
> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
> dead000000000122
> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
> dead000000000100
> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000)
> knlGS:0000000000000000
> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
> 00000000001406a0
> [ 1145.870664][ T5764] Call Trace:
> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
> [ 1145.900159][ T5764] shrink_slab+0x253/0x440
> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30
> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110
> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120
> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
> [ 1146.075426][ T5764] ? page_fault+0x5/0x20
> [ 1146.079553][ T5764] page_fault+0x1b/0x20
> [ 1146.083594][ T5764] RIP: 0033:0x410be0
> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
> 00007f98f2674497
> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
> 0000000000000000
> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
> 0000000000000000
> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
> T5764] Shutting down cpus with NMI
> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-11 0:16 ` Yang Shi
@ 2019-07-11 21:07 ` Qian Cai
2019-07-12 19:12 ` Yang Shi
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-07-11 21:07 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
> Hi Qian,
>
>
> Thanks for reporting the issue. But, I can't reproduce it on my machine.
> Could you please share more details about your test? How often did you
> run into this problem?
I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
is some more information.
# cat .config
https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
# numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
node 0 size: 19984 MB
node 0 free: 7251 MB
node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
node 4 size: 31524 MB
node 4 free: 25165 MB
node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 32 32 32 32
1: 16 10 16 16 32 32 32 32
2: 16 16 10 16 32 32 32 32
3: 16 16 16 10 32 32 32 32
4: 32 32 32 32 10 16 16 16
5: 32 32 32 32 16 10 16 16
6: 32 32 32 32 16 16 10 16
7: 32 32 32 32 16 16 16 10
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7601 32-Core Processor
Stepping: 2
CPU MHz: 2713.551
BogoMIPS: 4391.39
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7,64-71
NUMA node1 CPU(s): 8-15,72-79
NUMA node2 CPU(s): 16-23,80-87
NUMA node3 CPU(s): 24-31,88-95
NUMA node4 CPU(s): 32-39,96-103
NUMA node5 CPU(s): 40-47,104-111
NUMA node6 CPU(s): 48-55,112-119
NUMA node7 CPU(s): 56-63,120-127
Another possible lead is that without reverting the those commits below, kdump
kernel would always also crash in shrink_slab_memcg() at this line,
map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
[ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
[ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
swapper/0/1
[ 9.072036][ T1]
[ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
20190711+ #10
[ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[ 9.072036][ T1] Call Trace:
[ 9.072036][ T1] dump_stack+0x62/0x9a
[ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
[ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
[ 9.072036][ T1] ? shrink_slab+0x111/0x440
[ 9.072036][ T1] kasan_report+0xc/0xe
[ 9.072036][ T1] __asan_load8+0x71/0xa0
[ 9.072036][ T1] shrink_slab+0x111/0x440
[ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
[ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
[ 9.072036][ T1] ? kasan_check_read+0x11/0x20
[ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
[ 9.072036][ T1] shrink_node+0x31e/0xa30
[ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
[ 9.072036][ T1] ? ktime_get+0x93/0x110
[ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
[ 9.072036][ T1] ? shrink_node+0xa30/0xa30
[ 9.072036][ T1] ? kasan_check_read+0x11/0x20
[ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
[ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
[ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
[ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 9.072036][ T1] ? unwind_dump+0x260/0x260
[ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
[ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
[ 9.072036][ T1] ? ret_from_fork+0x22/0x40
[ 9.072036][ T1] alloc_page_interleave+0x18/0x130
[ 9.072036][ T1] alloc_pages_current+0xf6/0x110
[ 9.072036][ T1] allocate_slab+0x600/0x11f0
[ 9.072036][ T1] new_slab+0x46/0x70
[ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
[ 9.072036][ T1] ? create_object+0x3a/0x3e0
[ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
[ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
[ 9.072036][ T1] ? create_object+0x3a/0x3e0
[ 9.072036][ T1] __slab_alloc+0x12/0x20
[ 9.072036][ T1] ? __slab_alloc+0x12/0x20
[ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
[ 9.072036][ T1] create_object+0x3a/0x3e0
[ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
[ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
[ 9.072036][ T1] ? kasan_check_read+0x11/0x20
[ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
[ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
[ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
[ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
[ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
[ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
[ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
[ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
[ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
[ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
[ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
[ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
[ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
[ 9.072036][ T1] acpi_load_tables+0x61/0x80
[ 9.072036][ T1] acpi_init+0x10d/0x44b
[ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
[ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
[ 9.072036][ T1] ? kernfs_get+0x13/0x20
[ 9.072036][ T1] ? kobject_uevent+0xb/0x10
[ 9.072036][ T1] ? kset_register+0x31/0x50
[ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
[ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.072036][ T1] do_one_initcall+0xfe/0x45a
[ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
[ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
[ 9.072036][ T1] ? kasan_check_write+0x14/0x20
[ 9.072036][ T1] ? up_write+0x6b/0x190
[ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
[ 9.072036][ T1] ? rest_init+0x188/0x188
[ 9.072036][ T1] kernel_init+0x11/0x138
[ 9.072036][ T1] ? rest_init+0x188/0x188
[ 9.072036][ T1] ret_from_fork+0x22/0x40
[ 9.072036][ T1]
==================================================================
[ 9.072036][ T1] Disabling lock debugging due to kernel taint
[ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
0000000000000dc8
[ 9.152036][ T1] #PF: supervisor read access in kernel mode
[ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
[ 9.152036][ T1] PGD 0 P4D 0
[ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
G B 5.2.0-next-20190711+ #10
[ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
[ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
[ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
[ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
ffffffff8112f288
[ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
ffffffff824e0440
[ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
fffffbfff049c088
[ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
00000000000001b8
[ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88905757f440
[ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
knlGS:0000000000000000
[ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
00000000001406b0
[ 9.152036][ T1] Call Trace:
[ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
[ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
[ 9.152036][ T1] ? kasan_check_read+0x11/0x20
[ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
[ 9.152036][ T1] shrink_node+0x31e/0xa30
[ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
[ 9.152036][ T1] ? ktime_get+0x93/0x110
[ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
[ 9.152036][ T1] ? shrink_node+0xa30/0xa30
[ 9.152036][ T1] ? kasan_check_read+0x11/0x20
[ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
[ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
[ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
[ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 9.152036][ T1] ? unwind_dump+0x260/0x260
[ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
[ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
[ 9.152036][ T1] ? ret_from_fork+0x22/0x40
[ 9.152036][ T1] alloc_page_interleave+0x18/0x130
[ 9.152036][ T1] alloc_pages_current+0xf6/0x110
[ 9.152036][ T1] allocate_slab+0x600/0x11f0
[ 9.152036][ T1] new_slab+0x46/0x70
[ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
[ 9.152036][ T1] ? create_object+0x3a/0x3e0
[ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
[ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
[ 9.152036][ T1] ? create_object+0x3a/0x3e0
[ 9.152036][ T1] __slab_alloc+0x12/0x20
[ 9.152036][ T1] ? __slab_alloc+0x12/0x20
[ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
[ 9.152036][ T1] create_object+0x3a/0x3e0
[ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
[ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
[ 9.152036][ T1] ? kasan_check_read+0x11/0x20
[ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
[ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
[ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
[ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
[ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
[ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
[ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
[ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
[ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
[ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
[ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
[ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
[ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
[ 9.152036][ T1] acpi_load_tables+0x61/0x80
[ 9.152036][ T1] acpi_init+0x10d/0x44b
[ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
[ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
[ 9.152036][ T1] ? kernfs_get+0x13/0x20
[ 9.152036][ T1] ? kobject_uevent+0xb/0x10
[ 9.152036][ T1] ? kset_register+0x31/0x50
[ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
[ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
[ 9.152036][ T1] do_one_initcall+0xfe/0x45a
[ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
[ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
[ 9.152036][ T1] ? kasan_check_write+0x14/0x20
[ 9.152036][ T1] ? up_write+0x6b/0x190
[ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
[ 9.152036][ T1] ? rest_init+0x188/0x188
[ 9.152036][ T1] kernel_init+0x11/0x138
[ 9.152036][ T1] ? rest_init+0x188/0x188
[ 9.152036][ T1] ret_from_fork+0x22/0x40
[ 9.152036][ T1] Modules linked in:
[ 9.152036][ T1] CR2: 0000000000000dc8
[ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
[ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
[ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
[ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
[ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
ffffffff8112f288
[ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
ffffffff824e0440
[ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
fffffbfff049c088
[ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
00000000000001b8
[ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88905757f440
[ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
knlGS:00000000
>
>
> Regards,
>
> Yang
>
>
>
> On 7/10/19 2:43 PM, Qian Cai wrote:
> > Running LTP oom01 test case with swap triggers a crash below. Revert the
> > series
> > "Make deferred split shrinker memcg aware" [1] seems fix the issue.
> >
> > aefde94195ca mm: thp: make deferred split shrinker memcg aware
> > cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
> > ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
> > 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
> > c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
> > 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
> > 4e050f2df876 mm: thp: extract split_queue_* into a struct
> >
> > [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.
> > shi@
> > linux.alibaba.com/
> >
> > [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
> > LIST_POISON1 (dead000000000100)
> > [ 1145.739763][ T5764] ------------[ cut here ]------------
> > [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
> > [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> > NOPTI
> > [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
> > G W 5.2.0-next-20190710+ #7
> > [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
> > [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80
> > 9e
> > a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff
> > <0f>
> > 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
> > [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
> > [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
> > ffffffffae95d318
> > [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> > ffff8888440bd380
> > [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
> > ffffed1108817a70
> > [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
> > dead000000000122
> > [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
> > dead000000000100
> > [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000)
> > knlGS:0000000000000000
> > [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
> > 00000000001406a0
> > [ 1145.870664][ T5764] Call Trace:
> > [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
> > [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
> > [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
> > [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
> > [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
> > [ 1145.900159][ T5764] shrink_slab+0x253/0x440
> > [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
> > [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
> > [ 1145.919645][ T5764] shrink_node+0x31e/0xa30
> > [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
> > [ 1145.929126][ T5764] ? ktime_get+0x93/0x110
> > [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
> > [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
> > [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
> > [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
> > [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
> > [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
> > [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
> > [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
> > [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
> > [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
> > [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
> > [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
> > [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
> > [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
> > [ 1146.026893][ T5764] ? finish_fault+0x120/0x120
> > [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
> > [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
> > [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
> > [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
> > [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
> > [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
> > [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
> > [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
> > [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
> > [ 1146.075426][ T5764] ? page_fault+0x5/0x20
> > [ 1146.079553][ T5764] page_fault+0x1b/0x20
> > [ 1146.083594][ T5764] RIP: 0033:0x410be0
> > [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00
> > 00
> > 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90
> > <c6>
> > 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> > [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
> > [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
> > 00007f98f2674497
> > [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
> > 0000000000000000
> > [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
> > 0000000000000000
> > [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
> > T5764] Shutting down cpus with NMI
> > [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception
> > ]---
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-11 21:07 ` Qian Cai
@ 2019-07-12 19:12 ` Yang Shi
2019-07-13 4:41 ` Yang Shi
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Yang Shi @ 2019-07-12 19:12 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/11/19 2:07 PM, Qian Cai wrote:
> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>> Hi Qian,
>>
>>
>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>> Could you please share more details about your test? How often did you
>> run into this problem?
> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
> is some more information.
>
> # cat .config
>
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
I tried your kernel config, but I still can't reproduce it. My compiler
doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my
test, but I don't think this would make any difference for this case.
According to the bug call trace in the earlier email, it looks deferred
_split_scan lost race with put_compound_page. The put_compound_page
would call free_transhuge_page() which delete the page from the deferred
split queue, but it may still appear on the deferred list due to some
reason.
Would you please try the below patch?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7f709d..66bd9db 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page,
struct list_head *list)
if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
if (!list_empty(page_deferred_list(head))) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(head));
+ list_del_init(page_deferred_list(head));
}
if (mapping)
__dec_node_page_state(page, NR_SHMEM_THPS);
@@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (!list_empty(page_deferred_list(page))) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(page));
+ list_del_init(page_deferred_list(page));
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
free_compound_page(page);
>
> # numactl -H
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
> node 0 size: 19984 MB
> node 0 free: 7251 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
> node 3 size: 0 MB
> node 3 free: 0 MB
> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
> node 4 size: 31524 MB
> node 4 free: 25165 MB
> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
> node 5 size: 0 MB
> node 5 free: 0 MB
> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
> node 6 size: 0 MB
> node 6 free: 0 MB
> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
> node 7 size: 0 MB
> node 7 free: 0 MB
> node distances:
> node 0 1 2 3 4 5 6 7
> 0: 10 16 16 16 32 32 32 32
> 1: 16 10 16 16 32 32 32 32
> 2: 16 16 10 16 32 32 32 32
> 3: 16 16 16 10 32 32 32 32
> 4: 32 32 32 32 10 16 16 16
> 5: 32 32 32 32 16 10 16 16
> 6: 32 32 32 32 16 16 10 16
> 7: 32 32 32 32 16 16 16 10
>
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 128
> On-line CPU(s) list: 0-127
> Thread(s) per core: 2
> Core(s) per socket: 32
> Socket(s): 2
> NUMA node(s): 8
> Vendor ID: AuthenticAMD
> CPU family: 23
> Model: 1
> Model name: AMD EPYC 7601 32-Core Processor
> Stepping: 2
> CPU MHz: 2713.551
> BogoMIPS: 4391.39
> Virtualization: AMD-V
> L1d cache: 32K
> L1i cache: 64K
> L2 cache: 512K
> L3 cache: 8192K
> NUMA node0 CPU(s): 0-7,64-71
> NUMA node1 CPU(s): 8-15,72-79
> NUMA node2 CPU(s): 16-23,80-87
> NUMA node3 CPU(s): 24-31,88-95
> NUMA node4 CPU(s): 32-39,96-103
> NUMA node5 CPU(s): 40-47,104-111
> NUMA node6 CPU(s): 48-55,112-119
> NUMA node7 CPU(s): 56-63,120-127
>
> Another possible lead is that without reverting the those commits below, kdump
> kernel would always also crash in shrink_slab_memcg() at this line,
>
> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
think of where nodeinfo was freed but memcg was still online. Maybe a
check is needed:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..bacda49 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
gfp_mask, int nid,
if (!mem_cgroup_online(memcg))
return 0;
+ if (!memcg->nodeinfo[nid])
+ return 0;
+
if (!down_read_trylock(&shrinker_rwsem))
return 0;
>
> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
> swapper/0/1
> [ 9.072036][ T1]
> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
> 20190711+ #10
> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [ 9.072036][ T1] Call Trace:
> [ 9.072036][ T1] dump_stack+0x62/0x9a
> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
> [ 9.072036][ T1] ? shrink_slab+0x111/0x440
> [ 9.072036][ T1] kasan_report+0xc/0xe
> [ 9.072036][ T1] __asan_load8+0x71/0xa0
> [ 9.072036][ T1] shrink_slab+0x111/0x440
> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
> [ 9.072036][ T1] shrink_node+0x31e/0xa30
> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
> [ 9.072036][ T1] ? ktime_get+0x93/0x110
> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 9.072036][ T1] ? unwind_dump+0x260/0x260
> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
> [ 9.072036][ T1] allocate_slab+0x600/0x11f0
> [ 9.072036][ T1] new_slab+0x46/0x70
> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
> [ 9.072036][ T1] __slab_alloc+0x12/0x20
> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
> [ 9.072036][ T1] create_object+0x3a/0x3e0
> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
> [ 9.072036][ T1] acpi_load_tables+0x61/0x80
> [ 9.072036][ T1] acpi_init+0x10d/0x44b
> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
> [ 9.072036][ T1] ? kernfs_get+0x13/0x20
> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
> [ 9.072036][ T1] ? kset_register+0x31/0x50
> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
> [ 9.072036][ T1] ? up_write+0x6b/0x190
> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
> [ 9.072036][ T1] ? rest_init+0x188/0x188
> [ 9.072036][ T1] kernel_init+0x11/0x138
> [ 9.072036][ T1] ? rest_init+0x188/0x188
> [ 9.072036][ T1] ret_from_fork+0x22/0x40
> [ 9.072036][ T1]
> ==================================================================
> [ 9.072036][ T1] Disabling lock debugging due to kernel taint
> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
> 0000000000000dc8
> [ 9.152036][ T1] #PF: supervisor read access in kernel mode
> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
> [ 9.152036][ T1] PGD 0 P4D 0
> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
> G B 5.2.0-next-20190711+ #10
> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> ffffffff8112f288
> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> ffffffff824e0440
> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> fffffbfff049c088
> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> 00000000000001b8
> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff88905757f440
> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
> knlGS:0000000000000000
> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
> 00000000001406b0
> [ 9.152036][ T1] Call Trace:
> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
> [ 9.152036][ T1] shrink_node+0x31e/0xa30
> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
> [ 9.152036][ T1] ? ktime_get+0x93/0x110
> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 9.152036][ T1] ? unwind_dump+0x260/0x260
> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
> [ 9.152036][ T1] allocate_slab+0x600/0x11f0
> [ 9.152036][ T1] new_slab+0x46/0x70
> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
> [ 9.152036][ T1] __slab_alloc+0x12/0x20
> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
> [ 9.152036][ T1] create_object+0x3a/0x3e0
> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
> [ 9.152036][ T1] acpi_load_tables+0x61/0x80
> [ 9.152036][ T1] acpi_init+0x10d/0x44b
> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
> [ 9.152036][ T1] ? kernfs_get+0x13/0x20
> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
> [ 9.152036][ T1] ? kset_register+0x31/0x50
> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
> [ 9.152036][ T1] ? up_write+0x6b/0x190
> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
> [ 9.152036][ T1] ? rest_init+0x188/0x188
> [ 9.152036][ T1] kernel_init+0x11/0x138
> [ 9.152036][ T1] ? rest_init+0x188/0x188
> [ 9.152036][ T1] ret_from_fork+0x22/0x40
> [ 9.152036][ T1] Modules linked in:
> [ 9.152036][ T1] CR2: 0000000000000dc8
> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> ffffffff8112f288
> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> ffffffff824e0440
> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> fffffbfff049c088
> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> 00000000000001b8
> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff88905757f440
> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
> knlGS:00000000
>
>>
>> Regards,
>>
>> Yang
>>
>>
>>
>> On 7/10/19 2:43 PM, Qian Cai wrote:
>>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>>> series
>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>>
>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>>> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
>>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>>
>>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.
>>> shi@
>>> linux.alibaba.com/
>>>
>>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
>>> LIST_POISON1 (dead000000000100)
>>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> NOPTI
>>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
>>> G W 5.2.0-next-20190710+ #7
>>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80
>>> 9e
>>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff
>>> <0f>
>>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
>>> ffffffffae95d318
>>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>>> ffff8888440bd380
>>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
>>> ffffed1108817a70
>>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
>>> dead000000000122
>>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
>>> dead000000000100
>>> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000)
>>> knlGS:0000000000000000
>>> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
>>> 00000000001406a0
>>> [ 1145.870664][ T5764] Call Trace:
>>> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
>>> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
>>> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
>>> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
>>> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
>>> [ 1145.900159][ T5764] shrink_slab+0x253/0x440
>>> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
>>> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
>>> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30
>>> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
>>> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110
>>> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
>>> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
>>> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
>>> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
>>> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
>>> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
>>> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
>>> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
>>> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
>>> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
>>> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
>>> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
>>> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
>>> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120
>>> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
>>> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
>>> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
>>> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
>>> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
>>> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
>>> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
>>> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
>>> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
>>> [ 1146.075426][ T5764] ? page_fault+0x5/0x20
>>> [ 1146.079553][ T5764] page_fault+0x1b/0x20
>>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00
>>> 00
>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90
>>> <c6>
>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
>>> 00007f98f2674497
>>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
>>> 0000000000000000
>>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
>>> 0000000000000000
>>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
>>> T5764] Shutting down cpus with NMI
>>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception
>>> ]---
>>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-12 19:12 ` Yang Shi
@ 2019-07-13 4:41 ` Yang Shi
2019-07-15 21:23 ` Qian Cai
2019-07-19 0:54 ` Qian Cai
2 siblings, 0 replies; 21+ messages in thread
From: Yang Shi @ 2019-07-13 4:41 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/12/19 12:12 PM, Yang Shi wrote:
>
>
> On 7/11/19 2:07 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>> Hi Qian,
>>>
>>>
>>> Thanks for reporting the issue. But, I can't reproduce it on my
>>> machine.
>>> Could you please share more details about your test? How often did you
>>> run into this problem?
>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10
>> server. Here
>> is some more information.
>>
>> # cat .config
>>
>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>
> I tried your kernel config, but I still can't reproduce it. My
> compiler doesn't have retpoline support, so CONFIG_RETPOLINE is
> disabled in my test, but I don't think this would make any difference
> for this case.
>
> According to the bug call trace in the earlier email, it looks
> deferred _split_scan lost race with put_compound_page. The
> put_compound_page would call free_transhuge_page() which delete the
> page from the deferred split queue, but it may still appear on the
> deferred list due to some reason.
>
> Would you please try the below patch?
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..66bd9db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page,
> struct list_head *list)
> if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> if (!list_empty(page_deferred_list(head))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(head));
> + list_del_init(page_deferred_list(head));
This line should not be changed. Please just apply the below part.
> }
> if (mapping)
> __dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> if (!list_empty(page_deferred_list(page))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(page));
> + list_del_init(page_deferred_list(page));
> }
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> free_compound_page(page);
>
>>
>> # numactl -H
>> available: 8 nodes (0-7)
>> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
>> node 0 size: 19984 MB
>> node 0 free: 7251 MB
>> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
>> node 1 size: 0 MB
>> node 1 free: 0 MB
>> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
>> node 2 size: 0 MB
>> node 2 free: 0 MB
>> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
>> node 3 size: 0 MB
>> node 3 free: 0 MB
>> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
>> node 4 size: 31524 MB
>> node 4 free: 25165 MB
>> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
>> node 5 size: 0 MB
>> node 5 free: 0 MB
>> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
>> node 6 size: 0 MB
>> node 6 free: 0 MB
>> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
>> node 7 size: 0 MB
>> node 7 free: 0 MB
>> node distances:
>> node 0 1 2 3 4 5 6 7
>> 0: 10 16 16 16 32 32 32 32
>> 1: 16 10 16 16 32 32 32 32
>> 2: 16 16 10 16 32 32 32 32
>> 3: 16 16 16 10 32 32 32 32
>> 4: 32 32 32 32 10 16 16 16
>> 5: 32 32 32 32 16 10 16 16
>> 6: 32 32 32 32 16 16 10 16
>> 7: 32 32 32 32 16 16 16 10
>>
>> # lscpu
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Byte Order: Little Endian
>> CPU(s): 128
>> On-line CPU(s) list: 0-127
>> Thread(s) per core: 2
>> Core(s) per socket: 32
>> Socket(s): 2
>> NUMA node(s): 8
>> Vendor ID: AuthenticAMD
>> CPU family: 23
>> Model: 1
>> Model name: AMD EPYC 7601 32-Core Processor
>> Stepping: 2
>> CPU MHz: 2713.551
>> BogoMIPS: 4391.39
>> Virtualization: AMD-V
>> L1d cache: 32K
>> L1i cache: 64K
>> L2 cache: 512K
>> L3 cache: 8192K
>> NUMA node0 CPU(s): 0-7,64-71
>> NUMA node1 CPU(s): 8-15,72-79
>> NUMA node2 CPU(s): 16-23,80-87
>> NUMA node3 CPU(s): 24-31,88-95
>> NUMA node4 CPU(s): 32-39,96-103
>> NUMA node5 CPU(s): 40-47,104-111
>> NUMA node6 CPU(s): 48-55,112-119
>> NUMA node7 CPU(s): 56-63,120-127
>>
>> Another possible lead is that without reverting the those commits
>> below, kdump
>> kernel would always also crash in shrink_slab_memcg() at this line,
>>
>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map,
>> true);
>
> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I
> didn't think of where nodeinfo was freed but memcg was still online.
> Maybe a check is needed:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..bacda49 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
> gfp_mask, int nid,
> if (!mem_cgroup_online(memcg))
> return 0;
>
> + if (!memcg->nodeinfo[nid])
> + return 0;
> +
> if (!down_read_trylock(&shrinker_rwsem))
> return 0;
>
>>
>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in
>> shrink_slab+0x111/0x440
>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
>> swapper/0/1
>> [ 9.072036][ T1]
>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 5.2.0-next-
>> 20190711+ #10
>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 01/25/2019
>> [ 9.072036][ T1] Call Trace:
>> [ 9.072036][ T1] dump_stack+0x62/0x9a
>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440
>> [ 9.072036][ T1] kasan_report+0xc/0xe
>> [ 9.072036][ T1] __asan_load8+0x71/0xa0
>> [ 9.072036][ T1] shrink_slab+0x111/0x440
>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
>> [ 9.072036][ T1] shrink_node+0x31e/0xa30
>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
>> [ 9.072036][ T1] ? ktime_get+0x93/0x110
>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260
>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0
>> [ 9.072036][ T1] new_slab+0x46/0x70
>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>> [ 9.072036][ T1] __slab_alloc+0x12/0x20
>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
>> [ 9.072036][ T1] create_object+0x3a/0x3e0
>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80
>> [ 9.072036][ T1] acpi_init+0x10d/0x44b
>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20
>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
>> [ 9.072036][ T1] ? kset_register+0x31/0x50
>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
>> [ 9.072036][ T1] ? up_write+0x6b/0x190
>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>> [ 9.072036][ T1] kernel_init+0x11/0x138
>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>> [ 9.072036][ T1] ret_from_fork+0x22/0x40
>> [ 9.072036][ T1]
>> ==================================================================
>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint
>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
>> 0000000000000dc8
>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode
>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
>> [ 9.152036][ T1] PGD 0 P4D 0
>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>> G B 5.2.0-next-20190711+ #10
>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 01/25/2019
>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f
>> 84 e2 02 00
>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07
>> 0e 00 <4f>
>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>> ffffffff8112f288
>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>> ffffffff824e0440
>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>> fffffbfff049c088
>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>> 00000000000001b8
>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>> ffff88905757f440
>> [ 9.152036][ T1] FS: 0000000000000000(0000)
>> GS:ffff889062800000(0000)
>> knlGS:0000000000000000
>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>> 00000000001406b0
>> [ 9.152036][ T1] Call Trace:
>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
>> [ 9.152036][ T1] shrink_node+0x31e/0xa30
>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
>> [ 9.152036][ T1] ? ktime_get+0x93/0x110
>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260
>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0
>> [ 9.152036][ T1] new_slab+0x46/0x70
>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>> [ 9.152036][ T1] __slab_alloc+0x12/0x20
>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
>> [ 9.152036][ T1] create_object+0x3a/0x3e0
>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80
>> [ 9.152036][ T1] acpi_init+0x10d/0x44b
>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20
>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
>> [ 9.152036][ T1] ? kset_register+0x31/0x50
>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
>> [ 9.152036][ T1] ? up_write+0x6b/0x190
>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>> [ 9.152036][ T1] kernel_init+0x11/0x138
>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>> [ 9.152036][ T1] ret_from_fork+0x22/0x40
>> [ 9.152036][ T1] Modules linked in:
>> [ 9.152036][ T1] CR2: 0000000000000dc8
>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f
>> 84 e2 02 00
>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07
>> 0e 00 <4f>
>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>> ffffffff8112f288
>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>> ffffffff824e0440
>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>> fffffbfff049c088
>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>> 00000000000001b8
>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>> ffff88905757f440
>> [ 9.152036][ T1] FS: 0000000000000000(0000)
>> GS:ffff889062800000(0000)
>> knlGS:00000000
>>
>>>
>>> Regards,
>>>
>>> Yang
>>>
>>>
>>>
>>> On 7/10/19 2:43 PM, Qian Cai wrote:
>>>> Running LTP oom01 test case with swap triggers a crash below.
>>>> Revert the
>>>> series
>>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>>>
>>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>>>> cf402211cacc
>>>> mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>>>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>>>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>>>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>>>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of
>>>> __page_cache_release()
>>>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.
>>>>
>>>> shi@
>>>> linux.alibaba.com/
>>>>
>>>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
>>>> LIST_POISON1 (dead000000000100)
>>>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>>>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>>>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP
>>>> DEBUG_PAGEALLOC KASAN
>>>> NOPTI
>>>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
>>>> G W 5.2.0-next-20190710+ #7
>>>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385
>>>> Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [ 1145.776000][ T5764] RIP:
>>>> 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>>>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48
>>>> c7 c7 80
>>>> 9e
>>>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c
>>>> fe bc ff
>>>> <0f>
>>>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>>>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>>>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098
>>>> RCX:
>>>> ffffffffae95d318
>>>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008
>>>> RDI:
>>>> ffff8888440bd380
>>>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71
>>>> R09:
>>>> ffffed1108817a70
>>>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387
>>>> R12:
>>>> dead000000000122
>>>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034
>>>> R15:
>>>> dead000000000100
>>>> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000)
>>>> GS:ffff888844080000(0000)
>>>> knlGS:0000000000000000
>>>> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>> 0000000080050033
>>>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000
>>>> CR4:
>>>> 00000000001406a0
>>>> [ 1145.870664][ T5764] Call Trace:
>>>> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
>>>> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
>>>> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
>>>> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
>>>> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
>>>> [ 1145.900159][ T5764] shrink_slab+0x253/0x440
>>>> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
>>>> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
>>>> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30
>>>> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
>>>> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110
>>>> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
>>>> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
>>>> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
>>>> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
>>>> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
>>>> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
>>>> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
>>>> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
>>>> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
>>>> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
>>>> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
>>>> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
>>>> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
>>>> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120
>>>> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
>>>> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
>>>> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
>>>> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
>>>> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
>>>> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
>>>> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
>>>> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
>>>> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
>>>> [ 1146.075426][ T5764] ? page_fault+0x5/0x20
>>>> [ 1146.079553][ T5764] page_fault+0x1b/0x20
>>>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>>>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84
>>>> 86 00 00
>>>> 00
>>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2
>>>> 48 98 90
>>>> <c6>
>>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>>>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000
>>>> RCX:
>>>> 00007f98f2674497
>>>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000
>>>> RDI:
>>>> 0000000000000000
>>>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff
>>>> R09:
>>>> 0000000000000000
>>>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[
>>>> 1147.588181][
>>>> T5764] Shutting down cpus with NMI
>>>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from
>>>> 0xffffffff81000000
>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal
>>>> exception
>>>> ]---
>>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
2019-07-11 0:16 ` Yang Shi
@ 2019-07-15 4:52 ` Yang Shi
2019-07-24 21:13 ` Qian Cai
2 siblings, 0 replies; 21+ messages in thread
From: Yang Shi @ 2019-07-15 4:52 UTC (permalink / raw)
To: Hillf Danton, Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/13/19 8:53 PM, Hillf Danton wrote:
> On Wed, 10 Jul 2019 14:43:28 -0700 (PDT) Qian Cai wrote:
>> Running LTP oom01 test case with swap triggers a crash below. Revert the series
>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>
>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>
>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@linux.alibaba.com/
>>
>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is LIST_POISON1 (dead000000000100)
>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: G W 5.2.0-next-20190710+ #7
>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019
>> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: ffffffffae95d318
>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888440bd380
>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: ffffed1108817a70
>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: dead000000000122
>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: dead000000000100
>> [ 1145.847455][ T5764] FS: 00007f765ad4d700(0000) GS:ffff888844080000(0000) knlGS:0000000000000000
>> [ 1145.856299][ T5764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: 00000000001406a0
>> [ 1145.870664][ T5764] Call Trace:
>> [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740
>> [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30
>> [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0
>> [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40
>> [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0
>> [ 1145.900159][ T5764] shrink_slab+0x253/0x440
>> [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110
>> [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260
>> [ 1145.919645][ T5764] shrink_node+0x31e/0xa30
>> [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560
>> [ 1145.929126][ T5764] ? ktime_get+0x93/0x110
>> [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820
>> [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30
>> [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0
>> [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0
>> [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820
>> [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0
>> [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0
>> [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30
>> [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390
>> [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0
>> [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160
>> [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0
>> [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30
>> [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490
>> [ 1146.026893][ T5764] ? finish_fault+0x120/0x120
>> [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20
>> [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0
>> [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50
>> [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20
>> [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20
>> [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0
>> [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370
>> [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0
>> [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf
>> [ 1146.075426][ T5764] ? page_fault+0x5/0x20
>> [ 1146.079553][ T5764] page_fault+0x1b/0x20
>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f98f2674497
>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: 0000000000000000
>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: 0000000000000000
>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000
>> [ 1147.588181][ T5764] Shutting down cpus with NMI
>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Ignore the noise if there is no chance you think to corrupt the local list walk
> in some way like:
>
> CPU0 CPU1
> ---- ----
> take no lock spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> list_for_each_safe(pos, next,
> &list)
> list_del(page_deferred_list(page));
> page = list_entry((void *)pos,
> struct page, mapping);
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
IMHO, I didn't see the race could happen really.
list_del() is called at 3 places:
1. Parallel free_transhuge_page(): The refcount bump should prevent from
the race.
2. Parallel reclaimer: split_queue_lock should prevent this, so the
other reclaimer should not see the same page.
3. Parallel split_huge_page(): I'm not sure about this one. But, page
lock should be acquired before calling split_huge_page() in other call
paths too.
I'm not sure if I miss anything, please feel free to correct me.
>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> if (!list_empty(page_deferred_list(head))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(head));
> + list_del_init(page_deferred_list(head));
> }
> if (mapping)
> __dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> if (!list_empty(page_deferred_list(page))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(page));
> + list_del_init(page_deferred_list(page));
> }
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> free_compound_page(page);
> --
I proposed the similar thing.
> The major important is listed above; the minor trivial part below.
> Both are only for thought collectings.
>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2869,9 +2869,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> struct pglist_data *pgdata = NODE_DATA(sc->nid);
> struct deferred_split *ds_queue;
> unsigned long flags;
> - LIST_HEAD(list), *pos, *next;
> struct page *page;
> - int split = 0;
> + unsigned long nr_split = 0;
>
> #ifdef CONFIG_MEMCG
> if (sc->memcg)
> @@ -2884,44 +2883,44 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>
> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> /* Take pin on all head pages to avoid freeing them under us */
> - list_for_each_safe(pos, next, &ds_queue->split_queue) {
> - page = list_entry((void *)pos, struct page, mapping);
> + while (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
> + bool locked, pinned;
> +
> + page = list_first_entry(&ds_queue->split_queue, struct page,
> + mapping);
> page = compound_head(page);
> +
> if (get_page_unless_zero(page)) {
> - list_move(page_deferred_list(page), &list);
> + pinned = true;
> + locked = trylock_page(page);
> } else {
> /* We lost race with put_compound_page() */
> - list_del_init(page_deferred_list(page));
> - ds_queue->split_queue_len--;
> + pinned = false;
> + locked = false;
> + }
> + list_del_init(page_deferred_list(page));
> + ds_queue->split_queue_len--;
> + --sc->nr_to_scan;
> + if (!pinned)
> + continue;
> + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> + if (locked) {
> + if (!split_huge_page(page))
> + nr_split++;
> + unlock_page(page);
> }
> - if (!--sc->nr_to_scan)
> - break;
> - }
> - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> -
> - list_for_each_safe(pos, next, &list) {
> - page = list_entry((void *)pos, struct page, mapping);
> - if (!trylock_page(page))
> - goto next;
> - /* split_huge_page() removes page from list on success */
> - if (!split_huge_page(page))
> - split++;
> - unlock_page(page);
> -next:
> put_page(page);
> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> }
> -
> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> - list_splice_tail(&list, &ds_queue->split_queue);
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>
> /*
> * Stop shrinker if we didn't split any page, but the queue is empty.
> * This can happen if pages were freed under us.
> */
> - if (!split && list_empty(&ds_queue->split_queue))
> + if (!nr_split && list_empty(&ds_queue->split_queue))
> return SHRINK_STOP;
> - return split;
> + return nr_split;
> }
>
> static struct shrinker deferred_split_shrinker = {
> --
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-12 19:12 ` Yang Shi
2019-07-13 4:41 ` Yang Shi
@ 2019-07-15 21:23 ` Qian Cai
2019-07-16 0:22 ` Yang Shi
2019-07-19 0:54 ` Qian Cai
2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-07-15 21:23 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> > Another possible lead is that without reverting the those commits below,
> > kdump
> > kernel would always also crash in shrink_slab_memcg() at this line,
> >
> > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>
> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
> think of where nodeinfo was freed but memcg was still online. Maybe a
> check is needed:
Actually, "memcg" is NULL.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..bacda49 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
> gfp_mask, int nid,
> if (!mem_cgroup_online(memcg))
> return 0;
>
> + if (!memcg->nodeinfo[nid])
> + return 0;
> +
> if (!down_read_trylock(&shrinker_rwsem))
> return 0;
>
> >
> > [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
> > [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
> > swapper/0/1
> > [ 9.072036][ T1]
> > [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
> > 20190711+ #10
> > [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [ 9.072036][ T1] Call Trace:
> > [ 9.072036][ T1] dump_stack+0x62/0x9a
> > [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
> > [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
> > [ 9.072036][ T1] ? shrink_slab+0x111/0x440
> > [ 9.072036][ T1] kasan_report+0xc/0xe
> > [ 9.072036][ T1] __asan_load8+0x71/0xa0
> > [ 9.072036][ T1] shrink_slab+0x111/0x440
> > [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
> > [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
> > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
> > [ 9.072036][ T1] shrink_node+0x31e/0xa30
> > [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
> > [ 9.072036][ T1] ? ktime_get+0x93/0x110
> > [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
> > [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
> > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
> > [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
> > [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
> > [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
> > [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [ 9.072036][ T1] ? unwind_dump+0x260/0x260
> > [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
> > [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
> > [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
> > [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
> > [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
> > [ 9.072036][ T1] allocate_slab+0x600/0x11f0
> > [ 9.072036][ T1] new_slab+0x46/0x70
> > [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
> > [ 9.072036][ T1] ? create_object+0x3a/0x3e0
> > [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
> > [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
> > [ 9.072036][ T1] ? create_object+0x3a/0x3e0
> > [ 9.072036][ T1] __slab_alloc+0x12/0x20
> > [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
> > [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
> > [ 9.072036][ T1] create_object+0x3a/0x3e0
> > [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
> > [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
> > [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
> > [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
> > [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
> > [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
> > [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
> > [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
> > [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
> > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> > [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> > [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
> > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
> > [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
> > [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
> > [ 9.072036][ T1] acpi_load_tables+0x61/0x80
> > [ 9.072036][ T1] acpi_init+0x10d/0x44b
> > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
> > [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
> > [ 9.072036][ T1] ? kernfs_get+0x13/0x20
> > [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
> > [ 9.072036][ T1] ? kset_register+0x31/0x50
> > [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
> > [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
> > [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
> > [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
> > [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
> > [ 9.072036][ T1] ? up_write+0x6b/0x190
> > [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
> > [ 9.072036][ T1] ? rest_init+0x188/0x188
> > [ 9.072036][ T1] kernel_init+0x11/0x138
> > [ 9.072036][ T1] ? rest_init+0x188/0x188
> > [ 9.072036][ T1] ret_from_fork+0x22/0x40
> > [ 9.072036][ T1]
> > ==================================================================
> > [ 9.072036][ T1] Disabling lock debugging due to kernel taint
> > [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
> > 0000000000000dc8
> > [ 9.152036][ T1] #PF: supervisor read access in kernel mode
> > [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
> > [ 9.152036][ T1] PGD 0 P4D 0
> > [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
> > G B 5.2.0-next-20190711+ #10
> > [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
> > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
> > 00
> > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
> > <4f>
> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> > ffffffff8112f288
> > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> > ffffffff824e0440
> > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> > fffffbfff049c088
> > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> > 00000000000001b8
> > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> > ffff88905757f440
> > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
> > knlGS:0000000000000000
> > [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
> > 00000000001406b0
> > [ 9.152036][ T1] Call Trace:
> > [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
> > [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
> > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
> > [ 9.152036][ T1] shrink_node+0x31e/0xa30
> > [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
> > [ 9.152036][ T1] ? ktime_get+0x93/0x110
> > [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
> > [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
> > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
> > [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
> > [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
> > [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
> > [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [ 9.152036][ T1] ? unwind_dump+0x260/0x260
> > [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
> > [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
> > [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
> > [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
> > [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
> > [ 9.152036][ T1] allocate_slab+0x600/0x11f0
> > [ 9.152036][ T1] new_slab+0x46/0x70
> > [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
> > [ 9.152036][ T1] ? create_object+0x3a/0x3e0
> > [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
> > [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
> > [ 9.152036][ T1] ? create_object+0x3a/0x3e0
> > [ 9.152036][ T1] __slab_alloc+0x12/0x20
> > [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
> > [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
> > [ 9.152036][ T1] create_object+0x3a/0x3e0
> > [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
> > [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
> > [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
> > [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
> > [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
> > [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
> > [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
> > [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
> > [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
> > [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
> > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> > [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
> > [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
> > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
> > [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
> > [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
> > [ 9.152036][ T1] acpi_load_tables+0x61/0x80
> > [ 9.152036][ T1] acpi_init+0x10d/0x44b
> > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
> > [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
> > [ 9.152036][ T1] ? kernfs_get+0x13/0x20
> > [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
> > [ 9.152036][ T1] ? kset_register+0x31/0x50
> > [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
> > [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
> > [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
> > [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
> > [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
> > [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
> > [ 9.152036][ T1] ? up_write+0x6b/0x190
> > [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
> > [ 9.152036][ T1] ? rest_init+0x188/0x188
> > [ 9.152036][ T1] kernel_init+0x11/0x138
> > [ 9.152036][ T1] ? rest_init+0x188/0x188
> > [ 9.152036][ T1] ret_from_fork+0x22/0x40
> > [ 9.152036][ T1] Modules linked in:
> > [ 9.152036][ T1] CR2: 0000000000000dc8
> > [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
> > [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
> > [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
> > 00
> > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
> > <4f>
> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> > [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> > [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> > ffffffff8112f288
> > [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> > ffffffff824e0440
> > [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> > fffffbfff049c088
> > [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> > 00000000000001b8
> > [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> > ffff88905757f440
> > [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
> > knlGS:00000000
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-15 21:23 ` Qian Cai
@ 2019-07-16 0:22 ` Yang Shi
2019-07-16 1:36 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-16 0:22 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/15/19 2:23 PM, Qian Cai wrote:
> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>> Another possible lead is that without reverting the those commits below,
>>> kdump
>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>
>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>> think of where nodeinfo was freed but memcg was still online. Maybe a
>> check is needed:
> Actually, "memcg" is NULL.
It sounds weird. shrink_slab() is called in mem_cgroup_iter which does
pin the memcg. So, the memcg should not go away.
>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..bacda49 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>> gfp_mask, int nid,
>> if (!mem_cgroup_online(memcg))
>> return 0;
>>
>> + if (!memcg->nodeinfo[nid])
>> + return 0;
>> +
>> if (!down_read_trylock(&shrinker_rwsem))
>> return 0;
>>
>>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
>>> swapper/0/1
>>> [ 9.072036][ T1]
>>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>> 20190711+ #10
>>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [ 9.072036][ T1] Call Trace:
>>> [ 9.072036][ T1] dump_stack+0x62/0x9a
>>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
>>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
>>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440
>>> [ 9.072036][ T1] kasan_report+0xc/0xe
>>> [ 9.072036][ T1] __asan_load8+0x71/0xa0
>>> [ 9.072036][ T1] shrink_slab+0x111/0x440
>>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
>>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
>>> [ 9.072036][ T1] shrink_node+0x31e/0xa30
>>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>> [ 9.072036][ T1] ? ktime_get+0x93/0x110
>>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
>>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
>>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
>>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
>>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260
>>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
>>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
>>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
>>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
>>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
>>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0
>>> [ 9.072036][ T1] new_slab+0x46/0x70
>>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>> [ 9.072036][ T1] __slab_alloc+0x12/0x20
>>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
>>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
>>> [ 9.072036][ T1] create_object+0x3a/0x3e0
>>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
>>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
>>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
>>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80
>>> [ 9.072036][ T1] acpi_init+0x10d/0x44b
>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
>>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
>>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20
>>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
>>> [ 9.072036][ T1] ? kset_register+0x31/0x50
>>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
>>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
>>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
>>> [ 9.072036][ T1] ? up_write+0x6b/0x190
>>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>> [ 9.072036][ T1] kernel_init+0x11/0x138
>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>> [ 9.072036][ T1] ret_from_fork+0x22/0x40
>>> [ 9.072036][ T1]
>>> ==================================================================
>>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint
>>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000dc8
>>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode
>>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
>>> [ 9.152036][ T1] PGD 0 P4D 0
>>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>> G B 5.2.0-next-20190711+ #10
>>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>> 00
>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>> <4f>
>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>> ffffffff8112f288
>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>> ffffffff824e0440
>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>> fffffbfff049c088
>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>> 00000000000001b8
>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>> ffff88905757f440
>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>> knlGS:0000000000000000
>>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>> 00000000001406b0
>>> [ 9.152036][ T1] Call Trace:
>>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
>>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
>>> [ 9.152036][ T1] shrink_node+0x31e/0xa30
>>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>> [ 9.152036][ T1] ? ktime_get+0x93/0x110
>>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
>>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
>>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
>>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
>>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260
>>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
>>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
>>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
>>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
>>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
>>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0
>>> [ 9.152036][ T1] new_slab+0x46/0x70
>>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>> [ 9.152036][ T1] __slab_alloc+0x12/0x20
>>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
>>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
>>> [ 9.152036][ T1] create_object+0x3a/0x3e0
>>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
>>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
>>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
>>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80
>>> [ 9.152036][ T1] acpi_init+0x10d/0x44b
>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
>>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
>>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20
>>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
>>> [ 9.152036][ T1] ? kset_register+0x31/0x50
>>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
>>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
>>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
>>> [ 9.152036][ T1] ? up_write+0x6b/0x190
>>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>> [ 9.152036][ T1] kernel_init+0x11/0x138
>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>> [ 9.152036][ T1] ret_from_fork+0x22/0x40
>>> [ 9.152036][ T1] Modules linked in:
>>> [ 9.152036][ T1] CR2: 0000000000000dc8
>>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>> 00
>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>> <4f>
>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>> ffffffff8112f288
>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>> ffffffff824e0440
>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>> fffffbfff049c088
>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>> 00000000000001b8
>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>> ffff88905757f440
>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>> knlGS:00000000
>>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-16 0:22 ` Yang Shi
@ 2019-07-16 1:36 ` Qian Cai
2019-07-16 3:00 ` Yang Shi
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-07-16 1:36 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML
> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/15/19 2:23 PM, Qian Cai wrote:
>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>> Another possible lead is that without reverting the those commits below,
>>>> kdump
>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>
>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>> check is needed:
>> Actually, "memcg" is NULL.
>
> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
- if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
+ if (!mem_cgroup_online(memcg))
return 0;
Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
if (mem_cgroup_disabled())
return NULL;
>
>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a0301ed..bacda49 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>>> gfp_mask, int nid,
>>> if (!mem_cgroup_online(memcg))
>>> return 0;
>>>
>>> + if (!memcg->nodeinfo[nid])
>>> + return 0;
>>> +
>>> if (!down_read_trylock(&shrinker_rwsem))
>>> return 0;
>>>
>>>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
>>>> swapper/0/1
>>>> [ 9.072036][ T1]
>>>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>>> 20190711+ #10
>>>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [ 9.072036][ T1] Call Trace:
>>>> [ 9.072036][ T1] dump_stack+0x62/0x9a
>>>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
>>>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
>>>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440
>>>> [ 9.072036][ T1] kasan_report+0xc/0xe
>>>> [ 9.072036][ T1] __asan_load8+0x71/0xa0
>>>> [ 9.072036][ T1] shrink_slab+0x111/0x440
>>>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
>>>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
>>>> [ 9.072036][ T1] shrink_node+0x31e/0xa30
>>>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>>> [ 9.072036][ T1] ? ktime_get+0x93/0x110
>>>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
>>>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
>>>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
>>>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
>>>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260
>>>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
>>>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
>>>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
>>>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
>>>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
>>>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0
>>>> [ 9.072036][ T1] new_slab+0x46/0x70
>>>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>>> [ 9.072036][ T1] __slab_alloc+0x12/0x20
>>>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
>>>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
>>>> [ 9.072036][ T1] create_object+0x3a/0x3e0
>>>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
>>>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
>>>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
>>>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80
>>>> [ 9.072036][ T1] acpi_init+0x10d/0x44b
>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
>>>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
>>>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20
>>>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
>>>> [ 9.072036][ T1] ? kset_register+0x31/0x50
>>>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
>>>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
>>>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
>>>> [ 9.072036][ T1] ? up_write+0x6b/0x190
>>>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>>> [ 9.072036][ T1] kernel_init+0x11/0x138
>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>>> [ 9.072036][ T1] ret_from_fork+0x22/0x40
>>>> [ 9.072036][ T1]
>>>> ==================================================================
>>>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint
>>>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000dc8
>>>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode
>>>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
>>>> [ 9.152036][ T1] PGD 0 P4D 0
>>>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>>> G B 5.2.0-next-20190711+ #10
>>>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>> 00
>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>> <4f>
>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>> ffffffff8112f288
>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>> ffffffff824e0440
>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>> fffffbfff049c088
>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>> 00000000000001b8
>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>> ffff88905757f440
>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>>> knlGS:0000000000000000
>>>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>>> 00000000001406b0
>>>> [ 9.152036][ T1] Call Trace:
>>>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
>>>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
>>>> [ 9.152036][ T1] shrink_node+0x31e/0xa30
>>>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>>> [ 9.152036][ T1] ? ktime_get+0x93/0x110
>>>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
>>>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
>>>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
>>>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
>>>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260
>>>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
>>>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
>>>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
>>>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
>>>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
>>>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0
>>>> [ 9.152036][ T1] new_slab+0x46/0x70
>>>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>>> [ 9.152036][ T1] __slab_alloc+0x12/0x20
>>>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
>>>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
>>>> [ 9.152036][ T1] create_object+0x3a/0x3e0
>>>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
>>>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
>>>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
>>>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80
>>>> [ 9.152036][ T1] acpi_init+0x10d/0x44b
>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
>>>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
>>>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20
>>>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
>>>> [ 9.152036][ T1] ? kset_register+0x31/0x50
>>>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
>>>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
>>>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
>>>> [ 9.152036][ T1] ? up_write+0x6b/0x190
>>>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>>> [ 9.152036][ T1] kernel_init+0x11/0x138
>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>>> [ 9.152036][ T1] ret_from_fork+0x22/0x40
>>>> [ 9.152036][ T1] Modules linked in:
>>>> [ 9.152036][ T1] CR2: 0000000000000dc8
>>>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>> 00
>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>> <4f>
>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>> ffffffff8112f288
>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>> ffffffff824e0440
>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>> fffffbfff049c088
>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>> 00000000000001b8
>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>> ffff88905757f440
>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>>> knlGS:00000000
>>>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-16 1:36 ` Qian Cai
@ 2019-07-16 3:00 ` Yang Shi
2019-07-16 23:36 ` Shakeel Butt
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-16 3:00 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML
On 7/15/19 6:36 PM, Qian Cai wrote:
>
>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>> Another possible lead is that without reverting the those commits below,
>>>>> kdump
>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>
>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>> check is needed:
>>> Actually, "memcg" is NULL.
>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>
> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> + if (!mem_cgroup_online(memcg))
> return 0;
>
> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>
> if (mem_cgroup_disabled())
> return NULL;
Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
check before calling shrink_slab_memcg() as below:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..2f03c61 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
nid,
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a0301ed..bacda49 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>>>> gfp_mask, int nid,
>>>> if (!mem_cgroup_online(memcg))
>>>> return 0;
>>>>
>>>> + if (!memcg->nodeinfo[nid])
>>>> + return 0;
>>>> +
>>>> if (!down_read_trylock(&shrinker_rwsem))
>>>> return 0;
>>>>
>>>>> [ 9.072036][ T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>>>> [ 9.072036][ T1] Read of size 8 at addr 0000000000000dc8 by task
>>>>> swapper/0/1
>>>>> [ 9.072036][ T1]
>>>>> [ 9.072036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>>>> 20190711+ #10
>>>>> [ 9.072036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>>> DL385
>>>>> Gen10, BIOS A40 01/25/2019
>>>>> [ 9.072036][ T1] Call Trace:
>>>>> [ 9.072036][ T1] dump_stack+0x62/0x9a
>>>>> [ 9.072036][ T1] __kasan_report.cold.4+0xb0/0xb4
>>>>> [ 9.072036][ T1] ? unwind_get_return_address+0x40/0x50
>>>>> [ 9.072036][ T1] ? shrink_slab+0x111/0x440
>>>>> [ 9.072036][ T1] kasan_report+0xc/0xe
>>>>> [ 9.072036][ T1] __asan_load8+0x71/0xa0
>>>>> [ 9.072036][ T1] shrink_slab+0x111/0x440
>>>>> [ 9.072036][ T1] ? mem_cgroup_iter+0x98/0x840
>>>>> [ 9.072036][ T1] ? unregister_shrinker+0x110/0x110
>>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.072036][ T1] ? mem_cgroup_protected+0x39/0x260
>>>>> [ 9.072036][ T1] shrink_node+0x31e/0xa30
>>>>> [ 9.072036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>>>> [ 9.072036][ T1] ? ktime_get+0x93/0x110
>>>>> [ 9.072036][ T1] do_try_to_free_pages+0x22f/0x820
>>>>> [ 9.072036][ T1] ? shrink_node+0xa30/0xa30
>>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.072036][ T1] ? check_chain_key+0x1df/0x2e0
>>>>> [ 9.072036][ T1] try_to_free_pages+0x242/0x4d0
>>>>> [ 9.072036][ T1] ? do_try_to_free_pages+0x820/0x820
>>>>> [ 9.072036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>>>> [ 9.072036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>>> [ 9.072036][ T1] ? unwind_dump+0x260/0x260
>>>>> [ 9.072036][ T1] ? kernel_text_address+0x33/0xc0
>>>>> [ 9.072036][ T1] ? arch_stack_walk+0x8f/0xf0
>>>>> [ 9.072036][ T1] ? ret_from_fork+0x22/0x40
>>>>> [ 9.072036][ T1] alloc_page_interleave+0x18/0x130
>>>>> [ 9.072036][ T1] alloc_pages_current+0xf6/0x110
>>>>> [ 9.072036][ T1] allocate_slab+0x600/0x11f0
>>>>> [ 9.072036][ T1] new_slab+0x46/0x70
>>>>> [ 9.072036][ T1] ___slab_alloc+0x5d4/0x9c0
>>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>>>> [ 9.072036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>>>> [ 9.072036][ T1] ? ___might_sleep+0xab/0xc0
>>>>> [ 9.072036][ T1] ? create_object+0x3a/0x3e0
>>>>> [ 9.072036][ T1] __slab_alloc+0x12/0x20
>>>>> [ 9.072036][ T1] ? __slab_alloc+0x12/0x20
>>>>> [ 9.072036][ T1] kmem_cache_alloc+0x32a/0x400
>>>>> [ 9.072036][ T1] create_object+0x3a/0x3e0
>>>>> [ 9.072036][ T1] kmemleak_alloc+0x71/0xa0
>>>>> [ 9.072036][ T1] kmem_cache_alloc+0x272/0x400
>>>>> [ 9.072036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.072036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>>>> [ 9.072036][ T1] acpi_ps_alloc_op+0x76/0x122
>>>>> [ 9.072036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>>>> [ 9.072036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>>>> [ 9.072036][ T1] acpi_ns_init_one_package+0x33/0x61
>>>>> [ 9.072036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>>>> [ 9.072036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>>> [ 9.072036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>>> [ 9.072036][ T1] acpi_walk_namespace+0x9e/0xcb
>>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.072036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>>>> [ 9.072036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>>> [ 9.072036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>>> [ 9.072036][ T1] acpi_load_tables+0x61/0x80
>>>>> [ 9.072036][ T1] acpi_init+0x10d/0x44b
>>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.072036][ T1] ? bus_uevent_filter+0x16/0x30
>>>>> [ 9.072036][ T1] ? kobject_uevent_env+0x109/0x980
>>>>> [ 9.072036][ T1] ? kernfs_get+0x13/0x20
>>>>> [ 9.072036][ T1] ? kobject_uevent+0xb/0x10
>>>>> [ 9.072036][ T1] ? kset_register+0x31/0x50
>>>>> [ 9.072036][ T1] ? kset_create_and_add+0x9f/0xd0
>>>>> [ 9.072036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.072036][ T1] do_one_initcall+0xfe/0x45a
>>>>> [ 9.072036][ T1] ? initcall_blacklisted+0x150/0x150
>>>>> [ 9.072036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>>>> [ 9.072036][ T1] ? kasan_check_write+0x14/0x20
>>>>> [ 9.072036][ T1] ? up_write+0x6b/0x190
>>>>> [ 9.072036][ T1] kernel_init_freeable+0x614/0x6a7
>>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>>>> [ 9.072036][ T1] kernel_init+0x11/0x138
>>>>> [ 9.072036][ T1] ? rest_init+0x188/0x188
>>>>> [ 9.072036][ T1] ret_from_fork+0x22/0x40
>>>>> [ 9.072036][ T1]
>>>>> ==================================================================
>>>>> [ 9.072036][ T1] Disabling lock debugging due to kernel taint
>>>>> [ 9.145712][ T1] BUG: kernel NULL pointer dereference, address:
>>>>> 0000000000000dc8
>>>>> [ 9.152036][ T1] #PF: supervisor read access in kernel mode
>>>>> [ 9.152036][ T1] #PF: error_code(0x0000) - not-present page
>>>>> [ 9.152036][ T1] PGD 0 P4D 0
>>>>> [ 9.152036][ T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>>>> [ 9.152036][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>>>> G B 5.2.0-next-20190711+ #10
>>>>> [ 9.152036][ T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>>> DL385
>>>>> Gen10, BIOS A40 01/25/2019
>>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>>> 00
>>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>>> <4f>
>>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>>> ffffffff8112f288
>>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>>> ffffffff824e0440
>>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>>> fffffbfff049c088
>>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>>> 00000000000001b8
>>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>>> ffff88905757f440
>>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 9.152036][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 9.152036][ T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>>>> 00000000001406b0
>>>>> [ 9.152036][ T1] Call Trace:
>>>>> [ 9.152036][ T1] ? mem_cgroup_iter+0x98/0x840
>>>>> [ 9.152036][ T1] ? unregister_shrinker+0x110/0x110
>>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.152036][ T1] ? mem_cgroup_protected+0x39/0x260
>>>>> [ 9.152036][ T1] shrink_node+0x31e/0xa30
>>>>> [ 9.152036][ T1] ? shrink_node_memcg+0x1560/0x1560
>>>>> [ 9.152036][ T1] ? ktime_get+0x93/0x110
>>>>> [ 9.152036][ T1] do_try_to_free_pages+0x22f/0x820
>>>>> [ 9.152036][ T1] ? shrink_node+0xa30/0xa30
>>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.152036][ T1] ? check_chain_key+0x1df/0x2e0
>>>>> [ 9.152036][ T1] try_to_free_pages+0x242/0x4d0
>>>>> [ 9.152036][ T1] ? do_try_to_free_pages+0x820/0x820
>>>>> [ 9.152036][ T1] __alloc_pages_nodemask+0x9ce/0x1bc0
>>>>> [ 9.152036][ T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>>> [ 9.152036][ T1] ? unwind_dump+0x260/0x260
>>>>> [ 9.152036][ T1] ? kernel_text_address+0x33/0xc0
>>>>> [ 9.152036][ T1] ? arch_stack_walk+0x8f/0xf0
>>>>> [ 9.152036][ T1] ? ret_from_fork+0x22/0x40
>>>>> [ 9.152036][ T1] alloc_page_interleave+0x18/0x130
>>>>> [ 9.152036][ T1] alloc_pages_current+0xf6/0x110
>>>>> [ 9.152036][ T1] allocate_slab+0x600/0x11f0
>>>>> [ 9.152036][ T1] new_slab+0x46/0x70
>>>>> [ 9.152036][ T1] ___slab_alloc+0x5d4/0x9c0
>>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>>>> [ 9.152036][ T1] ? fs_reclaim_acquire.part.15+0x5/0x30
>>>>> [ 9.152036][ T1] ? ___might_sleep+0xab/0xc0
>>>>> [ 9.152036][ T1] ? create_object+0x3a/0x3e0
>>>>> [ 9.152036][ T1] __slab_alloc+0x12/0x20
>>>>> [ 9.152036][ T1] ? __slab_alloc+0x12/0x20
>>>>> [ 9.152036][ T1] kmem_cache_alloc+0x32a/0x400
>>>>> [ 9.152036][ T1] create_object+0x3a/0x3e0
>>>>> [ 9.152036][ T1] kmemleak_alloc+0x71/0xa0
>>>>> [ 9.152036][ T1] kmem_cache_alloc+0x272/0x400
>>>>> [ 9.152036][ T1] ? kasan_check_read+0x11/0x20
>>>>> [ 9.152036][ T1] ? do_raw_spin_unlock+0xa8/0x140
>>>>> [ 9.152036][ T1] acpi_ps_alloc_op+0x76/0x122
>>>>> [ 9.152036][ T1] acpi_ds_execute_arguments+0x2f/0x18d
>>>>> [ 9.152036][ T1] acpi_ds_get_package_arguments+0x7d/0x84
>>>>> [ 9.152036][ T1] acpi_ns_init_one_package+0x33/0x61
>>>>> [ 9.152036][ T1] acpi_ns_init_one_object+0xfc/0x189
>>>>> [ 9.152036][ T1] acpi_ns_walk_namespace+0x114/0x1f2
>>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>>> [ 9.152036][ T1] ? acpi_ns_init_one_package+0x61/0x61
>>>>> [ 9.152036][ T1] acpi_walk_namespace+0x9e/0xcb
>>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.152036][ T1] acpi_ns_initialize_objects+0x99/0xed
>>>>> [ 9.152036][ T1] ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>>> [ 9.152036][ T1] ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>>> [ 9.152036][ T1] acpi_load_tables+0x61/0x80
>>>>> [ 9.152036][ T1] acpi_init+0x10d/0x44b
>>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.152036][ T1] ? bus_uevent_filter+0x16/0x30
>>>>> [ 9.152036][ T1] ? kobject_uevent_env+0x109/0x980
>>>>> [ 9.152036][ T1] ? kernfs_get+0x13/0x20
>>>>> [ 9.152036][ T1] ? kobject_uevent+0xb/0x10
>>>>> [ 9.152036][ T1] ? kset_register+0x31/0x50
>>>>> [ 9.152036][ T1] ? kset_create_and_add+0x9f/0xd0
>>>>> [ 9.152036][ T1] ? acpi_sleep_proc_init+0x36/0x36
>>>>> [ 9.152036][ T1] do_one_initcall+0xfe/0x45a
>>>>> [ 9.152036][ T1] ? initcall_blacklisted+0x150/0x150
>>>>> [ 9.152036][ T1] ? rwsem_down_read_slowpath+0x930/0x930
>>>>> [ 9.152036][ T1] ? kasan_check_write+0x14/0x20
>>>>> [ 9.152036][ T1] ? up_write+0x6b/0x190
>>>>> [ 9.152036][ T1] kernel_init_freeable+0x614/0x6a7
>>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>>>> [ 9.152036][ T1] kernel_init+0x11/0x138
>>>>> [ 9.152036][ T1] ? rest_init+0x188/0x188
>>>>> [ 9.152036][ T1] ret_from_fork+0x22/0x40
>>>>> [ 9.152036][ T1] Modules linked in:
>>>>> [ 9.152036][ T1] CR2: 0000000000000dc8
>>>>> [ 9.152036][ T1] ---[ end trace 568acce4eca01945 ]---
>>>>> [ 9.152036][ T1] RIP: 0010:shrink_slab+0x111/0x440
>>>>> [ 9.152036][ T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>>> 00
>>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>>> <4f>
>>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>>> [ 9.152036][ T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>>> [ 9.152036][ T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>>> ffffffff8112f288
>>>>> [ 9.152036][ T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>>> ffffffff824e0440
>>>>> [ 9.152036][ T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>>> fffffbfff049c088
>>>>> [ 9.152036][ T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>>> 00000000000001b8
>>>>> [ 9.152036][ T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>>> ffff88905757f440
>>>>> [ 9.152036][ T1] FS: 0000000000000000(0000) GS:ffff889062800000(0000)
>>>>> knlGS:00000000
>>>>>
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-16 3:00 ` Yang Shi
@ 2019-07-16 23:36 ` Shakeel Butt
2019-07-17 0:12 ` Yang Shi
0 siblings, 1 reply; 21+ messages in thread
From: Shakeel Butt @ 2019-07-16 23:36 UTC (permalink / raw)
To: Yang Shi, Kirill Tkhai, Vladimir Davydov, Hugh Dickins,
Michal Hocko, Johannes Weiner, Roman Gushchin
Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML
Adding related people.
The thread starts at:
http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/15/19 6:36 PM, Qian Cai wrote:
> >
> >> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 7/15/19 2:23 PM, Qian Cai wrote:
> >>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> >>>>> Another possible lead is that without reverting the those commits below,
> >>>>> kdump
> >>>>> kernel would always also crash in shrink_slab_memcg() at this line,
> >>>>>
> >>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
> >>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
> >>>> think of where nodeinfo was freed but memcg was still online. Maybe a
> >>>> check is needed:
> >>> Actually, "memcg" is NULL.
> >> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> > Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
> >
> > - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> > + if (!mem_cgroup_online(memcg))
> > return 0;
> >
> > Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
> >
> > if (mem_cgroup_disabled())
> > return NULL;
>
> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
> check before calling shrink_slab_memcg() as below:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..2f03c61 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
> nid,
> unsigned long ret, freed = 0;
> struct shrinker *shrinker;
>
> - if (!mem_cgroup_is_root(memcg))
> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>
> if (!down_read_trylock(&shrinker_rwsem))
>
We were seeing unneeded oom-kills on kernels with
"cgroup_disabled=memory" and Yang's patch series basically expose the
bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
generalize shrink_slab() calls in shrink_node()") missed the case for
"cgroup_disabled=memory". However I am surprised that root_mem_cgroup
is allocated even for "cgroup_disabled=memory" and it seems like
css_alloc() is called even before checking if the corresponding
controller is disabled.
Yang, can you please send the above change with signed-off and CC to
stable as well?
thanks,
Shakeel
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-16 23:36 ` Shakeel Butt
@ 2019-07-17 0:12 ` Yang Shi
2019-07-17 17:02 ` Shakeel Butt
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-17 0:12 UTC (permalink / raw)
To: Shakeel Butt, Kirill Tkhai, Vladimir Davydov, Hugh Dickins,
Michal Hocko, Johannes Weiner, Roman Gushchin
Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML
On 7/16/19 4:36 PM, Shakeel Butt wrote:
> Adding related people.
>
> The thread starts at:
> http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
>
> On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>> On 7/15/19 6:36 PM, Qian Cai wrote:
>>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>>>> Another possible lead is that without reverting the those commits below,
>>>>>>> kdump
>>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>>>
>>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>>>> check is needed:
>>>>> Actually, "memcg" is NULL.
>>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
>>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>>>
>>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
>>> + if (!mem_cgroup_online(memcg))
>>> return 0;
>>>
>>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>>>
>>> if (mem_cgroup_disabled())
>>> return NULL;
>> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
>> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
>> check before calling shrink_slab_memcg() as below:
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..2f03c61 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
>> nid,
>> unsigned long ret, freed = 0;
>> struct shrinker *shrinker;
>>
>> - if (!mem_cgroup_is_root(memcg))
>> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>> return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>>
>> if (!down_read_trylock(&shrinker_rwsem))
>>
> We were seeing unneeded oom-kills on kernels with
> "cgroup_disabled=memory" and Yang's patch series basically expose the
> bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
> generalize shrink_slab() calls in shrink_node()") missed the case for
> "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
> is allocated even for "cgroup_disabled=memory" and it seems like
> css_alloc() is called even before checking if the corresponding
> controller is disabled.
I'm surprised too. A quick test with drgn shows root memcg is definitely
allocated:
>>> prog['root_mem_cgroup']
*(struct mem_cgroup *)0xffff8902cf058000 = {
[snip]
But, isn't this a bug?
Thanks,
Yang
>
> Yang, can you please send the above change with signed-off and CC to
> stable as well?
>
> thanks,
> Shakeel
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-17 0:12 ` Yang Shi
@ 2019-07-17 17:02 ` Shakeel Butt
2019-07-17 17:09 ` Yang Shi
0 siblings, 1 reply; 21+ messages in thread
From: Shakeel Butt @ 2019-07-17 17:02 UTC (permalink / raw)
To: Yang Shi
Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko,
Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov,
Andrew Morton, Linux MM, LKML
On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/16/19 4:36 PM, Shakeel Butt wrote:
> > Adding related people.
> >
> > The thread starts at:
> > http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
> >
> > On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>
> >>
> >> On 7/15/19 6:36 PM, Qian Cai wrote:
> >>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/15/19 2:23 PM, Qian Cai wrote:
> >>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> >>>>>>> Another possible lead is that without reverting the those commits below,
> >>>>>>> kdump
> >>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
> >>>>>>>
> >>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
> >>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
> >>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
> >>>>>> check is needed:
> >>>>> Actually, "memcg" is NULL.
> >>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> >>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
> >>>
> >>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> >>> + if (!mem_cgroup_online(memcg))
> >>> return 0;
> >>>
> >>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
> >>>
> >>> if (mem_cgroup_disabled())
> >>> return NULL;
> >> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
> >> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
> >> check before calling shrink_slab_memcg() as below:
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index a0301ed..2f03c61 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
> >> nid,
> >> unsigned long ret, freed = 0;
> >> struct shrinker *shrinker;
> >>
> >> - if (!mem_cgroup_is_root(memcg))
> >> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> >> return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> >>
> >> if (!down_read_trylock(&shrinker_rwsem))
> >>
> > We were seeing unneeded oom-kills on kernels with
> > "cgroup_disabled=memory" and Yang's patch series basically expose the
> > bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
> > generalize shrink_slab() calls in shrink_node()") missed the case for
> > "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
> > is allocated even for "cgroup_disabled=memory" and it seems like
> > css_alloc() is called even before checking if the corresponding
> > controller is disabled.
>
> I'm surprised too. A quick test with drgn shows root memcg is definitely
> allocated:
>
> >>> prog['root_mem_cgroup']
> *(struct mem_cgroup *)0xffff8902cf058000 = {
> [snip]
>
> But, isn't this a bug?
It can be treated as a bug as this is not expected but we can discuss
and take care of it later. I think we need your patch urgently as
memory reclaim and /proc/sys/vm/drop_caches is broken for
"cgroup_disabled=memory" kernel. So, please send your patch asap.
thanks,
Shakeel
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-17 17:02 ` Shakeel Butt
@ 2019-07-17 17:09 ` Yang Shi
0 siblings, 0 replies; 21+ messages in thread
From: Yang Shi @ 2019-07-17 17:09 UTC (permalink / raw)
To: Shakeel Butt
Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko,
Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov,
Andrew Morton, Linux MM, LKML
On 7/17/19 10:02 AM, Shakeel Butt wrote:
> On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>> On 7/16/19 4:36 PM, Shakeel Butt wrote:
>>> Adding related people.
>>>
>>> The thread starts at:
>>> http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
>>>
>>> On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>
>>>> On 7/15/19 6:36 PM, Qian Cai wrote:
>>>>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>>>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>>>>>> Another possible lead is that without reverting the those commits below,
>>>>>>>>> kdump
>>>>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>>>>>
>>>>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>>>>>> check is needed:
>>>>>>> Actually, "memcg" is NULL.
>>>>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
>>>>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>>>>>
>>>>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
>>>>> + if (!mem_cgroup_online(memcg))
>>>>> return 0;
>>>>>
>>>>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>>>>>
>>>>> if (mem_cgroup_disabled())
>>>>> return NULL;
>>>> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
>>>> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
>>>> check before calling shrink_slab_memcg() as below:
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a0301ed..2f03c61 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
>>>> nid,
>>>> unsigned long ret, freed = 0;
>>>> struct shrinker *shrinker;
>>>>
>>>> - if (!mem_cgroup_is_root(memcg))
>>>> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>>>> return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>>>>
>>>> if (!down_read_trylock(&shrinker_rwsem))
>>>>
>>> We were seeing unneeded oom-kills on kernels with
>>> "cgroup_disabled=memory" and Yang's patch series basically expose the
>>> bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
>>> generalize shrink_slab() calls in shrink_node()") missed the case for
>>> "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
>>> is allocated even for "cgroup_disabled=memory" and it seems like
>>> css_alloc() is called even before checking if the corresponding
>>> controller is disabled.
>> I'm surprised too. A quick test with drgn shows root memcg is definitely
>> allocated:
>>
>> >>> prog['root_mem_cgroup']
>> *(struct mem_cgroup *)0xffff8902cf058000 = {
>> [snip]
>>
>> But, isn't this a bug?
> It can be treated as a bug as this is not expected but we can discuss
> and take care of it later. I think we need your patch urgently as
> memory reclaim and /proc/sys/vm/drop_caches is broken for
> "cgroup_disabled=memory" kernel. So, please send your patch asap.
Sure. I'm going to post the patch soon.
>
> thanks,
> Shakeel
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-12 19:12 ` Yang Shi
2019-07-13 4:41 ` Yang Shi
2019-07-15 21:23 ` Qian Cai
@ 2019-07-19 0:54 ` Qian Cai
2019-07-19 0:59 ` Yang Shi
2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-07-19 0:54 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel
> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/11/19 2:07 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>> Hi Qian,
>>>
>>>
>>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>>> Could you please share more details about your test? How often did you
>>> run into this problem?
>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
>> is some more information.
>>
>> # cat .config
>>
>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>
> I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case.
>
> According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason.
>
> Would you please try the below patch?
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..66bd9db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> if (!list_empty(page_deferred_list(head))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(head));
> + list_del_init(page_deferred_list(head));
> }
> if (mapping)
> __dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> if (!list_empty(page_deferred_list(page))) {
> ds_queue->split_queue_len--;
> - list_del(page_deferred_list(page));
> + list_del_init(page_deferred_list(page));
> }
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> free_compound_page(page);
Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-19 0:54 ` Qian Cai
@ 2019-07-19 0:59 ` Yang Shi
2019-07-24 18:10 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-19 0:59 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel
On 7/18/19 5:54 PM, Qian Cai wrote:
>
>> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/11/19 2:07 PM, Qian Cai wrote:
>>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>>> Hi Qian,
>>>>
>>>>
>>>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>>>> Could you please share more details about your test? How often did you
>>>> run into this problem?
>>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
>>> is some more information.
>>>
>>> # cat .config
>>>
>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>> I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case.
>>
>> According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason.
>>
>> Would you please try the below patch?
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b7f709d..66bd9db 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>> if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
>> if (!list_empty(page_deferred_list(head))) {
>> ds_queue->split_queue_len--;
>> - list_del(page_deferred_list(head));
>> + list_del_init(page_deferred_list(head));
>> }
>> if (mapping)
>> __dec_node_page_state(page, NR_SHMEM_THPS);
>> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
>> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> if (!list_empty(page_deferred_list(page))) {
>> ds_queue->split_queue_len--;
>> - list_del(page_deferred_list(page));
>> + list_del_init(page_deferred_list(page));
>> }
>> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>> free_compound_page(page);
> Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.
It is because the patches have been dropped from -mm tree by Andrew due
to this problem I guess. You have to use next-20190711, or apply the
patches on today's linux-next.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-19 0:59 ` Yang Shi
@ 2019-07-24 18:10 ` Qian Cai
0 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-07-24 18:10 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel
On Thu, 2019-07-18 at 17:59 -0700, Yang Shi wrote:
>
> On 7/18/19 5:54 PM, Qian Cai wrote:
> >
> > > On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 7/11/19 2:07 PM, Qian Cai wrote:
> > > > On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
> > > > > Hi Qian,
> > > > >
> > > > >
> > > > > Thanks for reporting the issue. But, I can't reproduce it on my
> > > > > machine.
> > > > > Could you please share more details about your test? How often did you
> > > > > run into this problem?
> > > >
> > > > I can almost reproduce it every time on a HPE ProLiant DL385 Gen10
> > > > server. Here
> > > > is some more information.
> > > >
> > > > # cat .config
> > > >
> > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> > >
> > > I tried your kernel config, but I still can't reproduce it. My compiler
> > > doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my
> > > test, but I don't think this would make any difference for this case.
> > >
> > > According to the bug call trace in the earlier email, it looks deferred
> > > _split_scan lost race with put_compound_page. The put_compound_page would
> > > call free_transhuge_page() which delete the page from the deferred split
> > > queue, but it may still appear on the deferred list due to some reason.
> > >
> > > Would you please try the below patch?
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index b7f709d..66bd9db 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page,
> > > struct list_head *list)
> > > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> > > if (!list_empty(page_deferred_list(head))) {
> > > ds_queue->split_queue_len--;
> > > - list_del(page_deferred_list(head));
> > > + list_del_init(page_deferred_list(head));
> > > }
> > > if (mapping)
> > > __dec_node_page_state(page, NR_SHMEM_THPS);
> > > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
> > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> > > if (!list_empty(page_deferred_list(page))) {
> > > ds_queue->split_queue_len--;
> > > - list_del(page_deferred_list(page));
> > > + list_del_init(page_deferred_list(page));
> > > }
> > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> > > free_compound_page(page);
> >
> > Unfortunately, I am no longer be able to reproduce the original list
> > corruption with today’s linux-next.
>
> It is because the patches have been dropped from -mm tree by Andrew due
> to this problem I guess. You have to use next-20190711, or apply the
> patches on today's linux-next.
>
The patch you have here does not help. Only applied the part for
free_transhuge_page() as you requested.
[ 375.006307][ T3580] list_del corruption. next->prev should be
ffffea0030e10098, but was ffff888ea8d0cdb8
[ 375.015928][ T3580] ------------[ cut here ]------------
[ 375.021296][ T3580] kernel BUG at lib/list_debug.c:56!
[ 375.026491][ T3580] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 375.033680][ T3580] CPU: 84 PID: 3580 Comm: oom01 Tainted:
G W 5.2.0-next-20190711+ #2
[ 375.042964][ T3580] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 375.052256][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6
[ 375.058135][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7
c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f>
0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7
[ 375.077722][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082
[ 375.083684][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX:
ffffffffb015d728
[ 375.091566][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88903263d380
[ 375.099448][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09:
ffffed12064c7a70
[ 375.107330][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12:
ffffea0030e10098
[ 375.115212][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15:
ffffea0031d40098
[ 375.123095][ T3580] FS: 00007fc3dc851700(0000) GS:ffff889032600000(0000)
knlGS:0000000000000000
[ 375.131937][ T3580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 375.138421][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4:
00000000001406a0
[ 375.146301][ T3580] Call Trace:
[ 375.149472][ T3580] deferred_split_scan+0x337/0x740
[ 375.154475][ T3580] ? split_huge_page_to_list+0xe30/0xe30
[ 375.160002][ T3580] ? __sched_text_start+0x8/0x8
[ 375.164743][ T3580] ? __radix_tree_lookup+0x12d/0x1e0
[ 375.169923][ T3580] do_shrink_slab+0x244/0x5a0
[ 375.174490][ T3580] shrink_slab+0x253/0x440
[ 375.178794][ T3580] ? unregister_shrinker+0x110/0x110
[ 375.183972][ T3580] ? kasan_check_read+0x11/0x20
[ 375.188715][ T3580] ? mem_cgroup_protected+0x20f/0x260
[ 375.193976][ T3580] ? shrink_node+0x1ad/0xa30
[ 375.198453][ T3580] shrink_node+0x31e/0xa30
[ 375.202755][ T3580] ? shrink_node_memcg+0x1560/0x1560
[ 375.207934][ T3580] ? ktime_get+0x93/0x110
[ 375.212147][ T3580] do_try_to_free_pages+0x22f/0x820
[ 375.217236][ T3580] ? shrink_node+0xa30/0xa30
[ 375.221711][ T3580] ? kasan_check_read+0x11/0x20
[ 375.226450][ T3580] ? check_chain_key+0x1df/0x2e0
[ 375.231277][ T3580] try_to_free_pages+0x242/0x4d0
[ 375.236102][ T3580] ? do_try_to_free_pages+0x820/0x820
[ 375.241370][ T3580] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 375.246721][ T3580] ? kasan_check_read+0x11/0x20
[ 375.251459][ T3580] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 375.256722][ T3580] ? kasan_check_read+0x11/0x20
[ 375.261458][ T3580] ? check_chain_key+0x1df/0x2e0
[ 375.266287][ T3580] ? do_anonymous_page+0x343/0xe30
[ 375.271289][ T3580] ? lock_downgrade+0x390/0x390
[ 375.276029][ T3580] ? __count_memcg_events+0x8b/0x1c0
[ 375.281204][ T3580] ? kasan_check_read+0x11/0x20
[ 375.285945][ T3580] ? __lru_cache_add+0x122/0x160
[ 375.290774][ T3580] alloc_pages_vma+0x89/0x2c0
[ 375.295339][ T3580] do_anonymous_page+0x3e1/0xe30
[ 375.300168][ T3580] ? __update_load_avg_cfs_rq+0x2c/0x490
[ 375.305692][ T3580] ? finish_fault+0x120/0x120
[ 375.310257][ T3580] ? alloc_pages_vma+0x21e/0x2c0
[ 375.315085][ T3580] handle_pte_fault+0x457/0x12c0
[ 375.319912][ T3580] __handle_mm_fault+0x79a/0xa50
[ 375.324738][ T3580] ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 375.330175][ T3580] ? kasan_check_read+0x11/0x20
[ 375.334913][ T3580] ? __count_memcg_events+0x8b/0x1c0
[ 375.340090][ T3580] handle_mm_fault+0x17f/0x370
[ 375.344745][ T3580] __do_page_fault+0x25b/0x5d0
[ 375.349398][ T3580] do_page_fault+0x4c/0x2cf
[ 375.353793][ T3580] ? page_fault+0x5/0x20
[ 375.357920][ T3580] page_fault+0x1b/0x20
[ 375.361959][ T3580] RIP: 0033:0x410be0
[ 375.365737][ T3580] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 375.385323][ T3580] RSP: 002b:00007fc3dc850ec0 EFLAGS: 00010206
[ 375.391283][ T3580] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007fda6c168497
[ 375.399164][ T3580] RDX: 00000000041e9000 RSI: 00000000c0000000 RDI:
0000000000000000
[ 375.407047][ T3580] RBP: 00007fc25b850000 R08: 00000000ffffffff R09:
0000000000000000
[ 375.414928][ T3580] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000000001
[ 375.422812][ T3580] R13: 00007ffc4a58701f R14: 0000000000000000 R15:
00007fc3dc850fc0
[ 375.430694][ T3580] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars ip_tables
x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 firmware_class
libphy dm_mirror dm_region_hash dm_log dm_mod efivarfs
[ 375.455820][ T3580] ---[ end trace 82d52f9627313e53 ]---
[ 375.461172][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6
[ 375.467048][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7
c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f>
0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7
[ 375.486635][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082
[ 375.492597][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX:
ffffffffb015d728
[ 375.500479][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88903263d380
[ 375.508361][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09:
ffffed12064c7a70
[ 375.516244][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12:
ffffea0030e10098
[ 375.524124][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15:
ffffea0031d40098
[ 375.532007][ T3580] FS: 00007fc3dc851700(0000) GS:ffff889032600000(0000)
knlGS:0000000000000000
[ 375.540851][ T3580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 375.547335][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4:
00000000001406a0
[ 375.555217][ T3580] Kernel panic - not syncing: Fatal exception
[ 376.868640][ T3580] Shutting down cpus with NMI
[ 376.873223][ T3580] Kernel Offset: 0x2ec00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 376.884878][ T3580] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
2019-07-11 0:16 ` Yang Shi
2019-07-15 4:52 ` Yang Shi
@ 2019-07-24 21:13 ` Qian Cai
2019-07-25 21:46 ` Yang Shi
2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-07-24 21:13 UTC (permalink / raw)
To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
> Running LTP oom01 test case with swap triggers a crash below. Revert the
> series
> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
You might want to look harder on this commit, as reverted it alone on the top of
5.2.0-next-20190711 fixed the issue.
aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
[1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
linux.alibaba.com/
There are all console output while running LTP oom01 before the crash that might
be useful.
[ 656.302886][ T3384] WARNING: CPU: 79 PID: 3384 at mm/page_alloc.c:4608
__alloc_pages_nodemask+0x1a8a/0x1bc0
[ 656.304395][ T3409] kmemleak: Cannot allocate a kmemleak_object structure
[ 656.312714][ T3384] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_amd kvm ses enclosure dax_pmem irqbypass dax_pmem_core efivars ip_tables
x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 libphy
firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs
[ 656.320916][ T3409] kmemleak: Kernel memory leak detector disabled
[ 656.344509][ T3384] CPU: 79 PID: 3384 Comm: oom01 Not tainted 5.2.0-next-
20190711+ #3
[ 656.344523][ T3384] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 656.352100][ T829] kmemleak: Automatic memory scanning thread ended
[ 656.358648][ T3384] RIP: 0010:__alloc_pages_nodemask+0x1a8a/0x1bc0
[ 656.358658][ T3384] Code: 00 85 d2 0f 85 a1 00 00 00 48 c7 c7 e0 29 c3 a3 e8
3b 98 62 00 65 48 8b 1c 25 80 ee 01 00 e9 85 fa ff ff 0f 0b e9 3e fb ff ff <0f>
0b 48 8b b5 00 ff ff ff 8b 8d 84 fe ff ff 48 c7 c2 00 1d 6c a3
[ 656.358675][ T3384] RSP: 0000:ffff888efa4a6210 EFLAGS: 00010046
[ 656.406140][ T3384] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffffa2b28be2
[ 656.414033][ T3384] RDX: 0000000000000000 RSI: dffffc0000000000 RDI:
ffffffffa4d15d60
[ 656.421926][ T3384] RBP: ffff888efa4a6420 R08: fffffbfff49a2bad R09:
fffffbfff49a2bac
[ 656.429818][ T3384] R10: fffffbfff49a2bac R11: 0000000000000003 R12:
ffffffffa4d15d60
[ 656.437711][ T3384] R13: 0000000000000000 R14: 0000000000000800 R15:
0000000000000000
[ 656.445605][ T3384] FS: 00007ff44adfc700(0000) GS:ffff889032f80000(0000)
knlGS:0000000000000000
[ 656.454459][ T3384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 656.460952][ T3384] CR2: 00007ff2f05e1000 CR3: 0000001012e44000 CR4:
00000000001406a0
[ 656.468843][ T3384] Call Trace:
[ 656.472026][ T3384] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 656.477303][ T3384] ? stack_depot_save+0x215/0x58b
[ 656.482228][ T3384] ? lock_downgrade+0x390/0x390
[ 656.486976][ T3384] ? stack_depot_save+0x183/0x58b
[ 656.491900][ T3384] ? kasan_check_read+0x11/0x20
[ 656.496647][ T3384] ? do_raw_spin_unlock+0xa8/0x140
[ 656.501658][ T3384] ? stack_depot_save+0x215/0x58b
[ 656.506582][ T3384] alloc_pages_current+0x9c/0x110
[ 656.511505][ T3384] allocate_slab+0x351/0x11f0
[ 656.516077][ T3384] ? kasan_slab_alloc+0x11/0x20
[ 656.520824][ T3384] new_slab+0x46/0x70
[ 656.524702][ T3384] ? pageout.isra.4+0x3e5/0xa00
[ 656.529449][ T3384] ___slab_alloc+0x5d4/0x9c0
[ 656.533933][ T3384] ? try_to_free_pages+0x242/0x4d0
[ 656.538941][ T3384] ? __alloc_pages_nodemask+0x9ce/0x1bc0
[ 656.544476][ T3384] ? alloc_pages_vma+0x89/0x2c0
[ 656.549226][ T3384] ? __do_page_fault+0x25b/0x5d0
[ 656.554064][ T3384] ? create_object+0x3a/0x3e0
[ 656.558637][ T3384] ? init_object+0x7e/0x90
[ 656.562947][ T3384] ? create_object+0x3a/0x3e0
[ 656.567520][ T3384] __slab_alloc+0x12/0x20
[ 656.571742][ T3384] ? __slab_alloc+0x12/0x20
[ 656.576142][ T3384] kmem_cache_alloc+0x32a/0x400
[ 656.580890][ T3384] create_object+0x3a/0x3e0
[ 656.585291][ T3384] ? stack_depot_save+0x183/0x58b
[ 656.590215][ T3384] kmemleak_alloc+0x71/0xa0
[ 656.594611][ T3384] kmem_cache_alloc+0x272/0x400
[ 656.599361][ T3384] ? ___might_sleep+0xab/0xc0
[ 656.603934][ T3384] ? mempool_free+0x170/0x170
[ 656.608507][ T3384] mempool_alloc_slab+0x2d/0x40
[ 656.613254][ T3384] mempool_alloc+0x10a/0x29e
[ 656.617739][ T3384] ? alloc_pages_vma+0x89/0x2c0
[ 656.622485][ T3384] ? mempool_resize+0x390/0x390
[ 656.627233][ T3384] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 656.633730][ T3384] bio_alloc_bioset+0x150/0x330
[ 656.638477][ T3384] ? bvec_alloc+0x1b0/0x1b0
[ 656.642892][ T3384] alloc_io+0x2f/0x230 [dm_mod]
[ 656.647654][ T3384] __split_and_process_bio+0x99/0x630 [dm_mod]
[ 656.653714][ T3384] ? blk_rq_map_sg+0x9f0/0x9f0
[ 656.658388][ T3384] ? __send_empty_flush.constprop.11+0x1f0/0x1f0 [dm_mod]
[ 656.665407][ T3384] ? check_chain_key+0x1df/0x2e0
[ 656.670244][ T3384] ? kasan_check_read+0x11/0x20
[ 656.674992][ T3384] ? blk_queue_split+0x60/0x90
[ 656.679654][ T3384] ? __blk_queue_split+0x970/0x970
[ 656.684679][ T3384] dm_process_bio+0x33f/0x520 [dm_mod]
[ 656.690054][ T3384] ? __process_bio+0x230/0x230 [dm_mod]
[ 656.695515][ T3384] dm_make_request+0xbd/0x150 [dm_mod]
[ 656.700888][ T3384] ? dm_wq_work+0x1b0/0x1b0 [dm_mod]
[ 656.706073][ T3384] ? lock_downgrade+0x390/0x390
[ 656.710821][ T3384] generic_make_request+0x179/0x4a0
[ 656.715917][ T3384] ? blk_queue_exit+0xc0/0xc0
[ 656.720489][ T3384] ? __unlock_page_memcg+0x4f/0x90
[ 656.725495][ T3384] ? unlock_page_memcg+0x1f/0x30
[ 656.730329][ T3384] submit_bio+0xaa/0x270
[ 656.734466][ T3384] ? generic_make_request+0x4a0/0x4a0
[ 656.739739][ T3384] __swap_writepage+0x8f5/0xba0
[ 656.744484][ T3384] ? __x64_sys_madvise.cold.0+0x22/0x22
[ 656.749931][ T3384] ? generic_swapfile_activate+0x2a0/0x2a0
[ 656.755638][ T3384] ? do_raw_spin_lock+0x118/0x1d0
[ 656.760559][ T3384] ? rwlock_bug.part.0+0x60/0x60
[ 656.765393][ T3384] ? page_swapcount+0x68/0xc0
[ 656.769967][ T3384] ? kasan_check_read+0x11/0x20
[ 656.774713][ T3384] ? do_raw_spin_unlock+0xa8/0x140
[ 656.779724][ T3384] ? __frontswap_store+0x103/0x2b0
[ 656.784735][ T3384] swap_writepage+0x65/0xb0
[ 656.789134][ T3384] pageout.isra.4+0x3e5/0xa00
[ 656.793707][ T3384] ? shrink_slab+0x440/0x440
[ 656.798192][ T3384] ? kasan_check_read+0x11/0x20
[ 656.802939][ T3384] shrink_page_list+0x159f/0x2650
[ 656.807860][ T3384] ? page_evictable+0x150/0x150
[ 656.812606][ T3384] ? kasan_check_read+0x11/0x20
[ 656.817352][ T3384] ? check_chain_key+0x1df/0x2e0
[ 656.822185][ T3384] ? shrink_inactive_list+0x2ea/0x770
[ 656.827456][ T3384] ? lock_downgrade+0x390/0x390
[ 656.832202][ T3384] ? do_raw_spin_lock+0x118/0x1d0
[ 656.837126][ T3384] ? rwlock_bug.part.0+0x60/0x60
[ 656.841959][ T3384] ? kasan_check_read+0x11/0x20
[ 656.846706][ T3384] ? do_raw_spin_unlock+0xa8/0x140
[ 656.851715][ T3384] shrink_inactive_list+0x373/0x770
[ 656.856812][ T3384] ? move_pages_to_lru+0xb60/0xb60
[ 656.861820][ T3384] ? shrink_node_memcg+0xcfa/0x1560
[ 656.866917][ T3384] ? lock_downgrade+0x390/0x390
[ 656.871665][ T3384] ? find_next_bit+0x2c/0xa0
[ 656.876151][ T3384] shrink_node_memcg+0x4ff/0x1560
[ 656.881075][ T3384] ? shrink_active_list+0xa10/0xa10
[ 656.886173][ T3384] ? dev_ifsioc+0xb0/0x4d0
[ 656.890485][ T3384] ? mem_cgroup_iter+0x18e/0x840
[ 656.895319][ T3384] ? kasan_check_read+0x11/0x20
[ 656.900066][ T3384] ? mem_cgroup_protected+0x20f/0x260
[ 656.905334][ T3384] shrink_node+0x1d3/0xa30
[ 656.909644][ T3384] ? shrink_node_memcg+0x1560/0x1560
[ 656.914828][ T3384] ? ktime_get+0x93/0x110
[ 656.919050][ T3384] do_try_to_free_pages+0x22f/0x820
[ 656.924146][ T3384] ? shrink_node+0xa30/0xa30
[ 656.928632][ T3384] ? kasan_check_read+0x11/0x20
[ 656.933379][ T3384] ? check_chain_key+0x1df/0x2e0
[ 656.938212][ T3384] try_to_free_pages+0x242/0x4d0
[ 656.943046][ T3384] ? do_try_to_free_pages+0x820/0x820
[ 656.948318][ T3384] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 656.953677][ T3384] ? kasan_check_read+0x11/0x20
[ 656.958424][ T3384] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 656.963697][ T3384] ? kasan_check_read+0x11/0x20
[ 656.968443][ T3384] ? check_chain_key+0x1df/0x2e0
[ 656.973277][ T3384] ? do_anonymous_page+0x343/0xe30
[ 656.978288][ T3384] ? lock_downgrade+0x390/0x390
[ 656.983035][ T3384] ? __count_memcg_events+0x8b/0x1c0
[ 656.988218][ T3384] ? kasan_check_read+0x11/0x20
[ 656.992966][ T3384] ? __lru_cache_add+0x122/0x160
[ 656.997802][ T3384] alloc_pages_vma+0x89/0x2c0
[ 657.002375][ T3384] do_anonymous_page+0x3e1/0xe30
[ 657.007211][ T3384] ? __update_load_avg_cfs_rq+0x2c/0x490
[ 657.012743][ T3384] ? finish_fault+0x120/0x120
[ 657.017314][ T3384] ? alloc_pages_vma+0x21e/0x2c0
[ 657.022148][ T3384] handle_pte_fault+0x457/0x12c0
[ 657.026984][ T3384] __handle_mm_fault+0x79a/0xa50
[ 657.031819][ T3384] ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 657.037267][ T3384] ? kasan_check_read+0x11/0x20
[ 657.042013][ T3384] ? __count_memcg_events+0x8b/0x1c0
[ 657.047199][ T3384] handle_mm_fault+0x17f/0x370
[ 657.051863][ T3384] __do_page_fault+0x25b/0x5d0
[ 657.056521][ T3384] do_page_fault+0x4c/0x2cf
[ 657.060922][ T3384] ? page_[ 659.105948][ T3124] kworker/2:1H: page
allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
[ 659.106045][ T1598] kworker/10:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4
[ 659.118049][ T3124] CPU: 2 PID: 3124 Comm: kworker/2:1H Tainted:
G W 5.2.0-next-20190711+ #3
[ 659.137325][ T762] ODEBUG: Out of memory. ODEBUG disabled
[ 659.140015][ T3124] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 659.140032][ T3124] Workqueue: kblockd blk_mq_run_work_fn
[ 659.160266][ T3124] Call Trace:
[ 659.163442][ T3124] dump_stack+0x62/0x9a
[ 659.167487][ T3124] warn_alloc.cold.45+0x8a/0x12a
[ 659.172315][ T3124] ? zone_watermark_ok_safe+0x1a0/0x1a0
[ 659.177756][ T3124] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 659.184252][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.190658][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.197060][ T3124] ? __isolate_free_page+0x390/0x390
[ 659.202239][ T3124] __alloc_pages_nodemask+0x1aab/0x1bc0
[ 659.207680][ T3124] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 659.212949][ T3124] ? stack_trace_save+0x87/0xb0
[ 659.217689][ T3124] ? freezing_slow_path.cold.1+0x35/0x35
[ 659.223219][ T3124] ? __kasan_kmalloc.part.0+0x81/0xc0
[ 659.228485][ T3124] ? __kasan_kmalloc.part.0+0x44/0xc0
[ 659.233750][ T3124] ? __kasan_kmalloc.constprop.1+0xac/0xc0
[ 659.239451][ T3124] ? kasan_slab_alloc+0x11/0x20
[ 659.244196][ T3124] ? kmem_cache_alloc+0x17a/0x400
[ 659.249113][ T3124] ? alloc_iova+0x33/0x210
[ 659.253418][ T3124] ? alloc_iova_fast+0x47/0xba
[ 659.258073][ T3124] ? dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 659.263603][ T3124] ? map_sg+0x99/0x2f0
[ 659.267558][ T3124] ? scsi_dma_map+0xc6/0x160
[ 659.272042][ T3124] ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 659.280020][ T3124] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.286421][ T3124] ? scsi_queue_rq+0x7c6/0x1280
[ 659.291163][ T3124] ? ftrace_graph_ret_addr+0x2a/0xb0
[ 659.296340][ T3124] ? stack_trace_save+0x87/0xb0
[ 659.301081][ T3124] alloc_pages_current+0x9c/0x110
[ 659.305998][ T3124] allocate_slab+0x351/0x11f0
[ 659.310564][ T3124] new_slab+0x46/0x70
[ 659.314433][ T3124] ___slab_alloc+0x5d4/0x9c0
[ 659.318913][ T3124] ? should_fail+0x107/0x3bc
[ 659.323393][ T3124] ? alloc_iova+0x33/0x210
[ 659.327700][ T3124] ? lock_downgrade+0x390/0x390
[ 659.332441][ T3124] ? lock_downgrade+0x390/0x390
[ 659.337183][ T3124] ? alloc_iova+0x33/0x210
[ 659.341487][ T3124] __slab_alloc+0x12/0x20
[ 659.345704][ T3124] ? __slab_alloc+0x12/0x20
[ 659.350096][ T3124] kmem_cache_alloc+0x32a/0x400
[ 659.354838][ T3124] ? kasan_check_read+0x11/0x20
[ 659.359580][ T3124] ? do_raw_spin_unlock+0xa8/0x140
[ 659.364585][ T3124] alloc_iova+0x33/0x210
[ 659.368714][ T3124] ? iova_rcache_get+0x1a1/0x300
[ 659.373545][ T3124] alloc_iova_fast+0x47/0xba
[ 659.378026][ T3124] dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 659.383381][ T3124] map_sg+0x99/0x2f0
[ 659.387161][ T3124] scsi_dma_map+0xc6/0x160
[ 659.391470][ T3124] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 659.399274][ T3124] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[ 659.405507][ T3124] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.411733][ T3124] ? scsi_init_io+0x102/0x150
[ 659.416306][ T3124] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[ 659.422713][ T3124] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[ 659.428593][ T3124] ? sd_init_command+0x88b/0x930 [sd_mod]
[ 659.434211][ T3124] ? blk_add_timer+0xd7/0x110
[ 659.438780][ T3124] scsi_queue_rq+0x7c6/0x1280
[ 659.443350][ T3124] blk_mq_dispatch_rq_list+0x9d3/0xba0
[ 659.448702][ T3124] ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[ 659.454145][ T3124] ? blk_mq_get_driver_tag+0x290/0x290
[ 659.459498][ T3124] ? __lock_acquire.isra.13+0x430/0x830
[ 659.464938][ T3124] blk_mq_sched_dispatch_requests+0x2f4/0x300
[ 659.470903][ T3124] ? blk_mq_sched_restart+0x60/0x60
[ 659.475993][ T3124] __blk_mq_run_hw_queue+0x156/0x230
[ 659.481172][ T3124] ? hctx_lock+0xc0/0xc0
[ 659.485301][ T3124] ? process_one_work+0x426/0xa70
[ 659.490217][ T3124] blk_mq_run_work_fn+0x3b/0x40
[ 659.494959][ T3124] process_one_work+0x53b/0xa70
[ 659.499703][ T3124] ? pwq_dec_nr_in_flight+0x170/0x170
[ 659.504967][ T3124] worker_thread+0x63/0x5b0
[ 659.509361][ T3124] kthread+0x1df/0x200
[ 659.513316][ T3124] ? process_one_work+0xa70/0xa70
[ 659.518231][ T3124] ? kthread_park+0xc0/0xc0
[ 659.522625][ T3124] ret_from_fork+0x22/0x40
[ 659.526937][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted:
G W 5.2.0-next-20190711+ #3
[ 659.526991][ T3124] Mem-Info:
[ 659.536921][ T1598] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 659.536934][ T1598] Workqueue: kblockd blk_mq_run_work_fn
[ 659.540067][ T3124] active_anon:4662210 inactive_anon:359358
isolated_anon:2005
[ 659.540067][ T3124] active_file:10032 inactive_file:12947 isolated_file:0
[ 659.540067][ T3124] unevictable:0 dirty:12 writeback:0 unstable:0
[ 659.540067][ T3124] slab_reclaimable:71207 slab_unreclaimable:1252996
[ 659.540067][ T3124] mapped:17530 shmem:1850 pagetables:11491 bounce:0
[ 659.540067][ T3124] free:54096 free_pcp:5994 free_cma:84
[ 659.549192][ T1598] Call Trace:
[ 659.549203][ T1598] dump_stack+0x62/0x9a
[ 659.554639][ T3124] Node 0 active_anon:2246440kB inactive_anon:572540kB
active_file:19500kB inactive_file:19016kB unevictable:0kB isolated(anon):7708kB
isolated(file):0kB mapped:24840kB dirty:8kB writeback:0kB shmem:1372kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1689600kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[ 659.593619][ T1598] warn_alloc.cold.45+0x8a/0x12a
[ 659.596785][ T3124] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.600821][ T1598] ? zone_watermark_ok_safe+0x1a0/0x1a0
[ 659.630195][ T3124] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.635021][ T1598] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 659.661328][ T3124] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.661337][ T3124] Node 4 active_anon:16402112kB inactive_anon:865180kB
active_file:20600kB inactive_file:32712kB unevictable:0kB isolated(anon):304kB
isolated(file):0kB mapped:45216kB dirty:40kB writeback:12kB shmem:6028kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[ 659.666778][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.693086][ T3124] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.693096][ T3124] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.699583][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.725894][ T3124] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 659.755524][ T1598] ? __isolate_free_page+0x390/0x390
[ 659.761953][ T3124] Node 0 DMA free:15908kB min:24kB low:36kB high:48kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB writepending:0kB present:15996kB managed:15908kB mlocked:0kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
[ 659.788234][ T1598] __alloc_pages_nodemask+0x1aab/0x1bc0
[ 659.814544][ T3124] lowmem_reserve[]: 0 1532 19982 19982 19982
[ 659.820945][ T1598] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 659.847287][ T3124] Node 0 DMA32 free:73504kB min:2676kB low:4244kB
high:5812kB active_anon:1190128kB inactive_anon:362496kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:1923080kB
managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
free_pcp:1432kB local_pcp:0kB free_cma:0kB
[ 659.852428][ T1598] ? stack_trace_save+0x87/0xb0
[ 659.852435][ T1598] ? freezing_slow_path.cold.1+0x35/0x35
[ 659.879003][ T3124] lowmem_reserve[]: 0 0 18450 18450 18450
[ 659.884446][ T1598] ? __kasan_kmalloc.part.0+0x81/0xc0
[ 659.890346][ T3124] Node 0 Normal free:47760kB min:137264kB low:156156kB
high:175048kB active_anon:1056208kB inactive_anon:209672kB active_file:19456kB
inactive_file:18996kB unevictable:0kB writepending:0kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB
bounce:0kB free_pcp:9340kB local_pcp:164kB free_cma:0kB
[ 659.895574][ T1598] ? __kasan_kmalloc.part.0+0x44/0xc0
[ 659.895581][ T1598] ? __kasan_kmalloc.constprop.1+0xac/0xc0
[ 659.924420][ T3124] lowmem_reserve[]: 0 0 0 0 0
[ 659.929163][ T1598] ? kasan_slab_alloc+0x11/0x20
[ 659.929170][ T1598] ? kmem_cache_alloc+0x17a/0x400
[ 659.934724][ T3124] Node 4 Normal free:72728kB min:234904kB low:267232kB
high:299560kB active_anon:16401776kB inactive_anon:865580kB active_file:20596kB
inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:12956kB local_pcp:24kB free_cma:336kB
[ 659.940301][ T1598] ? alloc_iova+0x33/0x210
[ 659.940307][ T1598] ? alloc_iova_fast+0x47/0xba
[ 659.945563][ T3124] lowmem_reserve[]: 0 0 0 0 0
[ 659.976773][ T1598] ? dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 659.976780][ T1598] ? map_sg+0x99/0x2f0
[ 659.982039][ T3124] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U)
1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[ 659.987736][ T1598] ? scsi_dma_map+0xc6/0x160
[ 659.987747][ T1598] ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 659.992300][ T3124] Node 0 DMA32: 0*4kB 0*8kB 2*16kB (M) 5*32kB (UM) 17*64kB
(UM) 8*128kB (UM) 12*256kB (UM) 11*512kB (UM) 10*1024kB (UM) 2*2048kB (UM)
12*4096kB (M) = 74496kB
[ 659.997045][ T1598] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 659.997051][ T1598] ? scsi_queue_rq+0x7c6/0x1280
[ 660.001958][ T3124] Node 0 Normal: 0*4kB 0*8kB 198*16kB (MEH) 356*32kB (ME)
83*64kB (UME) 15*128kB (UME) 101*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB =
47648kB
[ 660.033521][ T1598] ? ftrace_graph_ret_addr+0x2a/0xb0
[ 660.033528][ T1598] ? stack_trace_save+0x87/0xb0
[ 660.037828][ T3124] Node 4 Normal: 0*4kB 0*8kB 211*16kB (UME) 441*32kB (UME)
449*64kB (UME) 71*128kB (ME) 62*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB =
71184kB
[ 660.042481][ T1598] alloc_pages_current+0x9c/0x110
[ 660.047042][ T3124] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 660.052569][ T1598] allocate_slab+0x351/0x11f0
[ 660.056516][ T3124] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[ 660.056521][ T3124] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 660.070694][ T1598] new_slab+0x46/0x70
[ 660.075169][ T3124] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[ 660.083141][ T1598] ___slab_alloc+0x5d4/0x9c0
[ 660.098879][ T3124] 26058 total pagecache pages
[ 660.098894][ T3124] 1298 pages in swap cache
[ 660.105279][ T1598] ? should_fail+0x107/0x3bc
[ 660.105285][ T1598] ? alloc_iova+0x33/0x210
[ 660.110020][ T3124] Swap cache stats: add 2607, delete 1311, find 0/1
[ 660.110024][ T3124] Free swap = 32919548kB
[ 660.124719][ T1598] ? lock_downgrade+0x390/0x390
[ 660.124725][ T1598] ? lock_downgrade+0x390/0x390
[ 660.129894][ T3124] Total swap = 32952316kB
[ 660.129899][ T3124] 15685025 pages RAM
[ 660.134637][ T1598] ? alloc_iova+0x33/0x210
[ 660.149328][ T3124] 0 pages HighMem/MovableOnly
[ 660.149332][ T3124] 2465994 pages reserved
[ 660.154245][ T1598] __slab_alloc+0x12/0x20
[ 660.154252][ T1598] ? __slab_alloc+0x12/0x20
[ 660.163701][ T3124] 16384 pages cma reserved
[ 660.163763][ T3124] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[ 660.168269][ T1598] kmem_cache_alloc+0x32a/0x400
[ 660.168276][ T1598] ? kasan_check_read+0x11/0x20
[ 660.177465][ T3124] cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[ 660.177470][ T3124] node 0: slabs: 10580, objs: 95220, free: 0
[ 660.186924][ T1598] ? do_raw_spin_unlock+0xa8/0x140
[ 660.186930][ T1598] alloc_iova+0x33/0x210
[ 660.190792][ T3124] node 4: slabs: 2292, objs: 20628, free: 25
[ 660.199982][ T1598] ? iova_rcache_get+0x1a1/0x300
[ 660.199989][ T1598] alloc_iova_fast+0x47/0xba
[ 660.204513][ T3124] kworker/2:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4
[ 660.209026][ T1598] dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 660.351109][ T1598] map_sg+0x99/0x2f0
[ 660.354891][ T1598] ? __debug_object_init+0x412/0x7a0
[ 660.360070][ T1598] scsi_dma_map+0xc6/0x160
[ 660.364381][ T1598] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 660.372184][ T1598] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[ 660.378415][ T1598] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 660.384644][ T1598] ? scsi_init_io+0x102/0x150
[ 660.389217][ T1598] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[ 660.395622][ T1598] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[ 660.401503][ T1598] ? sd_init_command+0x88b/0x930 [sd_mod]
[ 660.407119][ T1598] ? blk_add_timer+0xd7/0x110
[ 660.411686][ T1598] scsi_queue_rq+0x7c6/0x1280
[ 660.416252][ T1598] blk_mq_dispatch_rq_list+0x9d3/0xba0
[ 660.421604][ T1598] ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[ 660.427045][ T1598] ? blk_mq_get_driver_tag+0x290/0x290
[ 660.432396][ T1598] ?
__lock_acquire.isra.13+0xT3124] __blk_mq_run_hw_queue+0x156/0x230
[ 660.822569][ T3124] ? hctx_lock+0xc0/0xc0
[ 660.826700][ T3124] ? process_one_work+0x426/0xa70
[ 660.831617][ T3124] blk_mq_run_work_fn+0x3b/0x40
[ 660.836358][ T3124] process_one_work+0x53b/0xa70
[ 660.841100][ T3124] ? pwq_dec_nr_in_flight+0x170/0x170
[ 660.846365][ T3124] worker_thread+0x63/0x5b0
[ 660.850756][ T3124] kthread+0x1df/0x200
[ 660.854712][ T3124] ? process_one_work+0xa70/0xa70
[ 660.859626][ T3124] ? kthread_park+0xc0/0xc0
[ 660.864021][ T3124] ret_from_fork+0x22/0x40
[ 660.868328][ T3124] warn_alloc_show_mem: 1 callbacks suppressed
[ 660.868332][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted:
G W 5.2.0-next-20190711+ #3
[ 660.868335][ T3124] Mem-Info:
[ 660.868485][ T3124] active_anon:4662011 inactive_anon:359383
isolated_anon:2155
[ 660.868485][ T3124] active_file:10012 inactive_file:12922 isolated_file:0
[ 660.868485][ T3124] unevictable:0 dirty:12 writeback:0 unstable:0
[ 660.868485][ T3h:175048kB active_anon:1056208kB inactive_anon:209448kB
active_file:19452kB inactive_file:18996kB unevictable:0kB writepending:0kB
present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB
pagetables:10064kB bounce:0kB free_pcp:8784kB local_pcp:164kB free_cma:0kB
[ 661.222532][ T1598] ? kernel_poison_pages.cold.2+0x8c/0x8c
[ 661.228397][ T3124] lowmem_reserve[]: 0 0 0 0 0
[ 661.233138][ T1598] ? vprintk_default+0x1f/0x30
[ 661.233146][ T1598] alloc_pages_current+0x9c/0x110
[ 661.238174][ T3124] Node 4 Normal free:71384kB min:234904kB low:267232kB
high:299560kB active_anon:16401776kB inactive_anon:865588kB active_file:20596kB
inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:12872kB local_pcp:24kB free_cma:336kB
[ 661.266900][ T1598] allocate_slab+0x351/0x11f0
[ 661.266905][ T1598] new_slab+0x46/0x70
[ 661.271461][ T3124] lowmem_reserve[]: 0 0 0 0 0
[ 661.275941][ T1598] ___slab_alloc+0x5d4/0x9c0
[ 661.275948][ T1598] ? should0
[ 661.543007][ T3132] cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[ 661.543011][ T3203] node 0: slabs: 10582, objs: 95238, free: 7
[ 661.543016][ T3132] node 0: slabs: 10582, objs: 95238, free: 7
[ 661.543020][ T3203] node 4: slabs: 2293, objs: 20637, free: 30
[ 661.543026][ T3132] node 4: slabs: 2293, objs: 20637, free: 30
[ 661.543040][ T3203] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[ 661.543046][ T3203] cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[ 661.543052][ T3203] node 0: slabs: 10582, objs: 95238, free: 7
[ 661.543057][ T3132] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[ 661.543061][ T3203] node 4: slabs: 2293, objs: 20637, free: 30
[ 661.543066][ T3132] cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[ 661.543072][ T3132] node 0: slabs: 10582, objs: 95238, free: 7
[ 661.543078][ T3132] node 4: slabs: 2293, objs: 20637, free: 30
[ 661.543544][ T3205] SLUB: Unable to allocnevictable:0kB isolated(anon):352kB
isolated(file):0kB mapped:45056kB dirty:40kB writeback:52kB shmem:6028kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[ 662.181289][ T1598] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 662.207607][ T3209] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 662.212434][ T1598] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 662.238751][ T3209] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 662.244187][ T1598] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(ano alloc_iova_fast+0x47/0xba
[ 662.835750][ T3209] dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 662.841103][ T3209] map_sg+0x99/0x2f0
[ 662.844886][ T3209] ? kasan_check_read+0x11/0x20
[ 662.849627][ T3209] scsi_dma_map+0xc6/0x160
[ 662.853938][ T3209] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 662.861740][ T3209] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[ 662.867971][ T3209] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 662.874198][ T3209] ? scsi_init_io+0x102/0x150
[ 662.878768][ T3209] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[ 662.885176][ T3209] ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[ 662.891055][ T3209] ? sd_init_command+0x88b/0x930 [sd_mod]
[ 662.896672][ T3209] ? blk_add_timer+0xd7/0x110
[ 662.901240][ T3209] scsi_queue_rq+0x7c6/0x1280
[ 662.905807][ T3209] blk_mq_dispatch_rq_list+0x9d3/0xba0
[ 662.911159][ T3209] ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[ 662.916601][ T3209] ? blk_mq_get_driver_tag+0x290/0x290
[ 662.921953][ T3209] ? __lock_acquire.isra.13+0x430/0x830
[ 662.927394][ T3209] blk_mq_sched_diag+0x290/0x290
[ 663.313403][ T3146] ? __lock_acquire.isra.13+0x430/0x830
[ 663.318844][ T3146] blk_mq_sched_dispatch_requests+0x2f4/0x300
[ 663.324807][ T3146] ? blk_mq_sched_restart+0x60/0x60
[ 663.329898][ T3146] __blk_mq_run_hw_queue+0x156/0x230
[ 663.335076][ T3146] ? hctx_lock+0xc0/0xc0
[ 663.339211][ T3146] ? process_one_work+0x426/0xa70
[ 663.344128][ T3146] blk_mq_run_work_fn+0x3b/0x40
[ 663.348870][ T3146] process_one_work+0x53b/0xa70
[ 663.353613][ T3146] ? pwq_dec_nr_in_flight+0x170/0x170
[ 663.358880][ T3146] worker_thread+0x63/0x5b0
[ 663.363277][ T3146] kthread+0x1df/0x200
[ 663.367233][ T3146] ? process_one_work+0xa70/0xa70
[ 663.372148][ T3146] ? kthread_park+0xc0/0xc0
[ 663.376543][ T3146] ret_from_fork+0x22/0x40
[ 663.380848][ T3146] warn_alloc_show_mem: 1 callbacks suppressed
[ 663.380855][ T3123] CPU: 1 PID: 3123 Comm: kworker/1:1H Tainted:
G W 5.2.0-next-20190711+ #3
[ 663.380857][ T3146] Mem-Info:
[ 663.381000][ T3146] active_anon:4654271 inactive_anon:367023
isolated_anon:2263
[ 663.381000T3123] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 663.744691][ T3146] Node 0 Normal free:74264kB min:137264kB low:156156kB
high:175048kB active_anon:1055816kB inactive_anon:209292kB active_file:19416kB
inactive_file:18964kB unevictable:0kB writepending:248kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB
bounce:0kB free_pcp:9356kB local_pcp:124kB free_cma:0kB
[ 663.750101][ T3123] ? lock_downgrade+0x390/0x390
[ 663.778942][ T3146] lowmem_reserve[]: 0 0 0 0 0
[ 663.783688][ T3123] ? do_raw_spin_lock+0x118/0x1d0
[ 663.789326][ T3146] Node 4 Normal free:81632kB min:234904kB low:267232kB
high:299560kB active_anon:16368972kB inactive_anon:898504kB active_file:20548kB
inactive_file:32468kB unevictable:0kB writepending:104kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:11372kB local_pcp:160kB free_cma:0kB
[ 663.794556][ T3123] ? rwlock_bug.part.0+0x60/0x60
[ 663.794563][ T3123] ? get_partial_node+0x48/0x540
[ 663.825936][ T3146] lowmem_reserve[]: 0 0 0 0 0
[ 663.830678][ T3123] #3
[ 664.269661][ T3202] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 664.278993][ T3202] Workqueue: kblockd blk_mq_run_work_fn
[ 664.284453][ T3202] Call Trace:
[ 664.287655][ T3202] dump_stack+0x62/0x9a
[ 664.291721][ T3202] warn_alloc.cold.45+0x8a/0x12a
[ 664.296577][ T3202] ? zone_watermark_ok_safe+0x1a0/0x1a0
[ 664.302044][ T3202] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 664.308564][ T3202] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 664.314996][ T3202] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 664.321420][ T3202] ? __isolate_free_page+0x390/0x390
[ 664.326613][ T3202] __alloc_pages_nodemask+0x1aab/0x1bc0
[ 664.332062][ T3202] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 664.337345][ T3202] ? stack_trace_save+0x87/0xb0
[ 664.342103][ T3202] ? freezing_slow_path.cold.1+0x35/0x35
[ 664.347647][ T3202] ? __kasan_kmalloc.part.0+0x81/0xc0
[ 664.352925][ T3202] ? __kasan_kmalloc.part.0+0x44/0xc0
[ 664.358204][ T3202] ? __kasan_kmalloc.constprop.1+0xac/0xc0
[ 664.363922][ hmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
all_unreclaimable? no
[ 664.759472][ T3127] ? __read_once_size_nocheck.constprop.2+0x10/0x10
[ 664.759508][ T3127] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 664.785836][ T3202] Node 4 active_anon:15362196kB inactive_anon:1296156kB
active_file:15052kB inactive_file:17752kB unevictable:0kB isolated(anon):66644kB
isolated(file):112kB mapped:30596kB dirty:0kB writeback:3968kB shmem:1080kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14735360kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[ 664.789031][ T3127] ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 664.793056][ T3202] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 664.819386][ T3127] ? __isolate_free_page+0x390/0x390
[ 664.819401][ T3127] __alloc_pages_nodemask+0x1aab/0x1bc0
[ 664.824245][ T3202] Node 6 active_anon7] map_sg+0x99/0x2f0
[ 665.159320][ T3202] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[ 665.191157][ T3127] ? kasan_check_read+0x11/0x20
[ 665.191176][ T3127] scsi_dma_map+0xc6/0x160
[ 665.195480][ T3202] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 665.195490][ T3202] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[ 665.200248][ T3127] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 665.204805][ T3202] 69668 total pagecache pages
[ 665.209566][ T3127] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[ 665.213886][ T3202] 65404 pages in swap cache
[ 665.228054][ T3127] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 665.228074][ T3127] ? scsi_init_io+0x102/0x150
[ 665.232285][ T3202] Swap cache stats: add 486050, delete 428240, find 59/149
[ 665.232294][ T3202] Free swap = 30975484kB
[ 665.236832][ T3127] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[ 665.236858][ T3127] ? pqi_event_worker+0xdf0/0xdf0 [smar390
[ 665.806891][ T3141] ? lock_downgrade+0x390/0x390
[ 665.811664][ T3141] ? alloc_iova+0x33/0x210
[ 665.815987][ T3141] __slab_alloc+0x12/0x20
[ 665.820232][ T3141] ? __slab_alloc+0x12/0x20
[ 665.824654][ T3141] kmem_cache_alloc+0x32a/0x400
[ 665.829413][ T3141] ? kasan_check_read+0x11/0x20
[ 665.834179][ T3141] ? do_raw_spin_unlock+0xa8/0x140
[ 665.839221][ T3141] alloc_iova+0x33/0x210
[ 665.843369][ T3141] ? iova_rcache_get+0x1a1/0x300
[ 665.848225][ T3141] alloc_iova_fast+0x47/0xba
[ 665.852736][ T3141] dma_ops_alloc_iova.isra.5+0x86/0xa0
[ 665.858122][ T3141] map_sg+0x99/0x2f0
[ 665.861957][ T3141] ? kasan_check_read+0x11/0x20
[ 665.866759][ T3141] scsi_dma_map+0xc6/0x160
[ 665.871098][ T3141] pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[ 665.878918][ T3141] ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[ 665.885172][ T3141] pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[ 665.891435][ T3141] ? scsi_init_io+0x102/0x150
[ 665.896103][ T3141] ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[ 665.902619][ T3141] ? pqie:0kB unevictable:0kB writepending:0kB
present:15996kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 666.300385][ T3141] lowmem_reserve[]: 0 1532 19982 19982 19982
[ 666.306395][ T3141] Node 0 DMA32 free:75568kB min:2676kB low:4244kB
high:5812kB active_anon:749752kB inactive_anon:395332kB active_file:128kB
inactive_file:168kB unevictable:0kB writepending:0kB present:1923080kB
managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:28kB bounce:0kB
free_pcp:55484kB local_pcp:248kB free_cma:0kB
[ 666.335894][ T3141] lowmem_reserve[]: 0 0 18450 18450 18450
[ 666.341762][ T3141] Node 0 Normal free:52856kB min:52716kB low:71608kB
high:90500kB active_anon:1127696kB inactive_anon:80184kB active_file:492kB
inactive_file:656kB unevictable:0kB writepending:2208kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10372kB
bounce:0kB free_pcp:12848kB local_pcp:36kB free_cma:0kB
[ 666.372602][ T3141] lowmem_reserve[]: 0 0 0 0 0
[ 666.377419][ T3141] Node 4 Normal free:234488kB m[ 685.274656][ T3456]
list_del corruption. prev->next should be ffffea0022b10098, but was
0000000000000000
[ 685.284254][ T3456] ------------[ cut here ]------------
[ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
[ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
G W 5.2.0-next-20190711+ #3
[ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
[ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
[ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
[ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
ffffffffa2d5d708
[ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff8888442bd380
[ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
ffffed1108857a70
[ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
0000000000000000
[ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
ffffea0022b10098
[ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000)
knlGS:0000000000000000
[ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
00000000001406a0
[ 685.414563][ T3456] Call Trace:
[ 685.417736][ T3456] deferred_split_scan+0x337/0x740
[ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10
[ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0
[ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40
[ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0
[ 685.444071][ T3456] shrink_slab+0x253/0x440
[ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110
[ 685.453551][ T3456] ? kasan_check_read+0x11/0x20
[ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260
[ 685.463555][ T3456] shrink_node+0x31e/0xa30
[ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560
[ 685.473036][ T3456] ? ktime_get+0x93/0x110
[ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820
[ 685.482338][ T3456] ? shrink_node+0xa30/0xa30
[ 685.486815][ T3456] ? kasan_check_read+0x11/0x20
[ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0
[ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0
[ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820
[ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0
[ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 685.517089][ T3456] ? kasan_check_read+0x11/0x20
[ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0
[ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30
[ 685.531658][ T3456] ? lock_downgrade+0x390/0x390
[ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0
[ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160
[ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0
[ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30
[ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490
[ 685.560796][ T3456] ? finish_fault+0x120/0x120
[ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0
[ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0
[ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50
[ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 685.585280][ T3456] ? kasan_check_read+0x11/0x20
[ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0
[ 685.595196][ T3456] handle_mm_fault+0x17f/0x370
[ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0
[ 685.604501][ T3456] do_page_fault+0x4c/0x2cf
[ 685.608892][ T3456] ? page_fault+0x5/0x20
[ 685.613019][ T3456] page_fault+0x1b/0x20
[ 685.617058][ T3456] RIP: 0033:0x410be0
[ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 68[ 687.120156][ T3456] Shutting down cpus with NMI
[ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-24 21:13 ` Qian Cai
@ 2019-07-25 21:46 ` Yang Shi
2019-08-05 22:15 ` Yang Shi
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-07-25 21:46 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/24/19 2:13 PM, Qian Cai wrote:
> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>> series
>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
> You might want to look harder on this commit, as reverted it alone on the top of
> 5.2.0-next-20190711 fixed the issue.
>
> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>
> [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
> linux.alibaba.com/
This is the real meat of the patch series, which converted to memcg
deferred split queue actually.
>
>
> list_del corruption. prev->next should be ffffea0022b10098, but was
> 0000000000000000
Finally I could reproduce the list corruption issue on my machine with
THP swap (swap device is fast device). I should checked this with you at
the first place. The problem can't be reproduced with rotate swap
device. So, I'm supposed you were using THP swap too.
Actually, I found two issues with THP swap:
1. free_transhuge_page() is called in reclaim path instead of put_page.
The mem_cgroup_uncharge() is called before free_transhuge_page() in
reclaim path, which causes page->mem_cgroup is NULL so the wrong
deferred_split_queue would be used, so the THP was not deleted from the
memcg's list at all. Then the page might be split or reused later,
page->mapping would be override.
2. There is a race condition caused by try_to_unmap() with THP swap. The
try_to_unmap() just calls page_remove_rmap() to add THP to deferred
split queue in reclaim path. This might cause the below race condition
to corrupt the list:
A B
deferred_split_scan
list_move
try_to_unmap
list_add_tail
list_splice <-- The list might get corrupted here
free_transhuge_page
list_del <--
kernel bug triggered
I hope the below patch would solve your problem (tested locally).
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7f709d..d6612ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+ /*
+ * The try_to_unmap() in page reclaim path might reach here too,
+ * this may cause a race condition to corrupt deferred split queue.
+ * And, if page reclaim is already handling the same page, it is
+ * unnecessary to handle it again in shrinker.
+ *
+ * Check PageSwapCache to determine if the page is being
+ * handled by page reclaim since THP swap would add the page into
+ * swap cache before reaching try_to_unmap().
+ */
+ if (PageSwapCache(page))
+ return;
+
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (list_empty(page_deferred_list(page))) {
count_vm_event(THP_DEFERRED_SPLIT_PAGE);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..40c684a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct
list_head *page_list,
* Is there need to periodically free_page_list? It would
* appear not as the counts should be low
*/
- if (unlikely(PageTransHuge(page))) {
- mem_cgroup_uncharge(page);
+ if (unlikely(PageTransHuge(page)))
(*get_compound_page_dtor(page))(page);
- } else
+ else
list_add(&page->lru, &free_pages);
continue;
@@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack
move_pages_to_lru(struct lruvec *lruvec,
if (unlikely(PageCompound(page))) {
spin_unlock_irq(&pgdat->lru_lock);
- mem_cgroup_uncharge(page);
(*get_compound_page_dtor(page))(page);
spin_lock_irq(&pgdat->lru_lock);
} else
> [ 685.284254][ T3456] ------------[ cut here ]------------
> [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
> [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
> G W 5.2.0-next-20190711+ #3
> [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 06/24/2019
> [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
> [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
> [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
> [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
> ffffffffa2d5d708
> [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> ffff8888442bd380
> [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
> ffffed1108857a70
> [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
> 0000000000000000
> [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
> ffffea0022b10098
> [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000)
> knlGS:0000000000000000
> [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
> 00000000001406a0
> [ 685.414563][ T3456] Call Trace:
> [ 685.417736][ T3456] deferred_split_scan+0x337/0x740
> [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10
> [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0
> [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40
> [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0
> [ 685.444071][ T3456] shrink_slab+0x253/0x440
> [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110
> [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20
> [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260
> [ 685.463555][ T3456] shrink_node+0x31e/0xa30
> [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560
> [ 685.473036][ T3456] ? ktime_get+0x93/0x110
> [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820
> [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30
> [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20
> [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0
> [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0
> [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820
> [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20
> [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0
> [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30
> [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390
> [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0
> [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160
> [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0
> [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30
> [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490
> [ 685.560796][ T3456] ? finish_fault+0x120/0x120
> [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0
> [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0
> [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50
> [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20
> [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0
> [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370
> [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0
> [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf
> [ 685.608892][ T3456] ? page_fault+0x5/0x20
> [ 685.613019][ T3456] page_fault+0x1b/0x20
> [ 685.617058][ T3456] RIP: 0033:0x410be0
> [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 68[ 687.120156][ T3456] Shutting down cpus with NMI
> [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-07-25 21:46 ` Yang Shi
@ 2019-08-05 22:15 ` Yang Shi
2019-08-06 1:05 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Yang Shi @ 2019-08-05 22:15 UTC (permalink / raw)
To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel
On 7/25/19 2:46 PM, Yang Shi wrote:
>
>
> On 7/24/19 2:13 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>>> Running LTP oom01 test case with swap triggers a crash below. Revert
>>> the
>>> series
>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>> You might want to look harder on this commit, as reverted it alone on
>> the top of
>> 5.2.0-next-20190711 fixed the issue.
>>
>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>>
>> [1]
>> https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
>> linux.alibaba.com/
>
> This is the real meat of the patch series, which converted to memcg
> deferred split queue actually.
>
>>
>>
>> list_del corruption. prev->next should be ffffea0022b10098, but was
>> 0000000000000000
>
> Finally I could reproduce the list corruption issue on my machine with
> THP swap (swap device is fast device). I should checked this with you
> at the first place. The problem can't be reproduced with rotate swap
> device. So, I'm supposed you were using THP swap too.
>
> Actually, I found two issues with THP swap:
> 1. free_transhuge_page() is called in reclaim path instead of
> put_page. The mem_cgroup_uncharge() is called before
> free_transhuge_page() in reclaim path, which causes page->mem_cgroup
> is NULL so the wrong deferred_split_queue would be used, so the THP
> was not deleted from the memcg's list at all. Then the page might be
> split or reused later, page->mapping would be override.
>
> 2. There is a race condition caused by try_to_unmap() with THP swap.
> The try_to_unmap() just calls page_remove_rmap() to add THP to
> deferred split queue in reclaim path. This might cause the below race
> condition to corrupt the list:
>
> A B
> deferred_split_scan
> list_move
> try_to_unmap
> list_add_tail
>
> list_splice <-- The list might get corrupted here
>
> free_transhuge_page
> list_del <--
> kernel bug triggered
>
> I hope the below patch would solve your problem (tested locally).
Hi Qian,
Did the below patch solve your problem? I would like the fold the fix
into the series then target to 5.4 release.
Thanks,
Yang
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..d6612ec 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)
>
> VM_BUG_ON_PAGE(!PageTransHuge(page), page);
>
> + /*
> + * The try_to_unmap() in page reclaim path might reach here too,
> + * this may cause a race condition to corrupt deferred split
> queue.
> + * And, if page reclaim is already handling the same page, it is
> + * unnecessary to handle it again in shrinker.
> + *
> + * Check PageSwapCache to determine if the page is being
> + * handled by page reclaim since THP swap would add the page into
> + * swap cache before reaching try_to_unmap().
> + */
> + if (PageSwapCache(page))
> + return;
> +
> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> if (list_empty(page_deferred_list(page))) {
> count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..40c684a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct
> list_head *page_list,
> * Is there need to periodically free_page_list? It would
> * appear not as the counts should be low
> */
> - if (unlikely(PageTransHuge(page))) {
> - mem_cgroup_uncharge(page);
> + if (unlikely(PageTransHuge(page)))
> (*get_compound_page_dtor(page))(page);
> - } else
> + else
> list_add(&page->lru, &free_pages);
> continue;
>
> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack
> move_pages_to_lru(struct lruvec *lruvec,
>
> if (unlikely(PageCompound(page))) {
> spin_unlock_irq(&pgdat->lru_lock);
> - mem_cgroup_uncharge(page);
> (*get_compound_page_dtor(page))(page);
> spin_lock_irq(&pgdat->lru_lock);
> } else
>
>> [ 685.284254][ T3456] ------------[ cut here ]------------
>> [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
>> [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>> KASAN NOPTI
>> [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
>> G W 5.2.0-next-20190711+ #3
>> [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 06/24/2019
>> [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
>> [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b
>> b8 01 00 00
>> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa
>> bc ff <0f>
>> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
>> [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
>> [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
>> ffffffffa2d5d708
>> [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>> ffff8888442bd380
>> [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
>> ffffed1108857a70
>> [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
>> 0000000000000000
>> [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
>> ffffea0022b10098
>> [ 685.391348][ T3456] FS: 00007fbe26db4700(0000)
>> GS:ffff888844280000(0000)
>> knlGS:0000000000000000
>> [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
>> 00000000001406a0
>> [ 685.414563][ T3456] Call Trace:
>> [ 685.417736][ T3456] deferred_split_scan+0x337/0x740
>> [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10
>> [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0
>> [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40
>> [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0
>> [ 685.444071][ T3456] shrink_slab+0x253/0x440
>> [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110
>> [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20
>> [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260
>> [ 685.463555][ T3456] shrink_node+0x31e/0xa30
>> [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560
>> [ 685.473036][ T3456] ? ktime_get+0x93/0x110
>> [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820
>> [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30
>> [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20
>> [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0
>> [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0
>> [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820
>> [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0
>> [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20
>> [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0
>> [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30
>> [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390
>> [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0
>> [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160
>> [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0
>> [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30
>> [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490
>> [ 685.560796][ T3456] ? finish_fault+0x120/0x120
>> [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0
>> [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0
>> [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50
>> [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20
>> [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20
>> [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0
>> [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370
>> [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0
>> [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf
>> [ 685.608892][ T3456] ? page_fault+0x5/0x20
>> [ 685.613019][ T3456] page_fault+0x1b/0x20
>> [ 685.617058][ T3456] RIP: 0033:0x410be0
>> [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84
>> 86 00 00 00
>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48
>> 98 90 <c6>
>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>> [ 68[ 687.120156][ T3456] Shutting down cpus with NMI
>> [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal
>> exception ]---
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: list corruption in deferred_split_scan()
2019-08-05 22:15 ` Yang Shi
@ 2019-08-06 1:05 ` Qian Cai
0 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-08-06 1:05 UTC (permalink / raw)
To: Yang Shi
Cc: Kirill A. Shutemov, Andrew Morton, Linux-MM, Linux List Kernel Mailing
> On Aug 5, 2019, at 6:15 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/25/19 2:46 PM, Yang Shi wrote:
>>
>>
>> On 7/24/19 2:13 PM, Qian Cai wrote:
>>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>>>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>>>> series
>>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>> You might want to look harder on this commit, as reverted it alone on the top of
>>> 5.2.0-next-20190711 fixed the issue.
>>>
>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>>>
>>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
>>> linux.alibaba.com/
>>
>> This is the real meat of the patch series, which converted to memcg deferred split queue actually.
>>
>>>
>>>
>>> list_del corruption. prev->next should be ffffea0022b10098, but was
>>> 0000000000000000
>>
>> Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too.
>>
>> Actually, I found two issues with THP swap:
>> 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override.
>>
>> 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list:
>>
>> A B
>> deferred_split_scan
>> list_move
>> try_to_unmap
>> list_add_tail
>>
>> list_splice <-- The list might get corrupted here
>>
>> free_transhuge_page
>> list_del <-- kernel bug triggered
>>
>> I hope the below patch would solve your problem (tested locally).
>
> Hi Qian,
>
> Did the below patch solve your problem? I would like the fold the fix into the series then target to 5.4 release.
It is going to take a while before I would be able to access that system again. Since you can reproduce this and
test yourself now, I’d say go ahead posting the patch.
>
> Thanks,
> Yang
>
>>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b7f709d..d6612ec 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)
>>
>> VM_BUG_ON_PAGE(!PageTransHuge(page), page);
>>
>> + /*
>> + * The try_to_unmap() in page reclaim path might reach here too,
>> + * this may cause a race condition to corrupt deferred split queue.
>> + * And, if page reclaim is already handling the same page, it is
>> + * unnecessary to handle it again in shrinker.
>> + *
>> + * Check PageSwapCache to determine if the page is being
>> + * handled by page reclaim since THP swap would add the page into
>> + * swap cache before reaching try_to_unmap().
>> + */
>> + if (PageSwapCache(page))
>> + return;
>> +
>> spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> if (list_empty(page_deferred_list(page))) {
>> count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..40c684a 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>> * Is there need to periodically free_page_list? It would
>> * appear not as the counts should be low
>> */
>> - if (unlikely(PageTransHuge(page))) {
>> - mem_cgroup_uncharge(page);
>> + if (unlikely(PageTransHuge(page)))
>> (*get_compound_page_dtor(page))(page);
>> - } else
>> + else
>> list_add(&page->lru, &free_pages);
>> continue;
>>
>> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>>
>> if (unlikely(PageCompound(page))) {
>> spin_unlock_irq(&pgdat->lru_lock);
>> - mem_cgroup_uncharge(page);
>> (*get_compound_page_dtor(page))(page);
>> spin_lock_irq(&pgdat->lru_lock);
>> } else
>>
>>> [ 685.284254][ T3456] ------------[ cut here ]------------
>>> [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
>>> [ 685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>> [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
>>> G W 5.2.0-next-20190711+ #3
>>> [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>>> Gen10, BIOS A40 06/24/2019
>>> [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
>>> [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
>>> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
>>> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
>>> [ 685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
>>> [ 685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
>>> ffffffffa2d5d708
>>> [ 685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>>> ffff8888442bd380
>>> [ 685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
>>> ffffed1108857a70
>>> [ 685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
>>> 0000000000000000
>>> [ 685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
>>> ffffea0022b10098
>>> [ 685.391348][ T3456] FS: 00007fbe26db4700(0000) GS:ffff888844280000(0000)
>>> knlGS:0000000000000000
>>> [ 685.400194][ T3456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
>>> 00000000001406a0
>>> [ 685.414563][ T3456] Call Trace:
>>> [ 685.417736][ T3456] deferred_split_scan+0x337/0x740
>>> [ 685.422741][ T3456] ? split_huge_page_to_list+0xe10/0xe10
>>> [ 685.428272][ T3456] ? __radix_tree_lookup+0x12d/0x1e0
>>> [ 685.433453][ T3456] ? node_tag_get.part.0.constprop.6+0x40/0x40
>>> [ 685.439505][ T3456] do_shrink_slab+0x244/0x5a0
>>> [ 685.444071][ T3456] shrink_slab+0x253/0x440
>>> [ 685.448375][ T3456] ? unregister_shrinker+0x110/0x110
>>> [ 685.453551][ T3456] ? kasan_check_read+0x11/0x20
>>> [ 685.458291][ T3456] ? mem_cgroup_protected+0x20f/0x260
>>> [ 685.463555][ T3456] shrink_node+0x31e/0xa30
>>> [ 685.467858][ T3456] ? shrink_node_memcg+0x1560/0x1560
>>> [ 685.473036][ T3456] ? ktime_get+0x93/0x110
>>> [ 685.477250][ T3456] do_try_to_free_pages+0x22f/0x820
>>> [ 685.482338][ T3456] ? shrink_node+0xa30/0xa30
>>> [ 685.486815][ T3456] ? kasan_check_read+0x11/0x20
>>> [ 685.491556][ T3456] ? check_chain_key+0x1df/0x2e0
>>> [ 685.496383][ T3456] try_to_free_pages+0x242/0x4d0
>>> [ 685.501209][ T3456] ? do_try_to_free_pages+0x820/0x820
>>> [ 685.506476][ T3456] __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [ 685.511826][ T3456] ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [ 685.517089][ T3456] ? kasan_check_read+0x11/0x20
>>> [ 685.521826][ T3456] ? check_chain_key+0x1df/0x2e0
>>> [ 685.526657][ T3456] ? do_anonymous_page+0x343/0xe30
>>> [ 685.531658][ T3456] ? lock_downgrade+0x390/0x390
>>> [ 685.536399][ T3456] ? get_kernel_page+0xa0/0xa0
>>> [ 685.541050][ T3456] ? __lru_cache_add+0x108/0x160
>>> [ 685.545879][ T3456] alloc_pages_vma+0x89/0x2c0
>>> [ 685.550444][ T3456] do_anonymous_page+0x3e1/0xe30
>>> [ 685.555271][ T3456] ? __update_load_avg_cfs_rq+0x2c/0x490
>>> [ 685.560796][ T3456] ? finish_fault+0x120/0x120
>>> [ 685.565361][ T3456] ? alloc_pages_vma+0x21e/0x2c0
>>> [ 685.570187][ T3456] handle_pte_fault+0x457/0x12c0
>>> [ 685.575014][ T3456] __handle_mm_fault+0x79a/0xa50
>>> [ 685.579841][ T3456] ? vmf_insert_mixed_mkwrite+0x20/0x20
>>> [ 685.585280][ T3456] ? kasan_check_read+0x11/0x20
>>> [ 685.590021][ T3456] ? __count_memcg_events+0x8b/0x1c0
>>> [ 685.595196][ T3456] handle_mm_fault+0x17f/0x370
>>> [ 685.599850][ T3456] __do_page_fault+0x25b/0x5d0
>>> [ 685.604501][ T3456] do_page_fault+0x4c/0x2cf
>>> [ 685.608892][ T3456] ? page_fault+0x5/0x20
>>> [ 685.613019][ T3456] page_fault+0x1b/0x20
>>> [ 685.617058][ T3456] RIP: 0033:0x410be0
>>> [ 685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>> [ 68[ 687.120156][ T3456] Shutting down cpus with NMI
>>> [ 687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-08-06 1:05 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
2019-07-11 0:16 ` Yang Shi
2019-07-11 21:07 ` Qian Cai
2019-07-12 19:12 ` Yang Shi
2019-07-13 4:41 ` Yang Shi
2019-07-15 21:23 ` Qian Cai
2019-07-16 0:22 ` Yang Shi
2019-07-16 1:36 ` Qian Cai
2019-07-16 3:00 ` Yang Shi
2019-07-16 23:36 ` Shakeel Butt
2019-07-17 0:12 ` Yang Shi
2019-07-17 17:02 ` Shakeel Butt
2019-07-17 17:09 ` Yang Shi
2019-07-19 0:54 ` Qian Cai
2019-07-19 0:59 ` Yang Shi
2019-07-24 18:10 ` Qian Cai
2019-07-15 4:52 ` Yang Shi
2019-07-24 21:13 ` Qian Cai
2019-07-25 21:46 ` Yang Shi
2019-08-05 22:15 ` Yang Shi
2019-08-06 1:05 ` Qian Cai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).