linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* list corruption in deferred_split_scan()
@ 2019-07-10 21:43 Qian Cai
  2019-07-11  0:16 ` Yang Shi
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Qian Cai @ 2019-07-10 21:43 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel

Running LTP oom01 test case with swap triggers a crash below. Revert the series
"Make deferred split shrinker memcg aware" [1] seems fix the issue.

aefde94195ca mm: thp: make deferred split shrinker memcg aware
cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
4e050f2df876 mm: thp: extract split_queue_* into a struct

[1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@
linux.alibaba.com/

[ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
LIST_POISON1 (dead000000000100)
[ 1145.739763][ T5764] ------------[ cut here ]------------
[ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
[ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
G        W         5.2.0-next-20190710+ #7
[ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
[ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
[ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
[ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
ffffffffae95d318
[ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff8888440bd380
[ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
ffffed1108817a70
[ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
dead000000000122
[ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
dead000000000100
[ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000)
knlGS:0000000000000000
[ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
00000000001406a0
[ 1145.870664][ T5764] Call Trace:
[ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
[ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
[ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
[ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
[ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
[ 1145.900159][ T5764]  shrink_slab+0x253/0x440
[ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
[ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
[ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
[ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
[ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
[ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
[ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
[ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
[ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
[ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
[ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
[ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
[ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
[ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
[ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
[ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
[ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
[ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
[ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
[ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
[ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
[ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
[ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
[ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
[ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
[ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
[ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
[ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
[ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
[ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
[ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
[ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
[ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
[ 1146.075426][ T5764]  ? page_fault+0x5/0x20
[ 1146.079553][ T5764]  page_fault+0x1b/0x20
[ 1146.083594][ T5764] RIP: 0033:0x410be0
[ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
[ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007f98f2674497
[ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
0000000000000000
[ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
0000000000000000
[ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
T5764] Shutting down cpus with NMI
[ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
@ 2019-07-11  0:16 ` Yang Shi
  2019-07-11 21:07   ` Qian Cai
  2019-07-14  3:53 ` Hillf Danton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-11  0:16 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel

Hi Qian,


Thanks for reporting the issue. But, I can't reproduce it on my machine. 
Could you please share more details about your test? How often did you 
run into this problem?


Regards,

Yang



On 7/10/19 2:43 PM, Qian Cai wrote:
> Running LTP oom01 test case with swap triggers a crash below. Revert the series
> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>
> aefde94195ca mm: thp: make deferred split shrinker memcg aware
> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>
> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@
> linux.alibaba.com/
>
> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
> LIST_POISON1 (dead000000000100)
> [ 1145.739763][ T5764] ------------[ cut here ]------------
> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
> G        W         5.2.0-next-20190710+ #7
> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
> ffffffffae95d318
> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> ffff8888440bd380
> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
> ffffed1108817a70
> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
> dead000000000122
> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
> dead000000000100
> [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000)
> knlGS:0000000000000000
> [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
> 00000000001406a0
> [ 1145.870664][ T5764] Call Trace:
> [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
> [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
> [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
> [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
> [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
> [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
> [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
> [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
> [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
> [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
> [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
> [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
> [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
> [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
> [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
> [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
> [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
> [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
> [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
> [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
> [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
> [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
> [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
> [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
> [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
> [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
> [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
> [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
> [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
> [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
> [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
> [ 1146.079553][ T5764]  page_fault+0x1b/0x20
> [ 1146.083594][ T5764] RIP: 0033:0x410be0
> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
> 00007f98f2674497
> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
> 0000000000000000
> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
> 0000000000000000
> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
> T5764] Shutting down cpus with NMI
> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-11  0:16 ` Yang Shi
@ 2019-07-11 21:07   ` Qian Cai
  2019-07-12 19:12     ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Qian Cai @ 2019-07-11 21:07 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel

On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
> Hi Qian,
> 
> 
> Thanks for reporting the issue. But, I can't reproduce it on my machine. 
> Could you please share more details about your test? How often did you 
> run into this problem?

I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
is some more information.

# cat .config

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

# numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
node 0 size: 19984 MB
node 0 free: 7251 MB
node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
node 4 size: 31524 MB
node 4 free: 25165 MB
node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  16  16  16  32  32  32  32 
  1:  16  10  16  16  32  32  32  32 
  2:  16  16  10  16  32  32  32  32 
  3:  16  16  16  10  32  32  32  32 
  4:  32  32  32  32  10  16  16  16 
  5:  32  32  32  32  16  10  16  16 
  6:  32  32  32  32  16  16  10  16 
  7:  32  32  32  32  16  16  16  10

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7601 32-Core Processor
Stepping:            2
CPU MHz:             2713.551
BogoMIPS:            4391.39
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7,64-71
NUMA node1 CPU(s):   8-15,72-79
NUMA node2 CPU(s):   16-23,80-87
NUMA node3 CPU(s):   24-31,88-95
NUMA node4 CPU(s):   32-39,96-103
NUMA node5 CPU(s):   40-47,104-111
NUMA node6 CPU(s):   48-55,112-119
NUMA node7 CPU(s):   56-63,120-127

Another possible lead is that without reverting the those commits below, kdump
kernel would always also crash in shrink_slab_memcg() at this line,

map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);

[    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
[    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
swapper/0/1
[    9.072036][    T1] 
[    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
20190711+ #10
[    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[    9.072036][    T1] Call Trace:
[    9.072036][    T1]  dump_stack+0x62/0x9a
[    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
[    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
[    9.072036][    T1]  ? shrink_slab+0x111/0x440
[    9.072036][    T1]  kasan_report+0xc/0xe
[    9.072036][    T1]  __asan_load8+0x71/0xa0
[    9.072036][    T1]  shrink_slab+0x111/0x440
[    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
[    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
[    9.072036][    T1]  ? kasan_check_read+0x11/0x20
[    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
[    9.072036][    T1]  shrink_node+0x31e/0xa30
[    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
[    9.072036][    T1]  ? ktime_get+0x93/0x110
[    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
[    9.072036][    T1]  ? shrink_node+0xa30/0xa30
[    9.072036][    T1]  ? kasan_check_read+0x11/0x20
[    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
[    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
[    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
[    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
[    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[    9.072036][    T1]  ? unwind_dump+0x260/0x260
[    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
[    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
[    9.072036][    T1]  ? ret_from_fork+0x22/0x40
[    9.072036][    T1]  alloc_page_interleave+0x18/0x130
[    9.072036][    T1]  alloc_pages_current+0xf6/0x110
[    9.072036][    T1]  allocate_slab+0x600/0x11f0
[    9.072036][    T1]  new_slab+0x46/0x70
[    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
[    9.072036][    T1]  ? create_object+0x3a/0x3e0
[    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
[    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
[    9.072036][    T1]  ? create_object+0x3a/0x3e0
[    9.072036][    T1]  __slab_alloc+0x12/0x20
[    9.072036][    T1]  ? __slab_alloc+0x12/0x20
[    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
[    9.072036][    T1]  create_object+0x3a/0x3e0
[    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
[    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
[    9.072036][    T1]  ? kasan_check_read+0x11/0x20
[    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
[    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
[    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
[    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
[    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
[    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
[    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
[    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
[    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
[    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
[    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
[    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
[    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
[    9.072036][    T1]  acpi_load_tables+0x61/0x80
[    9.072036][    T1]  acpi_init+0x10d/0x44b
[    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
[    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
[    9.072036][    T1]  ? kernfs_get+0x13/0x20
[    9.072036][    T1]  ? kobject_uevent+0xb/0x10
[    9.072036][    T1]  ? kset_register+0x31/0x50
[    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
[    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.072036][    T1]  do_one_initcall+0xfe/0x45a
[    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
[    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
[    9.072036][    T1]  ? kasan_check_write+0x14/0x20
[    9.072036][    T1]  ? up_write+0x6b/0x190
[    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
[    9.072036][    T1]  ? rest_init+0x188/0x188
[    9.072036][    T1]  kernel_init+0x11/0x138
[    9.072036][    T1]  ? rest_init+0x188/0x188
[    9.072036][    T1]  ret_from_fork+0x22/0x40
[    9.072036][    T1]
==================================================================
[    9.072036][    T1] Disabling lock debugging due to kernel taint
[    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
0000000000000dc8
[    9.152036][    T1] #PF: supervisor read access in kernel mode
[    9.152036][    T1] #PF: error_code(0x0000) - not-present page
[    9.152036][    T1] PGD 0 P4D 0 
[    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
G    B             5.2.0-next-20190711+ #10
[    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 01/25/2019
[    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
[    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
[    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
[    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
ffffffff8112f288
[    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
ffffffff824e0440
[    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
fffffbfff049c088
[    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
00000000000001b8
[    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88905757f440
[    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
knlGS:0000000000000000
[    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
00000000001406b0
[    9.152036][    T1] Call Trace:
[    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
[    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
[    9.152036][    T1]  ? kasan_check_read+0x11/0x20
[    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
[    9.152036][    T1]  shrink_node+0x31e/0xa30
[    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
[    9.152036][    T1]  ? ktime_get+0x93/0x110
[    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
[    9.152036][    T1]  ? shrink_node+0xa30/0xa30
[    9.152036][    T1]  ? kasan_check_read+0x11/0x20
[    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
[    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
[    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
[    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
[    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[    9.152036][    T1]  ? unwind_dump+0x260/0x260
[    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
[    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
[    9.152036][    T1]  ? ret_from_fork+0x22/0x40
[    9.152036][    T1]  alloc_page_interleave+0x18/0x130
[    9.152036][    T1]  alloc_pages_current+0xf6/0x110
[    9.152036][    T1]  allocate_slab+0x600/0x11f0
[    9.152036][    T1]  new_slab+0x46/0x70
[    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
[    9.152036][    T1]  ? create_object+0x3a/0x3e0
[    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
[    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
[    9.152036][    T1]  ? create_object+0x3a/0x3e0
[    9.152036][    T1]  __slab_alloc+0x12/0x20
[    9.152036][    T1]  ? __slab_alloc+0x12/0x20
[    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
[    9.152036][    T1]  create_object+0x3a/0x3e0
[    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
[    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
[    9.152036][    T1]  ? kasan_check_read+0x11/0x20
[    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
[    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
[    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
[    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
[    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
[    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
[    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
[    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
[    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
[    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
[    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
[    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
[    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
[    9.152036][    T1]  acpi_load_tables+0x61/0x80
[    9.152036][    T1]  acpi_init+0x10d/0x44b
[    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
[    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
[    9.152036][    T1]  ? kernfs_get+0x13/0x20
[    9.152036][    T1]  ? kobject_uevent+0xb/0x10
[    9.152036][    T1]  ? kset_register+0x31/0x50
[    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
[    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
[    9.152036][    T1]  do_one_initcall+0xfe/0x45a
[    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
[    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
[    9.152036][    T1]  ? kasan_check_write+0x14/0x20
[    9.152036][    T1]  ? up_write+0x6b/0x190
[    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
[    9.152036][    T1]  ? rest_init+0x188/0x188
[    9.152036][    T1]  kernel_init+0x11/0x138
[    9.152036][    T1]  ? rest_init+0x188/0x188
[    9.152036][    T1]  ret_from_fork+0x22/0x40
[    9.152036][    T1] Modules linked in:
[    9.152036][    T1] CR2: 0000000000000dc8
[    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
[    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
[    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
[    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
[    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
ffffffff8112f288
[    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
ffffffff824e0440
[    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
fffffbfff049c088
[    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
00000000000001b8
[    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88905757f440
[    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
knlGS:00000000

> 
> 
> Regards,
> 
> Yang
> 
> 
> 
> On 7/10/19 2:43 PM, Qian Cai wrote:
> > Running LTP oom01 test case with swap triggers a crash below. Revert the
> > series
> > "Make deferred split shrinker memcg aware" [1] seems fix the issue.
> > 
> > aefde94195ca mm: thp: make deferred split shrinker memcg aware
> > cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
> > ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
> > 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
> > c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
> > 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
> > 4e050f2df876 mm: thp: extract split_queue_* into a struct
> > 
> > [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.
> > shi@
> > linux.alibaba.com/
> > 
> > [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
> > LIST_POISON1 (dead000000000100)
> > [ 1145.739763][ T5764] ------------[ cut here ]------------
> > [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
> > [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> > NOPTI
> > [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
> > G        W         5.2.0-next-20190710+ #7
> > [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
> > [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80
> > 9e
> > a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff
> > <0f>
> > 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
> > [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
> > [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
> > ffffffffae95d318
> > [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> > ffff8888440bd380
> > [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
> > ffffed1108817a70
> > [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
> > dead000000000122
> > [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
> > dead000000000100
> > [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000)
> > knlGS:0000000000000000
> > [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
> > 00000000001406a0
> > [ 1145.870664][ T5764] Call Trace:
> > [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
> > [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
> > [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
> > [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
> > [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
> > [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
> > [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
> > [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
> > [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
> > [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
> > [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
> > [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
> > [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
> > [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
> > [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
> > [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
> > [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
> > [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
> > [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
> > [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
> > [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> > [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
> > [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
> > [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
> > [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
> > [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
> > [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
> > [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
> > [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
> > [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> > [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
> > [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> > [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
> > [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
> > [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
> > [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
> > [ 1146.079553][ T5764]  page_fault+0x1b/0x20
> > [ 1146.083594][ T5764] RIP: 0033:0x410be0
> > [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00
> > 00
> > 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90
> > <c6>
> > 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> > [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
> > [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
> > 00007f98f2674497
> > [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
> > 0000000000000000
> > [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
> > 0000000000000000
> > [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
> > T5764] Shutting down cpus with NMI
> > [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception
> > ]---
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-11 21:07   ` Qian Cai
@ 2019-07-12 19:12     ` Yang Shi
  2019-07-13  4:41       ` Yang Shi
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Yang Shi @ 2019-07-12 19:12 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/11/19 2:07 PM, Qian Cai wrote:
> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>> Hi Qian,
>>
>>
>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>> Could you please share more details about your test? How often did you
>> run into this problem?
> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
> is some more information.
>
> # cat .config
>
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

I tried your kernel config, but I still can't reproduce it. My compiler 
doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my 
test, but I don't think this would make any difference for this case.

According to the bug call trace in the earlier email, it looks deferred 
_split_scan lost race with put_compound_page. The put_compound_page 
would call free_transhuge_page() which delete the page from the deferred 
split queue, but it may still appear on the deferred list due to some 
reason.

Would you please try the below patch?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7f709d..66bd9db 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, 
struct list_head *list)
         if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
                 if (!list_empty(page_deferred_list(head))) {
                         ds_queue->split_queue_len--;
-                       list_del(page_deferred_list(head));
+                       list_del_init(page_deferred_list(head));
                 }
                 if (mapping)
                         __dec_node_page_state(page, NR_SHMEM_THPS);
@@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
         if (!list_empty(page_deferred_list(page))) {
                 ds_queue->split_queue_len--;
-               list_del(page_deferred_list(page));
+               list_del_init(page_deferred_list(page));
         }
         spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
         free_compound_page(page);

>
> # numactl -H
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
> node 0 size: 19984 MB
> node 0 free: 7251 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
> node 3 size: 0 MB
> node 3 free: 0 MB
> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
> node 4 size: 31524 MB
> node 4 free: 25165 MB
> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
> node 5 size: 0 MB
> node 5 free: 0 MB
> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
> node 6 size: 0 MB
> node 6 free: 0 MB
> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
> node 7 size: 0 MB
> node 7 free: 0 MB
> node distances:
> node   0   1   2   3   4   5   6   7
>    0:  10  16  16  16  32  32  32  32
>    1:  16  10  16  16  32  32  32  32
>    2:  16  16  10  16  32  32  32  32
>    3:  16  16  16  10  32  32  32  32
>    4:  32  32  32  32  10  16  16  16
>    5:  32  32  32  32  16  10  16  16
>    6:  32  32  32  32  16  16  10  16
>    7:  32  32  32  32  16  16  16  10
>
> # lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  2
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        8
> Vendor ID:           AuthenticAMD
> CPU family:          23
> Model:               1
> Model name:          AMD EPYC 7601 32-Core Processor
> Stepping:            2
> CPU MHz:             2713.551
> BogoMIPS:            4391.39
> Virtualization:      AMD-V
> L1d cache:           32K
> L1i cache:           64K
> L2 cache:            512K
> L3 cache:            8192K
> NUMA node0 CPU(s):   0-7,64-71
> NUMA node1 CPU(s):   8-15,72-79
> NUMA node2 CPU(s):   16-23,80-87
> NUMA node3 CPU(s):   24-31,88-95
> NUMA node4 CPU(s):   32-39,96-103
> NUMA node5 CPU(s):   40-47,104-111
> NUMA node6 CPU(s):   48-55,112-119
> NUMA node7 CPU(s):   56-63,120-127
>
> Another possible lead is that without reverting the those commits below, kdump
> kernel would always also crash in shrink_slab_memcg() at this line,
>
> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);

This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't 
think of where nodeinfo was freed but memcg was still online. Maybe a 
check is needed:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..bacda49 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t 
gfp_mask, int nid,
         if (!mem_cgroup_online(memcg))
                 return 0;

+       if (!memcg->nodeinfo[nid])
+               return 0;
+
         if (!down_read_trylock(&shrinker_rwsem))
                 return 0;

>
> [    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
> [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
> swapper/0/1
> [    9.072036][    T1]
> [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
> 20190711+ #10
> [    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [    9.072036][    T1] Call Trace:
> [    9.072036][    T1]  dump_stack+0x62/0x9a
> [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
> [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
> [    9.072036][    T1]  ? shrink_slab+0x111/0x440
> [    9.072036][    T1]  kasan_report+0xc/0xe
> [    9.072036][    T1]  __asan_load8+0x71/0xa0
> [    9.072036][    T1]  shrink_slab+0x111/0x440
> [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
> [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
> [    9.072036][    T1]  shrink_node+0x31e/0xa30
> [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
> [    9.072036][    T1]  ? ktime_get+0x93/0x110
> [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
> [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
> [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
> [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
> [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
> [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [    9.072036][    T1]  ? unwind_dump+0x260/0x260
> [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
> [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
> [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
> [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
> [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
> [    9.072036][    T1]  allocate_slab+0x600/0x11f0
> [    9.072036][    T1]  new_slab+0x46/0x70
> [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
> [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
> [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
> [    9.072036][    T1]  __slab_alloc+0x12/0x20
> [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
> [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
> [    9.072036][    T1]  create_object+0x3a/0x3e0
> [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
> [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
> [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
> [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
> [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
> [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
> [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
> [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
> [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
> [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
> [    9.072036][    T1]  acpi_load_tables+0x61/0x80
> [    9.072036][    T1]  acpi_init+0x10d/0x44b
> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
> [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
> [    9.072036][    T1]  ? kernfs_get+0x13/0x20
> [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
> [    9.072036][    T1]  ? kset_register+0x31/0x50
> [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
> [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
> [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
> [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
> [    9.072036][    T1]  ? up_write+0x6b/0x190
> [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
> [    9.072036][    T1]  ? rest_init+0x188/0x188
> [    9.072036][    T1]  kernel_init+0x11/0x138
> [    9.072036][    T1]  ? rest_init+0x188/0x188
> [    9.072036][    T1]  ret_from_fork+0x22/0x40
> [    9.072036][    T1]
> ==================================================================
> [    9.072036][    T1] Disabling lock debugging due to kernel taint
> [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
> 0000000000000dc8
> [    9.152036][    T1] #PF: supervisor read access in kernel mode
> [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
> [    9.152036][    T1] PGD 0 P4D 0
> [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
> G    B             5.2.0-next-20190711+ #10
> [    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 01/25/2019
> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> ffffffff8112f288
> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> ffffffff824e0440
> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> fffffbfff049c088
> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> 00000000000001b8
> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff88905757f440
> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
> knlGS:0000000000000000
> [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
> 00000000001406b0
> [    9.152036][    T1] Call Trace:
> [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
> [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
> [    9.152036][    T1]  shrink_node+0x31e/0xa30
> [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
> [    9.152036][    T1]  ? ktime_get+0x93/0x110
> [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
> [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
> [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
> [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
> [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
> [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [    9.152036][    T1]  ? unwind_dump+0x260/0x260
> [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
> [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
> [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
> [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
> [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
> [    9.152036][    T1]  allocate_slab+0x600/0x11f0
> [    9.152036][    T1]  new_slab+0x46/0x70
> [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
> [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
> [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
> [    9.152036][    T1]  __slab_alloc+0x12/0x20
> [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
> [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
> [    9.152036][    T1]  create_object+0x3a/0x3e0
> [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
> [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
> [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
> [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
> [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
> [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
> [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
> [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
> [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
> [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
> [    9.152036][    T1]  acpi_load_tables+0x61/0x80
> [    9.152036][    T1]  acpi_init+0x10d/0x44b
> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
> [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
> [    9.152036][    T1]  ? kernfs_get+0x13/0x20
> [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
> [    9.152036][    T1]  ? kset_register+0x31/0x50
> [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
> [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
> [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
> [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
> [    9.152036][    T1]  ? up_write+0x6b/0x190
> [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
> [    9.152036][    T1]  ? rest_init+0x188/0x188
> [    9.152036][    T1]  kernel_init+0x11/0x138
> [    9.152036][    T1]  ? rest_init+0x188/0x188
> [    9.152036][    T1]  ret_from_fork+0x22/0x40
> [    9.152036][    T1] Modules linked in:
> [    9.152036][    T1] CR2: 0000000000000dc8
> [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02 00
> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00 <4f>
> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> ffffffff8112f288
> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> ffffffff824e0440
> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> fffffbfff049c088
> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> 00000000000001b8
> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> ffff88905757f440
> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
> knlGS:00000000
>
>>
>> Regards,
>>
>> Yang
>>
>>
>>
>> On 7/10/19 2:43 PM, Qian Cai wrote:
>>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>>> series
>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>>
>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>>> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
>>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>>
>>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.
>>> shi@
>>> linux.alibaba.com/
>>>
>>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
>>> LIST_POISON1 (dead000000000100)
>>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> NOPTI
>>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
>>> G        W         5.2.0-next-20190710+ #7
>>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80
>>> 9e
>>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff
>>> <0f>
>>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX:
>>> ffffffffae95d318
>>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>>> ffff8888440bd380
>>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09:
>>> ffffed1108817a70
>>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12:
>>> dead000000000122
>>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15:
>>> dead000000000100
>>> [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000)
>>> knlGS:0000000000000000
>>> [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4:
>>> 00000000001406a0
>>> [ 1145.870664][ T5764] Call Trace:
>>> [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
>>> [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
>>> [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
>>> [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
>>> [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
>>> [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
>>> [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
>>> [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
>>> [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
>>> [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
>>> [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
>>> [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
>>> [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
>>> [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
>>> [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
>>> [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
>>> [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
>>> [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
>>> [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
>>> [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>>> [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
>>> [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
>>> [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
>>> [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
>>> [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
>>> [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
>>> [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
>>> [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
>>> [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
>>> [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
>>> [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>>> [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
>>> [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
>>> [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
>>> [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
>>> [ 1146.079553][ T5764]  page_fault+0x1b/0x20
>>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00
>>> 00
>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90
>>> <c6>
>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
>>> 00007f98f2674497
>>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI:
>>> 0000000000000000
>>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09:
>>> 0000000000000000
>>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 1147.588181][
>>> T5764] Shutting down cpus with NMI
>>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception
>>> ]---
>>


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-12 19:12     ` Yang Shi
@ 2019-07-13  4:41       ` Yang Shi
  2019-07-15 21:23       ` Qian Cai
  2019-07-19  0:54       ` Qian Cai
  2 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2019-07-13  4:41 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/12/19 12:12 PM, Yang Shi wrote:
>
>
> On 7/11/19 2:07 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>> Hi Qian,
>>>
>>>
>>> Thanks for reporting the issue. But, I can't reproduce it on my 
>>> machine.
>>> Could you please share more details about your test? How often did you
>>> run into this problem?
>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 
>> server. Here
>> is some more information.
>>
>> # cat .config
>>
>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>
> I tried your kernel config, but I still can't reproduce it. My 
> compiler doesn't have retpoline support, so CONFIG_RETPOLINE is 
> disabled in my test, but I don't think this would make any difference 
> for this case.
>
> According to the bug call trace in the earlier email, it looks 
> deferred _split_scan lost race with put_compound_page. The 
> put_compound_page would call free_transhuge_page() which delete the 
> page from the deferred split queue, but it may still appear on the 
> deferred list due to some reason.
>
> Would you please try the below patch?
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..66bd9db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, 
> struct list_head *list)
>         if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
>                 if (!list_empty(page_deferred_list(head))) {
>                         ds_queue->split_queue_len--;
> -                       list_del(page_deferred_list(head));
> +                       list_del_init(page_deferred_list(head));

This line should not be changed. Please just apply the below part.

> }
>                 if (mapping)
>                         __dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>         if (!list_empty(page_deferred_list(page))) {
>                 ds_queue->split_queue_len--;
> -               list_del(page_deferred_list(page));
> +               list_del_init(page_deferred_list(page));
>         }
>         spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>         free_compound_page(page);
>
>>
>> # numactl -H
>> available: 8 nodes (0-7)
>> node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71
>> node 0 size: 19984 MB
>> node 0 free: 7251 MB
>> node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79
>> node 1 size: 0 MB
>> node 1 free: 0 MB
>> node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87
>> node 2 size: 0 MB
>> node 2 free: 0 MB
>> node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95
>> node 3 size: 0 MB
>> node 3 free: 0 MB
>> node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103
>> node 4 size: 31524 MB
>> node 4 free: 25165 MB
>> node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111
>> node 5 size: 0 MB
>> node 5 free: 0 MB
>> node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119
>> node 6 size: 0 MB
>> node 6 free: 0 MB
>> node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127
>> node 7 size: 0 MB
>> node 7 free: 0 MB
>> node distances:
>> node   0   1   2   3   4   5   6   7
>>    0:  10  16  16  16  32  32  32  32
>>    1:  16  10  16  16  32  32  32  32
>>    2:  16  16  10  16  32  32  32  32
>>    3:  16  16  16  10  32  32  32  32
>>    4:  32  32  32  32  10  16  16  16
>>    5:  32  32  32  32  16  10  16  16
>>    6:  32  32  32  32  16  16  10  16
>>    7:  32  32  32  32  16  16  16  10
>>
>> # lscpu
>> Architecture:        x86_64
>> CPU op-mode(s):      32-bit, 64-bit
>> Byte Order:          Little Endian
>> CPU(s):              128
>> On-line CPU(s) list: 0-127
>> Thread(s) per core:  2
>> Core(s) per socket:  32
>> Socket(s):           2
>> NUMA node(s):        8
>> Vendor ID:           AuthenticAMD
>> CPU family:          23
>> Model:               1
>> Model name:          AMD EPYC 7601 32-Core Processor
>> Stepping:            2
>> CPU MHz:             2713.551
>> BogoMIPS:            4391.39
>> Virtualization:      AMD-V
>> L1d cache:           32K
>> L1i cache:           64K
>> L2 cache:            512K
>> L3 cache:            8192K
>> NUMA node0 CPU(s):   0-7,64-71
>> NUMA node1 CPU(s):   8-15,72-79
>> NUMA node2 CPU(s):   16-23,80-87
>> NUMA node3 CPU(s):   24-31,88-95
>> NUMA node4 CPU(s):   32-39,96-103
>> NUMA node5 CPU(s):   40-47,104-111
>> NUMA node6 CPU(s):   48-55,112-119
>> NUMA node7 CPU(s):   56-63,120-127
>>
>> Another possible lead is that without reverting the those commits 
>> below, kdump
>> kernel would always also crash in shrink_slab_memcg() at this line,
>>
>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, 
>> true);
>
> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I 
> didn't think of where nodeinfo was freed but memcg was still online. 
> Maybe a check is needed:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..bacda49 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t 
> gfp_mask, int nid,
>         if (!mem_cgroup_online(memcg))
>                 return 0;
>
> +       if (!memcg->nodeinfo[nid])
> +               return 0;
> +
>         if (!down_read_trylock(&shrinker_rwsem))
>                 return 0;
>
>>
>> [    9.072036][    T1] BUG: KASAN: null-ptr-deref in 
>> shrink_slab+0x111/0x440
>> [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
>> swapper/0/1
>> [    9.072036][    T1]
>> [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
>> 5.2.0-next-
>> 20190711+ #10
>> [    9.072036][    T1] Hardware name: HPE ProLiant DL385 
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 01/25/2019
>> [    9.072036][    T1] Call Trace:
>> [    9.072036][    T1]  dump_stack+0x62/0x9a
>> [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
>> [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
>> [    9.072036][    T1]  ? shrink_slab+0x111/0x440
>> [    9.072036][    T1]  kasan_report+0xc/0xe
>> [    9.072036][    T1]  __asan_load8+0x71/0xa0
>> [    9.072036][    T1]  shrink_slab+0x111/0x440
>> [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
>> [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
>> [    9.072036][    T1]  shrink_node+0x31e/0xa30
>> [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>> [    9.072036][    T1]  ? ktime_get+0x93/0x110
>> [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
>> [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
>> [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
>> [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
>> [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>> [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [    9.072036][    T1]  ? unwind_dump+0x260/0x260
>> [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
>> [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
>> [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
>> [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
>> [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
>> [    9.072036][    T1]  allocate_slab+0x600/0x11f0
>> [    9.072036][    T1]  new_slab+0x46/0x70
>> [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>> [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>> [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>> [    9.072036][    T1]  __slab_alloc+0x12/0x20
>> [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
>> [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
>> [    9.072036][    T1]  create_object+0x3a/0x3e0
>> [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
>> [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>> [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
>> [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>> [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>> [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
>> [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>> [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>> [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>> [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>> [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>> [    9.072036][    T1]  acpi_load_tables+0x61/0x80
>> [    9.072036][    T1]  acpi_init+0x10d/0x44b
>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
>> [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
>> [    9.072036][    T1]  ? kernfs_get+0x13/0x20
>> [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
>> [    9.072036][    T1]  ? kset_register+0x31/0x50
>> [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
>> [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
>> [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>> [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
>> [    9.072036][    T1]  ? up_write+0x6b/0x190
>> [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>> [    9.072036][    T1]  kernel_init+0x11/0x138
>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>> [    9.072036][    T1]  ret_from_fork+0x22/0x40
>> [    9.072036][    T1]
>> ==================================================================
>> [    9.072036][    T1] Disabling lock debugging due to kernel taint
>> [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
>> 0000000000000dc8
>> [    9.152036][    T1] #PF: supervisor read access in kernel mode
>> [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
>> [    9.152036][    T1] PGD 0 P4D 0
>> [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>> G    B             5.2.0-next-20190711+ #10
>> [    9.152036][    T1] Hardware name: HPE ProLiant DL385 
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 01/25/2019
>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 
>> 84 e2 02 00
>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 
>> 0e 00 <4f>
>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>> ffffffff8112f288
>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>> ffffffff824e0440
>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>> fffffbfff049c088
>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>> 00000000000001b8
>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>> ffff88905757f440
>> [    9.152036][    T1] FS:  0000000000000000(0000) 
>> GS:ffff889062800000(0000)
>> knlGS:0000000000000000
>> [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>> 00000000001406b0
>> [    9.152036][    T1] Call Trace:
>> [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
>> [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
>> [    9.152036][    T1]  shrink_node+0x31e/0xa30
>> [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>> [    9.152036][    T1]  ? ktime_get+0x93/0x110
>> [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
>> [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
>> [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
>> [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
>> [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>> [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [    9.152036][    T1]  ? unwind_dump+0x260/0x260
>> [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
>> [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
>> [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
>> [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
>> [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
>> [    9.152036][    T1]  allocate_slab+0x600/0x11f0
>> [    9.152036][    T1]  new_slab+0x46/0x70
>> [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>> [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>> [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>> [    9.152036][    T1]  __slab_alloc+0x12/0x20
>> [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
>> [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
>> [    9.152036][    T1]  create_object+0x3a/0x3e0
>> [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
>> [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>> [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>> [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
>> [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>> [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>> [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
>> [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>> [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>> [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>> [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>> [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>> [    9.152036][    T1]  acpi_load_tables+0x61/0x80
>> [    9.152036][    T1]  acpi_init+0x10d/0x44b
>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
>> [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
>> [    9.152036][    T1]  ? kernfs_get+0x13/0x20
>> [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
>> [    9.152036][    T1]  ? kset_register+0x31/0x50
>> [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>> [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
>> [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
>> [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>> [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
>> [    9.152036][    T1]  ? up_write+0x6b/0x190
>> [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>> [    9.152036][    T1]  kernel_init+0x11/0x138
>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>> [    9.152036][    T1]  ret_from_fork+0x22/0x40
>> [    9.152036][    T1] Modules linked in:
>> [    9.152036][    T1] CR2: 0000000000000dc8
>> [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 
>> 84 e2 02 00
>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 
>> 0e 00 <4f>
>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>> ffffffff8112f288
>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>> ffffffff824e0440
>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>> fffffbfff049c088
>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>> 00000000000001b8
>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>> ffff88905757f440
>> [    9.152036][    T1] FS:  0000000000000000(0000) 
>> GS:ffff889062800000(0000)
>> knlGS:00000000
>>
>>>
>>> Regards,
>>>
>>> Yang
>>>
>>>
>>>
>>> On 7/10/19 2:43 PM, Qian Cai wrote:
>>>> Running LTP oom01 test case with swap triggers a crash below. 
>>>> Revert the
>>>> series
>>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>>>
>>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>>>> cf402211cacc 
>>>> mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>>>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>>>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>>>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>>>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of 
>>>> __page_cache_release()
>>>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>>>
>>>> [1] 
>>>> https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang. 
>>>>
>>>> shi@
>>>> linux.alibaba.com/
>>>>
>>>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is
>>>> LIST_POISON1 (dead000000000100)
>>>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>>>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>>>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP 
>>>> DEBUG_PAGEALLOC KASAN
>>>> NOPTI
>>>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted:
>>>> G        W         5.2.0-next-20190710+ #7
>>>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 
>>>> Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [ 1145.776000][ T5764] RIP: 
>>>> 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>>>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 
>>>> c7 c7 80
>>>> 9e
>>>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c 
>>>> fe bc ff
>>>> <0f>
>>>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>>>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>>>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 
>>>> RCX:
>>>> ffffffffae95d318
>>>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 
>>>> RDI:
>>>> ffff8888440bd380
>>>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 
>>>> R09:
>>>> ffffed1108817a70
>>>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 
>>>> R12:
>>>> dead000000000122
>>>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 
>>>> R15:
>>>> dead000000000100
>>>> [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) 
>>>> GS:ffff888844080000(0000)
>>>> knlGS:0000000000000000
>>>> [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>> 0000000080050033
>>>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 
>>>> CR4:
>>>> 00000000001406a0
>>>> [ 1145.870664][ T5764] Call Trace:
>>>> [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
>>>> [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
>>>> [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
>>>> [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
>>>> [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
>>>> [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
>>>> [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
>>>> [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
>>>> [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
>>>> [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
>>>> [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
>>>> [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
>>>> [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
>>>> [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
>>>> [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
>>>> [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
>>>> [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
>>>> [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
>>>> [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
>>>> [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>>>> [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
>>>> [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
>>>> [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
>>>> [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
>>>> [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
>>>> [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
>>>> [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
>>>> [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
>>>> [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
>>>> [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
>>>> [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>>>> [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
>>>> [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
>>>> [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
>>>> [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
>>>> [ 1146.079553][ T5764]  page_fault+0x1b/0x20
>>>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>>>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 
>>>> 86 00 00
>>>> 00
>>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 
>>>> 48 98 90
>>>> <c6>
>>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>>>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 
>>>> RCX:
>>>> 00007f98f2674497
>>>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 
>>>> RDI:
>>>> 0000000000000000
>>>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff 
>>>> R09:
>>>> 0000000000000000
>>>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000[ 
>>>> 1147.588181][
>>>> T5764] Shutting down cpus with NMI
>>>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 
>>>> 0xffffffff81000000
>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal 
>>>> exception
>>>> ]---
>>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
  2019-07-11  0:16 ` Yang Shi
@ 2019-07-14  3:53 ` Hillf Danton
  2019-07-15  4:52 ` Yang Shi
  2019-07-24 21:13 ` Qian Cai
  3 siblings, 0 replies; 22+ messages in thread
From: Hillf Danton @ 2019-07-14  3:53 UTC (permalink / raw)
  To: Qian Cai; +Cc: Yang Shi, Kirill A. Shutemov, akpm, linux-mm, linux-kernel


On Wed, 10 Jul 2019 14:43:28 -0700 (PDT) Qian Cai wrote:
> 
> Running LTP oom01 test case with swap triggers a crash below. Revert the series
> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
> 
> aefde94195ca mm: thp: make deferred split shrinker memcg aware
> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
> 4e050f2df876 mm: thp: extract split_queue_* into a struct
> 
> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@linux.alibaba.com/
> 
> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is LIST_POISON1 (dead000000000100)
> [ 1145.739763][ T5764] ------------[ cut here ]------------
> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: G        W         5.2.0-next-20190710+ #7
> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019
> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: ffffffffae95d318
> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888440bd380
> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: ffffed1108817a70
> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: dead000000000122
> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: dead000000000100
> [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000) knlGS:0000000000000000
> [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: 00000000001406a0
> [ 1145.870664][ T5764] Call Trace:
> [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
> [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
> [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
> [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
> [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
> [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
> [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
> [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
> [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
> [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
> [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
> [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
> [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
> [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
> [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
> [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
> [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
> [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
> [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
> [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
> [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
> [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
> [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
> [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
> [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
> [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
> [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
> [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
> [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
> [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
> [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
> [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
> [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
> [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
> [ 1146.079553][ T5764]  page_fault+0x1b/0x20
> [ 1146.083594][ T5764] RIP: 0033:0x410be0
> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f98f2674497
> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: 0000000000000000
> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: 0000000000000000
> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000
> [ 1147.588181][ T5764] Shutting down cpus with NMI
> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---


Ignore the noise if there is no chance you think to corrupt the local list walk
in some way like:

	CPU0				CPU1
	----				----
	take no lock			spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
	list_for_each_safe(pos, next,
				&list)
					list_del(page_deferred_list(page));
	page = list_entry((void *)pos,
		struct page, mapping);
					spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);


--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
 		if (!list_empty(page_deferred_list(head))) {
 			ds_queue->split_queue_len--;
-			list_del(page_deferred_list(head));
+			list_del_init(page_deferred_list(head));
 		}
 		if (mapping)
 			__dec_node_page_state(page, NR_SHMEM_THPS);
@@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	if (!list_empty(page_deferred_list(page))) {
 		ds_queue->split_queue_len--;
-		list_del(page_deferred_list(page));
+		list_del_init(page_deferred_list(page));
 	}
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	free_compound_page(page);
--

The major important is listed above; the minor trivial part below.
Both are only for thought collectings.

--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2869,9 +2869,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 	struct pglist_data *pgdata = NODE_DATA(sc->nid);
 	struct deferred_split *ds_queue;
 	unsigned long flags;
-	LIST_HEAD(list), *pos, *next;
 	struct page *page;
-	int split = 0;
+	unsigned long nr_split = 0;
 
 #ifdef CONFIG_MEMCG
 	if (sc->memcg)
@@ -2884,44 +2883,44 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	/* Take pin on all head pages to avoid freeing them under us */
-	list_for_each_safe(pos, next, &ds_queue->split_queue) {
-		page = list_entry((void *)pos, struct page, mapping);
+	while (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
+		bool locked, pinned;
+
+		page = list_first_entry(&ds_queue->split_queue, struct page,
+						mapping);
 		page = compound_head(page);
+
 		if (get_page_unless_zero(page)) {
-			list_move(page_deferred_list(page), &list);
+			pinned = true;
+			locked = trylock_page(page);
 		} else {
 			/* We lost race with put_compound_page() */
-			list_del_init(page_deferred_list(page));
-			ds_queue->split_queue_len--;
+			pinned = false;
+			locked = false;
+		}
+		list_del_init(page_deferred_list(page));
+		ds_queue->split_queue_len--;
+		--sc->nr_to_scan;
+		if (!pinned)
+			continue;
+		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
+		if (locked) {
+			if (!split_huge_page(page))
+				nr_split++;
+			unlock_page(page);
 		}
-		if (!--sc->nr_to_scan)
-			break;
-	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
-
-	list_for_each_safe(pos, next, &list) {
-		page = list_entry((void *)pos, struct page, mapping);
-		if (!trylock_page(page))
-			goto next;
-		/* split_huge_page() removes page from list on success */
-		if (!split_huge_page(page))
-			split++;
-		unlock_page(page);
-next:
 		put_page(page);
+		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	}
-
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-	list_splice_tail(&list, &ds_queue->split_queue);
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 
 	/*
 	 * Stop shrinker if we didn't split any page, but the queue is empty.
 	 * This can happen if pages were freed under us.
 	 */
-	if (!split && list_empty(&ds_queue->split_queue))
+	if (!nr_split && list_empty(&ds_queue->split_queue))
 		return SHRINK_STOP;
-	return split;
+	return nr_split;
 }
 
 static struct shrinker deferred_split_shrinker = {
--


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
  2019-07-11  0:16 ` Yang Shi
  2019-07-14  3:53 ` Hillf Danton
@ 2019-07-15  4:52 ` Yang Shi
  2019-07-24 21:13 ` Qian Cai
  3 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2019-07-15  4:52 UTC (permalink / raw)
  To: Hillf Danton, Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/13/19 8:53 PM, Hillf Danton wrote:
> On Wed, 10 Jul 2019 14:43:28 -0700 (PDT) Qian Cai wrote:
>> Running LTP oom01 test case with swap triggers a crash below. Revert the series
>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>
>> aefde94195ca mm: thp: make deferred split shrinker memcg aware
>> cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix
>> ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2
>> 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix
>> c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem
>> 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release()
>> 4e050f2df876 mm: thp: extract split_queue_* into a struct
>>
>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@linux.alibaba.com/
>>
>> [ 1145.730682][ T5764] list_del corruption, ffffea00251c8098->next is LIST_POISON1 (dead000000000100)
>> [ 1145.739763][ T5764] ------------[ cut here ]------------
>> [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47!
>> [ 1145.750320][ T5764] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: G        W         5.2.0-next-20190710+ #7
>> [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019
>> [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a
>> [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e
>> a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f>
>> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7
>> [ 1145.802078][ T5764] RSP: 0018:ffff888514d773c0 EFLAGS: 00010082
>> [ 1145.808042][ T5764] RAX: 000000000000004e RBX: ffffea00251c8098 RCX: ffffffffae95d318
>> [ 1145.815923][ T5764] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8888440bd380
>> [ 1145.823806][ T5764] RBP: ffff888514d773d8 R08: ffffed1108817a71 R09: ffffed1108817a70
>> [ 1145.831689][ T5764] R10: ffffed1108817a70 R11: ffff8888440bd387 R12: dead000000000122
>> [ 1145.839571][ T5764] R13: dead000000000100 R14: ffffea00251c8034 R15: dead000000000100
>> [ 1145.847455][ T5764] FS:  00007f765ad4d700(0000) GS:ffff888844080000(0000) knlGS:0000000000000000
>> [ 1145.856299][ T5764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1145.862784][ T5764] CR2: 00007f8cebec7000 CR3: 0000000459338000 CR4: 00000000001406a0
>> [ 1145.870664][ T5764] Call Trace:
>> [ 1145.873835][ T5764]  deferred_split_scan+0x337/0x740
>> [ 1145.878835][ T5764]  ? split_huge_page_to_list+0xe30/0xe30
>> [ 1145.884364][ T5764]  ? __radix_tree_lookup+0x12d/0x1e0
>> [ 1145.889539][ T5764]  ? node_tag_get.part.0.constprop.6+0x40/0x40
>> [ 1145.895592][ T5764]  do_shrink_slab+0x244/0x5a0
>> [ 1145.900159][ T5764]  shrink_slab+0x253/0x440
>> [ 1145.904462][ T5764]  ? unregister_shrinker+0x110/0x110
>> [ 1145.909641][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1145.914383][ T5764]  ? mem_cgroup_protected+0x20f/0x260
>> [ 1145.919645][ T5764]  shrink_node+0x31e/0xa30
>> [ 1145.923949][ T5764]  ? shrink_node_memcg+0x1560/0x1560
>> [ 1145.929126][ T5764]  ? ktime_get+0x93/0x110
>> [ 1145.933340][ T5764]  do_try_to_free_pages+0x22f/0x820
>> [ 1145.938429][ T5764]  ? shrink_node+0xa30/0xa30
>> [ 1145.942906][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1145.947647][ T5764]  ? check_chain_key+0x1df/0x2e0
>> [ 1145.952474][ T5764]  try_to_free_pages+0x242/0x4d0
>> [ 1145.957299][ T5764]  ? do_try_to_free_pages+0x820/0x820
>> [ 1145.962566][ T5764]  __alloc_pages_nodemask+0x9ce/0x1bc0
>> [ 1145.967917][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1145.972657][ T5764]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [ 1145.977920][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1145.982659][ T5764]  ? check_chain_key+0x1df/0x2e0
>> [ 1145.987487][ T5764]  ? do_anonymous_page+0x343/0xe30
>> [ 1145.992489][ T5764]  ? lock_downgrade+0x390/0x390
>> [ 1145.997230][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>> [ 1146.002404][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1146.007145][ T5764]  ? __lru_cache_add+0x122/0x160
>> [ 1146.011974][ T5764]  alloc_pages_vma+0x89/0x2c0
>> [ 1146.016538][ T5764]  do_anonymous_page+0x3e1/0xe30
>> [ 1146.021367][ T5764]  ? __update_load_avg_cfs_rq+0x2c/0x490
>> [ 1146.026893][ T5764]  ? finish_fault+0x120/0x120
>> [ 1146.031461][ T5764]  ? call_function_interrupt+0xa/0x20
>> [ 1146.036724][ T5764]  handle_pte_fault+0x457/0x12c0
>> [ 1146.041552][ T5764]  __handle_mm_fault+0x79a/0xa50
>> [ 1146.046378][ T5764]  ? vmf_insert_mixed_mkwrite+0x20/0x20
>> [ 1146.051817][ T5764]  ? kasan_check_read+0x11/0x20
>> [ 1146.056557][ T5764]  ? __count_memcg_events+0x8b/0x1c0
>> [ 1146.061732][ T5764]  handle_mm_fault+0x17f/0x370
>> [ 1146.066386][ T5764]  __do_page_fault+0x25b/0x5d0
>> [ 1146.071037][ T5764]  do_page_fault+0x4c/0x2cf
>> [ 1146.075426][ T5764]  ? page_fault+0x5/0x20
>> [ 1146.079553][ T5764]  page_fault+0x1b/0x20
>> [ 1146.083594][ T5764] RIP: 0033:0x410be0
>> [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>> [ 1146.106959][ T5764] RSP: 002b:00007f765ad4cec0 EFLAGS: 00010206
>> [ 1146.112921][ T5764] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f98f2674497
>> [ 1146.120804][ T5764] RDX: 0000000001d95000 RSI: 00000000c0000000 RDI: 0000000000000000
>> [ 1146.128687][ T5764] RBP: 00007f74d9d4c000 R08: 00000000ffffffff R09: 0000000000000000
>> [ 1146.136569][ T5764] R10: 0000000000000022 R11: 000000000
>> [ 1147.588181][ T5764] Shutting down cpus with NMI
>> [ 1147.592756][ T5764] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 1147.604414][ T5764] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Ignore the noise if there is no chance you think to corrupt the local list walk
> in some way like:
>
> 	CPU0				CPU1
> 	----				----
> 	take no lock			spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 	list_for_each_safe(pos, next,
> 				&list)
> 					list_del(page_deferred_list(page));
> 	page = list_entry((void *)pos,
> 		struct page, mapping);
> 					spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);

IMHO, I didn't see the race could happen really.

list_del() is called at 3 places:
1. Parallel free_transhuge_page(): The refcount bump should prevent from 
the race.
2. Parallel reclaimer: split_queue_lock should prevent this, so the 
other reclaimer should not see the same page.
3. Parallel split_huge_page(): I'm not sure about this one. But, page 
lock should be acquired before calling split_huge_page() in other call 
paths too.

I'm not sure if I miss anything, please feel free to correct me.

>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>   	if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
>   		if (!list_empty(page_deferred_list(head))) {
>   			ds_queue->split_queue_len--;
> -			list_del(page_deferred_list(head));
> +			list_del_init(page_deferred_list(head));
>   		}
>   		if (mapping)
>   			__dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
>   	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	if (!list_empty(page_deferred_list(page))) {
>   		ds_queue->split_queue_len--;
> -		list_del(page_deferred_list(page));
> +		list_del_init(page_deferred_list(page));
>   	}
>   	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>   	free_compound_page(page);
> --

I proposed the similar thing.

> The major important is listed above; the minor trivial part below.
> Both are only for thought collectings.
>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2869,9 +2869,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   	struct pglist_data *pgdata = NODE_DATA(sc->nid);
>   	struct deferred_split *ds_queue;
>   	unsigned long flags;
> -	LIST_HEAD(list), *pos, *next;
>   	struct page *page;
> -	int split = 0;
> +	unsigned long nr_split = 0;
>   
>   #ifdef CONFIG_MEMCG
>   	if (sc->memcg)
> @@ -2884,44 +2883,44 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   
>   	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	/* Take pin on all head pages to avoid freeing them under us */
> -	list_for_each_safe(pos, next, &ds_queue->split_queue) {
> -		page = list_entry((void *)pos, struct page, mapping);
> +	while (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
> +		bool locked, pinned;
> +
> +		page = list_first_entry(&ds_queue->split_queue, struct page,
> +						mapping);
>   		page = compound_head(page);
> +
>   		if (get_page_unless_zero(page)) {
> -			list_move(page_deferred_list(page), &list);
> +			pinned = true;
> +			locked = trylock_page(page);
>   		} else {
>   			/* We lost race with put_compound_page() */
> -			list_del_init(page_deferred_list(page));
> -			ds_queue->split_queue_len--;
> +			pinned = false;
> +			locked = false;
> +		}
> +		list_del_init(page_deferred_list(page));
> +		ds_queue->split_queue_len--;
> +		--sc->nr_to_scan;
> +		if (!pinned)
> +			continue;
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> +		if (locked) {
> +			if (!split_huge_page(page))
> +				nr_split++;
> +			unlock_page(page);
>   		}
> -		if (!--sc->nr_to_scan)
> -			break;
> -	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> -
> -	list_for_each_safe(pos, next, &list) {
> -		page = list_entry((void *)pos, struct page, mapping);
> -		if (!trylock_page(page))
> -			goto next;
> -		/* split_huge_page() removes page from list on success */
> -		if (!split_huge_page(page))
> -			split++;
> -		unlock_page(page);
> -next:
>   		put_page(page);
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	}
> -
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	list_splice_tail(&list, &ds_queue->split_queue);
>   	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>   
>   	/*
>   	 * Stop shrinker if we didn't split any page, but the queue is empty.
>   	 * This can happen if pages were freed under us.
>   	 */
> -	if (!split && list_empty(&ds_queue->split_queue))
> +	if (!nr_split && list_empty(&ds_queue->split_queue))
>   		return SHRINK_STOP;
> -	return split;
> +	return nr_split;
>   }
>   
>   static struct shrinker deferred_split_shrinker = {
> --


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-12 19:12     ` Yang Shi
  2019-07-13  4:41       ` Yang Shi
@ 2019-07-15 21:23       ` Qian Cai
  2019-07-16  0:22         ` Yang Shi
  2019-07-19  0:54       ` Qian Cai
  2 siblings, 1 reply; 22+ messages in thread
From: Qian Cai @ 2019-07-15 21:23 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel

On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> > Another possible lead is that without reverting the those commits below,
> > kdump
> > kernel would always also crash in shrink_slab_memcg() at this line,
> > 
> > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
> 
> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't 
> think of where nodeinfo was freed but memcg was still online. Maybe a 
> check is needed:

Actually, "memcg" is NULL.

> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..bacda49 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t 
> gfp_mask, int nid,
>          if (!mem_cgroup_online(memcg))
>                  return 0;
> 
> +       if (!memcg->nodeinfo[nid])
> +               return 0;
> +
>          if (!down_read_trylock(&shrinker_rwsem))
>                  return 0;
> 
> > 
> > [    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
> > [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
> > swapper/0/1
> > [    9.072036][    T1]
> > [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
> > 20190711+ #10
> > [    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [    9.072036][    T1] Call Trace:
> > [    9.072036][    T1]  dump_stack+0x62/0x9a
> > [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
> > [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
> > [    9.072036][    T1]  ? shrink_slab+0x111/0x440
> > [    9.072036][    T1]  kasan_report+0xc/0xe
> > [    9.072036][    T1]  __asan_load8+0x71/0xa0
> > [    9.072036][    T1]  shrink_slab+0x111/0x440
> > [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
> > [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
> > [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
> > [    9.072036][    T1]  shrink_node+0x31e/0xa30
> > [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
> > [    9.072036][    T1]  ? ktime_get+0x93/0x110
> > [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
> > [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
> > [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
> > [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
> > [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
> > [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
> > [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [    9.072036][    T1]  ? unwind_dump+0x260/0x260
> > [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
> > [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
> > [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
> > [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
> > [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
> > [    9.072036][    T1]  allocate_slab+0x600/0x11f0
> > [    9.072036][    T1]  new_slab+0x46/0x70
> > [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
> > [    9.072036][    T1]  ? create_object+0x3a/0x3e0
> > [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
> > [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
> > [    9.072036][    T1]  ? create_object+0x3a/0x3e0
> > [    9.072036][    T1]  __slab_alloc+0x12/0x20
> > [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
> > [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
> > [    9.072036][    T1]  create_object+0x3a/0x3e0
> > [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
> > [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
> > [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
> > [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
> > [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
> > [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
> > [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
> > [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
> > [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
> > [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> > [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> > [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
> > [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
> > [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
> > [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
> > [    9.072036][    T1]  acpi_load_tables+0x61/0x80
> > [    9.072036][    T1]  acpi_init+0x10d/0x44b
> > [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
> > [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
> > [    9.072036][    T1]  ? kernfs_get+0x13/0x20
> > [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
> > [    9.072036][    T1]  ? kset_register+0x31/0x50
> > [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
> > [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
> > [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
> > [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
> > [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
> > [    9.072036][    T1]  ? up_write+0x6b/0x190
> > [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
> > [    9.072036][    T1]  ? rest_init+0x188/0x188
> > [    9.072036][    T1]  kernel_init+0x11/0x138
> > [    9.072036][    T1]  ? rest_init+0x188/0x188
> > [    9.072036][    T1]  ret_from_fork+0x22/0x40
> > [    9.072036][    T1]
> > ==================================================================
> > [    9.072036][    T1] Disabling lock debugging due to kernel taint
> > [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
> > 0000000000000dc8
> > [    9.152036][    T1] #PF: supervisor read access in kernel mode
> > [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
> > [    9.152036][    T1] PGD 0 P4D 0
> > [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
> > G    B             5.2.0-next-20190711+ #10
> > [    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 01/25/2019
> > [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
> > [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
> > 00
> > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
> > <4f>
> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> > [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> > [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> > ffffffff8112f288
> > [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> > ffffffff824e0440
> > [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> > fffffbfff049c088
> > [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> > 00000000000001b8
> > [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> > ffff88905757f440
> > [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
> > knlGS:0000000000000000
> > [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
> > 00000000001406b0
> > [    9.152036][    T1] Call Trace:
> > [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
> > [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
> > [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
> > [    9.152036][    T1]  shrink_node+0x31e/0xa30
> > [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
> > [    9.152036][    T1]  ? ktime_get+0x93/0x110
> > [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
> > [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
> > [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
> > [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
> > [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
> > [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
> > [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> > [    9.152036][    T1]  ? unwind_dump+0x260/0x260
> > [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
> > [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
> > [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
> > [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
> > [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
> > [    9.152036][    T1]  allocate_slab+0x600/0x11f0
> > [    9.152036][    T1]  new_slab+0x46/0x70
> > [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
> > [    9.152036][    T1]  ? create_object+0x3a/0x3e0
> > [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
> > [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
> > [    9.152036][    T1]  ? create_object+0x3a/0x3e0
> > [    9.152036][    T1]  __slab_alloc+0x12/0x20
> > [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
> > [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
> > [    9.152036][    T1]  create_object+0x3a/0x3e0
> > [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
> > [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
> > [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
> > [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
> > [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
> > [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
> > [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
> > [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
> > [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
> > [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
> > [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> > [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
> > [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
> > [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
> > [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
> > [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
> > [    9.152036][    T1]  acpi_load_tables+0x61/0x80
> > [    9.152036][    T1]  acpi_init+0x10d/0x44b
> > [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
> > [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
> > [    9.152036][    T1]  ? kernfs_get+0x13/0x20
> > [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
> > [    9.152036][    T1]  ? kset_register+0x31/0x50
> > [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
> > [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
> > [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
> > [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
> > [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
> > [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
> > [    9.152036][    T1]  ? up_write+0x6b/0x190
> > [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
> > [    9.152036][    T1]  ? rest_init+0x188/0x188
> > [    9.152036][    T1]  kernel_init+0x11/0x138
> > [    9.152036][    T1]  ? rest_init+0x188/0x188
> > [    9.152036][    T1]  ret_from_fork+0x22/0x40
> > [    9.152036][    T1] Modules linked in:
> > [    9.152036][    T1] CR2: 0000000000000dc8
> > [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
> > [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
> > [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
> > 00
> > 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
> > <4f>
> > 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
> > [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
> > [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
> > ffffffff8112f288
> > [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
> > ffffffff824e0440
> > [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
> > fffffbfff049c088
> > [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
> > 00000000000001b8
> > [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
> > ffff88905757f440
> > [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
> > knlGS:00000000
> > 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-15 21:23       ` Qian Cai
@ 2019-07-16  0:22         ` Yang Shi
  2019-07-16  1:36           ` Qian Cai
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-16  0:22 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/15/19 2:23 PM, Qian Cai wrote:
> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>> Another possible lead is that without reverting the those commits below,
>>> kdump
>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>
>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>> think of where nodeinfo was freed but memcg was still online. Maybe a
>> check is needed:
> Actually, "memcg" is NULL.

It sounds weird. shrink_slab() is called in mem_cgroup_iter which does 
pin the memcg. So, the memcg should not go away.

>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..bacda49 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>> gfp_mask, int nid,
>>           if (!mem_cgroup_online(memcg))
>>                   return 0;
>>
>> +       if (!memcg->nodeinfo[nid])
>> +               return 0;
>> +
>>           if (!down_read_trylock(&shrinker_rwsem))
>>                   return 0;
>>
>>> [    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>> [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
>>> swapper/0/1
>>> [    9.072036][    T1]
>>> [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>> 20190711+ #10
>>> [    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [    9.072036][    T1] Call Trace:
>>> [    9.072036][    T1]  dump_stack+0x62/0x9a
>>> [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
>>> [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
>>> [    9.072036][    T1]  ? shrink_slab+0x111/0x440
>>> [    9.072036][    T1]  kasan_report+0xc/0xe
>>> [    9.072036][    T1]  __asan_load8+0x71/0xa0
>>> [    9.072036][    T1]  shrink_slab+0x111/0x440
>>> [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>> [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>> [    9.072036][    T1]  shrink_node+0x31e/0xa30
>>> [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>> [    9.072036][    T1]  ? ktime_get+0x93/0x110
>>> [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
>>> [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
>>> [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
>>> [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>> [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [    9.072036][    T1]  ? unwind_dump+0x260/0x260
>>> [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
>>> [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>> [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
>>> [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
>>> [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
>>> [    9.072036][    T1]  allocate_slab+0x600/0x11f0
>>> [    9.072036][    T1]  new_slab+0x46/0x70
>>> [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>> [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>> [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>> [    9.072036][    T1]  __slab_alloc+0x12/0x20
>>> [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
>>> [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
>>> [    9.072036][    T1]  create_object+0x3a/0x3e0
>>> [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
>>> [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>> [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>> [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>> [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>> [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>> [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>> [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>> [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>> [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>> [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>> [    9.072036][    T1]  acpi_load_tables+0x61/0x80
>>> [    9.072036][    T1]  acpi_init+0x10d/0x44b
>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
>>> [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
>>> [    9.072036][    T1]  ? kernfs_get+0x13/0x20
>>> [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
>>> [    9.072036][    T1]  ? kset_register+0x31/0x50
>>> [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
>>> [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
>>> [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>> [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
>>> [    9.072036][    T1]  ? up_write+0x6b/0x190
>>> [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>> [    9.072036][    T1]  kernel_init+0x11/0x138
>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>> [    9.072036][    T1]  ret_from_fork+0x22/0x40
>>> [    9.072036][    T1]
>>> ==================================================================
>>> [    9.072036][    T1] Disabling lock debugging due to kernel taint
>>> [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000dc8
>>> [    9.152036][    T1] #PF: supervisor read access in kernel mode
>>> [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
>>> [    9.152036][    T1] PGD 0 P4D 0
>>> [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>> [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>> G    B             5.2.0-next-20190711+ #10
>>> [    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>> DL385
>>> Gen10, BIOS A40 01/25/2019
>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>> 00
>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>> <4f>
>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>> ffffffff8112f288
>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>> ffffffff824e0440
>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>> fffffbfff049c088
>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>> 00000000000001b8
>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>> ffff88905757f440
>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>> knlGS:0000000000000000
>>> [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>> 00000000001406b0
>>> [    9.152036][    T1] Call Trace:
>>> [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>> [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>> [    9.152036][    T1]  shrink_node+0x31e/0xa30
>>> [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>> [    9.152036][    T1]  ? ktime_get+0x93/0x110
>>> [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
>>> [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
>>> [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
>>> [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>> [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [    9.152036][    T1]  ? unwind_dump+0x260/0x260
>>> [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
>>> [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>> [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
>>> [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
>>> [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
>>> [    9.152036][    T1]  allocate_slab+0x600/0x11f0
>>> [    9.152036][    T1]  new_slab+0x46/0x70
>>> [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>> [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>> [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>> [    9.152036][    T1]  __slab_alloc+0x12/0x20
>>> [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
>>> [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
>>> [    9.152036][    T1]  create_object+0x3a/0x3e0
>>> [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
>>> [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>> [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>> [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>> [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>> [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>> [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>> [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>> [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>> [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>> [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>> [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>> [    9.152036][    T1]  acpi_load_tables+0x61/0x80
>>> [    9.152036][    T1]  acpi_init+0x10d/0x44b
>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
>>> [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
>>> [    9.152036][    T1]  ? kernfs_get+0x13/0x20
>>> [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
>>> [    9.152036][    T1]  ? kset_register+0x31/0x50
>>> [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>> [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
>>> [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
>>> [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>> [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
>>> [    9.152036][    T1]  ? up_write+0x6b/0x190
>>> [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>> [    9.152036][    T1]  kernel_init+0x11/0x138
>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>> [    9.152036][    T1]  ret_from_fork+0x22/0x40
>>> [    9.152036][    T1] Modules linked in:
>>> [    9.152036][    T1] CR2: 0000000000000dc8
>>> [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>> 00
>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>> <4f>
>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>> ffffffff8112f288
>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>> ffffffff824e0440
>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>> fffffbfff049c088
>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>> 00000000000001b8
>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>> ffff88905757f440
>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>> knlGS:00000000
>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-16  0:22         ` Yang Shi
@ 2019-07-16  1:36           ` Qian Cai
  2019-07-16  3:00             ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Qian Cai @ 2019-07-16  1:36 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML



> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> 
> 
> On 7/15/19 2:23 PM, Qian Cai wrote:
>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>> Another possible lead is that without reverting the those commits below,
>>>> kdump
>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>> 
>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>> check is needed:
>> Actually, "memcg" is NULL.
> 
> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.

Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),

-	if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
+	if (!mem_cgroup_online(memcg))
		return 0;

Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,

if (mem_cgroup_disabled())		
	return NULL;

> 
>> 
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a0301ed..bacda49 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>>> gfp_mask, int nid,
>>>          if (!mem_cgroup_online(memcg))
>>>                  return 0;
>>> 
>>> +       if (!memcg->nodeinfo[nid])
>>> +               return 0;
>>> +
>>>          if (!down_read_trylock(&shrinker_rwsem))
>>>                  return 0;
>>> 
>>>> [    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>>> [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
>>>> swapper/0/1
>>>> [    9.072036][    T1]
>>>> [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>>> 20190711+ #10
>>>> [    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [    9.072036][    T1] Call Trace:
>>>> [    9.072036][    T1]  dump_stack+0x62/0x9a
>>>> [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
>>>> [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
>>>> [    9.072036][    T1]  ? shrink_slab+0x111/0x440
>>>> [    9.072036][    T1]  kasan_report+0xc/0xe
>>>> [    9.072036][    T1]  __asan_load8+0x71/0xa0
>>>> [    9.072036][    T1]  shrink_slab+0x111/0x440
>>>> [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>>> [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>>> [    9.072036][    T1]  shrink_node+0x31e/0xa30
>>>> [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>>> [    9.072036][    T1]  ? ktime_get+0x93/0x110
>>>> [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
>>>> [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
>>>> [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
>>>> [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>>> [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [    9.072036][    T1]  ? unwind_dump+0x260/0x260
>>>> [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
>>>> [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>>> [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
>>>> [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
>>>> [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
>>>> [    9.072036][    T1]  allocate_slab+0x600/0x11f0
>>>> [    9.072036][    T1]  new_slab+0x46/0x70
>>>> [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>>> [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>>> [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
>>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>>> [    9.072036][    T1]  __slab_alloc+0x12/0x20
>>>> [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
>>>> [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
>>>> [    9.072036][    T1]  create_object+0x3a/0x3e0
>>>> [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
>>>> [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>>> [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>>> [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>>> [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>>> [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>>> [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>>> [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>> [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>>> [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>> [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>> [    9.072036][    T1]  acpi_load_tables+0x61/0x80
>>>> [    9.072036][    T1]  acpi_init+0x10d/0x44b
>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
>>>> [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
>>>> [    9.072036][    T1]  ? kernfs_get+0x13/0x20
>>>> [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
>>>> [    9.072036][    T1]  ? kset_register+0x31/0x50
>>>> [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
>>>> [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
>>>> [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>>> [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
>>>> [    9.072036][    T1]  ? up_write+0x6b/0x190
>>>> [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
>>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>>> [    9.072036][    T1]  kernel_init+0x11/0x138
>>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>>> [    9.072036][    T1]  ret_from_fork+0x22/0x40
>>>> [    9.072036][    T1]
>>>> ==================================================================
>>>> [    9.072036][    T1] Disabling lock debugging due to kernel taint
>>>> [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000dc8
>>>> [    9.152036][    T1] #PF: supervisor read access in kernel mode
>>>> [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
>>>> [    9.152036][    T1] PGD 0 P4D 0
>>>> [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>>> [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>>> G    B             5.2.0-next-20190711+ #10
>>>> [    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 01/25/2019
>>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>> 00
>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>> <4f>
>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>> ffffffff8112f288
>>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>> ffffffff824e0440
>>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>> fffffbfff049c088
>>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>> 00000000000001b8
>>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>> ffff88905757f440
>>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>>> knlGS:0000000000000000
>>>> [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>>> 00000000001406b0
>>>> [    9.152036][    T1] Call Trace:
>>>> [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>>> [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>>> [    9.152036][    T1]  shrink_node+0x31e/0xa30
>>>> [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>>> [    9.152036][    T1]  ? ktime_get+0x93/0x110
>>>> [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
>>>> [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
>>>> [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
>>>> [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>>> [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>>> [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>> [    9.152036][    T1]  ? unwind_dump+0x260/0x260
>>>> [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
>>>> [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>>> [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
>>>> [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
>>>> [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
>>>> [    9.152036][    T1]  allocate_slab+0x600/0x11f0
>>>> [    9.152036][    T1]  new_slab+0x46/0x70
>>>> [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>>> [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>>> [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
>>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>>> [    9.152036][    T1]  __slab_alloc+0x12/0x20
>>>> [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
>>>> [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
>>>> [    9.152036][    T1]  create_object+0x3a/0x3e0
>>>> [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
>>>> [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>> [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>>> [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>>> [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>>> [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>>> [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>>> [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>>> [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>> [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>>> [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>> [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>> [    9.152036][    T1]  acpi_load_tables+0x61/0x80
>>>> [    9.152036][    T1]  acpi_init+0x10d/0x44b
>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
>>>> [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
>>>> [    9.152036][    T1]  ? kernfs_get+0x13/0x20
>>>> [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
>>>> [    9.152036][    T1]  ? kset_register+0x31/0x50
>>>> [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>> [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
>>>> [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
>>>> [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>>> [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
>>>> [    9.152036][    T1]  ? up_write+0x6b/0x190
>>>> [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
>>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>>> [    9.152036][    T1]  kernel_init+0x11/0x138
>>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>>> [    9.152036][    T1]  ret_from_fork+0x22/0x40
>>>> [    9.152036][    T1] Modules linked in:
>>>> [    9.152036][    T1] CR2: 0000000000000dc8
>>>> [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
>>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>> 00
>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>> <4f>
>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>> ffffffff8112f288
>>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>> ffffffff824e0440
>>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>> fffffbfff049c088
>>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>> 00000000000001b8
>>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>> ffff88905757f440
>>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>>> knlGS:00000000
>>>> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-16  1:36           ` Qian Cai
@ 2019-07-16  3:00             ` Yang Shi
  2019-07-16 23:36               ` Shakeel Butt
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-16  3:00 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, LKML



On 7/15/19 6:36 PM, Qian Cai wrote:
>
>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>> Another possible lead is that without reverting the those commits below,
>>>>> kdump
>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>
>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>> check is needed:
>>> Actually, "memcg" is NULL.
>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>
> -	if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> +	if (!mem_cgroup_online(memcg))
> 		return 0;
>
> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>
> if (mem_cgroup_disabled())		
> 	return NULL;

Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). 
Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() 
check before calling shrink_slab_memcg() as below:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..2f03c61 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int 
nid,
         unsigned long ret, freed = 0;
         struct shrinker *shrinker;

-       if (!mem_cgroup_is_root(memcg))
+       if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
                 return shrink_slab_memcg(gfp_mask, nid, memcg, priority);

         if (!down_read_trylock(&shrinker_rwsem))

>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a0301ed..bacda49 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>>>> gfp_mask, int nid,
>>>>           if (!mem_cgroup_online(memcg))
>>>>                   return 0;
>>>>
>>>> +       if (!memcg->nodeinfo[nid])
>>>> +               return 0;
>>>> +
>>>>           if (!down_read_trylock(&shrinker_rwsem))
>>>>                   return 0;
>>>>
>>>>> [    9.072036][    T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
>>>>> [    9.072036][    T1] Read of size 8 at addr 0000000000000dc8 by task
>>>>> swapper/0/1
>>>>> [    9.072036][    T1]
>>>>> [    9.072036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
>>>>> 20190711+ #10
>>>>> [    9.072036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>>> DL385
>>>>> Gen10, BIOS A40 01/25/2019
>>>>> [    9.072036][    T1] Call Trace:
>>>>> [    9.072036][    T1]  dump_stack+0x62/0x9a
>>>>> [    9.072036][    T1]  __kasan_report.cold.4+0xb0/0xb4
>>>>> [    9.072036][    T1]  ? unwind_get_return_address+0x40/0x50
>>>>> [    9.072036][    T1]  ? shrink_slab+0x111/0x440
>>>>> [    9.072036][    T1]  kasan_report+0xc/0xe
>>>>> [    9.072036][    T1]  __asan_load8+0x71/0xa0
>>>>> [    9.072036][    T1]  shrink_slab+0x111/0x440
>>>>> [    9.072036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>>>> [    9.072036][    T1]  ? unregister_shrinker+0x110/0x110
>>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.072036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>>>> [    9.072036][    T1]  shrink_node+0x31e/0xa30
>>>>> [    9.072036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>>>> [    9.072036][    T1]  ? ktime_get+0x93/0x110
>>>>> [    9.072036][    T1]  do_try_to_free_pages+0x22f/0x820
>>>>> [    9.072036][    T1]  ? shrink_node+0xa30/0xa30
>>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.072036][    T1]  ? check_chain_key+0x1df/0x2e0
>>>>> [    9.072036][    T1]  try_to_free_pages+0x242/0x4d0
>>>>> [    9.072036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>>>> [    9.072036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>>>> [    9.072036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>>> [    9.072036][    T1]  ? unwind_dump+0x260/0x260
>>>>> [    9.072036][    T1]  ? kernel_text_address+0x33/0xc0
>>>>> [    9.072036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>>>> [    9.072036][    T1]  ? ret_from_fork+0x22/0x40
>>>>> [    9.072036][    T1]  alloc_page_interleave+0x18/0x130
>>>>> [    9.072036][    T1]  alloc_pages_current+0xf6/0x110
>>>>> [    9.072036][    T1]  allocate_slab+0x600/0x11f0
>>>>> [    9.072036][    T1]  new_slab+0x46/0x70
>>>>> [    9.072036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>>>> [    9.072036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>>>> [    9.072036][    T1]  ? ___might_sleep+0xab/0xc0
>>>>> [    9.072036][    T1]  ? create_object+0x3a/0x3e0
>>>>> [    9.072036][    T1]  __slab_alloc+0x12/0x20
>>>>> [    9.072036][    T1]  ? __slab_alloc+0x12/0x20
>>>>> [    9.072036][    T1]  kmem_cache_alloc+0x32a/0x400
>>>>> [    9.072036][    T1]  create_object+0x3a/0x3e0
>>>>> [    9.072036][    T1]  kmemleak_alloc+0x71/0xa0
>>>>> [    9.072036][    T1]  kmem_cache_alloc+0x272/0x400
>>>>> [    9.072036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.072036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>>>> [    9.072036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>>>> [    9.072036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>>>> [    9.072036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>>>> [    9.072036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>>>> [    9.072036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>>>> [    9.072036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>>> [    9.072036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>>> [    9.072036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.072036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>>>> [    9.072036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>>> [    9.072036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>>> [    9.072036][    T1]  acpi_load_tables+0x61/0x80
>>>>> [    9.072036][    T1]  acpi_init+0x10d/0x44b
>>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.072036][    T1]  ? bus_uevent_filter+0x16/0x30
>>>>> [    9.072036][    T1]  ? kobject_uevent_env+0x109/0x980
>>>>> [    9.072036][    T1]  ? kernfs_get+0x13/0x20
>>>>> [    9.072036][    T1]  ? kobject_uevent+0xb/0x10
>>>>> [    9.072036][    T1]  ? kset_register+0x31/0x50
>>>>> [    9.072036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>>>> [    9.072036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.072036][    T1]  do_one_initcall+0xfe/0x45a
>>>>> [    9.072036][    T1]  ? initcall_blacklisted+0x150/0x150
>>>>> [    9.072036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>>>> [    9.072036][    T1]  ? kasan_check_write+0x14/0x20
>>>>> [    9.072036][    T1]  ? up_write+0x6b/0x190
>>>>> [    9.072036][    T1]  kernel_init_freeable+0x614/0x6a7
>>>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>>>> [    9.072036][    T1]  kernel_init+0x11/0x138
>>>>> [    9.072036][    T1]  ? rest_init+0x188/0x188
>>>>> [    9.072036][    T1]  ret_from_fork+0x22/0x40
>>>>> [    9.072036][    T1]
>>>>> ==================================================================
>>>>> [    9.072036][    T1] Disabling lock debugging due to kernel taint
>>>>> [    9.145712][    T1] BUG: kernel NULL pointer dereference, address:
>>>>> 0000000000000dc8
>>>>> [    9.152036][    T1] #PF: supervisor read access in kernel mode
>>>>> [    9.152036][    T1] #PF: error_code(0x0000) - not-present page
>>>>> [    9.152036][    T1] PGD 0 P4D 0
>>>>> [    9.152036][    T1] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>>>> [    9.152036][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted:
>>>>> G    B             5.2.0-next-20190711+ #10
>>>>> [    9.152036][    T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>>> DL385
>>>>> Gen10, BIOS A40 01/25/2019
>>>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>>> 00
>>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>>> <4f>
>>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>>> ffffffff8112f288
>>>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>>> ffffffff824e0440
>>>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>>> fffffbfff049c088
>>>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>>> 00000000000001b8
>>>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>>> ffff88905757f440
>>>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>>>> knlGS:0000000000000000
>>>>> [    9.152036][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [    9.152036][    T1] CR2: 0000000000000dc8 CR3: 0000001070212000 CR4:
>>>>> 00000000001406b0
>>>>> [    9.152036][    T1] Call Trace:
>>>>> [    9.152036][    T1]  ? mem_cgroup_iter+0x98/0x840
>>>>> [    9.152036][    T1]  ? unregister_shrinker+0x110/0x110
>>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.152036][    T1]  ? mem_cgroup_protected+0x39/0x260
>>>>> [    9.152036][    T1]  shrink_node+0x31e/0xa30
>>>>> [    9.152036][    T1]  ? shrink_node_memcg+0x1560/0x1560
>>>>> [    9.152036][    T1]  ? ktime_get+0x93/0x110
>>>>> [    9.152036][    T1]  do_try_to_free_pages+0x22f/0x820
>>>>> [    9.152036][    T1]  ? shrink_node+0xa30/0xa30
>>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.152036][    T1]  ? check_chain_key+0x1df/0x2e0
>>>>> [    9.152036][    T1]  try_to_free_pages+0x242/0x4d0
>>>>> [    9.152036][    T1]  ? do_try_to_free_pages+0x820/0x820
>>>>> [    9.152036][    T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>>>> [    9.152036][    T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>>>> [    9.152036][    T1]  ? unwind_dump+0x260/0x260
>>>>> [    9.152036][    T1]  ? kernel_text_address+0x33/0xc0
>>>>> [    9.152036][    T1]  ? arch_stack_walk+0x8f/0xf0
>>>>> [    9.152036][    T1]  ? ret_from_fork+0x22/0x40
>>>>> [    9.152036][    T1]  alloc_page_interleave+0x18/0x130
>>>>> [    9.152036][    T1]  alloc_pages_current+0xf6/0x110
>>>>> [    9.152036][    T1]  allocate_slab+0x600/0x11f0
>>>>> [    9.152036][    T1]  new_slab+0x46/0x70
>>>>> [    9.152036][    T1]  ___slab_alloc+0x5d4/0x9c0
>>>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>>>> [    9.152036][    T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
>>>>> [    9.152036][    T1]  ? ___might_sleep+0xab/0xc0
>>>>> [    9.152036][    T1]  ? create_object+0x3a/0x3e0
>>>>> [    9.152036][    T1]  __slab_alloc+0x12/0x20
>>>>> [    9.152036][    T1]  ? __slab_alloc+0x12/0x20
>>>>> [    9.152036][    T1]  kmem_cache_alloc+0x32a/0x400
>>>>> [    9.152036][    T1]  create_object+0x3a/0x3e0
>>>>> [    9.152036][    T1]  kmemleak_alloc+0x71/0xa0
>>>>> [    9.152036][    T1]  kmem_cache_alloc+0x272/0x400
>>>>> [    9.152036][    T1]  ? kasan_check_read+0x11/0x20
>>>>> [    9.152036][    T1]  ? do_raw_spin_unlock+0xa8/0x140
>>>>> [    9.152036][    T1]  acpi_ps_alloc_op+0x76/0x122
>>>>> [    9.152036][    T1]  acpi_ds_execute_arguments+0x2f/0x18d
>>>>> [    9.152036][    T1]  acpi_ds_get_package_arguments+0x7d/0x84
>>>>> [    9.152036][    T1]  acpi_ns_init_one_package+0x33/0x61
>>>>> [    9.152036][    T1]  acpi_ns_init_one_object+0xfc/0x189
>>>>> [    9.152036][    T1]  acpi_ns_walk_namespace+0x114/0x1f2
>>>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>>> [    9.152036][    T1]  ? acpi_ns_init_one_package+0x61/0x61
>>>>> [    9.152036][    T1]  acpi_walk_namespace+0x9e/0xcb
>>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.152036][    T1]  acpi_ns_initialize_objects+0x99/0xed
>>>>> [    9.152036][    T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
>>>>> [    9.152036][    T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
>>>>> [    9.152036][    T1]  acpi_load_tables+0x61/0x80
>>>>> [    9.152036][    T1]  acpi_init+0x10d/0x44b
>>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.152036][    T1]  ? bus_uevent_filter+0x16/0x30
>>>>> [    9.152036][    T1]  ? kobject_uevent_env+0x109/0x980
>>>>> [    9.152036][    T1]  ? kernfs_get+0x13/0x20
>>>>> [    9.152036][    T1]  ? kobject_uevent+0xb/0x10
>>>>> [    9.152036][    T1]  ? kset_register+0x31/0x50
>>>>> [    9.152036][    T1]  ? kset_create_and_add+0x9f/0xd0
>>>>> [    9.152036][    T1]  ? acpi_sleep_proc_init+0x36/0x36
>>>>> [    9.152036][    T1]  do_one_initcall+0xfe/0x45a
>>>>> [    9.152036][    T1]  ? initcall_blacklisted+0x150/0x150
>>>>> [    9.152036][    T1]  ? rwsem_down_read_slowpath+0x930/0x930
>>>>> [    9.152036][    T1]  ? kasan_check_write+0x14/0x20
>>>>> [    9.152036][    T1]  ? up_write+0x6b/0x190
>>>>> [    9.152036][    T1]  kernel_init_freeable+0x614/0x6a7
>>>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>>>> [    9.152036][    T1]  kernel_init+0x11/0x138
>>>>> [    9.152036][    T1]  ? rest_init+0x188/0x188
>>>>> [    9.152036][    T1]  ret_from_fork+0x22/0x40
>>>>> [    9.152036][    T1] Modules linked in:
>>>>> [    9.152036][    T1] CR2: 0000000000000dc8
>>>>> [    9.152036][    T1] ---[ end trace 568acce4eca01945 ]---
>>>>> [    9.152036][    T1] RIP: 0010:shrink_slab+0x111/0x440
>>>>> [    9.152036][    T1] Code: c7 20 8d 44 82 e8 7f 8b e8 ff 85 c0 0f 84 e2 02
>>>>> 00
>>>>> 00 4c 63 a5 4c ff ff ff 49 81 c4 b8 01 00 00 4b 8d 7c e6 08 e8 3f 07 0e 00
>>>>> <4f>
>>>>> 8b 64 e6 08 49 8d bc 24 20 03 00 00 e8 2d 07 0e 00 49 8b 84 24
>>>>> [    9.152036][    T1] RSP: 0018:ffff88905757f100 EFLAGS: 00010282
>>>>> [    9.152036][    T1] RAX: 0000000000000000 RBX: ffff88905757f1b0 RCX:
>>>>> ffffffff8112f288
>>>>> [    9.152036][    T1] RDX: 1ffffffff049c088 RSI: dffffc0000000000 RDI:
>>>>> ffffffff824e0440
>>>>> [    9.152036][    T1] RBP: ffff88905757f1d8 R08: fffffbfff049c089 R09:
>>>>> fffffbfff049c088
>>>>> [    9.152036][    T1] R10: fffffbfff049c088 R11: ffffffff824e0443 R12:
>>>>> 00000000000001b8
>>>>> [    9.152036][    T1] R13: 0000000000000000 R14: 0000000000000000 R15:
>>>>> ffff88905757f440
>>>>> [    9.152036][    T1] FS:  0000000000000000(0000) GS:ffff889062800000(0000)
>>>>> knlGS:00000000
>>>>>


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-16  3:00             ` Yang Shi
@ 2019-07-16 23:36               ` Shakeel Butt
  2019-07-17  0:12                 ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Shakeel Butt @ 2019-07-16 23:36 UTC (permalink / raw)
  To: Yang Shi, Kirill Tkhai, Vladimir Davydov, Hugh Dickins,
	Michal Hocko, Johannes Weiner, Roman Gushchin
  Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML

Adding related people.

The thread starts at:
http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw

On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/15/19 6:36 PM, Qian Cai wrote:
> >
> >> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 7/15/19 2:23 PM, Qian Cai wrote:
> >>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> >>>>> Another possible lead is that without reverting the those commits below,
> >>>>> kdump
> >>>>> kernel would always also crash in shrink_slab_memcg() at this line,
> >>>>>
> >>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
> >>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
> >>>> think of where nodeinfo was freed but memcg was still online. Maybe a
> >>>> check is needed:
> >>> Actually, "memcg" is NULL.
> >> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> > Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
> >
> > -     if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> > +     if (!mem_cgroup_online(memcg))
> >               return 0;
> >
> > Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
> >
> > if (mem_cgroup_disabled())
> >       return NULL;
>
> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
> check before calling shrink_slab_memcg() as below:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..2f03c61 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
> nid,
>          unsigned long ret, freed = 0;
>          struct shrinker *shrinker;
>
> -       if (!mem_cgroup_is_root(memcg))
> +       if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>                  return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>
>          if (!down_read_trylock(&shrinker_rwsem))
>

We were seeing unneeded oom-kills on kernels with
"cgroup_disabled=memory" and Yang's patch series basically expose the
bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
generalize shrink_slab() calls in shrink_node()") missed the case for
"cgroup_disabled=memory". However I am surprised that root_mem_cgroup
is allocated even for "cgroup_disabled=memory" and it seems like
css_alloc() is called even before checking if the corresponding
controller is disabled.

Yang, can you please send the above change with signed-off and CC to
stable as well?

thanks,
Shakeel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-16 23:36               ` Shakeel Butt
@ 2019-07-17  0:12                 ` Yang Shi
  2019-07-17 17:02                   ` Shakeel Butt
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-17  0:12 UTC (permalink / raw)
  To: Shakeel Butt, Kirill Tkhai, Vladimir Davydov, Hugh Dickins,
	Michal Hocko, Johannes Weiner, Roman Gushchin
  Cc: Qian Cai, Kirill A. Shutemov, Andrew Morton, Linux MM, LKML



On 7/16/19 4:36 PM, Shakeel Butt wrote:
> Adding related people.
>
> The thread starts at:
> http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
>
> On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>> On 7/15/19 6:36 PM, Qian Cai wrote:
>>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>>>> Another possible lead is that without reverting the those commits below,
>>>>>>> kdump
>>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>>>
>>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>>>> check is needed:
>>>>> Actually, "memcg" is NULL.
>>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
>>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>>>
>>> -     if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
>>> +     if (!mem_cgroup_online(memcg))
>>>                return 0;
>>>
>>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>>>
>>> if (mem_cgroup_disabled())
>>>        return NULL;
>> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
>> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
>> check before calling shrink_slab_memcg() as below:
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..2f03c61 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
>> nid,
>>           unsigned long ret, freed = 0;
>>           struct shrinker *shrinker;
>>
>> -       if (!mem_cgroup_is_root(memcg))
>> +       if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>>                   return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>>
>>           if (!down_read_trylock(&shrinker_rwsem))
>>
> We were seeing unneeded oom-kills on kernels with
> "cgroup_disabled=memory" and Yang's patch series basically expose the
> bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
> generalize shrink_slab() calls in shrink_node()") missed the case for
> "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
> is allocated even for "cgroup_disabled=memory" and it seems like
> css_alloc() is called even before checking if the corresponding
> controller is disabled.

I'm surprised too. A quick test with drgn shows root memcg is definitely 
allocated:

 >>> prog['root_mem_cgroup']
*(struct mem_cgroup *)0xffff8902cf058000 = {
[snip]

But, isn't this a bug?

Thanks,
Yang

>
> Yang, can you please send the above change with signed-off and CC to
> stable as well?
>
> thanks,
> Shakeel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-17  0:12                 ` Yang Shi
@ 2019-07-17 17:02                   ` Shakeel Butt
  2019-07-17 17:09                     ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Shakeel Butt @ 2019-07-17 17:02 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov,
	Andrew Morton, Linux MM, LKML

On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
>
>
> On 7/16/19 4:36 PM, Shakeel Butt wrote:
> > Adding related people.
> >
> > The thread starts at:
> > http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
> >
> > On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>
> >>
> >> On 7/15/19 6:36 PM, Qian Cai wrote:
> >>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/15/19 2:23 PM, Qian Cai wrote:
> >>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
> >>>>>>> Another possible lead is that without reverting the those commits below,
> >>>>>>> kdump
> >>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
> >>>>>>>
> >>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
> >>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
> >>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
> >>>>>> check is needed:
> >>>>> Actually, "memcg" is NULL.
> >>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
> >>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
> >>>
> >>> -     if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> >>> +     if (!mem_cgroup_online(memcg))
> >>>                return 0;
> >>>
> >>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
> >>>
> >>> if (mem_cgroup_disabled())
> >>>        return NULL;
> >> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
> >> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
> >> check before calling shrink_slab_memcg() as below:
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index a0301ed..2f03c61 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
> >> nid,
> >>           unsigned long ret, freed = 0;
> >>           struct shrinker *shrinker;
> >>
> >> -       if (!mem_cgroup_is_root(memcg))
> >> +       if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> >>                   return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> >>
> >>           if (!down_read_trylock(&shrinker_rwsem))
> >>
> > We were seeing unneeded oom-kills on kernels with
> > "cgroup_disabled=memory" and Yang's patch series basically expose the
> > bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
> > generalize shrink_slab() calls in shrink_node()") missed the case for
> > "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
> > is allocated even for "cgroup_disabled=memory" and it seems like
> > css_alloc() is called even before checking if the corresponding
> > controller is disabled.
>
> I'm surprised too. A quick test with drgn shows root memcg is definitely
> allocated:
>
>  >>> prog['root_mem_cgroup']
> *(struct mem_cgroup *)0xffff8902cf058000 = {
> [snip]
>
> But, isn't this a bug?

It can be treated as a bug as this is not expected but we can discuss
and take care of it later. I think we need your patch urgently as
memory reclaim and /proc/sys/vm/drop_caches is broken for
"cgroup_disabled=memory" kernel. So, please send your patch asap.

thanks,
Shakeel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-17 17:02                   ` Shakeel Butt
@ 2019-07-17 17:09                     ` Yang Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2019-07-17 17:09 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Kirill Tkhai, Vladimir Davydov, Hugh Dickins, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Qian Cai, Kirill A. Shutemov,
	Andrew Morton, Linux MM, LKML



On 7/17/19 10:02 AM, Shakeel Butt wrote:
> On Tue, Jul 16, 2019 at 5:12 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>> On 7/16/19 4:36 PM, Shakeel Butt wrote:
>>> Adding related people.
>>>
>>> The thread starts at:
>>> http://lkml.kernel.org/r/1562795006.8510.19.camel@lca.pw
>>>
>>> On Mon, Jul 15, 2019 at 8:01 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>
>>>> On 7/15/19 6:36 PM, Qian Cai wrote:
>>>>>> On Jul 15, 2019, at 8:22 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/15/19 2:23 PM, Qian Cai wrote:
>>>>>>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
>>>>>>>>> Another possible lead is that without reverting the those commits below,
>>>>>>>>> kdump
>>>>>>>>> kernel would always also crash in shrink_slab_memcg() at this line,
>>>>>>>>>
>>>>>>>>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>>>>>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>>>>>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>>>>>>> check is needed:
>>>>>>> Actually, "memcg" is NULL.
>>>>>> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away.
>>>>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(),
>>>>>
>>>>> -     if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
>>>>> +     if (!mem_cgroup_online(memcg))
>>>>>                 return 0;
>>>>>
>>>>> Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as,
>>>>>
>>>>> if (mem_cgroup_disabled())
>>>>>         return NULL;
>>>> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled().
>>>> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled()
>>>> check before calling shrink_slab_memcg() as below:
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a0301ed..2f03c61 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int
>>>> nid,
>>>>            unsigned long ret, freed = 0;
>>>>            struct shrinker *shrinker;
>>>>
>>>> -       if (!mem_cgroup_is_root(memcg))
>>>> +       if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>>>>                    return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>>>>
>>>>            if (!down_read_trylock(&shrinker_rwsem))
>>>>
>>> We were seeing unneeded oom-kills on kernels with
>>> "cgroup_disabled=memory" and Yang's patch series basically expose the
>>> bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c:
>>> generalize shrink_slab() calls in shrink_node()") missed the case for
>>> "cgroup_disabled=memory". However I am surprised that root_mem_cgroup
>>> is allocated even for "cgroup_disabled=memory" and it seems like
>>> css_alloc() is called even before checking if the corresponding
>>> controller is disabled.
>> I'm surprised too. A quick test with drgn shows root memcg is definitely
>> allocated:
>>
>>   >>> prog['root_mem_cgroup']
>> *(struct mem_cgroup *)0xffff8902cf058000 = {
>> [snip]
>>
>> But, isn't this a bug?
> It can be treated as a bug as this is not expected but we can discuss
> and take care of it later. I think we need your patch urgently as
> memory reclaim and /proc/sys/vm/drop_caches is broken for
> "cgroup_disabled=memory" kernel. So, please send your patch asap.

Sure. I'm going to post the patch soon.

>
> thanks,
> Shakeel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-12 19:12     ` Yang Shi
  2019-07-13  4:41       ` Yang Shi
  2019-07-15 21:23       ` Qian Cai
@ 2019-07-19  0:54       ` Qian Cai
  2019-07-19  0:59         ` Yang Shi
  2 siblings, 1 reply; 22+ messages in thread
From: Qian Cai @ 2019-07-19  0:54 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel



> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> 
> 
> On 7/11/19 2:07 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>> Hi Qian,
>>> 
>>> 
>>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>>> Could you please share more details about your test? How often did you
>>> run into this problem?
>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
>> is some more information.
>> 
>> # cat .config
>> 
>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> 
> I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case.
> 
> According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason.
> 
> Would you please try the below patch?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..66bd9db 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>         if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
>                 if (!list_empty(page_deferred_list(head))) {
>                         ds_queue->split_queue_len--;
> -                       list_del(page_deferred_list(head));
> +                       list_del_init(page_deferred_list(head));
>                 }
>                 if (mapping)
>                         __dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>         if (!list_empty(page_deferred_list(page))) {
>                 ds_queue->split_queue_len--;
> -               list_del(page_deferred_list(page));
> +               list_del_init(page_deferred_list(page));
>         }
>         spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>         free_compound_page(page);

Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-19  0:54       ` Qian Cai
@ 2019-07-19  0:59         ` Yang Shi
  2019-07-24 18:10           ` Qian Cai
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-19  0:59 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel



On 7/18/19 5:54 PM, Qian Cai wrote:
>
>> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/11/19 2:07 PM, Qian Cai wrote:
>>> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
>>>> Hi Qian,
>>>>
>>>>
>>>> Thanks for reporting the issue. But, I can't reproduce it on my machine.
>>>> Could you please share more details about your test? How often did you
>>>> run into this problem?
>>> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here
>>> is some more information.
>>>
>>> # cat .config
>>>
>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>> I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case.
>>
>> According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason.
>>
>> Would you please try the below patch?
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b7f709d..66bd9db 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>>          if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
>>                  if (!list_empty(page_deferred_list(head))) {
>>                          ds_queue->split_queue_len--;
>> -                       list_del(page_deferred_list(head));
>> +                       list_del_init(page_deferred_list(head));
>>                  }
>>                  if (mapping)
>>                          __dec_node_page_state(page, NR_SHMEM_THPS);
>> @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
>>          spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>          if (!list_empty(page_deferred_list(page))) {
>>                  ds_queue->split_queue_len--;
>> -               list_del(page_deferred_list(page));
>> +               list_del_init(page_deferred_list(page));
>>          }
>>          spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>          free_compound_page(page);
> Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.

It is because the patches have been dropped from -mm tree by Andrew due 
to this problem I guess. You have to use next-20190711, or apply the 
patches on today's linux-next.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-19  0:59         ` Yang Shi
@ 2019-07-24 18:10           ` Qian Cai
  0 siblings, 0 replies; 22+ messages in thread
From: Qian Cai @ 2019-07-24 18:10 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM, linux-kernel

On Thu, 2019-07-18 at 17:59 -0700, Yang Shi wrote:
> 
> On 7/18/19 5:54 PM, Qian Cai wrote:
> > 
> > > On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> > > 
> > > 
> > > 
> > > On 7/11/19 2:07 PM, Qian Cai wrote:
> > > > On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote:
> > > > > Hi Qian,
> > > > > 
> > > > > 
> > > > > Thanks for reporting the issue. But, I can't reproduce it on my
> > > > > machine.
> > > > > Could you please share more details about your test? How often did you
> > > > > run into this problem?
> > > > 
> > > > I can almost reproduce it every time on a HPE ProLiant DL385 Gen10
> > > > server. Here
> > > > is some more information.
> > > > 
> > > > # cat .config
> > > > 
> > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> > > 
> > > I tried your kernel config, but I still can't reproduce it. My compiler
> > > doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my
> > > test, but I don't think this would make any difference for this case.
> > > 
> > > According to the bug call trace in the earlier email, it looks deferred
> > > _split_scan lost race with put_compound_page. The put_compound_page would
> > > call free_transhuge_page() which delete the page from the deferred split
> > > queue, but it may still appear on the deferred list due to some reason.
> > > 
> > > Would you please try the below patch?
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index b7f709d..66bd9db 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page,
> > > struct list_head *list)
> > >          if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> > >                  if (!list_empty(page_deferred_list(head))) {
> > >                          ds_queue->split_queue_len--;
> > > -                       list_del(page_deferred_list(head));
> > > +                       list_del_init(page_deferred_list(head));
> > >                  }
> > >                  if (mapping)
> > >                          __dec_node_page_state(page, NR_SHMEM_THPS);
> > > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page)
> > >          spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> > >          if (!list_empty(page_deferred_list(page))) {
> > >                  ds_queue->split_queue_len--;
> > > -               list_del(page_deferred_list(page));
> > > +               list_del_init(page_deferred_list(page));
> > >          }
> > >          spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> > >          free_compound_page(page);
> > 
> > Unfortunately, I am no longer be able to reproduce the original list
> > corruption with today’s linux-next.
> 
> It is because the patches have been dropped from -mm tree by Andrew due 
> to this problem I guess. You have to use next-20190711, or apply the 
> patches on today's linux-next.
> 

The patch you have here does not help. Only applied the part for
free_transhuge_page() as you requested.

[  375.006307][ T3580] list_del corruption. next->prev should be
ffffea0030e10098, but was ffff888ea8d0cdb8
[  375.015928][ T3580] ------------[ cut here ]------------
[  375.021296][ T3580] kernel BUG at lib/list_debug.c:56!
[  375.026491][ T3580] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[  375.033680][ T3580] CPU: 84 PID: 3580 Comm: oom01 Tainted:
G        W         5.2.0-next-20190711+ #2
[  375.042964][ T3580] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  375.052256][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6
[  375.058135][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7
c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f>
0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7
[  375.077722][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082
[  375.083684][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX:
ffffffffb015d728
[  375.091566][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88903263d380
[  375.099448][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09:
ffffed12064c7a70
[  375.107330][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12:
ffffea0030e10098
[  375.115212][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15:
ffffea0031d40098
[  375.123095][ T3580] FS:  00007fc3dc851700(0000) GS:ffff889032600000(0000)
knlGS:0000000000000000
[  375.131937][ T3580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  375.138421][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4:
00000000001406a0
[  375.146301][ T3580] Call Trace:
[  375.149472][ T3580]  deferred_split_scan+0x337/0x740
[  375.154475][ T3580]  ? split_huge_page_to_list+0xe30/0xe30
[  375.160002][ T3580]  ? __sched_text_start+0x8/0x8
[  375.164743][ T3580]  ? __radix_tree_lookup+0x12d/0x1e0
[  375.169923][ T3580]  do_shrink_slab+0x244/0x5a0
[  375.174490][ T3580]  shrink_slab+0x253/0x440
[  375.178794][ T3580]  ? unregister_shrinker+0x110/0x110
[  375.183972][ T3580]  ? kasan_check_read+0x11/0x20
[  375.188715][ T3580]  ? mem_cgroup_protected+0x20f/0x260
[  375.193976][ T3580]  ? shrink_node+0x1ad/0xa30
[  375.198453][ T3580]  shrink_node+0x31e/0xa30
[  375.202755][ T3580]  ? shrink_node_memcg+0x1560/0x1560
[  375.207934][ T3580]  ? ktime_get+0x93/0x110
[  375.212147][ T3580]  do_try_to_free_pages+0x22f/0x820
[  375.217236][ T3580]  ? shrink_node+0xa30/0xa30
[  375.221711][ T3580]  ? kasan_check_read+0x11/0x20
[  375.226450][ T3580]  ? check_chain_key+0x1df/0x2e0
[  375.231277][ T3580]  try_to_free_pages+0x242/0x4d0
[  375.236102][ T3580]  ? do_try_to_free_pages+0x820/0x820
[  375.241370][ T3580]  __alloc_pages_nodemask+0x9ce/0x1bc0
[  375.246721][ T3580]  ? kasan_check_read+0x11/0x20
[  375.251459][ T3580]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  375.256722][ T3580]  ? kasan_check_read+0x11/0x20
[  375.261458][ T3580]  ? check_chain_key+0x1df/0x2e0
[  375.266287][ T3580]  ? do_anonymous_page+0x343/0xe30
[  375.271289][ T3580]  ? lock_downgrade+0x390/0x390
[  375.276029][ T3580]  ? __count_memcg_events+0x8b/0x1c0
[  375.281204][ T3580]  ? kasan_check_read+0x11/0x20
[  375.285945][ T3580]  ? __lru_cache_add+0x122/0x160
[  375.290774][ T3580]  alloc_pages_vma+0x89/0x2c0
[  375.295339][ T3580]  do_anonymous_page+0x3e1/0xe30
[  375.300168][ T3580]  ? __update_load_avg_cfs_rq+0x2c/0x490
[  375.305692][ T3580]  ? finish_fault+0x120/0x120
[  375.310257][ T3580]  ? alloc_pages_vma+0x21e/0x2c0
[  375.315085][ T3580]  handle_pte_fault+0x457/0x12c0
[  375.319912][ T3580]  __handle_mm_fault+0x79a/0xa50
[  375.324738][ T3580]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[  375.330175][ T3580]  ? kasan_check_read+0x11/0x20
[  375.334913][ T3580]  ? __count_memcg_events+0x8b/0x1c0
[  375.340090][ T3580]  handle_mm_fault+0x17f/0x370
[  375.344745][ T3580]  __do_page_fault+0x25b/0x5d0
[  375.349398][ T3580]  do_page_fault+0x4c/0x2cf
[  375.353793][ T3580]  ? page_fault+0x5/0x20
[  375.357920][ T3580]  page_fault+0x1b/0x20
[  375.361959][ T3580] RIP: 0033:0x410be0
[  375.365737][ T3580] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[  375.385323][ T3580] RSP: 002b:00007fc3dc850ec0 EFLAGS: 00010206
[  375.391283][ T3580] RAX: 0000000000001000 RBX: 00000000c0000000 RCX:
00007fda6c168497
[  375.399164][ T3580] RDX: 00000000041e9000 RSI: 00000000c0000000 RDI:
0000000000000000
[  375.407047][ T3580] RBP: 00007fc25b850000 R08: 00000000ffffffff R09:
0000000000000000
[  375.414928][ T3580] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000000001
[  375.422812][ T3580] R13: 00007ffc4a58701f R14: 0000000000000000 R15:
00007fc3dc850fc0
[  375.430694][ T3580] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars ip_tables
x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 firmware_class
libphy dm_mirror dm_region_hash dm_log dm_mod efivarfs
[  375.455820][ T3580] ---[ end trace 82d52f9627313e53 ]---
[  375.461172][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6
[  375.467048][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7
c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f>
0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7
[  375.486635][ T3580] RSP: 0018:ffff888ebc4b73c0 EFLAGS: 00010082
[  375.492597][ T3580] RAX: 0000000000000054 RBX: ffffea0030e10098 RCX:
ffffffffb015d728
[  375.500479][ T3580] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88903263d380
[  375.508361][ T3580] RBP: ffff888ebc4b73d8 R08: ffffed12064c7a71 R09:
ffffed12064c7a70
[  375.516244][ T3580] R10: ffffed12064c7a70 R11: ffff88903263d387 R12:
ffffea0030e10098
[  375.524124][ T3580] R13: ffffea0031d40098 R14: ffffea0030e10034 R15:
ffffea0031d40098
[  375.532007][ T3580] FS:  00007fc3dc851700(0000) GS:ffff889032600000(0000)
knlGS:0000000000000000
[  375.540851][ T3580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  375.547335][ T3580] CR2: 00007fc25fa39000 CR3: 0000000884762000 CR4:
00000000001406a0
[  375.555217][ T3580] Kernel panic - not syncing: Fatal exception
[  376.868640][ T3580] Shutting down cpus with NMI
[  376.873223][ T3580] Kernel Offset: 0x2ec00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  376.884878][ T3580] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
                   ` (2 preceding siblings ...)
  2019-07-15  4:52 ` Yang Shi
@ 2019-07-24 21:13 ` Qian Cai
  2019-07-25 21:46   ` Yang Shi
  3 siblings, 1 reply; 22+ messages in thread
From: Qian Cai @ 2019-07-24 21:13 UTC (permalink / raw)
  To: Yang Shi; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel

On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
> Running LTP oom01 test case with swap triggers a crash below. Revert the
> series
> "Make deferred split shrinker memcg aware" [1] seems fix the issue.

You might want to look harder on this commit, as reverted it alone on the top of
 5.2.0-next-20190711 fixed the issue.

aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]

[1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
linux.alibaba.com/

There are all console output while running LTP oom01 before the crash that might
be useful.

[  656.302886][ T3384] WARNING: CPU: 79 PID: 3384 at mm/page_alloc.c:4608
__alloc_pages_nodemask+0x1a8a/0x1bc0
[  656.304395][ T3409] kmemleak: Cannot allocate a kmemleak_object structure
[  656.312714][ T3384] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_amd kvm ses enclosure dax_pmem irqbypass dax_pmem_core efivars ip_tables
x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 libphy
firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs
[  656.320916][ T3409] kmemleak: Kernel memory leak detector disabled
[  656.344509][ T3384] CPU: 79 PID: 3384 Comm: oom01 Not tainted 5.2.0-next-
20190711+ #3
[  656.344523][ T3384] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  656.352100][  T829] kmemleak: Automatic memory scanning thread ended
[  656.358648][ T3384] RIP: 0010:__alloc_pages_nodemask+0x1a8a/0x1bc0
[  656.358658][ T3384] Code: 00 85 d2 0f 85 a1 00 00 00 48 c7 c7 e0 29 c3 a3 e8
3b 98 62 00 65 48 8b 1c 25 80 ee 01 00 e9 85 fa ff ff 0f 0b e9 3e fb ff ff <0f>
0b 48 8b b5 00 ff ff ff 8b 8d 84 fe ff ff 48 c7 c2 00 1d 6c a3
[  656.358675][ T3384] RSP: 0000:ffff888efa4a6210 EFLAGS: 00010046
[  656.406140][ T3384] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffffa2b28be2
[  656.414033][ T3384] RDX: 0000000000000000 RSI: dffffc0000000000 RDI:
ffffffffa4d15d60
[  656.421926][ T3384] RBP: ffff888efa4a6420 R08: fffffbfff49a2bad R09:
fffffbfff49a2bac
[  656.429818][ T3384] R10: fffffbfff49a2bac R11: 0000000000000003 R12:
ffffffffa4d15d60
[  656.437711][ T3384] R13: 0000000000000000 R14: 0000000000000800 R15:
0000000000000000
[  656.445605][ T3384] FS:  00007ff44adfc700(0000) GS:ffff889032f80000(0000)
knlGS:0000000000000000
[  656.454459][ T3384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  656.460952][ T3384] CR2: 00007ff2f05e1000 CR3: 0000001012e44000 CR4:
00000000001406a0
[  656.468843][ T3384] Call Trace:
[  656.472026][ T3384]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  656.477303][ T3384]  ? stack_depot_save+0x215/0x58b
[  656.482228][ T3384]  ? lock_downgrade+0x390/0x390
[  656.486976][ T3384]  ? stack_depot_save+0x183/0x58b
[  656.491900][ T3384]  ? kasan_check_read+0x11/0x20
[  656.496647][ T3384]  ? do_raw_spin_unlock+0xa8/0x140
[  656.501658][ T3384]  ? stack_depot_save+0x215/0x58b
[  656.506582][ T3384]  alloc_pages_current+0x9c/0x110
[  656.511505][ T3384]  allocate_slab+0x351/0x11f0
[  656.516077][ T3384]  ? kasan_slab_alloc+0x11/0x20
[  656.520824][ T3384]  new_slab+0x46/0x70
[  656.524702][ T3384]  ? pageout.isra.4+0x3e5/0xa00
[  656.529449][ T3384]  ___slab_alloc+0x5d4/0x9c0
[  656.533933][ T3384]  ? try_to_free_pages+0x242/0x4d0
[  656.538941][ T3384]  ? __alloc_pages_nodemask+0x9ce/0x1bc0
[  656.544476][ T3384]  ? alloc_pages_vma+0x89/0x2c0
[  656.549226][ T3384]  ? __do_page_fault+0x25b/0x5d0
[  656.554064][ T3384]  ? create_object+0x3a/0x3e0
[  656.558637][ T3384]  ? init_object+0x7e/0x90
[  656.562947][ T3384]  ? create_object+0x3a/0x3e0
[  656.567520][ T3384]  __slab_alloc+0x12/0x20
[  656.571742][ T3384]  ? __slab_alloc+0x12/0x20
[  656.576142][ T3384]  kmem_cache_alloc+0x32a/0x400
[  656.580890][ T3384]  create_object+0x3a/0x3e0
[  656.585291][ T3384]  ? stack_depot_save+0x183/0x58b
[  656.590215][ T3384]  kmemleak_alloc+0x71/0xa0
[  656.594611][ T3384]  kmem_cache_alloc+0x272/0x400
[  656.599361][ T3384]  ? ___might_sleep+0xab/0xc0
[  656.603934][ T3384]  ? mempool_free+0x170/0x170
[  656.608507][ T3384]  mempool_alloc_slab+0x2d/0x40
[  656.613254][ T3384]  mempool_alloc+0x10a/0x29e
[  656.617739][ T3384]  ? alloc_pages_vma+0x89/0x2c0
[  656.622485][ T3384]  ? mempool_resize+0x390/0x390
[  656.627233][ T3384]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  656.633730][ T3384]  bio_alloc_bioset+0x150/0x330
[  656.638477][ T3384]  ? bvec_alloc+0x1b0/0x1b0
[  656.642892][ T3384]  alloc_io+0x2f/0x230 [dm_mod]
[  656.647654][ T3384]  __split_and_process_bio+0x99/0x630 [dm_mod]
[  656.653714][ T3384]  ? blk_rq_map_sg+0x9f0/0x9f0
[  656.658388][ T3384]  ? __send_empty_flush.constprop.11+0x1f0/0x1f0 [dm_mod]
[  656.665407][ T3384]  ? check_chain_key+0x1df/0x2e0
[  656.670244][ T3384]  ? kasan_check_read+0x11/0x20
[  656.674992][ T3384]  ? blk_queue_split+0x60/0x90
[  656.679654][ T3384]  ? __blk_queue_split+0x970/0x970
[  656.684679][ T3384]  dm_process_bio+0x33f/0x520 [dm_mod]
[  656.690054][ T3384]  ? __process_bio+0x230/0x230 [dm_mod]
[  656.695515][ T3384]  dm_make_request+0xbd/0x150 [dm_mod]
[  656.700888][ T3384]  ? dm_wq_work+0x1b0/0x1b0 [dm_mod]
[  656.706073][ T3384]  ? lock_downgrade+0x390/0x390
[  656.710821][ T3384]  generic_make_request+0x179/0x4a0
[  656.715917][ T3384]  ? blk_queue_exit+0xc0/0xc0
[  656.720489][ T3384]  ? __unlock_page_memcg+0x4f/0x90
[  656.725495][ T3384]  ? unlock_page_memcg+0x1f/0x30
[  656.730329][ T3384]  submit_bio+0xaa/0x270
[  656.734466][ T3384]  ? generic_make_request+0x4a0/0x4a0
[  656.739739][ T3384]  __swap_writepage+0x8f5/0xba0
[  656.744484][ T3384]  ? __x64_sys_madvise.cold.0+0x22/0x22
[  656.749931][ T3384]  ? generic_swapfile_activate+0x2a0/0x2a0
[  656.755638][ T3384]  ? do_raw_spin_lock+0x118/0x1d0
[  656.760559][ T3384]  ? rwlock_bug.part.0+0x60/0x60
[  656.765393][ T3384]  ? page_swapcount+0x68/0xc0
[  656.769967][ T3384]  ? kasan_check_read+0x11/0x20
[  656.774713][ T3384]  ? do_raw_spin_unlock+0xa8/0x140
[  656.779724][ T3384]  ? __frontswap_store+0x103/0x2b0
[  656.784735][ T3384]  swap_writepage+0x65/0xb0
[  656.789134][ T3384]  pageout.isra.4+0x3e5/0xa00
[  656.793707][ T3384]  ? shrink_slab+0x440/0x440
[  656.798192][ T3384]  ? kasan_check_read+0x11/0x20
[  656.802939][ T3384]  shrink_page_list+0x159f/0x2650
[  656.807860][ T3384]  ? page_evictable+0x150/0x150
[  656.812606][ T3384]  ? kasan_check_read+0x11/0x20
[  656.817352][ T3384]  ? check_chain_key+0x1df/0x2e0
[  656.822185][ T3384]  ? shrink_inactive_list+0x2ea/0x770
[  656.827456][ T3384]  ? lock_downgrade+0x390/0x390
[  656.832202][ T3384]  ? do_raw_spin_lock+0x118/0x1d0
[  656.837126][ T3384]  ? rwlock_bug.part.0+0x60/0x60
[  656.841959][ T3384]  ? kasan_check_read+0x11/0x20
[  656.846706][ T3384]  ? do_raw_spin_unlock+0xa8/0x140
[  656.851715][ T3384]  shrink_inactive_list+0x373/0x770
[  656.856812][ T3384]  ? move_pages_to_lru+0xb60/0xb60
[  656.861820][ T3384]  ? shrink_node_memcg+0xcfa/0x1560
[  656.866917][ T3384]  ? lock_downgrade+0x390/0x390
[  656.871665][ T3384]  ? find_next_bit+0x2c/0xa0
[  656.876151][ T3384]  shrink_node_memcg+0x4ff/0x1560
[  656.881075][ T3384]  ? shrink_active_list+0xa10/0xa10
[  656.886173][ T3384]  ? dev_ifsioc+0xb0/0x4d0
[  656.890485][ T3384]  ? mem_cgroup_iter+0x18e/0x840
[  656.895319][ T3384]  ? kasan_check_read+0x11/0x20
[  656.900066][ T3384]  ? mem_cgroup_protected+0x20f/0x260
[  656.905334][ T3384]  shrink_node+0x1d3/0xa30
[  656.909644][ T3384]  ? shrink_node_memcg+0x1560/0x1560
[  656.914828][ T3384]  ? ktime_get+0x93/0x110
[  656.919050][ T3384]  do_try_to_free_pages+0x22f/0x820
[  656.924146][ T3384]  ? shrink_node+0xa30/0xa30
[  656.928632][ T3384]  ? kasan_check_read+0x11/0x20
[  656.933379][ T3384]  ? check_chain_key+0x1df/0x2e0
[  656.938212][ T3384]  try_to_free_pages+0x242/0x4d0
[  656.943046][ T3384]  ? do_try_to_free_pages+0x820/0x820
[  656.948318][ T3384]  __alloc_pages_nodemask+0x9ce/0x1bc0
[  656.953677][ T3384]  ? kasan_check_read+0x11/0x20
[  656.958424][ T3384]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  656.963697][ T3384]  ? kasan_check_read+0x11/0x20
[  656.968443][ T3384]  ? check_chain_key+0x1df/0x2e0
[  656.973277][ T3384]  ? do_anonymous_page+0x343/0xe30
[  656.978288][ T3384]  ? lock_downgrade+0x390/0x390
[  656.983035][ T3384]  ? __count_memcg_events+0x8b/0x1c0
[  656.988218][ T3384]  ? kasan_check_read+0x11/0x20
[  656.992966][ T3384]  ? __lru_cache_add+0x122/0x160
[  656.997802][ T3384]  alloc_pages_vma+0x89/0x2c0
[  657.002375][ T3384]  do_anonymous_page+0x3e1/0xe30
[  657.007211][ T3384]  ? __update_load_avg_cfs_rq+0x2c/0x490
[  657.012743][ T3384]  ? finish_fault+0x120/0x120
[  657.017314][ T3384]  ? alloc_pages_vma+0x21e/0x2c0
[  657.022148][ T3384]  handle_pte_fault+0x457/0x12c0
[  657.026984][ T3384]  __handle_mm_fault+0x79a/0xa50
[  657.031819][ T3384]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[  657.037267][ T3384]  ? kasan_check_read+0x11/0x20
[  657.042013][ T3384]  ? __count_memcg_events+0x8b/0x1c0
[  657.047199][ T3384]  handle_mm_fault+0x17f/0x370
[  657.051863][ T3384]  __do_page_fault+0x25b/0x5d0
[  657.056521][ T3384]  do_page_fault+0x4c/0x2cf
[  657.060922][ T3384]  ? page_[  659.105948][ T3124] kworker/2:1H: page
allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0,4
[  659.106045][ T1598] kworker/10:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4
[  659.118049][ T3124] CPU: 2 PID: 3124 Comm: kworker/2:1H Tainted:
G        W         5.2.0-next-20190711+ #3
[  659.137325][  T762] ODEBUG: Out of memory. ODEBUG disabled
[  659.140015][ T3124] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  659.140032][ T3124] Workqueue: kblockd blk_mq_run_work_fn
[  659.160266][ T3124] Call Trace:
[  659.163442][ T3124]  dump_stack+0x62/0x9a
[  659.167487][ T3124]  warn_alloc.cold.45+0x8a/0x12a
[  659.172315][ T3124]  ? zone_watermark_ok_safe+0x1a0/0x1a0
[  659.177756][ T3124]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  659.184252][ T3124]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.190658][ T3124]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.197060][ T3124]  ? __isolate_free_page+0x390/0x390
[  659.202239][ T3124]  __alloc_pages_nodemask+0x1aab/0x1bc0
[  659.207680][ T3124]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  659.212949][ T3124]  ? stack_trace_save+0x87/0xb0
[  659.217689][ T3124]  ? freezing_slow_path.cold.1+0x35/0x35
[  659.223219][ T3124]  ? __kasan_kmalloc.part.0+0x81/0xc0
[  659.228485][ T3124]  ? __kasan_kmalloc.part.0+0x44/0xc0
[  659.233750][ T3124]  ? __kasan_kmalloc.constprop.1+0xac/0xc0
[  659.239451][ T3124]  ? kasan_slab_alloc+0x11/0x20
[  659.244196][ T3124]  ? kmem_cache_alloc+0x17a/0x400
[  659.249113][ T3124]  ? alloc_iova+0x33/0x210
[  659.253418][ T3124]  ? alloc_iova_fast+0x47/0xba
[  659.258073][ T3124]  ? dma_ops_alloc_iova.isra.5+0x86/0xa0
[  659.263603][ T3124]  ? map_sg+0x99/0x2f0
[  659.267558][ T3124]  ? scsi_dma_map+0xc6/0x160
[  659.272042][ T3124]  ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  659.280020][ T3124]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.286421][ T3124]  ? scsi_queue_rq+0x7c6/0x1280
[  659.291163][ T3124]  ? ftrace_graph_ret_addr+0x2a/0xb0
[  659.296340][ T3124]  ? stack_trace_save+0x87/0xb0
[  659.301081][ T3124]  alloc_pages_current+0x9c/0x110
[  659.305998][ T3124]  allocate_slab+0x351/0x11f0
[  659.310564][ T3124]  new_slab+0x46/0x70
[  659.314433][ T3124]  ___slab_alloc+0x5d4/0x9c0
[  659.318913][ T3124]  ? should_fail+0x107/0x3bc
[  659.323393][ T3124]  ? alloc_iova+0x33/0x210
[  659.327700][ T3124]  ? lock_downgrade+0x390/0x390
[  659.332441][ T3124]  ? lock_downgrade+0x390/0x390
[  659.337183][ T3124]  ? alloc_iova+0x33/0x210
[  659.341487][ T3124]  __slab_alloc+0x12/0x20
[  659.345704][ T3124]  ? __slab_alloc+0x12/0x20
[  659.350096][ T3124]  kmem_cache_alloc+0x32a/0x400
[  659.354838][ T3124]  ? kasan_check_read+0x11/0x20
[  659.359580][ T3124]  ? do_raw_spin_unlock+0xa8/0x140
[  659.364585][ T3124]  alloc_iova+0x33/0x210
[  659.368714][ T3124]  ? iova_rcache_get+0x1a1/0x300
[  659.373545][ T3124]  alloc_iova_fast+0x47/0xba
[  659.378026][ T3124]  dma_ops_alloc_iova.isra.5+0x86/0xa0
[  659.383381][ T3124]  map_sg+0x99/0x2f0
[  659.387161][ T3124]  scsi_dma_map+0xc6/0x160
[  659.391470][ T3124]  pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  659.399274][ T3124]  ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[  659.405507][ T3124]  pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.411733][ T3124]  ? scsi_init_io+0x102/0x150
[  659.416306][ T3124]  ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[  659.422713][ T3124]  ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[  659.428593][ T3124]  ? sd_init_command+0x88b/0x930 [sd_mod]
[  659.434211][ T3124]  ? blk_add_timer+0xd7/0x110
[  659.438780][ T3124]  scsi_queue_rq+0x7c6/0x1280
[  659.443350][ T3124]  blk_mq_dispatch_rq_list+0x9d3/0xba0
[  659.448702][ T3124]  ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[  659.454145][ T3124]  ? blk_mq_get_driver_tag+0x290/0x290
[  659.459498][ T3124]  ? __lock_acquire.isra.13+0x430/0x830
[  659.464938][ T3124]  blk_mq_sched_dispatch_requests+0x2f4/0x300
[  659.470903][ T3124]  ? blk_mq_sched_restart+0x60/0x60
[  659.475993][ T3124]  __blk_mq_run_hw_queue+0x156/0x230
[  659.481172][ T3124]  ? hctx_lock+0xc0/0xc0
[  659.485301][ T3124]  ? process_one_work+0x426/0xa70
[  659.490217][ T3124]  blk_mq_run_work_fn+0x3b/0x40
[  659.494959][ T3124]  process_one_work+0x53b/0xa70
[  659.499703][ T3124]  ? pwq_dec_nr_in_flight+0x170/0x170
[  659.504967][ T3124]  worker_thread+0x63/0x5b0
[  659.509361][ T3124]  kthread+0x1df/0x200
[  659.513316][ T3124]  ? process_one_work+0xa70/0xa70
[  659.518231][ T3124]  ? kthread_park+0xc0/0xc0
[  659.522625][ T3124]  ret_from_fork+0x22/0x40
[  659.526937][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted:
G        W         5.2.0-next-20190711+ #3
[  659.526991][ T3124] Mem-Info:
[  659.536921][ T1598] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  659.536934][ T1598] Workqueue: kblockd blk_mq_run_work_fn
[  659.540067][ T3124] active_anon:4662210 inactive_anon:359358
isolated_anon:2005
[  659.540067][ T3124]  active_file:10032 inactive_file:12947 isolated_file:0
[  659.540067][ T3124]  unevictable:0 dirty:12 writeback:0 unstable:0
[  659.540067][ T3124]  slab_reclaimable:71207 slab_unreclaimable:1252996
[  659.540067][ T3124]  mapped:17530 shmem:1850 pagetables:11491 bounce:0
[  659.540067][ T3124]  free:54096 free_pcp:5994 free_cma:84
[  659.549192][ T1598] Call Trace:
[  659.549203][ T1598]  dump_stack+0x62/0x9a
[  659.554639][ T3124] Node 0 active_anon:2246440kB inactive_anon:572540kB
active_file:19500kB inactive_file:19016kB unevictable:0kB isolated(anon):7708kB
isolated(file):0kB mapped:24840kB dirty:8kB writeback:0kB shmem:1372kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1689600kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[  659.593619][ T1598]  warn_alloc.cold.45+0x8a/0x12a
[  659.596785][ T3124] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.600821][ T1598]  ? zone_watermark_ok_safe+0x1a0/0x1a0
[  659.630195][ T3124] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.635021][ T1598]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  659.661328][ T3124] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.661337][ T3124] Node 4 active_anon:16402112kB inactive_anon:865180kB
active_file:20600kB inactive_file:32712kB unevictable:0kB isolated(anon):304kB
isolated(file):0kB mapped:45216kB dirty:40kB writeback:12kB shmem:6028kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[  659.666778][ T1598]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.693086][ T3124] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.693096][ T3124] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.699583][ T1598]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.725894][ T3124] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  659.755524][ T1598]  ? __isolate_free_page+0x390/0x390
[  659.761953][ T3124] Node 0 DMA free:15908kB min:24kB low:36kB high:48kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB writepending:0kB present:15996kB managed:15908kB mlocked:0kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
[  659.788234][ T1598]  __alloc_pages_nodemask+0x1aab/0x1bc0
[  659.814544][ T3124] lowmem_reserve[]: 0 1532 19982 19982 19982
[  659.820945][ T1598]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  659.847287][ T3124] Node 0 DMA32 free:73504kB min:2676kB low:4244kB
high:5812kB active_anon:1190128kB inactive_anon:362496kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:1923080kB
managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
free_pcp:1432kB local_pcp:0kB free_cma:0kB
[  659.852428][ T1598]  ? stack_trace_save+0x87/0xb0
[  659.852435][ T1598]  ? freezing_slow_path.cold.1+0x35/0x35
[  659.879003][ T3124] lowmem_reserve[]: 0 0 18450 18450 18450
[  659.884446][ T1598]  ? __kasan_kmalloc.part.0+0x81/0xc0
[  659.890346][ T3124] Node 0 Normal free:47760kB min:137264kB low:156156kB
high:175048kB active_anon:1056208kB inactive_anon:209672kB active_file:19456kB
inactive_file:18996kB unevictable:0kB writepending:0kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB
bounce:0kB free_pcp:9340kB local_pcp:164kB free_cma:0kB
[  659.895574][ T1598]  ? __kasan_kmalloc.part.0+0x44/0xc0
[  659.895581][ T1598]  ? __kasan_kmalloc.constprop.1+0xac/0xc0
[  659.924420][ T3124] lowmem_reserve[]: 0 0 0 0 0
[  659.929163][ T1598]  ? kasan_slab_alloc+0x11/0x20
[  659.929170][ T1598]  ? kmem_cache_alloc+0x17a/0x400
[  659.934724][ T3124] Node 4 Normal free:72728kB min:234904kB low:267232kB
high:299560kB active_anon:16401776kB inactive_anon:865580kB active_file:20596kB
inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:12956kB local_pcp:24kB free_cma:336kB
[  659.940301][ T1598]  ? alloc_iova+0x33/0x210
[  659.940307][ T1598]  ? alloc_iova_fast+0x47/0xba
[  659.945563][ T3124] lowmem_reserve[]: 0 0 0 0 0
[  659.976773][ T1598]  ? dma_ops_alloc_iova.isra.5+0x86/0xa0
[  659.976780][ T1598]  ? map_sg+0x99/0x2f0
[  659.982039][ T3124] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U)
1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[  659.987736][ T1598]  ? scsi_dma_map+0xc6/0x160
[  659.987747][ T1598]  ? pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  659.992300][ T3124] Node 0 DMA32: 0*4kB 0*8kB 2*16kB (M) 5*32kB (UM) 17*64kB
(UM) 8*128kB (UM) 12*256kB (UM) 11*512kB (UM) 10*1024kB (UM) 2*2048kB (UM)
12*4096kB (M) = 74496kB
[  659.997045][ T1598]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  659.997051][ T1598]  ? scsi_queue_rq+0x7c6/0x1280
[  660.001958][ T3124] Node 0 Normal: 0*4kB 0*8kB 198*16kB (MEH) 356*32kB (ME)
83*64kB (UME) 15*128kB (UME) 101*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB =
47648kB
[  660.033521][ T1598]  ? ftrace_graph_ret_addr+0x2a/0xb0
[  660.033528][ T1598]  ? stack_trace_save+0x87/0xb0
[  660.037828][ T3124] Node 4 Normal: 0*4kB 0*8kB 211*16kB (UME) 441*32kB (UME)
449*64kB (UME) 71*128kB (ME) 62*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB =
71184kB
[  660.042481][ T1598]  alloc_pages_current+0x9c/0x110
[  660.047042][ T3124] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[  660.052569][ T1598]  allocate_slab+0x351/0x11f0
[  660.056516][ T3124] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[  660.056521][ T3124] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[  660.070694][ T1598]  new_slab+0x46/0x70
[  660.075169][ T3124] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[  660.083141][ T1598]  ___slab_alloc+0x5d4/0x9c0
[  660.098879][ T3124] 26058 total pagecache pages
[  660.098894][ T3124] 1298 pages in swap cache
[  660.105279][ T1598]  ? should_fail+0x107/0x3bc
[  660.105285][ T1598]  ? alloc_iova+0x33/0x210
[  660.110020][ T3124] Swap cache stats: add 2607, delete 1311, find 0/1
[  660.110024][ T3124] Free swap  = 32919548kB
[  660.124719][ T1598]  ? lock_downgrade+0x390/0x390
[  660.124725][ T1598]  ? lock_downgrade+0x390/0x390
[  660.129894][ T3124] Total swap = 32952316kB
[  660.129899][ T3124] 15685025 pages RAM
[  660.134637][ T1598]  ? alloc_iova+0x33/0x210
[  660.149328][ T3124] 0 pages HighMem/MovableOnly
[  660.149332][ T3124] 2465994 pages reserved
[  660.154245][ T1598]  __slab_alloc+0x12/0x20
[  660.154252][ T1598]  ? __slab_alloc+0x12/0x20
[  660.163701][ T3124] 16384 pages cma reserved
[  660.163763][ T3124] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[  660.168269][ T1598]  kmem_cache_alloc+0x32a/0x400
[  660.168276][ T1598]  ? kasan_check_read+0x11/0x20
[  660.177465][ T3124]   cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[  660.177470][ T3124]   node 0: slabs: 10580, objs: 95220, free: 0
[  660.186924][ T1598]  ? do_raw_spin_unlock+0xa8/0x140
[  660.186930][ T1598]  alloc_iova+0x33/0x210
[  660.190792][ T3124]   node 4: slabs: 2292, objs: 20628, free: 25
[  660.199982][ T1598]  ? iova_rcache_get+0x1a1/0x300
[  660.199989][ T1598]  alloc_iova_fast+0x47/0xba
[  660.204513][ T3124] kworker/2:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4
[  660.209026][ T1598]  dma_ops_alloc_iova.isra.5+0x86/0xa0
[  660.351109][ T1598]  map_sg+0x99/0x2f0
[  660.354891][ T1598]  ? __debug_object_init+0x412/0x7a0
[  660.360070][ T1598]  scsi_dma_map+0xc6/0x160
[  660.364381][ T1598]  pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  660.372184][ T1598]  ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[  660.378415][ T1598]  pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  660.384644][ T1598]  ? scsi_init_io+0x102/0x150
[  660.389217][ T1598]  ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[  660.395622][ T1598]  ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[  660.401503][ T1598]  ? sd_init_command+0x88b/0x930 [sd_mod]
[  660.407119][ T1598]  ? blk_add_timer+0xd7/0x110
[  660.411686][ T1598]  scsi_queue_rq+0x7c6/0x1280
[  660.416252][ T1598]  blk_mq_dispatch_rq_list+0x9d3/0xba0
[  660.421604][ T1598]  ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[  660.427045][ T1598]  ? blk_mq_get_driver_tag+0x290/0x290
[  660.432396][ T1598]  ?
__lock_acquire.isra.13+0xT3124]  __blk_mq_run_hw_queue+0x156/0x230
[  660.822569][ T3124]  ? hctx_lock+0xc0/0xc0
[  660.826700][ T3124]  ? process_one_work+0x426/0xa70
[  660.831617][ T3124]  blk_mq_run_work_fn+0x3b/0x40
[  660.836358][ T3124]  process_one_work+0x53b/0xa70
[  660.841100][ T3124]  ? pwq_dec_nr_in_flight+0x170/0x170
[  660.846365][ T3124]  worker_thread+0x63/0x5b0
[  660.850756][ T3124]  kthread+0x1df/0x200
[  660.854712][ T3124]  ? process_one_work+0xa70/0xa70
[  660.859626][ T3124]  ? kthread_park+0xc0/0xc0
[  660.864021][ T3124]  ret_from_fork+0x22/0x40
[  660.868328][ T3124] warn_alloc_show_mem: 1 callbacks suppressed
[  660.868332][ T1598] CPU: 10 PID: 1598 Comm: kworker/10:1H Tainted:
G        W         5.2.0-next-20190711+ #3
[  660.868335][ T3124] Mem-Info:
[  660.868485][ T3124] active_anon:4662011 inactive_anon:359383
isolated_anon:2155
[  660.868485][ T3124]  active_file:10012 inactive_file:12922 isolated_file:0
[  660.868485][ T3124]  unevictable:0 dirty:12 writeback:0 unstable:0
[  660.868485][ T3h:175048kB active_anon:1056208kB inactive_anon:209448kB
active_file:19452kB inactive_file:18996kB unevictable:0kB writepending:0kB
present:27262976kB managed:18893712kB mlocked:0kB kernel_stack:22240kB
pagetables:10064kB bounce:0kB free_pcp:8784kB local_pcp:164kB free_cma:0kB
[  661.222532][ T1598]  ? kernel_poison_pages.cold.2+0x8c/0x8c
[  661.228397][ T3124] lowmem_reserve[]: 0 0 0 0 0
[  661.233138][ T1598]  ? vprintk_default+0x1f/0x30
[  661.233146][ T1598]  alloc_pages_current+0x9c/0x110
[  661.238174][ T3124] Node 4 Normal free:71384kB min:234904kB low:267232kB
high:299560kB active_anon:16401776kB inactive_anon:865588kB active_file:20596kB
inactive_file:32692kB unevictable:0kB writepending:40kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:12872kB local_pcp:24kB free_cma:336kB
[  661.266900][ T1598]  allocate_slab+0x351/0x11f0
[  661.266905][ T1598]  new_slab+0x46/0x70
[  661.271461][ T3124] lowmem_reserve[]: 0 0 0 0 0
[  661.275941][ T1598]  ___slab_alloc+0x5d4/0x9c0
[  661.275948][ T1598]  ? should0
[  661.543007][ T3132]   cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[  661.543011][ T3203]   node 0: slabs: 10582, objs: 95238, free: 7
[  661.543016][ T3132]   node 0: slabs: 10582, objs: 95238, free: 7
[  661.543020][ T3203]   node 4: slabs: 2293, objs: 20637, free: 30
[  661.543026][ T3132]   node 4: slabs: 2293, objs: 20637, free: 30
[  661.543040][ T3203] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[  661.543046][ T3203]   cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[  661.543052][ T3203]   node 0: slabs: 10582, objs: 95238, free: 7
[  661.543057][ T3132] SLUB: Unable to allocate memory on node -1,
gfp=0xa20(GFP_ATOMIC)
[  661.543061][ T3203]   node 4: slabs: 2293, objs: 20637, free: 30
[  661.543066][ T3132]   cache: iommu_iova, object size: 40, buffer size: 448,
default order: 0, min order: 0
[  661.543072][ T3132]   node 0: slabs: 10582, objs: 95238, free: 7
[  661.543078][ T3132]   node 4: slabs: 2293, objs: 20637, free: 30
[  661.543544][ T3205] SLUB: Unable to allocnevictable:0kB isolated(anon):352kB
isolated(file):0kB mapped:45056kB dirty:40kB writeback:52kB shmem:6028kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 15167488kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[  662.181289][ T1598] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  662.207607][ T3209]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  662.212434][ T1598] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  662.238751][ T3209]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  662.244187][ T1598] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(ano  alloc_iova_fast+0x47/0xba
[  662.835750][ T3209]  dma_ops_alloc_iova.isra.5+0x86/0xa0
[  662.841103][ T3209]  map_sg+0x99/0x2f0
[  662.844886][ T3209]  ? kasan_check_read+0x11/0x20
[  662.849627][ T3209]  scsi_dma_map+0xc6/0x160
[  662.853938][ T3209]  pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  662.861740][ T3209]  ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[  662.867971][ T3209]  pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  662.874198][ T3209]  ? scsi_init_io+0x102/0x150
[  662.878768][ T3209]  ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[  662.885176][ T3209]  ? pqi_event_worker+0xdf0/0xdf0 [smartpqi]
[  662.891055][ T3209]  ? sd_init_command+0x88b/0x930 [sd_mod]
[  662.896672][ T3209]  ? blk_add_timer+0xd7/0x110
[  662.901240][ T3209]  scsi_queue_rq+0x7c6/0x1280
[  662.905807][ T3209]  blk_mq_dispatch_rq_list+0x9d3/0xba0
[  662.911159][ T3209]  ? blk_mq_flush_busy_ctxs+0x1c5/0x450
[  662.916601][ T3209]  ? blk_mq_get_driver_tag+0x290/0x290
[  662.921953][ T3209]  ? __lock_acquire.isra.13+0x430/0x830
[  662.927394][ T3209]  blk_mq_sched_diag+0x290/0x290
[  663.313403][ T3146]  ? __lock_acquire.isra.13+0x430/0x830
[  663.318844][ T3146]  blk_mq_sched_dispatch_requests+0x2f4/0x300
[  663.324807][ T3146]  ? blk_mq_sched_restart+0x60/0x60
[  663.329898][ T3146]  __blk_mq_run_hw_queue+0x156/0x230
[  663.335076][ T3146]  ? hctx_lock+0xc0/0xc0
[  663.339211][ T3146]  ? process_one_work+0x426/0xa70
[  663.344128][ T3146]  blk_mq_run_work_fn+0x3b/0x40
[  663.348870][ T3146]  process_one_work+0x53b/0xa70
[  663.353613][ T3146]  ? pwq_dec_nr_in_flight+0x170/0x170
[  663.358880][ T3146]  worker_thread+0x63/0x5b0
[  663.363277][ T3146]  kthread+0x1df/0x200
[  663.367233][ T3146]  ? process_one_work+0xa70/0xa70
[  663.372148][ T3146]  ? kthread_park+0xc0/0xc0
[  663.376543][ T3146]  ret_from_fork+0x22/0x40
[  663.380848][ T3146] warn_alloc_show_mem: 1 callbacks suppressed
[  663.380855][ T3123] CPU: 1 PID: 3123 Comm: kworker/1:1H Tainted:
G        W         5.2.0-next-20190711+ #3
[  663.380857][ T3146] Mem-Info:
[  663.381000][ T3146] active_anon:4654271 inactive_anon:367023
isolated_anon:2263
[  663.381000T3123]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  663.744691][ T3146] Node 0 Normal free:74264kB min:137264kB low:156156kB
high:175048kB active_anon:1055816kB inactive_anon:209292kB active_file:19416kB
inactive_file:18964kB unevictable:0kB writepending:248kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10064kB
bounce:0kB free_pcp:9356kB local_pcp:124kB free_cma:0kB
[  663.750101][ T3123]  ? lock_downgrade+0x390/0x390
[  663.778942][ T3146] lowmem_reserve[]: 0 0 0 0 0
[  663.783688][ T3123]  ? do_raw_spin_lock+0x118/0x1d0
[  663.789326][ T3146] Node 4 Normal free:81632kB min:234904kB low:267232kB
high:299560kB active_anon:16368972kB inactive_anon:898504kB active_file:20548kB
inactive_file:32468kB unevictable:0kB writepending:104kB present:33538048kB
managed:32332156kB mlocked:0kB kernel_stack:23040kB pagetables:35900kB
bounce:0kB free_pcp:11372kB local_pcp:160kB free_cma:0kB
[  663.794556][ T3123]  ? rwlock_bug.part.0+0x60/0x60
[  663.794563][ T3123]  ? get_partial_node+0x48/0x540
[  663.825936][ T3146] lowmem_reserve[]: 0 0 0 0 0
[  663.830678][ T3123]   #3
[  664.269661][ T3202] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  664.278993][ T3202] Workqueue: kblockd blk_mq_run_work_fn
[  664.284453][ T3202] Call Trace:
[  664.287655][ T3202]  dump_stack+0x62/0x9a
[  664.291721][ T3202]  warn_alloc.cold.45+0x8a/0x12a
[  664.296577][ T3202]  ? zone_watermark_ok_safe+0x1a0/0x1a0
[  664.302044][ T3202]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  664.308564][ T3202]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  664.314996][ T3202]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  664.321420][ T3202]  ? __isolate_free_page+0x390/0x390
[  664.326613][ T3202]  __alloc_pages_nodemask+0x1aab/0x1bc0
[  664.332062][ T3202]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  664.337345][ T3202]  ? stack_trace_save+0x87/0xb0
[  664.342103][ T3202]  ? freezing_slow_path.cold.1+0x35/0x35
[  664.347647][ T3202]  ? __kasan_kmalloc.part.0+0x81/0xc0
[  664.352925][ T3202]  ? __kasan_kmalloc.part.0+0x44/0xc0
[  664.358204][ T3202]  ? __kasan_kmalloc.constprop.1+0xac/0xc0
[  664.363922][ hmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
all_unreclaimable? no
[  664.759472][ T3127]  ? __read_once_size_nocheck.constprop.2+0x10/0x10
[  664.759508][ T3127]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  664.785836][ T3202] Node 4 active_anon:15362196kB inactive_anon:1296156kB
active_file:15052kB inactive_file:17752kB unevictable:0kB isolated(anon):66644kB
isolated(file):112kB mapped:30596kB dirty:0kB writeback:3968kB shmem:1080kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14735360kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[  664.789031][ T3127]  ? pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  664.793056][ T3202] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  664.819386][ T3127]  ? __isolate_free_page+0x390/0x390
[  664.819401][ T3127]  __alloc_pages_nodemask+0x1aab/0x1bc0
[  664.824245][ T3202] Node 6 active_anon7]  map_sg+0x99/0x2f0
[  665.159320][ T3202] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[  665.191157][ T3127]  ? kasan_check_read+0x11/0x20
[  665.191176][ T3127]  scsi_dma_map+0xc6/0x160
[  665.195480][ T3202] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[  665.195490][ T3202] Node 4 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[  665.200248][ T3127]  pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  665.204805][ T3202] 69668 total pagecache pages
[  665.209566][ T3127]  ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[  665.213886][ T3202] 65404 pages in swap cache
[  665.228054][ T3127]  pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  665.228074][ T3127]  ? scsi_init_io+0x102/0x150
[  665.232285][ T3202] Swap cache stats: add 486050, delete 428240, find 59/149
[  665.232294][ T3202] Free swap  = 30975484kB
[  665.236832][ T3127]  ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[  665.236858][ T3127]  ? pqi_event_worker+0xdf0/0xdf0 [smar390
[  665.806891][ T3141]  ? lock_downgrade+0x390/0x390
[  665.811664][ T3141]  ? alloc_iova+0x33/0x210
[  665.815987][ T3141]  __slab_alloc+0x12/0x20
[  665.820232][ T3141]  ? __slab_alloc+0x12/0x20
[  665.824654][ T3141]  kmem_cache_alloc+0x32a/0x400
[  665.829413][ T3141]  ? kasan_check_read+0x11/0x20
[  665.834179][ T3141]  ? do_raw_spin_unlock+0xa8/0x140
[  665.839221][ T3141]  alloc_iova+0x33/0x210
[  665.843369][ T3141]  ? iova_rcache_get+0x1a1/0x300
[  665.848225][ T3141]  alloc_iova_fast+0x47/0xba
[  665.852736][ T3141]  dma_ops_alloc_iova.isra.5+0x86/0xa0
[  665.858122][ T3141]  map_sg+0x99/0x2f0
[  665.861957][ T3141]  ? kasan_check_read+0x11/0x20
[  665.866759][ T3141]  scsi_dma_map+0xc6/0x160
[  665.871098][ T3141]  pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470
[smartpqi]
[  665.878918][ T3141]  ? pqi_alloc_io_request+0x11e/0x140 [smartpqi]
[  665.885172][ T3141]  pqi_scsi_queue_command+0x791/0xdd0 [smartpqi]
[  665.891435][ T3141]  ? scsi_init_io+0x102/0x150
[  665.896103][ T3141]  ? sd_setup_read_write_cmnd+0x6e9/0xa90 [sd_mod]
[  665.902619][ T3141]  ? pqie:0kB unevictable:0kB writepending:0kB
present:15996kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  666.300385][ T3141] lowmem_reserve[]: 0 1532 19982 19982 19982
[  666.306395][ T3141] Node 0 DMA32 free:75568kB min:2676kB low:4244kB
high:5812kB active_anon:749752kB inactive_anon:395332kB active_file:128kB
inactive_file:168kB unevictable:0kB writepending:0kB present:1923080kB
managed:1634348kB mlocked:0kB kernel_stack:0kB pagetables:28kB bounce:0kB
free_pcp:55484kB local_pcp:248kB free_cma:0kB
[  666.335894][ T3141] lowmem_reserve[]: 0 0 18450 18450 18450
[  666.341762][ T3141] Node 0 Normal free:52856kB min:52716kB low:71608kB
high:90500kB active_anon:1127696kB inactive_anon:80184kB active_file:492kB
inactive_file:656kB unevictable:0kB writepending:2208kB present:27262976kB
managed:18893712kB mlocked:0kB kernel_stack:22240kB pagetables:10372kB
bounce:0kB free_pcp:12848kB local_pcp:36kB free_cma:0kB
[  666.372602][ T3141] lowmem_reserve[]: 0 0 0 0 0
[  666.377419][ T3141] Node 4 Normal free:234488kB m[  685.274656][ T3456]
list_del corruption. prev->next should be ffffea0022b10098, but was
0000000000000000
[  685.284254][ T3456] ------------[ cut here ]------------
[  685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
[  685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[  685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
G        W         5.2.0-next-20190711+ #3
[  685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 06/24/2019
[  685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
[  685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
[  685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
[  685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
ffffffffa2d5d708
[  685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff8888442bd380
[  685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
ffffed1108857a70
[  685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
0000000000000000
[  685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
ffffea0022b10098
[  685.391348][ T3456] FS:  00007fbe26db4700(0000) GS:ffff888844280000(0000)
knlGS:0000000000000000
[  685.400194][ T3456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
00000000001406a0
[  685.414563][ T3456] Call Trace:
[  685.417736][ T3456]  deferred_split_scan+0x337/0x740
[  685.422741][ T3456]  ? split_huge_page_to_list+0xe10/0xe10
[  685.428272][ T3456]  ? __radix_tree_lookup+0x12d/0x1e0
[  685.433453][ T3456]  ? node_tag_get.part.0.constprop.6+0x40/0x40
[  685.439505][ T3456]  do_shrink_slab+0x244/0x5a0
[  685.444071][ T3456]  shrink_slab+0x253/0x440
[  685.448375][ T3456]  ? unregister_shrinker+0x110/0x110
[  685.453551][ T3456]  ? kasan_check_read+0x11/0x20
[  685.458291][ T3456]  ? mem_cgroup_protected+0x20f/0x260
[  685.463555][ T3456]  shrink_node+0x31e/0xa30
[  685.467858][ T3456]  ? shrink_node_memcg+0x1560/0x1560
[  685.473036][ T3456]  ? ktime_get+0x93/0x110
[  685.477250][ T3456]  do_try_to_free_pages+0x22f/0x820
[  685.482338][ T3456]  ? shrink_node+0xa30/0xa30
[  685.486815][ T3456]  ? kasan_check_read+0x11/0x20
[  685.491556][ T3456]  ? check_chain_key+0x1df/0x2e0
[  685.496383][ T3456]  try_to_free_pages+0x242/0x4d0
[  685.501209][ T3456]  ? do_try_to_free_pages+0x820/0x820
[  685.506476][ T3456]  __alloc_pages_nodemask+0x9ce/0x1bc0
[  685.511826][ T3456]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[  685.517089][ T3456]  ? kasan_check_read+0x11/0x20
[  685.521826][ T3456]  ? check_chain_key+0x1df/0x2e0
[  685.526657][ T3456]  ? do_anonymous_page+0x343/0xe30
[  685.531658][ T3456]  ? lock_downgrade+0x390/0x390
[  685.536399][ T3456]  ? get_kernel_page+0xa0/0xa0
[  685.541050][ T3456]  ? __lru_cache_add+0x108/0x160
[  685.545879][ T3456]  alloc_pages_vma+0x89/0x2c0
[  685.550444][ T3456]  do_anonymous_page+0x3e1/0xe30
[  685.555271][ T3456]  ? __update_load_avg_cfs_rq+0x2c/0x490
[  685.560796][ T3456]  ? finish_fault+0x120/0x120
[  685.565361][ T3456]  ? alloc_pages_vma+0x21e/0x2c0
[  685.570187][ T3456]  handle_pte_fault+0x457/0x12c0
[  685.575014][ T3456]  __handle_mm_fault+0x79a/0xa50
[  685.579841][ T3456]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[  685.585280][ T3456]  ? kasan_check_read+0x11/0x20
[  685.590021][ T3456]  ? __count_memcg_events+0x8b/0x1c0
[  685.595196][ T3456]  handle_mm_fault+0x17f/0x370
[  685.599850][ T3456]  __do_page_fault+0x25b/0x5d0
[  685.604501][ T3456]  do_page_fault+0x4c/0x2cf
[  685.608892][ T3456]  ? page_fault+0x5/0x20
[  685.613019][ T3456]  page_fault+0x1b/0x20
[  685.617058][ T3456] RIP: 0033:0x410be0
[  685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[  68[  687.120156][ T3456] Shutting down cpus with NMI
[  687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-24 21:13 ` Qian Cai
@ 2019-07-25 21:46   ` Yang Shi
  2019-08-05 22:15     ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-07-25 21:46 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/24/19 2:13 PM, Qian Cai wrote:
> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>> series
>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
> You might want to look harder on this commit, as reverted it alone on the top of
>   5.2.0-next-20190711 fixed the issue.
>
> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>
> [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
> linux.alibaba.com/

This is the real meat of the patch series, which converted to memcg 
deferred split queue actually.

>
>
> list_del corruption. prev->next should be ffffea0022b10098, but was
> 0000000000000000

Finally I could reproduce the list corruption issue on my machine with 
THP swap (swap device is fast device). I should checked this with you at 
the first place. The problem can't be reproduced with rotate swap 
device. So, I'm supposed you were using THP swap too.

Actually, I found two issues with THP swap:
1. free_transhuge_page() is called in reclaim path instead of put_page. 
The mem_cgroup_uncharge() is called before free_transhuge_page() in 
reclaim path, which causes page->mem_cgroup is NULL so the wrong 
deferred_split_queue would be used, so the THP was not deleted from the 
memcg's list at all. Then the page might be split or reused later, 
page->mapping would be override.

2. There is a race condition caused by try_to_unmap() with THP swap. The 
try_to_unmap() just calls page_remove_rmap() to add THP to deferred 
split queue in reclaim path. This might cause the below race condition 
to corrupt the list:

                   A                                      B
deferred_split_scan
     list_move
                                                try_to_unmap
                                                       list_add_tail

list_splice <-- The list might get corrupted here

                                                free_transhuge_page
                                                       list_del <-- 
kernel bug triggered

I hope the below patch would solve your problem (tested locally).


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7f709d..d6612ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)

         VM_BUG_ON_PAGE(!PageTransHuge(page), page);

+       /*
+        * The try_to_unmap() in page reclaim path might reach here too,
+        * this may cause a race condition to corrupt deferred split queue.
+        * And, if page reclaim is already handling the same page, it is
+        * unnecessary to handle it again in shrinker.
+        *
+        * Check PageSwapCache to determine if the page is being
+        * handled by page reclaim since THP swap would add the page into
+        * swap cache before reaching try_to_unmap().
+        */
+       if (PageSwapCache(page))
+               return;
+
         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
         if (list_empty(page_deferred_list(page))) {
                 count_vm_event(THP_DEFERRED_SPLIT_PAGE);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..40c684a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct 
list_head *page_list,
                  * Is there need to periodically free_page_list? It would
                  * appear not as the counts should be low
                  */
-               if (unlikely(PageTransHuge(page))) {
-                       mem_cgroup_uncharge(page);
+               if (unlikely(PageTransHuge(page)))
                         (*get_compound_page_dtor(page))(page);
-               } else
+               else
                         list_add(&page->lru, &free_pages);
                 continue;

@@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,

                         if (unlikely(PageCompound(page))) {
spin_unlock_irq(&pgdat->lru_lock);
-                               mem_cgroup_uncharge(page);
(*get_compound_page_dtor(page))(page);
spin_lock_irq(&pgdat->lru_lock);
                         } else

> [  685.284254][ T3456] ------------[ cut here ]------------
> [  685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
> [  685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [  685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
> G        W         5.2.0-next-20190711+ #3
> [  685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 06/24/2019
> [  685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
> [  685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
> [  685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
> [  685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
> ffffffffa2d5d708
> [  685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
> ffff8888442bd380
> [  685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
> ffffed1108857a70
> [  685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
> 0000000000000000
> [  685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
> ffffea0022b10098
> [  685.391348][ T3456] FS:  00007fbe26db4700(0000) GS:ffff888844280000(0000)
> knlGS:0000000000000000
> [  685.400194][ T3456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
> 00000000001406a0
> [  685.414563][ T3456] Call Trace:
> [  685.417736][ T3456]  deferred_split_scan+0x337/0x740
> [  685.422741][ T3456]  ? split_huge_page_to_list+0xe10/0xe10
> [  685.428272][ T3456]  ? __radix_tree_lookup+0x12d/0x1e0
> [  685.433453][ T3456]  ? node_tag_get.part.0.constprop.6+0x40/0x40
> [  685.439505][ T3456]  do_shrink_slab+0x244/0x5a0
> [  685.444071][ T3456]  shrink_slab+0x253/0x440
> [  685.448375][ T3456]  ? unregister_shrinker+0x110/0x110
> [  685.453551][ T3456]  ? kasan_check_read+0x11/0x20
> [  685.458291][ T3456]  ? mem_cgroup_protected+0x20f/0x260
> [  685.463555][ T3456]  shrink_node+0x31e/0xa30
> [  685.467858][ T3456]  ? shrink_node_memcg+0x1560/0x1560
> [  685.473036][ T3456]  ? ktime_get+0x93/0x110
> [  685.477250][ T3456]  do_try_to_free_pages+0x22f/0x820
> [  685.482338][ T3456]  ? shrink_node+0xa30/0xa30
> [  685.486815][ T3456]  ? kasan_check_read+0x11/0x20
> [  685.491556][ T3456]  ? check_chain_key+0x1df/0x2e0
> [  685.496383][ T3456]  try_to_free_pages+0x242/0x4d0
> [  685.501209][ T3456]  ? do_try_to_free_pages+0x820/0x820
> [  685.506476][ T3456]  __alloc_pages_nodemask+0x9ce/0x1bc0
> [  685.511826][ T3456]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
> [  685.517089][ T3456]  ? kasan_check_read+0x11/0x20
> [  685.521826][ T3456]  ? check_chain_key+0x1df/0x2e0
> [  685.526657][ T3456]  ? do_anonymous_page+0x343/0xe30
> [  685.531658][ T3456]  ? lock_downgrade+0x390/0x390
> [  685.536399][ T3456]  ? get_kernel_page+0xa0/0xa0
> [  685.541050][ T3456]  ? __lru_cache_add+0x108/0x160
> [  685.545879][ T3456]  alloc_pages_vma+0x89/0x2c0
> [  685.550444][ T3456]  do_anonymous_page+0x3e1/0xe30
> [  685.555271][ T3456]  ? __update_load_avg_cfs_rq+0x2c/0x490
> [  685.560796][ T3456]  ? finish_fault+0x120/0x120
> [  685.565361][ T3456]  ? alloc_pages_vma+0x21e/0x2c0
> [  685.570187][ T3456]  handle_pte_fault+0x457/0x12c0
> [  685.575014][ T3456]  __handle_mm_fault+0x79a/0xa50
> [  685.579841][ T3456]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> [  685.585280][ T3456]  ? kasan_check_read+0x11/0x20
> [  685.590021][ T3456]  ? __count_memcg_events+0x8b/0x1c0
> [  685.595196][ T3456]  handle_mm_fault+0x17f/0x370
> [  685.599850][ T3456]  __do_page_fault+0x25b/0x5d0
> [  685.604501][ T3456]  do_page_fault+0x4c/0x2cf
> [  685.608892][ T3456]  ? page_fault+0x5/0x20
> [  685.613019][ T3456]  page_fault+0x1b/0x20
> [  685.617058][ T3456] RIP: 0033:0x410be0
> [  685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [  68[  687.120156][ T3456] Shutting down cpus with NMI
> [  687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-07-25 21:46   ` Yang Shi
@ 2019-08-05 22:15     ` Yang Shi
  2019-08-06  1:05       ` Qian Cai
  0 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2019-08-05 22:15 UTC (permalink / raw)
  To: Qian Cai; +Cc: Kirill A. Shutemov, akpm, linux-mm, linux-kernel



On 7/25/19 2:46 PM, Yang Shi wrote:
>
>
> On 7/24/19 2:13 PM, Qian Cai wrote:
>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>>> Running LTP oom01 test case with swap triggers a crash below. Revert 
>>> the
>>> series
>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>> You might want to look harder on this commit, as reverted it alone on 
>> the top of
>>   5.2.0-next-20190711 fixed the issue.
>>
>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>>
>> [1] 
>> https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
>> linux.alibaba.com/
>
> This is the real meat of the patch series, which converted to memcg 
> deferred split queue actually.
>
>>
>>
>> list_del corruption. prev->next should be ffffea0022b10098, but was
>> 0000000000000000
>
> Finally I could reproduce the list corruption issue on my machine with 
> THP swap (swap device is fast device). I should checked this with you 
> at the first place. The problem can't be reproduced with rotate swap 
> device. So, I'm supposed you were using THP swap too.
>
> Actually, I found two issues with THP swap:
> 1. free_transhuge_page() is called in reclaim path instead of 
> put_page. The mem_cgroup_uncharge() is called before 
> free_transhuge_page() in reclaim path, which causes page->mem_cgroup 
> is NULL so the wrong deferred_split_queue would be used, so the THP 
> was not deleted from the memcg's list at all. Then the page might be 
> split or reused later, page->mapping would be override.
>
> 2. There is a race condition caused by try_to_unmap() with THP swap. 
> The try_to_unmap() just calls page_remove_rmap() to add THP to 
> deferred split queue in reclaim path. This might cause the below race 
> condition to corrupt the list:
>
>                   A                                      B
> deferred_split_scan
>     list_move
>                                                try_to_unmap
> list_add_tail
>
> list_splice <-- The list might get corrupted here
>
>                                                free_transhuge_page
>                                                       list_del <-- 
> kernel bug triggered
>
> I hope the below patch would solve your problem (tested locally).

Hi Qian,

Did the below patch solve your problem? I would like the fold the fix 
into the series then target to 5.4 release.

Thanks,
Yang

>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b7f709d..d6612ec 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)
>
>         VM_BUG_ON_PAGE(!PageTransHuge(page), page);
>
> +       /*
> +        * The try_to_unmap() in page reclaim path might reach here too,
> +        * this may cause a race condition to corrupt deferred split 
> queue.
> +        * And, if page reclaim is already handling the same page, it is
> +        * unnecessary to handle it again in shrinker.
> +        *
> +        * Check PageSwapCache to determine if the page is being
> +        * handled by page reclaim since THP swap would add the page into
> +        * swap cache before reaching try_to_unmap().
> +        */
> +       if (PageSwapCache(page))
> +               return;
> +
>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>         if (list_empty(page_deferred_list(page))) {
>                 count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0301ed..40c684a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct 
> list_head *page_list,
>                  * Is there need to periodically free_page_list? It would
>                  * appear not as the counts should be low
>                  */
> -               if (unlikely(PageTransHuge(page))) {
> -                       mem_cgroup_uncharge(page);
> +               if (unlikely(PageTransHuge(page)))
>                         (*get_compound_page_dtor(page))(page);
> -               } else
> +               else
>                         list_add(&page->lru, &free_pages);
>                 continue;
>
> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack 
> move_pages_to_lru(struct lruvec *lruvec,
>
>                         if (unlikely(PageCompound(page))) {
> spin_unlock_irq(&pgdat->lru_lock);
> -                               mem_cgroup_uncharge(page);
> (*get_compound_page_dtor(page))(page);
> spin_lock_irq(&pgdat->lru_lock);
>                         } else
>
>> [  685.284254][ T3456] ------------[ cut here ]------------
>> [  685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
>> [  685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC 
>> KASAN NOPTI
>> [  685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
>> G        W         5.2.0-next-20190711+ #3
>> [  685.311193][ T3456] Hardware name: HPE ProLiant DL385 
>> Gen10/ProLiant DL385
>> Gen10, BIOS A40 06/24/2019
>> [  685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
>> [  685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b 
>> b8 01 00 00
>> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa 
>> bc ff <0f>
>> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
>> [  685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
>> [  685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
>> ffffffffa2d5d708
>> [  685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>> ffff8888442bd380
>> [  685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
>> ffffed1108857a70
>> [  685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
>> 0000000000000000
>> [  685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
>> ffffea0022b10098
>> [  685.391348][ T3456] FS:  00007fbe26db4700(0000) 
>> GS:ffff888844280000(0000)
>> knlGS:0000000000000000
>> [  685.400194][ T3456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
>> 00000000001406a0
>> [  685.414563][ T3456] Call Trace:
>> [  685.417736][ T3456]  deferred_split_scan+0x337/0x740
>> [  685.422741][ T3456]  ? split_huge_page_to_list+0xe10/0xe10
>> [  685.428272][ T3456]  ? __radix_tree_lookup+0x12d/0x1e0
>> [  685.433453][ T3456]  ? node_tag_get.part.0.constprop.6+0x40/0x40
>> [  685.439505][ T3456]  do_shrink_slab+0x244/0x5a0
>> [  685.444071][ T3456]  shrink_slab+0x253/0x440
>> [  685.448375][ T3456]  ? unregister_shrinker+0x110/0x110
>> [  685.453551][ T3456]  ? kasan_check_read+0x11/0x20
>> [  685.458291][ T3456]  ? mem_cgroup_protected+0x20f/0x260
>> [  685.463555][ T3456]  shrink_node+0x31e/0xa30
>> [  685.467858][ T3456]  ? shrink_node_memcg+0x1560/0x1560
>> [  685.473036][ T3456]  ? ktime_get+0x93/0x110
>> [  685.477250][ T3456]  do_try_to_free_pages+0x22f/0x820
>> [  685.482338][ T3456]  ? shrink_node+0xa30/0xa30
>> [  685.486815][ T3456]  ? kasan_check_read+0x11/0x20
>> [  685.491556][ T3456]  ? check_chain_key+0x1df/0x2e0
>> [  685.496383][ T3456]  try_to_free_pages+0x242/0x4d0
>> [  685.501209][ T3456]  ? do_try_to_free_pages+0x820/0x820
>> [  685.506476][ T3456]  __alloc_pages_nodemask+0x9ce/0x1bc0
>> [  685.511826][ T3456]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>> [  685.517089][ T3456]  ? kasan_check_read+0x11/0x20
>> [  685.521826][ T3456]  ? check_chain_key+0x1df/0x2e0
>> [  685.526657][ T3456]  ? do_anonymous_page+0x343/0xe30
>> [  685.531658][ T3456]  ? lock_downgrade+0x390/0x390
>> [  685.536399][ T3456]  ? get_kernel_page+0xa0/0xa0
>> [  685.541050][ T3456]  ? __lru_cache_add+0x108/0x160
>> [  685.545879][ T3456]  alloc_pages_vma+0x89/0x2c0
>> [  685.550444][ T3456]  do_anonymous_page+0x3e1/0xe30
>> [  685.555271][ T3456]  ? __update_load_avg_cfs_rq+0x2c/0x490
>> [  685.560796][ T3456]  ? finish_fault+0x120/0x120
>> [  685.565361][ T3456]  ? alloc_pages_vma+0x21e/0x2c0
>> [  685.570187][ T3456]  handle_pte_fault+0x457/0x12c0
>> [  685.575014][ T3456]  __handle_mm_fault+0x79a/0xa50
>> [  685.579841][ T3456]  ? vmf_insert_mixed_mkwrite+0x20/0x20
>> [  685.585280][ T3456]  ? kasan_check_read+0x11/0x20
>> [  685.590021][ T3456]  ? __count_memcg_events+0x8b/0x1c0
>> [  685.595196][ T3456]  handle_mm_fault+0x17f/0x370
>> [  685.599850][ T3456]  __do_page_fault+0x25b/0x5d0
>> [  685.604501][ T3456]  do_page_fault+0x4c/0x2cf
>> [  685.608892][ T3456]  ? page_fault+0x5/0x20
>> [  685.613019][ T3456]  page_fault+0x1b/0x20
>> [  685.617058][ T3456] RIP: 0033:0x410be0
>> [  685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 
>> 86 00 00 00
>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 
>> 98 90 <c6>
>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>> [  68[  687.120156][ T3456] Shutting down cpus with NMI
>> [  687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [  687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal 
>> exception ]---
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: list corruption in deferred_split_scan()
  2019-08-05 22:15     ` Yang Shi
@ 2019-08-06  1:05       ` Qian Cai
  0 siblings, 0 replies; 22+ messages in thread
From: Qian Cai @ 2019-08-06  1:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kirill A. Shutemov, Andrew Morton, Linux-MM, Linux List Kernel Mailing



> On Aug 5, 2019, at 6:15 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> 
> 
> On 7/25/19 2:46 PM, Yang Shi wrote:
>> 
>> 
>> On 7/24/19 2:13 PM, Qian Cai wrote:
>>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote:
>>>> Running LTP oom01 test case with swap triggers a crash below. Revert the
>>>> series
>>>> "Make deferred split shrinker memcg aware" [1] seems fix the issue.
>>> You might want to look harder on this commit, as reverted it alone on the top of
>>>   5.2.0-next-20190711 fixed the issue.
>>> 
>>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1]
>>> 
>>> [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@
>>> linux.alibaba.com/
>> 
>> This is the real meat of the patch series, which converted to memcg deferred split queue actually.
>> 
>>> 
>>> 
>>> list_del corruption. prev->next should be ffffea0022b10098, but was
>>> 0000000000000000
>> 
>> Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too.
>> 
>> Actually, I found two issues with THP swap:
>> 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override.
>> 
>> 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list:
>> 
>>                   A                                      B
>> deferred_split_scan
>>     list_move
>>                                                try_to_unmap
>> list_add_tail
>> 
>> list_splice <-- The list might get corrupted here
>> 
>>                                                free_transhuge_page
>>                                                       list_del <-- kernel bug triggered
>> 
>> I hope the below patch would solve your problem (tested locally).
> 
> Hi Qian,
> 
> Did the below patch solve your problem? I would like the fold the fix into the series then target to 5.4 release.

It is going to take a while before I would be able to access that system again. Since you can reproduce this and
test yourself now, I’d say go ahead posting the patch.


> 
> Thanks,
> Yang
> 
>> 
>> 
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b7f709d..d6612ec 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page)
>> 
>>         VM_BUG_ON_PAGE(!PageTransHuge(page), page);
>> 
>> +       /*
>> +        * The try_to_unmap() in page reclaim path might reach here too,
>> +        * this may cause a race condition to corrupt deferred split queue.
>> +        * And, if page reclaim is already handling the same page, it is
>> +        * unnecessary to handle it again in shrinker.
>> +        *
>> +        * Check PageSwapCache to determine if the page is being
>> +        * handled by page reclaim since THP swap would add the page into
>> +        * swap cache before reaching try_to_unmap().
>> +        */
>> +       if (PageSwapCache(page))
>> +               return;
>> +
>>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>         if (list_empty(page_deferred_list(page))) {
>>                 count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a0301ed..40c684a 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>                  * Is there need to periodically free_page_list? It would
>>                  * appear not as the counts should be low
>>                  */
>> -               if (unlikely(PageTransHuge(page))) {
>> -                       mem_cgroup_uncharge(page);
>> +               if (unlikely(PageTransHuge(page)))
>>                         (*get_compound_page_dtor(page))(page);
>> -               } else
>> +               else
>>                         list_add(&page->lru, &free_pages);
>>                 continue;
>> 
>> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>> 
>>                         if (unlikely(PageCompound(page))) {
>> spin_unlock_irq(&pgdat->lru_lock);
>> -                               mem_cgroup_uncharge(page);
>> (*get_compound_page_dtor(page))(page);
>> spin_lock_irq(&pgdat->lru_lock);
>>                         } else
>> 
>>> [  685.284254][ T3456] ------------[ cut here ]------------
>>> [  685.289616][ T3456] kernel BUG at lib/list_debug.c:53!
>>> [  685.294808][ T3456] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>> [  685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted:
>>> G        W         5.2.0-next-20190711+ #3
>>> [  685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>>> Gen10, BIOS A40 06/24/2019
>>> [  685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6
>>> [  685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00
>>> 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f>
>>> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b
>>> [  685.345956][ T3456] RSP: 0018:ffff888e0c8a73c0 EFLAGS: 00010082
>>> [  685.351920][ T3456] RAX: 0000000000000054 RBX: ffffea0022b10098 RCX:
>>> ffffffffa2d5d708
>>> [  685.359807][ T3456] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
>>> ffff8888442bd380
>>> [  685.367693][ T3456] RBP: ffff888e0c8a73d8 R08: ffffed1108857a71 R09:
>>> ffffed1108857a70
>>> [  685.375577][ T3456] R10: ffffed1108857a70 R11: ffff8888442bd387 R12:
>>> 0000000000000000
>>> [  685.383462][ T3456] R13: 0000000000000000 R14: ffffea0022b10034 R15:
>>> ffffea0022b10098
>>> [  685.391348][ T3456] FS:  00007fbe26db4700(0000) GS:ffff888844280000(0000)
>>> knlGS:0000000000000000
>>> [  685.400194][ T3456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  685.406681][ T3456] CR2: 00007fbcabb3f000 CR3: 0000001012e44000 CR4:
>>> 00000000001406a0
>>> [  685.414563][ T3456] Call Trace:
>>> [  685.417736][ T3456]  deferred_split_scan+0x337/0x740
>>> [  685.422741][ T3456]  ? split_huge_page_to_list+0xe10/0xe10
>>> [  685.428272][ T3456]  ? __radix_tree_lookup+0x12d/0x1e0
>>> [  685.433453][ T3456]  ? node_tag_get.part.0.constprop.6+0x40/0x40
>>> [  685.439505][ T3456]  do_shrink_slab+0x244/0x5a0
>>> [  685.444071][ T3456]  shrink_slab+0x253/0x440
>>> [  685.448375][ T3456]  ? unregister_shrinker+0x110/0x110
>>> [  685.453551][ T3456]  ? kasan_check_read+0x11/0x20
>>> [  685.458291][ T3456]  ? mem_cgroup_protected+0x20f/0x260
>>> [  685.463555][ T3456]  shrink_node+0x31e/0xa30
>>> [  685.467858][ T3456]  ? shrink_node_memcg+0x1560/0x1560
>>> [  685.473036][ T3456]  ? ktime_get+0x93/0x110
>>> [  685.477250][ T3456]  do_try_to_free_pages+0x22f/0x820
>>> [  685.482338][ T3456]  ? shrink_node+0xa30/0xa30
>>> [  685.486815][ T3456]  ? kasan_check_read+0x11/0x20
>>> [  685.491556][ T3456]  ? check_chain_key+0x1df/0x2e0
>>> [  685.496383][ T3456]  try_to_free_pages+0x242/0x4d0
>>> [  685.501209][ T3456]  ? do_try_to_free_pages+0x820/0x820
>>> [  685.506476][ T3456]  __alloc_pages_nodemask+0x9ce/0x1bc0
>>> [  685.511826][ T3456]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
>>> [  685.517089][ T3456]  ? kasan_check_read+0x11/0x20
>>> [  685.521826][ T3456]  ? check_chain_key+0x1df/0x2e0
>>> [  685.526657][ T3456]  ? do_anonymous_page+0x343/0xe30
>>> [  685.531658][ T3456]  ? lock_downgrade+0x390/0x390
>>> [  685.536399][ T3456]  ? get_kernel_page+0xa0/0xa0
>>> [  685.541050][ T3456]  ? __lru_cache_add+0x108/0x160
>>> [  685.545879][ T3456]  alloc_pages_vma+0x89/0x2c0
>>> [  685.550444][ T3456]  do_anonymous_page+0x3e1/0xe30
>>> [  685.555271][ T3456]  ? __update_load_avg_cfs_rq+0x2c/0x490
>>> [  685.560796][ T3456]  ? finish_fault+0x120/0x120
>>> [  685.565361][ T3456]  ? alloc_pages_vma+0x21e/0x2c0
>>> [  685.570187][ T3456]  handle_pte_fault+0x457/0x12c0
>>> [  685.575014][ T3456]  __handle_mm_fault+0x79a/0xa50
>>> [  685.579841][ T3456]  ? vmf_insert_mixed_mkwrite+0x20/0x20
>>> [  685.585280][ T3456]  ? kasan_check_read+0x11/0x20
>>> [  685.590021][ T3456]  ? __count_memcg_events+0x8b/0x1c0
>>> [  685.595196][ T3456]  handle_mm_fault+0x17f/0x370
>>> [  685.599850][ T3456]  __do_page_fault+0x25b/0x5d0
>>> [  685.604501][ T3456]  do_page_fault+0x4c/0x2cf
>>> [  685.608892][ T3456]  ? page_fault+0x5/0x20
>>> [  685.613019][ T3456]  page_fault+0x1b/0x20
>>> [  685.617058][ T3456] RIP: 0033:0x410be0
>>> [  685.620840][ T3456] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00
>>> 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 <c6>
>>> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
>>> [  68[  687.120156][ T3456] Shutting down cpus with NMI
>>> [  687.124731][ T3456] Kernel Offset: 0x21800000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [  687.136389][ T3456] ---[ end Kernel panic - not syncing: Fatal exception ]---
>> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-08-06  1:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-10 21:43 list corruption in deferred_split_scan() Qian Cai
2019-07-11  0:16 ` Yang Shi
2019-07-11 21:07   ` Qian Cai
2019-07-12 19:12     ` Yang Shi
2019-07-13  4:41       ` Yang Shi
2019-07-15 21:23       ` Qian Cai
2019-07-16  0:22         ` Yang Shi
2019-07-16  1:36           ` Qian Cai
2019-07-16  3:00             ` Yang Shi
2019-07-16 23:36               ` Shakeel Butt
2019-07-17  0:12                 ` Yang Shi
2019-07-17 17:02                   ` Shakeel Butt
2019-07-17 17:09                     ` Yang Shi
2019-07-19  0:54       ` Qian Cai
2019-07-19  0:59         ` Yang Shi
2019-07-24 18:10           ` Qian Cai
2019-07-14  3:53 ` Hillf Danton
2019-07-15  4:52 ` Yang Shi
2019-07-24 21:13 ` Qian Cai
2019-07-25 21:46   ` Yang Shi
2019-08-05 22:15     ` Yang Shi
2019-08-06  1:05       ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).